DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH 00/16] [RFC] [VFIO] Add VFIO support to DPDK
@ 2014-05-01 11:05 Burakov, Anatoly
  2014-05-01 16:12 ` Stephen Hemminger
                   ` (17 more replies)
  0 siblings, 18 replies; 160+ messages in thread
From: Burakov, Anatoly @ 2014-05-01 11:05 UTC (permalink / raw)
  To: dev

This patchset adds support for using VFIO instead of IGB_UIO to
map the device BARs.

VFIO is a kernel 3.6+ driver allowing secure DMA from userspace
by means of using IOMMU instead of working directly with physical
memory like igb_uio does.

Short summary:
* Adding support for VFIO in EAL PCI code
* Adding new command-line parameter for VFIO interrupt type
* Adding support for VFIO in setup.sh
* Renaming igb_uio_bind to dpdk_nic_bind and adding support for
  VFIO there
* Removing PCI ID list from igb_uio, effectively making it another
  generic PCI driver similar to pci_stub, vfio-pci et al
* Adding autotest for VFIO interrupt types
* Making igb_uio and VFIO compilation optional

I'm submitting this as an RFC because this patch is based off
current dpdk.org branch with David Marchand's RTE_EAL_UNBIND_PORTS
patchset. IOW, this will *not* apply to the dpdk.org tree *unless* you
also apply David's patches beforehand.

Signed-off by: Anatoly Burakov <anatoly.burakov@intel.com>

Anatoly Burakov (16):
  Separate igb_uio mapping into a separate file
  Distinguish between legitimate failures and non-fatal errors
  Rename RTE_PCI_DRV_NEED_IGB_UIO to RTE_PCI_DRV_NEED_MAPPING
  Make igb_uio compilation optional
  Moved interrupt type out of igb_uio
  Add support for VFIO in Linuxapp targets
  Add support for VFIO interrupts, add VFIO header
  Add support for mapping devices through VFIO.
  Enable VFIO device binding
  Added support for selecting VFIO interrupt type from EAL command-line
  Make --no-huge use mmap instead of malloc.
  Adding unit tests for VFIO EAL command-line parameter
  Removed PCI ID table from igb_uio
  Renamed igb_uio_bind to dpdk_nic_bind
  Added support for VFIO drivers in dpdk_nic_bind.py
  Adding support for VFIO to setup.sh

 app/test/test_eal_flags.c                          |  24 +
 app/test/test_pci.c                                |   4 +-
 config/defconfig_i686-default-linuxapp-gcc         |   2 +
 config/defconfig_i686-default-linuxapp-icc         |   2 +
 config/defconfig_x86_64-default-linuxapp-gcc       |   2 +
 config/defconfig_x86_64-default-linuxapp-icc       |   2 +
 lib/librte_eal/bsdapp/eal/eal_pci.c                |   2 +-
 lib/librte_eal/common/Makefile                     |   1 +
 lib/librte_eal/common/eal_common_pci.c             |  17 +-
 lib/librte_eal/common/include/rte_pci.h            |   7 +-
 .../common/include/rte_pci_dev_feature_defs.h      |  46 ++
 .../common/include/rte_pci_dev_features.h          |  42 ++
 lib/librte_eal/linuxapp/Makefile                   |   2 +
 lib/librte_eal/linuxapp/eal/Makefile               |   6 +-
 lib/librte_eal/linuxapp/eal/eal.c                  |  35 +
 lib/librte_eal/linuxapp/eal/eal_interrupts.c       | 203 +++++-
 lib/librte_eal/linuxapp/eal/eal_memory.c           |   8 +-
 lib/librte_eal/linuxapp/eal/eal_pci.c              | 480 ++------------
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c          | 416 ++++++++++++
 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c         | 709 +++++++++++++++++++++
 lib/librte_eal/linuxapp/eal/eal_pci_vfio_socket.c  | 367 +++++++++++
 .../linuxapp/eal/include/eal_internal_cfg.h        |   3 +
 lib/librte_eal/linuxapp/eal/include/eal_pci_init.h | 120 ++++
 lib/librte_eal/linuxapp/eal/include/eal_vfio.h     |  55 ++
 .../linuxapp/eal/include/exec-env/rte_interrupts.h |   7 +-
 lib/librte_eal/linuxapp/igb_uio/igb_uio.c          |  70 +-
 lib/librte_pmd_e1000/em_ethdev.c                   |   2 +-
 lib/librte_pmd_e1000/igb_ethdev.c                  |   4 +-
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c                |   4 +-
 lib/librte_pmd_vmxnet3/vmxnet3_ethdev.c            |   2 +-
 tools/dpdk_nic_bind.py                             | 500 +++++++++++++++
 tools/igb_uio_bind.py                              | 485 --------------
 tools/setup.sh                                     | 168 ++++-
 33 files changed, 2797 insertions(+), 1000 deletions(-)
 create mode 100644 lib/librte_eal/common/include/rte_pci_dev_feature_defs.h
 create mode 100644 lib/librte_eal/common/include/rte_pci_dev_features.h
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_pci_uio.c
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_pci_vfio_socket.c
 create mode 100644 lib/librte_eal/linuxapp/eal/include/eal_pci_init.h
 create mode 100644 lib/librte_eal/linuxapp/eal/include/eal_vfio.h
 create mode 100755 tools/dpdk_nic_bind.py
 delete mode 100755 tools/igb_uio_bind.py

-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [dpdk-dev] [PATCH 00/16] [RFC] [VFIO] Add VFIO support to DPDK
  2014-05-01 11:05 [dpdk-dev] [PATCH 00/16] [RFC] [VFIO] Add VFIO support to DPDK Burakov, Anatoly
@ 2014-05-01 16:12 ` Stephen Hemminger
  2014-05-01 17:00   ` Chris Wright
  2014-05-02  8:58   ` Burakov, Anatoly
  2014-05-19 15:51 ` [dpdk-dev] [PATCH v2 00/16] " Anatoly Burakov
                   ` (16 subsequent siblings)
  17 siblings, 2 replies; 160+ messages in thread
From: Stephen Hemminger @ 2014-05-01 16:12 UTC (permalink / raw)
  To: Burakov, Anatoly; +Cc: dev

On Thu, 1 May 2014 11:05:38 +0000
"Burakov, Anatoly" <anatoly.burakov@intel.com> wrote:

> This patchset adds support for using VFIO instead of IGB_UIO to
> map the device BARs.
> 
> VFIO is a kernel 3.6+ driver allowing secure DMA from userspace
> by means of using IOMMU instead of working directly with physical
> memory like igb_uio does.
> 
> Short summary:
> * Adding support for VFIO in EAL PCI code
> * Adding new command-line parameter for VFIO interrupt type
> * Adding support for VFIO in setup.sh
> * Renaming igb_uio_bind to dpdk_nic_bind and adding support for
>   VFIO there
> * Removing PCI ID list from igb_uio, effectively making it another
>   generic PCI driver similar to pci_stub, vfio-pci et al
> * Adding autotest for VFIO interrupt types
> * Making igb_uio and VFIO compilation optional
> 
> I'm submitting this as an RFC because this patch is based off
> current dpdk.org branch with David Marchand's RTE_EAL_UNBIND_PORTS
> patchset. IOW, this will *not* apply to the dpdk.org tree *unless* you
> also apply David's patches beforehand.
> 
> Signed-off by: Anatoly Burakov <anatoly.burakov@intel.com>
> 
> Anatoly Burakov (16):
>   Separate igb_uio mapping into a separate file
>   Distinguish between legitimate failures and non-fatal errors
>   Rename RTE_PCI_DRV_NEED_IGB_UIO to RTE_PCI_DRV_NEED_MAPPING
>   Make igb_uio compilation optional
>   Moved interrupt type out of igb_uio
>   Add support for VFIO in Linuxapp targets
>   Add support for VFIO interrupts, add VFIO header
>   Add support for mapping devices through VFIO.
>   Enable VFIO device binding
>   Added support for selecting VFIO interrupt type from EAL command-line
>   Make --no-huge use mmap instead of malloc.
>   Adding unit tests for VFIO EAL command-line parameter
>   Removed PCI ID table from igb_uio
>   Renamed igb_uio_bind to dpdk_nic_bind
>   Added support for VFIO drivers in dpdk_nic_bind.py
>   Adding support for VFIO to setup.sh
> 
>  app/test/test_eal_flags.c                          |  24 +
>  app/test/test_pci.c                                |   4 +-
>  config/defconfig_i686-default-linuxapp-gcc         |   2 +
>  config/defconfig_i686-default-linuxapp-icc         |   2 +
>  config/defconfig_x86_64-default-linuxapp-gcc       |   2 +
>  config/defconfig_x86_64-default-linuxapp-icc       |   2 +
>  lib/librte_eal/bsdapp/eal/eal_pci.c                |   2 +-
>  lib/librte_eal/common/Makefile                     |   1 +
>  lib/librte_eal/common/eal_common_pci.c             |  17 +-
>  lib/librte_eal/common/include/rte_pci.h            |   7 +-
>  .../common/include/rte_pci_dev_feature_defs.h      |  46 ++
>  .../common/include/rte_pci_dev_features.h          |  42 ++
>  lib/librte_eal/linuxapp/Makefile                   |   2 +
>  lib/librte_eal/linuxapp/eal/Makefile               |   6 +-
>  lib/librte_eal/linuxapp/eal/eal.c                  |  35 +
>  lib/librte_eal/linuxapp/eal/eal_interrupts.c       | 203 +++++-
>  lib/librte_eal/linuxapp/eal/eal_memory.c           |   8 +-
>  lib/librte_eal/linuxapp/eal/eal_pci.c              | 480 ++------------
>  lib/librte_eal/linuxapp/eal/eal_pci_uio.c          | 416 ++++++++++++
>  lib/librte_eal/linuxapp/eal/eal_pci_vfio.c         | 709 +++++++++++++++++++++
>  lib/librte_eal/linuxapp/eal/eal_pci_vfio_socket.c  | 367 +++++++++++
>  .../linuxapp/eal/include/eal_internal_cfg.h        |   3 +
>  lib/librte_eal/linuxapp/eal/include/eal_pci_init.h | 120 ++++
>  lib/librte_eal/linuxapp/eal/include/eal_vfio.h     |  55 ++
>  .../linuxapp/eal/include/exec-env/rte_interrupts.h |   7 +-
>  lib/librte_eal/linuxapp/igb_uio/igb_uio.c          |  70 +-
>  lib/librte_pmd_e1000/em_ethdev.c                   |   2 +-
>  lib/librte_pmd_e1000/igb_ethdev.c                  |   4 +-
>  lib/librte_pmd_ixgbe/ixgbe_ethdev.c                |   4 +-
>  lib/librte_pmd_vmxnet3/vmxnet3_ethdev.c            |   2 +-
>  tools/dpdk_nic_bind.py                             | 500 +++++++++++++++
>  tools/igb_uio_bind.py                              | 485 --------------
>  tools/setup.sh                                     | 168 ++++-
>  33 files changed, 2797 insertions(+), 1000 deletions(-)
>  create mode 100644 lib/librte_eal/common/include/rte_pci_dev_feature_defs.h
>  create mode 100644 lib/librte_eal/common/include/rte_pci_dev_features.h
>  create mode 100644 lib/librte_eal/linuxapp/eal/eal_pci_uio.c
>  create mode 100644 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
>  create mode 100644 lib/librte_eal/linuxapp/eal/eal_pci_vfio_socket.c
>  create mode 100644 lib/librte_eal/linuxapp/eal/include/eal_pci_init.h
>  create mode 100644 lib/librte_eal/linuxapp/eal/include/eal_vfio.h
>  create mode 100755 tools/dpdk_nic_bind.py
>  delete mode 100755 tools/igb_uio_bind.py
> 


Will this work in guest? or only on bare metal?

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [dpdk-dev] [PATCH 00/16] [RFC] [VFIO] Add VFIO support to DPDK
  2014-05-01 16:12 ` Stephen Hemminger
@ 2014-05-01 17:00   ` Chris Wright
  2014-05-02  9:00     ` Burakov, Anatoly
  2014-05-02  8:58   ` Burakov, Anatoly
  1 sibling, 1 reply; 160+ messages in thread
From: Chris Wright @ 2014-05-01 17:00 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev

* Stephen Hemminger (stephen@networkplumber.org) wrote:
> On Thu, 1 May 2014 11:05:38 +0000
> "Burakov, Anatoly" <anatoly.burakov@intel.com> wrote:
> 
> > This patchset adds support for using VFIO instead of IGB_UIO to
> > map the device BARs.
> > 
> > VFIO is a kernel 3.6+ driver allowing secure DMA from userspace
> > by means of using IOMMU instead of working directly with physical
> > memory like igb_uio does.
> > 
> > Short summary:
> > * Adding support for VFIO in EAL PCI code
> > * Adding new command-line parameter for VFIO interrupt type
> > * Adding support for VFIO in setup.sh
> > * Renaming igb_uio_bind to dpdk_nic_bind and adding support for
> >   VFIO there
> > * Removing PCI ID list from igb_uio, effectively making it another
> >   generic PCI driver similar to pci_stub, vfio-pci et al
> > * Adding autotest for VFIO interrupt types
> > * Making igb_uio and VFIO compilation optional
> > 
> > I'm submitting this as an RFC because this patch is based off
> > current dpdk.org branch with David Marchand's RTE_EAL_UNBIND_PORTS
> > patchset. IOW, this will *not* apply to the dpdk.org tree *unless* you
> > also apply David's patches beforehand.
> > 
> > Signed-off by: Anatoly Burakov <anatoly.burakov@intel.com>
> > 
> > Anatoly Burakov (16):
> >   Separate igb_uio mapping into a separate file
> >   Distinguish between legitimate failures and non-fatal errors
> >   Rename RTE_PCI_DRV_NEED_IGB_UIO to RTE_PCI_DRV_NEED_MAPPING
> >   Make igb_uio compilation optional
> >   Moved interrupt type out of igb_uio
> >   Add support for VFIO in Linuxapp targets
> >   Add support for VFIO interrupts, add VFIO header
> >   Add support for mapping devices through VFIO.
> >   Enable VFIO device binding
> >   Added support for selecting VFIO interrupt type from EAL command-line
> >   Make --no-huge use mmap instead of malloc.
> >   Adding unit tests for VFIO EAL command-line parameter
> >   Removed PCI ID table from igb_uio
> >   Renamed igb_uio_bind to dpdk_nic_bind
> >   Added support for VFIO drivers in dpdk_nic_bind.py
> >   Adding support for VFIO to setup.sh
> > 
> >  app/test/test_eal_flags.c                          |  24 +
> >  app/test/test_pci.c                                |   4 +-
> >  config/defconfig_i686-default-linuxapp-gcc         |   2 +
> >  config/defconfig_i686-default-linuxapp-icc         |   2 +
> >  config/defconfig_x86_64-default-linuxapp-gcc       |   2 +
> >  config/defconfig_x86_64-default-linuxapp-icc       |   2 +
> >  lib/librte_eal/bsdapp/eal/eal_pci.c                |   2 +-
> >  lib/librte_eal/common/Makefile                     |   1 +
> >  lib/librte_eal/common/eal_common_pci.c             |  17 +-
> >  lib/librte_eal/common/include/rte_pci.h            |   7 +-
> >  .../common/include/rte_pci_dev_feature_defs.h      |  46 ++
> >  .../common/include/rte_pci_dev_features.h          |  42 ++
> >  lib/librte_eal/linuxapp/Makefile                   |   2 +
> >  lib/librte_eal/linuxapp/eal/Makefile               |   6 +-
> >  lib/librte_eal/linuxapp/eal/eal.c                  |  35 +
> >  lib/librte_eal/linuxapp/eal/eal_interrupts.c       | 203 +++++-
> >  lib/librte_eal/linuxapp/eal/eal_memory.c           |   8 +-
> >  lib/librte_eal/linuxapp/eal/eal_pci.c              | 480 ++------------
> >  lib/librte_eal/linuxapp/eal/eal_pci_uio.c          | 416 ++++++++++++
> >  lib/librte_eal/linuxapp/eal/eal_pci_vfio.c         | 709 +++++++++++++++++++++
> >  lib/librte_eal/linuxapp/eal/eal_pci_vfio_socket.c  | 367 +++++++++++
> >  .../linuxapp/eal/include/eal_internal_cfg.h        |   3 +
> >  lib/librte_eal/linuxapp/eal/include/eal_pci_init.h | 120 ++++
> >  lib/librte_eal/linuxapp/eal/include/eal_vfio.h     |  55 ++
> >  .../linuxapp/eal/include/exec-env/rte_interrupts.h |   7 +-
> >  lib/librte_eal/linuxapp/igb_uio/igb_uio.c          |  70 +-
> >  lib/librte_pmd_e1000/em_ethdev.c                   |   2 +-
> >  lib/librte_pmd_e1000/igb_ethdev.c                  |   4 +-
> >  lib/librte_pmd_ixgbe/ixgbe_ethdev.c                |   4 +-
> >  lib/librte_pmd_vmxnet3/vmxnet3_ethdev.c            |   2 +-
> >  tools/dpdk_nic_bind.py                             | 500 +++++++++++++++
> >  tools/igb_uio_bind.py                              | 485 --------------
> >  tools/setup.sh                                     | 168 ++++-
> >  33 files changed, 2797 insertions(+), 1000 deletions(-)
> >  create mode 100644 lib/librte_eal/common/include/rte_pci_dev_feature_defs.h
> >  create mode 100644 lib/librte_eal/common/include/rte_pci_dev_features.h
> >  create mode 100644 lib/librte_eal/linuxapp/eal/eal_pci_uio.c
> >  create mode 100644 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
> >  create mode 100644 lib/librte_eal/linuxapp/eal/eal_pci_vfio_socket.c
> >  create mode 100644 lib/librte_eal/linuxapp/eal/include/eal_pci_init.h
> >  create mode 100644 lib/librte_eal/linuxapp/eal/include/eal_vfio.h
> >  create mode 100755 tools/dpdk_nic_bind.py
> >  delete mode 100755 tools/igb_uio_bind.py
> 
> Will this work in guest? or only on bare metal?

hmm, vfio requires iommu support, however virtio pmd?

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [dpdk-dev] [PATCH 00/16] [RFC] [VFIO] Add VFIO support to DPDK
  2014-05-01 16:12 ` Stephen Hemminger
  2014-05-01 17:00   ` Chris Wright
@ 2014-05-02  8:58   ` Burakov, Anatoly
  2014-09-08  8:20     ` Sujith Sankar (ssujith)
  1 sibling, 1 reply; 160+ messages in thread
From: Burakov, Anatoly @ 2014-05-02  8:58 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev

Hi Stephen,

> Will this work in guest? or only on bare metal?

VFIO is Linux-only, and in theory will be able to work on the guest, but not at the moment, since it requires IOMMU. There was a GSoC proposal for KVM to do IOMMU implementation, and there were a few AMD IOMMU-emulation patches floating around the KVM lists for some time, but nothing has made it into release yet.

Best regards,
Anatoly Burakov
DPDK SW Engineer

--------------------------------------------------------------
Intel Shannon Limited
Registered in Ireland
Registered Office: Collinstown Industrial Park, Leixlip, County Kildare
Registered Number: 308263
Business address: Dromore House, East Park, Shannon, Co. Clare

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [dpdk-dev] [PATCH 00/16] [RFC] [VFIO] Add VFIO support to DPDK
  2014-05-01 17:00   ` Chris Wright
@ 2014-05-02  9:00     ` Burakov, Anatoly
  2014-05-05 14:44       ` Vincent JARDIN
  0 siblings, 1 reply; 160+ messages in thread
From: Burakov, Anatoly @ 2014-05-02  9:00 UTC (permalink / raw)
  To: Chris Wright, Stephen Hemminger; +Cc: dev

Hi Chris,
 
> hmm, vfio requires iommu support, however virtio pmd?

That's correct, virtio will not work with VFIO as it stands. However it's not the fault of this patch but rather lack of emulated IOMMU on the guest :-)

Best regards,
Anatoly Burakov
DPDK SW Engineer

--------------------------------------------------------------
Intel Shannon Limited
Registered in Ireland
Registered Office: Collinstown Industrial Park, Leixlip, County Kildare
Registered Number: 308263
Business address: Dromore House, East Park, Shannon, Co. Clare

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [dpdk-dev] [PATCH 00/16] [RFC] [VFIO] Add VFIO support to DPDK
  2014-05-02  9:00     ` Burakov, Anatoly
@ 2014-05-05 14:44       ` Vincent JARDIN
  2014-05-06  8:41         ` Burakov, Anatoly
  0 siblings, 1 reply; 160+ messages in thread
From: Vincent JARDIN @ 2014-05-05 14:44 UTC (permalink / raw)
  To: dev

On 02/05/2014 11:00, Burakov, Anatoly wrote:
> Hi Chris,
>
>> hmm, vfio requires iommu support, however virtio pmd?
>
> That's correct, virtio will not work with VFIO as it stands. However it's not the fault of this patch but rather lack of emulated IOMMU on the guest :-)

My 2 cents:
   http://dpdk.org/browse/virtio-net-pmd/tree/README.rst
it works without UIO.

Best regards,
   Vincent

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [dpdk-dev] [PATCH 00/16] [RFC] [VFIO] Add VFIO support to DPDK
  2014-05-05 14:44       ` Vincent JARDIN
@ 2014-05-06  8:41         ` Burakov, Anatoly
  0 siblings, 0 replies; 160+ messages in thread
From: Burakov, Anatoly @ 2014-05-06  8:41 UTC (permalink / raw)
  To: Vincent JARDIN, dev

Hi Vincent,

> My 2 cents:
>    http://dpdk.org/browse/virtio-net-pmd/tree/README.rst
> it works without UIO.
> 

I meant the in-tree virtio driver. Obviously anything that doesn't depend on UIO will work with VFIO as well (or even without both - compilation of both will be optional).

Best regards,
Anatoly Burakov
DPDK SW Engineer

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v2 00/16] Add VFIO support to DPDK
  2014-05-01 11:05 [dpdk-dev] [PATCH 00/16] [RFC] [VFIO] Add VFIO support to DPDK Burakov, Anatoly
  2014-05-01 16:12 ` Stephen Hemminger
@ 2014-05-19 15:51 ` Anatoly Burakov
  2014-05-28 14:37   ` [dpdk-dev] [PATCH v3 00/20] " Anatoly Burakov
  2014-05-19 15:51 ` [dpdk-dev] [PATCH v2 01/16] Separate igb_uio mapping into a separate file Anatoly Burakov
                   ` (15 subsequent siblings)
  17 siblings, 1 reply; 160+ messages in thread
From: Anatoly Burakov @ 2014-05-19 15:51 UTC (permalink / raw)
  To: dev

This patchset adds support for using VFIO instead of IGB_UIO to
map the device BARs.

VFIO is a kernel 3.6+ driver allowing secure DMA from userspace
by means of using IOMMU instead of working directly with physical
memory like igb_uio does.

Short summary:
* Adding support for VFIO in EAL PCI code
* Adding new command-line parameter for VFIO interrupt type
* Adding support for VFIO in setup.sh
* Renaming igb_uio_bind to dpdk_nic_bind and adding support for
  VFIO there
* Removing PCI ID list from igb_uio, effectively making it another
  generic PCI driver similar to pci_stub, vfio-pci et al
* Adding autotest for VFIO interrupt types
* Making igb_uio and VFIO compilation optional

v2 changes: fixed a couple of resource leaks

Based off commit 356cb732d5381140f42ef8b55492339579854986

Anatoly Burakov (16):
  Separate igb_uio mapping into a separate file
  Distinguish between legitimate failures and non-fatal errors
  Rename RTE_PCI_DRV_NEED_IGB_UIO to RTE_PCI_DRV_NEED_MAPPING
  Make igb_uio compilation optional
  Moved interrupt type out of igb_uio
  Add support for VFIO in Linuxapp targets
  Add support for VFIO interrupts, add VFIO header
  Add support for mapping devices through VFIO.
  Enable VFIO device binding
  Added support for selecting VFIO interrupt type from EAL command-line
  Make --no-huge use mmap instead of malloc
  Adding unit tests for VFIO EAL command-line parameter
  Removed PCI ID table from igb_uio
  Renamed igb_uio_bind to dpdk_nic_bind
  Added support for VFIO drivers in dpdk_nic_bind.py
  Adding support for VFIO to setup.sh

 app/test/test_eal_flags.c                          |   24 +
 app/test/test_pci.c                                |    4 +-
 config/defconfig_i686-default-linuxapp-gcc         |    2 +
 config/defconfig_i686-default-linuxapp-icc         |    2 +
 config/defconfig_x86_64-default-linuxapp-gcc       |    2 +
 config/defconfig_x86_64-default-linuxapp-icc       |    2 +
 lib/librte_eal/bsdapp/eal/eal_pci.c                |    2 +-
 lib/librte_eal/common/Makefile                     |    1 +
 lib/librte_eal/common/eal_common_pci.c             |   16 +-
 lib/librte_eal/common/include/rte_pci.h            |    7 +-
 .../common/include/rte_pci_dev_feature_defs.h      |   46 ++
 .../common/include/rte_pci_dev_features.h          |   40 ++
 lib/librte_eal/linuxapp/Makefile                   |    2 +
 lib/librte_eal/linuxapp/eal/Makefile               |    6 +-
 lib/librte_eal/linuxapp/eal/eal.c                  |   33 +
 lib/librte_eal/linuxapp/eal/eal_interrupts.c       |  203 ++++++-
 lib/librte_eal/linuxapp/eal/eal_memory.c           |    8 +-
 lib/librte_eal/linuxapp/eal/eal_pci.c              |  486 ++------------
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c          |  403 +++++++++++
 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c         |  719 ++++++++++++++++++++
 lib/librte_eal/linuxapp/eal/eal_pci_vfio_socket.c  |  367 ++++++++++
 .../linuxapp/eal/include/eal_internal_cfg.h        |    3 +
 lib/librte_eal/linuxapp/eal/include/eal_pci_init.h |  120 ++++
 lib/librte_eal/linuxapp/eal/include/eal_vfio.h     |   55 ++
 .../linuxapp/eal/include/exec-env/rte_interrupts.h |    3 +
 lib/librte_eal/linuxapp/igb_uio/igb_uio.c          |   70 +--
 lib/librte_pmd_e1000/em_ethdev.c                   |    2 +-
 lib/librte_pmd_e1000/igb_ethdev.c                  |    4 +-
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c                |    4 +-
 lib/librte_pmd_vmxnet3/vmxnet3_ethdev.c            |    2 +-
 tools/{igb_uio_bind.py => dpdk_nic_bind.py}        |  163 +++--
 tools/setup.sh                                     |  168 ++++-
 32 files changed, 2380 insertions(+), 589 deletions(-)
 create mode 100644 lib/librte_eal/common/include/rte_pci_dev_feature_defs.h
 create mode 100644 lib/librte_eal/common/include/rte_pci_dev_features.h
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_pci_uio.c
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_pci_vfio_socket.c
 create mode 100644 lib/librte_eal/linuxapp/eal/include/eal_pci_init.h
 create mode 100644 lib/librte_eal/linuxapp/eal/include/eal_vfio.h
 rename tools/{igb_uio_bind.py => dpdk_nic_bind.py} (82%)

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v2 01/16] Separate igb_uio mapping into a separate file
  2014-05-01 11:05 [dpdk-dev] [PATCH 00/16] [RFC] [VFIO] Add VFIO support to DPDK Burakov, Anatoly
  2014-05-01 16:12 ` Stephen Hemminger
  2014-05-19 15:51 ` [dpdk-dev] [PATCH v2 00/16] " Anatoly Burakov
@ 2014-05-19 15:51 ` Anatoly Burakov
  2014-05-21 12:42   ` Thomas Monjalon
  2014-05-19 15:51 ` [dpdk-dev] [PATCH v2 02/16] Distinguish between legitimate failures and non-fatal errors Anatoly Burakov
                   ` (14 subsequent siblings)
  17 siblings, 1 reply; 160+ messages in thread
From: Anatoly Burakov @ 2014-05-19 15:51 UTC (permalink / raw)
  To: dev

In order to make the code a bit more clean while using multiple
drivers, IGB_UIO mapping has been separated into its own file.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/linuxapp/eal/Makefile               |    1 +
 lib/librte_eal/linuxapp/eal/eal_pci.c              |  424 +-------------------
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c          |  403 +++++++++++++++++++
 lib/librte_eal/linuxapp/eal/include/eal_pci_init.h |   65 +++
 4 files changed, 478 insertions(+), 415 deletions(-)
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_pci_uio.c
 create mode 100644 lib/librte_eal/linuxapp/eal/include/eal_pci_init.h

diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index b00e3ec..527fa2a 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -57,6 +57,7 @@ endif
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_thread.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_log.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci.c
+SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci_uio.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_debug.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_lcore.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_timer.c
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c
index ac2c1fe..cd5b797 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -31,82 +31,31 @@
  *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */
 
-#include <ctype.h>
-#include <stdio.h>
-#include <stdlib.h>
 #include <string.h>
-#include <stdarg.h>
-#include <unistd.h>
-#include <inttypes.h>
-#include <sys/types.h>
-#include <sys/stat.h>
-#include <fcntl.h>
-#include <stdarg.h>
-#include <errno.h>
 #include <dirent.h>
-#include <limits.h>
-#include <sys/queue.h>
 #include <sys/mman.h>
-#include <sys/ioctl.h>
 
-#include <rte_interrupts.h>
 #include <rte_log.h>
 #include <rte_pci.h>
-#include <rte_common.h>
-#include <rte_launch.h>
-#include <rte_memory.h>
-#include <rte_memzone.h>
 #include <rte_tailq.h>
-#include <rte_eal.h>
 #include <rte_eal_memconfig.h>
-#include <rte_per_lcore.h>
-#include <rte_lcore.h>
-#include <rte_malloc.h>
-#include <rte_string_fns.h>
-#include <rte_debug.h>
 #include <rte_devargs.h>
 
 #include "rte_pci_dev_ids.h"
 #include "eal_filesystem.h"
 #include "eal_private.h"
+#include "eal_pci_init.h"
 
 /**
  * @file
  * PCI probing under linux
  *
  * This code is used to simulate a PCI probe by parsing information in
- * sysfs. Moreover, when a registered driver matches a device, the
- * kernel driver currently using it is unloaded and replaced by
- * igb_uio module, which is a very minimal userland driver for Intel
- * network card, only providing access to PCI BAR to applications, and
- * enabling bus master.
+ * sysfs. When a registered device matches a driver, it is then initialized
+ * with either VFIO or IGB_UIO driver (or doesn't initialize), whichever
+ * driver the device is bound to.
  */
 
-struct uio_map {
-	void *addr;
-	uint64_t offset;
-	uint64_t size;
-	uint64_t phaddr;
-};
-
-/*
- * For multi-process we need to reproduce all PCI mappings in secondary
- * processes, so save them in a tailq.
- */
-struct uio_resource {
-	TAILQ_ENTRY(uio_resource) next;
-
-	struct rte_pci_addr pci_addr;
-	char path[PATH_MAX];
-	size_t nb_maps;
-	struct uio_map maps[PCI_MAX_RESOURCE];
-};
-
-TAILQ_HEAD(uio_res_list, uio_resource);
-
-static struct uio_res_list *uio_res_list = NULL;
-static int pci_parse_sysfs_value(const char *filename, uint64_t *val);
-
 /* unbind kernel driver for this device */
 static int
 pci_unbind_kernel_driver(struct rte_pci_device *dev)
@@ -147,31 +96,19 @@ error:
 }
 
 /* map a particular resource from a file */
-static void *
-pci_map_resource(void *requested_addr, const char *devname, off_t offset,
+void *
+pci_map_resource(void *requested_addr, int fd, off_t offset,
 		 size_t size)
 {
-	int fd;
 	void *mapaddr;
 
-	/*
-	 * open devname, to mmap it
-	 */
-	fd = open(devname, O_RDWR);
-	if (fd < 0) {
-		RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
-			devname, strerror(errno));
-		goto fail;
-	}
-
 	/* Map the PCI memory resource of device */
 	mapaddr = mmap(requested_addr, size, PROT_READ | PROT_WRITE,
 			MAP_SHARED, fd, offset);
-	close(fd);
 	if (mapaddr == MAP_FAILED ||
 			(requested_addr != NULL && mapaddr != requested_addr)) {
-		RTE_LOG(ERR, EAL, "%s(): cannot mmap(%s(%d), %p, 0x%lx, 0x%lx):"
-			" %s (%p)\n", __func__, devname, fd, requested_addr,
+		RTE_LOG(ERR, EAL, "%s(): cannot mmap(%d, %p, 0x%lx, 0x%lx):"
+			" %s (%p)\n", __func__, fd, requested_addr,
 			(unsigned long)size, (unsigned long)offset,
 			strerror(errno), mapaddr);
 		goto fail;
@@ -185,314 +122,6 @@ fail:
 	return NULL;
 }
 
-#define OFF_MAX              ((uint64_t)(off_t)-1)
-static ssize_t
-pci_uio_get_mappings(const char *devname, struct uio_map maps[], size_t nb_maps)
-{
-	size_t i;
-	char dirname[PATH_MAX];
-	char filename[PATH_MAX];
-	uint64_t offset, size;
-
-	for (i = 0; i != nb_maps; i++) {
- 
-		/* check if map directory exists */
-		rte_snprintf(dirname, sizeof(dirname), 
-			"%s/maps/map%u", devname, i);
- 
-		if (access(dirname, F_OK) != 0)
-			break;
- 
-		/* get mapping offset */
-		rte_snprintf(filename, sizeof(filename),
-			"%s/offset", dirname);
-		if (pci_parse_sysfs_value(filename, &offset) < 0) {
-			RTE_LOG(ERR, EAL,
-				"%s(): cannot parse offset of %s\n",
-				__func__, dirname);
-			return (-1);
-		}
- 
-		/* get mapping size */
-		rte_snprintf(filename, sizeof(filename),
-			"%s/size", dirname);
-		if (pci_parse_sysfs_value(filename, &size) < 0) {
-			RTE_LOG(ERR, EAL,
-				"%s(): cannot parse size of %s\n",
-				__func__, dirname);
-			return (-1);
-		}
- 
-		/* get mapping physical address */
-		rte_snprintf(filename, sizeof(filename),
-			"%s/addr", dirname);
-		if (pci_parse_sysfs_value(filename, &maps[i].phaddr) < 0) {
-			RTE_LOG(ERR, EAL,
-				"%s(): cannot parse addr of %s\n",
-				__func__, dirname);
-			return (-1);
-		}
-
-		if ((offset > OFF_MAX) || (size > SIZE_MAX)) {
-			RTE_LOG(ERR, EAL,
-				"%s(): offset/size exceed system max value\n",
-				__func__); 
-			return (-1);
-		}
-
-		maps[i].offset = offset;
-		maps[i].size = size;
-        }
-	return (i);
-}
-
-static int
-pci_uio_map_secondary(struct rte_pci_device *dev)
-{
-        size_t i;
-        struct uio_resource *uio_res;
-
-	TAILQ_FOREACH(uio_res, uio_res_list, next) {
-
-		/* skip this element if it doesn't match our PCI address */
-		if (memcmp(&uio_res->pci_addr, &dev->addr, sizeof(dev->addr)))
-			continue;
-
-		for (i = 0; i != uio_res->nb_maps; i++) {
-			if (pci_map_resource(uio_res->maps[i].addr,
-					     uio_res->path,
-					     (off_t)uio_res->maps[i].offset,
-					     (size_t)uio_res->maps[i].size)
-			    != uio_res->maps[i].addr) {
-				RTE_LOG(ERR, EAL,
-					"Cannot mmap device resource\n");
-				return (-1);
-			}
-		}
-		return (0);
-	}
-
-	RTE_LOG(ERR, EAL, "Cannot find resource for device\n");
-	return -1;
-}
-
-static int pci_mknod_uio_dev(const char *sysfs_uio_path, unsigned uio_num)
-{
-	FILE *f;
-	char filename[PATH_MAX];
-	int ret;
-	unsigned major, minor;
-	dev_t dev;
-
-	/* get the name of the sysfs file that contains the major and minor
-	 * of the uio device and read its content */
-	rte_snprintf(filename, sizeof(filename), "%s/dev", sysfs_uio_path);
-
-	f = fopen(filename, "r");
-	if (f == NULL) {
-		RTE_LOG(ERR, EAL, "%s(): cannot open sysfs to get major:minor\n",
-			__func__);
-		return -1;
-	}
-
-	ret = fscanf(f, "%d:%d", &major, &minor);
-	if (ret != 2) {
-		RTE_LOG(ERR, EAL, "%s(): cannot parse sysfs to get major:minor\n",
-			__func__);
-		fclose(f);
-		return -1;
-	}
-	fclose(f);
-
-	/* create the char device "mknod /dev/uioX c major minor" */
-	rte_snprintf(filename, sizeof(filename), "/dev/uio%u", uio_num);
-	dev = makedev(major, minor);
-	ret = mknod(filename, S_IFCHR | S_IRUSR | S_IWUSR, dev);
-	if (f == NULL) {
-		RTE_LOG(ERR, EAL, "%s(): mknod() failed %s\n",
-			__func__, strerror(errno));
-		return -1;
-	}
-
-	return ret;
-}
-
-/*
- * Return the uioX char device used for a pci device. On success, return
- * the UIO number and fill dstbuf string with the path of the device in
- * sysfs. On error, return a negative value. In this case dstbuf is
- * invalid.
- */
-static int pci_get_uio_dev(struct rte_pci_device *dev, char *dstbuf,
-			   unsigned int buflen)
-{
-	struct rte_pci_addr *loc = &dev->addr;
-	unsigned int uio_num;
-	struct dirent *e;
-	DIR *dir;
-	char dirname[PATH_MAX];
-
-	/* depending on kernel version, uio can be located in uio/uioX
-	 * or uio:uioX */
-
-	rte_snprintf(dirname, sizeof(dirname),
-	         SYSFS_PCI_DEVICES "/" PCI_PRI_FMT "/uio",
-	         loc->domain, loc->bus, loc->devid, loc->function);
-
-	dir = opendir(dirname);
-	if (dir == NULL) {
-		/* retry with the parent directory */
-		rte_snprintf(dirname, sizeof(dirname),
-		         SYSFS_PCI_DEVICES "/" PCI_PRI_FMT,
-		         loc->domain, loc->bus, loc->devid, loc->function);
-		dir = opendir(dirname);
-
-		if (dir == NULL) {
-			RTE_LOG(ERR, EAL, "Cannot opendir %s\n", dirname);
-			return -1;
-		}
-	}
-
-	/* take the first file starting with "uio" */
-	while ((e = readdir(dir)) != NULL) {
-		/* format could be uio%d ...*/
-		int shortprefix_len = sizeof("uio") - 1;
-		/* ... or uio:uio%d */
-		int longprefix_len = sizeof("uio:uio") - 1; 
-		char *endptr;
-
-		if (strncmp(e->d_name, "uio", 3) != 0)
-			continue;
-
-		/* first try uio%d */
-		errno = 0;
-		uio_num = strtoull(e->d_name + shortprefix_len, &endptr, 10);
-		if (errno == 0 && endptr != (e->d_name + shortprefix_len)) {
-			rte_snprintf(dstbuf, buflen, "%s/uio%u", dirname, uio_num);
-			break;
-		}
-
-		/* then try uio:uio%d */
-		errno = 0;
-		uio_num = strtoull(e->d_name + longprefix_len, &endptr, 10);
-		if (errno == 0 && endptr != (e->d_name + longprefix_len)) {
-			rte_snprintf(dstbuf, buflen, "%s/uio:uio%u", dirname, uio_num);
-			break;
-		}
-	}
-	closedir(dir);
-
-	/* No uio resource found */
-	if (e == NULL)
-		return -1;
-
-	/* create uio device if we've been asked to */
-	if (internal_config.create_uio_dev && pci_mknod_uio_dev(dstbuf, uio_num) < 0)
-		RTE_LOG(WARNING, EAL, "Cannot create /dev/uio%u\n", uio_num);
-
-	return uio_num;
-}
-
-/* map the PCI resource of a PCI device in virtual memory */
-static int
-pci_uio_map_resource(struct rte_pci_device *dev)
-{
-	int i, j;
-	char dirname[PATH_MAX];
-	char devname[PATH_MAX]; /* contains the /dev/uioX */
-	void *mapaddr;
-	int uio_num;
-	uint64_t phaddr;
-	uint64_t offset;
-	uint64_t pagesz;
-	ssize_t nb_maps;
-	struct rte_pci_addr *loc = &dev->addr;
-	struct uio_resource *uio_res;
-	struct uio_map *maps;
-
-	dev->intr_handle.fd = -1;
-	dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
-
-	/* secondary processes - use already recorded details */
-	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
-		return (pci_uio_map_secondary(dev));
-
-	/* find uio resource */
-	uio_num = pci_get_uio_dev(dev, dirname, sizeof(dirname));
-	if (uio_num < 0) {
-		RTE_LOG(WARNING, EAL, "  "PCI_PRI_FMT" not managed by UIO driver, "
-				"skipping\n", loc->domain, loc->bus, loc->devid, loc->function);
-		return -1;
-	}
-	rte_snprintf(devname, sizeof(devname), "/dev/uio%u", uio_num);
-
-	/* save fd if in primary process */
-	dev->intr_handle.fd = open(devname, O_RDWR);
-	if (dev->intr_handle.fd < 0) {
-		RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
-			devname, strerror(errno));
-		return -1;
-	}
-	dev->intr_handle.type = RTE_INTR_HANDLE_UIO;
-
-	/* allocate the mapping details for secondary processes*/
-	if ((uio_res = rte_zmalloc("UIO_RES", sizeof (*uio_res), 0)) == NULL) {
-		RTE_LOG(ERR, EAL,
-			"%s(): cannot store uio mmap details\n", __func__);
-		return (-1);
-	}
-
-	rte_snprintf(uio_res->path, sizeof(uio_res->path), "%s", devname);
-	memcpy(&uio_res->pci_addr, &dev->addr, sizeof(uio_res->pci_addr));
-
-	/* collect info about device mappings */
-	nb_maps = pci_uio_get_mappings(dirname, uio_res->maps,
-				       RTE_DIM(uio_res->maps));
-	if (nb_maps < 0) {
-		rte_free(uio_res);
-		return (nb_maps);
-	}
-
-	uio_res->nb_maps = nb_maps;
-
-	/* Map all BARs */
-	pagesz = sysconf(_SC_PAGESIZE);
-
-	maps = uio_res->maps;
-	for (i = 0; i != PCI_MAX_RESOURCE; i++) {
-
-		/* skip empty BAR */
-		if ((phaddr = dev->mem_resource[i].phys_addr) == 0)
-			continue;
-
-		for (j = 0; j != nb_maps && (phaddr != maps[j].phaddr ||
-				dev->mem_resource[i].len != maps[j].size);
-				j++)
-			;
-
-		/* if matching map is found, then use it */
-		if (j != nb_maps) {
-			offset = j * pagesz;
-			if (maps[j].addr != NULL ||
-			    (mapaddr = pci_map_resource(NULL, devname,
-							(off_t)offset,
-							(size_t)maps[j].size)
-			    ) == NULL) {
-				rte_free(uio_res);
-				return (-1);
-			}
-
-			maps[j].addr = mapaddr;
-			maps[j].offset = offset;
-			dev->mem_resource[i].addr = mapaddr;
-		}
-	}
-
-	TAILQ_INSERT_TAIL(uio_res_list, uio_res, next);
-
-	return (0);
-}
-
 /* parse the "resource" sysfs file */
 #define IORESOURCE_MEM  0x00000200
 
@@ -556,41 +185,6 @@ error:
 	return -1;
 }
 
-/* 
- * parse a sysfs file containing one integer value 
- * different to the eal version, as it needs to work with 64-bit values
- */ 
-static int 
-pci_parse_sysfs_value(const char *filename, uint64_t *val) 
-{
-        FILE *f;
-        char buf[BUFSIZ];
-        char *end = NULL;
- 
-        f = fopen(filename, "r");
-        if (f == NULL) {
-                RTE_LOG(ERR, EAL, "%s(): cannot open sysfs value %s\n",
-                        __func__, filename);
-                return -1;
-        }
- 
-        if (fgets(buf, sizeof(buf), f) == NULL) {
-                RTE_LOG(ERR, EAL, "%s(): cannot read sysfs value %s\n",
-                        __func__, filename);
-                fclose(f);
-                return -1;
-        }
-        *val = strtoull(buf, &end, 0);
-        if ((buf[0] == '\0') || (end == NULL) || (*end != '\n')) {
-                RTE_LOG(ERR, EAL, "%s(): cannot parse sysfs value %s\n",
-                                __func__, filename);
-                fclose(f);
-                return -1;
-        }
-        fclose(f);
-        return 0;
-}
-
 /* Compare two PCI device addresses. */
 static int
 pci_addr_comparison(struct rte_pci_addr *addr, struct rte_pci_addr *addr2)
@@ -866,7 +460,7 @@ rte_eal_pci_init(void)
 {
 	TAILQ_INIT(&pci_driver_list);
 	TAILQ_INIT(&pci_device_list);
-	uio_res_list = RTE_TAILQ_RESERVE_BY_IDX(RTE_TAILQ_PCI, uio_res_list);
+	pci_res_list = RTE_TAILQ_RESERVE_BY_IDX(RTE_TAILQ_PCI, mapped_pci_res_list);
 
 	/* for debug purposes, PCI can be disabled */
 	if (internal_config.no_pci)
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
new file mode 100644
index 0000000..f29fee5
--- /dev/null
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
@@ -0,0 +1,403 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <string.h>
+#include <fcntl.h>
+#include <dirent.h>
+#include <sys/stat.h>
+
+#include <rte_log.h>
+#include <rte_pci.h>
+#include <rte_common.h>
+#include <rte_malloc.h>
+#include <rte_tailq.h>
+
+#include "rte_pci_dev_ids.h"
+#include "eal_filesystem.h"
+#include "eal_pci_init.h"
+
+static int pci_parse_sysfs_value(const char *filename, uint64_t *val);
+
+#define OFF_MAX              ((uint64_t)(off_t)-1)
+static ssize_t
+pci_uio_get_mappings(const char *devname, struct pci_map maps[], size_t nb_maps) {
+	size_t i;
+	char dirname[PATH_MAX];
+	char filename[PATH_MAX];
+	uint64_t offset, size;
+
+	for (i = 0; i != nb_maps; i++) {
+
+		/* check if map directory exists */
+		rte_snprintf(dirname, sizeof(dirname), "%s/maps/map%u", devname, i);
+
+		if (access(dirname, F_OK) != 0)
+			break;
+
+		/* get mapping offset */
+		rte_snprintf(filename, sizeof(filename), "%s/offset", dirname);
+		if (pci_parse_sysfs_value(filename, &offset) < 0) {
+			RTE_LOG(ERR, EAL,
+					"%s(): cannot parse offset of %s\n", __func__, dirname);
+			return (-1);
+		}
+
+		/* get mapping size */
+		rte_snprintf(filename, sizeof(filename), "%s/size", dirname);
+		if (pci_parse_sysfs_value(filename, &size) < 0) {
+			RTE_LOG(ERR, EAL,
+					"%s(): cannot parse size of %s\n", __func__, dirname);
+			return (-1);
+		}
+
+		/* get mapping physical address */
+		rte_snprintf(filename, sizeof(filename), "%s/addr", dirname);
+		if (pci_parse_sysfs_value(filename, &maps[i].phaddr) < 0) {
+			RTE_LOG(ERR, EAL,
+					"%s(): cannot parse addr of %s\n", __func__, dirname);
+			return (-1);
+		}
+
+		if ((offset > OFF_MAX) || (size > SIZE_MAX)) {
+			RTE_LOG(ERR, EAL,
+					"%s(): offset/size exceed system max value\n", __func__);
+			return (-1);
+		}
+
+		maps[i].offset = offset;
+		maps[i].size = size;
+	}
+
+	return (i);
+}
+
+static int
+pci_uio_map_secondary(struct rte_pci_device *dev) {
+	int fd, i;
+	struct mapped_pci_resource *uio_res;
+
+	TAILQ_FOREACH(uio_res, pci_res_list, next) {
+
+		/* skip this element if it doesn't match our PCI address */
+		if (memcmp(&uio_res->pci_addr, &dev->addr, sizeof(dev->addr)))
+			continue;
+
+		for (i = 0; i != uio_res->nb_maps; i++) {
+			/*
+			 * open devname, to mmap it
+			 */
+			fd = open(uio_res->path, O_RDWR);
+			if (fd < 0) {
+				RTE_LOG(ERR, EAL,
+						"Cannot open %s: %s\n", uio_res->path, strerror(errno));
+				return -1;
+			}
+
+			if (pci_map_resource(uio_res->maps[i].addr, fd,
+					(off_t) uio_res->maps[i].offset,
+					(size_t) uio_res->maps[i].size) != uio_res->maps[i].addr) {
+				RTE_LOG(ERR, EAL, "Cannot mmap device resource\n");
+				close(fd);
+				return (-1);
+			}
+			/* fd is not needed in slave process, close it */
+			close(fd);
+		}
+		return (0);
+	}
+
+	RTE_LOG(ERR, EAL, "Cannot find resource for device\n");
+	return -1;
+}
+
+static int
+pci_mknod_uio_dev(const char *sysfs_uio_path, unsigned uio_num) {
+	FILE *f;
+	char filename[PATH_MAX];
+	int ret;
+	unsigned major, minor;
+	dev_t dev;
+
+	/* get the name of the sysfs file that contains the major and minor
+	 * of the uio device and read its content */
+	rte_snprintf(filename, sizeof(filename), "%s/dev", sysfs_uio_path);
+
+	f = fopen(filename, "r");
+	if (f == NULL) {
+		RTE_LOG(ERR, EAL,
+				"%s(): cannot open sysfs to get major:minor\n", __func__);
+		return -1;
+	}
+
+	ret = fscanf(f, "%d:%d", &major, &minor);
+	if (ret != 2) {
+		RTE_LOG(ERR, EAL,
+				"%s(): cannot parse sysfs to get major:minor\n", __func__);
+		fclose(f);
+		return -1;
+	}
+	fclose(f);
+
+	/* create the char device "mknod /dev/uioX c major minor" */
+	rte_snprintf(filename, sizeof(filename), "/dev/uio%u", uio_num);
+	dev = makedev(major, minor);
+	ret = mknod(filename, S_IFCHR | S_IRUSR | S_IWUSR, dev);
+	if (f == NULL) {
+		RTE_LOG(ERR, EAL,
+				"%s(): mknod() failed %s\n", __func__, strerror(errno));
+		return -1;
+	}
+
+	return ret;
+}
+
+/*
+ * Return the uioX char device used for a pci device. On success, return
+ * the UIO number and fill dstbuf string with the path of the device in
+ * sysfs. On error, return a negative value. In this case dstbuf is
+ * invalid.
+ */
+static int
+pci_get_uio_dev(struct rte_pci_device *dev, char *dstbuf,
+		unsigned int buflen) {
+	struct rte_pci_addr *loc = &dev->addr;
+	unsigned int uio_num;
+	struct dirent *e;
+	DIR *dir;
+	char dirname[PATH_MAX];
+
+	/* depending on kernel version, uio can be located in uio/uioX
+	 * or uio:uioX */
+
+	rte_snprintf(dirname, sizeof(dirname),
+			SYSFS_PCI_DEVICES "/" PCI_PRI_FMT "/uio", loc->domain, loc->bus,
+			loc->devid, loc->function);
+
+	dir = opendir(dirname);
+	if (dir == NULL) {
+		/* retry with the parent directory */
+		rte_snprintf(dirname, sizeof(dirname),
+				SYSFS_PCI_DEVICES "/" PCI_PRI_FMT, loc->domain, loc->bus,
+				loc->devid, loc->function);
+		dir = opendir(dirname);
+
+		if (dir == NULL) {
+			RTE_LOG(ERR, EAL, "Cannot opendir %s\n", dirname);
+			return -1;
+		}
+	}
+
+	/* take the first file starting with "uio" */
+	while ((e = readdir(dir)) != NULL) {
+		/* format could be uio%d ...*/
+		int shortprefix_len = sizeof("uio") - 1;
+		/* ... or uio:uio%d */
+		int longprefix_len = sizeof("uio:uio") - 1;
+		char *endptr;
+
+		if (strncmp(e->d_name, "uio", 3) != 0)
+			continue;
+
+		/* first try uio%d */errno = 0;
+		uio_num = strtoull(e->d_name + shortprefix_len, &endptr, 10);
+		if (errno == 0 && endptr != (e->d_name + shortprefix_len)) {
+			rte_snprintf(dstbuf, buflen, "%s/uio%u", dirname, uio_num);
+			break;
+		}
+
+		/* then try uio:uio%d */errno = 0;
+		uio_num = strtoull(e->d_name + longprefix_len, &endptr, 10);
+		if (errno == 0 && endptr != (e->d_name + longprefix_len)) {
+			rte_snprintf(dstbuf, buflen, "%s/uio:uio%u", dirname, uio_num);
+			break;
+		}
+	}
+	closedir(dir);
+
+	/* No uio resource found */
+	if (e == NULL)
+		return -1;
+
+	/* create uio device if we've been asked to */
+	if (internal_config.create_uio_dev
+			&& pci_mknod_uio_dev(dstbuf, uio_num) < 0)
+		RTE_LOG(WARNING, EAL, "Cannot create /dev/uio%u\n", uio_num);
+
+	return uio_num;
+}
+
+/* map the PCI resource of a PCI device in virtual memory */
+int
+pci_uio_map_resource(struct rte_pci_device *dev) {
+	int i, j;
+	char dirname[PATH_MAX];
+	char devname[PATH_MAX]; /* contains the /dev/uioX */
+	void *mapaddr;
+	int uio_num;
+	uint64_t phaddr;
+	uint64_t offset;
+	uint64_t pagesz;
+	ssize_t nb_maps;
+	struct rte_pci_addr *loc = &dev->addr;
+	struct mapped_pci_resource *uio_res;
+	struct pci_map *maps;
+
+	dev->intr_handle.fd = -1;
+	dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
+
+	/* secondary processes - use already recorded details */
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
+		return (pci_uio_map_secondary(dev));
+
+	/* find uio resource */
+	uio_num = pci_get_uio_dev(dev, dirname, sizeof(dirname));
+	if (uio_num < 0) {
+		RTE_LOG(WARNING, EAL, "  "PCI_PRI_FMT" not managed by UIO driver, "
+		"skipping\n", loc->domain, loc->bus, loc->devid, loc->function);
+		return -1;
+	}
+	rte_snprintf(devname, sizeof(devname), "/dev/uio%u", uio_num);
+
+	/* save fd if in primary process */
+	dev->intr_handle.fd = open(devname, O_RDWR);
+	if (dev->intr_handle.fd < 0) {
+		RTE_LOG(ERR, EAL, "Cannot open %s: %s\n", devname, strerror(errno));
+		return -1;
+	}
+	dev->intr_handle.type = RTE_INTR_HANDLE_UIO;
+
+	/* allocate the mapping details for secondary processes*/
+	if ((uio_res = rte_zmalloc("UIO_RES", sizeof(*uio_res), 0)) == NULL) {
+		RTE_LOG(ERR, EAL, "%s(): cannot store uio mmap details\n", __func__);
+		return (-1);
+	}
+
+	rte_snprintf(uio_res->path, sizeof(uio_res->path), "%s", devname);
+	memcpy(&uio_res->pci_addr, &dev->addr, sizeof(uio_res->pci_addr));
+
+	/* collect info about device mappings */
+	nb_maps = pci_uio_get_mappings(dirname, uio_res->maps,
+			RTE_DIM(uio_res->maps));
+	if (nb_maps < 0) {
+		rte_free(uio_res);
+		return (nb_maps);
+	}
+
+	uio_res->nb_maps = nb_maps;
+
+	/* Map all BARs */
+	pagesz = sysconf(_SC_PAGESIZE);
+
+	maps = uio_res->maps;
+	for (i = 0; i != PCI_MAX_RESOURCE; i++) {
+		int fd;
+
+		/* skip empty BAR */
+		if ((phaddr = dev->mem_resource[i].phys_addr) == 0)
+			continue;
+
+		for (j = 0;
+				j != nb_maps
+						&& (phaddr != maps[j].phaddr
+								|| dev->mem_resource[i].len != maps[j].size);
+				j++)
+			;
+
+		/* if matching map is found, then use it */
+		if (j != nb_maps) {
+			offset = j * pagesz;
+
+			/*
+			 * open devname, to mmap it
+			 */
+			fd = open(uio_res->path, O_RDWR);
+			if (fd < 0) {
+				RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
+					uio_res->path, strerror(errno));
+				rte_free(uio_res);
+				return -1;
+			}
+
+			if (maps[j].addr != NULL
+					|| (mapaddr = pci_map_resource(NULL, fd,
+							(off_t) offset, (size_t) maps[j].size)) == NULL) {
+				rte_free(uio_res);
+				close(fd);
+				return (-1);
+			}
+			close(fd);
+
+			maps[j].addr = mapaddr;
+			maps[j].offset = offset;
+			dev->mem_resource[i].addr = mapaddr;
+		}
+	}
+
+	TAILQ_INSERT_TAIL(pci_res_list, uio_res, next);
+
+	return (0);
+}
+
+/*
+ * parse a sysfs file containing one integer value
+ * different to the eal version, as it needs to work with 64-bit values
+ */
+static int
+pci_parse_sysfs_value(const char *filename, uint64_t *val) {
+	FILE *f;
+	char buf[BUFSIZ];
+	char *end = NULL;
+
+	f = fopen(filename, "r");
+	if (f == NULL) {
+		RTE_LOG(ERR, EAL,
+				"%s(): cannot open sysfs value %s\n", __func__, filename);
+		return -1;
+	}
+
+	if (fgets(buf, sizeof(buf), f) == NULL) {
+		RTE_LOG(ERR, EAL,
+				"%s(): cannot read sysfs value %s\n", __func__, filename);
+		fclose(f);
+		return -1;
+	}
+	*val = strtoull(buf, &end, 0);
+	if ((buf[0] == '\0') || (end == NULL) || (*end != '\n')) {
+		RTE_LOG(ERR, EAL,
+				"%s(): cannot parse sysfs value %s\n", __func__, filename);
+		fclose(f);
+		return -1;
+	}
+	fclose(f);
+	return 0;
+}
diff --git a/lib/librte_eal/linuxapp/eal/include/eal_pci_init.h b/lib/librte_eal/linuxapp/eal/include/eal_pci_init.h
new file mode 100644
index 0000000..699e80d
--- /dev/null
+++ b/lib/librte_eal/linuxapp/eal/include/eal_pci_init.h
@@ -0,0 +1,65 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef EAL_PCI_INIT_H_
+#define EAL_PCI_INIT_H_
+
+struct pci_map {
+	void *addr;
+	uint64_t offset;
+	uint64_t size;
+	uint64_t phaddr;
+};
+
+/*
+ * For multi-process we need to reproduce all PCI mappings in secondary
+ * processes, so save them in a tailq.
+ */
+struct mapped_pci_resource {
+	TAILQ_ENTRY(mapped_pci_resource) next;
+
+	struct rte_pci_addr pci_addr;
+	char path[PATH_MAX];
+	int nb_maps;
+	struct pci_map maps[PCI_MAX_RESOURCE];
+};
+
+TAILQ_HEAD(mapped_pci_res_list, mapped_pci_resource);
+struct mapped_pci_res_list *pci_res_list;
+
+void * pci_map_resource(void *requested_addr, int fd, off_t offset, size_t size);
+
+/* map IGB_UIO resource prototype */
+int pci_uio_map_resource(struct rte_pci_device *dev);
+
+#endif /* EAL_PCI_INIT_H_ */
-- 
1.7.0.7

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v2 02/16] Distinguish between legitimate failures and non-fatal errors
  2014-05-01 11:05 [dpdk-dev] [PATCH 00/16] [RFC] [VFIO] Add VFIO support to DPDK Burakov, Anatoly
                   ` (2 preceding siblings ...)
  2014-05-19 15:51 ` [dpdk-dev] [PATCH v2 01/16] Separate igb_uio mapping into a separate file Anatoly Burakov
@ 2014-05-19 15:51 ` Anatoly Burakov
  2014-05-19 15:51 ` [dpdk-dev] [PATCH v2 03/16] Rename RTE_PCI_DRV_NEED_IGB_UIO to RTE_PCI_DRV_NEED_MAPPING Anatoly Burakov
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-05-19 15:51 UTC (permalink / raw)
  To: dev

Currently, EAL does not distinguish between actual failures and
expected initialization errors. E.g. sometimes the driver fails to
initialize because it was not supposed to be initialized in the
first place, such as device not being managed by said driver.

This patch makes EAL fail on actual initialization errors while
still skipping over expected initialization errors.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/eal_common_pci.c    |   16 +++++++++-------
 lib/librte_eal/linuxapp/eal/eal_pci.c     |    7 ++++---
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c |    4 ++--
 3 files changed, 15 insertions(+), 12 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_pci.c b/lib/librte_eal/common/eal_common_pci.c
index 7c23e86..1fb8f2c 100644
--- a/lib/librte_eal/common/eal_common_pci.c
+++ b/lib/librte_eal/common/eal_common_pci.c
@@ -101,8 +101,8 @@ static struct rte_devargs *pci_devargs_lookup(struct rte_pci_device *dev)
 
 /*
  * If vendor/device ID match, call the devinit() function of all
- * registered driver for the given device. Return -1 if no driver is
- * found for this device.
+ * registered driver for the given device. Return -1 if initialization
+ * failed, return 1 if no driver is found for this device.
  * For drivers with the RTE_PCI_DRV_MULTIPLE flag enabled, register
  * the same device multiple times until failure to do so.
  * It is required for non-Intel NIC drivers provided by third-parties such
@@ -118,7 +118,7 @@ pci_probe_all_drivers(struct rte_pci_device *dev)
 		rc = rte_eal_pci_probe_one_driver(dr, dev);
 		if (rc < 0)
 			/* negative value is an error */
-			break;
+			return -1;
 		if (rc > 0)
 			/* positive value means driver not found */
 			continue;
@@ -130,7 +130,7 @@ pci_probe_all_drivers(struct rte_pci_device *dev)
 				;
 		return 0;
 	}
-	return -1;
+	return 1;
 }
 
 /*
@@ -144,6 +144,7 @@ rte_eal_pci_probe(void)
 	struct rte_pci_device *dev = NULL;
 	struct rte_devargs *devargs;
 	int probe_all = 0;
+	int ret = 0;
 
 	if (rte_eal_devargs_type_count(RTE_DEVTYPE_WHITELISTED_PCI) == 0)
 		probe_all = 1;
@@ -157,10 +158,11 @@ rte_eal_pci_probe(void)
 
 		/* probe all or only whitelisted devices */
 		if (probe_all)
-			pci_probe_all_drivers(dev);
+			ret = pci_probe_all_drivers(dev);
 		else if (devargs != NULL &&
-			devargs->type == RTE_DEVTYPE_WHITELISTED_PCI &&
-			pci_probe_all_drivers(dev) < 0)
+			devargs->type == RTE_DEVTYPE_WHITELISTED_PCI)
+			ret = pci_probe_all_drivers(dev);
+		if (ret < 0)
 			rte_exit(EXIT_FAILURE, "Requested device " PCI_PRI_FMT
 				 " cannot be used\n", dev->addr.domain, dev->addr.bus,
 				 dev->addr.devid, dev->addr.function);
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c
index cd5b797..de1b0a0 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -400,6 +400,7 @@ int
 rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, struct rte_pci_device *dev)
 {
 	struct rte_pci_id *id_table;
+	int ret = 0;
 
 	for (id_table = dr->id_table ; id_table->vendor_id != 0; id_table++) {
 
@@ -430,13 +431,13 @@ rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, struct rte_pci_device *d
 		if (dev->devargs != NULL &&
 			dev->devargs->type == RTE_DEVTYPE_BLACKLISTED_PCI) {
 			RTE_LOG(DEBUG, EAL, "  Device is blacklisted, not initializing\n");
-			return 0;
+			return 1;
 		}
 
 		if (dr->drv_flags & RTE_PCI_DRV_NEED_IGB_UIO) {
 			/* map resources for devices that use igb_uio */
-			if (pci_uio_map_resource(dev) < 0)
-				return -1;
+			if ((ret = pci_uio_map_resource(dev)) != 0)
+				return ret;
 		} else if (dr->drv_flags & RTE_PCI_DRV_FORCE_UNBIND &&
 		           rte_eal_process_type() == RTE_PROC_PRIMARY) {
 			/* unbind current driver */
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
index f29fee5..2d5e75d 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
@@ -137,7 +137,7 @@ pci_uio_map_secondary(struct rte_pci_device *dev) {
 	}
 
 	RTE_LOG(ERR, EAL, "Cannot find resource for device\n");
-	return -1;
+	return 1;
 }
 
 static int
@@ -284,7 +284,7 @@ pci_uio_map_resource(struct rte_pci_device *dev) {
 	if (uio_num < 0) {
 		RTE_LOG(WARNING, EAL, "  "PCI_PRI_FMT" not managed by UIO driver, "
 		"skipping\n", loc->domain, loc->bus, loc->devid, loc->function);
-		return -1;
+		return 1;
 	}
 	rte_snprintf(devname, sizeof(devname), "/dev/uio%u", uio_num);
 
-- 
1.7.0.7

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v2 03/16] Rename RTE_PCI_DRV_NEED_IGB_UIO to RTE_PCI_DRV_NEED_MAPPING
  2014-05-01 11:05 [dpdk-dev] [PATCH 00/16] [RFC] [VFIO] Add VFIO support to DPDK Burakov, Anatoly
                   ` (3 preceding siblings ...)
  2014-05-19 15:51 ` [dpdk-dev] [PATCH v2 02/16] Distinguish between legitimate failures and non-fatal errors Anatoly Burakov
@ 2014-05-19 15:51 ` Anatoly Burakov
  2014-05-21 12:55   ` Thomas Monjalon
  2014-05-19 15:51 ` [dpdk-dev] [PATCH v2 04/16] Make igb_uio compilation optional Anatoly Burakov
                   ` (12 subsequent siblings)
  17 siblings, 1 reply; 160+ messages in thread
From: Anatoly Burakov @ 2014-05-19 15:51 UTC (permalink / raw)
  To: dev

Rename the RTE_PCI_DRV_NEED_IGB_UIO to be more generic,
retain old macro for backwards compatibility. Probably should
be removed in one of the next releases.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 app/test/test_pci.c                     |    4 ++--
 lib/librte_eal/bsdapp/eal/eal_pci.c     |    2 +-
 lib/librte_eal/common/include/rte_pci.h |    6 ++++--
 lib/librte_pmd_e1000/em_ethdev.c        |    2 +-
 lib/librte_pmd_e1000/igb_ethdev.c       |    4 ++--
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c     |    4 ++--
 lib/librte_pmd_vmxnet3/vmxnet3_ethdev.c |    2 +-
 7 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/app/test/test_pci.c b/app/test/test_pci.c
index 6908d04..fad118e 100644
--- a/app/test/test_pci.c
+++ b/app/test/test_pci.c
@@ -63,7 +63,7 @@ static int my_driver_init(struct rte_pci_driver *dr,
 			  struct rte_pci_device *dev);
 
 /*
- * To test cases where RTE_PCI_DRV_NEED_IGB_UIO is set, and isn't set, two
+ * To test cases where RTE_PCI_DRV_NEED_MAPPING is set, and isn't set, two
  * drivers are created (one with IGB devices, the other with IXGBE devices).
  */
 
@@ -91,7 +91,7 @@ struct rte_pci_driver my_driver = {
 	.name = "test_driver",
 	.devinit = my_driver_init,
 	.id_table = my_driver_id,
-	.drv_flags = RTE_PCI_DRV_NEED_IGB_UIO,
+	.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
 };
 
 struct rte_pci_driver my_driver2 = {
diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c b/lib/librte_eal/bsdapp/eal/eal_pci.c
index 94ae461..eddbd2f 100644
--- a/lib/librte_eal/bsdapp/eal/eal_pci.c
+++ b/lib/librte_eal/bsdapp/eal/eal_pci.c
@@ -474,7 +474,7 @@ rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, struct rte_pci_device *d
 			return 0;
 		}
 
-		if (dr->drv_flags & RTE_PCI_DRV_NEED_IGB_UIO) {
+		if (dr->drv_flags & RTE_PCI_DRV_NEED_MAPPING) {
 			/* map resources for devices that use igb_uio */
 			if (pci_uio_map_resource(dev) < 0)
 				return -1;
diff --git a/lib/librte_eal/common/include/rte_pci.h b/lib/librte_eal/common/include/rte_pci.h
index c793773..84d7b42 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -190,12 +190,14 @@ struct rte_pci_driver {
 	uint32_t drv_flags;                     /**< Flags contolling handling of device. */
 };
 
-/** Device needs igb_uio kernel module */
-#define RTE_PCI_DRV_NEED_IGB_UIO 0x0001
+/** Device needs PCI BAR mapping (done with either IGB_UIO or VFIO) */
+#define RTE_PCI_DRV_NEED_MAPPING 0x0001
 /** Device driver must be registered several times until failure */
 #define RTE_PCI_DRV_MULTIPLE 0x0002
 /** Device needs to be unbound even if no module is provided */
 #define RTE_PCI_DRV_FORCE_UNBIND 0x0004
+/** Retain the old name for backwards-compatibility */
+#define RTE_PCI_DRV_NEED_IGB_UIO RTE_PCI_DRV_NEED_MAPPING
 
 /**< Internal use only - Macro used by pci addr parsing functions **/
 #define GET_PCIADDR_FIELD(in, fd, lim, dlm)                   \
diff --git a/lib/librte_pmd_e1000/em_ethdev.c b/lib/librte_pmd_e1000/em_ethdev.c
index 755e474..f3575d5 100644
--- a/lib/librte_pmd_e1000/em_ethdev.c
+++ b/lib/librte_pmd_e1000/em_ethdev.c
@@ -279,7 +279,7 @@ static struct eth_driver rte_em_pmd = {
 	{
 		.name = "rte_em_pmd",
 		.id_table = pci_id_em_map,
-		.drv_flags = RTE_PCI_DRV_NEED_IGB_UIO,
+		.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
 	},
 	.eth_dev_init = eth_em_dev_init,
 	.dev_private_size = sizeof(struct e1000_adapter),
diff --git a/lib/librte_pmd_e1000/igb_ethdev.c b/lib/librte_pmd_e1000/igb_ethdev.c
index c7b3926..b49db52 100644
--- a/lib/librte_pmd_e1000/igb_ethdev.c
+++ b/lib/librte_pmd_e1000/igb_ethdev.c
@@ -600,7 +600,7 @@ static struct eth_driver rte_igb_pmd = {
 	{
 		.name = "rte_igb_pmd",
 		.id_table = pci_id_igb_map,
-		.drv_flags = RTE_PCI_DRV_NEED_IGB_UIO,
+		.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
 	},
 	.eth_dev_init = eth_igb_dev_init,
 	.dev_private_size = sizeof(struct e1000_adapter),
@@ -613,7 +613,7 @@ static struct eth_driver rte_igbvf_pmd = {
 	{
 		.name = "rte_igbvf_pmd",
 		.id_table = pci_id_igbvf_map,
-		.drv_flags = RTE_PCI_DRV_NEED_IGB_UIO,
+		.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
 	},
 	.eth_dev_init = eth_igbvf_dev_init,
 	.dev_private_size = sizeof(struct e1000_adapter),
diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
index e78c208..5354a3f 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
@@ -931,7 +931,7 @@ static struct eth_driver rte_ixgbe_pmd = {
 	{
 		.name = "rte_ixgbe_pmd",
 		.id_table = pci_id_ixgbe_map,
-		.drv_flags = RTE_PCI_DRV_NEED_IGB_UIO,
+		.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
 	},
 	.eth_dev_init = eth_ixgbe_dev_init,
 	.dev_private_size = sizeof(struct ixgbe_adapter),
@@ -944,7 +944,7 @@ static struct eth_driver rte_ixgbevf_pmd = {
 	{
 		.name = "rte_ixgbevf_pmd",
 		.id_table = pci_id_ixgbevf_map,
-		.drv_flags = RTE_PCI_DRV_NEED_IGB_UIO,
+		.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
 	},
 	.eth_dev_init = eth_ixgbevf_dev_init,
 	.dev_private_size = sizeof(struct ixgbe_adapter),
diff --git a/lib/librte_pmd_vmxnet3/vmxnet3_ethdev.c b/lib/librte_pmd_vmxnet3/vmxnet3_ethdev.c
index 8259cfe..a08c2bf 100644
--- a/lib/librte_pmd_vmxnet3/vmxnet3_ethdev.c
+++ b/lib/librte_pmd_vmxnet3/vmxnet3_ethdev.c
@@ -267,7 +267,7 @@ static struct eth_driver rte_vmxnet3_pmd = {
 	{
 		.name = "rte_vmxnet3_pmd",
 		.id_table = pci_id_vmxnet3_map,
-		.drv_flags = RTE_PCI_DRV_NEED_IGB_UIO,
+		.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
 	},
 	.eth_dev_init = eth_vmxnet3_dev_init,
 	.dev_private_size = sizeof(struct vmxnet3_adapter),
-- 
1.7.0.7

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v2 04/16] Make igb_uio compilation optional
  2014-05-01 11:05 [dpdk-dev] [PATCH 00/16] [RFC] [VFIO] Add VFIO support to DPDK Burakov, Anatoly
                   ` (4 preceding siblings ...)
  2014-05-19 15:51 ` [dpdk-dev] [PATCH v2 03/16] Rename RTE_PCI_DRV_NEED_IGB_UIO to RTE_PCI_DRV_NEED_MAPPING Anatoly Burakov
@ 2014-05-19 15:51 ` Anatoly Burakov
  2014-05-19 15:51 ` [dpdk-dev] [PATCH v2 05/16] Moved interrupt type out of igb_uio Anatoly Burakov
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-05-19 15:51 UTC (permalink / raw)
  To: dev

Currently, igb_uio is always compiled. Some Linux distribution may
not want to include igb_uio by default, so we need to make sure that
igb_uio compilation can be optional.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 config/defconfig_i686-default-linuxapp-gcc   |    1 +
 config/defconfig_i686-default-linuxapp-icc   |    1 +
 config/defconfig_x86_64-default-linuxapp-gcc |    1 +
 config/defconfig_x86_64-default-linuxapp-icc |    1 +
 lib/librte_eal/linuxapp/Makefile             |    2 ++
 lib/librte_eal/linuxapp/eal/eal_pci.c        |    2 +-
 6 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/config/defconfig_i686-default-linuxapp-gcc b/config/defconfig_i686-default-linuxapp-gcc
index 14bd3d1..ea90f12 100644
--- a/config/defconfig_i686-default-linuxapp-gcc
+++ b/config/defconfig_i686-default-linuxapp-gcc
@@ -123,6 +123,7 @@ CONFIG_RTE_LOG_HISTORY=256
 CONFIG_RTE_LIBEAL_USE_HPET=n
 CONFIG_RTE_EAL_ALLOW_INV_SOCKET_ID=n
 CONFIG_RTE_EAL_ALWAYS_PANIC_ON_ERROR=n
+CONFIG_RTE_EAL_IGB_UIO=y
 
 #
 # Compile Environment Abstraction Layer for linux
diff --git a/config/defconfig_i686-default-linuxapp-icc b/config/defconfig_i686-default-linuxapp-icc
index ec3386e..ecfbf28 100644
--- a/config/defconfig_i686-default-linuxapp-icc
+++ b/config/defconfig_i686-default-linuxapp-icc
@@ -123,6 +123,7 @@ CONFIG_RTE_LOG_HISTORY=256
 CONFIG_RTE_LIBEAL_USE_HPET=n
 CONFIG_RTE_EAL_ALLOW_INV_SOCKET_ID=n
 CONFIG_RTE_EAL_ALWAYS_PANIC_ON_ERROR=n
+CONFIG_RTE_EAL_IGB_UIO=y
 
 #
 # Compile Environment Abstraction Layer for linux
diff --git a/config/defconfig_x86_64-default-linuxapp-gcc b/config/defconfig_x86_64-default-linuxapp-gcc
index f11ffbf..fc69b80 100644
--- a/config/defconfig_x86_64-default-linuxapp-gcc
+++ b/config/defconfig_x86_64-default-linuxapp-gcc
@@ -123,6 +123,7 @@ CONFIG_RTE_LOG_HISTORY=256
 CONFIG_RTE_LIBEAL_USE_HPET=n
 CONFIG_RTE_EAL_ALLOW_INV_SOCKET_ID=n
 CONFIG_RTE_EAL_ALWAYS_PANIC_ON_ERROR=n
+CONFIG_RTE_EAL_IGB_UIO=y
 
 #
 # Compile Environment Abstraction Layer for linux
diff --git a/config/defconfig_x86_64-default-linuxapp-icc b/config/defconfig_x86_64-default-linuxapp-icc
index 4eaca4c..4ab45b3 100644
--- a/config/defconfig_x86_64-default-linuxapp-icc
+++ b/config/defconfig_x86_64-default-linuxapp-icc
@@ -123,6 +123,7 @@ CONFIG_RTE_LOG_HISTORY=256
 CONFIG_RTE_LIBEAL_USE_HPET=n
 CONFIG_RTE_EAL_ALLOW_INV_SOCKET_ID=n
 CONFIG_RTE_EAL_ALWAYS_PANIC_ON_ERROR=n
+CONFIG_RTE_EAL_IGB_UIO=y
 
 #
 # Compile Environment Abstraction Layer for linux
diff --git a/lib/librte_eal/linuxapp/Makefile b/lib/librte_eal/linuxapp/Makefile
index b00e89f..acbf500 100644
--- a/lib/librte_eal/linuxapp/Makefile
+++ b/lib/librte_eal/linuxapp/Makefile
@@ -31,7 +31,9 @@
 
 include $(RTE_SDK)/mk/rte.vars.mk
 
+ifeq ($(CONFIG_RTE_EAL_IGB_UIO),y)
 DIRS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += igb_uio
+endif
 DIRS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal
 ifeq ($(CONFIG_RTE_LIBRTE_KNI),y)
 DIRS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += kni
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c
index de1b0a0..7256406 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -434,7 +434,7 @@ rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, struct rte_pci_device *d
 			return 1;
 		}
 
-		if (dr->drv_flags & RTE_PCI_DRV_NEED_IGB_UIO) {
+		if (dr->drv_flags & RTE_PCI_DRV_NEED_MAPPING) {
 			/* map resources for devices that use igb_uio */
 			if ((ret = pci_uio_map_resource(dev)) != 0)
 				return ret;
-- 
1.7.0.7

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v2 05/16] Moved interrupt type out of igb_uio
  2014-05-01 11:05 [dpdk-dev] [PATCH 00/16] [RFC] [VFIO] Add VFIO support to DPDK Burakov, Anatoly
                   ` (5 preceding siblings ...)
  2014-05-19 15:51 ` [dpdk-dev] [PATCH v2 04/16] Make igb_uio compilation optional Anatoly Burakov
@ 2014-05-19 15:51 ` Anatoly Burakov
  2014-05-21 13:38   ` Thomas Monjalon
  2014-05-21 13:46   ` Thomas Monjalon
  2014-05-19 15:51 ` [dpdk-dev] [PATCH v2 06/16] Add support for VFIO in Linuxapp targets Anatoly Burakov
                   ` (10 subsequent siblings)
  17 siblings, 2 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-05-19 15:51 UTC (permalink / raw)
  To: dev

Moving interrupt type enum out of igb_uio and renaming it to be more
generic. Such a strange header naming and separation is done mostly to
make coming virtio patches easier to port to dpdk.org tree.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/Makefile                     |    1 +
 lib/librte_eal/common/include/rte_pci.h            |    1 +
 .../common/include/rte_pci_dev_feature_defs.h      |   46 ++++++++++++++++++
 .../common/include/rte_pci_dev_features.h          |   40 ++++++++++++++++
 lib/librte_eal/linuxapp/igb_uio/igb_uio.c          |   49 ++++++++-----------
 5 files changed, 109 insertions(+), 28 deletions(-)
 create mode 100644 lib/librte_eal/common/include/rte_pci_dev_feature_defs.h
 create mode 100644 lib/librte_eal/common/include/rte_pci_dev_features.h

diff --git a/lib/librte_eal/common/Makefile b/lib/librte_eal/common/Makefile
index 2f99bf4..7daf38c 100644
--- a/lib/librte_eal/common/Makefile
+++ b/lib/librte_eal/common/Makefile
@@ -39,6 +39,7 @@ INC += rte_rwlock.h rte_spinlock.h rte_tailq.h rte_interrupts.h rte_alarm.h
 INC += rte_string_fns.h rte_cpuflags.h rte_version.h rte_tailq_elem.h
 INC += rte_eal_memconfig.h rte_malloc_heap.h
 INC += rte_hexdump.h rte_devargs.h rte_vdev.h
+INC += rte_pci_dev_feature_defs.h rte_pci_dev_features.h
 
 ifeq ($(CONFIG_RTE_INSECURE_FUNCTION_WARNING),y)
 INC += rte_warnings.h
diff --git a/lib/librte_eal/common/include/rte_pci.h b/lib/librte_eal/common/include/rte_pci.h
index 84d7b42..d364cee 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -80,6 +80,7 @@ extern "C" {
 #include <sys/queue.h>
 #include <stdint.h>
 #include <inttypes.h>
+
 #include <rte_interrupts.h>
 
 TAILQ_HEAD(pci_device_list, rte_pci_device); /**< PCI devices in D-linked Q. */
diff --git a/lib/librte_eal/common/include/rte_pci_dev_feature_defs.h b/lib/librte_eal/common/include/rte_pci_dev_feature_defs.h
new file mode 100644
index 0000000..82f2c00
--- /dev/null
+++ b/lib/librte_eal/common/include/rte_pci_dev_feature_defs.h
@@ -0,0 +1,46 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_PCI_DEV_DEFS_H_
+#define _RTE_PCI_DEV_DEFS_H_
+
+/* interrupt mode */
+enum rte_intr_mode {
+	RTE_INTR_MODE_NONE = 0,
+	RTE_INTR_MODE_LEGACY,
+	RTE_INTR_MODE_MSI,
+	RTE_INTR_MODE_MSIX,
+	RTE_INTR_MODE_MAX
+};
+
+#endif /* _RTE_PCI_DEV_DEFS_H_ */
diff --git a/lib/librte_eal/common/include/rte_pci_dev_features.h b/lib/librte_eal/common/include/rte_pci_dev_features.h
new file mode 100644
index 0000000..61f271a
--- /dev/null
+++ b/lib/librte_eal/common/include/rte_pci_dev_features.h
@@ -0,0 +1,40 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <rte_pci_dev_feature_defs.h>
+
+#define RTE_INTR_MODE_NONE_NAME "none"
+#define RTE_INTR_MODE_LEGACY_NAME "legacy"
+#define RTE_INTR_MODE_MSI_NAME "msi"
+#define RTE_INTR_MODE_MSIX_NAME "msix"
+#define RTE_INTR_MODE_MAX_MAX "max"
diff --git a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
index 09c40bf..043c0f6 100644
--- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
+++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
@@ -33,6 +33,7 @@
 #ifdef CONFIG_XEN_DOM0 
 #include <xen/xen.h>
 #endif
+#include <rte_pci_dev_features.h>
 
 /**
  * MSI-X related macros, copy from linux/pci_regs.h in kernel 2.6.39,
@@ -49,14 +50,6 @@
 
 #define IGBUIO_NUM_MSI_VECTORS 1
 
-/* interrupt mode */
-enum igbuio_intr_mode {
-	IGBUIO_LEGACY_INTR_MODE = 0,
-	IGBUIO_MSI_INTR_MODE,
-	IGBUIO_MSIX_INTR_MODE,
-	IGBUIO_INTR_MODE_MAX
-};
-
 /**
  * A structure describing the private information for a uio device.
  */
@@ -64,13 +57,13 @@ struct rte_uio_pci_dev {
 	struct uio_info info;
 	struct pci_dev *pdev;
 	spinlock_t lock; /* spinlock for accessing PCI config space or msix data in multi tasks/isr */
-	enum igbuio_intr_mode mode;
+	enum rte_intr_mode mode;
 	struct msix_entry \
 		msix_entries[IGBUIO_NUM_MSI_VECTORS]; /* pointer to the msix vectors to be allocated later */
 };
 
 static char *intr_mode = NULL;
-static enum igbuio_intr_mode igbuio_intr_mode_preferred = IGBUIO_MSIX_INTR_MODE;
+static enum rte_intr_mode igbuio_intr_mode_preferred = RTE_INTR_MODE_MSIX;
 
 /* PCI device id table */
 static struct pci_device_id igbuio_pci_ids[] = {
@@ -222,14 +215,14 @@ igbuio_set_interrupt_mask(struct rte_uio_pci_dev *udev, int32_t state)
 {
 	struct pci_dev *pdev = udev->pdev;
 
-	if (udev->mode == IGBUIO_MSIX_INTR_MODE) {
+	if (udev->mode == RTE_INTR_MODE_MSIX) {
 		struct msi_desc *desc;
 
 		list_for_each_entry(desc, &pdev->msi_list, list) {
 			igbuio_msix_mask_irq(desc, state);
 		}
 	}
-	else if (udev->mode == IGBUIO_LEGACY_INTR_MODE) {
+	else if (udev->mode == RTE_INTR_MODE_LEGACY) {
 		uint32_t status;
 		uint16_t old, new;
 
@@ -301,7 +294,7 @@ igbuio_pci_irqhandler(int irq, struct uio_info *info)
 		goto spin_unlock;
 
 	/* for legacy mode, interrupt maybe shared */
-	if (udev->mode == IGBUIO_LEGACY_INTR_MODE) {
+	if (udev->mode == RTE_INTR_MODE_LEGACY) {
 		pci_read_config_dword(pdev, PCI_COMMAND, &cmd_status_dword);
 		status = cmd_status_dword >> 16;
 		/* interrupt is not ours, goes to out */
@@ -520,18 +513,18 @@ igbuio_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
 #endif
 	udev->info.priv = udev;
 	udev->pdev = dev;
-	udev->mode = 0; /* set the default value for interrupt mode */
+	udev->mode = RTE_INTR_MODE_LEGACY;
 	spin_lock_init(&udev->lock);
 
 	/* check if it need to try msix first */
-	if (igbuio_intr_mode_preferred == IGBUIO_MSIX_INTR_MODE) {
+	if (igbuio_intr_mode_preferred == RTE_INTR_MODE_MSIX) {
 		int vector;
 
 		for (vector = 0; vector < IGBUIO_NUM_MSI_VECTORS; vector ++)
 			udev->msix_entries[vector].entry = vector;
 
 		if (pci_enable_msix(udev->pdev, udev->msix_entries, IGBUIO_NUM_MSI_VECTORS) == 0) {
-			udev->mode = IGBUIO_MSIX_INTR_MODE;
+			udev->mode = RTE_INTR_MODE_MSIX;
 		}
 		else {
 			pci_disable_msix(udev->pdev);
@@ -539,13 +532,13 @@ igbuio_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
 		}
 	}
 	switch (udev->mode) {
-	case IGBUIO_MSIX_INTR_MODE:
+	case RTE_INTR_MODE_MSIX:
 		udev->info.irq_flags = 0;
 		udev->info.irq = udev->msix_entries[0].vector;
 		break;
-	case IGBUIO_MSI_INTR_MODE:
+	case RTE_INTR_MODE_MSI:
 		break;
-	case IGBUIO_LEGACY_INTR_MODE:
+	case RTE_INTR_MODE_LEGACY:
 		udev->info.irq_flags = IRQF_SHARED;
 		udev->info.irq = dev->irq;
 		break;
@@ -570,7 +563,7 @@ igbuio_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
 fail_release_iomem:
 	sysfs_remove_group(&dev->dev.kobj, &dev_attr_grp);
 	igbuio_pci_release_iomem(&udev->info);
-	if (udev->mode == IGBUIO_MSIX_INTR_MODE)
+	if (udev->mode == RTE_INTR_MODE_MSIX)
 		pci_disable_msix(udev->pdev);
 	pci_release_regions(dev);
 fail_disable:
@@ -595,8 +588,8 @@ igbuio_pci_remove(struct pci_dev *dev)
 	uio_unregister_device(info);
 	igbuio_pci_release_iomem(info);
 	if (((struct rte_uio_pci_dev *)info->priv)->mode ==
-					IGBUIO_MSIX_INTR_MODE)
-		pci_disable_msix(dev);
+			RTE_INTR_MODE_MSIX)
+	pci_disable_msix(dev);
 	pci_release_regions(dev);
 	pci_disable_device(dev);
 	pci_set_drvdata(dev, NULL);
@@ -611,11 +604,11 @@ igbuio_config_intr_mode(char *intr_str)
 		return 0;
 	}
 
-	if (!strcmp(intr_str, "msix")) {
-		igbuio_intr_mode_preferred = IGBUIO_MSIX_INTR_MODE;
+	if (!strcmp(intr_str, RTE_INTR_MODE_MSIX_NAME)) {
+		igbuio_intr_mode_preferred = RTE_INTR_MODE_MSIX;
 		printk(KERN_INFO "Use MSIX interrupt\n");
-	} else if (!strcmp(intr_str, "legacy")) {
-		igbuio_intr_mode_preferred = IGBUIO_LEGACY_INTR_MODE;
+	} else if (!strcmp(intr_str, RTE_INTR_MODE_LEGACY_NAME)) {
+		igbuio_intr_mode_preferred = RTE_INTR_MODE_LEGACY;
 		printk(KERN_INFO "Use legacy interrupt\n");
 	} else {
 		printk(KERN_INFO "Error: bad parameter - %s\n", intr_str);
@@ -656,8 +649,8 @@ module_exit(igbuio_pci_exit_module);
 module_param(intr_mode, charp, S_IRUGO | S_IWUSR);
 MODULE_PARM_DESC(intr_mode,
 "igb_uio interrupt mode (default=msix):\n"
-"    msix       Use MSIX interrupt\n"
-"    legacy     Use Legacy interrupt\n"
+"    " RTE_INTR_MODE_MSIX_NAME "       Use MSIX interrupt\n"
+"    " RTE_INTR_MODE_LEGACY_NAME "     Use Legacy interrupt\n"
 "\n");
 
 MODULE_DESCRIPTION("UIO driver for Intel IGB PCI cards");
-- 
1.7.0.7

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v2 06/16] Add support for VFIO in Linuxapp targets
  2014-05-01 11:05 [dpdk-dev] [PATCH 00/16] [RFC] [VFIO] Add VFIO support to DPDK Burakov, Anatoly
                   ` (6 preceding siblings ...)
  2014-05-19 15:51 ` [dpdk-dev] [PATCH v2 05/16] Moved interrupt type out of igb_uio Anatoly Burakov
@ 2014-05-19 15:51 ` Anatoly Burakov
  2014-05-19 15:51 ` [dpdk-dev] [PATCH v2 07/16] Add support for VFIO interrupts, add VFIO header Anatoly Burakov
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-05-19 15:51 UTC (permalink / raw)
  To: dev

Make VFIO compilation optional for all configs.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 config/defconfig_i686-default-linuxapp-gcc   |    1 +
 config/defconfig_i686-default-linuxapp-icc   |    1 +
 config/defconfig_x86_64-default-linuxapp-gcc |    1 +
 config/defconfig_x86_64-default-linuxapp-icc |    1 +
 4 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/config/defconfig_i686-default-linuxapp-gcc b/config/defconfig_i686-default-linuxapp-gcc
index ea90f12..5410f57 100644
--- a/config/defconfig_i686-default-linuxapp-gcc
+++ b/config/defconfig_i686-default-linuxapp-gcc
@@ -124,6 +124,7 @@ CONFIG_RTE_LIBEAL_USE_HPET=n
 CONFIG_RTE_EAL_ALLOW_INV_SOCKET_ID=n
 CONFIG_RTE_EAL_ALWAYS_PANIC_ON_ERROR=n
 CONFIG_RTE_EAL_IGB_UIO=y
+CONFIG_RTE_EAL_VFIO=y
 
 #
 # Compile Environment Abstraction Layer for linux
diff --git a/config/defconfig_i686-default-linuxapp-icc b/config/defconfig_i686-default-linuxapp-icc
index ecfbf28..1c0000c 100644
--- a/config/defconfig_i686-default-linuxapp-icc
+++ b/config/defconfig_i686-default-linuxapp-icc
@@ -124,6 +124,7 @@ CONFIG_RTE_LIBEAL_USE_HPET=n
 CONFIG_RTE_EAL_ALLOW_INV_SOCKET_ID=n
 CONFIG_RTE_EAL_ALWAYS_PANIC_ON_ERROR=n
 CONFIG_RTE_EAL_IGB_UIO=y
+CONFIG_RTE_EAL_VFIO=y
 
 #
 # Compile Environment Abstraction Layer for linux
diff --git a/config/defconfig_x86_64-default-linuxapp-gcc b/config/defconfig_x86_64-default-linuxapp-gcc
index fc69b80..5c682a5 100644
--- a/config/defconfig_x86_64-default-linuxapp-gcc
+++ b/config/defconfig_x86_64-default-linuxapp-gcc
@@ -124,6 +124,7 @@ CONFIG_RTE_LIBEAL_USE_HPET=n
 CONFIG_RTE_EAL_ALLOW_INV_SOCKET_ID=n
 CONFIG_RTE_EAL_ALWAYS_PANIC_ON_ERROR=n
 CONFIG_RTE_EAL_IGB_UIO=y
+CONFIG_RTE_EAL_VFIO=y
 
 #
 # Compile Environment Abstraction Layer for linux
diff --git a/config/defconfig_x86_64-default-linuxapp-icc b/config/defconfig_x86_64-default-linuxapp-icc
index 4ab45b3..b9bb7f6 100644
--- a/config/defconfig_x86_64-default-linuxapp-icc
+++ b/config/defconfig_x86_64-default-linuxapp-icc
@@ -124,6 +124,7 @@ CONFIG_RTE_LIBEAL_USE_HPET=n
 CONFIG_RTE_EAL_ALLOW_INV_SOCKET_ID=n
 CONFIG_RTE_EAL_ALWAYS_PANIC_ON_ERROR=n
 CONFIG_RTE_EAL_IGB_UIO=y
+CONFIG_RTE_EAL_VFIO=y
 
 #
 # Compile Environment Abstraction Layer for linux
-- 
1.7.0.7

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v2 07/16] Add support for VFIO interrupts, add VFIO header
  2014-05-01 11:05 [dpdk-dev] [PATCH 00/16] [RFC] [VFIO] Add VFIO support to DPDK Burakov, Anatoly
                   ` (7 preceding siblings ...)
  2014-05-19 15:51 ` [dpdk-dev] [PATCH v2 06/16] Add support for VFIO in Linuxapp targets Anatoly Burakov
@ 2014-05-19 15:51 ` Anatoly Burakov
  2014-05-21 16:07   ` Thomas Monjalon
  2014-05-19 15:51 ` [dpdk-dev] [PATCH v2 08/16] Add support for mapping devices through VFIO Anatoly Burakov
                   ` (8 subsequent siblings)
  17 siblings, 1 reply; 160+ messages in thread
From: Anatoly Burakov @ 2014-05-19 15:51 UTC (permalink / raw)
  To: dev

Creating code to handle VFIO interrupts in EAL interrupts, and also
adding a header eal_vfio.h.

This header checks two things:
* checks if CONFIG_RTE_EAL_VFIO was enabled during build time
* checks that kernel version is 3.6+ so that DPDK would still compile
  on older kernels despite VFIO compilation being enabled by default.

This header also defines a VFIO_PRESENT macro, which should be used to
conditionally compile all the VFIO code. This is because having
CONFIG_RTE_EAL_VFIO enabled doesn't guarantee that the VFIO support is
compiled in, because we're still dependent on kernel version.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/linuxapp/eal/eal_interrupts.c       |  203 +++++++++++++++++++-
 lib/librte_eal/linuxapp/eal/include/eal_vfio.h     |   49 +++++
 .../linuxapp/eal/include/exec-env/rte_interrupts.h |    3 +
 3 files changed, 250 insertions(+), 5 deletions(-)
 create mode 100644 lib/librte_eal/linuxapp/eal/include/eal_vfio.h

diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
index 58e1ddf..cb95e2a 100644
--- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c
+++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
@@ -36,7 +36,6 @@
 #include <stdlib.h>
 #include <pthread.h>
 #include <sys/queue.h>
-#include <malloc.h>
 #include <stdarg.h>
 #include <unistd.h>
 #include <string.h>
@@ -44,6 +43,7 @@
 #include <inttypes.h>
 #include <sys/epoll.h>
 #include <sys/signalfd.h>
+#include <sys/ioctl.h>
 
 #include <rte_common.h>
 #include <rte_interrupts.h>
@@ -66,6 +66,7 @@
 #include <rte_spinlock.h>
 
 #include "eal_private.h"
+#include "eal_vfio.h"
 
 #define EAL_INTR_EPOLL_WAIT_FOREVER (-1)
 
@@ -87,6 +88,7 @@ union intr_pipefds{
  */
 union rte_intr_read_buffer {
 	int uio_intr_count;              /* for uio device */
+	uint64_t vfio_intr_count;        /* for vfio device */
 	uint64_t timerfd_num;            /* for timerfd */
 	char charbuf[16];                /* for others */
 };
@@ -119,6 +121,173 @@ static struct rte_intr_source_list intr_sources;
 /* interrupt handling thread */
 static pthread_t intr_thread;
 
+/* VFIO interrupts */
+#ifdef VFIO_PRESENT
+
+#define IRQ_SET_BUF_LEN  (sizeof(struct vfio_irq_set) + sizeof(int))
+
+/* enable legacy (INTx) interrupts */
+static int
+vfio_enable_intx(struct rte_intr_handle *intr_handle) {
+	struct vfio_irq_set * irq_set;
+	char irq_set_buf[IRQ_SET_BUF_LEN];
+	int len, ret;
+	int * fd_ptr;
+
+	len = sizeof(irq_set_buf);
+
+	/* enable INTx */
+	irq_set = (struct vfio_irq_set *) irq_set_buf;
+	irq_set->argsz = len;
+	irq_set->count = 1;
+	irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD | VFIO_IRQ_SET_ACTION_TRIGGER;
+	irq_set->index = VFIO_PCI_INTX_IRQ_INDEX;
+	irq_set->start = 0;
+	fd_ptr = (int*) &irq_set->data;
+	*fd_ptr = intr_handle->fd;
+
+	ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
+
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Error enabling INTx interrupts for fd %d\n",
+						intr_handle->fd);
+		return -1;
+	}
+
+	/* unmask INTx after enabling */
+	memset(irq_set, 0, len);
+	len = sizeof(struct vfio_irq_set);
+	irq_set->argsz = len;
+	irq_set->count = 1;
+	irq_set->flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_UNMASK;
+	irq_set->index = VFIO_PCI_INTX_IRQ_INDEX;
+	irq_set->start = 0;
+
+	ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
+
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Error unmasking INTx interrupts for fd %d\n",
+						intr_handle->fd);
+		return -1;
+	}
+	return 0;
+}
+
+/* disable legacy (INTx) interrupts */
+static int
+vfio_disable_intx(struct rte_intr_handle *intr_handle) {
+	struct vfio_irq_set * irq_set;
+	char irq_set_buf[IRQ_SET_BUF_LEN];
+	int len, ret;
+
+	len = sizeof(struct vfio_irq_set);
+
+	/* mask interrupts before disabling */
+	irq_set = (struct vfio_irq_set *) irq_set_buf;
+	irq_set->argsz = len;
+	irq_set->count = 1;
+	irq_set->flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_UNMASK;
+	irq_set->index = VFIO_PCI_INTX_IRQ_INDEX;
+	irq_set->start = 0;
+
+	ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
+
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Error unmasking INTx interrupts for fd %d\n",
+						intr_handle->fd);
+		return -1;
+	}
+
+	/* disable INTx*/
+	memset(irq_set, 0, len);
+	irq_set->argsz = len;
+	irq_set->count = 0;
+	irq_set->flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_TRIGGER;
+	irq_set->index = VFIO_PCI_INTX_IRQ_INDEX;
+	irq_set->start = 0;
+
+	ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
+
+	if (ret) {
+		RTE_LOG(ERR, EAL,
+			"Error disabling INTx interrupts for fd %d\n", intr_handle->fd);
+		return -1;
+	}
+	return 0;
+}
+
+/* enable MSI-X interrupts */
+static int
+vfio_enable_msix(struct rte_intr_handle *intr_handle) {
+	int len, ret;
+	char irq_set_buf[IRQ_SET_BUF_LEN];
+	struct vfio_irq_set * irq_set;
+	int * fd_ptr;
+
+	len = sizeof(irq_set_buf);
+
+	irq_set = (struct vfio_irq_set *) irq_set_buf;
+	irq_set->argsz = len;
+	irq_set->count = 1;
+	irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD | VFIO_IRQ_SET_ACTION_TRIGGER;
+	irq_set->index = VFIO_PCI_MSIX_IRQ_INDEX;
+	irq_set->start = 0;
+	fd_ptr = (int*) &irq_set->data;
+	*fd_ptr = intr_handle->fd;
+
+	ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
+
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Error enabling MSI-X interrupts for fd %d\n",
+						intr_handle->fd);
+		return -1;
+	}
+
+	/* manually trigger interrupt to enable it */
+	memset(irq_set, 0, len);
+	len = sizeof(struct vfio_irq_set);
+	irq_set->argsz = len;
+	irq_set->count = 1;
+	irq_set->flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_TRIGGER;
+	irq_set->index = VFIO_PCI_MSIX_IRQ_INDEX;
+	irq_set->start = 0;
+
+	ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
+
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Error triggering MSI-X interrupts for fd %d\n",
+						intr_handle->fd);
+		return -1;
+	}
+	return 0;
+}
+
+/* disable MSI-X interrupts */
+static int
+vfio_disable_msix(struct rte_intr_handle *intr_handle) {
+	struct vfio_irq_set * irq_set;
+	char irq_set_buf[IRQ_SET_BUF_LEN];
+	int len, ret;
+
+	len = sizeof(struct vfio_irq_set);
+
+	irq_set = (struct vfio_irq_set *) irq_set_buf;
+	irq_set->argsz = len;
+	irq_set->count = 0;
+	irq_set->flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_TRIGGER;
+	irq_set->index = VFIO_PCI_MSIX_IRQ_INDEX;
+	irq_set->start = 0;
+
+	ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
+
+	if (ret)
+		RTE_LOG(ERR, EAL,
+			"Error disabling MSI-X interrupts for fd %d\n", intr_handle->fd);
+
+	return ret;
+}
+#endif
+
 int
 rte_intr_callback_register(struct rte_intr_handle *intr_handle,
 			rte_intr_callback_fn cb, void *cb_arg)
@@ -276,6 +445,16 @@ rte_intr_enable(struct rte_intr_handle *intr_handle)
 	/* not used at this moment */
 	case RTE_INTR_HANDLE_ALARM:
 		return -1;
+#ifdef VFIO_PRESENT
+	case RTE_INTR_HANDLE_VFIO_MSIX:
+		if (vfio_enable_msix(intr_handle))
+			return -1;
+		break;
+	case RTE_INTR_HANDLE_VFIO_LEGACY:
+		if (vfio_enable_intx(intr_handle))
+			return -1;
+		break;
+#endif
 	/* unknown handle type */
 	default:
 		RTE_LOG(ERR, EAL,
@@ -300,7 +479,7 @@ rte_intr_disable(struct rte_intr_handle *intr_handle)
 	case RTE_INTR_HANDLE_UIO:
 		if (write(intr_handle->fd, &value, sizeof(value)) < 0){
 			RTE_LOG(ERR, EAL,
-				"Error enabling interrupts for fd %d\n",
+				"Error disabling interrupts for fd %d\n",
 							intr_handle->fd);
 			return -1;
 		}
@@ -308,6 +487,16 @@ rte_intr_disable(struct rte_intr_handle *intr_handle)
 	/* not used at this moment */
 	case RTE_INTR_HANDLE_ALARM:
 		return -1;
+#ifdef VFIO_PRESENT
+	case RTE_INTR_HANDLE_VFIO_MSIX:
+		if (vfio_disable_msix(intr_handle))
+			return -1;
+		break;
+	case RTE_INTR_HANDLE_VFIO_LEGACY:
+		if (vfio_disable_intx(intr_handle))
+			return -1;
+		break;
+#endif
 	/* unknown handle type */
 	default:
 		RTE_LOG(ERR, EAL,
@@ -357,10 +546,14 @@ eal_intr_process_interrupts(struct epoll_event *events, int nfds)
 		/* set the length to be read dor different handle type */
 		switch (src->intr_handle.type) {
 		case RTE_INTR_HANDLE_UIO:
-			bytes_read = 4;
+			bytes_read = sizeof(buf.uio_intr_count);
 			break;
 		case RTE_INTR_HANDLE_ALARM:
-			bytes_read = sizeof(uint64_t);
+			bytes_read = sizeof(buf.timerfd_num);
+			break;
+		case RTE_INTR_HANDLE_VFIO_MSIX:
+		case RTE_INTR_HANDLE_VFIO_LEGACY:
+			bytes_read = sizeof(buf.vfio_intr_count);
 			break;
 		default:
 			bytes_read = 1;
@@ -397,7 +590,7 @@ eal_intr_process_interrupts(struct epoll_event *events, int nfds)
 				active_cb.cb_fn(&src->intr_handle,
 					active_cb.cb_arg);
 
-				/*get the lcok back. */
+				/*get the lock back. */
 				rte_spinlock_lock(&intr_lock);
 			}
 		}
diff --git a/lib/librte_eal/linuxapp/eal/include/eal_vfio.h b/lib/librte_eal/linuxapp/eal/include/eal_vfio.h
new file mode 100644
index 0000000..ca4982b
--- /dev/null
+++ b/lib/librte_eal/linuxapp/eal/include/eal_vfio.h
@@ -0,0 +1,49 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef EAL_VFIO_H_
+#define EAL_VFIO_H_
+
+/*
+ * determine if VFIO is present on the system
+ */
+#ifdef RTE_EAL_VFIO
+#include <linux/version.h>
+#if LINUX_VERSION_CODE >= KERNEL_VERSION(3,6,0)
+#include <linux/vfio.h>
+
+#define VFIO_PRESENT
+#endif /* kernel version */
+#endif /* RTE_EAL_VFIO */
+
+#endif /* EAL_VFIO_H_ */
diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
index 6733948..b160efe 100644
--- a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
+++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
@@ -41,12 +41,15 @@
 enum rte_intr_handle_type {
 	RTE_INTR_HANDLE_UNKNOWN = 0,
 	RTE_INTR_HANDLE_UIO,      /**< uio device handle */
+	RTE_INTR_HANDLE_VFIO_LEGACY,  /**< vfio device handle (legacy) */
+	RTE_INTR_HANDLE_VFIO_MSIX,    /**< uio device handle (MSIX) */
 	RTE_INTR_HANDLE_ALARM,    /**< alarm handle */
 	RTE_INTR_HANDLE_MAX
 };
 
 /** Handle for interrupts. */
 struct rte_intr_handle {
+	int vfio_dev_fd;                 /**< VFIO device file descriptor */
 	int fd;                          /**< file descriptor */
 	enum rte_intr_handle_type type;  /**< handle type */
 };
-- 
1.7.0.7

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v2 08/16] Add support for mapping devices through VFIO.
  2014-05-01 11:05 [dpdk-dev] [PATCH 00/16] [RFC] [VFIO] Add VFIO support to DPDK Burakov, Anatoly
                   ` (8 preceding siblings ...)
  2014-05-19 15:51 ` [dpdk-dev] [PATCH v2 07/16] Add support for VFIO interrupts, add VFIO header Anatoly Burakov
@ 2014-05-19 15:51 ` Anatoly Burakov
  2014-05-22 11:53   ` Thomas Monjalon
  2014-05-19 15:51 ` [dpdk-dev] [PATCH v2 09/16] Enable VFIO device binding Anatoly Burakov
                   ` (7 subsequent siblings)
  17 siblings, 1 reply; 160+ messages in thread
From: Anatoly Burakov @ 2014-05-19 15:51 UTC (permalink / raw)
  To: dev

VFIO is kernel 3.6+ only, and so is only compiled when DPDK config
option CONFIG_RTE_EAL_VFIO is enabled, and kernel 3.6 or higher is
detected, thus preventing compile failures on older kernels if VFIO is
enabled in config (and it is, by default).

Since VFIO cannot be used to map the same device twice, secondary
processes receive the device/group fd's by means of communicating over a
local socket. Only group and container fd's should be sent, as device
fd's can be obtained via ioctl() calls' on the group fd.

For multiprocess, VFIO distinguishes between existing but unused groups
(e.g. grups that aren't bound to VFIO driver) and non-existing groups in
order to know if the secondary process requests a valid group, or if
secondary process requests something that doesn't exist.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/linuxapp/eal/Makefile               |    5 +-
 lib/librte_eal/linuxapp/eal/eal.c                  |    1 +
 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c         |  719 ++++++++++++++++++++
 lib/librte_eal/linuxapp/eal/eal_pci_vfio_socket.c  |  367 ++++++++++
 .../linuxapp/eal/include/eal_internal_cfg.h        |    3 +
 lib/librte_eal/linuxapp/eal/include/eal_pci_init.h |   55 ++
 lib/librte_eal/linuxapp/eal/include/eal_vfio.h     |    6 +
 7 files changed, 1155 insertions(+), 1 deletions(-)
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_pci_vfio_socket.c

diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index 527fa2a..3a39cca 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -58,6 +58,8 @@ SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_thread.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_log.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci_uio.c
+SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci_vfio.c
+SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci_vfio_socket.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_debug.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_lcore.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_timer.c
@@ -88,12 +90,13 @@ CFLAGS_eal_common_log.o := -D_GNU_SOURCE
 CFLAGS_eal_hugepage_info.o := -D_GNU_SOURCE
 CFLAGS_eal_pci.o := -D_GNU_SOURCE
 CFLAGS_eal_common_whitelist.o := -D_GNU_SOURCE
+CFLAGS_eal_pci_vfio.o := -D_GNU_SOURCE
 
 # workaround for a gcc bug with noreturn attribute
 # http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12603
 ifeq ($(CONFIG_RTE_TOOLCHAIN_GCC),y)
 CFLAGS_eal_thread.o += -Wno-return-type
-CFLAGS_eal_hpet.o += -Wno-return-type
+CFLAGS_eal_pci_vfio_socket.o += -Wno-return-type
 endif
 
 INC := rte_per_lcore.h rte_lcore.h rte_interrupts.h rte_kni_common.h rte_dom0_common.h
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index de182e1..01bfd6c 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -650,6 +650,7 @@ eal_parse_args(int argc, char **argv)
 	internal_config.force_sockets = 0;
 	internal_config.syslog_facility = LOG_DAEMON;
 	internal_config.xen_dom0_support = 0;
+	internal_config.vfio_intr_mode = RTE_INTR_MODE_MSIX;
 #ifdef RTE_LIBEAL_USE_HPET
 	internal_config.no_hpet = 0;
 #else
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
new file mode 100644
index 0000000..0a6f95c
--- /dev/null
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
@@ -0,0 +1,719 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <string.h>
+#include <fcntl.h>
+#include <linux/pci_regs.h>
+#include <sys/eventfd.h>
+#include <sys/socket.h>
+#include <sys/ioctl.h>
+
+#include <rte_log.h>
+#include <rte_pci.h>
+#include <rte_tailq.h>
+#include <rte_eal_memconfig.h>
+#include <rte_malloc.h>
+
+#include "eal_filesystem.h"
+#include "eal_pci_init.h"
+#include "eal_vfio.h"
+
+/**
+ * @file
+ * PCI probing under linux (VFIO version)
+ *
+ * This code tries to determine if the PCI device is bound to VFIO driver,
+ * and initialize it (map BARs, set up interrupts) if that's the case.
+ *
+ * This file is only compiled if CONFIG_RTE_EAL_VFIO is set to "y".
+ */
+
+#ifdef VFIO_PRESENT
+/* get PCI BAR number where MSI-X interrupts are */
+static int
+pci_vfio_get_msix_bar(int fd, int * msix_bar)
+{
+	int ret;
+	uint32_t reg;
+	uint8_t cap_id, cap_offset;
+
+	/* read PCI capability pointer from config space */
+	ret = pread64(fd, &reg, sizeof(reg),
+			VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
+			PCI_CAPABILITY_LIST);
+	if (ret != sizeof(reg)) {
+		RTE_LOG(ERR, EAL, "Cannot read capability pointer from PCI "
+				"config space!\n");
+		return -1;
+	}
+
+	/* we need first byte */
+	cap_offset = reg & 0xFF;
+
+	while (cap_offset){
+
+		/* read PCI capability ID */
+		ret = pread64(fd, &reg, sizeof(reg),
+				VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
+				cap_offset);
+		if (ret != sizeof(reg)) {
+			RTE_LOG(ERR, EAL, "Cannot read capability ID from PCI "
+					"config space!\n");
+			return -1;
+		}
+
+		/* we need first byte */
+		cap_id = reg & 0xFF;
+
+		/* if we haven't reached MSI-X, check next capability */
+		if (cap_id != PCI_CAP_ID_MSIX) {
+			ret = pread64(fd, &reg, sizeof(reg),
+					VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
+					cap_offset);
+			if (ret != sizeof(reg)) {
+				RTE_LOG(ERR, EAL, "Cannot read capability pointer from PCI "
+						"config space!\n");
+				return -1;
+			}
+
+			/* we need second byte */
+			cap_offset = (reg & 0xFF00) >> 8;
+
+			continue;
+		}
+		/* else, read table offset */
+		else {
+			/* table offset resides in the next 4 bytes */
+			ret = pread64(fd, &reg, sizeof(reg),
+					VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
+					cap_offset + 4);
+			if (ret != sizeof(reg)) {
+				RTE_LOG(ERR, EAL, "Cannot read table offset from PCI config "
+						"space!\n");
+				return -1;
+			}
+
+			*msix_bar = reg & RTE_PCI_MSIX_TABLE_BIR;
+
+			return 0;
+		}
+	}
+	return 0;
+}
+
+/* set PCI bus mastering */
+static int
+pci_vfio_set_bus_master(int dev_fd)
+{
+	uint16_t reg;
+	int ret;
+
+	ret = pread64(dev_fd, &reg, sizeof(reg),
+			VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
+			PCI_COMMAND);
+	if (ret != sizeof(reg)) {
+		RTE_LOG(ERR, EAL, "Cannot read command from PCI config space!\n");
+		return -1;
+	}
+
+	/* set the master bit */
+	reg |= PCI_COMMAND_MASTER;
+
+	ret = pwrite64(dev_fd, &reg, sizeof(reg),
+			VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
+			PCI_COMMAND);
+
+	if (ret != sizeof(reg)) {
+		RTE_LOG(ERR, EAL, "Cannot write command to PCI config space!\n");
+		return -1;
+	}
+
+	return 0;
+}
+
+/* set up DMA mappings */
+static int
+pci_vfio_setup_dma_maps(int vfio_container_fd)
+{
+	const struct rte_memseg * ms = rte_eal_get_physmem_layout();
+	int i, ret;
+
+	ret = ioctl(vfio_container_fd, VFIO_SET_IOMMU,
+			VFIO_TYPE1_IOMMU);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "  cannot set IOMMU type!\n");
+		return -1;
+	}
+
+	/* map all DPDK segments for DMA. use 1:1 PA to IOVA mapping */
+	for (i = 0; i < RTE_MAX_MEMSEG; i++) {
+		struct vfio_iommu_type1_dma_map dma_map;
+
+		if (ms[i].addr == NULL)
+			break;
+
+		memset(&dma_map, 0, sizeof(dma_map));
+		dma_map.argsz = sizeof(struct vfio_iommu_type1_dma_map);
+		dma_map.vaddr = ms[i].addr_64;
+		dma_map.size = ms[i].len;
+		dma_map.iova = ms[i].phys_addr;
+		dma_map.flags = VFIO_DMA_MAP_FLAG_READ | VFIO_DMA_MAP_FLAG_WRITE;
+
+		ret = ioctl(vfio_container_fd, VFIO_IOMMU_MAP_DMA, &dma_map);
+
+		if (ret) {
+			RTE_LOG(ERR, EAL, "  cannot set up DMA remapping!\n");
+			return -1;
+		}
+	}
+
+	return 0;
+}
+
+/* set up interrupt support (but not enable interrupts) */
+static int
+pci_vfio_setup_interrupts(struct rte_pci_device *dev, int vfio_dev_fd,
+		int num_irqs)
+{
+	int i, ret, intr_idx;
+	enum rte_intr_handle_type handle_type;
+
+	/* get interrupt type from internal config (MSI-X by default, can be
+	 * overriden from the command line
+	 */
+	switch (internal_config.vfio_intr_mode) {
+	case RTE_INTR_MODE_MSIX:
+		intr_idx = VFIO_PCI_MSIX_IRQ_INDEX;
+		handle_type = RTE_INTR_HANDLE_VFIO_MSIX;
+		break;
+	case RTE_INTR_MODE_LEGACY:
+		intr_idx = VFIO_PCI_INTX_IRQ_INDEX;
+		handle_type = RTE_INTR_HANDLE_VFIO_LEGACY;
+		break;
+	default:
+		RTE_LOG(ERR, EAL, "  unknown default interrupt type!\n");
+		return -1;
+	}
+
+	for (i = 0; i < num_irqs; i++) {
+		struct vfio_irq_info irq = { .argsz = sizeof(irq) };
+		int fd = -1;
+
+		/* skip interrupt modes we don't want */
+		if (i != intr_idx)
+			continue;
+
+		irq.index = i;
+
+		ret = ioctl(vfio_dev_fd, VFIO_DEVICE_GET_IRQ_INFO, &irq);
+		if (ret < 0) {
+			RTE_LOG(ERR, EAL, "  cannot get IRQ info!\n");
+			return -1;
+		}
+
+		/* fail if this vector cannot be used with eventfd */
+		if ((irq.flags & VFIO_IRQ_INFO_EVENTFD) == 0) {
+			RTE_LOG(ERR, EAL, "  interrupt vector does not support eventfd!\n");
+			return -1;
+		}
+
+		/* set up an eventfd for interrupts */
+		fd = eventfd(0, 0);
+		if (fd < 0) {
+			RTE_LOG(ERR, EAL, "  cannot set up eventfd!\n");
+			return -1;
+		}
+
+		dev->intr_handle.type = handle_type;
+		dev->intr_handle.fd = fd;
+		dev->intr_handle.vfio_dev_fd = vfio_dev_fd;
+
+		return 0;
+	}
+
+	/* if we're here, we haven't found a suitable interrupt vector */
+	return -1;
+}
+
+/* open container fd or get an existing one */
+static int
+pci_vfio_get_container_fd(void)
+{
+	int ret, vfio_container_fd;
+
+	/* if we're in a primary process, try to open the container */
+	if (internal_config.process_type == RTE_PROC_PRIMARY) {
+		vfio_container_fd = open(VFIO_CONTAINER_PATH, O_RDWR);
+		if (vfio_container_fd < 0) {
+			RTE_LOG(ERR, EAL, "  cannot open VFIO container!\n");
+			return -1;
+		}
+
+		/* check VFIO API version */
+		ret = ioctl(vfio_container_fd, VFIO_GET_API_VERSION);
+		if (ret != VFIO_API_VERSION) {
+			RTE_LOG(ERR, EAL, "  unknown VFIO API version!\n");
+			close(vfio_container_fd);
+			return -1;
+		}
+
+		/* check if we support IOMMU type 1 */
+		ret = ioctl(vfio_container_fd, VFIO_CHECK_EXTENSION, VFIO_TYPE1_IOMMU);
+		if (!ret) {
+			RTE_LOG(ERR, EAL, "  unknown IOMMU driver!\n");
+			close(vfio_container_fd);
+			return -1;
+		}
+
+		return vfio_container_fd;
+	}
+	/* if we're in a secondary process, request container fd from the primary
+	 * process via our socket
+	 */
+	else {
+		int socket_fd;
+		if ((socket_fd = vfio_socket_connect_to_primary()) < 0) {
+			RTE_LOG(ERR, EAL, "  cannot connect to primary process!\n");
+			return -1;
+		}
+		if (vfio_socket_send_request(socket_fd, SOCKET_REQ_CONTAINER) < 0) {
+			RTE_LOG(ERR, EAL, "  cannot request container fd!\n");
+			close(socket_fd);
+			return -1;
+		}
+		vfio_container_fd = vfio_socket_receive_fd(socket_fd);
+		if (vfio_container_fd < 0) {
+			RTE_LOG(ERR, EAL, "  cannot get container fd!\n");
+			close(socket_fd);
+			return -1;
+		}
+		close(socket_fd);
+		return vfio_container_fd;
+	}
+
+	return -1;
+}
+
+/* open group fd or get an existing one */
+static int
+pci_vfio_get_group_fd(int iommu_group_no)
+{
+	int i;
+	int vfio_group_fd;
+	char filename[PATH_MAX];
+
+	/* check if we already have the group descriptor open */
+	for (i = 0; i < vfio_cfg.vfio_group_idx; i++)
+		if (vfio_cfg.vfio_groups[i].group_no == iommu_group_no)
+			return vfio_cfg.vfio_groups[i].fd;
+
+	/* if primary, try to open the group */
+	if (internal_config.process_type == RTE_PROC_PRIMARY) {
+		rte_snprintf(filename, sizeof(filename),
+				 VFIO_GROUP_FMT, iommu_group_no);
+		vfio_group_fd = open(filename, O_RDWR);
+		if (vfio_group_fd < 0) {
+			/* if file not found, it's not an error */
+			if (errno != ENOENT) {
+				RTE_LOG(ERR, EAL, "Cannot open %s: %s\n", filename,
+						strerror(errno));
+				return -1;
+			}
+			return 0;
+		}
+
+		/* if the fd is valid, create a new group for it */
+		if (vfio_cfg.vfio_group_idx == VFIO_MAX_GROUPS) {
+			RTE_LOG(ERR, EAL, "Maximum number of VFIO groups reached!\n");
+			return -1;
+		}
+		vfio_cfg.vfio_groups[vfio_cfg.vfio_group_idx].group_no = iommu_group_no;
+		vfio_cfg.vfio_groups[vfio_cfg.vfio_group_idx].fd = vfio_group_fd;
+		return vfio_group_fd;
+	}
+	/* if we're in a secondary process, request group fd from the primary
+	 * process via our socket
+	 */
+	else {
+		int socket_fd, ret;
+		if ((socket_fd = vfio_socket_connect_to_primary()) < 0) {
+			RTE_LOG(ERR, EAL, "  cannot connect to primary process!\n");
+			return -1;
+		}
+		if (vfio_socket_send_request(socket_fd, SOCKET_REQ_GROUP) < 0) {
+			RTE_LOG(ERR, EAL, "  cannot request container fd!\n");
+			close(socket_fd);
+			return -1;
+		}
+		if (vfio_socket_send_request(socket_fd, iommu_group_no) < 0) {
+			RTE_LOG(ERR, EAL, "  cannot send group number!\n");
+			close(socket_fd);
+			return -1;
+		}
+		ret = vfio_socket_receive_request(socket_fd);
+		switch(ret) {
+		case SOCKET_NO_FD:
+			close(socket_fd);
+			return 0;
+		case SOCKET_OK:
+			vfio_group_fd = vfio_socket_receive_fd(socket_fd);
+			/* if we got the fd, return it */
+			if (vfio_group_fd > 0) {
+				close(socket_fd);
+				return vfio_group_fd;
+			}
+			/* fall-through on error */
+		default:
+			RTE_LOG(ERR, EAL, "  cannot get container fd!\n");
+			close(socket_fd);
+			return -1;
+		}
+	}
+	return -1;
+}
+
+/* parse IOMMU group number for a PCI device
+ * returns -1 for errors, 0 for non-existent group */
+static int
+pci_vfio_get_group_no(const char * pci_addr)
+{
+	char linkname[PATH_MAX];
+	char filename[PATH_MAX];
+	char * tok[16], *group_tok, *end;
+	int ret, iommu_group_no;
+
+	memset(linkname, 0, sizeof(linkname));
+	memset(filename, 0, sizeof(filename));
+
+	/* try to find out IOMMU group for this device */
+	rte_snprintf(linkname, sizeof(linkname),
+			 SYSFS_PCI_DEVICES "/%s/iommu_group", pci_addr);
+
+	ret = readlink(linkname, filename, sizeof(filename));
+
+	/* if the link doesn't exist, no VFIO for us */
+	if (ret < 0)
+		return 0;
+
+	ret = rte_strsplit(filename, sizeof(filename),
+			tok, RTE_DIM(tok), '/');
+
+	if (ret <= 0) {
+		RTE_LOG(ERR, EAL, "  %s cannot get IOMMU group\n", pci_addr);
+		return -1;
+	}
+
+	/* IOMMU group is always the last token */
+	errno = 0;
+	group_tok = tok[ret - 1];
+	end = group_tok;
+	iommu_group_no = strtol(group_tok, &end, 10);
+	if ((end != group_tok && *end != '\0') || errno != 0) {
+		RTE_LOG(ERR, EAL, "  %s error parsing IOMMU number!\n", pci_addr);
+		return -1;
+	}
+
+	return iommu_group_no;
+}
+
+static void
+clear_current_group(void)
+{
+	vfio_cfg.vfio_groups[vfio_cfg.vfio_group_idx].group_no = 0;
+	vfio_cfg.vfio_groups[vfio_cfg.vfio_group_idx].fd = -1;
+}
+
+
+/*
+ * map the PCI resources of a PCI device in virtual memory (VFIO version).
+ * primary and secondary processes follow almost exactly the same path
+ */
+int
+pci_vfio_map_resource(struct rte_pci_device *dev)
+{
+	struct vfio_group_status group_status =
+					{ .argsz = sizeof(group_status) };
+	struct vfio_device_info device_info = { .argsz = sizeof(device_info) };
+	int vfio_group_fd, vfio_dev_fd;
+	int iommu_group_no;
+	char pci_addr[PATH_MAX] = {0};
+	struct rte_pci_addr *loc = &dev->addr;
+	int i, ret, msix_bar;
+	struct mapped_pci_resource *vfio_res = NULL;
+	struct pci_map *maps;
+
+	dev->intr_handle.fd = -1;
+	dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
+
+	/* store PCI address string */
+	rte_snprintf(pci_addr, sizeof(pci_addr), PCI_PRI_FMT,
+			loc->domain, loc->bus, loc->devid, loc->function);
+
+	/* get container fd (needs to be done only once per initialization) */
+	if (vfio_cfg.vfio_container_fd == -1) {
+		int vfio_container_fd = pci_vfio_get_container_fd();
+		if (vfio_container_fd < 0) {
+			RTE_LOG(ERR, EAL, "  %s cannot open VFIO container!\n", pci_addr);
+			return -1;
+		}
+
+		vfio_cfg.vfio_container_fd = vfio_container_fd;
+	}
+
+	/* get group number */
+	iommu_group_no = pci_vfio_get_group_no(pci_addr);
+
+	/* if 0, group doesn't exist */
+	if (iommu_group_no == 0) {
+		RTE_LOG(WARNING, EAL, "  %s not managed by VFIO driver, skipping\n",
+				pci_addr);
+		return 1;
+	}
+	/* if negative, something failed */
+	else if (iommu_group_no < 0)
+		return -1;
+
+	/* get the actual group fd */
+	vfio_group_fd = pci_vfio_get_group_fd(iommu_group_no);
+	if (vfio_group_fd < 0) {
+		return -1;
+	}
+
+	/* store group fd */
+	vfio_cfg.vfio_groups[vfio_cfg.vfio_group_idx].group_no = iommu_group_no;
+	vfio_cfg.vfio_groups[vfio_cfg.vfio_group_idx].fd = vfio_group_fd;
+
+	/* if group_fd == 0, that means the device isn't managed by VFIO */
+	if (vfio_group_fd == 0) {
+		RTE_LOG(WARNING, EAL, "  %s not managed by VFIO driver, skipping\n",
+				pci_addr);
+		/* we store 0 as group fd to distinguish between existing but
+		 * unbound VFIO groups, and groups that don't exist at all.
+		 */
+		vfio_cfg.vfio_group_idx++;
+		return 1;
+	}
+
+	/*
+	 * at this point, we know at least one port on this device is bound to VFIO,
+	 * so we can proceed to try and set this particular port up
+	 */
+
+	/* check if the group is viable */
+	ret = ioctl(vfio_group_fd, VFIO_GROUP_GET_STATUS, &group_status);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "  %s cannot get group status!\n", pci_addr);
+		close(vfio_group_fd);
+		clear_current_group();
+		return -1;
+	}
+	else if (!(group_status.flags & VFIO_GROUP_FLAGS_VIABLE)) {
+		RTE_LOG(ERR, EAL, "  %s VFIO group is not viable!\n", pci_addr);
+		close(vfio_group_fd);
+		clear_current_group();
+		return -1;
+	}
+
+	/*
+	 * at this point, we know that this group is viable (meaning, all devices
+	 * are either bound to VFIO or not bound to anything)
+	 */
+
+	/* check if group does not have a container yet */
+	if (!(group_status.flags & VFIO_GROUP_FLAGS_CONTAINER_SET)) {
+
+		/* add group to a container */
+		ret = ioctl(vfio_group_fd, VFIO_GROUP_SET_CONTAINER,
+				&vfio_cfg.vfio_container_fd);
+		if (ret) {
+			RTE_LOG(ERR, EAL, "  %s cannot add VFIO group to container!\n",
+					pci_addr);
+			close(vfio_group_fd);
+			clear_current_group();
+			return -1;
+		}
+		/*
+		 * at this point we know that this group has been successfully
+		 * initialized, so we increment vfio_group_idx to indicate that we can
+		 * add new groups.
+		 */
+		vfio_cfg.vfio_group_idx++;
+	}
+
+	/*
+	 * set up DMA mappings for container (needs to be done only once, only when
+	 * at least one group is assigned to a container and only in primary process)
+	 */
+	if (internal_config.process_type == RTE_PROC_PRIMARY &&
+			vfio_cfg.vfio_container_has_dma == 0) {
+		ret = pci_vfio_setup_dma_maps(vfio_cfg.vfio_container_fd);
+		if (ret) {
+			RTE_LOG(ERR, EAL, "  %s DMA remapping failed!\n", pci_addr);
+			return -1;
+		}
+		vfio_cfg.vfio_container_has_dma = 1;
+	}
+
+	/* get a file descriptor for the device */
+	vfio_dev_fd = ioctl(vfio_group_fd, VFIO_GROUP_GET_DEVICE_FD, pci_addr);
+	if (vfio_dev_fd < 0) {
+		/* if we cannot get a device fd, this simply means that this
+		 * particular port is not bound to VFIO
+		 */
+		RTE_LOG(WARNING, EAL, "  %s not managed by VFIO driver, skipping\n",
+				pci_addr);
+		return 1;
+	}
+
+	/* test and setup the device */
+	ret = ioctl(vfio_dev_fd, VFIO_DEVICE_GET_INFO, &device_info);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "  %s cannot get device info!\n", pci_addr);
+		close(vfio_dev_fd);
+		return -1;
+	}
+
+	/* get MSI-X BAR, if any (we have to know where it is because we can't
+	 * mmap it when using VFIO) */
+	msix_bar = -1;
+	ret = pci_vfio_get_msix_bar(vfio_dev_fd, &msix_bar);
+	if (ret < 0) {
+		RTE_LOG(ERR, EAL, "  %s cannot get MSI-X BAR number!\n", pci_addr);
+		close(vfio_dev_fd);
+		return -1;
+	}
+
+	/* if we're in a primary process, allocate vfio_res and get region info */
+	if (internal_config.process_type == RTE_PROC_PRIMARY) {
+		if ((vfio_res = rte_zmalloc("VFIO_RES", sizeof (*vfio_res), 0)) == NULL) {
+			RTE_LOG(ERR, EAL,
+				"%s(): cannot store uio mmap details\n", __func__);
+			close(vfio_dev_fd);
+			return -1;
+		}
+		memcpy(&vfio_res->pci_addr, &dev->addr, sizeof(vfio_res->pci_addr));
+
+		/* get number of registers (up to BAR5) */
+		vfio_res->nb_maps = RTE_MIN((int) device_info.num_regions,
+				VFIO_PCI_BAR5_REGION_INDEX + 1);
+	}
+	/* if we're in a secondary process, just find our tailq entry and use that */
+	else {
+		TAILQ_FOREACH(vfio_res, pci_res_list, next) {
+			if (memcmp(&vfio_res->pci_addr, &dev->addr, sizeof(dev->addr)))
+				continue;
+			break;
+		}
+		/* if we haven't found our tailq entry, something's wrong */
+		if (vfio_res == NULL) {
+			RTE_LOG(ERR, EAL, "  %s cannot find TAILQ entry for PCI device!\n",
+					pci_addr);
+			close(vfio_dev_fd);
+			return -1;
+		}
+	}
+
+	/* map BARs */
+	maps = vfio_res->maps;
+
+	for (i = 0; i < (int) vfio_res->nb_maps; i++) {
+		struct vfio_region_info reg = { .argsz = sizeof(reg) };
+		void * bar_addr;
+
+		reg.index = i;
+
+		ret = ioctl(vfio_dev_fd, VFIO_DEVICE_GET_REGION_INFO, &reg);
+
+		if (ret) {
+			RTE_LOG(ERR, EAL, "  %s cannot get device region info!\n",
+					pci_addr);
+			close(vfio_dev_fd);
+			if (internal_config.process_type == RTE_PROC_PRIMARY)
+				rte_free(vfio_res);
+			return -1;
+		}
+
+		/* skip non-mmapable BARs */
+		if ((reg.flags & VFIO_REGION_INFO_FLAG_MMAP) == 0)
+			continue;
+
+		/* skip MSI-X BAR */
+		if (i == msix_bar)
+			continue;
+
+		bar_addr = pci_map_resource(maps[i].addr, vfio_dev_fd, reg.offset,
+				reg.size);
+
+		if (bar_addr == NULL) {
+			RTE_LOG(ERR, EAL, "  %s mapping BAR%i failed: %s\n", pci_addr, i,
+					strerror(errno));
+			close(vfio_dev_fd);
+			if (internal_config.process_type == RTE_PROC_PRIMARY)
+				rte_free(vfio_res);
+			return -1;
+		}
+
+		maps[i].addr = bar_addr;
+		maps[i].offset = reg.offset;
+		maps[i].size = reg.size;
+		dev->mem_resource[i].addr = bar_addr;
+	}
+
+	/* if secondary process, do not set up interrupts */
+	if (internal_config.process_type == RTE_PROC_PRIMARY) {
+		if (pci_vfio_setup_interrupts(dev, vfio_dev_fd,
+				(int) device_info.num_irqs) != 0) {
+			RTE_LOG(ERR, EAL, "  %s error setting up interrupts!\n", pci_addr);
+			close(vfio_dev_fd);
+			rte_free(vfio_res);
+			return -1;
+		}
+
+		/* set bus mastering for the device */
+		if (pci_vfio_set_bus_master(vfio_dev_fd)) {
+			RTE_LOG(ERR, EAL, "  %s cannot set up bus mastering!\n", pci_addr);
+			close(vfio_dev_fd);
+			rte_free(vfio_res);
+			return -1;
+		}
+
+		/* Reset the device */
+		ioctl(vfio_dev_fd, VFIO_DEVICE_RESET);
+	}
+
+	if (internal_config.process_type == RTE_PROC_PRIMARY)
+		TAILQ_INSERT_TAIL(pci_res_list, vfio_res, next);
+
+	return (0);
+}
+#endif
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio_socket.c b/lib/librte_eal/linuxapp/eal/eal_pci_vfio_socket.c
new file mode 100644
index 0000000..1605fce
--- /dev/null
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio_socket.c
@@ -0,0 +1,367 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <string.h>
+#include <fcntl.h>
+#include <sys/socket.h>
+
+/* sys/un.h with __USE_MISC uses strlen, which is unsafe and should not be used. */
+#ifdef __USE_MISC
+#define REMOVED_USE_MISC
+#undef __USE_MISC
+#endif
+#include <sys/un.h>
+/* make sure we redefine __USE_MISC only if it was previously undefined */
+#ifdef REMOVED_USE_MISC
+#define __USE_MISC
+#undef REMOVED_USE_MISC
+#endif
+
+#include <rte_log.h>
+#include <rte_pci.h>
+#include <rte_tailq.h>
+#include <rte_eal_memconfig.h>
+#include <rte_malloc.h>
+
+#include "eal_filesystem.h"
+#include "eal_pci_init.h"
+
+/**
+ * @file
+ * VFIO socket for communication between primary and secondary processes.
+ *
+ * This file is only compiled if CONFIG_RTE_EAL_VFIO is set to "y".
+ */
+
+#ifdef VFIO_PRESENT
+#define SOCKET_PATH_FMT "%s/.%s_mp_socket"
+#define CMSGLEN (CMSG_LEN(sizeof(int)))
+#define FD_TO_CMSGHDR(fd,chdr) \
+		do {\
+			(chdr).cmsg_len = CMSGLEN;\
+			(chdr).cmsg_level = SOL_SOCKET;\
+			(chdr).cmsg_type = SCM_RIGHTS;\
+			memcpy((chdr).__cmsg_data, &(fd), sizeof(fd));\
+		} while(0)
+#define CMSGHDR_TO_FD(chdr,fd) \
+		do {\
+			memcpy(&(fd), (chdr).__cmsg_data, sizeof(fd));\
+		} while (0)
+
+
+/* get socket path (/var/run if root, $HOME otherwise) */
+static void
+get_socket_path(char * buffer, int bufsz)
+{
+	const char *dir = "/var/run";
+	const char *home_dir = getenv("HOME");
+
+	if (getuid() != 0 && home_dir != NULL)
+		dir = home_dir;
+
+	/* use current prefix as file path */
+	rte_snprintf(buffer, bufsz, SOCKET_PATH_FMT, dir,
+			internal_config.hugefile_prefix);
+}
+
+
+
+/*
+ * data flow for socket comm protocol:
+ * 1. client sends SOCKET_REQ_CONTAINER or SOCKET_REQ_GROUP
+ * 1a. in case of SOCKET_REQ_GROUP, client also then sends group number
+ * 2. server receives message
+ * 2a. in case of invalid group, SOCKET_ERR is sent back to client
+ * 2b. in case of unbound group, SOCKET_NO_FD is sent back to client
+ * 2c. in case of valid group, SOCKET_OK is sent and immediately followed by fd
+ *
+ * in case of any error, socket is closed.
+ */
+
+/* send a request, return -1 on error */
+int
+vfio_socket_send_request(int socket, int req)
+{
+	struct msghdr hdr;
+	struct iovec iov;
+	int buf;
+	int ret;
+
+	memset(&hdr, 0, sizeof(hdr));
+
+	buf = req;
+
+	hdr.msg_iov = &iov;
+	hdr.msg_iovlen = 1;
+	iov.iov_base = (char*) &buf;
+	iov.iov_len = sizeof(buf);
+
+	ret = sendmsg(socket, &hdr, 0);
+	if (ret < 0)
+		return -1;
+	return 0;
+}
+
+/* receive a request and return it */
+int
+vfio_socket_receive_request(int socket)
+{
+	int buf;
+	struct msghdr hdr;
+	struct iovec iov;
+	int ret, req;
+
+	memset(&hdr, 0, sizeof(hdr));
+
+	buf = SOCKET_ERR;
+
+	hdr.msg_iov = &iov;
+	hdr.msg_iovlen = 1;
+	iov.iov_base = (char*) &buf;
+	iov.iov_len = sizeof(buf);
+
+	ret = recvmsg(socket, &hdr, 0);
+	if (ret < 0)
+		return -1;
+
+	req = buf;
+
+	return req;
+}
+
+/* send OK in message, fd in control message */
+int
+vfio_socket_send_fd(int socket, int fd)
+{
+	int buf;
+	struct msghdr hdr;
+	struct cmsghdr * chdr;
+	char chdr_buf[CMSGLEN];
+	struct iovec iov;
+	int ret;
+
+	chdr = (struct cmsghdr *) chdr_buf;
+	memset(chdr, 0, sizeof(chdr_buf));
+	memset(&hdr, 0, sizeof(hdr));
+
+	hdr.msg_iov = &iov;
+	hdr.msg_iovlen = 1;
+	iov.iov_base = (char*) &buf;
+	iov.iov_len = sizeof(buf);
+	hdr.msg_control = chdr;
+	hdr.msg_controllen = CMSGLEN;
+
+	buf = SOCKET_OK;
+	FD_TO_CMSGHDR(fd, *chdr);
+
+	ret = sendmsg(socket, &hdr, 0);
+	if (ret < 0)
+		return -1;
+	return 0;
+}
+
+/* receive OK in message, fd in control message */
+int
+vfio_socket_receive_fd(int socket)
+{
+	int buf;
+	struct msghdr hdr;
+	struct cmsghdr * chdr;
+	char chdr_buf[CMSGLEN];
+	struct iovec iov;
+	int ret, req, fd;
+
+	buf = SOCKET_ERR;
+
+	chdr = (struct cmsghdr *) chdr_buf;
+	memset(chdr, 0, sizeof(chdr_buf));
+	memset(&hdr, 0, sizeof(hdr));
+
+	hdr.msg_iov = &iov;
+	hdr.msg_iovlen = 1;
+	iov.iov_base = (char*) &buf;
+	iov.iov_len = sizeof(buf);
+	hdr.msg_control = chdr;
+	hdr.msg_controllen = CMSGLEN;
+
+	ret = recvmsg(socket, &hdr, 0);
+	if (ret < 0)
+		return -1;
+
+	req = buf;
+
+	if (req != SOCKET_OK)
+		return -1;
+
+	CMSGHDR_TO_FD(*chdr, fd);
+
+	return fd;
+}
+
+/* connect socket_fd in secondary process to the primary process's socket */
+int
+vfio_socket_connect_to_primary(void)
+{
+	struct sockaddr_un addr;
+	socklen_t sockaddr_len;
+	int socket_fd;
+
+	/* set up a socket */
+	socket_fd = socket(AF_UNIX, SOCK_SEQPACKET, 0);
+	if (socket_fd < 0) {
+		RTE_LOG(ERR, EAL, "Failed to create socket!\n");
+		return -1;
+	}
+
+	get_socket_path(addr.sun_path, sizeof(addr.sun_path));
+	addr.sun_family = AF_UNIX;
+
+	sockaddr_len = sizeof(struct sockaddr_un);
+
+	if (connect(socket_fd, (struct sockaddr*) &addr, sockaddr_len) == 0)
+		return socket_fd;
+
+	/* if connect failed */
+	close(socket_fd);
+	return -1;
+}
+
+
+
+/*
+ * socket listening thread for primary process
+ */
+__attribute__((noreturn)) void *
+pci_vfio_socket_thread(void *arg)
+{
+	int ret, i, vfio_group_no;
+	int socket_fd = *(int*) arg;
+
+	/* wait for requests on the socket */
+	for (;;) {
+		int conn_sock;
+		struct sockaddr_un addr;
+		socklen_t sockaddr_len = sizeof(addr);
+
+		/* this is a blocking call */
+		conn_sock = accept(socket_fd, (struct sockaddr*) &addr, &sockaddr_len);
+
+		/* just restart on error */
+		if (conn_sock == -1)
+			continue;
+
+		/* set socket to linger after close */
+		struct linger l;
+		l.l_onoff = 1;
+		l.l_linger = 60;
+		setsockopt(conn_sock, SOL_SOCKET, SO_LINGER, &l, sizeof(l));
+
+		ret = vfio_socket_receive_request(conn_sock);
+
+		switch (ret) {
+		case SOCKET_REQ_CONTAINER:
+			vfio_socket_send_fd(conn_sock, vfio_cfg.vfio_container_fd);
+			break;
+		case SOCKET_REQ_GROUP:
+			/* wait for group number */
+			vfio_group_no = vfio_socket_receive_request(conn_sock);
+			if (vfio_group_no < 0) {
+				close(conn_sock);
+				continue;
+			}
+			for (i = 0; i < vfio_cfg.vfio_group_idx; i++) {
+				if (vfio_cfg.vfio_groups[i].group_no == vfio_group_no)
+					break;
+			}
+			/* if we reached end of the list, the group doesn't exist */
+			if (i == vfio_cfg.vfio_group_idx)
+				vfio_socket_send_request(conn_sock, SOCKET_ERR);
+			/* if VFIO group exists but isn't bound to VFIO driver */
+			else if (vfio_cfg.vfio_groups[i].fd == 0)
+				vfio_socket_send_request(conn_sock, SOCKET_NO_FD);
+			/* if group exists and is bound to VFIO driver */
+			else {
+				vfio_socket_send_request(conn_sock, SOCKET_OK);
+				vfio_socket_send_fd(conn_sock, vfio_cfg.vfio_groups[i].fd);
+			}
+			break;
+		default:
+			vfio_socket_send_request(conn_sock, SOCKET_ERR);
+			break;
+		}
+		close(conn_sock);
+	}
+}
+
+/*
+ * set up a local socket and tell it to listen for incoming connections
+ */
+int
+pci_vfio_socket_setup(void)
+{
+	int ret, socket_fd;
+	struct sockaddr_un addr;
+	socklen_t sockaddr_len;
+
+	/* set up a socket */
+	socket_fd = socket(AF_UNIX, SOCK_SEQPACKET, 0);
+	if (socket_fd < 0) {
+		RTE_LOG(ERR, EAL, "Failed to create socket!\n");
+		return -1;
+	}
+
+	get_socket_path(addr.sun_path, sizeof(addr.sun_path));
+	addr.sun_family = AF_UNIX;
+
+	sockaddr_len = sizeof(struct sockaddr_un);
+
+	unlink(addr.sun_path);
+
+	ret = bind(socket_fd, (struct sockaddr*) &addr, sockaddr_len);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Failed to bind socket: %s!\n", strerror(errno));
+		close(socket_fd);
+		return -1;
+	}
+
+	ret = listen(socket_fd, 50);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Failed to listen: %s!\n", strerror(errno));
+		close(socket_fd);
+		return -1;
+	}
+
+	return socket_fd;
+}
+
+#endif
diff --git a/lib/librte_eal/linuxapp/eal/include/eal_internal_cfg.h b/lib/librte_eal/linuxapp/eal/include/eal_internal_cfg.h
index 92e3065..5468b0a 100644
--- a/lib/librte_eal/linuxapp/eal/include/eal_internal_cfg.h
+++ b/lib/librte_eal/linuxapp/eal/include/eal_internal_cfg.h
@@ -40,6 +40,7 @@
 #define _EAL_LINUXAPP_INTERNAL_CFG
 
 #include <rte_eal.h>
+#include <rte_pci_dev_feature_defs.h>
 
 #define MAX_HUGEPAGE_SIZES 3  /**< support up to 3 page sizes */
 
@@ -76,6 +77,8 @@ struct internal_config {
 	volatile uint64_t socket_mem[RTE_MAX_NUMA_NODES]; /**< amount of memory per socket */
 	uintptr_t base_virtaddr;          /**< base address to try and reserve memory from */
 	volatile int syslog_facility;	  /**< facility passed to openlog() */
+	/** default interrupt mode for VFIO */
+	volatile enum rte_intr_mode vfio_intr_mode;
 	const char *hugefile_prefix;      /**< the base filename of hugetlbfs files */
 	const char *hugepage_dir;         /**< specific hugetlbfs directory to use */
 
diff --git a/lib/librte_eal/linuxapp/eal/include/eal_pci_init.h b/lib/librte_eal/linuxapp/eal/include/eal_pci_init.h
index 699e80d..b163ab5 100644
--- a/lib/librte_eal/linuxapp/eal/include/eal_pci_init.h
+++ b/lib/librte_eal/linuxapp/eal/include/eal_pci_init.h
@@ -34,6 +34,8 @@
 #ifndef EAL_PCI_INIT_H_
 #define EAL_PCI_INIT_H_
 
+#include "eal_vfio.h"
+
 struct pci_map {
 	void *addr;
 	uint64_t offset;
@@ -62,4 +64,57 @@ void * pci_map_resource(void *requested_addr, int fd, off_t offset, size_t size)
 /* map IGB_UIO resource prototype */
 int pci_uio_map_resource(struct rte_pci_device *dev);
 
+#ifdef VFIO_PRESENT
+
+#define VFIO_MAX_GROUPS 64
+#define VFIO_DIR "/dev/vfio"
+#define VFIO_CONTAINER_PATH "/dev/vfio/vfio"
+#define VFIO_GROUP_FMT "/dev/vfio/%u"
+#define VFIO_GET_REGION_ADDR(x) ((uint64_t) x << 40ULL)
+
+/* map VFIO resource prototype */
+int pci_vfio_map_resource(struct rte_pci_device *dev);
+
+/*
+ * Function prototypes for VFIO socket functions
+ */
+int vfio_socket_send_request(int socket, int req);
+int vfio_socket_receive_request(int socket);
+int vfio_socket_send_fd(int socket, int fd);
+int vfio_socket_receive_fd(int socket);
+int vfio_socket_connect_to_primary(void);
+int pci_vfio_socket_setup(void);
+void * pci_vfio_socket_thread(void *arg);
+
+/* socket comm protocol definitions */
+#define SOCKET_REQ_CONTAINER 0x100
+#define SOCKET_REQ_GROUP 0x200
+#define SOCKET_OK 0x0
+#define SOCKET_NO_FD 0x1
+#define SOCKET_ERR 0xFF
+
+/*
+ * we don't need to store device fd's anywhere since they can be obtained from
+ * the group fd via an ioctl() call.
+ */
+struct vfio_group {
+	int group_no;
+	int fd;
+};
+
+struct vfio_config {
+	int vfio_enabled;
+	int vfio_container_fd;
+	int vfio_container_has_dma;
+	int vfio_group_idx;
+	struct vfio_group vfio_groups[VFIO_MAX_GROUPS];
+};
+
+/* per-process VFIO config */
+struct vfio_config vfio_cfg;
+
+pthread_t socket_thread;
+
+#endif
+
 #endif /* EAL_PCI_INIT_H_ */
diff --git a/lib/librte_eal/linuxapp/eal/include/eal_vfio.h b/lib/librte_eal/linuxapp/eal/include/eal_vfio.h
index ca4982b..32953c0 100644
--- a/lib/librte_eal/linuxapp/eal/include/eal_vfio.h
+++ b/lib/librte_eal/linuxapp/eal/include/eal_vfio.h
@@ -42,6 +42,12 @@
 #if LINUX_VERSION_CODE >= KERNEL_VERSION(3,6,0)
 #include <linux/vfio.h>
 
+#if LINUX_VERSION_CODE < KERNEL_VERSION(3,10,0)
+#define RTE_PCI_MSIX_TABLE_BIR 0x7
+#else
+#define RTE_PCI_MSIX_TABLE_BIR PCI_MSIX_TABLE_BIR
+#endif
+
 #define VFIO_PRESENT
 #endif /* kernel version */
 #endif /* RTE_EAL_VFIO */
-- 
1.7.0.7

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v2 09/16] Enable VFIO device binding
  2014-05-01 11:05 [dpdk-dev] [PATCH 00/16] [RFC] [VFIO] Add VFIO support to DPDK Burakov, Anatoly
                   ` (9 preceding siblings ...)
  2014-05-19 15:51 ` [dpdk-dev] [PATCH v2 08/16] Add support for mapping devices through VFIO Anatoly Burakov
@ 2014-05-19 15:51 ` Anatoly Burakov
  2014-05-22 12:03   ` Thomas Monjalon
  2014-05-19 15:51 ` [dpdk-dev] [PATCH v2 10/16] Added support for selecting VFIO interrupt type from EAL command-line Anatoly Burakov
                   ` (6 subsequent siblings)
  17 siblings, 1 reply; 160+ messages in thread
From: Anatoly Burakov @ 2014-05-19 15:51 UTC (permalink / raw)
  To: dev

Add support for binding VFIO devices if RTE_PCI_DRV_NEED_IGB_UIO
is set for this driver. Try VFIO first, if not mapped then try
IGB_UIO too.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/linuxapp/eal/eal_pci.c |   55 ++++++++++++++++++++++++++++++++-
 1 files changed, 54 insertions(+), 1 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c
index 7256406..953abe6 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -401,6 +401,7 @@ rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, struct rte_pci_device *d
 {
 	struct rte_pci_id *id_table;
 	int ret = 0;
+	int mapped = 0;
 
 	for (id_table = dr->id_table ; id_table->vendor_id != 0; id_table++) {
 
@@ -435,8 +436,17 @@ rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, struct rte_pci_device *d
 		}
 
 		if (dr->drv_flags & RTE_PCI_DRV_NEED_MAPPING) {
+			/* try mapping the NIC resources using VFIO if it exists */
+#ifdef VFIO_PRESENT
+			if (vfio_cfg.vfio_enabled) {
+				if ((ret = pci_vfio_map_resource(dev)) == 0)
+					mapped = 1;
+				else if (ret < 0)
+					return ret;
+			}
+#endif
 			/* map resources for devices that use igb_uio */
-			if ((ret = pci_uio_map_resource(dev)) != 0)
+			if (!mapped && (ret = pci_uio_map_resource(dev)) != 0)
 				return ret;
 		} else if (dr->drv_flags & RTE_PCI_DRV_FORCE_UNBIND &&
 		           rte_eal_process_type() == RTE_PROC_PRIMARY) {
@@ -471,5 +481,48 @@ rte_eal_pci_init(void)
 		RTE_LOG(ERR, EAL, "%s(): Cannot scan PCI bus\n", __func__);
 		return -1;
 	}
+#ifdef VFIO_PRESENT
+	memset(&vfio_cfg, 0, sizeof(vfio_cfg));
+
+	/* initialize group list */
+	int i, ret;
+
+	for (i = 0; i < VFIO_MAX_GROUPS; i++) {
+		vfio_cfg.vfio_groups[i].fd = -1;
+		vfio_cfg.vfio_groups[i].group_no = -1;
+	}
+	vfio_cfg.vfio_container_fd = -1;
+
+	/* check if we have VFIO driver enabled */
+	if (access(VFIO_DIR, F_OK) == 0) {
+		static int socket_fd;
+
+		vfio_cfg.vfio_enabled = 1;
+
+		/* if we are primary process, create a thread to communicate with
+		 * secondary processes. the thread will use a socket to wait for
+		 * requests from secondary process to send open file descriptors,
+		 * because VFIO does not allow multiple open descriptors on a group or
+		 * VFIO container.
+		 */
+		if (internal_config.process_type == RTE_PROC_PRIMARY) {
+			/* set up local socket */
+			if ((socket_fd = pci_vfio_socket_setup()) < 0) {
+				RTE_LOG(ERR, EAL, "Failed to set up local socket!\n");
+				return -1;
+			}
+			ret = pthread_create(&socket_thread, NULL,
+					pci_vfio_socket_thread, (void*) &socket_fd);
+			if (ret) {
+				RTE_LOG(ERR, EAL,
+					"Failed to create thread for communication with secondary "
+					"processes!\n");
+				return -1;
+			}
+		}
+	}
+	else
+		RTE_LOG(INFO, EAL, "VFIO driver not loaded or wrong permissions\n");
+#endif
 	return 0;
 }
-- 
1.7.0.7

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v2 10/16] Added support for selecting VFIO interrupt type from EAL command-line
  2014-05-01 11:05 [dpdk-dev] [PATCH 00/16] [RFC] [VFIO] Add VFIO support to DPDK Burakov, Anatoly
                   ` (10 preceding siblings ...)
  2014-05-19 15:51 ` [dpdk-dev] [PATCH v2 09/16] Enable VFIO device binding Anatoly Burakov
@ 2014-05-19 15:51 ` Anatoly Burakov
  2014-05-20  7:40   ` Stephen Hemminger
  2014-05-22 12:34   ` Thomas Monjalon
  2014-05-19 15:51 ` [dpdk-dev] [PATCH v2 11/16] Make --no-huge use mmap instead of malloc Anatoly Burakov
                   ` (5 subsequent siblings)
  17 siblings, 2 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-05-19 15:51 UTC (permalink / raw)
  To: dev

Unlike igb_uio, VFIO interrupt type is not set by kernel module
parameters but is set up via ioctl() calls at runtime. This warrants
a new EAL command-line parameter. It will have no effect if VFIO is
not compiled, but will set VFIO interrupt type to either "legacy" or
"msix" if VFIO support is compiled. Note that VFIO initialization will
fail if the interrupt type selected is not supported by the system.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/linuxapp/eal/eal.c |   32 ++++++++++++++++++++++++++++++++
 1 files changed, 32 insertions(+), 0 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index 01bfd6c..bae1078 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -99,6 +99,7 @@
 #define OPT_BASE_VIRTADDR   "base-virtaddr"
 #define OPT_XEN_DOM0    "xen-dom0"
 #define OPT_CREATE_UIO_DEV "create-uio-dev"
+#define OPT_VFIO_INTR    "vfio-intr"
 
 #define RTE_EAL_BLACKLIST_SIZE	0x100
 
@@ -361,6 +362,7 @@ eal_usage(const char *prgname)
 	       "  --"OPT_VMWARE_TSC_MAP": use VMware TSC map instead of "
 	    		   "native RDTSC\n"
 	       "  --"OPT_BASE_VIRTADDR": specify base virtual address\n"
+	       "  --"OPT_VFIO_INTR": specify desired interrupt mode for VFIO (intx|msix)\n"
 	       "  --"OPT_CREATE_UIO_DEV": create /dev/uioX (usually done by hotplug)\n"
 	       "\nEAL options for DEBUG use only:\n"
 	       "  --"OPT_NO_HUGE"  : use malloc instead of hugetlbfs\n"
@@ -579,6 +581,27 @@ eal_parse_base_virtaddr(const char *arg)
 	return 0;
 }
 
+static int
+eal_parse_vfio_intr(const char *mode)
+{
+	unsigned i;
+	static struct {
+		const char *name;
+		enum rte_intr_mode value;
+	} map[] = {
+		{ "legacy", RTE_INTR_MODE_LEGACY },
+		{ "msix", RTE_INTR_MODE_MSIX },
+	};
+
+	for (i = 0; i < RTE_DIM(map); i++) {
+		if (!strcmp(mode, map[i].name)) {
+			internal_config.vfio_intr_mode = map[i].value;
+			return 0;
+		}
+	}
+	return -1;
+}
+
 static inline size_t
 eal_get_hugepage_mem_size(void)
 {
@@ -633,6 +656,7 @@ eal_parse_args(int argc, char **argv)
 		{OPT_PCI_BLACKLIST, 1, 0, 0},
 		{OPT_VDEV, 1, 0, 0},
 		{OPT_SYSLOG, 1, NULL, 0},
+		{OPT_VFIO_INTR, 1, NULL, 0},
 		{OPT_BASE_VIRTADDR, 1, 0, 0},
 		{OPT_XEN_DOM0, 0, 0, 0},
 		{OPT_CREATE_UIO_DEV, 1, NULL, 0},
@@ -828,6 +852,14 @@ eal_parse_args(int argc, char **argv)
 					return -1;
 				}
 			}
+			else if (!strcmp(lgopts[option_index].name, OPT_VFIO_INTR)) {
+				if (eal_parse_vfio_intr(optarg) < 0) {
+					RTE_LOG(ERR, EAL, "invalid parameters for --"
+							OPT_VFIO_INTR "\n");
+					eal_usage(prgname);
+					return -1;
+				}
+			}
 			else if (!strcmp(lgopts[option_index].name, OPT_CREATE_UIO_DEV)) {
 				internal_config.create_uio_dev = 1;
 			}
-- 
1.7.0.7

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v2 11/16] Make --no-huge use mmap instead of malloc
  2014-05-01 11:05 [dpdk-dev] [PATCH 00/16] [RFC] [VFIO] Add VFIO support to DPDK Burakov, Anatoly
                   ` (11 preceding siblings ...)
  2014-05-19 15:51 ` [dpdk-dev] [PATCH v2 10/16] Added support for selecting VFIO interrupt type from EAL command-line Anatoly Burakov
@ 2014-05-19 15:51 ` Anatoly Burakov
  2014-05-22 13:04   ` Thomas Monjalon
  2014-05-19 15:51 ` [dpdk-dev] [PATCH v2 12/16] Adding unit tests for VFIO EAL command-line parameter Anatoly Burakov
                   ` (4 subsequent siblings)
  17 siblings, 1 reply; 160+ messages in thread
From: Anatoly Burakov @ 2014-05-19 15:51 UTC (permalink / raw)
  To: dev

This makes it possible to run DPDK without hugepage memory when VFIO
is used, as VFIO uses virtual addresses to set up DMA mappings.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/linuxapp/eal/eal_memory.c |    8 +++++++-
 1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
index 5a10a80..3fc0d28 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -1031,7 +1031,13 @@ rte_eal_hugepage_init(void)
 
 	/* hugetlbfs can be disabled */
 	if (internal_config.no_hugetlbfs) {
-		addr = malloc(internal_config.memory);
+		addr = mmap(NULL, internal_config.memory, PROT_READ | PROT_WRITE,
+				MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
+		if (addr == MAP_FAILED) {
+			RTE_LOG(ERR, EAL, "%s: mmap() failed: %s\n", __func__,
+					strerror(errno));
+			return -1;
+		}
 		mcfg->memseg[0].phys_addr = (phys_addr_t)(uintptr_t)addr;
 		mcfg->memseg[0].addr = addr;
 		mcfg->memseg[0].len = internal_config.memory;
-- 
1.7.0.7

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v2 12/16] Adding unit tests for VFIO EAL command-line parameter
  2014-05-01 11:05 [dpdk-dev] [PATCH 00/16] [RFC] [VFIO] Add VFIO support to DPDK Burakov, Anatoly
                   ` (12 preceding siblings ...)
  2014-05-19 15:51 ` [dpdk-dev] [PATCH v2 11/16] Make --no-huge use mmap instead of malloc Anatoly Burakov
@ 2014-05-19 15:51 ` Anatoly Burakov
  2014-05-19 15:51 ` [dpdk-dev] [PATCH v2 13/16] Removed PCI ID table from igb_uio Anatoly Burakov
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-05-19 15:51 UTC (permalink / raw)
  To: dev

Adding unit tests for VFIO interrupt type command-line parameter.
We don't know if VFIO is compiled (eal_vfio.h header is internal
to Linuxapp EAL), so we check this flag regardless.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 app/test/test_eal_flags.c |   24 ++++++++++++++++++++++++
 1 files changed, 24 insertions(+), 0 deletions(-)

diff --git a/app/test/test_eal_flags.c b/app/test/test_eal_flags.c
index 195a1f5..081b47f 100644
--- a/app/test/test_eal_flags.c
+++ b/app/test/test_eal_flags.c
@@ -768,6 +768,18 @@ test_misc_flags(void)
 	const char *argv11[] = {prgname, "--file-prefix=virtaddr",
 			"-c", "1", "-n", "2", "--base-virtaddr=0x12345678"};
 
+	/* try running with --vfio-intr INTx flag */
+	const char *argv12[] = {prgname, "--file-prefix=intr",
+			"-c", "1", "-n", "2", "--vfio-intr=legacy"};
+
+	/* try running with --vfio-intr MSI-X flag */
+	const char *argv13[] = {prgname, "--file-prefix=intr",
+			"-c", "1", "-n", "2", "--vfio-intr=msix"};
+
+	/* try running with --vfio-intr invalid flag */
+	const char *argv14[] = {prgname, "--file-prefix=intr",
+			"-c", "1", "-n", "2", "--vfio-intr=invalid"};
+
 
 	if (launch_proc(argv0) == 0) {
 		printf("Error - process ran ok with invalid flag\n");
@@ -820,6 +832,18 @@ test_misc_flags(void)
 		printf("Error - process did not run ok with --base-virtaddr parameter\n");
 		return -1;
 	}
+	if (launch_proc(argv12) != 0) {
+		printf("Error - process did not run ok with --vfio-intr INTx parameter\n");
+		return -1;
+	}
+	if (launch_proc(argv13) != 0) {
+		printf("Error - process did not run ok with --vfio-intr MSI-X parameter\n");
+		return -1;
+	}
+	if (launch_proc(argv14) == 0) {
+		printf("Error - process run ok with --vfio-intr invalid parameter\n");
+		return -1;
+	}
 	return 0;
 }
 #endif
-- 
1.7.0.7

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v2 13/16] Removed PCI ID table from igb_uio
  2014-05-01 11:05 [dpdk-dev] [PATCH 00/16] [RFC] [VFIO] Add VFIO support to DPDK Burakov, Anatoly
                   ` (13 preceding siblings ...)
  2014-05-19 15:51 ` [dpdk-dev] [PATCH v2 12/16] Adding unit tests for VFIO EAL command-line parameter Anatoly Burakov
@ 2014-05-19 15:51 ` Anatoly Burakov
  2014-05-22 13:13   ` Thomas Monjalon
  2014-05-19 15:51 ` [dpdk-dev] [PATCH v2 14/16] Renamed igb_uio_bind to dpdk_nic_bind Anatoly Burakov
                   ` (2 subsequent siblings)
  17 siblings, 1 reply; 160+ messages in thread
From: Anatoly Burakov @ 2014-05-19 15:51 UTC (permalink / raw)
  To: dev

Note that since igb_uio no longer has a PCI ID list, it can now be
bound to any device, not just those explicitly supported by DPDK. In
other words, it now behaves similar to PCI stub, VFIO and other generic
PCI drivers.

Therefore to bind a new device to igb_uio, the user will now have to
first write its PCI ID to "new_id" file inside the igb_uio driver
directory, and only then write the PCI ID to "bind". This will be
reflected in later changes to PCI binding script as well.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/linuxapp/igb_uio/igb_uio.c |   21 +--------------------
 1 files changed, 1 insertions(+), 20 deletions(-)

diff --git a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
index 043c0f6..d30c94a 100644
--- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
+++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
@@ -65,25 +65,6 @@ struct rte_uio_pci_dev {
 static char *intr_mode = NULL;
 static enum rte_intr_mode igbuio_intr_mode_preferred = RTE_INTR_MODE_MSIX;
 
-/* PCI device id table */
-static struct pci_device_id igbuio_pci_ids[] = {
-#define RTE_PCI_DEV_ID_DECL_EM(vend, dev) {PCI_DEVICE(vend, dev)},
-#define RTE_PCI_DEV_ID_DECL_IGB(vend, dev) {PCI_DEVICE(vend, dev)},
-#define RTE_PCI_DEV_ID_DECL_IGBVF(vend, dev) {PCI_DEVICE(vend, dev)},
-#define RTE_PCI_DEV_ID_DECL_IXGBE(vend, dev) {PCI_DEVICE(vend, dev)},
-#define RTE_PCI_DEV_ID_DECL_IXGBEVF(vend, dev) {PCI_DEVICE(vend, dev)},
-#ifdef RTE_LIBRTE_VIRTIO_PMD
-#define RTE_PCI_DEV_ID_DECL_VIRTIO(vend, dev) {PCI_DEVICE(vend, dev)},
-#endif
-#ifdef RTE_LIBRTE_VMXNET3_PMD
-#define RTE_PCI_DEV_ID_DECL_VMXNET3(vend, dev) {PCI_DEVICE(vend, dev)},
-#endif
-#include <rte_pci_dev_ids.h>
-{ 0, },
-};
-
-MODULE_DEVICE_TABLE(pci, igbuio_pci_ids);
-
 static inline struct rte_uio_pci_dev *
 igbuio_get_uio_pci_dev(struct uio_info *info)
 {
@@ -620,7 +601,7 @@ igbuio_config_intr_mode(char *intr_str)
 
 static struct pci_driver igbuio_pci_driver = {
 	.name = "igb_uio",
-	.id_table = igbuio_pci_ids,
+	.id_table = NULL,
 	.probe = igbuio_pci_probe,
 	.remove = igbuio_pci_remove,
 };
-- 
1.7.0.7

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v2 14/16] Renamed igb_uio_bind to dpdk_nic_bind
  2014-05-01 11:05 [dpdk-dev] [PATCH 00/16] [RFC] [VFIO] Add VFIO support to DPDK Burakov, Anatoly
                   ` (14 preceding siblings ...)
  2014-05-19 15:51 ` [dpdk-dev] [PATCH v2 13/16] Removed PCI ID table from igb_uio Anatoly Burakov
@ 2014-05-19 15:51 ` Anatoly Burakov
  2014-05-19 15:51 ` [dpdk-dev] [PATCH v2 15/16] Added support for VFIO drivers in dpdk_nic_bind.py Anatoly Burakov
  2014-05-19 15:51 ` [dpdk-dev] [PATCH v2 16/16] Adding support for VFIO to setup.sh Anatoly Burakov
  17 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-05-19 15:51 UTC (permalink / raw)
  To: dev

Renaming the igb_uio_bind script to dpdk_nic_bind to have a generic
name since we're now supporting two drivers.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 tools/{igb_uio_bind.py => dpdk_nic_bind.py} |    0
 1 files changed, 0 insertions(+), 0 deletions(-)
 rename tools/{igb_uio_bind.py => dpdk_nic_bind.py} (100%)

diff --git a/tools/igb_uio_bind.py b/tools/dpdk_nic_bind.py
similarity index 100%
rename from tools/igb_uio_bind.py
rename to tools/dpdk_nic_bind.py
-- 
1.7.0.7

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v2 15/16] Added support for VFIO drivers in dpdk_nic_bind.py
  2014-05-01 11:05 [dpdk-dev] [PATCH 00/16] [RFC] [VFIO] Add VFIO support to DPDK Burakov, Anatoly
                   ` (15 preceding siblings ...)
  2014-05-19 15:51 ` [dpdk-dev] [PATCH v2 14/16] Renamed igb_uio_bind to dpdk_nic_bind Anatoly Burakov
@ 2014-05-19 15:51 ` Anatoly Burakov
  2014-05-22 13:23   ` Thomas Monjalon
  2014-05-19 15:51 ` [dpdk-dev] [PATCH v2 16/16] Adding support for VFIO to setup.sh Anatoly Burakov
  17 siblings, 1 reply; 160+ messages in thread
From: Anatoly Burakov @ 2014-05-19 15:51 UTC (permalink / raw)
  To: dev

Since igb_uio no longer has a PCI ID list, the script will no
longer distinguish between supported and unsupported NICs.

There's a weird behaviour of sysfs when a new device ID is added to
new_id. Subsequent writing to "bind" will result in IOError on
closing the file. This error is harmless but it triggers the
exception anyway, so in order to work around that, we check if the
device was actually bound to the driver before raising an error.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 tools/dpdk_nic_bind.py |  163 ++++++++++++++++++++++++++----------------------
 1 files changed, 89 insertions(+), 74 deletions(-)

diff --git a/tools/dpdk_nic_bind.py b/tools/dpdk_nic_bind.py
index 824aa2b..06fb28a 100755
--- a/tools/dpdk_nic_bind.py
+++ b/tools/dpdk_nic_bind.py
@@ -42,8 +42,8 @@ ETHERNET_CLASS = "0200"
 # global dict ethernet devices present. Dictionary indexed by PCI address.
 # Each device within this is itself a dictionary of device properties
 devices = {}
-# list of vendor:device pairs (again stored as dictionary) supported by igb_uio
-module_dev_ids = []
+# list of supported DPDK drivers
+dpdk_drivers = [ "igb_uio", "vfio-pci" ]
 
 def usage():
     '''Print usage information for the program'''
@@ -147,64 +147,70 @@ def find_module(mod):
                 return path
 
 def check_modules():
-    '''Checks that the needed modules (igb_uio) is loaded, and then
-    determine from the .ko file, what its supported device ids are'''
-    global module_dev_ids
+    '''Checks that the needed modules (igb_uio or vfio_pci) are loaded'''
+    global dpdk_drivers
     
     fd = file("/proc/modules")
     loaded_mods = fd.readlines()
     fd.close()
-    mod = "igb_uio"
+    
+    # list of supported modules
+    mods =  [{"Name" : driver, "Found" : False} for driver in dpdk_drivers]
     
     # first check if module is loaded
-    found = False
     for line in loaded_mods:
-        if line.startswith(mod):
-            found = True
-            break
-    if not found:
-        print "Error - module %s not loaded" %mod
-        sys.exit(1)
-    
-    # now find the .ko and get list of supported vendor/dev-ids
-    modpath = find_module(mod)
-    if modpath is None:
-        print "Cannot find module file %s" % (mod + ".ko")
+        for mod in mods:
+            if line.startswith(mod["Name"]):
+                mod["Found"] = True
+            # special case for vfio_pci (module is named vfio-pci,
+            # but its .ko is named vfio_pci)
+            elif line.replace("_", "-").startswith(mod["Name"]):
+                mod["Found"] = True
+
+    # check if we have at least one loaded module
+    if True not in [mod["Found"] for mod in mods]:
+        print "Error - no supported modules are loaded"
         sys.exit(1)
-    depmod_output = check_output(["depmod", "-n", modpath]).splitlines()
-    for line in depmod_output:
-        if not line.startswith("alias"):
-            continue
-        if not line.endswith(mod):
-            continue
-        lineparts = line.split()
-        if not(lineparts[1].startswith("pci:")):
-            continue;
-        else:
-            lineparts[1] = lineparts[1][4:]
-        vendor = lineparts[1][:9]
-        device = lineparts[1][9:18]
-        if vendor.startswith("v") and device.startswith("d"):
-            module_dev_ids.append({"Vendor": int(vendor[1:],16), 
-                                   "Device": int(device[1:],16)})
-
-def is_supported_device(dev_id):
-    '''return true if device is supported by igb_uio, false otherwise'''
-    for dev in module_dev_ids:
-        if (dev["Vendor"] == devices[dev_id]["Vendor"] and 
-            dev["Device"] == devices[dev_id]["Device"]):
-            return True
-    return False
+        
+    # change DPDK driver list to only contain drivers that are loaded
+    dpdk_drivers = [mod["Name"] for mod in mods if mod["Found"]]
+
 
 def has_driver(dev_id):
     '''return true if a device is assigned to a driver. False otherwise'''
     return "Driver_str" in devices[dev_id]
 
+def get_pci_device_details(dev_id):
+    '''This function gets additional details for a PCI device'''
+    device = {}
+    
+    extra_info = check_output(["lspci", "-vmmks", dev_id]).splitlines()
+    
+    # parse lspci details
+    for line in extra_info:
+        if len(line) == 0:
+            continue
+        name, value = line.split("\t", 1)
+        name = name.strip(":") + "_str"
+        device[name] = value
+    # check for a unix interface name
+    sys_path = "/sys/bus/pci/devices/%s/net/" % dev_id
+    if exists(sys_path):
+        device["Interface"] = ",".join(os.listdir(sys_path))
+    else:
+        device["Interface"] = ""
+    # check if a port is used for ssh connection
+    device["Ssh_if"] = False
+    device["Active"] = ""
+    
+    return device
+
 def get_nic_details():
     '''This function populates the "devices" dictionary. The keys used are
     the pci addresses (domain:bus:slot.func). The values are themselves
     dictionaries - one for each NIC.'''
     global devices
+    global dpdk_drivers
     
     # clear any old data
     devices = {} 
@@ -237,38 +243,23 @@ def get_nic_details():
 
     # based on the basic info, get extended text details            
     for d in devices.keys():
-        extra_info = check_output(["lspci", "-vmmks", d]).splitlines()
-        # parse lspci details
-        for line in extra_info:
-            if len(line) == 0:
-                continue
-            name, value = line.split("\t", 1)
-            name = name.strip(":") + "_str"
-            devices[d][name] = value
-        # check for a unix interface name
-        sys_path = "/sys/bus/pci/devices/%s/net/" % d
-        if exists(sys_path):
-            devices[d]["Interface"] = ",".join(os.listdir(sys_path))
-        else:
-            devices[d]["Interface"] = ""
-        # check if a port is used for ssh connection
-        devices[d]["Ssh_if"] = False
-        devices[d]["Active"] = ""
+        # get additional info and add it to existing data
+        devices[d] = dict(devices[d].items() + get_pci_device_details(d).items())
+        
         for _if in ssh_if: 
             if _if in devices[d]["Interface"].split(","):
                 devices[d]["Ssh_if"] = True
                 devices[d]["Active"] = "*Active*"
                 break;
 
-        # add igb_uio to list of supporting modules if needed
-        if is_supported_device(d):
-            if "Module_str" in devices[d]:
-                if "igb_uio" not in devices[d]["Module_str"]:
-                    devices[d]["Module_str"] = devices[d]["Module_str"] + ",igb_uio"
-            else:
-                devices[d]["Module_str"] = "igb_uio"
-        if "Module_str" not in devices[d]:
-            devices[d]["Module_str"] = "<none>"
+        # add DPDK drivers to list of supporting modules if needed
+        if "Module_str" in devices[d]:
+            for driver in dpdk_drivers:
+                if driver not in devices[d]["Module_str"]:
+                    devices[d]["Module_str"] = devices[d]["Module_str"] + ",%s" % driver
+        else:
+            devices[d]["Module_str"] = ",".join(dpdk_drivers)
+        
         # make sure the driver and module strings do not have any duplicates
         if has_driver(d):
             modules = devices[d]["Module_str"].split(",")
@@ -298,7 +289,7 @@ def dev_id_from_dev_name(dev_name):
     sys.exit(1)
 
 def unbind_one(dev_id, force):
-    '''Unbind the device identified by "dev_id" from its current driver'''
+    '''Unbind the device identified by "dev_id" from its current driver'''    
     dev = devices[dev_id]
     if not has_driver(dev_id):
         print "%s %s %s is not currently managed by any driver\n" % \
@@ -329,8 +320,8 @@ def bind_one(dev_id, driver, force):
     
     # prevent disconnection of our ssh session
     if dev["Ssh_if"] and not force:
-        print "Routing table indicates that interface %s is active" \
-            ". Not modifying" % (dev_id)
+        print ("Routing table indicates that interface %s is active"
+               ". Not modifying" % (dev_id))
         return
 
     # unbind any existing drivers we don't want
@@ -343,6 +334,22 @@ def bind_one(dev_id, driver, force):
             unbind_one(dev_id, force)
             dev["Driver_str"] = "" # clear driver string
 
+    # if we are binding to one of DPDK drivers, add PCI id's to that driver
+    if driver in dpdk_drivers:
+        filename = "/sys/bus/pci/drivers/%s/new_id" % driver
+        try:
+            f = open(filename, "w")
+        except:
+            print "Error: bind failed for %s - Cannot open %s" % (dev_id, filename)
+            return
+        try:
+            f.write("%04x %04x" % (dev["Vendor"], dev["Device"]))
+            f.close()
+        except:
+            print "Error: bind failed for %s - Cannot write new PCI ID to " \
+                "driver %s" % (dev_id, driver)
+            return
+
     # do the bind by writing to /sys
     filename = "/sys/bus/pci/drivers/%s/bind" % driver
     try:
@@ -356,6 +363,12 @@ def bind_one(dev_id, driver, force):
         f.write(dev_id)
         f.close()
     except:
+        # for some reason, closing dev_id after adding a new PCI ID to new_id
+        # results in IOError. however, if the device was successfully bound,
+        # we don't care for any errors and can safely ignore IOError
+        tmp = get_pci_device_details(dev_id)
+        if "Driver_str" in tmp and tmp["Driver_str"] == driver:
+            return
         print "Error: bind failed for %s - Cannot bind to driver %s" % (dev_id, driver)
         if saved_driver is not None: # restore any previous driver
             bind_one(dev_id, saved_driver, force)
@@ -399,21 +412,23 @@ def show_status():
     '''Function called when the script is passed the "--status" option. Displays
     to the user what devices are bound to the igb_uio driver, the kernel driver
     or to no driver'''
+    global dpdk_drivers
     kernel_drv = []
-    uio_drv = []
+    dpdk_drv = []
     no_drv = []
+    
     # split our list of devices into the three categories above
     for d in devices.keys():
         if not has_driver(d):
             no_drv.append(devices[d])
             continue
-        if devices[d]["Driver_str"] == "igb_uio":
-            uio_drv.append(devices[d])
+        if devices[d]["Driver_str"] in dpdk_drivers:
+            dpdk_drv.append(devices[d])
         else:
             kernel_drv.append(devices[d])
 
     # print each category separately, so we can clearly see what's used by DPDK
-    display_devices("Network devices using IGB_UIO driver", uio_drv, \
+    display_devices("Network devices using IGB_UIO driver", dpdk_drv, \
                     "drv=%(Driver_str)s unused=%(Module_str)s")
     display_devices("Network devices using kernel driver", kernel_drv,
                     "if=%(Interface)s drv=%(Driver_str)s unused=%(Module_str)s %(Active)s")
-- 
1.7.0.7

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v2 16/16] Adding support for VFIO to setup.sh
  2014-05-01 11:05 [dpdk-dev] [PATCH 00/16] [RFC] [VFIO] Add VFIO support to DPDK Burakov, Anatoly
                   ` (16 preceding siblings ...)
  2014-05-19 15:51 ` [dpdk-dev] [PATCH v2 15/16] Added support for VFIO drivers in dpdk_nic_bind.py Anatoly Burakov
@ 2014-05-19 15:51 ` Anatoly Burakov
  2014-05-22 13:25   ` Thomas Monjalon
  17 siblings, 1 reply; 160+ messages in thread
From: Anatoly Burakov @ 2014-05-19 15:51 UTC (permalink / raw)
  To: dev

Support for loading/unloading VFIO drivers, binding/unbinding
devices to/from VFIO, also setting up correct userspace permissions.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 tools/setup.sh |  168 ++++++++++++++++++++++++++++++++++++++++++++++++--------
 1 files changed, 145 insertions(+), 23 deletions(-)

diff --git a/tools/setup.sh b/tools/setup.sh
index 39be8fc..2ffa55a 100755
--- a/tools/setup.sh
+++ b/tools/setup.sh
@@ -187,6 +187,52 @@ load_igb_uio_module()
 }
 
 #
+# Unloads VFIO modules.
+#
+remove_vfio_module()
+{
+	echo "Unloading any existing VFIO module"
+	/sbin/lsmod | grep -s vfio > /dev/null
+	if [ $? -eq 0 ] ; then
+		sudo /sbin/rmmod vfio-pci
+		sudo /sbin/rmmod vfio_iommu_type1
+		sudo /sbin/rmmod vfio
+	fi
+}
+
+#
+# Loads new vfio-pci (and vfio module if needed).
+#
+load_vfio_module()
+{
+	remove_vfio_module
+
+	echo "Loading VFIO module"
+	/sbin/lsmod | grep -s vfio_pci > /dev/null
+	if [ $? -ne 0 ] ; then
+		if [ -f /lib/modules/$(uname -r)/kernel/drivers/vfio/pci/vfio-pci.ko ] ; then
+			sudo /sbin/modprobe vfio-pci
+		fi
+	fi
+	
+	# make sure regular users can read /dev/vfio
+	echo "chmod /dev/vfio"
+	sudo /usr/bin/chmod a+x /dev/vfio
+	if [ $? -ne 0 ] ; then
+		echo "FAIL"
+		quit
+	fi
+	echo "OK"
+	
+	# check if /dev/vfio/vfio exists - that way we
+	# know we either loaded the module, or it was
+	# compiled into the kernel
+	if [ ! -e /dev/vfio/vfio ] ; then
+		echo "## ERROR: VFIO not found!"
+	fi
+}
+
+#
 # Unloads the rte_kni.ko module.
 #
 remove_kni_module()
@@ -223,6 +269,53 @@ load_kni_module()
 }
 
 #
+# Sets appropriate permissions on /dev/vfio/* files
+#
+set_vfio_permissions()
+{
+	# make sure regular users can read /dev/vfio
+	echo "chmod /dev/vfio"
+	sudo /usr/bin/chmod a+x /dev/vfio
+	if [ $? -ne 0 ] ; then
+		echo "FAIL"
+		quit
+	fi
+	echo "OK"
+	
+	# make sure regular user can access everything inside /dev/vfio
+	echo "chmod /dev/vfio/*"
+	sudo /usr/bin/chmod 0666 /dev/vfio/*
+	if [ $? -ne 0 ] ; then
+		echo "FAIL"
+		quit
+	fi
+	echo "OK"
+	
+	# since permissions are only to be set when running as
+	# regular user, we only check ulimit here
+	#
+	# warn if regular user is only allowed
+	# to memlock <64M of memory
+	MEMLOCK_AMNT=`ulimit -l`
+	
+	if [ "$MEMLOCK_AMNT" != "unlimited" ] ; then
+		MEMLOCK_MB=`expr $MEMLOCK_AMNT / 1024`
+		echo ""
+		echo "Current user memlock limit: ${MEMLOCK_MB} MB"
+		echo ""
+		echo "This is the maximum amount of memory you will be"
+		echo "able to use with DPDK and VFIO if run as current user."
+		echo "To change this, please adjust limits.conf memlock limit for current user."
+		
+		if [ $MEMLOCK_AMNT -lt 65536 ] ; then
+			echo ""
+			echo "## WARNING: memlock limit is less than 64MB"
+			echo "## DPDK with VFIO may not be able to initialize if run as current user."
+		fi
+	fi
+}
+
+#
 # Removes all reserved hugepages.
 #
 clear_huge_pages()
@@ -324,13 +417,13 @@ grep_meminfo()
 }
 
 #
-# Calls igb_uio_bind.py --status to show the NIC and what they
+# Calls dpdk_nic_bind.py --status to show the NIC and what they
 # are all bound to, in terms of drivers.
 #
 show_nics()
 {
 	if  /sbin/lsmod  | grep -q igb_uio ; then 
-		${RTE_SDK}/tools/igb_uio_bind.py --status
+		${RTE_SDK}/tools/dpdk_nic_bind.py --status
 	else 
 		echo "# Please load the 'igb_uio' kernel module before querying or "
 		echo "# adjusting NIC device bindings"
@@ -338,16 +431,33 @@ show_nics()
 }
 
 #
-# Uses igb_uio_bind.py to move devices to work with igb_uio
+# Uses dpdk_nic_bind.py to move devices to work with igb_uio
 #
-bind_nics()
+bind_nics_to_vfio()
+{
+	if  /sbin/lsmod  | grep -q vfio_pci ; then 
+		${RTE_SDK}/tools/dpdk_nic_bind.py --status
+		echo ""
+		echo -n "Enter PCI address of device to bind to VFIO driver: "
+		read PCI_PATH
+		sudo ${RTE_SDK}/tools/dpdk_nic_bind.py -b vfio-pci $PCI_PATH && echo "OK"
+	else 
+		echo "# Please load the 'vfio-pci' kernel module before querying or "
+		echo "# adjusting NIC device bindings"
+	fi
+}
+
+#
+# Uses dpdk_nic_bind.py to move devices to work with igb_uio
+#
+bind_nics_to_igb_uio()
 {
 	if  /sbin/lsmod  | grep -q igb_uio ; then 
-		${RTE_SDK}/tools/igb_uio_bind.py --status
+		${RTE_SDK}/tools/dpdk_nic_bind.py --status
 		echo ""
 		echo -n "Enter PCI address of device to bind to IGB UIO driver: "
 		read PCI_PATH
-		sudo ${RTE_SDK}/tools/igb_uio_bind.py -b igb_uio $PCI_PATH && echo "OK"
+		sudo ${RTE_SDK}/tools/dpdk_nic_bind.py -b igb_uio $PCI_PATH && echo "OK"
 	else 
 		echo "# Please load the 'igb_uio' kernel module before querying or "
 		echo "# adjusting NIC device bindings"
@@ -355,18 +465,18 @@ bind_nics()
 }
 
 #
-# Uses igb_uio_bind.py to move devices to work with kernel drivers again
+# Uses dpdk_nic_bind.py to move devices to work with kernel drivers again
 #
 unbind_nics()
 {
-	${RTE_SDK}/tools/igb_uio_bind.py --status
+	${RTE_SDK}/tools/dpdk_nic_bind.py --status
 	echo ""
 	echo -n "Enter PCI address of device to bind to IGB UIO driver: "
 	read PCI_PATH
 	echo ""
 	echo -n "Enter name of kernel driver to bind the device to: "
 	read DRV
-	sudo ${RTE_SDK}/tools/igb_uio_bind.py -b $DRV $PCI_PATH && echo "OK"
+	sudo ${RTE_SDK}/tools/dpdk_nic_bind.py -b $DRV $PCI_PATH && echo "OK"
 }
 
 #
@@ -396,21 +506,30 @@ step2_func()
 
 	TEXT[1]="Insert IGB UIO module"
 	FUNC[1]="load_igb_uio_module"
+	
+	TEXT[2]="Insert VFIO module"
+	FUNC[2]="load_vfio_module"
 
-	TEXT[2]="Insert KNI module"
-	FUNC[2]="load_kni_module"
+	TEXT[3]="Insert KNI module"
+	FUNC[3]="load_kni_module"
 
-	TEXT[3]="Setup hugepage mappings for non-NUMA systems"
-	FUNC[3]="set_non_numa_pages"
+	TEXT[4]="Setup hugepage mappings for non-NUMA systems"
+	FUNC[4]="set_non_numa_pages"
 
-	TEXT[4]="Setup hugepage mappings for NUMA systems"
-	FUNC[4]="set_numa_pages"
+	TEXT[5]="Setup hugepage mappings for NUMA systems"
+	FUNC[5]="set_numa_pages"
 
-	TEXT[5]="Display current Ethernet device settings"
-	FUNC[5]="show_nics"
+	TEXT[6]="Display current Ethernet device settings"
+	FUNC[6]="show_nics"
 
-	TEXT[6]="Bind Ethernet device to IGB UIO module"
-	FUNC[6]="bind_nics"
+	TEXT[7]="Bind Ethernet device to IGB UIO module"
+	FUNC[7]="bind_nics_to_igb_uio"
+	
+	TEXT[8]="Bind Ethernet device to VFIO module"
+	FUNC[8]="bind_nics_to_vfio"
+
+	TEXT[9]="Setup VFIO permissions"
+	FUNC[9]="set_vfio_permissions"
 }
 
 #
@@ -455,11 +574,14 @@ step5_func()
 	TEXT[3]="Remove IGB UIO module"
 	FUNC[3]="remove_igb_uio_module"
 
-	TEXT[4]="Remove KNI module"
-	FUNC[4]="remove_kni_module"
+	TEXT[4]="Remove VFIO module"
+	FUNC[4]="remove_vfio_module"
+
+	TEXT[5]="Remove KNI module"
+	FUNC[5]="remove_kni_module"
 
-	TEXT[5]="Remove hugepage mappings"
-	FUNC[5]="clear_huge_pages"
+	TEXT[6]="Remove hugepage mappings"
+	FUNC[6]="clear_huge_pages"
 }
 
 STEPS[1]="step1_func"
-- 
1.7.0.7

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [dpdk-dev] [PATCH v2 10/16] Added support for selecting VFIO interrupt type from EAL command-line
  2014-05-19 15:51 ` [dpdk-dev] [PATCH v2 10/16] Added support for selecting VFIO interrupt type from EAL command-line Anatoly Burakov
@ 2014-05-20  7:40   ` Stephen Hemminger
  2014-05-20  8:33     ` Burakov, Anatoly
  2014-05-22 12:34   ` Thomas Monjalon
  1 sibling, 1 reply; 160+ messages in thread
From: Stephen Hemminger @ 2014-05-20  7:40 UTC (permalink / raw)
  To: Anatoly Burakov; +Cc: dev

I really wish the code did automatic fall back based on PCI config. It is
possible to know the right mode, and do the right thing.
Rather than punting the problem out to command line which is totally
unusable in hot plug and generic application.


On Mon, May 19, 2014 at 8:51 AM, Anatoly Burakov
<anatoly.burakov@intel.com>wrote:

> Unlike igb_uio, VFIO interrupt type is not set by kernel module
> parameters but is set up via ioctl() calls at runtime. This warrants
> a new EAL command-line parameter. It will have no effect if VFIO is
> not compiled, but will set VFIO interrupt type to either "legacy" or
> "msix" if VFIO support is compiled. Note that VFIO initialization will
> fail if the interrupt type selected is not supported by the system.
>
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---
>  lib/librte_eal/linuxapp/eal/eal.c |   32 ++++++++++++++++++++++++++++++++
>  1 files changed, 32 insertions(+), 0 deletions(-)
>
> diff --git a/lib/librte_eal/linuxapp/eal/eal.c
> b/lib/librte_eal/linuxapp/eal/eal.c
> index 01bfd6c..bae1078 100644
> --- a/lib/librte_eal/linuxapp/eal/eal.c
> +++ b/lib/librte_eal/linuxapp/eal/eal.c
> @@ -99,6 +99,7 @@
>  #define OPT_BASE_VIRTADDR   "base-virtaddr"
>  #define OPT_XEN_DOM0    "xen-dom0"
>  #define OPT_CREATE_UIO_DEV "create-uio-dev"
> +#define OPT_VFIO_INTR    "vfio-intr"
>
>  #define RTE_EAL_BLACKLIST_SIZE 0x100
>
> @@ -361,6 +362,7 @@ eal_usage(const char *prgname)
>                "  --"OPT_VMWARE_TSC_MAP": use VMware TSC map instead of "
>                            "native RDTSC\n"
>                "  --"OPT_BASE_VIRTADDR": specify base virtual address\n"
> +              "  --"OPT_VFIO_INTR": specify desired interrupt mode for
> VFIO (intx|msix)\n"
>                "  --"OPT_CREATE_UIO_DEV": create /dev/uioX (usually done
> by hotplug)\n"
>                "\nEAL options for DEBUG use only:\n"
>                "  --"OPT_NO_HUGE"  : use malloc instead of hugetlbfs\n"
> @@ -579,6 +581,27 @@ eal_parse_base_virtaddr(const char *arg)
>         return 0;
>  }
>
> +static int
> +eal_parse_vfio_intr(const char *mode)
> +{
> +       unsigned i;
> +       static struct {
> +               const char *name;
> +               enum rte_intr_mode value;
> +       } map[] = {
> +               { "legacy", RTE_INTR_MODE_LEGACY },
> +               { "msix", RTE_INTR_MODE_MSIX },
> +       };
> +
> +       for (i = 0; i < RTE_DIM(map); i++) {
> +               if (!strcmp(mode, map[i].name)) {
> +                       internal_config.vfio_intr_mode = map[i].value;
> +                       return 0;
> +               }
> +       }
> +       return -1;
> +}
> +
>  static inline size_t
>  eal_get_hugepage_mem_size(void)
>  {
> @@ -633,6 +656,7 @@ eal_parse_args(int argc, char **argv)
>                 {OPT_PCI_BLACKLIST, 1, 0, 0},
>                 {OPT_VDEV, 1, 0, 0},
>                 {OPT_SYSLOG, 1, NULL, 0},
> +               {OPT_VFIO_INTR, 1, NULL, 0},
>                 {OPT_BASE_VIRTADDR, 1, 0, 0},
>                 {OPT_XEN_DOM0, 0, 0, 0},
>                 {OPT_CREATE_UIO_DEV, 1, NULL, 0},
> @@ -828,6 +852,14 @@ eal_parse_args(int argc, char **argv)
>                                         return -1;
>                                 }
>                         }
> +                       else if (!strcmp(lgopts[option_index].name,
> OPT_VFIO_INTR)) {
> +                               if (eal_parse_vfio_intr(optarg) < 0) {
> +                                       RTE_LOG(ERR, EAL, "invalid
> parameters for --"
> +                                                       OPT_VFIO_INTR
> "\n");
> +                                       eal_usage(prgname);
> +                                       return -1;
> +                               }
> +                       }
>                         else if (!strcmp(lgopts[option_index].name,
> OPT_CREATE_UIO_DEV)) {
>                                 internal_config.create_uio_dev = 1;
>                         }
> --
> 1.7.0.7
>
>

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [dpdk-dev] [PATCH v2 10/16] Added support for selecting VFIO interrupt type from EAL command-line
  2014-05-20  7:40   ` Stephen Hemminger
@ 2014-05-20  8:33     ` Burakov, Anatoly
  2014-05-20 11:23       ` Stephen Hemminger
  0 siblings, 1 reply; 160+ messages in thread
From: Burakov, Anatoly @ 2014-05-20  8:33 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev

Hi Stephen,

> I really wish the code did automatic fall back based on PCI config. It is possible to know the right mode, and do the right thing.
> Rather than punting the problem out to command line which is totally unusable in hot plug and generic application.

You mean we should use whatever is available rather than default to MSI-X if nothing was explicitly specified? That could be done, I guess.

Best regards,
Anatoly Burakov
DPDK SW Engineer



^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [dpdk-dev] [PATCH v2 10/16] Added support for selecting VFIO interrupt type from EAL command-line
  2014-05-20  8:33     ` Burakov, Anatoly
@ 2014-05-20 11:23       ` Stephen Hemminger
  2014-05-20 11:26         ` Burakov, Anatoly
  0 siblings, 1 reply; 160+ messages in thread
From: Stephen Hemminger @ 2014-05-20 11:23 UTC (permalink / raw)
  To: Burakov, Anatoly; +Cc: dev

On Tue, 20 May 2014 08:33:43 +0000
"Burakov, Anatoly" <anatoly.burakov@intel.com> wrote:

> Hi Stephen,
> 
> > I really wish the code did automatic fall back based on PCI config. It is possible to know the right mode, and do the right thing.
> > Rather than punting the problem out to command line which is totally unusable in hot plug and generic application.
> 
> You mean we should use whatever is available rather than default to MSI-X if nothing was explicitly specified? That could be done, I guess.
> 
> Best regards,
> Anatoly Burakov
> DPDK SW Engineer
> 
> 

I am not sure that MSI-X has any advantage with only one IRQ, so MSI would do.
Then have the code look at the PCI capability of device and fallback to INTX
if needed. It should also check if INTX works, see kernel for details, since
some PCI emulation like VMware is broken.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [dpdk-dev] [PATCH v2 10/16] Added support for selecting VFIO interrupt type from EAL command-line
  2014-05-20 11:23       ` Stephen Hemminger
@ 2014-05-20 11:26         ` Burakov, Anatoly
  2014-05-20 21:39           ` Stephen Hemminger
  0 siblings, 1 reply; 160+ messages in thread
From: Burakov, Anatoly @ 2014-05-20 11:26 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev

Hi Stephen,

> I am not sure that MSI-X has any advantage with only one IRQ, so MSI would
> do.

Igb_uio doesn't support MSI, so I never included MSI support. It can be added though, but I don't see much point.

> Then have the code look at the PCI capability of device and fallback to INTX if
> needed. It should also check if INTX works, see kernel for details, since some
> PCI emulation like VMware is broken.

I believe VFIO itself does that already when it sets up interrupts.

Best regards,
Anatoly Burakov
DPDK SW Engineer

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [dpdk-dev] [PATCH v2 10/16] Added support for selecting VFIO interrupt type from EAL command-line
  2014-05-20 11:26         ` Burakov, Anatoly
@ 2014-05-20 21:39           ` Stephen Hemminger
  0 siblings, 0 replies; 160+ messages in thread
From: Stephen Hemminger @ 2014-05-20 21:39 UTC (permalink / raw)
  To: Burakov, Anatoly; +Cc: dev

Originally igb_uio had code for MSI, but it was broken.
See my patches which fixed that and several other bugs.


On Tue, May 20, 2014 at 8:26 PM, Burakov, Anatoly <anatoly.burakov@intel.com
> wrote:

> Hi Stephen,
>
> > I am not sure that MSI-X has any advantage with only one IRQ, so MSI
> would
> > do.
>
> Igb_uio doesn't support MSI, so I never included MSI support. It can be
> added though, but I don't see much point.
>
> > Then have the code look at the PCI capability of device and fallback to
> INTX if
> > needed. It should also check if INTX works, see kernel for details,
> since some
> > PCI emulation like VMware is broken.
>
> I believe VFIO itself does that already when it sets up interrupts.
>
> Best regards,
> Anatoly Burakov
> DPDK SW Engineer
>
>
>
>

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [dpdk-dev] [PATCH v2 01/16] Separate igb_uio mapping into a separate file
  2014-05-19 15:51 ` [dpdk-dev] [PATCH v2 01/16] Separate igb_uio mapping into a separate file Anatoly Burakov
@ 2014-05-21 12:42   ` Thomas Monjalon
  0 siblings, 0 replies; 160+ messages in thread
From: Thomas Monjalon @ 2014-05-21 12:42 UTC (permalink / raw)
  To: Anatoly Burakov; +Cc: dev

Hi Anatoly,

2014-05-19 16:51, Anatoly Burakov:
> In order to make the code a bit more clean while using multiple
> drivers, IGB_UIO mapping has been separated into its own file.
> 
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>

[...]
>  /* map a particular resource from a file */
> -static void *
> -pci_map_resource(void *requested_addr, const char *devname, off_t offset,
> +void *
> +pci_map_resource(void *requested_addr, int fd, off_t offset,
>  		 size_t size)
>  {
> -	int fd;
>  	void *mapaddr;
> 
> -	/*
> -	 * open devname, to mmap it
> -	 */
> -	fd = open(devname, O_RDWR);
> -	if (fd < 0) {
> -		RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
> -			devname, strerror(errno));
> -		goto fail;
> -	}
> -
>  	/* Map the PCI memory resource of device */
>  	mapaddr = mmap(requested_addr, size, PROT_READ | PROT_WRITE,
>  			MAP_SHARED, fd, offset);
> -	close(fd);
>  	if (mapaddr == MAP_FAILED ||
>  			(requested_addr != NULL && mapaddr != requested_addr)) {
> -		RTE_LOG(ERR, EAL, "%s(): cannot mmap(%s(%d), %p, 0x%lx, 0x%lx):"
> -			" %s (%p)\n", __func__, devname, fd, requested_addr,
> +		RTE_LOG(ERR, EAL, "%s(): cannot mmap(%d, %p, 0x%lx, 0x%lx):"
> +			" %s (%p)\n", __func__, fd, requested_addr,
>  			(unsigned long)size, (unsigned long)offset,
>  			strerror(errno), mapaddr);
>  		goto fail;
[...]
> -static int
> -pci_uio_map_resource(struct rte_pci_device *dev)
[...]
> -		/* if matching map is found, then use it */
> -		if (j != nb_maps) {
> -			offset = j * pagesz;
> -			if (maps[j].addr != NULL ||
> -			    (mapaddr = pci_map_resource(NULL, devname,
> -							(off_t)offset,
> -							(size_t)maps[j].size)
> -			    ) == NULL) {
> -				rte_free(uio_res);
> -				return (-1);
> -			}
> -
> -			maps[j].addr = mapaddr;
> -			maps[j].offset = offset;
> -			dev->mem_resource[i].addr = mapaddr;
> -		}
[...]
> +		/* if matching map is found, then use it */
> +		if (j != nb_maps) {
> +			offset = j * pagesz;
> +
> +			/*
> +			 * open devname, to mmap it
> +			 */
> +			fd = open(uio_res->path, O_RDWR);
> +			if (fd < 0) {
> +				RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
> +					uio_res->path, strerror(errno));
> +				rte_free(uio_res);
> +				return -1;
> +			}
> +
> +			if (maps[j].addr != NULL
> +					|| (mapaddr = pci_map_resource(NULL, fd,
> +							(off_t) offset, (size_t) maps[j].size)) == NULL) 
{
> +				rte_free(uio_res);
> +				close(fd);
> +				return (-1);
> +			}
> +			close(fd);
> +
> +			maps[j].addr = mapaddr;
> +			maps[j].offset = offset;
> +			dev->mem_resource[i].addr = mapaddr;
> +		}

Looking at pci_map_resource(), it seems you are not only moving functions in 
another file.

Please split this patch in a way we can clearly see what is changing.

Thanks
-- 
Thomas

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [dpdk-dev] [PATCH v2 03/16] Rename RTE_PCI_DRV_NEED_IGB_UIO to RTE_PCI_DRV_NEED_MAPPING
  2014-05-19 15:51 ` [dpdk-dev] [PATCH v2 03/16] Rename RTE_PCI_DRV_NEED_IGB_UIO to RTE_PCI_DRV_NEED_MAPPING Anatoly Burakov
@ 2014-05-21 12:55   ` Thomas Monjalon
  0 siblings, 0 replies; 160+ messages in thread
From: Thomas Monjalon @ 2014-05-21 12:55 UTC (permalink / raw)
  To: Anatoly Burakov; +Cc: dev

2014-05-19 16:51, Anatoly Burakov:
> Rename the RTE_PCI_DRV_NEED_IGB_UIO to be more generic,
> retain old macro for backwards compatibility. Probably should
> be removed in one of the next releases.
> 
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>

[...]
> +/** Retain the old name for backwards-compatibility */
> +#define RTE_PCI_DRV_NEED_IGB_UIO RTE_PCI_DRV_NEED_MAPPING

As we are breaking PMD API for other features, I think we don't need to keep 
the old name here.

-- 
Thomas

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [dpdk-dev] [PATCH v2 05/16] Moved interrupt type out of igb_uio
  2014-05-19 15:51 ` [dpdk-dev] [PATCH v2 05/16] Moved interrupt type out of igb_uio Anatoly Burakov
@ 2014-05-21 13:38   ` Thomas Monjalon
  2014-05-21 13:44     ` Burakov, Anatoly
  2014-05-21 13:46   ` Thomas Monjalon
  1 sibling, 1 reply; 160+ messages in thread
From: Thomas Monjalon @ 2014-05-21 13:38 UTC (permalink / raw)
  To: Anatoly Burakov; +Cc: dev

2014-05-19 16:51, Anatoly Burakov:
> Moving interrupt type enum out of igb_uio and renaming it to be more
> generic. Such a strange header naming and separation is done mostly to
> make coming virtio patches easier to port to dpdk.org tree.
> 
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>


> +++ b/lib/librte_eal/common/include/rte_pci_dev_feature_defs.h
[...]
> +++ b/lib/librte_eal/common/include/rte_pci_dev_features.h
[...]
> +#include <rte_pci_dev_feature_defs.h>

Why are you splitting things in 2 files?

+#define RTE_INTR_MODE_MAX_MAX "max"

What is this constant for?

Thanks
-- 
Thomas

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [dpdk-dev] [PATCH v2 05/16] Moved interrupt type out of igb_uio
  2014-05-21 13:38   ` Thomas Monjalon
@ 2014-05-21 13:44     ` Burakov, Anatoly
  0 siblings, 0 replies; 160+ messages in thread
From: Burakov, Anatoly @ 2014-05-21 13:44 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev

Hi Thomas,

> Why are you splitting things in 2 files?
> 

The commit message explains it. Initially this work was based off our internal tree, which had a few virtio-related changes, including these two files. I stripped out most of the virtio stuff and left only things relevant to VFIO, but kept the file structure to make things easier for porting virtio changes.

> +#define RTE_INTR_MODE_MAX_MAX "max"
> 
> What is this constant for?

That's a mistake, I'll remove it.

Best regards,
Anatoly Burakov
DPDK SW Engineer

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [dpdk-dev] [PATCH v2 05/16] Moved interrupt type out of igb_uio
  2014-05-19 15:51 ` [dpdk-dev] [PATCH v2 05/16] Moved interrupt type out of igb_uio Anatoly Burakov
  2014-05-21 13:38   ` Thomas Monjalon
@ 2014-05-21 13:46   ` Thomas Monjalon
  1 sibling, 0 replies; 160+ messages in thread
From: Thomas Monjalon @ 2014-05-21 13:46 UTC (permalink / raw)
  To: Anatoly Burakov; +Cc: dev

Few more comments from checkpatch.pl on this patch:

ERROR: else should follow close brace '}'
#252: FILE: lib/librte_eal/linuxapp/igb_uio/igb_uio.c:225:
 	}
+	else if (udev->mode == RTE_INTR_MODE_LEGACY) {

WARNING: suspect code indent for conditional statements (8, 8)
#316: FILE: lib/librte_eal/linuxapp/igb_uio/igb_uio.c:590:
 	if (((struct rte_uio_pci_dev *)info->priv)->mode ==
[...]
+	pci_disable_msix(dev);

-- 
Thomas

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [dpdk-dev] [PATCH v2 07/16] Add support for VFIO interrupts, add VFIO header
  2014-05-19 15:51 ` [dpdk-dev] [PATCH v2 07/16] Add support for VFIO interrupts, add VFIO header Anatoly Burakov
@ 2014-05-21 16:07   ` Thomas Monjalon
  2014-05-22 12:45     ` Burakov, Anatoly
  0 siblings, 1 reply; 160+ messages in thread
From: Thomas Monjalon @ 2014-05-21 16:07 UTC (permalink / raw)
  To: Anatoly Burakov; +Cc: dev

2014-05-19 16:51, Anatoly Burakov:
> Creating code to handle VFIO interrupts in EAL interrupts, and also
> adding a header eal_vfio.h.

Maybe it's better to have 2 patches here.

> This header checks two things:
> * checks if CONFIG_RTE_EAL_VFIO was enabled during build time
> * checks that kernel version is 3.6+ so that DPDK would still compile
>   on older kernels despite VFIO compilation being enabled by default.

In case VFIO is backported on older kernel, it would be better to check a 
related macro instead of Linux version.
 
> This header also defines a VFIO_PRESENT macro, which should be used to
> conditionally compile all the VFIO code. This is because having
> CONFIG_RTE_EAL_VFIO enabled doesn't guarantee that the VFIO support is
> compiled in, because we're still dependent on kernel version.
> 
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>

> +	struct vfio_irq_set * irq_set;

There few lines like this one where checkpatch.pl reports an error:
	ERROR: "foo * bar" should be "foo *bar"

Thanks
-- 
Thomas

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [dpdk-dev] [PATCH v2 08/16] Add support for mapping devices through VFIO.
  2014-05-19 15:51 ` [dpdk-dev] [PATCH v2 08/16] Add support for mapping devices through VFIO Anatoly Burakov
@ 2014-05-22 11:53   ` Thomas Monjalon
  2014-05-22 12:06     ` Burakov, Anatoly
  2014-05-27 16:21     ` Burakov, Anatoly
  0 siblings, 2 replies; 160+ messages in thread
From: Thomas Monjalon @ 2014-05-22 11:53 UTC (permalink / raw)
  To: Anatoly Burakov; +Cc: dev

Hi Anatoly,

It seems to be the main patch, so I have many comments.

2014-05-19 16:51, Anatoly Burakov:
> VFIO is kernel 3.6+ only, and so is only compiled when DPDK config
> option CONFIG_RTE_EAL_VFIO is enabled, and kernel 3.6 or higher is
> detected, thus preventing compile failures on older kernels if VFIO is
> enabled in config (and it is, by default).
> 
> Since VFIO cannot be used to map the same device twice, secondary
> processes receive the device/group fd's by means of communicating over a
> local socket. Only group and container fd's should be sent, as device
> fd's can be obtained via ioctl() calls' on the group fd.
> 
> For multiprocess, VFIO distinguishes between existing but unused groups
> (e.g. grups that aren't bound to VFIO driver) and non-existing groups in
> order to know if the secondary process requests a valid group, or if
> secondary process requests something that doesn't exist.
> 
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>

How did you test this feature?
Did you see some performance differences with igb_uio?

>  # workaround for a gcc bug with noreturn attribute
>  # http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12603
>  ifeq ($(CONFIG_RTE_TOOLCHAIN_GCC),y)
>  CFLAGS_eal_thread.o += -Wno-return-type
> -CFLAGS_eal_hpet.o += -Wno-return-type

For history reason, it's better to explain in another patch that eal_hpet has 
been renamed eal_timer and there is no such need anymore in this file.

> +++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
[...]
> + * This code tries to determine if the PCI device is bound to VFIO driver,

We should discuss a way to request igb_uio or VFIO binding of a device.

> +++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio_socket.c

This whole socket communication deserves a separated patch with protocol 
description.
By the way, I'm not a big fan of the suffix "_socket" which can be misleading. 
But I have no other good naming idea.

> +/*
> + * socket listening thread for primary process
> + */
> +__attribute__((noreturn)) void *
> +pci_vfio_socket_thread(void *arg)

So we have another thread to manage.
I don't see where it is spawned?

> --- a/lib/librte_eal/linuxapp/eal/include/eal_pci_init.h
> +++ b/lib/librte_eal/linuxapp/eal/include/eal_pci_init.h
[...]
> +struct vfio_config vfio_cfg;
> +
> +pthread_t socket_thread;

You are defining some variables in a .h file. I think it is a problem.


Here are some other relevant errors from checkpatch.pl:

ERROR: "foo * bar" should be "foo *bar"
#197: FILE: lib/librte_eal/linuxapp/eal/eal_pci_vfio.c:64:
+pci_vfio_get_msix_bar(int fd, int * msix_bar)

ERROR: space required before the open brace '{'
#216: FILE: lib/librte_eal/linuxapp/eal/eal_pci_vfio.c:83:
+	while (cap_offset){

ERROR: "foo * bar" should be "foo *bar"
#301: FILE: lib/librte_eal/linuxapp/eal/eal_pci_vfio.c:168:
+	const struct rte_memseg * ms = rte_eal_get_physmem_layout();

ERROR: space required before the open parenthesis '('
#517: FILE: lib/librte_eal/linuxapp/eal/eal_pci_vfio.c:384:
+		switch(ret) {

ERROR: "foo * bar" should be "foo *bar"
#541: FILE: lib/librte_eal/linuxapp/eal/eal_pci_vfio.c:408:
+pci_vfio_get_group_no(const char * pci_addr)

ERROR: "foo * bar" should be "foo *bar"
#545: FILE: lib/librte_eal/linuxapp/eal/eal_pci_vfio.c:412:
+	char * tok[16], *group_tok, *end;

ERROR: else should follow close brace '}'
#673: FILE: lib/librte_eal/linuxapp/eal/eal_pci_vfio.c:540:
+	}
+	else if (!(group_status.flags & VFIO_GROUP_FLAGS_VIABLE)) {

WARNING: space prohibited between function name and open parenthesis '('
#751: FILE: lib/librte_eal/linuxapp/eal/eal_pci_vfio.c:618:
+		if ((vfio_res = rte_zmalloc("VFIO_RES", sizeof (*vfio_res), 0)) == 
NULL) {

ERROR: "foo * bar" should be "foo *bar"
#784: FILE: lib/librte_eal/linuxapp/eal/eal_pci_vfio.c:651:
+		void * bar_addr;

ERROR: return is not a function, parentheses are not required
#850: FILE: lib/librte_eal/linuxapp/eal/eal_pci_vfio.c:717:
+	return (0);

ERROR: space required before the open parenthesis '('
#933: FILE: lib/librte_eal/linuxapp/eal/eal_pci_vfio_socket.c:75:
+		} while(0)

WARNING: Single statement macros should not use a do {} while (0) loop
#934: FILE: lib/librte_eal/linuxapp/eal/eal_pci_vfio_socket.c:76:
+#define CMSGHDR_TO_FD(chdr,fd) \
+		do {\
+			memcpy(&(fd), (chdr).__cmsg_data, sizeof(fd));\
+		} while (0)

ERROR: "foo * bar" should be "foo *bar"
#942: FILE: lib/librte_eal/linuxapp/eal/eal_pci_vfio_socket.c:84:
+get_socket_path(char * buffer, int bufsz)

ERROR: "foo * bar" should be "foo *bar"
#1026: FILE: lib/librte_eal/linuxapp/eal/eal_pci_vfio_socket.c:168:
+	struct cmsghdr * chdr;

ERROR: "foo * bar" should be "foo *bar"
#1057: FILE: lib/librte_eal/linuxapp/eal/eal_pci_vfio_socket.c:199:
+	struct cmsghdr * chdr;

ERROR: "foo * bar" should be "foo *bar"
#1284: FILE: lib/librte_eal/linuxapp/eal/include/eal_pci_init.h:87:
+void * pci_vfio_socket_thread(void *arg);


Thanks
-- 
Thomas

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [dpdk-dev] [PATCH v2 09/16] Enable VFIO device binding
  2014-05-19 15:51 ` [dpdk-dev] [PATCH v2 09/16] Enable VFIO device binding Anatoly Burakov
@ 2014-05-22 12:03   ` Thomas Monjalon
  0 siblings, 0 replies; 160+ messages in thread
From: Thomas Monjalon @ 2014-05-22 12:03 UTC (permalink / raw)
  To: Anatoly Burakov; +Cc: dev

2014-05-19 16:51, Anatoly Burakov:
> Add support for binding VFIO devices if RTE_PCI_DRV_NEED_IGB_UIO
> is set for this driver. Try VFIO first, if not mapped then try
> IGB_UIO too.

You have renamed RTE_PCI_DRV_NEED_IGB_UIO. Please update this log :)

> --- a/lib/librte_eal/linuxapp/eal/eal_pci.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
> @@ -401,6 +401,7 @@ rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr,
> struct rte_pci_device *d {
>  	struct rte_pci_id *id_table;
>  	int ret = 0;
> +	int mapped = 0;
> 
>  	for (id_table = dr->id_table ; id_table->vendor_id != 0; id_table++) {
> 
> @@ -435,8 +436,17 @@ rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr,
> struct rte_pci_device *d }
> 
>  		if (dr->drv_flags & RTE_PCI_DRV_NEED_MAPPING) {
> +			/* try mapping the NIC resources using VFIO if it exists */
> +#ifdef VFIO_PRESENT
> +			if (vfio_cfg.vfio_enabled) {
> +				if ((ret = pci_vfio_map_resource(dev)) == 0)
> +					mapped = 1;
> +				else if (ret < 0)
> +					return ret;
> +			}
> +#endif
>  			/* map resources for devices that use igb_uio */
> -			if ((ret = pci_uio_map_resource(dev)) != 0)
> +			if (!mapped && (ret = pci_uio_map_resource(dev)) != 0)
>  				return ret;

I think creating a function pci_map_resource() could be cleaner (you won't 
need variable mapped).

> +#ifdef VFIO_PRESENT
> +	memset(&vfio_cfg, 0, sizeof(vfio_cfg));
> +
> +	/* initialize group list */
> +	int i, ret;
> +
> +	for (i = 0; i < VFIO_MAX_GROUPS; i++) {
> +		vfio_cfg.vfio_groups[i].fd = -1;
> +		vfio_cfg.vfio_groups[i].group_no = -1;
> +	}
> +	vfio_cfg.vfio_container_fd = -1;
> +
> +	/* check if we have VFIO driver enabled */
> +	if (access(VFIO_DIR, F_OK) == 0) {
> +		static int socket_fd;
> +
> +		vfio_cfg.vfio_enabled = 1;
> +
> +		/* if we are primary process, create a thread to communicate with
> +		 * secondary processes. the thread will use a socket to wait for
> +		 * requests from secondary process to send open file descriptors,
> +		 * because VFIO does not allow multiple open descriptors on a group 
or
> +		 * VFIO container.
> +		 */
> +		if (internal_config.process_type == RTE_PROC_PRIMARY) {
> +			/* set up local socket */
> +			if ((socket_fd = pci_vfio_socket_setup()) < 0) {
> +				RTE_LOG(ERR, EAL, "Failed to set up local socket!\n");
> +				return -1;
> +			}
> +			ret = pthread_create(&socket_thread, NULL,
> +					pci_vfio_socket_thread, (void*) &socket_fd);
> +			if (ret) {
> +				RTE_LOG(ERR, EAL,
> +					"Failed to create thread for communication with secondary 
"
> +					"processes!\n");
> +				return -1;
> +			}
> +		}

Also here, it could help to have a dedicated function for vfio init.

> +	}
> +	else

checkpatch.pl reports an error: "else should follow close brace '}'"

> +		RTE_LOG(INFO, EAL, "VFIO driver not loaded or wrong permissions\n");
> +#endif

Thanks
-- 
Thomas

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [dpdk-dev] [PATCH v2 08/16] Add support for mapping devices through VFIO.
  2014-05-22 11:53   ` Thomas Monjalon
@ 2014-05-22 12:06     ` Burakov, Anatoly
  2014-05-22 12:28       ` Thomas Monjalon
  2014-05-27 16:21     ` Burakov, Anatoly
  1 sibling, 1 reply; 160+ messages in thread
From: Burakov, Anatoly @ 2014-05-22 12:06 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev

Hi Thomas,

> How did you test this feature?
> Did you see some performance differences with igb_uio?

The same way everything else is tested - bind a NIC to the driver and see if it works :-)

As for performance differences, potentially it can be degraded a bit because of mandatory IOMMU involvement, but I did not see any performance impact during my tests.

> For history reason, it's better to explain in another patch that eal_hpet has
> been renamed eal_timer and there is no such need anymore in this file.

Agreed.

> 
> We should discuss a way to request igb_uio or VFIO binding of a device.

Why? The device can either be bound to VFIO or igb_uio. So unless you want binding code in DPDK EAL (to avoid which the pci_unbind/igb_uio_bind/dpdk_bind script was created in the first place), I see no point in that. The dpdk_bind script already does that (you bind either to igb_uio or to vfio-pci).

> This whole socket communication deserves a separated patch with protocol
> description.

Agreed, I'll break it up and provide a more detailed explanation.

> By the way, I'm not a big fan of the suffix "_socket" which can be misleading.
> But I have no other good naming idea.

Would _mp_socket do?
 
> So we have another thread to manage.
> I don't see where it is spawned?

In rte_eal_pci_init().

> You are defining some variables in a .h file. I think it is a problem.

Well, they need to be shared between several .c files.
 
> Here are some other relevant errors from checkpatch.pl:

Thanks, I'll fix those.

Best regards,
Anatoly Burakov
DPDK SW Engineer

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [dpdk-dev] [PATCH v2 08/16] Add support for mapping devices through VFIO.
  2014-05-22 12:06     ` Burakov, Anatoly
@ 2014-05-22 12:28       ` Thomas Monjalon
  2014-05-22 12:37         ` Burakov, Anatoly
  0 siblings, 1 reply; 160+ messages in thread
From: Thomas Monjalon @ 2014-05-22 12:28 UTC (permalink / raw)
  To: Burakov, Anatoly; +Cc: dev

2014-05-22 12:06, Burakov, Anatoly:
> > We should discuss a way to request igb_uio or VFIO binding of a device.
> 
> Why? The device can either be bound to VFIO or igb_uio. So unless you want
> binding code in DPDK EAL (to avoid which the
> pci_unbind/igb_uio_bind/dpdk_bind script was created in the first place), I
> see no point in that. The dpdk_bind script already does that (you bind
> either to igb_uio or to vfio-pci).

Yes, in some environments, it could be easier to be able to configure devices 
directly on application command line instead of having to call a python 
script.
I think having a clear and extendable syntax to configure devices in command 
line could greatly improve usability. But it can be another step.

> > This whole socket communication deserves a separated patch with protocol
> > description.
> 
> Agreed, I'll break it up and provide a more detailed explanation.

Thanks.

> > By the way, I'm not a big fan of the suffix "_socket" which can be
> > misleading. But I have no other good naming idea.
> 
> Would _mp_socket do?

What do you think of _mp_sync or _mp_conf?
Usage of the socket is to synchronize VFIO config between processes, right?

> > So we have another thread to manage.
> > I don't see where it is spawned?
> 
> In rte_eal_pci_init().

Oh yes. Do you think you could merge the thread spawning in the patch adding 
it?

> > You are defining some variables in a .h file. I think it is a problem.
> 
> Well, they need to be shared between several .c files.

So you should use an "extern" trick in order to have only one instance of the 
variables. But I think it's not a good practice.
You probably need to group functions using these variables in one .c file.
Or do I miss something?

> > Here are some other relevant errors from checkpatch.pl:
> Thanks, I'll fix those.

Thank you
-- 
Thomas

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [dpdk-dev] [PATCH v2 10/16] Added support for selecting VFIO interrupt type from EAL command-line
  2014-05-19 15:51 ` [dpdk-dev] [PATCH v2 10/16] Added support for selecting VFIO interrupt type from EAL command-line Anatoly Burakov
  2014-05-20  7:40   ` Stephen Hemminger
@ 2014-05-22 12:34   ` Thomas Monjalon
  2014-05-28 10:35     ` Burakov, Anatoly
  1 sibling, 1 reply; 160+ messages in thread
From: Thomas Monjalon @ 2014-05-22 12:34 UTC (permalink / raw)
  To: Anatoly Burakov; +Cc: dev

2014-05-19 16:51, Anatoly Burakov:
> Unlike igb_uio, VFIO interrupt type is not set by kernel module
> parameters but is set up via ioctl() calls at runtime. This warrants
> a new EAL command-line parameter. It will have no effect if VFIO is
> not compiled, but will set VFIO interrupt type to either "legacy" or
> "msix" if VFIO support is compiled. Note that VFIO initialization will
> fail if the interrupt type selected is not supported by the system.
> 
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>

> +			}
>  			else if (!strcmp(lgopts[option_index].name, OPT_CREATE_UIO_DEV)) 

another code style issue reported by checkpatch.pl ;)

But it should be fixed by removing this code as Stephen suggests.

-- 
Thomas

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [dpdk-dev] [PATCH v2 08/16] Add support for mapping devices through VFIO.
  2014-05-22 12:28       ` Thomas Monjalon
@ 2014-05-22 12:37         ` Burakov, Anatoly
  2014-05-22 12:46           ` Thomas Monjalon
  0 siblings, 1 reply; 160+ messages in thread
From: Burakov, Anatoly @ 2014-05-22 12:37 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev

Hi Thomas,

> Yes, in some environments, it could be easier to be able to configure devices
> directly on application command line instead of having to call a python script.
> I think having a clear and extendable syntax to configure devices in command
> line could greatly improve usability. But it can be another step.

That's probably out of scope for this patch. We can discuss this later without stalling VFIO :)

> What do you think of _mp_sync or _mp_conf?
> Usage of the socket is to synchronize VFIO config between processes, right?

More or less, yes. However, the code inside that file is the communication mechanism. I.e. it's not actually synchronizing or configuring anything, it's simply providing means to do so for primary and secondary processes, so I don't think _mp_sync or _mp_conf is a good name for that. IMO something like _mp_socket or similar (_mp_comm?) would be more appropriate. 

> Oh yes. Do you think you could merge the thread spawning in the patch
> adding it?

Good point, I'll do that.

> So you should use an "extern" trick in order to have only one instance of the
> variables. But I think it's not a good practice.
> You probably need to group functions using these variables in one .c file.
> Or do I miss something?

I'll look into this.

Best regards,
Anatoly Burakov
DPDK SW Engineer

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [dpdk-dev] [PATCH v2 07/16] Add support for VFIO interrupts, add VFIO header
  2014-05-21 16:07   ` Thomas Monjalon
@ 2014-05-22 12:45     ` Burakov, Anatoly
  2014-05-22 12:49       ` Thomas Monjalon
  0 siblings, 1 reply; 160+ messages in thread
From: Burakov, Anatoly @ 2014-05-22 12:45 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev

Hi Thomas,

> In case VFIO is backported on older kernel, it would be better to check a
> related macro instead of Linux version.

Not sure I follow. What is the "related macro" you're referring to?

Best regards,
Anatoly Burakov
DPDK SW Engineer

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [dpdk-dev] [PATCH v2 08/16] Add support for mapping devices through VFIO.
  2014-05-22 12:37         ` Burakov, Anatoly
@ 2014-05-22 12:46           ` Thomas Monjalon
  2014-05-22 12:54             ` Burakov, Anatoly
  0 siblings, 1 reply; 160+ messages in thread
From: Thomas Monjalon @ 2014-05-22 12:46 UTC (permalink / raw)
  To: Burakov, Anatoly; +Cc: dev

2014-05-22 12:37, Burakov, Anatoly:
> > Yes, in some environments, it could be easier to be able to configure
> > devices directly on application command line instead of having to call a
> > python script. I think having a clear and extendable syntax to configure
> > devices in command line could greatly improve usability. But it can be
> > another step.
> 
> That's probably out of scope for this patch. We can discuss this later
> without stalling VFIO :)

Yes, I agree to discuss it later.

> > What do you think of _mp_sync or _mp_conf?
> > Usage of the socket is to synchronize VFIO config between processes,
> > right?
> 
> More or less, yes. However, the code inside that file is the communication
> mechanism. I.e. it's not actually synchronizing or configuring anything,
> it's simply providing means to do so for primary and secondary processes,
> so I don't think _mp_sync or _mp_conf is a good name for that. IMO
> something like _mp_socket or similar (_mp_comm?) would be more appropriate.

Yes I agree. But I stopped on the name for another thing: it's not really 
specific to vfio. Actually, vfio uses it for synchronization. But wouldn't it 
be more generic?

-- 
Thomas

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [dpdk-dev] [PATCH v2 07/16] Add support for VFIO interrupts, add VFIO header
  2014-05-22 12:45     ` Burakov, Anatoly
@ 2014-05-22 12:49       ` Thomas Monjalon
  2014-05-22 12:54         ` Burakov, Anatoly
  0 siblings, 1 reply; 160+ messages in thread
From: Thomas Monjalon @ 2014-05-22 12:49 UTC (permalink / raw)
  To: Burakov, Anatoly; +Cc: dev

2014-05-22 12:45, Burakov, Anatoly:
> > In case VFIO is backported on older kernel, it would be better to check a
> > related macro instead of Linux version.
> 
> Not sure I follow. What is the "related macro" you're referring to?

I don't know if there is something defined in a Linux header which could help 
to check if VFIO is supported. But in general, it's better to check for a 
macro belonging to the feature instead of checking kernel version.

-- 
Thomas

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [dpdk-dev] [PATCH v2 07/16] Add support for VFIO interrupts, add VFIO header
  2014-05-22 12:49       ` Thomas Monjalon
@ 2014-05-22 12:54         ` Burakov, Anatoly
  2014-05-27 14:29           ` Burakov, Anatoly
  0 siblings, 1 reply; 160+ messages in thread
From: Burakov, Anatoly @ 2014-05-22 12:54 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev

Hi Thomas,

> I don't know if there is something defined in a Linux header which could help
> to check if VFIO is supported. But in general, it's better to check for a macro
> belonging to the feature instead of checking kernel version.

Not sure VFIO defines any macros anywhere, but I'll look into it, thanks.

Best regards,
Anatoly Burakov
DPDK SW Engineer

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [dpdk-dev] [PATCH v2 08/16] Add support for mapping devices through VFIO.
  2014-05-22 12:46           ` Thomas Monjalon
@ 2014-05-22 12:54             ` Burakov, Anatoly
  0 siblings, 0 replies; 160+ messages in thread
From: Burakov, Anatoly @ 2014-05-22 12:54 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev

Hi Thomas,

> Yes I agree. But I stopped on the name for another thing: it's not really
> specific to vfio. Actually, vfio uses it for synchronization. But wouldn't it be
> more generic?

OK, _mp_sync it is then.

Best regards,
Anatoly Burakov
DPDK SW Engineer

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [dpdk-dev] [PATCH v2 11/16] Make --no-huge use mmap instead of malloc
  2014-05-19 15:51 ` [dpdk-dev] [PATCH v2 11/16] Make --no-huge use mmap instead of malloc Anatoly Burakov
@ 2014-05-22 13:04   ` Thomas Monjalon
  0 siblings, 0 replies; 160+ messages in thread
From: Thomas Monjalon @ 2014-05-22 13:04 UTC (permalink / raw)
  To: Anatoly Burakov; +Cc: dev

2014-05-19 16:51, Anatoly Burakov:
> This makes it possible to run DPDK without hugepage memory when VFIO
> is used, as VFIO uses virtual addresses to set up DMA mappings.
> 
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>

> -		addr = malloc(internal_config.memory);
> +		addr = mmap(NULL, internal_config.memory, PROT_READ | PROT_WRITE,
> +				MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);

Please, could you add a comment to explain why using mmap helps?

-- 
Thomas

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [dpdk-dev] [PATCH v2 13/16] Removed PCI ID table from igb_uio
  2014-05-19 15:51 ` [dpdk-dev] [PATCH v2 13/16] Removed PCI ID table from igb_uio Anatoly Burakov
@ 2014-05-22 13:13   ` Thomas Monjalon
  2014-05-22 13:24     ` Burakov, Anatoly
                       ` (2 more replies)
  0 siblings, 3 replies; 160+ messages in thread
From: Thomas Monjalon @ 2014-05-22 13:13 UTC (permalink / raw)
  To: Anatoly Burakov; +Cc: dev

2014-05-19 16:51, Anatoly Burakov:
> Note that since igb_uio no longer has a PCI ID list, it can now be
> bound to any device, not just those explicitly supported by DPDK. In
> other words, it now behaves similar to PCI stub, VFIO and other generic
> PCI drivers.

I wonder if we could replace igb_uio by uio_pci_generic?

> Therefore to bind a new device to igb_uio, the user will now have to
> first write its PCI ID to "new_id" file inside the igb_uio driver
> directory, and only then write the PCI ID to "bind". This will be
> reflected in later changes to PCI binding script as well.

Please explain in the commit log why you are removing PCI ids from igb_uio.

-- 
Thomas

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [dpdk-dev] [PATCH v2 15/16] Added support for VFIO drivers in dpdk_nic_bind.py
  2014-05-19 15:51 ` [dpdk-dev] [PATCH v2 15/16] Added support for VFIO drivers in dpdk_nic_bind.py Anatoly Burakov
@ 2014-05-22 13:23   ` Thomas Monjalon
  0 siblings, 0 replies; 160+ messages in thread
From: Thomas Monjalon @ 2014-05-22 13:23 UTC (permalink / raw)
  To: Anatoly Burakov; +Cc: dev

Please, could you remove trailing whitespaces from your patch?

Thanks
-- 
Thomas

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [dpdk-dev] [PATCH v2 13/16] Removed PCI ID table from igb_uio
  2014-05-22 13:13   ` Thomas Monjalon
@ 2014-05-22 13:24     ` Burakov, Anatoly
  2014-05-22 13:28       ` Thomas Monjalon
  2014-05-22 23:11     ` Stephen Hemminger
  2014-05-23  0:10     ` Antti Kantee
  2 siblings, 1 reply; 160+ messages in thread
From: Burakov, Anatoly @ 2014-05-22 13:24 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev

Hi Thomas,

> I wonder if we could replace igb_uio by uio_pci_generic?

Can it do DMA or IOMMU support? Not even VFIO does everything we need, you may have noticed that I have to go to PCI config space to enable bus mastering. I don't think that driver can do either of those things. Unless, of course, you meant simply not tying the binding script to specific driver, which in effect is already in place (although it does check whether vfio-pci or igb_uio are loaded).

Best regards,
Anatoly Burakov
DPDK SW Engineer

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [dpdk-dev] [PATCH v2 16/16] Adding support for VFIO to setup.sh
  2014-05-19 15:51 ` [dpdk-dev] [PATCH v2 16/16] Adding support for VFIO to setup.sh Anatoly Burakov
@ 2014-05-22 13:25   ` Thomas Monjalon
  0 siblings, 0 replies; 160+ messages in thread
From: Thomas Monjalon @ 2014-05-22 13:25 UTC (permalink / raw)
  To: Anatoly Burakov; +Cc: dev

2014-05-19 16:51, Anatoly Burakov:
> Support for loading/unloading VFIO drivers, binding/unbinding
> devices to/from VFIO, also setting up correct userspace permissions.
> 
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>

> -		${RTE_SDK}/tools/igb_uio_bind.py --status
> +		${RTE_SDK}/tools/dpdk_nic_bind.py --status

Please merge this kind of change in the patch renaming the script, in order to 
make it atomic.

Last comment: there are some trailing whitespaces to remove.

Thanks for this important patch serie.
-- 
Thomas

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [dpdk-dev] [PATCH v2 13/16] Removed PCI ID table from igb_uio
  2014-05-22 13:24     ` Burakov, Anatoly
@ 2014-05-22 13:28       ` Thomas Monjalon
  0 siblings, 0 replies; 160+ messages in thread
From: Thomas Monjalon @ 2014-05-22 13:28 UTC (permalink / raw)
  To: Burakov, Anatoly; +Cc: dev

2014-05-22 13:24, Burakov, Anatoly:
> Hi Thomas,
> 
> > I wonder if we could replace igb_uio by uio_pci_generic?
> 
> Can it do DMA or IOMMU support? Not even VFIO does everything we need, you
> may have noticed that I have to go to PCI config space to enable bus
> mastering. I don't think that driver can do either of those things. Unless,
> of course, you meant simply not tying the binding script to specific
> driver, which in effect is already in place (although it does check whether
> vfio-pci or igb_uio are loaded).

Actually my question was: is there something in igb_uio which is not already 
handled by uio_pci_generic?

-- 
Thomas

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [dpdk-dev] [PATCH v2 13/16] Removed PCI ID table from igb_uio
  2014-05-22 13:13   ` Thomas Monjalon
  2014-05-22 13:24     ` Burakov, Anatoly
@ 2014-05-22 23:11     ` Stephen Hemminger
  2014-05-23  7:48       ` Thomas Monjalon
  2014-05-23  0:10     ` Antti Kantee
  2 siblings, 1 reply; 160+ messages in thread
From: Stephen Hemminger @ 2014-05-22 23:11 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev

On Thu, 22 May 2014 15:13:49 +0200
Thomas Monjalon <thomas.monjalon@6wind.com> wrote:

> 2014-05-19 16:51, Anatoly Burakov:
> > Note that since igb_uio no longer has a PCI ID list, it can now be
> > bound to any device, not just those explicitly supported by DPDK. In
> > other words, it now behaves similar to PCI stub, VFIO and other generic
> > PCI drivers.  
> 
> I wonder if we could replace igb_uio by uio_pci_generic?

Not as is. I am strarting a new driver for upstream kernel based of
pci_generic plus igb_uio. After discussion with Greg KH, doing a new
driver seems like best idea. 

PCI generic driver as is does not do interrupts, and does not
claim PCI resources from kernel.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [dpdk-dev] [PATCH v2 13/16] Removed PCI ID table from igb_uio
  2014-05-22 13:13   ` Thomas Monjalon
  2014-05-22 13:24     ` Burakov, Anatoly
  2014-05-22 23:11     ` Stephen Hemminger
@ 2014-05-23  0:10     ` Antti Kantee
  2014-05-28 13:45       ` Thomas Monjalon
  2 siblings, 1 reply; 160+ messages in thread
From: Antti Kantee @ 2014-05-23  0:10 UTC (permalink / raw)
  To: dev

On 22/05/14 13:13, Thomas Monjalon wrote:
> 2014-05-19 16:51, Anatoly Burakov:
>> Note that since igb_uio no longer has a PCI ID list, it can now be
>> bound to any device, not just those explicitly supported by DPDK. In
>> other words, it now behaves similar to PCI stub, VFIO and other generic
>> PCI drivers.
>
> I wonder if we could replace igb_uio by uio_pci_generic?

I've been running plenty of the NetBSD kernel PCI drivers in Linux 
userspace on top of uio_pci_generic, including NICs supported by DPDK. 
The only real annoyance is that mainline uio_pci_generic doesn't support 
MSI.  A pseudo-annoyance is that uio_pci_generic turns interrupts off 
from the PCI config space each time after you read an interrupt, so they 
have to be reenabled after each one (and NetBSD kernel drivers tend to 
like using interrupts for everything).

The annoyance of vfio is iommus.  Yes, I want to make the tradeoff of 
possibly scribbling memory vs. not being able to do anything on the 
wrong system.

I'd like to see a generic Linux kernel PCI driver blob without 
annoyances, though not yet annoyed enough to do anything myself ;)

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [dpdk-dev] [PATCH v2 13/16] Removed PCI ID table from igb_uio
  2014-05-22 23:11     ` Stephen Hemminger
@ 2014-05-23  7:48       ` Thomas Monjalon
  0 siblings, 0 replies; 160+ messages in thread
From: Thomas Monjalon @ 2014-05-23  7:48 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev

2014-05-23 08:11, Stephen Hemminger:
> On Thu, 22 May 2014 15:13:49 +0200
> 
> Thomas Monjalon <thomas.monjalon@6wind.com> wrote:
> > 2014-05-19 16:51, Anatoly Burakov:
> > > Note that since igb_uio no longer has a PCI ID list, it can now be
> > > bound to any device, not just those explicitly supported by DPDK. In
> > > other words, it now behaves similar to PCI stub, VFIO and other generic
> > > PCI drivers.
> > 
> > I wonder if we could replace igb_uio by uio_pci_generic?
> 
> Not as is. I am strarting a new driver for upstream kernel based of
> pci_generic plus igb_uio. After discussion with Greg KH, doing a new
> driver seems like best idea.
> 
> PCI generic driver as is does not do interrupts, and does not
> claim PCI resources from kernel.

OK, thanks. Don't hesitate to keep us informed about this work.
-- 
Thomas

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [dpdk-dev] [PATCH v2 07/16] Add support for VFIO interrupts, add VFIO header
  2014-05-22 12:54         ` Burakov, Anatoly
@ 2014-05-27 14:29           ` Burakov, Anatoly
  2014-05-27 14:38             ` Thomas Monjalon
  0 siblings, 1 reply; 160+ messages in thread
From: Burakov, Anatoly @ 2014-05-27 14:29 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev

Hi Thomas,

> > I don't know if there is something defined in a Linux header which could
> help
> > to check if VFIO is supported. But in general, it's better to check for a macro
> > belonging to the feature instead of checking kernel version.
> 
> Not sure VFIO defines any macros anywhere, but I'll look into it, thanks.
> 

There probably isn't. At least I don't see anything in commit log. So unless there's a way to get config options from kernel at compile-time in userspace code, I don't think we can do anything else than kernel version check.

Best regards,
Anatoly Burakov
DPDK SW Engineer

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [dpdk-dev] [PATCH v2 07/16] Add support for VFIO interrupts, add VFIO header
  2014-05-27 14:29           ` Burakov, Anatoly
@ 2014-05-27 14:38             ` Thomas Monjalon
  2014-05-27 14:40               ` Burakov, Anatoly
  0 siblings, 1 reply; 160+ messages in thread
From: Thomas Monjalon @ 2014-05-27 14:38 UTC (permalink / raw)
  To: Burakov, Anatoly; +Cc: dev

2014-05-27 14:29, Burakov, Anatoly:
> > > I don't know if there is something defined in a Linux header which could
> > > help to check if VFIO is supported. But in general, it's better to check
> > > for a macro belonging to the feature instead of checking kernel version.
> > 
> > Not sure VFIO defines any macros anywhere, but I'll look into it, thanks.
> 
> There probably isn't. At least I don't see anything in commit log. So unless
> there's a way to get config options from kernel at compile-time in
> userspace code, I don't think we can do anything else than kernel version
> check.

Isn't it sufficient to check VFIO_API_VERSION?

-- 
Thomas

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [dpdk-dev] [PATCH v2 07/16] Add support for VFIO interrupts, add VFIO header
  2014-05-27 14:38             ` Thomas Monjalon
@ 2014-05-27 14:40               ` Burakov, Anatoly
  2014-05-27 14:46                 ` Thomas Monjalon
  0 siblings, 1 reply; 160+ messages in thread
From: Burakov, Anatoly @ 2014-05-27 14:40 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev

Hi Thomas,

> 2014-05-27 14:29, Burakov, Anatoly:
> > > > I don't know if there is something defined in a Linux header which
> > > > could help to check if VFIO is supported. But in general, it's
> > > > better to check for a macro belonging to the feature instead of checking
> kernel version.
> > >
> > > Not sure VFIO defines any macros anywhere, but I'll look into it, thanks.
> >
> > There probably isn't. At least I don't see anything in commit log. So
> > unless there's a way to get config options from kernel at compile-time
> > in userspace code, I don't think we can do anything else than kernel
> > version check.
> 
> Isn't it sufficient to check VFIO_API_VERSION?

I believe you have to #include <linux/vfio.h> before you can get that macro. The header file isn't present on kernels earlier than 3.6.

Best regards,
Anatoly Burakov
DPDK SW Engineer

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [dpdk-dev] [PATCH v2 07/16] Add support for VFIO interrupts, add VFIO header
  2014-05-27 14:40               ` Burakov, Anatoly
@ 2014-05-27 14:46                 ` Thomas Monjalon
  0 siblings, 0 replies; 160+ messages in thread
From: Thomas Monjalon @ 2014-05-27 14:46 UTC (permalink / raw)
  To: Burakov, Anatoly; +Cc: dev

2014-05-27 14:40, Burakov, Anatoly:
> > 2014-05-27 14:29, Burakov, Anatoly:
> > > > > I don't know if there is something defined in a Linux header which
> > > > > could help to check if VFIO is supported. But in general, it's
> > > > > better to check for a macro belonging to the feature instead of
> > > > > checking kernel version.
> > 
> > > > Not sure VFIO defines any macros anywhere, but I'll look into it,
> > > > thanks.
> > > 
> > > There probably isn't. At least I don't see anything in commit log. So
> > > unless there's a way to get config options from kernel at compile-time
> > > in userspace code, I don't think we can do anything else than kernel
> > > version check.
> > 
> > Isn't it sufficient to check VFIO_API_VERSION?
> 
> I believe you have to #include <linux/vfio.h> before you can get that macro.
> The header file isn't present on kernels earlier than 3.6.

Oh yes, you're right :)

-- 
Thomas

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [dpdk-dev] [PATCH v2 08/16] Add support for mapping devices through VFIO.
  2014-05-22 11:53   ` Thomas Monjalon
  2014-05-22 12:06     ` Burakov, Anatoly
@ 2014-05-27 16:21     ` Burakov, Anatoly
  2014-05-27 16:36       ` Thomas Monjalon
  1 sibling, 1 reply; 160+ messages in thread
From: Burakov, Anatoly @ 2014-05-27 16:21 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev

Hi Thomas,

> You are defining some variables in a .h file. I think it is a problem.

I have managed to move everything to .c files, except for "struct mapped_pci_res_list *pci_res_list;" - which I need in both uio and vfio .c files. I don't think I'll be able to move it out of the eal_pci_init header file. Should declaring it as extern be fine as a compromise?

Best regards,
Anatoly Burakov
DPDK SW Engineer

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [dpdk-dev] [PATCH v2 08/16] Add support for mapping devices through VFIO.
  2014-05-27 16:21     ` Burakov, Anatoly
@ 2014-05-27 16:36       ` Thomas Monjalon
  0 siblings, 0 replies; 160+ messages in thread
From: Thomas Monjalon @ 2014-05-27 16:36 UTC (permalink / raw)
  To: Burakov, Anatoly; +Cc: dev

2014-05-27 16:21, Burakov, Anatoly:

> > You are defining some variables in a .h file. I think it is a problem.
> 
> I have managed to move everything to .c files, except for "struct
> mapped_pci_res_list *pci_res_list;" - which I need in both uio and vfio .c
> files. I don't think I'll be able to move it out of the eal_pci_init header
> file. Should declaring it as extern be fine as a compromise?

I think it's acceptable.
Like this one:
	extern struct pci_device_list pci_device_list;

-- 
Thomas

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [dpdk-dev] [PATCH v2 10/16] Added support for selecting VFIO interrupt type from EAL command-line
  2014-05-22 12:34   ` Thomas Monjalon
@ 2014-05-28 10:35     ` Burakov, Anatoly
  2014-05-28 11:24       ` Thomas Monjalon
  0 siblings, 1 reply; 160+ messages in thread
From: Burakov, Anatoly @ 2014-05-28 10:35 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev

Hi Thomas,

> > +			}
> >  			else if (!strcmp(lgopts[option_index].name,
> OPT_CREATE_UIO_DEV))
> 
> another code style issue reported by checkpatch.pl ;)
> 
> But it should be fixed by removing this code as Stephen suggests.

I'm not sure this could should be removed. Igb_uio allows to pick interrupt mode, so why not VFIO? I've modified my code to try all interrupt modes if nothing was explicitly specified, but why should that preclude the user from selecting a specific interrupt type if he so desires?

As for the style error - the whole chunk of code uses the same style there, so either we fix all of that (in a separate patch?), or leave it as it is.

Best regards,
Anatoly Burakov
DPDK SW Engineer

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [dpdk-dev] [PATCH v2 10/16] Added support for selecting VFIO interrupt type from EAL command-line
  2014-05-28 10:35     ` Burakov, Anatoly
@ 2014-05-28 11:24       ` Thomas Monjalon
  0 siblings, 0 replies; 160+ messages in thread
From: Thomas Monjalon @ 2014-05-28 11:24 UTC (permalink / raw)
  To: Burakov, Anatoly; +Cc: dev

2014-05-28 10:35, Burakov, Anatoly:
> Hi Thomas,
> 
> > > +			}
> > > 
> > >  			else if (!strcmp(lgopts[option_index].name,
> > 
> > OPT_CREATE_UIO_DEV))
> > 
> > another code style issue reported by checkpatch.pl ;)
> > 
> > But it should be fixed by removing this code as Stephen suggests.
> 
> I'm not sure this could should be removed. Igb_uio allows to pick interrupt
> mode, so why not VFIO? I've modified my code to try all interrupt modes if
> nothing was explicitly specified, but why should that preclude the user
> from selecting a specific interrupt type if he so desires?
> 
> As for the style error - the whole chunk of code uses the same style there,
> so either we fix all of that (in a separate patch?), or leave it as it is.

OK to leave it as is.

But please, let's try to keep a clean code style.
About existing code style issues, separated patches for cleaning should be 
well accepted.

Thanks
-- 
Thomas

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [dpdk-dev] [PATCH v2 13/16] Removed PCI ID table from igb_uio
  2014-05-23  0:10     ` Antti Kantee
@ 2014-05-28 13:45       ` Thomas Monjalon
  2014-05-28 14:50         ` Antti Kantee
  2014-05-28 16:24         ` Stephen Hemminger
  0 siblings, 2 replies; 160+ messages in thread
From: Thomas Monjalon @ 2014-05-28 13:45 UTC (permalink / raw)
  To: dev

2014-05-23 00:10, Antti Kantee:
> On 22/05/14 13:13, Thomas Monjalon wrote:
> > 2014-05-19 16:51, Anatoly Burakov:
> >> Note that since igb_uio no longer has a PCI ID list, it can now be
> >> bound to any device, not just those explicitly supported by DPDK. In
> >> other words, it now behaves similar to PCI stub, VFIO and other generic
> >> PCI drivers.
> > 
> > I wonder if we could replace igb_uio by uio_pci_generic?
> 
> I've been running plenty of the NetBSD kernel PCI drivers in Linux
> userspace on top of uio_pci_generic, including NICs supported by DPDK.
> The only real annoyance is that mainline uio_pci_generic doesn't support
> MSI.  A pseudo-annoyance is that uio_pci_generic turns interrupts off
> from the PCI config space each time after you read an interrupt, so they
> have to be reenabled after each one (and NetBSD kernel drivers tend to
> like using interrupts for everything).
> 
> The annoyance of vfio is iommus.  Yes, I want to make the tradeoff of
> possibly scribbling memory vs. not being able to do anything on the
> wrong system.
> 
> I'd like to see a generic Linux kernel PCI driver blob without
> annoyances, though not yet annoyed enough to do anything myself ;)

So maybe it's possible to improve uio_pci_generic in order to replace igb_uio.
If someone wants to work on it, it's possible to stage uio_pci_generic in 
dpdk.org in order to make it ready for kernel.org.

-- 
Thomas

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v3 00/20] Add VFIO support to DPDK
  2014-05-19 15:51 ` [dpdk-dev] [PATCH v2 00/16] " Anatoly Burakov
@ 2014-05-28 14:37   ` Anatoly Burakov
  2014-05-28 14:37     ` [dpdk-dev] [PATCH v3 01/20] pci: move open() out of pci_map_resource, rename structs Anatoly Burakov
                       ` (20 more replies)
  0 siblings, 21 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-05-28 14:37 UTC (permalink / raw)
  To: dev

This patchset adds support for using VFIO instead of IGB_UIO to
map the device BARs.

VFIO is a kernel 3.6+ driver allowing secure DMA from userspace
by means of using IOMMU instead of working directly with physical
memory like igb_uio does.

Short summary:
* Adding support for VFIO in EAL PCI code
* Adding new command-line parameter for VFIO interrupt type
* Adding support for VFIO in setup.sh
* Renaming igb_uio_bind to dpdk_nic_bind and adding support for
  VFIO there
* Removing PCI ID list from igb_uio, effectively making it another
  generic PCI driver similar to pci_stub, vfio-pci et al
* Adding autotest for VFIO interrupt types
* Making igb_uio and VFIO compilation optional

v3 fixes:
* Fixed various checkpatch.pl issues
* Added MSI interrupt support
* Added an option to automatically determine interrupt type
* Fixed various issues of commit atomicity

Anatoly Burakov (20):
  pci: move open() out of pci_map_resource, rename structs
  pci: move uio mapping code to a separate file
  pci: fixing errors in a previous commit found by checkpatch
  pci: distinguish between legitimate failures and non-fatal errors
  pci: Rename RTE_PCI_DRV_NEED_IGB_UIO to RTE_PCI_DRV_NEED_MAPPING
  igb_uio: make igb_uio compilation optional
  igb_uio: Moved interrupt type out of igb_uio
  vfio: add support for VFIO in Linuxapp targets
  vfio: add VFIO header
  interrupts: Add support for VFIO interrupts
  eal: remove -Wno-return-type for non-existent eal_hpet.c
  vfio: create mapping code for VFIO
  vfio: add multiprocess support.
  pci: enable VFIO device binding
  eal: added support for selecting VFIO interrupt type from EAL    
    command-line
  eal: make --no-huge use mmap instead of malloc
  test app: adding unit tests for VFIO EAL command-line parameter
  igb_uio: Removed PCI ID table from igb_uio
  binding script: Renamed igb_uio_bind to dpdk_nic_bind
  setup script: adding support for VFIO to setup.sh

 app/test/test_eal_flags.c                          |  36 +
 app/test/test_pci.c                                |   4 +-
 config/defconfig_i686-default-linuxapp-gcc         |   2 +
 config/defconfig_i686-default-linuxapp-icc         |   2 +
 config/defconfig_x86_64-default-linuxapp-gcc       |   2 +
 config/defconfig_x86_64-default-linuxapp-icc       |   2 +
 lib/librte_eal/bsdapp/eal/eal_pci.c                |   2 +-
 lib/librte_eal/common/Makefile                     |   1 +
 lib/librte_eal/common/eal_common_pci.c             |  16 +-
 lib/librte_eal/common/include/rte_pci.h            |   5 +-
 .../common/include/rte_pci_dev_feature_defs.h      |  46 ++
 .../common/include/rte_pci_dev_features.h          |  44 ++
 lib/librte_eal/linuxapp/Makefile                   |   2 +
 lib/librte_eal/linuxapp/eal/Makefile               |   5 +-
 lib/librte_eal/linuxapp/eal/eal.c                  |  35 +
 lib/librte_eal/linuxapp/eal/eal_interrupts.c       | 285 +++++++-
 lib/librte_eal/linuxapp/eal/eal_memory.c           |   8 +-
 lib/librte_eal/linuxapp/eal/eal_pci.c              | 473 ++-----------
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c          | 403 +++++++++++
 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c         | 781 +++++++++++++++++++++
 lib/librte_eal/linuxapp/eal/eal_pci_vfio_mp_sync.c | 395 +++++++++++
 .../linuxapp/eal/include/eal_internal_cfg.h        |   3 +
 lib/librte_eal/linuxapp/eal/include/eal_pci_init.h | 116 +++
 lib/librte_eal/linuxapp/eal/include/eal_vfio.h     |  55 ++
 .../linuxapp/eal/include/exec-env/rte_interrupts.h |   4 +
 lib/librte_eal/linuxapp/igb_uio/igb_uio.c          |  69 +-
 lib/librte_pmd_e1000/em_ethdev.c                   |   2 +-
 lib/librte_pmd_e1000/igb_ethdev.c                  |   4 +-
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c                |   4 +-
 lib/librte_pmd_vmxnet3/vmxnet3_ethdev.c            |   2 +-
 tools/{igb_uio_bind.py => dpdk_nic_bind.py}        | 157 +++--
 tools/setup.sh                                     | 172 ++++-
 32 files changed, 2550 insertions(+), 587 deletions(-)
 create mode 100644 lib/librte_eal/common/include/rte_pci_dev_feature_defs.h
 create mode 100644 lib/librte_eal/common/include/rte_pci_dev_features.h
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_pci_uio.c
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_pci_vfio_mp_sync.c
 create mode 100644 lib/librte_eal/linuxapp/eal/include/eal_pci_init.h
 create mode 100644 lib/librte_eal/linuxapp/eal/include/eal_vfio.h
 rename tools/{igb_uio_bind.py => dpdk_nic_bind.py} (83%)

-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v3 01/20] pci: move open() out of pci_map_resource, rename structs
  2014-05-28 14:37   ` [dpdk-dev] [PATCH v3 00/20] " Anatoly Burakov
@ 2014-05-28 14:37     ` Anatoly Burakov
  2014-05-28 14:37     ` [dpdk-dev] [PATCH v3 02/20] pci: move uio mapping code to a separate file Anatoly Burakov
                       ` (19 subsequent siblings)
  20 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-05-28 14:37 UTC (permalink / raw)
  To: dev

Separating mapping code and calls to open. This is a preparatory work
for VFIO patch since it'll need to map BARs too but it doesn't use path
in mapped_pci_resource. Also, renaming structs to be more generic.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/linuxapp/eal/eal_pci.c | 125 ++++++++++++++++------------------
 1 file changed, 58 insertions(+), 67 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c
index ac2c1fe..fd88bd0 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -31,39 +31,17 @@
  *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */
 
-#include <ctype.h>
-#include <stdio.h>
-#include <stdlib.h>
 #include <string.h>
-#include <stdarg.h>
-#include <unistd.h>
-#include <inttypes.h>
-#include <sys/types.h>
 #include <sys/stat.h>
 #include <fcntl.h>
-#include <stdarg.h>
-#include <errno.h>
 #include <dirent.h>
-#include <limits.h>
-#include <sys/queue.h>
 #include <sys/mman.h>
-#include <sys/ioctl.h>
 
-#include <rte_interrupts.h>
 #include <rte_log.h>
 #include <rte_pci.h>
-#include <rte_common.h>
-#include <rte_launch.h>
-#include <rte_memory.h>
-#include <rte_memzone.h>
 #include <rte_tailq.h>
-#include <rte_eal.h>
 #include <rte_eal_memconfig.h>
-#include <rte_per_lcore.h>
-#include <rte_lcore.h>
 #include <rte_malloc.h>
-#include <rte_string_fns.h>
-#include <rte_debug.h>
 #include <rte_devargs.h>
 
 #include "rte_pci_dev_ids.h"
@@ -74,15 +52,12 @@
  * @file
  * PCI probing under linux
  *
- * This code is used to simulate a PCI probe by parsing information in
- * sysfs. Moreover, when a registered driver matches a device, the
- * kernel driver currently using it is unloaded and replaced by
- * igb_uio module, which is a very minimal userland driver for Intel
- * network card, only providing access to PCI BAR to applications, and
- * enabling bus master.
+ * This code is used to simulate a PCI probe by parsing information in sysfs.
+ * When a registered device matches a driver, it is then initialized with
+ * IGB_UIO driver (or doesn't initialize, if the device wasn't bound to it).
  */
 
-struct uio_map {
+struct pci_map {
 	void *addr;
 	uint64_t offset;
 	uint64_t size;
@@ -93,18 +68,18 @@ struct uio_map {
  * For multi-process we need to reproduce all PCI mappings in secondary
  * processes, so save them in a tailq.
  */
-struct uio_resource {
-	TAILQ_ENTRY(uio_resource) next;
+struct mapped_pci_resource {
+	TAILQ_ENTRY(mapped_pci_resource) next;
 
 	struct rte_pci_addr pci_addr;
 	char path[PATH_MAX];
-	size_t nb_maps;
-	struct uio_map maps[PCI_MAX_RESOURCE];
+	int nb_maps;
+	struct pci_map maps[PCI_MAX_RESOURCE];
 };
 
-TAILQ_HEAD(uio_res_list, uio_resource);
+TAILQ_HEAD(mapped_pci_res_list, mapped_pci_resource);
+static struct mapped_pci_res_list *pci_res_list;
 
-static struct uio_res_list *uio_res_list = NULL;
 static int pci_parse_sysfs_value(const char *filename, uint64_t *val);
 
 /* unbind kernel driver for this device */
@@ -148,30 +123,17 @@ error:
 
 /* map a particular resource from a file */
 static void *
-pci_map_resource(void *requested_addr, const char *devname, off_t offset,
-		 size_t size)
+pci_map_resource(void *requested_addr, int fd, off_t offset, size_t size)
 {
-	int fd;
 	void *mapaddr;
 
-	/*
-	 * open devname, to mmap it
-	 */
-	fd = open(devname, O_RDWR);
-	if (fd < 0) {
-		RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
-			devname, strerror(errno));
-		goto fail;
-	}
-
 	/* Map the PCI memory resource of device */
 	mapaddr = mmap(requested_addr, size, PROT_READ | PROT_WRITE,
 			MAP_SHARED, fd, offset);
-	close(fd);
 	if (mapaddr == MAP_FAILED ||
 			(requested_addr != NULL && mapaddr != requested_addr)) {
-		RTE_LOG(ERR, EAL, "%s(): cannot mmap(%s(%d), %p, 0x%lx, 0x%lx):"
-			" %s (%p)\n", __func__, devname, fd, requested_addr,
+		RTE_LOG(ERR, EAL, "%s(): cannot mmap(%d, %p, 0x%lx, 0x%lx): %s (%p)\n",
+			__func__, fd, requested_addr,
 			(unsigned long)size, (unsigned long)offset,
 			strerror(errno), mapaddr);
 		goto fail;
@@ -186,10 +148,10 @@ fail:
 }
 
 #define OFF_MAX              ((uint64_t)(off_t)-1)
-static ssize_t
-pci_uio_get_mappings(const char *devname, struct uio_map maps[], size_t nb_maps)
+static int
+pci_uio_get_mappings(const char *devname, struct pci_map maps[], int nb_maps)
 {
-	size_t i;
+	int i;
 	char dirname[PATH_MAX];
 	char filename[PATH_MAX];
 	uint64_t offset, size;
@@ -249,25 +211,37 @@ pci_uio_get_mappings(const char *devname, struct uio_map maps[], size_t nb_maps)
 static int
 pci_uio_map_secondary(struct rte_pci_device *dev)
 {
-        size_t i;
-        struct uio_resource *uio_res;
+	int fd, i;
+	struct mapped_pci_resource *uio_res;
 
-	TAILQ_FOREACH(uio_res, uio_res_list, next) {
+	TAILQ_FOREACH(uio_res, pci_res_list, next) {
 
 		/* skip this element if it doesn't match our PCI address */
 		if (memcmp(&uio_res->pci_addr, &dev->addr, sizeof(dev->addr)))
 			continue;
 
 		for (i = 0; i != uio_res->nb_maps; i++) {
-			if (pci_map_resource(uio_res->maps[i].addr,
-					     uio_res->path,
+			/*
+			 * open devname, to mmap it
+			 */
+			fd = open(uio_res->path, O_RDWR);
+			if (fd < 0) {
+				RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
+					uio_res->path, strerror(errno));
+				return -1;
+			}
+
+			if (pci_map_resource(uio_res->maps[i].addr, fd,
 					     (off_t)uio_res->maps[i].offset,
 					     (size_t)uio_res->maps[i].size)
 			    != uio_res->maps[i].addr) {
 				RTE_LOG(ERR, EAL,
 					"Cannot mmap device resource\n");
+				close(fd);
 				return (-1);
 			}
+			/* fd is not needed in slave process, close it */
+			close(fd);
 		}
 		return (0);
 	}
@@ -276,7 +250,8 @@ pci_uio_map_secondary(struct rte_pci_device *dev)
 	return -1;
 }
 
-static int pci_mknod_uio_dev(const char *sysfs_uio_path, unsigned uio_num)
+static int
+pci_mknod_uio_dev(const char *sysfs_uio_path, unsigned uio_num)
 {
 	FILE *f;
 	char filename[PATH_MAX];
@@ -323,7 +298,8 @@ static int pci_mknod_uio_dev(const char *sysfs_uio_path, unsigned uio_num)
  * sysfs. On error, return a negative value. In this case dstbuf is
  * invalid.
  */
-static int pci_get_uio_dev(struct rte_pci_device *dev, char *dstbuf,
+static int
+pci_get_uio_dev(struct rte_pci_device *dev, char *dstbuf,
 			   unsigned int buflen)
 {
 	struct rte_pci_addr *loc = &dev->addr;
@@ -405,10 +381,10 @@ pci_uio_map_resource(struct rte_pci_device *dev)
 	uint64_t phaddr;
 	uint64_t offset;
 	uint64_t pagesz;
-	ssize_t nb_maps;
+	int nb_maps;
 	struct rte_pci_addr *loc = &dev->addr;
-	struct uio_resource *uio_res;
-	struct uio_map *maps;
+	struct mapped_pci_resource *uio_res;
+	struct pci_map *maps;
 
 	dev->intr_handle.fd = -1;
 	dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
@@ -460,6 +436,7 @@ pci_uio_map_resource(struct rte_pci_device *dev)
 
 	maps = uio_res->maps;
 	for (i = 0; i != PCI_MAX_RESOURCE; i++) {
+		int fd;
 
 		/* skip empty BAR */
 		if ((phaddr = dev->mem_resource[i].phys_addr) == 0)
@@ -473,14 +450,27 @@ pci_uio_map_resource(struct rte_pci_device *dev)
 		/* if matching map is found, then use it */
 		if (j != nb_maps) {
 			offset = j * pagesz;
+
+			/*
+			 * open devname, to mmap it
+			 */
+			fd = open(devname, O_RDWR);
+			if (fd < 0) {
+				RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
+					devname, strerror(errno));
+				return -1;
+			}
+
 			if (maps[j].addr != NULL ||
-			    (mapaddr = pci_map_resource(NULL, devname,
+			    (mapaddr = pci_map_resource(NULL, fd,
 							(off_t)offset,
 							(size_t)maps[j].size)
 			    ) == NULL) {
 				rte_free(uio_res);
+				close(fd);
 				return (-1);
 			}
+			close(fd);
 
 			maps[j].addr = mapaddr;
 			maps[j].offset = offset;
@@ -488,7 +478,7 @@ pci_uio_map_resource(struct rte_pci_device *dev)
 		}
 	}
 
-	TAILQ_INSERT_TAIL(uio_res_list, uio_res, next);
+	TAILQ_INSERT_TAIL(pci_res_list, uio_res, next);
 
 	return (0);
 }
@@ -866,7 +856,8 @@ rte_eal_pci_init(void)
 {
 	TAILQ_INIT(&pci_driver_list);
 	TAILQ_INIT(&pci_device_list);
-	uio_res_list = RTE_TAILQ_RESERVE_BY_IDX(RTE_TAILQ_PCI, uio_res_list);
+	pci_res_list = RTE_TAILQ_RESERVE_BY_IDX(RTE_TAILQ_PCI,
+			mapped_pci_res_list);
 
 	/* for debug purposes, PCI can be disabled */
 	if (internal_config.no_pci)
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v3 02/20] pci: move uio mapping code to a separate file
  2014-05-28 14:37   ` [dpdk-dev] [PATCH v3 00/20] " Anatoly Burakov
  2014-05-28 14:37     ` [dpdk-dev] [PATCH v3 01/20] pci: move open() out of pci_map_resource, rename structs Anatoly Burakov
@ 2014-05-28 14:37     ` Anatoly Burakov
  2014-05-28 14:38     ` [dpdk-dev] [PATCH v3 03/20] pci: fixing errors in a previous commit found by checkpatch Anatoly Burakov
                       ` (18 subsequent siblings)
  20 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-05-28 14:37 UTC (permalink / raw)
  To: dev


Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/linuxapp/eal/Makefile               |   1 +
 lib/librte_eal/linuxapp/eal/eal_pci.c              | 403 +--------------------
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c          | 403 +++++++++++++++++++++
 lib/librte_eal/linuxapp/eal/include/eal_pci_init.h |  66 ++++
 4 files changed, 474 insertions(+), 399 deletions(-)
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_pci_uio.c
 create mode 100644 lib/librte_eal/linuxapp/eal/include/eal_pci_init.h

diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index b00e3ec..527fa2a 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -57,6 +57,7 @@ endif
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_thread.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_log.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci.c
+SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci_uio.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_debug.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_lcore.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_timer.c
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c
index fd88bd0..628813b 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -32,8 +32,6 @@
  */
 
 #include <string.h>
-#include <sys/stat.h>
-#include <fcntl.h>
 #include <dirent.h>
 #include <sys/mman.h>
 
@@ -47,6 +45,7 @@
 #include "rte_pci_dev_ids.h"
 #include "eal_filesystem.h"
 #include "eal_private.h"
+#include "eal_pci_init.h"
 
 /**
  * @file
@@ -57,30 +56,7 @@
  * IGB_UIO driver (or doesn't initialize, if the device wasn't bound to it).
  */
 
-struct pci_map {
-	void *addr;
-	uint64_t offset;
-	uint64_t size;
-	uint64_t phaddr;
-};
-
-/*
- * For multi-process we need to reproduce all PCI mappings in secondary
- * processes, so save them in a tailq.
- */
-struct mapped_pci_resource {
-	TAILQ_ENTRY(mapped_pci_resource) next;
-
-	struct rte_pci_addr pci_addr;
-	char path[PATH_MAX];
-	int nb_maps;
-	struct pci_map maps[PCI_MAX_RESOURCE];
-};
-
-TAILQ_HEAD(mapped_pci_res_list, mapped_pci_resource);
-static struct mapped_pci_res_list *pci_res_list;
-
-static int pci_parse_sysfs_value(const char *filename, uint64_t *val);
+struct mapped_pci_res_list *pci_res_list = NULL;
 
 /* unbind kernel driver for this device */
 static int
@@ -122,8 +98,8 @@ error:
 }
 
 /* map a particular resource from a file */
-static void *
-pci_map_resource(void *requested_addr, int fd, off_t offset, size_t size)
+void *
+pci_map_resource(void * requested_addr, int fd, off_t offset, size_t size)
 {
 	void *mapaddr;
 
@@ -147,342 +123,6 @@ fail:
 	return NULL;
 }
 
-#define OFF_MAX              ((uint64_t)(off_t)-1)
-static int
-pci_uio_get_mappings(const char *devname, struct pci_map maps[], int nb_maps)
-{
-	int i;
-	char dirname[PATH_MAX];
-	char filename[PATH_MAX];
-	uint64_t offset, size;
-
-	for (i = 0; i != nb_maps; i++) {
- 
-		/* check if map directory exists */
-		rte_snprintf(dirname, sizeof(dirname), 
-			"%s/maps/map%u", devname, i);
- 
-		if (access(dirname, F_OK) != 0)
-			break;
- 
-		/* get mapping offset */
-		rte_snprintf(filename, sizeof(filename),
-			"%s/offset", dirname);
-		if (pci_parse_sysfs_value(filename, &offset) < 0) {
-			RTE_LOG(ERR, EAL,
-				"%s(): cannot parse offset of %s\n",
-				__func__, dirname);
-			return (-1);
-		}
- 
-		/* get mapping size */
-		rte_snprintf(filename, sizeof(filename),
-			"%s/size", dirname);
-		if (pci_parse_sysfs_value(filename, &size) < 0) {
-			RTE_LOG(ERR, EAL,
-				"%s(): cannot parse size of %s\n",
-				__func__, dirname);
-			return (-1);
-		}
- 
-		/* get mapping physical address */
-		rte_snprintf(filename, sizeof(filename),
-			"%s/addr", dirname);
-		if (pci_parse_sysfs_value(filename, &maps[i].phaddr) < 0) {
-			RTE_LOG(ERR, EAL,
-				"%s(): cannot parse addr of %s\n",
-				__func__, dirname);
-			return (-1);
-		}
-
-		if ((offset > OFF_MAX) || (size > SIZE_MAX)) {
-			RTE_LOG(ERR, EAL,
-				"%s(): offset/size exceed system max value\n",
-				__func__); 
-			return (-1);
-		}
-
-		maps[i].offset = offset;
-		maps[i].size = size;
-        }
-	return (i);
-}
-
-static int
-pci_uio_map_secondary(struct rte_pci_device *dev)
-{
-	int fd, i;
-	struct mapped_pci_resource *uio_res;
-
-	TAILQ_FOREACH(uio_res, pci_res_list, next) {
-
-		/* skip this element if it doesn't match our PCI address */
-		if (memcmp(&uio_res->pci_addr, &dev->addr, sizeof(dev->addr)))
-			continue;
-
-		for (i = 0; i != uio_res->nb_maps; i++) {
-			/*
-			 * open devname, to mmap it
-			 */
-			fd = open(uio_res->path, O_RDWR);
-			if (fd < 0) {
-				RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
-					uio_res->path, strerror(errno));
-				return -1;
-			}
-
-			if (pci_map_resource(uio_res->maps[i].addr, fd,
-					     (off_t)uio_res->maps[i].offset,
-					     (size_t)uio_res->maps[i].size)
-			    != uio_res->maps[i].addr) {
-				RTE_LOG(ERR, EAL,
-					"Cannot mmap device resource\n");
-				close(fd);
-				return (-1);
-			}
-			/* fd is not needed in slave process, close it */
-			close(fd);
-		}
-		return (0);
-	}
-
-	RTE_LOG(ERR, EAL, "Cannot find resource for device\n");
-	return -1;
-}
-
-static int
-pci_mknod_uio_dev(const char *sysfs_uio_path, unsigned uio_num)
-{
-	FILE *f;
-	char filename[PATH_MAX];
-	int ret;
-	unsigned major, minor;
-	dev_t dev;
-
-	/* get the name of the sysfs file that contains the major and minor
-	 * of the uio device and read its content */
-	rte_snprintf(filename, sizeof(filename), "%s/dev", sysfs_uio_path);
-
-	f = fopen(filename, "r");
-	if (f == NULL) {
-		RTE_LOG(ERR, EAL, "%s(): cannot open sysfs to get major:minor\n",
-			__func__);
-		return -1;
-	}
-
-	ret = fscanf(f, "%d:%d", &major, &minor);
-	if (ret != 2) {
-		RTE_LOG(ERR, EAL, "%s(): cannot parse sysfs to get major:minor\n",
-			__func__);
-		fclose(f);
-		return -1;
-	}
-	fclose(f);
-
-	/* create the char device "mknod /dev/uioX c major minor" */
-	rte_snprintf(filename, sizeof(filename), "/dev/uio%u", uio_num);
-	dev = makedev(major, minor);
-	ret = mknod(filename, S_IFCHR | S_IRUSR | S_IWUSR, dev);
-	if (f == NULL) {
-		RTE_LOG(ERR, EAL, "%s(): mknod() failed %s\n",
-			__func__, strerror(errno));
-		return -1;
-	}
-
-	return ret;
-}
-
-/*
- * Return the uioX char device used for a pci device. On success, return
- * the UIO number and fill dstbuf string with the path of the device in
- * sysfs. On error, return a negative value. In this case dstbuf is
- * invalid.
- */
-static int
-pci_get_uio_dev(struct rte_pci_device *dev, char *dstbuf,
-			   unsigned int buflen)
-{
-	struct rte_pci_addr *loc = &dev->addr;
-	unsigned int uio_num;
-	struct dirent *e;
-	DIR *dir;
-	char dirname[PATH_MAX];
-
-	/* depending on kernel version, uio can be located in uio/uioX
-	 * or uio:uioX */
-
-	rte_snprintf(dirname, sizeof(dirname),
-	         SYSFS_PCI_DEVICES "/" PCI_PRI_FMT "/uio",
-	         loc->domain, loc->bus, loc->devid, loc->function);
-
-	dir = opendir(dirname);
-	if (dir == NULL) {
-		/* retry with the parent directory */
-		rte_snprintf(dirname, sizeof(dirname),
-		         SYSFS_PCI_DEVICES "/" PCI_PRI_FMT,
-		         loc->domain, loc->bus, loc->devid, loc->function);
-		dir = opendir(dirname);
-
-		if (dir == NULL) {
-			RTE_LOG(ERR, EAL, "Cannot opendir %s\n", dirname);
-			return -1;
-		}
-	}
-
-	/* take the first file starting with "uio" */
-	while ((e = readdir(dir)) != NULL) {
-		/* format could be uio%d ...*/
-		int shortprefix_len = sizeof("uio") - 1;
-		/* ... or uio:uio%d */
-		int longprefix_len = sizeof("uio:uio") - 1; 
-		char *endptr;
-
-		if (strncmp(e->d_name, "uio", 3) != 0)
-			continue;
-
-		/* first try uio%d */
-		errno = 0;
-		uio_num = strtoull(e->d_name + shortprefix_len, &endptr, 10);
-		if (errno == 0 && endptr != (e->d_name + shortprefix_len)) {
-			rte_snprintf(dstbuf, buflen, "%s/uio%u", dirname, uio_num);
-			break;
-		}
-
-		/* then try uio:uio%d */
-		errno = 0;
-		uio_num = strtoull(e->d_name + longprefix_len, &endptr, 10);
-		if (errno == 0 && endptr != (e->d_name + longprefix_len)) {
-			rte_snprintf(dstbuf, buflen, "%s/uio:uio%u", dirname, uio_num);
-			break;
-		}
-	}
-	closedir(dir);
-
-	/* No uio resource found */
-	if (e == NULL)
-		return -1;
-
-	/* create uio device if we've been asked to */
-	if (internal_config.create_uio_dev && pci_mknod_uio_dev(dstbuf, uio_num) < 0)
-		RTE_LOG(WARNING, EAL, "Cannot create /dev/uio%u\n", uio_num);
-
-	return uio_num;
-}
-
-/* map the PCI resource of a PCI device in virtual memory */
-static int
-pci_uio_map_resource(struct rte_pci_device *dev)
-{
-	int i, j;
-	char dirname[PATH_MAX];
-	char devname[PATH_MAX]; /* contains the /dev/uioX */
-	void *mapaddr;
-	int uio_num;
-	uint64_t phaddr;
-	uint64_t offset;
-	uint64_t pagesz;
-	int nb_maps;
-	struct rte_pci_addr *loc = &dev->addr;
-	struct mapped_pci_resource *uio_res;
-	struct pci_map *maps;
-
-	dev->intr_handle.fd = -1;
-	dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
-
-	/* secondary processes - use already recorded details */
-	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
-		return (pci_uio_map_secondary(dev));
-
-	/* find uio resource */
-	uio_num = pci_get_uio_dev(dev, dirname, sizeof(dirname));
-	if (uio_num < 0) {
-		RTE_LOG(WARNING, EAL, "  "PCI_PRI_FMT" not managed by UIO driver, "
-				"skipping\n", loc->domain, loc->bus, loc->devid, loc->function);
-		return -1;
-	}
-	rte_snprintf(devname, sizeof(devname), "/dev/uio%u", uio_num);
-
-	/* save fd if in primary process */
-	dev->intr_handle.fd = open(devname, O_RDWR);
-	if (dev->intr_handle.fd < 0) {
-		RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
-			devname, strerror(errno));
-		return -1;
-	}
-	dev->intr_handle.type = RTE_INTR_HANDLE_UIO;
-
-	/* allocate the mapping details for secondary processes*/
-	if ((uio_res = rte_zmalloc("UIO_RES", sizeof (*uio_res), 0)) == NULL) {
-		RTE_LOG(ERR, EAL,
-			"%s(): cannot store uio mmap details\n", __func__);
-		return (-1);
-	}
-
-	rte_snprintf(uio_res->path, sizeof(uio_res->path), "%s", devname);
-	memcpy(&uio_res->pci_addr, &dev->addr, sizeof(uio_res->pci_addr));
-
-	/* collect info about device mappings */
-	nb_maps = pci_uio_get_mappings(dirname, uio_res->maps,
-				       RTE_DIM(uio_res->maps));
-	if (nb_maps < 0) {
-		rte_free(uio_res);
-		return (nb_maps);
-	}
-
-	uio_res->nb_maps = nb_maps;
-
-	/* Map all BARs */
-	pagesz = sysconf(_SC_PAGESIZE);
-
-	maps = uio_res->maps;
-	for (i = 0; i != PCI_MAX_RESOURCE; i++) {
-		int fd;
-
-		/* skip empty BAR */
-		if ((phaddr = dev->mem_resource[i].phys_addr) == 0)
-			continue;
-
-		for (j = 0; j != nb_maps && (phaddr != maps[j].phaddr ||
-				dev->mem_resource[i].len != maps[j].size);
-				j++)
-			;
-
-		/* if matching map is found, then use it */
-		if (j != nb_maps) {
-			offset = j * pagesz;
-
-			/*
-			 * open devname, to mmap it
-			 */
-			fd = open(devname, O_RDWR);
-			if (fd < 0) {
-				RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
-					devname, strerror(errno));
-				return -1;
-			}
-
-			if (maps[j].addr != NULL ||
-			    (mapaddr = pci_map_resource(NULL, fd,
-							(off_t)offset,
-							(size_t)maps[j].size)
-			    ) == NULL) {
-				rte_free(uio_res);
-				close(fd);
-				return (-1);
-			}
-			close(fd);
-
-			maps[j].addr = mapaddr;
-			maps[j].offset = offset;
-			dev->mem_resource[i].addr = mapaddr;
-		}
-	}
-
-	TAILQ_INSERT_TAIL(pci_res_list, uio_res, next);
-
-	return (0);
-}
-
 /* parse the "resource" sysfs file */
 #define IORESOURCE_MEM  0x00000200
 
@@ -546,41 +186,6 @@ error:
 	return -1;
 }
 
-/* 
- * parse a sysfs file containing one integer value 
- * different to the eal version, as it needs to work with 64-bit values
- */ 
-static int 
-pci_parse_sysfs_value(const char *filename, uint64_t *val) 
-{
-        FILE *f;
-        char buf[BUFSIZ];
-        char *end = NULL;
- 
-        f = fopen(filename, "r");
-        if (f == NULL) {
-                RTE_LOG(ERR, EAL, "%s(): cannot open sysfs value %s\n",
-                        __func__, filename);
-                return -1;
-        }
- 
-        if (fgets(buf, sizeof(buf), f) == NULL) {
-                RTE_LOG(ERR, EAL, "%s(): cannot read sysfs value %s\n",
-                        __func__, filename);
-                fclose(f);
-                return -1;
-        }
-        *val = strtoull(buf, &end, 0);
-        if ((buf[0] == '\0') || (end == NULL) || (*end != '\n')) {
-                RTE_LOG(ERR, EAL, "%s(): cannot parse sysfs value %s\n",
-                                __func__, filename);
-                fclose(f);
-                return -1;
-        }
-        fclose(f);
-        return 0;
-}
-
 /* Compare two PCI device addresses. */
 static int
 pci_addr_comparison(struct rte_pci_addr *addr, struct rte_pci_addr *addr2)
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
new file mode 100644
index 0000000..61f09cc
--- /dev/null
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
@@ -0,0 +1,403 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <string.h>
+#include <fcntl.h>
+#include <dirent.h>
+#include <sys/stat.h>
+
+#include <rte_log.h>
+#include <rte_pci.h>
+#include <rte_common.h>
+#include <rte_malloc.h>
+#include <rte_tailq.h>
+
+#include "rte_pci_dev_ids.h"
+#include "eal_filesystem.h"
+#include "eal_pci_init.h"
+
+static int pci_parse_sysfs_value(const char *filename, uint64_t *val);
+
+#define OFF_MAX              ((uint64_t)(off_t)-1)
+static int
+pci_uio_get_mappings(const char *devname, struct pci_map maps[], int nb_maps) {
+	int i;
+	char dirname[PATH_MAX];
+	char filename[PATH_MAX];
+	uint64_t offset, size;
+
+	for (i = 0; i != nb_maps; i++) {
+
+		/* check if map directory exists */
+		rte_snprintf(dirname, sizeof(dirname), "%s/maps/map%u", devname, i);
+
+		if (access(dirname, F_OK) != 0)
+			break;
+
+		/* get mapping offset */
+		rte_snprintf(filename, sizeof(filename), "%s/offset", dirname);
+		if (pci_parse_sysfs_value(filename, &offset) < 0) {
+			RTE_LOG(ERR, EAL,
+					"%s(): cannot parse offset of %s\n", __func__, dirname);
+			return (-1);
+		}
+
+		/* get mapping size */
+		rte_snprintf(filename, sizeof(filename), "%s/size", dirname);
+		if (pci_parse_sysfs_value(filename, &size) < 0) {
+			RTE_LOG(ERR, EAL,
+					"%s(): cannot parse size of %s\n", __func__, dirname);
+			return (-1);
+		}
+
+		/* get mapping physical address */
+		rte_snprintf(filename, sizeof(filename), "%s/addr", dirname);
+		if (pci_parse_sysfs_value(filename, &maps[i].phaddr) < 0) {
+			RTE_LOG(ERR, EAL,
+					"%s(): cannot parse addr of %s\n", __func__, dirname);
+			return (-1);
+		}
+
+		if ((offset > OFF_MAX) || (size > SIZE_MAX)) {
+			RTE_LOG(ERR, EAL,
+					"%s(): offset/size exceed system max value\n", __func__);
+			return (-1);
+		}
+
+		maps[i].offset = offset;
+		maps[i].size = size;
+	}
+
+	return (i);
+}
+
+static int
+pci_uio_map_secondary(struct rte_pci_device *dev) {
+	int fd, i;
+	struct mapped_pci_resource *uio_res;
+
+	TAILQ_FOREACH(uio_res, pci_res_list, next) {
+
+		/* skip this element if it doesn't match our PCI address */
+		if (memcmp(&uio_res->pci_addr, &dev->addr, sizeof(dev->addr)))
+			continue;
+
+		for (i = 0; i != uio_res->nb_maps; i++) {
+			/*
+			 * open devname, to mmap it
+			 */
+			fd = open(uio_res->path, O_RDWR);
+			if (fd < 0) {
+				RTE_LOG(ERR, EAL,
+						"Cannot open %s: %s\n", uio_res->path, strerror(errno));
+				return -1;
+			}
+
+			if (pci_map_resource(uio_res->maps[i].addr, fd,
+					(off_t) uio_res->maps[i].offset,
+					(size_t) uio_res->maps[i].size) != uio_res->maps[i].addr) {
+				RTE_LOG(ERR, EAL, "Cannot mmap device resource\n");
+				close(fd);
+				return (-1);
+			}
+			/* fd is not needed in slave process, close it */
+			close(fd);
+		}
+		return (0);
+	}
+
+	RTE_LOG(ERR, EAL, "Cannot find resource for device\n");
+	return -1;
+}
+
+static int
+pci_mknod_uio_dev(const char *sysfs_uio_path, unsigned uio_num) {
+	FILE *f;
+	char filename[PATH_MAX];
+	int ret;
+	unsigned major, minor;
+	dev_t dev;
+
+	/* get the name of the sysfs file that contains the major and minor
+	 * of the uio device and read its content */
+	rte_snprintf(filename, sizeof(filename), "%s/dev", sysfs_uio_path);
+
+	f = fopen(filename, "r");
+	if (f == NULL) {
+		RTE_LOG(ERR, EAL,
+				"%s(): cannot open sysfs to get major:minor\n", __func__);
+		return -1;
+	}
+
+	ret = fscanf(f, "%d:%d", &major, &minor);
+	if (ret != 2) {
+		RTE_LOG(ERR, EAL,
+				"%s(): cannot parse sysfs to get major:minor\n", __func__);
+		fclose(f);
+		return -1;
+	}
+	fclose(f);
+
+	/* create the char device "mknod /dev/uioX c major minor" */
+	rte_snprintf(filename, sizeof(filename), "/dev/uio%u", uio_num);
+	dev = makedev(major, minor);
+	ret = mknod(filename, S_IFCHR | S_IRUSR | S_IWUSR, dev);
+	if (f == NULL) {
+		RTE_LOG(ERR, EAL,
+				"%s(): mknod() failed %s\n", __func__, strerror(errno));
+		return -1;
+	}
+
+	return ret;
+}
+
+/*
+ * Return the uioX char device used for a pci device. On success, return
+ * the UIO number and fill dstbuf string with the path of the device in
+ * sysfs. On error, return a negative value. In this case dstbuf is
+ * invalid.
+ */
+static int
+pci_get_uio_dev(struct rte_pci_device *dev, char *dstbuf,
+		unsigned int buflen) {
+	struct rte_pci_addr *loc = &dev->addr;
+	unsigned int uio_num;
+	struct dirent *e;
+	DIR *dir;
+	char dirname[PATH_MAX];
+
+	/* depending on kernel version, uio can be located in uio/uioX
+	 * or uio:uioX */
+
+	rte_snprintf(dirname, sizeof(dirname),
+			SYSFS_PCI_DEVICES "/" PCI_PRI_FMT "/uio", loc->domain, loc->bus,
+			loc->devid, loc->function);
+
+	dir = opendir(dirname);
+	if (dir == NULL) {
+		/* retry with the parent directory */
+		rte_snprintf(dirname, sizeof(dirname),
+				SYSFS_PCI_DEVICES "/" PCI_PRI_FMT, loc->domain, loc->bus,
+				loc->devid, loc->function);
+		dir = opendir(dirname);
+
+		if (dir == NULL) {
+			RTE_LOG(ERR, EAL, "Cannot opendir %s\n", dirname);
+			return -1;
+		}
+	}
+
+	/* take the first file starting with "uio" */
+	while ((e = readdir(dir)) != NULL) {
+		/* format could be uio%d ...*/
+		int shortprefix_len = sizeof("uio") - 1;
+		/* ... or uio:uio%d */
+		int longprefix_len = sizeof("uio:uio") - 1;
+		char *endptr;
+
+		if (strncmp(e->d_name, "uio", 3) != 0)
+			continue;
+
+		/* first try uio%d */errno = 0;
+		uio_num = strtoull(e->d_name + shortprefix_len, &endptr, 10);
+		if (errno == 0 && endptr != (e->d_name + shortprefix_len)) {
+			rte_snprintf(dstbuf, buflen, "%s/uio%u", dirname, uio_num);
+			break;
+		}
+
+		/* then try uio:uio%d */errno = 0;
+		uio_num = strtoull(e->d_name + longprefix_len, &endptr, 10);
+		if (errno == 0 && endptr != (e->d_name + longprefix_len)) {
+			rte_snprintf(dstbuf, buflen, "%s/uio:uio%u", dirname, uio_num);
+			break;
+		}
+	}
+	closedir(dir);
+
+	/* No uio resource found */
+	if (e == NULL)
+		return -1;
+
+	/* create uio device if we've been asked to */
+	if (internal_config.create_uio_dev
+			&& pci_mknod_uio_dev(dstbuf, uio_num) < 0)
+		RTE_LOG(WARNING, EAL, "Cannot create /dev/uio%u\n", uio_num);
+
+	return uio_num;
+}
+
+/* map the PCI resource of a PCI device in virtual memory */
+int
+pci_uio_map_resource(struct rte_pci_device *dev) {
+	int i, j;
+	char dirname[PATH_MAX];
+	char devname[PATH_MAX]; /* contains the /dev/uioX */
+	void *mapaddr;
+	int uio_num;
+	uint64_t phaddr;
+	uint64_t offset;
+	uint64_t pagesz;
+	int nb_maps;
+	struct rte_pci_addr *loc = &dev->addr;
+	struct mapped_pci_resource *uio_res;
+	struct pci_map *maps;
+
+	dev->intr_handle.fd = -1;
+	dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
+
+	/* secondary processes - use already recorded details */
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
+		return (pci_uio_map_secondary(dev));
+
+	/* find uio resource */
+	uio_num = pci_get_uio_dev(dev, dirname, sizeof(dirname));
+	if (uio_num < 0) {
+		RTE_LOG(WARNING, EAL, "  "PCI_PRI_FMT" not managed by UIO driver, "
+		"skipping\n", loc->domain, loc->bus, loc->devid, loc->function);
+		return -1;
+	}
+	rte_snprintf(devname, sizeof(devname), "/dev/uio%u", uio_num);
+
+	/* save fd if in primary process */
+	dev->intr_handle.fd = open(devname, O_RDWR);
+	if (dev->intr_handle.fd < 0) {
+		RTE_LOG(ERR, EAL, "Cannot open %s: %s\n", devname, strerror(errno));
+		return -1;
+	}
+	dev->intr_handle.type = RTE_INTR_HANDLE_UIO;
+
+	/* allocate the mapping details for secondary processes*/
+	if ((uio_res = rte_zmalloc("UIO_RES", sizeof(*uio_res), 0)) == NULL) {
+		RTE_LOG(ERR, EAL, "%s(): cannot store uio mmap details\n", __func__);
+		return (-1);
+	}
+
+	rte_snprintf(uio_res->path, sizeof(uio_res->path), "%s", devname);
+	memcpy(&uio_res->pci_addr, &dev->addr, sizeof(uio_res->pci_addr));
+
+	/* collect info about device mappings */
+	nb_maps = pci_uio_get_mappings(dirname, uio_res->maps,
+			RTE_DIM(uio_res->maps));
+	if (nb_maps < 0) {
+		rte_free(uio_res);
+		return (nb_maps);
+	}
+
+	uio_res->nb_maps = nb_maps;
+
+	/* Map all BARs */
+	pagesz = sysconf(_SC_PAGESIZE);
+
+	maps = uio_res->maps;
+	for (i = 0; i != PCI_MAX_RESOURCE; i++) {
+		int fd;
+
+		/* skip empty BAR */
+		if ((phaddr = dev->mem_resource[i].phys_addr) == 0)
+			continue;
+
+		for (j = 0;
+				j != nb_maps
+						&& (phaddr != maps[j].phaddr
+								|| dev->mem_resource[i].len != maps[j].size);
+				j++)
+			;
+
+		/* if matching map is found, then use it */
+		if (j != nb_maps) {
+			offset = j * pagesz;
+
+			/*
+			 * open devname, to mmap it
+			 */
+			fd = open(uio_res->path, O_RDWR);
+			if (fd < 0) {
+				RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
+					uio_res->path, strerror(errno));
+				rte_free(uio_res);
+				return -1;
+			}
+
+			if (maps[j].addr != NULL
+					|| (mapaddr = pci_map_resource(NULL, fd,
+							(off_t) offset, (size_t) maps[j].size)) == NULL) {
+				rte_free(uio_res);
+				close(fd);
+				return (-1);
+			}
+			close(fd);
+
+			maps[j].addr = mapaddr;
+			maps[j].offset = offset;
+			dev->mem_resource[i].addr = mapaddr;
+		}
+	}
+
+	TAILQ_INSERT_TAIL(pci_res_list, uio_res, next);
+
+	return (0);
+}
+
+/*
+ * parse a sysfs file containing one integer value
+ * different to the eal version, as it needs to work with 64-bit values
+ */
+static int
+pci_parse_sysfs_value(const char *filename, uint64_t *val) {
+	FILE *f;
+	char buf[BUFSIZ];
+	char *end = NULL;
+
+	f = fopen(filename, "r");
+	if (f == NULL) {
+		RTE_LOG(ERR, EAL,
+				"%s(): cannot open sysfs value %s\n", __func__, filename);
+		return -1;
+	}
+
+	if (fgets(buf, sizeof(buf), f) == NULL) {
+		RTE_LOG(ERR, EAL,
+				"%s(): cannot read sysfs value %s\n", __func__, filename);
+		fclose(f);
+		return -1;
+	}
+	*val = strtoull(buf, &end, 0);
+	if ((buf[0] == '\0') || (end == NULL) || (*end != '\n')) {
+		RTE_LOG(ERR, EAL,
+				"%s(): cannot parse sysfs value %s\n", __func__, filename);
+		fclose(f);
+		return -1;
+	}
+	fclose(f);
+	return 0;
+}
diff --git a/lib/librte_eal/linuxapp/eal/include/eal_pci_init.h b/lib/librte_eal/linuxapp/eal/include/eal_pci_init.h
new file mode 100644
index 0000000..1292eda
--- /dev/null
+++ b/lib/librte_eal/linuxapp/eal/include/eal_pci_init.h
@@ -0,0 +1,66 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef EAL_PCI_INIT_H_
+#define EAL_PCI_INIT_H_
+
+struct pci_map {
+	void *addr;
+	uint64_t offset;
+	uint64_t size;
+	uint64_t phaddr;
+};
+
+/*
+ * For multi-process we need to reproduce all PCI mappings in secondary
+ * processes, so save them in a tailq.
+ */
+struct mapped_pci_resource {
+	TAILQ_ENTRY(mapped_pci_resource) next;
+
+	struct rte_pci_addr pci_addr;
+	char path[PATH_MAX];
+	int nb_maps;
+	struct pci_map maps[PCI_MAX_RESOURCE];
+};
+
+TAILQ_HEAD(mapped_pci_res_list, mapped_pci_resource);
+extern struct mapped_pci_res_list *pci_res_list;
+
+void * pci_map_resource(void * requested_addr, int fd, off_t offset,
+		size_t size);
+
+/* map IGB_UIO resource prototype */
+int pci_uio_map_resource(struct rte_pci_device *dev);
+
+#endif /* EAL_PCI_INIT_H_ */
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v3 03/20] pci: fixing errors in a previous commit found by checkpatch
  2014-05-28 14:37   ` [dpdk-dev] [PATCH v3 00/20] " Anatoly Burakov
  2014-05-28 14:37     ` [dpdk-dev] [PATCH v3 01/20] pci: move open() out of pci_map_resource, rename structs Anatoly Burakov
  2014-05-28 14:37     ` [dpdk-dev] [PATCH v3 02/20] pci: move uio mapping code to a separate file Anatoly Burakov
@ 2014-05-28 14:38     ` Anatoly Burakov
  2014-05-28 14:38     ` [dpdk-dev] [PATCH v3 04/20] pci: distinguish between legitimate failures and non-fatal errors Anatoly Burakov
                       ` (17 subsequent siblings)
  20 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-05-28 14:38 UTC (permalink / raw)
  To: dev


Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
index 61f09cc..ae4e716 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
@@ -69,7 +69,7 @@ pci_uio_get_mappings(const char *devname, struct pci_map maps[], int nb_maps) {
 		if (pci_parse_sysfs_value(filename, &offset) < 0) {
 			RTE_LOG(ERR, EAL,
 					"%s(): cannot parse offset of %s\n", __func__, dirname);
-			return (-1);
+			return -1;
 		}
 
 		/* get mapping size */
@@ -77,7 +77,7 @@ pci_uio_get_mappings(const char *devname, struct pci_map maps[], int nb_maps) {
 		if (pci_parse_sysfs_value(filename, &size) < 0) {
 			RTE_LOG(ERR, EAL,
 					"%s(): cannot parse size of %s\n", __func__, dirname);
-			return (-1);
+			return -1;
 		}
 
 		/* get mapping physical address */
@@ -85,20 +85,20 @@ pci_uio_get_mappings(const char *devname, struct pci_map maps[], int nb_maps) {
 		if (pci_parse_sysfs_value(filename, &maps[i].phaddr) < 0) {
 			RTE_LOG(ERR, EAL,
 					"%s(): cannot parse addr of %s\n", __func__, dirname);
-			return (-1);
+			return -1;
 		}
 
 		if ((offset > OFF_MAX) || (size > SIZE_MAX)) {
 			RTE_LOG(ERR, EAL,
 					"%s(): offset/size exceed system max value\n", __func__);
-			return (-1);
+			return -1;
 		}
 
 		maps[i].offset = offset;
 		maps[i].size = size;
 	}
 
-	return (i);
+	return i;
 }
 
 static int
@@ -128,12 +128,12 @@ pci_uio_map_secondary(struct rte_pci_device *dev) {
 					(size_t) uio_res->maps[i].size) != uio_res->maps[i].addr) {
 				RTE_LOG(ERR, EAL, "Cannot mmap device resource\n");
 				close(fd);
-				return (-1);
+				return -1;
 			}
 			/* fd is not needed in slave process, close it */
 			close(fd);
 		}
-		return (0);
+		return 0;
 	}
 
 	RTE_LOG(ERR, EAL, "Cannot find resource for device\n");
@@ -277,7 +277,7 @@ pci_uio_map_resource(struct rte_pci_device *dev) {
 
 	/* secondary processes - use already recorded details */
 	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
-		return (pci_uio_map_secondary(dev));
+		return pci_uio_map_secondary(dev);
 
 	/* find uio resource */
 	uio_num = pci_get_uio_dev(dev, dirname, sizeof(dirname));
@@ -299,7 +299,7 @@ pci_uio_map_resource(struct rte_pci_device *dev) {
 	/* allocate the mapping details for secondary processes*/
 	if ((uio_res = rte_zmalloc("UIO_RES", sizeof(*uio_res), 0)) == NULL) {
 		RTE_LOG(ERR, EAL, "%s(): cannot store uio mmap details\n", __func__);
-		return (-1);
+		return -1;
 	}
 
 	rte_snprintf(uio_res->path, sizeof(uio_res->path), "%s", devname);
@@ -310,7 +310,7 @@ pci_uio_map_resource(struct rte_pci_device *dev) {
 			RTE_DIM(uio_res->maps));
 	if (nb_maps < 0) {
 		rte_free(uio_res);
-		return (nb_maps);
+		return nb_maps;
 	}
 
 	uio_res->nb_maps = nb_maps;
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v3 04/20] pci: distinguish between legitimate failures and non-fatal errors
  2014-05-28 14:37   ` [dpdk-dev] [PATCH v3 00/20] " Anatoly Burakov
                       ` (2 preceding siblings ...)
  2014-05-28 14:38     ` [dpdk-dev] [PATCH v3 03/20] pci: fixing errors in a previous commit found by checkpatch Anatoly Burakov
@ 2014-05-28 14:38     ` Anatoly Burakov
  2014-05-28 14:38     ` [dpdk-dev] [PATCH v3 05/20] pci: Rename RTE_PCI_DRV_NEED_IGB_UIO to RTE_PCI_DRV_NEED_MAPPING Anatoly Burakov
                       ` (16 subsequent siblings)
  20 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-05-28 14:38 UTC (permalink / raw)
  To: dev

Currently, EAL does not distinguish between actual failures and expected
initialization errors. E.g. sometimes the driver fails to initialize
because it was not supposed to be initialized in the first place, such
as device not being managed by said driver.

This patch makes EAL fail on actual initialization errors while still
skipping over expected initialization errors.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/eal_common_pci.c    | 16 +++++++++-------
 lib/librte_eal/linuxapp/eal/eal_pci.c     |  7 ++++---
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c |  4 ++--
 3 files changed, 15 insertions(+), 12 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_pci.c b/lib/librte_eal/common/eal_common_pci.c
index 7c23e86..1fb8f2c 100644
--- a/lib/librte_eal/common/eal_common_pci.c
+++ b/lib/librte_eal/common/eal_common_pci.c
@@ -101,8 +101,8 @@ static struct rte_devargs *pci_devargs_lookup(struct rte_pci_device *dev)
 
 /*
  * If vendor/device ID match, call the devinit() function of all
- * registered driver for the given device. Return -1 if no driver is
- * found for this device.
+ * registered driver for the given device. Return -1 if initialization
+ * failed, return 1 if no driver is found for this device.
  * For drivers with the RTE_PCI_DRV_MULTIPLE flag enabled, register
  * the same device multiple times until failure to do so.
  * It is required for non-Intel NIC drivers provided by third-parties such
@@ -118,7 +118,7 @@ pci_probe_all_drivers(struct rte_pci_device *dev)
 		rc = rte_eal_pci_probe_one_driver(dr, dev);
 		if (rc < 0)
 			/* negative value is an error */
-			break;
+			return -1;
 		if (rc > 0)
 			/* positive value means driver not found */
 			continue;
@@ -130,7 +130,7 @@ pci_probe_all_drivers(struct rte_pci_device *dev)
 				;
 		return 0;
 	}
-	return -1;
+	return 1;
 }
 
 /*
@@ -144,6 +144,7 @@ rte_eal_pci_probe(void)
 	struct rte_pci_device *dev = NULL;
 	struct rte_devargs *devargs;
 	int probe_all = 0;
+	int ret = 0;
 
 	if (rte_eal_devargs_type_count(RTE_DEVTYPE_WHITELISTED_PCI) == 0)
 		probe_all = 1;
@@ -157,10 +158,11 @@ rte_eal_pci_probe(void)
 
 		/* probe all or only whitelisted devices */
 		if (probe_all)
-			pci_probe_all_drivers(dev);
+			ret = pci_probe_all_drivers(dev);
 		else if (devargs != NULL &&
-			devargs->type == RTE_DEVTYPE_WHITELISTED_PCI &&
-			pci_probe_all_drivers(dev) < 0)
+			devargs->type == RTE_DEVTYPE_WHITELISTED_PCI)
+			ret = pci_probe_all_drivers(dev);
+		if (ret < 0)
 			rte_exit(EXIT_FAILURE, "Requested device " PCI_PRI_FMT
 				 " cannot be used\n", dev->addr.domain, dev->addr.bus,
 				 dev->addr.devid, dev->addr.function);
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c
index 628813b..0b779ec 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -401,6 +401,7 @@ int
 rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, struct rte_pci_device *dev)
 {
 	struct rte_pci_id *id_table;
+	int ret = 0;
 
 	for (id_table = dr->id_table ; id_table->vendor_id != 0; id_table++) {
 
@@ -431,13 +432,13 @@ rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, struct rte_pci_device *d
 		if (dev->devargs != NULL &&
 			dev->devargs->type == RTE_DEVTYPE_BLACKLISTED_PCI) {
 			RTE_LOG(DEBUG, EAL, "  Device is blacklisted, not initializing\n");
-			return 0;
+			return 1;
 		}
 
 		if (dr->drv_flags & RTE_PCI_DRV_NEED_IGB_UIO) {
 			/* map resources for devices that use igb_uio */
-			if (pci_uio_map_resource(dev) < 0)
-				return -1;
+			if ((ret = pci_uio_map_resource(dev)) != 0)
+				return ret;
 		} else if (dr->drv_flags & RTE_PCI_DRV_FORCE_UNBIND &&
 		           rte_eal_process_type() == RTE_PROC_PRIMARY) {
 			/* unbind current driver */
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
index ae4e716..426769b 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
@@ -137,7 +137,7 @@ pci_uio_map_secondary(struct rte_pci_device *dev) {
 	}
 
 	RTE_LOG(ERR, EAL, "Cannot find resource for device\n");
-	return -1;
+	return 1;
 }
 
 static int
@@ -284,7 +284,7 @@ pci_uio_map_resource(struct rte_pci_device *dev) {
 	if (uio_num < 0) {
 		RTE_LOG(WARNING, EAL, "  "PCI_PRI_FMT" not managed by UIO driver, "
 		"skipping\n", loc->domain, loc->bus, loc->devid, loc->function);
-		return -1;
+		return 1;
 	}
 	rte_snprintf(devname, sizeof(devname), "/dev/uio%u", uio_num);
 
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v3 05/20] pci: Rename RTE_PCI_DRV_NEED_IGB_UIO to RTE_PCI_DRV_NEED_MAPPING
  2014-05-28 14:37   ` [dpdk-dev] [PATCH v3 00/20] " Anatoly Burakov
                       ` (3 preceding siblings ...)
  2014-05-28 14:38     ` [dpdk-dev] [PATCH v3 04/20] pci: distinguish between legitimate failures and non-fatal errors Anatoly Burakov
@ 2014-05-28 14:38     ` Anatoly Burakov
  2014-05-28 14:38     ` [dpdk-dev] [PATCH v3 06/20] igb_uio: make igb_uio compilation optional Anatoly Burakov
                       ` (15 subsequent siblings)
  20 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-05-28 14:38 UTC (permalink / raw)
  To: dev

Rename the RTE_PCI_DRV_NEED_IGB_UIO to be more generic.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 app/test/test_pci.c                     | 4 ++--
 lib/librte_eal/bsdapp/eal/eal_pci.c     | 2 +-
 lib/librte_eal/common/include/rte_pci.h | 4 ++--
 lib/librte_eal/linuxapp/eal/eal_pci.c   | 2 +-
 lib/librte_pmd_e1000/em_ethdev.c        | 2 +-
 lib/librte_pmd_e1000/igb_ethdev.c       | 4 ++--
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c     | 4 ++--
 lib/librte_pmd_vmxnet3/vmxnet3_ethdev.c | 2 +-
 8 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/app/test/test_pci.c b/app/test/test_pci.c
index 6908d04..fad118e 100644
--- a/app/test/test_pci.c
+++ b/app/test/test_pci.c
@@ -63,7 +63,7 @@ static int my_driver_init(struct rte_pci_driver *dr,
 			  struct rte_pci_device *dev);
 
 /*
- * To test cases where RTE_PCI_DRV_NEED_IGB_UIO is set, and isn't set, two
+ * To test cases where RTE_PCI_DRV_NEED_MAPPING is set, and isn't set, two
  * drivers are created (one with IGB devices, the other with IXGBE devices).
  */
 
@@ -91,7 +91,7 @@ struct rte_pci_driver my_driver = {
 	.name = "test_driver",
 	.devinit = my_driver_init,
 	.id_table = my_driver_id,
-	.drv_flags = RTE_PCI_DRV_NEED_IGB_UIO,
+	.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
 };
 
 struct rte_pci_driver my_driver2 = {
diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c b/lib/librte_eal/bsdapp/eal/eal_pci.c
index 94ae461..eddbd2f 100644
--- a/lib/librte_eal/bsdapp/eal/eal_pci.c
+++ b/lib/librte_eal/bsdapp/eal/eal_pci.c
@@ -474,7 +474,7 @@ rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, struct rte_pci_device *d
 			return 0;
 		}
 
-		if (dr->drv_flags & RTE_PCI_DRV_NEED_IGB_UIO) {
+		if (dr->drv_flags & RTE_PCI_DRV_NEED_MAPPING) {
 			/* map resources for devices that use igb_uio */
 			if (pci_uio_map_resource(dev) < 0)
 				return -1;
diff --git a/lib/librte_eal/common/include/rte_pci.h b/lib/librte_eal/common/include/rte_pci.h
index c793773..11b8c13 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -190,8 +190,8 @@ struct rte_pci_driver {
 	uint32_t drv_flags;                     /**< Flags contolling handling of device. */
 };
 
-/** Device needs igb_uio kernel module */
-#define RTE_PCI_DRV_NEED_IGB_UIO 0x0001
+/** Device needs PCI BAR mapping (done with either IGB_UIO or VFIO) */
+#define RTE_PCI_DRV_NEED_MAPPING 0x0001
 /** Device driver must be registered several times until failure */
 #define RTE_PCI_DRV_MULTIPLE 0x0002
 /** Device needs to be unbound even if no module is provided */
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c
index 0b779ec..a0abec8 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -435,7 +435,7 @@ rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, struct rte_pci_device *d
 			return 1;
 		}
 
-		if (dr->drv_flags & RTE_PCI_DRV_NEED_IGB_UIO) {
+		if (dr->drv_flags & RTE_PCI_DRV_NEED_MAPPING) {
 			/* map resources for devices that use igb_uio */
 			if ((ret = pci_uio_map_resource(dev)) != 0)
 				return ret;
diff --git a/lib/librte_pmd_e1000/em_ethdev.c b/lib/librte_pmd_e1000/em_ethdev.c
index 755e474..f3575d5 100644
--- a/lib/librte_pmd_e1000/em_ethdev.c
+++ b/lib/librte_pmd_e1000/em_ethdev.c
@@ -279,7 +279,7 @@ static struct eth_driver rte_em_pmd = {
 	{
 		.name = "rte_em_pmd",
 		.id_table = pci_id_em_map,
-		.drv_flags = RTE_PCI_DRV_NEED_IGB_UIO,
+		.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
 	},
 	.eth_dev_init = eth_em_dev_init,
 	.dev_private_size = sizeof(struct e1000_adapter),
diff --git a/lib/librte_pmd_e1000/igb_ethdev.c b/lib/librte_pmd_e1000/igb_ethdev.c
index c7b3926..b49db52 100644
--- a/lib/librte_pmd_e1000/igb_ethdev.c
+++ b/lib/librte_pmd_e1000/igb_ethdev.c
@@ -600,7 +600,7 @@ static struct eth_driver rte_igb_pmd = {
 	{
 		.name = "rte_igb_pmd",
 		.id_table = pci_id_igb_map,
-		.drv_flags = RTE_PCI_DRV_NEED_IGB_UIO,
+		.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
 	},
 	.eth_dev_init = eth_igb_dev_init,
 	.dev_private_size = sizeof(struct e1000_adapter),
@@ -613,7 +613,7 @@ static struct eth_driver rte_igbvf_pmd = {
 	{
 		.name = "rte_igbvf_pmd",
 		.id_table = pci_id_igbvf_map,
-		.drv_flags = RTE_PCI_DRV_NEED_IGB_UIO,
+		.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
 	},
 	.eth_dev_init = eth_igbvf_dev_init,
 	.dev_private_size = sizeof(struct e1000_adapter),
diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
index e78c208..5354a3f 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
@@ -931,7 +931,7 @@ static struct eth_driver rte_ixgbe_pmd = {
 	{
 		.name = "rte_ixgbe_pmd",
 		.id_table = pci_id_ixgbe_map,
-		.drv_flags = RTE_PCI_DRV_NEED_IGB_UIO,
+		.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
 	},
 	.eth_dev_init = eth_ixgbe_dev_init,
 	.dev_private_size = sizeof(struct ixgbe_adapter),
@@ -944,7 +944,7 @@ static struct eth_driver rte_ixgbevf_pmd = {
 	{
 		.name = "rte_ixgbevf_pmd",
 		.id_table = pci_id_ixgbevf_map,
-		.drv_flags = RTE_PCI_DRV_NEED_IGB_UIO,
+		.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
 	},
 	.eth_dev_init = eth_ixgbevf_dev_init,
 	.dev_private_size = sizeof(struct ixgbe_adapter),
diff --git a/lib/librte_pmd_vmxnet3/vmxnet3_ethdev.c b/lib/librte_pmd_vmxnet3/vmxnet3_ethdev.c
index 8259cfe..a08c2bf 100644
--- a/lib/librte_pmd_vmxnet3/vmxnet3_ethdev.c
+++ b/lib/librte_pmd_vmxnet3/vmxnet3_ethdev.c
@@ -267,7 +267,7 @@ static struct eth_driver rte_vmxnet3_pmd = {
 	{
 		.name = "rte_vmxnet3_pmd",
 		.id_table = pci_id_vmxnet3_map,
-		.drv_flags = RTE_PCI_DRV_NEED_IGB_UIO,
+		.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
 	},
 	.eth_dev_init = eth_vmxnet3_dev_init,
 	.dev_private_size = sizeof(struct vmxnet3_adapter),
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v3 06/20] igb_uio: make igb_uio compilation optional
  2014-05-28 14:37   ` [dpdk-dev] [PATCH v3 00/20] " Anatoly Burakov
                       ` (4 preceding siblings ...)
  2014-05-28 14:38     ` [dpdk-dev] [PATCH v3 05/20] pci: Rename RTE_PCI_DRV_NEED_IGB_UIO to RTE_PCI_DRV_NEED_MAPPING Anatoly Burakov
@ 2014-05-28 14:38     ` Anatoly Burakov
  2014-05-28 14:38     ` [dpdk-dev] [PATCH v3 07/20] igb_uio: Moved interrupt type out of igb_uio Anatoly Burakov
                       ` (14 subsequent siblings)
  20 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-05-28 14:38 UTC (permalink / raw)
  To: dev

Currently, igb_uio is always compiled. Some Linux distributions may not
want to include igb_uio with DPDK, so we need to make sure that igb_uio
compilation can be optional.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 config/defconfig_i686-default-linuxapp-gcc   | 1 +
 config/defconfig_i686-default-linuxapp-icc   | 1 +
 config/defconfig_x86_64-default-linuxapp-gcc | 1 +
 config/defconfig_x86_64-default-linuxapp-icc | 1 +
 lib/librte_eal/linuxapp/Makefile             | 2 ++
 5 files changed, 6 insertions(+)

diff --git a/config/defconfig_i686-default-linuxapp-gcc b/config/defconfig_i686-default-linuxapp-gcc
index 14bd3d1..ea90f12 100644
--- a/config/defconfig_i686-default-linuxapp-gcc
+++ b/config/defconfig_i686-default-linuxapp-gcc
@@ -123,6 +123,7 @@ CONFIG_RTE_LOG_HISTORY=256
 CONFIG_RTE_LIBEAL_USE_HPET=n
 CONFIG_RTE_EAL_ALLOW_INV_SOCKET_ID=n
 CONFIG_RTE_EAL_ALWAYS_PANIC_ON_ERROR=n
+CONFIG_RTE_EAL_IGB_UIO=y
 
 #
 # Compile Environment Abstraction Layer for linux
diff --git a/config/defconfig_i686-default-linuxapp-icc b/config/defconfig_i686-default-linuxapp-icc
index ec3386e..ecfbf28 100644
--- a/config/defconfig_i686-default-linuxapp-icc
+++ b/config/defconfig_i686-default-linuxapp-icc
@@ -123,6 +123,7 @@ CONFIG_RTE_LOG_HISTORY=256
 CONFIG_RTE_LIBEAL_USE_HPET=n
 CONFIG_RTE_EAL_ALLOW_INV_SOCKET_ID=n
 CONFIG_RTE_EAL_ALWAYS_PANIC_ON_ERROR=n
+CONFIG_RTE_EAL_IGB_UIO=y
 
 #
 # Compile Environment Abstraction Layer for linux
diff --git a/config/defconfig_x86_64-default-linuxapp-gcc b/config/defconfig_x86_64-default-linuxapp-gcc
index f11ffbf..fc69b80 100644
--- a/config/defconfig_x86_64-default-linuxapp-gcc
+++ b/config/defconfig_x86_64-default-linuxapp-gcc
@@ -123,6 +123,7 @@ CONFIG_RTE_LOG_HISTORY=256
 CONFIG_RTE_LIBEAL_USE_HPET=n
 CONFIG_RTE_EAL_ALLOW_INV_SOCKET_ID=n
 CONFIG_RTE_EAL_ALWAYS_PANIC_ON_ERROR=n
+CONFIG_RTE_EAL_IGB_UIO=y
 
 #
 # Compile Environment Abstraction Layer for linux
diff --git a/config/defconfig_x86_64-default-linuxapp-icc b/config/defconfig_x86_64-default-linuxapp-icc
index 4eaca4c..4ab45b3 100644
--- a/config/defconfig_x86_64-default-linuxapp-icc
+++ b/config/defconfig_x86_64-default-linuxapp-icc
@@ -123,6 +123,7 @@ CONFIG_RTE_LOG_HISTORY=256
 CONFIG_RTE_LIBEAL_USE_HPET=n
 CONFIG_RTE_EAL_ALLOW_INV_SOCKET_ID=n
 CONFIG_RTE_EAL_ALWAYS_PANIC_ON_ERROR=n
+CONFIG_RTE_EAL_IGB_UIO=y
 
 #
 # Compile Environment Abstraction Layer for linux
diff --git a/lib/librte_eal/linuxapp/Makefile b/lib/librte_eal/linuxapp/Makefile
index b00e89f..acbf500 100644
--- a/lib/librte_eal/linuxapp/Makefile
+++ b/lib/librte_eal/linuxapp/Makefile
@@ -31,7 +31,9 @@
 
 include $(RTE_SDK)/mk/rte.vars.mk
 
+ifeq ($(CONFIG_RTE_EAL_IGB_UIO),y)
 DIRS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += igb_uio
+endif
 DIRS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal
 ifeq ($(CONFIG_RTE_LIBRTE_KNI),y)
 DIRS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += kni
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v3 07/20] igb_uio: Moved interrupt type out of igb_uio
  2014-05-28 14:37   ` [dpdk-dev] [PATCH v3 00/20] " Anatoly Burakov
                       ` (5 preceding siblings ...)
  2014-05-28 14:38     ` [dpdk-dev] [PATCH v3 06/20] igb_uio: make igb_uio compilation optional Anatoly Burakov
@ 2014-05-28 14:38     ` Anatoly Burakov
  2014-05-28 14:38     ` [dpdk-dev] [PATCH v3 08/20] vfio: add support for VFIO in Linuxapp targets Anatoly Burakov
                       ` (13 subsequent siblings)
  20 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-05-28 14:38 UTC (permalink / raw)
  To: dev

Moving interrupt type enum out of igb_uio and renaming it to be more
generic. Such a strange header naming and separation is done mostly to
make coming virtio patches easier to port to dpdk.org tree.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/Makefile                     |  1 +
 lib/librte_eal/common/include/rte_pci.h            |  1 +
 .../common/include/rte_pci_dev_feature_defs.h      | 46 +++++++++++++++++++++
 .../common/include/rte_pci_dev_features.h          | 44 ++++++++++++++++++++
 lib/librte_eal/linuxapp/igb_uio/igb_uio.c          | 48 +++++++++-------------
 5 files changed, 112 insertions(+), 28 deletions(-)
 create mode 100644 lib/librte_eal/common/include/rte_pci_dev_feature_defs.h
 create mode 100644 lib/librte_eal/common/include/rte_pci_dev_features.h

diff --git a/lib/librte_eal/common/Makefile b/lib/librte_eal/common/Makefile
index 2f99bf4..7daf38c 100644
--- a/lib/librte_eal/common/Makefile
+++ b/lib/librte_eal/common/Makefile
@@ -39,6 +39,7 @@ INC += rte_rwlock.h rte_spinlock.h rte_tailq.h rte_interrupts.h rte_alarm.h
 INC += rte_string_fns.h rte_cpuflags.h rte_version.h rte_tailq_elem.h
 INC += rte_eal_memconfig.h rte_malloc_heap.h
 INC += rte_hexdump.h rte_devargs.h rte_vdev.h
+INC += rte_pci_dev_feature_defs.h rte_pci_dev_features.h
 
 ifeq ($(CONFIG_RTE_INSECURE_FUNCTION_WARNING),y)
 INC += rte_warnings.h
diff --git a/lib/librte_eal/common/include/rte_pci.h b/lib/librte_eal/common/include/rte_pci.h
index 11b8c13..e653027 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -80,6 +80,7 @@ extern "C" {
 #include <sys/queue.h>
 #include <stdint.h>
 #include <inttypes.h>
+
 #include <rte_interrupts.h>
 
 TAILQ_HEAD(pci_device_list, rte_pci_device); /**< PCI devices in D-linked Q. */
diff --git a/lib/librte_eal/common/include/rte_pci_dev_feature_defs.h b/lib/librte_eal/common/include/rte_pci_dev_feature_defs.h
new file mode 100644
index 0000000..82f2c00
--- /dev/null
+++ b/lib/librte_eal/common/include/rte_pci_dev_feature_defs.h
@@ -0,0 +1,46 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_PCI_DEV_DEFS_H_
+#define _RTE_PCI_DEV_DEFS_H_
+
+/* interrupt mode */
+enum rte_intr_mode {
+	RTE_INTR_MODE_NONE = 0,
+	RTE_INTR_MODE_LEGACY,
+	RTE_INTR_MODE_MSI,
+	RTE_INTR_MODE_MSIX,
+	RTE_INTR_MODE_MAX
+};
+
+#endif /* _RTE_PCI_DEV_DEFS_H_ */
diff --git a/lib/librte_eal/common/include/rte_pci_dev_features.h b/lib/librte_eal/common/include/rte_pci_dev_features.h
new file mode 100644
index 0000000..01200de
--- /dev/null
+++ b/lib/librte_eal/common/include/rte_pci_dev_features.h
@@ -0,0 +1,44 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_PCI_DEV_FEATURES_H
+#define _RTE_PCI_DEV_FEATURES_H
+
+#include <rte_pci_dev_feature_defs.h>
+
+#define RTE_INTR_MODE_NONE_NAME "none"
+#define RTE_INTR_MODE_LEGACY_NAME "legacy"
+#define RTE_INTR_MODE_MSI_NAME "msi"
+#define RTE_INTR_MODE_MSIX_NAME "msix"
+
+#endif
diff --git a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
index 09c40bf..7d5e6b4 100644
--- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
+++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
@@ -33,6 +33,7 @@
 #ifdef CONFIG_XEN_DOM0 
 #include <xen/xen.h>
 #endif
+#include <rte_pci_dev_features.h>
 
 /**
  * MSI-X related macros, copy from linux/pci_regs.h in kernel 2.6.39,
@@ -49,14 +50,6 @@
 
 #define IGBUIO_NUM_MSI_VECTORS 1
 
-/* interrupt mode */
-enum igbuio_intr_mode {
-	IGBUIO_LEGACY_INTR_MODE = 0,
-	IGBUIO_MSI_INTR_MODE,
-	IGBUIO_MSIX_INTR_MODE,
-	IGBUIO_INTR_MODE_MAX
-};
-
 /**
  * A structure describing the private information for a uio device.
  */
@@ -64,13 +57,13 @@ struct rte_uio_pci_dev {
 	struct uio_info info;
 	struct pci_dev *pdev;
 	spinlock_t lock; /* spinlock for accessing PCI config space or msix data in multi tasks/isr */
-	enum igbuio_intr_mode mode;
+	enum rte_intr_mode mode;
 	struct msix_entry \
 		msix_entries[IGBUIO_NUM_MSI_VECTORS]; /* pointer to the msix vectors to be allocated later */
 };
 
 static char *intr_mode = NULL;
-static enum igbuio_intr_mode igbuio_intr_mode_preferred = IGBUIO_MSIX_INTR_MODE;
+static enum rte_intr_mode igbuio_intr_mode_preferred = RTE_INTR_MODE_MSIX;
 
 /* PCI device id table */
 static struct pci_device_id igbuio_pci_ids[] = {
@@ -222,14 +215,13 @@ igbuio_set_interrupt_mask(struct rte_uio_pci_dev *udev, int32_t state)
 {
 	struct pci_dev *pdev = udev->pdev;
 
-	if (udev->mode == IGBUIO_MSIX_INTR_MODE) {
+	if (udev->mode == RTE_INTR_MODE_MSIX) {
 		struct msi_desc *desc;
 
 		list_for_each_entry(desc, &pdev->msi_list, list) {
 			igbuio_msix_mask_irq(desc, state);
 		}
-	}
-	else if (udev->mode == IGBUIO_LEGACY_INTR_MODE) {
+	} else if (udev->mode == RTE_INTR_MODE_LEGACY) {
 		uint32_t status;
 		uint16_t old, new;
 
@@ -301,7 +293,7 @@ igbuio_pci_irqhandler(int irq, struct uio_info *info)
 		goto spin_unlock;
 
 	/* for legacy mode, interrupt maybe shared */
-	if (udev->mode == IGBUIO_LEGACY_INTR_MODE) {
+	if (udev->mode == RTE_INTR_MODE_LEGACY) {
 		pci_read_config_dword(pdev, PCI_COMMAND, &cmd_status_dword);
 		status = cmd_status_dword >> 16;
 		/* interrupt is not ours, goes to out */
@@ -520,18 +512,18 @@ igbuio_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
 #endif
 	udev->info.priv = udev;
 	udev->pdev = dev;
-	udev->mode = 0; /* set the default value for interrupt mode */
+	udev->mode = RTE_INTR_MODE_LEGACY;
 	spin_lock_init(&udev->lock);
 
 	/* check if it need to try msix first */
-	if (igbuio_intr_mode_preferred == IGBUIO_MSIX_INTR_MODE) {
+	if (igbuio_intr_mode_preferred == RTE_INTR_MODE_MSIX) {
 		int vector;
 
 		for (vector = 0; vector < IGBUIO_NUM_MSI_VECTORS; vector ++)
 			udev->msix_entries[vector].entry = vector;
 
 		if (pci_enable_msix(udev->pdev, udev->msix_entries, IGBUIO_NUM_MSI_VECTORS) == 0) {
-			udev->mode = IGBUIO_MSIX_INTR_MODE;
+			udev->mode = RTE_INTR_MODE_MSIX;
 		}
 		else {
 			pci_disable_msix(udev->pdev);
@@ -539,13 +531,13 @@ igbuio_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
 		}
 	}
 	switch (udev->mode) {
-	case IGBUIO_MSIX_INTR_MODE:
+	case RTE_INTR_MODE_MSIX:
 		udev->info.irq_flags = 0;
 		udev->info.irq = udev->msix_entries[0].vector;
 		break;
-	case IGBUIO_MSI_INTR_MODE:
+	case RTE_INTR_MODE_MSI:
 		break;
-	case IGBUIO_LEGACY_INTR_MODE:
+	case RTE_INTR_MODE_LEGACY:
 		udev->info.irq_flags = IRQF_SHARED;
 		udev->info.irq = dev->irq;
 		break;
@@ -570,7 +562,7 @@ igbuio_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
 fail_release_iomem:
 	sysfs_remove_group(&dev->dev.kobj, &dev_attr_grp);
 	igbuio_pci_release_iomem(&udev->info);
-	if (udev->mode == IGBUIO_MSIX_INTR_MODE)
+	if (udev->mode == RTE_INTR_MODE_MSIX)
 		pci_disable_msix(udev->pdev);
 	pci_release_regions(dev);
 fail_disable:
@@ -595,7 +587,7 @@ igbuio_pci_remove(struct pci_dev *dev)
 	uio_unregister_device(info);
 	igbuio_pci_release_iomem(info);
 	if (((struct rte_uio_pci_dev *)info->priv)->mode ==
-					IGBUIO_MSIX_INTR_MODE)
+			RTE_INTR_MODE_MSIX)
 		pci_disable_msix(dev);
 	pci_release_regions(dev);
 	pci_disable_device(dev);
@@ -611,11 +603,11 @@ igbuio_config_intr_mode(char *intr_str)
 		return 0;
 	}
 
-	if (!strcmp(intr_str, "msix")) {
-		igbuio_intr_mode_preferred = IGBUIO_MSIX_INTR_MODE;
+	if (!strcmp(intr_str, RTE_INTR_MODE_MSIX_NAME)) {
+		igbuio_intr_mode_preferred = RTE_INTR_MODE_MSIX;
 		printk(KERN_INFO "Use MSIX interrupt\n");
-	} else if (!strcmp(intr_str, "legacy")) {
-		igbuio_intr_mode_preferred = IGBUIO_LEGACY_INTR_MODE;
+	} else if (!strcmp(intr_str, RTE_INTR_MODE_LEGACY_NAME)) {
+		igbuio_intr_mode_preferred = RTE_INTR_MODE_LEGACY;
 		printk(KERN_INFO "Use legacy interrupt\n");
 	} else {
 		printk(KERN_INFO "Error: bad parameter - %s\n", intr_str);
@@ -656,8 +648,8 @@ module_exit(igbuio_pci_exit_module);
 module_param(intr_mode, charp, S_IRUGO | S_IWUSR);
 MODULE_PARM_DESC(intr_mode,
 "igb_uio interrupt mode (default=msix):\n"
-"    msix       Use MSIX interrupt\n"
-"    legacy     Use Legacy interrupt\n"
+"    " RTE_INTR_MODE_MSIX_NAME "       Use MSIX interrupt\n"
+"    " RTE_INTR_MODE_LEGACY_NAME "     Use Legacy interrupt\n"
 "\n");
 
 MODULE_DESCRIPTION("UIO driver for Intel IGB PCI cards");
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v3 08/20] vfio: add support for VFIO in Linuxapp targets
  2014-05-28 14:37   ` [dpdk-dev] [PATCH v3 00/20] " Anatoly Burakov
                       ` (6 preceding siblings ...)
  2014-05-28 14:38     ` [dpdk-dev] [PATCH v3 07/20] igb_uio: Moved interrupt type out of igb_uio Anatoly Burakov
@ 2014-05-28 14:38     ` Anatoly Burakov
  2014-05-28 14:38     ` [dpdk-dev] [PATCH v3 09/20] vfio: add VFIO header Anatoly Burakov
                       ` (12 subsequent siblings)
  20 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-05-28 14:38 UTC (permalink / raw)
  To: dev

Add VFIO compilation option to all configs.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 config/defconfig_i686-default-linuxapp-gcc   | 1 +
 config/defconfig_i686-default-linuxapp-icc   | 1 +
 config/defconfig_x86_64-default-linuxapp-gcc | 1 +
 config/defconfig_x86_64-default-linuxapp-icc | 1 +
 4 files changed, 4 insertions(+)

diff --git a/config/defconfig_i686-default-linuxapp-gcc b/config/defconfig_i686-default-linuxapp-gcc
index ea90f12..5410f57 100644
--- a/config/defconfig_i686-default-linuxapp-gcc
+++ b/config/defconfig_i686-default-linuxapp-gcc
@@ -124,6 +124,7 @@ CONFIG_RTE_LIBEAL_USE_HPET=n
 CONFIG_RTE_EAL_ALLOW_INV_SOCKET_ID=n
 CONFIG_RTE_EAL_ALWAYS_PANIC_ON_ERROR=n
 CONFIG_RTE_EAL_IGB_UIO=y
+CONFIG_RTE_EAL_VFIO=y
 
 #
 # Compile Environment Abstraction Layer for linux
diff --git a/config/defconfig_i686-default-linuxapp-icc b/config/defconfig_i686-default-linuxapp-icc
index ecfbf28..1c0000c 100644
--- a/config/defconfig_i686-default-linuxapp-icc
+++ b/config/defconfig_i686-default-linuxapp-icc
@@ -124,6 +124,7 @@ CONFIG_RTE_LIBEAL_USE_HPET=n
 CONFIG_RTE_EAL_ALLOW_INV_SOCKET_ID=n
 CONFIG_RTE_EAL_ALWAYS_PANIC_ON_ERROR=n
 CONFIG_RTE_EAL_IGB_UIO=y
+CONFIG_RTE_EAL_VFIO=y
 
 #
 # Compile Environment Abstraction Layer for linux
diff --git a/config/defconfig_x86_64-default-linuxapp-gcc b/config/defconfig_x86_64-default-linuxapp-gcc
index fc69b80..5c682a5 100644
--- a/config/defconfig_x86_64-default-linuxapp-gcc
+++ b/config/defconfig_x86_64-default-linuxapp-gcc
@@ -124,6 +124,7 @@ CONFIG_RTE_LIBEAL_USE_HPET=n
 CONFIG_RTE_EAL_ALLOW_INV_SOCKET_ID=n
 CONFIG_RTE_EAL_ALWAYS_PANIC_ON_ERROR=n
 CONFIG_RTE_EAL_IGB_UIO=y
+CONFIG_RTE_EAL_VFIO=y
 
 #
 # Compile Environment Abstraction Layer for linux
diff --git a/config/defconfig_x86_64-default-linuxapp-icc b/config/defconfig_x86_64-default-linuxapp-icc
index 4ab45b3..b9bb7f6 100644
--- a/config/defconfig_x86_64-default-linuxapp-icc
+++ b/config/defconfig_x86_64-default-linuxapp-icc
@@ -124,6 +124,7 @@ CONFIG_RTE_LIBEAL_USE_HPET=n
 CONFIG_RTE_EAL_ALLOW_INV_SOCKET_ID=n
 CONFIG_RTE_EAL_ALWAYS_PANIC_ON_ERROR=n
 CONFIG_RTE_EAL_IGB_UIO=y
+CONFIG_RTE_EAL_VFIO=y
 
 #
 # Compile Environment Abstraction Layer for linux
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v3 09/20] vfio: add VFIO header
  2014-05-28 14:37   ` [dpdk-dev] [PATCH v3 00/20] " Anatoly Burakov
                       ` (7 preceding siblings ...)
  2014-05-28 14:38     ` [dpdk-dev] [PATCH v3 08/20] vfio: add support for VFIO in Linuxapp targets Anatoly Burakov
@ 2014-05-28 14:38     ` Anatoly Burakov
  2014-05-28 14:38     ` [dpdk-dev] [PATCH v3 10/20] interrupts: Add support for VFIO interrupts Anatoly Burakov
                       ` (11 subsequent siblings)
  20 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-05-28 14:38 UTC (permalink / raw)
  To: dev

Adding a header that will determine if VFIO support should be compiled
in. If VFIO is enabled in config (and it's enabled by default), then the
header will also check for kernel version. If VFIO is enabled in config
and if the kernel version is 3.6+, then VFIO_PRESENT will be defined.
This is the macro that should be used to determine if VFIO support is
being compiled in.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/linuxapp/eal/include/eal_vfio.h | 49 ++++++++++++++++++++++++++
 1 file changed, 49 insertions(+)
 create mode 100644 lib/librte_eal/linuxapp/eal/include/eal_vfio.h

diff --git a/lib/librte_eal/linuxapp/eal/include/eal_vfio.h b/lib/librte_eal/linuxapp/eal/include/eal_vfio.h
new file mode 100644
index 0000000..354e9ca
--- /dev/null
+++ b/lib/librte_eal/linuxapp/eal/include/eal_vfio.h
@@ -0,0 +1,49 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef EAL_VFIO_H_
+#define EAL_VFIO_H_
+
+/*
+ * determine if VFIO is present on the system
+ */
+#ifdef RTE_EAL_VFIO
+#include <linux/version.h>
+#if LINUX_VERSION_CODE >= KERNEL_VERSION(3, 6, 0)
+#include <linux/vfio.h>
+
+#define VFIO_PRESENT
+#endif /* kernel version */
+#endif /* RTE_EAL_VFIO */
+
+#endif /* EAL_VFIO_H_ */
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v3 10/20] interrupts: Add support for VFIO interrupts
  2014-05-28 14:37   ` [dpdk-dev] [PATCH v3 00/20] " Anatoly Burakov
                       ` (8 preceding siblings ...)
  2014-05-28 14:38     ` [dpdk-dev] [PATCH v3 09/20] vfio: add VFIO header Anatoly Burakov
@ 2014-05-28 14:38     ` Anatoly Burakov
  2014-05-28 14:38     ` [dpdk-dev] [PATCH v3 11/20] eal: remove -Wno-return-type for non-existent eal_hpet.c Anatoly Burakov
                       ` (10 subsequent siblings)
  20 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-05-28 14:38 UTC (permalink / raw)
  To: dev

Creating code to handle VFIO interrupts in EAL interrupts (supports all
types of interrupts).

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/linuxapp/eal/eal_interrupts.c       | 285 ++++++++++++++++++++-
 .../linuxapp/eal/include/exec-env/rte_interrupts.h |   4 +
 2 files changed, 284 insertions(+), 5 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
index 58e1ddf..c430710 100644
--- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c
+++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
@@ -36,7 +36,6 @@
 #include <stdlib.h>
 #include <pthread.h>
 #include <sys/queue.h>
-#include <malloc.h>
 #include <stdarg.h>
 #include <unistd.h>
 #include <string.h>
@@ -44,6 +43,7 @@
 #include <inttypes.h>
 #include <sys/epoll.h>
 #include <sys/signalfd.h>
+#include <sys/ioctl.h>
 
 #include <rte_common.h>
 #include <rte_interrupts.h>
@@ -66,6 +66,7 @@
 #include <rte_spinlock.h>
 
 #include "eal_private.h"
+#include "eal_vfio.h"
 
 #define EAL_INTR_EPOLL_WAIT_FOREVER (-1)
 
@@ -87,6 +88,9 @@ union intr_pipefds{
  */
 union rte_intr_read_buffer {
 	int uio_intr_count;              /* for uio device */
+#ifdef VFIO_PRESENT
+	uint64_t vfio_intr_count;        /* for vfio device */
+#endif
 	uint64_t timerfd_num;            /* for timerfd */
 	char charbuf[16];                /* for others */
 };
@@ -119,6 +123,244 @@ static struct rte_intr_source_list intr_sources;
 /* interrupt handling thread */
 static pthread_t intr_thread;
 
+/* VFIO interrupts */
+#ifdef VFIO_PRESENT
+
+#define IRQ_SET_BUF_LEN  (sizeof(struct vfio_irq_set) + sizeof(int))
+
+/* enable legacy (INTx) interrupts */
+static int
+vfio_enable_intx(struct rte_intr_handle *intr_handle) {
+	struct vfio_irq_set *irq_set;
+	char irq_set_buf[IRQ_SET_BUF_LEN];
+	int len, ret;
+	int *fd_ptr;
+
+	len = sizeof(irq_set_buf);
+
+	/* enable INTx */
+	irq_set = (struct vfio_irq_set *) irq_set_buf;
+	irq_set->argsz = len;
+	irq_set->count = 1;
+	irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD | VFIO_IRQ_SET_ACTION_TRIGGER;
+	irq_set->index = VFIO_PCI_INTX_IRQ_INDEX;
+	irq_set->start = 0;
+	fd_ptr = (int *) &irq_set->data;
+	*fd_ptr = intr_handle->fd;
+
+	ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
+
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Error enabling INTx interrupts for fd %d\n",
+						intr_handle->fd);
+		return -1;
+	}
+
+	/* unmask INTx after enabling */
+	memset(irq_set, 0, len);
+	len = sizeof(struct vfio_irq_set);
+	irq_set->argsz = len;
+	irq_set->count = 1;
+	irq_set->flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_UNMASK;
+	irq_set->index = VFIO_PCI_INTX_IRQ_INDEX;
+	irq_set->start = 0;
+
+	ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
+
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Error unmasking INTx interrupts for fd %d\n",
+						intr_handle->fd);
+		return -1;
+	}
+	return 0;
+}
+
+/* disable legacy (INTx) interrupts */
+static int
+vfio_disable_intx(struct rte_intr_handle *intr_handle) {
+	struct vfio_irq_set *irq_set;
+	char irq_set_buf[IRQ_SET_BUF_LEN];
+	int len, ret;
+
+	len = sizeof(struct vfio_irq_set);
+
+	/* mask interrupts before disabling */
+	irq_set = (struct vfio_irq_set *) irq_set_buf;
+	irq_set->argsz = len;
+	irq_set->count = 1;
+	irq_set->flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_UNMASK;
+	irq_set->index = VFIO_PCI_INTX_IRQ_INDEX;
+	irq_set->start = 0;
+
+	ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
+
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Error unmasking INTx interrupts for fd %d\n",
+						intr_handle->fd);
+		return -1;
+	}
+
+	/* disable INTx*/
+	memset(irq_set, 0, len);
+	irq_set->argsz = len;
+	irq_set->count = 0;
+	irq_set->flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_TRIGGER;
+	irq_set->index = VFIO_PCI_INTX_IRQ_INDEX;
+	irq_set->start = 0;
+
+	ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
+
+	if (ret) {
+		RTE_LOG(ERR, EAL,
+			"Error disabling INTx interrupts for fd %d\n", intr_handle->fd);
+		return -1;
+	}
+	return 0;
+}
+
+/* enable MSI-X interrupts */
+static int
+vfio_enable_msi(struct rte_intr_handle *intr_handle) {
+	int len, ret;
+	char irq_set_buf[IRQ_SET_BUF_LEN];
+	struct vfio_irq_set *irq_set;
+	int *fd_ptr;
+
+	len = sizeof(irq_set_buf);
+
+	irq_set = (struct vfio_irq_set *) irq_set_buf;
+	irq_set->argsz = len;
+	irq_set->count = 1;
+	irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD | VFIO_IRQ_SET_ACTION_TRIGGER;
+	irq_set->index = VFIO_PCI_MSI_IRQ_INDEX;
+	irq_set->start = 0;
+	fd_ptr = (int *) &irq_set->data;
+	*fd_ptr = intr_handle->fd;
+
+	ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
+
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Error enabling MSI interrupts for fd %d\n",
+						intr_handle->fd);
+		return -1;
+	}
+
+	/* manually trigger interrupt to enable it */
+	memset(irq_set, 0, len);
+	len = sizeof(struct vfio_irq_set);
+	irq_set->argsz = len;
+	irq_set->count = 1;
+	irq_set->flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_TRIGGER;
+	irq_set->index = VFIO_PCI_MSI_IRQ_INDEX;
+	irq_set->start = 0;
+
+	ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
+
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Error triggering MSI interrupts for fd %d\n",
+						intr_handle->fd);
+		return -1;
+	}
+	return 0;
+}
+
+/* disable MSI-X interrupts */
+static int
+vfio_disable_msi(struct rte_intr_handle *intr_handle) {
+	struct vfio_irq_set *irq_set;
+	char irq_set_buf[IRQ_SET_BUF_LEN];
+	int len, ret;
+
+	len = sizeof(struct vfio_irq_set);
+
+	irq_set = (struct vfio_irq_set *) irq_set_buf;
+	irq_set->argsz = len;
+	irq_set->count = 0;
+	irq_set->flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_TRIGGER;
+	irq_set->index = VFIO_PCI_MSI_IRQ_INDEX;
+	irq_set->start = 0;
+
+	ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
+
+	if (ret)
+		RTE_LOG(ERR, EAL,
+			"Error disabling MSI interrupts for fd %d\n", intr_handle->fd);
+
+	return ret;
+}
+
+/* enable MSI-X interrupts */
+static int
+vfio_enable_msix(struct rte_intr_handle *intr_handle) {
+	int len, ret;
+	char irq_set_buf[IRQ_SET_BUF_LEN];
+	struct vfio_irq_set *irq_set;
+	int *fd_ptr;
+
+	len = sizeof(irq_set_buf);
+
+	irq_set = (struct vfio_irq_set *) irq_set_buf;
+	irq_set->argsz = len;
+	irq_set->count = 1;
+	irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD | VFIO_IRQ_SET_ACTION_TRIGGER;
+	irq_set->index = VFIO_PCI_MSIX_IRQ_INDEX;
+	irq_set->start = 0;
+	fd_ptr = (int *) &irq_set->data;
+	*fd_ptr = intr_handle->fd;
+
+	ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
+
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Error enabling MSI-X interrupts for fd %d\n",
+						intr_handle->fd);
+		return -1;
+	}
+
+	/* manually trigger interrupt to enable it */
+	memset(irq_set, 0, len);
+	len = sizeof(struct vfio_irq_set);
+	irq_set->argsz = len;
+	irq_set->count = 1;
+	irq_set->flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_TRIGGER;
+	irq_set->index = VFIO_PCI_MSIX_IRQ_INDEX;
+	irq_set->start = 0;
+
+	ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
+
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Error triggering MSI-X interrupts for fd %d\n",
+						intr_handle->fd);
+		return -1;
+	}
+	return 0;
+}
+
+/* disable MSI-X interrupts */
+static int
+vfio_disable_msix(struct rte_intr_handle *intr_handle) {
+	struct vfio_irq_set *irq_set;
+	char irq_set_buf[IRQ_SET_BUF_LEN];
+	int len, ret;
+
+	len = sizeof(struct vfio_irq_set);
+
+	irq_set = (struct vfio_irq_set *) irq_set_buf;
+	irq_set->argsz = len;
+	irq_set->count = 0;
+	irq_set->flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_TRIGGER;
+	irq_set->index = VFIO_PCI_MSIX_IRQ_INDEX;
+	irq_set->start = 0;
+
+	ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
+
+	if (ret)
+		RTE_LOG(ERR, EAL,
+			"Error disabling MSI-X interrupts for fd %d\n", intr_handle->fd);
+
+	return ret;
+}
+#endif
+
 int
 rte_intr_callback_register(struct rte_intr_handle *intr_handle,
 			rte_intr_callback_fn cb, void *cb_arg)
@@ -276,6 +518,20 @@ rte_intr_enable(struct rte_intr_handle *intr_handle)
 	/* not used at this moment */
 	case RTE_INTR_HANDLE_ALARM:
 		return -1;
+#ifdef VFIO_PRESENT
+	case RTE_INTR_HANDLE_VFIO_MSIX:
+		if (vfio_enable_msix(intr_handle))
+			return -1;
+		break;
+	case RTE_INTR_HANDLE_VFIO_MSI:
+		if (vfio_enable_msi(intr_handle))
+			return -1;
+		break;
+	case RTE_INTR_HANDLE_VFIO_LEGACY:
+		if (vfio_enable_intx(intr_handle))
+			return -1;
+		break;
+#endif
 	/* unknown handle type */
 	default:
 		RTE_LOG(ERR, EAL,
@@ -300,7 +556,7 @@ rte_intr_disable(struct rte_intr_handle *intr_handle)
 	case RTE_INTR_HANDLE_UIO:
 		if (write(intr_handle->fd, &value, sizeof(value)) < 0){
 			RTE_LOG(ERR, EAL,
-				"Error enabling interrupts for fd %d\n",
+				"Error disabling interrupts for fd %d\n",
 							intr_handle->fd);
 			return -1;
 		}
@@ -308,6 +564,20 @@ rte_intr_disable(struct rte_intr_handle *intr_handle)
 	/* not used at this moment */
 	case RTE_INTR_HANDLE_ALARM:
 		return -1;
+#ifdef VFIO_PRESENT
+	case RTE_INTR_HANDLE_VFIO_MSIX:
+		if (vfio_disable_msix(intr_handle))
+			return -1;
+		break;
+	case RTE_INTR_HANDLE_VFIO_MSI:
+		if (vfio_disable_msi(intr_handle))
+			return -1;
+		break;
+	case RTE_INTR_HANDLE_VFIO_LEGACY:
+		if (vfio_disable_intx(intr_handle))
+			return -1;
+		break;
+#endif
 	/* unknown handle type */
 	default:
 		RTE_LOG(ERR, EAL,
@@ -357,10 +627,15 @@ eal_intr_process_interrupts(struct epoll_event *events, int nfds)
 		/* set the length to be read dor different handle type */
 		switch (src->intr_handle.type) {
 		case RTE_INTR_HANDLE_UIO:
-			bytes_read = 4;
+			bytes_read = sizeof(buf.uio_intr_count);
 			break;
 		case RTE_INTR_HANDLE_ALARM:
-			bytes_read = sizeof(uint64_t);
+			bytes_read = sizeof(buf.timerfd_num);
+			break;
+		case RTE_INTR_HANDLE_VFIO_MSIX:
+		case RTE_INTR_HANDLE_VFIO_MSI:
+		case RTE_INTR_HANDLE_VFIO_LEGACY:
+			bytes_read = sizeof(buf.vfio_intr_count);
 			break;
 		default:
 			bytes_read = 1;
@@ -397,7 +672,7 @@ eal_intr_process_interrupts(struct epoll_event *events, int nfds)
 				active_cb.cb_fn(&src->intr_handle,
 					active_cb.cb_arg);
 
-				/*get the lcok back. */
+				/*get the lock back. */
 				rte_spinlock_lock(&intr_lock);
 			}
 		}
diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
index 6733948..e00a343 100644
--- a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
+++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
@@ -41,12 +41,16 @@
 enum rte_intr_handle_type {
 	RTE_INTR_HANDLE_UNKNOWN = 0,
 	RTE_INTR_HANDLE_UIO,      /**< uio device handle */
+	RTE_INTR_HANDLE_VFIO_LEGACY,  /**< vfio device handle (legacy) */
+	RTE_INTR_HANDLE_VFIO_MSI,     /**< vfio device handle (MSI) */
+	RTE_INTR_HANDLE_VFIO_MSIX,    /**< vfio device handle (MSIX) */
 	RTE_INTR_HANDLE_ALARM,    /**< alarm handle */
 	RTE_INTR_HANDLE_MAX
 };
 
 /** Handle for interrupts. */
 struct rte_intr_handle {
+	int vfio_dev_fd;                 /**< VFIO device file descriptor */
 	int fd;                          /**< file descriptor */
 	enum rte_intr_handle_type type;  /**< handle type */
 };
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v3 11/20] eal: remove -Wno-return-type for non-existent eal_hpet.c
  2014-05-28 14:37   ` [dpdk-dev] [PATCH v3 00/20] " Anatoly Burakov
                       ` (9 preceding siblings ...)
  2014-05-28 14:38     ` [dpdk-dev] [PATCH v3 10/20] interrupts: Add support for VFIO interrupts Anatoly Burakov
@ 2014-05-28 14:38     ` Anatoly Burakov
  2014-05-28 14:38     ` [dpdk-dev] [PATCH v3 12/20] vfio: create mapping code for VFIO Anatoly Burakov
                       ` (9 subsequent siblings)
  20 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-05-28 14:38 UTC (permalink / raw)
  To: dev

eal_hpet.c was renamed to eal_timer.c and, thanks to code changes, does
not need the -Wno-return-type any more.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/linuxapp/eal/Makefile | 1 -
 1 file changed, 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index 527fa2a..76d445f 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -93,7 +93,6 @@ CFLAGS_eal_common_whitelist.o := -D_GNU_SOURCE
 # http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12603
 ifeq ($(CONFIG_RTE_TOOLCHAIN_GCC),y)
 CFLAGS_eal_thread.o += -Wno-return-type
-CFLAGS_eal_hpet.o += -Wno-return-type
 endif
 
 INC := rte_per_lcore.h rte_lcore.h rte_interrupts.h rte_kni_common.h rte_dom0_common.h
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v3 12/20] vfio: create mapping code for VFIO
  2014-05-28 14:37   ` [dpdk-dev] [PATCH v3 00/20] " Anatoly Burakov
                       ` (10 preceding siblings ...)
  2014-05-28 14:38     ` [dpdk-dev] [PATCH v3 11/20] eal: remove -Wno-return-type for non-existent eal_hpet.c Anatoly Burakov
@ 2014-05-28 14:38     ` Anatoly Burakov
  2014-05-28 14:38     ` [dpdk-dev] [PATCH v3 13/20] vfio: add multiprocess support Anatoly Burakov
                       ` (8 subsequent siblings)
  20 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-05-28 14:38 UTC (permalink / raw)
  To: dev

Adding code to support VFIO mapping (primary processes only). Most of
the things are done via ioctl() calls on either /dev/vfio/vfio (the
container) or a /dev/vfio/$GROUP_NR (IOMMU group).

In a nutshell, the code does the following:
1. creates a VFIO container (an entity that allows sharing IOMMU DMA
   mappings between devices)
2. checks if a given PCI device is a member of an IOMMU group (if it's
   not, this indicates that the device isn't bound to VFIO)
3. calls open() the group file to obtain a group fd
4. checks if the group is viable (that is, if all the devices in the
   same IOMMU group are either bound to VFIO or not bound to anything)
5. adds the group to a container
6. sets up DMA mappings (only done once, mapping whole DPDK hugepage
   memory for DMA, with a 1:1 correspondence of IOVA to PA)
7. gets the actual PCI device fd from the group fd (can fail, which
   simply means that this particular device is not bound to VFIO)
8. maps BARs (MSI-X BAR cannot be mmaped, so skipping it)
9. sets up interrupt structures (but not enables them!)
10. enables PCI bus mastering

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/linuxapp/eal/Makefile               |   2 +
 lib/librte_eal/linuxapp/eal/eal.c                  |   2 +
 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c         | 706 +++++++++++++++++++++
 .../linuxapp/eal/include/eal_internal_cfg.h        |   3 +
 lib/librte_eal/linuxapp/eal/include/eal_pci_init.h |  31 +
 lib/librte_eal/linuxapp/eal/include/eal_vfio.h     |   6 +
 6 files changed, 750 insertions(+)
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c

diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index 76d445f..cb87f8a 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -58,6 +58,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_thread.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_log.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci_uio.c
+SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci_vfio.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_debug.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_lcore.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_timer.c
@@ -87,6 +88,7 @@ CFLAGS_eal_log.o := -D_GNU_SOURCE
 CFLAGS_eal_common_log.o := -D_GNU_SOURCE
 CFLAGS_eal_hugepage_info.o := -D_GNU_SOURCE
 CFLAGS_eal_pci.o := -D_GNU_SOURCE
+CFLAGS_eal_pci_vfio.o := -D_GNU_SOURCE
 CFLAGS_eal_common_whitelist.o := -D_GNU_SOURCE
 
 # workaround for a gcc bug with noreturn attribute
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index de182e1..18a3e04 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -650,6 +650,8 @@ eal_parse_args(int argc, char **argv)
 	internal_config.force_sockets = 0;
 	internal_config.syslog_facility = LOG_DAEMON;
 	internal_config.xen_dom0_support = 0;
+	/* if set to NONE, interrupt mode is determined automatically */
+	internal_config.vfio_intr_mode = RTE_INTR_MODE_NONE;
 #ifdef RTE_LIBEAL_USE_HPET
 	internal_config.no_hpet = 0;
 #else
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
new file mode 100644
index 0000000..e1d6973
--- /dev/null
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
@@ -0,0 +1,706 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <string.h>
+#include <fcntl.h>
+#include <linux/pci_regs.h>
+#include <sys/eventfd.h>
+#include <sys/socket.h>
+#include <sys/ioctl.h>
+
+#include <rte_log.h>
+#include <rte_pci.h>
+#include <rte_tailq.h>
+#include <rte_eal_memconfig.h>
+#include <rte_malloc.h>
+
+#include "eal_filesystem.h"
+#include "eal_pci_init.h"
+#include "eal_vfio.h"
+
+/**
+ * @file
+ * PCI probing under linux (VFIO version)
+ *
+ * This code tries to determine if the PCI device is bound to VFIO driver,
+ * and initialize it (map BARs, set up interrupts) if that's the case.
+ *
+ * This file is only compiled if CONFIG_RTE_EAL_VFIO is set to "y".
+ */
+
+#ifdef VFIO_PRESENT
+
+#define VFIO_DIR "/dev/vfio"
+#define VFIO_CONTAINER_PATH "/dev/vfio/vfio"
+#define VFIO_GROUP_FMT "/dev/vfio/%u"
+#define VFIO_GET_REGION_ADDR(x) ((uint64_t) x << 40ULL)
+
+/* per-process VFIO config */
+static struct vfio_config vfio_cfg;
+
+/* get PCI BAR number where MSI-X interrupts are */
+static int
+pci_vfio_get_msix_bar(int fd, int *msix_bar)
+{
+	int ret;
+	uint32_t reg;
+	uint8_t cap_id, cap_offset;
+
+	/* read PCI capability pointer from config space */
+	ret = pread64(fd, &reg, sizeof(reg),
+			VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
+			PCI_CAPABILITY_LIST);
+	if (ret != sizeof(reg)) {
+		RTE_LOG(ERR, EAL, "Cannot read capability pointer from PCI "
+				"config space!\n");
+		return -1;
+	}
+
+	/* we need first byte */
+	cap_offset = reg & 0xFF;
+
+	while (cap_offset) {
+
+		/* read PCI capability ID */
+		ret = pread64(fd, &reg, sizeof(reg),
+				VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
+				cap_offset);
+		if (ret != sizeof(reg)) {
+			RTE_LOG(ERR, EAL, "Cannot read capability ID from PCI "
+					"config space!\n");
+			return -1;
+		}
+
+		/* we need first byte */
+		cap_id = reg & 0xFF;
+
+		/* if we haven't reached MSI-X, check next capability */
+		if (cap_id != PCI_CAP_ID_MSIX) {
+			ret = pread64(fd, &reg, sizeof(reg),
+					VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
+					cap_offset);
+			if (ret != sizeof(reg)) {
+				RTE_LOG(ERR, EAL, "Cannot read capability pointer from PCI "
+						"config space!\n");
+				return -1;
+			}
+
+			/* we need second byte */
+			cap_offset = (reg & 0xFF00) >> 8;
+
+			continue;
+		}
+		/* else, read table offset */
+		else {
+			/* table offset resides in the next 4 bytes */
+			ret = pread64(fd, &reg, sizeof(reg),
+					VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
+					cap_offset + 4);
+			if (ret != sizeof(reg)) {
+				RTE_LOG(ERR, EAL, "Cannot read table offset from PCI config "
+						"space!\n");
+				return -1;
+			}
+
+			*msix_bar = reg & RTE_PCI_MSIX_TABLE_BIR;
+
+			return 0;
+		}
+	}
+	return 0;
+}
+
+/* set PCI bus mastering */
+static int
+pci_vfio_set_bus_master(int dev_fd)
+{
+	uint16_t reg;
+	int ret;
+
+	ret = pread64(dev_fd, &reg, sizeof(reg),
+			VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
+			PCI_COMMAND);
+	if (ret != sizeof(reg)) {
+		RTE_LOG(ERR, EAL, "Cannot read command from PCI config space!\n");
+		return -1;
+	}
+
+	/* set the master bit */
+	reg |= PCI_COMMAND_MASTER;
+
+	ret = pwrite64(dev_fd, &reg, sizeof(reg),
+			VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
+			PCI_COMMAND);
+
+	if (ret != sizeof(reg)) {
+		RTE_LOG(ERR, EAL, "Cannot write command to PCI config space!\n");
+		return -1;
+	}
+
+	return 0;
+}
+
+/* set up DMA mappings */
+static int
+pci_vfio_setup_dma_maps(int vfio_container_fd)
+{
+	const struct rte_memseg *ms = rte_eal_get_physmem_layout();
+	int i, ret;
+
+	ret = ioctl(vfio_container_fd, VFIO_SET_IOMMU,
+			VFIO_TYPE1_IOMMU);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "  cannot set IOMMU type!\n");
+		return -1;
+	}
+
+	/* map all DPDK segments for DMA. use 1:1 PA to IOVA mapping */
+	for (i = 0; i < RTE_MAX_MEMSEG; i++) {
+		struct vfio_iommu_type1_dma_map dma_map;
+
+		if (ms[i].addr == NULL)
+			break;
+
+		memset(&dma_map, 0, sizeof(dma_map));
+		dma_map.argsz = sizeof(struct vfio_iommu_type1_dma_map);
+		dma_map.vaddr = ms[i].addr_64;
+		dma_map.size = ms[i].len;
+		dma_map.iova = ms[i].phys_addr;
+		dma_map.flags = VFIO_DMA_MAP_FLAG_READ | VFIO_DMA_MAP_FLAG_WRITE;
+
+		ret = ioctl(vfio_container_fd, VFIO_IOMMU_MAP_DMA, &dma_map);
+
+		if (ret) {
+			RTE_LOG(ERR, EAL, "  cannot set up DMA remapping!\n");
+			return -1;
+		}
+	}
+
+	return 0;
+}
+
+/* set up interrupt support (but not enable interrupts) */
+static int
+pci_vfio_setup_interrupts(struct rte_pci_device *dev, int vfio_dev_fd)
+{
+	int i, ret, intr_idx;
+
+	/* default to invalid index */
+	intr_idx = VFIO_PCI_NUM_IRQS;
+
+	/* get interrupt type from internal config (MSI-X by default, can be
+	 * overriden from the command line
+	 */
+	switch (internal_config.vfio_intr_mode) {
+	case RTE_INTR_MODE_MSIX:
+		intr_idx = VFIO_PCI_MSIX_IRQ_INDEX;
+		break;
+	case RTE_INTR_MODE_MSI:
+		intr_idx = VFIO_PCI_MSI_IRQ_INDEX;
+		break;
+	case RTE_INTR_MODE_LEGACY:
+		intr_idx = VFIO_PCI_INTX_IRQ_INDEX;
+		break;
+	/* don't do anything if we want to automatically determine interrupt type */
+	case RTE_INTR_MODE_NONE:
+		break;
+	default:
+		RTE_LOG(ERR, EAL, "  unknown default interrupt type!\n");
+		return -1;
+	}
+
+	/* start from MSI-X interrupt type */
+	for (i = VFIO_PCI_MSIX_IRQ_INDEX; i >= 0; i--) {
+		struct vfio_irq_info irq = { .argsz = sizeof(irq) };
+		int fd = -1;
+
+		/* skip interrupt modes we don't want */
+		if (internal_config.vfio_intr_mode != RTE_INTR_MODE_NONE &&
+				i != intr_idx)
+			continue;
+
+		irq.index = i;
+
+		ret = ioctl(vfio_dev_fd, VFIO_DEVICE_GET_IRQ_INFO, &irq);
+		if (ret < 0) {
+			RTE_LOG(ERR, EAL, "  cannot get IRQ info!\n");
+			return -1;
+		}
+
+		/* if this vector cannot be used with eventfd, fail if we explicitly
+		 * specified interrupt type, otherwise continue */
+		if ((irq.flags & VFIO_IRQ_INFO_EVENTFD) == 0) {
+			if (internal_config.vfio_intr_mode != RTE_INTR_MODE_NONE) {
+				RTE_LOG(ERR, EAL, "  interrupt vector does not support eventfd!\n");
+				return -1;
+			} else
+				continue;
+		}
+
+		/* set up an eventfd for interrupts */
+		fd = eventfd(0, 0);
+		if (fd < 0) {
+			RTE_LOG(ERR, EAL, "  cannot set up eventfd!\n");
+			return -1;
+		}
+
+		dev->intr_handle.fd = fd;
+		dev->intr_handle.vfio_dev_fd = vfio_dev_fd;
+
+		switch (i) {
+		case VFIO_PCI_MSIX_IRQ_INDEX:
+			internal_config.vfio_intr_mode = RTE_INTR_MODE_MSIX;
+			dev->intr_handle.type = RTE_INTR_HANDLE_VFIO_MSIX;
+			break;
+		case VFIO_PCI_MSI_IRQ_INDEX:
+			internal_config.vfio_intr_mode = RTE_INTR_MODE_MSI;
+			dev->intr_handle.type = RTE_INTR_HANDLE_VFIO_MSI;
+			break;
+		case VFIO_PCI_INTX_IRQ_INDEX:
+			internal_config.vfio_intr_mode = RTE_INTR_MODE_LEGACY;
+			dev->intr_handle.type = RTE_INTR_HANDLE_VFIO_LEGACY;
+			break;
+		default:
+			RTE_LOG(ERR, EAL, "  unknown interrupt type!\n");
+			return -1;
+		}
+
+		return 0;
+	}
+
+	/* if we're here, we haven't found a suitable interrupt vector */
+	return -1;
+}
+
+/* open container fd or get an existing one */
+static int
+pci_vfio_get_container_fd(void)
+{
+	int ret, vfio_container_fd;
+
+	/* if we're in a primary process, try to open the container */
+	if (internal_config.process_type == RTE_PROC_PRIMARY) {
+		vfio_container_fd = open(VFIO_CONTAINER_PATH, O_RDWR);
+		if (vfio_container_fd < 0) {
+			RTE_LOG(ERR, EAL, "  cannot open VFIO container!\n");
+			return -1;
+		}
+
+		/* check VFIO API version */
+		ret = ioctl(vfio_container_fd, VFIO_GET_API_VERSION);
+		if (ret != VFIO_API_VERSION) {
+			RTE_LOG(ERR, EAL, "  unknown VFIO API version!\n");
+			close(vfio_container_fd);
+			return -1;
+		}
+
+		/* check if we support IOMMU type 1 */
+		ret = ioctl(vfio_container_fd, VFIO_CHECK_EXTENSION, VFIO_TYPE1_IOMMU);
+		if (!ret) {
+			RTE_LOG(ERR, EAL, "  unknown IOMMU driver!\n");
+			close(vfio_container_fd);
+			return -1;
+		}
+
+		return vfio_container_fd;
+	}
+
+	return -1;
+}
+
+/* open group fd or get an existing one */
+static int
+pci_vfio_get_group_fd(int iommu_group_no)
+{
+	int i;
+	int vfio_group_fd;
+	char filename[PATH_MAX];
+
+	/* check if we already have the group descriptor open */
+	for (i = 0; i < vfio_cfg.vfio_group_idx; i++)
+		if (vfio_cfg.vfio_groups[i].group_no == iommu_group_no)
+			return vfio_cfg.vfio_groups[i].fd;
+
+	/* if primary, try to open the group */
+	if (internal_config.process_type == RTE_PROC_PRIMARY) {
+		rte_snprintf(filename, sizeof(filename),
+				 VFIO_GROUP_FMT, iommu_group_no);
+		vfio_group_fd = open(filename, O_RDWR);
+		if (vfio_group_fd < 0) {
+			/* if file not found, it's not an error */
+			if (errno != ENOENT) {
+				RTE_LOG(ERR, EAL, "Cannot open %s: %s\n", filename,
+						strerror(errno));
+				return -1;
+			}
+			return 0;
+		}
+
+		/* if the fd is valid, create a new group for it */
+		if (vfio_cfg.vfio_group_idx == VFIO_MAX_GROUPS) {
+			RTE_LOG(ERR, EAL, "Maximum number of VFIO groups reached!\n");
+			return -1;
+		}
+		vfio_cfg.vfio_groups[vfio_cfg.vfio_group_idx].group_no = iommu_group_no;
+		vfio_cfg.vfio_groups[vfio_cfg.vfio_group_idx].fd = vfio_group_fd;
+		return vfio_group_fd;
+	}
+	return -1;
+}
+
+/* parse IOMMU group number for a PCI device
+ * returns -1 for errors, 0 for non-existent group */
+static int
+pci_vfio_get_group_no(const char *pci_addr)
+{
+	char linkname[PATH_MAX];
+	char filename[PATH_MAX];
+	char *tok[16], *group_tok, *end;
+	int ret, iommu_group_no;
+
+	memset(linkname, 0, sizeof(linkname));
+	memset(filename, 0, sizeof(filename));
+
+	/* try to find out IOMMU group for this device */
+	rte_snprintf(linkname, sizeof(linkname),
+			 SYSFS_PCI_DEVICES "/%s/iommu_group", pci_addr);
+
+	ret = readlink(linkname, filename, sizeof(filename));
+
+	/* if the link doesn't exist, no VFIO for us */
+	if (ret < 0)
+		return 0;
+
+	ret = rte_strsplit(filename, sizeof(filename),
+			tok, RTE_DIM(tok), '/');
+
+	if (ret <= 0) {
+		RTE_LOG(ERR, EAL, "  %s cannot get IOMMU group\n", pci_addr);
+		return -1;
+	}
+
+	/* IOMMU group is always the last token */
+	errno = 0;
+	group_tok = tok[ret - 1];
+	end = group_tok;
+	iommu_group_no = strtol(group_tok, &end, 10);
+	if ((end != group_tok && *end != '\0') || errno != 0) {
+		RTE_LOG(ERR, EAL, "  %s error parsing IOMMU number!\n", pci_addr);
+		return -1;
+	}
+
+	return iommu_group_no;
+}
+
+static void
+clear_current_group(void)
+{
+	vfio_cfg.vfio_groups[vfio_cfg.vfio_group_idx].group_no = 0;
+	vfio_cfg.vfio_groups[vfio_cfg.vfio_group_idx].fd = -1;
+}
+
+
+/*
+ * map the PCI resources of a PCI device in virtual memory (VFIO version).
+ * primary and secondary processes follow almost exactly the same path
+ */
+int
+pci_vfio_map_resource(struct rte_pci_device *dev)
+{
+	struct vfio_group_status group_status = {
+			.argsz = sizeof(group_status)
+	};
+	struct vfio_device_info device_info = { .argsz = sizeof(device_info) };
+	int vfio_group_fd, vfio_dev_fd;
+	int iommu_group_no;
+	char pci_addr[PATH_MAX] = {0};
+	struct rte_pci_addr *loc = &dev->addr;
+	int i, ret, msix_bar;
+	struct mapped_pci_resource *vfio_res = NULL;
+	struct pci_map *maps;
+
+	dev->intr_handle.fd = -1;
+	dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
+
+	/* store PCI address string */
+	rte_snprintf(pci_addr, sizeof(pci_addr), PCI_PRI_FMT,
+			loc->domain, loc->bus, loc->devid, loc->function);
+
+	/* get container fd (needs to be done only once per initialization) */
+	if (vfio_cfg.vfio_container_fd == -1) {
+		int vfio_container_fd = pci_vfio_get_container_fd();
+		if (vfio_container_fd < 0) {
+			RTE_LOG(ERR, EAL, "  %s cannot open VFIO container!\n", pci_addr);
+			return -1;
+		}
+
+		vfio_cfg.vfio_container_fd = vfio_container_fd;
+	}
+
+	/* get group number */
+	iommu_group_no = pci_vfio_get_group_no(pci_addr);
+
+	/* if 0, group doesn't exist */
+	if (iommu_group_no == 0) {
+		RTE_LOG(WARNING, EAL, "  %s not managed by VFIO driver, skipping\n",
+				pci_addr);
+		return 1;
+	}
+	/* if negative, something failed */
+	else if (iommu_group_no < 0)
+		return -1;
+
+	/* get the actual group fd */
+	vfio_group_fd = pci_vfio_get_group_fd(iommu_group_no);
+	if (vfio_group_fd < 0)
+		return -1;
+
+	/* store group fd */
+	vfio_cfg.vfio_groups[vfio_cfg.vfio_group_idx].group_no = iommu_group_no;
+	vfio_cfg.vfio_groups[vfio_cfg.vfio_group_idx].fd = vfio_group_fd;
+
+	/* if group_fd == 0, that means the device isn't managed by VFIO */
+	if (vfio_group_fd == 0) {
+		RTE_LOG(WARNING, EAL, "  %s not managed by VFIO driver, skipping\n",
+				pci_addr);
+		/* we store 0 as group fd to distinguish between existing but
+		 * unbound VFIO groups, and groups that don't exist at all.
+		 */
+		vfio_cfg.vfio_group_idx++;
+		return 1;
+	}
+
+	/*
+	 * at this point, we know at least one port on this device is bound to VFIO,
+	 * so we can proceed to try and set this particular port up
+	 */
+
+	/* check if the group is viable */
+	ret = ioctl(vfio_group_fd, VFIO_GROUP_GET_STATUS, &group_status);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "  %s cannot get group status!\n", pci_addr);
+		close(vfio_group_fd);
+		clear_current_group();
+		return -1;
+	} else if (!(group_status.flags & VFIO_GROUP_FLAGS_VIABLE)) {
+		RTE_LOG(ERR, EAL, "  %s VFIO group is not viable!\n", pci_addr);
+		close(vfio_group_fd);
+		clear_current_group();
+		return -1;
+	}
+
+	/*
+	 * at this point, we know that this group is viable (meaning, all devices
+	 * are either bound to VFIO or not bound to anything)
+	 */
+
+	/* check if group does not have a container yet */
+	if (!(group_status.flags & VFIO_GROUP_FLAGS_CONTAINER_SET)) {
+
+		/* add group to a container */
+		ret = ioctl(vfio_group_fd, VFIO_GROUP_SET_CONTAINER,
+				&vfio_cfg.vfio_container_fd);
+		if (ret) {
+			RTE_LOG(ERR, EAL, "  %s cannot add VFIO group to container!\n",
+					pci_addr);
+			close(vfio_group_fd);
+			clear_current_group();
+			return -1;
+		}
+		/*
+		 * at this point we know that this group has been successfully
+		 * initialized, so we increment vfio_group_idx to indicate that we can
+		 * add new groups.
+		 */
+		vfio_cfg.vfio_group_idx++;
+	}
+
+	/*
+	 * set up DMA mappings for container (needs to be done only once, only when
+	 * at least one group is assigned to a container and only in primary process)
+	 */
+	if (internal_config.process_type == RTE_PROC_PRIMARY &&
+			vfio_cfg.vfio_container_has_dma == 0) {
+		ret = pci_vfio_setup_dma_maps(vfio_cfg.vfio_container_fd);
+		if (ret) {
+			RTE_LOG(ERR, EAL, "  %s DMA remapping failed!\n", pci_addr);
+			return -1;
+		}
+		vfio_cfg.vfio_container_has_dma = 1;
+	}
+
+	/* get a file descriptor for the device */
+	vfio_dev_fd = ioctl(vfio_group_fd, VFIO_GROUP_GET_DEVICE_FD, pci_addr);
+	if (vfio_dev_fd < 0) {
+		/* if we cannot get a device fd, this simply means that this
+		 * particular port is not bound to VFIO
+		 */
+		RTE_LOG(WARNING, EAL, "  %s not managed by VFIO driver, skipping\n",
+				pci_addr);
+		return 1;
+	}
+
+	/* test and setup the device */
+	ret = ioctl(vfio_dev_fd, VFIO_DEVICE_GET_INFO, &device_info);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "  %s cannot get device info!\n", pci_addr);
+		close(vfio_dev_fd);
+		return -1;
+	}
+
+	/* get MSI-X BAR, if any (we have to know where it is because we can't
+	 * mmap it when using VFIO) */
+	msix_bar = -1;
+	ret = pci_vfio_get_msix_bar(vfio_dev_fd, &msix_bar);
+	if (ret < 0) {
+		RTE_LOG(ERR, EAL, "  %s cannot get MSI-X BAR number!\n", pci_addr);
+		close(vfio_dev_fd);
+		return -1;
+	}
+
+	/* if we're in a primary process, allocate vfio_res and get region info */
+	if (internal_config.process_type == RTE_PROC_PRIMARY) {
+		if ((vfio_res = rte_zmalloc("VFIO_RES", sizeof(*vfio_res), 0))
+				== NULL) {
+			RTE_LOG(ERR, EAL,
+				"%s(): cannot store uio mmap details\n", __func__);
+			close(vfio_dev_fd);
+			return -1;
+		}
+		memcpy(&vfio_res->pci_addr, &dev->addr, sizeof(vfio_res->pci_addr));
+
+		/* get number of registers (up to BAR5) */
+		vfio_res->nb_maps = RTE_MIN((int) device_info.num_regions,
+				VFIO_PCI_BAR5_REGION_INDEX + 1);
+	}
+
+	/* map BARs */
+	maps = vfio_res->maps;
+
+	for (i = 0; i < (int) vfio_res->nb_maps; i++) {
+		struct vfio_region_info reg = { .argsz = sizeof(reg) };
+		void *bar_addr;
+
+		reg.index = i;
+
+		ret = ioctl(vfio_dev_fd, VFIO_DEVICE_GET_REGION_INFO, &reg);
+
+		if (ret) {
+			RTE_LOG(ERR, EAL, "  %s cannot get device region info!\n",
+					pci_addr);
+			close(vfio_dev_fd);
+			if (internal_config.process_type == RTE_PROC_PRIMARY)
+				rte_free(vfio_res);
+			return -1;
+		}
+
+		/* skip non-mmapable BARs */
+		if ((reg.flags & VFIO_REGION_INFO_FLAG_MMAP) == 0)
+			continue;
+
+		/* skip MSI-X BAR */
+		if (i == msix_bar)
+			continue;
+
+		bar_addr = pci_map_resource(maps[i].addr, vfio_dev_fd, reg.offset,
+				reg.size);
+
+		if (bar_addr == NULL) {
+			RTE_LOG(ERR, EAL, "  %s mapping BAR%i failed: %s\n", pci_addr, i,
+					strerror(errno));
+			close(vfio_dev_fd);
+			if (internal_config.process_type == RTE_PROC_PRIMARY)
+				rte_free(vfio_res);
+			return -1;
+		}
+
+		maps[i].addr = bar_addr;
+		maps[i].offset = reg.offset;
+		maps[i].size = reg.size;
+		dev->mem_resource[i].addr = bar_addr;
+	}
+
+	/* if secondary process, do not set up interrupts */
+	if (internal_config.process_type == RTE_PROC_PRIMARY) {
+		if (pci_vfio_setup_interrupts(dev, vfio_dev_fd) != 0) {
+			RTE_LOG(ERR, EAL, "  %s error setting up interrupts!\n", pci_addr);
+			close(vfio_dev_fd);
+			rte_free(vfio_res);
+			return -1;
+		}
+
+		/* set bus mastering for the device */
+		if (pci_vfio_set_bus_master(vfio_dev_fd)) {
+			RTE_LOG(ERR, EAL, "  %s cannot set up bus mastering!\n", pci_addr);
+			close(vfio_dev_fd);
+			rte_free(vfio_res);
+			return -1;
+		}
+
+		/* Reset the device */
+		ioctl(vfio_dev_fd, VFIO_DEVICE_RESET);
+	}
+
+	if (internal_config.process_type == RTE_PROC_PRIMARY)
+		TAILQ_INSERT_TAIL(pci_res_list, vfio_res, next);
+
+	return 0;
+}
+
+int
+pci_vfio_enable(void)
+{
+	/* initialize group list */
+	int i;
+
+	for (i = 0; i < VFIO_MAX_GROUPS; i++) {
+		vfio_cfg.vfio_groups[i].fd = -1;
+		vfio_cfg.vfio_groups[i].group_no = -1;
+	}
+	vfio_cfg.vfio_container_fd = -1;
+
+	/* check if we have VFIO driver enabled */
+	if (access(VFIO_DIR, F_OK) == 0)
+		vfio_cfg.vfio_enabled = 1;
+	else
+		RTE_LOG(INFO, EAL, "VFIO driver not loaded or wrong permissions\n");
+
+	return 0;
+}
+
+int
+pci_vfio_is_enabled(void)
+{
+	return vfio_cfg.vfio_enabled;
+}
+#endif
diff --git a/lib/librte_eal/linuxapp/eal/include/eal_internal_cfg.h b/lib/librte_eal/linuxapp/eal/include/eal_internal_cfg.h
index 92e3065..5468b0a 100644
--- a/lib/librte_eal/linuxapp/eal/include/eal_internal_cfg.h
+++ b/lib/librte_eal/linuxapp/eal/include/eal_internal_cfg.h
@@ -40,6 +40,7 @@
 #define _EAL_LINUXAPP_INTERNAL_CFG
 
 #include <rte_eal.h>
+#include <rte_pci_dev_feature_defs.h>
 
 #define MAX_HUGEPAGE_SIZES 3  /**< support up to 3 page sizes */
 
@@ -76,6 +77,8 @@ struct internal_config {
 	volatile uint64_t socket_mem[RTE_MAX_NUMA_NODES]; /**< amount of memory per socket */
 	uintptr_t base_virtaddr;          /**< base address to try and reserve memory from */
 	volatile int syslog_facility;	  /**< facility passed to openlog() */
+	/** default interrupt mode for VFIO */
+	volatile enum rte_intr_mode vfio_intr_mode;
 	const char *hugefile_prefix;      /**< the base filename of hugetlbfs files */
 	const char *hugepage_dir;         /**< specific hugetlbfs directory to use */
 
diff --git a/lib/librte_eal/linuxapp/eal/include/eal_pci_init.h b/lib/librte_eal/linuxapp/eal/include/eal_pci_init.h
index 1292eda..23fb3c3 100644
--- a/lib/librte_eal/linuxapp/eal/include/eal_pci_init.h
+++ b/lib/librte_eal/linuxapp/eal/include/eal_pci_init.h
@@ -34,6 +34,8 @@
 #ifndef EAL_PCI_INIT_H_
 #define EAL_PCI_INIT_H_
 
+#include "eal_vfio.h"
+
 struct pci_map {
 	void *addr;
 	uint64_t offset;
@@ -63,4 +65,33 @@ void * pci_map_resource(void * requested_addr, int fd, off_t offset,
 /* map IGB_UIO resource prototype */
 int pci_uio_map_resource(struct rte_pci_device *dev);
 
+#ifdef VFIO_PRESENT
+
+#define VFIO_MAX_GROUPS 64
+
+int pci_vfio_enable(void);
+int pci_vfio_is_enabled(void);
+
+/* map VFIO resource prototype */
+int pci_vfio_map_resource(struct rte_pci_device *dev);
+
+/*
+ * we don't need to store device fd's anywhere since they can be obtained from
+ * the group fd via an ioctl() call.
+ */
+struct vfio_group {
+	int group_no;
+	int fd;
+};
+
+struct vfio_config {
+	int vfio_enabled;
+	int vfio_container_fd;
+	int vfio_container_has_dma;
+	int vfio_group_idx;
+	struct vfio_group vfio_groups[VFIO_MAX_GROUPS];
+};
+
+#endif
+
 #endif /* EAL_PCI_INIT_H_ */
diff --git a/lib/librte_eal/linuxapp/eal/include/eal_vfio.h b/lib/librte_eal/linuxapp/eal/include/eal_vfio.h
index 354e9ca..03e693e 100644
--- a/lib/librte_eal/linuxapp/eal/include/eal_vfio.h
+++ b/lib/librte_eal/linuxapp/eal/include/eal_vfio.h
@@ -42,6 +42,12 @@
 #if LINUX_VERSION_CODE >= KERNEL_VERSION(3, 6, 0)
 #include <linux/vfio.h>
 
+#if LINUX_VERSION_CODE < KERNEL_VERSION(3, 10, 0)
+#define RTE_PCI_MSIX_TABLE_BIR 0x7
+#else
+#define RTE_PCI_MSIX_TABLE_BIR PCI_MSIX_TABLE_BIR
+#endif
+
 #define VFIO_PRESENT
 #endif /* kernel version */
 #endif /* RTE_EAL_VFIO */
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v3 13/20] vfio: add multiprocess support.
  2014-05-28 14:37   ` [dpdk-dev] [PATCH v3 00/20] " Anatoly Burakov
                       ` (11 preceding siblings ...)
  2014-05-28 14:38     ` [dpdk-dev] [PATCH v3 12/20] vfio: create mapping code for VFIO Anatoly Burakov
@ 2014-05-28 14:38     ` Anatoly Burakov
  2014-05-28 14:38     ` [dpdk-dev] [PATCH v3 14/20] pci: enable VFIO device binding Anatoly Burakov
                       ` (7 subsequent siblings)
  20 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-05-28 14:38 UTC (permalink / raw)
  To: dev

Since VFIO cannot be used to map the same device twice, secondary
processes receive the device/group fd's by means of communicating over a
local socket. Only group and container fd's should be sent, as device
fd's can be obtained via ioctl() calls' on the group fd.

For multiprocess, VFIO distinguishes between existing but unused groups
(e.g. grups that aren't bound to VFIO driver) and non-existing groups in
order to know if the secondary process requests a valid group, or if
secondary process requests something that doesn't exist.

VFIO multiprocess sync communicates over a simple protocol. It defines
two requests - request for group fd, and request for container fd.
Possible replies are: SOCKET_OK (an OK signal), SOCKET_ERR (error
signal) and SOCKET_NO_FD (a signal that indicates that the requested
VFIO group is valid, but no fd is present for that group - indicating
that the respective group is simply not bound to VFIO driver).

Here is the logic in a nutshell:

1. secondary process sends SOCKET_REQ_CONTAINER or SOCKET_REQ_GROUP
1a. in case of SOCKET_REQ_GROUP, client also then sends group number
2. primary process receives message
2a. in case of invalid group, SOCKET_ERR is sent back to secondary
2b. in case of unbound group, SOCKET_NO_FD is sent back to secondary
2c. in case of valid group, SOCKET_OK is sent and followed by fd
3. socket is closed

in case of any error, socket is closed and SOCKET_ERR is sent.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/linuxapp/eal/Makefile               |   1 +
 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c         |  79 ++++-
 lib/librte_eal/linuxapp/eal/eal_pci_vfio_mp_sync.c | 395 +++++++++++++++++++++
 lib/librte_eal/linuxapp/eal/include/eal_pci_init.h |  19 +
 4 files changed, 492 insertions(+), 2 deletions(-)
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_pci_vfio_mp_sync.c

diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index cb87f8a..572d173 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -59,6 +59,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_log.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci_uio.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci_vfio.c
+SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci_vfio_mp_sync.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_debug.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_lcore.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_timer.c
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
index e1d6973..f0d4f55 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
@@ -303,7 +303,7 @@ pci_vfio_setup_interrupts(struct rte_pci_device *dev, int vfio_dev_fd)
 }
 
 /* open container fd or get an existing one */
-static int
+int
 pci_vfio_get_container_fd(void)
 {
 	int ret, vfio_container_fd;
@@ -333,13 +333,36 @@ pci_vfio_get_container_fd(void)
 		}
 
 		return vfio_container_fd;
+	} else {
+		/*
+		 * if we're in a secondary process, request container fd from the
+		 * primary process via our socket
+		 */
+		int socket_fd;
+		if ((socket_fd = vfio_mp_sync_connect_to_primary()) < 0) {
+			RTE_LOG(ERR, EAL, "  cannot connect to primary process!\n");
+			return -1;
+		}
+		if (vfio_mp_sync_send_request(socket_fd, SOCKET_REQ_CONTAINER) < 0) {
+			RTE_LOG(ERR, EAL, "  cannot request container fd!\n");
+			close(socket_fd);
+			return -1;
+		}
+		vfio_container_fd = vfio_mp_sync_receive_fd(socket_fd);
+		if (vfio_container_fd < 0) {
+			RTE_LOG(ERR, EAL, "  cannot get container fd!\n");
+			close(socket_fd);
+			return -1;
+		}
+		close(socket_fd);
+		return vfio_container_fd;
 	}
 
 	return -1;
 }
 
 /* open group fd or get an existing one */
-static int
+int
 pci_vfio_get_group_fd(int iommu_group_no)
 {
 	int i;
@@ -375,6 +398,44 @@ pci_vfio_get_group_fd(int iommu_group_no)
 		vfio_cfg.vfio_groups[vfio_cfg.vfio_group_idx].fd = vfio_group_fd;
 		return vfio_group_fd;
 	}
+	/* if we're in a secondary process, request group fd from the primary
+	 * process via our socket
+	 */
+	else {
+		int socket_fd, ret;
+		if ((socket_fd = vfio_mp_sync_connect_to_primary()) < 0) {
+			RTE_LOG(ERR, EAL, "  cannot connect to primary process!\n");
+			return -1;
+		}
+		if (vfio_mp_sync_send_request(socket_fd, SOCKET_REQ_GROUP) < 0) {
+			RTE_LOG(ERR, EAL, "  cannot request container fd!\n");
+			close(socket_fd);
+			return -1;
+		}
+		if (vfio_mp_sync_send_request(socket_fd, iommu_group_no) < 0) {
+			RTE_LOG(ERR, EAL, "  cannot send group number!\n");
+			close(socket_fd);
+			return -1;
+		}
+		ret = vfio_mp_sync_receive_request(socket_fd);
+		switch (ret) {
+		case SOCKET_NO_FD:
+			close(socket_fd);
+			return 0;
+		case SOCKET_OK:
+			vfio_group_fd = vfio_mp_sync_receive_fd(socket_fd);
+			/* if we got the fd, return it */
+			if (vfio_group_fd > 0) {
+				close(socket_fd);
+				return vfio_group_fd;
+			}
+			/* fall-through on error */
+		default:
+			RTE_LOG(ERR, EAL, "  cannot get container fd!\n");
+			close(socket_fd);
+			return -1;
+		}
+	}
 	return -1;
 }
 
@@ -602,6 +663,20 @@ pci_vfio_map_resource(struct rte_pci_device *dev)
 		/* get number of registers (up to BAR5) */
 		vfio_res->nb_maps = RTE_MIN((int) device_info.num_regions,
 				VFIO_PCI_BAR5_REGION_INDEX + 1);
+	} else {
+		/* if we're in a secondary process, just find our tailq entry */
+		TAILQ_FOREACH(vfio_res, pci_res_list, next) {
+			if (memcmp(&vfio_res->pci_addr, &dev->addr, sizeof(dev->addr)))
+				continue;
+			break;
+		}
+		/* if we haven't found our tailq entry, something's wrong */
+		if (vfio_res == NULL) {
+			RTE_LOG(ERR, EAL, "  %s cannot find TAILQ entry for PCI device!\n",
+					pci_addr);
+			close(vfio_dev_fd);
+			return -1;
+		}
 	}
 
 	/* map BARs */
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio_mp_sync.c b/lib/librte_eal/linuxapp/eal/eal_pci_vfio_mp_sync.c
new file mode 100644
index 0000000..26dbaa5
--- /dev/null
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio_mp_sync.c
@@ -0,0 +1,395 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <string.h>
+#include <fcntl.h>
+#include <sys/socket.h>
+
+/* sys/un.h with __USE_MISC uses strlen, which is unsafe and should not be used. */
+#ifdef __USE_MISC
+#define REMOVED_USE_MISC
+#undef __USE_MISC
+#endif
+#include <sys/un.h>
+/* make sure we redefine __USE_MISC only if it was previously undefined */
+#ifdef REMOVED_USE_MISC
+#define __USE_MISC
+#undef REMOVED_USE_MISC
+#endif
+
+#include <rte_log.h>
+#include <rte_pci.h>
+#include <rte_tailq.h>
+#include <rte_eal_memconfig.h>
+#include <rte_malloc.h>
+
+#include "eal_filesystem.h"
+#include "eal_pci_init.h"
+
+/**
+ * @file
+ * VFIO socket for communication between primary and secondary processes.
+ *
+ * This file is only compiled if CONFIG_RTE_EAL_VFIO is set to "y".
+ */
+
+#ifdef VFIO_PRESENT
+
+#define SOCKET_PATH_FMT "%s/.%s_mp_socket"
+#define CMSGLEN (CMSG_LEN(sizeof(int)))
+#define FD_TO_CMSGHDR(fd, chdr) \
+		do {\
+			(chdr).cmsg_len = CMSGLEN;\
+			(chdr).cmsg_level = SOL_SOCKET;\
+			(chdr).cmsg_type = SCM_RIGHTS;\
+			memcpy((chdr).__cmsg_data, &(fd), sizeof(fd));\
+		} while (0)
+#define CMSGHDR_TO_FD(chdr, fd) \
+			memcpy(&(fd), (chdr).__cmsg_data, sizeof(fd))
+
+static pthread_t socket_thread;
+static int mp_socket_fd;
+
+
+/* get socket path (/var/run if root, $HOME otherwise) */
+static void
+get_socket_path(char *buffer, int bufsz)
+{
+	const char *dir = "/var/run";
+	const char *home_dir = getenv("HOME");
+
+	if (getuid() != 0 && home_dir != NULL)
+		dir = home_dir;
+
+	/* use current prefix as file path */
+	rte_snprintf(buffer, bufsz, SOCKET_PATH_FMT, dir,
+			internal_config.hugefile_prefix);
+}
+
+
+
+/*
+ * data flow for socket comm protocol:
+ * 1. client sends SOCKET_REQ_CONTAINER or SOCKET_REQ_GROUP
+ * 1a. in case of SOCKET_REQ_GROUP, client also then sends group number
+ * 2. server receives message
+ * 2a. in case of invalid group, SOCKET_ERR is sent back to client
+ * 2b. in case of unbound group, SOCKET_NO_FD is sent back to client
+ * 2c. in case of valid group, SOCKET_OK is sent and immediately followed by fd
+ *
+ * in case of any error, socket is closed.
+ */
+
+/* send a request, return -1 on error */
+int
+vfio_mp_sync_send_request(int socket, int req)
+{
+	struct msghdr hdr;
+	struct iovec iov;
+	int buf;
+	int ret;
+
+	memset(&hdr, 0, sizeof(hdr));
+
+	buf = req;
+
+	hdr.msg_iov = &iov;
+	hdr.msg_iovlen = 1;
+	iov.iov_base = (char *) &buf;
+	iov.iov_len = sizeof(buf);
+
+	ret = sendmsg(socket, &hdr, 0);
+	if (ret < 0)
+		return -1;
+	return 0;
+}
+
+/* receive a request and return it */
+int
+vfio_mp_sync_receive_request(int socket)
+{
+	int buf;
+	struct msghdr hdr;
+	struct iovec iov;
+	int ret, req;
+
+	memset(&hdr, 0, sizeof(hdr));
+
+	buf = SOCKET_ERR;
+
+	hdr.msg_iov = &iov;
+	hdr.msg_iovlen = 1;
+	iov.iov_base = (char *) &buf;
+	iov.iov_len = sizeof(buf);
+
+	ret = recvmsg(socket, &hdr, 0);
+	if (ret < 0)
+		return -1;
+
+	req = buf;
+
+	return req;
+}
+
+/* send OK in message, fd in control message */
+int
+vfio_mp_sync_send_fd(int socket, int fd)
+{
+	int buf;
+	struct msghdr hdr;
+	struct cmsghdr *chdr;
+	char chdr_buf[CMSGLEN];
+	struct iovec iov;
+	int ret;
+
+	chdr = (struct cmsghdr *) chdr_buf;
+	memset(chdr, 0, sizeof(chdr_buf));
+	memset(&hdr, 0, sizeof(hdr));
+
+	hdr.msg_iov = &iov;
+	hdr.msg_iovlen = 1;
+	iov.iov_base = (char *) &buf;
+	iov.iov_len = sizeof(buf);
+	hdr.msg_control = chdr;
+	hdr.msg_controllen = CMSGLEN;
+
+	buf = SOCKET_OK;
+	FD_TO_CMSGHDR(fd, *chdr);
+
+	ret = sendmsg(socket, &hdr, 0);
+	if (ret < 0)
+		return -1;
+	return 0;
+}
+
+/* receive OK in message, fd in control message */
+int
+vfio_mp_sync_receive_fd(int socket)
+{
+	int buf;
+	struct msghdr hdr;
+	struct cmsghdr *chdr;
+	char chdr_buf[CMSGLEN];
+	struct iovec iov;
+	int ret, req, fd;
+
+	buf = SOCKET_ERR;
+
+	chdr = (struct cmsghdr *) chdr_buf;
+	memset(chdr, 0, sizeof(chdr_buf));
+	memset(&hdr, 0, sizeof(hdr));
+
+	hdr.msg_iov = &iov;
+	hdr.msg_iovlen = 1;
+	iov.iov_base = (char *) &buf;
+	iov.iov_len = sizeof(buf);
+	hdr.msg_control = chdr;
+	hdr.msg_controllen = CMSGLEN;
+
+	ret = recvmsg(socket, &hdr, 0);
+	if (ret < 0)
+		return -1;
+
+	req = buf;
+
+	if (req != SOCKET_OK)
+		return -1;
+
+	CMSGHDR_TO_FD(*chdr, fd);
+
+	return fd;
+}
+
+/* connect socket_fd in secondary process to the primary process's socket */
+int
+vfio_mp_sync_connect_to_primary(void)
+{
+	struct sockaddr_un addr;
+	socklen_t sockaddr_len;
+	int socket_fd;
+
+	/* set up a socket */
+	socket_fd = socket(AF_UNIX, SOCK_SEQPACKET, 0);
+	if (socket_fd < 0) {
+		RTE_LOG(ERR, EAL, "Failed to create socket!\n");
+		return -1;
+	}
+
+	get_socket_path(addr.sun_path, sizeof(addr.sun_path));
+	addr.sun_family = AF_UNIX;
+
+	sockaddr_len = sizeof(struct sockaddr_un);
+
+	if (connect(socket_fd, (struct sockaddr *) &addr, sockaddr_len) == 0)
+		return socket_fd;
+
+	/* if connect failed */
+	close(socket_fd);
+	return -1;
+}
+
+
+
+/*
+ * socket listening thread for primary process
+ */
+static __attribute__((noreturn)) void *
+pci_vfio_mp_sync_thread(void __rte_unused * arg)
+{
+	int ret, fd, vfio_group_no;
+
+	/* wait for requests on the socket */
+	for (;;) {
+		int conn_sock;
+		struct sockaddr_un addr;
+		socklen_t sockaddr_len = sizeof(addr);
+
+		/* this is a blocking call */
+		conn_sock = accept(mp_socket_fd, (struct sockaddr *) &addr,
+				&sockaddr_len);
+
+		/* just restart on error */
+		if (conn_sock == -1)
+			continue;
+
+		/* set socket to linger after close */
+		struct linger l;
+		l.l_onoff = 1;
+		l.l_linger = 60;
+		setsockopt(conn_sock, SOL_SOCKET, SO_LINGER, &l, sizeof(l));
+
+		ret = vfio_mp_sync_receive_request(conn_sock);
+
+		switch (ret) {
+		case SOCKET_REQ_CONTAINER:
+			fd = pci_vfio_get_container_fd();
+			if (fd < 0)
+				vfio_mp_sync_send_request(conn_sock, SOCKET_ERR);
+			else
+				vfio_mp_sync_send_fd(conn_sock, fd);
+			break;
+		case SOCKET_REQ_GROUP:
+			/* wait for group number */
+			vfio_group_no = vfio_mp_sync_receive_request(conn_sock);
+			if (vfio_group_no < 0) {
+				close(conn_sock);
+				continue;
+			}
+
+			fd = pci_vfio_get_group_fd(vfio_group_no);
+
+			if (fd < 0)
+				vfio_mp_sync_send_request(conn_sock, SOCKET_ERR);
+			/* if VFIO group exists but isn't bound to VFIO driver */
+			else if (fd == 0)
+				vfio_mp_sync_send_request(conn_sock, SOCKET_NO_FD);
+			/* if group exists and is bound to VFIO driver */
+			else {
+				vfio_mp_sync_send_request(conn_sock, SOCKET_OK);
+				vfio_mp_sync_send_fd(conn_sock, fd);
+			}
+			break;
+		default:
+			vfio_mp_sync_send_request(conn_sock, SOCKET_ERR);
+			break;
+		}
+		close(conn_sock);
+	}
+}
+
+static int
+vfio_mp_sync_socket_setup(void)
+{
+	int ret, socket_fd;
+	struct sockaddr_un addr;
+	socklen_t sockaddr_len;
+
+	/* set up a socket */
+	socket_fd = socket(AF_UNIX, SOCK_SEQPACKET, 0);
+	if (socket_fd < 0) {
+		RTE_LOG(ERR, EAL, "Failed to create socket!\n");
+		return -1;
+	}
+
+	get_socket_path(addr.sun_path, sizeof(addr.sun_path));
+	addr.sun_family = AF_UNIX;
+
+	sockaddr_len = sizeof(struct sockaddr_un);
+
+	unlink(addr.sun_path);
+
+	ret = bind(socket_fd, (struct sockaddr *) &addr, sockaddr_len);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Failed to bind socket: %s!\n", strerror(errno));
+		close(socket_fd);
+		return -1;
+	}
+
+	ret = listen(socket_fd, 50);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Failed to listen: %s!\n", strerror(errno));
+		close(socket_fd);
+		return -1;
+	}
+
+	/* save the socket in local configuration */
+	mp_socket_fd = socket_fd;
+
+	return 0;
+}
+
+/*
+ * set up a local socket and tell it to listen for incoming connections
+ */
+int
+pci_vfio_mp_sync_setup(void)
+{
+	int ret;
+
+	if (vfio_mp_sync_socket_setup() < 0) {
+		RTE_LOG(ERR, EAL, "Failed to set up local socket!\n");
+		return -1;
+	}
+
+	ret = pthread_create(&socket_thread, NULL,
+			pci_vfio_mp_sync_thread, NULL);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Failed to create thread for communication with "
+				"secondary processes!\n");
+		close(mp_socket_fd);
+		return -1;
+	}
+	return 0;
+}
+
+#endif
diff --git a/lib/librte_eal/linuxapp/eal/include/eal_pci_init.h b/lib/librte_eal/linuxapp/eal/include/eal_pci_init.h
index 23fb3c3..45846cc 100644
--- a/lib/librte_eal/linuxapp/eal/include/eal_pci_init.h
+++ b/lib/librte_eal/linuxapp/eal/include/eal_pci_init.h
@@ -71,9 +71,28 @@ int pci_uio_map_resource(struct rte_pci_device *dev);
 
 int pci_vfio_enable(void);
 int pci_vfio_is_enabled(void);
+int pci_vfio_mp_sync_setup(void);
 
 /* map VFIO resource prototype */
 int pci_vfio_map_resource(struct rte_pci_device *dev);
+int pci_vfio_get_group_fd(int iommu_group_fd);
+int pci_vfio_get_container_fd(void);
+
+/*
+ * Function prototypes for VFIO multiprocess sync functions
+ */
+int vfio_mp_sync_send_request(int socket, int req);
+int vfio_mp_sync_receive_request(int socket);
+int vfio_mp_sync_send_fd(int socket, int fd);
+int vfio_mp_sync_receive_fd(int socket);
+int vfio_mp_sync_connect_to_primary(void);
+
+/* socket comm protocol definitions */
+#define SOCKET_REQ_CONTAINER 0x100
+#define SOCKET_REQ_GROUP 0x200
+#define SOCKET_OK 0x0
+#define SOCKET_NO_FD 0x1
+#define SOCKET_ERR 0xFF
 
 /*
  * we don't need to store device fd's anywhere since they can be obtained from
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v3 14/20] pci: enable VFIO device binding
  2014-05-28 14:37   ` [dpdk-dev] [PATCH v3 00/20] " Anatoly Burakov
                       ` (12 preceding siblings ...)
  2014-05-28 14:38     ` [dpdk-dev] [PATCH v3 13/20] vfio: add multiprocess support Anatoly Burakov
@ 2014-05-28 14:38     ` Anatoly Burakov
  2014-05-28 14:38     ` [dpdk-dev] [PATCH v3 15/20] eal: added support for selecting VFIO interrupt type from EAL command-line Anatoly Burakov
                       ` (6 subsequent siblings)
  20 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-05-28 14:38 UTC (permalink / raw)
  To: dev

Add support for binding VFIO devices if RTE_PCI_DRV_NEED_MAPPING is set
for this driver. Try VFIO first, if not mapped then try IGB_UIO too.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/linuxapp/eal/eal_pci.c | 42 ++++++++++++++++++++++++++++++++---
 1 file changed, 39 insertions(+), 3 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c
index a0abec8..8a9cbf9 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -393,6 +393,27 @@ error:
 	return -1;
 }
 
+static int
+pci_map_device(struct rte_pci_device *dev)
+{
+	int ret, mapped = 0;
+
+	/* try mapping the NIC resources using VFIO if it exists */
+#ifdef VFIO_PRESENT
+	if (pci_vfio_is_enabled()) {
+		if ((ret = pci_vfio_map_resource(dev)) == 0)
+			mapped = 1;
+		else if (ret < 0)
+			return ret;
+	}
+#endif
+	/* map resources for devices that use igb_uio */
+	if (!mapped && (ret = pci_uio_map_resource(dev)) != 0)
+		return ret;
+
+	return 0;
+}
+
 /*
  * If vendor/device ID match, call the devinit() function of the
  * driver.
@@ -400,8 +421,8 @@ error:
 int
 rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, struct rte_pci_device *dev)
 {
+	int ret;
 	struct rte_pci_id *id_table;
-	int ret = 0;
 
 	for (id_table = dr->id_table ; id_table->vendor_id != 0; id_table++) {
 
@@ -436,8 +457,7 @@ rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, struct rte_pci_device *d
 		}
 
 		if (dr->drv_flags & RTE_PCI_DRV_NEED_MAPPING) {
-			/* map resources for devices that use igb_uio */
-			if ((ret = pci_uio_map_resource(dev)) != 0)
+			if ((ret = pci_map_device(dev)) != 0)
 				return ret;
 		} else if (dr->drv_flags & RTE_PCI_DRV_FORCE_UNBIND &&
 		           rte_eal_process_type() == RTE_PROC_PRIMARY) {
@@ -473,5 +493,21 @@ rte_eal_pci_init(void)
 		RTE_LOG(ERR, EAL, "%s(): Cannot scan PCI bus\n", __func__);
 		return -1;
 	}
+#ifdef VFIO_PRESENT
+	pci_vfio_enable();
+
+	if (pci_vfio_is_enabled()) {
+
+		/* if we are primary process, create a thread to communicate with
+		 * secondary processes. the thread will use a socket to wait for
+		 * requests from secondary process to send open file descriptors,
+		 * because VFIO does not allow multiple open descriptors on a group or
+		 * VFIO container.
+		 */
+		if (internal_config.process_type == RTE_PROC_PRIMARY &&
+				pci_vfio_mp_sync_setup() < 0)
+			return -1;
+	}
+#endif
 	return 0;
 }
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v3 15/20] eal: added support for selecting VFIO interrupt type from EAL command-line
  2014-05-28 14:37   ` [dpdk-dev] [PATCH v3 00/20] " Anatoly Burakov
                       ` (13 preceding siblings ...)
  2014-05-28 14:38     ` [dpdk-dev] [PATCH v3 14/20] pci: enable VFIO device binding Anatoly Burakov
@ 2014-05-28 14:38     ` Anatoly Burakov
  2014-05-28 14:38     ` [dpdk-dev] [PATCH v3 16/20] eal: make --no-huge use mmap instead of malloc Anatoly Burakov
                       ` (5 subsequent siblings)
  20 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-05-28 14:38 UTC (permalink / raw)
  To: dev

Unlike igb_uio, VFIO interrupt type is not set by kernel module
parameters but is set up via ioctl() calls at runtime. This warrants
a new EAL command-line parameter. It will have no effect if VFIO is
not compiled, but will set VFIO interrupt type to either "legacy", "msi"
or "msix" if VFIO support is compiled. Note that VFIO initialization
will fail if the interrupt type selected is not supported by the system.

If the interrupt type parameter wasn't specified, VFIO will try all
interrupt types (starting with MSI-X).

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/linuxapp/eal/eal.c | 33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index 18a3e04..e87a2e9 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -99,6 +99,7 @@
 #define OPT_BASE_VIRTADDR   "base-virtaddr"
 #define OPT_XEN_DOM0    "xen-dom0"
 #define OPT_CREATE_UIO_DEV "create-uio-dev"
+#define OPT_VFIO_INTR    "vfio-intr"
 
 #define RTE_EAL_BLACKLIST_SIZE	0x100
 
@@ -361,6 +362,7 @@ eal_usage(const char *prgname)
 	       "  --"OPT_VMWARE_TSC_MAP": use VMware TSC map instead of "
 	    		   "native RDTSC\n"
 	       "  --"OPT_BASE_VIRTADDR": specify base virtual address\n"
+	       "  --"OPT_VFIO_INTR": specify desired interrupt mode for VFIO (intx|msix)\n"
 	       "  --"OPT_CREATE_UIO_DEV": create /dev/uioX (usually done by hotplug)\n"
 	       "\nEAL options for DEBUG use only:\n"
 	       "  --"OPT_NO_HUGE"  : use malloc instead of hugetlbfs\n"
@@ -579,6 +581,28 @@ eal_parse_base_virtaddr(const char *arg)
 	return 0;
 }
 
+static int
+eal_parse_vfio_intr(const char *mode)
+{
+	unsigned i;
+	static struct {
+		const char *name;
+		enum rte_intr_mode value;
+	} map[] = {
+		{ "legacy", RTE_INTR_MODE_LEGACY },
+		{ "msi", RTE_INTR_MODE_MSI },
+		{ "msix", RTE_INTR_MODE_MSIX },
+	};
+
+	for (i = 0; i < RTE_DIM(map); i++) {
+		if (!strcmp(mode, map[i].name)) {
+			internal_config.vfio_intr_mode = map[i].value;
+			return 0;
+		}
+	}
+	return -1;
+}
+
 static inline size_t
 eal_get_hugepage_mem_size(void)
 {
@@ -633,6 +657,7 @@ eal_parse_args(int argc, char **argv)
 		{OPT_PCI_BLACKLIST, 1, 0, 0},
 		{OPT_VDEV, 1, 0, 0},
 		{OPT_SYSLOG, 1, NULL, 0},
+		{OPT_VFIO_INTR, 1, NULL, 0},
 		{OPT_BASE_VIRTADDR, 1, 0, 0},
 		{OPT_XEN_DOM0, 0, 0, 0},
 		{OPT_CREATE_UIO_DEV, 1, NULL, 0},
@@ -829,6 +854,14 @@ eal_parse_args(int argc, char **argv)
 					return -1;
 				}
 			}
+			else if (!strcmp(lgopts[option_index].name, OPT_VFIO_INTR)) {
+				if (eal_parse_vfio_intr(optarg) < 0) {
+					RTE_LOG(ERR, EAL, "invalid parameters for --"
+							OPT_VFIO_INTR "\n");
+					eal_usage(prgname);
+					return -1;
+				}
+			}
 			else if (!strcmp(lgopts[option_index].name, OPT_CREATE_UIO_DEV)) {
 				internal_config.create_uio_dev = 1;
 			}
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v3 16/20] eal: make --no-huge use mmap instead of malloc
  2014-05-28 14:37   ` [dpdk-dev] [PATCH v3 00/20] " Anatoly Burakov
                       ` (14 preceding siblings ...)
  2014-05-28 14:38     ` [dpdk-dev] [PATCH v3 15/20] eal: added support for selecting VFIO interrupt type from EAL command-line Anatoly Burakov
@ 2014-05-28 14:38     ` Anatoly Burakov
  2014-05-28 14:38     ` [dpdk-dev] [PATCH v3 17/20] test app: adding unit tests for VFIO EAL command-line parameter Anatoly Burakov
                       ` (4 subsequent siblings)
  20 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-05-28 14:38 UTC (permalink / raw)
  To: dev

This makes it possible to run DPDK without hugepage memory when VFIO
is used, as VFIO uses virtual addresses to set up DMA mappings.

Technically, malloc is just fine, but we want to guarantee that
memory will be page-aligned, so using mmap to be safe.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/linuxapp/eal/eal_memory.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
index 5a10a80..3fc0d28 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -1031,7 +1031,13 @@ rte_eal_hugepage_init(void)
 
 	/* hugetlbfs can be disabled */
 	if (internal_config.no_hugetlbfs) {
-		addr = malloc(internal_config.memory);
+		addr = mmap(NULL, internal_config.memory, PROT_READ | PROT_WRITE,
+				MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
+		if (addr == MAP_FAILED) {
+			RTE_LOG(ERR, EAL, "%s: mmap() failed: %s\n", __func__,
+					strerror(errno));
+			return -1;
+		}
 		mcfg->memseg[0].phys_addr = (phys_addr_t)(uintptr_t)addr;
 		mcfg->memseg[0].addr = addr;
 		mcfg->memseg[0].len = internal_config.memory;
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v3 17/20] test app: adding unit tests for VFIO EAL command-line parameter
  2014-05-28 14:37   ` [dpdk-dev] [PATCH v3 00/20] " Anatoly Burakov
                       ` (15 preceding siblings ...)
  2014-05-28 14:38     ` [dpdk-dev] [PATCH v3 16/20] eal: make --no-huge use mmap instead of malloc Anatoly Burakov
@ 2014-05-28 14:38     ` Anatoly Burakov
  2014-05-28 14:38     ` [dpdk-dev] [PATCH v3 18/20] igb_uio: Removed PCI ID table from igb_uio Anatoly Burakov
                       ` (3 subsequent siblings)
  20 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-05-28 14:38 UTC (permalink / raw)
  To: dev

Adding unit tests for VFIO interrupt type command-line parameter. We
don't know if VFIO is compiled (eal_vfio.h header is internal to
Linuxapp EAL), so we check this flag regardless.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 app/test/test_eal_flags.c | 36 ++++++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/app/test/test_eal_flags.c b/app/test/test_eal_flags.c
index 195a1f5..a0ee4e6 100644
--- a/app/test/test_eal_flags.c
+++ b/app/test/test_eal_flags.c
@@ -768,6 +768,22 @@ test_misc_flags(void)
 	const char *argv11[] = {prgname, "--file-prefix=virtaddr",
 			"-c", "1", "-n", "2", "--base-virtaddr=0x12345678"};
 
+	/* try running with --vfio-intr INTx flag */
+	const char *argv12[] = {prgname, "--file-prefix=intr",
+			"-c", "1", "-n", "2", "--vfio-intr=legacy"};
+
+	/* try running with --vfio-intr MSI flag */
+	const char *argv13[] = {prgname, "--file-prefix=intr",
+			"-c", "1", "-n", "2", "--vfio-intr=msi"};
+
+	/* try running with --vfio-intr MSI-X flag */
+	const char *argv14[] = {prgname, "--file-prefix=intr",
+			"-c", "1", "-n", "2", "--vfio-intr=msix"};
+
+	/* try running with --vfio-intr invalid flag */
+	const char *argv15[] = {prgname, "--file-prefix=intr",
+			"-c", "1", "-n", "2", "--vfio-intr=invalid"};
+
 
 	if (launch_proc(argv0) == 0) {
 		printf("Error - process ran ok with invalid flag\n");
@@ -820,6 +836,26 @@ test_misc_flags(void)
 		printf("Error - process did not run ok with --base-virtaddr parameter\n");
 		return -1;
 	}
+	if (launch_proc(argv12) != 0) {
+		printf("Error - process did not run ok with "
+				"--vfio-intr INTx parameter\n");
+		return -1;
+	}
+	if (launch_proc(argv13) != 0) {
+		printf("Error - process did not run ok with "
+				"--vfio-intr MSI parameter\n");
+		return -1;
+	}
+	if (launch_proc(argv14) != 0) {
+		printf("Error - process did not run ok with "
+				"--vfio-intr MSI-X parameter\n");
+		return -1;
+	}
+	if (launch_proc(argv15) == 0) {
+		printf("Error - process run ok with "
+				"--vfio-intr invalid parameter\n");
+		return -1;
+	}
 	return 0;
 }
 #endif
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v3 18/20] igb_uio: Removed PCI ID table from igb_uio
  2014-05-28 14:37   ` [dpdk-dev] [PATCH v3 00/20] " Anatoly Burakov
                       ` (16 preceding siblings ...)
  2014-05-28 14:38     ` [dpdk-dev] [PATCH v3 17/20] test app: adding unit tests for VFIO EAL command-line parameter Anatoly Burakov
@ 2014-05-28 14:38     ` Anatoly Burakov
  2014-05-28 14:38     ` [dpdk-dev] [PATCH v3 19/20] binding script: Renamed igb_uio_bind to dpdk_nic_bind Anatoly Burakov
                       ` (2 subsequent siblings)
  20 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-05-28 14:38 UTC (permalink / raw)
  To: dev

Removing PCI ID list to make igb_uio more similar to a generic driver
like vfio-pci or pci_uio_generic. This is done to make it easier for
the binding script to support multiple drivers.

Note that since igb_uio no longer has a PCI ID list, it can now be
bound to any device, not just those explicitly supported by DPDK. In
other words, it now behaves similar to PCI stub, VFIO and other generic
PCI drivers.

Therefore to bind a new device to igb_uio, the user will now have to
first write its PCI ID to "new_id" file inside the igb_uio driver
directory, and only then write the PCI ID to "bind". This is reflected
in changes to PCI binding script as well.

There's a weird behaviour of sysfs when a new device ID is added to
new_id. Subsequent writing to "bind" will result in IOError on
closing the file. This error is harmless but it triggers the
exception anyway, so in order to work around that, we check if the
device was actually bound to the driver before raising an error.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/linuxapp/igb_uio/igb_uio.c |  21 +-----
 tools/igb_uio_bind.py                     | 118 +++++++++++++++---------------
 2 files changed, 59 insertions(+), 80 deletions(-)

diff --git a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
index 7d5e6b4..6362b1c 100644
--- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
+++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
@@ -65,25 +65,6 @@ struct rte_uio_pci_dev {
 static char *intr_mode = NULL;
 static enum rte_intr_mode igbuio_intr_mode_preferred = RTE_INTR_MODE_MSIX;
 
-/* PCI device id table */
-static struct pci_device_id igbuio_pci_ids[] = {
-#define RTE_PCI_DEV_ID_DECL_EM(vend, dev) {PCI_DEVICE(vend, dev)},
-#define RTE_PCI_DEV_ID_DECL_IGB(vend, dev) {PCI_DEVICE(vend, dev)},
-#define RTE_PCI_DEV_ID_DECL_IGBVF(vend, dev) {PCI_DEVICE(vend, dev)},
-#define RTE_PCI_DEV_ID_DECL_IXGBE(vend, dev) {PCI_DEVICE(vend, dev)},
-#define RTE_PCI_DEV_ID_DECL_IXGBEVF(vend, dev) {PCI_DEVICE(vend, dev)},
-#ifdef RTE_LIBRTE_VIRTIO_PMD
-#define RTE_PCI_DEV_ID_DECL_VIRTIO(vend, dev) {PCI_DEVICE(vend, dev)},
-#endif
-#ifdef RTE_LIBRTE_VMXNET3_PMD
-#define RTE_PCI_DEV_ID_DECL_VMXNET3(vend, dev) {PCI_DEVICE(vend, dev)},
-#endif
-#include <rte_pci_dev_ids.h>
-{ 0, },
-};
-
-MODULE_DEVICE_TABLE(pci, igbuio_pci_ids);
-
 static inline struct rte_uio_pci_dev *
 igbuio_get_uio_pci_dev(struct uio_info *info)
 {
@@ -619,7 +600,7 @@ igbuio_config_intr_mode(char *intr_str)
 
 static struct pci_driver igbuio_pci_driver = {
 	.name = "igb_uio",
-	.id_table = igbuio_pci_ids,
+	.id_table = NULL,
 	.probe = igbuio_pci_probe,
 	.remove = igbuio_pci_remove,
 };
diff --git a/tools/igb_uio_bind.py b/tools/igb_uio_bind.py
index 824aa2b..33adcf4 100755
--- a/tools/igb_uio_bind.py
+++ b/tools/igb_uio_bind.py
@@ -42,8 +42,6 @@ ETHERNET_CLASS = "0200"
 # global dict ethernet devices present. Dictionary indexed by PCI address.
 # Each device within this is itself a dictionary of device properties
 devices = {}
-# list of vendor:device pairs (again stored as dictionary) supported by igb_uio
-module_dev_ids = []
 
 def usage():
     '''Print usage information for the program'''
@@ -147,9 +145,7 @@ def find_module(mod):
                 return path
 
 def check_modules():
-    '''Checks that the needed modules (igb_uio) is loaded, and then
-    determine from the .ko file, what its supported device ids are'''
-    global module_dev_ids
+    '''Checks that igb_uio is loaded'''
     
     fd = file("/proc/modules")
     loaded_mods = fd.readlines()
@@ -165,41 +161,36 @@ def check_modules():
     if not found:
         print "Error - module %s not loaded" %mod
         sys.exit(1)
-    
-    # now find the .ko and get list of supported vendor/dev-ids
-    modpath = find_module(mod)
-    if modpath is None:
-        print "Cannot find module file %s" % (mod + ".ko")
-        sys.exit(1)
-    depmod_output = check_output(["depmod", "-n", modpath]).splitlines()
-    for line in depmod_output:
-        if not line.startswith("alias"):
-            continue
-        if not line.endswith(mod):
-            continue
-        lineparts = line.split()
-        if not(lineparts[1].startswith("pci:")):
-            continue;
-        else:
-            lineparts[1] = lineparts[1][4:]
-        vendor = lineparts[1][:9]
-        device = lineparts[1][9:18]
-        if vendor.startswith("v") and device.startswith("d"):
-            module_dev_ids.append({"Vendor": int(vendor[1:],16), 
-                                   "Device": int(device[1:],16)})
-
-def is_supported_device(dev_id):
-    '''return true if device is supported by igb_uio, false otherwise'''
-    for dev in module_dev_ids:
-        if (dev["Vendor"] == devices[dev_id]["Vendor"] and 
-            dev["Device"] == devices[dev_id]["Device"]):
-            return True
-    return False
 
 def has_driver(dev_id):
     '''return true if a device is assigned to a driver. False otherwise'''
     return "Driver_str" in devices[dev_id]
 
+def get_pci_device_details(dev_id):
+    '''This function gets additional details for a PCI device'''
+    device = {}
+
+    extra_info = check_output(["lspci", "-vmmks", dev_id]).splitlines()
+
+    # parse lspci details
+    for line in extra_info:
+        if len(line) == 0:
+            continue
+        name, value = line.split("\t", 1)
+        name = name.strip(":") + "_str"
+        device[name] = value
+    # check for a unix interface name
+    sys_path = "/sys/bus/pci/devices/%s/net/" % dev_id
+    if exists(sys_path):
+        device["Interface"] = ",".join(os.listdir(sys_path))
+    else:
+        device["Interface"] = ""
+    # check if a port is used for ssh connection
+    device["Ssh_if"] = False
+    device["Active"] = ""
+
+    return device
+
 def get_nic_details():
     '''This function populates the "devices" dictionary. The keys used are
     the pci addresses (domain:bus:slot.func). The values are themselves
@@ -237,23 +228,10 @@ def get_nic_details():
 
     # based on the basic info, get extended text details            
     for d in devices.keys():
-        extra_info = check_output(["lspci", "-vmmks", d]).splitlines()
-        # parse lspci details
-        for line in extra_info:
-            if len(line) == 0:
-                continue
-            name, value = line.split("\t", 1)
-            name = name.strip(":") + "_str"
-            devices[d][name] = value
-        # check for a unix interface name
-        sys_path = "/sys/bus/pci/devices/%s/net/" % d
-        if exists(sys_path):
-            devices[d]["Interface"] = ",".join(os.listdir(sys_path))
-        else:
-            devices[d]["Interface"] = ""
-        # check if a port is used for ssh connection
-        devices[d]["Ssh_if"] = False
-        devices[d]["Active"] = ""
+        # get additional info and add it to existing data
+        devices[d] = dict(devices[d].items() +
+                          get_pci_device_details(d).items())
+
         for _if in ssh_if: 
             if _if in devices[d]["Interface"].split(","):
                 devices[d]["Ssh_if"] = True
@@ -261,14 +239,12 @@ def get_nic_details():
                 break;
 
         # add igb_uio to list of supporting modules if needed
-        if is_supported_device(d):
-            if "Module_str" in devices[d]:
-                if "igb_uio" not in devices[d]["Module_str"]:
-                    devices[d]["Module_str"] = devices[d]["Module_str"] + ",igb_uio"
-            else:
-                devices[d]["Module_str"] = "igb_uio"
-        if "Module_str" not in devices[d]:
-            devices[d]["Module_str"] = "<none>"
+        if "Module_str" in devices[d]:
+            if "igb_uio" not in devices[d]["Module_str"]:
+                devices[d]["Module_str"] = devices[d]["Module_str"] + ",igb_uio"
+        else:
+            devices[d]["Module_str"] = "igb_uio"
+
         # make sure the driver and module strings do not have any duplicates
         if has_driver(d):
             modules = devices[d]["Module_str"].split(",")
@@ -343,6 +319,22 @@ def bind_one(dev_id, driver, force):
             unbind_one(dev_id, force)
             dev["Driver_str"] = "" # clear driver string
 
+    # if we are binding to one of DPDK drivers, add PCI id's to that driver
+    if driver == "igb_uio":
+        filename = "/sys/bus/pci/drivers/%s/new_id" % driver
+        try:
+            f = open(filename, "w")
+        except:
+            print "Error: bind failed for %s - Cannot open %s" % (dev_id, filename)
+            return
+        try:
+            f.write("%04x %04x" % (dev["Vendor"], dev["Device"]))
+            f.close()
+        except:
+            print "Error: bind failed for %s - Cannot write new PCI ID to " \
+                "driver %s" % (dev_id, driver)
+            return
+
     # do the bind by writing to /sys
     filename = "/sys/bus/pci/drivers/%s/bind" % driver
     try:
@@ -356,6 +348,12 @@ def bind_one(dev_id, driver, force):
         f.write(dev_id)
         f.close()
     except:
+        # for some reason, closing dev_id after adding a new PCI ID to new_id
+        # results in IOError. however, if the device was successfully bound,
+        # we don't care for any errors and can safely ignore IOError
+        tmp = get_pci_device_details(dev_id)
+        if "Driver_str" in tmp and tmp["Driver_str"] == driver:
+            return
         print "Error: bind failed for %s - Cannot bind to driver %s" % (dev_id, driver)
         if saved_driver is not None: # restore any previous driver
             bind_one(dev_id, saved_driver, force)
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v3 19/20] binding script: Renamed igb_uio_bind to dpdk_nic_bind
  2014-05-28 14:37   ` [dpdk-dev] [PATCH v3 00/20] " Anatoly Burakov
                       ` (17 preceding siblings ...)
  2014-05-28 14:38     ` [dpdk-dev] [PATCH v3 18/20] igb_uio: Removed PCI ID table from igb_uio Anatoly Burakov
@ 2014-05-28 14:38     ` Anatoly Burakov
  2014-05-28 14:38     ` [dpdk-dev] [PATCH v3 20/20] setup script: adding support for VFIO to setup.sh Anatoly Burakov
  2014-06-03 10:17     ` [dpdk-dev] [PATCH v4 00/20] Add VFIO support to DPDK Anatoly Burakov
  20 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-05-28 14:38 UTC (permalink / raw)
  To: dev

Renaming the igb_uio_bind script to dpdk_nic_bind to have a generic name
since we're now supporting two drivers.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 tools/{igb_uio_bind.py => dpdk_nic_bind.py} | 47 ++++++++++++++++++++---------
 tools/setup.sh                              | 16 +++++-----
 2 files changed, 40 insertions(+), 23 deletions(-)
 rename tools/{igb_uio_bind.py => dpdk_nic_bind.py} (92%)

diff --git a/tools/igb_uio_bind.py b/tools/dpdk_nic_bind.py
similarity index 92%
rename from tools/igb_uio_bind.py
rename to tools/dpdk_nic_bind.py
index 33adcf4..1e517e7 100755
--- a/tools/igb_uio_bind.py
+++ b/tools/dpdk_nic_bind.py
@@ -42,6 +42,8 @@ ETHERNET_CLASS = "0200"
 # global dict ethernet devices present. Dictionary indexed by PCI address.
 # Each device within this is itself a dictionary of device properties
 devices = {}
+# list of supported DPDK drivers
+dpdk_drivers = [ "igb_uio", "vfio-pci" ]
 
 def usage():
     '''Print usage information for the program'''
@@ -146,22 +148,33 @@ def find_module(mod):
 
 def check_modules():
     '''Checks that igb_uio is loaded'''
+    global dpdk_drivers
     
     fd = file("/proc/modules")
     loaded_mods = fd.readlines()
     fd.close()
-    mod = "igb_uio"
+
+    # list of supported modules
+    mods =  [{"Name" : driver, "Found" : False} for driver in dpdk_drivers]
     
     # first check if module is loaded
-    found = False
     for line in loaded_mods:
-        if line.startswith(mod):
-            found = True
-            break
-    if not found:
-        print "Error - module %s not loaded" %mod
+        for mod in mods:
+            if line.startswith(mod["Name"]):
+                mod["Found"] = True
+            # special case for vfio_pci (module is named vfio-pci,
+            # but its .ko is named vfio_pci)
+            elif line.replace("_", "-").startswith(mod["Name"]):
+                mod["Found"] = True
+
+    # check if we have at least one loaded module
+    if True not in [mod["Found"] for mod in mods]:
+        print "Error - no supported modules are loaded"
         sys.exit(1)
 
+    # change DPDK driver list to only contain drivers that are loaded
+    dpdk_drivers = [mod["Name"] for mod in mods if mod["Found"]]
+
 def has_driver(dev_id):
     '''return true if a device is assigned to a driver. False otherwise'''
     return "Driver_str" in devices[dev_id]
@@ -196,6 +209,7 @@ def get_nic_details():
     the pci addresses (domain:bus:slot.func). The values are themselves
     dictionaries - one for each NIC.'''
     global devices
+    global dpdk_drivers
     
     # clear any old data
     devices = {} 
@@ -240,10 +254,11 @@ def get_nic_details():
 
         # add igb_uio to list of supporting modules if needed
         if "Module_str" in devices[d]:
-            if "igb_uio" not in devices[d]["Module_str"]:
-                devices[d]["Module_str"] = devices[d]["Module_str"] + ",igb_uio"
+            for driver in dpdk_drivers:
+                if driver not in devices[d]["Module_str"]:
+                    devices[d]["Module_str"] = devices[d]["Module_str"] + ",%s" % driver
         else:
-            devices[d]["Module_str"] = "igb_uio"
+            devices[d]["Module_str"] = ",".join(dpdk_drivers)
 
         # make sure the driver and module strings do not have any duplicates
         if has_driver(d):
@@ -320,7 +335,7 @@ def bind_one(dev_id, driver, force):
             dev["Driver_str"] = "" # clear driver string
 
     # if we are binding to one of DPDK drivers, add PCI id's to that driver
-    if driver == "igb_uio":
+    if driver in dpdk_drivers:
         filename = "/sys/bus/pci/drivers/%s/new_id" % driver
         try:
             f = open(filename, "w")
@@ -397,21 +412,23 @@ def show_status():
     '''Function called when the script is passed the "--status" option. Displays
     to the user what devices are bound to the igb_uio driver, the kernel driver
     or to no driver'''
+    global dpdk_drivers
     kernel_drv = []
-    uio_drv = []
+    dpdk_drv = []
     no_drv = []
+
     # split our list of devices into the three categories above
     for d in devices.keys():
         if not has_driver(d):
             no_drv.append(devices[d])
             continue
-        if devices[d]["Driver_str"] == "igb_uio":
-            uio_drv.append(devices[d])
+        if devices[d]["Driver_str"] in dpdk_drivers:
+            dpdk_drv.append(devices[d])
         else:
             kernel_drv.append(devices[d])
 
     # print each category separately, so we can clearly see what's used by DPDK
-    display_devices("Network devices using IGB_UIO driver", uio_drv, \
+    display_devices("Network devices using DPDK-compatible driver", dpdk_drv, \
                     "drv=%(Driver_str)s unused=%(Module_str)s")
     display_devices("Network devices using kernel driver", kernel_drv,
                     "if=%(Interface)s drv=%(Driver_str)s unused=%(Module_str)s %(Active)s")
diff --git a/tools/setup.sh b/tools/setup.sh
index 39be8fc..e0671b8 100755
--- a/tools/setup.sh
+++ b/tools/setup.sh
@@ -324,13 +324,13 @@ grep_meminfo()
 }
 
 #
-# Calls igb_uio_bind.py --status to show the NIC and what they
+# Calls dpdk_nic_bind.py --status to show the NIC and what they
 # are all bound to, in terms of drivers.
 #
 show_nics()
 {
 	if  /sbin/lsmod  | grep -q igb_uio ; then 
-		${RTE_SDK}/tools/igb_uio_bind.py --status
+		${RTE_SDK}/tools/dpdk_nic_bind.py --status
 	else 
 		echo "# Please load the 'igb_uio' kernel module before querying or "
 		echo "# adjusting NIC device bindings"
@@ -338,16 +338,16 @@ show_nics()
 }
 
 #
-# Uses igb_uio_bind.py to move devices to work with igb_uio
+# Uses dpdk_nic_bind.py to move devices to work with igb_uio
 #
 bind_nics()
 {
 	if  /sbin/lsmod  | grep -q igb_uio ; then 
-		${RTE_SDK}/tools/igb_uio_bind.py --status
+		${RTE_SDK}/tools/dpdk_nic_bind.py --status
 		echo ""
 		echo -n "Enter PCI address of device to bind to IGB UIO driver: "
 		read PCI_PATH
-		sudo ${RTE_SDK}/tools/igb_uio_bind.py -b igb_uio $PCI_PATH && echo "OK"
+		sudo ${RTE_SDK}/tools/dpdk_nic_bind.py -b igb_uio $PCI_PATH && echo "OK"
 	else 
 		echo "# Please load the 'igb_uio' kernel module before querying or "
 		echo "# adjusting NIC device bindings"
@@ -355,18 +355,18 @@ bind_nics()
 }
 
 #
-# Uses igb_uio_bind.py to move devices to work with kernel drivers again
+# Uses dpdk_nic_bind.py to move devices to work with kernel drivers again
 #
 unbind_nics()
 {
-	${RTE_SDK}/tools/igb_uio_bind.py --status
+	${RTE_SDK}/tools/dpdk_nic_bind.py --status
 	echo ""
 	echo -n "Enter PCI address of device to bind to IGB UIO driver: "
 	read PCI_PATH
 	echo ""
 	echo -n "Enter name of kernel driver to bind the device to: "
 	read DRV
-	sudo ${RTE_SDK}/tools/igb_uio_bind.py -b $DRV $PCI_PATH && echo "OK"
+	sudo ${RTE_SDK}/tools/dpdk_nic_bind.py -b $DRV $PCI_PATH && echo "OK"
 }
 
 #
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v3 20/20] setup script: adding support for VFIO to setup.sh
  2014-05-28 14:37   ` [dpdk-dev] [PATCH v3 00/20] " Anatoly Burakov
                       ` (18 preceding siblings ...)
  2014-05-28 14:38     ` [dpdk-dev] [PATCH v3 19/20] binding script: Renamed igb_uio_bind to dpdk_nic_bind Anatoly Burakov
@ 2014-05-28 14:38     ` Anatoly Burakov
  2014-06-03 10:17     ` [dpdk-dev] [PATCH v4 00/20] Add VFIO support to DPDK Anatoly Burakov
  20 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-05-28 14:38 UTC (permalink / raw)
  To: dev

Support for loading/unloading VFIO drivers, binding/unbinding devices
to/from VFIO, also setting up correct userspace permissions.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 tools/setup.sh | 156 +++++++++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 141 insertions(+), 15 deletions(-)

diff --git a/tools/setup.sh b/tools/setup.sh
index e0671b8..3991da9 100755
--- a/tools/setup.sh
+++ b/tools/setup.sh
@@ -187,6 +187,54 @@ load_igb_uio_module()
 }
 
 #
+# Unloads VFIO modules.
+#
+remove_vfio_module()
+{
+	echo "Unloading any existing VFIO module"
+	/sbin/lsmod | grep -s vfio > /dev/null
+	if [ $? -eq 0 ] ; then
+		sudo /sbin/rmmod vfio-pci
+		sudo /sbin/rmmod vfio_iommu_type1
+		sudo /sbin/rmmod vfio
+	fi
+}
+
+#
+# Loads new vfio-pci (and vfio module if needed).
+#
+load_vfio_module()
+{
+	remove_vfio_module
+
+	VFIO_PATH="kernel/drivers/vfio/pci/vfio-pci.ko"
+
+	echo "Loading VFIO module"
+	/sbin/lsmod | grep -s vfio_pci > /dev/null
+	if [ $? -ne 0 ] ; then
+		if [ -f /lib/modules/$(uname -r)/$VFIO_PATH ] ; then
+			sudo /sbin/modprobe vfio-pci
+		fi
+	fi
+
+	# make sure regular users can read /dev/vfio
+	echo "chmod /dev/vfio"
+	sudo /usr/bin/chmod a+x /dev/vfio
+	if [ $? -ne 0 ] ; then
+		echo "FAIL"
+		quit
+	fi
+	echo "OK"
+
+	# check if /dev/vfio/vfio exists - that way we
+	# know we either loaded the module, or it was
+	# compiled into the kernel
+	if [ ! -e /dev/vfio/vfio ] ; then
+		echo "## ERROR: VFIO not found!"
+	fi
+}
+
+#
 # Unloads the rte_kni.ko module.
 #
 remove_kni_module()
@@ -223,6 +271,55 @@ load_kni_module()
 }
 
 #
+# Sets appropriate permissions on /dev/vfio/* files
+#
+set_vfio_permissions()
+{
+	# make sure regular users can read /dev/vfio
+	echo "chmod /dev/vfio"
+	sudo /usr/bin/chmod a+x /dev/vfio
+	if [ $? -ne 0 ] ; then
+		echo "FAIL"
+		quit
+	fi
+	echo "OK"
+
+	# make sure regular user can access everything inside /dev/vfio
+	echo "chmod /dev/vfio/*"
+	sudo /usr/bin/chmod 0666 /dev/vfio/*
+	if [ $? -ne 0 ] ; then
+		echo "FAIL"
+		quit
+	fi
+	echo "OK"
+
+	# since permissions are only to be set when running as
+	# regular user, we only check ulimit here
+	#
+	# warn if regular user is only allowed
+	# to memlock <64M of memory
+	MEMLOCK_AMNT=`ulimit -l`
+
+	if [ "$MEMLOCK_AMNT" != "unlimited" ] ; then
+		MEMLOCK_MB=`expr $MEMLOCK_AMNT / 1024`
+		echo ""
+		echo "Current user memlock limit: ${MEMLOCK_MB} MB"
+		echo ""
+		echo "This is the maximum amount of memory you will be"
+		echo "able to use with DPDK and VFIO if run as current user."
+		echo -n "To change this, please adjust limits.conf memlock "
+		echo "limit for current user."
+
+		if [ $MEMLOCK_AMNT -lt 65536 ] ; then
+			echo ""
+			echo "## WARNING: memlock limit is less than 64MB"
+			echo -n "## DPDK with VFIO may not be able to initialize "
+			echo "if run as current user."
+		fi
+	fi
+}
+
+#
 # Removes all reserved hugepages.
 #
 clear_huge_pages()
@@ -340,7 +437,24 @@ show_nics()
 #
 # Uses dpdk_nic_bind.py to move devices to work with igb_uio
 #
-bind_nics()
+bind_nics_to_vfio()
+{
+	if /sbin/lsmod  | grep -q vfio_pci ; then
+		${RTE_SDK}/tools/dpdk_nic_bind.py --status
+		echo ""
+		echo -n "Enter PCI address of device to bind to VFIO driver: "
+		read PCI_PATH
+		sudo ${RTE_SDK}/tools/dpdk_nic_bind.py -b vfio-pci $PCI_PATH && echo "OK"
+	else
+		echo "# Please load the 'vfio-pci' kernel module before querying or "
+		echo "# adjusting NIC device bindings"
+	fi
+}
+
+#
+# Uses dpdk_nic_bind.py to move devices to work with igb_uio
+#
+bind_nics_to_igb_uio()
 {
 	if  /sbin/lsmod  | grep -q igb_uio ; then 
 		${RTE_SDK}/tools/dpdk_nic_bind.py --status
@@ -397,20 +511,29 @@ step2_func()
 	TEXT[1]="Insert IGB UIO module"
 	FUNC[1]="load_igb_uio_module"
 
-	TEXT[2]="Insert KNI module"
-	FUNC[2]="load_kni_module"
+	TEXT[2]="Insert VFIO module"
+	FUNC[2]="load_vfio_module"
+
+	TEXT[3]="Insert KNI module"
+	FUNC[3]="load_kni_module"
 
-	TEXT[3]="Setup hugepage mappings for non-NUMA systems"
-	FUNC[3]="set_non_numa_pages"
+	TEXT[4]="Setup hugepage mappings for non-NUMA systems"
+	FUNC[4]="set_non_numa_pages"
 
-	TEXT[4]="Setup hugepage mappings for NUMA systems"
-	FUNC[4]="set_numa_pages"
+	TEXT[5]="Setup hugepage mappings for NUMA systems"
+	FUNC[5]="set_numa_pages"
 
-	TEXT[5]="Display current Ethernet device settings"
-	FUNC[5]="show_nics"
+	TEXT[6]="Display current Ethernet device settings"
+	FUNC[6]="show_nics"
 
-	TEXT[6]="Bind Ethernet device to IGB UIO module"
-	FUNC[6]="bind_nics"
+	TEXT[7]="Bind Ethernet device to IGB UIO module"
+	FUNC[7]="bind_nics_to_igb_uio"
+
+	TEXT[8]="Bind Ethernet device to VFIO module"
+	FUNC[8]="bind_nics_to_vfio"
+
+	TEXT[9]="Setup VFIO permissions"
+	FUNC[9]="set_vfio_permissions"
 }
 
 #
@@ -455,11 +578,14 @@ step5_func()
 	TEXT[3]="Remove IGB UIO module"
 	FUNC[3]="remove_igb_uio_module"
 
-	TEXT[4]="Remove KNI module"
-	FUNC[4]="remove_kni_module"
+	TEXT[4]="Remove VFIO module"
+	FUNC[4]="remove_vfio_module"
+
+	TEXT[5]="Remove KNI module"
+	FUNC[5]="remove_kni_module"
 
-	TEXT[5]="Remove hugepage mappings"
-	FUNC[5]="clear_huge_pages"
+	TEXT[6]="Remove hugepage mappings"
+	FUNC[6]="clear_huge_pages"
 }
 
 STEPS[1]="step1_func"
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [dpdk-dev] [PATCH v2 13/16] Removed PCI ID table from igb_uio
  2014-05-28 13:45       ` Thomas Monjalon
@ 2014-05-28 14:50         ` Antti Kantee
  2014-05-28 16:24         ` Stephen Hemminger
  1 sibling, 0 replies; 160+ messages in thread
From: Antti Kantee @ 2014-05-28 14:50 UTC (permalink / raw)
  To: Thomas Monjalon, dev

On 28/05/14 13:45, Thomas Monjalon wrote:
> So maybe it's possible to improve uio_pci_generic in order to replace igb_uio.
> If someone wants to work on it, it's possible to stage uio_pci_generic in
> dpdk.org in order to make it ready for kernel.org.

Back when researching MSI + uio_pci_generic, I found this:
http://www.gossamer-threads.com/lists/linux/kernel/1738200

I'm not sure I completely follow the logic of the argument there, but 
seems like the maintainer's(?) mind of uio_pci_generic never supporting 
MSI is quite made up.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [dpdk-dev] [PATCH v2 13/16] Removed PCI ID table from igb_uio
  2014-05-28 13:45       ` Thomas Monjalon
  2014-05-28 14:50         ` Antti Kantee
@ 2014-05-28 16:24         ` Stephen Hemminger
  1 sibling, 0 replies; 160+ messages in thread
From: Stephen Hemminger @ 2014-05-28 16:24 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev

On Wed, 28 May 2014 15:45:02 +0200
Thomas Monjalon <thomas.monjalon@6wind.com> wrote:

> 2014-05-23 00:10, Antti Kantee:
> > On 22/05/14 13:13, Thomas Monjalon wrote:
> > > 2014-05-19 16:51, Anatoly Burakov:
> > >> Note that since igb_uio no longer has a PCI ID list, it can now be
> > >> bound to any device, not just those explicitly supported by DPDK. In
> > >> other words, it now behaves similar to PCI stub, VFIO and other generic
> > >> PCI drivers.
> > > 
> > > I wonder if we could replace igb_uio by uio_pci_generic?
> > 
> > I've been running plenty of the NetBSD kernel PCI drivers in Linux
> > userspace on top of uio_pci_generic, including NICs supported by DPDK.
> > The only real annoyance is that mainline uio_pci_generic doesn't support
> > MSI.  A pseudo-annoyance is that uio_pci_generic turns interrupts off
> > from the PCI config space each time after you read an interrupt, so they
> > have to be reenabled after each one (and NetBSD kernel drivers tend to
> > like using interrupts for everything).
> > 
> > The annoyance of vfio is iommus.  Yes, I want to make the tradeoff of
> > possibly scribbling memory vs. not being able to do anything on the
> > wrong system.
> > 
> > I'd like to see a generic Linux kernel PCI driver blob without
> > annoyances, though not yet annoyed enough to do anything myself ;)
> 
> So maybe it's possible to improve uio_pci_generic in order to replace igb_uio.
> If someone wants to work on it, it's possible to stage uio_pci_generic in 
> dpdk.org in order to make it ready for kernel.org.
> 

I am doing a new version of uio_pci for upstream kernel and will submit
when ready.  It will be for 3.10 or later kernel, will not bother backporting
past that.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v4 00/20] Add VFIO support to DPDK
  2014-05-28 14:37   ` [dpdk-dev] [PATCH v3 00/20] " Anatoly Burakov
                       ` (19 preceding siblings ...)
  2014-05-28 14:38     ` [dpdk-dev] [PATCH v3 20/20] setup script: adding support for VFIO to setup.sh Anatoly Burakov
@ 2014-06-03 10:17     ` Anatoly Burakov
  2014-06-03 10:17       ` [dpdk-dev] [PATCH v4 01/20] pci: move open() out of pci_map_resource, rename structs Anatoly Burakov
                         ` (20 more replies)
  20 siblings, 21 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-06-03 10:17 UTC (permalink / raw)
  To: dev

This patchset adds support for using VFIO instead of IGB_UIO to
map the device BARs.

VFIO is a kernel 3.6+ driver allowing secure DMA from userspace
by means of using IOMMU instead of working directly with physical
memory like igb_uio does.

Short summary:
* Adding support for VFIO in EAL PCI code
* Adding new command-line parameter for VFIO interrupt type
* Adding support for VFIO in setup.sh
* Renaming igb_uio_bind to dpdk_nic_bind and adding support for
  VFIO there
* Removing PCI ID list from igb_uio, effectively making it another
  generic PCI driver similar to pci_stub, vfio-pci et al
* Adding autotest for VFIO interrupt types
* Making igb_uio and VFIO compilation optional

v2 fixes:
* Fixed a couple of resource leaks

v3 fixes:
* Fixed various checkpatch.pl issues
* Added MSI interrupt support
* Added an option to automatically determine interrupt type
* Fixed various issues of commit atomicity

v4 fixes:
* Rebased on top of 5ebbb17281645b23359fbd49133bb639b63ba88c
* Fixed a typo in EAL command-line help text

Anatoly Burakov (20):
  pci: move open() out of pci_map_resource, rename structs
  pci: move uio mapping code to a separate file
  pci: fixing errors in a previous commit found by checkpatch
  pci: distinguish between legitimate failures and non-fatal errors
  pci: Rename RTE_PCI_DRV_NEED_IGB_UIO to RTE_PCI_DRV_NEED_MAPPING
  igb_uio: make igb_uio compilation optional
  igb_uio: Moved interrupt type out of igb_uio
  vfio: add support for VFIO in Linuxapp targets
  vfio: add VFIO header
  interrupts: Add support for VFIO interrupts
  eal: remove -Wno-return-type for non-existent eal_hpet.c
  vfio: create mapping code for VFIO
  vfio: add multiprocess support.
  pci: enable VFIO device binding
  eal: added support for selecting VFIO interrupt type from EAL    
    command-line
  eal: make --no-huge use mmap instead of malloc
  test app: adding unit tests for VFIO EAL command-line parameter
  igb_uio: Removed PCI ID table from igb_uio
  binding script: Renamed igb_uio_bind to dpdk_nic_bind
  setup script: adding support for VFIO to setup.sh

 app/test/test_eal_flags.c                          |  36 +
 app/test/test_pci.c                                |   4 +-
 config/common_linuxapp                             |   2 +
 lib/librte_eal/bsdapp/eal/eal_pci.c                |   2 +-
 lib/librte_eal/common/Makefile                     |   1 +
 lib/librte_eal/common/eal_common_pci.c             |  16 +-
 lib/librte_eal/common/include/rte_pci.h            |   5 +-
 .../common/include/rte_pci_dev_feature_defs.h      |  46 ++
 .../common/include/rte_pci_dev_features.h          |  44 ++
 lib/librte_eal/linuxapp/Makefile                   |   2 +
 lib/librte_eal/linuxapp/eal/Makefile               |   5 +-
 lib/librte_eal/linuxapp/eal/eal.c                  |  36 +
 lib/librte_eal/linuxapp/eal/eal_interrupts.c       | 285 +++++++-
 lib/librte_eal/linuxapp/eal/eal_memory.c           |   8 +-
 lib/librte_eal/linuxapp/eal/eal_pci.c              | 473 ++-----------
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c          | 403 +++++++++++
 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c         | 781 +++++++++++++++++++++
 lib/librte_eal/linuxapp/eal/eal_pci_vfio_mp_sync.c | 395 +++++++++++
 .../linuxapp/eal/include/eal_internal_cfg.h        |   3 +
 lib/librte_eal/linuxapp/eal/include/eal_pci_init.h | 116 +++
 lib/librte_eal/linuxapp/eal/include/eal_vfio.h     |  55 ++
 .../linuxapp/eal/include/exec-env/rte_interrupts.h |   4 +
 lib/librte_eal/linuxapp/igb_uio/igb_uio.c          |  69 +-
 lib/librte_pmd_e1000/em_ethdev.c                   |   2 +-
 lib/librte_pmd_e1000/igb_ethdev.c                  |   4 +-
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c                |   4 +-
 lib/librte_pmd_vmxnet3/vmxnet3_ethdev.c            |   2 +-
 tools/{igb_uio_bind.py => dpdk_nic_bind.py}        | 157 +++--
 tools/setup.sh                                     | 172 ++++-
 29 files changed, 2545 insertions(+), 587 deletions(-)
 create mode 100644 lib/librte_eal/common/include/rte_pci_dev_feature_defs.h
 create mode 100644 lib/librte_eal/common/include/rte_pci_dev_features.h
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_pci_uio.c
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_pci_vfio_mp_sync.c
 create mode 100644 lib/librte_eal/linuxapp/eal/include/eal_pci_init.h
 create mode 100644 lib/librte_eal/linuxapp/eal/include/eal_vfio.h
 rename tools/{igb_uio_bind.py => dpdk_nic_bind.py} (83%)

-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v4 01/20] pci: move open() out of pci_map_resource, rename structs
  2014-06-03 10:17     ` [dpdk-dev] [PATCH v4 00/20] Add VFIO support to DPDK Anatoly Burakov
@ 2014-06-03 10:17       ` Anatoly Burakov
  2014-06-03 10:17       ` [dpdk-dev] [PATCH v4 02/20] pci: move uio mapping code to a separate file Anatoly Burakov
                         ` (19 subsequent siblings)
  20 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-06-03 10:17 UTC (permalink / raw)
  To: dev

Separating mapping code and calls to open. This is a preparatory work
for VFIO patch since it'll need to map BARs too but it doesn't use path
in mapped_pci_resource. Also, renaming structs to be more generic.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/linuxapp/eal/eal_pci.c | 125 ++++++++++++++++------------------
 1 file changed, 58 insertions(+), 67 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c
index ac2c1fe..fd88bd0 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -31,39 +31,17 @@
  *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */
 
-#include <ctype.h>
-#include <stdio.h>
-#include <stdlib.h>
 #include <string.h>
-#include <stdarg.h>
-#include <unistd.h>
-#include <inttypes.h>
-#include <sys/types.h>
 #include <sys/stat.h>
 #include <fcntl.h>
-#include <stdarg.h>
-#include <errno.h>
 #include <dirent.h>
-#include <limits.h>
-#include <sys/queue.h>
 #include <sys/mman.h>
-#include <sys/ioctl.h>
 
-#include <rte_interrupts.h>
 #include <rte_log.h>
 #include <rte_pci.h>
-#include <rte_common.h>
-#include <rte_launch.h>
-#include <rte_memory.h>
-#include <rte_memzone.h>
 #include <rte_tailq.h>
-#include <rte_eal.h>
 #include <rte_eal_memconfig.h>
-#include <rte_per_lcore.h>
-#include <rte_lcore.h>
 #include <rte_malloc.h>
-#include <rte_string_fns.h>
-#include <rte_debug.h>
 #include <rte_devargs.h>
 
 #include "rte_pci_dev_ids.h"
@@ -74,15 +52,12 @@
  * @file
  * PCI probing under linux
  *
- * This code is used to simulate a PCI probe by parsing information in
- * sysfs. Moreover, when a registered driver matches a device, the
- * kernel driver currently using it is unloaded and replaced by
- * igb_uio module, which is a very minimal userland driver for Intel
- * network card, only providing access to PCI BAR to applications, and
- * enabling bus master.
+ * This code is used to simulate a PCI probe by parsing information in sysfs.
+ * When a registered device matches a driver, it is then initialized with
+ * IGB_UIO driver (or doesn't initialize, if the device wasn't bound to it).
  */
 
-struct uio_map {
+struct pci_map {
 	void *addr;
 	uint64_t offset;
 	uint64_t size;
@@ -93,18 +68,18 @@ struct uio_map {
  * For multi-process we need to reproduce all PCI mappings in secondary
  * processes, so save them in a tailq.
  */
-struct uio_resource {
-	TAILQ_ENTRY(uio_resource) next;
+struct mapped_pci_resource {
+	TAILQ_ENTRY(mapped_pci_resource) next;
 
 	struct rte_pci_addr pci_addr;
 	char path[PATH_MAX];
-	size_t nb_maps;
-	struct uio_map maps[PCI_MAX_RESOURCE];
+	int nb_maps;
+	struct pci_map maps[PCI_MAX_RESOURCE];
 };
 
-TAILQ_HEAD(uio_res_list, uio_resource);
+TAILQ_HEAD(mapped_pci_res_list, mapped_pci_resource);
+static struct mapped_pci_res_list *pci_res_list;
 
-static struct uio_res_list *uio_res_list = NULL;
 static int pci_parse_sysfs_value(const char *filename, uint64_t *val);
 
 /* unbind kernel driver for this device */
@@ -148,30 +123,17 @@ error:
 
 /* map a particular resource from a file */
 static void *
-pci_map_resource(void *requested_addr, const char *devname, off_t offset,
-		 size_t size)
+pci_map_resource(void *requested_addr, int fd, off_t offset, size_t size)
 {
-	int fd;
 	void *mapaddr;
 
-	/*
-	 * open devname, to mmap it
-	 */
-	fd = open(devname, O_RDWR);
-	if (fd < 0) {
-		RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
-			devname, strerror(errno));
-		goto fail;
-	}
-
 	/* Map the PCI memory resource of device */
 	mapaddr = mmap(requested_addr, size, PROT_READ | PROT_WRITE,
 			MAP_SHARED, fd, offset);
-	close(fd);
 	if (mapaddr == MAP_FAILED ||
 			(requested_addr != NULL && mapaddr != requested_addr)) {
-		RTE_LOG(ERR, EAL, "%s(): cannot mmap(%s(%d), %p, 0x%lx, 0x%lx):"
-			" %s (%p)\n", __func__, devname, fd, requested_addr,
+		RTE_LOG(ERR, EAL, "%s(): cannot mmap(%d, %p, 0x%lx, 0x%lx): %s (%p)\n",
+			__func__, fd, requested_addr,
 			(unsigned long)size, (unsigned long)offset,
 			strerror(errno), mapaddr);
 		goto fail;
@@ -186,10 +148,10 @@ fail:
 }
 
 #define OFF_MAX              ((uint64_t)(off_t)-1)
-static ssize_t
-pci_uio_get_mappings(const char *devname, struct uio_map maps[], size_t nb_maps)
+static int
+pci_uio_get_mappings(const char *devname, struct pci_map maps[], int nb_maps)
 {
-	size_t i;
+	int i;
 	char dirname[PATH_MAX];
 	char filename[PATH_MAX];
 	uint64_t offset, size;
@@ -249,25 +211,37 @@ pci_uio_get_mappings(const char *devname, struct uio_map maps[], size_t nb_maps)
 static int
 pci_uio_map_secondary(struct rte_pci_device *dev)
 {
-        size_t i;
-        struct uio_resource *uio_res;
+	int fd, i;
+	struct mapped_pci_resource *uio_res;
 
-	TAILQ_FOREACH(uio_res, uio_res_list, next) {
+	TAILQ_FOREACH(uio_res, pci_res_list, next) {
 
 		/* skip this element if it doesn't match our PCI address */
 		if (memcmp(&uio_res->pci_addr, &dev->addr, sizeof(dev->addr)))
 			continue;
 
 		for (i = 0; i != uio_res->nb_maps; i++) {
-			if (pci_map_resource(uio_res->maps[i].addr,
-					     uio_res->path,
+			/*
+			 * open devname, to mmap it
+			 */
+			fd = open(uio_res->path, O_RDWR);
+			if (fd < 0) {
+				RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
+					uio_res->path, strerror(errno));
+				return -1;
+			}
+
+			if (pci_map_resource(uio_res->maps[i].addr, fd,
 					     (off_t)uio_res->maps[i].offset,
 					     (size_t)uio_res->maps[i].size)
 			    != uio_res->maps[i].addr) {
 				RTE_LOG(ERR, EAL,
 					"Cannot mmap device resource\n");
+				close(fd);
 				return (-1);
 			}
+			/* fd is not needed in slave process, close it */
+			close(fd);
 		}
 		return (0);
 	}
@@ -276,7 +250,8 @@ pci_uio_map_secondary(struct rte_pci_device *dev)
 	return -1;
 }
 
-static int pci_mknod_uio_dev(const char *sysfs_uio_path, unsigned uio_num)
+static int
+pci_mknod_uio_dev(const char *sysfs_uio_path, unsigned uio_num)
 {
 	FILE *f;
 	char filename[PATH_MAX];
@@ -323,7 +298,8 @@ static int pci_mknod_uio_dev(const char *sysfs_uio_path, unsigned uio_num)
  * sysfs. On error, return a negative value. In this case dstbuf is
  * invalid.
  */
-static int pci_get_uio_dev(struct rte_pci_device *dev, char *dstbuf,
+static int
+pci_get_uio_dev(struct rte_pci_device *dev, char *dstbuf,
 			   unsigned int buflen)
 {
 	struct rte_pci_addr *loc = &dev->addr;
@@ -405,10 +381,10 @@ pci_uio_map_resource(struct rte_pci_device *dev)
 	uint64_t phaddr;
 	uint64_t offset;
 	uint64_t pagesz;
-	ssize_t nb_maps;
+	int nb_maps;
 	struct rte_pci_addr *loc = &dev->addr;
-	struct uio_resource *uio_res;
-	struct uio_map *maps;
+	struct mapped_pci_resource *uio_res;
+	struct pci_map *maps;
 
 	dev->intr_handle.fd = -1;
 	dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
@@ -460,6 +436,7 @@ pci_uio_map_resource(struct rte_pci_device *dev)
 
 	maps = uio_res->maps;
 	for (i = 0; i != PCI_MAX_RESOURCE; i++) {
+		int fd;
 
 		/* skip empty BAR */
 		if ((phaddr = dev->mem_resource[i].phys_addr) == 0)
@@ -473,14 +450,27 @@ pci_uio_map_resource(struct rte_pci_device *dev)
 		/* if matching map is found, then use it */
 		if (j != nb_maps) {
 			offset = j * pagesz;
+
+			/*
+			 * open devname, to mmap it
+			 */
+			fd = open(devname, O_RDWR);
+			if (fd < 0) {
+				RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
+					devname, strerror(errno));
+				return -1;
+			}
+
 			if (maps[j].addr != NULL ||
-			    (mapaddr = pci_map_resource(NULL, devname,
+			    (mapaddr = pci_map_resource(NULL, fd,
 							(off_t)offset,
 							(size_t)maps[j].size)
 			    ) == NULL) {
 				rte_free(uio_res);
+				close(fd);
 				return (-1);
 			}
+			close(fd);
 
 			maps[j].addr = mapaddr;
 			maps[j].offset = offset;
@@ -488,7 +478,7 @@ pci_uio_map_resource(struct rte_pci_device *dev)
 		}
 	}
 
-	TAILQ_INSERT_TAIL(uio_res_list, uio_res, next);
+	TAILQ_INSERT_TAIL(pci_res_list, uio_res, next);
 
 	return (0);
 }
@@ -866,7 +856,8 @@ rte_eal_pci_init(void)
 {
 	TAILQ_INIT(&pci_driver_list);
 	TAILQ_INIT(&pci_device_list);
-	uio_res_list = RTE_TAILQ_RESERVE_BY_IDX(RTE_TAILQ_PCI, uio_res_list);
+	pci_res_list = RTE_TAILQ_RESERVE_BY_IDX(RTE_TAILQ_PCI,
+			mapped_pci_res_list);
 
 	/* for debug purposes, PCI can be disabled */
 	if (internal_config.no_pci)
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v4 02/20] pci: move uio mapping code to a separate file
  2014-06-03 10:17     ` [dpdk-dev] [PATCH v4 00/20] Add VFIO support to DPDK Anatoly Burakov
  2014-06-03 10:17       ` [dpdk-dev] [PATCH v4 01/20] pci: move open() out of pci_map_resource, rename structs Anatoly Burakov
@ 2014-06-03 10:17       ` Anatoly Burakov
  2014-06-03 10:17       ` [dpdk-dev] [PATCH v4 03/20] pci: fixing errors in a previous commit found by checkpatch Anatoly Burakov
                         ` (18 subsequent siblings)
  20 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-06-03 10:17 UTC (permalink / raw)
  To: dev


Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/linuxapp/eal/Makefile               |   1 +
 lib/librte_eal/linuxapp/eal/eal_pci.c              | 403 +--------------------
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c          | 403 +++++++++++++++++++++
 lib/librte_eal/linuxapp/eal/include/eal_pci_init.h |  66 ++++
 4 files changed, 474 insertions(+), 399 deletions(-)
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_pci_uio.c
 create mode 100644 lib/librte_eal/linuxapp/eal/include/eal_pci_init.h

diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index b052820..d958014 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -57,6 +57,7 @@ endif
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_thread.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_log.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci.c
+SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci_uio.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_debug.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_lcore.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_timer.c
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c
index fd88bd0..628813b 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -32,8 +32,6 @@
  */
 
 #include <string.h>
-#include <sys/stat.h>
-#include <fcntl.h>
 #include <dirent.h>
 #include <sys/mman.h>
 
@@ -47,6 +45,7 @@
 #include "rte_pci_dev_ids.h"
 #include "eal_filesystem.h"
 #include "eal_private.h"
+#include "eal_pci_init.h"
 
 /**
  * @file
@@ -57,30 +56,7 @@
  * IGB_UIO driver (or doesn't initialize, if the device wasn't bound to it).
  */
 
-struct pci_map {
-	void *addr;
-	uint64_t offset;
-	uint64_t size;
-	uint64_t phaddr;
-};
-
-/*
- * For multi-process we need to reproduce all PCI mappings in secondary
- * processes, so save them in a tailq.
- */
-struct mapped_pci_resource {
-	TAILQ_ENTRY(mapped_pci_resource) next;
-
-	struct rte_pci_addr pci_addr;
-	char path[PATH_MAX];
-	int nb_maps;
-	struct pci_map maps[PCI_MAX_RESOURCE];
-};
-
-TAILQ_HEAD(mapped_pci_res_list, mapped_pci_resource);
-static struct mapped_pci_res_list *pci_res_list;
-
-static int pci_parse_sysfs_value(const char *filename, uint64_t *val);
+struct mapped_pci_res_list *pci_res_list = NULL;
 
 /* unbind kernel driver for this device */
 static int
@@ -122,8 +98,8 @@ error:
 }
 
 /* map a particular resource from a file */
-static void *
-pci_map_resource(void *requested_addr, int fd, off_t offset, size_t size)
+void *
+pci_map_resource(void * requested_addr, int fd, off_t offset, size_t size)
 {
 	void *mapaddr;
 
@@ -147,342 +123,6 @@ fail:
 	return NULL;
 }
 
-#define OFF_MAX              ((uint64_t)(off_t)-1)
-static int
-pci_uio_get_mappings(const char *devname, struct pci_map maps[], int nb_maps)
-{
-	int i;
-	char dirname[PATH_MAX];
-	char filename[PATH_MAX];
-	uint64_t offset, size;
-
-	for (i = 0; i != nb_maps; i++) {
- 
-		/* check if map directory exists */
-		rte_snprintf(dirname, sizeof(dirname), 
-			"%s/maps/map%u", devname, i);
- 
-		if (access(dirname, F_OK) != 0)
-			break;
- 
-		/* get mapping offset */
-		rte_snprintf(filename, sizeof(filename),
-			"%s/offset", dirname);
-		if (pci_parse_sysfs_value(filename, &offset) < 0) {
-			RTE_LOG(ERR, EAL,
-				"%s(): cannot parse offset of %s\n",
-				__func__, dirname);
-			return (-1);
-		}
- 
-		/* get mapping size */
-		rte_snprintf(filename, sizeof(filename),
-			"%s/size", dirname);
-		if (pci_parse_sysfs_value(filename, &size) < 0) {
-			RTE_LOG(ERR, EAL,
-				"%s(): cannot parse size of %s\n",
-				__func__, dirname);
-			return (-1);
-		}
- 
-		/* get mapping physical address */
-		rte_snprintf(filename, sizeof(filename),
-			"%s/addr", dirname);
-		if (pci_parse_sysfs_value(filename, &maps[i].phaddr) < 0) {
-			RTE_LOG(ERR, EAL,
-				"%s(): cannot parse addr of %s\n",
-				__func__, dirname);
-			return (-1);
-		}
-
-		if ((offset > OFF_MAX) || (size > SIZE_MAX)) {
-			RTE_LOG(ERR, EAL,
-				"%s(): offset/size exceed system max value\n",
-				__func__); 
-			return (-1);
-		}
-
-		maps[i].offset = offset;
-		maps[i].size = size;
-        }
-	return (i);
-}
-
-static int
-pci_uio_map_secondary(struct rte_pci_device *dev)
-{
-	int fd, i;
-	struct mapped_pci_resource *uio_res;
-
-	TAILQ_FOREACH(uio_res, pci_res_list, next) {
-
-		/* skip this element if it doesn't match our PCI address */
-		if (memcmp(&uio_res->pci_addr, &dev->addr, sizeof(dev->addr)))
-			continue;
-
-		for (i = 0; i != uio_res->nb_maps; i++) {
-			/*
-			 * open devname, to mmap it
-			 */
-			fd = open(uio_res->path, O_RDWR);
-			if (fd < 0) {
-				RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
-					uio_res->path, strerror(errno));
-				return -1;
-			}
-
-			if (pci_map_resource(uio_res->maps[i].addr, fd,
-					     (off_t)uio_res->maps[i].offset,
-					     (size_t)uio_res->maps[i].size)
-			    != uio_res->maps[i].addr) {
-				RTE_LOG(ERR, EAL,
-					"Cannot mmap device resource\n");
-				close(fd);
-				return (-1);
-			}
-			/* fd is not needed in slave process, close it */
-			close(fd);
-		}
-		return (0);
-	}
-
-	RTE_LOG(ERR, EAL, "Cannot find resource for device\n");
-	return -1;
-}
-
-static int
-pci_mknod_uio_dev(const char *sysfs_uio_path, unsigned uio_num)
-{
-	FILE *f;
-	char filename[PATH_MAX];
-	int ret;
-	unsigned major, minor;
-	dev_t dev;
-
-	/* get the name of the sysfs file that contains the major and minor
-	 * of the uio device and read its content */
-	rte_snprintf(filename, sizeof(filename), "%s/dev", sysfs_uio_path);
-
-	f = fopen(filename, "r");
-	if (f == NULL) {
-		RTE_LOG(ERR, EAL, "%s(): cannot open sysfs to get major:minor\n",
-			__func__);
-		return -1;
-	}
-
-	ret = fscanf(f, "%d:%d", &major, &minor);
-	if (ret != 2) {
-		RTE_LOG(ERR, EAL, "%s(): cannot parse sysfs to get major:minor\n",
-			__func__);
-		fclose(f);
-		return -1;
-	}
-	fclose(f);
-
-	/* create the char device "mknod /dev/uioX c major minor" */
-	rte_snprintf(filename, sizeof(filename), "/dev/uio%u", uio_num);
-	dev = makedev(major, minor);
-	ret = mknod(filename, S_IFCHR | S_IRUSR | S_IWUSR, dev);
-	if (f == NULL) {
-		RTE_LOG(ERR, EAL, "%s(): mknod() failed %s\n",
-			__func__, strerror(errno));
-		return -1;
-	}
-
-	return ret;
-}
-
-/*
- * Return the uioX char device used for a pci device. On success, return
- * the UIO number and fill dstbuf string with the path of the device in
- * sysfs. On error, return a negative value. In this case dstbuf is
- * invalid.
- */
-static int
-pci_get_uio_dev(struct rte_pci_device *dev, char *dstbuf,
-			   unsigned int buflen)
-{
-	struct rte_pci_addr *loc = &dev->addr;
-	unsigned int uio_num;
-	struct dirent *e;
-	DIR *dir;
-	char dirname[PATH_MAX];
-
-	/* depending on kernel version, uio can be located in uio/uioX
-	 * or uio:uioX */
-
-	rte_snprintf(dirname, sizeof(dirname),
-	         SYSFS_PCI_DEVICES "/" PCI_PRI_FMT "/uio",
-	         loc->domain, loc->bus, loc->devid, loc->function);
-
-	dir = opendir(dirname);
-	if (dir == NULL) {
-		/* retry with the parent directory */
-		rte_snprintf(dirname, sizeof(dirname),
-		         SYSFS_PCI_DEVICES "/" PCI_PRI_FMT,
-		         loc->domain, loc->bus, loc->devid, loc->function);
-		dir = opendir(dirname);
-
-		if (dir == NULL) {
-			RTE_LOG(ERR, EAL, "Cannot opendir %s\n", dirname);
-			return -1;
-		}
-	}
-
-	/* take the first file starting with "uio" */
-	while ((e = readdir(dir)) != NULL) {
-		/* format could be uio%d ...*/
-		int shortprefix_len = sizeof("uio") - 1;
-		/* ... or uio:uio%d */
-		int longprefix_len = sizeof("uio:uio") - 1; 
-		char *endptr;
-
-		if (strncmp(e->d_name, "uio", 3) != 0)
-			continue;
-
-		/* first try uio%d */
-		errno = 0;
-		uio_num = strtoull(e->d_name + shortprefix_len, &endptr, 10);
-		if (errno == 0 && endptr != (e->d_name + shortprefix_len)) {
-			rte_snprintf(dstbuf, buflen, "%s/uio%u", dirname, uio_num);
-			break;
-		}
-
-		/* then try uio:uio%d */
-		errno = 0;
-		uio_num = strtoull(e->d_name + longprefix_len, &endptr, 10);
-		if (errno == 0 && endptr != (e->d_name + longprefix_len)) {
-			rte_snprintf(dstbuf, buflen, "%s/uio:uio%u", dirname, uio_num);
-			break;
-		}
-	}
-	closedir(dir);
-
-	/* No uio resource found */
-	if (e == NULL)
-		return -1;
-
-	/* create uio device if we've been asked to */
-	if (internal_config.create_uio_dev && pci_mknod_uio_dev(dstbuf, uio_num) < 0)
-		RTE_LOG(WARNING, EAL, "Cannot create /dev/uio%u\n", uio_num);
-
-	return uio_num;
-}
-
-/* map the PCI resource of a PCI device in virtual memory */
-static int
-pci_uio_map_resource(struct rte_pci_device *dev)
-{
-	int i, j;
-	char dirname[PATH_MAX];
-	char devname[PATH_MAX]; /* contains the /dev/uioX */
-	void *mapaddr;
-	int uio_num;
-	uint64_t phaddr;
-	uint64_t offset;
-	uint64_t pagesz;
-	int nb_maps;
-	struct rte_pci_addr *loc = &dev->addr;
-	struct mapped_pci_resource *uio_res;
-	struct pci_map *maps;
-
-	dev->intr_handle.fd = -1;
-	dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
-
-	/* secondary processes - use already recorded details */
-	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
-		return (pci_uio_map_secondary(dev));
-
-	/* find uio resource */
-	uio_num = pci_get_uio_dev(dev, dirname, sizeof(dirname));
-	if (uio_num < 0) {
-		RTE_LOG(WARNING, EAL, "  "PCI_PRI_FMT" not managed by UIO driver, "
-				"skipping\n", loc->domain, loc->bus, loc->devid, loc->function);
-		return -1;
-	}
-	rte_snprintf(devname, sizeof(devname), "/dev/uio%u", uio_num);
-
-	/* save fd if in primary process */
-	dev->intr_handle.fd = open(devname, O_RDWR);
-	if (dev->intr_handle.fd < 0) {
-		RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
-			devname, strerror(errno));
-		return -1;
-	}
-	dev->intr_handle.type = RTE_INTR_HANDLE_UIO;
-
-	/* allocate the mapping details for secondary processes*/
-	if ((uio_res = rte_zmalloc("UIO_RES", sizeof (*uio_res), 0)) == NULL) {
-		RTE_LOG(ERR, EAL,
-			"%s(): cannot store uio mmap details\n", __func__);
-		return (-1);
-	}
-
-	rte_snprintf(uio_res->path, sizeof(uio_res->path), "%s", devname);
-	memcpy(&uio_res->pci_addr, &dev->addr, sizeof(uio_res->pci_addr));
-
-	/* collect info about device mappings */
-	nb_maps = pci_uio_get_mappings(dirname, uio_res->maps,
-				       RTE_DIM(uio_res->maps));
-	if (nb_maps < 0) {
-		rte_free(uio_res);
-		return (nb_maps);
-	}
-
-	uio_res->nb_maps = nb_maps;
-
-	/* Map all BARs */
-	pagesz = sysconf(_SC_PAGESIZE);
-
-	maps = uio_res->maps;
-	for (i = 0; i != PCI_MAX_RESOURCE; i++) {
-		int fd;
-
-		/* skip empty BAR */
-		if ((phaddr = dev->mem_resource[i].phys_addr) == 0)
-			continue;
-
-		for (j = 0; j != nb_maps && (phaddr != maps[j].phaddr ||
-				dev->mem_resource[i].len != maps[j].size);
-				j++)
-			;
-
-		/* if matching map is found, then use it */
-		if (j != nb_maps) {
-			offset = j * pagesz;
-
-			/*
-			 * open devname, to mmap it
-			 */
-			fd = open(devname, O_RDWR);
-			if (fd < 0) {
-				RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
-					devname, strerror(errno));
-				return -1;
-			}
-
-			if (maps[j].addr != NULL ||
-			    (mapaddr = pci_map_resource(NULL, fd,
-							(off_t)offset,
-							(size_t)maps[j].size)
-			    ) == NULL) {
-				rte_free(uio_res);
-				close(fd);
-				return (-1);
-			}
-			close(fd);
-
-			maps[j].addr = mapaddr;
-			maps[j].offset = offset;
-			dev->mem_resource[i].addr = mapaddr;
-		}
-	}
-
-	TAILQ_INSERT_TAIL(pci_res_list, uio_res, next);
-
-	return (0);
-}
-
 /* parse the "resource" sysfs file */
 #define IORESOURCE_MEM  0x00000200
 
@@ -546,41 +186,6 @@ error:
 	return -1;
 }
 
-/* 
- * parse a sysfs file containing one integer value 
- * different to the eal version, as it needs to work with 64-bit values
- */ 
-static int 
-pci_parse_sysfs_value(const char *filename, uint64_t *val) 
-{
-        FILE *f;
-        char buf[BUFSIZ];
-        char *end = NULL;
- 
-        f = fopen(filename, "r");
-        if (f == NULL) {
-                RTE_LOG(ERR, EAL, "%s(): cannot open sysfs value %s\n",
-                        __func__, filename);
-                return -1;
-        }
- 
-        if (fgets(buf, sizeof(buf), f) == NULL) {
-                RTE_LOG(ERR, EAL, "%s(): cannot read sysfs value %s\n",
-                        __func__, filename);
-                fclose(f);
-                return -1;
-        }
-        *val = strtoull(buf, &end, 0);
-        if ((buf[0] == '\0') || (end == NULL) || (*end != '\n')) {
-                RTE_LOG(ERR, EAL, "%s(): cannot parse sysfs value %s\n",
-                                __func__, filename);
-                fclose(f);
-                return -1;
-        }
-        fclose(f);
-        return 0;
-}
-
 /* Compare two PCI device addresses. */
 static int
 pci_addr_comparison(struct rte_pci_addr *addr, struct rte_pci_addr *addr2)
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
new file mode 100644
index 0000000..61f09cc
--- /dev/null
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
@@ -0,0 +1,403 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <string.h>
+#include <fcntl.h>
+#include <dirent.h>
+#include <sys/stat.h>
+
+#include <rte_log.h>
+#include <rte_pci.h>
+#include <rte_common.h>
+#include <rte_malloc.h>
+#include <rte_tailq.h>
+
+#include "rte_pci_dev_ids.h"
+#include "eal_filesystem.h"
+#include "eal_pci_init.h"
+
+static int pci_parse_sysfs_value(const char *filename, uint64_t *val);
+
+#define OFF_MAX              ((uint64_t)(off_t)-1)
+static int
+pci_uio_get_mappings(const char *devname, struct pci_map maps[], int nb_maps) {
+	int i;
+	char dirname[PATH_MAX];
+	char filename[PATH_MAX];
+	uint64_t offset, size;
+
+	for (i = 0; i != nb_maps; i++) {
+
+		/* check if map directory exists */
+		rte_snprintf(dirname, sizeof(dirname), "%s/maps/map%u", devname, i);
+
+		if (access(dirname, F_OK) != 0)
+			break;
+
+		/* get mapping offset */
+		rte_snprintf(filename, sizeof(filename), "%s/offset", dirname);
+		if (pci_parse_sysfs_value(filename, &offset) < 0) {
+			RTE_LOG(ERR, EAL,
+					"%s(): cannot parse offset of %s\n", __func__, dirname);
+			return (-1);
+		}
+
+		/* get mapping size */
+		rte_snprintf(filename, sizeof(filename), "%s/size", dirname);
+		if (pci_parse_sysfs_value(filename, &size) < 0) {
+			RTE_LOG(ERR, EAL,
+					"%s(): cannot parse size of %s\n", __func__, dirname);
+			return (-1);
+		}
+
+		/* get mapping physical address */
+		rte_snprintf(filename, sizeof(filename), "%s/addr", dirname);
+		if (pci_parse_sysfs_value(filename, &maps[i].phaddr) < 0) {
+			RTE_LOG(ERR, EAL,
+					"%s(): cannot parse addr of %s\n", __func__, dirname);
+			return (-1);
+		}
+
+		if ((offset > OFF_MAX) || (size > SIZE_MAX)) {
+			RTE_LOG(ERR, EAL,
+					"%s(): offset/size exceed system max value\n", __func__);
+			return (-1);
+		}
+
+		maps[i].offset = offset;
+		maps[i].size = size;
+	}
+
+	return (i);
+}
+
+static int
+pci_uio_map_secondary(struct rte_pci_device *dev) {
+	int fd, i;
+	struct mapped_pci_resource *uio_res;
+
+	TAILQ_FOREACH(uio_res, pci_res_list, next) {
+
+		/* skip this element if it doesn't match our PCI address */
+		if (memcmp(&uio_res->pci_addr, &dev->addr, sizeof(dev->addr)))
+			continue;
+
+		for (i = 0; i != uio_res->nb_maps; i++) {
+			/*
+			 * open devname, to mmap it
+			 */
+			fd = open(uio_res->path, O_RDWR);
+			if (fd < 0) {
+				RTE_LOG(ERR, EAL,
+						"Cannot open %s: %s\n", uio_res->path, strerror(errno));
+				return -1;
+			}
+
+			if (pci_map_resource(uio_res->maps[i].addr, fd,
+					(off_t) uio_res->maps[i].offset,
+					(size_t) uio_res->maps[i].size) != uio_res->maps[i].addr) {
+				RTE_LOG(ERR, EAL, "Cannot mmap device resource\n");
+				close(fd);
+				return (-1);
+			}
+			/* fd is not needed in slave process, close it */
+			close(fd);
+		}
+		return (0);
+	}
+
+	RTE_LOG(ERR, EAL, "Cannot find resource for device\n");
+	return -1;
+}
+
+static int
+pci_mknod_uio_dev(const char *sysfs_uio_path, unsigned uio_num) {
+	FILE *f;
+	char filename[PATH_MAX];
+	int ret;
+	unsigned major, minor;
+	dev_t dev;
+
+	/* get the name of the sysfs file that contains the major and minor
+	 * of the uio device and read its content */
+	rte_snprintf(filename, sizeof(filename), "%s/dev", sysfs_uio_path);
+
+	f = fopen(filename, "r");
+	if (f == NULL) {
+		RTE_LOG(ERR, EAL,
+				"%s(): cannot open sysfs to get major:minor\n", __func__);
+		return -1;
+	}
+
+	ret = fscanf(f, "%d:%d", &major, &minor);
+	if (ret != 2) {
+		RTE_LOG(ERR, EAL,
+				"%s(): cannot parse sysfs to get major:minor\n", __func__);
+		fclose(f);
+		return -1;
+	}
+	fclose(f);
+
+	/* create the char device "mknod /dev/uioX c major minor" */
+	rte_snprintf(filename, sizeof(filename), "/dev/uio%u", uio_num);
+	dev = makedev(major, minor);
+	ret = mknod(filename, S_IFCHR | S_IRUSR | S_IWUSR, dev);
+	if (f == NULL) {
+		RTE_LOG(ERR, EAL,
+				"%s(): mknod() failed %s\n", __func__, strerror(errno));
+		return -1;
+	}
+
+	return ret;
+}
+
+/*
+ * Return the uioX char device used for a pci device. On success, return
+ * the UIO number and fill dstbuf string with the path of the device in
+ * sysfs. On error, return a negative value. In this case dstbuf is
+ * invalid.
+ */
+static int
+pci_get_uio_dev(struct rte_pci_device *dev, char *dstbuf,
+		unsigned int buflen) {
+	struct rte_pci_addr *loc = &dev->addr;
+	unsigned int uio_num;
+	struct dirent *e;
+	DIR *dir;
+	char dirname[PATH_MAX];
+
+	/* depending on kernel version, uio can be located in uio/uioX
+	 * or uio:uioX */
+
+	rte_snprintf(dirname, sizeof(dirname),
+			SYSFS_PCI_DEVICES "/" PCI_PRI_FMT "/uio", loc->domain, loc->bus,
+			loc->devid, loc->function);
+
+	dir = opendir(dirname);
+	if (dir == NULL) {
+		/* retry with the parent directory */
+		rte_snprintf(dirname, sizeof(dirname),
+				SYSFS_PCI_DEVICES "/" PCI_PRI_FMT, loc->domain, loc->bus,
+				loc->devid, loc->function);
+		dir = opendir(dirname);
+
+		if (dir == NULL) {
+			RTE_LOG(ERR, EAL, "Cannot opendir %s\n", dirname);
+			return -1;
+		}
+	}
+
+	/* take the first file starting with "uio" */
+	while ((e = readdir(dir)) != NULL) {
+		/* format could be uio%d ...*/
+		int shortprefix_len = sizeof("uio") - 1;
+		/* ... or uio:uio%d */
+		int longprefix_len = sizeof("uio:uio") - 1;
+		char *endptr;
+
+		if (strncmp(e->d_name, "uio", 3) != 0)
+			continue;
+
+		/* first try uio%d */errno = 0;
+		uio_num = strtoull(e->d_name + shortprefix_len, &endptr, 10);
+		if (errno == 0 && endptr != (e->d_name + shortprefix_len)) {
+			rte_snprintf(dstbuf, buflen, "%s/uio%u", dirname, uio_num);
+			break;
+		}
+
+		/* then try uio:uio%d */errno = 0;
+		uio_num = strtoull(e->d_name + longprefix_len, &endptr, 10);
+		if (errno == 0 && endptr != (e->d_name + longprefix_len)) {
+			rte_snprintf(dstbuf, buflen, "%s/uio:uio%u", dirname, uio_num);
+			break;
+		}
+	}
+	closedir(dir);
+
+	/* No uio resource found */
+	if (e == NULL)
+		return -1;
+
+	/* create uio device if we've been asked to */
+	if (internal_config.create_uio_dev
+			&& pci_mknod_uio_dev(dstbuf, uio_num) < 0)
+		RTE_LOG(WARNING, EAL, "Cannot create /dev/uio%u\n", uio_num);
+
+	return uio_num;
+}
+
+/* map the PCI resource of a PCI device in virtual memory */
+int
+pci_uio_map_resource(struct rte_pci_device *dev) {
+	int i, j;
+	char dirname[PATH_MAX];
+	char devname[PATH_MAX]; /* contains the /dev/uioX */
+	void *mapaddr;
+	int uio_num;
+	uint64_t phaddr;
+	uint64_t offset;
+	uint64_t pagesz;
+	int nb_maps;
+	struct rte_pci_addr *loc = &dev->addr;
+	struct mapped_pci_resource *uio_res;
+	struct pci_map *maps;
+
+	dev->intr_handle.fd = -1;
+	dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
+
+	/* secondary processes - use already recorded details */
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
+		return (pci_uio_map_secondary(dev));
+
+	/* find uio resource */
+	uio_num = pci_get_uio_dev(dev, dirname, sizeof(dirname));
+	if (uio_num < 0) {
+		RTE_LOG(WARNING, EAL, "  "PCI_PRI_FMT" not managed by UIO driver, "
+		"skipping\n", loc->domain, loc->bus, loc->devid, loc->function);
+		return -1;
+	}
+	rte_snprintf(devname, sizeof(devname), "/dev/uio%u", uio_num);
+
+	/* save fd if in primary process */
+	dev->intr_handle.fd = open(devname, O_RDWR);
+	if (dev->intr_handle.fd < 0) {
+		RTE_LOG(ERR, EAL, "Cannot open %s: %s\n", devname, strerror(errno));
+		return -1;
+	}
+	dev->intr_handle.type = RTE_INTR_HANDLE_UIO;
+
+	/* allocate the mapping details for secondary processes*/
+	if ((uio_res = rte_zmalloc("UIO_RES", sizeof(*uio_res), 0)) == NULL) {
+		RTE_LOG(ERR, EAL, "%s(): cannot store uio mmap details\n", __func__);
+		return (-1);
+	}
+
+	rte_snprintf(uio_res->path, sizeof(uio_res->path), "%s", devname);
+	memcpy(&uio_res->pci_addr, &dev->addr, sizeof(uio_res->pci_addr));
+
+	/* collect info about device mappings */
+	nb_maps = pci_uio_get_mappings(dirname, uio_res->maps,
+			RTE_DIM(uio_res->maps));
+	if (nb_maps < 0) {
+		rte_free(uio_res);
+		return (nb_maps);
+	}
+
+	uio_res->nb_maps = nb_maps;
+
+	/* Map all BARs */
+	pagesz = sysconf(_SC_PAGESIZE);
+
+	maps = uio_res->maps;
+	for (i = 0; i != PCI_MAX_RESOURCE; i++) {
+		int fd;
+
+		/* skip empty BAR */
+		if ((phaddr = dev->mem_resource[i].phys_addr) == 0)
+			continue;
+
+		for (j = 0;
+				j != nb_maps
+						&& (phaddr != maps[j].phaddr
+								|| dev->mem_resource[i].len != maps[j].size);
+				j++)
+			;
+
+		/* if matching map is found, then use it */
+		if (j != nb_maps) {
+			offset = j * pagesz;
+
+			/*
+			 * open devname, to mmap it
+			 */
+			fd = open(uio_res->path, O_RDWR);
+			if (fd < 0) {
+				RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
+					uio_res->path, strerror(errno));
+				rte_free(uio_res);
+				return -1;
+			}
+
+			if (maps[j].addr != NULL
+					|| (mapaddr = pci_map_resource(NULL, fd,
+							(off_t) offset, (size_t) maps[j].size)) == NULL) {
+				rte_free(uio_res);
+				close(fd);
+				return (-1);
+			}
+			close(fd);
+
+			maps[j].addr = mapaddr;
+			maps[j].offset = offset;
+			dev->mem_resource[i].addr = mapaddr;
+		}
+	}
+
+	TAILQ_INSERT_TAIL(pci_res_list, uio_res, next);
+
+	return (0);
+}
+
+/*
+ * parse a sysfs file containing one integer value
+ * different to the eal version, as it needs to work with 64-bit values
+ */
+static int
+pci_parse_sysfs_value(const char *filename, uint64_t *val) {
+	FILE *f;
+	char buf[BUFSIZ];
+	char *end = NULL;
+
+	f = fopen(filename, "r");
+	if (f == NULL) {
+		RTE_LOG(ERR, EAL,
+				"%s(): cannot open sysfs value %s\n", __func__, filename);
+		return -1;
+	}
+
+	if (fgets(buf, sizeof(buf), f) == NULL) {
+		RTE_LOG(ERR, EAL,
+				"%s(): cannot read sysfs value %s\n", __func__, filename);
+		fclose(f);
+		return -1;
+	}
+	*val = strtoull(buf, &end, 0);
+	if ((buf[0] == '\0') || (end == NULL) || (*end != '\n')) {
+		RTE_LOG(ERR, EAL,
+				"%s(): cannot parse sysfs value %s\n", __func__, filename);
+		fclose(f);
+		return -1;
+	}
+	fclose(f);
+	return 0;
+}
diff --git a/lib/librte_eal/linuxapp/eal/include/eal_pci_init.h b/lib/librte_eal/linuxapp/eal/include/eal_pci_init.h
new file mode 100644
index 0000000..1292eda
--- /dev/null
+++ b/lib/librte_eal/linuxapp/eal/include/eal_pci_init.h
@@ -0,0 +1,66 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef EAL_PCI_INIT_H_
+#define EAL_PCI_INIT_H_
+
+struct pci_map {
+	void *addr;
+	uint64_t offset;
+	uint64_t size;
+	uint64_t phaddr;
+};
+
+/*
+ * For multi-process we need to reproduce all PCI mappings in secondary
+ * processes, so save them in a tailq.
+ */
+struct mapped_pci_resource {
+	TAILQ_ENTRY(mapped_pci_resource) next;
+
+	struct rte_pci_addr pci_addr;
+	char path[PATH_MAX];
+	int nb_maps;
+	struct pci_map maps[PCI_MAX_RESOURCE];
+};
+
+TAILQ_HEAD(mapped_pci_res_list, mapped_pci_resource);
+extern struct mapped_pci_res_list *pci_res_list;
+
+void * pci_map_resource(void * requested_addr, int fd, off_t offset,
+		size_t size);
+
+/* map IGB_UIO resource prototype */
+int pci_uio_map_resource(struct rte_pci_device *dev);
+
+#endif /* EAL_PCI_INIT_H_ */
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v4 03/20] pci: fixing errors in a previous commit found by checkpatch
  2014-06-03 10:17     ` [dpdk-dev] [PATCH v4 00/20] Add VFIO support to DPDK Anatoly Burakov
  2014-06-03 10:17       ` [dpdk-dev] [PATCH v4 01/20] pci: move open() out of pci_map_resource, rename structs Anatoly Burakov
  2014-06-03 10:17       ` [dpdk-dev] [PATCH v4 02/20] pci: move uio mapping code to a separate file Anatoly Burakov
@ 2014-06-03 10:17       ` Anatoly Burakov
  2014-06-03 10:17       ` [dpdk-dev] [PATCH v4 04/20] pci: distinguish between legitimate failures and non-fatal errors Anatoly Burakov
                         ` (17 subsequent siblings)
  20 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-06-03 10:17 UTC (permalink / raw)
  To: dev


Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
index 61f09cc..ae4e716 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
@@ -69,7 +69,7 @@ pci_uio_get_mappings(const char *devname, struct pci_map maps[], int nb_maps) {
 		if (pci_parse_sysfs_value(filename, &offset) < 0) {
 			RTE_LOG(ERR, EAL,
 					"%s(): cannot parse offset of %s\n", __func__, dirname);
-			return (-1);
+			return -1;
 		}
 
 		/* get mapping size */
@@ -77,7 +77,7 @@ pci_uio_get_mappings(const char *devname, struct pci_map maps[], int nb_maps) {
 		if (pci_parse_sysfs_value(filename, &size) < 0) {
 			RTE_LOG(ERR, EAL,
 					"%s(): cannot parse size of %s\n", __func__, dirname);
-			return (-1);
+			return -1;
 		}
 
 		/* get mapping physical address */
@@ -85,20 +85,20 @@ pci_uio_get_mappings(const char *devname, struct pci_map maps[], int nb_maps) {
 		if (pci_parse_sysfs_value(filename, &maps[i].phaddr) < 0) {
 			RTE_LOG(ERR, EAL,
 					"%s(): cannot parse addr of %s\n", __func__, dirname);
-			return (-1);
+			return -1;
 		}
 
 		if ((offset > OFF_MAX) || (size > SIZE_MAX)) {
 			RTE_LOG(ERR, EAL,
 					"%s(): offset/size exceed system max value\n", __func__);
-			return (-1);
+			return -1;
 		}
 
 		maps[i].offset = offset;
 		maps[i].size = size;
 	}
 
-	return (i);
+	return i;
 }
 
 static int
@@ -128,12 +128,12 @@ pci_uio_map_secondary(struct rte_pci_device *dev) {
 					(size_t) uio_res->maps[i].size) != uio_res->maps[i].addr) {
 				RTE_LOG(ERR, EAL, "Cannot mmap device resource\n");
 				close(fd);
-				return (-1);
+				return -1;
 			}
 			/* fd is not needed in slave process, close it */
 			close(fd);
 		}
-		return (0);
+		return 0;
 	}
 
 	RTE_LOG(ERR, EAL, "Cannot find resource for device\n");
@@ -277,7 +277,7 @@ pci_uio_map_resource(struct rte_pci_device *dev) {
 
 	/* secondary processes - use already recorded details */
 	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
-		return (pci_uio_map_secondary(dev));
+		return pci_uio_map_secondary(dev);
 
 	/* find uio resource */
 	uio_num = pci_get_uio_dev(dev, dirname, sizeof(dirname));
@@ -299,7 +299,7 @@ pci_uio_map_resource(struct rte_pci_device *dev) {
 	/* allocate the mapping details for secondary processes*/
 	if ((uio_res = rte_zmalloc("UIO_RES", sizeof(*uio_res), 0)) == NULL) {
 		RTE_LOG(ERR, EAL, "%s(): cannot store uio mmap details\n", __func__);
-		return (-1);
+		return -1;
 	}
 
 	rte_snprintf(uio_res->path, sizeof(uio_res->path), "%s", devname);
@@ -310,7 +310,7 @@ pci_uio_map_resource(struct rte_pci_device *dev) {
 			RTE_DIM(uio_res->maps));
 	if (nb_maps < 0) {
 		rte_free(uio_res);
-		return (nb_maps);
+		return nb_maps;
 	}
 
 	uio_res->nb_maps = nb_maps;
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v4 04/20] pci: distinguish between legitimate failures and non-fatal errors
  2014-06-03 10:17     ` [dpdk-dev] [PATCH v4 00/20] Add VFIO support to DPDK Anatoly Burakov
                         ` (2 preceding siblings ...)
  2014-06-03 10:17       ` [dpdk-dev] [PATCH v4 03/20] pci: fixing errors in a previous commit found by checkpatch Anatoly Burakov
@ 2014-06-03 10:17       ` Anatoly Burakov
  2014-06-03 10:17       ` [dpdk-dev] [PATCH v4 05/20] pci: Rename RTE_PCI_DRV_NEED_IGB_UIO to RTE_PCI_DRV_NEED_MAPPING Anatoly Burakov
                         ` (16 subsequent siblings)
  20 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-06-03 10:17 UTC (permalink / raw)
  To: dev

Currently, EAL does not distinguish between actual failures and expected
initialization errors. E.g. sometimes the driver fails to initialize
because it was not supposed to be initialized in the first place, such
as device not being managed by said driver.

This patch makes EAL fail on actual initialization errors while still
skipping over expected initialization errors.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/eal_common_pci.c    | 16 +++++++++-------
 lib/librte_eal/linuxapp/eal/eal_pci.c     |  7 ++++---
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c |  4 ++--
 3 files changed, 15 insertions(+), 12 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_pci.c b/lib/librte_eal/common/eal_common_pci.c
index 7c23e86..1fb8f2c 100644
--- a/lib/librte_eal/common/eal_common_pci.c
+++ b/lib/librte_eal/common/eal_common_pci.c
@@ -101,8 +101,8 @@ static struct rte_devargs *pci_devargs_lookup(struct rte_pci_device *dev)
 
 /*
  * If vendor/device ID match, call the devinit() function of all
- * registered driver for the given device. Return -1 if no driver is
- * found for this device.
+ * registered driver for the given device. Return -1 if initialization
+ * failed, return 1 if no driver is found for this device.
  * For drivers with the RTE_PCI_DRV_MULTIPLE flag enabled, register
  * the same device multiple times until failure to do so.
  * It is required for non-Intel NIC drivers provided by third-parties such
@@ -118,7 +118,7 @@ pci_probe_all_drivers(struct rte_pci_device *dev)
 		rc = rte_eal_pci_probe_one_driver(dr, dev);
 		if (rc < 0)
 			/* negative value is an error */
-			break;
+			return -1;
 		if (rc > 0)
 			/* positive value means driver not found */
 			continue;
@@ -130,7 +130,7 @@ pci_probe_all_drivers(struct rte_pci_device *dev)
 				;
 		return 0;
 	}
-	return -1;
+	return 1;
 }
 
 /*
@@ -144,6 +144,7 @@ rte_eal_pci_probe(void)
 	struct rte_pci_device *dev = NULL;
 	struct rte_devargs *devargs;
 	int probe_all = 0;
+	int ret = 0;
 
 	if (rte_eal_devargs_type_count(RTE_DEVTYPE_WHITELISTED_PCI) == 0)
 		probe_all = 1;
@@ -157,10 +158,11 @@ rte_eal_pci_probe(void)
 
 		/* probe all or only whitelisted devices */
 		if (probe_all)
-			pci_probe_all_drivers(dev);
+			ret = pci_probe_all_drivers(dev);
 		else if (devargs != NULL &&
-			devargs->type == RTE_DEVTYPE_WHITELISTED_PCI &&
-			pci_probe_all_drivers(dev) < 0)
+			devargs->type == RTE_DEVTYPE_WHITELISTED_PCI)
+			ret = pci_probe_all_drivers(dev);
+		if (ret < 0)
 			rte_exit(EXIT_FAILURE, "Requested device " PCI_PRI_FMT
 				 " cannot be used\n", dev->addr.domain, dev->addr.bus,
 				 dev->addr.devid, dev->addr.function);
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c
index 628813b..0b779ec 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -401,6 +401,7 @@ int
 rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, struct rte_pci_device *dev)
 {
 	struct rte_pci_id *id_table;
+	int ret = 0;
 
 	for (id_table = dr->id_table ; id_table->vendor_id != 0; id_table++) {
 
@@ -431,13 +432,13 @@ rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, struct rte_pci_device *d
 		if (dev->devargs != NULL &&
 			dev->devargs->type == RTE_DEVTYPE_BLACKLISTED_PCI) {
 			RTE_LOG(DEBUG, EAL, "  Device is blacklisted, not initializing\n");
-			return 0;
+			return 1;
 		}
 
 		if (dr->drv_flags & RTE_PCI_DRV_NEED_IGB_UIO) {
 			/* map resources for devices that use igb_uio */
-			if (pci_uio_map_resource(dev) < 0)
-				return -1;
+			if ((ret = pci_uio_map_resource(dev)) != 0)
+				return ret;
 		} else if (dr->drv_flags & RTE_PCI_DRV_FORCE_UNBIND &&
 		           rte_eal_process_type() == RTE_PROC_PRIMARY) {
 			/* unbind current driver */
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
index ae4e716..426769b 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
@@ -137,7 +137,7 @@ pci_uio_map_secondary(struct rte_pci_device *dev) {
 	}
 
 	RTE_LOG(ERR, EAL, "Cannot find resource for device\n");
-	return -1;
+	return 1;
 }
 
 static int
@@ -284,7 +284,7 @@ pci_uio_map_resource(struct rte_pci_device *dev) {
 	if (uio_num < 0) {
 		RTE_LOG(WARNING, EAL, "  "PCI_PRI_FMT" not managed by UIO driver, "
 		"skipping\n", loc->domain, loc->bus, loc->devid, loc->function);
-		return -1;
+		return 1;
 	}
 	rte_snprintf(devname, sizeof(devname), "/dev/uio%u", uio_num);
 
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v4 05/20] pci: Rename RTE_PCI_DRV_NEED_IGB_UIO to RTE_PCI_DRV_NEED_MAPPING
  2014-06-03 10:17     ` [dpdk-dev] [PATCH v4 00/20] Add VFIO support to DPDK Anatoly Burakov
                         ` (3 preceding siblings ...)
  2014-06-03 10:17       ` [dpdk-dev] [PATCH v4 04/20] pci: distinguish between legitimate failures and non-fatal errors Anatoly Burakov
@ 2014-06-03 10:17       ` Anatoly Burakov
  2014-06-04  9:03         ` Burakov, Anatoly
  2014-06-03 10:17       ` [dpdk-dev] [PATCH v4 06/20] igb_uio: make igb_uio compilation optional Anatoly Burakov
                         ` (15 subsequent siblings)
  20 siblings, 1 reply; 160+ messages in thread
From: Anatoly Burakov @ 2014-06-03 10:17 UTC (permalink / raw)
  To: dev

Rename the RTE_PCI_DRV_NEED_IGB_UIO to be more generic.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 app/test/test_pci.c                     | 4 ++--
 lib/librte_eal/bsdapp/eal/eal_pci.c     | 2 +-
 lib/librte_eal/common/include/rte_pci.h | 4 ++--
 lib/librte_eal/linuxapp/eal/eal_pci.c   | 2 +-
 lib/librte_pmd_e1000/em_ethdev.c        | 2 +-
 lib/librte_pmd_e1000/igb_ethdev.c       | 4 ++--
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c     | 4 ++--
 lib/librte_pmd_vmxnet3/vmxnet3_ethdev.c | 2 +-
 8 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/app/test/test_pci.c b/app/test/test_pci.c
index 6908d04..fad118e 100644
--- a/app/test/test_pci.c
+++ b/app/test/test_pci.c
@@ -63,7 +63,7 @@ static int my_driver_init(struct rte_pci_driver *dr,
 			  struct rte_pci_device *dev);
 
 /*
- * To test cases where RTE_PCI_DRV_NEED_IGB_UIO is set, and isn't set, two
+ * To test cases where RTE_PCI_DRV_NEED_MAPPING is set, and isn't set, two
  * drivers are created (one with IGB devices, the other with IXGBE devices).
  */
 
@@ -91,7 +91,7 @@ struct rte_pci_driver my_driver = {
 	.name = "test_driver",
 	.devinit = my_driver_init,
 	.id_table = my_driver_id,
-	.drv_flags = RTE_PCI_DRV_NEED_IGB_UIO,
+	.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
 };
 
 struct rte_pci_driver my_driver2 = {
diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c b/lib/librte_eal/bsdapp/eal/eal_pci.c
index 94ae461..eddbd2f 100644
--- a/lib/librte_eal/bsdapp/eal/eal_pci.c
+++ b/lib/librte_eal/bsdapp/eal/eal_pci.c
@@ -474,7 +474,7 @@ rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, struct rte_pci_device *d
 			return 0;
 		}
 
-		if (dr->drv_flags & RTE_PCI_DRV_NEED_IGB_UIO) {
+		if (dr->drv_flags & RTE_PCI_DRV_NEED_MAPPING) {
 			/* map resources for devices that use igb_uio */
 			if (pci_uio_map_resource(dev) < 0)
 				return -1;
diff --git a/lib/librte_eal/common/include/rte_pci.h b/lib/librte_eal/common/include/rte_pci.h
index c793773..11b8c13 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -190,8 +190,8 @@ struct rte_pci_driver {
 	uint32_t drv_flags;                     /**< Flags contolling handling of device. */
 };
 
-/** Device needs igb_uio kernel module */
-#define RTE_PCI_DRV_NEED_IGB_UIO 0x0001
+/** Device needs PCI BAR mapping (done with either IGB_UIO or VFIO) */
+#define RTE_PCI_DRV_NEED_MAPPING 0x0001
 /** Device driver must be registered several times until failure */
 #define RTE_PCI_DRV_MULTIPLE 0x0002
 /** Device needs to be unbound even if no module is provided */
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c
index 0b779ec..a0abec8 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -435,7 +435,7 @@ rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, struct rte_pci_device *d
 			return 1;
 		}
 
-		if (dr->drv_flags & RTE_PCI_DRV_NEED_IGB_UIO) {
+		if (dr->drv_flags & RTE_PCI_DRV_NEED_MAPPING) {
 			/* map resources for devices that use igb_uio */
 			if ((ret = pci_uio_map_resource(dev)) != 0)
 				return ret;
diff --git a/lib/librte_pmd_e1000/em_ethdev.c b/lib/librte_pmd_e1000/em_ethdev.c
index 493806c..c8355bc 100644
--- a/lib/librte_pmd_e1000/em_ethdev.c
+++ b/lib/librte_pmd_e1000/em_ethdev.c
@@ -280,7 +280,7 @@ static struct eth_driver rte_em_pmd = {
 	{
 		.name = "rte_em_pmd",
 		.id_table = pci_id_em_map,
-		.drv_flags = RTE_PCI_DRV_NEED_IGB_UIO,
+		.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
 	},
 	.eth_dev_init = eth_em_dev_init,
 	.dev_private_size = sizeof(struct e1000_adapter),
diff --git a/lib/librte_pmd_e1000/igb_ethdev.c b/lib/librte_pmd_e1000/igb_ethdev.c
index 5f93bcf..d60f923 100644
--- a/lib/librte_pmd_e1000/igb_ethdev.c
+++ b/lib/librte_pmd_e1000/igb_ethdev.c
@@ -603,7 +603,7 @@ static struct eth_driver rte_igb_pmd = {
 	{
 		.name = "rte_igb_pmd",
 		.id_table = pci_id_igb_map,
-		.drv_flags = RTE_PCI_DRV_NEED_IGB_UIO,
+		.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
 	},
 	.eth_dev_init = eth_igb_dev_init,
 	.dev_private_size = sizeof(struct e1000_adapter),
@@ -616,7 +616,7 @@ static struct eth_driver rte_igbvf_pmd = {
 	{
 		.name = "rte_igbvf_pmd",
 		.id_table = pci_id_igbvf_map,
-		.drv_flags = RTE_PCI_DRV_NEED_IGB_UIO,
+		.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
 	},
 	.eth_dev_init = eth_igbvf_dev_init,
 	.dev_private_size = sizeof(struct e1000_adapter),
diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
index b38235c..f8e6039 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
@@ -999,7 +999,7 @@ static struct eth_driver rte_ixgbe_pmd = {
 	{
 		.name = "rte_ixgbe_pmd",
 		.id_table = pci_id_ixgbe_map,
-		.drv_flags = RTE_PCI_DRV_NEED_IGB_UIO,
+		.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
 	},
 	.eth_dev_init = eth_ixgbe_dev_init,
 	.dev_private_size = sizeof(struct ixgbe_adapter),
@@ -1012,7 +1012,7 @@ static struct eth_driver rte_ixgbevf_pmd = {
 	{
 		.name = "rte_ixgbevf_pmd",
 		.id_table = pci_id_ixgbevf_map,
-		.drv_flags = RTE_PCI_DRV_NEED_IGB_UIO,
+		.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
 	},
 	.eth_dev_init = eth_ixgbevf_dev_init,
 	.dev_private_size = sizeof(struct ixgbe_adapter),
diff --git a/lib/librte_pmd_vmxnet3/vmxnet3_ethdev.c b/lib/librte_pmd_vmxnet3/vmxnet3_ethdev.c
index c41032f..d42d709 100644
--- a/lib/librte_pmd_vmxnet3/vmxnet3_ethdev.c
+++ b/lib/librte_pmd_vmxnet3/vmxnet3_ethdev.c
@@ -268,7 +268,7 @@ static struct eth_driver rte_vmxnet3_pmd = {
 	{
 		.name = "rte_vmxnet3_pmd",
 		.id_table = pci_id_vmxnet3_map,
-		.drv_flags = RTE_PCI_DRV_NEED_IGB_UIO,
+		.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
 	},
 	.eth_dev_init = eth_vmxnet3_dev_init,
 	.dev_private_size = sizeof(struct vmxnet3_adapter),
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v4 06/20] igb_uio: make igb_uio compilation optional
  2014-06-03 10:17     ` [dpdk-dev] [PATCH v4 00/20] Add VFIO support to DPDK Anatoly Burakov
                         ` (4 preceding siblings ...)
  2014-06-03 10:17       ` [dpdk-dev] [PATCH v4 05/20] pci: Rename RTE_PCI_DRV_NEED_IGB_UIO to RTE_PCI_DRV_NEED_MAPPING Anatoly Burakov
@ 2014-06-03 10:17       ` Anatoly Burakov
  2014-06-03 10:17       ` [dpdk-dev] [PATCH v4 07/20] igb_uio: Moved interrupt type out of igb_uio Anatoly Burakov
                         ` (14 subsequent siblings)
  20 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-06-03 10:17 UTC (permalink / raw)
  To: dev

Currently, igb_uio is always compiled. Some Linux distributions may not
want to include igb_uio with DPDK, so we need to make sure that igb_uio
compilation for Linuxapp targets can be optional.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 config/common_linuxapp           | 1 +
 lib/librte_eal/linuxapp/Makefile | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/config/common_linuxapp b/config/common_linuxapp
index 62619c6..b17e37e 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -123,6 +123,7 @@ CONFIG_RTE_LOG_HISTORY=256
 CONFIG_RTE_LIBEAL_USE_HPET=n
 CONFIG_RTE_EAL_ALLOW_INV_SOCKET_ID=n
 CONFIG_RTE_EAL_ALWAYS_PANIC_ON_ERROR=n
+CONFIG_RTE_EAL_IGB_UIO=y
 
 #
 # Compile Environment Abstraction Layer for linux
diff --git a/lib/librte_eal/linuxapp/Makefile b/lib/librte_eal/linuxapp/Makefile
index b00e89f..acbf500 100644
--- a/lib/librte_eal/linuxapp/Makefile
+++ b/lib/librte_eal/linuxapp/Makefile
@@ -31,7 +31,9 @@
 
 include $(RTE_SDK)/mk/rte.vars.mk
 
+ifeq ($(CONFIG_RTE_EAL_IGB_UIO),y)
 DIRS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += igb_uio
+endif
 DIRS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal
 ifeq ($(CONFIG_RTE_LIBRTE_KNI),y)
 DIRS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += kni
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v4 07/20] igb_uio: Moved interrupt type out of igb_uio
  2014-06-03 10:17     ` [dpdk-dev] [PATCH v4 00/20] Add VFIO support to DPDK Anatoly Burakov
                         ` (5 preceding siblings ...)
  2014-06-03 10:17       ` [dpdk-dev] [PATCH v4 06/20] igb_uio: make igb_uio compilation optional Anatoly Burakov
@ 2014-06-03 10:17       ` Anatoly Burakov
  2014-06-03 10:17       ` [dpdk-dev] [PATCH v4 08/20] vfio: add support for VFIO in Linuxapp targets Anatoly Burakov
                         ` (13 subsequent siblings)
  20 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-06-03 10:17 UTC (permalink / raw)
  To: dev

Moving interrupt type enum out of igb_uio and renaming it to be more
generic. Such a strange header naming and separation is done mostly to
make coming virtio patches easier to port to dpdk.org tree.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/Makefile                     |  1 +
 lib/librte_eal/common/include/rte_pci.h            |  1 +
 .../common/include/rte_pci_dev_feature_defs.h      | 46 +++++++++++++++++++++
 .../common/include/rte_pci_dev_features.h          | 44 ++++++++++++++++++++
 lib/librte_eal/linuxapp/igb_uio/igb_uio.c          | 48 +++++++++-------------
 5 files changed, 112 insertions(+), 28 deletions(-)
 create mode 100644 lib/librte_eal/common/include/rte_pci_dev_feature_defs.h
 create mode 100644 lib/librte_eal/common/include/rte_pci_dev_features.h

diff --git a/lib/librte_eal/common/Makefile b/lib/librte_eal/common/Makefile
index 0016fc5..e2a3f3a 100644
--- a/lib/librte_eal/common/Makefile
+++ b/lib/librte_eal/common/Makefile
@@ -39,6 +39,7 @@ INC += rte_rwlock.h rte_spinlock.h rte_tailq.h rte_interrupts.h rte_alarm.h
 INC += rte_string_fns.h rte_cpuflags.h rte_version.h rte_tailq_elem.h
 INC += rte_eal_memconfig.h rte_malloc_heap.h
 INC += rte_hexdump.h rte_devargs.h rte_dev.h
+INC += rte_pci_dev_feature_defs.h rte_pci_dev_features.h
 
 ifeq ($(CONFIG_RTE_INSECURE_FUNCTION_WARNING),y)
 INC += rte_warnings.h
diff --git a/lib/librte_eal/common/include/rte_pci.h b/lib/librte_eal/common/include/rte_pci.h
index 11b8c13..e653027 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -80,6 +80,7 @@ extern "C" {
 #include <sys/queue.h>
 #include <stdint.h>
 #include <inttypes.h>
+
 #include <rte_interrupts.h>
 
 TAILQ_HEAD(pci_device_list, rte_pci_device); /**< PCI devices in D-linked Q. */
diff --git a/lib/librte_eal/common/include/rte_pci_dev_feature_defs.h b/lib/librte_eal/common/include/rte_pci_dev_feature_defs.h
new file mode 100644
index 0000000..82f2c00
--- /dev/null
+++ b/lib/librte_eal/common/include/rte_pci_dev_feature_defs.h
@@ -0,0 +1,46 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_PCI_DEV_DEFS_H_
+#define _RTE_PCI_DEV_DEFS_H_
+
+/* interrupt mode */
+enum rte_intr_mode {
+	RTE_INTR_MODE_NONE = 0,
+	RTE_INTR_MODE_LEGACY,
+	RTE_INTR_MODE_MSI,
+	RTE_INTR_MODE_MSIX,
+	RTE_INTR_MODE_MAX
+};
+
+#endif /* _RTE_PCI_DEV_DEFS_H_ */
diff --git a/lib/librte_eal/common/include/rte_pci_dev_features.h b/lib/librte_eal/common/include/rte_pci_dev_features.h
new file mode 100644
index 0000000..01200de
--- /dev/null
+++ b/lib/librte_eal/common/include/rte_pci_dev_features.h
@@ -0,0 +1,44 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_PCI_DEV_FEATURES_H
+#define _RTE_PCI_DEV_FEATURES_H
+
+#include <rte_pci_dev_feature_defs.h>
+
+#define RTE_INTR_MODE_NONE_NAME "none"
+#define RTE_INTR_MODE_LEGACY_NAME "legacy"
+#define RTE_INTR_MODE_MSI_NAME "msi"
+#define RTE_INTR_MODE_MSIX_NAME "msix"
+
+#endif
diff --git a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
index 09c40bf..7d5e6b4 100644
--- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
+++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
@@ -33,6 +33,7 @@
 #ifdef CONFIG_XEN_DOM0 
 #include <xen/xen.h>
 #endif
+#include <rte_pci_dev_features.h>
 
 /**
  * MSI-X related macros, copy from linux/pci_regs.h in kernel 2.6.39,
@@ -49,14 +50,6 @@
 
 #define IGBUIO_NUM_MSI_VECTORS 1
 
-/* interrupt mode */
-enum igbuio_intr_mode {
-	IGBUIO_LEGACY_INTR_MODE = 0,
-	IGBUIO_MSI_INTR_MODE,
-	IGBUIO_MSIX_INTR_MODE,
-	IGBUIO_INTR_MODE_MAX
-};
-
 /**
  * A structure describing the private information for a uio device.
  */
@@ -64,13 +57,13 @@ struct rte_uio_pci_dev {
 	struct uio_info info;
 	struct pci_dev *pdev;
 	spinlock_t lock; /* spinlock for accessing PCI config space or msix data in multi tasks/isr */
-	enum igbuio_intr_mode mode;
+	enum rte_intr_mode mode;
 	struct msix_entry \
 		msix_entries[IGBUIO_NUM_MSI_VECTORS]; /* pointer to the msix vectors to be allocated later */
 };
 
 static char *intr_mode = NULL;
-static enum igbuio_intr_mode igbuio_intr_mode_preferred = IGBUIO_MSIX_INTR_MODE;
+static enum rte_intr_mode igbuio_intr_mode_preferred = RTE_INTR_MODE_MSIX;
 
 /* PCI device id table */
 static struct pci_device_id igbuio_pci_ids[] = {
@@ -222,14 +215,13 @@ igbuio_set_interrupt_mask(struct rte_uio_pci_dev *udev, int32_t state)
 {
 	struct pci_dev *pdev = udev->pdev;
 
-	if (udev->mode == IGBUIO_MSIX_INTR_MODE) {
+	if (udev->mode == RTE_INTR_MODE_MSIX) {
 		struct msi_desc *desc;
 
 		list_for_each_entry(desc, &pdev->msi_list, list) {
 			igbuio_msix_mask_irq(desc, state);
 		}
-	}
-	else if (udev->mode == IGBUIO_LEGACY_INTR_MODE) {
+	} else if (udev->mode == RTE_INTR_MODE_LEGACY) {
 		uint32_t status;
 		uint16_t old, new;
 
@@ -301,7 +293,7 @@ igbuio_pci_irqhandler(int irq, struct uio_info *info)
 		goto spin_unlock;
 
 	/* for legacy mode, interrupt maybe shared */
-	if (udev->mode == IGBUIO_LEGACY_INTR_MODE) {
+	if (udev->mode == RTE_INTR_MODE_LEGACY) {
 		pci_read_config_dword(pdev, PCI_COMMAND, &cmd_status_dword);
 		status = cmd_status_dword >> 16;
 		/* interrupt is not ours, goes to out */
@@ -520,18 +512,18 @@ igbuio_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
 #endif
 	udev->info.priv = udev;
 	udev->pdev = dev;
-	udev->mode = 0; /* set the default value for interrupt mode */
+	udev->mode = RTE_INTR_MODE_LEGACY;
 	spin_lock_init(&udev->lock);
 
 	/* check if it need to try msix first */
-	if (igbuio_intr_mode_preferred == IGBUIO_MSIX_INTR_MODE) {
+	if (igbuio_intr_mode_preferred == RTE_INTR_MODE_MSIX) {
 		int vector;
 
 		for (vector = 0; vector < IGBUIO_NUM_MSI_VECTORS; vector ++)
 			udev->msix_entries[vector].entry = vector;
 
 		if (pci_enable_msix(udev->pdev, udev->msix_entries, IGBUIO_NUM_MSI_VECTORS) == 0) {
-			udev->mode = IGBUIO_MSIX_INTR_MODE;
+			udev->mode = RTE_INTR_MODE_MSIX;
 		}
 		else {
 			pci_disable_msix(udev->pdev);
@@ -539,13 +531,13 @@ igbuio_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
 		}
 	}
 	switch (udev->mode) {
-	case IGBUIO_MSIX_INTR_MODE:
+	case RTE_INTR_MODE_MSIX:
 		udev->info.irq_flags = 0;
 		udev->info.irq = udev->msix_entries[0].vector;
 		break;
-	case IGBUIO_MSI_INTR_MODE:
+	case RTE_INTR_MODE_MSI:
 		break;
-	case IGBUIO_LEGACY_INTR_MODE:
+	case RTE_INTR_MODE_LEGACY:
 		udev->info.irq_flags = IRQF_SHARED;
 		udev->info.irq = dev->irq;
 		break;
@@ -570,7 +562,7 @@ igbuio_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
 fail_release_iomem:
 	sysfs_remove_group(&dev->dev.kobj, &dev_attr_grp);
 	igbuio_pci_release_iomem(&udev->info);
-	if (udev->mode == IGBUIO_MSIX_INTR_MODE)
+	if (udev->mode == RTE_INTR_MODE_MSIX)
 		pci_disable_msix(udev->pdev);
 	pci_release_regions(dev);
 fail_disable:
@@ -595,7 +587,7 @@ igbuio_pci_remove(struct pci_dev *dev)
 	uio_unregister_device(info);
 	igbuio_pci_release_iomem(info);
 	if (((struct rte_uio_pci_dev *)info->priv)->mode ==
-					IGBUIO_MSIX_INTR_MODE)
+			RTE_INTR_MODE_MSIX)
 		pci_disable_msix(dev);
 	pci_release_regions(dev);
 	pci_disable_device(dev);
@@ -611,11 +603,11 @@ igbuio_config_intr_mode(char *intr_str)
 		return 0;
 	}
 
-	if (!strcmp(intr_str, "msix")) {
-		igbuio_intr_mode_preferred = IGBUIO_MSIX_INTR_MODE;
+	if (!strcmp(intr_str, RTE_INTR_MODE_MSIX_NAME)) {
+		igbuio_intr_mode_preferred = RTE_INTR_MODE_MSIX;
 		printk(KERN_INFO "Use MSIX interrupt\n");
-	} else if (!strcmp(intr_str, "legacy")) {
-		igbuio_intr_mode_preferred = IGBUIO_LEGACY_INTR_MODE;
+	} else if (!strcmp(intr_str, RTE_INTR_MODE_LEGACY_NAME)) {
+		igbuio_intr_mode_preferred = RTE_INTR_MODE_LEGACY;
 		printk(KERN_INFO "Use legacy interrupt\n");
 	} else {
 		printk(KERN_INFO "Error: bad parameter - %s\n", intr_str);
@@ -656,8 +648,8 @@ module_exit(igbuio_pci_exit_module);
 module_param(intr_mode, charp, S_IRUGO | S_IWUSR);
 MODULE_PARM_DESC(intr_mode,
 "igb_uio interrupt mode (default=msix):\n"
-"    msix       Use MSIX interrupt\n"
-"    legacy     Use Legacy interrupt\n"
+"    " RTE_INTR_MODE_MSIX_NAME "       Use MSIX interrupt\n"
+"    " RTE_INTR_MODE_LEGACY_NAME "     Use Legacy interrupt\n"
 "\n");
 
 MODULE_DESCRIPTION("UIO driver for Intel IGB PCI cards");
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v4 08/20] vfio: add support for VFIO in Linuxapp targets
  2014-06-03 10:17     ` [dpdk-dev] [PATCH v4 00/20] Add VFIO support to DPDK Anatoly Burakov
                         ` (6 preceding siblings ...)
  2014-06-03 10:17       ` [dpdk-dev] [PATCH v4 07/20] igb_uio: Moved interrupt type out of igb_uio Anatoly Burakov
@ 2014-06-03 10:17       ` Anatoly Burakov
  2014-06-03 10:17       ` [dpdk-dev] [PATCH v4 09/20] vfio: add VFIO header Anatoly Burakov
                         ` (12 subsequent siblings)
  20 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-06-03 10:17 UTC (permalink / raw)
  To: dev

Add VFIO compilation option to common Linuxapp config.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 config/common_linuxapp | 1 +
 1 file changed, 1 insertion(+)

diff --git a/config/common_linuxapp b/config/common_linuxapp
index b17e37e..2ed4b7e 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -124,6 +124,7 @@ CONFIG_RTE_LIBEAL_USE_HPET=n
 CONFIG_RTE_EAL_ALLOW_INV_SOCKET_ID=n
 CONFIG_RTE_EAL_ALWAYS_PANIC_ON_ERROR=n
 CONFIG_RTE_EAL_IGB_UIO=y
+CONFIG_RTE_EAL_VFIO=y
 
 #
 # Compile Environment Abstraction Layer for linux
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v4 09/20] vfio: add VFIO header
  2014-06-03 10:17     ` [dpdk-dev] [PATCH v4 00/20] Add VFIO support to DPDK Anatoly Burakov
                         ` (7 preceding siblings ...)
  2014-06-03 10:17       ` [dpdk-dev] [PATCH v4 08/20] vfio: add support for VFIO in Linuxapp targets Anatoly Burakov
@ 2014-06-03 10:17       ` Anatoly Burakov
  2014-06-03 10:17       ` [dpdk-dev] [PATCH v4 10/20] interrupts: Add support for VFIO interrupts Anatoly Burakov
                         ` (11 subsequent siblings)
  20 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-06-03 10:17 UTC (permalink / raw)
  To: dev

Adding a header that will determine if VFIO support should be compiled
in. If VFIO is enabled in config (and it's enabled by default), then the
header will also check for kernel version. If VFIO is enabled in config
and if the kernel version is 3.6+, then VFIO_PRESENT will be defined.
This is the macro that should be used to determine if VFIO support is
being compiled in.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/linuxapp/eal/include/eal_vfio.h | 49 ++++++++++++++++++++++++++
 1 file changed, 49 insertions(+)
 create mode 100644 lib/librte_eal/linuxapp/eal/include/eal_vfio.h

diff --git a/lib/librte_eal/linuxapp/eal/include/eal_vfio.h b/lib/librte_eal/linuxapp/eal/include/eal_vfio.h
new file mode 100644
index 0000000..354e9ca
--- /dev/null
+++ b/lib/librte_eal/linuxapp/eal/include/eal_vfio.h
@@ -0,0 +1,49 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef EAL_VFIO_H_
+#define EAL_VFIO_H_
+
+/*
+ * determine if VFIO is present on the system
+ */
+#ifdef RTE_EAL_VFIO
+#include <linux/version.h>
+#if LINUX_VERSION_CODE >= KERNEL_VERSION(3, 6, 0)
+#include <linux/vfio.h>
+
+#define VFIO_PRESENT
+#endif /* kernel version */
+#endif /* RTE_EAL_VFIO */
+
+#endif /* EAL_VFIO_H_ */
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v4 10/20] interrupts: Add support for VFIO interrupts
  2014-06-03 10:17     ` [dpdk-dev] [PATCH v4 00/20] Add VFIO support to DPDK Anatoly Burakov
                         ` (8 preceding siblings ...)
  2014-06-03 10:17       ` [dpdk-dev] [PATCH v4 09/20] vfio: add VFIO header Anatoly Burakov
@ 2014-06-03 10:17       ` Anatoly Burakov
  2014-06-03 10:18       ` [dpdk-dev] [PATCH v4 11/20] eal: remove -Wno-return-type for non-existent eal_hpet.c Anatoly Burakov
                         ` (10 subsequent siblings)
  20 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-06-03 10:17 UTC (permalink / raw)
  To: dev

Creating code to handle VFIO interrupts in EAL interrupts (supports all
types of interrupts).

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/linuxapp/eal/eal_interrupts.c       | 285 ++++++++++++++++++++-
 .../linuxapp/eal/include/exec-env/rte_interrupts.h |   4 +
 2 files changed, 284 insertions(+), 5 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
index 58e1ddf..c430710 100644
--- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c
+++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
@@ -36,7 +36,6 @@
 #include <stdlib.h>
 #include <pthread.h>
 #include <sys/queue.h>
-#include <malloc.h>
 #include <stdarg.h>
 #include <unistd.h>
 #include <string.h>
@@ -44,6 +43,7 @@
 #include <inttypes.h>
 #include <sys/epoll.h>
 #include <sys/signalfd.h>
+#include <sys/ioctl.h>
 
 #include <rte_common.h>
 #include <rte_interrupts.h>
@@ -66,6 +66,7 @@
 #include <rte_spinlock.h>
 
 #include "eal_private.h"
+#include "eal_vfio.h"
 
 #define EAL_INTR_EPOLL_WAIT_FOREVER (-1)
 
@@ -87,6 +88,9 @@ union intr_pipefds{
  */
 union rte_intr_read_buffer {
 	int uio_intr_count;              /* for uio device */
+#ifdef VFIO_PRESENT
+	uint64_t vfio_intr_count;        /* for vfio device */
+#endif
 	uint64_t timerfd_num;            /* for timerfd */
 	char charbuf[16];                /* for others */
 };
@@ -119,6 +123,244 @@ static struct rte_intr_source_list intr_sources;
 /* interrupt handling thread */
 static pthread_t intr_thread;
 
+/* VFIO interrupts */
+#ifdef VFIO_PRESENT
+
+#define IRQ_SET_BUF_LEN  (sizeof(struct vfio_irq_set) + sizeof(int))
+
+/* enable legacy (INTx) interrupts */
+static int
+vfio_enable_intx(struct rte_intr_handle *intr_handle) {
+	struct vfio_irq_set *irq_set;
+	char irq_set_buf[IRQ_SET_BUF_LEN];
+	int len, ret;
+	int *fd_ptr;
+
+	len = sizeof(irq_set_buf);
+
+	/* enable INTx */
+	irq_set = (struct vfio_irq_set *) irq_set_buf;
+	irq_set->argsz = len;
+	irq_set->count = 1;
+	irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD | VFIO_IRQ_SET_ACTION_TRIGGER;
+	irq_set->index = VFIO_PCI_INTX_IRQ_INDEX;
+	irq_set->start = 0;
+	fd_ptr = (int *) &irq_set->data;
+	*fd_ptr = intr_handle->fd;
+
+	ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
+
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Error enabling INTx interrupts for fd %d\n",
+						intr_handle->fd);
+		return -1;
+	}
+
+	/* unmask INTx after enabling */
+	memset(irq_set, 0, len);
+	len = sizeof(struct vfio_irq_set);
+	irq_set->argsz = len;
+	irq_set->count = 1;
+	irq_set->flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_UNMASK;
+	irq_set->index = VFIO_PCI_INTX_IRQ_INDEX;
+	irq_set->start = 0;
+
+	ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
+
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Error unmasking INTx interrupts for fd %d\n",
+						intr_handle->fd);
+		return -1;
+	}
+	return 0;
+}
+
+/* disable legacy (INTx) interrupts */
+static int
+vfio_disable_intx(struct rte_intr_handle *intr_handle) {
+	struct vfio_irq_set *irq_set;
+	char irq_set_buf[IRQ_SET_BUF_LEN];
+	int len, ret;
+
+	len = sizeof(struct vfio_irq_set);
+
+	/* mask interrupts before disabling */
+	irq_set = (struct vfio_irq_set *) irq_set_buf;
+	irq_set->argsz = len;
+	irq_set->count = 1;
+	irq_set->flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_UNMASK;
+	irq_set->index = VFIO_PCI_INTX_IRQ_INDEX;
+	irq_set->start = 0;
+
+	ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
+
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Error unmasking INTx interrupts for fd %d\n",
+						intr_handle->fd);
+		return -1;
+	}
+
+	/* disable INTx*/
+	memset(irq_set, 0, len);
+	irq_set->argsz = len;
+	irq_set->count = 0;
+	irq_set->flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_TRIGGER;
+	irq_set->index = VFIO_PCI_INTX_IRQ_INDEX;
+	irq_set->start = 0;
+
+	ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
+
+	if (ret) {
+		RTE_LOG(ERR, EAL,
+			"Error disabling INTx interrupts for fd %d\n", intr_handle->fd);
+		return -1;
+	}
+	return 0;
+}
+
+/* enable MSI-X interrupts */
+static int
+vfio_enable_msi(struct rte_intr_handle *intr_handle) {
+	int len, ret;
+	char irq_set_buf[IRQ_SET_BUF_LEN];
+	struct vfio_irq_set *irq_set;
+	int *fd_ptr;
+
+	len = sizeof(irq_set_buf);
+
+	irq_set = (struct vfio_irq_set *) irq_set_buf;
+	irq_set->argsz = len;
+	irq_set->count = 1;
+	irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD | VFIO_IRQ_SET_ACTION_TRIGGER;
+	irq_set->index = VFIO_PCI_MSI_IRQ_INDEX;
+	irq_set->start = 0;
+	fd_ptr = (int *) &irq_set->data;
+	*fd_ptr = intr_handle->fd;
+
+	ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
+
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Error enabling MSI interrupts for fd %d\n",
+						intr_handle->fd);
+		return -1;
+	}
+
+	/* manually trigger interrupt to enable it */
+	memset(irq_set, 0, len);
+	len = sizeof(struct vfio_irq_set);
+	irq_set->argsz = len;
+	irq_set->count = 1;
+	irq_set->flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_TRIGGER;
+	irq_set->index = VFIO_PCI_MSI_IRQ_INDEX;
+	irq_set->start = 0;
+
+	ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
+
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Error triggering MSI interrupts for fd %d\n",
+						intr_handle->fd);
+		return -1;
+	}
+	return 0;
+}
+
+/* disable MSI-X interrupts */
+static int
+vfio_disable_msi(struct rte_intr_handle *intr_handle) {
+	struct vfio_irq_set *irq_set;
+	char irq_set_buf[IRQ_SET_BUF_LEN];
+	int len, ret;
+
+	len = sizeof(struct vfio_irq_set);
+
+	irq_set = (struct vfio_irq_set *) irq_set_buf;
+	irq_set->argsz = len;
+	irq_set->count = 0;
+	irq_set->flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_TRIGGER;
+	irq_set->index = VFIO_PCI_MSI_IRQ_INDEX;
+	irq_set->start = 0;
+
+	ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
+
+	if (ret)
+		RTE_LOG(ERR, EAL,
+			"Error disabling MSI interrupts for fd %d\n", intr_handle->fd);
+
+	return ret;
+}
+
+/* enable MSI-X interrupts */
+static int
+vfio_enable_msix(struct rte_intr_handle *intr_handle) {
+	int len, ret;
+	char irq_set_buf[IRQ_SET_BUF_LEN];
+	struct vfio_irq_set *irq_set;
+	int *fd_ptr;
+
+	len = sizeof(irq_set_buf);
+
+	irq_set = (struct vfio_irq_set *) irq_set_buf;
+	irq_set->argsz = len;
+	irq_set->count = 1;
+	irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD | VFIO_IRQ_SET_ACTION_TRIGGER;
+	irq_set->index = VFIO_PCI_MSIX_IRQ_INDEX;
+	irq_set->start = 0;
+	fd_ptr = (int *) &irq_set->data;
+	*fd_ptr = intr_handle->fd;
+
+	ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
+
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Error enabling MSI-X interrupts for fd %d\n",
+						intr_handle->fd);
+		return -1;
+	}
+
+	/* manually trigger interrupt to enable it */
+	memset(irq_set, 0, len);
+	len = sizeof(struct vfio_irq_set);
+	irq_set->argsz = len;
+	irq_set->count = 1;
+	irq_set->flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_TRIGGER;
+	irq_set->index = VFIO_PCI_MSIX_IRQ_INDEX;
+	irq_set->start = 0;
+
+	ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
+
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Error triggering MSI-X interrupts for fd %d\n",
+						intr_handle->fd);
+		return -1;
+	}
+	return 0;
+}
+
+/* disable MSI-X interrupts */
+static int
+vfio_disable_msix(struct rte_intr_handle *intr_handle) {
+	struct vfio_irq_set *irq_set;
+	char irq_set_buf[IRQ_SET_BUF_LEN];
+	int len, ret;
+
+	len = sizeof(struct vfio_irq_set);
+
+	irq_set = (struct vfio_irq_set *) irq_set_buf;
+	irq_set->argsz = len;
+	irq_set->count = 0;
+	irq_set->flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_TRIGGER;
+	irq_set->index = VFIO_PCI_MSIX_IRQ_INDEX;
+	irq_set->start = 0;
+
+	ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
+
+	if (ret)
+		RTE_LOG(ERR, EAL,
+			"Error disabling MSI-X interrupts for fd %d\n", intr_handle->fd);
+
+	return ret;
+}
+#endif
+
 int
 rte_intr_callback_register(struct rte_intr_handle *intr_handle,
 			rte_intr_callback_fn cb, void *cb_arg)
@@ -276,6 +518,20 @@ rte_intr_enable(struct rte_intr_handle *intr_handle)
 	/* not used at this moment */
 	case RTE_INTR_HANDLE_ALARM:
 		return -1;
+#ifdef VFIO_PRESENT
+	case RTE_INTR_HANDLE_VFIO_MSIX:
+		if (vfio_enable_msix(intr_handle))
+			return -1;
+		break;
+	case RTE_INTR_HANDLE_VFIO_MSI:
+		if (vfio_enable_msi(intr_handle))
+			return -1;
+		break;
+	case RTE_INTR_HANDLE_VFIO_LEGACY:
+		if (vfio_enable_intx(intr_handle))
+			return -1;
+		break;
+#endif
 	/* unknown handle type */
 	default:
 		RTE_LOG(ERR, EAL,
@@ -300,7 +556,7 @@ rte_intr_disable(struct rte_intr_handle *intr_handle)
 	case RTE_INTR_HANDLE_UIO:
 		if (write(intr_handle->fd, &value, sizeof(value)) < 0){
 			RTE_LOG(ERR, EAL,
-				"Error enabling interrupts for fd %d\n",
+				"Error disabling interrupts for fd %d\n",
 							intr_handle->fd);
 			return -1;
 		}
@@ -308,6 +564,20 @@ rte_intr_disable(struct rte_intr_handle *intr_handle)
 	/* not used at this moment */
 	case RTE_INTR_HANDLE_ALARM:
 		return -1;
+#ifdef VFIO_PRESENT
+	case RTE_INTR_HANDLE_VFIO_MSIX:
+		if (vfio_disable_msix(intr_handle))
+			return -1;
+		break;
+	case RTE_INTR_HANDLE_VFIO_MSI:
+		if (vfio_disable_msi(intr_handle))
+			return -1;
+		break;
+	case RTE_INTR_HANDLE_VFIO_LEGACY:
+		if (vfio_disable_intx(intr_handle))
+			return -1;
+		break;
+#endif
 	/* unknown handle type */
 	default:
 		RTE_LOG(ERR, EAL,
@@ -357,10 +627,15 @@ eal_intr_process_interrupts(struct epoll_event *events, int nfds)
 		/* set the length to be read dor different handle type */
 		switch (src->intr_handle.type) {
 		case RTE_INTR_HANDLE_UIO:
-			bytes_read = 4;
+			bytes_read = sizeof(buf.uio_intr_count);
 			break;
 		case RTE_INTR_HANDLE_ALARM:
-			bytes_read = sizeof(uint64_t);
+			bytes_read = sizeof(buf.timerfd_num);
+			break;
+		case RTE_INTR_HANDLE_VFIO_MSIX:
+		case RTE_INTR_HANDLE_VFIO_MSI:
+		case RTE_INTR_HANDLE_VFIO_LEGACY:
+			bytes_read = sizeof(buf.vfio_intr_count);
 			break;
 		default:
 			bytes_read = 1;
@@ -397,7 +672,7 @@ eal_intr_process_interrupts(struct epoll_event *events, int nfds)
 				active_cb.cb_fn(&src->intr_handle,
 					active_cb.cb_arg);
 
-				/*get the lcok back. */
+				/*get the lock back. */
 				rte_spinlock_lock(&intr_lock);
 			}
 		}
diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
index 6733948..e00a343 100644
--- a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
+++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
@@ -41,12 +41,16 @@
 enum rte_intr_handle_type {
 	RTE_INTR_HANDLE_UNKNOWN = 0,
 	RTE_INTR_HANDLE_UIO,      /**< uio device handle */
+	RTE_INTR_HANDLE_VFIO_LEGACY,  /**< vfio device handle (legacy) */
+	RTE_INTR_HANDLE_VFIO_MSI,     /**< vfio device handle (MSI) */
+	RTE_INTR_HANDLE_VFIO_MSIX,    /**< vfio device handle (MSIX) */
 	RTE_INTR_HANDLE_ALARM,    /**< alarm handle */
 	RTE_INTR_HANDLE_MAX
 };
 
 /** Handle for interrupts. */
 struct rte_intr_handle {
+	int vfio_dev_fd;                 /**< VFIO device file descriptor */
 	int fd;                          /**< file descriptor */
 	enum rte_intr_handle_type type;  /**< handle type */
 };
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v4 11/20] eal: remove -Wno-return-type for non-existent eal_hpet.c
  2014-06-03 10:17     ` [dpdk-dev] [PATCH v4 00/20] Add VFIO support to DPDK Anatoly Burakov
                         ` (9 preceding siblings ...)
  2014-06-03 10:17       ` [dpdk-dev] [PATCH v4 10/20] interrupts: Add support for VFIO interrupts Anatoly Burakov
@ 2014-06-03 10:18       ` Anatoly Burakov
  2014-06-03 10:18       ` [dpdk-dev] [PATCH v4 12/20] vfio: create mapping code for VFIO Anatoly Burakov
                         ` (9 subsequent siblings)
  20 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-06-03 10:18 UTC (permalink / raw)
  To: dev

eal_hpet.c was renamed to eal_timer.c and, thanks to code changes, does
not need the -Wno-return-type any more.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/linuxapp/eal/Makefile | 1 -
 1 file changed, 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index d958014..5f3be5f 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -93,7 +93,6 @@ CFLAGS_eal_common_whitelist.o := -D_GNU_SOURCE
 # http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12603
 ifeq ($(CONFIG_RTE_TOOLCHAIN_GCC),y)
 CFLAGS_eal_thread.o += -Wno-return-type
-CFLAGS_eal_hpet.o += -Wno-return-type
 endif
 
 INC := rte_per_lcore.h rte_lcore.h rte_interrupts.h rte_kni_common.h rte_dom0_common.h
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v4 12/20] vfio: create mapping code for VFIO
  2014-06-03 10:17     ` [dpdk-dev] [PATCH v4 00/20] Add VFIO support to DPDK Anatoly Burakov
                         ` (10 preceding siblings ...)
  2014-06-03 10:18       ` [dpdk-dev] [PATCH v4 11/20] eal: remove -Wno-return-type for non-existent eal_hpet.c Anatoly Burakov
@ 2014-06-03 10:18       ` Anatoly Burakov
  2014-06-03 10:18       ` [dpdk-dev] [PATCH v4 13/20] vfio: add multiprocess support Anatoly Burakov
                         ` (8 subsequent siblings)
  20 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-06-03 10:18 UTC (permalink / raw)
  To: dev

Adding code to support VFIO mapping (primary processes only). Most of
the things are done via ioctl() calls on either /dev/vfio/vfio (the
container) or a /dev/vfio/$GROUP_NR (IOMMU group).

In a nutshell, the code does the following:
1. creates a VFIO container (an entity that allows sharing IOMMU DMA
   mappings between devices)
2. checks if a given PCI device is a member of an IOMMU group (if it's
   not, this indicates that the device isn't bound to VFIO)
3. calls open() the group file to obtain a group fd
4. checks if the group is viable (that is, if all the devices in the
   same IOMMU group are either bound to VFIO or not bound to anything)
5. adds the group to a container
6. sets up DMA mappings (only done once, mapping whole DPDK hugepage
   memory for DMA, with a 1:1 correspondence of IOVA to PA)
7. gets the actual PCI device fd from the group fd (can fail, which
   simply means that this particular device is not bound to VFIO)
8. maps BARs (MSI-X BAR cannot be mmaped, so skipping it)
9. sets up interrupt structures (but not enables them!)
10. enables PCI bus mastering

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/linuxapp/eal/Makefile               |   2 +
 lib/librte_eal/linuxapp/eal/eal.c                  |   2 +
 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c         | 706 +++++++++++++++++++++
 .../linuxapp/eal/include/eal_internal_cfg.h        |   3 +
 lib/librte_eal/linuxapp/eal/include/eal_pci_init.h |  31 +
 lib/librte_eal/linuxapp/eal/include/eal_vfio.h     |   6 +
 6 files changed, 750 insertions(+)
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c

diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index 5f3be5f..cf9f026 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -58,6 +58,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_thread.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_log.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci_uio.c
+SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci_vfio.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_debug.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_lcore.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_timer.c
@@ -87,6 +88,7 @@ CFLAGS_eal_log.o := -D_GNU_SOURCE
 CFLAGS_eal_common_log.o := -D_GNU_SOURCE
 CFLAGS_eal_hugepage_info.o := -D_GNU_SOURCE
 CFLAGS_eal_pci.o := -D_GNU_SOURCE
+CFLAGS_eal_pci_vfio.o := -D_GNU_SOURCE
 CFLAGS_eal_common_whitelist.o := -D_GNU_SOURCE
 
 # workaround for a gcc bug with noreturn attribute
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index 9d2675b..aeb5903 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -650,6 +650,8 @@ eal_parse_args(int argc, char **argv)
 	internal_config.force_sockets = 0;
 	internal_config.syslog_facility = LOG_DAEMON;
 	internal_config.xen_dom0_support = 0;
+	/* if set to NONE, interrupt mode is determined automatically */
+	internal_config.vfio_intr_mode = RTE_INTR_MODE_NONE;
 #ifdef RTE_LIBEAL_USE_HPET
 	internal_config.no_hpet = 0;
 #else
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
new file mode 100644
index 0000000..e1d6973
--- /dev/null
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
@@ -0,0 +1,706 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <string.h>
+#include <fcntl.h>
+#include <linux/pci_regs.h>
+#include <sys/eventfd.h>
+#include <sys/socket.h>
+#include <sys/ioctl.h>
+
+#include <rte_log.h>
+#include <rte_pci.h>
+#include <rte_tailq.h>
+#include <rte_eal_memconfig.h>
+#include <rte_malloc.h>
+
+#include "eal_filesystem.h"
+#include "eal_pci_init.h"
+#include "eal_vfio.h"
+
+/**
+ * @file
+ * PCI probing under linux (VFIO version)
+ *
+ * This code tries to determine if the PCI device is bound to VFIO driver,
+ * and initialize it (map BARs, set up interrupts) if that's the case.
+ *
+ * This file is only compiled if CONFIG_RTE_EAL_VFIO is set to "y".
+ */
+
+#ifdef VFIO_PRESENT
+
+#define VFIO_DIR "/dev/vfio"
+#define VFIO_CONTAINER_PATH "/dev/vfio/vfio"
+#define VFIO_GROUP_FMT "/dev/vfio/%u"
+#define VFIO_GET_REGION_ADDR(x) ((uint64_t) x << 40ULL)
+
+/* per-process VFIO config */
+static struct vfio_config vfio_cfg;
+
+/* get PCI BAR number where MSI-X interrupts are */
+static int
+pci_vfio_get_msix_bar(int fd, int *msix_bar)
+{
+	int ret;
+	uint32_t reg;
+	uint8_t cap_id, cap_offset;
+
+	/* read PCI capability pointer from config space */
+	ret = pread64(fd, &reg, sizeof(reg),
+			VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
+			PCI_CAPABILITY_LIST);
+	if (ret != sizeof(reg)) {
+		RTE_LOG(ERR, EAL, "Cannot read capability pointer from PCI "
+				"config space!\n");
+		return -1;
+	}
+
+	/* we need first byte */
+	cap_offset = reg & 0xFF;
+
+	while (cap_offset) {
+
+		/* read PCI capability ID */
+		ret = pread64(fd, &reg, sizeof(reg),
+				VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
+				cap_offset);
+		if (ret != sizeof(reg)) {
+			RTE_LOG(ERR, EAL, "Cannot read capability ID from PCI "
+					"config space!\n");
+			return -1;
+		}
+
+		/* we need first byte */
+		cap_id = reg & 0xFF;
+
+		/* if we haven't reached MSI-X, check next capability */
+		if (cap_id != PCI_CAP_ID_MSIX) {
+			ret = pread64(fd, &reg, sizeof(reg),
+					VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
+					cap_offset);
+			if (ret != sizeof(reg)) {
+				RTE_LOG(ERR, EAL, "Cannot read capability pointer from PCI "
+						"config space!\n");
+				return -1;
+			}
+
+			/* we need second byte */
+			cap_offset = (reg & 0xFF00) >> 8;
+
+			continue;
+		}
+		/* else, read table offset */
+		else {
+			/* table offset resides in the next 4 bytes */
+			ret = pread64(fd, &reg, sizeof(reg),
+					VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
+					cap_offset + 4);
+			if (ret != sizeof(reg)) {
+				RTE_LOG(ERR, EAL, "Cannot read table offset from PCI config "
+						"space!\n");
+				return -1;
+			}
+
+			*msix_bar = reg & RTE_PCI_MSIX_TABLE_BIR;
+
+			return 0;
+		}
+	}
+	return 0;
+}
+
+/* set PCI bus mastering */
+static int
+pci_vfio_set_bus_master(int dev_fd)
+{
+	uint16_t reg;
+	int ret;
+
+	ret = pread64(dev_fd, &reg, sizeof(reg),
+			VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
+			PCI_COMMAND);
+	if (ret != sizeof(reg)) {
+		RTE_LOG(ERR, EAL, "Cannot read command from PCI config space!\n");
+		return -1;
+	}
+
+	/* set the master bit */
+	reg |= PCI_COMMAND_MASTER;
+
+	ret = pwrite64(dev_fd, &reg, sizeof(reg),
+			VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
+			PCI_COMMAND);
+
+	if (ret != sizeof(reg)) {
+		RTE_LOG(ERR, EAL, "Cannot write command to PCI config space!\n");
+		return -1;
+	}
+
+	return 0;
+}
+
+/* set up DMA mappings */
+static int
+pci_vfio_setup_dma_maps(int vfio_container_fd)
+{
+	const struct rte_memseg *ms = rte_eal_get_physmem_layout();
+	int i, ret;
+
+	ret = ioctl(vfio_container_fd, VFIO_SET_IOMMU,
+			VFIO_TYPE1_IOMMU);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "  cannot set IOMMU type!\n");
+		return -1;
+	}
+
+	/* map all DPDK segments for DMA. use 1:1 PA to IOVA mapping */
+	for (i = 0; i < RTE_MAX_MEMSEG; i++) {
+		struct vfio_iommu_type1_dma_map dma_map;
+
+		if (ms[i].addr == NULL)
+			break;
+
+		memset(&dma_map, 0, sizeof(dma_map));
+		dma_map.argsz = sizeof(struct vfio_iommu_type1_dma_map);
+		dma_map.vaddr = ms[i].addr_64;
+		dma_map.size = ms[i].len;
+		dma_map.iova = ms[i].phys_addr;
+		dma_map.flags = VFIO_DMA_MAP_FLAG_READ | VFIO_DMA_MAP_FLAG_WRITE;
+
+		ret = ioctl(vfio_container_fd, VFIO_IOMMU_MAP_DMA, &dma_map);
+
+		if (ret) {
+			RTE_LOG(ERR, EAL, "  cannot set up DMA remapping!\n");
+			return -1;
+		}
+	}
+
+	return 0;
+}
+
+/* set up interrupt support (but not enable interrupts) */
+static int
+pci_vfio_setup_interrupts(struct rte_pci_device *dev, int vfio_dev_fd)
+{
+	int i, ret, intr_idx;
+
+	/* default to invalid index */
+	intr_idx = VFIO_PCI_NUM_IRQS;
+
+	/* get interrupt type from internal config (MSI-X by default, can be
+	 * overriden from the command line
+	 */
+	switch (internal_config.vfio_intr_mode) {
+	case RTE_INTR_MODE_MSIX:
+		intr_idx = VFIO_PCI_MSIX_IRQ_INDEX;
+		break;
+	case RTE_INTR_MODE_MSI:
+		intr_idx = VFIO_PCI_MSI_IRQ_INDEX;
+		break;
+	case RTE_INTR_MODE_LEGACY:
+		intr_idx = VFIO_PCI_INTX_IRQ_INDEX;
+		break;
+	/* don't do anything if we want to automatically determine interrupt type */
+	case RTE_INTR_MODE_NONE:
+		break;
+	default:
+		RTE_LOG(ERR, EAL, "  unknown default interrupt type!\n");
+		return -1;
+	}
+
+	/* start from MSI-X interrupt type */
+	for (i = VFIO_PCI_MSIX_IRQ_INDEX; i >= 0; i--) {
+		struct vfio_irq_info irq = { .argsz = sizeof(irq) };
+		int fd = -1;
+
+		/* skip interrupt modes we don't want */
+		if (internal_config.vfio_intr_mode != RTE_INTR_MODE_NONE &&
+				i != intr_idx)
+			continue;
+
+		irq.index = i;
+
+		ret = ioctl(vfio_dev_fd, VFIO_DEVICE_GET_IRQ_INFO, &irq);
+		if (ret < 0) {
+			RTE_LOG(ERR, EAL, "  cannot get IRQ info!\n");
+			return -1;
+		}
+
+		/* if this vector cannot be used with eventfd, fail if we explicitly
+		 * specified interrupt type, otherwise continue */
+		if ((irq.flags & VFIO_IRQ_INFO_EVENTFD) == 0) {
+			if (internal_config.vfio_intr_mode != RTE_INTR_MODE_NONE) {
+				RTE_LOG(ERR, EAL, "  interrupt vector does not support eventfd!\n");
+				return -1;
+			} else
+				continue;
+		}
+
+		/* set up an eventfd for interrupts */
+		fd = eventfd(0, 0);
+		if (fd < 0) {
+			RTE_LOG(ERR, EAL, "  cannot set up eventfd!\n");
+			return -1;
+		}
+
+		dev->intr_handle.fd = fd;
+		dev->intr_handle.vfio_dev_fd = vfio_dev_fd;
+
+		switch (i) {
+		case VFIO_PCI_MSIX_IRQ_INDEX:
+			internal_config.vfio_intr_mode = RTE_INTR_MODE_MSIX;
+			dev->intr_handle.type = RTE_INTR_HANDLE_VFIO_MSIX;
+			break;
+		case VFIO_PCI_MSI_IRQ_INDEX:
+			internal_config.vfio_intr_mode = RTE_INTR_MODE_MSI;
+			dev->intr_handle.type = RTE_INTR_HANDLE_VFIO_MSI;
+			break;
+		case VFIO_PCI_INTX_IRQ_INDEX:
+			internal_config.vfio_intr_mode = RTE_INTR_MODE_LEGACY;
+			dev->intr_handle.type = RTE_INTR_HANDLE_VFIO_LEGACY;
+			break;
+		default:
+			RTE_LOG(ERR, EAL, "  unknown interrupt type!\n");
+			return -1;
+		}
+
+		return 0;
+	}
+
+	/* if we're here, we haven't found a suitable interrupt vector */
+	return -1;
+}
+
+/* open container fd or get an existing one */
+static int
+pci_vfio_get_container_fd(void)
+{
+	int ret, vfio_container_fd;
+
+	/* if we're in a primary process, try to open the container */
+	if (internal_config.process_type == RTE_PROC_PRIMARY) {
+		vfio_container_fd = open(VFIO_CONTAINER_PATH, O_RDWR);
+		if (vfio_container_fd < 0) {
+			RTE_LOG(ERR, EAL, "  cannot open VFIO container!\n");
+			return -1;
+		}
+
+		/* check VFIO API version */
+		ret = ioctl(vfio_container_fd, VFIO_GET_API_VERSION);
+		if (ret != VFIO_API_VERSION) {
+			RTE_LOG(ERR, EAL, "  unknown VFIO API version!\n");
+			close(vfio_container_fd);
+			return -1;
+		}
+
+		/* check if we support IOMMU type 1 */
+		ret = ioctl(vfio_container_fd, VFIO_CHECK_EXTENSION, VFIO_TYPE1_IOMMU);
+		if (!ret) {
+			RTE_LOG(ERR, EAL, "  unknown IOMMU driver!\n");
+			close(vfio_container_fd);
+			return -1;
+		}
+
+		return vfio_container_fd;
+	}
+
+	return -1;
+}
+
+/* open group fd or get an existing one */
+static int
+pci_vfio_get_group_fd(int iommu_group_no)
+{
+	int i;
+	int vfio_group_fd;
+	char filename[PATH_MAX];
+
+	/* check if we already have the group descriptor open */
+	for (i = 0; i < vfio_cfg.vfio_group_idx; i++)
+		if (vfio_cfg.vfio_groups[i].group_no == iommu_group_no)
+			return vfio_cfg.vfio_groups[i].fd;
+
+	/* if primary, try to open the group */
+	if (internal_config.process_type == RTE_PROC_PRIMARY) {
+		rte_snprintf(filename, sizeof(filename),
+				 VFIO_GROUP_FMT, iommu_group_no);
+		vfio_group_fd = open(filename, O_RDWR);
+		if (vfio_group_fd < 0) {
+			/* if file not found, it's not an error */
+			if (errno != ENOENT) {
+				RTE_LOG(ERR, EAL, "Cannot open %s: %s\n", filename,
+						strerror(errno));
+				return -1;
+			}
+			return 0;
+		}
+
+		/* if the fd is valid, create a new group for it */
+		if (vfio_cfg.vfio_group_idx == VFIO_MAX_GROUPS) {
+			RTE_LOG(ERR, EAL, "Maximum number of VFIO groups reached!\n");
+			return -1;
+		}
+		vfio_cfg.vfio_groups[vfio_cfg.vfio_group_idx].group_no = iommu_group_no;
+		vfio_cfg.vfio_groups[vfio_cfg.vfio_group_idx].fd = vfio_group_fd;
+		return vfio_group_fd;
+	}
+	return -1;
+}
+
+/* parse IOMMU group number for a PCI device
+ * returns -1 for errors, 0 for non-existent group */
+static int
+pci_vfio_get_group_no(const char *pci_addr)
+{
+	char linkname[PATH_MAX];
+	char filename[PATH_MAX];
+	char *tok[16], *group_tok, *end;
+	int ret, iommu_group_no;
+
+	memset(linkname, 0, sizeof(linkname));
+	memset(filename, 0, sizeof(filename));
+
+	/* try to find out IOMMU group for this device */
+	rte_snprintf(linkname, sizeof(linkname),
+			 SYSFS_PCI_DEVICES "/%s/iommu_group", pci_addr);
+
+	ret = readlink(linkname, filename, sizeof(filename));
+
+	/* if the link doesn't exist, no VFIO for us */
+	if (ret < 0)
+		return 0;
+
+	ret = rte_strsplit(filename, sizeof(filename),
+			tok, RTE_DIM(tok), '/');
+
+	if (ret <= 0) {
+		RTE_LOG(ERR, EAL, "  %s cannot get IOMMU group\n", pci_addr);
+		return -1;
+	}
+
+	/* IOMMU group is always the last token */
+	errno = 0;
+	group_tok = tok[ret - 1];
+	end = group_tok;
+	iommu_group_no = strtol(group_tok, &end, 10);
+	if ((end != group_tok && *end != '\0') || errno != 0) {
+		RTE_LOG(ERR, EAL, "  %s error parsing IOMMU number!\n", pci_addr);
+		return -1;
+	}
+
+	return iommu_group_no;
+}
+
+static void
+clear_current_group(void)
+{
+	vfio_cfg.vfio_groups[vfio_cfg.vfio_group_idx].group_no = 0;
+	vfio_cfg.vfio_groups[vfio_cfg.vfio_group_idx].fd = -1;
+}
+
+
+/*
+ * map the PCI resources of a PCI device in virtual memory (VFIO version).
+ * primary and secondary processes follow almost exactly the same path
+ */
+int
+pci_vfio_map_resource(struct rte_pci_device *dev)
+{
+	struct vfio_group_status group_status = {
+			.argsz = sizeof(group_status)
+	};
+	struct vfio_device_info device_info = { .argsz = sizeof(device_info) };
+	int vfio_group_fd, vfio_dev_fd;
+	int iommu_group_no;
+	char pci_addr[PATH_MAX] = {0};
+	struct rte_pci_addr *loc = &dev->addr;
+	int i, ret, msix_bar;
+	struct mapped_pci_resource *vfio_res = NULL;
+	struct pci_map *maps;
+
+	dev->intr_handle.fd = -1;
+	dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
+
+	/* store PCI address string */
+	rte_snprintf(pci_addr, sizeof(pci_addr), PCI_PRI_FMT,
+			loc->domain, loc->bus, loc->devid, loc->function);
+
+	/* get container fd (needs to be done only once per initialization) */
+	if (vfio_cfg.vfio_container_fd == -1) {
+		int vfio_container_fd = pci_vfio_get_container_fd();
+		if (vfio_container_fd < 0) {
+			RTE_LOG(ERR, EAL, "  %s cannot open VFIO container!\n", pci_addr);
+			return -1;
+		}
+
+		vfio_cfg.vfio_container_fd = vfio_container_fd;
+	}
+
+	/* get group number */
+	iommu_group_no = pci_vfio_get_group_no(pci_addr);
+
+	/* if 0, group doesn't exist */
+	if (iommu_group_no == 0) {
+		RTE_LOG(WARNING, EAL, "  %s not managed by VFIO driver, skipping\n",
+				pci_addr);
+		return 1;
+	}
+	/* if negative, something failed */
+	else if (iommu_group_no < 0)
+		return -1;
+
+	/* get the actual group fd */
+	vfio_group_fd = pci_vfio_get_group_fd(iommu_group_no);
+	if (vfio_group_fd < 0)
+		return -1;
+
+	/* store group fd */
+	vfio_cfg.vfio_groups[vfio_cfg.vfio_group_idx].group_no = iommu_group_no;
+	vfio_cfg.vfio_groups[vfio_cfg.vfio_group_idx].fd = vfio_group_fd;
+
+	/* if group_fd == 0, that means the device isn't managed by VFIO */
+	if (vfio_group_fd == 0) {
+		RTE_LOG(WARNING, EAL, "  %s not managed by VFIO driver, skipping\n",
+				pci_addr);
+		/* we store 0 as group fd to distinguish between existing but
+		 * unbound VFIO groups, and groups that don't exist at all.
+		 */
+		vfio_cfg.vfio_group_idx++;
+		return 1;
+	}
+
+	/*
+	 * at this point, we know at least one port on this device is bound to VFIO,
+	 * so we can proceed to try and set this particular port up
+	 */
+
+	/* check if the group is viable */
+	ret = ioctl(vfio_group_fd, VFIO_GROUP_GET_STATUS, &group_status);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "  %s cannot get group status!\n", pci_addr);
+		close(vfio_group_fd);
+		clear_current_group();
+		return -1;
+	} else if (!(group_status.flags & VFIO_GROUP_FLAGS_VIABLE)) {
+		RTE_LOG(ERR, EAL, "  %s VFIO group is not viable!\n", pci_addr);
+		close(vfio_group_fd);
+		clear_current_group();
+		return -1;
+	}
+
+	/*
+	 * at this point, we know that this group is viable (meaning, all devices
+	 * are either bound to VFIO or not bound to anything)
+	 */
+
+	/* check if group does not have a container yet */
+	if (!(group_status.flags & VFIO_GROUP_FLAGS_CONTAINER_SET)) {
+
+		/* add group to a container */
+		ret = ioctl(vfio_group_fd, VFIO_GROUP_SET_CONTAINER,
+				&vfio_cfg.vfio_container_fd);
+		if (ret) {
+			RTE_LOG(ERR, EAL, "  %s cannot add VFIO group to container!\n",
+					pci_addr);
+			close(vfio_group_fd);
+			clear_current_group();
+			return -1;
+		}
+		/*
+		 * at this point we know that this group has been successfully
+		 * initialized, so we increment vfio_group_idx to indicate that we can
+		 * add new groups.
+		 */
+		vfio_cfg.vfio_group_idx++;
+	}
+
+	/*
+	 * set up DMA mappings for container (needs to be done only once, only when
+	 * at least one group is assigned to a container and only in primary process)
+	 */
+	if (internal_config.process_type == RTE_PROC_PRIMARY &&
+			vfio_cfg.vfio_container_has_dma == 0) {
+		ret = pci_vfio_setup_dma_maps(vfio_cfg.vfio_container_fd);
+		if (ret) {
+			RTE_LOG(ERR, EAL, "  %s DMA remapping failed!\n", pci_addr);
+			return -1;
+		}
+		vfio_cfg.vfio_container_has_dma = 1;
+	}
+
+	/* get a file descriptor for the device */
+	vfio_dev_fd = ioctl(vfio_group_fd, VFIO_GROUP_GET_DEVICE_FD, pci_addr);
+	if (vfio_dev_fd < 0) {
+		/* if we cannot get a device fd, this simply means that this
+		 * particular port is not bound to VFIO
+		 */
+		RTE_LOG(WARNING, EAL, "  %s not managed by VFIO driver, skipping\n",
+				pci_addr);
+		return 1;
+	}
+
+	/* test and setup the device */
+	ret = ioctl(vfio_dev_fd, VFIO_DEVICE_GET_INFO, &device_info);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "  %s cannot get device info!\n", pci_addr);
+		close(vfio_dev_fd);
+		return -1;
+	}
+
+	/* get MSI-X BAR, if any (we have to know where it is because we can't
+	 * mmap it when using VFIO) */
+	msix_bar = -1;
+	ret = pci_vfio_get_msix_bar(vfio_dev_fd, &msix_bar);
+	if (ret < 0) {
+		RTE_LOG(ERR, EAL, "  %s cannot get MSI-X BAR number!\n", pci_addr);
+		close(vfio_dev_fd);
+		return -1;
+	}
+
+	/* if we're in a primary process, allocate vfio_res and get region info */
+	if (internal_config.process_type == RTE_PROC_PRIMARY) {
+		if ((vfio_res = rte_zmalloc("VFIO_RES", sizeof(*vfio_res), 0))
+				== NULL) {
+			RTE_LOG(ERR, EAL,
+				"%s(): cannot store uio mmap details\n", __func__);
+			close(vfio_dev_fd);
+			return -1;
+		}
+		memcpy(&vfio_res->pci_addr, &dev->addr, sizeof(vfio_res->pci_addr));
+
+		/* get number of registers (up to BAR5) */
+		vfio_res->nb_maps = RTE_MIN((int) device_info.num_regions,
+				VFIO_PCI_BAR5_REGION_INDEX + 1);
+	}
+
+	/* map BARs */
+	maps = vfio_res->maps;
+
+	for (i = 0; i < (int) vfio_res->nb_maps; i++) {
+		struct vfio_region_info reg = { .argsz = sizeof(reg) };
+		void *bar_addr;
+
+		reg.index = i;
+
+		ret = ioctl(vfio_dev_fd, VFIO_DEVICE_GET_REGION_INFO, &reg);
+
+		if (ret) {
+			RTE_LOG(ERR, EAL, "  %s cannot get device region info!\n",
+					pci_addr);
+			close(vfio_dev_fd);
+			if (internal_config.process_type == RTE_PROC_PRIMARY)
+				rte_free(vfio_res);
+			return -1;
+		}
+
+		/* skip non-mmapable BARs */
+		if ((reg.flags & VFIO_REGION_INFO_FLAG_MMAP) == 0)
+			continue;
+
+		/* skip MSI-X BAR */
+		if (i == msix_bar)
+			continue;
+
+		bar_addr = pci_map_resource(maps[i].addr, vfio_dev_fd, reg.offset,
+				reg.size);
+
+		if (bar_addr == NULL) {
+			RTE_LOG(ERR, EAL, "  %s mapping BAR%i failed: %s\n", pci_addr, i,
+					strerror(errno));
+			close(vfio_dev_fd);
+			if (internal_config.process_type == RTE_PROC_PRIMARY)
+				rte_free(vfio_res);
+			return -1;
+		}
+
+		maps[i].addr = bar_addr;
+		maps[i].offset = reg.offset;
+		maps[i].size = reg.size;
+		dev->mem_resource[i].addr = bar_addr;
+	}
+
+	/* if secondary process, do not set up interrupts */
+	if (internal_config.process_type == RTE_PROC_PRIMARY) {
+		if (pci_vfio_setup_interrupts(dev, vfio_dev_fd) != 0) {
+			RTE_LOG(ERR, EAL, "  %s error setting up interrupts!\n", pci_addr);
+			close(vfio_dev_fd);
+			rte_free(vfio_res);
+			return -1;
+		}
+
+		/* set bus mastering for the device */
+		if (pci_vfio_set_bus_master(vfio_dev_fd)) {
+			RTE_LOG(ERR, EAL, "  %s cannot set up bus mastering!\n", pci_addr);
+			close(vfio_dev_fd);
+			rte_free(vfio_res);
+			return -1;
+		}
+
+		/* Reset the device */
+		ioctl(vfio_dev_fd, VFIO_DEVICE_RESET);
+	}
+
+	if (internal_config.process_type == RTE_PROC_PRIMARY)
+		TAILQ_INSERT_TAIL(pci_res_list, vfio_res, next);
+
+	return 0;
+}
+
+int
+pci_vfio_enable(void)
+{
+	/* initialize group list */
+	int i;
+
+	for (i = 0; i < VFIO_MAX_GROUPS; i++) {
+		vfio_cfg.vfio_groups[i].fd = -1;
+		vfio_cfg.vfio_groups[i].group_no = -1;
+	}
+	vfio_cfg.vfio_container_fd = -1;
+
+	/* check if we have VFIO driver enabled */
+	if (access(VFIO_DIR, F_OK) == 0)
+		vfio_cfg.vfio_enabled = 1;
+	else
+		RTE_LOG(INFO, EAL, "VFIO driver not loaded or wrong permissions\n");
+
+	return 0;
+}
+
+int
+pci_vfio_is_enabled(void)
+{
+	return vfio_cfg.vfio_enabled;
+}
+#endif
diff --git a/lib/librte_eal/linuxapp/eal/include/eal_internal_cfg.h b/lib/librte_eal/linuxapp/eal/include/eal_internal_cfg.h
index 92e3065..5468b0a 100644
--- a/lib/librte_eal/linuxapp/eal/include/eal_internal_cfg.h
+++ b/lib/librte_eal/linuxapp/eal/include/eal_internal_cfg.h
@@ -40,6 +40,7 @@
 #define _EAL_LINUXAPP_INTERNAL_CFG
 
 #include <rte_eal.h>
+#include <rte_pci_dev_feature_defs.h>
 
 #define MAX_HUGEPAGE_SIZES 3  /**< support up to 3 page sizes */
 
@@ -76,6 +77,8 @@ struct internal_config {
 	volatile uint64_t socket_mem[RTE_MAX_NUMA_NODES]; /**< amount of memory per socket */
 	uintptr_t base_virtaddr;          /**< base address to try and reserve memory from */
 	volatile int syslog_facility;	  /**< facility passed to openlog() */
+	/** default interrupt mode for VFIO */
+	volatile enum rte_intr_mode vfio_intr_mode;
 	const char *hugefile_prefix;      /**< the base filename of hugetlbfs files */
 	const char *hugepage_dir;         /**< specific hugetlbfs directory to use */
 
diff --git a/lib/librte_eal/linuxapp/eal/include/eal_pci_init.h b/lib/librte_eal/linuxapp/eal/include/eal_pci_init.h
index 1292eda..23fb3c3 100644
--- a/lib/librte_eal/linuxapp/eal/include/eal_pci_init.h
+++ b/lib/librte_eal/linuxapp/eal/include/eal_pci_init.h
@@ -34,6 +34,8 @@
 #ifndef EAL_PCI_INIT_H_
 #define EAL_PCI_INIT_H_
 
+#include "eal_vfio.h"
+
 struct pci_map {
 	void *addr;
 	uint64_t offset;
@@ -63,4 +65,33 @@ void * pci_map_resource(void * requested_addr, int fd, off_t offset,
 /* map IGB_UIO resource prototype */
 int pci_uio_map_resource(struct rte_pci_device *dev);
 
+#ifdef VFIO_PRESENT
+
+#define VFIO_MAX_GROUPS 64
+
+int pci_vfio_enable(void);
+int pci_vfio_is_enabled(void);
+
+/* map VFIO resource prototype */
+int pci_vfio_map_resource(struct rte_pci_device *dev);
+
+/*
+ * we don't need to store device fd's anywhere since they can be obtained from
+ * the group fd via an ioctl() call.
+ */
+struct vfio_group {
+	int group_no;
+	int fd;
+};
+
+struct vfio_config {
+	int vfio_enabled;
+	int vfio_container_fd;
+	int vfio_container_has_dma;
+	int vfio_group_idx;
+	struct vfio_group vfio_groups[VFIO_MAX_GROUPS];
+};
+
+#endif
+
 #endif /* EAL_PCI_INIT_H_ */
diff --git a/lib/librte_eal/linuxapp/eal/include/eal_vfio.h b/lib/librte_eal/linuxapp/eal/include/eal_vfio.h
index 354e9ca..03e693e 100644
--- a/lib/librte_eal/linuxapp/eal/include/eal_vfio.h
+++ b/lib/librte_eal/linuxapp/eal/include/eal_vfio.h
@@ -42,6 +42,12 @@
 #if LINUX_VERSION_CODE >= KERNEL_VERSION(3, 6, 0)
 #include <linux/vfio.h>
 
+#if LINUX_VERSION_CODE < KERNEL_VERSION(3, 10, 0)
+#define RTE_PCI_MSIX_TABLE_BIR 0x7
+#else
+#define RTE_PCI_MSIX_TABLE_BIR PCI_MSIX_TABLE_BIR
+#endif
+
 #define VFIO_PRESENT
 #endif /* kernel version */
 #endif /* RTE_EAL_VFIO */
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v4 13/20] vfio: add multiprocess support.
  2014-06-03 10:17     ` [dpdk-dev] [PATCH v4 00/20] Add VFIO support to DPDK Anatoly Burakov
                         ` (11 preceding siblings ...)
  2014-06-03 10:18       ` [dpdk-dev] [PATCH v4 12/20] vfio: create mapping code for VFIO Anatoly Burakov
@ 2014-06-03 10:18       ` Anatoly Burakov
  2014-06-03 10:18       ` [dpdk-dev] [PATCH v4 14/20] pci: enable VFIO device binding Anatoly Burakov
                         ` (7 subsequent siblings)
  20 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-06-03 10:18 UTC (permalink / raw)
  To: dev

Since VFIO cannot be used to map the same device twice, secondary
processes receive the device/group fd's by means of communicating over a
local socket. Only group and container fd's should be sent, as device
fd's can be obtained via ioctl() calls' on the group fd.

For multiprocess, VFIO distinguishes between existing but unused groups
(e.g. grups that aren't bound to VFIO driver) and non-existing groups in
order to know if the secondary process requests a valid group, or if
secondary process requests something that doesn't exist.

VFIO multiprocess sync communicates over a simple protocol. It defines
two requests - request for group fd, and request for container fd.
Possible replies are: SOCKET_OK (an OK signal), SOCKET_ERR (error
signal) and SOCKET_NO_FD (a signal that indicates that the requested
VFIO group is valid, but no fd is present for that group - indicating
that the respective group is simply not bound to VFIO driver).

Here is the logic in a nutshell:

1. secondary process sends SOCKET_REQ_CONTAINER or SOCKET_REQ_GROUP
1a. in case of SOCKET_REQ_GROUP, client also then sends group number
2. primary process receives message
2a. in case of invalid group, SOCKET_ERR is sent back to secondary
2b. in case of unbound group, SOCKET_NO_FD is sent back to secondary
2c. in case of valid group, SOCKET_OK is sent and followed by fd
3. socket is closed

in case of any error, socket is closed and SOCKET_ERR is sent.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/linuxapp/eal/Makefile               |   1 +
 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c         |  79 ++++-
 lib/librte_eal/linuxapp/eal/eal_pci_vfio_mp_sync.c | 395 +++++++++++++++++++++
 lib/librte_eal/linuxapp/eal/include/eal_pci_init.h |  19 +
 4 files changed, 492 insertions(+), 2 deletions(-)
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_pci_vfio_mp_sync.c

diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index cf9f026..3c05edf 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -59,6 +59,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_log.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci_uio.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci_vfio.c
+SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci_vfio_mp_sync.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_debug.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_lcore.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_timer.c
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
index e1d6973..f0d4f55 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
@@ -303,7 +303,7 @@ pci_vfio_setup_interrupts(struct rte_pci_device *dev, int vfio_dev_fd)
 }
 
 /* open container fd or get an existing one */
-static int
+int
 pci_vfio_get_container_fd(void)
 {
 	int ret, vfio_container_fd;
@@ -333,13 +333,36 @@ pci_vfio_get_container_fd(void)
 		}
 
 		return vfio_container_fd;
+	} else {
+		/*
+		 * if we're in a secondary process, request container fd from the
+		 * primary process via our socket
+		 */
+		int socket_fd;
+		if ((socket_fd = vfio_mp_sync_connect_to_primary()) < 0) {
+			RTE_LOG(ERR, EAL, "  cannot connect to primary process!\n");
+			return -1;
+		}
+		if (vfio_mp_sync_send_request(socket_fd, SOCKET_REQ_CONTAINER) < 0) {
+			RTE_LOG(ERR, EAL, "  cannot request container fd!\n");
+			close(socket_fd);
+			return -1;
+		}
+		vfio_container_fd = vfio_mp_sync_receive_fd(socket_fd);
+		if (vfio_container_fd < 0) {
+			RTE_LOG(ERR, EAL, "  cannot get container fd!\n");
+			close(socket_fd);
+			return -1;
+		}
+		close(socket_fd);
+		return vfio_container_fd;
 	}
 
 	return -1;
 }
 
 /* open group fd or get an existing one */
-static int
+int
 pci_vfio_get_group_fd(int iommu_group_no)
 {
 	int i;
@@ -375,6 +398,44 @@ pci_vfio_get_group_fd(int iommu_group_no)
 		vfio_cfg.vfio_groups[vfio_cfg.vfio_group_idx].fd = vfio_group_fd;
 		return vfio_group_fd;
 	}
+	/* if we're in a secondary process, request group fd from the primary
+	 * process via our socket
+	 */
+	else {
+		int socket_fd, ret;
+		if ((socket_fd = vfio_mp_sync_connect_to_primary()) < 0) {
+			RTE_LOG(ERR, EAL, "  cannot connect to primary process!\n");
+			return -1;
+		}
+		if (vfio_mp_sync_send_request(socket_fd, SOCKET_REQ_GROUP) < 0) {
+			RTE_LOG(ERR, EAL, "  cannot request container fd!\n");
+			close(socket_fd);
+			return -1;
+		}
+		if (vfio_mp_sync_send_request(socket_fd, iommu_group_no) < 0) {
+			RTE_LOG(ERR, EAL, "  cannot send group number!\n");
+			close(socket_fd);
+			return -1;
+		}
+		ret = vfio_mp_sync_receive_request(socket_fd);
+		switch (ret) {
+		case SOCKET_NO_FD:
+			close(socket_fd);
+			return 0;
+		case SOCKET_OK:
+			vfio_group_fd = vfio_mp_sync_receive_fd(socket_fd);
+			/* if we got the fd, return it */
+			if (vfio_group_fd > 0) {
+				close(socket_fd);
+				return vfio_group_fd;
+			}
+			/* fall-through on error */
+		default:
+			RTE_LOG(ERR, EAL, "  cannot get container fd!\n");
+			close(socket_fd);
+			return -1;
+		}
+	}
 	return -1;
 }
 
@@ -602,6 +663,20 @@ pci_vfio_map_resource(struct rte_pci_device *dev)
 		/* get number of registers (up to BAR5) */
 		vfio_res->nb_maps = RTE_MIN((int) device_info.num_regions,
 				VFIO_PCI_BAR5_REGION_INDEX + 1);
+	} else {
+		/* if we're in a secondary process, just find our tailq entry */
+		TAILQ_FOREACH(vfio_res, pci_res_list, next) {
+			if (memcmp(&vfio_res->pci_addr, &dev->addr, sizeof(dev->addr)))
+				continue;
+			break;
+		}
+		/* if we haven't found our tailq entry, something's wrong */
+		if (vfio_res == NULL) {
+			RTE_LOG(ERR, EAL, "  %s cannot find TAILQ entry for PCI device!\n",
+					pci_addr);
+			close(vfio_dev_fd);
+			return -1;
+		}
 	}
 
 	/* map BARs */
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio_mp_sync.c b/lib/librte_eal/linuxapp/eal/eal_pci_vfio_mp_sync.c
new file mode 100644
index 0000000..26dbaa5
--- /dev/null
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio_mp_sync.c
@@ -0,0 +1,395 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <string.h>
+#include <fcntl.h>
+#include <sys/socket.h>
+
+/* sys/un.h with __USE_MISC uses strlen, which is unsafe and should not be used. */
+#ifdef __USE_MISC
+#define REMOVED_USE_MISC
+#undef __USE_MISC
+#endif
+#include <sys/un.h>
+/* make sure we redefine __USE_MISC only if it was previously undefined */
+#ifdef REMOVED_USE_MISC
+#define __USE_MISC
+#undef REMOVED_USE_MISC
+#endif
+
+#include <rte_log.h>
+#include <rte_pci.h>
+#include <rte_tailq.h>
+#include <rte_eal_memconfig.h>
+#include <rte_malloc.h>
+
+#include "eal_filesystem.h"
+#include "eal_pci_init.h"
+
+/**
+ * @file
+ * VFIO socket for communication between primary and secondary processes.
+ *
+ * This file is only compiled if CONFIG_RTE_EAL_VFIO is set to "y".
+ */
+
+#ifdef VFIO_PRESENT
+
+#define SOCKET_PATH_FMT "%s/.%s_mp_socket"
+#define CMSGLEN (CMSG_LEN(sizeof(int)))
+#define FD_TO_CMSGHDR(fd, chdr) \
+		do {\
+			(chdr).cmsg_len = CMSGLEN;\
+			(chdr).cmsg_level = SOL_SOCKET;\
+			(chdr).cmsg_type = SCM_RIGHTS;\
+			memcpy((chdr).__cmsg_data, &(fd), sizeof(fd));\
+		} while (0)
+#define CMSGHDR_TO_FD(chdr, fd) \
+			memcpy(&(fd), (chdr).__cmsg_data, sizeof(fd))
+
+static pthread_t socket_thread;
+static int mp_socket_fd;
+
+
+/* get socket path (/var/run if root, $HOME otherwise) */
+static void
+get_socket_path(char *buffer, int bufsz)
+{
+	const char *dir = "/var/run";
+	const char *home_dir = getenv("HOME");
+
+	if (getuid() != 0 && home_dir != NULL)
+		dir = home_dir;
+
+	/* use current prefix as file path */
+	rte_snprintf(buffer, bufsz, SOCKET_PATH_FMT, dir,
+			internal_config.hugefile_prefix);
+}
+
+
+
+/*
+ * data flow for socket comm protocol:
+ * 1. client sends SOCKET_REQ_CONTAINER or SOCKET_REQ_GROUP
+ * 1a. in case of SOCKET_REQ_GROUP, client also then sends group number
+ * 2. server receives message
+ * 2a. in case of invalid group, SOCKET_ERR is sent back to client
+ * 2b. in case of unbound group, SOCKET_NO_FD is sent back to client
+ * 2c. in case of valid group, SOCKET_OK is sent and immediately followed by fd
+ *
+ * in case of any error, socket is closed.
+ */
+
+/* send a request, return -1 on error */
+int
+vfio_mp_sync_send_request(int socket, int req)
+{
+	struct msghdr hdr;
+	struct iovec iov;
+	int buf;
+	int ret;
+
+	memset(&hdr, 0, sizeof(hdr));
+
+	buf = req;
+
+	hdr.msg_iov = &iov;
+	hdr.msg_iovlen = 1;
+	iov.iov_base = (char *) &buf;
+	iov.iov_len = sizeof(buf);
+
+	ret = sendmsg(socket, &hdr, 0);
+	if (ret < 0)
+		return -1;
+	return 0;
+}
+
+/* receive a request and return it */
+int
+vfio_mp_sync_receive_request(int socket)
+{
+	int buf;
+	struct msghdr hdr;
+	struct iovec iov;
+	int ret, req;
+
+	memset(&hdr, 0, sizeof(hdr));
+
+	buf = SOCKET_ERR;
+
+	hdr.msg_iov = &iov;
+	hdr.msg_iovlen = 1;
+	iov.iov_base = (char *) &buf;
+	iov.iov_len = sizeof(buf);
+
+	ret = recvmsg(socket, &hdr, 0);
+	if (ret < 0)
+		return -1;
+
+	req = buf;
+
+	return req;
+}
+
+/* send OK in message, fd in control message */
+int
+vfio_mp_sync_send_fd(int socket, int fd)
+{
+	int buf;
+	struct msghdr hdr;
+	struct cmsghdr *chdr;
+	char chdr_buf[CMSGLEN];
+	struct iovec iov;
+	int ret;
+
+	chdr = (struct cmsghdr *) chdr_buf;
+	memset(chdr, 0, sizeof(chdr_buf));
+	memset(&hdr, 0, sizeof(hdr));
+
+	hdr.msg_iov = &iov;
+	hdr.msg_iovlen = 1;
+	iov.iov_base = (char *) &buf;
+	iov.iov_len = sizeof(buf);
+	hdr.msg_control = chdr;
+	hdr.msg_controllen = CMSGLEN;
+
+	buf = SOCKET_OK;
+	FD_TO_CMSGHDR(fd, *chdr);
+
+	ret = sendmsg(socket, &hdr, 0);
+	if (ret < 0)
+		return -1;
+	return 0;
+}
+
+/* receive OK in message, fd in control message */
+int
+vfio_mp_sync_receive_fd(int socket)
+{
+	int buf;
+	struct msghdr hdr;
+	struct cmsghdr *chdr;
+	char chdr_buf[CMSGLEN];
+	struct iovec iov;
+	int ret, req, fd;
+
+	buf = SOCKET_ERR;
+
+	chdr = (struct cmsghdr *) chdr_buf;
+	memset(chdr, 0, sizeof(chdr_buf));
+	memset(&hdr, 0, sizeof(hdr));
+
+	hdr.msg_iov = &iov;
+	hdr.msg_iovlen = 1;
+	iov.iov_base = (char *) &buf;
+	iov.iov_len = sizeof(buf);
+	hdr.msg_control = chdr;
+	hdr.msg_controllen = CMSGLEN;
+
+	ret = recvmsg(socket, &hdr, 0);
+	if (ret < 0)
+		return -1;
+
+	req = buf;
+
+	if (req != SOCKET_OK)
+		return -1;
+
+	CMSGHDR_TO_FD(*chdr, fd);
+
+	return fd;
+}
+
+/* connect socket_fd in secondary process to the primary process's socket */
+int
+vfio_mp_sync_connect_to_primary(void)
+{
+	struct sockaddr_un addr;
+	socklen_t sockaddr_len;
+	int socket_fd;
+
+	/* set up a socket */
+	socket_fd = socket(AF_UNIX, SOCK_SEQPACKET, 0);
+	if (socket_fd < 0) {
+		RTE_LOG(ERR, EAL, "Failed to create socket!\n");
+		return -1;
+	}
+
+	get_socket_path(addr.sun_path, sizeof(addr.sun_path));
+	addr.sun_family = AF_UNIX;
+
+	sockaddr_len = sizeof(struct sockaddr_un);
+
+	if (connect(socket_fd, (struct sockaddr *) &addr, sockaddr_len) == 0)
+		return socket_fd;
+
+	/* if connect failed */
+	close(socket_fd);
+	return -1;
+}
+
+
+
+/*
+ * socket listening thread for primary process
+ */
+static __attribute__((noreturn)) void *
+pci_vfio_mp_sync_thread(void __rte_unused * arg)
+{
+	int ret, fd, vfio_group_no;
+
+	/* wait for requests on the socket */
+	for (;;) {
+		int conn_sock;
+		struct sockaddr_un addr;
+		socklen_t sockaddr_len = sizeof(addr);
+
+		/* this is a blocking call */
+		conn_sock = accept(mp_socket_fd, (struct sockaddr *) &addr,
+				&sockaddr_len);
+
+		/* just restart on error */
+		if (conn_sock == -1)
+			continue;
+
+		/* set socket to linger after close */
+		struct linger l;
+		l.l_onoff = 1;
+		l.l_linger = 60;
+		setsockopt(conn_sock, SOL_SOCKET, SO_LINGER, &l, sizeof(l));
+
+		ret = vfio_mp_sync_receive_request(conn_sock);
+
+		switch (ret) {
+		case SOCKET_REQ_CONTAINER:
+			fd = pci_vfio_get_container_fd();
+			if (fd < 0)
+				vfio_mp_sync_send_request(conn_sock, SOCKET_ERR);
+			else
+				vfio_mp_sync_send_fd(conn_sock, fd);
+			break;
+		case SOCKET_REQ_GROUP:
+			/* wait for group number */
+			vfio_group_no = vfio_mp_sync_receive_request(conn_sock);
+			if (vfio_group_no < 0) {
+				close(conn_sock);
+				continue;
+			}
+
+			fd = pci_vfio_get_group_fd(vfio_group_no);
+
+			if (fd < 0)
+				vfio_mp_sync_send_request(conn_sock, SOCKET_ERR);
+			/* if VFIO group exists but isn't bound to VFIO driver */
+			else if (fd == 0)
+				vfio_mp_sync_send_request(conn_sock, SOCKET_NO_FD);
+			/* if group exists and is bound to VFIO driver */
+			else {
+				vfio_mp_sync_send_request(conn_sock, SOCKET_OK);
+				vfio_mp_sync_send_fd(conn_sock, fd);
+			}
+			break;
+		default:
+			vfio_mp_sync_send_request(conn_sock, SOCKET_ERR);
+			break;
+		}
+		close(conn_sock);
+	}
+}
+
+static int
+vfio_mp_sync_socket_setup(void)
+{
+	int ret, socket_fd;
+	struct sockaddr_un addr;
+	socklen_t sockaddr_len;
+
+	/* set up a socket */
+	socket_fd = socket(AF_UNIX, SOCK_SEQPACKET, 0);
+	if (socket_fd < 0) {
+		RTE_LOG(ERR, EAL, "Failed to create socket!\n");
+		return -1;
+	}
+
+	get_socket_path(addr.sun_path, sizeof(addr.sun_path));
+	addr.sun_family = AF_UNIX;
+
+	sockaddr_len = sizeof(struct sockaddr_un);
+
+	unlink(addr.sun_path);
+
+	ret = bind(socket_fd, (struct sockaddr *) &addr, sockaddr_len);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Failed to bind socket: %s!\n", strerror(errno));
+		close(socket_fd);
+		return -1;
+	}
+
+	ret = listen(socket_fd, 50);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Failed to listen: %s!\n", strerror(errno));
+		close(socket_fd);
+		return -1;
+	}
+
+	/* save the socket in local configuration */
+	mp_socket_fd = socket_fd;
+
+	return 0;
+}
+
+/*
+ * set up a local socket and tell it to listen for incoming connections
+ */
+int
+pci_vfio_mp_sync_setup(void)
+{
+	int ret;
+
+	if (vfio_mp_sync_socket_setup() < 0) {
+		RTE_LOG(ERR, EAL, "Failed to set up local socket!\n");
+		return -1;
+	}
+
+	ret = pthread_create(&socket_thread, NULL,
+			pci_vfio_mp_sync_thread, NULL);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Failed to create thread for communication with "
+				"secondary processes!\n");
+		close(mp_socket_fd);
+		return -1;
+	}
+	return 0;
+}
+
+#endif
diff --git a/lib/librte_eal/linuxapp/eal/include/eal_pci_init.h b/lib/librte_eal/linuxapp/eal/include/eal_pci_init.h
index 23fb3c3..45846cc 100644
--- a/lib/librte_eal/linuxapp/eal/include/eal_pci_init.h
+++ b/lib/librte_eal/linuxapp/eal/include/eal_pci_init.h
@@ -71,9 +71,28 @@ int pci_uio_map_resource(struct rte_pci_device *dev);
 
 int pci_vfio_enable(void);
 int pci_vfio_is_enabled(void);
+int pci_vfio_mp_sync_setup(void);
 
 /* map VFIO resource prototype */
 int pci_vfio_map_resource(struct rte_pci_device *dev);
+int pci_vfio_get_group_fd(int iommu_group_fd);
+int pci_vfio_get_container_fd(void);
+
+/*
+ * Function prototypes for VFIO multiprocess sync functions
+ */
+int vfio_mp_sync_send_request(int socket, int req);
+int vfio_mp_sync_receive_request(int socket);
+int vfio_mp_sync_send_fd(int socket, int fd);
+int vfio_mp_sync_receive_fd(int socket);
+int vfio_mp_sync_connect_to_primary(void);
+
+/* socket comm protocol definitions */
+#define SOCKET_REQ_CONTAINER 0x100
+#define SOCKET_REQ_GROUP 0x200
+#define SOCKET_OK 0x0
+#define SOCKET_NO_FD 0x1
+#define SOCKET_ERR 0xFF
 
 /*
  * we don't need to store device fd's anywhere since they can be obtained from
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v4 14/20] pci: enable VFIO device binding
  2014-06-03 10:17     ` [dpdk-dev] [PATCH v4 00/20] Add VFIO support to DPDK Anatoly Burakov
                         ` (12 preceding siblings ...)
  2014-06-03 10:18       ` [dpdk-dev] [PATCH v4 13/20] vfio: add multiprocess support Anatoly Burakov
@ 2014-06-03 10:18       ` Anatoly Burakov
  2014-06-03 10:18       ` [dpdk-dev] [PATCH v4 15/20] eal: added support for selecting VFIO interrupt type from EAL command-line Anatoly Burakov
                         ` (6 subsequent siblings)
  20 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-06-03 10:18 UTC (permalink / raw)
  To: dev

Add support for binding VFIO devices if RTE_PCI_DRV_NEED_MAPPING is set
for this driver. Try VFIO first, if not mapped then try IGB_UIO too.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/linuxapp/eal/eal_pci.c | 42 ++++++++++++++++++++++++++++++++---
 1 file changed, 39 insertions(+), 3 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c
index a0abec8..8a9cbf9 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -393,6 +393,27 @@ error:
 	return -1;
 }
 
+static int
+pci_map_device(struct rte_pci_device *dev)
+{
+	int ret, mapped = 0;
+
+	/* try mapping the NIC resources using VFIO if it exists */
+#ifdef VFIO_PRESENT
+	if (pci_vfio_is_enabled()) {
+		if ((ret = pci_vfio_map_resource(dev)) == 0)
+			mapped = 1;
+		else if (ret < 0)
+			return ret;
+	}
+#endif
+	/* map resources for devices that use igb_uio */
+	if (!mapped && (ret = pci_uio_map_resource(dev)) != 0)
+		return ret;
+
+	return 0;
+}
+
 /*
  * If vendor/device ID match, call the devinit() function of the
  * driver.
@@ -400,8 +421,8 @@ error:
 int
 rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, struct rte_pci_device *dev)
 {
+	int ret;
 	struct rte_pci_id *id_table;
-	int ret = 0;
 
 	for (id_table = dr->id_table ; id_table->vendor_id != 0; id_table++) {
 
@@ -436,8 +457,7 @@ rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, struct rte_pci_device *d
 		}
 
 		if (dr->drv_flags & RTE_PCI_DRV_NEED_MAPPING) {
-			/* map resources for devices that use igb_uio */
-			if ((ret = pci_uio_map_resource(dev)) != 0)
+			if ((ret = pci_map_device(dev)) != 0)
 				return ret;
 		} else if (dr->drv_flags & RTE_PCI_DRV_FORCE_UNBIND &&
 		           rte_eal_process_type() == RTE_PROC_PRIMARY) {
@@ -473,5 +493,21 @@ rte_eal_pci_init(void)
 		RTE_LOG(ERR, EAL, "%s(): Cannot scan PCI bus\n", __func__);
 		return -1;
 	}
+#ifdef VFIO_PRESENT
+	pci_vfio_enable();
+
+	if (pci_vfio_is_enabled()) {
+
+		/* if we are primary process, create a thread to communicate with
+		 * secondary processes. the thread will use a socket to wait for
+		 * requests from secondary process to send open file descriptors,
+		 * because VFIO does not allow multiple open descriptors on a group or
+		 * VFIO container.
+		 */
+		if (internal_config.process_type == RTE_PROC_PRIMARY &&
+				pci_vfio_mp_sync_setup() < 0)
+			return -1;
+	}
+#endif
 	return 0;
 }
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v4 15/20] eal: added support for selecting VFIO interrupt type from EAL command-line
  2014-06-03 10:17     ` [dpdk-dev] [PATCH v4 00/20] Add VFIO support to DPDK Anatoly Burakov
                         ` (13 preceding siblings ...)
  2014-06-03 10:18       ` [dpdk-dev] [PATCH v4 14/20] pci: enable VFIO device binding Anatoly Burakov
@ 2014-06-03 10:18       ` Anatoly Burakov
  2014-06-03 10:18       ` [dpdk-dev] [PATCH v4 16/20] eal: make --no-huge use mmap instead of malloc Anatoly Burakov
                         ` (5 subsequent siblings)
  20 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-06-03 10:18 UTC (permalink / raw)
  To: dev

Unlike igb_uio, VFIO interrupt type is not set by kernel module
parameters but is set up via ioctl() calls at runtime. This warrants
a new EAL command-line parameter. It will have no effect if VFIO is
not compiled, but will set VFIO interrupt type to either "legacy", "msi"
or "msix" if VFIO support is compiled. Note that VFIO initialization
will fail if the interrupt type selected is not supported by the system.

If the interrupt type parameter wasn't specified, VFIO will try all
interrupt types (starting with MSI-X).

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/linuxapp/eal/eal.c | 34 ++++++++++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)

diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index aeb5903..10c40fa 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -99,6 +99,7 @@
 #define OPT_BASE_VIRTADDR   "base-virtaddr"
 #define OPT_XEN_DOM0    "xen-dom0"
 #define OPT_CREATE_UIO_DEV "create-uio-dev"
+#define OPT_VFIO_INTR    "vfio-intr"
 
 #define RTE_EAL_BLACKLIST_SIZE	0x100
 
@@ -361,6 +362,8 @@ eal_usage(const char *prgname)
 	       "  --"OPT_VMWARE_TSC_MAP": use VMware TSC map instead of "
 	    		   "native RDTSC\n"
 	       "  --"OPT_BASE_VIRTADDR": specify base virtual address\n"
+	       "  --"OPT_VFIO_INTR": specify desired interrupt mode for VFIO "
+	       	   	   "(legacy|msi|msix)\n"
 	       "  --"OPT_CREATE_UIO_DEV": create /dev/uioX (usually done by hotplug)\n"
 	       "\nEAL options for DEBUG use only:\n"
 	       "  --"OPT_NO_HUGE"  : use malloc instead of hugetlbfs\n"
@@ -579,6 +582,28 @@ eal_parse_base_virtaddr(const char *arg)
 	return 0;
 }
 
+static int
+eal_parse_vfio_intr(const char *mode)
+{
+	unsigned i;
+	static struct {
+		const char *name;
+		enum rte_intr_mode value;
+	} map[] = {
+		{ "legacy", RTE_INTR_MODE_LEGACY },
+		{ "msi", RTE_INTR_MODE_MSI },
+		{ "msix", RTE_INTR_MODE_MSIX },
+	};
+
+	for (i = 0; i < RTE_DIM(map); i++) {
+		if (!strcmp(mode, map[i].name)) {
+			internal_config.vfio_intr_mode = map[i].value;
+			return 0;
+		}
+	}
+	return -1;
+}
+
 static inline size_t
 eal_get_hugepage_mem_size(void)
 {
@@ -633,6 +658,7 @@ eal_parse_args(int argc, char **argv)
 		{OPT_PCI_BLACKLIST, 1, 0, 0},
 		{OPT_VDEV, 1, 0, 0},
 		{OPT_SYSLOG, 1, NULL, 0},
+		{OPT_VFIO_INTR, 1, NULL, 0},
 		{OPT_BASE_VIRTADDR, 1, 0, 0},
 		{OPT_XEN_DOM0, 0, 0, 0},
 		{OPT_CREATE_UIO_DEV, 1, NULL, 0},
@@ -829,6 +855,14 @@ eal_parse_args(int argc, char **argv)
 					return -1;
 				}
 			}
+			else if (!strcmp(lgopts[option_index].name, OPT_VFIO_INTR)) {
+				if (eal_parse_vfio_intr(optarg) < 0) {
+					RTE_LOG(ERR, EAL, "invalid parameters for --"
+							OPT_VFIO_INTR "\n");
+					eal_usage(prgname);
+					return -1;
+				}
+			}
 			else if (!strcmp(lgopts[option_index].name, OPT_CREATE_UIO_DEV)) {
 				internal_config.create_uio_dev = 1;
 			}
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v4 16/20] eal: make --no-huge use mmap instead of malloc
  2014-06-03 10:17     ` [dpdk-dev] [PATCH v4 00/20] Add VFIO support to DPDK Anatoly Burakov
                         ` (14 preceding siblings ...)
  2014-06-03 10:18       ` [dpdk-dev] [PATCH v4 15/20] eal: added support for selecting VFIO interrupt type from EAL command-line Anatoly Burakov
@ 2014-06-03 10:18       ` Anatoly Burakov
  2014-06-03 10:18       ` [dpdk-dev] [PATCH v4 17/20] test app: adding unit tests for VFIO EAL command-line parameter Anatoly Burakov
                         ` (4 subsequent siblings)
  20 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-06-03 10:18 UTC (permalink / raw)
  To: dev

This makes it possible to run DPDK without hugepage memory when VFIO
is used, as VFIO uses virtual addresses to set up DMA mappings.

Technically, malloc is just fine, but we want to guarantee that
memory will be page-aligned, so using mmap to be safe.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/linuxapp/eal/eal_memory.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
index 8d1edd9..315214b 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -1031,7 +1031,13 @@ rte_eal_hugepage_init(void)
 
 	/* hugetlbfs can be disabled */
 	if (internal_config.no_hugetlbfs) {
-		addr = malloc(internal_config.memory);
+		addr = mmap(NULL, internal_config.memory, PROT_READ | PROT_WRITE,
+				MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
+		if (addr == MAP_FAILED) {
+			RTE_LOG(ERR, EAL, "%s: mmap() failed: %s\n", __func__,
+					strerror(errno));
+			return -1;
+		}
 		mcfg->memseg[0].phys_addr = (phys_addr_t)(uintptr_t)addr;
 		mcfg->memseg[0].addr = addr;
 		mcfg->memseg[0].len = internal_config.memory;
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v4 17/20] test app: adding unit tests for VFIO EAL command-line parameter
  2014-06-03 10:17     ` [dpdk-dev] [PATCH v4 00/20] Add VFIO support to DPDK Anatoly Burakov
                         ` (15 preceding siblings ...)
  2014-06-03 10:18       ` [dpdk-dev] [PATCH v4 16/20] eal: make --no-huge use mmap instead of malloc Anatoly Burakov
@ 2014-06-03 10:18       ` Anatoly Burakov
  2014-06-03 10:18       ` [dpdk-dev] [PATCH v4 18/20] igb_uio: Removed PCI ID table from igb_uio Anatoly Burakov
                         ` (3 subsequent siblings)
  20 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-06-03 10:18 UTC (permalink / raw)
  To: dev

Adding unit tests for VFIO interrupt type command-line parameter. We
don't know if VFIO is compiled (eal_vfio.h header is internal to
Linuxapp EAL), so we check this flag regardless.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 app/test/test_eal_flags.c | 36 ++++++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/app/test/test_eal_flags.c b/app/test/test_eal_flags.c
index 195a1f5..a0ee4e6 100644
--- a/app/test/test_eal_flags.c
+++ b/app/test/test_eal_flags.c
@@ -768,6 +768,22 @@ test_misc_flags(void)
 	const char *argv11[] = {prgname, "--file-prefix=virtaddr",
 			"-c", "1", "-n", "2", "--base-virtaddr=0x12345678"};
 
+	/* try running with --vfio-intr INTx flag */
+	const char *argv12[] = {prgname, "--file-prefix=intr",
+			"-c", "1", "-n", "2", "--vfio-intr=legacy"};
+
+	/* try running with --vfio-intr MSI flag */
+	const char *argv13[] = {prgname, "--file-prefix=intr",
+			"-c", "1", "-n", "2", "--vfio-intr=msi"};
+
+	/* try running with --vfio-intr MSI-X flag */
+	const char *argv14[] = {prgname, "--file-prefix=intr",
+			"-c", "1", "-n", "2", "--vfio-intr=msix"};
+
+	/* try running with --vfio-intr invalid flag */
+	const char *argv15[] = {prgname, "--file-prefix=intr",
+			"-c", "1", "-n", "2", "--vfio-intr=invalid"};
+
 
 	if (launch_proc(argv0) == 0) {
 		printf("Error - process ran ok with invalid flag\n");
@@ -820,6 +836,26 @@ test_misc_flags(void)
 		printf("Error - process did not run ok with --base-virtaddr parameter\n");
 		return -1;
 	}
+	if (launch_proc(argv12) != 0) {
+		printf("Error - process did not run ok with "
+				"--vfio-intr INTx parameter\n");
+		return -1;
+	}
+	if (launch_proc(argv13) != 0) {
+		printf("Error - process did not run ok with "
+				"--vfio-intr MSI parameter\n");
+		return -1;
+	}
+	if (launch_proc(argv14) != 0) {
+		printf("Error - process did not run ok with "
+				"--vfio-intr MSI-X parameter\n");
+		return -1;
+	}
+	if (launch_proc(argv15) == 0) {
+		printf("Error - process run ok with "
+				"--vfio-intr invalid parameter\n");
+		return -1;
+	}
 	return 0;
 }
 #endif
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v4 18/20] igb_uio: Removed PCI ID table from igb_uio
  2014-06-03 10:17     ` [dpdk-dev] [PATCH v4 00/20] Add VFIO support to DPDK Anatoly Burakov
                         ` (16 preceding siblings ...)
  2014-06-03 10:18       ` [dpdk-dev] [PATCH v4 17/20] test app: adding unit tests for VFIO EAL command-line parameter Anatoly Burakov
@ 2014-06-03 10:18       ` Anatoly Burakov
  2014-06-03 10:18       ` [dpdk-dev] [PATCH v4 19/20] binding script: Renamed igb_uio_bind to dpdk_nic_bind Anatoly Burakov
                         ` (2 subsequent siblings)
  20 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-06-03 10:18 UTC (permalink / raw)
  To: dev

Removing PCI ID list to make igb_uio more similar to a generic driver
like vfio-pci or pci_uio_generic. This is done to make it easier for
the binding script to support multiple drivers.

Note that since igb_uio no longer has a PCI ID list, it can now be
bound to any device, not just those explicitly supported by DPDK. In
other words, it now behaves similar to PCI stub, VFIO and other generic
PCI drivers.

Therefore to bind a new device to igb_uio, the user will now have to
first write its PCI ID to "new_id" file inside the igb_uio driver
directory, and only then write the PCI ID to "bind". This is reflected
in changes to PCI binding script as well.

There's a weird behaviour of sysfs when a new device ID is added to
new_id. Subsequent writing to "bind" will result in IOError on
closing the file. This error is harmless but it triggers the
exception anyway, so in order to work around that, we check if the
device was actually bound to the driver before raising an error.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/linuxapp/igb_uio/igb_uio.c |  21 +-----
 tools/igb_uio_bind.py                     | 118 +++++++++++++++---------------
 2 files changed, 59 insertions(+), 80 deletions(-)

diff --git a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
index 7d5e6b4..6362b1c 100644
--- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
+++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
@@ -65,25 +65,6 @@ struct rte_uio_pci_dev {
 static char *intr_mode = NULL;
 static enum rte_intr_mode igbuio_intr_mode_preferred = RTE_INTR_MODE_MSIX;
 
-/* PCI device id table */
-static struct pci_device_id igbuio_pci_ids[] = {
-#define RTE_PCI_DEV_ID_DECL_EM(vend, dev) {PCI_DEVICE(vend, dev)},
-#define RTE_PCI_DEV_ID_DECL_IGB(vend, dev) {PCI_DEVICE(vend, dev)},
-#define RTE_PCI_DEV_ID_DECL_IGBVF(vend, dev) {PCI_DEVICE(vend, dev)},
-#define RTE_PCI_DEV_ID_DECL_IXGBE(vend, dev) {PCI_DEVICE(vend, dev)},
-#define RTE_PCI_DEV_ID_DECL_IXGBEVF(vend, dev) {PCI_DEVICE(vend, dev)},
-#ifdef RTE_LIBRTE_VIRTIO_PMD
-#define RTE_PCI_DEV_ID_DECL_VIRTIO(vend, dev) {PCI_DEVICE(vend, dev)},
-#endif
-#ifdef RTE_LIBRTE_VMXNET3_PMD
-#define RTE_PCI_DEV_ID_DECL_VMXNET3(vend, dev) {PCI_DEVICE(vend, dev)},
-#endif
-#include <rte_pci_dev_ids.h>
-{ 0, },
-};
-
-MODULE_DEVICE_TABLE(pci, igbuio_pci_ids);
-
 static inline struct rte_uio_pci_dev *
 igbuio_get_uio_pci_dev(struct uio_info *info)
 {
@@ -619,7 +600,7 @@ igbuio_config_intr_mode(char *intr_str)
 
 static struct pci_driver igbuio_pci_driver = {
 	.name = "igb_uio",
-	.id_table = igbuio_pci_ids,
+	.id_table = NULL,
 	.probe = igbuio_pci_probe,
 	.remove = igbuio_pci_remove,
 };
diff --git a/tools/igb_uio_bind.py b/tools/igb_uio_bind.py
index 824aa2b..33adcf4 100755
--- a/tools/igb_uio_bind.py
+++ b/tools/igb_uio_bind.py
@@ -42,8 +42,6 @@ ETHERNET_CLASS = "0200"
 # global dict ethernet devices present. Dictionary indexed by PCI address.
 # Each device within this is itself a dictionary of device properties
 devices = {}
-# list of vendor:device pairs (again stored as dictionary) supported by igb_uio
-module_dev_ids = []
 
 def usage():
     '''Print usage information for the program'''
@@ -147,9 +145,7 @@ def find_module(mod):
                 return path
 
 def check_modules():
-    '''Checks that the needed modules (igb_uio) is loaded, and then
-    determine from the .ko file, what its supported device ids are'''
-    global module_dev_ids
+    '''Checks that igb_uio is loaded'''
     
     fd = file("/proc/modules")
     loaded_mods = fd.readlines()
@@ -165,41 +161,36 @@ def check_modules():
     if not found:
         print "Error - module %s not loaded" %mod
         sys.exit(1)
-    
-    # now find the .ko and get list of supported vendor/dev-ids
-    modpath = find_module(mod)
-    if modpath is None:
-        print "Cannot find module file %s" % (mod + ".ko")
-        sys.exit(1)
-    depmod_output = check_output(["depmod", "-n", modpath]).splitlines()
-    for line in depmod_output:
-        if not line.startswith("alias"):
-            continue
-        if not line.endswith(mod):
-            continue
-        lineparts = line.split()
-        if not(lineparts[1].startswith("pci:")):
-            continue;
-        else:
-            lineparts[1] = lineparts[1][4:]
-        vendor = lineparts[1][:9]
-        device = lineparts[1][9:18]
-        if vendor.startswith("v") and device.startswith("d"):
-            module_dev_ids.append({"Vendor": int(vendor[1:],16), 
-                                   "Device": int(device[1:],16)})
-
-def is_supported_device(dev_id):
-    '''return true if device is supported by igb_uio, false otherwise'''
-    for dev in module_dev_ids:
-        if (dev["Vendor"] == devices[dev_id]["Vendor"] and 
-            dev["Device"] == devices[dev_id]["Device"]):
-            return True
-    return False
 
 def has_driver(dev_id):
     '''return true if a device is assigned to a driver. False otherwise'''
     return "Driver_str" in devices[dev_id]
 
+def get_pci_device_details(dev_id):
+    '''This function gets additional details for a PCI device'''
+    device = {}
+
+    extra_info = check_output(["lspci", "-vmmks", dev_id]).splitlines()
+
+    # parse lspci details
+    for line in extra_info:
+        if len(line) == 0:
+            continue
+        name, value = line.split("\t", 1)
+        name = name.strip(":") + "_str"
+        device[name] = value
+    # check for a unix interface name
+    sys_path = "/sys/bus/pci/devices/%s/net/" % dev_id
+    if exists(sys_path):
+        device["Interface"] = ",".join(os.listdir(sys_path))
+    else:
+        device["Interface"] = ""
+    # check if a port is used for ssh connection
+    device["Ssh_if"] = False
+    device["Active"] = ""
+
+    return device
+
 def get_nic_details():
     '''This function populates the "devices" dictionary. The keys used are
     the pci addresses (domain:bus:slot.func). The values are themselves
@@ -237,23 +228,10 @@ def get_nic_details():
 
     # based on the basic info, get extended text details            
     for d in devices.keys():
-        extra_info = check_output(["lspci", "-vmmks", d]).splitlines()
-        # parse lspci details
-        for line in extra_info:
-            if len(line) == 0:
-                continue
-            name, value = line.split("\t", 1)
-            name = name.strip(":") + "_str"
-            devices[d][name] = value
-        # check for a unix interface name
-        sys_path = "/sys/bus/pci/devices/%s/net/" % d
-        if exists(sys_path):
-            devices[d]["Interface"] = ",".join(os.listdir(sys_path))
-        else:
-            devices[d]["Interface"] = ""
-        # check if a port is used for ssh connection
-        devices[d]["Ssh_if"] = False
-        devices[d]["Active"] = ""
+        # get additional info and add it to existing data
+        devices[d] = dict(devices[d].items() +
+                          get_pci_device_details(d).items())
+
         for _if in ssh_if: 
             if _if in devices[d]["Interface"].split(","):
                 devices[d]["Ssh_if"] = True
@@ -261,14 +239,12 @@ def get_nic_details():
                 break;
 
         # add igb_uio to list of supporting modules if needed
-        if is_supported_device(d):
-            if "Module_str" in devices[d]:
-                if "igb_uio" not in devices[d]["Module_str"]:
-                    devices[d]["Module_str"] = devices[d]["Module_str"] + ",igb_uio"
-            else:
-                devices[d]["Module_str"] = "igb_uio"
-        if "Module_str" not in devices[d]:
-            devices[d]["Module_str"] = "<none>"
+        if "Module_str" in devices[d]:
+            if "igb_uio" not in devices[d]["Module_str"]:
+                devices[d]["Module_str"] = devices[d]["Module_str"] + ",igb_uio"
+        else:
+            devices[d]["Module_str"] = "igb_uio"
+
         # make sure the driver and module strings do not have any duplicates
         if has_driver(d):
             modules = devices[d]["Module_str"].split(",")
@@ -343,6 +319,22 @@ def bind_one(dev_id, driver, force):
             unbind_one(dev_id, force)
             dev["Driver_str"] = "" # clear driver string
 
+    # if we are binding to one of DPDK drivers, add PCI id's to that driver
+    if driver == "igb_uio":
+        filename = "/sys/bus/pci/drivers/%s/new_id" % driver
+        try:
+            f = open(filename, "w")
+        except:
+            print "Error: bind failed for %s - Cannot open %s" % (dev_id, filename)
+            return
+        try:
+            f.write("%04x %04x" % (dev["Vendor"], dev["Device"]))
+            f.close()
+        except:
+            print "Error: bind failed for %s - Cannot write new PCI ID to " \
+                "driver %s" % (dev_id, driver)
+            return
+
     # do the bind by writing to /sys
     filename = "/sys/bus/pci/drivers/%s/bind" % driver
     try:
@@ -356,6 +348,12 @@ def bind_one(dev_id, driver, force):
         f.write(dev_id)
         f.close()
     except:
+        # for some reason, closing dev_id after adding a new PCI ID to new_id
+        # results in IOError. however, if the device was successfully bound,
+        # we don't care for any errors and can safely ignore IOError
+        tmp = get_pci_device_details(dev_id)
+        if "Driver_str" in tmp and tmp["Driver_str"] == driver:
+            return
         print "Error: bind failed for %s - Cannot bind to driver %s" % (dev_id, driver)
         if saved_driver is not None: # restore any previous driver
             bind_one(dev_id, saved_driver, force)
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v4 19/20] binding script: Renamed igb_uio_bind to dpdk_nic_bind
  2014-06-03 10:17     ` [dpdk-dev] [PATCH v4 00/20] Add VFIO support to DPDK Anatoly Burakov
                         ` (17 preceding siblings ...)
  2014-06-03 10:18       ` [dpdk-dev] [PATCH v4 18/20] igb_uio: Removed PCI ID table from igb_uio Anatoly Burakov
@ 2014-06-03 10:18       ` Anatoly Burakov
  2014-06-03 10:18       ` [dpdk-dev] [PATCH v4 20/20] setup script: adding support for VFIO to setup.sh Anatoly Burakov
  2014-06-10 11:11       ` [dpdk-dev] [PATCH v5 00/20] Add VFIO support to DPDK Anatoly Burakov
  20 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-06-03 10:18 UTC (permalink / raw)
  To: dev

Renaming the igb_uio_bind script to dpdk_nic_bind to have a generic name
since we're now supporting two drivers.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 tools/{igb_uio_bind.py => dpdk_nic_bind.py} | 47 ++++++++++++++++++++---------
 tools/setup.sh                              | 16 +++++-----
 2 files changed, 40 insertions(+), 23 deletions(-)
 rename tools/{igb_uio_bind.py => dpdk_nic_bind.py} (92%)

diff --git a/tools/igb_uio_bind.py b/tools/dpdk_nic_bind.py
similarity index 92%
rename from tools/igb_uio_bind.py
rename to tools/dpdk_nic_bind.py
index 33adcf4..1e517e7 100755
--- a/tools/igb_uio_bind.py
+++ b/tools/dpdk_nic_bind.py
@@ -42,6 +42,8 @@ ETHERNET_CLASS = "0200"
 # global dict ethernet devices present. Dictionary indexed by PCI address.
 # Each device within this is itself a dictionary of device properties
 devices = {}
+# list of supported DPDK drivers
+dpdk_drivers = [ "igb_uio", "vfio-pci" ]
 
 def usage():
     '''Print usage information for the program'''
@@ -146,22 +148,33 @@ def find_module(mod):
 
 def check_modules():
     '''Checks that igb_uio is loaded'''
+    global dpdk_drivers
     
     fd = file("/proc/modules")
     loaded_mods = fd.readlines()
     fd.close()
-    mod = "igb_uio"
+
+    # list of supported modules
+    mods =  [{"Name" : driver, "Found" : False} for driver in dpdk_drivers]
     
     # first check if module is loaded
-    found = False
     for line in loaded_mods:
-        if line.startswith(mod):
-            found = True
-            break
-    if not found:
-        print "Error - module %s not loaded" %mod
+        for mod in mods:
+            if line.startswith(mod["Name"]):
+                mod["Found"] = True
+            # special case for vfio_pci (module is named vfio-pci,
+            # but its .ko is named vfio_pci)
+            elif line.replace("_", "-").startswith(mod["Name"]):
+                mod["Found"] = True
+
+    # check if we have at least one loaded module
+    if True not in [mod["Found"] for mod in mods]:
+        print "Error - no supported modules are loaded"
         sys.exit(1)
 
+    # change DPDK driver list to only contain drivers that are loaded
+    dpdk_drivers = [mod["Name"] for mod in mods if mod["Found"]]
+
 def has_driver(dev_id):
     '''return true if a device is assigned to a driver. False otherwise'''
     return "Driver_str" in devices[dev_id]
@@ -196,6 +209,7 @@ def get_nic_details():
     the pci addresses (domain:bus:slot.func). The values are themselves
     dictionaries - one for each NIC.'''
     global devices
+    global dpdk_drivers
     
     # clear any old data
     devices = {} 
@@ -240,10 +254,11 @@ def get_nic_details():
 
         # add igb_uio to list of supporting modules if needed
         if "Module_str" in devices[d]:
-            if "igb_uio" not in devices[d]["Module_str"]:
-                devices[d]["Module_str"] = devices[d]["Module_str"] + ",igb_uio"
+            for driver in dpdk_drivers:
+                if driver not in devices[d]["Module_str"]:
+                    devices[d]["Module_str"] = devices[d]["Module_str"] + ",%s" % driver
         else:
-            devices[d]["Module_str"] = "igb_uio"
+            devices[d]["Module_str"] = ",".join(dpdk_drivers)
 
         # make sure the driver and module strings do not have any duplicates
         if has_driver(d):
@@ -320,7 +335,7 @@ def bind_one(dev_id, driver, force):
             dev["Driver_str"] = "" # clear driver string
 
     # if we are binding to one of DPDK drivers, add PCI id's to that driver
-    if driver == "igb_uio":
+    if driver in dpdk_drivers:
         filename = "/sys/bus/pci/drivers/%s/new_id" % driver
         try:
             f = open(filename, "w")
@@ -397,21 +412,23 @@ def show_status():
     '''Function called when the script is passed the "--status" option. Displays
     to the user what devices are bound to the igb_uio driver, the kernel driver
     or to no driver'''
+    global dpdk_drivers
     kernel_drv = []
-    uio_drv = []
+    dpdk_drv = []
     no_drv = []
+
     # split our list of devices into the three categories above
     for d in devices.keys():
         if not has_driver(d):
             no_drv.append(devices[d])
             continue
-        if devices[d]["Driver_str"] == "igb_uio":
-            uio_drv.append(devices[d])
+        if devices[d]["Driver_str"] in dpdk_drivers:
+            dpdk_drv.append(devices[d])
         else:
             kernel_drv.append(devices[d])
 
     # print each category separately, so we can clearly see what's used by DPDK
-    display_devices("Network devices using IGB_UIO driver", uio_drv, \
+    display_devices("Network devices using DPDK-compatible driver", dpdk_drv, \
                     "drv=%(Driver_str)s unused=%(Module_str)s")
     display_devices("Network devices using kernel driver", kernel_drv,
                     "if=%(Interface)s drv=%(Driver_str)s unused=%(Module_str)s %(Active)s")
diff --git a/tools/setup.sh b/tools/setup.sh
index 39be8fc..e0671b8 100755
--- a/tools/setup.sh
+++ b/tools/setup.sh
@@ -324,13 +324,13 @@ grep_meminfo()
 }
 
 #
-# Calls igb_uio_bind.py --status to show the NIC and what they
+# Calls dpdk_nic_bind.py --status to show the NIC and what they
 # are all bound to, in terms of drivers.
 #
 show_nics()
 {
 	if  /sbin/lsmod  | grep -q igb_uio ; then 
-		${RTE_SDK}/tools/igb_uio_bind.py --status
+		${RTE_SDK}/tools/dpdk_nic_bind.py --status
 	else 
 		echo "# Please load the 'igb_uio' kernel module before querying or "
 		echo "# adjusting NIC device bindings"
@@ -338,16 +338,16 @@ show_nics()
 }
 
 #
-# Uses igb_uio_bind.py to move devices to work with igb_uio
+# Uses dpdk_nic_bind.py to move devices to work with igb_uio
 #
 bind_nics()
 {
 	if  /sbin/lsmod  | grep -q igb_uio ; then 
-		${RTE_SDK}/tools/igb_uio_bind.py --status
+		${RTE_SDK}/tools/dpdk_nic_bind.py --status
 		echo ""
 		echo -n "Enter PCI address of device to bind to IGB UIO driver: "
 		read PCI_PATH
-		sudo ${RTE_SDK}/tools/igb_uio_bind.py -b igb_uio $PCI_PATH && echo "OK"
+		sudo ${RTE_SDK}/tools/dpdk_nic_bind.py -b igb_uio $PCI_PATH && echo "OK"
 	else 
 		echo "# Please load the 'igb_uio' kernel module before querying or "
 		echo "# adjusting NIC device bindings"
@@ -355,18 +355,18 @@ bind_nics()
 }
 
 #
-# Uses igb_uio_bind.py to move devices to work with kernel drivers again
+# Uses dpdk_nic_bind.py to move devices to work with kernel drivers again
 #
 unbind_nics()
 {
-	${RTE_SDK}/tools/igb_uio_bind.py --status
+	${RTE_SDK}/tools/dpdk_nic_bind.py --status
 	echo ""
 	echo -n "Enter PCI address of device to bind to IGB UIO driver: "
 	read PCI_PATH
 	echo ""
 	echo -n "Enter name of kernel driver to bind the device to: "
 	read DRV
-	sudo ${RTE_SDK}/tools/igb_uio_bind.py -b $DRV $PCI_PATH && echo "OK"
+	sudo ${RTE_SDK}/tools/dpdk_nic_bind.py -b $DRV $PCI_PATH && echo "OK"
 }
 
 #
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v4 20/20] setup script: adding support for VFIO to setup.sh
  2014-06-03 10:17     ` [dpdk-dev] [PATCH v4 00/20] Add VFIO support to DPDK Anatoly Burakov
                         ` (18 preceding siblings ...)
  2014-06-03 10:18       ` [dpdk-dev] [PATCH v4 19/20] binding script: Renamed igb_uio_bind to dpdk_nic_bind Anatoly Burakov
@ 2014-06-03 10:18       ` Anatoly Burakov
  2014-06-10 11:11       ` [dpdk-dev] [PATCH v5 00/20] Add VFIO support to DPDK Anatoly Burakov
  20 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-06-03 10:18 UTC (permalink / raw)
  To: dev

Support for loading/unloading VFIO drivers, binding/unbinding devices
to/from VFIO, also setting up correct userspace permissions.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 tools/setup.sh | 156 +++++++++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 141 insertions(+), 15 deletions(-)

diff --git a/tools/setup.sh b/tools/setup.sh
index e0671b8..3991da9 100755
--- a/tools/setup.sh
+++ b/tools/setup.sh
@@ -187,6 +187,54 @@ load_igb_uio_module()
 }
 
 #
+# Unloads VFIO modules.
+#
+remove_vfio_module()
+{
+	echo "Unloading any existing VFIO module"
+	/sbin/lsmod | grep -s vfio > /dev/null
+	if [ $? -eq 0 ] ; then
+		sudo /sbin/rmmod vfio-pci
+		sudo /sbin/rmmod vfio_iommu_type1
+		sudo /sbin/rmmod vfio
+	fi
+}
+
+#
+# Loads new vfio-pci (and vfio module if needed).
+#
+load_vfio_module()
+{
+	remove_vfio_module
+
+	VFIO_PATH="kernel/drivers/vfio/pci/vfio-pci.ko"
+
+	echo "Loading VFIO module"
+	/sbin/lsmod | grep -s vfio_pci > /dev/null
+	if [ $? -ne 0 ] ; then
+		if [ -f /lib/modules/$(uname -r)/$VFIO_PATH ] ; then
+			sudo /sbin/modprobe vfio-pci
+		fi
+	fi
+
+	# make sure regular users can read /dev/vfio
+	echo "chmod /dev/vfio"
+	sudo /usr/bin/chmod a+x /dev/vfio
+	if [ $? -ne 0 ] ; then
+		echo "FAIL"
+		quit
+	fi
+	echo "OK"
+
+	# check if /dev/vfio/vfio exists - that way we
+	# know we either loaded the module, or it was
+	# compiled into the kernel
+	if [ ! -e /dev/vfio/vfio ] ; then
+		echo "## ERROR: VFIO not found!"
+	fi
+}
+
+#
 # Unloads the rte_kni.ko module.
 #
 remove_kni_module()
@@ -223,6 +271,55 @@ load_kni_module()
 }
 
 #
+# Sets appropriate permissions on /dev/vfio/* files
+#
+set_vfio_permissions()
+{
+	# make sure regular users can read /dev/vfio
+	echo "chmod /dev/vfio"
+	sudo /usr/bin/chmod a+x /dev/vfio
+	if [ $? -ne 0 ] ; then
+		echo "FAIL"
+		quit
+	fi
+	echo "OK"
+
+	# make sure regular user can access everything inside /dev/vfio
+	echo "chmod /dev/vfio/*"
+	sudo /usr/bin/chmod 0666 /dev/vfio/*
+	if [ $? -ne 0 ] ; then
+		echo "FAIL"
+		quit
+	fi
+	echo "OK"
+
+	# since permissions are only to be set when running as
+	# regular user, we only check ulimit here
+	#
+	# warn if regular user is only allowed
+	# to memlock <64M of memory
+	MEMLOCK_AMNT=`ulimit -l`
+
+	if [ "$MEMLOCK_AMNT" != "unlimited" ] ; then
+		MEMLOCK_MB=`expr $MEMLOCK_AMNT / 1024`
+		echo ""
+		echo "Current user memlock limit: ${MEMLOCK_MB} MB"
+		echo ""
+		echo "This is the maximum amount of memory you will be"
+		echo "able to use with DPDK and VFIO if run as current user."
+		echo -n "To change this, please adjust limits.conf memlock "
+		echo "limit for current user."
+
+		if [ $MEMLOCK_AMNT -lt 65536 ] ; then
+			echo ""
+			echo "## WARNING: memlock limit is less than 64MB"
+			echo -n "## DPDK with VFIO may not be able to initialize "
+			echo "if run as current user."
+		fi
+	fi
+}
+
+#
 # Removes all reserved hugepages.
 #
 clear_huge_pages()
@@ -340,7 +437,24 @@ show_nics()
 #
 # Uses dpdk_nic_bind.py to move devices to work with igb_uio
 #
-bind_nics()
+bind_nics_to_vfio()
+{
+	if /sbin/lsmod  | grep -q vfio_pci ; then
+		${RTE_SDK}/tools/dpdk_nic_bind.py --status
+		echo ""
+		echo -n "Enter PCI address of device to bind to VFIO driver: "
+		read PCI_PATH
+		sudo ${RTE_SDK}/tools/dpdk_nic_bind.py -b vfio-pci $PCI_PATH && echo "OK"
+	else
+		echo "# Please load the 'vfio-pci' kernel module before querying or "
+		echo "# adjusting NIC device bindings"
+	fi
+}
+
+#
+# Uses dpdk_nic_bind.py to move devices to work with igb_uio
+#
+bind_nics_to_igb_uio()
 {
 	if  /sbin/lsmod  | grep -q igb_uio ; then 
 		${RTE_SDK}/tools/dpdk_nic_bind.py --status
@@ -397,20 +511,29 @@ step2_func()
 	TEXT[1]="Insert IGB UIO module"
 	FUNC[1]="load_igb_uio_module"
 
-	TEXT[2]="Insert KNI module"
-	FUNC[2]="load_kni_module"
+	TEXT[2]="Insert VFIO module"
+	FUNC[2]="load_vfio_module"
+
+	TEXT[3]="Insert KNI module"
+	FUNC[3]="load_kni_module"
 
-	TEXT[3]="Setup hugepage mappings for non-NUMA systems"
-	FUNC[3]="set_non_numa_pages"
+	TEXT[4]="Setup hugepage mappings for non-NUMA systems"
+	FUNC[4]="set_non_numa_pages"
 
-	TEXT[4]="Setup hugepage mappings for NUMA systems"
-	FUNC[4]="set_numa_pages"
+	TEXT[5]="Setup hugepage mappings for NUMA systems"
+	FUNC[5]="set_numa_pages"
 
-	TEXT[5]="Display current Ethernet device settings"
-	FUNC[5]="show_nics"
+	TEXT[6]="Display current Ethernet device settings"
+	FUNC[6]="show_nics"
 
-	TEXT[6]="Bind Ethernet device to IGB UIO module"
-	FUNC[6]="bind_nics"
+	TEXT[7]="Bind Ethernet device to IGB UIO module"
+	FUNC[7]="bind_nics_to_igb_uio"
+
+	TEXT[8]="Bind Ethernet device to VFIO module"
+	FUNC[8]="bind_nics_to_vfio"
+
+	TEXT[9]="Setup VFIO permissions"
+	FUNC[9]="set_vfio_permissions"
 }
 
 #
@@ -455,11 +578,14 @@ step5_func()
 	TEXT[3]="Remove IGB UIO module"
 	FUNC[3]="remove_igb_uio_module"
 
-	TEXT[4]="Remove KNI module"
-	FUNC[4]="remove_kni_module"
+	TEXT[4]="Remove VFIO module"
+	FUNC[4]="remove_vfio_module"
+
+	TEXT[5]="Remove KNI module"
+	FUNC[5]="remove_kni_module"
 
-	TEXT[5]="Remove hugepage mappings"
-	FUNC[5]="clear_huge_pages"
+	TEXT[6]="Remove hugepage mappings"
+	FUNC[6]="clear_huge_pages"
 }
 
 STEPS[1]="step1_func"
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [dpdk-dev] [PATCH v4 05/20] pci: Rename RTE_PCI_DRV_NEED_IGB_UIO to RTE_PCI_DRV_NEED_MAPPING
  2014-06-03 10:17       ` [dpdk-dev] [PATCH v4 05/20] pci: Rename RTE_PCI_DRV_NEED_IGB_UIO to RTE_PCI_DRV_NEED_MAPPING Anatoly Burakov
@ 2014-06-04  9:03         ` Burakov, Anatoly
  0 siblings, 0 replies; 160+ messages in thread
From: Burakov, Anatoly @ 2014-06-04  9:03 UTC (permalink / raw)
  To: Burakov, Anatoly, dev


> Rename the RTE_PCI_DRV_NEED_IGB_UIO to be more generic.
> 
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>

NAK, virtio change got lost in the rebases :-(

Best regards,
Anatoly Burakov
DPDK SW Engineer

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v5 00/20] Add VFIO support to DPDK
  2014-06-03 10:17     ` [dpdk-dev] [PATCH v4 00/20] Add VFIO support to DPDK Anatoly Burakov
                         ` (19 preceding siblings ...)
  2014-06-03 10:18       ` [dpdk-dev] [PATCH v4 20/20] setup script: adding support for VFIO to setup.sh Anatoly Burakov
@ 2014-06-10 11:11       ` Anatoly Burakov
  2014-06-10 11:11         ` [dpdk-dev] [PATCH v5 01/20] pci: move open() out of pci_map_resource, rename structs Anatoly Burakov
                           ` (21 more replies)
  20 siblings, 22 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-06-10 11:11 UTC (permalink / raw)
  To: dev

This patchset adds support for using VFIO instead of IGB_UIO to
map the device BARs.

VFIO is a kernel 3.6+ driver allowing secure DMA from userspace
by means of using IOMMU instead of working directly with physical
memory like igb_uio does.

Short summary:
* Adding support for VFIO in EAL PCI code
* Adding new command-line parameter for VFIO interrupt type
* Adding support for VFIO in setup.sh
* Renaming igb_uio_bind to dpdk_nic_bind and adding support for
  VFIO there
* Removing PCI ID list from igb_uio, effectively making it another
  generic PCI driver similar to pci_stub, vfio-pci et al
* Adding autotest for VFIO interrupt types
* Making igb_uio and VFIO compilation optional

v2 fixes:
* Fixed a couple of resource leaks

v3 fixes:
* Fixed various checkpatch.pl issues
* Added MSI interrupt support
* Added an option to automatically determine interrupt type
* Fixed various issues of commit atomicity

v4 fixes:
* Rebased on top of 5ebbb17281645b23359fbd49133bb639b63ba88c
* Fixed a typo in EAL command-line help text

v5 fixes:
* Fixed missing virtio change to RTE_PCI_DRV_NEED_MAPPING
* Fixed compile issue when VFIO was disabled (introduced in v3)

Tested-by: Waterman Cao <waterman.cao@intel.com> 

This patch has been tested by intel.
We tested this patch with the following functions:
* Layer-2 Forwarding support
* Sample commands test
* Packet forwarding checking
* Bind and unbind VFIO driver
* Compile igb_uio driver ( Linux kernel < 3.6)
* Interrupt model test under Legacy|msi|msix
All cases passed.

Please see test environment information :
Fedora 20 x86_64, Linux Kernel 3.13.6-200,
GCC 4.8.2 Intel Xeon CPU E5-2680 v2 @ 2.80GHz NIC: Intel Niantic 82599


Anatoly Burakov (20):
  pci: move open() out of pci_map_resource, rename structs
  pci: move uio mapping code to a separate file
  pci: fixing errors in a previous commit found by checkpatch
  pci: distinguish between legitimate failures and non-fatal errors
  pci: Rename RTE_PCI_DRV_NEED_IGB_UIO to RTE_PCI_DRV_NEED_MAPPING
  igb_uio: make igb_uio compilation optional
  igb_uio: Moved interrupt type out of igb_uio
  vfio: add support for VFIO in Linuxapp targets
  vfio: add VFIO header
  interrupts: Add support for VFIO interrupts
  eal: remove -Wno-return-type for non-existent eal_hpet.c
  vfio: create mapping code for VFIO
  vfio: add multiprocess support.
  pci: enable VFIO device binding
  eal: added support for selecting VFIO interrupt type from EAL    
    command-line
  eal: make --no-huge use mmap instead of malloc
  test app: adding unit tests for VFIO EAL command-line parameter
  igb_uio: Removed PCI ID table from igb_uio
  binding script: Renamed igb_uio_bind to dpdk_nic_bind
  setup script: adding support for VFIO to setup.sh

 app/test/test_eal_flags.c                          |  36 +
 app/test/test_pci.c                                |   4 +-
 config/common_linuxapp                             |   2 +
 lib/librte_eal/bsdapp/eal/eal_pci.c                |   2 +-
 lib/librte_eal/common/Makefile                     |   1 +
 lib/librte_eal/common/eal_common_pci.c             |  16 +-
 lib/librte_eal/common/include/rte_pci.h            |   5 +-
 .../common/include/rte_pci_dev_feature_defs.h      |  46 ++
 .../common/include/rte_pci_dev_features.h          |  44 ++
 lib/librte_eal/linuxapp/Makefile                   |   2 +
 lib/librte_eal/linuxapp/eal/Makefile               |   5 +-
 lib/librte_eal/linuxapp/eal/eal.c                  |  36 +
 lib/librte_eal/linuxapp/eal/eal_interrupts.c       | 287 +++++++-
 lib/librte_eal/linuxapp/eal/eal_memory.c           |   8 +-
 lib/librte_eal/linuxapp/eal/eal_pci.c              | 473 ++-----------
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c          | 403 +++++++++++
 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c         | 781 +++++++++++++++++++++
 lib/librte_eal/linuxapp/eal/eal_pci_vfio_mp_sync.c | 395 +++++++++++
 .../linuxapp/eal/include/eal_internal_cfg.h        |   3 +
 lib/librte_eal/linuxapp/eal/include/eal_pci_init.h | 116 +++
 lib/librte_eal/linuxapp/eal/include/eal_vfio.h     |  55 ++
 .../linuxapp/eal/include/exec-env/rte_interrupts.h |   4 +
 lib/librte_eal/linuxapp/igb_uio/igb_uio.c          |  69 +-
 lib/librte_pmd_e1000/em_ethdev.c                   |   2 +-
 lib/librte_pmd_e1000/igb_ethdev.c                  |   4 +-
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c                |   4 +-
 lib/librte_pmd_virtio/virtio_ethdev.c              |   2 +-
 lib/librte_pmd_vmxnet3/vmxnet3_ethdev.c            |   2 +-
 tools/{igb_uio_bind.py => dpdk_nic_bind.py}        | 157 +++--
 tools/setup.sh                                     | 172 ++++-
 30 files changed, 2548 insertions(+), 588 deletions(-)
 create mode 100644 lib/librte_eal/common/include/rte_pci_dev_feature_defs.h
 create mode 100644 lib/librte_eal/common/include/rte_pci_dev_features.h
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_pci_uio.c
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_pci_vfio_mp_sync.c
 create mode 100644 lib/librte_eal/linuxapp/eal/include/eal_pci_init.h
 create mode 100644 lib/librte_eal/linuxapp/eal/include/eal_vfio.h
 rename tools/{igb_uio_bind.py => dpdk_nic_bind.py} (83%)

-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v5 01/20] pci: move open() out of pci_map_resource, rename structs
  2014-06-10 11:11       ` [dpdk-dev] [PATCH v5 00/20] Add VFIO support to DPDK Anatoly Burakov
@ 2014-06-10 11:11         ` Anatoly Burakov
  2014-06-10 11:11         ` [dpdk-dev] [PATCH v5 02/20] pci: move uio mapping code to a separate file Anatoly Burakov
                           ` (20 subsequent siblings)
  21 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-06-10 11:11 UTC (permalink / raw)
  To: dev

Separating mapping code and calls to open. This is a preparatory work
for VFIO patch since it'll need to map BARs too but it doesn't use path
in mapped_pci_resource. Also, renaming structs to be more generic.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/linuxapp/eal/eal_pci.c | 125 ++++++++++++++++------------------
 1 file changed, 58 insertions(+), 67 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c
index ac2c1fe..fd88bd0 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -31,39 +31,17 @@
  *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */
 
-#include <ctype.h>
-#include <stdio.h>
-#include <stdlib.h>
 #include <string.h>
-#include <stdarg.h>
-#include <unistd.h>
-#include <inttypes.h>
-#include <sys/types.h>
 #include <sys/stat.h>
 #include <fcntl.h>
-#include <stdarg.h>
-#include <errno.h>
 #include <dirent.h>
-#include <limits.h>
-#include <sys/queue.h>
 #include <sys/mman.h>
-#include <sys/ioctl.h>
 
-#include <rte_interrupts.h>
 #include <rte_log.h>
 #include <rte_pci.h>
-#include <rte_common.h>
-#include <rte_launch.h>
-#include <rte_memory.h>
-#include <rte_memzone.h>
 #include <rte_tailq.h>
-#include <rte_eal.h>
 #include <rte_eal_memconfig.h>
-#include <rte_per_lcore.h>
-#include <rte_lcore.h>
 #include <rte_malloc.h>
-#include <rte_string_fns.h>
-#include <rte_debug.h>
 #include <rte_devargs.h>
 
 #include "rte_pci_dev_ids.h"
@@ -74,15 +52,12 @@
  * @file
  * PCI probing under linux
  *
- * This code is used to simulate a PCI probe by parsing information in
- * sysfs. Moreover, when a registered driver matches a device, the
- * kernel driver currently using it is unloaded and replaced by
- * igb_uio module, which is a very minimal userland driver for Intel
- * network card, only providing access to PCI BAR to applications, and
- * enabling bus master.
+ * This code is used to simulate a PCI probe by parsing information in sysfs.
+ * When a registered device matches a driver, it is then initialized with
+ * IGB_UIO driver (or doesn't initialize, if the device wasn't bound to it).
  */
 
-struct uio_map {
+struct pci_map {
 	void *addr;
 	uint64_t offset;
 	uint64_t size;
@@ -93,18 +68,18 @@ struct uio_map {
  * For multi-process we need to reproduce all PCI mappings in secondary
  * processes, so save them in a tailq.
  */
-struct uio_resource {
-	TAILQ_ENTRY(uio_resource) next;
+struct mapped_pci_resource {
+	TAILQ_ENTRY(mapped_pci_resource) next;
 
 	struct rte_pci_addr pci_addr;
 	char path[PATH_MAX];
-	size_t nb_maps;
-	struct uio_map maps[PCI_MAX_RESOURCE];
+	int nb_maps;
+	struct pci_map maps[PCI_MAX_RESOURCE];
 };
 
-TAILQ_HEAD(uio_res_list, uio_resource);
+TAILQ_HEAD(mapped_pci_res_list, mapped_pci_resource);
+static struct mapped_pci_res_list *pci_res_list;
 
-static struct uio_res_list *uio_res_list = NULL;
 static int pci_parse_sysfs_value(const char *filename, uint64_t *val);
 
 /* unbind kernel driver for this device */
@@ -148,30 +123,17 @@ error:
 
 /* map a particular resource from a file */
 static void *
-pci_map_resource(void *requested_addr, const char *devname, off_t offset,
-		 size_t size)
+pci_map_resource(void *requested_addr, int fd, off_t offset, size_t size)
 {
-	int fd;
 	void *mapaddr;
 
-	/*
-	 * open devname, to mmap it
-	 */
-	fd = open(devname, O_RDWR);
-	if (fd < 0) {
-		RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
-			devname, strerror(errno));
-		goto fail;
-	}
-
 	/* Map the PCI memory resource of device */
 	mapaddr = mmap(requested_addr, size, PROT_READ | PROT_WRITE,
 			MAP_SHARED, fd, offset);
-	close(fd);
 	if (mapaddr == MAP_FAILED ||
 			(requested_addr != NULL && mapaddr != requested_addr)) {
-		RTE_LOG(ERR, EAL, "%s(): cannot mmap(%s(%d), %p, 0x%lx, 0x%lx):"
-			" %s (%p)\n", __func__, devname, fd, requested_addr,
+		RTE_LOG(ERR, EAL, "%s(): cannot mmap(%d, %p, 0x%lx, 0x%lx): %s (%p)\n",
+			__func__, fd, requested_addr,
 			(unsigned long)size, (unsigned long)offset,
 			strerror(errno), mapaddr);
 		goto fail;
@@ -186,10 +148,10 @@ fail:
 }
 
 #define OFF_MAX              ((uint64_t)(off_t)-1)
-static ssize_t
-pci_uio_get_mappings(const char *devname, struct uio_map maps[], size_t nb_maps)
+static int
+pci_uio_get_mappings(const char *devname, struct pci_map maps[], int nb_maps)
 {
-	size_t i;
+	int i;
 	char dirname[PATH_MAX];
 	char filename[PATH_MAX];
 	uint64_t offset, size;
@@ -249,25 +211,37 @@ pci_uio_get_mappings(const char *devname, struct uio_map maps[], size_t nb_maps)
 static int
 pci_uio_map_secondary(struct rte_pci_device *dev)
 {
-        size_t i;
-        struct uio_resource *uio_res;
+	int fd, i;
+	struct mapped_pci_resource *uio_res;
 
-	TAILQ_FOREACH(uio_res, uio_res_list, next) {
+	TAILQ_FOREACH(uio_res, pci_res_list, next) {
 
 		/* skip this element if it doesn't match our PCI address */
 		if (memcmp(&uio_res->pci_addr, &dev->addr, sizeof(dev->addr)))
 			continue;
 
 		for (i = 0; i != uio_res->nb_maps; i++) {
-			if (pci_map_resource(uio_res->maps[i].addr,
-					     uio_res->path,
+			/*
+			 * open devname, to mmap it
+			 */
+			fd = open(uio_res->path, O_RDWR);
+			if (fd < 0) {
+				RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
+					uio_res->path, strerror(errno));
+				return -1;
+			}
+
+			if (pci_map_resource(uio_res->maps[i].addr, fd,
 					     (off_t)uio_res->maps[i].offset,
 					     (size_t)uio_res->maps[i].size)
 			    != uio_res->maps[i].addr) {
 				RTE_LOG(ERR, EAL,
 					"Cannot mmap device resource\n");
+				close(fd);
 				return (-1);
 			}
+			/* fd is not needed in slave process, close it */
+			close(fd);
 		}
 		return (0);
 	}
@@ -276,7 +250,8 @@ pci_uio_map_secondary(struct rte_pci_device *dev)
 	return -1;
 }
 
-static int pci_mknod_uio_dev(const char *sysfs_uio_path, unsigned uio_num)
+static int
+pci_mknod_uio_dev(const char *sysfs_uio_path, unsigned uio_num)
 {
 	FILE *f;
 	char filename[PATH_MAX];
@@ -323,7 +298,8 @@ static int pci_mknod_uio_dev(const char *sysfs_uio_path, unsigned uio_num)
  * sysfs. On error, return a negative value. In this case dstbuf is
  * invalid.
  */
-static int pci_get_uio_dev(struct rte_pci_device *dev, char *dstbuf,
+static int
+pci_get_uio_dev(struct rte_pci_device *dev, char *dstbuf,
 			   unsigned int buflen)
 {
 	struct rte_pci_addr *loc = &dev->addr;
@@ -405,10 +381,10 @@ pci_uio_map_resource(struct rte_pci_device *dev)
 	uint64_t phaddr;
 	uint64_t offset;
 	uint64_t pagesz;
-	ssize_t nb_maps;
+	int nb_maps;
 	struct rte_pci_addr *loc = &dev->addr;
-	struct uio_resource *uio_res;
-	struct uio_map *maps;
+	struct mapped_pci_resource *uio_res;
+	struct pci_map *maps;
 
 	dev->intr_handle.fd = -1;
 	dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
@@ -460,6 +436,7 @@ pci_uio_map_resource(struct rte_pci_device *dev)
 
 	maps = uio_res->maps;
 	for (i = 0; i != PCI_MAX_RESOURCE; i++) {
+		int fd;
 
 		/* skip empty BAR */
 		if ((phaddr = dev->mem_resource[i].phys_addr) == 0)
@@ -473,14 +450,27 @@ pci_uio_map_resource(struct rte_pci_device *dev)
 		/* if matching map is found, then use it */
 		if (j != nb_maps) {
 			offset = j * pagesz;
+
+			/*
+			 * open devname, to mmap it
+			 */
+			fd = open(devname, O_RDWR);
+			if (fd < 0) {
+				RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
+					devname, strerror(errno));
+				return -1;
+			}
+
 			if (maps[j].addr != NULL ||
-			    (mapaddr = pci_map_resource(NULL, devname,
+			    (mapaddr = pci_map_resource(NULL, fd,
 							(off_t)offset,
 							(size_t)maps[j].size)
 			    ) == NULL) {
 				rte_free(uio_res);
+				close(fd);
 				return (-1);
 			}
+			close(fd);
 
 			maps[j].addr = mapaddr;
 			maps[j].offset = offset;
@@ -488,7 +478,7 @@ pci_uio_map_resource(struct rte_pci_device *dev)
 		}
 	}
 
-	TAILQ_INSERT_TAIL(uio_res_list, uio_res, next);
+	TAILQ_INSERT_TAIL(pci_res_list, uio_res, next);
 
 	return (0);
 }
@@ -866,7 +856,8 @@ rte_eal_pci_init(void)
 {
 	TAILQ_INIT(&pci_driver_list);
 	TAILQ_INIT(&pci_device_list);
-	uio_res_list = RTE_TAILQ_RESERVE_BY_IDX(RTE_TAILQ_PCI, uio_res_list);
+	pci_res_list = RTE_TAILQ_RESERVE_BY_IDX(RTE_TAILQ_PCI,
+			mapped_pci_res_list);
 
 	/* for debug purposes, PCI can be disabled */
 	if (internal_config.no_pci)
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v5 02/20] pci: move uio mapping code to a separate file
  2014-06-10 11:11       ` [dpdk-dev] [PATCH v5 00/20] Add VFIO support to DPDK Anatoly Burakov
  2014-06-10 11:11         ` [dpdk-dev] [PATCH v5 01/20] pci: move open() out of pci_map_resource, rename structs Anatoly Burakov
@ 2014-06-10 11:11         ` Anatoly Burakov
  2014-06-10 11:11         ` [dpdk-dev] [PATCH v5 03/20] pci: fixing errors in a previous commit found by checkpatch Anatoly Burakov
                           ` (19 subsequent siblings)
  21 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-06-10 11:11 UTC (permalink / raw)
  To: dev


Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/linuxapp/eal/Makefile               |   1 +
 lib/librte_eal/linuxapp/eal/eal_pci.c              | 403 +--------------------
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c          | 403 +++++++++++++++++++++
 lib/librte_eal/linuxapp/eal/include/eal_pci_init.h |  66 ++++
 4 files changed, 474 insertions(+), 399 deletions(-)
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_pci_uio.c
 create mode 100644 lib/librte_eal/linuxapp/eal/include/eal_pci_init.h

diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index b052820..d958014 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -57,6 +57,7 @@ endif
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_thread.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_log.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci.c
+SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci_uio.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_debug.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_lcore.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_timer.c
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c
index fd88bd0..628813b 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -32,8 +32,6 @@
  */
 
 #include <string.h>
-#include <sys/stat.h>
-#include <fcntl.h>
 #include <dirent.h>
 #include <sys/mman.h>
 
@@ -47,6 +45,7 @@
 #include "rte_pci_dev_ids.h"
 #include "eal_filesystem.h"
 #include "eal_private.h"
+#include "eal_pci_init.h"
 
 /**
  * @file
@@ -57,30 +56,7 @@
  * IGB_UIO driver (or doesn't initialize, if the device wasn't bound to it).
  */
 
-struct pci_map {
-	void *addr;
-	uint64_t offset;
-	uint64_t size;
-	uint64_t phaddr;
-};
-
-/*
- * For multi-process we need to reproduce all PCI mappings in secondary
- * processes, so save them in a tailq.
- */
-struct mapped_pci_resource {
-	TAILQ_ENTRY(mapped_pci_resource) next;
-
-	struct rte_pci_addr pci_addr;
-	char path[PATH_MAX];
-	int nb_maps;
-	struct pci_map maps[PCI_MAX_RESOURCE];
-};
-
-TAILQ_HEAD(mapped_pci_res_list, mapped_pci_resource);
-static struct mapped_pci_res_list *pci_res_list;
-
-static int pci_parse_sysfs_value(const char *filename, uint64_t *val);
+struct mapped_pci_res_list *pci_res_list = NULL;
 
 /* unbind kernel driver for this device */
 static int
@@ -122,8 +98,8 @@ error:
 }
 
 /* map a particular resource from a file */
-static void *
-pci_map_resource(void *requested_addr, int fd, off_t offset, size_t size)
+void *
+pci_map_resource(void * requested_addr, int fd, off_t offset, size_t size)
 {
 	void *mapaddr;
 
@@ -147,342 +123,6 @@ fail:
 	return NULL;
 }
 
-#define OFF_MAX              ((uint64_t)(off_t)-1)
-static int
-pci_uio_get_mappings(const char *devname, struct pci_map maps[], int nb_maps)
-{
-	int i;
-	char dirname[PATH_MAX];
-	char filename[PATH_MAX];
-	uint64_t offset, size;
-
-	for (i = 0; i != nb_maps; i++) {
- 
-		/* check if map directory exists */
-		rte_snprintf(dirname, sizeof(dirname), 
-			"%s/maps/map%u", devname, i);
- 
-		if (access(dirname, F_OK) != 0)
-			break;
- 
-		/* get mapping offset */
-		rte_snprintf(filename, sizeof(filename),
-			"%s/offset", dirname);
-		if (pci_parse_sysfs_value(filename, &offset) < 0) {
-			RTE_LOG(ERR, EAL,
-				"%s(): cannot parse offset of %s\n",
-				__func__, dirname);
-			return (-1);
-		}
- 
-		/* get mapping size */
-		rte_snprintf(filename, sizeof(filename),
-			"%s/size", dirname);
-		if (pci_parse_sysfs_value(filename, &size) < 0) {
-			RTE_LOG(ERR, EAL,
-				"%s(): cannot parse size of %s\n",
-				__func__, dirname);
-			return (-1);
-		}
- 
-		/* get mapping physical address */
-		rte_snprintf(filename, sizeof(filename),
-			"%s/addr", dirname);
-		if (pci_parse_sysfs_value(filename, &maps[i].phaddr) < 0) {
-			RTE_LOG(ERR, EAL,
-				"%s(): cannot parse addr of %s\n",
-				__func__, dirname);
-			return (-1);
-		}
-
-		if ((offset > OFF_MAX) || (size > SIZE_MAX)) {
-			RTE_LOG(ERR, EAL,
-				"%s(): offset/size exceed system max value\n",
-				__func__); 
-			return (-1);
-		}
-
-		maps[i].offset = offset;
-		maps[i].size = size;
-        }
-	return (i);
-}
-
-static int
-pci_uio_map_secondary(struct rte_pci_device *dev)
-{
-	int fd, i;
-	struct mapped_pci_resource *uio_res;
-
-	TAILQ_FOREACH(uio_res, pci_res_list, next) {
-
-		/* skip this element if it doesn't match our PCI address */
-		if (memcmp(&uio_res->pci_addr, &dev->addr, sizeof(dev->addr)))
-			continue;
-
-		for (i = 0; i != uio_res->nb_maps; i++) {
-			/*
-			 * open devname, to mmap it
-			 */
-			fd = open(uio_res->path, O_RDWR);
-			if (fd < 0) {
-				RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
-					uio_res->path, strerror(errno));
-				return -1;
-			}
-
-			if (pci_map_resource(uio_res->maps[i].addr, fd,
-					     (off_t)uio_res->maps[i].offset,
-					     (size_t)uio_res->maps[i].size)
-			    != uio_res->maps[i].addr) {
-				RTE_LOG(ERR, EAL,
-					"Cannot mmap device resource\n");
-				close(fd);
-				return (-1);
-			}
-			/* fd is not needed in slave process, close it */
-			close(fd);
-		}
-		return (0);
-	}
-
-	RTE_LOG(ERR, EAL, "Cannot find resource for device\n");
-	return -1;
-}
-
-static int
-pci_mknod_uio_dev(const char *sysfs_uio_path, unsigned uio_num)
-{
-	FILE *f;
-	char filename[PATH_MAX];
-	int ret;
-	unsigned major, minor;
-	dev_t dev;
-
-	/* get the name of the sysfs file that contains the major and minor
-	 * of the uio device and read its content */
-	rte_snprintf(filename, sizeof(filename), "%s/dev", sysfs_uio_path);
-
-	f = fopen(filename, "r");
-	if (f == NULL) {
-		RTE_LOG(ERR, EAL, "%s(): cannot open sysfs to get major:minor\n",
-			__func__);
-		return -1;
-	}
-
-	ret = fscanf(f, "%d:%d", &major, &minor);
-	if (ret != 2) {
-		RTE_LOG(ERR, EAL, "%s(): cannot parse sysfs to get major:minor\n",
-			__func__);
-		fclose(f);
-		return -1;
-	}
-	fclose(f);
-
-	/* create the char device "mknod /dev/uioX c major minor" */
-	rte_snprintf(filename, sizeof(filename), "/dev/uio%u", uio_num);
-	dev = makedev(major, minor);
-	ret = mknod(filename, S_IFCHR | S_IRUSR | S_IWUSR, dev);
-	if (f == NULL) {
-		RTE_LOG(ERR, EAL, "%s(): mknod() failed %s\n",
-			__func__, strerror(errno));
-		return -1;
-	}
-
-	return ret;
-}
-
-/*
- * Return the uioX char device used for a pci device. On success, return
- * the UIO number and fill dstbuf string with the path of the device in
- * sysfs. On error, return a negative value. In this case dstbuf is
- * invalid.
- */
-static int
-pci_get_uio_dev(struct rte_pci_device *dev, char *dstbuf,
-			   unsigned int buflen)
-{
-	struct rte_pci_addr *loc = &dev->addr;
-	unsigned int uio_num;
-	struct dirent *e;
-	DIR *dir;
-	char dirname[PATH_MAX];
-
-	/* depending on kernel version, uio can be located in uio/uioX
-	 * or uio:uioX */
-
-	rte_snprintf(dirname, sizeof(dirname),
-	         SYSFS_PCI_DEVICES "/" PCI_PRI_FMT "/uio",
-	         loc->domain, loc->bus, loc->devid, loc->function);
-
-	dir = opendir(dirname);
-	if (dir == NULL) {
-		/* retry with the parent directory */
-		rte_snprintf(dirname, sizeof(dirname),
-		         SYSFS_PCI_DEVICES "/" PCI_PRI_FMT,
-		         loc->domain, loc->bus, loc->devid, loc->function);
-		dir = opendir(dirname);
-
-		if (dir == NULL) {
-			RTE_LOG(ERR, EAL, "Cannot opendir %s\n", dirname);
-			return -1;
-		}
-	}
-
-	/* take the first file starting with "uio" */
-	while ((e = readdir(dir)) != NULL) {
-		/* format could be uio%d ...*/
-		int shortprefix_len = sizeof("uio") - 1;
-		/* ... or uio:uio%d */
-		int longprefix_len = sizeof("uio:uio") - 1; 
-		char *endptr;
-
-		if (strncmp(e->d_name, "uio", 3) != 0)
-			continue;
-
-		/* first try uio%d */
-		errno = 0;
-		uio_num = strtoull(e->d_name + shortprefix_len, &endptr, 10);
-		if (errno == 0 && endptr != (e->d_name + shortprefix_len)) {
-			rte_snprintf(dstbuf, buflen, "%s/uio%u", dirname, uio_num);
-			break;
-		}
-
-		/* then try uio:uio%d */
-		errno = 0;
-		uio_num = strtoull(e->d_name + longprefix_len, &endptr, 10);
-		if (errno == 0 && endptr != (e->d_name + longprefix_len)) {
-			rte_snprintf(dstbuf, buflen, "%s/uio:uio%u", dirname, uio_num);
-			break;
-		}
-	}
-	closedir(dir);
-
-	/* No uio resource found */
-	if (e == NULL)
-		return -1;
-
-	/* create uio device if we've been asked to */
-	if (internal_config.create_uio_dev && pci_mknod_uio_dev(dstbuf, uio_num) < 0)
-		RTE_LOG(WARNING, EAL, "Cannot create /dev/uio%u\n", uio_num);
-
-	return uio_num;
-}
-
-/* map the PCI resource of a PCI device in virtual memory */
-static int
-pci_uio_map_resource(struct rte_pci_device *dev)
-{
-	int i, j;
-	char dirname[PATH_MAX];
-	char devname[PATH_MAX]; /* contains the /dev/uioX */
-	void *mapaddr;
-	int uio_num;
-	uint64_t phaddr;
-	uint64_t offset;
-	uint64_t pagesz;
-	int nb_maps;
-	struct rte_pci_addr *loc = &dev->addr;
-	struct mapped_pci_resource *uio_res;
-	struct pci_map *maps;
-
-	dev->intr_handle.fd = -1;
-	dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
-
-	/* secondary processes - use already recorded details */
-	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
-		return (pci_uio_map_secondary(dev));
-
-	/* find uio resource */
-	uio_num = pci_get_uio_dev(dev, dirname, sizeof(dirname));
-	if (uio_num < 0) {
-		RTE_LOG(WARNING, EAL, "  "PCI_PRI_FMT" not managed by UIO driver, "
-				"skipping\n", loc->domain, loc->bus, loc->devid, loc->function);
-		return -1;
-	}
-	rte_snprintf(devname, sizeof(devname), "/dev/uio%u", uio_num);
-
-	/* save fd if in primary process */
-	dev->intr_handle.fd = open(devname, O_RDWR);
-	if (dev->intr_handle.fd < 0) {
-		RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
-			devname, strerror(errno));
-		return -1;
-	}
-	dev->intr_handle.type = RTE_INTR_HANDLE_UIO;
-
-	/* allocate the mapping details for secondary processes*/
-	if ((uio_res = rte_zmalloc("UIO_RES", sizeof (*uio_res), 0)) == NULL) {
-		RTE_LOG(ERR, EAL,
-			"%s(): cannot store uio mmap details\n", __func__);
-		return (-1);
-	}
-
-	rte_snprintf(uio_res->path, sizeof(uio_res->path), "%s", devname);
-	memcpy(&uio_res->pci_addr, &dev->addr, sizeof(uio_res->pci_addr));
-
-	/* collect info about device mappings */
-	nb_maps = pci_uio_get_mappings(dirname, uio_res->maps,
-				       RTE_DIM(uio_res->maps));
-	if (nb_maps < 0) {
-		rte_free(uio_res);
-		return (nb_maps);
-	}
-
-	uio_res->nb_maps = nb_maps;
-
-	/* Map all BARs */
-	pagesz = sysconf(_SC_PAGESIZE);
-
-	maps = uio_res->maps;
-	for (i = 0; i != PCI_MAX_RESOURCE; i++) {
-		int fd;
-
-		/* skip empty BAR */
-		if ((phaddr = dev->mem_resource[i].phys_addr) == 0)
-			continue;
-
-		for (j = 0; j != nb_maps && (phaddr != maps[j].phaddr ||
-				dev->mem_resource[i].len != maps[j].size);
-				j++)
-			;
-
-		/* if matching map is found, then use it */
-		if (j != nb_maps) {
-			offset = j * pagesz;
-
-			/*
-			 * open devname, to mmap it
-			 */
-			fd = open(devname, O_RDWR);
-			if (fd < 0) {
-				RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
-					devname, strerror(errno));
-				return -1;
-			}
-
-			if (maps[j].addr != NULL ||
-			    (mapaddr = pci_map_resource(NULL, fd,
-							(off_t)offset,
-							(size_t)maps[j].size)
-			    ) == NULL) {
-				rte_free(uio_res);
-				close(fd);
-				return (-1);
-			}
-			close(fd);
-
-			maps[j].addr = mapaddr;
-			maps[j].offset = offset;
-			dev->mem_resource[i].addr = mapaddr;
-		}
-	}
-
-	TAILQ_INSERT_TAIL(pci_res_list, uio_res, next);
-
-	return (0);
-}
-
 /* parse the "resource" sysfs file */
 #define IORESOURCE_MEM  0x00000200
 
@@ -546,41 +186,6 @@ error:
 	return -1;
 }
 
-/* 
- * parse a sysfs file containing one integer value 
- * different to the eal version, as it needs to work with 64-bit values
- */ 
-static int 
-pci_parse_sysfs_value(const char *filename, uint64_t *val) 
-{
-        FILE *f;
-        char buf[BUFSIZ];
-        char *end = NULL;
- 
-        f = fopen(filename, "r");
-        if (f == NULL) {
-                RTE_LOG(ERR, EAL, "%s(): cannot open sysfs value %s\n",
-                        __func__, filename);
-                return -1;
-        }
- 
-        if (fgets(buf, sizeof(buf), f) == NULL) {
-                RTE_LOG(ERR, EAL, "%s(): cannot read sysfs value %s\n",
-                        __func__, filename);
-                fclose(f);
-                return -1;
-        }
-        *val = strtoull(buf, &end, 0);
-        if ((buf[0] == '\0') || (end == NULL) || (*end != '\n')) {
-                RTE_LOG(ERR, EAL, "%s(): cannot parse sysfs value %s\n",
-                                __func__, filename);
-                fclose(f);
-                return -1;
-        }
-        fclose(f);
-        return 0;
-}
-
 /* Compare two PCI device addresses. */
 static int
 pci_addr_comparison(struct rte_pci_addr *addr, struct rte_pci_addr *addr2)
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
new file mode 100644
index 0000000..61f09cc
--- /dev/null
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
@@ -0,0 +1,403 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <string.h>
+#include <fcntl.h>
+#include <dirent.h>
+#include <sys/stat.h>
+
+#include <rte_log.h>
+#include <rte_pci.h>
+#include <rte_common.h>
+#include <rte_malloc.h>
+#include <rte_tailq.h>
+
+#include "rte_pci_dev_ids.h"
+#include "eal_filesystem.h"
+#include "eal_pci_init.h"
+
+static int pci_parse_sysfs_value(const char *filename, uint64_t *val);
+
+#define OFF_MAX              ((uint64_t)(off_t)-1)
+static int
+pci_uio_get_mappings(const char *devname, struct pci_map maps[], int nb_maps) {
+	int i;
+	char dirname[PATH_MAX];
+	char filename[PATH_MAX];
+	uint64_t offset, size;
+
+	for (i = 0; i != nb_maps; i++) {
+
+		/* check if map directory exists */
+		rte_snprintf(dirname, sizeof(dirname), "%s/maps/map%u", devname, i);
+
+		if (access(dirname, F_OK) != 0)
+			break;
+
+		/* get mapping offset */
+		rte_snprintf(filename, sizeof(filename), "%s/offset", dirname);
+		if (pci_parse_sysfs_value(filename, &offset) < 0) {
+			RTE_LOG(ERR, EAL,
+					"%s(): cannot parse offset of %s\n", __func__, dirname);
+			return (-1);
+		}
+
+		/* get mapping size */
+		rte_snprintf(filename, sizeof(filename), "%s/size", dirname);
+		if (pci_parse_sysfs_value(filename, &size) < 0) {
+			RTE_LOG(ERR, EAL,
+					"%s(): cannot parse size of %s\n", __func__, dirname);
+			return (-1);
+		}
+
+		/* get mapping physical address */
+		rte_snprintf(filename, sizeof(filename), "%s/addr", dirname);
+		if (pci_parse_sysfs_value(filename, &maps[i].phaddr) < 0) {
+			RTE_LOG(ERR, EAL,
+					"%s(): cannot parse addr of %s\n", __func__, dirname);
+			return (-1);
+		}
+
+		if ((offset > OFF_MAX) || (size > SIZE_MAX)) {
+			RTE_LOG(ERR, EAL,
+					"%s(): offset/size exceed system max value\n", __func__);
+			return (-1);
+		}
+
+		maps[i].offset = offset;
+		maps[i].size = size;
+	}
+
+	return (i);
+}
+
+static int
+pci_uio_map_secondary(struct rte_pci_device *dev) {
+	int fd, i;
+	struct mapped_pci_resource *uio_res;
+
+	TAILQ_FOREACH(uio_res, pci_res_list, next) {
+
+		/* skip this element if it doesn't match our PCI address */
+		if (memcmp(&uio_res->pci_addr, &dev->addr, sizeof(dev->addr)))
+			continue;
+
+		for (i = 0; i != uio_res->nb_maps; i++) {
+			/*
+			 * open devname, to mmap it
+			 */
+			fd = open(uio_res->path, O_RDWR);
+			if (fd < 0) {
+				RTE_LOG(ERR, EAL,
+						"Cannot open %s: %s\n", uio_res->path, strerror(errno));
+				return -1;
+			}
+
+			if (pci_map_resource(uio_res->maps[i].addr, fd,
+					(off_t) uio_res->maps[i].offset,
+					(size_t) uio_res->maps[i].size) != uio_res->maps[i].addr) {
+				RTE_LOG(ERR, EAL, "Cannot mmap device resource\n");
+				close(fd);
+				return (-1);
+			}
+			/* fd is not needed in slave process, close it */
+			close(fd);
+		}
+		return (0);
+	}
+
+	RTE_LOG(ERR, EAL, "Cannot find resource for device\n");
+	return -1;
+}
+
+static int
+pci_mknod_uio_dev(const char *sysfs_uio_path, unsigned uio_num) {
+	FILE *f;
+	char filename[PATH_MAX];
+	int ret;
+	unsigned major, minor;
+	dev_t dev;
+
+	/* get the name of the sysfs file that contains the major and minor
+	 * of the uio device and read its content */
+	rte_snprintf(filename, sizeof(filename), "%s/dev", sysfs_uio_path);
+
+	f = fopen(filename, "r");
+	if (f == NULL) {
+		RTE_LOG(ERR, EAL,
+				"%s(): cannot open sysfs to get major:minor\n", __func__);
+		return -1;
+	}
+
+	ret = fscanf(f, "%d:%d", &major, &minor);
+	if (ret != 2) {
+		RTE_LOG(ERR, EAL,
+				"%s(): cannot parse sysfs to get major:minor\n", __func__);
+		fclose(f);
+		return -1;
+	}
+	fclose(f);
+
+	/* create the char device "mknod /dev/uioX c major minor" */
+	rte_snprintf(filename, sizeof(filename), "/dev/uio%u", uio_num);
+	dev = makedev(major, minor);
+	ret = mknod(filename, S_IFCHR | S_IRUSR | S_IWUSR, dev);
+	if (f == NULL) {
+		RTE_LOG(ERR, EAL,
+				"%s(): mknod() failed %s\n", __func__, strerror(errno));
+		return -1;
+	}
+
+	return ret;
+}
+
+/*
+ * Return the uioX char device used for a pci device. On success, return
+ * the UIO number and fill dstbuf string with the path of the device in
+ * sysfs. On error, return a negative value. In this case dstbuf is
+ * invalid.
+ */
+static int
+pci_get_uio_dev(struct rte_pci_device *dev, char *dstbuf,
+		unsigned int buflen) {
+	struct rte_pci_addr *loc = &dev->addr;
+	unsigned int uio_num;
+	struct dirent *e;
+	DIR *dir;
+	char dirname[PATH_MAX];
+
+	/* depending on kernel version, uio can be located in uio/uioX
+	 * or uio:uioX */
+
+	rte_snprintf(dirname, sizeof(dirname),
+			SYSFS_PCI_DEVICES "/" PCI_PRI_FMT "/uio", loc->domain, loc->bus,
+			loc->devid, loc->function);
+
+	dir = opendir(dirname);
+	if (dir == NULL) {
+		/* retry with the parent directory */
+		rte_snprintf(dirname, sizeof(dirname),
+				SYSFS_PCI_DEVICES "/" PCI_PRI_FMT, loc->domain, loc->bus,
+				loc->devid, loc->function);
+		dir = opendir(dirname);
+
+		if (dir == NULL) {
+			RTE_LOG(ERR, EAL, "Cannot opendir %s\n", dirname);
+			return -1;
+		}
+	}
+
+	/* take the first file starting with "uio" */
+	while ((e = readdir(dir)) != NULL) {
+		/* format could be uio%d ...*/
+		int shortprefix_len = sizeof("uio") - 1;
+		/* ... or uio:uio%d */
+		int longprefix_len = sizeof("uio:uio") - 1;
+		char *endptr;
+
+		if (strncmp(e->d_name, "uio", 3) != 0)
+			continue;
+
+		/* first try uio%d */errno = 0;
+		uio_num = strtoull(e->d_name + shortprefix_len, &endptr, 10);
+		if (errno == 0 && endptr != (e->d_name + shortprefix_len)) {
+			rte_snprintf(dstbuf, buflen, "%s/uio%u", dirname, uio_num);
+			break;
+		}
+
+		/* then try uio:uio%d */errno = 0;
+		uio_num = strtoull(e->d_name + longprefix_len, &endptr, 10);
+		if (errno == 0 && endptr != (e->d_name + longprefix_len)) {
+			rte_snprintf(dstbuf, buflen, "%s/uio:uio%u", dirname, uio_num);
+			break;
+		}
+	}
+	closedir(dir);
+
+	/* No uio resource found */
+	if (e == NULL)
+		return -1;
+
+	/* create uio device if we've been asked to */
+	if (internal_config.create_uio_dev
+			&& pci_mknod_uio_dev(dstbuf, uio_num) < 0)
+		RTE_LOG(WARNING, EAL, "Cannot create /dev/uio%u\n", uio_num);
+
+	return uio_num;
+}
+
+/* map the PCI resource of a PCI device in virtual memory */
+int
+pci_uio_map_resource(struct rte_pci_device *dev) {
+	int i, j;
+	char dirname[PATH_MAX];
+	char devname[PATH_MAX]; /* contains the /dev/uioX */
+	void *mapaddr;
+	int uio_num;
+	uint64_t phaddr;
+	uint64_t offset;
+	uint64_t pagesz;
+	int nb_maps;
+	struct rte_pci_addr *loc = &dev->addr;
+	struct mapped_pci_resource *uio_res;
+	struct pci_map *maps;
+
+	dev->intr_handle.fd = -1;
+	dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
+
+	/* secondary processes - use already recorded details */
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
+		return (pci_uio_map_secondary(dev));
+
+	/* find uio resource */
+	uio_num = pci_get_uio_dev(dev, dirname, sizeof(dirname));
+	if (uio_num < 0) {
+		RTE_LOG(WARNING, EAL, "  "PCI_PRI_FMT" not managed by UIO driver, "
+		"skipping\n", loc->domain, loc->bus, loc->devid, loc->function);
+		return -1;
+	}
+	rte_snprintf(devname, sizeof(devname), "/dev/uio%u", uio_num);
+
+	/* save fd if in primary process */
+	dev->intr_handle.fd = open(devname, O_RDWR);
+	if (dev->intr_handle.fd < 0) {
+		RTE_LOG(ERR, EAL, "Cannot open %s: %s\n", devname, strerror(errno));
+		return -1;
+	}
+	dev->intr_handle.type = RTE_INTR_HANDLE_UIO;
+
+	/* allocate the mapping details for secondary processes*/
+	if ((uio_res = rte_zmalloc("UIO_RES", sizeof(*uio_res), 0)) == NULL) {
+		RTE_LOG(ERR, EAL, "%s(): cannot store uio mmap details\n", __func__);
+		return (-1);
+	}
+
+	rte_snprintf(uio_res->path, sizeof(uio_res->path), "%s", devname);
+	memcpy(&uio_res->pci_addr, &dev->addr, sizeof(uio_res->pci_addr));
+
+	/* collect info about device mappings */
+	nb_maps = pci_uio_get_mappings(dirname, uio_res->maps,
+			RTE_DIM(uio_res->maps));
+	if (nb_maps < 0) {
+		rte_free(uio_res);
+		return (nb_maps);
+	}
+
+	uio_res->nb_maps = nb_maps;
+
+	/* Map all BARs */
+	pagesz = sysconf(_SC_PAGESIZE);
+
+	maps = uio_res->maps;
+	for (i = 0; i != PCI_MAX_RESOURCE; i++) {
+		int fd;
+
+		/* skip empty BAR */
+		if ((phaddr = dev->mem_resource[i].phys_addr) == 0)
+			continue;
+
+		for (j = 0;
+				j != nb_maps
+						&& (phaddr != maps[j].phaddr
+								|| dev->mem_resource[i].len != maps[j].size);
+				j++)
+			;
+
+		/* if matching map is found, then use it */
+		if (j != nb_maps) {
+			offset = j * pagesz;
+
+			/*
+			 * open devname, to mmap it
+			 */
+			fd = open(uio_res->path, O_RDWR);
+			if (fd < 0) {
+				RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
+					uio_res->path, strerror(errno));
+				rte_free(uio_res);
+				return -1;
+			}
+
+			if (maps[j].addr != NULL
+					|| (mapaddr = pci_map_resource(NULL, fd,
+							(off_t) offset, (size_t) maps[j].size)) == NULL) {
+				rte_free(uio_res);
+				close(fd);
+				return (-1);
+			}
+			close(fd);
+
+			maps[j].addr = mapaddr;
+			maps[j].offset = offset;
+			dev->mem_resource[i].addr = mapaddr;
+		}
+	}
+
+	TAILQ_INSERT_TAIL(pci_res_list, uio_res, next);
+
+	return (0);
+}
+
+/*
+ * parse a sysfs file containing one integer value
+ * different to the eal version, as it needs to work with 64-bit values
+ */
+static int
+pci_parse_sysfs_value(const char *filename, uint64_t *val) {
+	FILE *f;
+	char buf[BUFSIZ];
+	char *end = NULL;
+
+	f = fopen(filename, "r");
+	if (f == NULL) {
+		RTE_LOG(ERR, EAL,
+				"%s(): cannot open sysfs value %s\n", __func__, filename);
+		return -1;
+	}
+
+	if (fgets(buf, sizeof(buf), f) == NULL) {
+		RTE_LOG(ERR, EAL,
+				"%s(): cannot read sysfs value %s\n", __func__, filename);
+		fclose(f);
+		return -1;
+	}
+	*val = strtoull(buf, &end, 0);
+	if ((buf[0] == '\0') || (end == NULL) || (*end != '\n')) {
+		RTE_LOG(ERR, EAL,
+				"%s(): cannot parse sysfs value %s\n", __func__, filename);
+		fclose(f);
+		return -1;
+	}
+	fclose(f);
+	return 0;
+}
diff --git a/lib/librte_eal/linuxapp/eal/include/eal_pci_init.h b/lib/librte_eal/linuxapp/eal/include/eal_pci_init.h
new file mode 100644
index 0000000..1292eda
--- /dev/null
+++ b/lib/librte_eal/linuxapp/eal/include/eal_pci_init.h
@@ -0,0 +1,66 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef EAL_PCI_INIT_H_
+#define EAL_PCI_INIT_H_
+
+struct pci_map {
+	void *addr;
+	uint64_t offset;
+	uint64_t size;
+	uint64_t phaddr;
+};
+
+/*
+ * For multi-process we need to reproduce all PCI mappings in secondary
+ * processes, so save them in a tailq.
+ */
+struct mapped_pci_resource {
+	TAILQ_ENTRY(mapped_pci_resource) next;
+
+	struct rte_pci_addr pci_addr;
+	char path[PATH_MAX];
+	int nb_maps;
+	struct pci_map maps[PCI_MAX_RESOURCE];
+};
+
+TAILQ_HEAD(mapped_pci_res_list, mapped_pci_resource);
+extern struct mapped_pci_res_list *pci_res_list;
+
+void * pci_map_resource(void * requested_addr, int fd, off_t offset,
+		size_t size);
+
+/* map IGB_UIO resource prototype */
+int pci_uio_map_resource(struct rte_pci_device *dev);
+
+#endif /* EAL_PCI_INIT_H_ */
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v5 03/20] pci: fixing errors in a previous commit found by checkpatch
  2014-06-10 11:11       ` [dpdk-dev] [PATCH v5 00/20] Add VFIO support to DPDK Anatoly Burakov
  2014-06-10 11:11         ` [dpdk-dev] [PATCH v5 01/20] pci: move open() out of pci_map_resource, rename structs Anatoly Burakov
  2014-06-10 11:11         ` [dpdk-dev] [PATCH v5 02/20] pci: move uio mapping code to a separate file Anatoly Burakov
@ 2014-06-10 11:11         ` Anatoly Burakov
  2014-06-10 11:11         ` [dpdk-dev] [PATCH v5 04/20] pci: distinguish between legitimate failures and non-fatal errors Anatoly Burakov
                           ` (18 subsequent siblings)
  21 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-06-10 11:11 UTC (permalink / raw)
  To: dev


Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
index 61f09cc..ae4e716 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
@@ -69,7 +69,7 @@ pci_uio_get_mappings(const char *devname, struct pci_map maps[], int nb_maps) {
 		if (pci_parse_sysfs_value(filename, &offset) < 0) {
 			RTE_LOG(ERR, EAL,
 					"%s(): cannot parse offset of %s\n", __func__, dirname);
-			return (-1);
+			return -1;
 		}
 
 		/* get mapping size */
@@ -77,7 +77,7 @@ pci_uio_get_mappings(const char *devname, struct pci_map maps[], int nb_maps) {
 		if (pci_parse_sysfs_value(filename, &size) < 0) {
 			RTE_LOG(ERR, EAL,
 					"%s(): cannot parse size of %s\n", __func__, dirname);
-			return (-1);
+			return -1;
 		}
 
 		/* get mapping physical address */
@@ -85,20 +85,20 @@ pci_uio_get_mappings(const char *devname, struct pci_map maps[], int nb_maps) {
 		if (pci_parse_sysfs_value(filename, &maps[i].phaddr) < 0) {
 			RTE_LOG(ERR, EAL,
 					"%s(): cannot parse addr of %s\n", __func__, dirname);
-			return (-1);
+			return -1;
 		}
 
 		if ((offset > OFF_MAX) || (size > SIZE_MAX)) {
 			RTE_LOG(ERR, EAL,
 					"%s(): offset/size exceed system max value\n", __func__);
-			return (-1);
+			return -1;
 		}
 
 		maps[i].offset = offset;
 		maps[i].size = size;
 	}
 
-	return (i);
+	return i;
 }
 
 static int
@@ -128,12 +128,12 @@ pci_uio_map_secondary(struct rte_pci_device *dev) {
 					(size_t) uio_res->maps[i].size) != uio_res->maps[i].addr) {
 				RTE_LOG(ERR, EAL, "Cannot mmap device resource\n");
 				close(fd);
-				return (-1);
+				return -1;
 			}
 			/* fd is not needed in slave process, close it */
 			close(fd);
 		}
-		return (0);
+		return 0;
 	}
 
 	RTE_LOG(ERR, EAL, "Cannot find resource for device\n");
@@ -277,7 +277,7 @@ pci_uio_map_resource(struct rte_pci_device *dev) {
 
 	/* secondary processes - use already recorded details */
 	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
-		return (pci_uio_map_secondary(dev));
+		return pci_uio_map_secondary(dev);
 
 	/* find uio resource */
 	uio_num = pci_get_uio_dev(dev, dirname, sizeof(dirname));
@@ -299,7 +299,7 @@ pci_uio_map_resource(struct rte_pci_device *dev) {
 	/* allocate the mapping details for secondary processes*/
 	if ((uio_res = rte_zmalloc("UIO_RES", sizeof(*uio_res), 0)) == NULL) {
 		RTE_LOG(ERR, EAL, "%s(): cannot store uio mmap details\n", __func__);
-		return (-1);
+		return -1;
 	}
 
 	rte_snprintf(uio_res->path, sizeof(uio_res->path), "%s", devname);
@@ -310,7 +310,7 @@ pci_uio_map_resource(struct rte_pci_device *dev) {
 			RTE_DIM(uio_res->maps));
 	if (nb_maps < 0) {
 		rte_free(uio_res);
-		return (nb_maps);
+		return nb_maps;
 	}
 
 	uio_res->nb_maps = nb_maps;
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v5 04/20] pci: distinguish between legitimate failures and non-fatal errors
  2014-06-10 11:11       ` [dpdk-dev] [PATCH v5 00/20] Add VFIO support to DPDK Anatoly Burakov
                           ` (2 preceding siblings ...)
  2014-06-10 11:11         ` [dpdk-dev] [PATCH v5 03/20] pci: fixing errors in a previous commit found by checkpatch Anatoly Burakov
@ 2014-06-10 11:11         ` Anatoly Burakov
  2014-06-10 11:11         ` [dpdk-dev] [PATCH v5 05/20] pci: Rename RTE_PCI_DRV_NEED_IGB_UIO to RTE_PCI_DRV_NEED_MAPPING Anatoly Burakov
                           ` (17 subsequent siblings)
  21 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-06-10 11:11 UTC (permalink / raw)
  To: dev

Currently, EAL does not distinguish between actual failures and expected
initialization errors. E.g. sometimes the driver fails to initialize
because it was not supposed to be initialized in the first place, such
as device not being managed by said driver.

This patch makes EAL fail on actual initialization errors while still
skipping over expected initialization errors.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/eal_common_pci.c    | 16 +++++++++-------
 lib/librte_eal/linuxapp/eal/eal_pci.c     |  7 ++++---
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c |  4 ++--
 3 files changed, 15 insertions(+), 12 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_pci.c b/lib/librte_eal/common/eal_common_pci.c
index 7c23e86..1fb8f2c 100644
--- a/lib/librte_eal/common/eal_common_pci.c
+++ b/lib/librte_eal/common/eal_common_pci.c
@@ -101,8 +101,8 @@ static struct rte_devargs *pci_devargs_lookup(struct rte_pci_device *dev)
 
 /*
  * If vendor/device ID match, call the devinit() function of all
- * registered driver for the given device. Return -1 if no driver is
- * found for this device.
+ * registered driver for the given device. Return -1 if initialization
+ * failed, return 1 if no driver is found for this device.
  * For drivers with the RTE_PCI_DRV_MULTIPLE flag enabled, register
  * the same device multiple times until failure to do so.
  * It is required for non-Intel NIC drivers provided by third-parties such
@@ -118,7 +118,7 @@ pci_probe_all_drivers(struct rte_pci_device *dev)
 		rc = rte_eal_pci_probe_one_driver(dr, dev);
 		if (rc < 0)
 			/* negative value is an error */
-			break;
+			return -1;
 		if (rc > 0)
 			/* positive value means driver not found */
 			continue;
@@ -130,7 +130,7 @@ pci_probe_all_drivers(struct rte_pci_device *dev)
 				;
 		return 0;
 	}
-	return -1;
+	return 1;
 }
 
 /*
@@ -144,6 +144,7 @@ rte_eal_pci_probe(void)
 	struct rte_pci_device *dev = NULL;
 	struct rte_devargs *devargs;
 	int probe_all = 0;
+	int ret = 0;
 
 	if (rte_eal_devargs_type_count(RTE_DEVTYPE_WHITELISTED_PCI) == 0)
 		probe_all = 1;
@@ -157,10 +158,11 @@ rte_eal_pci_probe(void)
 
 		/* probe all or only whitelisted devices */
 		if (probe_all)
-			pci_probe_all_drivers(dev);
+			ret = pci_probe_all_drivers(dev);
 		else if (devargs != NULL &&
-			devargs->type == RTE_DEVTYPE_WHITELISTED_PCI &&
-			pci_probe_all_drivers(dev) < 0)
+			devargs->type == RTE_DEVTYPE_WHITELISTED_PCI)
+			ret = pci_probe_all_drivers(dev);
+		if (ret < 0)
 			rte_exit(EXIT_FAILURE, "Requested device " PCI_PRI_FMT
 				 " cannot be used\n", dev->addr.domain, dev->addr.bus,
 				 dev->addr.devid, dev->addr.function);
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c
index 628813b..0b779ec 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -401,6 +401,7 @@ int
 rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, struct rte_pci_device *dev)
 {
 	struct rte_pci_id *id_table;
+	int ret = 0;
 
 	for (id_table = dr->id_table ; id_table->vendor_id != 0; id_table++) {
 
@@ -431,13 +432,13 @@ rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, struct rte_pci_device *d
 		if (dev->devargs != NULL &&
 			dev->devargs->type == RTE_DEVTYPE_BLACKLISTED_PCI) {
 			RTE_LOG(DEBUG, EAL, "  Device is blacklisted, not initializing\n");
-			return 0;
+			return 1;
 		}
 
 		if (dr->drv_flags & RTE_PCI_DRV_NEED_IGB_UIO) {
 			/* map resources for devices that use igb_uio */
-			if (pci_uio_map_resource(dev) < 0)
-				return -1;
+			if ((ret = pci_uio_map_resource(dev)) != 0)
+				return ret;
 		} else if (dr->drv_flags & RTE_PCI_DRV_FORCE_UNBIND &&
 		           rte_eal_process_type() == RTE_PROC_PRIMARY) {
 			/* unbind current driver */
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
index ae4e716..426769b 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
@@ -137,7 +137,7 @@ pci_uio_map_secondary(struct rte_pci_device *dev) {
 	}
 
 	RTE_LOG(ERR, EAL, "Cannot find resource for device\n");
-	return -1;
+	return 1;
 }
 
 static int
@@ -284,7 +284,7 @@ pci_uio_map_resource(struct rte_pci_device *dev) {
 	if (uio_num < 0) {
 		RTE_LOG(WARNING, EAL, "  "PCI_PRI_FMT" not managed by UIO driver, "
 		"skipping\n", loc->domain, loc->bus, loc->devid, loc->function);
-		return -1;
+		return 1;
 	}
 	rte_snprintf(devname, sizeof(devname), "/dev/uio%u", uio_num);
 
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v5 05/20] pci: Rename RTE_PCI_DRV_NEED_IGB_UIO to RTE_PCI_DRV_NEED_MAPPING
  2014-06-10 11:11       ` [dpdk-dev] [PATCH v5 00/20] Add VFIO support to DPDK Anatoly Burakov
                           ` (3 preceding siblings ...)
  2014-06-10 11:11         ` [dpdk-dev] [PATCH v5 04/20] pci: distinguish between legitimate failures and non-fatal errors Anatoly Burakov
@ 2014-06-10 11:11         ` Anatoly Burakov
  2014-06-10 11:11         ` [dpdk-dev] [PATCH v5 06/20] igb_uio: make igb_uio compilation optional Anatoly Burakov
                           ` (16 subsequent siblings)
  21 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-06-10 11:11 UTC (permalink / raw)
  To: dev

Rename the RTE_PCI_DRV_NEED_IGB_UIO to be more generic.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 app/test/test_pci.c                     | 4 ++--
 lib/librte_eal/bsdapp/eal/eal_pci.c     | 2 +-
 lib/librte_eal/common/include/rte_pci.h | 4 ++--
 lib/librte_eal/linuxapp/eal/eal_pci.c   | 2 +-
 lib/librte_pmd_e1000/em_ethdev.c        | 2 +-
 lib/librte_pmd_e1000/igb_ethdev.c       | 4 ++--
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c     | 4 ++--
 lib/librte_pmd_virtio/virtio_ethdev.c   | 2 +-
 lib/librte_pmd_vmxnet3/vmxnet3_ethdev.c | 2 +-
 9 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/app/test/test_pci.c b/app/test/test_pci.c
index 6908d04..fad118e 100644
--- a/app/test/test_pci.c
+++ b/app/test/test_pci.c
@@ -63,7 +63,7 @@ static int my_driver_init(struct rte_pci_driver *dr,
 			  struct rte_pci_device *dev);
 
 /*
- * To test cases where RTE_PCI_DRV_NEED_IGB_UIO is set, and isn't set, two
+ * To test cases where RTE_PCI_DRV_NEED_MAPPING is set, and isn't set, two
  * drivers are created (one with IGB devices, the other with IXGBE devices).
  */
 
@@ -91,7 +91,7 @@ struct rte_pci_driver my_driver = {
 	.name = "test_driver",
 	.devinit = my_driver_init,
 	.id_table = my_driver_id,
-	.drv_flags = RTE_PCI_DRV_NEED_IGB_UIO,
+	.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
 };
 
 struct rte_pci_driver my_driver2 = {
diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c b/lib/librte_eal/bsdapp/eal/eal_pci.c
index 94ae461..eddbd2f 100644
--- a/lib/librte_eal/bsdapp/eal/eal_pci.c
+++ b/lib/librte_eal/bsdapp/eal/eal_pci.c
@@ -474,7 +474,7 @@ rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, struct rte_pci_device *d
 			return 0;
 		}
 
-		if (dr->drv_flags & RTE_PCI_DRV_NEED_IGB_UIO) {
+		if (dr->drv_flags & RTE_PCI_DRV_NEED_MAPPING) {
 			/* map resources for devices that use igb_uio */
 			if (pci_uio_map_resource(dev) < 0)
 				return -1;
diff --git a/lib/librte_eal/common/include/rte_pci.h b/lib/librte_eal/common/include/rte_pci.h
index c793773..11b8c13 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -190,8 +190,8 @@ struct rte_pci_driver {
 	uint32_t drv_flags;                     /**< Flags contolling handling of device. */
 };
 
-/** Device needs igb_uio kernel module */
-#define RTE_PCI_DRV_NEED_IGB_UIO 0x0001
+/** Device needs PCI BAR mapping (done with either IGB_UIO or VFIO) */
+#define RTE_PCI_DRV_NEED_MAPPING 0x0001
 /** Device driver must be registered several times until failure */
 #define RTE_PCI_DRV_MULTIPLE 0x0002
 /** Device needs to be unbound even if no module is provided */
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c
index 0b779ec..a0abec8 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -435,7 +435,7 @@ rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, struct rte_pci_device *d
 			return 1;
 		}
 
-		if (dr->drv_flags & RTE_PCI_DRV_NEED_IGB_UIO) {
+		if (dr->drv_flags & RTE_PCI_DRV_NEED_MAPPING) {
 			/* map resources for devices that use igb_uio */
 			if ((ret = pci_uio_map_resource(dev)) != 0)
 				return ret;
diff --git a/lib/librte_pmd_e1000/em_ethdev.c b/lib/librte_pmd_e1000/em_ethdev.c
index 493806c..c8355bc 100644
--- a/lib/librte_pmd_e1000/em_ethdev.c
+++ b/lib/librte_pmd_e1000/em_ethdev.c
@@ -280,7 +280,7 @@ static struct eth_driver rte_em_pmd = {
 	{
 		.name = "rte_em_pmd",
 		.id_table = pci_id_em_map,
-		.drv_flags = RTE_PCI_DRV_NEED_IGB_UIO,
+		.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
 	},
 	.eth_dev_init = eth_em_dev_init,
 	.dev_private_size = sizeof(struct e1000_adapter),
diff --git a/lib/librte_pmd_e1000/igb_ethdev.c b/lib/librte_pmd_e1000/igb_ethdev.c
index 5f93bcf..d60f923 100644
--- a/lib/librte_pmd_e1000/igb_ethdev.c
+++ b/lib/librte_pmd_e1000/igb_ethdev.c
@@ -603,7 +603,7 @@ static struct eth_driver rte_igb_pmd = {
 	{
 		.name = "rte_igb_pmd",
 		.id_table = pci_id_igb_map,
-		.drv_flags = RTE_PCI_DRV_NEED_IGB_UIO,
+		.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
 	},
 	.eth_dev_init = eth_igb_dev_init,
 	.dev_private_size = sizeof(struct e1000_adapter),
@@ -616,7 +616,7 @@ static struct eth_driver rte_igbvf_pmd = {
 	{
 		.name = "rte_igbvf_pmd",
 		.id_table = pci_id_igbvf_map,
-		.drv_flags = RTE_PCI_DRV_NEED_IGB_UIO,
+		.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
 	},
 	.eth_dev_init = eth_igbvf_dev_init,
 	.dev_private_size = sizeof(struct e1000_adapter),
diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
index b38235c..f8e6039 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
@@ -999,7 +999,7 @@ static struct eth_driver rte_ixgbe_pmd = {
 	{
 		.name = "rte_ixgbe_pmd",
 		.id_table = pci_id_ixgbe_map,
-		.drv_flags = RTE_PCI_DRV_NEED_IGB_UIO,
+		.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
 	},
 	.eth_dev_init = eth_ixgbe_dev_init,
 	.dev_private_size = sizeof(struct ixgbe_adapter),
@@ -1012,7 +1012,7 @@ static struct eth_driver rte_ixgbevf_pmd = {
 	{
 		.name = "rte_ixgbevf_pmd",
 		.id_table = pci_id_ixgbevf_map,
-		.drv_flags = RTE_PCI_DRV_NEED_IGB_UIO,
+		.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
 	},
 	.eth_dev_init = eth_ixgbevf_dev_init,
 	.dev_private_size = sizeof(struct ixgbe_adapter),
diff --git a/lib/librte_pmd_virtio/virtio_ethdev.c b/lib/librte_pmd_virtio/virtio_ethdev.c
index c2b4dfb..9ef24b5 100644
--- a/lib/librte_pmd_virtio/virtio_ethdev.c
+++ b/lib/librte_pmd_virtio/virtio_ethdev.c
@@ -812,7 +812,7 @@ static struct eth_driver rte_virtio_pmd = {
 	{
 		.name = "rte_virtio_pmd",
 		.id_table = pci_id_virtio_map,
-		.drv_flags = RTE_PCI_DRV_NEED_IGB_UIO,
+		.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
 	},
 	.eth_dev_init = eth_virtio_dev_init,
 	.dev_private_size = sizeof(struct virtio_adapter),
diff --git a/lib/librte_pmd_vmxnet3/vmxnet3_ethdev.c b/lib/librte_pmd_vmxnet3/vmxnet3_ethdev.c
index c41032f..d42d709 100644
--- a/lib/librte_pmd_vmxnet3/vmxnet3_ethdev.c
+++ b/lib/librte_pmd_vmxnet3/vmxnet3_ethdev.c
@@ -268,7 +268,7 @@ static struct eth_driver rte_vmxnet3_pmd = {
 	{
 		.name = "rte_vmxnet3_pmd",
 		.id_table = pci_id_vmxnet3_map,
-		.drv_flags = RTE_PCI_DRV_NEED_IGB_UIO,
+		.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
 	},
 	.eth_dev_init = eth_vmxnet3_dev_init,
 	.dev_private_size = sizeof(struct vmxnet3_adapter),
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v5 06/20] igb_uio: make igb_uio compilation optional
  2014-06-10 11:11       ` [dpdk-dev] [PATCH v5 00/20] Add VFIO support to DPDK Anatoly Burakov
                           ` (4 preceding siblings ...)
  2014-06-10 11:11         ` [dpdk-dev] [PATCH v5 05/20] pci: Rename RTE_PCI_DRV_NEED_IGB_UIO to RTE_PCI_DRV_NEED_MAPPING Anatoly Burakov
@ 2014-06-10 11:11         ` Anatoly Burakov
  2014-06-10 11:11         ` [dpdk-dev] [PATCH v5 07/20] igb_uio: Moved interrupt type out of igb_uio Anatoly Burakov
                           ` (15 subsequent siblings)
  21 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-06-10 11:11 UTC (permalink / raw)
  To: dev

Currently, igb_uio is always compiled. Some Linux distributions may not
want to include igb_uio with DPDK, so we need to make sure that igb_uio
compilation for Linuxapp targets can be optional.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 config/common_linuxapp           | 1 +
 lib/librte_eal/linuxapp/Makefile | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/config/common_linuxapp b/config/common_linuxapp
index 62619c6..b17e37e 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -123,6 +123,7 @@ CONFIG_RTE_LOG_HISTORY=256
 CONFIG_RTE_LIBEAL_USE_HPET=n
 CONFIG_RTE_EAL_ALLOW_INV_SOCKET_ID=n
 CONFIG_RTE_EAL_ALWAYS_PANIC_ON_ERROR=n
+CONFIG_RTE_EAL_IGB_UIO=y
 
 #
 # Compile Environment Abstraction Layer for linux
diff --git a/lib/librte_eal/linuxapp/Makefile b/lib/librte_eal/linuxapp/Makefile
index b00e89f..acbf500 100644
--- a/lib/librte_eal/linuxapp/Makefile
+++ b/lib/librte_eal/linuxapp/Makefile
@@ -31,7 +31,9 @@
 
 include $(RTE_SDK)/mk/rte.vars.mk
 
+ifeq ($(CONFIG_RTE_EAL_IGB_UIO),y)
 DIRS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += igb_uio
+endif
 DIRS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal
 ifeq ($(CONFIG_RTE_LIBRTE_KNI),y)
 DIRS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += kni
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v5 07/20] igb_uio: Moved interrupt type out of igb_uio
  2014-06-10 11:11       ` [dpdk-dev] [PATCH v5 00/20] Add VFIO support to DPDK Anatoly Burakov
                           ` (5 preceding siblings ...)
  2014-06-10 11:11         ` [dpdk-dev] [PATCH v5 06/20] igb_uio: make igb_uio compilation optional Anatoly Burakov
@ 2014-06-10 11:11         ` Anatoly Burakov
  2014-06-10 11:11         ` [dpdk-dev] [PATCH v5 08/20] vfio: add support for VFIO in Linuxapp targets Anatoly Burakov
                           ` (14 subsequent siblings)
  21 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-06-10 11:11 UTC (permalink / raw)
  To: dev

Moving interrupt type enum out of igb_uio and renaming it to be more
generic. Such a strange header naming and separation is done mostly to
make coming virtio patches easier to port to dpdk.org tree.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/Makefile                     |  1 +
 lib/librte_eal/common/include/rte_pci.h            |  1 +
 .../common/include/rte_pci_dev_feature_defs.h      | 46 +++++++++++++++++++++
 .../common/include/rte_pci_dev_features.h          | 44 ++++++++++++++++++++
 lib/librte_eal/linuxapp/igb_uio/igb_uio.c          | 48 +++++++++-------------
 5 files changed, 112 insertions(+), 28 deletions(-)
 create mode 100644 lib/librte_eal/common/include/rte_pci_dev_feature_defs.h
 create mode 100644 lib/librte_eal/common/include/rte_pci_dev_features.h

diff --git a/lib/librte_eal/common/Makefile b/lib/librte_eal/common/Makefile
index 0016fc5..e2a3f3a 100644
--- a/lib/librte_eal/common/Makefile
+++ b/lib/librte_eal/common/Makefile
@@ -39,6 +39,7 @@ INC += rte_rwlock.h rte_spinlock.h rte_tailq.h rte_interrupts.h rte_alarm.h
 INC += rte_string_fns.h rte_cpuflags.h rte_version.h rte_tailq_elem.h
 INC += rte_eal_memconfig.h rte_malloc_heap.h
 INC += rte_hexdump.h rte_devargs.h rte_dev.h
+INC += rte_pci_dev_feature_defs.h rte_pci_dev_features.h
 
 ifeq ($(CONFIG_RTE_INSECURE_FUNCTION_WARNING),y)
 INC += rte_warnings.h
diff --git a/lib/librte_eal/common/include/rte_pci.h b/lib/librte_eal/common/include/rte_pci.h
index 11b8c13..e653027 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -80,6 +80,7 @@ extern "C" {
 #include <sys/queue.h>
 #include <stdint.h>
 #include <inttypes.h>
+
 #include <rte_interrupts.h>
 
 TAILQ_HEAD(pci_device_list, rte_pci_device); /**< PCI devices in D-linked Q. */
diff --git a/lib/librte_eal/common/include/rte_pci_dev_feature_defs.h b/lib/librte_eal/common/include/rte_pci_dev_feature_defs.h
new file mode 100644
index 0000000..82f2c00
--- /dev/null
+++ b/lib/librte_eal/common/include/rte_pci_dev_feature_defs.h
@@ -0,0 +1,46 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_PCI_DEV_DEFS_H_
+#define _RTE_PCI_DEV_DEFS_H_
+
+/* interrupt mode */
+enum rte_intr_mode {
+	RTE_INTR_MODE_NONE = 0,
+	RTE_INTR_MODE_LEGACY,
+	RTE_INTR_MODE_MSI,
+	RTE_INTR_MODE_MSIX,
+	RTE_INTR_MODE_MAX
+};
+
+#endif /* _RTE_PCI_DEV_DEFS_H_ */
diff --git a/lib/librte_eal/common/include/rte_pci_dev_features.h b/lib/librte_eal/common/include/rte_pci_dev_features.h
new file mode 100644
index 0000000..01200de
--- /dev/null
+++ b/lib/librte_eal/common/include/rte_pci_dev_features.h
@@ -0,0 +1,44 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_PCI_DEV_FEATURES_H
+#define _RTE_PCI_DEV_FEATURES_H
+
+#include <rte_pci_dev_feature_defs.h>
+
+#define RTE_INTR_MODE_NONE_NAME "none"
+#define RTE_INTR_MODE_LEGACY_NAME "legacy"
+#define RTE_INTR_MODE_MSI_NAME "msi"
+#define RTE_INTR_MODE_MSIX_NAME "msix"
+
+#endif
diff --git a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
index 09c40bf..7d5e6b4 100644
--- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
+++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
@@ -33,6 +33,7 @@
 #ifdef CONFIG_XEN_DOM0 
 #include <xen/xen.h>
 #endif
+#include <rte_pci_dev_features.h>
 
 /**
  * MSI-X related macros, copy from linux/pci_regs.h in kernel 2.6.39,
@@ -49,14 +50,6 @@
 
 #define IGBUIO_NUM_MSI_VECTORS 1
 
-/* interrupt mode */
-enum igbuio_intr_mode {
-	IGBUIO_LEGACY_INTR_MODE = 0,
-	IGBUIO_MSI_INTR_MODE,
-	IGBUIO_MSIX_INTR_MODE,
-	IGBUIO_INTR_MODE_MAX
-};
-
 /**
  * A structure describing the private information for a uio device.
  */
@@ -64,13 +57,13 @@ struct rte_uio_pci_dev {
 	struct uio_info info;
 	struct pci_dev *pdev;
 	spinlock_t lock; /* spinlock for accessing PCI config space or msix data in multi tasks/isr */
-	enum igbuio_intr_mode mode;
+	enum rte_intr_mode mode;
 	struct msix_entry \
 		msix_entries[IGBUIO_NUM_MSI_VECTORS]; /* pointer to the msix vectors to be allocated later */
 };
 
 static char *intr_mode = NULL;
-static enum igbuio_intr_mode igbuio_intr_mode_preferred = IGBUIO_MSIX_INTR_MODE;
+static enum rte_intr_mode igbuio_intr_mode_preferred = RTE_INTR_MODE_MSIX;
 
 /* PCI device id table */
 static struct pci_device_id igbuio_pci_ids[] = {
@@ -222,14 +215,13 @@ igbuio_set_interrupt_mask(struct rte_uio_pci_dev *udev, int32_t state)
 {
 	struct pci_dev *pdev = udev->pdev;
 
-	if (udev->mode == IGBUIO_MSIX_INTR_MODE) {
+	if (udev->mode == RTE_INTR_MODE_MSIX) {
 		struct msi_desc *desc;
 
 		list_for_each_entry(desc, &pdev->msi_list, list) {
 			igbuio_msix_mask_irq(desc, state);
 		}
-	}
-	else if (udev->mode == IGBUIO_LEGACY_INTR_MODE) {
+	} else if (udev->mode == RTE_INTR_MODE_LEGACY) {
 		uint32_t status;
 		uint16_t old, new;
 
@@ -301,7 +293,7 @@ igbuio_pci_irqhandler(int irq, struct uio_info *info)
 		goto spin_unlock;
 
 	/* for legacy mode, interrupt maybe shared */
-	if (udev->mode == IGBUIO_LEGACY_INTR_MODE) {
+	if (udev->mode == RTE_INTR_MODE_LEGACY) {
 		pci_read_config_dword(pdev, PCI_COMMAND, &cmd_status_dword);
 		status = cmd_status_dword >> 16;
 		/* interrupt is not ours, goes to out */
@@ -520,18 +512,18 @@ igbuio_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
 #endif
 	udev->info.priv = udev;
 	udev->pdev = dev;
-	udev->mode = 0; /* set the default value for interrupt mode */
+	udev->mode = RTE_INTR_MODE_LEGACY;
 	spin_lock_init(&udev->lock);
 
 	/* check if it need to try msix first */
-	if (igbuio_intr_mode_preferred == IGBUIO_MSIX_INTR_MODE) {
+	if (igbuio_intr_mode_preferred == RTE_INTR_MODE_MSIX) {
 		int vector;
 
 		for (vector = 0; vector < IGBUIO_NUM_MSI_VECTORS; vector ++)
 			udev->msix_entries[vector].entry = vector;
 
 		if (pci_enable_msix(udev->pdev, udev->msix_entries, IGBUIO_NUM_MSI_VECTORS) == 0) {
-			udev->mode = IGBUIO_MSIX_INTR_MODE;
+			udev->mode = RTE_INTR_MODE_MSIX;
 		}
 		else {
 			pci_disable_msix(udev->pdev);
@@ -539,13 +531,13 @@ igbuio_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
 		}
 	}
 	switch (udev->mode) {
-	case IGBUIO_MSIX_INTR_MODE:
+	case RTE_INTR_MODE_MSIX:
 		udev->info.irq_flags = 0;
 		udev->info.irq = udev->msix_entries[0].vector;
 		break;
-	case IGBUIO_MSI_INTR_MODE:
+	case RTE_INTR_MODE_MSI:
 		break;
-	case IGBUIO_LEGACY_INTR_MODE:
+	case RTE_INTR_MODE_LEGACY:
 		udev->info.irq_flags = IRQF_SHARED;
 		udev->info.irq = dev->irq;
 		break;
@@ -570,7 +562,7 @@ igbuio_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
 fail_release_iomem:
 	sysfs_remove_group(&dev->dev.kobj, &dev_attr_grp);
 	igbuio_pci_release_iomem(&udev->info);
-	if (udev->mode == IGBUIO_MSIX_INTR_MODE)
+	if (udev->mode == RTE_INTR_MODE_MSIX)
 		pci_disable_msix(udev->pdev);
 	pci_release_regions(dev);
 fail_disable:
@@ -595,7 +587,7 @@ igbuio_pci_remove(struct pci_dev *dev)
 	uio_unregister_device(info);
 	igbuio_pci_release_iomem(info);
 	if (((struct rte_uio_pci_dev *)info->priv)->mode ==
-					IGBUIO_MSIX_INTR_MODE)
+			RTE_INTR_MODE_MSIX)
 		pci_disable_msix(dev);
 	pci_release_regions(dev);
 	pci_disable_device(dev);
@@ -611,11 +603,11 @@ igbuio_config_intr_mode(char *intr_str)
 		return 0;
 	}
 
-	if (!strcmp(intr_str, "msix")) {
-		igbuio_intr_mode_preferred = IGBUIO_MSIX_INTR_MODE;
+	if (!strcmp(intr_str, RTE_INTR_MODE_MSIX_NAME)) {
+		igbuio_intr_mode_preferred = RTE_INTR_MODE_MSIX;
 		printk(KERN_INFO "Use MSIX interrupt\n");
-	} else if (!strcmp(intr_str, "legacy")) {
-		igbuio_intr_mode_preferred = IGBUIO_LEGACY_INTR_MODE;
+	} else if (!strcmp(intr_str, RTE_INTR_MODE_LEGACY_NAME)) {
+		igbuio_intr_mode_preferred = RTE_INTR_MODE_LEGACY;
 		printk(KERN_INFO "Use legacy interrupt\n");
 	} else {
 		printk(KERN_INFO "Error: bad parameter - %s\n", intr_str);
@@ -656,8 +648,8 @@ module_exit(igbuio_pci_exit_module);
 module_param(intr_mode, charp, S_IRUGO | S_IWUSR);
 MODULE_PARM_DESC(intr_mode,
 "igb_uio interrupt mode (default=msix):\n"
-"    msix       Use MSIX interrupt\n"
-"    legacy     Use Legacy interrupt\n"
+"    " RTE_INTR_MODE_MSIX_NAME "       Use MSIX interrupt\n"
+"    " RTE_INTR_MODE_LEGACY_NAME "     Use Legacy interrupt\n"
 "\n");
 
 MODULE_DESCRIPTION("UIO driver for Intel IGB PCI cards");
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v5 08/20] vfio: add support for VFIO in Linuxapp targets
  2014-06-10 11:11       ` [dpdk-dev] [PATCH v5 00/20] Add VFIO support to DPDK Anatoly Burakov
                           ` (6 preceding siblings ...)
  2014-06-10 11:11         ` [dpdk-dev] [PATCH v5 07/20] igb_uio: Moved interrupt type out of igb_uio Anatoly Burakov
@ 2014-06-10 11:11         ` Anatoly Burakov
  2014-06-10 11:11         ` [dpdk-dev] [PATCH v5 09/20] vfio: add VFIO header Anatoly Burakov
                           ` (13 subsequent siblings)
  21 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-06-10 11:11 UTC (permalink / raw)
  To: dev

Add VFIO compilation option to common Linuxapp config.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 config/common_linuxapp | 1 +
 1 file changed, 1 insertion(+)

diff --git a/config/common_linuxapp b/config/common_linuxapp
index b17e37e..2ed4b7e 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -124,6 +124,7 @@ CONFIG_RTE_LIBEAL_USE_HPET=n
 CONFIG_RTE_EAL_ALLOW_INV_SOCKET_ID=n
 CONFIG_RTE_EAL_ALWAYS_PANIC_ON_ERROR=n
 CONFIG_RTE_EAL_IGB_UIO=y
+CONFIG_RTE_EAL_VFIO=y
 
 #
 # Compile Environment Abstraction Layer for linux
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v5 09/20] vfio: add VFIO header
  2014-06-10 11:11       ` [dpdk-dev] [PATCH v5 00/20] Add VFIO support to DPDK Anatoly Burakov
                           ` (7 preceding siblings ...)
  2014-06-10 11:11         ` [dpdk-dev] [PATCH v5 08/20] vfio: add support for VFIO in Linuxapp targets Anatoly Burakov
@ 2014-06-10 11:11         ` Anatoly Burakov
  2014-06-10 11:11         ` [dpdk-dev] [PATCH v5 10/20] interrupts: Add support for VFIO interrupts Anatoly Burakov
                           ` (12 subsequent siblings)
  21 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-06-10 11:11 UTC (permalink / raw)
  To: dev

Adding a header that will determine if VFIO support should be compiled
in. If VFIO is enabled in config (and it's enabled by default), then the
header will also check for kernel version. If VFIO is enabled in config
and if the kernel version is 3.6+, then VFIO_PRESENT will be defined.
This is the macro that should be used to determine if VFIO support is
being compiled in.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/linuxapp/eal/include/eal_vfio.h | 49 ++++++++++++++++++++++++++
 1 file changed, 49 insertions(+)
 create mode 100644 lib/librte_eal/linuxapp/eal/include/eal_vfio.h

diff --git a/lib/librte_eal/linuxapp/eal/include/eal_vfio.h b/lib/librte_eal/linuxapp/eal/include/eal_vfio.h
new file mode 100644
index 0000000..354e9ca
--- /dev/null
+++ b/lib/librte_eal/linuxapp/eal/include/eal_vfio.h
@@ -0,0 +1,49 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef EAL_VFIO_H_
+#define EAL_VFIO_H_
+
+/*
+ * determine if VFIO is present on the system
+ */
+#ifdef RTE_EAL_VFIO
+#include <linux/version.h>
+#if LINUX_VERSION_CODE >= KERNEL_VERSION(3, 6, 0)
+#include <linux/vfio.h>
+
+#define VFIO_PRESENT
+#endif /* kernel version */
+#endif /* RTE_EAL_VFIO */
+
+#endif /* EAL_VFIO_H_ */
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v5 10/20] interrupts: Add support for VFIO interrupts
  2014-06-10 11:11       ` [dpdk-dev] [PATCH v5 00/20] Add VFIO support to DPDK Anatoly Burakov
                           ` (8 preceding siblings ...)
  2014-06-10 11:11         ` [dpdk-dev] [PATCH v5 09/20] vfio: add VFIO header Anatoly Burakov
@ 2014-06-10 11:11         ` Anatoly Burakov
  2014-06-10 11:11         ` [dpdk-dev] [PATCH v5 11/20] eal: remove -Wno-return-type for non-existent eal_hpet.c Anatoly Burakov
                           ` (11 subsequent siblings)
  21 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-06-10 11:11 UTC (permalink / raw)
  To: dev

Creating code to handle VFIO interrupts in EAL interrupts (supports all
types of interrupts).

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/linuxapp/eal/eal_interrupts.c       | 287 ++++++++++++++++++++-
 .../linuxapp/eal/include/exec-env/rte_interrupts.h |   4 +
 2 files changed, 286 insertions(+), 5 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
index 58e1ddf..664e522 100644
--- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c
+++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
@@ -36,7 +36,6 @@
 #include <stdlib.h>
 #include <pthread.h>
 #include <sys/queue.h>
-#include <malloc.h>
 #include <stdarg.h>
 #include <unistd.h>
 #include <string.h>
@@ -44,6 +43,7 @@
 #include <inttypes.h>
 #include <sys/epoll.h>
 #include <sys/signalfd.h>
+#include <sys/ioctl.h>
 
 #include <rte_common.h>
 #include <rte_interrupts.h>
@@ -66,6 +66,7 @@
 #include <rte_spinlock.h>
 
 #include "eal_private.h"
+#include "eal_vfio.h"
 
 #define EAL_INTR_EPOLL_WAIT_FOREVER (-1)
 
@@ -87,6 +88,9 @@ union intr_pipefds{
  */
 union rte_intr_read_buffer {
 	int uio_intr_count;              /* for uio device */
+#ifdef VFIO_PRESENT
+	uint64_t vfio_intr_count;        /* for vfio device */
+#endif
 	uint64_t timerfd_num;            /* for timerfd */
 	char charbuf[16];                /* for others */
 };
@@ -119,6 +123,244 @@ static struct rte_intr_source_list intr_sources;
 /* interrupt handling thread */
 static pthread_t intr_thread;
 
+/* VFIO interrupts */
+#ifdef VFIO_PRESENT
+
+#define IRQ_SET_BUF_LEN  (sizeof(struct vfio_irq_set) + sizeof(int))
+
+/* enable legacy (INTx) interrupts */
+static int
+vfio_enable_intx(struct rte_intr_handle *intr_handle) {
+	struct vfio_irq_set *irq_set;
+	char irq_set_buf[IRQ_SET_BUF_LEN];
+	int len, ret;
+	int *fd_ptr;
+
+	len = sizeof(irq_set_buf);
+
+	/* enable INTx */
+	irq_set = (struct vfio_irq_set *) irq_set_buf;
+	irq_set->argsz = len;
+	irq_set->count = 1;
+	irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD | VFIO_IRQ_SET_ACTION_TRIGGER;
+	irq_set->index = VFIO_PCI_INTX_IRQ_INDEX;
+	irq_set->start = 0;
+	fd_ptr = (int *) &irq_set->data;
+	*fd_ptr = intr_handle->fd;
+
+	ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
+
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Error enabling INTx interrupts for fd %d\n",
+						intr_handle->fd);
+		return -1;
+	}
+
+	/* unmask INTx after enabling */
+	memset(irq_set, 0, len);
+	len = sizeof(struct vfio_irq_set);
+	irq_set->argsz = len;
+	irq_set->count = 1;
+	irq_set->flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_UNMASK;
+	irq_set->index = VFIO_PCI_INTX_IRQ_INDEX;
+	irq_set->start = 0;
+
+	ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
+
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Error unmasking INTx interrupts for fd %d\n",
+						intr_handle->fd);
+		return -1;
+	}
+	return 0;
+}
+
+/* disable legacy (INTx) interrupts */
+static int
+vfio_disable_intx(struct rte_intr_handle *intr_handle) {
+	struct vfio_irq_set *irq_set;
+	char irq_set_buf[IRQ_SET_BUF_LEN];
+	int len, ret;
+
+	len = sizeof(struct vfio_irq_set);
+
+	/* mask interrupts before disabling */
+	irq_set = (struct vfio_irq_set *) irq_set_buf;
+	irq_set->argsz = len;
+	irq_set->count = 1;
+	irq_set->flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_UNMASK;
+	irq_set->index = VFIO_PCI_INTX_IRQ_INDEX;
+	irq_set->start = 0;
+
+	ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
+
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Error unmasking INTx interrupts for fd %d\n",
+						intr_handle->fd);
+		return -1;
+	}
+
+	/* disable INTx*/
+	memset(irq_set, 0, len);
+	irq_set->argsz = len;
+	irq_set->count = 0;
+	irq_set->flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_TRIGGER;
+	irq_set->index = VFIO_PCI_INTX_IRQ_INDEX;
+	irq_set->start = 0;
+
+	ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
+
+	if (ret) {
+		RTE_LOG(ERR, EAL,
+			"Error disabling INTx interrupts for fd %d\n", intr_handle->fd);
+		return -1;
+	}
+	return 0;
+}
+
+/* enable MSI-X interrupts */
+static int
+vfio_enable_msi(struct rte_intr_handle *intr_handle) {
+	int len, ret;
+	char irq_set_buf[IRQ_SET_BUF_LEN];
+	struct vfio_irq_set *irq_set;
+	int *fd_ptr;
+
+	len = sizeof(irq_set_buf);
+
+	irq_set = (struct vfio_irq_set *) irq_set_buf;
+	irq_set->argsz = len;
+	irq_set->count = 1;
+	irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD | VFIO_IRQ_SET_ACTION_TRIGGER;
+	irq_set->index = VFIO_PCI_MSI_IRQ_INDEX;
+	irq_set->start = 0;
+	fd_ptr = (int *) &irq_set->data;
+	*fd_ptr = intr_handle->fd;
+
+	ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
+
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Error enabling MSI interrupts for fd %d\n",
+						intr_handle->fd);
+		return -1;
+	}
+
+	/* manually trigger interrupt to enable it */
+	memset(irq_set, 0, len);
+	len = sizeof(struct vfio_irq_set);
+	irq_set->argsz = len;
+	irq_set->count = 1;
+	irq_set->flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_TRIGGER;
+	irq_set->index = VFIO_PCI_MSI_IRQ_INDEX;
+	irq_set->start = 0;
+
+	ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
+
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Error triggering MSI interrupts for fd %d\n",
+						intr_handle->fd);
+		return -1;
+	}
+	return 0;
+}
+
+/* disable MSI-X interrupts */
+static int
+vfio_disable_msi(struct rte_intr_handle *intr_handle) {
+	struct vfio_irq_set *irq_set;
+	char irq_set_buf[IRQ_SET_BUF_LEN];
+	int len, ret;
+
+	len = sizeof(struct vfio_irq_set);
+
+	irq_set = (struct vfio_irq_set *) irq_set_buf;
+	irq_set->argsz = len;
+	irq_set->count = 0;
+	irq_set->flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_TRIGGER;
+	irq_set->index = VFIO_PCI_MSI_IRQ_INDEX;
+	irq_set->start = 0;
+
+	ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
+
+	if (ret)
+		RTE_LOG(ERR, EAL,
+			"Error disabling MSI interrupts for fd %d\n", intr_handle->fd);
+
+	return ret;
+}
+
+/* enable MSI-X interrupts */
+static int
+vfio_enable_msix(struct rte_intr_handle *intr_handle) {
+	int len, ret;
+	char irq_set_buf[IRQ_SET_BUF_LEN];
+	struct vfio_irq_set *irq_set;
+	int *fd_ptr;
+
+	len = sizeof(irq_set_buf);
+
+	irq_set = (struct vfio_irq_set *) irq_set_buf;
+	irq_set->argsz = len;
+	irq_set->count = 1;
+	irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD | VFIO_IRQ_SET_ACTION_TRIGGER;
+	irq_set->index = VFIO_PCI_MSIX_IRQ_INDEX;
+	irq_set->start = 0;
+	fd_ptr = (int *) &irq_set->data;
+	*fd_ptr = intr_handle->fd;
+
+	ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
+
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Error enabling MSI-X interrupts for fd %d\n",
+						intr_handle->fd);
+		return -1;
+	}
+
+	/* manually trigger interrupt to enable it */
+	memset(irq_set, 0, len);
+	len = sizeof(struct vfio_irq_set);
+	irq_set->argsz = len;
+	irq_set->count = 1;
+	irq_set->flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_TRIGGER;
+	irq_set->index = VFIO_PCI_MSIX_IRQ_INDEX;
+	irq_set->start = 0;
+
+	ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
+
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Error triggering MSI-X interrupts for fd %d\n",
+						intr_handle->fd);
+		return -1;
+	}
+	return 0;
+}
+
+/* disable MSI-X interrupts */
+static int
+vfio_disable_msix(struct rte_intr_handle *intr_handle) {
+	struct vfio_irq_set *irq_set;
+	char irq_set_buf[IRQ_SET_BUF_LEN];
+	int len, ret;
+
+	len = sizeof(struct vfio_irq_set);
+
+	irq_set = (struct vfio_irq_set *) irq_set_buf;
+	irq_set->argsz = len;
+	irq_set->count = 0;
+	irq_set->flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_TRIGGER;
+	irq_set->index = VFIO_PCI_MSIX_IRQ_INDEX;
+	irq_set->start = 0;
+
+	ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
+
+	if (ret)
+		RTE_LOG(ERR, EAL,
+			"Error disabling MSI-X interrupts for fd %d\n", intr_handle->fd);
+
+	return ret;
+}
+#endif
+
 int
 rte_intr_callback_register(struct rte_intr_handle *intr_handle,
 			rte_intr_callback_fn cb, void *cb_arg)
@@ -276,6 +518,20 @@ rte_intr_enable(struct rte_intr_handle *intr_handle)
 	/* not used at this moment */
 	case RTE_INTR_HANDLE_ALARM:
 		return -1;
+#ifdef VFIO_PRESENT
+	case RTE_INTR_HANDLE_VFIO_MSIX:
+		if (vfio_enable_msix(intr_handle))
+			return -1;
+		break;
+	case RTE_INTR_HANDLE_VFIO_MSI:
+		if (vfio_enable_msi(intr_handle))
+			return -1;
+		break;
+	case RTE_INTR_HANDLE_VFIO_LEGACY:
+		if (vfio_enable_intx(intr_handle))
+			return -1;
+		break;
+#endif
 	/* unknown handle type */
 	default:
 		RTE_LOG(ERR, EAL,
@@ -300,7 +556,7 @@ rte_intr_disable(struct rte_intr_handle *intr_handle)
 	case RTE_INTR_HANDLE_UIO:
 		if (write(intr_handle->fd, &value, sizeof(value)) < 0){
 			RTE_LOG(ERR, EAL,
-				"Error enabling interrupts for fd %d\n",
+				"Error disabling interrupts for fd %d\n",
 							intr_handle->fd);
 			return -1;
 		}
@@ -308,6 +564,20 @@ rte_intr_disable(struct rte_intr_handle *intr_handle)
 	/* not used at this moment */
 	case RTE_INTR_HANDLE_ALARM:
 		return -1;
+#ifdef VFIO_PRESENT
+	case RTE_INTR_HANDLE_VFIO_MSIX:
+		if (vfio_disable_msix(intr_handle))
+			return -1;
+		break;
+	case RTE_INTR_HANDLE_VFIO_MSI:
+		if (vfio_disable_msi(intr_handle))
+			return -1;
+		break;
+	case RTE_INTR_HANDLE_VFIO_LEGACY:
+		if (vfio_disable_intx(intr_handle))
+			return -1;
+		break;
+#endif
 	/* unknown handle type */
 	default:
 		RTE_LOG(ERR, EAL,
@@ -357,11 +627,18 @@ eal_intr_process_interrupts(struct epoll_event *events, int nfds)
 		/* set the length to be read dor different handle type */
 		switch (src->intr_handle.type) {
 		case RTE_INTR_HANDLE_UIO:
-			bytes_read = 4;
+			bytes_read = sizeof(buf.uio_intr_count);
 			break;
 		case RTE_INTR_HANDLE_ALARM:
-			bytes_read = sizeof(uint64_t);
+			bytes_read = sizeof(buf.timerfd_num);
+			break;
+#ifdef VFIO_PRESENT
+		case RTE_INTR_HANDLE_VFIO_MSIX:
+		case RTE_INTR_HANDLE_VFIO_MSI:
+		case RTE_INTR_HANDLE_VFIO_LEGACY:
+			bytes_read = sizeof(buf.vfio_intr_count);
 			break;
+#endif
 		default:
 			bytes_read = 1;
 			break;
@@ -397,7 +674,7 @@ eal_intr_process_interrupts(struct epoll_event *events, int nfds)
 				active_cb.cb_fn(&src->intr_handle,
 					active_cb.cb_arg);
 
-				/*get the lcok back. */
+				/*get the lock back. */
 				rte_spinlock_lock(&intr_lock);
 			}
 		}
diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
index 6733948..e00a343 100644
--- a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
+++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
@@ -41,12 +41,16 @@
 enum rte_intr_handle_type {
 	RTE_INTR_HANDLE_UNKNOWN = 0,
 	RTE_INTR_HANDLE_UIO,      /**< uio device handle */
+	RTE_INTR_HANDLE_VFIO_LEGACY,  /**< vfio device handle (legacy) */
+	RTE_INTR_HANDLE_VFIO_MSI,     /**< vfio device handle (MSI) */
+	RTE_INTR_HANDLE_VFIO_MSIX,    /**< vfio device handle (MSIX) */
 	RTE_INTR_HANDLE_ALARM,    /**< alarm handle */
 	RTE_INTR_HANDLE_MAX
 };
 
 /** Handle for interrupts. */
 struct rte_intr_handle {
+	int vfio_dev_fd;                 /**< VFIO device file descriptor */
 	int fd;                          /**< file descriptor */
 	enum rte_intr_handle_type type;  /**< handle type */
 };
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v5 11/20] eal: remove -Wno-return-type for non-existent eal_hpet.c
  2014-06-10 11:11       ` [dpdk-dev] [PATCH v5 00/20] Add VFIO support to DPDK Anatoly Burakov
                           ` (9 preceding siblings ...)
  2014-06-10 11:11         ` [dpdk-dev] [PATCH v5 10/20] interrupts: Add support for VFIO interrupts Anatoly Burakov
@ 2014-06-10 11:11         ` Anatoly Burakov
  2014-06-10 11:11         ` [dpdk-dev] [PATCH v5 12/20] vfio: create mapping code for VFIO Anatoly Burakov
                           ` (10 subsequent siblings)
  21 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-06-10 11:11 UTC (permalink / raw)
  To: dev

eal_hpet.c was renamed to eal_timer.c and, thanks to code changes, does
not need the -Wno-return-type any more.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/linuxapp/eal/Makefile | 1 -
 1 file changed, 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index d958014..5f3be5f 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -93,7 +93,6 @@ CFLAGS_eal_common_whitelist.o := -D_GNU_SOURCE
 # http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12603
 ifeq ($(CONFIG_RTE_TOOLCHAIN_GCC),y)
 CFLAGS_eal_thread.o += -Wno-return-type
-CFLAGS_eal_hpet.o += -Wno-return-type
 endif
 
 INC := rte_per_lcore.h rte_lcore.h rte_interrupts.h rte_kni_common.h rte_dom0_common.h
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v5 12/20] vfio: create mapping code for VFIO
  2014-06-10 11:11       ` [dpdk-dev] [PATCH v5 00/20] Add VFIO support to DPDK Anatoly Burakov
                           ` (10 preceding siblings ...)
  2014-06-10 11:11         ` [dpdk-dev] [PATCH v5 11/20] eal: remove -Wno-return-type for non-existent eal_hpet.c Anatoly Burakov
@ 2014-06-10 11:11         ` Anatoly Burakov
  2014-06-10 11:11         ` [dpdk-dev] [PATCH v5 13/20] vfio: add multiprocess support Anatoly Burakov
                           ` (9 subsequent siblings)
  21 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-06-10 11:11 UTC (permalink / raw)
  To: dev

Adding code to support VFIO mapping (primary processes only). Most of
the things are done via ioctl() calls on either /dev/vfio/vfio (the
container) or a /dev/vfio/$GROUP_NR (IOMMU group).

In a nutshell, the code does the following:
1. creates a VFIO container (an entity that allows sharing IOMMU DMA
   mappings between devices)
2. checks if a given PCI device is a member of an IOMMU group (if it's
   not, this indicates that the device isn't bound to VFIO)
3. calls open() the group file to obtain a group fd
4. checks if the group is viable (that is, if all the devices in the
   same IOMMU group are either bound to VFIO or not bound to anything)
5. adds the group to a container
6. sets up DMA mappings (only done once, mapping whole DPDK hugepage
   memory for DMA, with a 1:1 correspondence of IOVA to PA)
7. gets the actual PCI device fd from the group fd (can fail, which
   simply means that this particular device is not bound to VFIO)
8. maps BARs (MSI-X BAR cannot be mmaped, so skipping it)
9. sets up interrupt structures (but not enables them!)
10. enables PCI bus mastering

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/linuxapp/eal/Makefile               |   2 +
 lib/librte_eal/linuxapp/eal/eal.c                  |   2 +
 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c         | 706 +++++++++++++++++++++
 .../linuxapp/eal/include/eal_internal_cfg.h        |   3 +
 lib/librte_eal/linuxapp/eal/include/eal_pci_init.h |  31 +
 lib/librte_eal/linuxapp/eal/include/eal_vfio.h     |   6 +
 6 files changed, 750 insertions(+)
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c

diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index 5f3be5f..cf9f026 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -58,6 +58,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_thread.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_log.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci_uio.c
+SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci_vfio.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_debug.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_lcore.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_timer.c
@@ -87,6 +88,7 @@ CFLAGS_eal_log.o := -D_GNU_SOURCE
 CFLAGS_eal_common_log.o := -D_GNU_SOURCE
 CFLAGS_eal_hugepage_info.o := -D_GNU_SOURCE
 CFLAGS_eal_pci.o := -D_GNU_SOURCE
+CFLAGS_eal_pci_vfio.o := -D_GNU_SOURCE
 CFLAGS_eal_common_whitelist.o := -D_GNU_SOURCE
 
 # workaround for a gcc bug with noreturn attribute
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index 9d2675b..aeb5903 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -650,6 +650,8 @@ eal_parse_args(int argc, char **argv)
 	internal_config.force_sockets = 0;
 	internal_config.syslog_facility = LOG_DAEMON;
 	internal_config.xen_dom0_support = 0;
+	/* if set to NONE, interrupt mode is determined automatically */
+	internal_config.vfio_intr_mode = RTE_INTR_MODE_NONE;
 #ifdef RTE_LIBEAL_USE_HPET
 	internal_config.no_hpet = 0;
 #else
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
new file mode 100644
index 0000000..e1d6973
--- /dev/null
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
@@ -0,0 +1,706 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <string.h>
+#include <fcntl.h>
+#include <linux/pci_regs.h>
+#include <sys/eventfd.h>
+#include <sys/socket.h>
+#include <sys/ioctl.h>
+
+#include <rte_log.h>
+#include <rte_pci.h>
+#include <rte_tailq.h>
+#include <rte_eal_memconfig.h>
+#include <rte_malloc.h>
+
+#include "eal_filesystem.h"
+#include "eal_pci_init.h"
+#include "eal_vfio.h"
+
+/**
+ * @file
+ * PCI probing under linux (VFIO version)
+ *
+ * This code tries to determine if the PCI device is bound to VFIO driver,
+ * and initialize it (map BARs, set up interrupts) if that's the case.
+ *
+ * This file is only compiled if CONFIG_RTE_EAL_VFIO is set to "y".
+ */
+
+#ifdef VFIO_PRESENT
+
+#define VFIO_DIR "/dev/vfio"
+#define VFIO_CONTAINER_PATH "/dev/vfio/vfio"
+#define VFIO_GROUP_FMT "/dev/vfio/%u"
+#define VFIO_GET_REGION_ADDR(x) ((uint64_t) x << 40ULL)
+
+/* per-process VFIO config */
+static struct vfio_config vfio_cfg;
+
+/* get PCI BAR number where MSI-X interrupts are */
+static int
+pci_vfio_get_msix_bar(int fd, int *msix_bar)
+{
+	int ret;
+	uint32_t reg;
+	uint8_t cap_id, cap_offset;
+
+	/* read PCI capability pointer from config space */
+	ret = pread64(fd, &reg, sizeof(reg),
+			VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
+			PCI_CAPABILITY_LIST);
+	if (ret != sizeof(reg)) {
+		RTE_LOG(ERR, EAL, "Cannot read capability pointer from PCI "
+				"config space!\n");
+		return -1;
+	}
+
+	/* we need first byte */
+	cap_offset = reg & 0xFF;
+
+	while (cap_offset) {
+
+		/* read PCI capability ID */
+		ret = pread64(fd, &reg, sizeof(reg),
+				VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
+				cap_offset);
+		if (ret != sizeof(reg)) {
+			RTE_LOG(ERR, EAL, "Cannot read capability ID from PCI "
+					"config space!\n");
+			return -1;
+		}
+
+		/* we need first byte */
+		cap_id = reg & 0xFF;
+
+		/* if we haven't reached MSI-X, check next capability */
+		if (cap_id != PCI_CAP_ID_MSIX) {
+			ret = pread64(fd, &reg, sizeof(reg),
+					VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
+					cap_offset);
+			if (ret != sizeof(reg)) {
+				RTE_LOG(ERR, EAL, "Cannot read capability pointer from PCI "
+						"config space!\n");
+				return -1;
+			}
+
+			/* we need second byte */
+			cap_offset = (reg & 0xFF00) >> 8;
+
+			continue;
+		}
+		/* else, read table offset */
+		else {
+			/* table offset resides in the next 4 bytes */
+			ret = pread64(fd, &reg, sizeof(reg),
+					VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
+					cap_offset + 4);
+			if (ret != sizeof(reg)) {
+				RTE_LOG(ERR, EAL, "Cannot read table offset from PCI config "
+						"space!\n");
+				return -1;
+			}
+
+			*msix_bar = reg & RTE_PCI_MSIX_TABLE_BIR;
+
+			return 0;
+		}
+	}
+	return 0;
+}
+
+/* set PCI bus mastering */
+static int
+pci_vfio_set_bus_master(int dev_fd)
+{
+	uint16_t reg;
+	int ret;
+
+	ret = pread64(dev_fd, &reg, sizeof(reg),
+			VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
+			PCI_COMMAND);
+	if (ret != sizeof(reg)) {
+		RTE_LOG(ERR, EAL, "Cannot read command from PCI config space!\n");
+		return -1;
+	}
+
+	/* set the master bit */
+	reg |= PCI_COMMAND_MASTER;
+
+	ret = pwrite64(dev_fd, &reg, sizeof(reg),
+			VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
+			PCI_COMMAND);
+
+	if (ret != sizeof(reg)) {
+		RTE_LOG(ERR, EAL, "Cannot write command to PCI config space!\n");
+		return -1;
+	}
+
+	return 0;
+}
+
+/* set up DMA mappings */
+static int
+pci_vfio_setup_dma_maps(int vfio_container_fd)
+{
+	const struct rte_memseg *ms = rte_eal_get_physmem_layout();
+	int i, ret;
+
+	ret = ioctl(vfio_container_fd, VFIO_SET_IOMMU,
+			VFIO_TYPE1_IOMMU);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "  cannot set IOMMU type!\n");
+		return -1;
+	}
+
+	/* map all DPDK segments for DMA. use 1:1 PA to IOVA mapping */
+	for (i = 0; i < RTE_MAX_MEMSEG; i++) {
+		struct vfio_iommu_type1_dma_map dma_map;
+
+		if (ms[i].addr == NULL)
+			break;
+
+		memset(&dma_map, 0, sizeof(dma_map));
+		dma_map.argsz = sizeof(struct vfio_iommu_type1_dma_map);
+		dma_map.vaddr = ms[i].addr_64;
+		dma_map.size = ms[i].len;
+		dma_map.iova = ms[i].phys_addr;
+		dma_map.flags = VFIO_DMA_MAP_FLAG_READ | VFIO_DMA_MAP_FLAG_WRITE;
+
+		ret = ioctl(vfio_container_fd, VFIO_IOMMU_MAP_DMA, &dma_map);
+
+		if (ret) {
+			RTE_LOG(ERR, EAL, "  cannot set up DMA remapping!\n");
+			return -1;
+		}
+	}
+
+	return 0;
+}
+
+/* set up interrupt support (but not enable interrupts) */
+static int
+pci_vfio_setup_interrupts(struct rte_pci_device *dev, int vfio_dev_fd)
+{
+	int i, ret, intr_idx;
+
+	/* default to invalid index */
+	intr_idx = VFIO_PCI_NUM_IRQS;
+
+	/* get interrupt type from internal config (MSI-X by default, can be
+	 * overriden from the command line
+	 */
+	switch (internal_config.vfio_intr_mode) {
+	case RTE_INTR_MODE_MSIX:
+		intr_idx = VFIO_PCI_MSIX_IRQ_INDEX;
+		break;
+	case RTE_INTR_MODE_MSI:
+		intr_idx = VFIO_PCI_MSI_IRQ_INDEX;
+		break;
+	case RTE_INTR_MODE_LEGACY:
+		intr_idx = VFIO_PCI_INTX_IRQ_INDEX;
+		break;
+	/* don't do anything if we want to automatically determine interrupt type */
+	case RTE_INTR_MODE_NONE:
+		break;
+	default:
+		RTE_LOG(ERR, EAL, "  unknown default interrupt type!\n");
+		return -1;
+	}
+
+	/* start from MSI-X interrupt type */
+	for (i = VFIO_PCI_MSIX_IRQ_INDEX; i >= 0; i--) {
+		struct vfio_irq_info irq = { .argsz = sizeof(irq) };
+		int fd = -1;
+
+		/* skip interrupt modes we don't want */
+		if (internal_config.vfio_intr_mode != RTE_INTR_MODE_NONE &&
+				i != intr_idx)
+			continue;
+
+		irq.index = i;
+
+		ret = ioctl(vfio_dev_fd, VFIO_DEVICE_GET_IRQ_INFO, &irq);
+		if (ret < 0) {
+			RTE_LOG(ERR, EAL, "  cannot get IRQ info!\n");
+			return -1;
+		}
+
+		/* if this vector cannot be used with eventfd, fail if we explicitly
+		 * specified interrupt type, otherwise continue */
+		if ((irq.flags & VFIO_IRQ_INFO_EVENTFD) == 0) {
+			if (internal_config.vfio_intr_mode != RTE_INTR_MODE_NONE) {
+				RTE_LOG(ERR, EAL, "  interrupt vector does not support eventfd!\n");
+				return -1;
+			} else
+				continue;
+		}
+
+		/* set up an eventfd for interrupts */
+		fd = eventfd(0, 0);
+		if (fd < 0) {
+			RTE_LOG(ERR, EAL, "  cannot set up eventfd!\n");
+			return -1;
+		}
+
+		dev->intr_handle.fd = fd;
+		dev->intr_handle.vfio_dev_fd = vfio_dev_fd;
+
+		switch (i) {
+		case VFIO_PCI_MSIX_IRQ_INDEX:
+			internal_config.vfio_intr_mode = RTE_INTR_MODE_MSIX;
+			dev->intr_handle.type = RTE_INTR_HANDLE_VFIO_MSIX;
+			break;
+		case VFIO_PCI_MSI_IRQ_INDEX:
+			internal_config.vfio_intr_mode = RTE_INTR_MODE_MSI;
+			dev->intr_handle.type = RTE_INTR_HANDLE_VFIO_MSI;
+			break;
+		case VFIO_PCI_INTX_IRQ_INDEX:
+			internal_config.vfio_intr_mode = RTE_INTR_MODE_LEGACY;
+			dev->intr_handle.type = RTE_INTR_HANDLE_VFIO_LEGACY;
+			break;
+		default:
+			RTE_LOG(ERR, EAL, "  unknown interrupt type!\n");
+			return -1;
+		}
+
+		return 0;
+	}
+
+	/* if we're here, we haven't found a suitable interrupt vector */
+	return -1;
+}
+
+/* open container fd or get an existing one */
+static int
+pci_vfio_get_container_fd(void)
+{
+	int ret, vfio_container_fd;
+
+	/* if we're in a primary process, try to open the container */
+	if (internal_config.process_type == RTE_PROC_PRIMARY) {
+		vfio_container_fd = open(VFIO_CONTAINER_PATH, O_RDWR);
+		if (vfio_container_fd < 0) {
+			RTE_LOG(ERR, EAL, "  cannot open VFIO container!\n");
+			return -1;
+		}
+
+		/* check VFIO API version */
+		ret = ioctl(vfio_container_fd, VFIO_GET_API_VERSION);
+		if (ret != VFIO_API_VERSION) {
+			RTE_LOG(ERR, EAL, "  unknown VFIO API version!\n");
+			close(vfio_container_fd);
+			return -1;
+		}
+
+		/* check if we support IOMMU type 1 */
+		ret = ioctl(vfio_container_fd, VFIO_CHECK_EXTENSION, VFIO_TYPE1_IOMMU);
+		if (!ret) {
+			RTE_LOG(ERR, EAL, "  unknown IOMMU driver!\n");
+			close(vfio_container_fd);
+			return -1;
+		}
+
+		return vfio_container_fd;
+	}
+
+	return -1;
+}
+
+/* open group fd or get an existing one */
+static int
+pci_vfio_get_group_fd(int iommu_group_no)
+{
+	int i;
+	int vfio_group_fd;
+	char filename[PATH_MAX];
+
+	/* check if we already have the group descriptor open */
+	for (i = 0; i < vfio_cfg.vfio_group_idx; i++)
+		if (vfio_cfg.vfio_groups[i].group_no == iommu_group_no)
+			return vfio_cfg.vfio_groups[i].fd;
+
+	/* if primary, try to open the group */
+	if (internal_config.process_type == RTE_PROC_PRIMARY) {
+		rte_snprintf(filename, sizeof(filename),
+				 VFIO_GROUP_FMT, iommu_group_no);
+		vfio_group_fd = open(filename, O_RDWR);
+		if (vfio_group_fd < 0) {
+			/* if file not found, it's not an error */
+			if (errno != ENOENT) {
+				RTE_LOG(ERR, EAL, "Cannot open %s: %s\n", filename,
+						strerror(errno));
+				return -1;
+			}
+			return 0;
+		}
+
+		/* if the fd is valid, create a new group for it */
+		if (vfio_cfg.vfio_group_idx == VFIO_MAX_GROUPS) {
+			RTE_LOG(ERR, EAL, "Maximum number of VFIO groups reached!\n");
+			return -1;
+		}
+		vfio_cfg.vfio_groups[vfio_cfg.vfio_group_idx].group_no = iommu_group_no;
+		vfio_cfg.vfio_groups[vfio_cfg.vfio_group_idx].fd = vfio_group_fd;
+		return vfio_group_fd;
+	}
+	return -1;
+}
+
+/* parse IOMMU group number for a PCI device
+ * returns -1 for errors, 0 for non-existent group */
+static int
+pci_vfio_get_group_no(const char *pci_addr)
+{
+	char linkname[PATH_MAX];
+	char filename[PATH_MAX];
+	char *tok[16], *group_tok, *end;
+	int ret, iommu_group_no;
+
+	memset(linkname, 0, sizeof(linkname));
+	memset(filename, 0, sizeof(filename));
+
+	/* try to find out IOMMU group for this device */
+	rte_snprintf(linkname, sizeof(linkname),
+			 SYSFS_PCI_DEVICES "/%s/iommu_group", pci_addr);
+
+	ret = readlink(linkname, filename, sizeof(filename));
+
+	/* if the link doesn't exist, no VFIO for us */
+	if (ret < 0)
+		return 0;
+
+	ret = rte_strsplit(filename, sizeof(filename),
+			tok, RTE_DIM(tok), '/');
+
+	if (ret <= 0) {
+		RTE_LOG(ERR, EAL, "  %s cannot get IOMMU group\n", pci_addr);
+		return -1;
+	}
+
+	/* IOMMU group is always the last token */
+	errno = 0;
+	group_tok = tok[ret - 1];
+	end = group_tok;
+	iommu_group_no = strtol(group_tok, &end, 10);
+	if ((end != group_tok && *end != '\0') || errno != 0) {
+		RTE_LOG(ERR, EAL, "  %s error parsing IOMMU number!\n", pci_addr);
+		return -1;
+	}
+
+	return iommu_group_no;
+}
+
+static void
+clear_current_group(void)
+{
+	vfio_cfg.vfio_groups[vfio_cfg.vfio_group_idx].group_no = 0;
+	vfio_cfg.vfio_groups[vfio_cfg.vfio_group_idx].fd = -1;
+}
+
+
+/*
+ * map the PCI resources of a PCI device in virtual memory (VFIO version).
+ * primary and secondary processes follow almost exactly the same path
+ */
+int
+pci_vfio_map_resource(struct rte_pci_device *dev)
+{
+	struct vfio_group_status group_status = {
+			.argsz = sizeof(group_status)
+	};
+	struct vfio_device_info device_info = { .argsz = sizeof(device_info) };
+	int vfio_group_fd, vfio_dev_fd;
+	int iommu_group_no;
+	char pci_addr[PATH_MAX] = {0};
+	struct rte_pci_addr *loc = &dev->addr;
+	int i, ret, msix_bar;
+	struct mapped_pci_resource *vfio_res = NULL;
+	struct pci_map *maps;
+
+	dev->intr_handle.fd = -1;
+	dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
+
+	/* store PCI address string */
+	rte_snprintf(pci_addr, sizeof(pci_addr), PCI_PRI_FMT,
+			loc->domain, loc->bus, loc->devid, loc->function);
+
+	/* get container fd (needs to be done only once per initialization) */
+	if (vfio_cfg.vfio_container_fd == -1) {
+		int vfio_container_fd = pci_vfio_get_container_fd();
+		if (vfio_container_fd < 0) {
+			RTE_LOG(ERR, EAL, "  %s cannot open VFIO container!\n", pci_addr);
+			return -1;
+		}
+
+		vfio_cfg.vfio_container_fd = vfio_container_fd;
+	}
+
+	/* get group number */
+	iommu_group_no = pci_vfio_get_group_no(pci_addr);
+
+	/* if 0, group doesn't exist */
+	if (iommu_group_no == 0) {
+		RTE_LOG(WARNING, EAL, "  %s not managed by VFIO driver, skipping\n",
+				pci_addr);
+		return 1;
+	}
+	/* if negative, something failed */
+	else if (iommu_group_no < 0)
+		return -1;
+
+	/* get the actual group fd */
+	vfio_group_fd = pci_vfio_get_group_fd(iommu_group_no);
+	if (vfio_group_fd < 0)
+		return -1;
+
+	/* store group fd */
+	vfio_cfg.vfio_groups[vfio_cfg.vfio_group_idx].group_no = iommu_group_no;
+	vfio_cfg.vfio_groups[vfio_cfg.vfio_group_idx].fd = vfio_group_fd;
+
+	/* if group_fd == 0, that means the device isn't managed by VFIO */
+	if (vfio_group_fd == 0) {
+		RTE_LOG(WARNING, EAL, "  %s not managed by VFIO driver, skipping\n",
+				pci_addr);
+		/* we store 0 as group fd to distinguish between existing but
+		 * unbound VFIO groups, and groups that don't exist at all.
+		 */
+		vfio_cfg.vfio_group_idx++;
+		return 1;
+	}
+
+	/*
+	 * at this point, we know at least one port on this device is bound to VFIO,
+	 * so we can proceed to try and set this particular port up
+	 */
+
+	/* check if the group is viable */
+	ret = ioctl(vfio_group_fd, VFIO_GROUP_GET_STATUS, &group_status);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "  %s cannot get group status!\n", pci_addr);
+		close(vfio_group_fd);
+		clear_current_group();
+		return -1;
+	} else if (!(group_status.flags & VFIO_GROUP_FLAGS_VIABLE)) {
+		RTE_LOG(ERR, EAL, "  %s VFIO group is not viable!\n", pci_addr);
+		close(vfio_group_fd);
+		clear_current_group();
+		return -1;
+	}
+
+	/*
+	 * at this point, we know that this group is viable (meaning, all devices
+	 * are either bound to VFIO or not bound to anything)
+	 */
+
+	/* check if group does not have a container yet */
+	if (!(group_status.flags & VFIO_GROUP_FLAGS_CONTAINER_SET)) {
+
+		/* add group to a container */
+		ret = ioctl(vfio_group_fd, VFIO_GROUP_SET_CONTAINER,
+				&vfio_cfg.vfio_container_fd);
+		if (ret) {
+			RTE_LOG(ERR, EAL, "  %s cannot add VFIO group to container!\n",
+					pci_addr);
+			close(vfio_group_fd);
+			clear_current_group();
+			return -1;
+		}
+		/*
+		 * at this point we know that this group has been successfully
+		 * initialized, so we increment vfio_group_idx to indicate that we can
+		 * add new groups.
+		 */
+		vfio_cfg.vfio_group_idx++;
+	}
+
+	/*
+	 * set up DMA mappings for container (needs to be done only once, only when
+	 * at least one group is assigned to a container and only in primary process)
+	 */
+	if (internal_config.process_type == RTE_PROC_PRIMARY &&
+			vfio_cfg.vfio_container_has_dma == 0) {
+		ret = pci_vfio_setup_dma_maps(vfio_cfg.vfio_container_fd);
+		if (ret) {
+			RTE_LOG(ERR, EAL, "  %s DMA remapping failed!\n", pci_addr);
+			return -1;
+		}
+		vfio_cfg.vfio_container_has_dma = 1;
+	}
+
+	/* get a file descriptor for the device */
+	vfio_dev_fd = ioctl(vfio_group_fd, VFIO_GROUP_GET_DEVICE_FD, pci_addr);
+	if (vfio_dev_fd < 0) {
+		/* if we cannot get a device fd, this simply means that this
+		 * particular port is not bound to VFIO
+		 */
+		RTE_LOG(WARNING, EAL, "  %s not managed by VFIO driver, skipping\n",
+				pci_addr);
+		return 1;
+	}
+
+	/* test and setup the device */
+	ret = ioctl(vfio_dev_fd, VFIO_DEVICE_GET_INFO, &device_info);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "  %s cannot get device info!\n", pci_addr);
+		close(vfio_dev_fd);
+		return -1;
+	}
+
+	/* get MSI-X BAR, if any (we have to know where it is because we can't
+	 * mmap it when using VFIO) */
+	msix_bar = -1;
+	ret = pci_vfio_get_msix_bar(vfio_dev_fd, &msix_bar);
+	if (ret < 0) {
+		RTE_LOG(ERR, EAL, "  %s cannot get MSI-X BAR number!\n", pci_addr);
+		close(vfio_dev_fd);
+		return -1;
+	}
+
+	/* if we're in a primary process, allocate vfio_res and get region info */
+	if (internal_config.process_type == RTE_PROC_PRIMARY) {
+		if ((vfio_res = rte_zmalloc("VFIO_RES", sizeof(*vfio_res), 0))
+				== NULL) {
+			RTE_LOG(ERR, EAL,
+				"%s(): cannot store uio mmap details\n", __func__);
+			close(vfio_dev_fd);
+			return -1;
+		}
+		memcpy(&vfio_res->pci_addr, &dev->addr, sizeof(vfio_res->pci_addr));
+
+		/* get number of registers (up to BAR5) */
+		vfio_res->nb_maps = RTE_MIN((int) device_info.num_regions,
+				VFIO_PCI_BAR5_REGION_INDEX + 1);
+	}
+
+	/* map BARs */
+	maps = vfio_res->maps;
+
+	for (i = 0; i < (int) vfio_res->nb_maps; i++) {
+		struct vfio_region_info reg = { .argsz = sizeof(reg) };
+		void *bar_addr;
+
+		reg.index = i;
+
+		ret = ioctl(vfio_dev_fd, VFIO_DEVICE_GET_REGION_INFO, &reg);
+
+		if (ret) {
+			RTE_LOG(ERR, EAL, "  %s cannot get device region info!\n",
+					pci_addr);
+			close(vfio_dev_fd);
+			if (internal_config.process_type == RTE_PROC_PRIMARY)
+				rte_free(vfio_res);
+			return -1;
+		}
+
+		/* skip non-mmapable BARs */
+		if ((reg.flags & VFIO_REGION_INFO_FLAG_MMAP) == 0)
+			continue;
+
+		/* skip MSI-X BAR */
+		if (i == msix_bar)
+			continue;
+
+		bar_addr = pci_map_resource(maps[i].addr, vfio_dev_fd, reg.offset,
+				reg.size);
+
+		if (bar_addr == NULL) {
+			RTE_LOG(ERR, EAL, "  %s mapping BAR%i failed: %s\n", pci_addr, i,
+					strerror(errno));
+			close(vfio_dev_fd);
+			if (internal_config.process_type == RTE_PROC_PRIMARY)
+				rte_free(vfio_res);
+			return -1;
+		}
+
+		maps[i].addr = bar_addr;
+		maps[i].offset = reg.offset;
+		maps[i].size = reg.size;
+		dev->mem_resource[i].addr = bar_addr;
+	}
+
+	/* if secondary process, do not set up interrupts */
+	if (internal_config.process_type == RTE_PROC_PRIMARY) {
+		if (pci_vfio_setup_interrupts(dev, vfio_dev_fd) != 0) {
+			RTE_LOG(ERR, EAL, "  %s error setting up interrupts!\n", pci_addr);
+			close(vfio_dev_fd);
+			rte_free(vfio_res);
+			return -1;
+		}
+
+		/* set bus mastering for the device */
+		if (pci_vfio_set_bus_master(vfio_dev_fd)) {
+			RTE_LOG(ERR, EAL, "  %s cannot set up bus mastering!\n", pci_addr);
+			close(vfio_dev_fd);
+			rte_free(vfio_res);
+			return -1;
+		}
+
+		/* Reset the device */
+		ioctl(vfio_dev_fd, VFIO_DEVICE_RESET);
+	}
+
+	if (internal_config.process_type == RTE_PROC_PRIMARY)
+		TAILQ_INSERT_TAIL(pci_res_list, vfio_res, next);
+
+	return 0;
+}
+
+int
+pci_vfio_enable(void)
+{
+	/* initialize group list */
+	int i;
+
+	for (i = 0; i < VFIO_MAX_GROUPS; i++) {
+		vfio_cfg.vfio_groups[i].fd = -1;
+		vfio_cfg.vfio_groups[i].group_no = -1;
+	}
+	vfio_cfg.vfio_container_fd = -1;
+
+	/* check if we have VFIO driver enabled */
+	if (access(VFIO_DIR, F_OK) == 0)
+		vfio_cfg.vfio_enabled = 1;
+	else
+		RTE_LOG(INFO, EAL, "VFIO driver not loaded or wrong permissions\n");
+
+	return 0;
+}
+
+int
+pci_vfio_is_enabled(void)
+{
+	return vfio_cfg.vfio_enabled;
+}
+#endif
diff --git a/lib/librte_eal/linuxapp/eal/include/eal_internal_cfg.h b/lib/librte_eal/linuxapp/eal/include/eal_internal_cfg.h
index 92e3065..5468b0a 100644
--- a/lib/librte_eal/linuxapp/eal/include/eal_internal_cfg.h
+++ b/lib/librte_eal/linuxapp/eal/include/eal_internal_cfg.h
@@ -40,6 +40,7 @@
 #define _EAL_LINUXAPP_INTERNAL_CFG
 
 #include <rte_eal.h>
+#include <rte_pci_dev_feature_defs.h>
 
 #define MAX_HUGEPAGE_SIZES 3  /**< support up to 3 page sizes */
 
@@ -76,6 +77,8 @@ struct internal_config {
 	volatile uint64_t socket_mem[RTE_MAX_NUMA_NODES]; /**< amount of memory per socket */
 	uintptr_t base_virtaddr;          /**< base address to try and reserve memory from */
 	volatile int syslog_facility;	  /**< facility passed to openlog() */
+	/** default interrupt mode for VFIO */
+	volatile enum rte_intr_mode vfio_intr_mode;
 	const char *hugefile_prefix;      /**< the base filename of hugetlbfs files */
 	const char *hugepage_dir;         /**< specific hugetlbfs directory to use */
 
diff --git a/lib/librte_eal/linuxapp/eal/include/eal_pci_init.h b/lib/librte_eal/linuxapp/eal/include/eal_pci_init.h
index 1292eda..23fb3c3 100644
--- a/lib/librte_eal/linuxapp/eal/include/eal_pci_init.h
+++ b/lib/librte_eal/linuxapp/eal/include/eal_pci_init.h
@@ -34,6 +34,8 @@
 #ifndef EAL_PCI_INIT_H_
 #define EAL_PCI_INIT_H_
 
+#include "eal_vfio.h"
+
 struct pci_map {
 	void *addr;
 	uint64_t offset;
@@ -63,4 +65,33 @@ void * pci_map_resource(void * requested_addr, int fd, off_t offset,
 /* map IGB_UIO resource prototype */
 int pci_uio_map_resource(struct rte_pci_device *dev);
 
+#ifdef VFIO_PRESENT
+
+#define VFIO_MAX_GROUPS 64
+
+int pci_vfio_enable(void);
+int pci_vfio_is_enabled(void);
+
+/* map VFIO resource prototype */
+int pci_vfio_map_resource(struct rte_pci_device *dev);
+
+/*
+ * we don't need to store device fd's anywhere since they can be obtained from
+ * the group fd via an ioctl() call.
+ */
+struct vfio_group {
+	int group_no;
+	int fd;
+};
+
+struct vfio_config {
+	int vfio_enabled;
+	int vfio_container_fd;
+	int vfio_container_has_dma;
+	int vfio_group_idx;
+	struct vfio_group vfio_groups[VFIO_MAX_GROUPS];
+};
+
+#endif
+
 #endif /* EAL_PCI_INIT_H_ */
diff --git a/lib/librte_eal/linuxapp/eal/include/eal_vfio.h b/lib/librte_eal/linuxapp/eal/include/eal_vfio.h
index 354e9ca..03e693e 100644
--- a/lib/librte_eal/linuxapp/eal/include/eal_vfio.h
+++ b/lib/librte_eal/linuxapp/eal/include/eal_vfio.h
@@ -42,6 +42,12 @@
 #if LINUX_VERSION_CODE >= KERNEL_VERSION(3, 6, 0)
 #include <linux/vfio.h>
 
+#if LINUX_VERSION_CODE < KERNEL_VERSION(3, 10, 0)
+#define RTE_PCI_MSIX_TABLE_BIR 0x7
+#else
+#define RTE_PCI_MSIX_TABLE_BIR PCI_MSIX_TABLE_BIR
+#endif
+
 #define VFIO_PRESENT
 #endif /* kernel version */
 #endif /* RTE_EAL_VFIO */
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v5 13/20] vfio: add multiprocess support.
  2014-06-10 11:11       ` [dpdk-dev] [PATCH v5 00/20] Add VFIO support to DPDK Anatoly Burakov
                           ` (11 preceding siblings ...)
  2014-06-10 11:11         ` [dpdk-dev] [PATCH v5 12/20] vfio: create mapping code for VFIO Anatoly Burakov
@ 2014-06-10 11:11         ` Anatoly Burakov
  2014-06-10 11:11         ` [dpdk-dev] [PATCH v5 14/20] pci: enable VFIO device binding Anatoly Burakov
                           ` (8 subsequent siblings)
  21 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-06-10 11:11 UTC (permalink / raw)
  To: dev

Since VFIO cannot be used to map the same device twice, secondary
processes receive the device/group fd's by means of communicating over a
local socket. Only group and container fd's should be sent, as device
fd's can be obtained via ioctl() calls' on the group fd.

For multiprocess, VFIO distinguishes between existing but unused groups
(e.g. grups that aren't bound to VFIO driver) and non-existing groups in
order to know if the secondary process requests a valid group, or if
secondary process requests something that doesn't exist.

VFIO multiprocess sync communicates over a simple protocol. It defines
two requests - request for group fd, and request for container fd.
Possible replies are: SOCKET_OK (an OK signal), SOCKET_ERR (error
signal) and SOCKET_NO_FD (a signal that indicates that the requested
VFIO group is valid, but no fd is present for that group - indicating
that the respective group is simply not bound to VFIO driver).

Here is the logic in a nutshell:

1. secondary process sends SOCKET_REQ_CONTAINER or SOCKET_REQ_GROUP
1a. in case of SOCKET_REQ_GROUP, client also then sends group number
2. primary process receives message
2a. in case of invalid group, SOCKET_ERR is sent back to secondary
2b. in case of unbound group, SOCKET_NO_FD is sent back to secondary
2c. in case of valid group, SOCKET_OK is sent and followed by fd
3. socket is closed

in case of any error, socket is closed and SOCKET_ERR is sent.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/linuxapp/eal/Makefile               |   1 +
 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c         |  79 ++++-
 lib/librte_eal/linuxapp/eal/eal_pci_vfio_mp_sync.c | 395 +++++++++++++++++++++
 lib/librte_eal/linuxapp/eal/include/eal_pci_init.h |  19 +
 4 files changed, 492 insertions(+), 2 deletions(-)
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_pci_vfio_mp_sync.c

diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index cf9f026..3c05edf 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -59,6 +59,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_log.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci_uio.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci_vfio.c
+SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci_vfio_mp_sync.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_debug.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_lcore.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_timer.c
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
index e1d6973..f0d4f55 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
@@ -303,7 +303,7 @@ pci_vfio_setup_interrupts(struct rte_pci_device *dev, int vfio_dev_fd)
 }
 
 /* open container fd or get an existing one */
-static int
+int
 pci_vfio_get_container_fd(void)
 {
 	int ret, vfio_container_fd;
@@ -333,13 +333,36 @@ pci_vfio_get_container_fd(void)
 		}
 
 		return vfio_container_fd;
+	} else {
+		/*
+		 * if we're in a secondary process, request container fd from the
+		 * primary process via our socket
+		 */
+		int socket_fd;
+		if ((socket_fd = vfio_mp_sync_connect_to_primary()) < 0) {
+			RTE_LOG(ERR, EAL, "  cannot connect to primary process!\n");
+			return -1;
+		}
+		if (vfio_mp_sync_send_request(socket_fd, SOCKET_REQ_CONTAINER) < 0) {
+			RTE_LOG(ERR, EAL, "  cannot request container fd!\n");
+			close(socket_fd);
+			return -1;
+		}
+		vfio_container_fd = vfio_mp_sync_receive_fd(socket_fd);
+		if (vfio_container_fd < 0) {
+			RTE_LOG(ERR, EAL, "  cannot get container fd!\n");
+			close(socket_fd);
+			return -1;
+		}
+		close(socket_fd);
+		return vfio_container_fd;
 	}
 
 	return -1;
 }
 
 /* open group fd or get an existing one */
-static int
+int
 pci_vfio_get_group_fd(int iommu_group_no)
 {
 	int i;
@@ -375,6 +398,44 @@ pci_vfio_get_group_fd(int iommu_group_no)
 		vfio_cfg.vfio_groups[vfio_cfg.vfio_group_idx].fd = vfio_group_fd;
 		return vfio_group_fd;
 	}
+	/* if we're in a secondary process, request group fd from the primary
+	 * process via our socket
+	 */
+	else {
+		int socket_fd, ret;
+		if ((socket_fd = vfio_mp_sync_connect_to_primary()) < 0) {
+			RTE_LOG(ERR, EAL, "  cannot connect to primary process!\n");
+			return -1;
+		}
+		if (vfio_mp_sync_send_request(socket_fd, SOCKET_REQ_GROUP) < 0) {
+			RTE_LOG(ERR, EAL, "  cannot request container fd!\n");
+			close(socket_fd);
+			return -1;
+		}
+		if (vfio_mp_sync_send_request(socket_fd, iommu_group_no) < 0) {
+			RTE_LOG(ERR, EAL, "  cannot send group number!\n");
+			close(socket_fd);
+			return -1;
+		}
+		ret = vfio_mp_sync_receive_request(socket_fd);
+		switch (ret) {
+		case SOCKET_NO_FD:
+			close(socket_fd);
+			return 0;
+		case SOCKET_OK:
+			vfio_group_fd = vfio_mp_sync_receive_fd(socket_fd);
+			/* if we got the fd, return it */
+			if (vfio_group_fd > 0) {
+				close(socket_fd);
+				return vfio_group_fd;
+			}
+			/* fall-through on error */
+		default:
+			RTE_LOG(ERR, EAL, "  cannot get container fd!\n");
+			close(socket_fd);
+			return -1;
+		}
+	}
 	return -1;
 }
 
@@ -602,6 +663,20 @@ pci_vfio_map_resource(struct rte_pci_device *dev)
 		/* get number of registers (up to BAR5) */
 		vfio_res->nb_maps = RTE_MIN((int) device_info.num_regions,
 				VFIO_PCI_BAR5_REGION_INDEX + 1);
+	} else {
+		/* if we're in a secondary process, just find our tailq entry */
+		TAILQ_FOREACH(vfio_res, pci_res_list, next) {
+			if (memcmp(&vfio_res->pci_addr, &dev->addr, sizeof(dev->addr)))
+				continue;
+			break;
+		}
+		/* if we haven't found our tailq entry, something's wrong */
+		if (vfio_res == NULL) {
+			RTE_LOG(ERR, EAL, "  %s cannot find TAILQ entry for PCI device!\n",
+					pci_addr);
+			close(vfio_dev_fd);
+			return -1;
+		}
 	}
 
 	/* map BARs */
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio_mp_sync.c b/lib/librte_eal/linuxapp/eal/eal_pci_vfio_mp_sync.c
new file mode 100644
index 0000000..26dbaa5
--- /dev/null
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio_mp_sync.c
@@ -0,0 +1,395 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <string.h>
+#include <fcntl.h>
+#include <sys/socket.h>
+
+/* sys/un.h with __USE_MISC uses strlen, which is unsafe and should not be used. */
+#ifdef __USE_MISC
+#define REMOVED_USE_MISC
+#undef __USE_MISC
+#endif
+#include <sys/un.h>
+/* make sure we redefine __USE_MISC only if it was previously undefined */
+#ifdef REMOVED_USE_MISC
+#define __USE_MISC
+#undef REMOVED_USE_MISC
+#endif
+
+#include <rte_log.h>
+#include <rte_pci.h>
+#include <rte_tailq.h>
+#include <rte_eal_memconfig.h>
+#include <rte_malloc.h>
+
+#include "eal_filesystem.h"
+#include "eal_pci_init.h"
+
+/**
+ * @file
+ * VFIO socket for communication between primary and secondary processes.
+ *
+ * This file is only compiled if CONFIG_RTE_EAL_VFIO is set to "y".
+ */
+
+#ifdef VFIO_PRESENT
+
+#define SOCKET_PATH_FMT "%s/.%s_mp_socket"
+#define CMSGLEN (CMSG_LEN(sizeof(int)))
+#define FD_TO_CMSGHDR(fd, chdr) \
+		do {\
+			(chdr).cmsg_len = CMSGLEN;\
+			(chdr).cmsg_level = SOL_SOCKET;\
+			(chdr).cmsg_type = SCM_RIGHTS;\
+			memcpy((chdr).__cmsg_data, &(fd), sizeof(fd));\
+		} while (0)
+#define CMSGHDR_TO_FD(chdr, fd) \
+			memcpy(&(fd), (chdr).__cmsg_data, sizeof(fd))
+
+static pthread_t socket_thread;
+static int mp_socket_fd;
+
+
+/* get socket path (/var/run if root, $HOME otherwise) */
+static void
+get_socket_path(char *buffer, int bufsz)
+{
+	const char *dir = "/var/run";
+	const char *home_dir = getenv("HOME");
+
+	if (getuid() != 0 && home_dir != NULL)
+		dir = home_dir;
+
+	/* use current prefix as file path */
+	rte_snprintf(buffer, bufsz, SOCKET_PATH_FMT, dir,
+			internal_config.hugefile_prefix);
+}
+
+
+
+/*
+ * data flow for socket comm protocol:
+ * 1. client sends SOCKET_REQ_CONTAINER or SOCKET_REQ_GROUP
+ * 1a. in case of SOCKET_REQ_GROUP, client also then sends group number
+ * 2. server receives message
+ * 2a. in case of invalid group, SOCKET_ERR is sent back to client
+ * 2b. in case of unbound group, SOCKET_NO_FD is sent back to client
+ * 2c. in case of valid group, SOCKET_OK is sent and immediately followed by fd
+ *
+ * in case of any error, socket is closed.
+ */
+
+/* send a request, return -1 on error */
+int
+vfio_mp_sync_send_request(int socket, int req)
+{
+	struct msghdr hdr;
+	struct iovec iov;
+	int buf;
+	int ret;
+
+	memset(&hdr, 0, sizeof(hdr));
+
+	buf = req;
+
+	hdr.msg_iov = &iov;
+	hdr.msg_iovlen = 1;
+	iov.iov_base = (char *) &buf;
+	iov.iov_len = sizeof(buf);
+
+	ret = sendmsg(socket, &hdr, 0);
+	if (ret < 0)
+		return -1;
+	return 0;
+}
+
+/* receive a request and return it */
+int
+vfio_mp_sync_receive_request(int socket)
+{
+	int buf;
+	struct msghdr hdr;
+	struct iovec iov;
+	int ret, req;
+
+	memset(&hdr, 0, sizeof(hdr));
+
+	buf = SOCKET_ERR;
+
+	hdr.msg_iov = &iov;
+	hdr.msg_iovlen = 1;
+	iov.iov_base = (char *) &buf;
+	iov.iov_len = sizeof(buf);
+
+	ret = recvmsg(socket, &hdr, 0);
+	if (ret < 0)
+		return -1;
+
+	req = buf;
+
+	return req;
+}
+
+/* send OK in message, fd in control message */
+int
+vfio_mp_sync_send_fd(int socket, int fd)
+{
+	int buf;
+	struct msghdr hdr;
+	struct cmsghdr *chdr;
+	char chdr_buf[CMSGLEN];
+	struct iovec iov;
+	int ret;
+
+	chdr = (struct cmsghdr *) chdr_buf;
+	memset(chdr, 0, sizeof(chdr_buf));
+	memset(&hdr, 0, sizeof(hdr));
+
+	hdr.msg_iov = &iov;
+	hdr.msg_iovlen = 1;
+	iov.iov_base = (char *) &buf;
+	iov.iov_len = sizeof(buf);
+	hdr.msg_control = chdr;
+	hdr.msg_controllen = CMSGLEN;
+
+	buf = SOCKET_OK;
+	FD_TO_CMSGHDR(fd, *chdr);
+
+	ret = sendmsg(socket, &hdr, 0);
+	if (ret < 0)
+		return -1;
+	return 0;
+}
+
+/* receive OK in message, fd in control message */
+int
+vfio_mp_sync_receive_fd(int socket)
+{
+	int buf;
+	struct msghdr hdr;
+	struct cmsghdr *chdr;
+	char chdr_buf[CMSGLEN];
+	struct iovec iov;
+	int ret, req, fd;
+
+	buf = SOCKET_ERR;
+
+	chdr = (struct cmsghdr *) chdr_buf;
+	memset(chdr, 0, sizeof(chdr_buf));
+	memset(&hdr, 0, sizeof(hdr));
+
+	hdr.msg_iov = &iov;
+	hdr.msg_iovlen = 1;
+	iov.iov_base = (char *) &buf;
+	iov.iov_len = sizeof(buf);
+	hdr.msg_control = chdr;
+	hdr.msg_controllen = CMSGLEN;
+
+	ret = recvmsg(socket, &hdr, 0);
+	if (ret < 0)
+		return -1;
+
+	req = buf;
+
+	if (req != SOCKET_OK)
+		return -1;
+
+	CMSGHDR_TO_FD(*chdr, fd);
+
+	return fd;
+}
+
+/* connect socket_fd in secondary process to the primary process's socket */
+int
+vfio_mp_sync_connect_to_primary(void)
+{
+	struct sockaddr_un addr;
+	socklen_t sockaddr_len;
+	int socket_fd;
+
+	/* set up a socket */
+	socket_fd = socket(AF_UNIX, SOCK_SEQPACKET, 0);
+	if (socket_fd < 0) {
+		RTE_LOG(ERR, EAL, "Failed to create socket!\n");
+		return -1;
+	}
+
+	get_socket_path(addr.sun_path, sizeof(addr.sun_path));
+	addr.sun_family = AF_UNIX;
+
+	sockaddr_len = sizeof(struct sockaddr_un);
+
+	if (connect(socket_fd, (struct sockaddr *) &addr, sockaddr_len) == 0)
+		return socket_fd;
+
+	/* if connect failed */
+	close(socket_fd);
+	return -1;
+}
+
+
+
+/*
+ * socket listening thread for primary process
+ */
+static __attribute__((noreturn)) void *
+pci_vfio_mp_sync_thread(void __rte_unused * arg)
+{
+	int ret, fd, vfio_group_no;
+
+	/* wait for requests on the socket */
+	for (;;) {
+		int conn_sock;
+		struct sockaddr_un addr;
+		socklen_t sockaddr_len = sizeof(addr);
+
+		/* this is a blocking call */
+		conn_sock = accept(mp_socket_fd, (struct sockaddr *) &addr,
+				&sockaddr_len);
+
+		/* just restart on error */
+		if (conn_sock == -1)
+			continue;
+
+		/* set socket to linger after close */
+		struct linger l;
+		l.l_onoff = 1;
+		l.l_linger = 60;
+		setsockopt(conn_sock, SOL_SOCKET, SO_LINGER, &l, sizeof(l));
+
+		ret = vfio_mp_sync_receive_request(conn_sock);
+
+		switch (ret) {
+		case SOCKET_REQ_CONTAINER:
+			fd = pci_vfio_get_container_fd();
+			if (fd < 0)
+				vfio_mp_sync_send_request(conn_sock, SOCKET_ERR);
+			else
+				vfio_mp_sync_send_fd(conn_sock, fd);
+			break;
+		case SOCKET_REQ_GROUP:
+			/* wait for group number */
+			vfio_group_no = vfio_mp_sync_receive_request(conn_sock);
+			if (vfio_group_no < 0) {
+				close(conn_sock);
+				continue;
+			}
+
+			fd = pci_vfio_get_group_fd(vfio_group_no);
+
+			if (fd < 0)
+				vfio_mp_sync_send_request(conn_sock, SOCKET_ERR);
+			/* if VFIO group exists but isn't bound to VFIO driver */
+			else if (fd == 0)
+				vfio_mp_sync_send_request(conn_sock, SOCKET_NO_FD);
+			/* if group exists and is bound to VFIO driver */
+			else {
+				vfio_mp_sync_send_request(conn_sock, SOCKET_OK);
+				vfio_mp_sync_send_fd(conn_sock, fd);
+			}
+			break;
+		default:
+			vfio_mp_sync_send_request(conn_sock, SOCKET_ERR);
+			break;
+		}
+		close(conn_sock);
+	}
+}
+
+static int
+vfio_mp_sync_socket_setup(void)
+{
+	int ret, socket_fd;
+	struct sockaddr_un addr;
+	socklen_t sockaddr_len;
+
+	/* set up a socket */
+	socket_fd = socket(AF_UNIX, SOCK_SEQPACKET, 0);
+	if (socket_fd < 0) {
+		RTE_LOG(ERR, EAL, "Failed to create socket!\n");
+		return -1;
+	}
+
+	get_socket_path(addr.sun_path, sizeof(addr.sun_path));
+	addr.sun_family = AF_UNIX;
+
+	sockaddr_len = sizeof(struct sockaddr_un);
+
+	unlink(addr.sun_path);
+
+	ret = bind(socket_fd, (struct sockaddr *) &addr, sockaddr_len);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Failed to bind socket: %s!\n", strerror(errno));
+		close(socket_fd);
+		return -1;
+	}
+
+	ret = listen(socket_fd, 50);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Failed to listen: %s!\n", strerror(errno));
+		close(socket_fd);
+		return -1;
+	}
+
+	/* save the socket in local configuration */
+	mp_socket_fd = socket_fd;
+
+	return 0;
+}
+
+/*
+ * set up a local socket and tell it to listen for incoming connections
+ */
+int
+pci_vfio_mp_sync_setup(void)
+{
+	int ret;
+
+	if (vfio_mp_sync_socket_setup() < 0) {
+		RTE_LOG(ERR, EAL, "Failed to set up local socket!\n");
+		return -1;
+	}
+
+	ret = pthread_create(&socket_thread, NULL,
+			pci_vfio_mp_sync_thread, NULL);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Failed to create thread for communication with "
+				"secondary processes!\n");
+		close(mp_socket_fd);
+		return -1;
+	}
+	return 0;
+}
+
+#endif
diff --git a/lib/librte_eal/linuxapp/eal/include/eal_pci_init.h b/lib/librte_eal/linuxapp/eal/include/eal_pci_init.h
index 23fb3c3..45846cc 100644
--- a/lib/librte_eal/linuxapp/eal/include/eal_pci_init.h
+++ b/lib/librte_eal/linuxapp/eal/include/eal_pci_init.h
@@ -71,9 +71,28 @@ int pci_uio_map_resource(struct rte_pci_device *dev);
 
 int pci_vfio_enable(void);
 int pci_vfio_is_enabled(void);
+int pci_vfio_mp_sync_setup(void);
 
 /* map VFIO resource prototype */
 int pci_vfio_map_resource(struct rte_pci_device *dev);
+int pci_vfio_get_group_fd(int iommu_group_fd);
+int pci_vfio_get_container_fd(void);
+
+/*
+ * Function prototypes for VFIO multiprocess sync functions
+ */
+int vfio_mp_sync_send_request(int socket, int req);
+int vfio_mp_sync_receive_request(int socket);
+int vfio_mp_sync_send_fd(int socket, int fd);
+int vfio_mp_sync_receive_fd(int socket);
+int vfio_mp_sync_connect_to_primary(void);
+
+/* socket comm protocol definitions */
+#define SOCKET_REQ_CONTAINER 0x100
+#define SOCKET_REQ_GROUP 0x200
+#define SOCKET_OK 0x0
+#define SOCKET_NO_FD 0x1
+#define SOCKET_ERR 0xFF
 
 /*
  * we don't need to store device fd's anywhere since they can be obtained from
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v5 14/20] pci: enable VFIO device binding
  2014-06-10 11:11       ` [dpdk-dev] [PATCH v5 00/20] Add VFIO support to DPDK Anatoly Burakov
                           ` (12 preceding siblings ...)
  2014-06-10 11:11         ` [dpdk-dev] [PATCH v5 13/20] vfio: add multiprocess support Anatoly Burakov
@ 2014-06-10 11:11         ` Anatoly Burakov
  2014-06-10 11:11         ` [dpdk-dev] [PATCH v5 15/20] eal: added support for selecting VFIO interrupt type from EAL command-line Anatoly Burakov
                           ` (7 subsequent siblings)
  21 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-06-10 11:11 UTC (permalink / raw)
  To: dev

Add support for binding VFIO devices if RTE_PCI_DRV_NEED_MAPPING is set
for this driver. Try VFIO first, if not mapped then try IGB_UIO too.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/linuxapp/eal/eal_pci.c | 42 ++++++++++++++++++++++++++++++++---
 1 file changed, 39 insertions(+), 3 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c
index a0abec8..8a9cbf9 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -393,6 +393,27 @@ error:
 	return -1;
 }
 
+static int
+pci_map_device(struct rte_pci_device *dev)
+{
+	int ret, mapped = 0;
+
+	/* try mapping the NIC resources using VFIO if it exists */
+#ifdef VFIO_PRESENT
+	if (pci_vfio_is_enabled()) {
+		if ((ret = pci_vfio_map_resource(dev)) == 0)
+			mapped = 1;
+		else if (ret < 0)
+			return ret;
+	}
+#endif
+	/* map resources for devices that use igb_uio */
+	if (!mapped && (ret = pci_uio_map_resource(dev)) != 0)
+		return ret;
+
+	return 0;
+}
+
 /*
  * If vendor/device ID match, call the devinit() function of the
  * driver.
@@ -400,8 +421,8 @@ error:
 int
 rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, struct rte_pci_device *dev)
 {
+	int ret;
 	struct rte_pci_id *id_table;
-	int ret = 0;
 
 	for (id_table = dr->id_table ; id_table->vendor_id != 0; id_table++) {
 
@@ -436,8 +457,7 @@ rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, struct rte_pci_device *d
 		}
 
 		if (dr->drv_flags & RTE_PCI_DRV_NEED_MAPPING) {
-			/* map resources for devices that use igb_uio */
-			if ((ret = pci_uio_map_resource(dev)) != 0)
+			if ((ret = pci_map_device(dev)) != 0)
 				return ret;
 		} else if (dr->drv_flags & RTE_PCI_DRV_FORCE_UNBIND &&
 		           rte_eal_process_type() == RTE_PROC_PRIMARY) {
@@ -473,5 +493,21 @@ rte_eal_pci_init(void)
 		RTE_LOG(ERR, EAL, "%s(): Cannot scan PCI bus\n", __func__);
 		return -1;
 	}
+#ifdef VFIO_PRESENT
+	pci_vfio_enable();
+
+	if (pci_vfio_is_enabled()) {
+
+		/* if we are primary process, create a thread to communicate with
+		 * secondary processes. the thread will use a socket to wait for
+		 * requests from secondary process to send open file descriptors,
+		 * because VFIO does not allow multiple open descriptors on a group or
+		 * VFIO container.
+		 */
+		if (internal_config.process_type == RTE_PROC_PRIMARY &&
+				pci_vfio_mp_sync_setup() < 0)
+			return -1;
+	}
+#endif
 	return 0;
 }
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v5 15/20] eal: added support for selecting VFIO interrupt type from EAL command-line
  2014-06-10 11:11       ` [dpdk-dev] [PATCH v5 00/20] Add VFIO support to DPDK Anatoly Burakov
                           ` (13 preceding siblings ...)
  2014-06-10 11:11         ` [dpdk-dev] [PATCH v5 14/20] pci: enable VFIO device binding Anatoly Burakov
@ 2014-06-10 11:11         ` Anatoly Burakov
  2014-06-10 11:11         ` [dpdk-dev] [PATCH v5 16/20] eal: make --no-huge use mmap instead of malloc Anatoly Burakov
                           ` (6 subsequent siblings)
  21 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-06-10 11:11 UTC (permalink / raw)
  To: dev

Unlike igb_uio, VFIO interrupt type is not set by kernel module
parameters but is set up via ioctl() calls at runtime. This warrants
a new EAL command-line parameter. It will have no effect if VFIO is
not compiled, but will set VFIO interrupt type to either "legacy", "msi"
or "msix" if VFIO support is compiled. Note that VFIO initialization
will fail if the interrupt type selected is not supported by the system.

If the interrupt type parameter wasn't specified, VFIO will try all
interrupt types (starting with MSI-X).

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/linuxapp/eal/eal.c | 34 ++++++++++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)

diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index aeb5903..10c40fa 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -99,6 +99,7 @@
 #define OPT_BASE_VIRTADDR   "base-virtaddr"
 #define OPT_XEN_DOM0    "xen-dom0"
 #define OPT_CREATE_UIO_DEV "create-uio-dev"
+#define OPT_VFIO_INTR    "vfio-intr"
 
 #define RTE_EAL_BLACKLIST_SIZE	0x100
 
@@ -361,6 +362,8 @@ eal_usage(const char *prgname)
 	       "  --"OPT_VMWARE_TSC_MAP": use VMware TSC map instead of "
 	    		   "native RDTSC\n"
 	       "  --"OPT_BASE_VIRTADDR": specify base virtual address\n"
+	       "  --"OPT_VFIO_INTR": specify desired interrupt mode for VFIO "
+	       	   	   "(legacy|msi|msix)\n"
 	       "  --"OPT_CREATE_UIO_DEV": create /dev/uioX (usually done by hotplug)\n"
 	       "\nEAL options for DEBUG use only:\n"
 	       "  --"OPT_NO_HUGE"  : use malloc instead of hugetlbfs\n"
@@ -579,6 +582,28 @@ eal_parse_base_virtaddr(const char *arg)
 	return 0;
 }
 
+static int
+eal_parse_vfio_intr(const char *mode)
+{
+	unsigned i;
+	static struct {
+		const char *name;
+		enum rte_intr_mode value;
+	} map[] = {
+		{ "legacy", RTE_INTR_MODE_LEGACY },
+		{ "msi", RTE_INTR_MODE_MSI },
+		{ "msix", RTE_INTR_MODE_MSIX },
+	};
+
+	for (i = 0; i < RTE_DIM(map); i++) {
+		if (!strcmp(mode, map[i].name)) {
+			internal_config.vfio_intr_mode = map[i].value;
+			return 0;
+		}
+	}
+	return -1;
+}
+
 static inline size_t
 eal_get_hugepage_mem_size(void)
 {
@@ -633,6 +658,7 @@ eal_parse_args(int argc, char **argv)
 		{OPT_PCI_BLACKLIST, 1, 0, 0},
 		{OPT_VDEV, 1, 0, 0},
 		{OPT_SYSLOG, 1, NULL, 0},
+		{OPT_VFIO_INTR, 1, NULL, 0},
 		{OPT_BASE_VIRTADDR, 1, 0, 0},
 		{OPT_XEN_DOM0, 0, 0, 0},
 		{OPT_CREATE_UIO_DEV, 1, NULL, 0},
@@ -829,6 +855,14 @@ eal_parse_args(int argc, char **argv)
 					return -1;
 				}
 			}
+			else if (!strcmp(lgopts[option_index].name, OPT_VFIO_INTR)) {
+				if (eal_parse_vfio_intr(optarg) < 0) {
+					RTE_LOG(ERR, EAL, "invalid parameters for --"
+							OPT_VFIO_INTR "\n");
+					eal_usage(prgname);
+					return -1;
+				}
+			}
 			else if (!strcmp(lgopts[option_index].name, OPT_CREATE_UIO_DEV)) {
 				internal_config.create_uio_dev = 1;
 			}
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v5 16/20] eal: make --no-huge use mmap instead of malloc
  2014-06-10 11:11       ` [dpdk-dev] [PATCH v5 00/20] Add VFIO support to DPDK Anatoly Burakov
                           ` (14 preceding siblings ...)
  2014-06-10 11:11         ` [dpdk-dev] [PATCH v5 15/20] eal: added support for selecting VFIO interrupt type from EAL command-line Anatoly Burakov
@ 2014-06-10 11:11         ` Anatoly Burakov
  2014-06-10 11:11         ` [dpdk-dev] [PATCH v5 17/20] test app: adding unit tests for VFIO EAL command-line parameter Anatoly Burakov
                           ` (5 subsequent siblings)
  21 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-06-10 11:11 UTC (permalink / raw)
  To: dev

This makes it possible to run DPDK without hugepage memory when VFIO
is used, as VFIO uses virtual addresses to set up DMA mappings.

Technically, malloc is just fine, but we want to guarantee that
memory will be page-aligned, so using mmap to be safe.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/linuxapp/eal/eal_memory.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
index 8d1edd9..315214b 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -1031,7 +1031,13 @@ rte_eal_hugepage_init(void)
 
 	/* hugetlbfs can be disabled */
 	if (internal_config.no_hugetlbfs) {
-		addr = malloc(internal_config.memory);
+		addr = mmap(NULL, internal_config.memory, PROT_READ | PROT_WRITE,
+				MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
+		if (addr == MAP_FAILED) {
+			RTE_LOG(ERR, EAL, "%s: mmap() failed: %s\n", __func__,
+					strerror(errno));
+			return -1;
+		}
 		mcfg->memseg[0].phys_addr = (phys_addr_t)(uintptr_t)addr;
 		mcfg->memseg[0].addr = addr;
 		mcfg->memseg[0].len = internal_config.memory;
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v5 17/20] test app: adding unit tests for VFIO EAL command-line parameter
  2014-06-10 11:11       ` [dpdk-dev] [PATCH v5 00/20] Add VFIO support to DPDK Anatoly Burakov
                           ` (15 preceding siblings ...)
  2014-06-10 11:11         ` [dpdk-dev] [PATCH v5 16/20] eal: make --no-huge use mmap instead of malloc Anatoly Burakov
@ 2014-06-10 11:11         ` Anatoly Burakov
  2014-06-10 11:11         ` [dpdk-dev] [PATCH v5 18/20] igb_uio: Removed PCI ID table from igb_uio Anatoly Burakov
                           ` (4 subsequent siblings)
  21 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-06-10 11:11 UTC (permalink / raw)
  To: dev

Adding unit tests for VFIO interrupt type command-line parameter. We
don't know if VFIO is compiled (eal_vfio.h header is internal to
Linuxapp EAL), so we check this flag regardless.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 app/test/test_eal_flags.c | 36 ++++++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/app/test/test_eal_flags.c b/app/test/test_eal_flags.c
index 195a1f5..a0ee4e6 100644
--- a/app/test/test_eal_flags.c
+++ b/app/test/test_eal_flags.c
@@ -768,6 +768,22 @@ test_misc_flags(void)
 	const char *argv11[] = {prgname, "--file-prefix=virtaddr",
 			"-c", "1", "-n", "2", "--base-virtaddr=0x12345678"};
 
+	/* try running with --vfio-intr INTx flag */
+	const char *argv12[] = {prgname, "--file-prefix=intr",
+			"-c", "1", "-n", "2", "--vfio-intr=legacy"};
+
+	/* try running with --vfio-intr MSI flag */
+	const char *argv13[] = {prgname, "--file-prefix=intr",
+			"-c", "1", "-n", "2", "--vfio-intr=msi"};
+
+	/* try running with --vfio-intr MSI-X flag */
+	const char *argv14[] = {prgname, "--file-prefix=intr",
+			"-c", "1", "-n", "2", "--vfio-intr=msix"};
+
+	/* try running with --vfio-intr invalid flag */
+	const char *argv15[] = {prgname, "--file-prefix=intr",
+			"-c", "1", "-n", "2", "--vfio-intr=invalid"};
+
 
 	if (launch_proc(argv0) == 0) {
 		printf("Error - process ran ok with invalid flag\n");
@@ -820,6 +836,26 @@ test_misc_flags(void)
 		printf("Error - process did not run ok with --base-virtaddr parameter\n");
 		return -1;
 	}
+	if (launch_proc(argv12) != 0) {
+		printf("Error - process did not run ok with "
+				"--vfio-intr INTx parameter\n");
+		return -1;
+	}
+	if (launch_proc(argv13) != 0) {
+		printf("Error - process did not run ok with "
+				"--vfio-intr MSI parameter\n");
+		return -1;
+	}
+	if (launch_proc(argv14) != 0) {
+		printf("Error - process did not run ok with "
+				"--vfio-intr MSI-X parameter\n");
+		return -1;
+	}
+	if (launch_proc(argv15) == 0) {
+		printf("Error - process run ok with "
+				"--vfio-intr invalid parameter\n");
+		return -1;
+	}
 	return 0;
 }
 #endif
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v5 18/20] igb_uio: Removed PCI ID table from igb_uio
  2014-06-10 11:11       ` [dpdk-dev] [PATCH v5 00/20] Add VFIO support to DPDK Anatoly Burakov
                           ` (16 preceding siblings ...)
  2014-06-10 11:11         ` [dpdk-dev] [PATCH v5 17/20] test app: adding unit tests for VFIO EAL command-line parameter Anatoly Burakov
@ 2014-06-10 11:11         ` Anatoly Burakov
  2014-06-10 11:11         ` [dpdk-dev] [PATCH v5 19/20] binding script: Renamed igb_uio_bind to dpdk_nic_bind Anatoly Burakov
                           ` (3 subsequent siblings)
  21 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-06-10 11:11 UTC (permalink / raw)
  To: dev

Removing PCI ID list to make igb_uio more similar to a generic driver
like vfio-pci or pci_uio_generic. This is done to make it easier for
the binding script to support multiple drivers.

Note that since igb_uio no longer has a PCI ID list, it can now be
bound to any device, not just those explicitly supported by DPDK. In
other words, it now behaves similar to PCI stub, VFIO and other generic
PCI drivers.

Therefore to bind a new device to igb_uio, the user will now have to
first write its PCI ID to "new_id" file inside the igb_uio driver
directory, and only then write the PCI ID to "bind". This is reflected
in changes to PCI binding script as well.

There's a weird behaviour of sysfs when a new device ID is added to
new_id. Subsequent writing to "bind" will result in IOError on
closing the file. This error is harmless but it triggers the
exception anyway, so in order to work around that, we check if the
device was actually bound to the driver before raising an error.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/linuxapp/igb_uio/igb_uio.c |  21 +-----
 tools/igb_uio_bind.py                     | 118 +++++++++++++++---------------
 2 files changed, 59 insertions(+), 80 deletions(-)

diff --git a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
index 7d5e6b4..6362b1c 100644
--- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
+++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
@@ -65,25 +65,6 @@ struct rte_uio_pci_dev {
 static char *intr_mode = NULL;
 static enum rte_intr_mode igbuio_intr_mode_preferred = RTE_INTR_MODE_MSIX;
 
-/* PCI device id table */
-static struct pci_device_id igbuio_pci_ids[] = {
-#define RTE_PCI_DEV_ID_DECL_EM(vend, dev) {PCI_DEVICE(vend, dev)},
-#define RTE_PCI_DEV_ID_DECL_IGB(vend, dev) {PCI_DEVICE(vend, dev)},
-#define RTE_PCI_DEV_ID_DECL_IGBVF(vend, dev) {PCI_DEVICE(vend, dev)},
-#define RTE_PCI_DEV_ID_DECL_IXGBE(vend, dev) {PCI_DEVICE(vend, dev)},
-#define RTE_PCI_DEV_ID_DECL_IXGBEVF(vend, dev) {PCI_DEVICE(vend, dev)},
-#ifdef RTE_LIBRTE_VIRTIO_PMD
-#define RTE_PCI_DEV_ID_DECL_VIRTIO(vend, dev) {PCI_DEVICE(vend, dev)},
-#endif
-#ifdef RTE_LIBRTE_VMXNET3_PMD
-#define RTE_PCI_DEV_ID_DECL_VMXNET3(vend, dev) {PCI_DEVICE(vend, dev)},
-#endif
-#include <rte_pci_dev_ids.h>
-{ 0, },
-};
-
-MODULE_DEVICE_TABLE(pci, igbuio_pci_ids);
-
 static inline struct rte_uio_pci_dev *
 igbuio_get_uio_pci_dev(struct uio_info *info)
 {
@@ -619,7 +600,7 @@ igbuio_config_intr_mode(char *intr_str)
 
 static struct pci_driver igbuio_pci_driver = {
 	.name = "igb_uio",
-	.id_table = igbuio_pci_ids,
+	.id_table = NULL,
 	.probe = igbuio_pci_probe,
 	.remove = igbuio_pci_remove,
 };
diff --git a/tools/igb_uio_bind.py b/tools/igb_uio_bind.py
index 824aa2b..33adcf4 100755
--- a/tools/igb_uio_bind.py
+++ b/tools/igb_uio_bind.py
@@ -42,8 +42,6 @@ ETHERNET_CLASS = "0200"
 # global dict ethernet devices present. Dictionary indexed by PCI address.
 # Each device within this is itself a dictionary of device properties
 devices = {}
-# list of vendor:device pairs (again stored as dictionary) supported by igb_uio
-module_dev_ids = []
 
 def usage():
     '''Print usage information for the program'''
@@ -147,9 +145,7 @@ def find_module(mod):
                 return path
 
 def check_modules():
-    '''Checks that the needed modules (igb_uio) is loaded, and then
-    determine from the .ko file, what its supported device ids are'''
-    global module_dev_ids
+    '''Checks that igb_uio is loaded'''
     
     fd = file("/proc/modules")
     loaded_mods = fd.readlines()
@@ -165,41 +161,36 @@ def check_modules():
     if not found:
         print "Error - module %s not loaded" %mod
         sys.exit(1)
-    
-    # now find the .ko and get list of supported vendor/dev-ids
-    modpath = find_module(mod)
-    if modpath is None:
-        print "Cannot find module file %s" % (mod + ".ko")
-        sys.exit(1)
-    depmod_output = check_output(["depmod", "-n", modpath]).splitlines()
-    for line in depmod_output:
-        if not line.startswith("alias"):
-            continue
-        if not line.endswith(mod):
-            continue
-        lineparts = line.split()
-        if not(lineparts[1].startswith("pci:")):
-            continue;
-        else:
-            lineparts[1] = lineparts[1][4:]
-        vendor = lineparts[1][:9]
-        device = lineparts[1][9:18]
-        if vendor.startswith("v") and device.startswith("d"):
-            module_dev_ids.append({"Vendor": int(vendor[1:],16), 
-                                   "Device": int(device[1:],16)})
-
-def is_supported_device(dev_id):
-    '''return true if device is supported by igb_uio, false otherwise'''
-    for dev in module_dev_ids:
-        if (dev["Vendor"] == devices[dev_id]["Vendor"] and 
-            dev["Device"] == devices[dev_id]["Device"]):
-            return True
-    return False
 
 def has_driver(dev_id):
     '''return true if a device is assigned to a driver. False otherwise'''
     return "Driver_str" in devices[dev_id]
 
+def get_pci_device_details(dev_id):
+    '''This function gets additional details for a PCI device'''
+    device = {}
+
+    extra_info = check_output(["lspci", "-vmmks", dev_id]).splitlines()
+
+    # parse lspci details
+    for line in extra_info:
+        if len(line) == 0:
+            continue
+        name, value = line.split("\t", 1)
+        name = name.strip(":") + "_str"
+        device[name] = value
+    # check for a unix interface name
+    sys_path = "/sys/bus/pci/devices/%s/net/" % dev_id
+    if exists(sys_path):
+        device["Interface"] = ",".join(os.listdir(sys_path))
+    else:
+        device["Interface"] = ""
+    # check if a port is used for ssh connection
+    device["Ssh_if"] = False
+    device["Active"] = ""
+
+    return device
+
 def get_nic_details():
     '''This function populates the "devices" dictionary. The keys used are
     the pci addresses (domain:bus:slot.func). The values are themselves
@@ -237,23 +228,10 @@ def get_nic_details():
 
     # based on the basic info, get extended text details            
     for d in devices.keys():
-        extra_info = check_output(["lspci", "-vmmks", d]).splitlines()
-        # parse lspci details
-        for line in extra_info:
-            if len(line) == 0:
-                continue
-            name, value = line.split("\t", 1)
-            name = name.strip(":") + "_str"
-            devices[d][name] = value
-        # check for a unix interface name
-        sys_path = "/sys/bus/pci/devices/%s/net/" % d
-        if exists(sys_path):
-            devices[d]["Interface"] = ",".join(os.listdir(sys_path))
-        else:
-            devices[d]["Interface"] = ""
-        # check if a port is used for ssh connection
-        devices[d]["Ssh_if"] = False
-        devices[d]["Active"] = ""
+        # get additional info and add it to existing data
+        devices[d] = dict(devices[d].items() +
+                          get_pci_device_details(d).items())
+
         for _if in ssh_if: 
             if _if in devices[d]["Interface"].split(","):
                 devices[d]["Ssh_if"] = True
@@ -261,14 +239,12 @@ def get_nic_details():
                 break;
 
         # add igb_uio to list of supporting modules if needed
-        if is_supported_device(d):
-            if "Module_str" in devices[d]:
-                if "igb_uio" not in devices[d]["Module_str"]:
-                    devices[d]["Module_str"] = devices[d]["Module_str"] + ",igb_uio"
-            else:
-                devices[d]["Module_str"] = "igb_uio"
-        if "Module_str" not in devices[d]:
-            devices[d]["Module_str"] = "<none>"
+        if "Module_str" in devices[d]:
+            if "igb_uio" not in devices[d]["Module_str"]:
+                devices[d]["Module_str"] = devices[d]["Module_str"] + ",igb_uio"
+        else:
+            devices[d]["Module_str"] = "igb_uio"
+
         # make sure the driver and module strings do not have any duplicates
         if has_driver(d):
             modules = devices[d]["Module_str"].split(",")
@@ -343,6 +319,22 @@ def bind_one(dev_id, driver, force):
             unbind_one(dev_id, force)
             dev["Driver_str"] = "" # clear driver string
 
+    # if we are binding to one of DPDK drivers, add PCI id's to that driver
+    if driver == "igb_uio":
+        filename = "/sys/bus/pci/drivers/%s/new_id" % driver
+        try:
+            f = open(filename, "w")
+        except:
+            print "Error: bind failed for %s - Cannot open %s" % (dev_id, filename)
+            return
+        try:
+            f.write("%04x %04x" % (dev["Vendor"], dev["Device"]))
+            f.close()
+        except:
+            print "Error: bind failed for %s - Cannot write new PCI ID to " \
+                "driver %s" % (dev_id, driver)
+            return
+
     # do the bind by writing to /sys
     filename = "/sys/bus/pci/drivers/%s/bind" % driver
     try:
@@ -356,6 +348,12 @@ def bind_one(dev_id, driver, force):
         f.write(dev_id)
         f.close()
     except:
+        # for some reason, closing dev_id after adding a new PCI ID to new_id
+        # results in IOError. however, if the device was successfully bound,
+        # we don't care for any errors and can safely ignore IOError
+        tmp = get_pci_device_details(dev_id)
+        if "Driver_str" in tmp and tmp["Driver_str"] == driver:
+            return
         print "Error: bind failed for %s - Cannot bind to driver %s" % (dev_id, driver)
         if saved_driver is not None: # restore any previous driver
             bind_one(dev_id, saved_driver, force)
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v5 19/20] binding script: Renamed igb_uio_bind to dpdk_nic_bind
  2014-06-10 11:11       ` [dpdk-dev] [PATCH v5 00/20] Add VFIO support to DPDK Anatoly Burakov
                           ` (17 preceding siblings ...)
  2014-06-10 11:11         ` [dpdk-dev] [PATCH v5 18/20] igb_uio: Removed PCI ID table from igb_uio Anatoly Burakov
@ 2014-06-10 11:11         ` Anatoly Burakov
  2014-06-10 11:11         ` [dpdk-dev] [PATCH v5 20/20] setup script: adding support for VFIO to setup.sh Anatoly Burakov
                           ` (2 subsequent siblings)
  21 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-06-10 11:11 UTC (permalink / raw)
  To: dev

Renaming the igb_uio_bind script to dpdk_nic_bind to have a generic name
since we're now supporting two drivers.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 tools/{igb_uio_bind.py => dpdk_nic_bind.py} | 47 ++++++++++++++++++++---------
 tools/setup.sh                              | 16 +++++-----
 2 files changed, 40 insertions(+), 23 deletions(-)
 rename tools/{igb_uio_bind.py => dpdk_nic_bind.py} (92%)

diff --git a/tools/igb_uio_bind.py b/tools/dpdk_nic_bind.py
similarity index 92%
rename from tools/igb_uio_bind.py
rename to tools/dpdk_nic_bind.py
index 33adcf4..1e517e7 100755
--- a/tools/igb_uio_bind.py
+++ b/tools/dpdk_nic_bind.py
@@ -42,6 +42,8 @@ ETHERNET_CLASS = "0200"
 # global dict ethernet devices present. Dictionary indexed by PCI address.
 # Each device within this is itself a dictionary of device properties
 devices = {}
+# list of supported DPDK drivers
+dpdk_drivers = [ "igb_uio", "vfio-pci" ]
 
 def usage():
     '''Print usage information for the program'''
@@ -146,22 +148,33 @@ def find_module(mod):
 
 def check_modules():
     '''Checks that igb_uio is loaded'''
+    global dpdk_drivers
     
     fd = file("/proc/modules")
     loaded_mods = fd.readlines()
     fd.close()
-    mod = "igb_uio"
+
+    # list of supported modules
+    mods =  [{"Name" : driver, "Found" : False} for driver in dpdk_drivers]
     
     # first check if module is loaded
-    found = False
     for line in loaded_mods:
-        if line.startswith(mod):
-            found = True
-            break
-    if not found:
-        print "Error - module %s not loaded" %mod
+        for mod in mods:
+            if line.startswith(mod["Name"]):
+                mod["Found"] = True
+            # special case for vfio_pci (module is named vfio-pci,
+            # but its .ko is named vfio_pci)
+            elif line.replace("_", "-").startswith(mod["Name"]):
+                mod["Found"] = True
+
+    # check if we have at least one loaded module
+    if True not in [mod["Found"] for mod in mods]:
+        print "Error - no supported modules are loaded"
         sys.exit(1)
 
+    # change DPDK driver list to only contain drivers that are loaded
+    dpdk_drivers = [mod["Name"] for mod in mods if mod["Found"]]
+
 def has_driver(dev_id):
     '''return true if a device is assigned to a driver. False otherwise'''
     return "Driver_str" in devices[dev_id]
@@ -196,6 +209,7 @@ def get_nic_details():
     the pci addresses (domain:bus:slot.func). The values are themselves
     dictionaries - one for each NIC.'''
     global devices
+    global dpdk_drivers
     
     # clear any old data
     devices = {} 
@@ -240,10 +254,11 @@ def get_nic_details():
 
         # add igb_uio to list of supporting modules if needed
         if "Module_str" in devices[d]:
-            if "igb_uio" not in devices[d]["Module_str"]:
-                devices[d]["Module_str"] = devices[d]["Module_str"] + ",igb_uio"
+            for driver in dpdk_drivers:
+                if driver not in devices[d]["Module_str"]:
+                    devices[d]["Module_str"] = devices[d]["Module_str"] + ",%s" % driver
         else:
-            devices[d]["Module_str"] = "igb_uio"
+            devices[d]["Module_str"] = ",".join(dpdk_drivers)
 
         # make sure the driver and module strings do not have any duplicates
         if has_driver(d):
@@ -320,7 +335,7 @@ def bind_one(dev_id, driver, force):
             dev["Driver_str"] = "" # clear driver string
 
     # if we are binding to one of DPDK drivers, add PCI id's to that driver
-    if driver == "igb_uio":
+    if driver in dpdk_drivers:
         filename = "/sys/bus/pci/drivers/%s/new_id" % driver
         try:
             f = open(filename, "w")
@@ -397,21 +412,23 @@ def show_status():
     '''Function called when the script is passed the "--status" option. Displays
     to the user what devices are bound to the igb_uio driver, the kernel driver
     or to no driver'''
+    global dpdk_drivers
     kernel_drv = []
-    uio_drv = []
+    dpdk_drv = []
     no_drv = []
+
     # split our list of devices into the three categories above
     for d in devices.keys():
         if not has_driver(d):
             no_drv.append(devices[d])
             continue
-        if devices[d]["Driver_str"] == "igb_uio":
-            uio_drv.append(devices[d])
+        if devices[d]["Driver_str"] in dpdk_drivers:
+            dpdk_drv.append(devices[d])
         else:
             kernel_drv.append(devices[d])
 
     # print each category separately, so we can clearly see what's used by DPDK
-    display_devices("Network devices using IGB_UIO driver", uio_drv, \
+    display_devices("Network devices using DPDK-compatible driver", dpdk_drv, \
                     "drv=%(Driver_str)s unused=%(Module_str)s")
     display_devices("Network devices using kernel driver", kernel_drv,
                     "if=%(Interface)s drv=%(Driver_str)s unused=%(Module_str)s %(Active)s")
diff --git a/tools/setup.sh b/tools/setup.sh
index 39be8fc..e0671b8 100755
--- a/tools/setup.sh
+++ b/tools/setup.sh
@@ -324,13 +324,13 @@ grep_meminfo()
 }
 
 #
-# Calls igb_uio_bind.py --status to show the NIC and what they
+# Calls dpdk_nic_bind.py --status to show the NIC and what they
 # are all bound to, in terms of drivers.
 #
 show_nics()
 {
 	if  /sbin/lsmod  | grep -q igb_uio ; then 
-		${RTE_SDK}/tools/igb_uio_bind.py --status
+		${RTE_SDK}/tools/dpdk_nic_bind.py --status
 	else 
 		echo "# Please load the 'igb_uio' kernel module before querying or "
 		echo "# adjusting NIC device bindings"
@@ -338,16 +338,16 @@ show_nics()
 }
 
 #
-# Uses igb_uio_bind.py to move devices to work with igb_uio
+# Uses dpdk_nic_bind.py to move devices to work with igb_uio
 #
 bind_nics()
 {
 	if  /sbin/lsmod  | grep -q igb_uio ; then 
-		${RTE_SDK}/tools/igb_uio_bind.py --status
+		${RTE_SDK}/tools/dpdk_nic_bind.py --status
 		echo ""
 		echo -n "Enter PCI address of device to bind to IGB UIO driver: "
 		read PCI_PATH
-		sudo ${RTE_SDK}/tools/igb_uio_bind.py -b igb_uio $PCI_PATH && echo "OK"
+		sudo ${RTE_SDK}/tools/dpdk_nic_bind.py -b igb_uio $PCI_PATH && echo "OK"
 	else 
 		echo "# Please load the 'igb_uio' kernel module before querying or "
 		echo "# adjusting NIC device bindings"
@@ -355,18 +355,18 @@ bind_nics()
 }
 
 #
-# Uses igb_uio_bind.py to move devices to work with kernel drivers again
+# Uses dpdk_nic_bind.py to move devices to work with kernel drivers again
 #
 unbind_nics()
 {
-	${RTE_SDK}/tools/igb_uio_bind.py --status
+	${RTE_SDK}/tools/dpdk_nic_bind.py --status
 	echo ""
 	echo -n "Enter PCI address of device to bind to IGB UIO driver: "
 	read PCI_PATH
 	echo ""
 	echo -n "Enter name of kernel driver to bind the device to: "
 	read DRV
-	sudo ${RTE_SDK}/tools/igb_uio_bind.py -b $DRV $PCI_PATH && echo "OK"
+	sudo ${RTE_SDK}/tools/dpdk_nic_bind.py -b $DRV $PCI_PATH && echo "OK"
 }
 
 #
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v5 20/20] setup script: adding support for VFIO to setup.sh
  2014-06-10 11:11       ` [dpdk-dev] [PATCH v5 00/20] Add VFIO support to DPDK Anatoly Burakov
                           ` (18 preceding siblings ...)
  2014-06-10 11:11         ` [dpdk-dev] [PATCH v5 19/20] binding script: Renamed igb_uio_bind to dpdk_nic_bind Anatoly Burakov
@ 2014-06-10 11:11         ` Anatoly Burakov
  2014-06-13 14:38         ` [dpdk-dev] [PATCH v5 00/20] Add VFIO support to DPDK Burakov, Anatoly
  2014-06-13 14:52         ` [dpdk-dev] [PATCH v6 " Anatoly Burakov
  21 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-06-10 11:11 UTC (permalink / raw)
  To: dev

Support for loading/unloading VFIO drivers, binding/unbinding devices
to/from VFIO, also setting up correct userspace permissions.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 tools/setup.sh | 156 +++++++++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 141 insertions(+), 15 deletions(-)

diff --git a/tools/setup.sh b/tools/setup.sh
index e0671b8..3991da9 100755
--- a/tools/setup.sh
+++ b/tools/setup.sh
@@ -187,6 +187,54 @@ load_igb_uio_module()
 }
 
 #
+# Unloads VFIO modules.
+#
+remove_vfio_module()
+{
+	echo "Unloading any existing VFIO module"
+	/sbin/lsmod | grep -s vfio > /dev/null
+	if [ $? -eq 0 ] ; then
+		sudo /sbin/rmmod vfio-pci
+		sudo /sbin/rmmod vfio_iommu_type1
+		sudo /sbin/rmmod vfio
+	fi
+}
+
+#
+# Loads new vfio-pci (and vfio module if needed).
+#
+load_vfio_module()
+{
+	remove_vfio_module
+
+	VFIO_PATH="kernel/drivers/vfio/pci/vfio-pci.ko"
+
+	echo "Loading VFIO module"
+	/sbin/lsmod | grep -s vfio_pci > /dev/null
+	if [ $? -ne 0 ] ; then
+		if [ -f /lib/modules/$(uname -r)/$VFIO_PATH ] ; then
+			sudo /sbin/modprobe vfio-pci
+		fi
+	fi
+
+	# make sure regular users can read /dev/vfio
+	echo "chmod /dev/vfio"
+	sudo /usr/bin/chmod a+x /dev/vfio
+	if [ $? -ne 0 ] ; then
+		echo "FAIL"
+		quit
+	fi
+	echo "OK"
+
+	# check if /dev/vfio/vfio exists - that way we
+	# know we either loaded the module, or it was
+	# compiled into the kernel
+	if [ ! -e /dev/vfio/vfio ] ; then
+		echo "## ERROR: VFIO not found!"
+	fi
+}
+
+#
 # Unloads the rte_kni.ko module.
 #
 remove_kni_module()
@@ -223,6 +271,55 @@ load_kni_module()
 }
 
 #
+# Sets appropriate permissions on /dev/vfio/* files
+#
+set_vfio_permissions()
+{
+	# make sure regular users can read /dev/vfio
+	echo "chmod /dev/vfio"
+	sudo /usr/bin/chmod a+x /dev/vfio
+	if [ $? -ne 0 ] ; then
+		echo "FAIL"
+		quit
+	fi
+	echo "OK"
+
+	# make sure regular user can access everything inside /dev/vfio
+	echo "chmod /dev/vfio/*"
+	sudo /usr/bin/chmod 0666 /dev/vfio/*
+	if [ $? -ne 0 ] ; then
+		echo "FAIL"
+		quit
+	fi
+	echo "OK"
+
+	# since permissions are only to be set when running as
+	# regular user, we only check ulimit here
+	#
+	# warn if regular user is only allowed
+	# to memlock <64M of memory
+	MEMLOCK_AMNT=`ulimit -l`
+
+	if [ "$MEMLOCK_AMNT" != "unlimited" ] ; then
+		MEMLOCK_MB=`expr $MEMLOCK_AMNT / 1024`
+		echo ""
+		echo "Current user memlock limit: ${MEMLOCK_MB} MB"
+		echo ""
+		echo "This is the maximum amount of memory you will be"
+		echo "able to use with DPDK and VFIO if run as current user."
+		echo -n "To change this, please adjust limits.conf memlock "
+		echo "limit for current user."
+
+		if [ $MEMLOCK_AMNT -lt 65536 ] ; then
+			echo ""
+			echo "## WARNING: memlock limit is less than 64MB"
+			echo -n "## DPDK with VFIO may not be able to initialize "
+			echo "if run as current user."
+		fi
+	fi
+}
+
+#
 # Removes all reserved hugepages.
 #
 clear_huge_pages()
@@ -340,7 +437,24 @@ show_nics()
 #
 # Uses dpdk_nic_bind.py to move devices to work with igb_uio
 #
-bind_nics()
+bind_nics_to_vfio()
+{
+	if /sbin/lsmod  | grep -q vfio_pci ; then
+		${RTE_SDK}/tools/dpdk_nic_bind.py --status
+		echo ""
+		echo -n "Enter PCI address of device to bind to VFIO driver: "
+		read PCI_PATH
+		sudo ${RTE_SDK}/tools/dpdk_nic_bind.py -b vfio-pci $PCI_PATH && echo "OK"
+	else
+		echo "# Please load the 'vfio-pci' kernel module before querying or "
+		echo "# adjusting NIC device bindings"
+	fi
+}
+
+#
+# Uses dpdk_nic_bind.py to move devices to work with igb_uio
+#
+bind_nics_to_igb_uio()
 {
 	if  /sbin/lsmod  | grep -q igb_uio ; then 
 		${RTE_SDK}/tools/dpdk_nic_bind.py --status
@@ -397,20 +511,29 @@ step2_func()
 	TEXT[1]="Insert IGB UIO module"
 	FUNC[1]="load_igb_uio_module"
 
-	TEXT[2]="Insert KNI module"
-	FUNC[2]="load_kni_module"
+	TEXT[2]="Insert VFIO module"
+	FUNC[2]="load_vfio_module"
+
+	TEXT[3]="Insert KNI module"
+	FUNC[3]="load_kni_module"
 
-	TEXT[3]="Setup hugepage mappings for non-NUMA systems"
-	FUNC[3]="set_non_numa_pages"
+	TEXT[4]="Setup hugepage mappings for non-NUMA systems"
+	FUNC[4]="set_non_numa_pages"
 
-	TEXT[4]="Setup hugepage mappings for NUMA systems"
-	FUNC[4]="set_numa_pages"
+	TEXT[5]="Setup hugepage mappings for NUMA systems"
+	FUNC[5]="set_numa_pages"
 
-	TEXT[5]="Display current Ethernet device settings"
-	FUNC[5]="show_nics"
+	TEXT[6]="Display current Ethernet device settings"
+	FUNC[6]="show_nics"
 
-	TEXT[6]="Bind Ethernet device to IGB UIO module"
-	FUNC[6]="bind_nics"
+	TEXT[7]="Bind Ethernet device to IGB UIO module"
+	FUNC[7]="bind_nics_to_igb_uio"
+
+	TEXT[8]="Bind Ethernet device to VFIO module"
+	FUNC[8]="bind_nics_to_vfio"
+
+	TEXT[9]="Setup VFIO permissions"
+	FUNC[9]="set_vfio_permissions"
 }
 
 #
@@ -455,11 +578,14 @@ step5_func()
 	TEXT[3]="Remove IGB UIO module"
 	FUNC[3]="remove_igb_uio_module"
 
-	TEXT[4]="Remove KNI module"
-	FUNC[4]="remove_kni_module"
+	TEXT[4]="Remove VFIO module"
+	FUNC[4]="remove_vfio_module"
+
+	TEXT[5]="Remove KNI module"
+	FUNC[5]="remove_kni_module"
 
-	TEXT[5]="Remove hugepage mappings"
-	FUNC[5]="clear_huge_pages"
+	TEXT[6]="Remove hugepage mappings"
+	FUNC[6]="clear_huge_pages"
 }
 
 STEPS[1]="step1_func"
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [dpdk-dev] [PATCH v5 00/20] Add VFIO support to DPDK
  2014-06-10 11:11       ` [dpdk-dev] [PATCH v5 00/20] Add VFIO support to DPDK Anatoly Burakov
                           ` (19 preceding siblings ...)
  2014-06-10 11:11         ` [dpdk-dev] [PATCH v5 20/20] setup script: adding support for VFIO to setup.sh Anatoly Burakov
@ 2014-06-13 14:38         ` Burakov, Anatoly
  2014-06-13 14:52         ` [dpdk-dev] [PATCH v6 " Anatoly Burakov
  21 siblings, 0 replies; 160+ messages in thread
From: Burakov, Anatoly @ 2014-06-13 14:38 UTC (permalink / raw)
  To: Burakov, Anatoly, dev

> This patchset adds support for using VFIO instead of IGB_UIO to map the
> device BARs.
> 

NAK, there's a small problem with FreeBSD, v6 is coming.

Best regards,
Anatoly Burakov
DPDK SW Engineer

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v6 00/20] Add VFIO support to DPDK
  2014-06-10 11:11       ` [dpdk-dev] [PATCH v5 00/20] Add VFIO support to DPDK Anatoly Burakov
                           ` (20 preceding siblings ...)
  2014-06-13 14:38         ` [dpdk-dev] [PATCH v5 00/20] Add VFIO support to DPDK Burakov, Anatoly
@ 2014-06-13 14:52         ` Anatoly Burakov
  2014-06-13 14:52           ` [dpdk-dev] [PATCH v6 01/20] pci: move open() out of pci_map_resource, rename structs Anatoly Burakov
                             ` (20 more replies)
  21 siblings, 21 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-06-13 14:52 UTC (permalink / raw)
  To: dev

This patchset adds support for using VFIO instead of IGB_UIO to
map the device BARs.

VFIO is a kernel 3.6+ driver allowing secure DMA from userspace
by means of using IOMMU instead of working directly with physical
memory like igb_uio does.

Short summary:
* Adding support for VFIO in EAL PCI code
* Adding new command-line parameter for VFIO interrupt type
* Adding support for VFIO in setup.sh
* Renaming igb_uio_bind to dpdk_nic_bind and adding support for
  VFIO there
* Removing PCI ID list from igb_uio, effectively making it another
  generic PCI driver similar to pci_stub, vfio-pci et al
* Adding autotest for VFIO interrupt types
* Making igb_uio and VFIO compilation optional

v2 fixes:
* Fixed a couple of resource leaks

v3 fixes:
* Fixed various checkpatch.pl issues
* Added MSI interrupt support
* Added an option to automatically determine interrupt type
* Fixed various issues of commit atomicity

v4 fixes:
* Rebased on top of 5ebbb17281645b23359fbd49133bb639b63ba88c
* Fixed a typo in EAL command-line help text

v5 fixes:
* Fixed missing virtio change to RTE_PCI_DRV_NEED_MAPPING
* Fixed compile issue when VFIO was disabled (introduced in v3)

v6 fixes:
* Rebased on top of 36c248ebc629889fff4e7d9d17e109412ddf9ecf
* Fixed FreeBSD issue with failed unbinds (introduced in v1)
* Fixed a few issues found by checkpatch

Tested-by: Waterman Cao <waterman.cao@intel.com> 

This patch has been tested by intel.
We tested this patch with the following functions:
* Layer-2 Forwarding support
* Sample commands test
* Packet forwarding checking
* Bind and unbind VFIO driver
* Compile igb_uio driver ( Linux kernel < 3.6)
* Interrupt model test under Legacy|msi|msix
All cases passed.

Please see test environment information :
Fedora 20 x86_64, Linux Kernel 3.13.6-200,
GCC 4.8.2 Intel Xeon CPU E5-2680 v2 @ 2.80GHz NIC: Intel Niantic 82599

Anatoly Burakov (20):
  pci: move open() out of pci_map_resource, rename structs
  pci: move uio mapping code to a separate file
  pci: fixing errors in a previous commit found by checkpatch
  pci: distinguish between legitimate failures and non-fatal errors
  pci: Rename RTE_PCI_DRV_NEED_IGB_UIO to RTE_PCI_DRV_NEED_MAPPING
  igb_uio: make igb_uio compilation optional
  igb_uio: Moved interrupt type out of igb_uio
  vfio: add support for VFIO in Linuxapp targets
  vfio: add VFIO header
  interrupts: Add support for VFIO interrupts
  eal: remove -Wno-return-type for non-existent eal_hpet.c
  vfio: create mapping code for VFIO
  vfio: add multiprocess support.
  pci: enable VFIO device binding
  eal: added support for selecting VFIO interrupt type from EAL    
    command-line
  eal: make --no-huge use mmap instead of malloc
  test app: adding unit tests for VFIO EAL command-line parameter
  igb_uio: Removed PCI ID table from igb_uio
  binding script: Renamed igb_uio_bind to dpdk_nic_bind
  setup script: adding support for VFIO to setup.sh

 app/test/test_eal_flags.c                          |  36 +
 app/test/test_pci.c                                |   4 +-
 config/common_linuxapp                             |   2 +
 lib/librte_eal/bsdapp/eal/eal_pci.c                |  10 +-
 lib/librte_eal/common/Makefile                     |   1 +
 lib/librte_eal/common/eal_common_pci.c             |  16 +-
 lib/librte_eal/common/include/rte_pci.h            |   5 +-
 .../common/include/rte_pci_dev_feature_defs.h      |  46 ++
 .../common/include/rte_pci_dev_features.h          |  44 ++
 lib/librte_eal/linuxapp/Makefile                   |   2 +
 lib/librte_eal/linuxapp/eal/Makefile               |   5 +-
 lib/librte_eal/linuxapp/eal/eal.c                  |  36 +
 lib/librte_eal/linuxapp/eal/eal_interrupts.c       | 287 +++++++-
 lib/librte_eal/linuxapp/eal/eal_memory.c           |   8 +-
 lib/librte_eal/linuxapp/eal/eal_pci.c              | 476 ++-----------
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c          | 431 +++++++++++
 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c         | 789 +++++++++++++++++++++
 lib/librte_eal/linuxapp/eal/eal_pci_vfio_mp_sync.c | 395 +++++++++++
 .../linuxapp/eal/include/eal_internal_cfg.h        |   3 +
 lib/librte_eal/linuxapp/eal/include/eal_pci_init.h | 116 +++
 lib/librte_eal/linuxapp/eal/include/eal_vfio.h     |  55 ++
 .../linuxapp/eal/include/exec-env/rte_interrupts.h |   4 +
 lib/librte_eal/linuxapp/igb_uio/igb_uio.c          |  69 +-
 lib/librte_pmd_e1000/em_ethdev.c                   |   2 +-
 lib/librte_pmd_e1000/igb_ethdev.c                  |   4 +-
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c                |   4 +-
 lib/librte_pmd_virtio/virtio_ethdev.c              |   2 +-
 lib/librte_pmd_vmxnet3/vmxnet3_ethdev.c            |   2 +-
 tools/{igb_uio_bind.py => dpdk_nic_bind.py}        | 155 ++--
 tools/setup.sh                                     | 173 ++++-
 30 files changed, 2593 insertions(+), 589 deletions(-)
 create mode 100644 lib/librte_eal/common/include/rte_pci_dev_feature_defs.h
 create mode 100644 lib/librte_eal/common/include/rte_pci_dev_features.h
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_pci_uio.c
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_pci_vfio_mp_sync.c
 create mode 100644 lib/librte_eal/linuxapp/eal/include/eal_pci_init.h
 create mode 100644 lib/librte_eal/linuxapp/eal/include/eal_vfio.h
 rename tools/{igb_uio_bind.py => dpdk_nic_bind.py} (83%)

-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v6 01/20] pci: move open() out of pci_map_resource, rename structs
  2014-06-13 14:52         ` [dpdk-dev] [PATCH v6 " Anatoly Burakov
@ 2014-06-13 14:52           ` Anatoly Burakov
  2014-06-13 14:52           ` [dpdk-dev] [PATCH v6 02/20] pci: move uio mapping code to a separate file Anatoly Burakov
                             ` (19 subsequent siblings)
  20 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-06-13 14:52 UTC (permalink / raw)
  To: dev

Separating mapping code and calls to open. This is a preparatory work
for VFIO patch since it'll need to map BARs too but it doesn't use path
in mapped_pci_resource. Also, renaming structs to be more generic.
---
 lib/librte_eal/linuxapp/eal/eal_pci.c | 125 ++++++++++++++++------------------
 1 file changed, 58 insertions(+), 67 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c
index f809574..29f1728 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -31,39 +31,17 @@
  *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */
 
-#include <ctype.h>
-#include <stdio.h>
-#include <stdlib.h>
 #include <string.h>
-#include <stdarg.h>
-#include <unistd.h>
-#include <inttypes.h>
-#include <sys/types.h>
 #include <sys/stat.h>
 #include <fcntl.h>
-#include <stdarg.h>
-#include <errno.h>
 #include <dirent.h>
-#include <limits.h>
-#include <sys/queue.h>
 #include <sys/mman.h>
-#include <sys/ioctl.h>
 
-#include <rte_interrupts.h>
 #include <rte_log.h>
 #include <rte_pci.h>
-#include <rte_common.h>
-#include <rte_launch.h>
-#include <rte_memory.h>
-#include <rte_memzone.h>
 #include <rte_tailq.h>
-#include <rte_eal.h>
 #include <rte_eal_memconfig.h>
-#include <rte_per_lcore.h>
-#include <rte_lcore.h>
 #include <rte_malloc.h>
-#include <rte_string_fns.h>
-#include <rte_debug.h>
 #include <rte_devargs.h>
 
 #include "rte_pci_dev_ids.h"
@@ -74,15 +52,12 @@
  * @file
  * PCI probing under linux
  *
- * This code is used to simulate a PCI probe by parsing information in
- * sysfs. Moreover, when a registered driver matches a device, the
- * kernel driver currently using it is unloaded and replaced by
- * igb_uio module, which is a very minimal userland driver for Intel
- * network card, only providing access to PCI BAR to applications, and
- * enabling bus master.
+ * This code is used to simulate a PCI probe by parsing information in sysfs.
+ * When a registered device matches a driver, it is then initialized with
+ * IGB_UIO driver (or doesn't initialize, if the device wasn't bound to it).
  */
 
-struct uio_map {
+struct pci_map {
 	void *addr;
 	uint64_t offset;
 	uint64_t size;
@@ -93,18 +68,18 @@ struct uio_map {
  * For multi-process we need to reproduce all PCI mappings in secondary
  * processes, so save them in a tailq.
  */
-struct uio_resource {
-	TAILQ_ENTRY(uio_resource) next;
+struct mapped_pci_resource {
+	TAILQ_ENTRY(mapped_pci_resource) next;
 
 	struct rte_pci_addr pci_addr;
 	char path[PATH_MAX];
-	size_t nb_maps;
-	struct uio_map maps[PCI_MAX_RESOURCE];
+	int nb_maps;
+	struct pci_map maps[PCI_MAX_RESOURCE];
 };
 
-TAILQ_HEAD(uio_res_list, uio_resource);
+TAILQ_HEAD(mapped_pci_res_list, mapped_pci_resource);
+static struct mapped_pci_res_list *pci_res_list;
 
-static struct uio_res_list *uio_res_list = NULL;
 static int pci_parse_sysfs_value(const char *filename, uint64_t *val);
 
 /* unbind kernel driver for this device */
@@ -148,30 +123,17 @@ error:
 
 /* map a particular resource from a file */
 static void *
-pci_map_resource(void *requested_addr, const char *devname, off_t offset,
-		 size_t size)
+pci_map_resource(void *requested_addr, int fd, off_t offset, size_t size)
 {
-	int fd;
 	void *mapaddr;
 
-	/*
-	 * open devname, to mmap it
-	 */
-	fd = open(devname, O_RDWR);
-	if (fd < 0) {
-		RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
-			devname, strerror(errno));
-		goto fail;
-	}
-
 	/* Map the PCI memory resource of device */
 	mapaddr = mmap(requested_addr, size, PROT_READ | PROT_WRITE,
 			MAP_SHARED, fd, offset);
-	close(fd);
 	if (mapaddr == MAP_FAILED ||
 			(requested_addr != NULL && mapaddr != requested_addr)) {
-		RTE_LOG(ERR, EAL, "%s(): cannot mmap(%s(%d), %p, 0x%lx, 0x%lx):"
-			" %s (%p)\n", __func__, devname, fd, requested_addr,
+		RTE_LOG(ERR, EAL, "%s(): cannot mmap(%d, %p, 0x%lx, 0x%lx): %s (%p)\n",
+			__func__, fd, requested_addr,
 			(unsigned long)size, (unsigned long)offset,
 			strerror(errno), mapaddr);
 		goto fail;
@@ -186,10 +148,10 @@ fail:
 }
 
 #define OFF_MAX              ((uint64_t)(off_t)-1)
-static ssize_t
-pci_uio_get_mappings(const char *devname, struct uio_map maps[], size_t nb_maps)
+static int
+pci_uio_get_mappings(const char *devname, struct pci_map maps[], int nb_maps)
 {
-	size_t i;
+	int i;
 	char dirname[PATH_MAX];
 	char filename[PATH_MAX];
 	uint64_t offset, size;
@@ -249,25 +211,37 @@ pci_uio_get_mappings(const char *devname, struct uio_map maps[], size_t nb_maps)
 static int
 pci_uio_map_secondary(struct rte_pci_device *dev)
 {
-        size_t i;
-        struct uio_resource *uio_res;
+	int fd, i;
+	struct mapped_pci_resource *uio_res;
 
-	TAILQ_FOREACH(uio_res, uio_res_list, next) {
+	TAILQ_FOREACH(uio_res, pci_res_list, next) {
 
 		/* skip this element if it doesn't match our PCI address */
 		if (memcmp(&uio_res->pci_addr, &dev->addr, sizeof(dev->addr)))
 			continue;
 
 		for (i = 0; i != uio_res->nb_maps; i++) {
-			if (pci_map_resource(uio_res->maps[i].addr,
-					     uio_res->path,
+			/*
+			 * open devname, to mmap it
+			 */
+			fd = open(uio_res->path, O_RDWR);
+			if (fd < 0) {
+				RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
+					uio_res->path, strerror(errno));
+				return -1;
+			}
+
+			if (pci_map_resource(uio_res->maps[i].addr, fd,
 					     (off_t)uio_res->maps[i].offset,
 					     (size_t)uio_res->maps[i].size)
 			    != uio_res->maps[i].addr) {
 				RTE_LOG(ERR, EAL,
 					"Cannot mmap device resource\n");
+				close(fd);
 				return (-1);
 			}
+			/* fd is not needed in slave process, close it */
+			close(fd);
 		}
 		return (0);
 	}
@@ -276,7 +250,8 @@ pci_uio_map_secondary(struct rte_pci_device *dev)
 	return -1;
 }
 
-static int pci_mknod_uio_dev(const char *sysfs_uio_path, unsigned uio_num)
+static int
+pci_mknod_uio_dev(const char *sysfs_uio_path, unsigned uio_num)
 {
 	FILE *f;
 	char filename[PATH_MAX];
@@ -323,7 +298,8 @@ static int pci_mknod_uio_dev(const char *sysfs_uio_path, unsigned uio_num)
  * sysfs. On error, return a negative value. In this case dstbuf is
  * invalid.
  */
-static int pci_get_uio_dev(struct rte_pci_device *dev, char *dstbuf,
+static int
+pci_get_uio_dev(struct rte_pci_device *dev, char *dstbuf,
 			   unsigned int buflen)
 {
 	struct rte_pci_addr *loc = &dev->addr;
@@ -405,10 +381,10 @@ pci_uio_map_resource(struct rte_pci_device *dev)
 	uint64_t phaddr;
 	uint64_t offset;
 	uint64_t pagesz;
-	ssize_t nb_maps;
+	int nb_maps;
 	struct rte_pci_addr *loc = &dev->addr;
-	struct uio_resource *uio_res;
-	struct uio_map *maps;
+	struct mapped_pci_resource *uio_res;
+	struct pci_map *maps;
 
 	dev->intr_handle.fd = -1;
 	dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
@@ -460,6 +436,7 @@ pci_uio_map_resource(struct rte_pci_device *dev)
 
 	maps = uio_res->maps;
 	for (i = 0; i != PCI_MAX_RESOURCE; i++) {
+		int fd;
 
 		/* skip empty BAR */
 		if ((phaddr = dev->mem_resource[i].phys_addr) == 0)
@@ -473,14 +450,27 @@ pci_uio_map_resource(struct rte_pci_device *dev)
 		/* if matching map is found, then use it */
 		if (j != nb_maps) {
 			offset = j * pagesz;
+
+			/*
+			 * open devname, to mmap it
+			 */
+			fd = open(devname, O_RDWR);
+			if (fd < 0) {
+				RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
+					devname, strerror(errno));
+				return -1;
+			}
+
 			if (maps[j].addr != NULL ||
-			    (mapaddr = pci_map_resource(NULL, devname,
+			    (mapaddr = pci_map_resource(NULL, fd,
 							(off_t)offset,
 							(size_t)maps[j].size)
 			    ) == NULL) {
 				rte_free(uio_res);
+				close(fd);
 				return (-1);
 			}
+			close(fd);
 
 			maps[j].addr = mapaddr;
 			maps[j].offset = offset;
@@ -488,7 +478,7 @@ pci_uio_map_resource(struct rte_pci_device *dev)
 		}
 	}
 
-	TAILQ_INSERT_TAIL(uio_res_list, uio_res, next);
+	TAILQ_INSERT_TAIL(pci_res_list, uio_res, next);
 
 	return (0);
 }
@@ -866,7 +856,8 @@ rte_eal_pci_init(void)
 {
 	TAILQ_INIT(&pci_driver_list);
 	TAILQ_INIT(&pci_device_list);
-	uio_res_list = RTE_TAILQ_RESERVE_BY_IDX(RTE_TAILQ_PCI, uio_res_list);
+	pci_res_list = RTE_TAILQ_RESERVE_BY_IDX(RTE_TAILQ_PCI,
+			mapped_pci_res_list);
 
 	/* for debug purposes, PCI can be disabled */
 	if (internal_config.no_pci)
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v6 02/20] pci: move uio mapping code to a separate file
  2014-06-13 14:52         ` [dpdk-dev] [PATCH v6 " Anatoly Burakov
  2014-06-13 14:52           ` [dpdk-dev] [PATCH v6 01/20] pci: move open() out of pci_map_resource, rename structs Anatoly Burakov
@ 2014-06-13 14:52           ` Anatoly Burakov
  2014-06-13 14:52           ` [dpdk-dev] [PATCH v6 03/20] pci: fixing errors in a previous commit found by checkpatch Anatoly Burakov
                             ` (18 subsequent siblings)
  20 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-06-13 14:52 UTC (permalink / raw)
  To: dev

---
 lib/librte_eal/linuxapp/eal/Makefile               |   1 +
 lib/librte_eal/linuxapp/eal/eal_pci.c              | 403 +-------------------
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c          | 421 +++++++++++++++++++++
 lib/librte_eal/linuxapp/eal/include/eal_pci_init.h |  66 ++++
 4 files changed, 492 insertions(+), 399 deletions(-)
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_pci_uio.c
 create mode 100644 lib/librte_eal/linuxapp/eal/include/eal_pci_init.h

diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index dad1f79..6e320ec 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -57,6 +57,7 @@ endif
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_thread.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_log.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci.c
+SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci_uio.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_debug.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_lcore.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_timer.c
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c
index 29f1728..a422e5f 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -32,8 +32,6 @@
  */
 
 #include <string.h>
-#include <sys/stat.h>
-#include <fcntl.h>
 #include <dirent.h>
 #include <sys/mman.h>
 
@@ -47,6 +45,7 @@
 #include "rte_pci_dev_ids.h"
 #include "eal_filesystem.h"
 #include "eal_private.h"
+#include "eal_pci_init.h"
 
 /**
  * @file
@@ -57,30 +56,7 @@
  * IGB_UIO driver (or doesn't initialize, if the device wasn't bound to it).
  */
 
-struct pci_map {
-	void *addr;
-	uint64_t offset;
-	uint64_t size;
-	uint64_t phaddr;
-};
-
-/*
- * For multi-process we need to reproduce all PCI mappings in secondary
- * processes, so save them in a tailq.
- */
-struct mapped_pci_resource {
-	TAILQ_ENTRY(mapped_pci_resource) next;
-
-	struct rte_pci_addr pci_addr;
-	char path[PATH_MAX];
-	int nb_maps;
-	struct pci_map maps[PCI_MAX_RESOURCE];
-};
-
-TAILQ_HEAD(mapped_pci_res_list, mapped_pci_resource);
-static struct mapped_pci_res_list *pci_res_list;
-
-static int pci_parse_sysfs_value(const char *filename, uint64_t *val);
+struct mapped_pci_res_list *pci_res_list = NULL;
 
 /* unbind kernel driver for this device */
 static int
@@ -122,8 +98,8 @@ error:
 }
 
 /* map a particular resource from a file */
-static void *
-pci_map_resource(void *requested_addr, int fd, off_t offset, size_t size)
+void *
+pci_map_resource(void * requested_addr, int fd, off_t offset, size_t size)
 {
 	void *mapaddr;
 
@@ -147,342 +123,6 @@ fail:
 	return NULL;
 }
 
-#define OFF_MAX              ((uint64_t)(off_t)-1)
-static int
-pci_uio_get_mappings(const char *devname, struct pci_map maps[], int nb_maps)
-{
-	int i;
-	char dirname[PATH_MAX];
-	char filename[PATH_MAX];
-	uint64_t offset, size;
-
-	for (i = 0; i != nb_maps; i++) {
-
-		/* check if map directory exists */
-		rte_snprintf(dirname, sizeof(dirname),
-			"%s/maps/map%u", devname, i);
-
-		if (access(dirname, F_OK) != 0)
-			break;
-
-		/* get mapping offset */
-		rte_snprintf(filename, sizeof(filename),
-			"%s/offset", dirname);
-		if (pci_parse_sysfs_value(filename, &offset) < 0) {
-			RTE_LOG(ERR, EAL,
-				"%s(): cannot parse offset of %s\n",
-				__func__, dirname);
-			return (-1);
-		}
-
-		/* get mapping size */
-		rte_snprintf(filename, sizeof(filename),
-			"%s/size", dirname);
-		if (pci_parse_sysfs_value(filename, &size) < 0) {
-			RTE_LOG(ERR, EAL,
-				"%s(): cannot parse size of %s\n",
-				__func__, dirname);
-			return (-1);
-		}
-
-		/* get mapping physical address */
-		rte_snprintf(filename, sizeof(filename),
-			"%s/addr", dirname);
-		if (pci_parse_sysfs_value(filename, &maps[i].phaddr) < 0) {
-			RTE_LOG(ERR, EAL,
-				"%s(): cannot parse addr of %s\n",
-				__func__, dirname);
-			return (-1);
-		}
-
-		if ((offset > OFF_MAX) || (size > SIZE_MAX)) {
-			RTE_LOG(ERR, EAL,
-				"%s(): offset/size exceed system max value\n",
-				__func__);
-			return (-1);
-		}
-
-		maps[i].offset = offset;
-		maps[i].size = size;
-        }
-	return (i);
-}
-
-static int
-pci_uio_map_secondary(struct rte_pci_device *dev)
-{
-	int fd, i;
-	struct mapped_pci_resource *uio_res;
-
-	TAILQ_FOREACH(uio_res, pci_res_list, next) {
-
-		/* skip this element if it doesn't match our PCI address */
-		if (memcmp(&uio_res->pci_addr, &dev->addr, sizeof(dev->addr)))
-			continue;
-
-		for (i = 0; i != uio_res->nb_maps; i++) {
-			/*
-			 * open devname, to mmap it
-			 */
-			fd = open(uio_res->path, O_RDWR);
-			if (fd < 0) {
-				RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
-					uio_res->path, strerror(errno));
-				return -1;
-			}
-
-			if (pci_map_resource(uio_res->maps[i].addr, fd,
-					     (off_t)uio_res->maps[i].offset,
-					     (size_t)uio_res->maps[i].size)
-			    != uio_res->maps[i].addr) {
-				RTE_LOG(ERR, EAL,
-					"Cannot mmap device resource\n");
-				close(fd);
-				return (-1);
-			}
-			/* fd is not needed in slave process, close it */
-			close(fd);
-		}
-		return (0);
-	}
-
-	RTE_LOG(ERR, EAL, "Cannot find resource for device\n");
-	return -1;
-}
-
-static int
-pci_mknod_uio_dev(const char *sysfs_uio_path, unsigned uio_num)
-{
-	FILE *f;
-	char filename[PATH_MAX];
-	int ret;
-	unsigned major, minor;
-	dev_t dev;
-
-	/* get the name of the sysfs file that contains the major and minor
-	 * of the uio device and read its content */
-	rte_snprintf(filename, sizeof(filename), "%s/dev", sysfs_uio_path);
-
-	f = fopen(filename, "r");
-	if (f == NULL) {
-		RTE_LOG(ERR, EAL, "%s(): cannot open sysfs to get major:minor\n",
-			__func__);
-		return -1;
-	}
-
-	ret = fscanf(f, "%d:%d", &major, &minor);
-	if (ret != 2) {
-		RTE_LOG(ERR, EAL, "%s(): cannot parse sysfs to get major:minor\n",
-			__func__);
-		fclose(f);
-		return -1;
-	}
-	fclose(f);
-
-	/* create the char device "mknod /dev/uioX c major minor" */
-	rte_snprintf(filename, sizeof(filename), "/dev/uio%u", uio_num);
-	dev = makedev(major, minor);
-	ret = mknod(filename, S_IFCHR | S_IRUSR | S_IWUSR, dev);
-	if (f == NULL) {
-		RTE_LOG(ERR, EAL, "%s(): mknod() failed %s\n",
-			__func__, strerror(errno));
-		return -1;
-	}
-
-	return ret;
-}
-
-/*
- * Return the uioX char device used for a pci device. On success, return
- * the UIO number and fill dstbuf string with the path of the device in
- * sysfs. On error, return a negative value. In this case dstbuf is
- * invalid.
- */
-static int
-pci_get_uio_dev(struct rte_pci_device *dev, char *dstbuf,
-			   unsigned int buflen)
-{
-	struct rte_pci_addr *loc = &dev->addr;
-	unsigned int uio_num;
-	struct dirent *e;
-	DIR *dir;
-	char dirname[PATH_MAX];
-
-	/* depending on kernel version, uio can be located in uio/uioX
-	 * or uio:uioX */
-
-	rte_snprintf(dirname, sizeof(dirname),
-	         SYSFS_PCI_DEVICES "/" PCI_PRI_FMT "/uio",
-	         loc->domain, loc->bus, loc->devid, loc->function);
-
-	dir = opendir(dirname);
-	if (dir == NULL) {
-		/* retry with the parent directory */
-		rte_snprintf(dirname, sizeof(dirname),
-		         SYSFS_PCI_DEVICES "/" PCI_PRI_FMT,
-		         loc->domain, loc->bus, loc->devid, loc->function);
-		dir = opendir(dirname);
-
-		if (dir == NULL) {
-			RTE_LOG(ERR, EAL, "Cannot opendir %s\n", dirname);
-			return -1;
-		}
-	}
-
-	/* take the first file starting with "uio" */
-	while ((e = readdir(dir)) != NULL) {
-		/* format could be uio%d ...*/
-		int shortprefix_len = sizeof("uio") - 1;
-		/* ... or uio:uio%d */
-		int longprefix_len = sizeof("uio:uio") - 1;
-		char *endptr;
-
-		if (strncmp(e->d_name, "uio", 3) != 0)
-			continue;
-
-		/* first try uio%d */
-		errno = 0;
-		uio_num = strtoull(e->d_name + shortprefix_len, &endptr, 10);
-		if (errno == 0 && endptr != (e->d_name + shortprefix_len)) {
-			rte_snprintf(dstbuf, buflen, "%s/uio%u", dirname, uio_num);
-			break;
-		}
-
-		/* then try uio:uio%d */
-		errno = 0;
-		uio_num = strtoull(e->d_name + longprefix_len, &endptr, 10);
-		if (errno == 0 && endptr != (e->d_name + longprefix_len)) {
-			rte_snprintf(dstbuf, buflen, "%s/uio:uio%u", dirname, uio_num);
-			break;
-		}
-	}
-	closedir(dir);
-
-	/* No uio resource found */
-	if (e == NULL)
-		return -1;
-
-	/* create uio device if we've been asked to */
-	if (internal_config.create_uio_dev && pci_mknod_uio_dev(dstbuf, uio_num) < 0)
-		RTE_LOG(WARNING, EAL, "Cannot create /dev/uio%u\n", uio_num);
-
-	return uio_num;
-}
-
-/* map the PCI resource of a PCI device in virtual memory */
-static int
-pci_uio_map_resource(struct rte_pci_device *dev)
-{
-	int i, j;
-	char dirname[PATH_MAX];
-	char devname[PATH_MAX]; /* contains the /dev/uioX */
-	void *mapaddr;
-	int uio_num;
-	uint64_t phaddr;
-	uint64_t offset;
-	uint64_t pagesz;
-	int nb_maps;
-	struct rte_pci_addr *loc = &dev->addr;
-	struct mapped_pci_resource *uio_res;
-	struct pci_map *maps;
-
-	dev->intr_handle.fd = -1;
-	dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
-
-	/* secondary processes - use already recorded details */
-	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
-		return (pci_uio_map_secondary(dev));
-
-	/* find uio resource */
-	uio_num = pci_get_uio_dev(dev, dirname, sizeof(dirname));
-	if (uio_num < 0) {
-		RTE_LOG(WARNING, EAL, "  "PCI_PRI_FMT" not managed by UIO driver, "
-				"skipping\n", loc->domain, loc->bus, loc->devid, loc->function);
-		return -1;
-	}
-	rte_snprintf(devname, sizeof(devname), "/dev/uio%u", uio_num);
-
-	/* save fd if in primary process */
-	dev->intr_handle.fd = open(devname, O_RDWR);
-	if (dev->intr_handle.fd < 0) {
-		RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
-			devname, strerror(errno));
-		return -1;
-	}
-	dev->intr_handle.type = RTE_INTR_HANDLE_UIO;
-
-	/* allocate the mapping details for secondary processes*/
-	if ((uio_res = rte_zmalloc("UIO_RES", sizeof (*uio_res), 0)) == NULL) {
-		RTE_LOG(ERR, EAL,
-			"%s(): cannot store uio mmap details\n", __func__);
-		return (-1);
-	}
-
-	rte_snprintf(uio_res->path, sizeof(uio_res->path), "%s", devname);
-	memcpy(&uio_res->pci_addr, &dev->addr, sizeof(uio_res->pci_addr));
-
-	/* collect info about device mappings */
-	nb_maps = pci_uio_get_mappings(dirname, uio_res->maps,
-				       RTE_DIM(uio_res->maps));
-	if (nb_maps < 0) {
-		rte_free(uio_res);
-		return (nb_maps);
-	}
-
-	uio_res->nb_maps = nb_maps;
-
-	/* Map all BARs */
-	pagesz = sysconf(_SC_PAGESIZE);
-
-	maps = uio_res->maps;
-	for (i = 0; i != PCI_MAX_RESOURCE; i++) {
-		int fd;
-
-		/* skip empty BAR */
-		if ((phaddr = dev->mem_resource[i].phys_addr) == 0)
-			continue;
-
-		for (j = 0; j != nb_maps && (phaddr != maps[j].phaddr ||
-				dev->mem_resource[i].len != maps[j].size);
-				j++)
-			;
-
-		/* if matching map is found, then use it */
-		if (j != nb_maps) {
-			offset = j * pagesz;
-
-			/*
-			 * open devname, to mmap it
-			 */
-			fd = open(devname, O_RDWR);
-			if (fd < 0) {
-				RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
-					devname, strerror(errno));
-				return -1;
-			}
-
-			if (maps[j].addr != NULL ||
-			    (mapaddr = pci_map_resource(NULL, fd,
-							(off_t)offset,
-							(size_t)maps[j].size)
-			    ) == NULL) {
-				rte_free(uio_res);
-				close(fd);
-				return (-1);
-			}
-			close(fd);
-
-			maps[j].addr = mapaddr;
-			maps[j].offset = offset;
-			dev->mem_resource[i].addr = mapaddr;
-		}
-	}
-
-	TAILQ_INSERT_TAIL(pci_res_list, uio_res, next);
-
-	return (0);
-}
-
 /* parse the "resource" sysfs file */
 #define IORESOURCE_MEM  0x00000200
 
@@ -546,41 +186,6 @@ error:
 	return -1;
 }
 
-/*
- * parse a sysfs file containing one integer value
- * different to the eal version, as it needs to work with 64-bit values
- */
-static int
-pci_parse_sysfs_value(const char *filename, uint64_t *val)
-{
-        FILE *f;
-        char buf[BUFSIZ];
-        char *end = NULL;
-
-        f = fopen(filename, "r");
-        if (f == NULL) {
-                RTE_LOG(ERR, EAL, "%s(): cannot open sysfs value %s\n",
-                        __func__, filename);
-                return -1;
-        }
-
-        if (fgets(buf, sizeof(buf), f) == NULL) {
-                RTE_LOG(ERR, EAL, "%s(): cannot read sysfs value %s\n",
-                        __func__, filename);
-                fclose(f);
-                return -1;
-        }
-        *val = strtoull(buf, &end, 0);
-        if ((buf[0] == '\0') || (end == NULL) || (*end != '\n')) {
-                RTE_LOG(ERR, EAL, "%s(): cannot parse sysfs value %s\n",
-                                __func__, filename);
-                fclose(f);
-                return -1;
-        }
-        fclose(f);
-        return 0;
-}
-
 /* Compare two PCI device addresses. */
 static int
 pci_addr_comparison(struct rte_pci_addr *addr, struct rte_pci_addr *addr2)
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
new file mode 100644
index 0000000..c9a12a1
--- /dev/null
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
@@ -0,0 +1,421 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <string.h>
+#include <fcntl.h>
+#include <dirent.h>
+#include <sys/stat.h>
+
+#include <rte_log.h>
+#include <rte_pci.h>
+#include <rte_common.h>
+#include <rte_malloc.h>
+#include <rte_tailq.h>
+
+#include "rte_pci_dev_ids.h"
+#include "eal_filesystem.h"
+#include "eal_pci_init.h"
+
+static int pci_parse_sysfs_value(const char *filename, uint64_t *val);
+
+
+#define OFF_MAX              ((uint64_t)(off_t)-1)
+static int
+pci_uio_get_mappings(const char *devname, struct pci_map maps[], int nb_maps)
+{
+	int i;
+	char dirname[PATH_MAX];
+	char filename[PATH_MAX];
+	uint64_t offset, size;
+
+	for (i = 0; i != nb_maps; i++) {
+
+		/* check if map directory exists */
+		rte_snprintf(dirname, sizeof(dirname),
+			"%s/maps/map%u", devname, i);
+
+		if (access(dirname, F_OK) != 0)
+			break;
+
+		/* get mapping offset */
+		rte_snprintf(filename, sizeof(filename),
+			"%s/offset", dirname);
+		if (pci_parse_sysfs_value(filename, &offset) < 0) {
+			RTE_LOG(ERR, EAL,
+				"%s(): cannot parse offset of %s\n",
+				__func__, dirname);
+			return (-1);
+		}
+
+		/* get mapping size */
+		rte_snprintf(filename, sizeof(filename),
+			"%s/size", dirname);
+		if (pci_parse_sysfs_value(filename, &size) < 0) {
+			RTE_LOG(ERR, EAL,
+				"%s(): cannot parse size of %s\n",
+				__func__, dirname);
+			return (-1);
+		}
+
+		/* get mapping physical address */
+		rte_snprintf(filename, sizeof(filename),
+			"%s/addr", dirname);
+		if (pci_parse_sysfs_value(filename, &maps[i].phaddr) < 0) {
+			RTE_LOG(ERR, EAL,
+				"%s(): cannot parse addr of %s\n",
+				__func__, dirname);
+			return (-1);
+		}
+
+		if ((offset > OFF_MAX) || (size > SIZE_MAX)) {
+			RTE_LOG(ERR, EAL,
+				"%s(): offset/size exceed system max value\n",
+				__func__);
+			return (-1);
+		}
+
+		maps[i].offset = offset;
+		maps[i].size = size;
+        }
+	return (i);
+}
+
+static int
+pci_uio_map_secondary(struct rte_pci_device *dev)
+{
+	int fd, i;
+	struct mapped_pci_resource *uio_res;
+
+	TAILQ_FOREACH(uio_res, pci_res_list, next) {
+
+		/* skip this element if it doesn't match our PCI address */
+		if (memcmp(&uio_res->pci_addr, &dev->addr, sizeof(dev->addr)))
+			continue;
+
+		for (i = 0; i != uio_res->nb_maps; i++) {
+			/*
+			 * open devname, to mmap it
+			 */
+			fd = open(uio_res->path, O_RDWR);
+			if (fd < 0) {
+				RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
+					uio_res->path, strerror(errno));
+				return -1;
+			}
+
+			if (pci_map_resource(uio_res->maps[i].addr, fd,
+					     (off_t)uio_res->maps[i].offset,
+					     (size_t)uio_res->maps[i].size)
+			    != uio_res->maps[i].addr) {
+				RTE_LOG(ERR, EAL,
+					"Cannot mmap device resource\n");
+				close(fd);
+				return (-1);
+			}
+			/* fd is not needed in slave process, close it */
+			close(fd);
+		}
+		return (0);
+	}
+
+	RTE_LOG(ERR, EAL, "Cannot find resource for device\n");
+	return -1;
+}
+
+static int
+pci_mknod_uio_dev(const char *sysfs_uio_path, unsigned uio_num)
+{
+	FILE *f;
+	char filename[PATH_MAX];
+	int ret;
+	unsigned major, minor;
+	dev_t dev;
+
+	/* get the name of the sysfs file that contains the major and minor
+	 * of the uio device and read its content */
+	rte_snprintf(filename, sizeof(filename), "%s/dev", sysfs_uio_path);
+
+	f = fopen(filename, "r");
+	if (f == NULL) {
+		RTE_LOG(ERR, EAL, "%s(): cannot open sysfs to get major:minor\n",
+			__func__);
+		return -1;
+	}
+
+	ret = fscanf(f, "%d:%d", &major, &minor);
+	if (ret != 2) {
+		RTE_LOG(ERR, EAL, "%s(): cannot parse sysfs to get major:minor\n",
+			__func__);
+		fclose(f);
+		return -1;
+	}
+	fclose(f);
+
+	/* create the char device "mknod /dev/uioX c major minor" */
+	rte_snprintf(filename, sizeof(filename), "/dev/uio%u", uio_num);
+	dev = makedev(major, minor);
+	ret = mknod(filename, S_IFCHR | S_IRUSR | S_IWUSR, dev);
+	if (f == NULL) {
+		RTE_LOG(ERR, EAL, "%s(): mknod() failed %s\n",
+			__func__, strerror(errno));
+		return -1;
+	}
+
+	return ret;
+}
+
+/*
+ * Return the uioX char device used for a pci device. On success, return
+ * the UIO number and fill dstbuf string with the path of the device in
+ * sysfs. On error, return a negative value. In this case dstbuf is
+ * invalid.
+ */
+static int
+pci_get_uio_dev(struct rte_pci_device *dev, char *dstbuf,
+			   unsigned int buflen)
+{
+	struct rte_pci_addr *loc = &dev->addr;
+	unsigned int uio_num;
+	struct dirent *e;
+	DIR *dir;
+	char dirname[PATH_MAX];
+
+	/* depending on kernel version, uio can be located in uio/uioX
+	 * or uio:uioX */
+
+	rte_snprintf(dirname, sizeof(dirname),
+	         SYSFS_PCI_DEVICES "/" PCI_PRI_FMT "/uio",
+	         loc->domain, loc->bus, loc->devid, loc->function);
+
+	dir = opendir(dirname);
+	if (dir == NULL) {
+		/* retry with the parent directory */
+		rte_snprintf(dirname, sizeof(dirname),
+		         SYSFS_PCI_DEVICES "/" PCI_PRI_FMT,
+		         loc->domain, loc->bus, loc->devid, loc->function);
+		dir = opendir(dirname);
+
+		if (dir == NULL) {
+			RTE_LOG(ERR, EAL, "Cannot opendir %s\n", dirname);
+			return -1;
+		}
+	}
+
+	/* take the first file starting with "uio" */
+	while ((e = readdir(dir)) != NULL) {
+		/* format could be uio%d ...*/
+		int shortprefix_len = sizeof("uio") - 1;
+		/* ... or uio:uio%d */
+		int longprefix_len = sizeof("uio:uio") - 1;
+		char *endptr;
+
+		if (strncmp(e->d_name, "uio", 3) != 0)
+			continue;
+
+		/* first try uio%d */
+		errno = 0;
+		uio_num = strtoull(e->d_name + shortprefix_len, &endptr, 10);
+		if (errno == 0 && endptr != (e->d_name + shortprefix_len)) {
+			rte_snprintf(dstbuf, buflen, "%s/uio%u", dirname, uio_num);
+			break;
+		}
+
+		/* then try uio:uio%d */
+		errno = 0;
+		uio_num = strtoull(e->d_name + longprefix_len, &endptr, 10);
+		if (errno == 0 && endptr != (e->d_name + longprefix_len)) {
+			rte_snprintf(dstbuf, buflen, "%s/uio:uio%u", dirname, uio_num);
+			break;
+		}
+	}
+	closedir(dir);
+
+	/* No uio resource found */
+	if (e == NULL)
+		return -1;
+
+	/* create uio device if we've been asked to */
+	if (internal_config.create_uio_dev && pci_mknod_uio_dev(dstbuf, uio_num) < 0)
+		RTE_LOG(WARNING, EAL, "Cannot create /dev/uio%u\n", uio_num);
+
+	return uio_num;
+}
+
+/* map the PCI resource of a PCI device in virtual memory */
+int
+pci_uio_map_resource(struct rte_pci_device *dev)
+{
+	int i, j;
+	char dirname[PATH_MAX];
+	char devname[PATH_MAX]; /* contains the /dev/uioX */
+	void *mapaddr;
+	int uio_num;
+	uint64_t phaddr;
+	uint64_t offset;
+	uint64_t pagesz;
+	int nb_maps;
+	struct rte_pci_addr *loc = &dev->addr;
+	struct mapped_pci_resource *uio_res;
+	struct pci_map *maps;
+
+	dev->intr_handle.fd = -1;
+	dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
+
+	/* secondary processes - use already recorded details */
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
+		return (pci_uio_map_secondary(dev));
+
+	/* find uio resource */
+	uio_num = pci_get_uio_dev(dev, dirname, sizeof(dirname));
+	if (uio_num < 0) {
+		RTE_LOG(WARNING, EAL, "  "PCI_PRI_FMT" not managed by UIO driver, "
+				"skipping\n", loc->domain, loc->bus, loc->devid, loc->function);
+		return -1;
+	}
+	rte_snprintf(devname, sizeof(devname), "/dev/uio%u", uio_num);
+
+	/* save fd if in primary process */
+	dev->intr_handle.fd = open(devname, O_RDWR);
+	if (dev->intr_handle.fd < 0) {
+		RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
+			devname, strerror(errno));
+		return -1;
+	}
+	dev->intr_handle.type = RTE_INTR_HANDLE_UIO;
+
+	/* allocate the mapping details for secondary processes*/
+	if ((uio_res = rte_zmalloc("UIO_RES", sizeof (*uio_res), 0)) == NULL) {
+		RTE_LOG(ERR, EAL,
+			"%s(): cannot store uio mmap details\n", __func__);
+		return (-1);
+	}
+
+	rte_snprintf(uio_res->path, sizeof(uio_res->path), "%s", devname);
+	memcpy(&uio_res->pci_addr, &dev->addr, sizeof(uio_res->pci_addr));
+
+	/* collect info about device mappings */
+	nb_maps = pci_uio_get_mappings(dirname, uio_res->maps,
+				       RTE_DIM(uio_res->maps));
+	if (nb_maps < 0) {
+		rte_free(uio_res);
+		return (nb_maps);
+	}
+
+	uio_res->nb_maps = nb_maps;
+
+	/* Map all BARs */
+	pagesz = sysconf(_SC_PAGESIZE);
+
+	maps = uio_res->maps;
+	for (i = 0; i != PCI_MAX_RESOURCE; i++) {
+		int fd;
+
+		/* skip empty BAR */
+		if ((phaddr = dev->mem_resource[i].phys_addr) == 0)
+			continue;
+
+		for (j = 0; j != nb_maps && (phaddr != maps[j].phaddr ||
+				dev->mem_resource[i].len != maps[j].size);
+				j++)
+			;
+
+		/* if matching map is found, then use it */
+		if (j != nb_maps) {
+			offset = j * pagesz;
+
+			/*
+			 * open devname, to mmap it
+			 */
+			fd = open(devname, O_RDWR);
+			if (fd < 0) {
+				RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
+					devname, strerror(errno));
+				return -1;
+			}
+
+			if (maps[j].addr != NULL ||
+			    (mapaddr = pci_map_resource(NULL, fd,
+							(off_t)offset,
+							(size_t)maps[j].size)
+			    ) == NULL) {
+				rte_free(uio_res);
+				close(fd);
+				return (-1);
+			}
+			close(fd);
+
+			maps[j].addr = mapaddr;
+			maps[j].offset = offset;
+			dev->mem_resource[i].addr = mapaddr;
+		}
+	}
+
+	TAILQ_INSERT_TAIL(pci_res_list, uio_res, next);
+
+	return (0);
+}
+
+/*
+ * parse a sysfs file containing one integer value
+ * different to the eal version, as it needs to work with 64-bit values
+ */
+static int
+pci_parse_sysfs_value(const char *filename, uint64_t *val)
+{
+        FILE *f;
+        char buf[BUFSIZ];
+        char *end = NULL;
+
+        f = fopen(filename, "r");
+        if (f == NULL) {
+                RTE_LOG(ERR, EAL, "%s(): cannot open sysfs value %s\n",
+                        __func__, filename);
+                return -1;
+        }
+
+        if (fgets(buf, sizeof(buf), f) == NULL) {
+                RTE_LOG(ERR, EAL, "%s(): cannot read sysfs value %s\n",
+                        __func__, filename);
+                fclose(f);
+                return -1;
+        }
+        *val = strtoull(buf, &end, 0);
+        if ((buf[0] == '\0') || (end == NULL) || (*end != '\n')) {
+                RTE_LOG(ERR, EAL, "%s(): cannot parse sysfs value %s\n",
+                                __func__, filename);
+                fclose(f);
+                return -1;
+        }
+        fclose(f);
+        return 0;
+}
diff --git a/lib/librte_eal/linuxapp/eal/include/eal_pci_init.h b/lib/librte_eal/linuxapp/eal/include/eal_pci_init.h
new file mode 100644
index 0000000..1292eda
--- /dev/null
+++ b/lib/librte_eal/linuxapp/eal/include/eal_pci_init.h
@@ -0,0 +1,66 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef EAL_PCI_INIT_H_
+#define EAL_PCI_INIT_H_
+
+struct pci_map {
+	void *addr;
+	uint64_t offset;
+	uint64_t size;
+	uint64_t phaddr;
+};
+
+/*
+ * For multi-process we need to reproduce all PCI mappings in secondary
+ * processes, so save them in a tailq.
+ */
+struct mapped_pci_resource {
+	TAILQ_ENTRY(mapped_pci_resource) next;
+
+	struct rte_pci_addr pci_addr;
+	char path[PATH_MAX];
+	int nb_maps;
+	struct pci_map maps[PCI_MAX_RESOURCE];
+};
+
+TAILQ_HEAD(mapped_pci_res_list, mapped_pci_resource);
+extern struct mapped_pci_res_list *pci_res_list;
+
+void * pci_map_resource(void * requested_addr, int fd, off_t offset,
+		size_t size);
+
+/* map IGB_UIO resource prototype */
+int pci_uio_map_resource(struct rte_pci_device *dev);
+
+#endif /* EAL_PCI_INIT_H_ */
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v6 03/20] pci: fixing errors in a previous commit found by checkpatch
  2014-06-13 14:52         ` [dpdk-dev] [PATCH v6 " Anatoly Burakov
  2014-06-13 14:52           ` [dpdk-dev] [PATCH v6 01/20] pci: move open() out of pci_map_resource, rename structs Anatoly Burakov
  2014-06-13 14:52           ` [dpdk-dev] [PATCH v6 02/20] pci: move uio mapping code to a separate file Anatoly Burakov
@ 2014-06-13 14:52           ` Anatoly Burakov
  2014-06-13 14:52           ` [dpdk-dev] [PATCH v6 04/20] pci: distinguish between legitimate failures and non-fatal errors Anatoly Burakov
                             ` (17 subsequent siblings)
  20 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-06-13 14:52 UTC (permalink / raw)
  To: dev

---
 lib/librte_eal/linuxapp/eal/eal_pci.c              |   2 +-
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c          | 112 +++++++++++----------
 lib/librte_eal/linuxapp/eal/include/eal_pci_init.h |   2 +-
 3 files changed, 63 insertions(+), 53 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c
index a422e5f..2066608 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -99,7 +99,7 @@ error:
 
 /* map a particular resource from a file */
 void *
-pci_map_resource(void * requested_addr, int fd, off_t offset, size_t size)
+pci_map_resource(void *requested_addr, int fd, off_t offset, size_t size)
 {
 	void *mapaddr;
 
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
index c9a12a1..7c75593 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
@@ -74,7 +74,7 @@ pci_uio_get_mappings(const char *devname, struct pci_map maps[], int nb_maps)
 			RTE_LOG(ERR, EAL,
 				"%s(): cannot parse offset of %s\n",
 				__func__, dirname);
-			return (-1);
+			return -1;
 		}
 
 		/* get mapping size */
@@ -84,7 +84,7 @@ pci_uio_get_mappings(const char *devname, struct pci_map maps[], int nb_maps)
 			RTE_LOG(ERR, EAL,
 				"%s(): cannot parse size of %s\n",
 				__func__, dirname);
-			return (-1);
+			return -1;
 		}
 
 		/* get mapping physical address */
@@ -94,20 +94,21 @@ pci_uio_get_mappings(const char *devname, struct pci_map maps[], int nb_maps)
 			RTE_LOG(ERR, EAL,
 				"%s(): cannot parse addr of %s\n",
 				__func__, dirname);
-			return (-1);
+			return -1;
 		}
 
 		if ((offset > OFF_MAX) || (size > SIZE_MAX)) {
 			RTE_LOG(ERR, EAL,
 				"%s(): offset/size exceed system max value\n",
 				__func__);
-			return (-1);
+			return -1;
 		}
 
 		maps[i].offset = offset;
 		maps[i].size = size;
-        }
-	return (i);
+	}
+
+	return i;
 }
 
 static int
@@ -140,12 +141,12 @@ pci_uio_map_secondary(struct rte_pci_device *dev)
 				RTE_LOG(ERR, EAL,
 					"Cannot mmap device resource\n");
 				close(fd);
-				return (-1);
+				return -1;
 			}
 			/* fd is not needed in slave process, close it */
 			close(fd);
 		}
-		return (0);
+		return 0;
 	}
 
 	RTE_LOG(ERR, EAL, "Cannot find resource for device\n");
@@ -214,15 +215,15 @@ pci_get_uio_dev(struct rte_pci_device *dev, char *dstbuf,
 	 * or uio:uioX */
 
 	rte_snprintf(dirname, sizeof(dirname),
-	         SYSFS_PCI_DEVICES "/" PCI_PRI_FMT "/uio",
-	         loc->domain, loc->bus, loc->devid, loc->function);
+			SYSFS_PCI_DEVICES "/" PCI_PRI_FMT "/uio",
+			loc->domain, loc->bus, loc->devid, loc->function);
 
 	dir = opendir(dirname);
 	if (dir == NULL) {
 		/* retry with the parent directory */
 		rte_snprintf(dirname, sizeof(dirname),
-		         SYSFS_PCI_DEVICES "/" PCI_PRI_FMT,
-		         loc->domain, loc->bus, loc->devid, loc->function);
+				SYSFS_PCI_DEVICES "/" PCI_PRI_FMT,
+				loc->domain, loc->bus, loc->devid, loc->function);
 		dir = opendir(dirname);
 
 		if (dir == NULL) {
@@ -265,7 +266,8 @@ pci_get_uio_dev(struct rte_pci_device *dev, char *dstbuf,
 		return -1;
 
 	/* create uio device if we've been asked to */
-	if (internal_config.create_uio_dev && pci_mknod_uio_dev(dstbuf, uio_num) < 0)
+	if (internal_config.create_uio_dev &&
+			pci_mknod_uio_dev(dstbuf, uio_num) < 0)
 		RTE_LOG(WARNING, EAL, "Cannot create /dev/uio%u\n", uio_num);
 
 	return uio_num;
@@ -293,7 +295,7 @@ pci_uio_map_resource(struct rte_pci_device *dev)
 
 	/* secondary processes - use already recorded details */
 	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
-		return (pci_uio_map_secondary(dev));
+		return pci_uio_map_secondary(dev);
 
 	/* find uio resource */
 	uio_num = pci_get_uio_dev(dev, dirname, sizeof(dirname));
@@ -314,10 +316,11 @@ pci_uio_map_resource(struct rte_pci_device *dev)
 	dev->intr_handle.type = RTE_INTR_HANDLE_UIO;
 
 	/* allocate the mapping details for secondary processes*/
-	if ((uio_res = rte_zmalloc("UIO_RES", sizeof (*uio_res), 0)) == NULL) {
+	uio_res = rte_zmalloc("UIO_RES", sizeof(*uio_res), 0);
+	if (uio_res == NULL) {
 		RTE_LOG(ERR, EAL,
 			"%s(): cannot store uio mmap details\n", __func__);
-		return (-1);
+		return -1;
 	}
 
 	rte_snprintf(uio_res->path, sizeof(uio_res->path), "%s", devname);
@@ -328,7 +331,7 @@ pci_uio_map_resource(struct rte_pci_device *dev)
 				       RTE_DIM(uio_res->maps));
 	if (nb_maps < 0) {
 		rte_free(uio_res);
-		return (nb_maps);
+		return nb_maps;
 	}
 
 	uio_res->nb_maps = nb_maps;
@@ -341,7 +344,8 @@ pci_uio_map_resource(struct rte_pci_device *dev)
 		int fd;
 
 		/* skip empty BAR */
-		if ((phaddr = dev->mem_resource[i].phys_addr) == 0)
+		phaddr = dev->mem_resource[i].phys_addr;
+		if (phaddr == 0)
 			continue;
 
 		for (j = 0; j != nb_maps && (phaddr != maps[j].phaddr ||
@@ -351,6 +355,7 @@ pci_uio_map_resource(struct rte_pci_device *dev)
 
 		/* if matching map is found, then use it */
 		if (j != nb_maps) {
+			int fail = 0;
 			offset = j * pagesz;
 
 			/*
@@ -363,14 +368,19 @@ pci_uio_map_resource(struct rte_pci_device *dev)
 				return -1;
 			}
 
-			if (maps[j].addr != NULL ||
-			    (mapaddr = pci_map_resource(NULL, fd,
-							(off_t)offset,
-							(size_t)maps[j].size)
-			    ) == NULL) {
+			if (maps[j].addr != NULL)
+				fail = 1;
+			else {
+				mapaddr = pci_map_resource(NULL, fd, (off_t)offset,
+						(size_t)maps[j].size);
+				if (mapaddr == NULL)
+					fail = 1;
+			}
+
+			if (fail) {
 				rte_free(uio_res);
 				close(fd);
-				return (-1);
+				return -1;
 			}
 			close(fd);
 
@@ -382,7 +392,7 @@ pci_uio_map_resource(struct rte_pci_device *dev)
 
 	TAILQ_INSERT_TAIL(pci_res_list, uio_res, next);
 
-	return (0);
+	return 0;
 }
 
 /*
@@ -392,30 +402,30 @@ pci_uio_map_resource(struct rte_pci_device *dev)
 static int
 pci_parse_sysfs_value(const char *filename, uint64_t *val)
 {
-        FILE *f;
-        char buf[BUFSIZ];
-        char *end = NULL;
-
-        f = fopen(filename, "r");
-        if (f == NULL) {
-                RTE_LOG(ERR, EAL, "%s(): cannot open sysfs value %s\n",
-                        __func__, filename);
-                return -1;
-        }
-
-        if (fgets(buf, sizeof(buf), f) == NULL) {
-                RTE_LOG(ERR, EAL, "%s(): cannot read sysfs value %s\n",
-                        __func__, filename);
-                fclose(f);
-                return -1;
-        }
-        *val = strtoull(buf, &end, 0);
-        if ((buf[0] == '\0') || (end == NULL) || (*end != '\n')) {
-                RTE_LOG(ERR, EAL, "%s(): cannot parse sysfs value %s\n",
-                                __func__, filename);
-                fclose(f);
-                return -1;
-        }
-        fclose(f);
-        return 0;
+	FILE *f;
+	char buf[BUFSIZ];
+	char *end = NULL;
+
+	f = fopen(filename, "r");
+	if (f == NULL) {
+		RTE_LOG(ERR, EAL, "%s(): cannot open sysfs value %s\n",
+				__func__, filename);
+		return -1;
+	}
+
+	if (fgets(buf, sizeof(buf), f) == NULL) {
+		RTE_LOG(ERR, EAL, "%s(): cannot read sysfs value %s\n",
+				__func__, filename);
+		fclose(f);
+		return -1;
+	}
+	*val = strtoull(buf, &end, 0);
+	if ((buf[0] == '\0') || (end == NULL) || (*end != '\n')) {
+		RTE_LOG(ERR, EAL, "%s(): cannot parse sysfs value %s\n",
+				__func__, filename);
+		fclose(f);
+		return -1;
+	}
+	fclose(f);
+	return 0;
 }
diff --git a/lib/librte_eal/linuxapp/eal/include/eal_pci_init.h b/lib/librte_eal/linuxapp/eal/include/eal_pci_init.h
index 1292eda..87bdfe7 100644
--- a/lib/librte_eal/linuxapp/eal/include/eal_pci_init.h
+++ b/lib/librte_eal/linuxapp/eal/include/eal_pci_init.h
@@ -57,7 +57,7 @@ struct mapped_pci_resource {
 TAILQ_HEAD(mapped_pci_res_list, mapped_pci_resource);
 extern struct mapped_pci_res_list *pci_res_list;
 
-void * pci_map_resource(void * requested_addr, int fd, off_t offset,
+void *pci_map_resource(void *requested_addr, int fd, off_t offset,
 		size_t size);
 
 /* map IGB_UIO resource prototype */
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v6 04/20] pci: distinguish between legitimate failures and non-fatal errors
  2014-06-13 14:52         ` [dpdk-dev] [PATCH v6 " Anatoly Burakov
                             ` (2 preceding siblings ...)
  2014-06-13 14:52           ` [dpdk-dev] [PATCH v6 03/20] pci: fixing errors in a previous commit found by checkpatch Anatoly Burakov
@ 2014-06-13 14:52           ` Anatoly Burakov
  2014-06-13 14:52           ` [dpdk-dev] [PATCH v6 05/20] pci: Rename RTE_PCI_DRV_NEED_IGB_UIO to RTE_PCI_DRV_NEED_MAPPING Anatoly Burakov
                             ` (16 subsequent siblings)
  20 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-06-13 14:52 UTC (permalink / raw)
  To: dev

Currently, EAL does not distinguish between actual failures and expected
initialization errors. E.g. sometimes the driver fails to initialize
because it was not supposed to be initialized in the first place, such
as device not being managed by said driver.

This patch makes EAL fail on actual initialization errors while still
skipping over expected initialization errors.
---
 lib/librte_eal/bsdapp/eal/eal_pci.c       |  8 +++++---
 lib/librte_eal/common/eal_common_pci.c    | 16 +++++++++-------
 lib/librte_eal/linuxapp/eal/eal_pci.c     |  8 +++++---
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c |  4 ++--
 4 files changed, 21 insertions(+), 15 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c b/lib/librte_eal/bsdapp/eal/eal_pci.c
index b560077..03200f3 100644
--- a/lib/librte_eal/bsdapp/eal/eal_pci.c
+++ b/lib/librte_eal/bsdapp/eal/eal_pci.c
@@ -217,7 +217,7 @@ pci_uio_map_resource(struct rte_pci_device *dev)
 	if (access(devname, O_RDWR) < 0) {
 		RTE_LOG(WARNING, EAL, "  "PCI_PRI_FMT" not managed by UIO driver, "
 				"skipping\n", loc->domain, loc->bus, loc->devid, loc->function);
-		return -1;
+		return 1;
 	}
 
 	/* save fd if in primary process */
@@ -440,6 +440,7 @@ int
 rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, struct rte_pci_device *dev)
 {
 	struct rte_pci_id *id_table;
+	int ret;
 
 	for (id_table = dr->id_table ; id_table->vendor_id != 0; id_table++) {
 
@@ -476,8 +477,9 @@ rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, struct rte_pci_device *d
 
 		if (dr->drv_flags & RTE_PCI_DRV_NEED_IGB_UIO) {
 			/* map resources for devices that use igb_uio */
-			if (pci_uio_map_resource(dev) < 0)
-				return -1;
+			ret = pci_uio_map_resource(dev);
+			if (ret != 0)
+				return ret;
 		} else if (dr->drv_flags & RTE_PCI_DRV_FORCE_UNBIND &&
 		           rte_eal_process_type() == RTE_PROC_PRIMARY) {
 			/* unbind current driver */
diff --git a/lib/librte_eal/common/eal_common_pci.c b/lib/librte_eal/common/eal_common_pci.c
index 4d877ea..af809a8 100644
--- a/lib/librte_eal/common/eal_common_pci.c
+++ b/lib/librte_eal/common/eal_common_pci.c
@@ -101,8 +101,8 @@ static struct rte_devargs *pci_devargs_lookup(struct rte_pci_device *dev)
 
 /*
  * If vendor/device ID match, call the devinit() function of all
- * registered driver for the given device. Return -1 if no driver is
- * found for this device.
+ * registered driver for the given device. Return -1 if initialization
+ * failed, return 1 if no driver is found for this device.
  * For drivers with the RTE_PCI_DRV_MULTIPLE flag enabled, register
  * the same device multiple times until failure to do so.
  * It is required for non-Intel NIC drivers provided by third-parties such
@@ -118,7 +118,7 @@ pci_probe_all_drivers(struct rte_pci_device *dev)
 		rc = rte_eal_pci_probe_one_driver(dr, dev);
 		if (rc < 0)
 			/* negative value is an error */
-			break;
+			return -1;
 		if (rc > 0)
 			/* positive value means driver not found */
 			continue;
@@ -130,7 +130,7 @@ pci_probe_all_drivers(struct rte_pci_device *dev)
 				;
 		return 0;
 	}
-	return -1;
+	return 1;
 }
 
 /*
@@ -144,6 +144,7 @@ rte_eal_pci_probe(void)
 	struct rte_pci_device *dev = NULL;
 	struct rte_devargs *devargs;
 	int probe_all = 0;
+	int ret = 0;
 
 	if (rte_eal_devargs_type_count(RTE_DEVTYPE_WHITELISTED_PCI) == 0)
 		probe_all = 1;
@@ -157,10 +158,11 @@ rte_eal_pci_probe(void)
 
 		/* probe all or only whitelisted devices */
 		if (probe_all)
-			pci_probe_all_drivers(dev);
+			ret = pci_probe_all_drivers(dev);
 		else if (devargs != NULL &&
-			devargs->type == RTE_DEVTYPE_WHITELISTED_PCI &&
-			pci_probe_all_drivers(dev) < 0)
+			devargs->type == RTE_DEVTYPE_WHITELISTED_PCI)
+			ret = pci_probe_all_drivers(dev);
+		if (ret < 0)
 			rte_exit(EXIT_FAILURE, "Requested device " PCI_PRI_FMT
 				 " cannot be used\n", dev->addr.domain, dev->addr.bus,
 				 dev->addr.devid, dev->addr.function);
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c
index 2066608..49b2a68 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -401,6 +401,7 @@ int
 rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, struct rte_pci_device *dev)
 {
 	struct rte_pci_id *id_table;
+	int ret = 0;
 
 	for (id_table = dr->id_table ; id_table->vendor_id != 0; id_table++) {
 
@@ -431,13 +432,14 @@ rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, struct rte_pci_device *d
 		if (dev->devargs != NULL &&
 			dev->devargs->type == RTE_DEVTYPE_BLACKLISTED_PCI) {
 			RTE_LOG(DEBUG, EAL, "  Device is blacklisted, not initializing\n");
-			return 0;
+			return 1;
 		}
 
 		if (dr->drv_flags & RTE_PCI_DRV_NEED_IGB_UIO) {
 			/* map resources for devices that use igb_uio */
-			if (pci_uio_map_resource(dev) < 0)
-				return -1;
+			ret = pci_uio_map_resource(dev);
+			if (ret != 0)
+				return ret;
 		} else if (dr->drv_flags & RTE_PCI_DRV_FORCE_UNBIND &&
 		           rte_eal_process_type() == RTE_PROC_PRIMARY) {
 			/* unbind current driver */
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
index 7c75593..96aa24d 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
@@ -150,7 +150,7 @@ pci_uio_map_secondary(struct rte_pci_device *dev)
 	}
 
 	RTE_LOG(ERR, EAL, "Cannot find resource for device\n");
-	return -1;
+	return 1;
 }
 
 static int
@@ -302,7 +302,7 @@ pci_uio_map_resource(struct rte_pci_device *dev)
 	if (uio_num < 0) {
 		RTE_LOG(WARNING, EAL, "  "PCI_PRI_FMT" not managed by UIO driver, "
 				"skipping\n", loc->domain, loc->bus, loc->devid, loc->function);
-		return -1;
+		return 1;
 	}
 	rte_snprintf(devname, sizeof(devname), "/dev/uio%u", uio_num);
 
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v6 05/20] pci: Rename RTE_PCI_DRV_NEED_IGB_UIO to RTE_PCI_DRV_NEED_MAPPING
  2014-06-13 14:52         ` [dpdk-dev] [PATCH v6 " Anatoly Burakov
                             ` (3 preceding siblings ...)
  2014-06-13 14:52           ` [dpdk-dev] [PATCH v6 04/20] pci: distinguish between legitimate failures and non-fatal errors Anatoly Burakov
@ 2014-06-13 14:52           ` Anatoly Burakov
  2014-06-13 14:52           ` [dpdk-dev] [PATCH v6 06/20] igb_uio: make igb_uio compilation optional Anatoly Burakov
                             ` (15 subsequent siblings)
  20 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-06-13 14:52 UTC (permalink / raw)
  To: dev

Rename the RTE_PCI_DRV_NEED_IGB_UIO to be more generic.
---
 app/test/test_pci.c                     | 4 ++--
 lib/librte_eal/bsdapp/eal/eal_pci.c     | 2 +-
 lib/librte_eal/common/include/rte_pci.h | 4 ++--
 lib/librte_eal/linuxapp/eal/eal_pci.c   | 2 +-
 lib/librte_pmd_e1000/em_ethdev.c        | 2 +-
 lib/librte_pmd_e1000/igb_ethdev.c       | 4 ++--
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c     | 4 ++--
 lib/librte_pmd_virtio/virtio_ethdev.c   | 2 +-
 lib/librte_pmd_vmxnet3/vmxnet3_ethdev.c | 2 +-
 9 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/app/test/test_pci.c b/app/test/test_pci.c
index 680a095..40095c6 100644
--- a/app/test/test_pci.c
+++ b/app/test/test_pci.c
@@ -63,7 +63,7 @@ static int my_driver_init(struct rte_pci_driver *dr,
 			  struct rte_pci_device *dev);
 
 /*
- * To test cases where RTE_PCI_DRV_NEED_IGB_UIO is set, and isn't set, two
+ * To test cases where RTE_PCI_DRV_NEED_MAPPING is set, and isn't set, two
  * drivers are created (one with IGB devices, the other with IXGBE devices).
  */
 
@@ -90,7 +90,7 @@ struct rte_pci_driver my_driver = {
 	.name = "test_driver",
 	.devinit = my_driver_init,
 	.id_table = my_driver_id,
-	.drv_flags = RTE_PCI_DRV_NEED_IGB_UIO,
+	.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
 };
 
 struct rte_pci_driver my_driver2 = {
diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c b/lib/librte_eal/bsdapp/eal/eal_pci.c
index 03200f3..dad5418 100644
--- a/lib/librte_eal/bsdapp/eal/eal_pci.c
+++ b/lib/librte_eal/bsdapp/eal/eal_pci.c
@@ -475,7 +475,7 @@ rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, struct rte_pci_device *d
 			return 0;
 		}
 
-		if (dr->drv_flags & RTE_PCI_DRV_NEED_IGB_UIO) {
+		if (dr->drv_flags & RTE_PCI_DRV_NEED_MAPPING) {
 			/* map resources for devices that use igb_uio */
 			ret = pci_uio_map_resource(dev);
 			if (ret != 0)
diff --git a/lib/librte_eal/common/include/rte_pci.h b/lib/librte_eal/common/include/rte_pci.h
index b56d7d3..3857584 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -190,8 +190,8 @@ struct rte_pci_driver {
 	uint32_t drv_flags;                     /**< Flags contolling handling of device. */
 };
 
-/** Device needs igb_uio kernel module */
-#define RTE_PCI_DRV_NEED_IGB_UIO 0x0001
+/** Device needs PCI BAR mapping (done with either IGB_UIO or VFIO) */
+#define RTE_PCI_DRV_NEED_MAPPING 0x0001
 /** Device driver must be registered several times until failure */
 #define RTE_PCI_DRV_MULTIPLE 0x0002
 /** Device needs to be unbound even if no module is provided */
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c
index 49b2a68..c7cd38e 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -435,7 +435,7 @@ rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, struct rte_pci_device *d
 			return 1;
 		}
 
-		if (dr->drv_flags & RTE_PCI_DRV_NEED_IGB_UIO) {
+		if (dr->drv_flags & RTE_PCI_DRV_NEED_MAPPING) {
 			/* map resources for devices that use igb_uio */
 			ret = pci_uio_map_resource(dev);
 			if (ret != 0)
diff --git a/lib/librte_pmd_e1000/em_ethdev.c b/lib/librte_pmd_e1000/em_ethdev.c
index 398838f..f025338 100644
--- a/lib/librte_pmd_e1000/em_ethdev.c
+++ b/lib/librte_pmd_e1000/em_ethdev.c
@@ -280,7 +280,7 @@ static struct eth_driver rte_em_pmd = {
 	{
 		.name = "rte_em_pmd",
 		.id_table = pci_id_em_map,
-		.drv_flags = RTE_PCI_DRV_NEED_IGB_UIO,
+		.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
 	},
 	.eth_dev_init = eth_em_dev_init,
 	.dev_private_size = sizeof(struct e1000_adapter),
diff --git a/lib/librte_pmd_e1000/igb_ethdev.c b/lib/librte_pmd_e1000/igb_ethdev.c
index 6e835c3..58ba5d3 100644
--- a/lib/librte_pmd_e1000/igb_ethdev.c
+++ b/lib/librte_pmd_e1000/igb_ethdev.c
@@ -603,7 +603,7 @@ static struct eth_driver rte_igb_pmd = {
 	{
 		.name = "rte_igb_pmd",
 		.id_table = pci_id_igb_map,
-		.drv_flags = RTE_PCI_DRV_NEED_IGB_UIO,
+		.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
 	},
 	.eth_dev_init = eth_igb_dev_init,
 	.dev_private_size = sizeof(struct e1000_adapter),
@@ -616,7 +616,7 @@ static struct eth_driver rte_igbvf_pmd = {
 	{
 		.name = "rte_igbvf_pmd",
 		.id_table = pci_id_igbvf_map,
-		.drv_flags = RTE_PCI_DRV_NEED_IGB_UIO,
+		.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
 	},
 	.eth_dev_init = eth_igbvf_dev_init,
 	.dev_private_size = sizeof(struct e1000_adapter),
diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
index 10e5633..255615b 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
@@ -1012,7 +1012,7 @@ static struct eth_driver rte_ixgbe_pmd = {
 	{
 		.name = "rte_ixgbe_pmd",
 		.id_table = pci_id_ixgbe_map,
-		.drv_flags = RTE_PCI_DRV_NEED_IGB_UIO,
+		.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
 	},
 	.eth_dev_init = eth_ixgbe_dev_init,
 	.dev_private_size = sizeof(struct ixgbe_adapter),
@@ -1025,7 +1025,7 @@ static struct eth_driver rte_ixgbevf_pmd = {
 	{
 		.name = "rte_ixgbevf_pmd",
 		.id_table = pci_id_ixgbevf_map,
-		.drv_flags = RTE_PCI_DRV_NEED_IGB_UIO,
+		.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
 	},
 	.eth_dev_init = eth_ixgbevf_dev_init,
 	.dev_private_size = sizeof(struct ixgbe_adapter),
diff --git a/lib/librte_pmd_virtio/virtio_ethdev.c b/lib/librte_pmd_virtio/virtio_ethdev.c
index d0b419d..9661358 100644
--- a/lib/librte_pmd_virtio/virtio_ethdev.c
+++ b/lib/librte_pmd_virtio/virtio_ethdev.c
@@ -816,7 +816,7 @@ static struct eth_driver rte_virtio_pmd = {
 	{
 		.name = "rte_virtio_pmd",
 		.id_table = pci_id_virtio_map,
-		.drv_flags = RTE_PCI_DRV_NEED_IGB_UIO,
+		.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
 	},
 	.eth_dev_init = eth_virtio_dev_init,
 	.dev_private_size = sizeof(struct virtio_adapter),
diff --git a/lib/librte_pmd_vmxnet3/vmxnet3_ethdev.c b/lib/librte_pmd_vmxnet3/vmxnet3_ethdev.c
index b955314..2411d26 100644
--- a/lib/librte_pmd_vmxnet3/vmxnet3_ethdev.c
+++ b/lib/librte_pmd_vmxnet3/vmxnet3_ethdev.c
@@ -268,7 +268,7 @@ static struct eth_driver rte_vmxnet3_pmd = {
 	{
 		.name = "rte_vmxnet3_pmd",
 		.id_table = pci_id_vmxnet3_map,
-		.drv_flags = RTE_PCI_DRV_NEED_IGB_UIO,
+		.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
 	},
 	.eth_dev_init = eth_vmxnet3_dev_init,
 	.dev_private_size = sizeof(struct vmxnet3_adapter),
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v6 06/20] igb_uio: make igb_uio compilation optional
  2014-06-13 14:52         ` [dpdk-dev] [PATCH v6 " Anatoly Burakov
                             ` (4 preceding siblings ...)
  2014-06-13 14:52           ` [dpdk-dev] [PATCH v6 05/20] pci: Rename RTE_PCI_DRV_NEED_IGB_UIO to RTE_PCI_DRV_NEED_MAPPING Anatoly Burakov
@ 2014-06-13 14:52           ` Anatoly Burakov
  2014-06-13 14:52           ` [dpdk-dev] [PATCH v6 07/20] igb_uio: Moved interrupt type out of igb_uio Anatoly Burakov
                             ` (14 subsequent siblings)
  20 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-06-13 14:52 UTC (permalink / raw)
  To: dev

Currently, igb_uio is always compiled. Some Linux distributions may not
want to include igb_uio with DPDK, so we need to make sure that igb_uio
compilation for Linuxapp targets can be optional.
---
 config/common_linuxapp           | 1 +
 lib/librte_eal/linuxapp/Makefile | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/config/common_linuxapp b/config/common_linuxapp
index 7c143eb..5f6b8f0 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -123,6 +123,7 @@ CONFIG_RTE_LOG_HISTORY=256
 CONFIG_RTE_LIBEAL_USE_HPET=n
 CONFIG_RTE_EAL_ALLOW_INV_SOCKET_ID=n
 CONFIG_RTE_EAL_ALWAYS_PANIC_ON_ERROR=n
+CONFIG_RTE_EAL_IGB_UIO=y
 
 #
 # Compile Environment Abstraction Layer for linux
diff --git a/lib/librte_eal/linuxapp/Makefile b/lib/librte_eal/linuxapp/Makefile
index 9ff167c..8fcfdf6 100644
--- a/lib/librte_eal/linuxapp/Makefile
+++ b/lib/librte_eal/linuxapp/Makefile
@@ -31,7 +31,9 @@
 
 include $(RTE_SDK)/mk/rte.vars.mk
 
+ifeq ($(CONFIG_RTE_EAL_IGB_UIO),y)
 DIRS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += igb_uio
+endif
 DIRS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal
 ifeq ($(CONFIG_RTE_LIBRTE_KNI),y)
 DIRS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += kni
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v6 07/20] igb_uio: Moved interrupt type out of igb_uio
  2014-06-13 14:52         ` [dpdk-dev] [PATCH v6 " Anatoly Burakov
                             ` (5 preceding siblings ...)
  2014-06-13 14:52           ` [dpdk-dev] [PATCH v6 06/20] igb_uio: make igb_uio compilation optional Anatoly Burakov
@ 2014-06-13 14:52           ` Anatoly Burakov
  2014-06-13 14:52           ` [dpdk-dev] [PATCH v6 08/20] vfio: add support for VFIO in Linuxapp targets Anatoly Burakov
                             ` (13 subsequent siblings)
  20 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-06-13 14:52 UTC (permalink / raw)
  To: dev

Moving interrupt type enum out of igb_uio and renaming it to be more
generic. Such a strange header naming and separation is done mostly to
make coming virtio patches easier to port to dpdk.org tree.
---
 lib/librte_eal/common/Makefile                     |  1 +
 lib/librte_eal/common/include/rte_pci.h            |  1 +
 .../common/include/rte_pci_dev_feature_defs.h      | 46 +++++++++++++++++++++
 .../common/include/rte_pci_dev_features.h          | 44 ++++++++++++++++++++
 lib/librte_eal/linuxapp/igb_uio/igb_uio.c          | 48 +++++++++-------------
 5 files changed, 112 insertions(+), 28 deletions(-)
 create mode 100644 lib/librte_eal/common/include/rte_pci_dev_feature_defs.h
 create mode 100644 lib/librte_eal/common/include/rte_pci_dev_features.h

diff --git a/lib/librte_eal/common/Makefile b/lib/librte_eal/common/Makefile
index 915cef1..7f27966 100644
--- a/lib/librte_eal/common/Makefile
+++ b/lib/librte_eal/common/Makefile
@@ -40,6 +40,7 @@ INC += rte_string_fns.h rte_cpuflags.h rte_version.h rte_tailq_elem.h
 INC += rte_eal_memconfig.h rte_malloc_heap.h
 INC += rte_hexdump.h rte_devargs.h rte_dev.h
 INC += rte_common_vect.h
+INC += rte_pci_dev_feature_defs.h rte_pci_dev_features.h
 
 ifeq ($(CONFIG_RTE_INSECURE_FUNCTION_WARNING),y)
 INC += rte_warnings.h
diff --git a/lib/librte_eal/common/include/rte_pci.h b/lib/librte_eal/common/include/rte_pci.h
index 3857584..3608ee0 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -80,6 +80,7 @@ extern "C" {
 #include <sys/queue.h>
 #include <stdint.h>
 #include <inttypes.h>
+
 #include <rte_interrupts.h>
 
 TAILQ_HEAD(pci_device_list, rte_pci_device); /**< PCI devices in D-linked Q. */
diff --git a/lib/librte_eal/common/include/rte_pci_dev_feature_defs.h b/lib/librte_eal/common/include/rte_pci_dev_feature_defs.h
new file mode 100644
index 0000000..82f2c00
--- /dev/null
+++ b/lib/librte_eal/common/include/rte_pci_dev_feature_defs.h
@@ -0,0 +1,46 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_PCI_DEV_DEFS_H_
+#define _RTE_PCI_DEV_DEFS_H_
+
+/* interrupt mode */
+enum rte_intr_mode {
+	RTE_INTR_MODE_NONE = 0,
+	RTE_INTR_MODE_LEGACY,
+	RTE_INTR_MODE_MSI,
+	RTE_INTR_MODE_MSIX,
+	RTE_INTR_MODE_MAX
+};
+
+#endif /* _RTE_PCI_DEV_DEFS_H_ */
diff --git a/lib/librte_eal/common/include/rte_pci_dev_features.h b/lib/librte_eal/common/include/rte_pci_dev_features.h
new file mode 100644
index 0000000..01200de
--- /dev/null
+++ b/lib/librte_eal/common/include/rte_pci_dev_features.h
@@ -0,0 +1,44 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_PCI_DEV_FEATURES_H
+#define _RTE_PCI_DEV_FEATURES_H
+
+#include <rte_pci_dev_feature_defs.h>
+
+#define RTE_INTR_MODE_NONE_NAME "none"
+#define RTE_INTR_MODE_LEGACY_NAME "legacy"
+#define RTE_INTR_MODE_MSI_NAME "msi"
+#define RTE_INTR_MODE_MSIX_NAME "msix"
+
+#endif
diff --git a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
index 6fa7396..8e467a2 100644
--- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
+++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
@@ -33,6 +33,7 @@
 #ifdef CONFIG_XEN_DOM0
 #include <xen/xen.h>
 #endif
+#include <rte_pci_dev_features.h>
 
 /**
  * MSI-X related macros, copy from linux/pci_regs.h in kernel 2.6.39,
@@ -49,14 +50,6 @@
 
 #define IGBUIO_NUM_MSI_VECTORS 1
 
-/* interrupt mode */
-enum igbuio_intr_mode {
-	IGBUIO_LEGACY_INTR_MODE = 0,
-	IGBUIO_MSI_INTR_MODE,
-	IGBUIO_MSIX_INTR_MODE,
-	IGBUIO_INTR_MODE_MAX
-};
-
 /**
  * A structure describing the private information for a uio device.
  */
@@ -64,13 +57,13 @@ struct rte_uio_pci_dev {
 	struct uio_info info;
 	struct pci_dev *pdev;
 	spinlock_t lock; /* spinlock for accessing PCI config space or msix data in multi tasks/isr */
-	enum igbuio_intr_mode mode;
+	enum rte_intr_mode mode;
 	struct msix_entry \
 		msix_entries[IGBUIO_NUM_MSI_VECTORS]; /* pointer to the msix vectors to be allocated later */
 };
 
 static char *intr_mode = NULL;
-static enum igbuio_intr_mode igbuio_intr_mode_preferred = IGBUIO_MSIX_INTR_MODE;
+static enum rte_intr_mode igbuio_intr_mode_preferred = RTE_INTR_MODE_MSIX;
 
 /* PCI device id table */
 static struct pci_device_id igbuio_pci_ids[] = {
@@ -222,14 +215,13 @@ igbuio_set_interrupt_mask(struct rte_uio_pci_dev *udev, int32_t state)
 {
 	struct pci_dev *pdev = udev->pdev;
 
-	if (udev->mode == IGBUIO_MSIX_INTR_MODE) {
+	if (udev->mode == RTE_INTR_MODE_MSIX) {
 		struct msi_desc *desc;
 
 		list_for_each_entry(desc, &pdev->msi_list, list) {
 			igbuio_msix_mask_irq(desc, state);
 		}
-	}
-	else if (udev->mode == IGBUIO_LEGACY_INTR_MODE) {
+	} else if (udev->mode == RTE_INTR_MODE_LEGACY) {
 		uint32_t status;
 		uint16_t old, new;
 
@@ -301,7 +293,7 @@ igbuio_pci_irqhandler(int irq, struct uio_info *info)
 		goto spin_unlock;
 
 	/* for legacy mode, interrupt maybe shared */
-	if (udev->mode == IGBUIO_LEGACY_INTR_MODE) {
+	if (udev->mode == RTE_INTR_MODE_LEGACY) {
 		pci_read_config_dword(pdev, PCI_COMMAND, &cmd_status_dword);
 		status = cmd_status_dword >> 16;
 		/* interrupt is not ours, goes to out */
@@ -520,18 +512,18 @@ igbuio_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
 #endif
 	udev->info.priv = udev;
 	udev->pdev = dev;
-	udev->mode = 0; /* set the default value for interrupt mode */
+	udev->mode = RTE_INTR_MODE_LEGACY;
 	spin_lock_init(&udev->lock);
 
 	/* check if it need to try msix first */
-	if (igbuio_intr_mode_preferred == IGBUIO_MSIX_INTR_MODE) {
+	if (igbuio_intr_mode_preferred == RTE_INTR_MODE_MSIX) {
 		int vector;
 
 		for (vector = 0; vector < IGBUIO_NUM_MSI_VECTORS; vector ++)
 			udev->msix_entries[vector].entry = vector;
 
 		if (pci_enable_msix(udev->pdev, udev->msix_entries, IGBUIO_NUM_MSI_VECTORS) == 0) {
-			udev->mode = IGBUIO_MSIX_INTR_MODE;
+			udev->mode = RTE_INTR_MODE_MSIX;
 		}
 		else {
 			pci_disable_msix(udev->pdev);
@@ -539,13 +531,13 @@ igbuio_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
 		}
 	}
 	switch (udev->mode) {
-	case IGBUIO_MSIX_INTR_MODE:
+	case RTE_INTR_MODE_MSIX:
 		udev->info.irq_flags = 0;
 		udev->info.irq = udev->msix_entries[0].vector;
 		break;
-	case IGBUIO_MSI_INTR_MODE:
+	case RTE_INTR_MODE_MSI:
 		break;
-	case IGBUIO_LEGACY_INTR_MODE:
+	case RTE_INTR_MODE_LEGACY:
 		udev->info.irq_flags = IRQF_SHARED;
 		udev->info.irq = dev->irq;
 		break;
@@ -570,7 +562,7 @@ igbuio_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
 fail_release_iomem:
 	sysfs_remove_group(&dev->dev.kobj, &dev_attr_grp);
 	igbuio_pci_release_iomem(&udev->info);
-	if (udev->mode == IGBUIO_MSIX_INTR_MODE)
+	if (udev->mode == RTE_INTR_MODE_MSIX)
 		pci_disable_msix(udev->pdev);
 	pci_release_regions(dev);
 fail_disable:
@@ -595,7 +587,7 @@ igbuio_pci_remove(struct pci_dev *dev)
 	uio_unregister_device(info);
 	igbuio_pci_release_iomem(info);
 	if (((struct rte_uio_pci_dev *)info->priv)->mode ==
-					IGBUIO_MSIX_INTR_MODE)
+			RTE_INTR_MODE_MSIX)
 		pci_disable_msix(dev);
 	pci_release_regions(dev);
 	pci_disable_device(dev);
@@ -611,11 +603,11 @@ igbuio_config_intr_mode(char *intr_str)
 		return 0;
 	}
 
-	if (!strcmp(intr_str, "msix")) {
-		igbuio_intr_mode_preferred = IGBUIO_MSIX_INTR_MODE;
+	if (!strcmp(intr_str, RTE_INTR_MODE_MSIX_NAME)) {
+		igbuio_intr_mode_preferred = RTE_INTR_MODE_MSIX;
 		printk(KERN_INFO "Use MSIX interrupt\n");
-	} else if (!strcmp(intr_str, "legacy")) {
-		igbuio_intr_mode_preferred = IGBUIO_LEGACY_INTR_MODE;
+	} else if (!strcmp(intr_str, RTE_INTR_MODE_LEGACY_NAME)) {
+		igbuio_intr_mode_preferred = RTE_INTR_MODE_LEGACY;
 		printk(KERN_INFO "Use legacy interrupt\n");
 	} else {
 		printk(KERN_INFO "Error: bad parameter - %s\n", intr_str);
@@ -656,8 +648,8 @@ module_exit(igbuio_pci_exit_module);
 module_param(intr_mode, charp, S_IRUGO | S_IWUSR);
 MODULE_PARM_DESC(intr_mode,
 "igb_uio interrupt mode (default=msix):\n"
-"    msix       Use MSIX interrupt\n"
-"    legacy     Use Legacy interrupt\n"
+"    " RTE_INTR_MODE_MSIX_NAME "       Use MSIX interrupt\n"
+"    " RTE_INTR_MODE_LEGACY_NAME "     Use Legacy interrupt\n"
 "\n");
 
 MODULE_DESCRIPTION("UIO driver for Intel IGB PCI cards");
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v6 08/20] vfio: add support for VFIO in Linuxapp targets
  2014-06-13 14:52         ` [dpdk-dev] [PATCH v6 " Anatoly Burakov
                             ` (6 preceding siblings ...)
  2014-06-13 14:52           ` [dpdk-dev] [PATCH v6 07/20] igb_uio: Moved interrupt type out of igb_uio Anatoly Burakov
@ 2014-06-13 14:52           ` Anatoly Burakov
  2014-06-13 14:52           ` [dpdk-dev] [PATCH v6 09/20] vfio: add VFIO header Anatoly Burakov
                             ` (12 subsequent siblings)
  20 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-06-13 14:52 UTC (permalink / raw)
  To: dev

Add VFIO compilation option to common Linuxapp config.
---
 config/common_linuxapp | 1 +
 1 file changed, 1 insertion(+)

diff --git a/config/common_linuxapp b/config/common_linuxapp
index 5f6b8f0..63ae903 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -124,6 +124,7 @@ CONFIG_RTE_LIBEAL_USE_HPET=n
 CONFIG_RTE_EAL_ALLOW_INV_SOCKET_ID=n
 CONFIG_RTE_EAL_ALWAYS_PANIC_ON_ERROR=n
 CONFIG_RTE_EAL_IGB_UIO=y
+CONFIG_RTE_EAL_VFIO=y
 
 #
 # Compile Environment Abstraction Layer for linux
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v6 09/20] vfio: add VFIO header
  2014-06-13 14:52         ` [dpdk-dev] [PATCH v6 " Anatoly Burakov
                             ` (7 preceding siblings ...)
  2014-06-13 14:52           ` [dpdk-dev] [PATCH v6 08/20] vfio: add support for VFIO in Linuxapp targets Anatoly Burakov
@ 2014-06-13 14:52           ` Anatoly Burakov
  2014-06-13 14:52           ` [dpdk-dev] [PATCH v6 10/20] interrupts: Add support for VFIO interrupts Anatoly Burakov
                             ` (11 subsequent siblings)
  20 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-06-13 14:52 UTC (permalink / raw)
  To: dev

Adding a header that will determine if VFIO support should be compiled
in. If VFIO is enabled in config (and it's enabled by default), then the
header will also check for kernel version. If VFIO is enabled in config
and if the kernel version is 3.6+, then VFIO_PRESENT will be defined.
This is the macro that should be used to determine if VFIO support is
being compiled in.
---
 lib/librte_eal/linuxapp/eal/include/eal_vfio.h | 49 ++++++++++++++++++++++++++
 1 file changed, 49 insertions(+)
 create mode 100644 lib/librte_eal/linuxapp/eal/include/eal_vfio.h

diff --git a/lib/librte_eal/linuxapp/eal/include/eal_vfio.h b/lib/librte_eal/linuxapp/eal/include/eal_vfio.h
new file mode 100644
index 0000000..354e9ca
--- /dev/null
+++ b/lib/librte_eal/linuxapp/eal/include/eal_vfio.h
@@ -0,0 +1,49 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef EAL_VFIO_H_
+#define EAL_VFIO_H_
+
+/*
+ * determine if VFIO is present on the system
+ */
+#ifdef RTE_EAL_VFIO
+#include <linux/version.h>
+#if LINUX_VERSION_CODE >= KERNEL_VERSION(3, 6, 0)
+#include <linux/vfio.h>
+
+#define VFIO_PRESENT
+#endif /* kernel version */
+#endif /* RTE_EAL_VFIO */
+
+#endif /* EAL_VFIO_H_ */
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v6 10/20] interrupts: Add support for VFIO interrupts
  2014-06-13 14:52         ` [dpdk-dev] [PATCH v6 " Anatoly Burakov
                             ` (8 preceding siblings ...)
  2014-06-13 14:52           ` [dpdk-dev] [PATCH v6 09/20] vfio: add VFIO header Anatoly Burakov
@ 2014-06-13 14:52           ` Anatoly Burakov
  2014-06-13 14:52           ` [dpdk-dev] [PATCH v6 11/20] eal: remove -Wno-return-type for non-existent eal_hpet.c Anatoly Burakov
                             ` (10 subsequent siblings)
  20 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-06-13 14:52 UTC (permalink / raw)
  To: dev

Creating code to handle VFIO interrupts in EAL interrupts (supports all
types of interrupts).
---
 lib/librte_eal/linuxapp/eal/eal_interrupts.c       | 287 ++++++++++++++++++++-
 .../linuxapp/eal/include/exec-env/rte_interrupts.h |   4 +
 2 files changed, 286 insertions(+), 5 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
index bd9fc5f..dc2668a 100644
--- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c
+++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
@@ -36,7 +36,6 @@
 #include <stdlib.h>
 #include <pthread.h>
 #include <sys/queue.h>
-#include <malloc.h>
 #include <stdarg.h>
 #include <unistd.h>
 #include <string.h>
@@ -44,6 +43,7 @@
 #include <inttypes.h>
 #include <sys/epoll.h>
 #include <sys/signalfd.h>
+#include <sys/ioctl.h>
 
 #include <rte_common.h>
 #include <rte_interrupts.h>
@@ -66,6 +66,7 @@
 #include <rte_spinlock.h>
 
 #include "eal_private.h"
+#include "eal_vfio.h"
 
 #define EAL_INTR_EPOLL_WAIT_FOREVER (-1)
 
@@ -87,6 +88,9 @@ union intr_pipefds{
  */
 union rte_intr_read_buffer {
 	int uio_intr_count;              /* for uio device */
+#ifdef VFIO_PRESENT
+	uint64_t vfio_intr_count;        /* for vfio device */
+#endif
 	uint64_t timerfd_num;            /* for timerfd */
 	char charbuf[16];                /* for others */
 };
@@ -119,6 +123,244 @@ static struct rte_intr_source_list intr_sources;
 /* interrupt handling thread */
 static pthread_t intr_thread;
 
+/* VFIO interrupts */
+#ifdef VFIO_PRESENT
+
+#define IRQ_SET_BUF_LEN  (sizeof(struct vfio_irq_set) + sizeof(int))
+
+/* enable legacy (INTx) interrupts */
+static int
+vfio_enable_intx(struct rte_intr_handle *intr_handle) {
+	struct vfio_irq_set *irq_set;
+	char irq_set_buf[IRQ_SET_BUF_LEN];
+	int len, ret;
+	int *fd_ptr;
+
+	len = sizeof(irq_set_buf);
+
+	/* enable INTx */
+	irq_set = (struct vfio_irq_set *) irq_set_buf;
+	irq_set->argsz = len;
+	irq_set->count = 1;
+	irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD | VFIO_IRQ_SET_ACTION_TRIGGER;
+	irq_set->index = VFIO_PCI_INTX_IRQ_INDEX;
+	irq_set->start = 0;
+	fd_ptr = (int *) &irq_set->data;
+	*fd_ptr = intr_handle->fd;
+
+	ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
+
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Error enabling INTx interrupts for fd %d\n",
+						intr_handle->fd);
+		return -1;
+	}
+
+	/* unmask INTx after enabling */
+	memset(irq_set, 0, len);
+	len = sizeof(struct vfio_irq_set);
+	irq_set->argsz = len;
+	irq_set->count = 1;
+	irq_set->flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_UNMASK;
+	irq_set->index = VFIO_PCI_INTX_IRQ_INDEX;
+	irq_set->start = 0;
+
+	ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
+
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Error unmasking INTx interrupts for fd %d\n",
+						intr_handle->fd);
+		return -1;
+	}
+	return 0;
+}
+
+/* disable legacy (INTx) interrupts */
+static int
+vfio_disable_intx(struct rte_intr_handle *intr_handle) {
+	struct vfio_irq_set *irq_set;
+	char irq_set_buf[IRQ_SET_BUF_LEN];
+	int len, ret;
+
+	len = sizeof(struct vfio_irq_set);
+
+	/* mask interrupts before disabling */
+	irq_set = (struct vfio_irq_set *) irq_set_buf;
+	irq_set->argsz = len;
+	irq_set->count = 1;
+	irq_set->flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_UNMASK;
+	irq_set->index = VFIO_PCI_INTX_IRQ_INDEX;
+	irq_set->start = 0;
+
+	ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
+
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Error unmasking INTx interrupts for fd %d\n",
+						intr_handle->fd);
+		return -1;
+	}
+
+	/* disable INTx*/
+	memset(irq_set, 0, len);
+	irq_set->argsz = len;
+	irq_set->count = 0;
+	irq_set->flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_TRIGGER;
+	irq_set->index = VFIO_PCI_INTX_IRQ_INDEX;
+	irq_set->start = 0;
+
+	ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
+
+	if (ret) {
+		RTE_LOG(ERR, EAL,
+			"Error disabling INTx interrupts for fd %d\n", intr_handle->fd);
+		return -1;
+	}
+	return 0;
+}
+
+/* enable MSI-X interrupts */
+static int
+vfio_enable_msi(struct rte_intr_handle *intr_handle) {
+	int len, ret;
+	char irq_set_buf[IRQ_SET_BUF_LEN];
+	struct vfio_irq_set *irq_set;
+	int *fd_ptr;
+
+	len = sizeof(irq_set_buf);
+
+	irq_set = (struct vfio_irq_set *) irq_set_buf;
+	irq_set->argsz = len;
+	irq_set->count = 1;
+	irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD | VFIO_IRQ_SET_ACTION_TRIGGER;
+	irq_set->index = VFIO_PCI_MSI_IRQ_INDEX;
+	irq_set->start = 0;
+	fd_ptr = (int *) &irq_set->data;
+	*fd_ptr = intr_handle->fd;
+
+	ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
+
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Error enabling MSI interrupts for fd %d\n",
+						intr_handle->fd);
+		return -1;
+	}
+
+	/* manually trigger interrupt to enable it */
+	memset(irq_set, 0, len);
+	len = sizeof(struct vfio_irq_set);
+	irq_set->argsz = len;
+	irq_set->count = 1;
+	irq_set->flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_TRIGGER;
+	irq_set->index = VFIO_PCI_MSI_IRQ_INDEX;
+	irq_set->start = 0;
+
+	ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
+
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Error triggering MSI interrupts for fd %d\n",
+						intr_handle->fd);
+		return -1;
+	}
+	return 0;
+}
+
+/* disable MSI-X interrupts */
+static int
+vfio_disable_msi(struct rte_intr_handle *intr_handle) {
+	struct vfio_irq_set *irq_set;
+	char irq_set_buf[IRQ_SET_BUF_LEN];
+	int len, ret;
+
+	len = sizeof(struct vfio_irq_set);
+
+	irq_set = (struct vfio_irq_set *) irq_set_buf;
+	irq_set->argsz = len;
+	irq_set->count = 0;
+	irq_set->flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_TRIGGER;
+	irq_set->index = VFIO_PCI_MSI_IRQ_INDEX;
+	irq_set->start = 0;
+
+	ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
+
+	if (ret)
+		RTE_LOG(ERR, EAL,
+			"Error disabling MSI interrupts for fd %d\n", intr_handle->fd);
+
+	return ret;
+}
+
+/* enable MSI-X interrupts */
+static int
+vfio_enable_msix(struct rte_intr_handle *intr_handle) {
+	int len, ret;
+	char irq_set_buf[IRQ_SET_BUF_LEN];
+	struct vfio_irq_set *irq_set;
+	int *fd_ptr;
+
+	len = sizeof(irq_set_buf);
+
+	irq_set = (struct vfio_irq_set *) irq_set_buf;
+	irq_set->argsz = len;
+	irq_set->count = 1;
+	irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD | VFIO_IRQ_SET_ACTION_TRIGGER;
+	irq_set->index = VFIO_PCI_MSIX_IRQ_INDEX;
+	irq_set->start = 0;
+	fd_ptr = (int *) &irq_set->data;
+	*fd_ptr = intr_handle->fd;
+
+	ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
+
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Error enabling MSI-X interrupts for fd %d\n",
+						intr_handle->fd);
+		return -1;
+	}
+
+	/* manually trigger interrupt to enable it */
+	memset(irq_set, 0, len);
+	len = sizeof(struct vfio_irq_set);
+	irq_set->argsz = len;
+	irq_set->count = 1;
+	irq_set->flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_TRIGGER;
+	irq_set->index = VFIO_PCI_MSIX_IRQ_INDEX;
+	irq_set->start = 0;
+
+	ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
+
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Error triggering MSI-X interrupts for fd %d\n",
+						intr_handle->fd);
+		return -1;
+	}
+	return 0;
+}
+
+/* disable MSI-X interrupts */
+static int
+vfio_disable_msix(struct rte_intr_handle *intr_handle) {
+	struct vfio_irq_set *irq_set;
+	char irq_set_buf[IRQ_SET_BUF_LEN];
+	int len, ret;
+
+	len = sizeof(struct vfio_irq_set);
+
+	irq_set = (struct vfio_irq_set *) irq_set_buf;
+	irq_set->argsz = len;
+	irq_set->count = 0;
+	irq_set->flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_TRIGGER;
+	irq_set->index = VFIO_PCI_MSIX_IRQ_INDEX;
+	irq_set->start = 0;
+
+	ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set);
+
+	if (ret)
+		RTE_LOG(ERR, EAL,
+			"Error disabling MSI-X interrupts for fd %d\n", intr_handle->fd);
+
+	return ret;
+}
+#endif
+
 int
 rte_intr_callback_register(struct rte_intr_handle *intr_handle,
 			rte_intr_callback_fn cb, void *cb_arg)
@@ -276,6 +518,20 @@ rte_intr_enable(struct rte_intr_handle *intr_handle)
 	/* not used at this moment */
 	case RTE_INTR_HANDLE_ALARM:
 		return -1;
+#ifdef VFIO_PRESENT
+	case RTE_INTR_HANDLE_VFIO_MSIX:
+		if (vfio_enable_msix(intr_handle))
+			return -1;
+		break;
+	case RTE_INTR_HANDLE_VFIO_MSI:
+		if (vfio_enable_msi(intr_handle))
+			return -1;
+		break;
+	case RTE_INTR_HANDLE_VFIO_LEGACY:
+		if (vfio_enable_intx(intr_handle))
+			return -1;
+		break;
+#endif
 	/* unknown handle type */
 	default:
 		RTE_LOG(ERR, EAL,
@@ -300,7 +556,7 @@ rte_intr_disable(struct rte_intr_handle *intr_handle)
 	case RTE_INTR_HANDLE_UIO:
 		if (write(intr_handle->fd, &value, sizeof(value)) < 0){
 			RTE_LOG(ERR, EAL,
-				"Error enabling interrupts for fd %d\n",
+				"Error disabling interrupts for fd %d\n",
 							intr_handle->fd);
 			return -1;
 		}
@@ -308,6 +564,20 @@ rte_intr_disable(struct rte_intr_handle *intr_handle)
 	/* not used at this moment */
 	case RTE_INTR_HANDLE_ALARM:
 		return -1;
+#ifdef VFIO_PRESENT
+	case RTE_INTR_HANDLE_VFIO_MSIX:
+		if (vfio_disable_msix(intr_handle))
+			return -1;
+		break;
+	case RTE_INTR_HANDLE_VFIO_MSI:
+		if (vfio_disable_msi(intr_handle))
+			return -1;
+		break;
+	case RTE_INTR_HANDLE_VFIO_LEGACY:
+		if (vfio_disable_intx(intr_handle))
+			return -1;
+		break;
+#endif
 	/* unknown handle type */
 	default:
 		RTE_LOG(ERR, EAL,
@@ -357,11 +627,18 @@ eal_intr_process_interrupts(struct epoll_event *events, int nfds)
 		/* set the length to be read dor different handle type */
 		switch (src->intr_handle.type) {
 		case RTE_INTR_HANDLE_UIO:
-			bytes_read = 4;
+			bytes_read = sizeof(buf.uio_intr_count);
 			break;
 		case RTE_INTR_HANDLE_ALARM:
-			bytes_read = sizeof(uint64_t);
+			bytes_read = sizeof(buf.timerfd_num);
+			break;
+#ifdef VFIO_PRESENT
+		case RTE_INTR_HANDLE_VFIO_MSIX:
+		case RTE_INTR_HANDLE_VFIO_MSI:
+		case RTE_INTR_HANDLE_VFIO_LEGACY:
+			bytes_read = sizeof(buf.vfio_intr_count);
 			break;
+#endif
 		default:
 			bytes_read = 1;
 			break;
@@ -397,7 +674,7 @@ eal_intr_process_interrupts(struct epoll_event *events, int nfds)
 				active_cb.cb_fn(&src->intr_handle,
 					active_cb.cb_arg);
 
-				/*get the lcok back. */
+				/*get the lock back. */
 				rte_spinlock_lock(&intr_lock);
 			}
 		}
diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
index 87a9cf6..23eafd9 100644
--- a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
+++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_interrupts.h
@@ -41,12 +41,16 @@
 enum rte_intr_handle_type {
 	RTE_INTR_HANDLE_UNKNOWN = 0,
 	RTE_INTR_HANDLE_UIO,      /**< uio device handle */
+	RTE_INTR_HANDLE_VFIO_LEGACY,  /**< vfio device handle (legacy) */
+	RTE_INTR_HANDLE_VFIO_MSI,     /**< vfio device handle (MSI) */
+	RTE_INTR_HANDLE_VFIO_MSIX,    /**< vfio device handle (MSIX) */
 	RTE_INTR_HANDLE_ALARM,    /**< alarm handle */
 	RTE_INTR_HANDLE_MAX
 };
 
 /** Handle for interrupts. */
 struct rte_intr_handle {
+	int vfio_dev_fd;                 /**< VFIO device file descriptor */
 	int fd;                          /**< file descriptor */
 	enum rte_intr_handle_type type;  /**< handle type */
 };
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v6 11/20] eal: remove -Wno-return-type for non-existent eal_hpet.c
  2014-06-13 14:52         ` [dpdk-dev] [PATCH v6 " Anatoly Burakov
                             ` (9 preceding siblings ...)
  2014-06-13 14:52           ` [dpdk-dev] [PATCH v6 10/20] interrupts: Add support for VFIO interrupts Anatoly Burakov
@ 2014-06-13 14:52           ` Anatoly Burakov
  2014-06-13 14:52           ` [dpdk-dev] [PATCH v6 12/20] vfio: create mapping code for VFIO Anatoly Burakov
                             ` (9 subsequent siblings)
  20 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-06-13 14:52 UTC (permalink / raw)
  To: dev

eal_hpet.c was renamed to eal_timer.c and, thanks to code changes, does
not need the -Wno-return-type any more.
---
 lib/librte_eal/linuxapp/eal/Makefile | 1 -
 1 file changed, 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index 6e320ec..00a2115 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -93,7 +93,6 @@ CFLAGS_eal_common_whitelist.o := -D_GNU_SOURCE
 # http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12603
 ifeq ($(CONFIG_RTE_TOOLCHAIN_GCC),y)
 CFLAGS_eal_thread.o += -Wno-return-type
-CFLAGS_eal_hpet.o += -Wno-return-type
 endif
 
 INC := rte_per_lcore.h rte_lcore.h rte_interrupts.h rte_kni_common.h rte_dom0_common.h
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v6 12/20] vfio: create mapping code for VFIO
  2014-06-13 14:52         ` [dpdk-dev] [PATCH v6 " Anatoly Burakov
                             ` (10 preceding siblings ...)
  2014-06-13 14:52           ` [dpdk-dev] [PATCH v6 11/20] eal: remove -Wno-return-type for non-existent eal_hpet.c Anatoly Burakov
@ 2014-06-13 14:52           ` Anatoly Burakov
  2014-06-13 14:52           ` [dpdk-dev] [PATCH v6 13/20] vfio: add multiprocess support Anatoly Burakov
                             ` (8 subsequent siblings)
  20 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-06-13 14:52 UTC (permalink / raw)
  To: dev

Adding code to support VFIO mapping (primary processes only). Most of
the things are done via ioctl() calls on either /dev/vfio/vfio (the
container) or a /dev/vfio/$GROUP_NR (IOMMU group).

In a nutshell, the code does the following:
1. creates a VFIO container (an entity that allows sharing IOMMU DMA
   mappings between devices)
2. checks if a given PCI device is a member of an IOMMU group (if it's
   not, this indicates that the device isn't bound to VFIO)
3. calls open() the group file to obtain a group fd
4. checks if the group is viable (that is, if all the devices in the
   same IOMMU group are either bound to VFIO or not bound to anything)
5. adds the group to a container
6. sets up DMA mappings (only done once, mapping whole DPDK hugepage
   memory for DMA, with a 1:1 correspondence of IOVA to PA)
7. gets the actual PCI device fd from the group fd (can fail, which
   simply means that this particular device is not bound to VFIO)
8. maps BARs (MSI-X BAR cannot be mmaped, so skipping it)
9. sets up interrupt structures (but not enables them!)
10. enables PCI bus mastering
---
 lib/librte_eal/linuxapp/eal/Makefile               |   2 +
 lib/librte_eal/linuxapp/eal/eal.c                  |   2 +
 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c         | 709 +++++++++++++++++++++
 .../linuxapp/eal/include/eal_internal_cfg.h        |   3 +
 lib/librte_eal/linuxapp/eal/include/eal_pci_init.h |  31 +
 lib/librte_eal/linuxapp/eal/include/eal_vfio.h     |   6 +
 6 files changed, 753 insertions(+)
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c

diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index 00a2115..91012fc 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -58,6 +58,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_thread.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_log.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci_uio.c
+SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci_vfio.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_debug.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_lcore.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_timer.c
@@ -87,6 +88,7 @@ CFLAGS_eal_log.o := -D_GNU_SOURCE
 CFLAGS_eal_common_log.o := -D_GNU_SOURCE
 CFLAGS_eal_hugepage_info.o := -D_GNU_SOURCE
 CFLAGS_eal_pci.o := -D_GNU_SOURCE
+CFLAGS_eal_pci_vfio.o := -D_GNU_SOURCE
 CFLAGS_eal_common_whitelist.o := -D_GNU_SOURCE
 
 # workaround for a gcc bug with noreturn attribute
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index 070bdc9..faa4c93 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -649,6 +649,8 @@ eal_parse_args(int argc, char **argv)
 	internal_config.force_sockets = 0;
 	internal_config.syslog_facility = LOG_DAEMON;
 	internal_config.xen_dom0_support = 0;
+	/* if set to NONE, interrupt mode is determined automatically */
+	internal_config.vfio_intr_mode = RTE_INTR_MODE_NONE;
 #ifdef RTE_LIBEAL_USE_HPET
 	internal_config.no_hpet = 0;
 #else
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
new file mode 100644
index 0000000..867467b
--- /dev/null
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
@@ -0,0 +1,709 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <string.h>
+#include <fcntl.h>
+#include <linux/pci_regs.h>
+#include <sys/eventfd.h>
+#include <sys/socket.h>
+#include <sys/ioctl.h>
+
+#include <rte_log.h>
+#include <rte_pci.h>
+#include <rte_tailq.h>
+#include <rte_eal_memconfig.h>
+#include <rte_malloc.h>
+
+#include "eal_filesystem.h"
+#include "eal_pci_init.h"
+#include "eal_vfio.h"
+
+/**
+ * @file
+ * PCI probing under linux (VFIO version)
+ *
+ * This code tries to determine if the PCI device is bound to VFIO driver,
+ * and initialize it (map BARs, set up interrupts) if that's the case.
+ *
+ * This file is only compiled if CONFIG_RTE_EAL_VFIO is set to "y".
+ */
+
+#ifdef VFIO_PRESENT
+
+#define VFIO_DIR "/dev/vfio"
+#define VFIO_CONTAINER_PATH "/dev/vfio/vfio"
+#define VFIO_GROUP_FMT "/dev/vfio/%u"
+#define VFIO_GET_REGION_ADDR(x) ((uint64_t) x << 40ULL)
+
+/* per-process VFIO config */
+static struct vfio_config vfio_cfg;
+
+/* get PCI BAR number where MSI-X interrupts are */
+static int
+pci_vfio_get_msix_bar(int fd, int *msix_bar)
+{
+	int ret;
+	uint32_t reg;
+	uint8_t cap_id, cap_offset;
+
+	/* read PCI capability pointer from config space */
+	ret = pread64(fd, &reg, sizeof(reg),
+			VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
+			PCI_CAPABILITY_LIST);
+	if (ret != sizeof(reg)) {
+		RTE_LOG(ERR, EAL, "Cannot read capability pointer from PCI "
+				"config space!\n");
+		return -1;
+	}
+
+	/* we need first byte */
+	cap_offset = reg & 0xFF;
+
+	while (cap_offset) {
+
+		/* read PCI capability ID */
+		ret = pread64(fd, &reg, sizeof(reg),
+				VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
+				cap_offset);
+		if (ret != sizeof(reg)) {
+			RTE_LOG(ERR, EAL, "Cannot read capability ID from PCI "
+					"config space!\n");
+			return -1;
+		}
+
+		/* we need first byte */
+		cap_id = reg & 0xFF;
+
+		/* if we haven't reached MSI-X, check next capability */
+		if (cap_id != PCI_CAP_ID_MSIX) {
+			ret = pread64(fd, &reg, sizeof(reg),
+					VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
+					cap_offset);
+			if (ret != sizeof(reg)) {
+				RTE_LOG(ERR, EAL, "Cannot read capability pointer from PCI "
+						"config space!\n");
+				return -1;
+			}
+
+			/* we need second byte */
+			cap_offset = (reg & 0xFF00) >> 8;
+
+			continue;
+		}
+		/* else, read table offset */
+		else {
+			/* table offset resides in the next 4 bytes */
+			ret = pread64(fd, &reg, sizeof(reg),
+					VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
+					cap_offset + 4);
+			if (ret != sizeof(reg)) {
+				RTE_LOG(ERR, EAL, "Cannot read table offset from PCI config "
+						"space!\n");
+				return -1;
+			}
+
+			*msix_bar = reg & RTE_PCI_MSIX_TABLE_BIR;
+
+			return 0;
+		}
+	}
+	return 0;
+}
+
+/* set PCI bus mastering */
+static int
+pci_vfio_set_bus_master(int dev_fd)
+{
+	uint16_t reg;
+	int ret;
+
+	ret = pread64(dev_fd, &reg, sizeof(reg),
+			VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
+			PCI_COMMAND);
+	if (ret != sizeof(reg)) {
+		RTE_LOG(ERR, EAL, "Cannot read command from PCI config space!\n");
+		return -1;
+	}
+
+	/* set the master bit */
+	reg |= PCI_COMMAND_MASTER;
+
+	ret = pwrite64(dev_fd, &reg, sizeof(reg),
+			VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
+			PCI_COMMAND);
+
+	if (ret != sizeof(reg)) {
+		RTE_LOG(ERR, EAL, "Cannot write command to PCI config space!\n");
+		return -1;
+	}
+
+	return 0;
+}
+
+/* set up DMA mappings */
+static int
+pci_vfio_setup_dma_maps(int vfio_container_fd)
+{
+	const struct rte_memseg *ms = rte_eal_get_physmem_layout();
+	int i, ret;
+
+	ret = ioctl(vfio_container_fd, VFIO_SET_IOMMU,
+			VFIO_TYPE1_IOMMU);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "  cannot set IOMMU type!\n");
+		return -1;
+	}
+
+	/* map all DPDK segments for DMA. use 1:1 PA to IOVA mapping */
+	for (i = 0; i < RTE_MAX_MEMSEG; i++) {
+		struct vfio_iommu_type1_dma_map dma_map;
+
+		if (ms[i].addr == NULL)
+			break;
+
+		memset(&dma_map, 0, sizeof(dma_map));
+		dma_map.argsz = sizeof(struct vfio_iommu_type1_dma_map);
+		dma_map.vaddr = ms[i].addr_64;
+		dma_map.size = ms[i].len;
+		dma_map.iova = ms[i].phys_addr;
+		dma_map.flags = VFIO_DMA_MAP_FLAG_READ | VFIO_DMA_MAP_FLAG_WRITE;
+
+		ret = ioctl(vfio_container_fd, VFIO_IOMMU_MAP_DMA, &dma_map);
+
+		if (ret) {
+			RTE_LOG(ERR, EAL, "  cannot set up DMA remapping!\n");
+			return -1;
+		}
+	}
+
+	return 0;
+}
+
+/* set up interrupt support (but not enable interrupts) */
+static int
+pci_vfio_setup_interrupts(struct rte_pci_device *dev, int vfio_dev_fd)
+{
+	int i, ret, intr_idx;
+
+	/* default to invalid index */
+	intr_idx = VFIO_PCI_NUM_IRQS;
+
+	/* get interrupt type from internal config (MSI-X by default, can be
+	 * overriden from the command line
+	 */
+	switch (internal_config.vfio_intr_mode) {
+	case RTE_INTR_MODE_MSIX:
+		intr_idx = VFIO_PCI_MSIX_IRQ_INDEX;
+		break;
+	case RTE_INTR_MODE_MSI:
+		intr_idx = VFIO_PCI_MSI_IRQ_INDEX;
+		break;
+	case RTE_INTR_MODE_LEGACY:
+		intr_idx = VFIO_PCI_INTX_IRQ_INDEX;
+		break;
+	/* don't do anything if we want to automatically determine interrupt type */
+	case RTE_INTR_MODE_NONE:
+		break;
+	default:
+		RTE_LOG(ERR, EAL, "  unknown default interrupt type!\n");
+		return -1;
+	}
+
+	/* start from MSI-X interrupt type */
+	for (i = VFIO_PCI_MSIX_IRQ_INDEX; i >= 0; i--) {
+		struct vfio_irq_info irq = { .argsz = sizeof(irq) };
+		int fd = -1;
+
+		/* skip interrupt modes we don't want */
+		if (internal_config.vfio_intr_mode != RTE_INTR_MODE_NONE &&
+				i != intr_idx)
+			continue;
+
+		irq.index = i;
+
+		ret = ioctl(vfio_dev_fd, VFIO_DEVICE_GET_IRQ_INFO, &irq);
+		if (ret < 0) {
+			RTE_LOG(ERR, EAL, "  cannot get IRQ info!\n");
+			return -1;
+		}
+
+		/* if this vector cannot be used with eventfd, fail if we explicitly
+		 * specified interrupt type, otherwise continue */
+		if ((irq.flags & VFIO_IRQ_INFO_EVENTFD) == 0) {
+			if (internal_config.vfio_intr_mode != RTE_INTR_MODE_NONE) {
+				RTE_LOG(ERR, EAL,
+						"  interrupt vector does not support eventfd!\n");
+				return -1;
+			} else
+				continue;
+		}
+
+		/* set up an eventfd for interrupts */
+		fd = eventfd(0, 0);
+		if (fd < 0) {
+			RTE_LOG(ERR, EAL, "  cannot set up eventfd!\n");
+			return -1;
+		}
+
+		dev->intr_handle.fd = fd;
+		dev->intr_handle.vfio_dev_fd = vfio_dev_fd;
+
+		switch (i) {
+		case VFIO_PCI_MSIX_IRQ_INDEX:
+			internal_config.vfio_intr_mode = RTE_INTR_MODE_MSIX;
+			dev->intr_handle.type = RTE_INTR_HANDLE_VFIO_MSIX;
+			break;
+		case VFIO_PCI_MSI_IRQ_INDEX:
+			internal_config.vfio_intr_mode = RTE_INTR_MODE_MSI;
+			dev->intr_handle.type = RTE_INTR_HANDLE_VFIO_MSI;
+			break;
+		case VFIO_PCI_INTX_IRQ_INDEX:
+			internal_config.vfio_intr_mode = RTE_INTR_MODE_LEGACY;
+			dev->intr_handle.type = RTE_INTR_HANDLE_VFIO_LEGACY;
+			break;
+		default:
+			RTE_LOG(ERR, EAL, "  unknown interrupt type!\n");
+			return -1;
+		}
+
+		return 0;
+	}
+
+	/* if we're here, we haven't found a suitable interrupt vector */
+	return -1;
+}
+
+/* open container fd or get an existing one */
+static int
+pci_vfio_get_container_fd(void)
+{
+	int ret, vfio_container_fd;
+
+	/* if we're in a primary process, try to open the container */
+	if (internal_config.process_type == RTE_PROC_PRIMARY) {
+		vfio_container_fd = open(VFIO_CONTAINER_PATH, O_RDWR);
+		if (vfio_container_fd < 0) {
+			RTE_LOG(ERR, EAL, "  cannot open VFIO container!\n");
+			return -1;
+		}
+
+		/* check VFIO API version */
+		ret = ioctl(vfio_container_fd, VFIO_GET_API_VERSION);
+		if (ret != VFIO_API_VERSION) {
+			RTE_LOG(ERR, EAL, "  unknown VFIO API version!\n");
+			close(vfio_container_fd);
+			return -1;
+		}
+
+		/* check if we support IOMMU type 1 */
+		ret = ioctl(vfio_container_fd, VFIO_CHECK_EXTENSION, VFIO_TYPE1_IOMMU);
+		if (!ret) {
+			RTE_LOG(ERR, EAL, "  unknown IOMMU driver!\n");
+			close(vfio_container_fd);
+			return -1;
+		}
+
+		return vfio_container_fd;
+	}
+
+	return -1;
+}
+
+/* open group fd or get an existing one */
+static int
+pci_vfio_get_group_fd(int iommu_group_no)
+{
+	int i;
+	int vfio_group_fd;
+	char filename[PATH_MAX];
+
+	/* check if we already have the group descriptor open */
+	for (i = 0; i < vfio_cfg.vfio_group_idx; i++)
+		if (vfio_cfg.vfio_groups[i].group_no == iommu_group_no)
+			return vfio_cfg.vfio_groups[i].fd;
+
+	/* if primary, try to open the group */
+	if (internal_config.process_type == RTE_PROC_PRIMARY) {
+		rte_snprintf(filename, sizeof(filename),
+				 VFIO_GROUP_FMT, iommu_group_no);
+		vfio_group_fd = open(filename, O_RDWR);
+		if (vfio_group_fd < 0) {
+			/* if file not found, it's not an error */
+			if (errno != ENOENT) {
+				RTE_LOG(ERR, EAL, "Cannot open %s: %s\n", filename,
+						strerror(errno));
+				return -1;
+			}
+			return 0;
+		}
+
+		/* if the fd is valid, create a new group for it */
+		if (vfio_cfg.vfio_group_idx == VFIO_MAX_GROUPS) {
+			RTE_LOG(ERR, EAL, "Maximum number of VFIO groups reached!\n");
+			return -1;
+		}
+		vfio_cfg.vfio_groups[vfio_cfg.vfio_group_idx].group_no = iommu_group_no;
+		vfio_cfg.vfio_groups[vfio_cfg.vfio_group_idx].fd = vfio_group_fd;
+		return vfio_group_fd;
+	}
+	return -1;
+}
+
+/* parse IOMMU group number for a PCI device
+ * returns -1 for errors, 0 for non-existent group */
+static int
+pci_vfio_get_group_no(const char *pci_addr)
+{
+	char linkname[PATH_MAX];
+	char filename[PATH_MAX];
+	char *tok[16], *group_tok, *end;
+	int ret, iommu_group_no;
+
+	memset(linkname, 0, sizeof(linkname));
+	memset(filename, 0, sizeof(filename));
+
+	/* try to find out IOMMU group for this device */
+	rte_snprintf(linkname, sizeof(linkname),
+			 SYSFS_PCI_DEVICES "/%s/iommu_group", pci_addr);
+
+	ret = readlink(linkname, filename, sizeof(filename));
+
+	/* if the link doesn't exist, no VFIO for us */
+	if (ret < 0)
+		return 0;
+
+	ret = rte_strsplit(filename, sizeof(filename),
+			tok, RTE_DIM(tok), '/');
+
+	if (ret <= 0) {
+		RTE_LOG(ERR, EAL, "  %s cannot get IOMMU group\n", pci_addr);
+		return -1;
+	}
+
+	/* IOMMU group is always the last token */
+	errno = 0;
+	group_tok = tok[ret - 1];
+	end = group_tok;
+	iommu_group_no = strtol(group_tok, &end, 10);
+	if ((end != group_tok && *end != '\0') || errno != 0) {
+		RTE_LOG(ERR, EAL, "  %s error parsing IOMMU number!\n", pci_addr);
+		return -1;
+	}
+
+	return iommu_group_no;
+}
+
+static void
+clear_current_group(void)
+{
+	vfio_cfg.vfio_groups[vfio_cfg.vfio_group_idx].group_no = 0;
+	vfio_cfg.vfio_groups[vfio_cfg.vfio_group_idx].fd = -1;
+}
+
+
+/*
+ * map the PCI resources of a PCI device in virtual memory (VFIO version).
+ * primary and secondary processes follow almost exactly the same path
+ */
+int
+pci_vfio_map_resource(struct rte_pci_device *dev)
+{
+	struct vfio_group_status group_status = {
+			.argsz = sizeof(group_status)
+	};
+	struct vfio_device_info device_info = { .argsz = sizeof(device_info) };
+	int vfio_group_fd, vfio_dev_fd;
+	int iommu_group_no;
+	char pci_addr[PATH_MAX] = {0};
+	struct rte_pci_addr *loc = &dev->addr;
+	int i, ret, msix_bar;
+	struct mapped_pci_resource *vfio_res = NULL;
+	struct pci_map *maps;
+
+	dev->intr_handle.fd = -1;
+	dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
+
+	/* store PCI address string */
+	rte_snprintf(pci_addr, sizeof(pci_addr), PCI_PRI_FMT,
+			loc->domain, loc->bus, loc->devid, loc->function);
+
+	/* get container fd (needs to be done only once per initialization) */
+	if (vfio_cfg.vfio_container_fd == -1) {
+		int vfio_container_fd = pci_vfio_get_container_fd();
+		if (vfio_container_fd < 0) {
+			RTE_LOG(ERR, EAL, "  %s cannot open VFIO container!\n", pci_addr);
+			return -1;
+		}
+
+		vfio_cfg.vfio_container_fd = vfio_container_fd;
+	}
+
+	/* get group number */
+	iommu_group_no = pci_vfio_get_group_no(pci_addr);
+
+	/* if 0, group doesn't exist */
+	if (iommu_group_no == 0) {
+		RTE_LOG(WARNING, EAL, "  %s not managed by VFIO driver, skipping\n",
+				pci_addr);
+		return 1;
+	}
+	/* if negative, something failed */
+	else if (iommu_group_no < 0)
+		return -1;
+
+	/* get the actual group fd */
+	vfio_group_fd = pci_vfio_get_group_fd(iommu_group_no);
+	if (vfio_group_fd < 0)
+		return -1;
+
+	/* store group fd */
+	vfio_cfg.vfio_groups[vfio_cfg.vfio_group_idx].group_no = iommu_group_no;
+	vfio_cfg.vfio_groups[vfio_cfg.vfio_group_idx].fd = vfio_group_fd;
+
+	/* if group_fd == 0, that means the device isn't managed by VFIO */
+	if (vfio_group_fd == 0) {
+		RTE_LOG(WARNING, EAL, "  %s not managed by VFIO driver, skipping\n",
+				pci_addr);
+		/* we store 0 as group fd to distinguish between existing but
+		 * unbound VFIO groups, and groups that don't exist at all.
+		 */
+		vfio_cfg.vfio_group_idx++;
+		return 1;
+	}
+
+	/*
+	 * at this point, we know at least one port on this device is bound to VFIO,
+	 * so we can proceed to try and set this particular port up
+	 */
+
+	/* check if the group is viable */
+	ret = ioctl(vfio_group_fd, VFIO_GROUP_GET_STATUS, &group_status);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "  %s cannot get group status!\n", pci_addr);
+		close(vfio_group_fd);
+		clear_current_group();
+		return -1;
+	} else if (!(group_status.flags & VFIO_GROUP_FLAGS_VIABLE)) {
+		RTE_LOG(ERR, EAL, "  %s VFIO group is not viable!\n", pci_addr);
+		close(vfio_group_fd);
+		clear_current_group();
+		return -1;
+	}
+
+	/*
+	 * at this point, we know that this group is viable (meaning, all devices
+	 * are either bound to VFIO or not bound to anything)
+	 */
+
+	/* check if group does not have a container yet */
+	if (!(group_status.flags & VFIO_GROUP_FLAGS_CONTAINER_SET)) {
+
+		/* add group to a container */
+		ret = ioctl(vfio_group_fd, VFIO_GROUP_SET_CONTAINER,
+				&vfio_cfg.vfio_container_fd);
+		if (ret) {
+			RTE_LOG(ERR, EAL, "  %s cannot add VFIO group to container!\n",
+					pci_addr);
+			close(vfio_group_fd);
+			clear_current_group();
+			return -1;
+		}
+		/*
+		 * at this point we know that this group has been successfully
+		 * initialized, so we increment vfio_group_idx to indicate that we can
+		 * add new groups.
+		 */
+		vfio_cfg.vfio_group_idx++;
+	}
+
+	/*
+	 * set up DMA mappings for container
+	 *
+	 * needs to be done only once, only when at least one group is assigned to
+	 * a container and only in primary process
+	 */
+	if (internal_config.process_type == RTE_PROC_PRIMARY &&
+			vfio_cfg.vfio_container_has_dma == 0) {
+		ret = pci_vfio_setup_dma_maps(vfio_cfg.vfio_container_fd);
+		if (ret) {
+			RTE_LOG(ERR, EAL, "  %s DMA remapping failed!\n", pci_addr);
+			return -1;
+		}
+		vfio_cfg.vfio_container_has_dma = 1;
+	}
+
+	/* get a file descriptor for the device */
+	vfio_dev_fd = ioctl(vfio_group_fd, VFIO_GROUP_GET_DEVICE_FD, pci_addr);
+	if (vfio_dev_fd < 0) {
+		/* if we cannot get a device fd, this simply means that this
+		 * particular port is not bound to VFIO
+		 */
+		RTE_LOG(WARNING, EAL, "  %s not managed by VFIO driver, skipping\n",
+				pci_addr);
+		return 1;
+	}
+
+	/* test and setup the device */
+	ret = ioctl(vfio_dev_fd, VFIO_DEVICE_GET_INFO, &device_info);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "  %s cannot get device info!\n", pci_addr);
+		close(vfio_dev_fd);
+		return -1;
+	}
+
+	/* get MSI-X BAR, if any (we have to know where it is because we can't
+	 * mmap it when using VFIO) */
+	msix_bar = -1;
+	ret = pci_vfio_get_msix_bar(vfio_dev_fd, &msix_bar);
+	if (ret < 0) {
+		RTE_LOG(ERR, EAL, "  %s cannot get MSI-X BAR number!\n", pci_addr);
+		close(vfio_dev_fd);
+		return -1;
+	}
+
+	/* if we're in a primary process, allocate vfio_res and get region info */
+	if (internal_config.process_type == RTE_PROC_PRIMARY) {
+		vfio_res = rte_zmalloc("VFIO_RES", sizeof(*vfio_res), 0);
+		if (vfio_res == NULL) {
+			RTE_LOG(ERR, EAL,
+				"%s(): cannot store uio mmap details\n", __func__);
+			close(vfio_dev_fd);
+			return -1;
+		}
+		memcpy(&vfio_res->pci_addr, &dev->addr, sizeof(vfio_res->pci_addr));
+
+		/* get number of registers (up to BAR5) */
+		vfio_res->nb_maps = RTE_MIN((int) device_info.num_regions,
+				VFIO_PCI_BAR5_REGION_INDEX + 1);
+	}
+
+	/* map BARs */
+	maps = vfio_res->maps;
+
+	for (i = 0; i < (int) vfio_res->nb_maps; i++) {
+		struct vfio_region_info reg = { .argsz = sizeof(reg) };
+		void *bar_addr;
+
+		reg.index = i;
+
+		ret = ioctl(vfio_dev_fd, VFIO_DEVICE_GET_REGION_INFO, &reg);
+
+		if (ret) {
+			RTE_LOG(ERR, EAL, "  %s cannot get device region info!\n",
+					pci_addr);
+			close(vfio_dev_fd);
+			if (internal_config.process_type == RTE_PROC_PRIMARY)
+				rte_free(vfio_res);
+			return -1;
+		}
+
+		/* skip non-mmapable BARs */
+		if ((reg.flags & VFIO_REGION_INFO_FLAG_MMAP) == 0)
+			continue;
+
+		/* skip MSI-X BAR */
+		if (i == msix_bar)
+			continue;
+
+		bar_addr = pci_map_resource(maps[i].addr, vfio_dev_fd, reg.offset,
+				reg.size);
+
+		if (bar_addr == NULL) {
+			RTE_LOG(ERR, EAL, "  %s mapping BAR%i failed: %s\n", pci_addr, i,
+					strerror(errno));
+			close(vfio_dev_fd);
+			if (internal_config.process_type == RTE_PROC_PRIMARY)
+				rte_free(vfio_res);
+			return -1;
+		}
+
+		maps[i].addr = bar_addr;
+		maps[i].offset = reg.offset;
+		maps[i].size = reg.size;
+		dev->mem_resource[i].addr = bar_addr;
+	}
+
+	/* if secondary process, do not set up interrupts */
+	if (internal_config.process_type == RTE_PROC_PRIMARY) {
+		if (pci_vfio_setup_interrupts(dev, vfio_dev_fd) != 0) {
+			RTE_LOG(ERR, EAL, "  %s error setting up interrupts!\n", pci_addr);
+			close(vfio_dev_fd);
+			rte_free(vfio_res);
+			return -1;
+		}
+
+		/* set bus mastering for the device */
+		if (pci_vfio_set_bus_master(vfio_dev_fd)) {
+			RTE_LOG(ERR, EAL, "  %s cannot set up bus mastering!\n", pci_addr);
+			close(vfio_dev_fd);
+			rte_free(vfio_res);
+			return -1;
+		}
+
+		/* Reset the device */
+		ioctl(vfio_dev_fd, VFIO_DEVICE_RESET);
+	}
+
+	if (internal_config.process_type == RTE_PROC_PRIMARY)
+		TAILQ_INSERT_TAIL(pci_res_list, vfio_res, next);
+
+	return 0;
+}
+
+int
+pci_vfio_enable(void)
+{
+	/* initialize group list */
+	int i;
+
+	for (i = 0; i < VFIO_MAX_GROUPS; i++) {
+		vfio_cfg.vfio_groups[i].fd = -1;
+		vfio_cfg.vfio_groups[i].group_no = -1;
+	}
+	vfio_cfg.vfio_container_fd = -1;
+
+	/* check if we have VFIO driver enabled */
+	if (access(VFIO_DIR, F_OK) == 0)
+		vfio_cfg.vfio_enabled = 1;
+	else
+		RTE_LOG(INFO, EAL, "VFIO driver not loaded or wrong permissions\n");
+
+	return 0;
+}
+
+int
+pci_vfio_is_enabled(void)
+{
+	return vfio_cfg.vfio_enabled;
+}
+#endif
diff --git a/lib/librte_eal/linuxapp/eal/include/eal_internal_cfg.h b/lib/librte_eal/linuxapp/eal/include/eal_internal_cfg.h
index dd17df2..498ade2 100644
--- a/lib/librte_eal/linuxapp/eal/include/eal_internal_cfg.h
+++ b/lib/librte_eal/linuxapp/eal/include/eal_internal_cfg.h
@@ -40,6 +40,7 @@
 #define _EAL_LINUXAPP_INTERNAL_CFG
 
 #include <rte_eal.h>
+#include <rte_pci_dev_feature_defs.h>
 
 #define MAX_HUGEPAGE_SIZES 3  /**< support up to 3 page sizes */
 
@@ -76,6 +77,8 @@ struct internal_config {
 	volatile uint64_t socket_mem[RTE_MAX_NUMA_NODES]; /**< amount of memory per socket */
 	uintptr_t base_virtaddr;          /**< base address to try and reserve memory from */
 	volatile int syslog_facility;	  /**< facility passed to openlog() */
+	/** default interrupt mode for VFIO */
+	volatile enum rte_intr_mode vfio_intr_mode;
 	const char *hugefile_prefix;      /**< the base filename of hugetlbfs files */
 	const char *hugepage_dir;         /**< specific hugetlbfs directory to use */
 
diff --git a/lib/librte_eal/linuxapp/eal/include/eal_pci_init.h b/lib/librte_eal/linuxapp/eal/include/eal_pci_init.h
index 87bdfe7..59a7e79 100644
--- a/lib/librte_eal/linuxapp/eal/include/eal_pci_init.h
+++ b/lib/librte_eal/linuxapp/eal/include/eal_pci_init.h
@@ -34,6 +34,8 @@
 #ifndef EAL_PCI_INIT_H_
 #define EAL_PCI_INIT_H_
 
+#include "eal_vfio.h"
+
 struct pci_map {
 	void *addr;
 	uint64_t offset;
@@ -63,4 +65,33 @@ void *pci_map_resource(void *requested_addr, int fd, off_t offset,
 /* map IGB_UIO resource prototype */
 int pci_uio_map_resource(struct rte_pci_device *dev);
 
+#ifdef VFIO_PRESENT
+
+#define VFIO_MAX_GROUPS 64
+
+int pci_vfio_enable(void);
+int pci_vfio_is_enabled(void);
+
+/* map VFIO resource prototype */
+int pci_vfio_map_resource(struct rte_pci_device *dev);
+
+/*
+ * we don't need to store device fd's anywhere since they can be obtained from
+ * the group fd via an ioctl() call.
+ */
+struct vfio_group {
+	int group_no;
+	int fd;
+};
+
+struct vfio_config {
+	int vfio_enabled;
+	int vfio_container_fd;
+	int vfio_container_has_dma;
+	int vfio_group_idx;
+	struct vfio_group vfio_groups[VFIO_MAX_GROUPS];
+};
+
+#endif
+
 #endif /* EAL_PCI_INIT_H_ */
diff --git a/lib/librte_eal/linuxapp/eal/include/eal_vfio.h b/lib/librte_eal/linuxapp/eal/include/eal_vfio.h
index 354e9ca..03e693e 100644
--- a/lib/librte_eal/linuxapp/eal/include/eal_vfio.h
+++ b/lib/librte_eal/linuxapp/eal/include/eal_vfio.h
@@ -42,6 +42,12 @@
 #if LINUX_VERSION_CODE >= KERNEL_VERSION(3, 6, 0)
 #include <linux/vfio.h>
 
+#if LINUX_VERSION_CODE < KERNEL_VERSION(3, 10, 0)
+#define RTE_PCI_MSIX_TABLE_BIR 0x7
+#else
+#define RTE_PCI_MSIX_TABLE_BIR PCI_MSIX_TABLE_BIR
+#endif
+
 #define VFIO_PRESENT
 #endif /* kernel version */
 #endif /* RTE_EAL_VFIO */
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v6 13/20] vfio: add multiprocess support.
  2014-06-13 14:52         ` [dpdk-dev] [PATCH v6 " Anatoly Burakov
                             ` (11 preceding siblings ...)
  2014-06-13 14:52           ` [dpdk-dev] [PATCH v6 12/20] vfio: create mapping code for VFIO Anatoly Burakov
@ 2014-06-13 14:52           ` Anatoly Burakov
  2014-06-13 14:52           ` [dpdk-dev] [PATCH v6 14/20] pci: enable VFIO device binding Anatoly Burakov
                             ` (7 subsequent siblings)
  20 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-06-13 14:52 UTC (permalink / raw)
  To: dev

Since VFIO cannot be used to map the same device twice, secondary
processes receive the device/group fd's by means of communicating over a
local socket. Only group and container fd's should be sent, as device
fd's can be obtained via ioctl() calls' on the group fd.

For multiprocess, VFIO distinguishes between existing but unused groups
(e.g. grups that aren't bound to VFIO driver) and non-existing groups in
order to know if the secondary process requests a valid group, or if
secondary process requests something that doesn't exist.

VFIO multiprocess sync communicates over a simple protocol. It defines
two requests - request for group fd, and request for container fd.
Possible replies are: SOCKET_OK (an OK signal), SOCKET_ERR (error
signal) and SOCKET_NO_FD (a signal that indicates that the requested
VFIO group is valid, but no fd is present for that group - indicating
that the respective group is simply not bound to VFIO driver).

Here is the logic in a nutshell:

1. secondary process sends SOCKET_REQ_CONTAINER or SOCKET_REQ_GROUP
1a. in case of SOCKET_REQ_GROUP, client also then sends group number
2. primary process receives message
2a. in case of invalid group, SOCKET_ERR is sent back to secondary
2b. in case of unbound group, SOCKET_NO_FD is sent back to secondary
2c. in case of valid group, SOCKET_OK is sent and followed by fd
3. socket is closed

in case of any error, socket is closed and SOCKET_ERR is sent.
---
 lib/librte_eal/linuxapp/eal/Makefile               |   1 +
 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c         |  84 ++++-
 lib/librte_eal/linuxapp/eal/eal_pci_vfio_mp_sync.c | 395 +++++++++++++++++++++
 lib/librte_eal/linuxapp/eal/include/eal_pci_init.h |  19 +
 4 files changed, 497 insertions(+), 2 deletions(-)
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_pci_vfio_mp_sync.c

diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index 91012fc..756d6b0 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -59,6 +59,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_log.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci_uio.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci_vfio.c
+SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci_vfio_mp_sync.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_debug.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_lcore.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_timer.c
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
index 867467b..4de6061 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
@@ -304,7 +304,7 @@ pci_vfio_setup_interrupts(struct rte_pci_device *dev, int vfio_dev_fd)
 }
 
 /* open container fd or get an existing one */
-static int
+int
 pci_vfio_get_container_fd(void)
 {
 	int ret, vfio_container_fd;
@@ -334,13 +334,38 @@ pci_vfio_get_container_fd(void)
 		}
 
 		return vfio_container_fd;
+	} else {
+		/*
+		 * if we're in a secondary process, request container fd from the
+		 * primary process via our socket
+		 */
+		int socket_fd;
+
+		socket_fd = vfio_mp_sync_connect_to_primary();
+		if (socket_fd < 0) {
+			RTE_LOG(ERR, EAL, "  cannot connect to primary process!\n");
+			return -1;
+		}
+		if (vfio_mp_sync_send_request(socket_fd, SOCKET_REQ_CONTAINER) < 0) {
+			RTE_LOG(ERR, EAL, "  cannot request container fd!\n");
+			close(socket_fd);
+			return -1;
+		}
+		vfio_container_fd = vfio_mp_sync_receive_fd(socket_fd);
+		if (vfio_container_fd < 0) {
+			RTE_LOG(ERR, EAL, "  cannot get container fd!\n");
+			close(socket_fd);
+			return -1;
+		}
+		close(socket_fd);
+		return vfio_container_fd;
 	}
 
 	return -1;
 }
 
 /* open group fd or get an existing one */
-static int
+int
 pci_vfio_get_group_fd(int iommu_group_no)
 {
 	int i;
@@ -376,6 +401,47 @@ pci_vfio_get_group_fd(int iommu_group_no)
 		vfio_cfg.vfio_groups[vfio_cfg.vfio_group_idx].fd = vfio_group_fd;
 		return vfio_group_fd;
 	}
+	/* if we're in a secondary process, request group fd from the primary
+	 * process via our socket
+	 */
+	else {
+		int socket_fd, ret;
+
+		socket_fd = vfio_mp_sync_connect_to_primary();
+
+		if (socket_fd < 0) {
+			RTE_LOG(ERR, EAL, "  cannot connect to primary process!\n");
+			return -1;
+		}
+		if (vfio_mp_sync_send_request(socket_fd, SOCKET_REQ_GROUP) < 0) {
+			RTE_LOG(ERR, EAL, "  cannot request container fd!\n");
+			close(socket_fd);
+			return -1;
+		}
+		if (vfio_mp_sync_send_request(socket_fd, iommu_group_no) < 0) {
+			RTE_LOG(ERR, EAL, "  cannot send group number!\n");
+			close(socket_fd);
+			return -1;
+		}
+		ret = vfio_mp_sync_receive_request(socket_fd);
+		switch (ret) {
+		case SOCKET_NO_FD:
+			close(socket_fd);
+			return 0;
+		case SOCKET_OK:
+			vfio_group_fd = vfio_mp_sync_receive_fd(socket_fd);
+			/* if we got the fd, return it */
+			if (vfio_group_fd > 0) {
+				close(socket_fd);
+				return vfio_group_fd;
+			}
+			/* fall-through on error */
+		default:
+			RTE_LOG(ERR, EAL, "  cannot get container fd!\n");
+			close(socket_fd);
+			return -1;
+		}
+	}
 	return -1;
 }
 
@@ -605,6 +671,20 @@ pci_vfio_map_resource(struct rte_pci_device *dev)
 		/* get number of registers (up to BAR5) */
 		vfio_res->nb_maps = RTE_MIN((int) device_info.num_regions,
 				VFIO_PCI_BAR5_REGION_INDEX + 1);
+	} else {
+		/* if we're in a secondary process, just find our tailq entry */
+		TAILQ_FOREACH(vfio_res, pci_res_list, next) {
+			if (memcmp(&vfio_res->pci_addr, &dev->addr, sizeof(dev->addr)))
+				continue;
+			break;
+		}
+		/* if we haven't found our tailq entry, something's wrong */
+		if (vfio_res == NULL) {
+			RTE_LOG(ERR, EAL, "  %s cannot find TAILQ entry for PCI device!\n",
+					pci_addr);
+			close(vfio_dev_fd);
+			return -1;
+		}
 	}
 
 	/* map BARs */
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio_mp_sync.c b/lib/librte_eal/linuxapp/eal/eal_pci_vfio_mp_sync.c
new file mode 100644
index 0000000..add2c3e
--- /dev/null
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio_mp_sync.c
@@ -0,0 +1,395 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <string.h>
+#include <fcntl.h>
+#include <sys/socket.h>
+
+/* sys/un.h with __USE_MISC uses strlen, which is unsafe */
+#ifdef __USE_MISC
+#define REMOVED_USE_MISC
+#undef __USE_MISC
+#endif
+#include <sys/un.h>
+/* make sure we redefine __USE_MISC only if it was previously undefined */
+#ifdef REMOVED_USE_MISC
+#define __USE_MISC
+#undef REMOVED_USE_MISC
+#endif
+
+#include <rte_log.h>
+#include <rte_pci.h>
+#include <rte_tailq.h>
+#include <rte_eal_memconfig.h>
+#include <rte_malloc.h>
+
+#include "eal_filesystem.h"
+#include "eal_pci_init.h"
+
+/**
+ * @file
+ * VFIO socket for communication between primary and secondary processes.
+ *
+ * This file is only compiled if CONFIG_RTE_EAL_VFIO is set to "y".
+ */
+
+#ifdef VFIO_PRESENT
+
+#define SOCKET_PATH_FMT "%s/.%s_mp_socket"
+#define CMSGLEN (CMSG_LEN(sizeof(int)))
+#define FD_TO_CMSGHDR(fd, chdr) \
+		do {\
+			(chdr).cmsg_len = CMSGLEN;\
+			(chdr).cmsg_level = SOL_SOCKET;\
+			(chdr).cmsg_type = SCM_RIGHTS;\
+			memcpy((chdr).__cmsg_data, &(fd), sizeof(fd));\
+		} while (0)
+#define CMSGHDR_TO_FD(chdr, fd) \
+			memcpy(&(fd), (chdr).__cmsg_data, sizeof(fd))
+
+static pthread_t socket_thread;
+static int mp_socket_fd;
+
+
+/* get socket path (/var/run if root, $HOME otherwise) */
+static void
+get_socket_path(char *buffer, int bufsz)
+{
+	const char *dir = "/var/run";
+	const char *home_dir = getenv("HOME");
+
+	if (getuid() != 0 && home_dir != NULL)
+		dir = home_dir;
+
+	/* use current prefix as file path */
+	rte_snprintf(buffer, bufsz, SOCKET_PATH_FMT, dir,
+			internal_config.hugefile_prefix);
+}
+
+
+
+/*
+ * data flow for socket comm protocol:
+ * 1. client sends SOCKET_REQ_CONTAINER or SOCKET_REQ_GROUP
+ * 1a. in case of SOCKET_REQ_GROUP, client also then sends group number
+ * 2. server receives message
+ * 2a. in case of invalid group, SOCKET_ERR is sent back to client
+ * 2b. in case of unbound group, SOCKET_NO_FD is sent back to client
+ * 2c. in case of valid group, SOCKET_OK is sent and immediately followed by fd
+ *
+ * in case of any error, socket is closed.
+ */
+
+/* send a request, return -1 on error */
+int
+vfio_mp_sync_send_request(int socket, int req)
+{
+	struct msghdr hdr;
+	struct iovec iov;
+	int buf;
+	int ret;
+
+	memset(&hdr, 0, sizeof(hdr));
+
+	buf = req;
+
+	hdr.msg_iov = &iov;
+	hdr.msg_iovlen = 1;
+	iov.iov_base = (char *) &buf;
+	iov.iov_len = sizeof(buf);
+
+	ret = sendmsg(socket, &hdr, 0);
+	if (ret < 0)
+		return -1;
+	return 0;
+}
+
+/* receive a request and return it */
+int
+vfio_mp_sync_receive_request(int socket)
+{
+	int buf;
+	struct msghdr hdr;
+	struct iovec iov;
+	int ret, req;
+
+	memset(&hdr, 0, sizeof(hdr));
+
+	buf = SOCKET_ERR;
+
+	hdr.msg_iov = &iov;
+	hdr.msg_iovlen = 1;
+	iov.iov_base = (char *) &buf;
+	iov.iov_len = sizeof(buf);
+
+	ret = recvmsg(socket, &hdr, 0);
+	if (ret < 0)
+		return -1;
+
+	req = buf;
+
+	return req;
+}
+
+/* send OK in message, fd in control message */
+int
+vfio_mp_sync_send_fd(int socket, int fd)
+{
+	int buf;
+	struct msghdr hdr;
+	struct cmsghdr *chdr;
+	char chdr_buf[CMSGLEN];
+	struct iovec iov;
+	int ret;
+
+	chdr = (struct cmsghdr *) chdr_buf;
+	memset(chdr, 0, sizeof(chdr_buf));
+	memset(&hdr, 0, sizeof(hdr));
+
+	hdr.msg_iov = &iov;
+	hdr.msg_iovlen = 1;
+	iov.iov_base = (char *) &buf;
+	iov.iov_len = sizeof(buf);
+	hdr.msg_control = chdr;
+	hdr.msg_controllen = CMSGLEN;
+
+	buf = SOCKET_OK;
+	FD_TO_CMSGHDR(fd, *chdr);
+
+	ret = sendmsg(socket, &hdr, 0);
+	if (ret < 0)
+		return -1;
+	return 0;
+}
+
+/* receive OK in message, fd in control message */
+int
+vfio_mp_sync_receive_fd(int socket)
+{
+	int buf;
+	struct msghdr hdr;
+	struct cmsghdr *chdr;
+	char chdr_buf[CMSGLEN];
+	struct iovec iov;
+	int ret, req, fd;
+
+	buf = SOCKET_ERR;
+
+	chdr = (struct cmsghdr *) chdr_buf;
+	memset(chdr, 0, sizeof(chdr_buf));
+	memset(&hdr, 0, sizeof(hdr));
+
+	hdr.msg_iov = &iov;
+	hdr.msg_iovlen = 1;
+	iov.iov_base = (char *) &buf;
+	iov.iov_len = sizeof(buf);
+	hdr.msg_control = chdr;
+	hdr.msg_controllen = CMSGLEN;
+
+	ret = recvmsg(socket, &hdr, 0);
+	if (ret < 0)
+		return -1;
+
+	req = buf;
+
+	if (req != SOCKET_OK)
+		return -1;
+
+	CMSGHDR_TO_FD(*chdr, fd);
+
+	return fd;
+}
+
+/* connect socket_fd in secondary process to the primary process's socket */
+int
+vfio_mp_sync_connect_to_primary(void)
+{
+	struct sockaddr_un addr;
+	socklen_t sockaddr_len;
+	int socket_fd;
+
+	/* set up a socket */
+	socket_fd = socket(AF_UNIX, SOCK_SEQPACKET, 0);
+	if (socket_fd < 0) {
+		RTE_LOG(ERR, EAL, "Failed to create socket!\n");
+		return -1;
+	}
+
+	get_socket_path(addr.sun_path, sizeof(addr.sun_path));
+	addr.sun_family = AF_UNIX;
+
+	sockaddr_len = sizeof(struct sockaddr_un);
+
+	if (connect(socket_fd, (struct sockaddr *) &addr, sockaddr_len) == 0)
+		return socket_fd;
+
+	/* if connect failed */
+	close(socket_fd);
+	return -1;
+}
+
+
+
+/*
+ * socket listening thread for primary process
+ */
+static __attribute__((noreturn)) void *
+pci_vfio_mp_sync_thread(void __rte_unused * arg)
+{
+	int ret, fd, vfio_group_no;
+
+	/* wait for requests on the socket */
+	for (;;) {
+		int conn_sock;
+		struct sockaddr_un addr;
+		socklen_t sockaddr_len = sizeof(addr);
+
+		/* this is a blocking call */
+		conn_sock = accept(mp_socket_fd, (struct sockaddr *) &addr,
+				&sockaddr_len);
+
+		/* just restart on error */
+		if (conn_sock == -1)
+			continue;
+
+		/* set socket to linger after close */
+		struct linger l;
+		l.l_onoff = 1;
+		l.l_linger = 60;
+		setsockopt(conn_sock, SOL_SOCKET, SO_LINGER, &l, sizeof(l));
+
+		ret = vfio_mp_sync_receive_request(conn_sock);
+
+		switch (ret) {
+		case SOCKET_REQ_CONTAINER:
+			fd = pci_vfio_get_container_fd();
+			if (fd < 0)
+				vfio_mp_sync_send_request(conn_sock, SOCKET_ERR);
+			else
+				vfio_mp_sync_send_fd(conn_sock, fd);
+			break;
+		case SOCKET_REQ_GROUP:
+			/* wait for group number */
+			vfio_group_no = vfio_mp_sync_receive_request(conn_sock);
+			if (vfio_group_no < 0) {
+				close(conn_sock);
+				continue;
+			}
+
+			fd = pci_vfio_get_group_fd(vfio_group_no);
+
+			if (fd < 0)
+				vfio_mp_sync_send_request(conn_sock, SOCKET_ERR);
+			/* if VFIO group exists but isn't bound to VFIO driver */
+			else if (fd == 0)
+				vfio_mp_sync_send_request(conn_sock, SOCKET_NO_FD);
+			/* if group exists and is bound to VFIO driver */
+			else {
+				vfio_mp_sync_send_request(conn_sock, SOCKET_OK);
+				vfio_mp_sync_send_fd(conn_sock, fd);
+			}
+			break;
+		default:
+			vfio_mp_sync_send_request(conn_sock, SOCKET_ERR);
+			break;
+		}
+		close(conn_sock);
+	}
+}
+
+static int
+vfio_mp_sync_socket_setup(void)
+{
+	int ret, socket_fd;
+	struct sockaddr_un addr;
+	socklen_t sockaddr_len;
+
+	/* set up a socket */
+	socket_fd = socket(AF_UNIX, SOCK_SEQPACKET, 0);
+	if (socket_fd < 0) {
+		RTE_LOG(ERR, EAL, "Failed to create socket!\n");
+		return -1;
+	}
+
+	get_socket_path(addr.sun_path, sizeof(addr.sun_path));
+	addr.sun_family = AF_UNIX;
+
+	sockaddr_len = sizeof(struct sockaddr_un);
+
+	unlink(addr.sun_path);
+
+	ret = bind(socket_fd, (struct sockaddr *) &addr, sockaddr_len);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Failed to bind socket: %s!\n", strerror(errno));
+		close(socket_fd);
+		return -1;
+	}
+
+	ret = listen(socket_fd, 50);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Failed to listen: %s!\n", strerror(errno));
+		close(socket_fd);
+		return -1;
+	}
+
+	/* save the socket in local configuration */
+	mp_socket_fd = socket_fd;
+
+	return 0;
+}
+
+/*
+ * set up a local socket and tell it to listen for incoming connections
+ */
+int
+pci_vfio_mp_sync_setup(void)
+{
+	int ret;
+
+	if (vfio_mp_sync_socket_setup() < 0) {
+		RTE_LOG(ERR, EAL, "Failed to set up local socket!\n");
+		return -1;
+	}
+
+	ret = pthread_create(&socket_thread, NULL,
+			pci_vfio_mp_sync_thread, NULL);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "Failed to create thread for communication with "
+				"secondary processes!\n");
+		close(mp_socket_fd);
+		return -1;
+	}
+	return 0;
+}
+
+#endif
diff --git a/lib/librte_eal/linuxapp/eal/include/eal_pci_init.h b/lib/librte_eal/linuxapp/eal/include/eal_pci_init.h
index 59a7e79..d758bee 100644
--- a/lib/librte_eal/linuxapp/eal/include/eal_pci_init.h
+++ b/lib/librte_eal/linuxapp/eal/include/eal_pci_init.h
@@ -71,9 +71,28 @@ int pci_uio_map_resource(struct rte_pci_device *dev);
 
 int pci_vfio_enable(void);
 int pci_vfio_is_enabled(void);
+int pci_vfio_mp_sync_setup(void);
 
 /* map VFIO resource prototype */
 int pci_vfio_map_resource(struct rte_pci_device *dev);
+int pci_vfio_get_group_fd(int iommu_group_fd);
+int pci_vfio_get_container_fd(void);
+
+/*
+ * Function prototypes for VFIO multiprocess sync functions
+ */
+int vfio_mp_sync_send_request(int socket, int req);
+int vfio_mp_sync_receive_request(int socket);
+int vfio_mp_sync_send_fd(int socket, int fd);
+int vfio_mp_sync_receive_fd(int socket);
+int vfio_mp_sync_connect_to_primary(void);
+
+/* socket comm protocol definitions */
+#define SOCKET_REQ_CONTAINER 0x100
+#define SOCKET_REQ_GROUP 0x200
+#define SOCKET_OK 0x0
+#define SOCKET_NO_FD 0x1
+#define SOCKET_ERR 0xFF
 
 /*
  * we don't need to store device fd's anywhere since they can be obtained from
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v6 14/20] pci: enable VFIO device binding
  2014-06-13 14:52         ` [dpdk-dev] [PATCH v6 " Anatoly Burakov
                             ` (12 preceding siblings ...)
  2014-06-13 14:52           ` [dpdk-dev] [PATCH v6 13/20] vfio: add multiprocess support Anatoly Burakov
@ 2014-06-13 14:52           ` Anatoly Burakov
  2014-06-13 14:52           ` [dpdk-dev] [PATCH v6 15/20] eal: added support for selecting VFIO interrupt type from EAL command-line Anatoly Burakov
                             ` (6 subsequent siblings)
  20 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-06-13 14:52 UTC (permalink / raw)
  To: dev

Add support for binding VFIO devices if RTE_PCI_DRV_NEED_MAPPING is set
for this driver. Try VFIO first, if not mapped then try IGB_UIO too.
---
 lib/librte_eal/linuxapp/eal/eal_pci.c | 44 +++++++++++++++++++++++++++++++++--
 1 file changed, 42 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c
index c7cd38e..3b94b6f 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -393,6 +393,30 @@ error:
 	return -1;
 }
 
+static int
+pci_map_device(struct rte_pci_device *dev)
+{
+	int ret, mapped = 0;
+
+	/* try mapping the NIC resources using VFIO if it exists */
+#ifdef VFIO_PRESENT
+	if (pci_vfio_is_enabled()) {
+		ret = pci_vfio_map_resource(dev);
+		if (ret == 0)
+			mapped = 1;
+		else if (ret < 0)
+			return ret;
+	}
+#endif
+	/* map resources for devices that use igb_uio */
+	if (!mapped) {
+		ret = pci_uio_map_resource(dev);
+		if (ret != 0)
+			return ret;
+	}
+	return 0;
+}
+
 /*
  * If vendor/device ID match, call the devinit() function of the
  * driver.
@@ -400,8 +424,8 @@ error:
 int
 rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, struct rte_pci_device *dev)
 {
+	int ret;
 	struct rte_pci_id *id_table;
-	int ret = 0;
 
 	for (id_table = dr->id_table ; id_table->vendor_id != 0; id_table++) {
 
@@ -437,7 +461,7 @@ rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, struct rte_pci_device *d
 
 		if (dr->drv_flags & RTE_PCI_DRV_NEED_MAPPING) {
 			/* map resources for devices that use igb_uio */
-			ret = pci_uio_map_resource(dev);
+			ret = pci_map_device(dev);
 			if (ret != 0)
 				return ret;
 		} else if (dr->drv_flags & RTE_PCI_DRV_FORCE_UNBIND &&
@@ -474,5 +498,21 @@ rte_eal_pci_init(void)
 		RTE_LOG(ERR, EAL, "%s(): Cannot scan PCI bus\n", __func__);
 		return -1;
 	}
+#ifdef VFIO_PRESENT
+	pci_vfio_enable();
+
+	if (pci_vfio_is_enabled()) {
+
+		/* if we are primary process, create a thread to communicate with
+		 * secondary processes. the thread will use a socket to wait for
+		 * requests from secondary process to send open file descriptors,
+		 * because VFIO does not allow multiple open descriptors on a group or
+		 * VFIO container.
+		 */
+		if (internal_config.process_type == RTE_PROC_PRIMARY &&
+				pci_vfio_mp_sync_setup() < 0)
+			return -1;
+	}
+#endif
 	return 0;
 }
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v6 15/20] eal: added support for selecting VFIO interrupt type from EAL command-line
  2014-06-13 14:52         ` [dpdk-dev] [PATCH v6 " Anatoly Burakov
                             ` (13 preceding siblings ...)
  2014-06-13 14:52           ` [dpdk-dev] [PATCH v6 14/20] pci: enable VFIO device binding Anatoly Burakov
@ 2014-06-13 14:52           ` Anatoly Burakov
  2014-06-13 14:52           ` [dpdk-dev] [PATCH v6 16/20] eal: make --no-huge use mmap instead of malloc Anatoly Burakov
                             ` (5 subsequent siblings)
  20 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-06-13 14:52 UTC (permalink / raw)
  To: dev

Unlike igb_uio, VFIO interrupt type is not set by kernel module
parameters but is set up via ioctl() calls at runtime. This warrants
a new EAL command-line parameter. It will have no effect if VFIO is
not compiled, but will set VFIO interrupt type to either "legacy", "msi"
or "msix" if VFIO support is compiled. Note that VFIO initialization
will fail if the interrupt type selected is not supported by the system.

If the interrupt type parameter wasn't specified, VFIO will try all
interrupt types (starting with MSI-X).
---
 lib/librte_eal/linuxapp/eal/eal.c | 34 ++++++++++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)

diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index faa4c93..6994303 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -99,6 +99,7 @@
 #define OPT_BASE_VIRTADDR   "base-virtaddr"
 #define OPT_XEN_DOM0    "xen-dom0"
 #define OPT_CREATE_UIO_DEV "create-uio-dev"
+#define OPT_VFIO_INTR    "vfio-intr"
 
 #define RTE_EAL_BLACKLIST_SIZE	0x100
 
@@ -360,6 +361,8 @@ eal_usage(const char *prgname)
 	       "               (ex: --vdev=eth_pcap0,iface=eth2).\n"
 	       "  --"OPT_VMWARE_TSC_MAP": use VMware TSC map instead of native RDTSC\n"
 	       "  --"OPT_BASE_VIRTADDR": specify base virtual address\n"
+	       "  --"OPT_VFIO_INTR": specify desired interrupt mode for VFIO "
+			   "(legacy|msi|msix)\n"
 	       "  --"OPT_CREATE_UIO_DEV": create /dev/uioX (usually done by hotplug)\n"
 	       "\nEAL options for DEBUG use only:\n"
 	       "  --"OPT_NO_HUGE"  : use malloc instead of hugetlbfs\n"
@@ -578,6 +581,28 @@ eal_parse_base_virtaddr(const char *arg)
 	return 0;
 }
 
+static int
+eal_parse_vfio_intr(const char *mode)
+{
+	unsigned i;
+	static struct {
+		const char *name;
+		enum rte_intr_mode value;
+	} map[] = {
+		{ "legacy", RTE_INTR_MODE_LEGACY },
+		{ "msi", RTE_INTR_MODE_MSI },
+		{ "msix", RTE_INTR_MODE_MSIX },
+	};
+
+	for (i = 0; i < RTE_DIM(map); i++) {
+		if (!strcmp(mode, map[i].name)) {
+			internal_config.vfio_intr_mode = map[i].value;
+			return 0;
+		}
+	}
+	return -1;
+}
+
 static inline size_t
 eal_get_hugepage_mem_size(void)
 {
@@ -632,6 +657,7 @@ eal_parse_args(int argc, char **argv)
 		{OPT_PCI_BLACKLIST, 1, 0, 0},
 		{OPT_VDEV, 1, 0, 0},
 		{OPT_SYSLOG, 1, NULL, 0},
+		{OPT_VFIO_INTR, 1, NULL, 0},
 		{OPT_BASE_VIRTADDR, 1, 0, 0},
 		{OPT_XEN_DOM0, 0, 0, 0},
 		{OPT_CREATE_UIO_DEV, 1, NULL, 0},
@@ -828,6 +854,14 @@ eal_parse_args(int argc, char **argv)
 					return -1;
 				}
 			}
+			else if (!strcmp(lgopts[option_index].name, OPT_VFIO_INTR)) {
+				if (eal_parse_vfio_intr(optarg) < 0) {
+					RTE_LOG(ERR, EAL, "invalid parameters for --"
+							OPT_VFIO_INTR "\n");
+					eal_usage(prgname);
+					return -1;
+				}
+			}
 			else if (!strcmp(lgopts[option_index].name, OPT_CREATE_UIO_DEV)) {
 				internal_config.create_uio_dev = 1;
 			}
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v6 16/20] eal: make --no-huge use mmap instead of malloc
  2014-06-13 14:52         ` [dpdk-dev] [PATCH v6 " Anatoly Burakov
                             ` (14 preceding siblings ...)
  2014-06-13 14:52           ` [dpdk-dev] [PATCH v6 15/20] eal: added support for selecting VFIO interrupt type from EAL command-line Anatoly Burakov
@ 2014-06-13 14:52           ` Anatoly Burakov
  2014-06-13 14:52           ` [dpdk-dev] [PATCH v6 17/20] test app: adding unit tests for VFIO EAL command-line parameter Anatoly Burakov
                             ` (4 subsequent siblings)
  20 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-06-13 14:52 UTC (permalink / raw)
  To: dev

This makes it possible to run DPDK without hugepage memory when VFIO
is used, as VFIO uses virtual addresses to set up DMA mappings.

Technically, malloc is just fine, but we want to guarantee that
memory will be page-aligned, so using mmap to be safe.
---
 lib/librte_eal/linuxapp/eal/eal_memory.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
index d9cfb09..ae43f9e 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -1031,7 +1031,13 @@ rte_eal_hugepage_init(void)
 
 	/* hugetlbfs can be disabled */
 	if (internal_config.no_hugetlbfs) {
-		addr = malloc(internal_config.memory);
+		addr = mmap(NULL, internal_config.memory, PROT_READ | PROT_WRITE,
+				MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
+		if (addr == MAP_FAILED) {
+			RTE_LOG(ERR, EAL, "%s: mmap() failed: %s\n", __func__,
+					strerror(errno));
+			return -1;
+		}
 		mcfg->memseg[0].phys_addr = (phys_addr_t)(uintptr_t)addr;
 		mcfg->memseg[0].addr = addr;
 		mcfg->memseg[0].len = internal_config.memory;
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v6 17/20] test app: adding unit tests for VFIO EAL command-line parameter
  2014-06-13 14:52         ` [dpdk-dev] [PATCH v6 " Anatoly Burakov
                             ` (15 preceding siblings ...)
  2014-06-13 14:52           ` [dpdk-dev] [PATCH v6 16/20] eal: make --no-huge use mmap instead of malloc Anatoly Burakov
@ 2014-06-13 14:52           ` Anatoly Burakov
  2014-06-13 14:52           ` [dpdk-dev] [PATCH v6 18/20] igb_uio: Removed PCI ID table from igb_uio Anatoly Burakov
                             ` (3 subsequent siblings)
  20 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-06-13 14:52 UTC (permalink / raw)
  To: dev

Adding unit tests for VFIO interrupt type command-line parameter. We
don't know if VFIO is compiled (eal_vfio.h header is internal to
Linuxapp EAL), so we check this flag regardless.
---
 app/test/test_eal_flags.c | 36 ++++++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/app/test/test_eal_flags.c b/app/test/test_eal_flags.c
index 298c11a..ea4a567 100644
--- a/app/test/test_eal_flags.c
+++ b/app/test/test_eal_flags.c
@@ -768,6 +768,22 @@ test_misc_flags(void)
 	const char *argv11[] = {prgname, "--file-prefix=virtaddr",
 			"-c", "1", "-n", "2", "--base-virtaddr=0x12345678"};
 
+	/* try running with --vfio-intr INTx flag */
+	const char *argv12[] = {prgname, "--file-prefix=intr",
+			"-c", "1", "-n", "2", "--vfio-intr=legacy"};
+
+	/* try running with --vfio-intr MSI flag */
+	const char *argv13[] = {prgname, "--file-prefix=intr",
+			"-c", "1", "-n", "2", "--vfio-intr=msi"};
+
+	/* try running with --vfio-intr MSI-X flag */
+	const char *argv14[] = {prgname, "--file-prefix=intr",
+			"-c", "1", "-n", "2", "--vfio-intr=msix"};
+
+	/* try running with --vfio-intr invalid flag */
+	const char *argv15[] = {prgname, "--file-prefix=intr",
+			"-c", "1", "-n", "2", "--vfio-intr=invalid"};
+
 
 	if (launch_proc(argv0) == 0) {
 		printf("Error - process ran ok with invalid flag\n");
@@ -820,6 +836,26 @@ test_misc_flags(void)
 		printf("Error - process did not run ok with --base-virtaddr parameter\n");
 		return -1;
 	}
+	if (launch_proc(argv12) != 0) {
+		printf("Error - process did not run ok with "
+				"--vfio-intr INTx parameter\n");
+		return -1;
+	}
+	if (launch_proc(argv13) != 0) {
+		printf("Error - process did not run ok with "
+				"--vfio-intr MSI parameter\n");
+		return -1;
+	}
+	if (launch_proc(argv14) != 0) {
+		printf("Error - process did not run ok with "
+				"--vfio-intr MSI-X parameter\n");
+		return -1;
+	}
+	if (launch_proc(argv15) == 0) {
+		printf("Error - process run ok with "
+				"--vfio-intr invalid parameter\n");
+		return -1;
+	}
 	return 0;
 }
 #endif
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v6 18/20] igb_uio: Removed PCI ID table from igb_uio
  2014-06-13 14:52         ` [dpdk-dev] [PATCH v6 " Anatoly Burakov
                             ` (16 preceding siblings ...)
  2014-06-13 14:52           ` [dpdk-dev] [PATCH v6 17/20] test app: adding unit tests for VFIO EAL command-line parameter Anatoly Burakov
@ 2014-06-13 14:52           ` Anatoly Burakov
  2014-06-13 14:52           ` [dpdk-dev] [PATCH v6 19/20] binding script: Renamed igb_uio_bind to dpdk_nic_bind Anatoly Burakov
                             ` (2 subsequent siblings)
  20 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-06-13 14:52 UTC (permalink / raw)
  To: dev

Removing PCI ID list to make igb_uio more similar to a generic driver
like vfio-pci or pci_uio_generic. This is done to make it easier for
the binding script to support multiple drivers.

Note that since igb_uio no longer has a PCI ID list, it can now be
bound to any device, not just those explicitly supported by DPDK. In
other words, it now behaves similar to PCI stub, VFIO and other generic
PCI drivers.

Therefore to bind a new device to igb_uio, the user will now have to
first write its PCI ID to "new_id" file inside the igb_uio driver
directory, and only then write the PCI ID to "bind". This is reflected
in changes to PCI binding script as well.

There's a weird behaviour of sysfs when a new device ID is added to
new_id. Subsequent writing to "bind" will result in IOError on
closing the file. This error is harmless but it triggers the
exception anyway, so in order to work around that, we check if the
device was actually bound to the driver before raising an error.
---
 lib/librte_eal/linuxapp/igb_uio/igb_uio.c |  21 +-----
 tools/igb_uio_bind.py                     | 118 +++++++++++++++---------------
 2 files changed, 59 insertions(+), 80 deletions(-)

diff --git a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
index 8e467a2..60b8ca4 100644
--- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
+++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
@@ -65,25 +65,6 @@ struct rte_uio_pci_dev {
 static char *intr_mode = NULL;
 static enum rte_intr_mode igbuio_intr_mode_preferred = RTE_INTR_MODE_MSIX;
 
-/* PCI device id table */
-static struct pci_device_id igbuio_pci_ids[] = {
-#define RTE_PCI_DEV_ID_DECL_EM(vend, dev) {PCI_DEVICE(vend, dev)},
-#define RTE_PCI_DEV_ID_DECL_IGB(vend, dev) {PCI_DEVICE(vend, dev)},
-#define RTE_PCI_DEV_ID_DECL_IGBVF(vend, dev) {PCI_DEVICE(vend, dev)},
-#define RTE_PCI_DEV_ID_DECL_IXGBE(vend, dev) {PCI_DEVICE(vend, dev)},
-#define RTE_PCI_DEV_ID_DECL_IXGBEVF(vend, dev) {PCI_DEVICE(vend, dev)},
-#ifdef RTE_LIBRTE_VIRTIO_PMD
-#define RTE_PCI_DEV_ID_DECL_VIRTIO(vend, dev) {PCI_DEVICE(vend, dev)},
-#endif
-#ifdef RTE_LIBRTE_VMXNET3_PMD
-#define RTE_PCI_DEV_ID_DECL_VMXNET3(vend, dev) {PCI_DEVICE(vend, dev)},
-#endif
-#include <rte_pci_dev_ids.h>
-{ 0, },
-};
-
-MODULE_DEVICE_TABLE(pci, igbuio_pci_ids);
-
 static inline struct rte_uio_pci_dev *
 igbuio_get_uio_pci_dev(struct uio_info *info)
 {
@@ -619,7 +600,7 @@ igbuio_config_intr_mode(char *intr_str)
 
 static struct pci_driver igbuio_pci_driver = {
 	.name = "igb_uio",
-	.id_table = igbuio_pci_ids,
+	.id_table = NULL,
 	.probe = igbuio_pci_probe,
 	.remove = igbuio_pci_remove,
 };
diff --git a/tools/igb_uio_bind.py b/tools/igb_uio_bind.py
index 18dbeda..e87a05e 100755
--- a/tools/igb_uio_bind.py
+++ b/tools/igb_uio_bind.py
@@ -42,8 +42,6 @@ ETHERNET_CLASS = "0200"
 # global dict ethernet devices present. Dictionary indexed by PCI address.
 # Each device within this is itself a dictionary of device properties
 devices = {}
-# list of vendor:device pairs (again stored as dictionary) supported by igb_uio
-module_dev_ids = []
 
 def usage():
     '''Print usage information for the program'''
@@ -147,9 +145,7 @@ def find_module(mod):
                 return path
 
 def check_modules():
-    '''Checks that the needed modules (igb_uio) is loaded, and then
-    determine from the .ko file, what its supported device ids are'''
-    global module_dev_ids
+    '''Checks that igb_uio is loaded'''
 
     fd = file("/proc/modules")
     loaded_mods = fd.readlines()
@@ -166,40 +162,35 @@ def check_modules():
         print "Error - module %s not loaded" %mod
         sys.exit(1)
 
-    # now find the .ko and get list of supported vendor/dev-ids
-    modpath = find_module(mod)
-    if modpath is None:
-        print "Cannot find module file %s" % (mod + ".ko")
-        sys.exit(1)
-    depmod_output = check_output(["depmod", "-n", modpath]).splitlines()
-    for line in depmod_output:
-        if not line.startswith("alias"):
-            continue
-        if not line.endswith(mod):
-            continue
-        lineparts = line.split()
-        if not(lineparts[1].startswith("pci:")):
-            continue;
-        else:
-            lineparts[1] = lineparts[1][4:]
-        vendor = lineparts[1][:9]
-        device = lineparts[1][9:18]
-        if vendor.startswith("v") and device.startswith("d"):
-            module_dev_ids.append({"Vendor": int(vendor[1:],16),
-                                   "Device": int(device[1:],16)})
-
-def is_supported_device(dev_id):
-    '''return true if device is supported by igb_uio, false otherwise'''
-    for dev in module_dev_ids:
-        if (dev["Vendor"] == devices[dev_id]["Vendor"] and
-            dev["Device"] == devices[dev_id]["Device"]):
-            return True
-    return False
-
 def has_driver(dev_id):
     '''return true if a device is assigned to a driver. False otherwise'''
     return "Driver_str" in devices[dev_id]
 
+def get_pci_device_details(dev_id):
+    '''This function gets additional details for a PCI device'''
+    device = {}
+
+    extra_info = check_output(["lspci", "-vmmks", dev_id]).splitlines()
+
+    # parse lspci details
+    for line in extra_info:
+        if len(line) == 0:
+            continue
+        name, value = line.split("\t", 1)
+        name = name.strip(":") + "_str"
+        device[name] = value
+    # check for a unix interface name
+    sys_path = "/sys/bus/pci/devices/%s/net/" % dev_id
+    if exists(sys_path):
+        device["Interface"] = ",".join(os.listdir(sys_path))
+    else:
+        device["Interface"] = ""
+    # check if a port is used for ssh connection
+    device["Ssh_if"] = False
+    device["Active"] = ""
+
+    return device
+
 def get_nic_details():
     '''This function populates the "devices" dictionary. The keys used are
     the pci addresses (domain:bus:slot.func). The values are themselves
@@ -237,23 +228,10 @@ def get_nic_details():
 
     # based on the basic info, get extended text details
     for d in devices.keys():
-        extra_info = check_output(["lspci", "-vmmks", d]).splitlines()
-        # parse lspci details
-        for line in extra_info:
-            if len(line) == 0:
-                continue
-            name, value = line.split("\t", 1)
-            name = name.strip(":") + "_str"
-            devices[d][name] = value
-        # check for a unix interface name
-        sys_path = "/sys/bus/pci/devices/%s/net/" % d
-        if exists(sys_path):
-            devices[d]["Interface"] = ",".join(os.listdir(sys_path))
-        else:
-            devices[d]["Interface"] = ""
-        # check if a port is used for ssh connection
-        devices[d]["Ssh_if"] = False
-        devices[d]["Active"] = ""
+        # get additional info and add it to existing data
+        devices[d] = dict(devices[d].items() +
+                          get_pci_device_details(d).items())
+
         for _if in ssh_if:
             if _if in devices[d]["Interface"].split(","):
                 devices[d]["Ssh_if"] = True
@@ -261,14 +239,12 @@ def get_nic_details():
                 break;
 
         # add igb_uio to list of supporting modules if needed
-        if is_supported_device(d):
-            if "Module_str" in devices[d]:
-                if "igb_uio" not in devices[d]["Module_str"]:
-                    devices[d]["Module_str"] = devices[d]["Module_str"] + ",igb_uio"
-            else:
-                devices[d]["Module_str"] = "igb_uio"
-        if "Module_str" not in devices[d]:
-            devices[d]["Module_str"] = "<none>"
+        if "Module_str" in devices[d]:
+            if "igb_uio" not in devices[d]["Module_str"]:
+                devices[d]["Module_str"] = devices[d]["Module_str"] + ",igb_uio"
+        else:
+            devices[d]["Module_str"] = "igb_uio"
+
         # make sure the driver and module strings do not have any duplicates
         if has_driver(d):
             modules = devices[d]["Module_str"].split(",")
@@ -343,6 +319,22 @@ def bind_one(dev_id, driver, force):
             unbind_one(dev_id, force)
             dev["Driver_str"] = "" # clear driver string
 
+    # if we are binding to one of DPDK drivers, add PCI id's to that driver
+    if driver == "igb_uio":
+        filename = "/sys/bus/pci/drivers/%s/new_id" % driver
+        try:
+            f = open(filename, "w")
+        except:
+            print "Error: bind failed for %s - Cannot open %s" % (dev_id, filename)
+            return
+        try:
+            f.write("%04x %04x" % (dev["Vendor"], dev["Device"]))
+            f.close()
+        except:
+            print "Error: bind failed for %s - Cannot write new PCI ID to " \
+                "driver %s" % (dev_id, driver)
+            return
+
     # do the bind by writing to /sys
     filename = "/sys/bus/pci/drivers/%s/bind" % driver
     try:
@@ -356,6 +348,12 @@ def bind_one(dev_id, driver, force):
         f.write(dev_id)
         f.close()
     except:
+        # for some reason, closing dev_id after adding a new PCI ID to new_id
+        # results in IOError. however, if the device was successfully bound,
+        # we don't care for any errors and can safely ignore IOError
+        tmp = get_pci_device_details(dev_id)
+        if "Driver_str" in tmp and tmp["Driver_str"] == driver:
+            return
         print "Error: bind failed for %s - Cannot bind to driver %s" % (dev_id, driver)
         if saved_driver is not None: # restore any previous driver
             bind_one(dev_id, saved_driver, force)
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v6 19/20] binding script: Renamed igb_uio_bind to dpdk_nic_bind
  2014-06-13 14:52         ` [dpdk-dev] [PATCH v6 " Anatoly Burakov
                             ` (17 preceding siblings ...)
  2014-06-13 14:52           ` [dpdk-dev] [PATCH v6 18/20] igb_uio: Removed PCI ID table from igb_uio Anatoly Burakov
@ 2014-06-13 14:52           ` Anatoly Burakov
  2014-06-13 14:52           ` [dpdk-dev] [PATCH v6 20/20] setup script: adding support for VFIO to setup.sh Anatoly Burakov
  2014-06-16  9:08           ` [dpdk-dev] [PATCH v6 00/20] Add VFIO support to DPDK Thomas Monjalon
  20 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-06-13 14:52 UTC (permalink / raw)
  To: dev

Renaming the igb_uio_bind script to dpdk_nic_bind to have a generic name
since we're now supporting two drivers.
---
 tools/{igb_uio_bind.py => dpdk_nic_bind.py} | 47 ++++++++++++++++++++---------
 tools/setup.sh                              | 16 +++++-----
 2 files changed, 40 insertions(+), 23 deletions(-)
 rename tools/{igb_uio_bind.py => dpdk_nic_bind.py} (92%)

diff --git a/tools/igb_uio_bind.py b/tools/dpdk_nic_bind.py
similarity index 92%
rename from tools/igb_uio_bind.py
rename to tools/dpdk_nic_bind.py
index e87a05e..42e845f 100755
--- a/tools/igb_uio_bind.py
+++ b/tools/dpdk_nic_bind.py
@@ -42,6 +42,8 @@ ETHERNET_CLASS = "0200"
 # global dict ethernet devices present. Dictionary indexed by PCI address.
 # Each device within this is itself a dictionary of device properties
 devices = {}
+# list of supported DPDK drivers
+dpdk_drivers = [ "igb_uio", "vfio-pci" ]
 
 def usage():
     '''Print usage information for the program'''
@@ -146,22 +148,33 @@ def find_module(mod):
 
 def check_modules():
     '''Checks that igb_uio is loaded'''
+    global dpdk_drivers
 
     fd = file("/proc/modules")
     loaded_mods = fd.readlines()
     fd.close()
-    mod = "igb_uio"
+
+    # list of supported modules
+    mods =  [{"Name" : driver, "Found" : False} for driver in dpdk_drivers]
 
     # first check if module is loaded
-    found = False
     for line in loaded_mods:
-        if line.startswith(mod):
-            found = True
-            break
-    if not found:
-        print "Error - module %s not loaded" %mod
+        for mod in mods:
+            if line.startswith(mod["Name"]):
+                mod["Found"] = True
+            # special case for vfio_pci (module is named vfio-pci,
+            # but its .ko is named vfio_pci)
+            elif line.replace("_", "-").startswith(mod["Name"]):
+                mod["Found"] = True
+
+    # check if we have at least one loaded module
+    if True not in [mod["Found"] for mod in mods]:
+        print "Error - no supported modules are loaded"
         sys.exit(1)
 
+    # change DPDK driver list to only contain drivers that are loaded
+    dpdk_drivers = [mod["Name"] for mod in mods if mod["Found"]]
+
 def has_driver(dev_id):
     '''return true if a device is assigned to a driver. False otherwise'''
     return "Driver_str" in devices[dev_id]
@@ -196,6 +209,7 @@ def get_nic_details():
     the pci addresses (domain:bus:slot.func). The values are themselves
     dictionaries - one for each NIC.'''
     global devices
+    global dpdk_drivers
 
     # clear any old data
     devices = {}
@@ -240,10 +254,11 @@ def get_nic_details():
 
         # add igb_uio to list of supporting modules if needed
         if "Module_str" in devices[d]:
-            if "igb_uio" not in devices[d]["Module_str"]:
-                devices[d]["Module_str"] = devices[d]["Module_str"] + ",igb_uio"
+            for driver in dpdk_drivers:
+                if driver not in devices[d]["Module_str"]:
+                    devices[d]["Module_str"] = devices[d]["Module_str"] + ",%s" % driver
         else:
-            devices[d]["Module_str"] = "igb_uio"
+            devices[d]["Module_str"] = ",".join(dpdk_drivers)
 
         # make sure the driver and module strings do not have any duplicates
         if has_driver(d):
@@ -320,7 +335,7 @@ def bind_one(dev_id, driver, force):
             dev["Driver_str"] = "" # clear driver string
 
     # if we are binding to one of DPDK drivers, add PCI id's to that driver
-    if driver == "igb_uio":
+    if driver in dpdk_drivers:
         filename = "/sys/bus/pci/drivers/%s/new_id" % driver
         try:
             f = open(filename, "w")
@@ -397,21 +412,23 @@ def show_status():
     '''Function called when the script is passed the "--status" option. Displays
     to the user what devices are bound to the igb_uio driver, the kernel driver
     or to no driver'''
+    global dpdk_drivers
     kernel_drv = []
-    uio_drv = []
+    dpdk_drv = []
     no_drv = []
+
     # split our list of devices into the three categories above
     for d in devices.keys():
         if not has_driver(d):
             no_drv.append(devices[d])
             continue
-        if devices[d]["Driver_str"] == "igb_uio":
-            uio_drv.append(devices[d])
+        if devices[d]["Driver_str"] in dpdk_drivers:
+            dpdk_drv.append(devices[d])
         else:
             kernel_drv.append(devices[d])
 
     # print each category separately, so we can clearly see what's used by DPDK
-    display_devices("Network devices using IGB_UIO driver", uio_drv, \
+    display_devices("Network devices using DPDK-compatible driver", dpdk_drv, \
                     "drv=%(Driver_str)s unused=%(Module_str)s")
     display_devices("Network devices using kernel driver", kernel_drv,
                     "if=%(Interface)s drv=%(Driver_str)s unused=%(Module_str)s %(Active)s")
diff --git a/tools/setup.sh b/tools/setup.sh
index c3fbd4d..a54f65d 100755
--- a/tools/setup.sh
+++ b/tools/setup.sh
@@ -324,13 +324,13 @@ grep_meminfo()
 }
 
 #
-# Calls igb_uio_bind.py --status to show the NIC and what they
+# Calls dpdk_nic_bind.py --status to show the NIC and what they
 # are all bound to, in terms of drivers.
 #
 show_nics()
 {
 	if  /sbin/lsmod  | grep -q igb_uio ; then
-		${RTE_SDK}/tools/igb_uio_bind.py --status
+		${RTE_SDK}/tools/dpdk_nic_bind.py --status
 	else
 		echo "# Please load the 'igb_uio' kernel module before querying or "
 		echo "# adjusting NIC device bindings"
@@ -338,16 +338,16 @@ show_nics()
 }
 
 #
-# Uses igb_uio_bind.py to move devices to work with igb_uio
+# Uses dpdk_nic_bind.py to move devices to work with igb_uio
 #
 bind_nics()
 {
 	if  /sbin/lsmod  | grep -q igb_uio ; then
-		${RTE_SDK}/tools/igb_uio_bind.py --status
+		${RTE_SDK}/tools/dpdk_nic_bind.py --status
 		echo ""
 		echo -n "Enter PCI address of device to bind to IGB UIO driver: "
 		read PCI_PATH
-		sudo ${RTE_SDK}/tools/igb_uio_bind.py -b igb_uio $PCI_PATH && echo "OK"
+		sudo ${RTE_SDK}/tools/dpdk_nic_bind.py -b igb_uio $PCI_PATH && echo "OK"
 	else
 		echo "# Please load the 'igb_uio' kernel module before querying or "
 		echo "# adjusting NIC device bindings"
@@ -355,18 +355,18 @@ bind_nics()
 }
 
 #
-# Uses igb_uio_bind.py to move devices to work with kernel drivers again
+# Uses dpdk_nic_bind.py to move devices to work with kernel drivers again
 #
 unbind_nics()
 {
-	${RTE_SDK}/tools/igb_uio_bind.py --status
+	${RTE_SDK}/tools/dpdk_nic_bind.py --status
 	echo ""
 	echo -n "Enter PCI address of device to bind to IGB UIO driver: "
 	read PCI_PATH
 	echo ""
 	echo -n "Enter name of kernel driver to bind the device to: "
 	read DRV
-	sudo ${RTE_SDK}/tools/igb_uio_bind.py -b $DRV $PCI_PATH && echo "OK"
+	sudo ${RTE_SDK}/tools/dpdk_nic_bind.py -b $DRV $PCI_PATH && echo "OK"
 }
 
 #
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [dpdk-dev] [PATCH v6 20/20] setup script: adding support for VFIO to setup.sh
  2014-06-13 14:52         ` [dpdk-dev] [PATCH v6 " Anatoly Burakov
                             ` (18 preceding siblings ...)
  2014-06-13 14:52           ` [dpdk-dev] [PATCH v6 19/20] binding script: Renamed igb_uio_bind to dpdk_nic_bind Anatoly Burakov
@ 2014-06-13 14:52           ` Anatoly Burakov
  2014-06-16  9:08           ` [dpdk-dev] [PATCH v6 00/20] Add VFIO support to DPDK Thomas Monjalon
  20 siblings, 0 replies; 160+ messages in thread
From: Anatoly Burakov @ 2014-06-13 14:52 UTC (permalink / raw)
  To: dev

Support for loading/unloading VFIO drivers, binding/unbinding devices
to/from VFIO, also setting up correct userspace permissions.
---
 tools/setup.sh | 157 +++++++++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 142 insertions(+), 15 deletions(-)

diff --git a/tools/setup.sh b/tools/setup.sh
index a54f65d..369e09e 100755
--- a/tools/setup.sh
+++ b/tools/setup.sh
@@ -187,6 +187,54 @@ load_igb_uio_module()
 }
 
 #
+# Unloads VFIO modules.
+#
+remove_vfio_module()
+{
+	echo "Unloading any existing VFIO module"
+	/sbin/lsmod | grep -s vfio > /dev/null
+	if [ $? -eq 0 ] ; then
+		sudo /sbin/rmmod vfio-pci
+		sudo /sbin/rmmod vfio_iommu_type1
+		sudo /sbin/rmmod vfio
+	fi
+}
+
+#
+# Loads new vfio-pci (and vfio module if needed).
+#
+load_vfio_module()
+{
+	remove_vfio_module
+
+	VFIO_PATH="kernel/drivers/vfio/pci/vfio-pci.ko"
+
+	echo "Loading VFIO module"
+	/sbin/lsmod | grep -s vfio_pci > /dev/null
+	if [ $? -ne 0 ] ; then
+		if [ -f /lib/modules/$(uname -r)/$VFIO_PATH ] ; then
+			sudo /sbin/modprobe vfio-pci
+		fi
+	fi
+
+	# make sure regular users can read /dev/vfio
+	echo "chmod /dev/vfio"
+	sudo /usr/bin/chmod a+x /dev/vfio
+	if [ $? -ne 0 ] ; then
+		echo "FAIL"
+		quit
+	fi
+	echo "OK"
+
+	# check if /dev/vfio/vfio exists - that way we
+	# know we either loaded the module, or it was
+	# compiled into the kernel
+	if [ ! -e /dev/vfio/vfio ] ; then
+		echo "## ERROR: VFIO not found!"
+	fi
+}
+
+#
 # Unloads the rte_kni.ko module.
 #
 remove_kni_module()
@@ -223,6 +271,55 @@ load_kni_module()
 }
 
 #
+# Sets appropriate permissions on /dev/vfio/* files
+#
+set_vfio_permissions()
+{
+	# make sure regular users can read /dev/vfio
+	echo "chmod /dev/vfio"
+	sudo /usr/bin/chmod a+x /dev/vfio
+	if [ $? -ne 0 ] ; then
+		echo "FAIL"
+		quit
+	fi
+	echo "OK"
+
+	# make sure regular user can access everything inside /dev/vfio
+	echo "chmod /dev/vfio/*"
+	sudo /usr/bin/chmod 0666 /dev/vfio/*
+	if [ $? -ne 0 ] ; then
+		echo "FAIL"
+		quit
+	fi
+	echo "OK"
+
+	# since permissions are only to be set when running as
+	# regular user, we only check ulimit here
+	#
+	# warn if regular user is only allowed
+	# to memlock <64M of memory
+	MEMLOCK_AMNT=`ulimit -l`
+
+	if [ "$MEMLOCK_AMNT" != "unlimited" ] ; then
+		MEMLOCK_MB=`expr $MEMLOCK_AMNT / 1024`
+		echo ""
+		echo "Current user memlock limit: ${MEMLOCK_MB} MB"
+		echo ""
+		echo "This is the maximum amount of memory you will be"
+		echo "able to use with DPDK and VFIO if run as current user."
+		echo -n "To change this, please adjust limits.conf memlock "
+		echo "limit for current user."
+
+		if [ $MEMLOCK_AMNT -lt 65536 ] ; then
+			echo ""
+			echo "## WARNING: memlock limit is less than 64MB"
+			echo -n "## DPDK with VFIO may not be able to initialize "
+			echo "if run as current user."
+		fi
+	fi
+}
+
+#
 # Removes all reserved hugepages.
 #
 clear_huge_pages()
@@ -340,7 +437,25 @@ show_nics()
 #
 # Uses dpdk_nic_bind.py to move devices to work with igb_uio
 #
-bind_nics()
+bind_nics_to_vfio()
+{
+	if /sbin/lsmod  | grep -q vfio_pci ; then
+		${RTE_SDK}/tools/dpdk_nic_bind.py --status
+		echo ""
+		echo -n "Enter PCI address of device to bind to VFIO driver: "
+		read PCI_PATH
+		sudo ${RTE_SDK}/tools/dpdk_nic_bind.py -b vfio-pci $PCI_PATH &&
+			echo "OK"
+	else
+		echo "# Please load the 'vfio-pci' kernel module before querying or "
+		echo "# adjusting NIC device bindings"
+	fi
+}
+
+#
+# Uses dpdk_nic_bind.py to move devices to work with igb_uio
+#
+bind_nics_to_igb_uio()
 {
 	if  /sbin/lsmod  | grep -q igb_uio ; then
 		${RTE_SDK}/tools/dpdk_nic_bind.py --status
@@ -397,20 +512,29 @@ step2_func()
 	TEXT[1]="Insert IGB UIO module"
 	FUNC[1]="load_igb_uio_module"
 
-	TEXT[2]="Insert KNI module"
-	FUNC[2]="load_kni_module"
+	TEXT[2]="Insert VFIO module"
+	FUNC[2]="load_vfio_module"
+
+	TEXT[3]="Insert KNI module"
+	FUNC[3]="load_kni_module"
 
-	TEXT[3]="Setup hugepage mappings for non-NUMA systems"
-	FUNC[3]="set_non_numa_pages"
+	TEXT[4]="Setup hugepage mappings for non-NUMA systems"
+	FUNC[4]="set_non_numa_pages"
 
-	TEXT[4]="Setup hugepage mappings for NUMA systems"
-	FUNC[4]="set_numa_pages"
+	TEXT[5]="Setup hugepage mappings for NUMA systems"
+	FUNC[5]="set_numa_pages"
 
-	TEXT[5]="Display current Ethernet device settings"
-	FUNC[5]="show_nics"
+	TEXT[6]="Display current Ethernet device settings"
+	FUNC[6]="show_nics"
 
-	TEXT[6]="Bind Ethernet device to IGB UIO module"
-	FUNC[6]="bind_nics"
+	TEXT[7]="Bind Ethernet device to IGB UIO module"
+	FUNC[7]="bind_nics_to_igb_uio"
+
+	TEXT[8]="Bind Ethernet device to VFIO module"
+	FUNC[8]="bind_nics_to_vfio"
+
+	TEXT[9]="Setup VFIO permissions"
+	FUNC[9]="set_vfio_permissions"
 }
 
 #
@@ -455,11 +579,14 @@ step5_func()
 	TEXT[3]="Remove IGB UIO module"
 	FUNC[3]="remove_igb_uio_module"
 
-	TEXT[4]="Remove KNI module"
-	FUNC[4]="remove_kni_module"
+	TEXT[4]="Remove VFIO module"
+	FUNC[4]="remove_vfio_module"
+
+	TEXT[5]="Remove KNI module"
+	FUNC[5]="remove_kni_module"
 
-	TEXT[5]="Remove hugepage mappings"
-	FUNC[5]="clear_huge_pages"
+	TEXT[6]="Remove hugepage mappings"
+	FUNC[6]="clear_huge_pages"
 }
 
 STEPS[1]="step1_func"
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [dpdk-dev] [PATCH v6 00/20] Add VFIO support to DPDK
  2014-06-13 14:52         ` [dpdk-dev] [PATCH v6 " Anatoly Burakov
                             ` (19 preceding siblings ...)
  2014-06-13 14:52           ` [dpdk-dev] [PATCH v6 20/20] setup script: adding support for VFIO to setup.sh Anatoly Burakov
@ 2014-06-16  9:08           ` Thomas Monjalon
  2014-06-16  9:28             ` Burakov, Anatoly
  20 siblings, 1 reply; 160+ messages in thread
From: Thomas Monjalon @ 2014-06-16  9:08 UTC (permalink / raw)
  To: Anatoly Burakov; +Cc: dev

Hi Anatoly,

The signed-off-by line disappeared from v6 patches.
I assume to be
	Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Please confirm.

-- 
Thomas

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [dpdk-dev] [PATCH v6 00/20] Add VFIO support to DPDK
  2014-06-16  9:08           ` [dpdk-dev] [PATCH v6 00/20] Add VFIO support to DPDK Thomas Monjalon
@ 2014-06-16  9:28             ` Burakov, Anatoly
  2014-06-16 13:07               ` Thomas Monjalon
  0 siblings, 1 reply; 160+ messages in thread
From: Burakov, Anatoly @ 2014-06-16  9:28 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev

Hi Thomas,

> The signed-off-by line disappeared from v6 patches.
> I assume to be
> 	Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Please
> confirm.

Yes, sorry about that, was having a rather long day :-( Both VFIO and tailq patches assume signoff.

Best regards,
Anatoly Burakov
DPDK SW Engineer

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [dpdk-dev] [PATCH v6 00/20] Add VFIO support to DPDK
  2014-06-16  9:28             ` Burakov, Anatoly
@ 2014-06-16 13:07               ` Thomas Monjalon
  0 siblings, 0 replies; 160+ messages in thread
From: Thomas Monjalon @ 2014-06-16 13:07 UTC (permalink / raw)
  To: Burakov, Anatoly; +Cc: dev

2014-06-16 09:28, Burakov, Anatoly:
> 2014-06-16 11:08, Thomas Monjalon:
> > The signed-off-by line disappeared from v6 patches.
> > I assume to be
> > 	Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> > 	Please confirm.
> 
> Yes, sorry about that, was having a rather long day :-(
> Both VFIO and tailq patches assume signoff.

Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
I've just reworded some titles and split or merge some patches.

Applied for version 1.7.0.

Thanks a lot
-- 
Thomas

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [dpdk-dev] [PATCH 00/16] [RFC] [VFIO] Add VFIO support to DPDK
  2014-05-02  8:58   ` Burakov, Anatoly
@ 2014-09-08  8:20     ` Sujith Sankar (ssujith)
  2014-09-08  8:21       ` Burakov, Anatoly
  0 siblings, 1 reply; 160+ messages in thread
From: Sujith Sankar (ssujith) @ 2014-09-08  8:20 UTC (permalink / raw)
  To: Burakov, Anatoly, Stephen Hemminger; +Cc: dev

Hi Anatoly,

Has anything happened in this front?  Do you see running of DPDK in guest
OS on KVM with physical NIC passed through to it happening soon?

Thanks,
-Sujith 

On 02/05/14 2:28 pm, "Burakov, Anatoly" <anatoly.burakov@intel.com> wrote:

>Hi Stephen,
>
>> Will this work in guest? or only on bare metal?
>
>VFIO is Linux-only, and in theory will be able to work on the guest, but
>not at the moment, since it requires IOMMU. There was a GSoC proposal for
>KVM to do IOMMU implementation, and there were a few AMD IOMMU-emulation
>patches floating around the KVM lists for some time, but nothing has made
>it into release yet.
>
>Best regards,
>Anatoly Burakov
>DPDK SW Engineer
>
>--------------------------------------------------------------
>Intel Shannon Limited
>Registered in Ireland
>Registered Office: Collinstown Industrial Park, Leixlip, County Kildare
>Registered Number: 308263
>Business address: Dromore House, East Park, Shannon, Co. Clare
>
>
>

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [dpdk-dev] [PATCH 00/16] [RFC] [VFIO] Add VFIO support to DPDK
  2014-09-08  8:20     ` Sujith Sankar (ssujith)
@ 2014-09-08  8:21       ` Burakov, Anatoly
  2014-09-08  8:27         ` Sujith Sankar (ssujith)
  0 siblings, 1 reply; 160+ messages in thread
From: Burakov, Anatoly @ 2014-09-08  8:21 UTC (permalink / raw)
  To: Sujith Sankar (ssujith), Stephen Hemminger; +Cc: dev

Hi Sujith

Not that I know of, no. There are other ways to run physical NICs in a VM though, you don't require VFIO for that.

Thanks,
Anatoly

-----Original Message-----
From: Sujith Sankar (ssujith) [mailto:ssujith@cisco.com] 
Sent: Monday, September 8, 2014 9:20 AM
To: Burakov, Anatoly; Stephen Hemminger
Cc: dev@dpdk.org
Subject: Re: [dpdk-dev] [PATCH 00/16] [RFC] [VFIO] Add VFIO support to DPDK

Hi Anatoly,

Has anything happened in this front?  Do you see running of DPDK in guest OS on KVM with physical NIC passed through to it happening soon?

Thanks,
-Sujith 

On 02/05/14 2:28 pm, "Burakov, Anatoly" <anatoly.burakov@intel.com> wrote:

>Hi Stephen,
>
>> Will this work in guest? or only on bare metal?
>
>VFIO is Linux-only, and in theory will be able to work on the guest, 
>but not at the moment, since it requires IOMMU. There was a GSoC 
>proposal for KVM to do IOMMU implementation, and there were a few AMD 
>IOMMU-emulation patches floating around the KVM lists for some time, 
>but nothing has made it into release yet.
>
>Best regards,
>Anatoly Burakov
>DPDK SW Engineer
>
>--------------------------------------------------------------
>Intel Shannon Limited
>Registered in Ireland
>Registered Office: Collinstown Industrial Park, Leixlip, County Kildare 
>Registered Number: 308263 Business address: Dromore House, East Park, 
>Shannon, Co. Clare
>
>
>

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [dpdk-dev] [PATCH 00/16] [RFC] [VFIO] Add VFIO support to DPDK
  2014-09-08  8:21       ` Burakov, Anatoly
@ 2014-09-08  8:27         ` Sujith Sankar (ssujith)
  2014-09-08  8:30           ` Burakov, Anatoly
  0 siblings, 1 reply; 160+ messages in thread
From: Sujith Sankar (ssujith) @ 2014-09-08  8:27 UTC (permalink / raw)
  To: Burakov, Anatoly, Stephen Hemminger; +Cc: dev

Anatoly,
Thanks for the quick response !

I am able do PCI passthrough and use the NIC in the guest OS.
What I¹m trying to do is run DPDK in the guest and make use of the
passed-through NIC.  Without using VFIO, could I achieve this?

Thanks,
-Sujith

On 08/09/14 1:51 pm, "Burakov, Anatoly" <anatoly.burakov@intel.com> wrote:

>Hi Sujith
>
>Not that I know of, no. There are other ways to run physical NICs in a VM
>though, you don't require VFIO for that.
>
>Thanks,
>Anatoly
>
>-----Original Message-----
>From: Sujith Sankar (ssujith) [mailto:ssujith@cisco.com]
>Sent: Monday, September 8, 2014 9:20 AM
>To: Burakov, Anatoly; Stephen Hemminger
>Cc: dev@dpdk.org
>Subject: Re: [dpdk-dev] [PATCH 00/16] [RFC] [VFIO] Add VFIO support to
>DPDK
>
>Hi Anatoly,
>
>Has anything happened in this front?  Do you see running of DPDK in guest
>OS on KVM with physical NIC passed through to it happening soon?
>
>Thanks,
>-Sujith 
>
>On 02/05/14 2:28 pm, "Burakov, Anatoly" <anatoly.burakov@intel.com> wrote:
>
>>Hi Stephen,
>>
>>> Will this work in guest? or only on bare metal?
>>
>>VFIO is Linux-only, and in theory will be able to work on the guest,
>>but not at the moment, since it requires IOMMU. There was a GSoC
>>proposal for KVM to do IOMMU implementation, and there were a few AMD
>>IOMMU-emulation patches floating around the KVM lists for some time,
>>but nothing has made it into release yet.
>>
>>Best regards,
>>Anatoly Burakov
>>DPDK SW Engineer
>>
>>--------------------------------------------------------------
>>Intel Shannon Limited
>>Registered in Ireland
>>Registered Office: Collinstown Industrial Park, Leixlip, County Kildare
>>Registered Number: 308263 Business address: Dromore House, East Park,
>>Shannon, Co. Clare
>>
>>
>>
>

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [dpdk-dev] [PATCH 00/16] [RFC] [VFIO] Add VFIO support to DPDK
  2014-09-08  8:27         ` Sujith Sankar (ssujith)
@ 2014-09-08  8:30           ` Burakov, Anatoly
  2014-09-08  8:33             ` Sujith Sankar (ssujith)
  0 siblings, 1 reply; 160+ messages in thread
From: Burakov, Anatoly @ 2014-09-08  8:30 UTC (permalink / raw)
  To: Sujith Sankar (ssujith), Stephen Hemminger; +Cc: dev

Hi Sujith

Of course you can. Just use the igb_uio driver instead. Refer to the Getting Started Guide from Intel, it'll walk you through the steps, although they are basically the same as for VFIO.

Thanks,
Anatoly

-----Original Message-----
From: Sujith Sankar (ssujith) [mailto:ssujith@cisco.com] 
Sent: Monday, September 8, 2014 9:28 AM
To: Burakov, Anatoly; Stephen Hemminger
Cc: dev@dpdk.org
Subject: Re: [dpdk-dev] [PATCH 00/16] [RFC] [VFIO] Add VFIO support to DPDK

Anatoly,
Thanks for the quick response !

I am able do PCI passthrough and use the NIC in the guest OS.
What I¹m trying to do is run DPDK in the guest and make use of the passed-through NIC.  Without using VFIO, could I achieve this?

Thanks,
-Sujith

On 08/09/14 1:51 pm, "Burakov, Anatoly" <anatoly.burakov@intel.com> wrote:

>Hi Sujith
>
>Not that I know of, no. There are other ways to run physical NICs in a 
>VM though, you don't require VFIO for that.
>
>Thanks,
>Anatoly
>
>-----Original Message-----
>From: Sujith Sankar (ssujith) [mailto:ssujith@cisco.com]
>Sent: Monday, September 8, 2014 9:20 AM
>To: Burakov, Anatoly; Stephen Hemminger
>Cc: dev@dpdk.org
>Subject: Re: [dpdk-dev] [PATCH 00/16] [RFC] [VFIO] Add VFIO support to 
>DPDK
>
>Hi Anatoly,
>
>Has anything happened in this front?  Do you see running of DPDK in 
>guest OS on KVM with physical NIC passed through to it happening soon?
>
>Thanks,
>-Sujith
>
>On 02/05/14 2:28 pm, "Burakov, Anatoly" <anatoly.burakov@intel.com> wrote:
>
>>Hi Stephen,
>>
>>> Will this work in guest? or only on bare metal?
>>
>>VFIO is Linux-only, and in theory will be able to work on the guest, 
>>but not at the moment, since it requires IOMMU. There was a GSoC 
>>proposal for KVM to do IOMMU implementation, and there were a few AMD 
>>IOMMU-emulation patches floating around the KVM lists for some time, 
>>but nothing has made it into release yet.
>>
>>Best regards,
>>Anatoly Burakov
>>DPDK SW Engineer
>>
>>--------------------------------------------------------------
>>Intel Shannon Limited
>>Registered in Ireland
>>Registered Office: Collinstown Industrial Park, Leixlip, County 
>>Kildare Registered Number: 308263 Business address: Dromore House, 
>>East Park, Shannon, Co. Clare
>>
>>
>>
>

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [dpdk-dev] [PATCH 00/16] [RFC] [VFIO] Add VFIO support to DPDK
  2014-09-08  8:30           ` Burakov, Anatoly
@ 2014-09-08  8:33             ` Sujith Sankar (ssujith)
  0 siblings, 0 replies; 160+ messages in thread
From: Sujith Sankar (ssujith) @ 2014-09-08  8:33 UTC (permalink / raw)
  To: Burakov, Anatoly, Stephen Hemminger; +Cc: dev

Thank you Anatoly !
I’ll do that and get back in case of questions.

Regards,
-Sujith

On 08/09/14 2:00 pm, "Burakov, Anatoly" <anatoly.burakov@intel.com> wrote:

>Hi Sujith
>
>Of course you can. Just use the igb_uio driver instead. Refer to the
>Getting Started Guide from Intel, it'll walk you through the steps,
>although they are basically the same as for VFIO.
>
>Thanks,
>Anatoly
>
>-----Original Message-----
>From: Sujith Sankar (ssujith) [mailto:ssujith@cisco.com]
>Sent: Monday, September 8, 2014 9:28 AM
>To: Burakov, Anatoly; Stephen Hemminger
>Cc: dev@dpdk.org
>Subject: Re: [dpdk-dev] [PATCH 00/16] [RFC] [VFIO] Add VFIO support to
>DPDK
>
>Anatoly,
>Thanks for the quick response !
>
>I am able do PCI passthrough and use the NIC in the guest OS.
>What I¹m trying to do is run DPDK in the guest and make use of the
>passed-through NIC.  Without using VFIO, could I achieve this?
>
>Thanks,
>-Sujith
>
>On 08/09/14 1:51 pm, "Burakov, Anatoly" <anatoly.burakov@intel.com> wrote:
>
>>Hi Sujith
>>
>>Not that I know of, no. There are other ways to run physical NICs in a
>>VM though, you don't require VFIO for that.
>>
>>Thanks,
>>Anatoly
>>
>>-----Original Message-----
>>From: Sujith Sankar (ssujith) [mailto:ssujith@cisco.com]
>>Sent: Monday, September 8, 2014 9:20 AM
>>To: Burakov, Anatoly; Stephen Hemminger
>>Cc: dev@dpdk.org
>>Subject: Re: [dpdk-dev] [PATCH 00/16] [RFC] [VFIO] Add VFIO support to
>>DPDK
>>
>>Hi Anatoly,
>>
>>Has anything happened in this front?  Do you see running of DPDK in
>>guest OS on KVM with physical NIC passed through to it happening soon?
>>
>>Thanks,
>>-Sujith
>>
>>On 02/05/14 2:28 pm, "Burakov, Anatoly" <anatoly.burakov@intel.com>
>>wrote:
>>
>>>Hi Stephen,
>>>
>>>> Will this work in guest? or only on bare metal?
>>>
>>>VFIO is Linux-only, and in theory will be able to work on the guest,
>>>but not at the moment, since it requires IOMMU. There was a GSoC
>>>proposal for KVM to do IOMMU implementation, and there were a few AMD
>>>IOMMU-emulation patches floating around the KVM lists for some time,
>>>but nothing has made it into release yet.
>>>
>>>Best regards,
>>>Anatoly Burakov
>>>DPDK SW Engineer
>>>
>>>--------------------------------------------------------------
>>>Intel Shannon Limited
>>>Registered in Ireland
>>>Registered Office: Collinstown Industrial Park, Leixlip, County
>>>Kildare Registered Number: 308263 Business address: Dromore House,
>>>East Park, Shannon, Co. Clare
>>>
>>>
>>>
>>
>


^ permalink raw reply	[flat|nested] 160+ messages in thread

end of thread, other threads:[~2014-09-08  8:28 UTC | newest]

Thread overview: 160+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-01 11:05 [dpdk-dev] [PATCH 00/16] [RFC] [VFIO] Add VFIO support to DPDK Burakov, Anatoly
2014-05-01 16:12 ` Stephen Hemminger
2014-05-01 17:00   ` Chris Wright
2014-05-02  9:00     ` Burakov, Anatoly
2014-05-05 14:44       ` Vincent JARDIN
2014-05-06  8:41         ` Burakov, Anatoly
2014-05-02  8:58   ` Burakov, Anatoly
2014-09-08  8:20     ` Sujith Sankar (ssujith)
2014-09-08  8:21       ` Burakov, Anatoly
2014-09-08  8:27         ` Sujith Sankar (ssujith)
2014-09-08  8:30           ` Burakov, Anatoly
2014-09-08  8:33             ` Sujith Sankar (ssujith)
2014-05-19 15:51 ` [dpdk-dev] [PATCH v2 00/16] " Anatoly Burakov
2014-05-28 14:37   ` [dpdk-dev] [PATCH v3 00/20] " Anatoly Burakov
2014-05-28 14:37     ` [dpdk-dev] [PATCH v3 01/20] pci: move open() out of pci_map_resource, rename structs Anatoly Burakov
2014-05-28 14:37     ` [dpdk-dev] [PATCH v3 02/20] pci: move uio mapping code to a separate file Anatoly Burakov
2014-05-28 14:38     ` [dpdk-dev] [PATCH v3 03/20] pci: fixing errors in a previous commit found by checkpatch Anatoly Burakov
2014-05-28 14:38     ` [dpdk-dev] [PATCH v3 04/20] pci: distinguish between legitimate failures and non-fatal errors Anatoly Burakov
2014-05-28 14:38     ` [dpdk-dev] [PATCH v3 05/20] pci: Rename RTE_PCI_DRV_NEED_IGB_UIO to RTE_PCI_DRV_NEED_MAPPING Anatoly Burakov
2014-05-28 14:38     ` [dpdk-dev] [PATCH v3 06/20] igb_uio: make igb_uio compilation optional Anatoly Burakov
2014-05-28 14:38     ` [dpdk-dev] [PATCH v3 07/20] igb_uio: Moved interrupt type out of igb_uio Anatoly Burakov
2014-05-28 14:38     ` [dpdk-dev] [PATCH v3 08/20] vfio: add support for VFIO in Linuxapp targets Anatoly Burakov
2014-05-28 14:38     ` [dpdk-dev] [PATCH v3 09/20] vfio: add VFIO header Anatoly Burakov
2014-05-28 14:38     ` [dpdk-dev] [PATCH v3 10/20] interrupts: Add support for VFIO interrupts Anatoly Burakov
2014-05-28 14:38     ` [dpdk-dev] [PATCH v3 11/20] eal: remove -Wno-return-type for non-existent eal_hpet.c Anatoly Burakov
2014-05-28 14:38     ` [dpdk-dev] [PATCH v3 12/20] vfio: create mapping code for VFIO Anatoly Burakov
2014-05-28 14:38     ` [dpdk-dev] [PATCH v3 13/20] vfio: add multiprocess support Anatoly Burakov
2014-05-28 14:38     ` [dpdk-dev] [PATCH v3 14/20] pci: enable VFIO device binding Anatoly Burakov
2014-05-28 14:38     ` [dpdk-dev] [PATCH v3 15/20] eal: added support for selecting VFIO interrupt type from EAL command-line Anatoly Burakov
2014-05-28 14:38     ` [dpdk-dev] [PATCH v3 16/20] eal: make --no-huge use mmap instead of malloc Anatoly Burakov
2014-05-28 14:38     ` [dpdk-dev] [PATCH v3 17/20] test app: adding unit tests for VFIO EAL command-line parameter Anatoly Burakov
2014-05-28 14:38     ` [dpdk-dev] [PATCH v3 18/20] igb_uio: Removed PCI ID table from igb_uio Anatoly Burakov
2014-05-28 14:38     ` [dpdk-dev] [PATCH v3 19/20] binding script: Renamed igb_uio_bind to dpdk_nic_bind Anatoly Burakov
2014-05-28 14:38     ` [dpdk-dev] [PATCH v3 20/20] setup script: adding support for VFIO to setup.sh Anatoly Burakov
2014-06-03 10:17     ` [dpdk-dev] [PATCH v4 00/20] Add VFIO support to DPDK Anatoly Burakov
2014-06-03 10:17       ` [dpdk-dev] [PATCH v4 01/20] pci: move open() out of pci_map_resource, rename structs Anatoly Burakov
2014-06-03 10:17       ` [dpdk-dev] [PATCH v4 02/20] pci: move uio mapping code to a separate file Anatoly Burakov
2014-06-03 10:17       ` [dpdk-dev] [PATCH v4 03/20] pci: fixing errors in a previous commit found by checkpatch Anatoly Burakov
2014-06-03 10:17       ` [dpdk-dev] [PATCH v4 04/20] pci: distinguish between legitimate failures and non-fatal errors Anatoly Burakov
2014-06-03 10:17       ` [dpdk-dev] [PATCH v4 05/20] pci: Rename RTE_PCI_DRV_NEED_IGB_UIO to RTE_PCI_DRV_NEED_MAPPING Anatoly Burakov
2014-06-04  9:03         ` Burakov, Anatoly
2014-06-03 10:17       ` [dpdk-dev] [PATCH v4 06/20] igb_uio: make igb_uio compilation optional Anatoly Burakov
2014-06-03 10:17       ` [dpdk-dev] [PATCH v4 07/20] igb_uio: Moved interrupt type out of igb_uio Anatoly Burakov
2014-06-03 10:17       ` [dpdk-dev] [PATCH v4 08/20] vfio: add support for VFIO in Linuxapp targets Anatoly Burakov
2014-06-03 10:17       ` [dpdk-dev] [PATCH v4 09/20] vfio: add VFIO header Anatoly Burakov
2014-06-03 10:17       ` [dpdk-dev] [PATCH v4 10/20] interrupts: Add support for VFIO interrupts Anatoly Burakov
2014-06-03 10:18       ` [dpdk-dev] [PATCH v4 11/20] eal: remove -Wno-return-type for non-existent eal_hpet.c Anatoly Burakov
2014-06-03 10:18       ` [dpdk-dev] [PATCH v4 12/20] vfio: create mapping code for VFIO Anatoly Burakov
2014-06-03 10:18       ` [dpdk-dev] [PATCH v4 13/20] vfio: add multiprocess support Anatoly Burakov
2014-06-03 10:18       ` [dpdk-dev] [PATCH v4 14/20] pci: enable VFIO device binding Anatoly Burakov
2014-06-03 10:18       ` [dpdk-dev] [PATCH v4 15/20] eal: added support for selecting VFIO interrupt type from EAL command-line Anatoly Burakov
2014-06-03 10:18       ` [dpdk-dev] [PATCH v4 16/20] eal: make --no-huge use mmap instead of malloc Anatoly Burakov
2014-06-03 10:18       ` [dpdk-dev] [PATCH v4 17/20] test app: adding unit tests for VFIO EAL command-line parameter Anatoly Burakov
2014-06-03 10:18       ` [dpdk-dev] [PATCH v4 18/20] igb_uio: Removed PCI ID table from igb_uio Anatoly Burakov
2014-06-03 10:18       ` [dpdk-dev] [PATCH v4 19/20] binding script: Renamed igb_uio_bind to dpdk_nic_bind Anatoly Burakov
2014-06-03 10:18       ` [dpdk-dev] [PATCH v4 20/20] setup script: adding support for VFIO to setup.sh Anatoly Burakov
2014-06-10 11:11       ` [dpdk-dev] [PATCH v5 00/20] Add VFIO support to DPDK Anatoly Burakov
2014-06-10 11:11         ` [dpdk-dev] [PATCH v5 01/20] pci: move open() out of pci_map_resource, rename structs Anatoly Burakov
2014-06-10 11:11         ` [dpdk-dev] [PATCH v5 02/20] pci: move uio mapping code to a separate file Anatoly Burakov
2014-06-10 11:11         ` [dpdk-dev] [PATCH v5 03/20] pci: fixing errors in a previous commit found by checkpatch Anatoly Burakov
2014-06-10 11:11         ` [dpdk-dev] [PATCH v5 04/20] pci: distinguish between legitimate failures and non-fatal errors Anatoly Burakov
2014-06-10 11:11         ` [dpdk-dev] [PATCH v5 05/20] pci: Rename RTE_PCI_DRV_NEED_IGB_UIO to RTE_PCI_DRV_NEED_MAPPING Anatoly Burakov
2014-06-10 11:11         ` [dpdk-dev] [PATCH v5 06/20] igb_uio: make igb_uio compilation optional Anatoly Burakov
2014-06-10 11:11         ` [dpdk-dev] [PATCH v5 07/20] igb_uio: Moved interrupt type out of igb_uio Anatoly Burakov
2014-06-10 11:11         ` [dpdk-dev] [PATCH v5 08/20] vfio: add support for VFIO in Linuxapp targets Anatoly Burakov
2014-06-10 11:11         ` [dpdk-dev] [PATCH v5 09/20] vfio: add VFIO header Anatoly Burakov
2014-06-10 11:11         ` [dpdk-dev] [PATCH v5 10/20] interrupts: Add support for VFIO interrupts Anatoly Burakov
2014-06-10 11:11         ` [dpdk-dev] [PATCH v5 11/20] eal: remove -Wno-return-type for non-existent eal_hpet.c Anatoly Burakov
2014-06-10 11:11         ` [dpdk-dev] [PATCH v5 12/20] vfio: create mapping code for VFIO Anatoly Burakov
2014-06-10 11:11         ` [dpdk-dev] [PATCH v5 13/20] vfio: add multiprocess support Anatoly Burakov
2014-06-10 11:11         ` [dpdk-dev] [PATCH v5 14/20] pci: enable VFIO device binding Anatoly Burakov
2014-06-10 11:11         ` [dpdk-dev] [PATCH v5 15/20] eal: added support for selecting VFIO interrupt type from EAL command-line Anatoly Burakov
2014-06-10 11:11         ` [dpdk-dev] [PATCH v5 16/20] eal: make --no-huge use mmap instead of malloc Anatoly Burakov
2014-06-10 11:11         ` [dpdk-dev] [PATCH v5 17/20] test app: adding unit tests for VFIO EAL command-line parameter Anatoly Burakov
2014-06-10 11:11         ` [dpdk-dev] [PATCH v5 18/20] igb_uio: Removed PCI ID table from igb_uio Anatoly Burakov
2014-06-10 11:11         ` [dpdk-dev] [PATCH v5 19/20] binding script: Renamed igb_uio_bind to dpdk_nic_bind Anatoly Burakov
2014-06-10 11:11         ` [dpdk-dev] [PATCH v5 20/20] setup script: adding support for VFIO to setup.sh Anatoly Burakov
2014-06-13 14:38         ` [dpdk-dev] [PATCH v5 00/20] Add VFIO support to DPDK Burakov, Anatoly
2014-06-13 14:52         ` [dpdk-dev] [PATCH v6 " Anatoly Burakov
2014-06-13 14:52           ` [dpdk-dev] [PATCH v6 01/20] pci: move open() out of pci_map_resource, rename structs Anatoly Burakov
2014-06-13 14:52           ` [dpdk-dev] [PATCH v6 02/20] pci: move uio mapping code to a separate file Anatoly Burakov
2014-06-13 14:52           ` [dpdk-dev] [PATCH v6 03/20] pci: fixing errors in a previous commit found by checkpatch Anatoly Burakov
2014-06-13 14:52           ` [dpdk-dev] [PATCH v6 04/20] pci: distinguish between legitimate failures and non-fatal errors Anatoly Burakov
2014-06-13 14:52           ` [dpdk-dev] [PATCH v6 05/20] pci: Rename RTE_PCI_DRV_NEED_IGB_UIO to RTE_PCI_DRV_NEED_MAPPING Anatoly Burakov
2014-06-13 14:52           ` [dpdk-dev] [PATCH v6 06/20] igb_uio: make igb_uio compilation optional Anatoly Burakov
2014-06-13 14:52           ` [dpdk-dev] [PATCH v6 07/20] igb_uio: Moved interrupt type out of igb_uio Anatoly Burakov
2014-06-13 14:52           ` [dpdk-dev] [PATCH v6 08/20] vfio: add support for VFIO in Linuxapp targets Anatoly Burakov
2014-06-13 14:52           ` [dpdk-dev] [PATCH v6 09/20] vfio: add VFIO header Anatoly Burakov
2014-06-13 14:52           ` [dpdk-dev] [PATCH v6 10/20] interrupts: Add support for VFIO interrupts Anatoly Burakov
2014-06-13 14:52           ` [dpdk-dev] [PATCH v6 11/20] eal: remove -Wno-return-type for non-existent eal_hpet.c Anatoly Burakov
2014-06-13 14:52           ` [dpdk-dev] [PATCH v6 12/20] vfio: create mapping code for VFIO Anatoly Burakov
2014-06-13 14:52           ` [dpdk-dev] [PATCH v6 13/20] vfio: add multiprocess support Anatoly Burakov
2014-06-13 14:52           ` [dpdk-dev] [PATCH v6 14/20] pci: enable VFIO device binding Anatoly Burakov
2014-06-13 14:52           ` [dpdk-dev] [PATCH v6 15/20] eal: added support for selecting VFIO interrupt type from EAL command-line Anatoly Burakov
2014-06-13 14:52           ` [dpdk-dev] [PATCH v6 16/20] eal: make --no-huge use mmap instead of malloc Anatoly Burakov
2014-06-13 14:52           ` [dpdk-dev] [PATCH v6 17/20] test app: adding unit tests for VFIO EAL command-line parameter Anatoly Burakov
2014-06-13 14:52           ` [dpdk-dev] [PATCH v6 18/20] igb_uio: Removed PCI ID table from igb_uio Anatoly Burakov
2014-06-13 14:52           ` [dpdk-dev] [PATCH v6 19/20] binding script: Renamed igb_uio_bind to dpdk_nic_bind Anatoly Burakov
2014-06-13 14:52           ` [dpdk-dev] [PATCH v6 20/20] setup script: adding support for VFIO to setup.sh Anatoly Burakov
2014-06-16  9:08           ` [dpdk-dev] [PATCH v6 00/20] Add VFIO support to DPDK Thomas Monjalon
2014-06-16  9:28             ` Burakov, Anatoly
2014-06-16 13:07               ` Thomas Monjalon
2014-05-19 15:51 ` [dpdk-dev] [PATCH v2 01/16] Separate igb_uio mapping into a separate file Anatoly Burakov
2014-05-21 12:42   ` Thomas Monjalon
2014-05-19 15:51 ` [dpdk-dev] [PATCH v2 02/16] Distinguish between legitimate failures and non-fatal errors Anatoly Burakov
2014-05-19 15:51 ` [dpdk-dev] [PATCH v2 03/16] Rename RTE_PCI_DRV_NEED_IGB_UIO to RTE_PCI_DRV_NEED_MAPPING Anatoly Burakov
2014-05-21 12:55   ` Thomas Monjalon
2014-05-19 15:51 ` [dpdk-dev] [PATCH v2 04/16] Make igb_uio compilation optional Anatoly Burakov
2014-05-19 15:51 ` [dpdk-dev] [PATCH v2 05/16] Moved interrupt type out of igb_uio Anatoly Burakov
2014-05-21 13:38   ` Thomas Monjalon
2014-05-21 13:44     ` Burakov, Anatoly
2014-05-21 13:46   ` Thomas Monjalon
2014-05-19 15:51 ` [dpdk-dev] [PATCH v2 06/16] Add support for VFIO in Linuxapp targets Anatoly Burakov
2014-05-19 15:51 ` [dpdk-dev] [PATCH v2 07/16] Add support for VFIO interrupts, add VFIO header Anatoly Burakov
2014-05-21 16:07   ` Thomas Monjalon
2014-05-22 12:45     ` Burakov, Anatoly
2014-05-22 12:49       ` Thomas Monjalon
2014-05-22 12:54         ` Burakov, Anatoly
2014-05-27 14:29           ` Burakov, Anatoly
2014-05-27 14:38             ` Thomas Monjalon
2014-05-27 14:40               ` Burakov, Anatoly
2014-05-27 14:46                 ` Thomas Monjalon
2014-05-19 15:51 ` [dpdk-dev] [PATCH v2 08/16] Add support for mapping devices through VFIO Anatoly Burakov
2014-05-22 11:53   ` Thomas Monjalon
2014-05-22 12:06     ` Burakov, Anatoly
2014-05-22 12:28       ` Thomas Monjalon
2014-05-22 12:37         ` Burakov, Anatoly
2014-05-22 12:46           ` Thomas Monjalon
2014-05-22 12:54             ` Burakov, Anatoly
2014-05-27 16:21     ` Burakov, Anatoly
2014-05-27 16:36       ` Thomas Monjalon
2014-05-19 15:51 ` [dpdk-dev] [PATCH v2 09/16] Enable VFIO device binding Anatoly Burakov
2014-05-22 12:03   ` Thomas Monjalon
2014-05-19 15:51 ` [dpdk-dev] [PATCH v2 10/16] Added support for selecting VFIO interrupt type from EAL command-line Anatoly Burakov
2014-05-20  7:40   ` Stephen Hemminger
2014-05-20  8:33     ` Burakov, Anatoly
2014-05-20 11:23       ` Stephen Hemminger
2014-05-20 11:26         ` Burakov, Anatoly
2014-05-20 21:39           ` Stephen Hemminger
2014-05-22 12:34   ` Thomas Monjalon
2014-05-28 10:35     ` Burakov, Anatoly
2014-05-28 11:24       ` Thomas Monjalon
2014-05-19 15:51 ` [dpdk-dev] [PATCH v2 11/16] Make --no-huge use mmap instead of malloc Anatoly Burakov
2014-05-22 13:04   ` Thomas Monjalon
2014-05-19 15:51 ` [dpdk-dev] [PATCH v2 12/16] Adding unit tests for VFIO EAL command-line parameter Anatoly Burakov
2014-05-19 15:51 ` [dpdk-dev] [PATCH v2 13/16] Removed PCI ID table from igb_uio Anatoly Burakov
2014-05-22 13:13   ` Thomas Monjalon
2014-05-22 13:24     ` Burakov, Anatoly
2014-05-22 13:28       ` Thomas Monjalon
2014-05-22 23:11     ` Stephen Hemminger
2014-05-23  7:48       ` Thomas Monjalon
2014-05-23  0:10     ` Antti Kantee
2014-05-28 13:45       ` Thomas Monjalon
2014-05-28 14:50         ` Antti Kantee
2014-05-28 16:24         ` Stephen Hemminger
2014-05-19 15:51 ` [dpdk-dev] [PATCH v2 14/16] Renamed igb_uio_bind to dpdk_nic_bind Anatoly Burakov
2014-05-19 15:51 ` [dpdk-dev] [PATCH v2 15/16] Added support for VFIO drivers in dpdk_nic_bind.py Anatoly Burakov
2014-05-22 13:23   ` Thomas Monjalon
2014-05-19 15:51 ` [dpdk-dev] [PATCH v2 16/16] Adding support for VFIO to setup.sh Anatoly Burakov
2014-05-22 13:25   ` Thomas Monjalon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).