From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id 51B3AAFD9 for ; Tue, 27 May 2014 05:19:22 +0200 (CEST) Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga102.jf.intel.com with ESMTP; 26 May 2014 20:14:17 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.98,916,1392192000"; d="scan'208,217";a="546896389" Received: from fmsmsx104.amr.corp.intel.com ([10.19.9.35]) by orsmga002.jf.intel.com with ESMTP; 26 May 2014 20:19:20 -0700 Received: from FMSMSX109.amr.corp.intel.com (10.18.116.9) by FMSMSX104.amr.corp.intel.com (10.19.9.35) with Microsoft SMTP Server (TLS) id 14.3.123.3; Mon, 26 May 2014 20:19:19 -0700 Received: from shsmsx104.ccr.corp.intel.com (10.239.4.70) by fmsmsx109.amr.corp.intel.com (10.18.116.9) with Microsoft SMTP Server (TLS) id 14.3.123.3; Mon, 26 May 2014 20:19:17 -0700 Received: from shsmsx101.ccr.corp.intel.com ([169.254.1.7]) by SHSMSX104.ccr.corp.intel.com ([169.254.5.192]) with mapi id 14.03.0123.003; Tue, 27 May 2014 11:19:15 +0800 From: "Xu, HuilongX" To: "dev@dpdk.org" , "Burakov, Anatoly" Thread-Topic: RE: [dpdk-dev] [PATCH v2 08/16] Add support for mapping devices through VFIO. Thread-Index: Ac95Wm8Bi7SrdyWuRdqgZhjz1tR9FQ== Date: Tue, 27 May 2014 03:19:15 +0000 Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-cr-hashedpuzzle: AGao Ac8c AlRv Amda AsqZ A4FV CRAr DZp0 FGza F9w5 GE5h GyNk Gy5I G4kT Igkb KyWz; 1; ZABlAHYAQABkAHAAZABrAC4AbwByAGcA; Sosha1_v1; 7; {5BCE8831-693F-4B2F-B14C-6812307090BE}; aAB1AGkAbABvAG4AZwB4AC4AeAB1AEAAaQBuAHQAZQBsAC4AYwBvAG0A; Tue, 27 May 2014 03:19:13 GMT; UgBFADoAIABbAGQAcABkAGsALQBkAGUAdgBdACAAWwBQAEEAVABDAEgAIAB2ADIAIAAwADgALwAxADYAXQAgAEEAZABkACAAcwB1AHAAcABvAHIAdAAgAGYAbwByACAAbQBhAHAAcABpAG4AZwAgAGQAZQB2AGkAYwBlAHMAIAB0AGgAcgBvAHUAZwBoAAkAVgBGAEkATwAuAA== x-cr-puzzleid: {5BCE8831-693F-4B2F-B14C-6812307090BE} x-originating-ip: [10.239.127.40] MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Subject: Re: [dpdk-dev] [PATCH v2 08/16] Add support for mapping devices through VFIO. X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 May 2014 03:19:24 -0000 VFIO is kernel 3.6+ only, and so is only compiled when DPDK config option CONFIG_RTE_EAL_VFIO is enabled, and kernel 3.6 or higher is detected, thus preventing compile failures on older kernels if VFIO is enabled in config (and it is, by default). Since VFIO cannot be used to map the same device twice, secondary processes receive the device/group fd's by means of communicating over a local socket. Only group and container fd's should be sent, as device fd's can be obtained via ioctl() calls' on the group fd. For multiprocess, VFIO distinguishes between existing but unused groups (e.g. grups that aren't bound to VFIO driver) and non-existing groups in order to know if the secondary process requests a valid group, or if secondary process requests something that doesn't exist. Signed-off-by: Anatoly Burakov Test-by: HuilongX Xu > Compile pass >>Compile OS: FC20 x86_64 >>Kernel version: 3.13.6-200 >>GCC version: 4.8.2 >>Server: Crownpass --- lib/librte_eal/linuxapp/eal/Makefile | 5 +- lib/librte_eal/linuxapp/eal/eal.c | 1 + lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 719 +++++++++++++++++= +++ lib/librte_eal/linuxapp/eal/eal_pci_vfio_socket.c | 367 ++++++++++ .../linuxapp/eal/include/eal_internal_cfg.h | 3 + lib/librte_eal/linuxapp/eal/include/eal_pci_init.h | 55 ++ lib/librte_eal/linuxapp/eal/include/eal_vfio.h | 6 + 7 files changed, 1155 insertions(+), 1 deletions(-) create mode 100644 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c create mode 100644 lib/librte_eal/linuxapp/eal/eal_pci_vfio_socket.c diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp= /eal/Makefile index 527fa2a..3a39cca 100644 --- a/lib/librte_eal/linuxapp/eal/Makefile +++ b/lib/librte_eal/linuxapp/eal/Makefile @@ -58,6 +58,8 @@ SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) +=3D eal_thread.c SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) +=3D eal_log.c SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) +=3D eal_pci.c SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) +=3D eal_pci_uio.c +SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) +=3D eal_pci_vfio.c +SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) +=3D eal_pci_vfio_socket.c SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) +=3D eal_debug.c SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) +=3D eal_lcore.c SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) +=3D eal_timer.c @@ -88,12 +90,13 @@ CFLAGS_eal_common_log.o :=3D -D_GNU_SOURCE CFLAGS_eal_hugepage_info.o :=3D -D_GNU_SOURCE CFLAGS_eal_pci.o :=3D -D_GNU_SOURCE CFLAGS_eal_common_whitelist.o :=3D -D_GNU_SOURCE +CFLAGS_eal_pci_vfio.o :=3D -D_GNU_SOURCE # workaround for a gcc bug with noreturn attribute # http://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D12603 ifeq ($(CONFIG_RTE_TOOLCHAIN_GCC),y) CFLAGS_eal_thread.o +=3D -Wno-return-type -CFLAGS_eal_hpet.o +=3D -Wno-return-type +CFLAGS_eal_pci_vfio_socket.o +=3D -Wno-return-type endif INC :=3D rte_per_lcore.h rte_lcore.h rte_interrupts.h rte_kni_common.h rte= _dom0_common.h diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/ea= l/eal.c index de182e1..01bfd6c 100644 --- a/lib/librte_eal/linuxapp/eal/eal.c +++ b/lib/librte_eal/linuxapp/eal/eal.c @@ -650,6 +650,7 @@ eal_parse_args(int argc, char **argv) internal_config.force_sockets =3D 0; internal_config.syslog_facility =3D LOG_DAEMON; internal_config.xen_dom0_support =3D 0; + internal_config.vfio_intr_mode =3D RTE_INTR_MODE_MSIX; #ifdef RTE_LIBEAL_USE_HPET internal_config.no_hpet =3D 0; #else diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c b/lib/librte_eal/li= nuxapp/eal/eal_pci_vfio.c new file mode 100644 index 0000000..0a6f95c --- /dev/null +++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c @@ -0,0 +1,719 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2010-2014 Intel Corporation. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include + +#include "eal_filesystem.h" +#include "eal_pci_init.h" +#include "eal_vfio.h" + +/** + * @file + * PCI probing under linux (VFIO version) + * + * This code tries to determine if the PCI device is bound to VFIO driver, + * and initialize it (map BARs, set up interrupts) if that's the case. + * + * This file is only compiled if CONFIG_RTE_EAL_VFIO is set to "y". + */ + +#ifdef VFIO_PRESENT +/* get PCI BAR number where MSI-X interrupts are */ +static int +pci_vfio_get_msix_bar(int fd, int * msix_bar) +{ + int ret; + uint32_t reg; + uint8_t cap_id, cap_offset; + + /* read PCI capability pointer from config space */ + ret =3D pread64(fd, ®, sizeof(reg), + VFIO_GET_REGION_ADDR(VFIO_PCI= _CONFIG_REGION_INDEX) + + PCI_CAPABILITY_LIST); + if (ret !=3D sizeof(reg)) { + RTE_LOG(ERR, EAL, "Cannot read capability poi= nter from PCI " + "config space= !\n"); + return -1; + } + + /* we need first byte */ + cap_offset =3D reg & 0xFF; + + while (cap_offset){ + + /* read PCI capability ID */ + ret =3D pread64(fd, ®, sizeof(reg), + VFIO_GET_REGI= ON_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) + + cap_offset); + if (ret !=3D sizeof(reg)) { + RTE_LOG(ERR, EAL, "Cannot rea= d capability ID from PCI " + = "config space!\n"); + return -1; + } + + /* we need first byte */ + cap_id =3D reg & 0xFF; + + /* if we haven't reached MSI-X, check next ca= pability */ + if (cap_id !=3D PCI_CAP_ID_MSIX) { + ret =3D pread64(fd, ®, siz= eof(reg), + = VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) + + = cap_offset); + if (ret !=3D sizeof(reg)) { + RTE_LOG(ERR, = EAL, "Cannot read capability pointer from PCI " + = "config space!\n"); + return -1; + } + + /* we need second byte */ + cap_offset =3D (reg & 0xFF00)= >> 8; + + continue; + } + /* else, read table offset */ + else { + /* table offset resides in th= e next 4 bytes */ + ret =3D pread64(fd, ®, siz= eof(reg), + = VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) + + = cap_offset + 4); + if (ret !=3D sizeof(reg)) { + RTE_LOG(ERR, = EAL, "Cannot read table offset from PCI config " + = "space!\n"); + return -1; + } + + *msix_bar =3D reg & RTE_PCI_M= SIX_TABLE_BIR; + + return 0; + } + } + return 0; +} + +/* set PCI bus mastering */ +static int +pci_vfio_set_bus_master(int dev_fd) +{ + uint16_t reg; + int ret; + + ret =3D pread64(dev_fd, ®, sizeof(reg), + VFIO_GET_REGION_ADDR(VFIO_PCI= _CONFIG_REGION_INDEX) + + PCI_COMMAND); + if (ret !=3D sizeof(reg)) { + RTE_LOG(ERR, EAL, "Cannot read command from P= CI config space!\n"); + return -1; + } + + /* set the master bit */ + reg |=3D PCI_COMMAND_MASTER; + + ret =3D pwrite64(dev_fd, ®, sizeof(reg), + VFIO_GET_REGION_ADDR(VFIO_PCI= _CONFIG_REGION_INDEX) + + PCI_COMMAND); + + if (ret !=3D sizeof(reg)) { + RTE_LOG(ERR, EAL, "Cannot write command to PC= I config space!\n"); + return -1; + } + + return 0; +} + +/* set up DMA mappings */ +static int +pci_vfio_setup_dma_maps(int vfio_container_fd) +{ + const struct rte_memseg * ms =3D rte_eal_get_physmem_layout()= ; + int i, ret; + + ret =3D ioctl(vfio_container_fd, VFIO_SET_IOMMU, + VFIO_TYPE1_IOMMU); + if (ret) { + RTE_LOG(ERR, EAL, " cannot set IOMMU type!\n= "); + return -1; + } + + /* map all DPDK segments for DMA. use 1:1 PA to IOVA mapping = */ + for (i =3D 0; i < RTE_MAX_MEMSEG; i++) { + struct vfio_iommu_type1_dma_map dma_map; + + if (ms[i].addr =3D=3D NULL) + break; + + memset(&dma_map, 0, sizeof(dma_map)); + dma_map.argsz =3D sizeof(struct vfio_iommu_ty= pe1_dma_map); + dma_map.vaddr =3D ms[i].addr_64; + dma_map.size =3D ms[i].len; + dma_map.iova =3D ms[i].phys_addr; + dma_map.flags =3D VFIO_DMA_MAP_FLAG_READ | VF= IO_DMA_MAP_FLAG_WRITE; + + ret =3D ioctl(vfio_container_fd, VFIO_IOMMU_M= AP_DMA, &dma_map); + + if (ret) { + RTE_LOG(ERR, EAL, " cannot s= et up DMA remapping!\n"); + return -1; + } + } + + return 0; +} + +/* set up interrupt support (but not enable interrupts) */ +static int +pci_vfio_setup_interrupts(struct rte_pci_device *dev, int vfio_dev_fd, + int num_irqs) +{ + int i, ret, intr_idx; + enum rte_intr_handle_type handle_type; + + /* get interrupt type from internal config (MSI-X by default,= can be + * overriden from the command line + */ + switch (internal_config.vfio_intr_mode) { + case RTE_INTR_MODE_MSIX: + intr_idx =3D VFIO_PCI_MSIX_IRQ_INDEX; + handle_type =3D RTE_INTR_HANDLE_VFIO_MSIX; + break; + case RTE_INTR_MODE_LEGACY: + intr_idx =3D VFIO_PCI_INTX_IRQ_INDEX; + handle_type =3D RTE_INTR_HANDLE_VFIO_LEGACY; + break; + default: + RTE_LOG(ERR, EAL, " unknown default interrup= t type!\n"); + return -1; + } + + for (i =3D 0; i < num_irqs; i++) { + struct vfio_irq_info irq =3D { .argsz =3D siz= eof(irq) }; + int fd =3D -1; + + /* skip interrupt modes we don't want */ + if (i !=3D intr_idx) + continue; + + irq.index =3D i; + + ret =3D ioctl(vfio_dev_fd, VFIO_DEVICE_GET_IR= Q_INFO, &irq); + if (ret < 0) { + RTE_LOG(ERR, EAL, " cannot g= et IRQ info!\n"); + return -1; + } + + /* fail if this vector cannot be used with ev= entfd */ + if ((irq.flags & VFIO_IRQ_INFO_EVENTFD) =3D= =3D 0) { + RTE_LOG(ERR, EAL, " interrup= t vector does not support eventfd!\n"); + return -1; + } + + /* set up an eventfd for interrupts */ + fd =3D eventfd(0, 0); + if (fd < 0) { + RTE_LOG(ERR, EAL, " cannot s= et up eventfd!\n"); + return -1; + } + + dev->intr_handle.type =3D handle_type; + dev->intr_handle.fd =3D fd; + dev->intr_handle.vfio_dev_fd =3D vfio_dev_fd; + + return 0; + } + + /* if we're here, we haven't found a suitable interrupt vecto= r */ + return -1; +} + +/* open container fd or get an existing one */ +static int +pci_vfio_get_container_fd(void) +{ + int ret, vfio_container_fd; + + /* if we're in a primary process, try to open the container *= / + if (internal_config.process_type =3D=3D RTE_PROC_PRIMARY) { + vfio_container_fd =3D open(VFIO_CONTAINER_PAT= H, O_RDWR); + if (vfio_container_fd < 0) { + RTE_LOG(ERR, EAL, " cannot o= pen VFIO container!\n"); + return -1; + } + + /* check VFIO API version */ + ret =3D ioctl(vfio_container_fd, VFIO_GET_API= _VERSION); + if (ret !=3D VFIO_API_VERSION) { + RTE_LOG(ERR, EAL, " unknown = VFIO API version!\n"); + close(vfio_container_fd); + return -1; + } + + /* check if we support IOMMU type 1 */ + ret =3D ioctl(vfio_container_fd, VFIO_CHECK_E= XTENSION, VFIO_TYPE1_IOMMU); + if (!ret) { + RTE_LOG(ERR, EAL, " unknown = IOMMU driver!\n"); + close(vfio_container_fd); + return -1; + } + + return vfio_container_fd; + } + /* if we're in a secondary process, request container fd from= the primary + * process via our socket + */ + else { + int socket_fd; + if ((socket_fd =3D vfio_socket_connect_to_pri= mary()) < 0) { + RTE_LOG(ERR, EAL, " cannot c= onnect to primary process!\n"); + return -1; + } + if (vfio_socket_send_request(socket_fd, SOCKE= T_REQ_CONTAINER) < 0) { + RTE_LOG(ERR, EAL, " cannot r= equest container fd!\n"); + close(socket_fd); + return -1; + } + vfio_container_fd =3D vfio_socket_receive_fd(= socket_fd); + if (vfio_container_fd < 0) { + RTE_LOG(ERR, EAL, " cannot g= et container fd!\n"); + close(socket_fd); + return -1; + } + close(socket_fd); + return vfio_container_fd; + } + + return -1; +} + +/* open group fd or get an existing one */ +static int +pci_vfio_get_group_fd(int iommu_group_no) +{ + int i; + int vfio_group_fd; + char filename[PATH_MAX]; + + /* check if we already have the group descriptor open */ + for (i =3D 0; i < vfio_cfg.vfio_group_idx; i++) + if (vfio_cfg.vfio_groups[i].group_no =3D=3D i= ommu_group_no) + return vfio_cfg.vfio_groups[i= ].fd; + + /* if primary, try to open the group */ + if (internal_config.process_type =3D=3D RTE_PROC_PRIMARY) { + rte_snprintf(filename, sizeof(filename), + VFIO_GROUP_FM= T, iommu_group_no); + vfio_group_fd =3D open(filename, O_RDWR); + if (vfio_group_fd < 0) { + /* if file not found, it's no= t an error */ + if (errno !=3D ENOENT) { + RTE_LOG(ERR, = EAL, "Cannot open %s: %s\n", filename, + = strerror(errno)); + return -1; + } + return 0; + } + + /* if the fd is valid, create a new group for= it */ + if (vfio_cfg.vfio_group_idx =3D=3D VFIO_MAX_G= ROUPS) { + RTE_LOG(ERR, EAL, "Maximum nu= mber of VFIO groups reached!\n"); + return -1; + } + vfio_cfg.vfio_groups[vfio_cfg.vfio_group_idx]= .group_no =3D iommu_group_no; + vfio_cfg.vfio_groups[vfio_cfg.vfio_group_idx]= .fd =3D vfio_group_fd; + return vfio_group_fd; + } + /* if we're in a secondary process, request group fd from the= primary + * process via our socket + */ + else { + int socket_fd, ret; + if ((socket_fd =3D vfio_socket_connect_to_pri= mary()) < 0) { + RTE_LOG(ERR, EAL, " cannot c= onnect to primary process!\n"); + return -1; + } + if (vfio_socket_send_request(socket_fd, SOCKE= T_REQ_GROUP) < 0) { + RTE_LOG(ERR, EAL, " cannot r= equest container fd!\n"); + close(socket_fd); + return -1; + } + if (vfio_socket_send_request(socket_fd, iommu= _group_no) < 0) { + RTE_LOG(ERR, EAL, " cannot s= end group number!\n"); + close(socket_fd); + return -1; + } + ret =3D vfio_socket_receive_request(socket_fd= ); + switch(ret) { + case SOCKET_NO_FD: + close(socket_fd); + return 0; + case SOCKET_OK: + vfio_group_fd =3D vfio_socket= _receive_fd(socket_fd); + /* if we got the fd, return i= t */ + if (vfio_group_fd > 0) { + close(socket_= fd); + return vfio_g= roup_fd; + } + /* fall-through on error */ + default: + RTE_LOG(ERR, EAL, " cannot g= et container fd!\n"); + close(socket_fd); + return -1; + } + } + return -1; +} + +/* parse IOMMU group number for a PCI device + * returns -1 for errors, 0 for non-existent group */ +static int +pci_vfio_get_group_no(const char * pci_addr) +{ + char linkname[PATH_MAX]; + char filename[PATH_MAX]; + char * tok[16], *group_tok, *end; + int ret, iommu_group_no; + + memset(linkname, 0, sizeof(linkname)); + memset(filename, 0, sizeof(filename)); + + /* try to find out IOMMU group for this device */ + rte_snprintf(linkname, sizeof(linkname), + SYSFS_PCI_DEVICES "/%s/iommu_= group", pci_addr); + + ret =3D readlink(linkname, filename, sizeof(filename)); + + /* if the link doesn't exist, no VFIO for us */ + if (ret < 0) + return 0; + + ret =3D rte_strsplit(filename, sizeof(filename), + tok, RTE_DIM(tok), '/'); + + if (ret <=3D 0) { + RTE_LOG(ERR, EAL, " %s cannot get IOMMU grou= p\n", pci_addr); + return -1; + } + + /* IOMMU group is always the last token */ + errno =3D 0; + group_tok =3D tok[ret - 1]; + end =3D group_tok; + iommu_group_no =3D strtol(group_tok, &end, 10); + if ((end !=3D group_tok && *end !=3D '\0') || errno !=3D 0) { + RTE_LOG(ERR, EAL, " %s error parsing IOMMU n= umber!\n", pci_addr); + return -1; + } + + return iommu_group_no; +} + +static void +clear_current_group(void) +{ + vfio_cfg.vfio_groups[vfio_cfg.vfio_group_idx].group_no =3D 0; + vfio_cfg.vfio_groups[vfio_cfg.vfio_group_idx].fd =3D -1; +} + + +/* + * map the PCI resources of a PCI device in virtual memory (VFIO version). + * primary and secondary processes follow almost exactly the same path + */ +int +pci_vfio_map_resource(struct rte_pci_device *dev) +{ + struct vfio_group_status group_status =3D + = { .argsz =3D sizeof(group_status) }; + struct vfio_device_info device_info =3D { .argsz =3D sizeof(d= evice_info) }; + int vfio_group_fd, vfio_dev_fd; + int iommu_group_no; + char pci_addr[PATH_MAX] =3D {0}; + struct rte_pci_addr *loc =3D &dev->addr; + int i, ret, msix_bar; + struct mapped_pci_resource *vfio_res =3D NULL; + struct pci_map *maps; + + dev->intr_handle.fd =3D -1; + dev->intr_handle.type =3D RTE_INTR_HANDLE_UNKNOWN; + + /* store PCI address string */ + rte_snprintf(pci_addr, sizeof(pci_addr), PCI_PRI_FMT, + loc->domain, loc->bus, loc->d= evid, loc->function); + + /* get container fd (needs to be done only once per initializ= ation) */ + if (vfio_cfg.vfio_container_fd =3D=3D -1) { + int vfio_container_fd =3D pci_vfio_get_contai= ner_fd(); + if (vfio_container_fd < 0) { + RTE_LOG(ERR, EAL, " %s canno= t open VFIO container!\n", pci_addr); + return -1; + } + + vfio_cfg.vfio_container_fd =3D vfio_container= _fd; + } + + /* get group number */ + iommu_group_no =3D pci_vfio_get_group_no(pci_addr); + + /* if 0, group doesn't exist */ + if (iommu_group_no =3D=3D 0) { + RTE_LOG(WARNING, EAL, " %s not managed by VF= IO driver, skipping\n", + pci_addr); + return 1; + } + /* if negative, something failed */ + else if (iommu_group_no < 0) + return -1; + + /* get the actual group fd */ + vfio_group_fd =3D pci_vfio_get_group_fd(iommu_group_no); + if (vfio_group_fd < 0) { + return -1; + } + + /* store group fd */ + vfio_cfg.vfio_groups[vfio_cfg.vfio_group_idx].group_no =3D io= mmu_group_no; + vfio_cfg.vfio_groups[vfio_cfg.vfio_group_idx].fd =3D vfio_gro= up_fd; + + /* if group_fd =3D=3D 0, that means the device isn't managed = by VFIO */ + if (vfio_group_fd =3D=3D 0) { + RTE_LOG(WARNING, EAL, " %s not managed by VF= IO driver, skipping\n", + pci_addr); + /* we store 0 as group fd to distinguish betw= een existing but + * unbound VFIO groups, and groups that don't = exist at all. + */ + vfio_cfg.vfio_group_idx++; + return 1; + } + + /* + * at this point, we know at least one port on this device is = bound to VFIO, + * so we can proceed to try and set this particular port up + */ + + /* check if the group is viable */ + ret =3D ioctl(vfio_group_fd, VFIO_GROUP_GET_STATUS, &group_st= atus); + if (ret) { + RTE_LOG(ERR, EAL, " %s cannot get group stat= us!\n", pci_addr); + close(vfio_group_fd); + clear_current_group(); + return -1; + } + else if (!(group_status.flags & VFIO_GROUP_FLAGS_VIABLE)) { + RTE_LOG(ERR, EAL, " %s VFIO group is not via= ble!\n", pci_addr); + close(vfio_group_fd); + clear_current_group(); + return -1; + } + + /* + * at this point, we know that this group is viable (meaning, = all devices + * are either bound to VFIO or not bound to anything) + */ + + /* check if group does not have a container yet */ + if (!(group_status.flags & VFIO_GROUP_FLAGS_CONTAINER_SET)) { + + /* add group to a container */ + ret =3D ioctl(vfio_group_fd, VFIO_GROUP_SET_C= ONTAINER, + &vfio_cfg.vfi= o_container_fd); + if (ret) { + RTE_LOG(ERR, EAL, " %s canno= t add VFIO group to container!\n", + = pci_addr); + close(vfio_group_fd); + clear_current_group(); + return -1; + } + /* + * at this point we know that this group has b= een successfully + * initialized, so we increment vfio_group_idx= to indicate that we can + * add new groups. + */ + vfio_cfg.vfio_group_idx++; + } + + /* + * set up DMA mappings for container (needs to be done only on= ce, only when + * at least one group is assigned to a container and only in p= rimary process) + */ + if (internal_config.process_type =3D=3D RTE_PROC_PRIMARY && + vfio_cfg.vfio_container_has_d= ma =3D=3D 0) { + ret =3D pci_vfio_setup_dma_maps(vfio_cfg.vfio= _container_fd); + if (ret) { + RTE_LOG(ERR, EAL, " %s DMA r= emapping failed!\n", pci_addr); + return -1; + } + vfio_cfg.vfio_container_has_dma =3D 1; + } + + /* get a file descriptor for the device */ + vfio_dev_fd =3D ioctl(vfio_group_fd, VFIO_GROUP_GET_DEVICE_FD= , pci_addr); + if (vfio_dev_fd < 0) { + /* if we cannot get a device fd, this simply = means that this + * particular port is not bound to VFIO + */ + RTE_LOG(WARNING, EAL, " %s not managed by VF= IO driver, skipping\n", + pci_addr); + return 1; + } + + /* test and setup the device */ + ret =3D ioctl(vfio_dev_fd, VFIO_DEVICE_GET_INFO, &device_info= ); + if (ret) { + RTE_LOG(ERR, EAL, " %s cannot get device inf= o!\n", pci_addr); + close(vfio_dev_fd); + return -1; + } + + /* get MSI-X BAR, if any (we have to know where it is because= we can't + * mmap it when using VFIO) */ + msix_bar =3D -1; + ret =3D pci_vfio_get_msix_bar(vfio_dev_fd, &msix_bar); + if (ret < 0) { + RTE_LOG(ERR, EAL, " %s cannot get MSI-X BAR = number!\n", pci_addr); + close(vfio_dev_fd); + return -1; + } + + /* if we're in a primary process, allocate vfio_res and get r= egion info */ + if (internal_config.process_type =3D=3D RTE_PROC_PRIMARY) { + if ((vfio_res =3D rte_zmalloc("VFIO_RES", siz= eof (*vfio_res), 0)) =3D=3D NULL) { + RTE_LOG(ERR, EAL, + "%s(): cannot= store uio mmap details\n", __func__); + close(vfio_dev_fd); + return -1; + } + memcpy(&vfio_res->pci_addr, &dev->addr, sizeo= f(vfio_res->pci_addr)); + + /* get number of registers (up to BAR5) */ + vfio_res->nb_maps =3D RTE_MIN((int) device_in= fo.num_regions, + VFIO_PCI_BAR5= _REGION_INDEX + 1); + } + /* if we're in a secondary process, just find our tailq entry= and use that */ + else { + TAILQ_FOREACH(vfio_res, pci_res_list, next) { + if (memcmp(&vfio_res->pci_add= r, &dev->addr, sizeof(dev->addr))) + continue; + break; + } + /* if we haven't found our tailq entry, somet= hing's wrong */ + if (vfio_res =3D=3D NULL) { + RTE_LOG(ERR, EAL, " %s canno= t find TAILQ entry for PCI device!\n", + = pci_addr); + close(vfio_dev_fd); + return -1; + } + } + + /* map BARs */ + maps =3D vfio_res->maps; + + for (i =3D 0; i < (int) vfio_res->nb_maps; i++) { + struct vfio_region_info reg =3D { .argsz =3D = sizeof(reg) }; + void * bar_addr; + + reg.index =3D i; + + ret =3D ioctl(vfio_dev_fd, VFIO_DEVICE_GET_RE= GION_INFO, ®); + + if (ret) { + RTE_LOG(ERR, EAL, " %s canno= t get device region info!\n", + = pci_addr); + close(vfio_dev_fd); + if (internal_config.process_t= ype =3D=3D RTE_PROC_PRIMARY) + rte_free(vfio= _res); + return -1; + } + + /* skip non-mmapable BARs */ + if ((reg.flags & VFIO_REGION_INFO_FLAG_MMAP) = =3D=3D 0) + continue; + + /* skip MSI-X BAR */ + if (i =3D=3D msix_bar) + continue; + + bar_addr =3D pci_map_resource(maps[i].addr, v= fio_dev_fd, reg.offset, + reg.size); + + if (bar_addr =3D=3D NULL) { + RTE_LOG(ERR, EAL, " %s mappi= ng BAR%i failed: %s\n", pci_addr, i, + = strerror(errno)); + close(vfio_dev_fd); + if (internal_config.process_t= ype =3D=3D RTE_PROC_PRIMARY) + rte_free(vfio= _res); + return -1; + } + + maps[i].addr =3D bar_addr; + maps[i].offset =3D reg.offset; + maps[i].size =3D reg.size; + dev->mem_resource[i].addr =3D bar_addr; + } + + /* if secondary process, do not set up interrupts */ + if (internal_config.process_type =3D=3D RTE_PROC_PRIMARY) { + if (pci_vfio_setup_interrupts(dev, vfio_dev_f= d, + (int) device_= info.num_irqs) !=3D 0) { + RTE_LOG(ERR, EAL, " %s error= setting up interrupts!\n", pci_addr); + close(vfio_dev_fd); + rte_free(vfio_res); + return -1; + } + + /* set bus mastering for the device */ + if (pci_vfio_set_bus_master(vfio_dev_fd)) { + RTE_LOG(ERR, EAL, " %s canno= t set up bus mastering!\n", pci_addr); + close(vfio_dev_fd); + rte_free(vfio_res); + return -1; + } + + /* Reset the device */ + ioctl(vfio_dev_fd, VFIO_DEVICE_RESET); + } + + if (internal_config.process_type =3D=3D RTE_PROC_PRIMARY) + TAILQ_INSERT_TAIL(pci_res_list, vfio_res, nex= t); + + return (0); +} +#endif diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio_socket.c b/lib/librte= _eal/linuxapp/eal/eal_pci_vfio_socket.c new file mode 100644 index 0000000..1605fce --- /dev/null +++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio_socket.c @@ -0,0 +1,367 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2010-2014 Intel Corporation. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#include +#include +#include + +/* sys/un.h with __USE_MISC uses strlen, which is unsafe and should not be= used. */ +#ifdef __USE_MISC +#define REMOVED_USE_MISC +#undef __USE_MISC +#endif +#include +/* make sure we redefine __USE_MISC only if it was previously undefined */ +#ifdef REMOVED_USE_MISC +#define __USE_MISC +#undef REMOVED_USE_MISC +#endif + +#include +#include +#include +#include +#include + +#include "eal_filesystem.h" +#include "eal_pci_init.h" + +/** + * @file + * VFIO socket for communication between primary and secondary processes. + * + * This file is only compiled if CONFIG_RTE_EAL_VFIO is set to "y". + */ + +#ifdef VFIO_PRESENT +#define SOCKET_PATH_FMT "%s/.%s_mp_socket" +#define CMSGLEN (CMSG_LEN(sizeof(int))) +#define FD_TO_CMSGHDR(fd,chdr) \ + do {\ + (chdr).cmsg_len =3D CMSGLEN;\ + (chdr).cmsg_level =3D SOL_SOC= KET;\ + (chdr).cmsg_type =3D SCM_RIGH= TS;\ + memcpy((chdr).__cmsg_data, &(= fd), sizeof(fd));\ + } while(0) +#define CMSGHDR_TO_FD(chdr,fd) \ + do {\ + memcpy(&(fd), (chdr).__cmsg_d= ata, sizeof(fd));\ + } while (0) + + +/* get socket path (/var/run if root, $HOME otherwise) */ +static void +get_socket_path(char * buffer, int bufsz) +{ + const char *dir =3D "/var/run"; + const char *home_dir =3D getenv("HOME"); + + if (getuid() !=3D 0 && home_dir !=3D NULL) + dir =3D home_dir; + + /* use current prefix as file path */ + rte_snprintf(buffer, bufsz, SOCKET_PATH_FMT, dir, + internal_config.hugefile_pref= ix); +} + + + +/* + * data flow for socket comm protocol: + * 1. client sends SOCKET_REQ_CONTAINER or SOCKET_REQ_GROUP + * 1a. in case of SOCKET_REQ_GROUP, client also then sends group number + * 2. server receives message + * 2a. in case of invalid group, SOCKET_ERR is sent back to client + * 2b. in case of unbound group, SOCKET_NO_FD is sent back to client + * 2c. in case of valid group, SOCKET_OK is sent and immediately followed = by fd + * + * in case of any error, socket is closed. + */ + +/* send a request, return -1 on error */ +int +vfio_socket_send_request(int socket, int req) +{ + struct msghdr hdr; + struct iovec iov; + int buf; + int ret; + + memset(&hdr, 0, sizeof(hdr)); + + buf =3D req; + + hdr.msg_iov =3D &iov; + hdr.msg_iovlen =3D 1; + iov.iov_base =3D (char*) &buf; + iov.iov_len =3D sizeof(buf); + + ret =3D sendmsg(socket, &hdr, 0); + if (ret < 0) + return -1; + return 0; +} + +/* receive a request and return it */ +int +vfio_socket_receive_request(int socket) +{ + int buf; + struct msghdr hdr; + struct iovec iov; + int ret, req; + + memset(&hdr, 0, sizeof(hdr)); + + buf =3D SOCKET_ERR; + + hdr.msg_iov =3D &iov; + hdr.msg_iovlen =3D 1; + iov.iov_base =3D (char*) &buf; + iov.iov_len =3D sizeof(buf); + + ret =3D recvmsg(socket, &hdr, 0); + if (ret < 0) + return -1; + + req =3D buf; + + return req; +} + +/* send OK in message, fd in control message */ +int +vfio_socket_send_fd(int socket, int fd) +{ + int buf; + struct msghdr hdr; + struct cmsghdr * chdr; + char chdr_buf[CMSGLEN]; + struct iovec iov; + int ret; + + chdr =3D (struct cmsghdr *) chdr_buf; + memset(chdr, 0, sizeof(chdr_buf)); + memset(&hdr, 0, sizeof(hdr)); + + hdr.msg_iov =3D &iov; + hdr.msg_iovlen =3D 1; + iov.iov_base =3D (char*) &buf; + iov.iov_len =3D sizeof(buf); + hdr.msg_control =3D chdr; + hdr.msg_controllen =3D CMSGLEN; + + buf =3D SOCKET_OK; + FD_TO_CMSGHDR(fd, *chdr); + + ret =3D sendmsg(socket, &hdr, 0); + if (ret < 0) + return -1; + return 0; +} + +/* receive OK in message, fd in control message */ +int +vfio_socket_receive_fd(int socket) +{ + int buf; + struct msghdr hdr; + struct cmsghdr * chdr; + char chdr_buf[CMSGLEN]; + struct iovec iov; + int ret, req, fd; + + buf =3D SOCKET_ERR; + + chdr =3D (struct cmsghdr *) chdr_buf; + memset(chdr, 0, sizeof(chdr_buf)); + memset(&hdr, 0, sizeof(hdr)); + + hdr.msg_iov =3D &iov; + hdr.msg_iovlen =3D 1; + iov.iov_base =3D (char*) &buf; + iov.iov_len =3D sizeof(buf); + hdr.msg_control =3D chdr; + hdr.msg_controllen =3D CMSGLEN; + + ret =3D recvmsg(socket, &hdr, 0); + if (ret < 0) + return -1; + + req =3D buf; + + if (req !=3D SOCKET_OK) + return -1; + + CMSGHDR_TO_FD(*chdr, fd); + + return fd; +} + +/* connect socket_fd in secondary process to the primary process's socket = */ +int +vfio_socket_connect_to_primary(void) +{ + struct sockaddr_un addr; + socklen_t sockaddr_len; + int socket_fd; + + /* set up a socket */ + socket_fd =3D socket(AF_UNIX, SOCK_SEQPACKET, 0); + if (socket_fd < 0) { + RTE_LOG(ERR, EAL, "Failed to create socket!\n= "); + return -1; + } + + get_socket_path(addr.sun_path, sizeof(addr.sun_path)); + addr.sun_family =3D AF_UNIX; + + sockaddr_len =3D sizeof(struct sockaddr_un); + + if (connect(socket_fd, (struct sockaddr*) &addr, sockaddr_len= ) =3D=3D 0) + return socket_fd; + + /* if connect failed */ + close(socket_fd); + return -1; +} + + + +/* + * socket listening thread for primary process + */ +__attribute__((noreturn)) void * +pci_vfio_socket_thread(void *arg) +{ + int ret, i, vfio_group_no; + int socket_fd =3D *(int*) arg; + + /* wait for requests on the socket */ + for (;;) { + int conn_sock; + struct sockaddr_un addr; + socklen_t sockaddr_len =3D sizeof(addr); + + /* this is a blocking call */ + conn_sock =3D accept(socket_fd, (struct socka= ddr*) &addr, &sockaddr_len); + + /* just restart on error */ + if (conn_sock =3D=3D -1) + continue; + + /* set socket to linger after close */ + struct linger l; + l.l_onoff =3D 1; + l.l_linger =3D 60; + setsockopt(conn_sock, SOL_SOCKET, SO_LINGER, = &l, sizeof(l)); + + ret =3D vfio_socket_receive_request(conn_sock= ); + + switch (ret) { + case SOCKET_REQ_CONTAINER: + vfio_socket_send_fd(conn_sock= , vfio_cfg.vfio_container_fd); + break; + case SOCKET_REQ_GROUP: + /* wait for group number */ + vfio_group_no =3D vfio_socket= _receive_request(conn_sock); + if (vfio_group_no < 0) { + close(conn_so= ck); + continue; + } + for (i =3D 0; i < vfio_cfg.vf= io_group_idx; i++) { + if (vfio_cfg.= vfio_groups[i].group_no =3D=3D vfio_group_no) + = break; + } + /* if we reached end of the l= ist, the group doesn't exist */ + if (i =3D=3D vfio_cfg.vfio_gr= oup_idx) + vfio_socket_s= end_request(conn_sock, SOCKET_ERR); + /* if VFIO group exists but i= sn't bound to VFIO driver */ + else if (vfio_cfg.vfio_groups= [i].fd =3D=3D 0) + vfio_socket_s= end_request(conn_sock, SOCKET_NO_FD); + /* if group exists and is bou= nd to VFIO driver */ + else { + vfio_socket_s= end_request(conn_sock, SOCKET_OK); + vfio_socket_s= end_fd(conn_sock, vfio_cfg.vfio_groups[i].fd); + } + break; + default: + vfio_socket_send_request(conn= _sock, SOCKET_ERR); + break; + } + close(conn_sock); + } +} + +/* + * set up a local socket and tell it to listen for incoming connections + */ +int +pci_vfio_socket_setup(void) +{ + int ret, socket_fd; + struct sockaddr_un addr; + socklen_t sockaddr_len; + + /* set up a socket */ + socket_fd =3D socket(AF_UNIX, SOCK_SEQPACKET, 0); + if (socket_fd < 0) { + RTE_LOG(ERR, EAL, "Failed to create socket!\n= "); + return -1; + } + + get_socket_path(addr.sun_path, sizeof(addr.sun_path)); + addr.sun_family =3D AF_UNIX; + + sockaddr_len =3D sizeof(struct sockaddr_un); + + unlink(addr.sun_path); + + ret =3D bind(socket_fd, (struct sockaddr*) &addr, sockaddr_le= n); + if (ret) { + RTE_LOG(ERR, EAL, "Failed to bind socket: %s!= \n", strerror(errno)); + close(socket_fd); + return -1; + } + + ret =3D listen(socket_fd, 50); + if (ret) { + RTE_LOG(ERR, EAL, "Failed to listen: %s!\n", = strerror(errno)); + close(socket_fd); + return -1; + } + + return socket_fd; +} + +#endif diff --git a/lib/librte_eal/linuxapp/eal/include/eal_internal_cfg.h b/lib/l= ibrte_eal/linuxapp/eal/include/eal_internal_cfg.h index 92e3065..5468b0a 100644 --- a/lib/librte_eal/linuxapp/eal/include/eal_internal_cfg.h +++ b/lib/librte_eal/linuxapp/eal/include/eal_internal_cfg.h @@ -40,6 +40,7 @@ #define _EAL_LINUXAPP_INTERNAL_CFG #include +#include #define MAX_HUGEPAGE_SIZES 3 /**< support up to 3 page sizes */ @@ -76,6 +77,8 @@ struct internal_config { volatile uint64_t socket_mem[RTE_MAX_NUMA_NODES]; /**< amoun= t of memory per socket */ uintptr_t base_virtaddr; /**< base address to try a= nd reserve memory from */ volatile int syslog_facility; /**< facility pa= ssed to openlog() */ + /** default interrupt mode for VFIO */ + volatile enum rte_intr_mode vfio_intr_mode; const char *hugefile_prefix; /**< the base filename of = hugetlbfs files */ const char *hugepage_dir; /**< specific hugetlbfs di= rectory to use */ diff --git a/lib/librte_eal/linuxapp/eal/include/eal_pci_init.h b/lib/librt= e_eal/linuxapp/eal/include/eal_pci_init.h index 699e80d..b163ab5 100644 --- a/lib/librte_eal/linuxapp/eal/include/eal_pci_init.h +++ b/lib/librte_eal/linuxapp/eal/include/eal_pci_init.h @@ -34,6 +34,8 @@ #ifndef EAL_PCI_INIT_H_ #define EAL_PCI_INIT_H_ +#include "eal_vfio.h" + struct pci_map { void *addr; uint64_t offset; @@ -62,4 +64,57 @@ void * pci_map_resource(void *requested_addr, int fd, of= f_t offset, size_t size) /* map IGB_UIO resource prototype */ int pci_uio_map_resource(struct rte_pci_device *dev); +#ifdef VFIO_PRESENT + +#define VFIO_MAX_GROUPS 64 +#define VFIO_DIR "/dev/vfio" +#define VFIO_CONTAINER_PATH "/dev/vfio/vfio" +#define VFIO_GROUP_FMT "/dev/vfio/%u" +#define VFIO_GET_REGION_ADDR(x) ((uint64_t) x << 40ULL) + +/* map VFIO resource prototype */ +int pci_vfio_map_resource(struct rte_pci_device *dev); + +/* + * Function prototypes for VFIO socket functions + */ +int vfio_socket_send_request(int socket, int req); +int vfio_socket_receive_request(int socket); +int vfio_socket_send_fd(int socket, int fd); +int vfio_socket_receive_fd(int socket); +int vfio_socket_connect_to_primary(void); +int pci_vfio_socket_setup(void); +void * pci_vfio_socket_thread(void *arg); + +/* socket comm protocol definitions */ +#define SOCKET_REQ_CONTAINER 0x100 +#define SOCKET_REQ_GROUP 0x200 +#define SOCKET_OK 0x0 +#define SOCKET_NO_FD 0x1 +#define SOCKET_ERR 0xFF + +/* + * we don't need to store device fd's anywhere since they can be obtained = from + * the group fd via an ioctl() call. + */ +struct vfio_group { + int group_no; + int fd; +}; + +struct vfio_config { + int vfio_enabled; + int vfio_container_fd; + int vfio_container_has_dma; + int vfio_group_idx; + struct vfio_group vfio_groups[VFIO_MAX_GROUPS]; +}; + +/* per-process VFIO config */ +struct vfio_config vfio_cfg; + +pthread_t socket_thread; + +#endif + #endif /* EAL_PCI_INIT_H_ */ diff --git a/lib/librte_eal/linuxapp/eal/include/eal_vfio.h b/lib/librte_ea= l/linuxapp/eal/include/eal_vfio.h index ca4982b..32953c0 100644 --- a/lib/librte_eal/linuxapp/eal/include/eal_vfio.h +++ b/lib/librte_eal/linuxapp/eal/include/eal_vfio.h @@ -42,6 +42,12 @@ #if LINUX_VERSION_CODE >=3D KERNEL_VERSION(3,6,0) #include +#if LINUX_VERSION_CODE < KERNEL_VERSION(3,10,0) +#define RTE_PCI_MSIX_TABLE_BIR 0x7 +#else +#define RTE_PCI_MSIX_TABLE_BIR PCI_MSIX_TABLE_BIR +#endif + #define VFIO_PRESENT #endif /* kernel version */ #endif /* RTE_EAL_VFIO */ -- 1.7.0.7