DPDK patches and discussions
 help / color / mirror / Atom feed
* [RFC 0/4] Support VFIO sparse mmap in PCI bus
@ 2023-04-18  5:30 Chenbo Xia
  2023-04-18  5:30 ` [RFC 1/4] bus/pci: introduce an internal representation of PCI device Chenbo Xia
                   ` (6 more replies)
  0 siblings, 7 replies; 50+ messages in thread
From: Chenbo Xia @ 2023-04-18  5:30 UTC (permalink / raw)
  To: dev; +Cc: skori

This series introduces a VFIO standard capability, called sparse
mmap to PCI bus. In linux kernel, it's defined as
VFIO_REGION_INFO_CAP_SPARSE_MMAP. Sparse mmap means instead of
mmap whole BAR region into DPDK process, only mmap part of the
BAR region after getting sparse mmap information from kernel.
For the rest of BAR region that is not mmap-ed, DPDK process
can use pread/pwrite system calls to access. Sparse mmap is
useful when kernel does not want userspace to mmap whole BAR
region, or kernel wants to control over access to specific BAR
region. Vendors can choose to enable this feature or not for
their devices in their specific kernel modules. 

In this patchset:

Patch 1-3 is mainly for introducing BAR access APIs so that
driver could use them to access specific BAR using pread/pwrite
system calls when part of the BAR is not mmap-able.

Patch 4 adds the VFIO sparse mmap support finally. A question
is for all sparse mmap regions, should they be mapped to a
continuous virtual address region that follows device-specific
BAR layout or not. In theory, there could be three options to
support this feature.

Option 1: Map sparse mmap regions independently
======================================================
In this approach, we mmap each sparse mmap region one by one
and each region could be located anywhere in process address
space. But accessing the mmaped BAR will not be as easy as
'bar_base_address + bar_offset', driver needs to check the
sparse mmap information to access specific BAR register. 

Patch 4 in this patchset adopts this option. Driver API change
is introduced in bus_pci_driver.h. Corresponding changes in
all drivers are also done and currently I am assuming drivers
do not support this feature so they will not check the
'is_sparse' flag but assumes it to be false. Note that it will
not break any driver and each vendor can add related logic when
they start to support this feature. This is only because I don't
want to introduce complexity to drivers that do not want to
support this feature.

Option 2: Map sparse mmap regions based on device-specific BAR layout 
======================================================================
In this approach, the sparse mmap regions are mapped to continuous
virtual address region that follows device-specific BAR layout.
For example, the BAR size is 0x4000 and only 0-0x1000 (sparse mmap
region #1) and 0x3000-0x4000 (sparse mmap region #2) could be
mmaped. Region #1 will be mapped at 'base_addr' and region #2
will be mapped at 'base_addr + 0x3000'. The good thing is if
we implement like this, driver can still access all BAR registers
using 'bar_base_address + bar_offset' way and we don't need
to introduce any driver API change. But the address space
range 'base_addr + 0x1000' to 'base_addr + 0x3000' may need to
be reserved so it could result in waste of address space or memory
(when we use MAP_ANONYMOUS and MAP_PRIVATE flag to reserve this
range). Meanwhile, driver needs to know which part of BAR is
mmaped (this is possible since the range is defined by vendor's
specific kernel module).

Option 3: Support both option 1 & 2 
===================================
We could define a driver flag to let driver choose which way it
perfers since either option has its own Pros & Cons.

Please share your comments, Thanks!


Chenbo Xia (4):
  bus/pci: introduce an internal representation of PCI device
  bus/pci: avoid depending on private value in kernel source
  bus/pci: introduce helper for MMIO read and write
  bus/pci: add VFIO sparse mmap support

 drivers/baseband/acc/rte_acc100_pmd.c         |   6 +-
 drivers/baseband/acc/rte_vrb_pmd.c            |   6 +-
 .../fpga_5gnr_fec/rte_fpga_5gnr_fec.c         |   6 +-
 drivers/baseband/fpga_lte_fec/fpga_lte_fec.c  |   6 +-
 drivers/bus/pci/bsd/pci.c                     |  43 +-
 drivers/bus/pci/bus_pci_driver.h              |  24 +-
 drivers/bus/pci/linux/pci.c                   |  91 +++-
 drivers/bus/pci/linux/pci_init.h              |  14 +-
 drivers/bus/pci/linux/pci_uio.c               |  34 +-
 drivers/bus/pci/linux/pci_vfio.c              | 445 ++++++++++++++----
 drivers/bus/pci/pci_common.c                  |  57 ++-
 drivers/bus/pci/pci_common_uio.c              |  12 +-
 drivers/bus/pci/private.h                     |  25 +-
 drivers/bus/pci/rte_bus_pci.h                 |  48 ++
 drivers/bus/pci/version.map                   |   3 +
 drivers/common/cnxk/roc_dev.c                 |   4 +-
 drivers/common/cnxk/roc_dpi.c                 |   2 +-
 drivers/common/cnxk/roc_ml.c                  |  22 +-
 drivers/common/qat/dev/qat_dev_gen1.c         |   2 +-
 drivers/common/qat/dev/qat_dev_gen4.c         |   4 +-
 drivers/common/sfc_efx/sfc_efx.c              |   2 +-
 drivers/compress/octeontx/otx_zip.c           |   4 +-
 drivers/crypto/ccp/ccp_dev.c                  |   4 +-
 drivers/crypto/cnxk/cnxk_cryptodev_ops.c      |   2 +-
 drivers/crypto/nitrox/nitrox_device.c         |   4 +-
 drivers/crypto/octeontx/otx_cryptodev_ops.c   |   6 +-
 drivers/crypto/virtio/virtio_pci.c            |   6 +-
 drivers/dma/cnxk/cnxk_dmadev.c                |   2 +-
 drivers/dma/hisilicon/hisi_dmadev.c           |   6 +-
 drivers/dma/idxd/idxd_pci.c                   |   4 +-
 drivers/dma/ioat/ioat_dmadev.c                |   2 +-
 drivers/event/dlb2/pf/dlb2_main.c             |  16 +-
 drivers/event/octeontx/ssovf_probe.c          |  38 +-
 drivers/event/octeontx/timvf_probe.c          |  18 +-
 drivers/event/skeleton/skeleton_eventdev.c    |   2 +-
 drivers/mempool/octeontx/octeontx_fpavf.c     |   6 +-
 drivers/net/ark/ark_ethdev.c                  |   4 +-
 drivers/net/atlantic/atl_ethdev.c             |   2 +-
 drivers/net/avp/avp_ethdev.c                  |  20 +-
 drivers/net/axgbe/axgbe_ethdev.c              |   4 +-
 drivers/net/bnx2x/bnx2x_ethdev.c              |   6 +-
 drivers/net/bnxt/bnxt_ethdev.c                |   8 +-
 drivers/net/cpfl/cpfl_ethdev.c                |   4 +-
 drivers/net/cxgbe/cxgbe_ethdev.c              |   2 +-
 drivers/net/cxgbe/cxgbe_main.c                |   2 +-
 drivers/net/cxgbe/cxgbevf_ethdev.c            |   2 +-
 drivers/net/cxgbe/cxgbevf_main.c              |   2 +-
 drivers/net/e1000/em_ethdev.c                 |   4 +-
 drivers/net/e1000/igb_ethdev.c                |   4 +-
 drivers/net/ena/ena_ethdev.c                  |   4 +-
 drivers/net/enetc/enetc_ethdev.c              |   2 +-
 drivers/net/enic/enic_main.c                  |   4 +-
 drivers/net/fm10k/fm10k_ethdev.c              |   2 +-
 drivers/net/gve/gve_ethdev.c                  |   4 +-
 drivers/net/hinic/base/hinic_pmd_hwif.c       |  14 +-
 drivers/net/hns3/hns3_ethdev.c                |   2 +-
 drivers/net/hns3/hns3_ethdev_vf.c             |   2 +-
 drivers/net/hns3/hns3_rxtx.c                  |   4 +-
 drivers/net/i40e/i40e_ethdev.c                |   2 +-
 drivers/net/iavf/iavf_ethdev.c                |   2 +-
 drivers/net/ice/ice_dcf.c                     |   2 +-
 drivers/net/ice/ice_ethdev.c                  |   2 +-
 drivers/net/idpf/idpf_ethdev.c                |   4 +-
 drivers/net/igc/igc_ethdev.c                  |   2 +-
 drivers/net/ionic/ionic_dev_pci.c             |   2 +-
 drivers/net/ixgbe/ixgbe_ethdev.c              |   4 +-
 drivers/net/liquidio/lio_ethdev.c             |   4 +-
 drivers/net/nfp/nfp_ethdev.c                  |   2 +-
 drivers/net/nfp/nfp_ethdev_vf.c               |   6 +-
 drivers/net/nfp/nfpcore/nfp_cpp_pcie_ops.c    |   4 +-
 drivers/net/ngbe/ngbe_ethdev.c                |   2 +-
 drivers/net/octeon_ep/otx_ep_ethdev.c         |   2 +-
 drivers/net/octeontx/base/octeontx_pkivf.c    |   6 +-
 drivers/net/octeontx/base/octeontx_pkovf.c    |  12 +-
 drivers/net/qede/qede_main.c                  |   6 +-
 drivers/net/sfc/sfc.c                         |   2 +-
 drivers/net/thunderx/nicvf_ethdev.c           |   2 +-
 drivers/net/txgbe/txgbe_ethdev.c              |   2 +-
 drivers/net/txgbe/txgbe_ethdev_vf.c           |   2 +-
 drivers/net/virtio/virtio_pci.c               |   6 +-
 drivers/net/vmxnet3/vmxnet3_ethdev.c          |   4 +-
 drivers/raw/cnxk_bphy/cnxk_bphy.c             |  10 +-
 drivers/raw/cnxk_bphy/cnxk_bphy_cgx.c         |   6 +-
 drivers/raw/ifpga/afu_pmd_n3000.c             |   4 +-
 drivers/raw/ifpga/ifpga_rawdev.c              |   6 +-
 drivers/raw/ntb/ntb_hw_intel.c                |   8 +-
 drivers/vdpa/ifc/ifcvf_vdpa.c                 |   6 +-
 drivers/vdpa/sfc/sfc_vdpa_hw.c                |   2 +-
 drivers/vdpa/sfc/sfc_vdpa_ops.c               |   2 +-
 lib/eal/include/rte_vfio.h                    |   1 -
 90 files changed, 853 insertions(+), 352 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [RFC 1/4] bus/pci: introduce an internal representation of PCI device
  2023-04-18  5:30 [RFC 0/4] Support VFIO sparse mmap in PCI bus Chenbo Xia
@ 2023-04-18  5:30 ` Chenbo Xia
  2023-04-18  5:30 ` [RFC 2/4] bus/pci: avoid depending on private value in kernel source Chenbo Xia
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 50+ messages in thread
From: Chenbo Xia @ 2023-04-18  5:30 UTC (permalink / raw)
  To: dev; +Cc: skori

This patch introduces an internal representation of the PCI device
which will be used to store the internal information that don't have
to be exposed to drivers, e.g., the VFIO region sizes/offsets.

In this patch, the internal structure is simply a wrapper of the
rte_pci_device structure. More fields will be added.

Signed-off-by: Chenbo Xia <chenbo.xia@intel.com>
---
 drivers/bus/pci/bsd/pci.c    | 13 ++++++++-----
 drivers/bus/pci/linux/pci.c  | 28 ++++++++++++++++------------
 drivers/bus/pci/pci_common.c | 12 ++++++------
 drivers/bus/pci/private.h    | 14 +++++++++++++-
 4 files changed, 43 insertions(+), 24 deletions(-)

diff --git a/drivers/bus/pci/bsd/pci.c b/drivers/bus/pci/bsd/pci.c
index 7459d15c7e..a747eca58c 100644
--- a/drivers/bus/pci/bsd/pci.c
+++ b/drivers/bus/pci/bsd/pci.c
@@ -208,16 +208,19 @@ pci_uio_map_resource_by_index(struct rte_pci_device *dev, int res_idx,
 static int
 pci_scan_one(int dev_pci_fd, struct pci_conf *conf)
 {
+	struct rte_pci_device_internal *pdev;
 	struct rte_pci_device *dev;
 	struct pci_bar_io bar;
 	unsigned i, max;
 
-	dev = malloc(sizeof(*dev));
-	if (dev == NULL) {
+	pdev = malloc(sizeof(*pdev));
+	if (pdev == NULL) {
+		RTE_LOG(ERR, EAL, "Cannot allocate memory for internal pci device\n");
 		return -1;
 	}
 
-	memset(dev, 0, sizeof(*dev));
+	memset(pdev, 0, sizeof(*pdev));
+	dev = &pdev->device;
 	dev->device.bus = &rte_pci_bus.bus;
 
 	dev->addr.domain = conf->pc_sel.pc_domain;
@@ -303,7 +306,7 @@ pci_scan_one(int dev_pci_fd, struct pci_conf *conf)
 				memmove(dev2->mem_resource,
 					dev->mem_resource,
 					sizeof(dev->mem_resource));
-				pci_free(dev);
+				pci_free(pdev);
 			}
 			return 0;
 		}
@@ -313,7 +316,7 @@ pci_scan_one(int dev_pci_fd, struct pci_conf *conf)
 	return 0;
 
 skipdev:
-	pci_free(dev);
+	pci_free(pdev);
 	return 0;
 }
 
diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index ebd1395502..4c2c5ba382 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -211,22 +211,26 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr *addr)
 {
 	char filename[PATH_MAX];
 	unsigned long tmp;
+	struct rte_pci_device_internal *pdev;
 	struct rte_pci_device *dev;
 	char driver[PATH_MAX];
 	int ret;
 
-	dev = malloc(sizeof(*dev));
-	if (dev == NULL)
+	pdev = malloc(sizeof(*pdev));
+	if (pdev == NULL) {
+		RTE_LOG(ERR, EAL, "Cannot allocate memory for internal pci device\n");
 		return -1;
+	}
 
-	memset(dev, 0, sizeof(*dev));
+	memset(pdev, 0, sizeof(*pdev));
+	dev = &pdev->device;
 	dev->device.bus = &rte_pci_bus.bus;
 	dev->addr = *addr;
 
 	/* get vendor id */
 	snprintf(filename, sizeof(filename), "%s/vendor", dirname);
 	if (eal_parse_sysfs_value(filename, &tmp) < 0) {
-		pci_free(dev);
+		pci_free(pdev);
 		return -1;
 	}
 	dev->id.vendor_id = (uint16_t)tmp;
@@ -234,7 +238,7 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr *addr)
 	/* get device id */
 	snprintf(filename, sizeof(filename), "%s/device", dirname);
 	if (eal_parse_sysfs_value(filename, &tmp) < 0) {
-		pci_free(dev);
+		pci_free(pdev);
 		return -1;
 	}
 	dev->id.device_id = (uint16_t)tmp;
@@ -243,7 +247,7 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr *addr)
 	snprintf(filename, sizeof(filename), "%s/subsystem_vendor",
 		 dirname);
 	if (eal_parse_sysfs_value(filename, &tmp) < 0) {
-		pci_free(dev);
+		pci_free(pdev);
 		return -1;
 	}
 	dev->id.subsystem_vendor_id = (uint16_t)tmp;
@@ -252,7 +256,7 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr *addr)
 	snprintf(filename, sizeof(filename), "%s/subsystem_device",
 		 dirname);
 	if (eal_parse_sysfs_value(filename, &tmp) < 0) {
-		pci_free(dev);
+		pci_free(pdev);
 		return -1;
 	}
 	dev->id.subsystem_device_id = (uint16_t)tmp;
@@ -261,7 +265,7 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr *addr)
 	snprintf(filename, sizeof(filename), "%s/class",
 		 dirname);
 	if (eal_parse_sysfs_value(filename, &tmp) < 0) {
-		pci_free(dev);
+		pci_free(pdev);
 		return -1;
 	}
 	/* the least 24 bits are valid: class, subclass, program interface */
@@ -297,7 +301,7 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr *addr)
 	snprintf(filename, sizeof(filename), "%s/resource", dirname);
 	if (pci_parse_sysfs_resource(filename, dev) < 0) {
 		RTE_LOG(ERR, EAL, "%s(): cannot parse resource\n", __func__);
-		pci_free(dev);
+		pci_free(pdev);
 		return -1;
 	}
 
@@ -306,7 +310,7 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr *addr)
 	ret = pci_get_kernel_driver_by_path(filename, driver, sizeof(driver));
 	if (ret < 0) {
 		RTE_LOG(ERR, EAL, "Fail to get kernel driver\n");
-		pci_free(dev);
+		pci_free(pdev);
 		return -1;
 	}
 
@@ -320,7 +324,7 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr *addr)
 		else
 			dev->kdrv = RTE_PCI_KDRV_UNKNOWN;
 	} else {
-		pci_free(dev);
+		pci_free(pdev);
 		return 0;
 	}
 	/* device is valid, add in list (sorted) */
@@ -375,7 +379,7 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr *addr)
 						pci_common_set(dev2);
 					}
 				}
-				pci_free(dev);
+				pci_free(pdev);
 			}
 			return 0;
 		}
diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
index e32a9d517a..52404ab0fe 100644
--- a/drivers/bus/pci/pci_common.c
+++ b/drivers/bus/pci/pci_common.c
@@ -121,12 +121,12 @@ pci_common_set(struct rte_pci_device *dev)
 }
 
 void
-pci_free(struct rte_pci_device *dev)
+pci_free(struct rte_pci_device_internal *pdev)
 {
-	if (dev == NULL)
+	if (pdev == NULL)
 		return;
-	free(dev->bus_info);
-	free(dev);
+	free(pdev->device.bus_info);
+	free(pdev);
 }
 
 /* map a particular resource from a file */
@@ -465,7 +465,7 @@ pci_cleanup(void)
 		rte_intr_instance_free(dev->vfio_req_intr_handle);
 		dev->vfio_req_intr_handle = NULL;
 
-		pci_free(dev);
+		pci_free(RTE_PCI_DEVICE_INTERNAL(dev));
 	}
 
 	return error;
@@ -681,7 +681,7 @@ pci_unplug(struct rte_device *dev)
 	if (ret == 0) {
 		rte_pci_remove_device(pdev);
 		rte_devargs_remove(dev->devargs);
-		pci_free(pdev);
+		pci_free(RTE_PCI_DEVICE_INTERNAL(pdev));
 	}
 	return ret;
 }
diff --git a/drivers/bus/pci/private.h b/drivers/bus/pci/private.h
index c8161a1074..b564646e03 100644
--- a/drivers/bus/pci/private.h
+++ b/drivers/bus/pci/private.h
@@ -13,6 +13,14 @@
 #include <rte_os_shim.h>
 #include <rte_pci.h>
 
+/*
+ * Convert struct rte_pci_device to struct rte_pci_device_internal
+ */
+#define RTE_PCI_DEVICE_INTERNAL(ptr) \
+	container_of(ptr, struct rte_pci_device_internal, device)
+#define RTE_PCI_DEVICE_INTERNAL_CONST(ptr) \
+	container_of(ptr, const struct rte_pci_device_internal, device)
+
 /**
  * Structure describing the PCI bus
  */
@@ -34,6 +42,10 @@ extern struct rte_pci_bus rte_pci_bus;
 struct rte_pci_driver;
 struct rte_pci_device;
 
+struct rte_pci_device_internal {
+	struct rte_pci_device device;
+};
+
 /**
  * Scan the content of the PCI bus, and the devices in the devices
  * list
@@ -53,7 +65,7 @@ pci_common_set(struct rte_pci_device *dev);
  * Free a PCI device.
  */
 void
-pci_free(struct rte_pci_device *dev);
+pci_free(struct rte_pci_device_internal *pdev);
 
 /**
  * Validate whether a device with given PCI address should be ignored or not.
-- 
2.17.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [RFC 2/4] bus/pci: avoid depending on private value in kernel source
  2023-04-18  5:30 [RFC 0/4] Support VFIO sparse mmap in PCI bus Chenbo Xia
  2023-04-18  5:30 ` [RFC 1/4] bus/pci: introduce an internal representation of PCI device Chenbo Xia
@ 2023-04-18  5:30 ` Chenbo Xia
  2023-04-18  5:30 ` [RFC 3/4] bus/pci: introduce helper for MMIO read and write Chenbo Xia
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 50+ messages in thread
From: Chenbo Xia @ 2023-04-18  5:30 UTC (permalink / raw)
  To: dev; +Cc: skori, Anatoly Burakov

The value 40 used in VFIO_GET_REGION_ADDR() is a private value
(VFIO_PCI_OFFSET_SHIFT) defined in Linux kernel source [1]. It
is not part of VFIO API, and we should not depend on it.

[1] https://github.com/torvalds/linux/blob/v6.2/include/linux/vfio_pci_core.h

Signed-off-by: Chenbo Xia <chenbo.xia@intel.com>
---
 drivers/bus/pci/linux/pci.c      |   4 +-
 drivers/bus/pci/linux/pci_init.h |   4 +-
 drivers/bus/pci/linux/pci_vfio.c | 195 +++++++++++++++++++++++--------
 drivers/bus/pci/private.h        |   9 ++
 lib/eal/include/rte_vfio.h       |   1 -
 5 files changed, 158 insertions(+), 55 deletions(-)

diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index 4c2c5ba382..04e21ae20f 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -645,7 +645,7 @@ int rte_pci_read_config(const struct rte_pci_device *device,
 		return pci_uio_read_config(intr_handle, buf, len, offset);
 #ifdef VFIO_PRESENT
 	case RTE_PCI_KDRV_VFIO:
-		return pci_vfio_read_config(intr_handle, buf, len, offset);
+		return pci_vfio_read_config(device, buf, len, offset);
 #endif
 	default:
 		rte_pci_device_name(&device->addr, devname,
@@ -669,7 +669,7 @@ int rte_pci_write_config(const struct rte_pci_device *device,
 		return pci_uio_write_config(intr_handle, buf, len, offset);
 #ifdef VFIO_PRESENT
 	case RTE_PCI_KDRV_VFIO:
-		return pci_vfio_write_config(intr_handle, buf, len, offset);
+		return pci_vfio_write_config(device, buf, len, offset);
 #endif
 	default:
 		rte_pci_device_name(&device->addr, devname,
diff --git a/drivers/bus/pci/linux/pci_init.h b/drivers/bus/pci/linux/pci_init.h
index dcea726186..9f6659ba6e 100644
--- a/drivers/bus/pci/linux/pci_init.h
+++ b/drivers/bus/pci/linux/pci_init.h
@@ -66,9 +66,9 @@ int pci_uio_ioport_unmap(struct rte_pci_ioport *p);
 #endif
 
 /* access config space */
-int pci_vfio_read_config(const struct rte_intr_handle *intr_handle,
+int pci_vfio_read_config(const struct rte_pci_device *dev,
 			 void *buf, size_t len, off_t offs);
-int pci_vfio_write_config(const struct rte_intr_handle *intr_handle,
+int pci_vfio_write_config(const struct rte_pci_device *dev,
 			  const void *buf, size_t len, off_t offs);
 
 int pci_vfio_ioport_map(struct rte_pci_device *dev, int bar,
diff --git a/drivers/bus/pci/linux/pci_vfio.c b/drivers/bus/pci/linux/pci_vfio.c
index fab3483d9f..1748ad2ae0 100644
--- a/drivers/bus/pci/linux/pci_vfio.c
+++ b/drivers/bus/pci/linux/pci_vfio.c
@@ -43,45 +43,82 @@ static struct rte_tailq_elem rte_vfio_tailq = {
 };
 EAL_REGISTER_TAILQ(rte_vfio_tailq)
 
+static int
+pci_vfio_get_region(const struct rte_pci_device *dev, int index,
+		    uint64_t *size, uint64_t *offset)
+{
+	const struct rte_pci_device_internal *pdev =
+		RTE_PCI_DEVICE_INTERNAL_CONST(dev);
+
+	if (index >= VFIO_PCI_NUM_REGIONS || index >= RTE_MAX_PCI_REGIONS)
+		return -1;
+
+	if (pdev->region[index].size == 0 && pdev->region[index].offset == 0)
+		return -1;
+
+	*size   = pdev->region[index].size;
+	*offset = pdev->region[index].offset;
+
+	return 0;
+}
+
 int
-pci_vfio_read_config(const struct rte_intr_handle *intr_handle,
+pci_vfio_read_config(const struct rte_pci_device *dev,
 		    void *buf, size_t len, off_t offs)
 {
-	int vfio_dev_fd = rte_intr_dev_fd_get(intr_handle);
+	uint64_t size, offset;
+	int fd;
 
-	if (vfio_dev_fd < 0)
+	fd = rte_intr_dev_fd_get(dev->intr_handle);
+
+	if (pci_vfio_get_region(dev, VFIO_PCI_CONFIG_REGION_INDEX,
+				&size, &offset) != 0)
+		return -1;
+
+	if ((uint64_t)len + offs > size)
 		return -1;
 
-	return pread64(vfio_dev_fd, buf, len,
-	       VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) + offs);
+	return pread64(fd, buf, len, offset + offs);
 }
 
 int
-pci_vfio_write_config(const struct rte_intr_handle *intr_handle,
+pci_vfio_write_config(const struct rte_pci_device *dev,
 		    const void *buf, size_t len, off_t offs)
 {
-	int vfio_dev_fd = rte_intr_dev_fd_get(intr_handle);
+	uint64_t size, offset;
+	int fd;
 
-	if (vfio_dev_fd < 0)
+	fd = rte_intr_dev_fd_get(dev->intr_handle);
+
+	if (pci_vfio_get_region(dev, VFIO_PCI_CONFIG_REGION_INDEX,
+				&size, &offset) != 0)
 		return -1;
 
-	return pwrite64(vfio_dev_fd, buf, len,
-	       VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) + offs);
+	if ((uint64_t)len + offs > size)
+		return -1;
+
+	return pwrite64(fd, buf, len, offset + offs);
 }
 
 /* get PCI BAR number where MSI-X interrupts are */
 static int
-pci_vfio_get_msix_bar(int fd, struct pci_msix_table *msix_table)
+pci_vfio_get_msix_bar(const struct rte_pci_device *dev, int fd,
+	struct pci_msix_table *msix_table)
 {
 	int ret;
 	uint32_t reg;
 	uint16_t flags;
 	uint8_t cap_id, cap_offset;
+	uint64_t size, offset;
+
+	if (pci_vfio_get_region(dev, VFIO_PCI_CONFIG_REGION_INDEX,
+		&size, &offset) != 0) {
+		RTE_LOG(ERR, EAL, "Cannot get offset of CONFIG region.\n");
+		return -1;
+	}
 
 	/* read PCI capability pointer from config space */
-	ret = pread64(fd, &reg, sizeof(reg),
-			VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
-			PCI_CAPABILITY_LIST);
+	ret = pread64(fd, &reg, sizeof(reg), offset + PCI_CAPABILITY_LIST);
 	if (ret != sizeof(reg)) {
 		RTE_LOG(ERR, EAL,
 			"Cannot read capability pointer from PCI config space!\n");
@@ -94,9 +131,7 @@ pci_vfio_get_msix_bar(int fd, struct pci_msix_table *msix_table)
 	while (cap_offset) {
 
 		/* read PCI capability ID */
-		ret = pread64(fd, &reg, sizeof(reg),
-				VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
-				cap_offset);
+		ret = pread64(fd, &reg, sizeof(reg), offset + cap_offset);
 		if (ret != sizeof(reg)) {
 			RTE_LOG(ERR, EAL,
 				"Cannot read capability ID from PCI config space!\n");
@@ -108,9 +143,7 @@ pci_vfio_get_msix_bar(int fd, struct pci_msix_table *msix_table)
 
 		/* if we haven't reached MSI-X, check next capability */
 		if (cap_id != PCI_CAP_ID_MSIX) {
-			ret = pread64(fd, &reg, sizeof(reg),
-					VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
-					cap_offset);
+			ret = pread64(fd, &reg, sizeof(reg), offset + cap_offset);
 			if (ret != sizeof(reg)) {
 				RTE_LOG(ERR, EAL,
 					"Cannot read capability pointer from PCI config space!\n");
@@ -125,18 +158,14 @@ pci_vfio_get_msix_bar(int fd, struct pci_msix_table *msix_table)
 		/* else, read table offset */
 		else {
 			/* table offset resides in the next 4 bytes */
-			ret = pread64(fd, &reg, sizeof(reg),
-					VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
-					cap_offset + 4);
+			ret = pread64(fd, &reg, sizeof(reg), offset + cap_offset + 4);
 			if (ret != sizeof(reg)) {
 				RTE_LOG(ERR, EAL,
 					"Cannot read table offset from PCI config space!\n");
 				return -1;
 			}
 
-			ret = pread64(fd, &flags, sizeof(flags),
-					VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
-					cap_offset + 2);
+			ret = pread64(fd, &flags, sizeof(flags), offset + cap_offset + 2);
 			if (ret != sizeof(flags)) {
 				RTE_LOG(ERR, EAL,
 					"Cannot read table flags from PCI config space!\n");
@@ -156,14 +185,19 @@ pci_vfio_get_msix_bar(int fd, struct pci_msix_table *msix_table)
 
 /* enable PCI bus memory space */
 static int
-pci_vfio_enable_bus_memory(int dev_fd)
+pci_vfio_enable_bus_memory(struct rte_pci_device *dev, int dev_fd)
 {
+	uint64_t size, offset;
 	uint16_t cmd;
 	int ret;
 
-	ret = pread64(dev_fd, &cmd, sizeof(cmd),
-		      VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
-		      PCI_COMMAND);
+	if (pci_vfio_get_region(dev, VFIO_PCI_CONFIG_REGION_INDEX,
+		&size, &offset) != 0) {
+		RTE_LOG(ERR, EAL, "Cannot get offset of CONFIG region.\n");
+		return -1;
+	}
+
+	ret = pread64(dev_fd, &cmd, sizeof(cmd), offset + PCI_COMMAND);
 
 	if (ret != sizeof(cmd)) {
 		RTE_LOG(ERR, EAL, "Cannot read command from PCI config space!\n");
@@ -174,9 +208,7 @@ pci_vfio_enable_bus_memory(int dev_fd)
 		return 0;
 
 	cmd |= PCI_COMMAND_MEMORY;
-	ret = pwrite64(dev_fd, &cmd, sizeof(cmd),
-		       VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
-		       PCI_COMMAND);
+	ret = pwrite64(dev_fd, &cmd, sizeof(cmd), offset + PCI_COMMAND);
 
 	if (ret != sizeof(cmd)) {
 		RTE_LOG(ERR, EAL, "Cannot write command to PCI config space!\n");
@@ -188,14 +220,19 @@ pci_vfio_enable_bus_memory(int dev_fd)
 
 /* set PCI bus mastering */
 static int
-pci_vfio_set_bus_master(int dev_fd, bool op)
+pci_vfio_set_bus_master(const struct rte_pci_device *dev, int dev_fd, bool op)
 {
+	uint64_t size, offset;
 	uint16_t reg;
 	int ret;
 
-	ret = pread64(dev_fd, &reg, sizeof(reg),
-			VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
-			PCI_COMMAND);
+	if (pci_vfio_get_region(dev, VFIO_PCI_CONFIG_REGION_INDEX,
+		&size, &offset) != 0) {
+		RTE_LOG(ERR, EAL, "Cannot get offset of CONFIG region.\n");
+		return -1;
+	}
+
+	ret = pread64(dev_fd, &reg, sizeof(reg), offset + PCI_COMMAND);
 	if (ret != sizeof(reg)) {
 		RTE_LOG(ERR, EAL, "Cannot read command from PCI config space!\n");
 		return -1;
@@ -207,9 +244,7 @@ pci_vfio_set_bus_master(int dev_fd, bool op)
 	else
 		reg &= ~(PCI_COMMAND_MASTER);
 
-	ret = pwrite64(dev_fd, &reg, sizeof(reg),
-			VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
-			PCI_COMMAND);
+	ret = pwrite64(dev_fd, &reg, sizeof(reg), offset + PCI_COMMAND);
 
 	if (ret != sizeof(reg)) {
 		RTE_LOG(ERR, EAL, "Cannot write command to PCI config space!\n");
@@ -458,14 +493,21 @@ pci_vfio_disable_notifier(struct rte_pci_device *dev)
 #endif
 
 static int
-pci_vfio_is_ioport_bar(int vfio_dev_fd, int bar_index)
+pci_vfio_is_ioport_bar(const struct rte_pci_device *dev, int vfio_dev_fd,
+	int bar_index)
 {
+	uint64_t size, offset;
 	uint32_t ioport_bar;
 	int ret;
 
+	if (pci_vfio_get_region(dev, VFIO_PCI_CONFIG_REGION_INDEX,
+		&size, &offset) != 0) {
+		RTE_LOG(ERR, EAL, "Cannot get offset of CONFIG region.\n");
+		return -1;
+	}
+
 	ret = pread64(vfio_dev_fd, &ioport_bar, sizeof(ioport_bar),
-			  VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX)
-			  + PCI_BASE_ADDRESS_0 + bar_index*4);
+			  offset + PCI_BASE_ADDRESS_0 + bar_index * 4);
 	if (ret != sizeof(ioport_bar)) {
 		RTE_LOG(ERR, EAL, "Cannot read command (%x) from config space!\n",
 			PCI_BASE_ADDRESS_0 + bar_index*4);
@@ -483,13 +525,13 @@ pci_rte_vfio_setup_device(struct rte_pci_device *dev, int vfio_dev_fd)
 		return -1;
 	}
 
-	if (pci_vfio_enable_bus_memory(vfio_dev_fd)) {
+	if (pci_vfio_enable_bus_memory(dev, vfio_dev_fd)) {
 		RTE_LOG(ERR, EAL, "Cannot enable bus memory!\n");
 		return -1;
 	}
 
 	/* set bus mastering for the device */
-	if (pci_vfio_set_bus_master(vfio_dev_fd, true)) {
+	if (pci_vfio_set_bus_master(dev, vfio_dev_fd, true)) {
 		RTE_LOG(ERR, EAL, "Cannot set up bus mastering!\n");
 		return -1;
 	}
@@ -719,11 +761,40 @@ pci_vfio_msix_is_mappable(int vfio_dev_fd, int msix_region)
 	return ret;
 }
 
+static int
+pci_vfio_fill_regions(struct rte_pci_device *dev, int vfio_dev_fd,
+		      struct vfio_device_info *device_info)
+{
+	struct rte_pci_device_internal *pdev = RTE_PCI_DEVICE_INTERNAL(dev);
+	struct vfio_region_info *reg = NULL;
+	int nb_maps, i, ret;
+
+	nb_maps = RTE_MIN((int)device_info->num_regions,
+			VFIO_PCI_CONFIG_REGION_INDEX + 1);
+
+	for (i = 0; i < nb_maps; i++) {
+		ret = pci_vfio_get_region_info(vfio_dev_fd, &reg, i);
+		if (ret < 0) {
+			RTE_LOG(DEBUG, EAL, "%s cannot get device region info error %i (%s)\n",
+				dev->name, errno, strerror(errno));
+			return -1;
+		}
+
+		pdev->region[i].size = reg->size;
+		pdev->region[i].offset = reg->offset;
+
+		free(reg);
+	}
+
+	return 0;
+}
 
 static int
 pci_vfio_map_resource_primary(struct rte_pci_device *dev)
 {
+	struct rte_pci_device_internal *pdev = RTE_PCI_DEVICE_INTERNAL(dev);
 	struct vfio_device_info device_info = { .argsz = sizeof(device_info) };
+	struct vfio_region_info *reg = NULL;
 	char pci_addr[PATH_MAX] = {0};
 	int vfio_dev_fd;
 	struct rte_pci_addr *loc = &dev->addr;
@@ -767,11 +838,22 @@ pci_vfio_map_resource_primary(struct rte_pci_device *dev)
 	/* map BARs */
 	maps = vfio_res->maps;
 
+	ret = pci_vfio_get_region_info(vfio_dev_fd, &reg,
+		VFIO_PCI_CONFIG_REGION_INDEX);
+	if (ret < 0) {
+		RTE_LOG(ERR, EAL, "%s cannot get device region info error %i (%s)\n",
+			dev->name, errno, strerror(errno));
+		goto err_vfio_res;
+	}
+	pdev->region[VFIO_PCI_CONFIG_REGION_INDEX].size = reg->size;
+	pdev->region[VFIO_PCI_CONFIG_REGION_INDEX].offset = reg->offset;
+	free(reg);
+
 	vfio_res->msix_table.bar_index = -1;
 	/* get MSI-X BAR, if any (we have to know where it is because we can't
 	 * easily mmap it when using VFIO)
 	 */
-	ret = pci_vfio_get_msix_bar(vfio_dev_fd, &vfio_res->msix_table);
+	ret = pci_vfio_get_msix_bar(dev, vfio_dev_fd, &vfio_res->msix_table);
 	if (ret < 0) {
 		RTE_LOG(ERR, EAL, "%s cannot get MSI-X BAR number!\n",
 				pci_addr);
@@ -792,7 +874,6 @@ pci_vfio_map_resource_primary(struct rte_pci_device *dev)
 	}
 
 	for (i = 0; i < vfio_res->nb_maps; i++) {
-		struct vfio_region_info *reg = NULL;
 		void *bar_addr;
 
 		ret = pci_vfio_get_region_info(vfio_dev_fd, &reg, i);
@@ -803,8 +884,11 @@ pci_vfio_map_resource_primary(struct rte_pci_device *dev)
 			goto err_vfio_res;
 		}
 
+		pdev->region[i].size = reg->size;
+		pdev->region[i].offset = reg->offset;
+
 		/* chk for io port region */
-		ret = pci_vfio_is_ioport_bar(vfio_dev_fd, i);
+		ret = pci_vfio_is_ioport_bar(dev, vfio_dev_fd, i);
 		if (ret < 0) {
 			free(reg);
 			goto err_vfio_res;
@@ -916,6 +1000,10 @@ pci_vfio_map_resource_secondary(struct rte_pci_device *dev)
 	if (ret)
 		return ret;
 
+	ret = pci_vfio_fill_regions(dev, vfio_dev_fd, &device_info);
+	if (ret)
+		return ret;
+
 	/* map BARs */
 	maps = vfio_res->maps;
 
@@ -1031,7 +1119,7 @@ pci_vfio_unmap_resource_primary(struct rte_pci_device *dev)
 	if (vfio_dev_fd < 0)
 		return -1;
 
-	if (pci_vfio_set_bus_master(vfio_dev_fd, false)) {
+	if (pci_vfio_set_bus_master(dev, vfio_dev_fd, false)) {
 		RTE_LOG(ERR, EAL, "%s cannot unset bus mastering for PCI device!\n",
 				pci_addr);
 		return -1;
@@ -1111,14 +1199,21 @@ int
 pci_vfio_ioport_map(struct rte_pci_device *dev, int bar,
 		    struct rte_pci_ioport *p)
 {
+	uint64_t size, offset;
+
 	if (bar < VFIO_PCI_BAR0_REGION_INDEX ||
 	    bar > VFIO_PCI_BAR5_REGION_INDEX) {
 		RTE_LOG(ERR, EAL, "invalid bar (%d)!\n", bar);
 		return -1;
 	}
 
+	if (pci_vfio_get_region(dev, bar, &size, &offset) != 0) {
+		RTE_LOG(ERR, EAL, "Cannot get offset of region %d.\n", bar);
+		return -1;
+	}
+
 	p->dev = dev;
-	p->base = VFIO_GET_REGION_ADDR(bar);
+	p->base = offset;
 	return 0;
 }
 
diff --git a/drivers/bus/pci/private.h b/drivers/bus/pci/private.h
index b564646e03..2d6991ccb7 100644
--- a/drivers/bus/pci/private.h
+++ b/drivers/bus/pci/private.h
@@ -13,6 +13,8 @@
 #include <rte_os_shim.h>
 #include <rte_pci.h>
 
+#define RTE_MAX_PCI_REGIONS    9
+
 /*
  * Convert struct rte_pci_device to struct rte_pci_device_internal
  */
@@ -42,8 +44,15 @@ extern struct rte_pci_bus rte_pci_bus;
 struct rte_pci_driver;
 struct rte_pci_device;
 
+struct rte_pci_region {
+	uint64_t size;
+	uint64_t offset;
+};
+
 struct rte_pci_device_internal {
 	struct rte_pci_device device;
+	/* PCI regions provided by e.g. VFIO. */
+	struct rte_pci_region region[RTE_MAX_PCI_REGIONS];
 };
 
 /**
diff --git a/lib/eal/include/rte_vfio.h b/lib/eal/include/rte_vfio.h
index 7bdb8932b2..3487c4f2a2 100644
--- a/lib/eal/include/rte_vfio.h
+++ b/lib/eal/include/rte_vfio.h
@@ -38,7 +38,6 @@ extern "C" {
 #define VFIO_CONTAINER_PATH "/dev/vfio/vfio"
 #define VFIO_GROUP_FMT "/dev/vfio/%u"
 #define VFIO_NOIOMMU_GROUP_FMT "/dev/vfio/noiommu-%u"
-#define VFIO_GET_REGION_ADDR(x) ((uint64_t) x << 40ULL)
 #define VFIO_GET_REGION_IDX(x) (x >> 40)
 #define VFIO_NOIOMMU_MODE      \
 	"/sys/module/vfio/parameters/enable_unsafe_noiommu_mode"
-- 
2.17.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [RFC 3/4] bus/pci: introduce helper for MMIO read and write
  2023-04-18  5:30 [RFC 0/4] Support VFIO sparse mmap in PCI bus Chenbo Xia
  2023-04-18  5:30 ` [RFC 1/4] bus/pci: introduce an internal representation of PCI device Chenbo Xia
  2023-04-18  5:30 ` [RFC 2/4] bus/pci: avoid depending on private value in kernel source Chenbo Xia
@ 2023-04-18  5:30 ` Chenbo Xia
  2023-04-18  5:30 ` [RFC 4/4] bus/pci: add VFIO sparse mmap support Chenbo Xia
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 50+ messages in thread
From: Chenbo Xia @ 2023-04-18  5:30 UTC (permalink / raw)
  To: dev; +Cc: skori, Anatoly Burakov

The MMIO regions may not be mmap-able for VFIO-PCI devices.
In this case, the driver should explicitly do read and write
to access these regions.

Signed-off-by: Chenbo Xia <chenbo.xia@intel.com>
---
 drivers/bus/pci/bsd/pci.c        | 22 +++++++++++++++
 drivers/bus/pci/linux/pci.c      | 46 ++++++++++++++++++++++++++++++
 drivers/bus/pci/linux/pci_init.h | 10 +++++++
 drivers/bus/pci/linux/pci_uio.c  | 22 +++++++++++++++
 drivers/bus/pci/linux/pci_vfio.c | 36 ++++++++++++++++++++++++
 drivers/bus/pci/rte_bus_pci.h    | 48 ++++++++++++++++++++++++++++++++
 drivers/bus/pci/version.map      |  3 ++
 7 files changed, 187 insertions(+)

diff --git a/drivers/bus/pci/bsd/pci.c b/drivers/bus/pci/bsd/pci.c
index a747eca58c..27f12590d4 100644
--- a/drivers/bus/pci/bsd/pci.c
+++ b/drivers/bus/pci/bsd/pci.c
@@ -489,6 +489,28 @@ int rte_pci_write_config(const struct rte_pci_device *dev,
 	return -1;
 }
 
+/* Read PCI MMIO space. */
+int rte_pci_mmio_read(const struct rte_pci_device *dev, int bar,
+		      void *buf, size_t len, off_t offset)
+{
+	if (bar >= PCI_MAX_RESOURCE || dev->mem_resource[bar].addr == NULL ||
+			(uint64_t)offset + len > dev->mem_resource[bar].len)
+		return -1;
+	memcpy(buf, (uint8_t *)dev->mem_resource[bar].addr + offset, len);
+	return len;
+}
+
+/* Write PCI MMIO space. */
+int rte_pci_mmio_write(const struct rte_pci_device *dev, int bar,
+		       const void *buf, size_t len, off_t offset)
+{
+	if (bar >= PCI_MAX_RESOURCE || dev->mem_resource[bar].addr == NULL ||
+			(uint64_t)offset + len > dev->mem_resource[bar].len)
+		return -1;
+	memcpy((uint8_t *)dev->mem_resource[bar].addr + offset, buf, len);
+	return len;
+}
+
 int
 rte_pci_ioport_map(struct rte_pci_device *dev, int bar,
 		struct rte_pci_ioport *p)
diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index 04e21ae20f..3d237398d9 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -680,6 +680,52 @@ int rte_pci_write_config(const struct rte_pci_device *device,
 	}
 }
 
+/* Read PCI MMIO space. */
+int rte_pci_mmio_read(const struct rte_pci_device *device, int bar,
+		void *buf, size_t len, off_t offset)
+{
+	char devname[RTE_DEV_NAME_MAX_LEN] = "";
+
+	switch (device->kdrv) {
+	case RTE_PCI_KDRV_IGB_UIO:
+	case RTE_PCI_KDRV_UIO_GENERIC:
+		return pci_uio_mmio_read(device, bar, buf, len, offset);
+#ifdef VFIO_PRESENT
+	case RTE_PCI_KDRV_VFIO:
+		return pci_vfio_mmio_read(device, bar, buf, len, offset);
+#endif
+	default:
+		rte_pci_device_name(&device->addr, devname,
+				    RTE_DEV_NAME_MAX_LEN);
+		RTE_LOG(ERR, EAL,
+			"Unknown driver type for %s\n", devname);
+		return -1;
+	}
+}
+
+/* Write PCI MMIO space. */
+int rte_pci_mmio_write(const struct rte_pci_device *device, int bar,
+		const void *buf, size_t len, off_t offset)
+{
+	char devname[RTE_DEV_NAME_MAX_LEN] = "";
+
+	switch (device->kdrv) {
+	case RTE_PCI_KDRV_IGB_UIO:
+	case RTE_PCI_KDRV_UIO_GENERIC:
+		return pci_uio_mmio_write(device, bar, buf, len, offset);
+#ifdef VFIO_PRESENT
+	case RTE_PCI_KDRV_VFIO:
+		return pci_vfio_mmio_write(device, bar, buf, len, offset);
+#endif
+	default:
+		rte_pci_device_name(&device->addr, devname,
+				    RTE_DEV_NAME_MAX_LEN);
+		RTE_LOG(ERR, EAL,
+			"Unknown driver type for %s\n", devname);
+		return -1;
+	}
+}
+
 int
 rte_pci_ioport_map(struct rte_pci_device *dev, int bar,
 		struct rte_pci_ioport *p)
diff --git a/drivers/bus/pci/linux/pci_init.h b/drivers/bus/pci/linux/pci_init.h
index 9f6659ba6e..d842809ccd 100644
--- a/drivers/bus/pci/linux/pci_init.h
+++ b/drivers/bus/pci/linux/pci_init.h
@@ -37,6 +37,11 @@ int pci_uio_read_config(const struct rte_intr_handle *intr_handle,
 int pci_uio_write_config(const struct rte_intr_handle *intr_handle,
 			 const void *buf, size_t len, off_t offs);
 
+int pci_uio_mmio_read(const struct rte_pci_device *dev, int bar,
+			void *buf, size_t len, off_t offset);
+int pci_uio_mmio_write(const struct rte_pci_device *dev, int bar,
+			const void *buf, size_t len, off_t offset);
+
 int pci_uio_ioport_map(struct rte_pci_device *dev, int bar,
 		       struct rte_pci_ioport *p);
 void pci_uio_ioport_read(struct rte_pci_ioport *p,
@@ -71,6 +76,11 @@ int pci_vfio_read_config(const struct rte_pci_device *dev,
 int pci_vfio_write_config(const struct rte_pci_device *dev,
 			  const void *buf, size_t len, off_t offs);
 
+int pci_vfio_mmio_read(const struct rte_pci_device *dev, int bar,
+			void *buf, size_t len, off_t offset);
+int pci_vfio_mmio_write(const struct rte_pci_device *dev, int bar,
+			const void *buf, size_t len, off_t offset);
+
 int pci_vfio_ioport_map(struct rte_pci_device *dev, int bar,
 		        struct rte_pci_ioport *p);
 void pci_vfio_ioport_read(struct rte_pci_ioport *p,
diff --git a/drivers/bus/pci/linux/pci_uio.c b/drivers/bus/pci/linux/pci_uio.c
index d52125e49b..2bf16e9369 100644
--- a/drivers/bus/pci/linux/pci_uio.c
+++ b/drivers/bus/pci/linux/pci_uio.c
@@ -55,6 +55,28 @@ pci_uio_write_config(const struct rte_intr_handle *intr_handle,
 	return pwrite(uio_cfg_fd, buf, len, offset);
 }
 
+int
+pci_uio_mmio_read(const struct rte_pci_device *dev, int bar,
+		  void *buf, size_t len, off_t offset)
+{
+	if (bar >= PCI_MAX_RESOURCE || dev->mem_resource[bar].addr == NULL ||
+			(uint64_t)offset + len > dev->mem_resource[bar].len)
+		return -1;
+	memcpy(buf, (uint8_t *)dev->mem_resource[bar].addr + offset, len);
+	return len;
+}
+
+int
+pci_uio_mmio_write(const struct rte_pci_device *dev, int bar,
+		   const void *buf, size_t len, off_t offset)
+{
+	if (bar >= PCI_MAX_RESOURCE || dev->mem_resource[bar].addr == NULL ||
+			(uint64_t)offset + len > dev->mem_resource[bar].len)
+		return -1;
+	memcpy((uint8_t *)dev->mem_resource[bar].addr + offset, buf, len);
+	return len;
+}
+
 static int
 pci_uio_set_bus_master(int dev_fd)
 {
diff --git a/drivers/bus/pci/linux/pci_vfio.c b/drivers/bus/pci/linux/pci_vfio.c
index 1748ad2ae0..f6289c907f 100644
--- a/drivers/bus/pci/linux/pci_vfio.c
+++ b/drivers/bus/pci/linux/pci_vfio.c
@@ -1258,6 +1258,42 @@ pci_vfio_ioport_unmap(struct rte_pci_ioport *p)
 	return -1;
 }
 
+int
+pci_vfio_mmio_read(const struct rte_pci_device *dev, int bar,
+		   void *buf, size_t len, off_t offs)
+{
+	uint64_t size, offset;
+	int fd;
+
+	fd = rte_intr_dev_fd_get(dev->intr_handle);
+
+	if (pci_vfio_get_region(dev, bar, &size, &offset) != 0)
+		return -1;
+
+	if ((uint64_t)len + offs > size)
+		return -1;
+
+	return pread64(fd, buf, len, offset + offs);
+}
+
+int
+pci_vfio_mmio_write(const struct rte_pci_device *dev, int bar,
+		    const void *buf, size_t len, off_t offs)
+{
+	uint64_t size, offset;
+	int fd;
+
+	fd = rte_intr_dev_fd_get(dev->intr_handle);
+
+	if (pci_vfio_get_region(dev, bar, &size, &offset) != 0)
+		return -1;
+
+	if ((uint64_t)len + offs > size)
+		return -1;
+
+	return pwrite64(fd, buf, len, offset + offs);
+}
+
 int
 pci_vfio_is_enabled(void)
 {
diff --git a/drivers/bus/pci/rte_bus_pci.h b/drivers/bus/pci/rte_bus_pci.h
index b193114fe5..82da087f24 100644
--- a/drivers/bus/pci/rte_bus_pci.h
+++ b/drivers/bus/pci/rte_bus_pci.h
@@ -135,6 +135,54 @@ int rte_pci_read_config(const struct rte_pci_device *device,
 int rte_pci_write_config(const struct rte_pci_device *device,
 		const void *buf, size_t len, off_t offset);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Read from a MMIO pci resource.
+ *
+ * @param device
+ *   A pointer to a rte_pci_device structure describing the device
+ *   to use
+ * @param bar
+ *   Index of the io pci resource we want to access.
+ * @param buf
+ *   A data buffer where the bytes should be read into
+ * @param len
+ *   The length of the data buffer.
+ * @param offset
+ *   The offset into MMIO space described by @bar
+ * @return
+ *  Number of bytes read on success, negative on error.
+ */
+__rte_experimental
+int rte_pci_mmio_read(const struct rte_pci_device *device, int bar,
+		void *buf, size_t len, off_t offset);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Write to a MMIO pci resource.
+ *
+ * @param device
+ *   A pointer to a rte_pci_device structure describing the device
+ *   to use
+ * @param bar
+ *   Index of the io pci resource we want to access.
+ * @param buf
+ *   A data buffer containing the bytes should be written
+ * @param len
+ *   The length of the data buffer.
+ * @param offset
+ *   The offset into MMIO space described by @bar
+ * @return
+ *  Number of bytes written on success, negative on error.
+ */
+__rte_experimental
+int rte_pci_mmio_write(const struct rte_pci_device *device, int bar,
+		const void *buf, size_t len, off_t offset);
+
 /**
  * Initialize a rte_pci_ioport object for a pci device io resource.
  *
diff --git a/drivers/bus/pci/version.map b/drivers/bus/pci/version.map
index 161ab86d3b..00fde139ca 100644
--- a/drivers/bus/pci/version.map
+++ b/drivers/bus/pci/version.map
@@ -21,6 +21,9 @@ EXPERIMENTAL {
 
 	# added in 21.08
 	rte_pci_set_bus_master;
+	# added in 23.07
+	rte_pci_mmio_read;
+	rte_pci_mmio_write;
 };
 
 INTERNAL {
-- 
2.17.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [RFC 4/4] bus/pci: add VFIO sparse mmap support
  2023-04-18  5:30 [RFC 0/4] Support VFIO sparse mmap in PCI bus Chenbo Xia
                   ` (2 preceding siblings ...)
  2023-04-18  5:30 ` [RFC 3/4] bus/pci: introduce helper for MMIO read and write Chenbo Xia
@ 2023-04-18  5:30 ` Chenbo Xia
  2023-04-18  7:46 ` [RFC 0/4] Support VFIO sparse mmap in PCI bus David Marchand
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 50+ messages in thread
From: Chenbo Xia @ 2023-04-18  5:30 UTC (permalink / raw)
  To: dev
  Cc: skori, Nicolas Chautru, Anatoly Burakov, Nithin Dabilpuram,
	Kiran Kumar K, Satha Rao, Srikanth Yalavarthi, Kai Ji,
	Andrew Rybchenko, Ashish Gupta, Fan Zhang, Sunil Uttarwar,
	Ankur Dwivedi, Anoob Joseph, Tejasree Kondoj,
	Nagadheeraj Rottela, Srikanth Jampala, Jay Zhou,
	Radha Mohan Chintakuntla, Veerasenareddy Burru, Chengwen Feng,
	Bruce Richardson, Kevin Laatz, Conor Walsh, Timothy McDaniel,
	Jerin Jacob, Pavan Nikhilesh, Harman Kalra, Shepard Siegel,
	Ed Czeck, John Miller, Igor Russkikh, Steven Webster,
	Matt Peters, Chandubabu Namburu, Rasesh Mody, Shahed Shaikh,
	Ajit Khaparde, Somnath Kotur, Yuying Zhang, Beilei Xing,
	Rahul Lakkireddy, Simei Su, Wenjun Wu, Marcin Wojtas,
	Michal Krawczyk, Shai Brandes, Evgeny Schemeilin, Igor Chauskin,
	Gagandeep Singh, Sachin Saxena, John Daley, Hyong Youb Kim,
	Qi Zhang, Xiao Wang, Junfeng Guo, Ziyang Xuan, Xiaoyun Wang,
	Guoyang Zhou, Dongdong Liu, Yisen Zhuang, Jingjing Wu,
	Qiming Yang, Andrew Boyer, Shijith Thotton,
	Srisivasubramanian Srinivasan, Chaoyong He,
	Niklas Söderlund, Jiawen Wu, Sathesh Edara,
	Devendra Singh Rawat, Alok Prasad, Maciej Czekaj, Jian Wang,
	Maxime Coquelin, Jochen Behrens, Jakub Palider, Tomasz Duszynski,
	Rosen Xu, Tianfei Zhang, Vijay Kumar Srivastava

This patch adds sparse mmap support in PCI bus. Sparse mmap is a
capability defined in VFIO which allows multiple mmap areas in one
VFIO region.

Signed-off-by: Chenbo Xia <chenbo.xia@intel.com>
---
 drivers/baseband/acc/rte_acc100_pmd.c         |   6 +-
 drivers/baseband/acc/rte_vrb_pmd.c            |   6 +-
 .../fpga_5gnr_fec/rte_fpga_5gnr_fec.c         |   6 +-
 drivers/baseband/fpga_lte_fec/fpga_lte_fec.c  |   6 +-
 drivers/bus/pci/bsd/pci.c                     |  20 +-
 drivers/bus/pci/bus_pci_driver.h              |  24 +-
 drivers/bus/pci/linux/pci.c                   |  13 +-
 drivers/bus/pci/linux/pci_uio.c               |  24 +-
 drivers/bus/pci/linux/pci_vfio.c              | 214 +++++++++++++++---
 drivers/bus/pci/pci_common.c                  |  45 ++--
 drivers/bus/pci/pci_common_uio.c              |  12 +-
 drivers/bus/pci/private.h                     |   2 +
 drivers/common/cnxk/roc_dev.c                 |   4 +-
 drivers/common/cnxk/roc_dpi.c                 |   2 +-
 drivers/common/cnxk/roc_ml.c                  |  22 +-
 drivers/common/qat/dev/qat_dev_gen1.c         |   2 +-
 drivers/common/qat/dev/qat_dev_gen4.c         |   4 +-
 drivers/common/sfc_efx/sfc_efx.c              |   2 +-
 drivers/compress/octeontx/otx_zip.c           |   4 +-
 drivers/crypto/ccp/ccp_dev.c                  |   4 +-
 drivers/crypto/cnxk/cnxk_cryptodev_ops.c      |   2 +-
 drivers/crypto/nitrox/nitrox_device.c         |   4 +-
 drivers/crypto/octeontx/otx_cryptodev_ops.c   |   6 +-
 drivers/crypto/virtio/virtio_pci.c            |   6 +-
 drivers/dma/cnxk/cnxk_dmadev.c                |   2 +-
 drivers/dma/hisilicon/hisi_dmadev.c           |   6 +-
 drivers/dma/idxd/idxd_pci.c                   |   4 +-
 drivers/dma/ioat/ioat_dmadev.c                |   2 +-
 drivers/event/dlb2/pf/dlb2_main.c             |  16 +-
 drivers/event/octeontx/ssovf_probe.c          |  38 ++--
 drivers/event/octeontx/timvf_probe.c          |  18 +-
 drivers/event/skeleton/skeleton_eventdev.c    |   2 +-
 drivers/mempool/octeontx/octeontx_fpavf.c     |   6 +-
 drivers/net/ark/ark_ethdev.c                  |   4 +-
 drivers/net/atlantic/atl_ethdev.c             |   2 +-
 drivers/net/avp/avp_ethdev.c                  |  20 +-
 drivers/net/axgbe/axgbe_ethdev.c              |   4 +-
 drivers/net/bnx2x/bnx2x_ethdev.c              |   6 +-
 drivers/net/bnxt/bnxt_ethdev.c                |   8 +-
 drivers/net/cpfl/cpfl_ethdev.c                |   4 +-
 drivers/net/cxgbe/cxgbe_ethdev.c              |   2 +-
 drivers/net/cxgbe/cxgbe_main.c                |   2 +-
 drivers/net/cxgbe/cxgbevf_ethdev.c            |   2 +-
 drivers/net/cxgbe/cxgbevf_main.c              |   2 +-
 drivers/net/e1000/em_ethdev.c                 |   4 +-
 drivers/net/e1000/igb_ethdev.c                |   4 +-
 drivers/net/ena/ena_ethdev.c                  |   4 +-
 drivers/net/enetc/enetc_ethdev.c              |   2 +-
 drivers/net/enic/enic_main.c                  |   4 +-
 drivers/net/fm10k/fm10k_ethdev.c              |   2 +-
 drivers/net/gve/gve_ethdev.c                  |   4 +-
 drivers/net/hinic/base/hinic_pmd_hwif.c       |  14 +-
 drivers/net/hns3/hns3_ethdev.c                |   2 +-
 drivers/net/hns3/hns3_ethdev_vf.c             |   2 +-
 drivers/net/hns3/hns3_rxtx.c                  |   4 +-
 drivers/net/i40e/i40e_ethdev.c                |   2 +-
 drivers/net/iavf/iavf_ethdev.c                |   2 +-
 drivers/net/ice/ice_dcf.c                     |   2 +-
 drivers/net/ice/ice_ethdev.c                  |   2 +-
 drivers/net/idpf/idpf_ethdev.c                |   4 +-
 drivers/net/igc/igc_ethdev.c                  |   2 +-
 drivers/net/ionic/ionic_dev_pci.c             |   2 +-
 drivers/net/ixgbe/ixgbe_ethdev.c              |   4 +-
 drivers/net/liquidio/lio_ethdev.c             |   4 +-
 drivers/net/nfp/nfp_ethdev.c                  |   2 +-
 drivers/net/nfp/nfp_ethdev_vf.c               |   6 +-
 drivers/net/nfp/nfpcore/nfp_cpp_pcie_ops.c    |   4 +-
 drivers/net/ngbe/ngbe_ethdev.c                |   2 +-
 drivers/net/octeon_ep/otx_ep_ethdev.c         |   2 +-
 drivers/net/octeontx/base/octeontx_pkivf.c    |   6 +-
 drivers/net/octeontx/base/octeontx_pkovf.c    |  12 +-
 drivers/net/qede/qede_main.c                  |   6 +-
 drivers/net/sfc/sfc.c                         |   2 +-
 drivers/net/thunderx/nicvf_ethdev.c           |   2 +-
 drivers/net/txgbe/txgbe_ethdev.c              |   2 +-
 drivers/net/txgbe/txgbe_ethdev_vf.c           |   2 +-
 drivers/net/virtio/virtio_pci.c               |   6 +-
 drivers/net/vmxnet3/vmxnet3_ethdev.c          |   4 +-
 drivers/raw/cnxk_bphy/cnxk_bphy.c             |  10 +-
 drivers/raw/cnxk_bphy/cnxk_bphy_cgx.c         |   6 +-
 drivers/raw/ifpga/afu_pmd_n3000.c             |   4 +-
 drivers/raw/ifpga/ifpga_rawdev.c              |   6 +-
 drivers/raw/ntb/ntb_hw_intel.c                |   8 +-
 drivers/vdpa/ifc/ifcvf_vdpa.c                 |   6 +-
 drivers/vdpa/sfc/sfc_vdpa_hw.c                |   2 +-
 drivers/vdpa/sfc/sfc_vdpa_ops.c               |   2 +-
 86 files changed, 477 insertions(+), 285 deletions(-)

diff --git a/drivers/baseband/acc/rte_acc100_pmd.c b/drivers/baseband/acc/rte_acc100_pmd.c
index 5362d39c30..7c22e3fad5 100644
--- a/drivers/baseband/acc/rte_acc100_pmd.c
+++ b/drivers/baseband/acc/rte_acc100_pmd.c
@@ -4260,12 +4260,12 @@ acc100_bbdev_init(struct rte_bbdev *dev, struct rte_pci_driver *drv)
 			!strcmp(drv->driver.name, RTE_STR(ACC100PF_DRIVER_NAME));
 
 	((struct acc_device *) dev->data->dev_private)->mmio_base =
-			pci_dev->mem_resource[0].addr;
+			pci_dev->pci_mem[0].mem_res.addr;
 
 	rte_bbdev_log_debug("Init device %s [%s] @ vaddr %p paddr %#"PRIx64"",
 			drv->driver.name, dev->data->name,
-			(void *)pci_dev->mem_resource[0].addr,
-			pci_dev->mem_resource[0].phys_addr);
+			(void *)pci_dev->pci_mem[0].mem_res.addr,
+			pci_dev->pci_mem[0].mem_res.phys_addr);
 }
 
 static int acc100_pci_probe(struct rte_pci_driver *pci_drv,
diff --git a/drivers/baseband/acc/rte_vrb_pmd.c b/drivers/baseband/acc/rte_vrb_pmd.c
index 9e5a73c9c7..5260ab75f4 100644
--- a/drivers/baseband/acc/rte_vrb_pmd.c
+++ b/drivers/baseband/acc/rte_vrb_pmd.c
@@ -3349,7 +3349,7 @@ vrb_bbdev_init(struct rte_bbdev *dev, struct rte_pci_driver *drv)
 	dev->dequeue_fft_ops = vrb_dequeue_fft;
 
 	d->pf_device = !strcmp(drv->driver.name, RTE_STR(VRB_PF_DRIVER_NAME));
-	d->mmio_base = pci_dev->mem_resource[0].addr;
+	d->mmio_base = pci_dev->pci_mem[0].mem_res.addr;
 
 	/* Device variant specific handling. */
 	if ((pci_dev->id.device_id == RTE_VRB1_PF_DEVICE_ID) ||
@@ -3367,8 +3367,8 @@ vrb_bbdev_init(struct rte_bbdev *dev, struct rte_pci_driver *drv)
 
 	rte_bbdev_log_debug("Init device %s [%s] @ vaddr %p paddr %#"PRIx64"",
 			drv->driver.name, dev->data->name,
-			(void *)pci_dev->mem_resource[0].addr,
-			pci_dev->mem_resource[0].phys_addr);
+			(void *)pci_dev->pci_mem[0].mem_res.addr,
+			pci_dev->pci_mem[0].mem_res.phys_addr);
 }
 
 static int vrb_pci_probe(struct rte_pci_driver *pci_drv,
diff --git a/drivers/baseband/fpga_5gnr_fec/rte_fpga_5gnr_fec.c b/drivers/baseband/fpga_5gnr_fec/rte_fpga_5gnr_fec.c
index f29565af8c..9571119487 100644
--- a/drivers/baseband/fpga_5gnr_fec/rte_fpga_5gnr_fec.c
+++ b/drivers/baseband/fpga_5gnr_fec/rte_fpga_5gnr_fec.c
@@ -2149,13 +2149,13 @@ fpga_5gnr_fec_init(struct rte_bbdev *dev, struct rte_pci_driver *drv)
 			!strcmp(drv->driver.name,
 					RTE_STR(FPGA_5GNR_FEC_PF_DRIVER_NAME));
 	((struct fpga_5gnr_fec_device *) dev->data->dev_private)->mmio_base =
-			pci_dev->mem_resource[0].addr;
+			pci_dev->pci_mem[0].mem_res.addr;
 
 	rte_bbdev_log_debug(
 			"Init device %s [%s] @ virtaddr %p phyaddr %#"PRIx64,
 			drv->driver.name, dev->data->name,
-			(void *)pci_dev->mem_resource[0].addr,
-			pci_dev->mem_resource[0].phys_addr);
+			(void *)pci_dev->pci_mem[0].mem_res.addr,
+			pci_dev->pci_mem[0].mem_res.phys_addr);
 }
 
 static int
diff --git a/drivers/baseband/fpga_lte_fec/fpga_lte_fec.c b/drivers/baseband/fpga_lte_fec/fpga_lte_fec.c
index a4a963f74d..b2a937d2c5 100644
--- a/drivers/baseband/fpga_lte_fec/fpga_lte_fec.c
+++ b/drivers/baseband/fpga_lte_fec/fpga_lte_fec.c
@@ -2326,13 +2326,13 @@ fpga_lte_fec_init(struct rte_bbdev *dev, struct rte_pci_driver *drv)
 			!strcmp(drv->driver.name,
 					RTE_STR(FPGA_LTE_FEC_PF_DRIVER_NAME));
 	((struct fpga_lte_fec_device *) dev->data->dev_private)->mmio_base =
-			pci_dev->mem_resource[0].addr;
+			pci_dev->pci_mem[0].mem_res.addr;
 
 	rte_bbdev_log_debug(
 			"Init device %s [%s] @ virtaddr %p phyaddr %#"PRIx64,
 			drv->driver.name, dev->data->name,
-			(void *)pci_dev->mem_resource[0].addr,
-			pci_dev->mem_resource[0].phys_addr);
+			(void *)pci_dev->pci_mem[0].mem_res.addr,
+			pci_dev->pci_mem[0].mem_res.phys_addr);
 }
 
 static int
diff --git a/drivers/bus/pci/bsd/pci.c b/drivers/bus/pci/bsd/pci.c
index 27f12590d4..cf99b97e01 100644
--- a/drivers/bus/pci/bsd/pci.c
+++ b/drivers/bus/pci/bsd/pci.c
@@ -186,17 +186,17 @@ pci_uio_map_resource_by_index(struct rte_pci_device *dev, int res_idx,
 	/* if matching map is found, then use it */
 	offset = res_idx * pagesz;
 	mapaddr = pci_map_resource(NULL, fd, (off_t)offset,
-			(size_t)dev->mem_resource[res_idx].len, 0);
+			(size_t)dev->pci_mem[res_idx].mem_res.len, 0);
 	close(fd);
 	if (mapaddr == NULL)
 		goto error;
 
-	maps[map_idx].phaddr = dev->mem_resource[res_idx].phys_addr;
-	maps[map_idx].size = dev->mem_resource[res_idx].len;
+	maps[map_idx].phaddr = dev->pci_mem[res_idx].mem_res.phys_addr;
+	maps[map_idx].size = dev->pci_mem[res_idx].mem_res.len;
 	maps[map_idx].addr = mapaddr;
 	maps[map_idx].offset = offset;
 	strcpy(maps[map_idx].path, devname);
-	dev->mem_resource[res_idx].addr = mapaddr;
+	dev->pci_mem[res_idx].mem_res.addr = mapaddr;
 
 	return 0;
 
@@ -493,10 +493,10 @@ int rte_pci_write_config(const struct rte_pci_device *dev,
 int rte_pci_mmio_read(const struct rte_pci_device *dev, int bar,
 		      void *buf, size_t len, off_t offset)
 {
-	if (bar >= PCI_MAX_RESOURCE || dev->mem_resource[bar].addr == NULL ||
-			(uint64_t)offset + len > dev->mem_resource[bar].len)
+	if (bar >= PCI_MAX_RESOURCE || dev->pci_mem[bar].mem_res.addr == NULL ||
+			(uint64_t)offset + len > dev->pci_mem[bar].mem_res.len)
 		return -1;
-	memcpy(buf, (uint8_t *)dev->mem_resource[bar].addr + offset, len);
+	memcpy(buf, (uint8_t *)dev->pci_mem[bar].mem_res.addr + offset, len);
 	return len;
 }
 
@@ -504,10 +504,10 @@ int rte_pci_mmio_read(const struct rte_pci_device *dev, int bar,
 int rte_pci_mmio_write(const struct rte_pci_device *dev, int bar,
 		       const void *buf, size_t len, off_t offset)
 {
-	if (bar >= PCI_MAX_RESOURCE || dev->mem_resource[bar].addr == NULL ||
-			(uint64_t)offset + len > dev->mem_resource[bar].len)
+	if (bar >= PCI_MAX_RESOURCE || dev->pci_mem[bar].mem_res.addr == NULL ||
+			(uint64_t)offset + len > dev->pci_mem[bar].mem_res.len)
 		return -1;
-	memcpy((uint8_t *)dev->mem_resource[bar].addr + offset, buf, len);
+	memcpy((uint8_t *)dev->pci_mem[bar].mem_res.addr + offset, buf, len);
 	return len;
 }
 
diff --git a/drivers/bus/pci/bus_pci_driver.h b/drivers/bus/pci/bus_pci_driver.h
index be32263a82..16be48348c 100644
--- a/drivers/bus/pci/bus_pci_driver.h
+++ b/drivers/bus/pci/bus_pci_driver.h
@@ -28,6 +28,27 @@ enum rte_pci_kernel_driver {
 	RTE_PCI_KDRV_NET_UIO,      /* NetUIO for Windows */
 };
 
+struct rte_mem_map_area {
+	void *addr;
+	uint64_t offset;
+	uint64_t size;
+};
+
+struct rte_sparse_mem_map {
+	uint64_t phys_addr;
+	uint64_t len;
+	uint32_t nr_maps;
+	struct rte_mem_map_area *areas;
+};
+
+struct rte_pci_mem_resource {
+	bool is_sparse;
+	union {
+		struct rte_mem_resource mem_res;
+		struct rte_sparse_mem_map sparse_mem;
+	};
+};
+
 /**
  * A structure describing a PCI device.
  */
@@ -36,8 +57,7 @@ struct rte_pci_device {
 	struct rte_device device;           /**< Inherit core device */
 	struct rte_pci_addr addr;           /**< PCI location. */
 	struct rte_pci_id id;               /**< PCI ID. */
-	struct rte_mem_resource mem_resource[PCI_MAX_RESOURCE];
-					    /**< PCI Memory Resource */
+	struct rte_pci_mem_resource pci_mem[PCI_MAX_RESOURCE]; /**< PCI Memory Resource */
 	struct rte_intr_handle *intr_handle; /**< Interrupt handle */
 	struct rte_pci_driver *driver;      /**< PCI driver used in probing */
 	uint16_t max_vfs;                   /**< sriov enable if not zero */
diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index 3d237398d9..98856b5df3 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -179,7 +179,7 @@ pci_parse_sysfs_resource(const char *filename, struct rte_pci_device *dev)
 		return -1;
 	}
 
-	for (i = 0; i<PCI_MAX_RESOURCE; i++) {
+	for (i = 0; i < PCI_MAX_RESOURCE; i++) {
 
 		if (fgets(buf, sizeof(buf), f) == NULL) {
 			RTE_LOG(ERR, EAL,
@@ -191,10 +191,10 @@ pci_parse_sysfs_resource(const char *filename, struct rte_pci_device *dev)
 			goto error;
 
 		if (flags & IORESOURCE_MEM) {
-			dev->mem_resource[i].phys_addr = phys_addr;
-			dev->mem_resource[i].len = end_addr - phys_addr + 1;
+			dev->pci_mem[i].mem_res.phys_addr = phys_addr;
+			dev->pci_mem[i].mem_res.len = end_addr - phys_addr + 1;
 			/* not mapped for now */
-			dev->mem_resource[i].addr = NULL;
+			dev->pci_mem[i].mem_res.addr = NULL;
 		}
 	}
 	fclose(f);
@@ -347,9 +347,8 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr *addr)
 					dev2->max_vfs = dev->max_vfs;
 					dev2->id = dev->id;
 					pci_common_set(dev2);
-					memmove(dev2->mem_resource,
-						dev->mem_resource,
-						sizeof(dev->mem_resource));
+					memmove(dev2->pci_mem, dev->pci_mem,
+						sizeof(dev->pci_mem));
 				} else {
 					/**
 					 * If device is plugged and driver is
diff --git a/drivers/bus/pci/linux/pci_uio.c b/drivers/bus/pci/linux/pci_uio.c
index 2bf16e9369..45e174f305 100644
--- a/drivers/bus/pci/linux/pci_uio.c
+++ b/drivers/bus/pci/linux/pci_uio.c
@@ -59,10 +59,10 @@ int
 pci_uio_mmio_read(const struct rte_pci_device *dev, int bar,
 		  void *buf, size_t len, off_t offset)
 {
-	if (bar >= PCI_MAX_RESOURCE || dev->mem_resource[bar].addr == NULL ||
-			(uint64_t)offset + len > dev->mem_resource[bar].len)
+	if (bar >= PCI_MAX_RESOURCE || dev->pci_mem[bar].mem_res.addr == NULL ||
+			(uint64_t)offset + len > dev->pci_mem[bar].mem_res.len)
 		return -1;
-	memcpy(buf, (uint8_t *)dev->mem_resource[bar].addr + offset, len);
+	memcpy(buf, (uint8_t *)dev->pci_mem[bar].mem_res.addr + offset, len);
 	return len;
 }
 
@@ -70,10 +70,10 @@ int
 pci_uio_mmio_write(const struct rte_pci_device *dev, int bar,
 		   const void *buf, size_t len, off_t offset)
 {
-	if (bar >= PCI_MAX_RESOURCE || dev->mem_resource[bar].addr == NULL ||
-			(uint64_t)offset + len > dev->mem_resource[bar].len)
+	if (bar >= PCI_MAX_RESOURCE || dev->pci_mem[bar].mem_res.addr == NULL ||
+			(uint64_t)offset + len > dev->pci_mem[bar].mem_res.len)
 		return -1;
-	memcpy((uint8_t *)dev->mem_resource[bar].addr + offset, buf, len);
+	memcpy((uint8_t *)dev->pci_mem[bar].mem_res.addr + offset, buf, len);
 	return len;
 }
 
@@ -388,22 +388,22 @@ pci_uio_map_resource_by_index(struct rte_pci_device *dev, int res_idx,
 		pci_map_addr = pci_find_max_end_va();
 
 	mapaddr = pci_map_resource(pci_map_addr, fd, 0,
-			(size_t)dev->mem_resource[res_idx].len, 0);
+			(size_t)dev->pci_mem[res_idx].mem_res.len, 0);
 	close(fd);
 	if (mapaddr == NULL)
 		goto error;
 
 	pci_map_addr = RTE_PTR_ADD(mapaddr,
-			(size_t)dev->mem_resource[res_idx].len);
+			(size_t)dev->pci_mem[res_idx].mem_res.len);
 
 	pci_map_addr = RTE_PTR_ALIGN(pci_map_addr, sysconf(_SC_PAGE_SIZE));
 
-	maps[map_idx].phaddr = dev->mem_resource[res_idx].phys_addr;
-	maps[map_idx].size = dev->mem_resource[res_idx].len;
+	maps[map_idx].phaddr = dev->pci_mem[res_idx].mem_res.phys_addr;
+	maps[map_idx].size = dev->pci_mem[res_idx].mem_res.len;
 	maps[map_idx].addr = mapaddr;
 	maps[map_idx].offset = 0;
 	strcpy(maps[map_idx].path, devname);
-	dev->mem_resource[res_idx].addr = mapaddr;
+	dev->pci_mem[res_idx].mem_res.addr = mapaddr;
 
 	return 0;
 
@@ -463,7 +463,7 @@ pci_uio_ioport_map(struct rte_pci_device *dev, int bar,
 
 		RTE_LOG(DEBUG, EAL, "%s(): PIO BAR %08lx detected\n", __func__, base);
 	} else if (flags & IORESOURCE_MEM) {
-		base = (unsigned long)dev->mem_resource[bar].addr;
+		base = (unsigned long)dev->pci_mem[bar].mem_res.addr;
 		RTE_LOG(DEBUG, EAL, "%s(): MMIO BAR %08lx detected\n", __func__, base);
 	} else {
 		RTE_LOG(ERR, EAL, "%s(): unknown BAR type\n", __func__);
diff --git a/drivers/bus/pci/linux/pci_vfio.c b/drivers/bus/pci/linux/pci_vfio.c
index f6289c907f..3c30055495 100644
--- a/drivers/bus/pci/linux/pci_vfio.c
+++ b/drivers/bus/pci/linux/pci_vfio.c
@@ -673,6 +673,82 @@ pci_vfio_mmap_bar(int vfio_dev_fd, struct mapped_pci_resource *vfio_res,
 	return 0;
 }
 
+static int
+pci_vfio_sparse_mmap_bar(int vfio_dev_fd, struct mapped_pci_resource *vfio_res,
+		struct vfio_region_sparse_mmap_area *vfio_areas,
+		uint32_t nr_areas, int bar_index, int additional_flags,
+		int numa_node)
+{
+	struct pci_map *map = &vfio_res->maps[bar_index];
+	struct rte_mem_map_area *area;
+	struct vfio_region_sparse_mmap_area *sparse;
+	void *bar_addr;
+	uint32_t i, j;
+
+	map->nr_areas = nr_areas;
+
+	if (map->size == 0) {
+		RTE_LOG(DEBUG, EAL, "Bar size is 0, skip BAR%d\n", bar_index);
+		return 0;
+	}
+
+	if (!map->nr_areas) {
+		RTE_LOG(DEBUG, EAL, "Skip bar %d with no sparse mmap areas\n",
+			bar_index);
+		map->areas = NULL;
+		return 0;
+	}
+
+	if (map->areas == NULL) {
+		map->areas = rte_zmalloc_socket(NULL,
+				sizeof(*map->areas) * nr_areas,
+				RTE_CACHE_LINE_SIZE, numa_node);
+		if (map->areas == NULL) {
+			RTE_LOG(ERR, EAL,
+				"Cannot alloc memory for sparse map areas\n");
+			return -1;
+		}
+	}
+
+	for (i = 0; i < map->nr_areas; i++) {
+		area = &map->areas[i];
+		sparse = &vfio_areas[i];
+
+		bar_addr = mmap(map->addr, sparse->size, 0, MAP_PRIVATE |
+				MAP_ANONYMOUS | additional_flags, -1, 0);
+		if (bar_addr != MAP_FAILED) {
+			area->addr = pci_map_resource(bar_addr, vfio_dev_fd,
+				map->offset + sparse->offset, sparse->size,
+				RTE_MAP_FORCE_ADDRESS);
+			if (area->addr == NULL) {
+				munmap(bar_addr, sparse->size);
+				RTE_LOG(ERR, EAL, "Failed to map pci BAR%d\n",
+					bar_index);
+				goto err_map;
+			}
+
+			area->offset = sparse->offset;
+			area->size = sparse->size;
+		} else {
+			RTE_LOG(ERR, EAL, "Failed to create inaccessible mapping for BAR%d\n",
+				bar_index);
+			goto err_map;
+		}
+	}
+
+	return 0;
+
+err_map:
+	for (j = 0; j < i; j++) {
+		pci_unmap_resource(map->areas[j].addr, map->areas[j].size);
+		map->areas[j].offset = 0;
+		map->areas[j].size = 0;
+	}
+	rte_free(map->areas);
+	map->nr_areas = 0;
+	return -1;
+}
+
 /*
  * region info may contain capability headers, so we need to keep reallocating
  * the memory until we match allocated memory size with argsz.
@@ -789,6 +865,31 @@ pci_vfio_fill_regions(struct rte_pci_device *dev, int vfio_dev_fd,
 	return 0;
 }
 
+static void
+clean_up_pci_resource(struct mapped_pci_resource *vfio_res)
+{
+	struct pci_map *map;
+	uint32_t i, j;
+
+	for (i = 0; i < PCI_MAX_RESOURCE; i++) {
+		map = &vfio_res->maps[i];
+		if (map->nr_areas > 1) {
+			for (j = 0; j < map->nr_areas; j++)
+				pci_unmap_resource(map->areas[j].addr,
+					map->areas[j].size);
+		} else {
+			/*
+			 * We do not need to be aware of MSI-X BAR mappings.
+			 * Using current maps array is enough.
+			 */
+			if (map->addr)
+				pci_unmap_resource(map->addr, map->size);
+		}
+	}
+
+	rte_free(map->areas);
+}
+
 static int
 pci_vfio_map_resource_primary(struct rte_pci_device *dev)
 {
@@ -875,6 +976,8 @@ pci_vfio_map_resource_primary(struct rte_pci_device *dev)
 
 	for (i = 0; i < vfio_res->nb_maps; i++) {
 		void *bar_addr;
+		struct vfio_info_cap_header *hdr;
+		struct vfio_region_info_cap_sparse_mmap *sparse;
 
 		ret = pci_vfio_get_region_info(vfio_dev_fd, &reg, i);
 		if (ret < 0) {
@@ -920,15 +1023,39 @@ pci_vfio_map_resource_primary(struct rte_pci_device *dev)
 		maps[i].size = reg->size;
 		maps[i].path = NULL; /* vfio doesn't have per-resource paths */
 
-		ret = pci_vfio_mmap_bar(vfio_dev_fd, vfio_res, i, 0);
-		if (ret < 0) {
-			RTE_LOG(ERR, EAL, "%s mapping BAR%i failed: %s\n",
-					pci_addr, i, strerror(errno));
-			free(reg);
-			goto err_vfio_res;
-		}
+		hdr = pci_vfio_info_cap(reg, VFIO_REGION_INFO_CAP_SPARSE_MMAP);
+
+		if (hdr != NULL) {
+			sparse = container_of(hdr,
+				struct vfio_region_info_cap_sparse_mmap,
+				header);
+
+			ret = pci_vfio_sparse_mmap_bar(vfio_dev_fd, vfio_res,
+				sparse->areas, sparse->nr_areas, i, 0,
+				dev->device.numa_node);
+			if (ret < 0) {
+				RTE_LOG(ERR, EAL, "%s sparse mapping BAR%i failed: %s\n",
+						pci_addr, i, strerror(errno));
+				free(reg);
+				goto err_vfio_res;
+			}
 
-		dev->mem_resource[i].addr = maps[i].addr;
+			dev->pci_mem[i].is_sparse = true;
+			dev->pci_mem[i].sparse_mem.len = maps[i].size;
+			dev->pci_mem[i].sparse_mem.nr_maps = maps[i].nr_areas;
+			dev->pci_mem[i].sparse_mem.areas = maps[i].areas;
+		} else {
+			ret = pci_vfio_mmap_bar(vfio_dev_fd, vfio_res, i, 0);
+			if (ret < 0) {
+				RTE_LOG(ERR, EAL, "%s mapping BAR%i failed: %s\n",
+						pci_addr, i, strerror(errno));
+				free(reg);
+				goto err_vfio_res;
+			}
+
+			dev->pci_mem[i].is_sparse = false;
+			dev->pci_mem[i].mem_res.addr = maps[i].addr;
+		}
 
 		free(reg);
 	}
@@ -949,6 +1076,7 @@ pci_vfio_map_resource_primary(struct rte_pci_device *dev)
 
 	return 0;
 err_vfio_res:
+	clean_up_pci_resource(vfio_res);
 	rte_free(vfio_res);
 err_vfio_dev_fd:
 	rte_vfio_release_device(rte_pci_get_sysfs_path(),
@@ -968,7 +1096,7 @@ pci_vfio_map_resource_secondary(struct rte_pci_device *dev)
 	struct mapped_pci_res_list *vfio_res_list =
 		RTE_TAILQ_CAST(rte_vfio_tailq.head, mapped_pci_res_list);
 
-	struct pci_map *maps;
+	struct pci_map *maps, *cur;
 
 	if (rte_intr_fd_set(dev->intr_handle, -1))
 		return -1;
@@ -1008,14 +1136,50 @@ pci_vfio_map_resource_secondary(struct rte_pci_device *dev)
 	maps = vfio_res->maps;
 
 	for (i = 0; i < vfio_res->nb_maps; i++) {
-		ret = pci_vfio_mmap_bar(vfio_dev_fd, vfio_res, i, MAP_FIXED);
-		if (ret < 0) {
-			RTE_LOG(ERR, EAL, "%s mapping BAR%i failed: %s\n",
-					pci_addr, i, strerror(errno));
-			goto err_vfio_dev_fd;
-		}
+		cur = &maps[i];
+		if (cur->nr_areas > 1) {
+			struct vfio_region_sparse_mmap_area *areas;
+			uint32_t i;
+
+			areas = malloc(sizeof(*areas) * cur->nr_areas);
+			if (areas == NULL) {
+				RTE_LOG(ERR, EAL, "Failed to alloc vfio areas for %s\n",
+					pci_addr);
+				goto err_vfio_dev_fd;
+			}
+
+			for (i = 0; i < cur->nr_areas; i++) {
+				areas[i].offset = cur->areas[i].offset;
+				areas[i].size = cur->areas[i].size;
+			}
+
+			ret = pci_vfio_sparse_mmap_bar(vfio_dev_fd, vfio_res,
+				areas, cur->nr_areas, i, MAP_FIXED,
+				dev->device.numa_node);
+			if (ret < 0) {
+				RTE_LOG(ERR, EAL, "%s sparse mapping BAR%i failed: %s\n",
+						pci_addr, i, strerror(errno));
+				free(areas);
+				goto err_vfio_dev_fd;
+			}
 
-		dev->mem_resource[i].addr = maps[i].addr;
+			dev->pci_mem[i].is_sparse = true;
+			dev->pci_mem[i].sparse_mem.len = cur->size;
+			dev->pci_mem[i].sparse_mem.nr_maps = cur->nr_areas;
+			dev->pci_mem[i].sparse_mem.areas = cur->areas;
+			free(areas);
+		} else {
+			ret = pci_vfio_mmap_bar(vfio_dev_fd, vfio_res,
+				i, MAP_FIXED);
+			if (ret < 0) {
+				RTE_LOG(ERR, EAL, "%s mapping BAR%i failed: %s\n",
+						pci_addr, i, strerror(errno));
+				goto err_vfio_dev_fd;
+			}
+
+			dev->pci_mem[i].is_sparse = false;
+			dev->pci_mem[i].mem_res.addr = cur->addr;
+		}
 	}
 
 	/* we need save vfio_dev_fd, so it can be used during release */
@@ -1052,8 +1216,6 @@ find_and_unmap_vfio_resource(struct mapped_pci_res_list *vfio_res_list,
 			const char *pci_addr)
 {
 	struct mapped_pci_resource *vfio_res = NULL;
-	struct pci_map *maps;
-	int i;
 
 	/* Get vfio_res */
 	TAILQ_FOREACH(vfio_res, vfio_res_list, next) {
@@ -1062,25 +1224,13 @@ find_and_unmap_vfio_resource(struct mapped_pci_res_list *vfio_res_list,
 		break;
 	}
 
-	if  (vfio_res == NULL)
+	if (vfio_res == NULL)
 		return vfio_res;
 
 	RTE_LOG(INFO, EAL, "Releasing PCI mapped resource for %s\n",
 		pci_addr);
 
-	maps = vfio_res->maps;
-	for (i = 0; i < vfio_res->nb_maps; i++) {
-
-		/*
-		 * We do not need to be aware of MSI-X table BAR mappings as
-		 * when mapping. Just using current maps array is enough
-		 */
-		if (maps[i].addr) {
-			RTE_LOG(INFO, EAL, "Calling pci_unmap_resource for %s at %p\n",
-				pci_addr, maps[i].addr);
-			pci_unmap_resource(maps[i].addr, maps[i].size);
-		}
-	}
+	clean_up_pci_resource(vfio_res);
 
 	return vfio_res;
 }
diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
index 52404ab0fe..18a6b38fd8 100644
--- a/drivers/bus/pci/pci_common.c
+++ b/drivers/bus/pci/pci_common.c
@@ -482,11 +482,10 @@ pci_dump_one_device(FILE *f, struct rte_pci_device *dev)
 	fprintf(f, " - vendor:%x device:%x\n", dev->id.vendor_id,
 	       dev->id.device_id);
 
-	for (i = 0; i != sizeof(dev->mem_resource) /
-		sizeof(dev->mem_resource[0]); i++) {
+	for (i = 0; i != PCI_MAX_RESOURCE; i++) {
 		fprintf(f, "   %16.16"PRIx64" %16.16"PRIx64"\n",
-			dev->mem_resource[i].phys_addr,
-			dev->mem_resource[i].len);
+			dev->pci_mem[i].mem_res.phys_addr,
+			dev->pci_mem[i].mem_res.len);
 	}
 	return 0;
 }
@@ -582,20 +581,38 @@ pci_find_device_by_addr(const void *failure_addr)
 {
 	struct rte_pci_device *pdev = NULL;
 	uint64_t check_point, start, end, len;
-	int i;
+	struct rte_pci_mem_resource *pci_mem;
+	struct rte_mem_map_area *ar;
+	uint32_t i, j;
 
 	check_point = (uint64_t)(uintptr_t)failure_addr;
 
 	FOREACH_DEVICE_ON_PCIBUS(pdev) {
-		for (i = 0; i != RTE_DIM(pdev->mem_resource); i++) {
-			start = (uint64_t)(uintptr_t)pdev->mem_resource[i].addr;
-			len = pdev->mem_resource[i].len;
-			end = start + len;
-			if (check_point >= start && check_point < end) {
-				RTE_LOG(DEBUG, EAL, "Failure address %16.16"
-					PRIx64" belongs to device %s!\n",
-					check_point, pdev->device.name);
-				return pdev;
+		for (i = 0; i != PCI_MAX_RESOURCE; i++) {
+			pci_mem = &pdev->pci_mem[i];
+			if (pci_mem->is_sparse) {
+				for (j = 0; j != pci_mem->sparse_mem.nr_maps; j++) {
+					ar = &pci_mem->sparse_mem.areas[j];
+					start = (uint64_t)(uintptr_t)ar->addr;
+					len = ar->size;
+					end = start + len;
+					if (check_point >= start && check_point < end) {
+						RTE_LOG(DEBUG, EAL, "Failure address %16.16"
+							PRIx64" belongs to device %s!\n",
+							check_point, pdev->device.name);
+						return pdev;
+					}
+				}
+			} else {
+				start = (uint64_t)(uintptr_t)pci_mem->mem_res.addr;
+				len = pci_mem->mem_res.len;
+				end = start + len;
+				if (check_point >= start && check_point < end) {
+					RTE_LOG(DEBUG, EAL, "Failure address %16.16"
+						PRIx64" belongs to device %s!\n",
+						check_point, pdev->device.name);
+					return pdev;
+				}
 			}
 		}
 	}
diff --git a/drivers/bus/pci/pci_common_uio.c b/drivers/bus/pci/pci_common_uio.c
index 76c661f054..d4ee42c0ca 100644
--- a/drivers/bus/pci/pci_common_uio.c
+++ b/drivers/bus/pci/pci_common_uio.c
@@ -71,7 +71,8 @@ pci_uio_map_secondary(struct rte_pci_device *dev)
 				}
 				return -1;
 			}
-			dev->mem_resource[i].addr = mapaddr;
+			dev->pci_mem[i].is_sparse = false;
+			dev->pci_mem[i].mem_res.addr = mapaddr;
 		}
 		return 0;
 	}
@@ -108,7 +109,8 @@ pci_uio_map_resource(struct rte_pci_device *dev)
 	/* Map all BARs */
 	for (i = 0; i != PCI_MAX_RESOURCE; i++) {
 		/* skip empty BAR */
-		phaddr = dev->mem_resource[i].phys_addr;
+		dev->pci_mem[i].is_sparse = false;
+		phaddr = dev->pci_mem[i].mem_res.phys_addr;
 		if (phaddr == 0)
 			continue;
 
@@ -164,10 +166,10 @@ pci_uio_remap_resource(struct rte_pci_device *dev)
 	/* Remap all BARs */
 	for (i = 0; i != PCI_MAX_RESOURCE; i++) {
 		/* skip empty BAR */
-		if (dev->mem_resource[i].phys_addr == 0)
+		if (dev->pci_mem[i].mem_res.phys_addr == 0)
 			continue;
-		map_address = mmap(dev->mem_resource[i].addr,
-				(size_t)dev->mem_resource[i].len,
+		map_address = mmap(dev->pci_mem[i].mem_res.addr,
+				(size_t)dev->pci_mem[i].mem_res.len,
 				PROT_READ | PROT_WRITE,
 				MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
 		if (map_address == MAP_FAILED) {
diff --git a/drivers/bus/pci/private.h b/drivers/bus/pci/private.h
index 2d6991ccb7..835964f3e4 100644
--- a/drivers/bus/pci/private.h
+++ b/drivers/bus/pci/private.h
@@ -121,6 +121,8 @@ struct pci_map {
 	uint64_t offset;
 	uint64_t size;
 	uint64_t phaddr;
+	uint32_t nr_areas;
+	struct rte_mem_map_area *areas;
 };
 
 struct pci_msix_table {
diff --git a/drivers/common/cnxk/roc_dev.c b/drivers/common/cnxk/roc_dev.c
index 2388237186..20bc514cf7 100644
--- a/drivers/common/cnxk/roc_dev.c
+++ b/drivers/common/cnxk/roc_dev.c
@@ -1151,8 +1151,8 @@ dev_init(struct dev *dev, struct plt_pci_device *pci_dev)
 	if (!dev_cache_line_size_valid())
 		return -EFAULT;
 
-	bar2 = (uintptr_t)pci_dev->mem_resource[2].addr;
-	bar4 = (uintptr_t)pci_dev->mem_resource[4].addr;
+	bar2 = (uintptr_t)pci_dev->pci_mem[2].mem_res.addr;
+	bar4 = (uintptr_t)pci_dev->pci_mem[4].mem_res.addr;
 	if (bar2 == 0 || bar4 == 0) {
 		plt_err("Failed to get PCI bars");
 		rc = -ENODEV;
diff --git a/drivers/common/cnxk/roc_dpi.c b/drivers/common/cnxk/roc_dpi.c
index 93c8318a3d..bd6c87d353 100644
--- a/drivers/common/cnxk/roc_dpi.c
+++ b/drivers/common/cnxk/roc_dpi.c
@@ -152,7 +152,7 @@ roc_dpi_dev_init(struct roc_dpi *roc_dpi)
 	struct plt_pci_device *pci_dev = roc_dpi->pci_dev;
 	uint16_t vfid;
 
-	roc_dpi->rbase = pci_dev->mem_resource[0].addr;
+	roc_dpi->rbase = pci_dev->pci_mem[0].mem_res.addr;
 	vfid = ((pci_dev->addr.devid & 0x1F) << 3) |
 	       (pci_dev->addr.function & 0x7);
 	vfid -= 1;
diff --git a/drivers/common/cnxk/roc_ml.c b/drivers/common/cnxk/roc_ml.c
index 7390697b1d..e1d3f3dc38 100644
--- a/drivers/common/cnxk/roc_ml.c
+++ b/drivers/common/cnxk/roc_ml.c
@@ -100,9 +100,9 @@ roc_ml_addr_pa_to_offset(struct roc_ml *roc_ml, uint64_t phys_addr)
 	struct ml *ml = roc_ml_to_ml_priv(roc_ml);
 
 	if (roc_model_is_cn10ka())
-		return phys_addr - ml->pci_dev->mem_resource[0].phys_addr;
+		return phys_addr - ml->pci_dev->pci_mem[0].mem_res.phys_addr;
 	else
-		return phys_addr - ml->pci_dev->mem_resource[0].phys_addr - ML_MLAB_BLK_OFFSET;
+		return phys_addr - ml->pci_dev->pci_mem[0].mem_res.phys_addr - ML_MLAB_BLK_OFFSET;
 }
 
 uint64_t
@@ -111,9 +111,9 @@ roc_ml_addr_offset_to_pa(struct roc_ml *roc_ml, uint64_t offset)
 	struct ml *ml = roc_ml_to_ml_priv(roc_ml);
 
 	if (roc_model_is_cn10ka())
-		return ml->pci_dev->mem_resource[0].phys_addr + offset;
+		return ml->pci_dev->pci_mem[0].mem_res.phys_addr + offset;
 	else
-		return ml->pci_dev->mem_resource[0].phys_addr + ML_MLAB_BLK_OFFSET + offset;
+		return ml->pci_dev->pci_mem[0].mem_res.phys_addr + ML_MLAB_BLK_OFFSET + offset;
 }
 
 void
@@ -543,13 +543,14 @@ roc_ml_dev_init(struct roc_ml *roc_ml)
 	ml->pci_dev = pci_dev;
 	dev->roc_ml = roc_ml;
 
-	ml->ml_reg_addr = ml->pci_dev->mem_resource[0].addr;
+	ml->ml_reg_addr = ml->pci_dev->pci_mem[0].mem_res.addr;
 	ml->ml_mlr_base = 0;
 	ml->ml_mlr_base_saved = false;
 
-	plt_ml_dbg("ML: PCI Physical Address : 0x%016lx", ml->pci_dev->mem_resource[0].phys_addr);
+	plt_ml_dbg("ML: PCI Physical Address : 0x%016lx",
+		   ml->pci_dev->pci_mem[0].mem_res.phys_addr);
 	plt_ml_dbg("ML: PCI Virtual Address : 0x%016lx",
-		   PLT_U64_CAST(ml->pci_dev->mem_resource[0].addr));
+		   PLT_U64_CAST(ml->pci_dev->pci_mem[0].mem_res.addr));
 
 	plt_spinlock_init(&roc_ml->sp_spinlock);
 	plt_spinlock_init(&roc_ml->fp_spinlock);
@@ -589,11 +590,12 @@ roc_ml_blk_init(struct roc_bphy *roc_bphy, struct roc_ml *roc_ml)
 
 	plt_ml_dbg(
 		"MLAB: Physical Address : 0x%016lx",
-		PLT_PTR_ADD_U64_CAST(ml->pci_dev->mem_resource[0].phys_addr, ML_MLAB_BLK_OFFSET));
+		PLT_PTR_ADD_U64_CAST(ml->pci_dev->pci_mem[0].mem_res.phys_addr,
+		ML_MLAB_BLK_OFFSET));
 	plt_ml_dbg("MLAB: Virtual Address : 0x%016lx",
-		   PLT_PTR_ADD_U64_CAST(ml->pci_dev->mem_resource[0].addr, ML_MLAB_BLK_OFFSET));
+		   PLT_PTR_ADD_U64_CAST(ml->pci_dev->pci_mem[0].mem_res.addr, ML_MLAB_BLK_OFFSET));
 
-	ml->ml_reg_addr = PLT_PTR_ADD(ml->pci_dev->mem_resource[0].addr, ML_MLAB_BLK_OFFSET);
+	ml->ml_reg_addr = PLT_PTR_ADD(ml->pci_dev->pci_mem[0].mem_res.addr, ML_MLAB_BLK_OFFSET);
 	ml->ml_mlr_base = 0;
 	ml->ml_mlr_base_saved = false;
 
diff --git a/drivers/common/qat/dev/qat_dev_gen1.c b/drivers/common/qat/dev/qat_dev_gen1.c
index cf480dcba8..8fd069275d 100644
--- a/drivers/common/qat/dev/qat_dev_gen1.c
+++ b/drivers/common/qat/dev/qat_dev_gen1.c
@@ -214,7 +214,7 @@ qat_reset_ring_pairs_gen1(struct qat_pci_device *qat_pci_dev __rte_unused)
 const struct rte_mem_resource *
 qat_dev_get_transport_bar_gen1(struct rte_pci_device *pci_dev)
 {
-	return &pci_dev->mem_resource[0];
+	return &pci_dev->pci_mem[0].mem_res;
 }
 
 int
diff --git a/drivers/common/qat/dev/qat_dev_gen4.c b/drivers/common/qat/dev/qat_dev_gen4.c
index 1b3a5deabf..2eff45db3a 100644
--- a/drivers/common/qat/dev/qat_dev_gen4.c
+++ b/drivers/common/qat/dev/qat_dev_gen4.c
@@ -271,14 +271,14 @@ qat_reset_ring_pairs_gen4(struct qat_pci_device *qat_pci_dev)
 static const struct rte_mem_resource *
 qat_dev_get_transport_bar_gen4(struct rte_pci_device *pci_dev)
 {
-	return &pci_dev->mem_resource[0];
+	return &pci_dev->pci_mem[0].mem_res;
 }
 
 static int
 qat_dev_get_misc_bar_gen4(struct rte_mem_resource **mem_resource,
 		struct rte_pci_device *pci_dev)
 {
-	*mem_resource = &pci_dev->mem_resource[2];
+	*mem_resource = &pci_dev->pci_mem[2].mem_res;
 	return 0;
 }
 
diff --git a/drivers/common/sfc_efx/sfc_efx.c b/drivers/common/sfc_efx/sfc_efx.c
index 2dc5545760..6eceede9aa 100644
--- a/drivers/common/sfc_efx/sfc_efx.c
+++ b/drivers/common/sfc_efx/sfc_efx.c
@@ -77,7 +77,7 @@ sfc_efx_find_mem_bar(efsys_pci_config_t *configp, int bar_index,
 
 	result.esb_rid = bar_index;
 	result.esb_dev = dev;
-	result.esb_base = dev->mem_resource[bar_index].addr;
+	result.esb_base = dev->pci_mem[bar_index].mem_res.addr;
 
 	*barp = result;
 
diff --git a/drivers/compress/octeontx/otx_zip.c b/drivers/compress/octeontx/otx_zip.c
index 11471dcbb4..6c00fdbdaa 100644
--- a/drivers/compress/octeontx/otx_zip.c
+++ b/drivers/compress/octeontx/otx_zip.c
@@ -149,10 +149,10 @@ zipvf_create(struct rte_compressdev *compressdev)
 	void     *vbar0;
 	uint64_t reg;
 
-	if (pdev->mem_resource[0].phys_addr == 0ULL)
+	if (pdev->pci_mem[0].mem_res.phys_addr == 0ULL)
 		return -EIO;
 
-	vbar0 = pdev->mem_resource[0].addr;
+	vbar0 = pdev->pci_mem[0].mem_res.addr;
 	if (!vbar0) {
 		ZIP_PMD_ERR("Failed to map BAR0 of %s", dev_name);
 		return -ENODEV;
diff --git a/drivers/crypto/ccp/ccp_dev.c b/drivers/crypto/ccp/ccp_dev.c
index ee30f5ac30..24874d92c0 100644
--- a/drivers/crypto/ccp/ccp_dev.c
+++ b/drivers/crypto/ccp/ccp_dev.c
@@ -67,7 +67,7 @@ ccp_read_hwrng(uint32_t *value)
 	struct ccp_device *dev;
 
 	TAILQ_FOREACH(dev, &ccp_list, next) {
-		void *vaddr = (void *)(dev->pci->mem_resource[2].addr);
+		void *vaddr = (void *)(dev->pci->pci_mem[2].mem_res.addr);
 
 		while (dev->hwrng_retries++ < CCP_MAX_TRNG_RETRIES) {
 			*value = CCP_READ_REG(vaddr, TRNG_OUT_REG);
@@ -493,7 +493,7 @@ ccp_add_device(struct ccp_device *dev)
 
 	dev->id = ccp_dev_id++;
 	dev->qidx = 0;
-	vaddr = (void *)(dev->pci->mem_resource[2].addr);
+	vaddr = (void *)(dev->pci->pci_mem[2].mem_res.addr);
 
 	if (dev->pci->id.device_id == AMD_PCI_CCP_5B) {
 		CCP_WRITE_REG(vaddr, CMD_TRNG_CTL_OFFSET, 0x00012D57);
diff --git a/drivers/crypto/cnxk/cnxk_cryptodev_ops.c b/drivers/crypto/cnxk/cnxk_cryptodev_ops.c
index 86efe75cc3..f5d41e3876 100644
--- a/drivers/crypto/cnxk/cnxk_cryptodev_ops.c
+++ b/drivers/crypto/cnxk/cnxk_cryptodev_ops.c
@@ -400,7 +400,7 @@ cnxk_cpt_queue_pair_setup(struct rte_cryptodev *dev, uint16_t qp_id,
 
 	pci_dev = RTE_DEV_TO_PCI(dev->device);
 
-	if (pci_dev->mem_resource[2].addr == NULL) {
+	if (pci_dev->pci_mem[2].mem_res.addr == NULL) {
 		plt_err("Invalid PCI mem address");
 		return -EIO;
 	}
diff --git a/drivers/crypto/nitrox/nitrox_device.c b/drivers/crypto/nitrox/nitrox_device.c
index 5b319dd681..fc1e9ce6a5 100644
--- a/drivers/crypto/nitrox/nitrox_device.c
+++ b/drivers/crypto/nitrox/nitrox_device.c
@@ -35,7 +35,7 @@ ndev_init(struct nitrox_device *ndev, struct rte_pci_device *pdev)
 	enum nitrox_vf_mode vf_mode;
 
 	ndev->pdev = pdev;
-	ndev->bar_addr = pdev->mem_resource[0].addr;
+	ndev->bar_addr = pdev->pci_mem[0].mem_res.addr;
 	vf_mode = vf_get_vf_config_mode(ndev->bar_addr);
 	ndev->nr_queues = vf_config_mode_to_nr_queues(vf_mode);
 }
@@ -70,7 +70,7 @@ nitrox_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 	int err;
 
 	/* Nitrox CSR space */
-	if (!pdev->mem_resource[0].addr)
+	if (!pdev->pci_mem[0].mem_res.addr)
 		return -EINVAL;
 
 	ndev = ndev_allocate(pdev);
diff --git a/drivers/crypto/octeontx/otx_cryptodev_ops.c b/drivers/crypto/octeontx/otx_cryptodev_ops.c
index 947e1be385..645b2fd979 100644
--- a/drivers/crypto/octeontx/otx_cryptodev_ops.c
+++ b/drivers/crypto/octeontx/otx_cryptodev_ops.c
@@ -157,7 +157,7 @@ otx_cpt_que_pair_setup(struct rte_cryptodev *dev,
 
 	pci_dev = RTE_DEV_TO_PCI(dev->device);
 
-	if (pci_dev->mem_resource[0].addr == NULL) {
+	if (pci_dev->pci_mem[0].mem_res.addr == NULL) {
 		CPT_LOG_ERR("PCI mem address null");
 		return -EIO;
 	}
@@ -1004,7 +1004,7 @@ otx_cpt_dev_create(struct rte_cryptodev *c_dev)
 	char dev_name[32];
 	int ret;
 
-	if (pdev->mem_resource[0].phys_addr == 0ULL)
+	if (pdev->pci_mem[0].mem_res.phys_addr == 0ULL)
 		return -EIO;
 
 	/* for secondary processes, we don't initialise any further as primary
@@ -1025,7 +1025,7 @@ otx_cpt_dev_create(struct rte_cryptodev *c_dev)
 	snprintf(dev_name, 32, "%02x:%02x.%x",
 			pdev->addr.bus, pdev->addr.devid, pdev->addr.function);
 
-	reg_base = pdev->mem_resource[0].addr;
+	reg_base = pdev->pci_mem[0].mem_res.addr;
 	if (!reg_base) {
 		CPT_LOG_ERR("Failed to map BAR0 of %s", dev_name);
 		ret = -ENODEV;
diff --git a/drivers/crypto/virtio/virtio_pci.c b/drivers/crypto/virtio/virtio_pci.c
index 95a43c8801..c27239b8f8 100644
--- a/drivers/crypto/virtio/virtio_pci.c
+++ b/drivers/crypto/virtio/virtio_pci.c
@@ -322,14 +322,14 @@ get_cfg_addr(struct rte_pci_device *dev, struct virtio_pci_cap *cap)
 		return NULL;
 	}
 
-	if (offset + length > dev->mem_resource[bar].len) {
+	if (offset + length > dev->pci_mem[bar].mem_res.len) {
 		VIRTIO_CRYPTO_INIT_LOG_ERR(
 			"invalid cap: overflows bar space: %u > %" PRIu64,
-			offset + length, dev->mem_resource[bar].len);
+			offset + length, dev->pci_mem[bar].mem_res.len);
 		return NULL;
 	}
 
-	base = dev->mem_resource[bar].addr;
+	base = dev->pci_mem[bar].mem_res.addr;
 	if (base == NULL) {
 		VIRTIO_CRYPTO_INIT_LOG_ERR("bar %u base addr is NULL", bar);
 		return NULL;
diff --git a/drivers/dma/cnxk/cnxk_dmadev.c b/drivers/dma/cnxk/cnxk_dmadev.c
index a6f4a31e0e..6294ed5ad5 100644
--- a/drivers/dma/cnxk/cnxk_dmadev.c
+++ b/drivers/dma/cnxk/cnxk_dmadev.c
@@ -637,7 +637,7 @@ cnxk_dmadev_probe(struct rte_pci_driver *pci_drv __rte_unused,
 	struct roc_dpi *rdpi = NULL;
 	int rc;
 
-	if (!pci_dev->mem_resource[0].addr)
+	if (!pci_dev->pci_mem[0].mem_res.addr)
 		return -ENODEV;
 
 	rc = roc_plt_init();
diff --git a/drivers/dma/hisilicon/hisi_dmadev.c b/drivers/dma/hisilicon/hisi_dmadev.c
index 0e11ca14cc..ab2dfb63d8 100644
--- a/drivers/dma/hisilicon/hisi_dmadev.c
+++ b/drivers/dma/hisilicon/hisi_dmadev.c
@@ -894,7 +894,7 @@ hisi_dma_create(struct rte_pci_device *pci_dev, uint8_t queue_id,
 	hw->data = dev->data;
 	hw->revision = revision;
 	hw->reg_layout = hisi_dma_reg_layout(revision);
-	hw->io_base = pci_dev->mem_resource[REG_PCI_BAR_INDEX].addr;
+	hw->io_base = pci_dev->pci_mem[REG_PCI_BAR_INDEX].mem_res.addr;
 	hw->queue_id = queue_id;
 	hw->sq_tail_reg = hisi_dma_queue_regaddr(hw,
 						 HISI_DMA_QUEUE_SQ_TAIL_REG);
@@ -950,7 +950,7 @@ hisi_dma_probe(struct rte_pci_driver *pci_drv __rte_unused,
 
 	rte_pci_device_name(&pci_dev->addr, name, sizeof(name));
 
-	if (pci_dev->mem_resource[2].addr == NULL) {
+	if (pci_dev->pci_mem[2].mem_res.addr == NULL) {
 		HISI_DMA_LOG(ERR, "%s BAR2 is NULL!\n", name);
 		return -ENODEV;
 	}
@@ -961,7 +961,7 @@ hisi_dma_probe(struct rte_pci_driver *pci_drv __rte_unused,
 	HISI_DMA_LOG(DEBUG, "%s read PCI revision: 0x%x", name, revision);
 
 	if (rte_eal_process_type() == RTE_PROC_PRIMARY)
-		hisi_dma_init_gbl(pci_dev->mem_resource[2].addr, revision);
+		hisi_dma_init_gbl(pci_dev->pci_mem[2].mem_res.addr, revision);
 
 	for (i = 0; i < HISI_DMA_MAX_HW_QUEUES; i++) {
 		ret = hisi_dma_create(pci_dev, i, revision);
diff --git a/drivers/dma/idxd/idxd_pci.c b/drivers/dma/idxd/idxd_pci.c
index 781fa02db3..15eb3f22ef 100644
--- a/drivers/dma/idxd/idxd_pci.c
+++ b/drivers/dma/idxd/idxd_pci.c
@@ -188,12 +188,12 @@ init_pci_device(struct rte_pci_device *dev, struct idxd_dmadev *idxd,
 	rte_spinlock_init(&pci->lk);
 
 	/* assign the bar registers, and then configure device */
-	pci->regs = dev->mem_resource[0].addr;
+	pci->regs = dev->pci_mem[0].mem_res.addr;
 	grp_offset = (uint16_t)pci->regs->offsets[0];
 	pci->grp_regs = RTE_PTR_ADD(pci->regs, grp_offset * 0x100);
 	wq_offset = (uint16_t)(pci->regs->offsets[0] >> 16);
 	pci->wq_regs_base = RTE_PTR_ADD(pci->regs, wq_offset * 0x100);
-	pci->portals = dev->mem_resource[2].addr;
+	pci->portals = dev->pci_mem[2].mem_res.addr;
 	pci->wq_cfg_sz = (pci->regs->wqcap >> 24) & 0x0F;
 
 	/* sanity check device status */
diff --git a/drivers/dma/ioat/ioat_dmadev.c b/drivers/dma/ioat/ioat_dmadev.c
index 5fc14bcf22..9fc6d73bb5 100644
--- a/drivers/dma/ioat/ioat_dmadev.c
+++ b/drivers/dma/ioat/ioat_dmadev.c
@@ -644,7 +644,7 @@ ioat_dmadev_create(const char *name, struct rte_pci_device *dev)
 
 	ioat = dmadev->data->dev_private;
 	ioat->dmadev = dmadev;
-	ioat->regs = dev->mem_resource[0].addr;
+	ioat->regs = dev->pci_mem[0].mem_res.addr;
 	ioat->doorbell = &ioat->regs->dmacount;
 	ioat->qcfg.nb_desc = 0;
 	ioat->desc_ring = NULL;
diff --git a/drivers/event/dlb2/pf/dlb2_main.c b/drivers/event/dlb2/pf/dlb2_main.c
index 717aa4fc08..0bb90d6964 100644
--- a/drivers/event/dlb2/pf/dlb2_main.c
+++ b/drivers/event/dlb2/pf/dlb2_main.c
@@ -170,32 +170,32 @@ dlb2_probe(struct rte_pci_device *pdev, const void *probe_args)
 	 */
 
 	/* BAR 0 */
-	if (pdev->mem_resource[0].addr == NULL) {
+	if (pdev->pci_mem[0].mem_res.addr == NULL) {
 		DLB2_ERR(dlb2_dev, "probe: BAR 0 addr (csr_kva) is NULL\n");
 		ret = -EINVAL;
 		goto pci_mmap_bad_addr;
 	}
-	dlb2_dev->hw.func_kva = (void *)(uintptr_t)pdev->mem_resource[0].addr;
-	dlb2_dev->hw.func_phys_addr = pdev->mem_resource[0].phys_addr;
+	dlb2_dev->hw.func_kva = (void *)(uintptr_t)pdev->pci_mem[0].mem_res.addr;
+	dlb2_dev->hw.func_phys_addr = pdev->pci_mem[0].mem_res.phys_addr;
 
 	DLB2_INFO(dlb2_dev, "DLB2 FUNC VA=%p, PA=%p, len=%p\n",
 		  (void *)dlb2_dev->hw.func_kva,
 		  (void *)dlb2_dev->hw.func_phys_addr,
-		  (void *)(pdev->mem_resource[0].len));
+		  (void *)(pdev->pci_mem[0].mem_res.len));
 
 	/* BAR 2 */
-	if (pdev->mem_resource[2].addr == NULL) {
+	if (pdev->pci_mem[2].mem_res.addr == NULL) {
 		DLB2_ERR(dlb2_dev, "probe: BAR 2 addr (func_kva) is NULL\n");
 		ret = -EINVAL;
 		goto pci_mmap_bad_addr;
 	}
-	dlb2_dev->hw.csr_kva = (void *)(uintptr_t)pdev->mem_resource[2].addr;
-	dlb2_dev->hw.csr_phys_addr = pdev->mem_resource[2].phys_addr;
+	dlb2_dev->hw.csr_kva = (void *)(uintptr_t)pdev->pci_mem[2].mem_res.addr;
+	dlb2_dev->hw.csr_phys_addr = pdev->pci_mem[2].mem_res.phys_addr;
 
 	DLB2_INFO(dlb2_dev, "DLB2 CSR VA=%p, PA=%p, len=%p\n",
 		  (void *)dlb2_dev->hw.csr_kva,
 		  (void *)dlb2_dev->hw.csr_phys_addr,
-		  (void *)(pdev->mem_resource[2].len));
+		  (void *)(pdev->pci_mem[2].mem_res.len));
 
 	dlb2_dev->pdev = pdev;
 
diff --git a/drivers/event/octeontx/ssovf_probe.c b/drivers/event/octeontx/ssovf_probe.c
index 2c9601a8ff..04558471c1 100644
--- a/drivers/event/octeontx/ssovf_probe.c
+++ b/drivers/event/octeontx/ssovf_probe.c
@@ -148,23 +148,23 @@ ssowvf_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
 		return 0;
 
-	if (pci_dev->mem_resource[0].addr == NULL ||
-			pci_dev->mem_resource[2].addr == NULL ||
-			pci_dev->mem_resource[4].addr == NULL) {
+	if (pci_dev->pci_mem[0].mem_res.addr == NULL ||
+			pci_dev->pci_mem[2].mem_res.addr == NULL ||
+			pci_dev->pci_mem[4].mem_res.addr == NULL) {
 		mbox_log_err("Empty bars %p %p %p",
-				pci_dev->mem_resource[0].addr,
-				pci_dev->mem_resource[2].addr,
-				pci_dev->mem_resource[4].addr);
+				pci_dev->pci_mem[0].mem_res.addr,
+				pci_dev->pci_mem[2].mem_res.addr,
+				pci_dev->pci_mem[4].mem_res.addr);
 		return -ENODEV;
 	}
 
-	if (pci_dev->mem_resource[4].len != SSOW_BAR4_LEN) {
+	if (pci_dev->pci_mem[4].mem_res.len != SSOW_BAR4_LEN) {
 		mbox_log_err("Bar4 len mismatch %d != %d",
-			SSOW_BAR4_LEN, (int)pci_dev->mem_resource[4].len);
+			SSOW_BAR4_LEN, (int)pci_dev->pci_mem[4].mem_res.len);
 		return -EINVAL;
 	}
 
-	id = pci_dev->mem_resource[4].addr;
+	id = pci_dev->pci_mem[4].mem_res.addr;
 	vfid = id->vfid;
 	if (vfid >= SSO_MAX_VHWS) {
 		mbox_log_err("Invalid vfid(%d/%d)", vfid, SSO_MAX_VHWS);
@@ -173,9 +173,9 @@ ssowvf_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 
 	res = &sdev.hws[vfid];
 	res->vfid = vfid;
-	res->bar0 = pci_dev->mem_resource[0].addr;
-	res->bar2 = pci_dev->mem_resource[2].addr;
-	res->bar4 = pci_dev->mem_resource[4].addr;
+	res->bar0 = pci_dev->pci_mem[0].mem_res.addr;
+	res->bar2 = pci_dev->pci_mem[2].mem_res.addr;
+	res->bar4 = pci_dev->pci_mem[4].mem_res.addr;
 	res->domain = id->domain;
 
 	sdev.total_ssowvfs++;
@@ -229,14 +229,14 @@ ssovf_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
 		return 0;
 
-	if (pci_dev->mem_resource[0].addr == NULL ||
-			pci_dev->mem_resource[2].addr == NULL) {
+	if (pci_dev->pci_mem[0].mem_res.addr == NULL ||
+			pci_dev->pci_mem[2].mem_res.addr == NULL) {
 		mbox_log_err("Empty bars %p %p",
-			pci_dev->mem_resource[0].addr,
-			pci_dev->mem_resource[2].addr);
+			pci_dev->pci_mem[0].mem_res.addr,
+			pci_dev->pci_mem[2].mem_res.addr);
 		return -ENODEV;
 	}
-	idreg = pci_dev->mem_resource[0].addr;
+	idreg = pci_dev->pci_mem[0].mem_res.addr;
 	idreg += SSO_VHGRP_AQ_THR;
 	val = rte_read64(idreg);
 
@@ -250,8 +250,8 @@ ssovf_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 
 	res = &sdev.grp[vfid];
 	res->vfid = vfid;
-	res->bar0 = pci_dev->mem_resource[0].addr;
-	res->bar2 = pci_dev->mem_resource[2].addr;
+	res->bar0 = pci_dev->pci_mem[0].mem_res.addr;
+	res->bar2 = pci_dev->pci_mem[2].mem_res.addr;
 	res->domain = val & 0xffff;
 
 	sdev.total_ssovfs++;
diff --git a/drivers/event/octeontx/timvf_probe.c b/drivers/event/octeontx/timvf_probe.c
index 7ce3eddd7e..9eeb03b9f2 100644
--- a/drivers/event/octeontx/timvf_probe.c
+++ b/drivers/event/octeontx/timvf_probe.c
@@ -112,15 +112,15 @@ timvf_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
 		return 0;
 
-	if (pci_dev->mem_resource[0].addr == NULL ||
-			pci_dev->mem_resource[4].addr == NULL) {
+	if (pci_dev->pci_mem[0].mem_res.addr == NULL ||
+			pci_dev->pci_mem[4].mem_res.addr == NULL) {
 		timvf_log_err("Empty bars %p %p",
-				pci_dev->mem_resource[0].addr,
-				pci_dev->mem_resource[4].addr);
+				pci_dev->pci_mem[0].mem_res.addr,
+				pci_dev->pci_mem[4].mem_res.addr);
 		return -ENODEV;
 	}
 
-	val = rte_read64((uint8_t *)pci_dev->mem_resource[0].addr +
+	val = rte_read64((uint8_t *)pci_dev->pci_mem[0].mem_res.addr +
 			0x100 /* TIM_VRINGX_BASE */);
 	vfid = (val >> 23) & 0xff;
 	if (vfid >= TIM_MAX_RINGS) {
@@ -130,16 +130,16 @@ timvf_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 
 	res = &tdev.rings[tdev.total_timvfs];
 	res->vfid = vfid;
-	res->bar0 = pci_dev->mem_resource[0].addr;
-	res->bar2 = pci_dev->mem_resource[2].addr;
-	res->bar4 = pci_dev->mem_resource[4].addr;
+	res->bar0 = pci_dev->pci_mem[0].mem_res.addr;
+	res->bar2 = pci_dev->pci_mem[2].mem_res.addr;
+	res->bar4 = pci_dev->pci_mem[4].mem_res.addr;
 	res->domain = (val >> 7) & 0xffff;
 	res->in_use = false;
 	tdev.total_timvfs++;
 	rte_wmb();
 
 	timvf_log_dbg("Domain=%d VFid=%d bar0 %p total_timvfs=%d", res->domain,
-			res->vfid, pci_dev->mem_resource[0].addr,
+			res->vfid, pci_dev->pci_mem[0].mem_res.addr,
 			tdev.total_timvfs);
 	return 0;
 }
diff --git a/drivers/event/skeleton/skeleton_eventdev.c b/drivers/event/skeleton/skeleton_eventdev.c
index 8513b9a013..c92604d3cc 100644
--- a/drivers/event/skeleton/skeleton_eventdev.c
+++ b/drivers/event/skeleton/skeleton_eventdev.c
@@ -360,7 +360,7 @@ skeleton_eventdev_init(struct rte_eventdev *eventdev)
 
 	pci_dev = RTE_DEV_TO_PCI(eventdev->dev);
 
-	skel->reg_base = (uintptr_t)pci_dev->mem_resource[0].addr;
+	skel->reg_base = (uintptr_t)pci_dev->pci_mem[0].mem_res.addr;
 	if (!skel->reg_base) {
 		PMD_DRV_ERR("Failed to map BAR0");
 		ret = -ENODEV;
diff --git a/drivers/mempool/octeontx/octeontx_fpavf.c b/drivers/mempool/octeontx/octeontx_fpavf.c
index 1513c632c6..287dad8021 100644
--- a/drivers/mempool/octeontx/octeontx_fpavf.c
+++ b/drivers/mempool/octeontx/octeontx_fpavf.c
@@ -785,11 +785,11 @@ fpavf_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
 		return 0;
 
-	if (pci_dev->mem_resource[0].addr == NULL) {
-		fpavf_log_err("Empty bars %p ", pci_dev->mem_resource[0].addr);
+	if (pci_dev->pci_mem[0].mem_res.addr == NULL) {
+		fpavf_log_err("Empty bars %p ", pci_dev->pci_mem[0].mem_res.addr);
 		return -ENODEV;
 	}
-	idreg = pci_dev->mem_resource[0].addr;
+	idreg = pci_dev->pci_mem[0].mem_res.addr;
 
 	octeontx_fpavf_setup();
 
diff --git a/drivers/net/ark/ark_ethdev.c b/drivers/net/ark/ark_ethdev.c
index b2995427c8..c3f677029f 100644
--- a/drivers/net/ark/ark_ethdev.c
+++ b/drivers/net/ark/ark_ethdev.c
@@ -329,8 +329,8 @@ eth_ark_dev_init(struct rte_eth_dev *dev)
 	dev->rx_pkt_burst = rte_eth_pkt_burst_dummy;
 	dev->tx_pkt_burst = rte_eth_pkt_burst_dummy;
 
-	ark->bar0 = (uint8_t *)pci_dev->mem_resource[0].addr;
-	ark->a_bar = (uint8_t *)pci_dev->mem_resource[2].addr;
+	ark->bar0 = (uint8_t *)pci_dev->pci_mem[0].mem_res.addr;
+	ark->a_bar = (uint8_t *)pci_dev->pci_mem[2].mem_res.addr;
 
 	ark->sysctrl.v  = (void *)&ark->bar0[ARK_SYSCTRL_BASE];
 	ark->mpurx.v  = (void *)&ark->bar0[ARK_MPU_RX_BASE];
diff --git a/drivers/net/atlantic/atl_ethdev.c b/drivers/net/atlantic/atl_ethdev.c
index 3a028f4290..90ee505644 100644
--- a/drivers/net/atlantic/atl_ethdev.c
+++ b/drivers/net/atlantic/atl_ethdev.c
@@ -384,7 +384,7 @@ eth_atl_dev_init(struct rte_eth_dev *eth_dev)
 	/* Vendor and Device ID need to be set before init of shared code */
 	hw->device_id = pci_dev->id.device_id;
 	hw->vendor_id = pci_dev->id.vendor_id;
-	hw->mmio = (void *)pci_dev->mem_resource[0].addr;
+	hw->mmio = (void *)pci_dev->pci_mem[0].mem_res.addr;
 
 	/* Hardware configuration - hardcode */
 	adapter->hw_cfg.is_lro = false;
diff --git a/drivers/net/avp/avp_ethdev.c b/drivers/net/avp/avp_ethdev.c
index b2a08f5635..2956233479 100644
--- a/drivers/net/avp/avp_ethdev.c
+++ b/drivers/net/avp/avp_ethdev.c
@@ -368,8 +368,8 @@ avp_dev_translate_address(struct rte_eth_dev *eth_dev,
 	void *addr;
 	unsigned int i;
 
-	addr = pci_dev->mem_resource[RTE_AVP_PCI_MEMORY_BAR].addr;
-	resource = &pci_dev->mem_resource[RTE_AVP_PCI_MEMMAP_BAR];
+	addr = pci_dev->pci_mem[RTE_AVP_PCI_MEMORY_BAR].mem_res.addr;
+	resource = &pci_dev->pci_mem[RTE_AVP_PCI_MEMMAP_BAR].mem_res;
 	info = (struct rte_avp_memmap_info *)resource->addr;
 
 	offset = 0;
@@ -421,7 +421,7 @@ avp_dev_check_regions(struct rte_eth_dev *eth_dev)
 
 	/* Dump resource info for debug */
 	for (i = 0; i < PCI_MAX_RESOURCE; i++) {
-		resource = &pci_dev->mem_resource[i];
+		resource = &pci_dev->pci_mem[i].mem_res;
 		if ((resource->phys_addr == 0) || (resource->len == 0))
 			continue;
 
@@ -554,7 +554,7 @@ _avp_set_queue_counts(struct rte_eth_dev *eth_dev)
 	struct rte_avp_device_info *host_info;
 	void *addr;
 
-	addr = pci_dev->mem_resource[RTE_AVP_PCI_DEVICE_BAR].addr;
+	addr = pci_dev->pci_mem[RTE_AVP_PCI_DEVICE_BAR].mem_res.addr;
 	host_info = (struct rte_avp_device_info *)addr;
 
 	/*
@@ -664,7 +664,7 @@ avp_dev_interrupt_handler(void *data)
 {
 	struct rte_eth_dev *eth_dev = data;
 	struct rte_pci_device *pci_dev = RTE_ETH_DEV_TO_PCI(eth_dev);
-	void *registers = pci_dev->mem_resource[RTE_AVP_PCI_MMIO_BAR].addr;
+	void *registers = pci_dev->pci_mem[RTE_AVP_PCI_MMIO_BAR].mem_res.addr;
 	uint32_t status, value;
 	int ret;
 
@@ -723,7 +723,7 @@ static int
 avp_dev_enable_interrupts(struct rte_eth_dev *eth_dev)
 {
 	struct rte_pci_device *pci_dev = RTE_ETH_DEV_TO_PCI(eth_dev);
-	void *registers = pci_dev->mem_resource[RTE_AVP_PCI_MMIO_BAR].addr;
+	void *registers = pci_dev->pci_mem[RTE_AVP_PCI_MMIO_BAR].mem_res.addr;
 	int ret;
 
 	if (registers == NULL)
@@ -748,7 +748,7 @@ static int
 avp_dev_disable_interrupts(struct rte_eth_dev *eth_dev)
 {
 	struct rte_pci_device *pci_dev = RTE_ETH_DEV_TO_PCI(eth_dev);
-	void *registers = pci_dev->mem_resource[RTE_AVP_PCI_MMIO_BAR].addr;
+	void *registers = pci_dev->pci_mem[RTE_AVP_PCI_MMIO_BAR].mem_res.addr;
 	int ret;
 
 	if (registers == NULL)
@@ -793,7 +793,7 @@ static int
 avp_dev_migration_pending(struct rte_eth_dev *eth_dev)
 {
 	struct rte_pci_device *pci_dev = RTE_ETH_DEV_TO_PCI(eth_dev);
-	void *registers = pci_dev->mem_resource[RTE_AVP_PCI_MMIO_BAR].addr;
+	void *registers = pci_dev->pci_mem[RTE_AVP_PCI_MMIO_BAR].mem_res.addr;
 	uint32_t value;
 
 	if (registers == NULL)
@@ -824,7 +824,7 @@ avp_dev_create(struct rte_pci_device *pci_dev,
 	struct rte_mem_resource *resource;
 	unsigned int i;
 
-	resource = &pci_dev->mem_resource[RTE_AVP_PCI_DEVICE_BAR];
+	resource = &pci_dev->pci_mem[RTE_AVP_PCI_DEVICE_BAR].mem_res;
 	if (resource->addr == NULL) {
 		PMD_DRV_LOG(ERR, "BAR%u is not mapped\n",
 			    RTE_AVP_PCI_DEVICE_BAR);
@@ -1992,7 +1992,7 @@ avp_dev_configure(struct rte_eth_dev *eth_dev)
 		goto unlock;
 	}
 
-	addr = pci_dev->mem_resource[RTE_AVP_PCI_DEVICE_BAR].addr;
+	addr = pci_dev->pci_mem[RTE_AVP_PCI_DEVICE_BAR].mem_res.addr;
 	host_info = (struct rte_avp_device_info *)addr;
 
 	/* Setup required number of queues */
diff --git a/drivers/net/axgbe/axgbe_ethdev.c b/drivers/net/axgbe/axgbe_ethdev.c
index 48714eebe6..30a3b62b05 100644
--- a/drivers/net/axgbe/axgbe_ethdev.c
+++ b/drivers/net/axgbe/axgbe_ethdev.c
@@ -2216,12 +2216,12 @@ eth_axgbe_dev_init(struct rte_eth_dev *eth_dev)
 	pdata->pci_dev = pci_dev;
 
 	pdata->xgmac_regs =
-		(void *)pci_dev->mem_resource[AXGBE_AXGMAC_BAR].addr;
+		(void *)pci_dev->pci_mem[AXGBE_AXGMAC_BAR].mem_res.addr;
 	pdata->xprop_regs = (void *)((uint8_t *)pdata->xgmac_regs
 				     + AXGBE_MAC_PROP_OFFSET);
 	pdata->xi2c_regs = (void *)((uint8_t *)pdata->xgmac_regs
 				    + AXGBE_I2C_CTRL_OFFSET);
-	pdata->xpcs_regs = (void *)pci_dev->mem_resource[AXGBE_XPCS_BAR].addr;
+	pdata->xpcs_regs = (void *)pci_dev->pci_mem[AXGBE_XPCS_BAR].mem_res.addr;
 
 	/* version specific driver data*/
 	if (pci_dev->id.device_id == AMD_PCI_AXGBE_DEVICE_V2A)
diff --git a/drivers/net/bnx2x/bnx2x_ethdev.c b/drivers/net/bnx2x/bnx2x_ethdev.c
index 4448cf2de2..68fa27841d 100644
--- a/drivers/net/bnx2x/bnx2x_ethdev.c
+++ b/drivers/net/bnx2x/bnx2x_ethdev.c
@@ -656,12 +656,12 @@ bnx2x_common_dev_init(struct rte_eth_dev *eth_dev, int is_vf)
 		sc->flags = BNX2X_IS_VF_FLAG;
 
 	sc->pcie_func = pci_dev->addr.function;
-	sc->bar[BAR0].base_addr = (void *)pci_dev->mem_resource[0].addr;
+	sc->bar[BAR0].base_addr = (void *)pci_dev->pci_mem[0].mem_res.addr;
 	if (is_vf)
 		sc->bar[BAR1].base_addr = (void *)
-			((uintptr_t)pci_dev->mem_resource[0].addr + PXP_VF_ADDR_DB_START);
+			((uintptr_t)pci_dev->pci_mem[0].mem_res.addr + PXP_VF_ADDR_DB_START);
 	else
-		sc->bar[BAR1].base_addr = pci_dev->mem_resource[2].addr;
+		sc->bar[BAR1].base_addr = pci_dev->pci_mem[2].mem_res.addr;
 
 	assert(sc->bar[BAR0].base_addr);
 	assert(sc->bar[BAR1].base_addr);
diff --git a/drivers/net/bnxt/bnxt_ethdev.c b/drivers/net/bnxt/bnxt_ethdev.c
index ef7b8859d9..1772ced4a7 100644
--- a/drivers/net/bnxt/bnxt_ethdev.c
+++ b/drivers/net/bnxt/bnxt_ethdev.c
@@ -4682,8 +4682,8 @@ static int bnxt_map_pci_bars(struct rte_eth_dev *eth_dev)
 	struct bnxt *bp = eth_dev->data->dev_private;
 
 	/* enable device (incl. PCI PM wakeup), and bus-mastering */
-	bp->bar0 = (void *)pci_dev->mem_resource[0].addr;
-	bp->doorbell_base = (void *)pci_dev->mem_resource[2].addr;
+	bp->bar0 = (void *)pci_dev->pci_mem[0].mem_res.addr;
+	bp->doorbell_base = (void *)pci_dev->pci_mem[2].mem_res.addr;
 	if (!bp->bar0 || !bp->doorbell_base) {
 		PMD_DRV_LOG(ERR, "Unable to access Hardware\n");
 		return -ENODEV;
@@ -5932,8 +5932,8 @@ bnxt_dev_init(struct rte_eth_dev *eth_dev, void *params __rte_unused)
 	PMD_DRV_LOG(INFO,
 		    "Found %s device at mem %" PRIX64 ", node addr %pM\n",
 		    DRV_MODULE_NAME,
-		    pci_dev->mem_resource[0].phys_addr,
-		    pci_dev->mem_resource[0].addr);
+		    pci_dev->pci_mem[0].mem_res.phys_addr,
+		    pci_dev->pci_mem[0].mem_res.addr);
 
 	return 0;
 
diff --git a/drivers/net/cpfl/cpfl_ethdev.c b/drivers/net/cpfl/cpfl_ethdev.c
index ede730fd50..dbfb74123e 100644
--- a/drivers/net/cpfl/cpfl_ethdev.c
+++ b/drivers/net/cpfl/cpfl_ethdev.c
@@ -1172,8 +1172,8 @@ cpfl_adapter_ext_init(struct rte_pci_device *pci_dev, struct cpfl_adapter_ext *a
 	struct idpf_hw *hw = &base->hw;
 	int ret = 0;
 
-	hw->hw_addr = (void *)pci_dev->mem_resource[0].addr;
-	hw->hw_addr_len = pci_dev->mem_resource[0].len;
+	hw->hw_addr = (void *)pci_dev->pci_mem[0].mem_res.addr;
+	hw->hw_addr_len = pci_dev->pci_mem[0].mem_res.len;
 	hw->back = base;
 	hw->vendor_id = pci_dev->id.vendor_id;
 	hw->device_id = pci_dev->id.device_id;
diff --git a/drivers/net/cxgbe/cxgbe_ethdev.c b/drivers/net/cxgbe/cxgbe_ethdev.c
index 45bbeaef0c..f02f211100 100644
--- a/drivers/net/cxgbe/cxgbe_ethdev.c
+++ b/drivers/net/cxgbe/cxgbe_ethdev.c
@@ -1729,7 +1729,7 @@ static int eth_cxgbe_dev_init(struct rte_eth_dev *eth_dev)
 		return -1;
 
 	adapter->use_unpacked_mode = 1;
-	adapter->regs = (void *)pci_dev->mem_resource[0].addr;
+	adapter->regs = (void *)pci_dev->pci_mem[0].mem_res.addr;
 	if (!adapter->regs) {
 		dev_err(adapter, "%s: cannot map device registers\n", __func__);
 		err = -ENOMEM;
diff --git a/drivers/net/cxgbe/cxgbe_main.c b/drivers/net/cxgbe/cxgbe_main.c
index f8dd833032..4a88f7a23e 100644
--- a/drivers/net/cxgbe/cxgbe_main.c
+++ b/drivers/net/cxgbe/cxgbe_main.c
@@ -2228,7 +2228,7 @@ int cxgbe_probe(struct adapter *adapter)
 		if (qpp > num_seg)
 			dev_warn(adapter, "Incorrect SGE EGRESS QUEUES_PER_PAGE configuration, continuing in debug mode\n");
 
-		adapter->bar2 = (void *)adapter->pdev->mem_resource[2].addr;
+		adapter->bar2 = (void *)adapter->pdev->pci_mem[2].mem_res.addr;
 		if (!adapter->bar2) {
 			dev_err(adapter, "cannot map device bar2 region\n");
 			err = -ENOMEM;
diff --git a/drivers/net/cxgbe/cxgbevf_ethdev.c b/drivers/net/cxgbe/cxgbevf_ethdev.c
index a62c56c2b9..f966b20933 100644
--- a/drivers/net/cxgbe/cxgbevf_ethdev.c
+++ b/drivers/net/cxgbe/cxgbevf_ethdev.c
@@ -148,7 +148,7 @@ static int eth_cxgbevf_dev_init(struct rte_eth_dev *eth_dev)
 		return -1;
 
 	adapter->use_unpacked_mode = 1;
-	adapter->regs = (void *)pci_dev->mem_resource[0].addr;
+	adapter->regs = (void *)pci_dev->pci_mem[0].mem_res.addr;
 	if (!adapter->regs) {
 		dev_err(adapter, "%s: cannot map device registers\n", __func__);
 		err = -ENOMEM;
diff --git a/drivers/net/cxgbe/cxgbevf_main.c b/drivers/net/cxgbe/cxgbevf_main.c
index d0c93f8ac3..d5f3b3fcd8 100644
--- a/drivers/net/cxgbe/cxgbevf_main.c
+++ b/drivers/net/cxgbe/cxgbevf_main.c
@@ -184,7 +184,7 @@ int cxgbevf_probe(struct adapter *adapter)
 		return err;
 
 	if (!is_t4(adapter->params.chip)) {
-		adapter->bar2 = (void *)adapter->pdev->mem_resource[2].addr;
+		adapter->bar2 = (void *)adapter->pdev->pci_mem[2].mem_res.addr;
 		if (!adapter->bar2) {
 			dev_err(adapter, "cannot map device bar2 region\n");
 			err = -ENOMEM;
diff --git a/drivers/net/e1000/em_ethdev.c b/drivers/net/e1000/em_ethdev.c
index 8ee9be12ad..995c78f6b5 100644
--- a/drivers/net/e1000/em_ethdev.c
+++ b/drivers/net/e1000/em_ethdev.c
@@ -265,13 +265,13 @@ eth_em_dev_init(struct rte_eth_dev *eth_dev)
 
 	rte_eth_copy_pci_info(eth_dev, pci_dev);
 
-	hw->hw_addr = (void *)pci_dev->mem_resource[0].addr;
+	hw->hw_addr = (void *)pci_dev->pci_mem[0].mem_res.addr;
 	hw->device_id = pci_dev->id.device_id;
 	adapter->stopped = 0;
 
 	/* For ICH8 support we'll need to map the flash memory BAR */
 	if (eth_em_dev_is_ich8(hw))
-		hw->flash_address = (void *)pci_dev->mem_resource[1].addr;
+		hw->flash_address = (void *)pci_dev->pci_mem[1].mem_res.addr;
 
 	if (e1000_setup_init_funcs(hw, TRUE) != E1000_SUCCESS ||
 			em_hw_init(hw) != 0) {
diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c
index 8858f975f8..620192a015 100644
--- a/drivers/net/e1000/igb_ethdev.c
+++ b/drivers/net/e1000/igb_ethdev.c
@@ -743,7 +743,7 @@ eth_igb_dev_init(struct rte_eth_dev *eth_dev)
 
 	rte_eth_copy_pci_info(eth_dev, pci_dev);
 
-	hw->hw_addr= (void *)pci_dev->mem_resource[0].addr;
+	hw->hw_addr = (void *)pci_dev->pci_mem[0].mem_res.addr;
 
 	igb_identify_hardware(eth_dev, pci_dev);
 	if (e1000_setup_init_funcs(hw, FALSE) != E1000_SUCCESS) {
@@ -938,7 +938,7 @@ eth_igbvf_dev_init(struct rte_eth_dev *eth_dev)
 
 	hw->device_id = pci_dev->id.device_id;
 	hw->vendor_id = pci_dev->id.vendor_id;
-	hw->hw_addr = (void *)pci_dev->mem_resource[0].addr;
+	hw->hw_addr = (void *)pci_dev->pci_mem[0].mem_res.addr;
 	adapter->stopped = 0;
 
 	/* Initialize the shared code (base driver) */
diff --git a/drivers/net/ena/ena_ethdev.c b/drivers/net/ena/ena_ethdev.c
index efcb163027..6733ae77ff 100644
--- a/drivers/net/ena/ena_ethdev.c
+++ b/drivers/net/ena/ena_ethdev.c
@@ -2121,8 +2121,8 @@ static int eth_ena_dev_init(struct rte_eth_dev *eth_dev)
 
 	intr_handle = pci_dev->intr_handle;
 
-	adapter->regs = pci_dev->mem_resource[ENA_REGS_BAR].addr;
-	adapter->dev_mem_base = pci_dev->mem_resource[ENA_MEM_BAR].addr;
+	adapter->regs = pci_dev->pci_mem[ENA_REGS_BAR].mem_res.addr;
+	adapter->dev_mem_base = pci_dev->pci_mem[ENA_MEM_BAR].mem_res.addr;
 
 	if (!adapter->regs) {
 		PMD_INIT_LOG(CRIT, "Failed to access registers BAR(%d)\n",
diff --git a/drivers/net/enetc/enetc_ethdev.c b/drivers/net/enetc/enetc_ethdev.c
index 1b4337bc48..307d2cfd3b 100644
--- a/drivers/net/enetc/enetc_ethdev.c
+++ b/drivers/net/enetc/enetc_ethdev.c
@@ -883,7 +883,7 @@ enetc_dev_init(struct rte_eth_dev *eth_dev)
 	eth_dev->tx_pkt_burst = &enetc_xmit_pkts;
 
 	/* Retrieving and storing the HW base address of device */
-	hw->hw.reg = (void *)pci_dev->mem_resource[0].addr;
+	hw->hw.reg = (void *)pci_dev->pci_mem[0].mem_res.addr;
 	hw->device_id = pci_dev->id.device_id;
 
 	error = enetc_hardware_init(hw);
diff --git a/drivers/net/enic/enic_main.c b/drivers/net/enic/enic_main.c
index 19a99a82c5..df5de9bf4f 100644
--- a/drivers/net/enic/enic_main.c
+++ b/drivers/net/enic/enic_main.c
@@ -1914,8 +1914,8 @@ int enic_probe(struct enic *enic)
 	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
 		return 0;
 
-	enic->bar0.vaddr = (void *)pdev->mem_resource[0].addr;
-	enic->bar0.len = pdev->mem_resource[0].len;
+	enic->bar0.vaddr = (void *)pdev->pci_mem[0].mem_res.addr;
+	enic->bar0.len = pdev->pci_mem[0].mem_res.len;
 
 	/* Register vNIC device */
 	enic->vdev = vnic_dev_register(NULL, enic, enic->pdev, &enic->bar0, 1);
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 8b83063f0a..209140ac9c 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -3090,7 +3090,7 @@ eth_fm10k_dev_init(struct rte_eth_dev *dev)
 	hw->subsystem_device_id = pdev->id.subsystem_device_id;
 	hw->subsystem_vendor_id = pdev->id.subsystem_vendor_id;
 	hw->revision_id = 0;
-	hw->hw_addr = (void *)pdev->mem_resource[0].addr;
+	hw->hw_addr = (void *)pdev->pci_mem[0].mem_res.addr;
 	if (hw->hw_addr == NULL) {
 		PMD_INIT_LOG(ERR, "Bad mem resource."
 			" Try to refuse unused devices.");
diff --git a/drivers/net/gve/gve_ethdev.c b/drivers/net/gve/gve_ethdev.c
index cf28a4a3b7..d3e2f146b6 100644
--- a/drivers/net/gve/gve_ethdev.c
+++ b/drivers/net/gve/gve_ethdev.c
@@ -777,13 +777,13 @@ gve_dev_init(struct rte_eth_dev *eth_dev)
 
 	pci_dev = RTE_DEV_TO_PCI(eth_dev->device);
 
-	reg_bar = pci_dev->mem_resource[GVE_REG_BAR].addr;
+	reg_bar = pci_dev->pci_mem[GVE_REG_BAR].mem_res.addr;
 	if (!reg_bar) {
 		PMD_DRV_LOG(ERR, "Failed to map pci bar!");
 		return -ENOMEM;
 	}
 
-	db_bar = pci_dev->mem_resource[GVE_DB_BAR].addr;
+	db_bar = pci_dev->pci_mem[GVE_DB_BAR].mem_res.addr;
 	if (!db_bar) {
 		PMD_DRV_LOG(ERR, "Failed to map doorbell bar!");
 		return -ENOMEM;
diff --git a/drivers/net/hinic/base/hinic_pmd_hwif.c b/drivers/net/hinic/base/hinic_pmd_hwif.c
index 2d3f192b21..a10aa5780f 100644
--- a/drivers/net/hinic/base/hinic_pmd_hwif.c
+++ b/drivers/net/hinic/base/hinic_pmd_hwif.c
@@ -398,7 +398,7 @@ static int hinic_init_hwif(struct hinic_hwdev *hwdev, void *cfg_reg_base,
 	int err;
 
 	pci_dev = (struct rte_pci_device *)(hwdev->pcidev_hdl);
-	db_bar_len = pci_dev->mem_resource[HINIC_DB_MEM_BAR].len;
+	db_bar_len = pci_dev->pci_mem[HINIC_DB_MEM_BAR].mem_res.len;
 
 	hwif = hwdev->hwif;
 
@@ -470,16 +470,16 @@ static void hinic_get_mmio(struct hinic_hwdev *hwdev, void **cfg_regs_base,
 	uint64_t bar0_phy_addr;
 	uint64_t pagesize = sysconf(_SC_PAGESIZE);
 
-	*cfg_regs_base = pci_dev->mem_resource[HINIC_CFG_REGS_BAR].addr;
-	*intr_base = pci_dev->mem_resource[HINIC_INTR_MSI_BAR].addr;
-	*db_base = pci_dev->mem_resource[HINIC_DB_MEM_BAR].addr;
+	*cfg_regs_base = pci_dev->pci_mem[HINIC_CFG_REGS_BAR].mem_res.addr;
+	*intr_base = pci_dev->pci_mem[HINIC_INTR_MSI_BAR].mem_res.addr;
+	*db_base = pci_dev->pci_mem[HINIC_DB_MEM_BAR].mem_res.addr;
 
-	bar0_size = pci_dev->mem_resource[HINIC_CFG_REGS_BAR].len;
-	bar2_size = pci_dev->mem_resource[HINIC_INTR_MSI_BAR].len;
+	bar0_size = pci_dev->pci_mem[HINIC_CFG_REGS_BAR].mem_res.len;
+	bar2_size = pci_dev->pci_mem[HINIC_INTR_MSI_BAR].mem_res.len;
 
 	if (pagesize == PAGE_SIZE_64K && (bar0_size % pagesize != 0)) {
 		bar0_phy_addr =
-			pci_dev->mem_resource[HINIC_CFG_REGS_BAR].phys_addr;
+			pci_dev->pci_mem[HINIC_CFG_REGS_BAR].mem_res.phys_addr;
 		if (bar0_phy_addr % pagesize != 0 &&
 		(bar0_size + bar2_size <= pagesize) &&
 		bar2_size >= bar0_size) {
diff --git a/drivers/net/hns3/hns3_ethdev.c b/drivers/net/hns3/hns3_ethdev.c
index 36896f8989..b54945b575 100644
--- a/drivers/net/hns3/hns3_ethdev.c
+++ b/drivers/net/hns3/hns3_ethdev.c
@@ -4525,7 +4525,7 @@ hns3_init_pf(struct rte_eth_dev *eth_dev)
 	PMD_INIT_FUNC_TRACE();
 
 	/* Get hardware io base address from pcie BAR2 IO space */
-	hw->io_base = pci_dev->mem_resource[2].addr;
+	hw->io_base = pci_dev->pci_mem[2].mem_res.addr;
 
 	/* Firmware command queue initialize */
 	ret = hns3_cmd_init_queue(hw);
diff --git a/drivers/net/hns3/hns3_ethdev_vf.c b/drivers/net/hns3/hns3_ethdev_vf.c
index d051a1357b..490caf3a74 100644
--- a/drivers/net/hns3/hns3_ethdev_vf.c
+++ b/drivers/net/hns3/hns3_ethdev_vf.c
@@ -1414,7 +1414,7 @@ hns3vf_init_vf(struct rte_eth_dev *eth_dev)
 	PMD_INIT_FUNC_TRACE();
 
 	/* Get hardware io base address from pcie BAR2 IO space */
-	hw->io_base = pci_dev->mem_resource[2].addr;
+	hw->io_base = pci_dev->pci_mem[2].mem_res.addr;
 
 	/* Firmware command queue initialize */
 	ret = hns3_cmd_init_queue(hw);
diff --git a/drivers/net/hns3/hns3_rxtx.c b/drivers/net/hns3/hns3_rxtx.c
index 4065c519c3..d0b07265c1 100644
--- a/drivers/net/hns3/hns3_rxtx.c
+++ b/drivers/net/hns3/hns3_rxtx.c
@@ -2923,8 +2923,8 @@ hns3_tx_push_get_queue_tail_reg(struct rte_eth_dev *dev, uint16_t queue_id)
 	 *
 	 * The quick doorbell located at 64B offset in the TQP region.
 	 */
-	return (char *)pci_dev->mem_resource[bar_id].addr +
-			(pci_dev->mem_resource[bar_id].len >> 1) +
+	return (char *)pci_dev->pci_mem[bar_id].mem_res.addr +
+			(pci_dev->pci_mem[bar_id].mem_res.len >> 1) +
 			HNS3_TX_PUSH_TQP_REGION_SIZE * queue_id +
 			HNS3_TX_PUSH_QUICK_DOORBELL_OFFSET;
 }
diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index cb0070f94b..a1d56570df 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -1449,7 +1449,7 @@ eth_i40e_dev_init(struct rte_eth_dev *dev, void *init_params __rte_unused)
 	pf->dev_data = dev->data;
 
 	hw->back = I40E_PF_TO_ADAPTER(pf);
-	hw->hw_addr = (uint8_t *)(pci_dev->mem_resource[0].addr);
+	hw->hw_addr = (uint8_t *)(pci_dev->pci_mem[0].mem_res.addr);
 	if (!hw->hw_addr) {
 		PMD_INIT_LOG(ERR,
 			"Hardware is not available, as address is NULL");
diff --git a/drivers/net/iavf/iavf_ethdev.c b/drivers/net/iavf/iavf_ethdev.c
index f6d68403ce..a96d1f258b 100644
--- a/drivers/net/iavf/iavf_ethdev.c
+++ b/drivers/net/iavf/iavf_ethdev.c
@@ -2605,7 +2605,7 @@ iavf_dev_init(struct rte_eth_dev *eth_dev)
 	hw->bus.bus_id = pci_dev->addr.bus;
 	hw->bus.device = pci_dev->addr.devid;
 	hw->bus.func = pci_dev->addr.function;
-	hw->hw_addr = (void *)pci_dev->mem_resource[0].addr;
+	hw->hw_addr = (void *)pci_dev->pci_mem[0].mem_res.addr;
 	hw->back = IAVF_DEV_PRIVATE_TO_ADAPTER(eth_dev->data->dev_private);
 	adapter->dev_data = eth_dev->data;
 	adapter->stopped = 1;
diff --git a/drivers/net/ice/ice_dcf.c b/drivers/net/ice/ice_dcf.c
index 1c3d22ae0f..e2be2064be 100644
--- a/drivers/net/ice/ice_dcf.c
+++ b/drivers/net/ice/ice_dcf.c
@@ -618,7 +618,7 @@ ice_dcf_init_hw(struct rte_eth_dev *eth_dev, struct ice_dcf_hw *hw)
 
 	hw->resetting = false;
 
-	hw->avf.hw_addr = pci_dev->mem_resource[0].addr;
+	hw->avf.hw_addr = pci_dev->pci_mem[0].mem_res.addr;
 	hw->avf.back = hw;
 
 	hw->avf.bus.bus_id = pci_dev->addr.bus;
diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c
index 9a88cf9796..9150e077d2 100644
--- a/drivers/net/ice/ice_ethdev.c
+++ b/drivers/net/ice/ice_ethdev.c
@@ -2293,7 +2293,7 @@ ice_dev_init(struct rte_eth_dev *dev)
 	pf->adapter = ICE_DEV_PRIVATE_TO_ADAPTER(dev->data->dev_private);
 	pf->dev_data = dev->data;
 	hw->back = pf->adapter;
-	hw->hw_addr = (uint8_t *)pci_dev->mem_resource[0].addr;
+	hw->hw_addr = (uint8_t *)pci_dev->pci_mem[0].mem_res.addr;
 	hw->vendor_id = pci_dev->id.vendor_id;
 	hw->device_id = pci_dev->id.device_id;
 	hw->subsystem_vendor_id = pci_dev->id.subsystem_vendor_id;
diff --git a/drivers/net/idpf/idpf_ethdev.c b/drivers/net/idpf/idpf_ethdev.c
index e02ec2ec5a..ab66ac4950 100644
--- a/drivers/net/idpf/idpf_ethdev.c
+++ b/drivers/net/idpf/idpf_ethdev.c
@@ -1135,8 +1135,8 @@ idpf_adapter_ext_init(struct rte_pci_device *pci_dev, struct idpf_adapter_ext *a
 	struct idpf_hw *hw = &base->hw;
 	int ret = 0;
 
-	hw->hw_addr = (void *)pci_dev->mem_resource[0].addr;
-	hw->hw_addr_len = pci_dev->mem_resource[0].len;
+	hw->hw_addr = (void *)pci_dev->pci_mem[0].mem_res.addr;
+	hw->hw_addr_len = pci_dev->pci_mem[0].mem_res.len;
 	hw->back = base;
 	hw->vendor_id = pci_dev->id.vendor_id;
 	hw->device_id = pci_dev->id.device_id;
diff --git a/drivers/net/igc/igc_ethdev.c b/drivers/net/igc/igc_ethdev.c
index fab2ab6d1c..091457cb23 100644
--- a/drivers/net/igc/igc_ethdev.c
+++ b/drivers/net/igc/igc_ethdev.c
@@ -1343,7 +1343,7 @@ eth_igc_dev_init(struct rte_eth_dev *dev)
 	dev->data->dev_flags |= RTE_ETH_DEV_AUTOFILL_QUEUE_XSTATS;
 
 	hw->back = pci_dev;
-	hw->hw_addr = (void *)pci_dev->mem_resource[0].addr;
+	hw->hw_addr = (void *)pci_dev->pci_mem[0].mem_res.addr;
 
 	igc_identify_hardware(dev, pci_dev);
 	if (igc_setup_init_funcs(hw, false) != IGC_SUCCESS) {
diff --git a/drivers/net/ionic/ionic_dev_pci.c b/drivers/net/ionic/ionic_dev_pci.c
index 5e74a6da71..7bd4b4961c 100644
--- a/drivers/net/ionic/ionic_dev_pci.c
+++ b/drivers/net/ionic/ionic_dev_pci.c
@@ -234,7 +234,7 @@ eth_ionic_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 
 	bars.num_bars = 0;
 	for (i = 0; i < PCI_MAX_RESOURCE && i < IONIC_BARS_MAX; i++) {
-		resource = &pci_dev->mem_resource[i];
+		resource = &pci_dev->pci_mem[i].mem_res;
 		if (resource->phys_addr == 0 || resource->len == 0)
 			continue;
 
diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index 88118bc305..9172bf4c55 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -1136,7 +1136,7 @@ eth_ixgbe_dev_init(struct rte_eth_dev *eth_dev, void *init_params __rte_unused)
 	/* Vendor and Device ID need to be set before init of shared code */
 	hw->device_id = pci_dev->id.device_id;
 	hw->vendor_id = pci_dev->id.vendor_id;
-	hw->hw_addr = (void *)pci_dev->mem_resource[0].addr;
+	hw->hw_addr = (void *)pci_dev->pci_mem[0].mem_res.addr;
 	hw->allow_unsupported_sfp = 1;
 
 	/* Initialize the shared code (base driver) */
@@ -1634,7 +1634,7 @@ eth_ixgbevf_dev_init(struct rte_eth_dev *eth_dev)
 
 	hw->device_id = pci_dev->id.device_id;
 	hw->vendor_id = pci_dev->id.vendor_id;
-	hw->hw_addr = (void *)pci_dev->mem_resource[0].addr;
+	hw->hw_addr = (void *)pci_dev->pci_mem[0].mem_res.addr;
 
 	/* initialize the vfta */
 	memset(shadow_vfta, 0, sizeof(*shadow_vfta));
diff --git a/drivers/net/liquidio/lio_ethdev.c b/drivers/net/liquidio/lio_ethdev.c
index ebcfbb1a5c..49885a648c 100644
--- a/drivers/net/liquidio/lio_ethdev.c
+++ b/drivers/net/liquidio/lio_ethdev.c
@@ -2071,8 +2071,8 @@ lio_eth_dev_init(struct rte_eth_dev *eth_dev)
 
 	rte_eth_copy_pci_info(eth_dev, pdev);
 
-	if (pdev->mem_resource[0].addr) {
-		lio_dev->hw_addr = pdev->mem_resource[0].addr;
+	if (pdev->pci_mem[0].mem_res.addr) {
+		lio_dev->hw_addr = pdev->pci_mem[0].mem_res.addr;
 	} else {
 		PMD_INIT_LOG(ERR, "ERROR: Failed to map BAR0\n");
 		return -ENODEV;
diff --git a/drivers/net/nfp/nfp_ethdev.c b/drivers/net/nfp/nfp_ethdev.c
index 56fb8e8c73..65a86ea35b 100644
--- a/drivers/net/nfp/nfp_ethdev.c
+++ b/drivers/net/nfp/nfp_ethdev.c
@@ -543,7 +543,7 @@ nfp_net_init(struct rte_eth_dev *eth_dev)
 		     pci_dev->addr.domain, pci_dev->addr.bus,
 		     pci_dev->addr.devid, pci_dev->addr.function);
 
-	hw->ctrl_bar = (uint8_t *)pci_dev->mem_resource[0].addr;
+	hw->ctrl_bar = (uint8_t *)pci_dev->pci_mem[0].mem_res.addr;
 	if (hw->ctrl_bar == NULL) {
 		PMD_DRV_LOG(ERR,
 			"hw->ctrl_bar is NULL. BAR0 not configured");
diff --git a/drivers/net/nfp/nfp_ethdev_vf.c b/drivers/net/nfp/nfp_ethdev_vf.c
index d69ac8cd37..14c9219e25 100644
--- a/drivers/net/nfp/nfp_ethdev_vf.c
+++ b/drivers/net/nfp/nfp_ethdev_vf.c
@@ -290,7 +290,7 @@ nfp_netvf_init(struct rte_eth_dev *eth_dev)
 
 	hw = NFP_NET_DEV_PRIVATE_TO_HW(eth_dev->data->dev_private);
 
-	hw->ctrl_bar = (uint8_t *)pci_dev->mem_resource[0].addr;
+	hw->ctrl_bar = (uint8_t *)pci_dev->pci_mem[0].mem_res.addr;
 	if (hw->ctrl_bar == NULL) {
 		PMD_DRV_LOG(ERR,
 			"hw->ctrl_bar is NULL. BAR0 not configured");
@@ -351,9 +351,9 @@ nfp_netvf_init(struct rte_eth_dev *eth_dev)
 	PMD_INIT_LOG(DEBUG, "tx_bar_off: 0x%" PRIx64 "", tx_bar_off);
 	PMD_INIT_LOG(DEBUG, "rx_bar_off: 0x%" PRIx64 "", rx_bar_off);
 
-	hw->tx_bar = (uint8_t *)pci_dev->mem_resource[2].addr +
+	hw->tx_bar = (uint8_t *)pci_dev->pci_mem[2].mem_res.addr +
 		     tx_bar_off;
-	hw->rx_bar = (uint8_t *)pci_dev->mem_resource[2].addr +
+	hw->rx_bar = (uint8_t *)pci_dev->pci_mem[2].mem_res.addr +
 		     rx_bar_off;
 
 	PMD_INIT_LOG(DEBUG, "ctrl_bar: %p, tx_bar: %p, rx_bar: %p",
diff --git a/drivers/net/nfp/nfpcore/nfp_cpp_pcie_ops.c b/drivers/net/nfp/nfpcore/nfp_cpp_pcie_ops.c
index 6029bd6c3a..9a9b357325 100644
--- a/drivers/net/nfp/nfpcore/nfp_cpp_pcie_ops.c
+++ b/drivers/net/nfp/nfpcore/nfp_cpp_pcie_ops.c
@@ -790,7 +790,7 @@ nfp6000_set_barsz(struct rte_pci_device *dev, struct nfp_pcie_user *desc)
 	unsigned long tmp;
 	int i = 0;
 
-	tmp = dev->mem_resource[0].len;
+	tmp = dev->pci_mem[0].mem_res.len;
 
 	while (tmp >>= 1)
 		i++;
@@ -836,7 +836,7 @@ nfp6000_init(struct nfp_cpp *cpp, struct rte_pci_device *dev)
 	if (nfp6000_set_barsz(dev, desc) < 0)
 		goto error;
 
-	desc->cfg = (char *)dev->mem_resource[0].addr;
+	desc->cfg = (char *)dev->pci_mem[0].mem_res.addr;
 
 	nfp_enable_bars(desc);
 
diff --git a/drivers/net/ngbe/ngbe_ethdev.c b/drivers/net/ngbe/ngbe_ethdev.c
index c32d954769..573acbf85c 100644
--- a/drivers/net/ngbe/ngbe_ethdev.c
+++ b/drivers/net/ngbe/ngbe_ethdev.c
@@ -364,7 +364,7 @@ eth_ngbe_dev_init(struct rte_eth_dev *eth_dev, void *init_params __rte_unused)
 	rte_eth_copy_pci_info(eth_dev, pci_dev);
 	eth_dev->data->dev_flags |= RTE_ETH_DEV_AUTOFILL_QUEUE_XSTATS;
 
-	hw->hw_addr = (void *)pci_dev->mem_resource[0].addr;
+	hw->hw_addr = (void *)pci_dev->pci_mem[0].mem_res.addr;
 
 	/* Vendor and Device ID need to be set before init of shared code */
 	hw->back = pci_dev;
diff --git a/drivers/net/octeon_ep/otx_ep_ethdev.c b/drivers/net/octeon_ep/otx_ep_ethdev.c
index f43db1e398..e9fdda3993 100644
--- a/drivers/net/octeon_ep/otx_ep_ethdev.c
+++ b/drivers/net/octeon_ep/otx_ep_ethdev.c
@@ -490,7 +490,7 @@ otx_ep_eth_dev_init(struct rte_eth_dev *eth_dev)
 	}
 	rte_eth_random_addr(vf_mac_addr.addr_bytes);
 	rte_ether_addr_copy(&vf_mac_addr, eth_dev->data->mac_addrs);
-	otx_epvf->hw_addr = pdev->mem_resource[0].addr;
+	otx_epvf->hw_addr = pdev->pci_mem[0].mem_res.addr;
 	otx_epvf->pdev = pdev;
 
 	otx_epdev_init(otx_epvf);
diff --git a/drivers/net/octeontx/base/octeontx_pkivf.c b/drivers/net/octeontx/base/octeontx_pkivf.c
index 6a48a22de6..a4708ee25d 100644
--- a/drivers/net/octeontx/base/octeontx_pkivf.c
+++ b/drivers/net/octeontx/base/octeontx_pkivf.c
@@ -195,13 +195,13 @@ pkivf_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
 		return 0;
 
-	if (pci_dev->mem_resource[0].addr == NULL) {
+	if (pci_dev->pci_mem[0].mem_res.addr == NULL) {
 		octeontx_log_err("PKI Empty bar[0] %p",
-				 pci_dev->mem_resource[0].addr);
+				 pci_dev->pci_mem[0].mem_res.addr);
 		return -ENODEV;
 	}
 
-	bar0 = pci_dev->mem_resource[0].addr;
+	bar0 = pci_dev->pci_mem[0].mem_res.addr;
 	val = octeontx_read64(bar0);
 	domain = val & 0xffff;
 	vfid = (val >> 16) & 0xffff;
diff --git a/drivers/net/octeontx/base/octeontx_pkovf.c b/drivers/net/octeontx/base/octeontx_pkovf.c
index 5d445dfb49..aed1f06aee 100644
--- a/drivers/net/octeontx/base/octeontx_pkovf.c
+++ b/drivers/net/octeontx/base/octeontx_pkovf.c
@@ -586,15 +586,15 @@ pkovf_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
 		return 0;
 
-	if (pci_dev->mem_resource[0].addr == NULL ||
-	    pci_dev->mem_resource[2].addr == NULL) {
+	if (pci_dev->pci_mem[0].mem_res.addr == NULL ||
+	    pci_dev->pci_mem[2].mem_res.addr == NULL) {
 		octeontx_log_err("Empty bars %p %p",
-			pci_dev->mem_resource[0].addr,
-			pci_dev->mem_resource[2].addr);
+			pci_dev->pci_mem[0].mem_res.addr,
+			pci_dev->pci_mem[2].mem_res.addr);
 		return -ENODEV;
 	}
-	bar0 = pci_dev->mem_resource[0].addr;
-	bar2 = pci_dev->mem_resource[2].addr;
+	bar0 = pci_dev->pci_mem[0].mem_res.addr;
+	bar2 = pci_dev->pci_mem[2].mem_res.addr;
 
 	octeontx_pkovf_setup();
 
diff --git a/drivers/net/qede/qede_main.c b/drivers/net/qede/qede_main.c
index 03039038ad..62f7308dd8 100644
--- a/drivers/net/qede/qede_main.c
+++ b/drivers/net/qede/qede_main.c
@@ -37,9 +37,9 @@ qed_update_pf_params(struct ecore_dev *edev, struct ecore_pf_params *params)
 
 static void qed_init_pci(struct ecore_dev *edev, struct rte_pci_device *pci_dev)
 {
-	edev->regview = pci_dev->mem_resource[0].addr;
-	edev->doorbells = pci_dev->mem_resource[2].addr;
-	edev->db_size = pci_dev->mem_resource[2].len;
+	edev->regview = pci_dev->pci_mem[0].mem_res.addr;
+	edev->doorbells = pci_dev->pci_mem[2].mem_res.addr;
+	edev->db_size = pci_dev->pci_mem[2].mem_res.len;
 	edev->pci_dev = pci_dev;
 }
 
diff --git a/drivers/net/sfc/sfc.c b/drivers/net/sfc/sfc.c
index 22753e3417..46132d81cb 100644
--- a/drivers/net/sfc/sfc.c
+++ b/drivers/net/sfc/sfc.c
@@ -773,7 +773,7 @@ sfc_mem_bar_init(struct sfc_adapter *sa, const efx_bar_region_t *mem_ebrp)
 	struct rte_pci_device *pci_dev = RTE_ETH_DEV_TO_PCI(eth_dev);
 	efsys_bar_t *ebp = &sa->mem_bar;
 	struct rte_mem_resource *res =
-		&pci_dev->mem_resource[mem_ebrp->ebr_index];
+		&pci_dev->pci_mem[mem_ebrp->ebr_index].mem_res;
 
 	SFC_BAR_LOCK_INIT(ebp, eth_dev->data->name);
 	ebp->esb_rid = mem_ebrp->ebr_index;
diff --git a/drivers/net/thunderx/nicvf_ethdev.c b/drivers/net/thunderx/nicvf_ethdev.c
index ab1e714d97..e15286feb2 100644
--- a/drivers/net/thunderx/nicvf_ethdev.c
+++ b/drivers/net/thunderx/nicvf_ethdev.c
@@ -2223,7 +2223,7 @@ nicvf_eth_dev_init(struct rte_eth_dev *eth_dev)
 			pci_dev->addr.domain, pci_dev->addr.bus,
 			pci_dev->addr.devid, pci_dev->addr.function);
 
-	nic->reg_base = (uintptr_t)pci_dev->mem_resource[0].addr;
+	nic->reg_base = (uintptr_t)pci_dev->pci_mem[0].mem_res.addr;
 	if (!nic->reg_base) {
 		PMD_INIT_LOG(ERR, "Failed to map BAR0");
 		ret = -ENODEV;
diff --git a/drivers/net/txgbe/txgbe_ethdev.c b/drivers/net/txgbe/txgbe_ethdev.c
index a502618bc5..492184a5ff 100644
--- a/drivers/net/txgbe/txgbe_ethdev.c
+++ b/drivers/net/txgbe/txgbe_ethdev.c
@@ -594,7 +594,7 @@ eth_txgbe_dev_init(struct rte_eth_dev *eth_dev, void *init_params __rte_unused)
 
 	rte_eth_copy_pci_info(eth_dev, pci_dev);
 
-	hw->hw_addr = (void *)pci_dev->mem_resource[0].addr;
+	hw->hw_addr = (void *)pci_dev->pci_mem[0].mem_res.addr;
 
 	/* Vendor and Device ID need to be set before init of shared code */
 	hw->device_id = pci_dev->id.device_id;
diff --git a/drivers/net/txgbe/txgbe_ethdev_vf.c b/drivers/net/txgbe/txgbe_ethdev_vf.c
index 3b1f7c913b..779598ea9c 100644
--- a/drivers/net/txgbe/txgbe_ethdev_vf.c
+++ b/drivers/net/txgbe/txgbe_ethdev_vf.c
@@ -211,7 +211,7 @@ eth_txgbevf_dev_init(struct rte_eth_dev *eth_dev)
 	hw->vendor_id = pci_dev->id.vendor_id;
 	hw->subsystem_device_id = pci_dev->id.subsystem_device_id;
 	hw->subsystem_vendor_id = pci_dev->id.subsystem_vendor_id;
-	hw->hw_addr = (void *)pci_dev->mem_resource[0].addr;
+	hw->hw_addr = (void *)pci_dev->pci_mem[0].mem_res.addr;
 
 	/* initialize the vfta */
 	memset(shadow_vfta, 0, sizeof(*shadow_vfta));
diff --git a/drivers/net/virtio/virtio_pci.c b/drivers/net/virtio/virtio_pci.c
index 29eb739b04..c7d05a2663 100644
--- a/drivers/net/virtio/virtio_pci.c
+++ b/drivers/net/virtio/virtio_pci.c
@@ -603,14 +603,14 @@ get_cfg_addr(struct rte_pci_device *dev, struct virtio_pci_cap *cap)
 		return NULL;
 	}
 
-	if (offset + length > dev->mem_resource[bar].len) {
+	if (offset + length > dev->pci_mem[bar].mem_res.len) {
 		PMD_INIT_LOG(ERR,
 			"invalid cap: overflows bar space: %u > %" PRIu64,
-			offset + length, dev->mem_resource[bar].len);
+			offset + length, dev->pci_mem[bar].mem_res.len);
 		return NULL;
 	}
 
-	base = dev->mem_resource[bar].addr;
+	base = dev->pci_mem[bar].mem_res.addr;
 	if (base == NULL) {
 		PMD_INIT_LOG(ERR, "bar %u base addr is NULL", bar);
 		return NULL;
diff --git a/drivers/net/vmxnet3/vmxnet3_ethdev.c b/drivers/net/vmxnet3/vmxnet3_ethdev.c
index fd946dec5c..7d4ebaca14 100644
--- a/drivers/net/vmxnet3/vmxnet3_ethdev.c
+++ b/drivers/net/vmxnet3/vmxnet3_ethdev.c
@@ -345,8 +345,8 @@ eth_vmxnet3_dev_init(struct rte_eth_dev *eth_dev)
 	/* Vendor and Device ID need to be set before init of shared code */
 	hw->device_id = pci_dev->id.device_id;
 	hw->vendor_id = pci_dev->id.vendor_id;
-	hw->hw_addr0 = (void *)pci_dev->mem_resource[0].addr;
-	hw->hw_addr1 = (void *)pci_dev->mem_resource[1].addr;
+	hw->hw_addr0 = (void *)pci_dev->pci_mem[0].mem_res.addr;
+	hw->hw_addr1 = (void *)pci_dev->pci_mem[1].mem_res.addr;
 
 	hw->num_rx_queues = 1;
 	hw->num_tx_queues = 1;
diff --git a/drivers/raw/cnxk_bphy/cnxk_bphy.c b/drivers/raw/cnxk_bphy/cnxk_bphy.c
index d42cca649c..cef8006550 100644
--- a/drivers/raw/cnxk_bphy/cnxk_bphy.c
+++ b/drivers/raw/cnxk_bphy/cnxk_bphy.c
@@ -331,10 +331,10 @@ bphy_rawdev_probe(struct rte_pci_driver *pci_drv,
 	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
 		return 0;
 
-	if (!pci_dev->mem_resource[0].addr) {
+	if (!pci_dev->pci_mem[0].mem_res.addr) {
 		plt_err("BARs have invalid values: BAR0 %p\n BAR2 %p",
-			pci_dev->mem_resource[0].addr,
-			pci_dev->mem_resource[2].addr);
+			pci_dev->pci_mem[0].mem_res.addr,
+			pci_dev->pci_mem[2].mem_res.addr);
 		return -ENODEV;
 	}
 
@@ -355,8 +355,8 @@ bphy_rawdev_probe(struct rte_pci_driver *pci_drv,
 	bphy_rawdev->driver_name = pci_dev->driver->driver.name;
 
 	bphy_dev = (struct bphy_device *)bphy_rawdev->dev_private;
-	bphy_dev->mem.res0 = pci_dev->mem_resource[0];
-	bphy_dev->mem.res2 = pci_dev->mem_resource[2];
+	bphy_dev->mem.res0 = pci_dev->pci_mem[0].mem_res;
+	bphy_dev->mem.res2 = pci_dev->pci_mem[2].mem_res;
 	bphy_dev->bphy.pci_dev = pci_dev;
 
 	ret = roc_bphy_dev_init(&bphy_dev->bphy);
diff --git a/drivers/raw/cnxk_bphy/cnxk_bphy_cgx.c b/drivers/raw/cnxk_bphy/cnxk_bphy_cgx.c
index 2d8466ef91..5a3b8c329d 100644
--- a/drivers/raw/cnxk_bphy/cnxk_bphy_cgx.c
+++ b/drivers/raw/cnxk_bphy/cnxk_bphy_cgx.c
@@ -302,7 +302,7 @@ cnxk_bphy_cgx_rawdev_probe(struct rte_pci_driver *pci_drv,
 	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
 		return 0;
 
-	if (!pci_dev->mem_resource[0].addr)
+	if (!pci_dev->pci_mem[0].mem_res.addr)
 		return -ENODEV;
 
 	ret = roc_plt_init();
@@ -326,8 +326,8 @@ cnxk_bphy_cgx_rawdev_probe(struct rte_pci_driver *pci_drv,
 	}
 
 	rcgx = cgx->rcgx;
-	rcgx->bar0_pa = pci_dev->mem_resource[0].phys_addr;
-	rcgx->bar0_va = pci_dev->mem_resource[0].addr;
+	rcgx->bar0_pa = pci_dev->pci_mem[0].mem_res.phys_addr;
+	rcgx->bar0_va = pci_dev->pci_mem[0].mem_res.addr;
 	ret = roc_bphy_cgx_dev_init(rcgx);
 	if (ret)
 		goto out_free;
diff --git a/drivers/raw/ifpga/afu_pmd_n3000.c b/drivers/raw/ifpga/afu_pmd_n3000.c
index 67b3941265..ae60407516 100644
--- a/drivers/raw/ifpga/afu_pmd_n3000.c
+++ b/drivers/raw/ifpga/afu_pmd_n3000.c
@@ -1524,7 +1524,7 @@ static void *n3000_afu_get_port_addr(struct afu_rawdev *dev)
 	if (!pci_dev)
 		return NULL;
 
-	addr = (uint8_t *)pci_dev->mem_resource[0].addr;
+	addr = (uint8_t *)pci_dev->pci_mem[0].mem_res.addr;
 	val = rte_read64(addr + PORT_ATTR_REG(dev->port));
 	if (!PORT_IMPLEMENTED(val)) {
 		IFPGA_RAWDEV_PMD_INFO("FIU port %d is not implemented", dev->port);
@@ -1537,7 +1537,7 @@ static void *n3000_afu_get_port_addr(struct afu_rawdev *dev)
 		return NULL;
 	}
 
-	addr = (uint8_t *)pci_dev->mem_resource[bar].addr + PORT_OFFSET(val);
+	addr = (uint8_t *)pci_dev->pci_mem[bar].mem_res.addr + PORT_OFFSET(val);
 	return addr;
 }
 
diff --git a/drivers/raw/ifpga/ifpga_rawdev.c b/drivers/raw/ifpga/ifpga_rawdev.c
index 1020adcf6e..078d37d5df 100644
--- a/drivers/raw/ifpga/ifpga_rawdev.c
+++ b/drivers/raw/ifpga/ifpga_rawdev.c
@@ -1580,9 +1580,9 @@ ifpga_rawdev_create(struct rte_pci_device *pci_dev,
 
 	/* init opae_adapter_data_pci for device specific information */
 	for (i = 0; i < PCI_MAX_RESOURCE; i++) {
-		data->region[i].phys_addr = pci_dev->mem_resource[i].phys_addr;
-		data->region[i].len = pci_dev->mem_resource[i].len;
-		data->region[i].addr = pci_dev->mem_resource[i].addr;
+		data->region[i].phys_addr = pci_dev->pci_mem[i].mem_res.phys_addr;
+		data->region[i].len = pci_dev->pci_mem[i].mem_res.len;
+		data->region[i].addr = pci_dev->pci_mem[i].mem_res.addr;
 	}
 	data->device_id = pci_dev->id.device_id;
 	data->vendor_id = pci_dev->id.vendor_id;
diff --git a/drivers/raw/ntb/ntb_hw_intel.c b/drivers/raw/ntb/ntb_hw_intel.c
index 9b4465176a..65cbed335c 100644
--- a/drivers/raw/ntb/ntb_hw_intel.c
+++ b/drivers/raw/ntb/ntb_hw_intel.c
@@ -179,7 +179,7 @@ intel_ntb_dev_init(const struct rte_rawdev *dev)
 		return -EINVAL;
 	}
 
-	hw->hw_addr = (char *)hw->pci_dev->mem_resource[0].addr;
+	hw->hw_addr = (char *)hw->pci_dev->pci_mem[0].mem_res.addr;
 
 	if (is_gen3_ntb(hw))
 		ret = intel_ntb3_check_ppd(hw);
@@ -207,7 +207,7 @@ intel_ntb_dev_init(const struct rte_rawdev *dev)
 
 	for (i = 0; i < hw->mw_cnt; i++) {
 		bar = intel_ntb_bar[i];
-		hw->mw_size[i] = hw->pci_dev->mem_resource[bar].len;
+		hw->mw_size[i] = hw->pci_dev->pci_mem[bar].mem_res.len;
 	}
 
 	/* Reserve the last 2 spad registers for users. */
@@ -238,7 +238,7 @@ intel_ntb_get_peer_mw_addr(const struct rte_rawdev *dev, int mw_idx)
 
 	bar = intel_ntb_bar[mw_idx];
 
-	return hw->pci_dev->mem_resource[bar].addr;
+	return hw->pci_dev->pci_mem[bar].mem_res.addr;
 }
 
 static int
@@ -271,7 +271,7 @@ intel_ntb_mw_set_trans(const struct rte_rawdev *dev, int mw_idx,
 
 	/* Limit reg val should be EMBAR base address plus MW size. */
 	base = addr;
-	limit = hw->pci_dev->mem_resource[bar].phys_addr + size;
+	limit = hw->pci_dev->pci_mem[bar].mem_res.phys_addr + size;
 	rte_write64(base, xlat_addr);
 	rte_write64(limit, limit_addr);
 
diff --git a/drivers/vdpa/ifc/ifcvf_vdpa.c b/drivers/vdpa/ifc/ifcvf_vdpa.c
index e4133568c1..a14b66e8cb 100644
--- a/drivers/vdpa/ifc/ifcvf_vdpa.c
+++ b/drivers/vdpa/ifc/ifcvf_vdpa.c
@@ -204,11 +204,11 @@ ifcvf_vfio_setup(struct ifcvf_internal *internal)
 	for (i = 0; i < RTE_MIN(PCI_MAX_RESOURCE, IFCVF_PCI_MAX_RESOURCE);
 			i++) {
 		internal->hw.mem_resource[i].addr =
-			internal->pdev->mem_resource[i].addr;
+			internal->pdev->pci_mem[i].mem_res.addr;
 		internal->hw.mem_resource[i].phys_addr =
-			internal->pdev->mem_resource[i].phys_addr;
+			internal->pdev->pci_mem[i].mem_res.phys_addr;
 		internal->hw.mem_resource[i].len =
-			internal->pdev->mem_resource[i].len;
+			internal->pdev->pci_mem[i].mem_res.len;
 	}
 
 	return 0;
diff --git a/drivers/vdpa/sfc/sfc_vdpa_hw.c b/drivers/vdpa/sfc/sfc_vdpa_hw.c
index edb7e35c2c..dcdb21d4ce 100644
--- a/drivers/vdpa/sfc/sfc_vdpa_hw.c
+++ b/drivers/vdpa/sfc/sfc_vdpa_hw.c
@@ -192,7 +192,7 @@ sfc_vdpa_mem_bar_init(struct sfc_vdpa_adapter *sva,
 	struct rte_pci_device *pci_dev = sva->pdev;
 	efsys_bar_t *ebp = &sva->mem_bar;
 	struct rte_mem_resource *res =
-		&pci_dev->mem_resource[mem_ebrp->ebr_index];
+		&pci_dev->pci_mem[mem_ebrp->ebr_index].mem_res;
 
 	SFC_BAR_LOCK_INIT(ebp, pci_dev->name);
 	ebp->esb_rid = mem_ebrp->ebr_index;
diff --git a/drivers/vdpa/sfc/sfc_vdpa_ops.c b/drivers/vdpa/sfc/sfc_vdpa_ops.c
index e88c7eeaa6..78b5a7c9fa 100644
--- a/drivers/vdpa/sfc/sfc_vdpa_ops.c
+++ b/drivers/vdpa/sfc/sfc_vdpa_ops.c
@@ -861,7 +861,7 @@ sfc_vdpa_get_notify_area(int vid, int qid, uint64_t *offset, uint64_t *size)
 		      *offset);
 
 	pci_dev = sfc_vdpa_adapter_by_dev_handle(dev)->pdev;
-	doorbell = (uint8_t *)pci_dev->mem_resource[reg.index].addr + *offset;
+	doorbell = (uint8_t *)pci_dev->pci_mem[reg.index].mem_res.addr + *offset;
 
 	/*
 	 * virtio-net driver in VM sends queue notifications before
-- 
2.17.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC 0/4] Support VFIO sparse mmap in PCI bus
  2023-04-18  5:30 [RFC 0/4] Support VFIO sparse mmap in PCI bus Chenbo Xia
                   ` (3 preceding siblings ...)
  2023-04-18  5:30 ` [RFC 4/4] bus/pci: add VFIO sparse mmap support Chenbo Xia
@ 2023-04-18  7:46 ` David Marchand
  2023-04-18  9:27   ` Xia, Chenbo
  2023-04-18  9:33   ` Xia, Chenbo
  2023-05-08  2:13 ` Xia, Chenbo
  2023-05-15  6:46 ` [PATCH v1 " Miao Li
  6 siblings, 2 replies; 50+ messages in thread
From: David Marchand @ 2023-04-18  7:46 UTC (permalink / raw)
  To: Chenbo Xia; +Cc: dev, skori

Hello Chenbo,

On Tue, Apr 18, 2023 at 7:49 AM Chenbo Xia <chenbo.xia@intel.com> wrote:
>
> This series introduces a VFIO standard capability, called sparse
> mmap to PCI bus. In linux kernel, it's defined as
> VFIO_REGION_INFO_CAP_SPARSE_MMAP. Sparse mmap means instead of
> mmap whole BAR region into DPDK process, only mmap part of the
> BAR region after getting sparse mmap information from kernel.
> For the rest of BAR region that is not mmap-ed, DPDK process
> can use pread/pwrite system calls to access. Sparse mmap is
> useful when kernel does not want userspace to mmap whole BAR
> region, or kernel wants to control over access to specific BAR
> region. Vendors can choose to enable this feature or not for
> their devices in their specific kernel modules.

Sorry, I did not take the time to look into the details.
Could you summarize what would be the benefit of this series?


>
> In this patchset:
>
> Patch 1-3 is mainly for introducing BAR access APIs so that
> driver could use them to access specific BAR using pread/pwrite
> system calls when part of the BAR is not mmap-able.
>
> Patch 4 adds the VFIO sparse mmap support finally. A question
> is for all sparse mmap regions, should they be mapped to a
> continuous virtual address region that follows device-specific
> BAR layout or not. In theory, there could be three options to
> support this feature.
>
> Option 1: Map sparse mmap regions independently
> ======================================================
> In this approach, we mmap each sparse mmap region one by one
> and each region could be located anywhere in process address
> space. But accessing the mmaped BAR will not be as easy as
> 'bar_base_address + bar_offset', driver needs to check the
> sparse mmap information to access specific BAR register.
>
> Patch 4 in this patchset adopts this option. Driver API change
> is introduced in bus_pci_driver.h. Corresponding changes in
> all drivers are also done and currently I am assuming drivers
> do not support this feature so they will not check the
> 'is_sparse' flag but assumes it to be false. Note that it will
> not break any driver and each vendor can add related logic when
> they start to support this feature. This is only because I don't
> want to introduce complexity to drivers that do not want to
> support this feature.
>
> Option 2: Map sparse mmap regions based on device-specific BAR layout
> ======================================================================
> In this approach, the sparse mmap regions are mapped to continuous
> virtual address region that follows device-specific BAR layout.
> For example, the BAR size is 0x4000 and only 0-0x1000 (sparse mmap
> region #1) and 0x3000-0x4000 (sparse mmap region #2) could be
> mmaped. Region #1 will be mapped at 'base_addr' and region #2
> will be mapped at 'base_addr + 0x3000'. The good thing is if
> we implement like this, driver can still access all BAR registers
> using 'bar_base_address + bar_offset' way and we don't need
> to introduce any driver API change. But the address space
> range 'base_addr + 0x1000' to 'base_addr + 0x3000' may need to
> be reserved so it could result in waste of address space or memory
> (when we use MAP_ANONYMOUS and MAP_PRIVATE flag to reserve this
> range). Meanwhile, driver needs to know which part of BAR is
> mmaped (this is possible since the range is defined by vendor's
> specific kernel module).
>
> Option 3: Support both option 1 & 2
> ===================================
> We could define a driver flag to let driver choose which way it
> perfers since either option has its own Pros & Cons.
>
> Please share your comments, Thanks!
>
>
> Chenbo Xia (4):
>   bus/pci: introduce an internal representation of PCI device

I think this first patch main motivation was to avoid ABI issues.
Since v22.11, the rte_pci_device object is opaque to applications.

So, do we still need this patch?


>   bus/pci: avoid depending on private value in kernel source
>   bus/pci: introduce helper for MMIO read and write
>   bus/pci: add VFIO sparse mmap support
>
>  drivers/baseband/acc/rte_acc100_pmd.c         |   6 +-
>  drivers/baseband/acc/rte_vrb_pmd.c            |   6 +-
>  .../fpga_5gnr_fec/rte_fpga_5gnr_fec.c         |   6 +-
>  drivers/baseband/fpga_lte_fec/fpga_lte_fec.c  |   6 +-
>  drivers/bus/pci/bsd/pci.c                     |  43 +-
>  drivers/bus/pci/bus_pci_driver.h              |  24 +-
>  drivers/bus/pci/linux/pci.c                   |  91 +++-
>  drivers/bus/pci/linux/pci_init.h              |  14 +-
>  drivers/bus/pci/linux/pci_uio.c               |  34 +-
>  drivers/bus/pci/linux/pci_vfio.c              | 445 ++++++++++++++----
>  drivers/bus/pci/pci_common.c                  |  57 ++-
>  drivers/bus/pci/pci_common_uio.c              |  12 +-
>  drivers/bus/pci/private.h                     |  25 +-
>  drivers/bus/pci/rte_bus_pci.h                 |  48 ++
>  drivers/bus/pci/version.map                   |   3 +
>  drivers/common/cnxk/roc_dev.c                 |   4 +-
>  drivers/common/cnxk/roc_dpi.c                 |   2 +-
>  drivers/common/cnxk/roc_ml.c                  |  22 +-
>  drivers/common/qat/dev/qat_dev_gen1.c         |   2 +-
>  drivers/common/qat/dev/qat_dev_gen4.c         |   4 +-
>  drivers/common/sfc_efx/sfc_efx.c              |   2 +-
>  drivers/compress/octeontx/otx_zip.c           |   4 +-
>  drivers/crypto/ccp/ccp_dev.c                  |   4 +-
>  drivers/crypto/cnxk/cnxk_cryptodev_ops.c      |   2 +-
>  drivers/crypto/nitrox/nitrox_device.c         |   4 +-
>  drivers/crypto/octeontx/otx_cryptodev_ops.c   |   6 +-
>  drivers/crypto/virtio/virtio_pci.c            |   6 +-
>  drivers/dma/cnxk/cnxk_dmadev.c                |   2 +-
>  drivers/dma/hisilicon/hisi_dmadev.c           |   6 +-
>  drivers/dma/idxd/idxd_pci.c                   |   4 +-
>  drivers/dma/ioat/ioat_dmadev.c                |   2 +-
>  drivers/event/dlb2/pf/dlb2_main.c             |  16 +-
>  drivers/event/octeontx/ssovf_probe.c          |  38 +-
>  drivers/event/octeontx/timvf_probe.c          |  18 +-
>  drivers/event/skeleton/skeleton_eventdev.c    |   2 +-
>  drivers/mempool/octeontx/octeontx_fpavf.c     |   6 +-
>  drivers/net/ark/ark_ethdev.c                  |   4 +-
>  drivers/net/atlantic/atl_ethdev.c             |   2 +-
>  drivers/net/avp/avp_ethdev.c                  |  20 +-
>  drivers/net/axgbe/axgbe_ethdev.c              |   4 +-
>  drivers/net/bnx2x/bnx2x_ethdev.c              |   6 +-
>  drivers/net/bnxt/bnxt_ethdev.c                |   8 +-
>  drivers/net/cpfl/cpfl_ethdev.c                |   4 +-
>  drivers/net/cxgbe/cxgbe_ethdev.c              |   2 +-
>  drivers/net/cxgbe/cxgbe_main.c                |   2 +-
>  drivers/net/cxgbe/cxgbevf_ethdev.c            |   2 +-
>  drivers/net/cxgbe/cxgbevf_main.c              |   2 +-
>  drivers/net/e1000/em_ethdev.c                 |   4 +-
>  drivers/net/e1000/igb_ethdev.c                |   4 +-
>  drivers/net/ena/ena_ethdev.c                  |   4 +-
>  drivers/net/enetc/enetc_ethdev.c              |   2 +-
>  drivers/net/enic/enic_main.c                  |   4 +-
>  drivers/net/fm10k/fm10k_ethdev.c              |   2 +-
>  drivers/net/gve/gve_ethdev.c                  |   4 +-
>  drivers/net/hinic/base/hinic_pmd_hwif.c       |  14 +-
>  drivers/net/hns3/hns3_ethdev.c                |   2 +-
>  drivers/net/hns3/hns3_ethdev_vf.c             |   2 +-
>  drivers/net/hns3/hns3_rxtx.c                  |   4 +-
>  drivers/net/i40e/i40e_ethdev.c                |   2 +-
>  drivers/net/iavf/iavf_ethdev.c                |   2 +-
>  drivers/net/ice/ice_dcf.c                     |   2 +-
>  drivers/net/ice/ice_ethdev.c                  |   2 +-
>  drivers/net/idpf/idpf_ethdev.c                |   4 +-
>  drivers/net/igc/igc_ethdev.c                  |   2 +-
>  drivers/net/ionic/ionic_dev_pci.c             |   2 +-
>  drivers/net/ixgbe/ixgbe_ethdev.c              |   4 +-
>  drivers/net/liquidio/lio_ethdev.c             |   4 +-
>  drivers/net/nfp/nfp_ethdev.c                  |   2 +-
>  drivers/net/nfp/nfp_ethdev_vf.c               |   6 +-
>  drivers/net/nfp/nfpcore/nfp_cpp_pcie_ops.c    |   4 +-
>  drivers/net/ngbe/ngbe_ethdev.c                |   2 +-
>  drivers/net/octeon_ep/otx_ep_ethdev.c         |   2 +-
>  drivers/net/octeontx/base/octeontx_pkivf.c    |   6 +-
>  drivers/net/octeontx/base/octeontx_pkovf.c    |  12 +-
>  drivers/net/qede/qede_main.c                  |   6 +-
>  drivers/net/sfc/sfc.c                         |   2 +-
>  drivers/net/thunderx/nicvf_ethdev.c           |   2 +-
>  drivers/net/txgbe/txgbe_ethdev.c              |   2 +-
>  drivers/net/txgbe/txgbe_ethdev_vf.c           |   2 +-
>  drivers/net/virtio/virtio_pci.c               |   6 +-
>  drivers/net/vmxnet3/vmxnet3_ethdev.c          |   4 +-
>  drivers/raw/cnxk_bphy/cnxk_bphy.c             |  10 +-
>  drivers/raw/cnxk_bphy/cnxk_bphy_cgx.c         |   6 +-
>  drivers/raw/ifpga/afu_pmd_n3000.c             |   4 +-
>  drivers/raw/ifpga/ifpga_rawdev.c              |   6 +-
>  drivers/raw/ntb/ntb_hw_intel.c                |   8 +-
>  drivers/vdpa/ifc/ifcvf_vdpa.c                 |   6 +-
>  drivers/vdpa/sfc/sfc_vdpa_hw.c                |   2 +-
>  drivers/vdpa/sfc/sfc_vdpa_ops.c               |   2 +-
>  lib/eal/include/rte_vfio.h                    |   1 -
>  90 files changed, 853 insertions(+), 352 deletions(-)


-- 
David Marchand


^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [RFC 0/4] Support VFIO sparse mmap in PCI bus
  2023-04-18  7:46 ` [RFC 0/4] Support VFIO sparse mmap in PCI bus David Marchand
@ 2023-04-18  9:27   ` Xia, Chenbo
  2023-04-18  9:33   ` Xia, Chenbo
  1 sibling, 0 replies; 50+ messages in thread
From: Xia, Chenbo @ 2023-04-18  9:27 UTC (permalink / raw)
  To: David Marchand; +Cc: dev, skori, Cao, Yahui, Li, Miao

Hi David,

> -----Original Message-----
> From: David Marchand <david.marchand@redhat.com>
> Sent: Tuesday, April 18, 2023 3:47 PM
> To: Xia, Chenbo <chenbo.xia@intel.com>
> Cc: dev@dpdk.org; skori@marvell.com
> Subject: Re: [RFC 0/4] Support VFIO sparse mmap in PCI bus
> 
> Hello Chenbo,
> 
> On Tue, Apr 18, 2023 at 7:49 AM Chenbo Xia <chenbo.xia@intel.com> wrote:
> >
> > This series introduces a VFIO standard capability, called sparse
> > mmap to PCI bus. In linux kernel, it's defined as
> > VFIO_REGION_INFO_CAP_SPARSE_MMAP. Sparse mmap means instead of
> > mmap whole BAR region into DPDK process, only mmap part of the
> > BAR region after getting sparse mmap information from kernel.
> > For the rest of BAR region that is not mmap-ed, DPDK process
> > can use pread/pwrite system calls to access. Sparse mmap is
> > useful when kernel does not want userspace to mmap whole BAR
> > region, or kernel wants to control over access to specific BAR
> > region. Vendors can choose to enable this feature or not for
> > their devices in their specific kernel modules.
> 
> Sorry, I did not take the time to look into the details.
> Could you summarize what would be the benefit of this series?

It could be different benefit for different vendor. There was one discussion:
http://inbox.dpdk.org/dev/CO6PR18MB386016A2634AF375F5B4BA8CB4899@CO6PR18MB3860.namprd18.prod.outlook.com/

Above problem is some device has very large BAR, and we don't want DPDK to map
the whole BAR.

For Intel devices, one benefit is that we want our kernel module to control over
access to specific BAR region so we will let DPDK process unable to mmap that region.
(Because after mmap, kernel will not know if userspace is accessing device BAR).

So that's why I summarize as 'Sparse mmap is useful when kernel does not want
userspace to mmap whole BAR region, or kernel wants to control over access to
specific BAR region'. It could be more usage for other vendors that I have not realized

Thanks,
Chenbo

> 
> 
> >
> > In this patchset:
> >
> > Patch 1-3 is mainly for introducing BAR access APIs so that
> > driver could use them to access specific BAR using pread/pwrite
> > system calls when part of the BAR is not mmap-able.
> >
> > Patch 4 adds the VFIO sparse mmap support finally. A question
> > is for all sparse mmap regions, should they be mapped to a
> > continuous virtual address region that follows device-specific
> > BAR layout or not. In theory, there could be three options to
> > support this feature.
> >
> > Option 1: Map sparse mmap regions independently
> > ======================================================
> > In this approach, we mmap each sparse mmap region one by one
> > and each region could be located anywhere in process address
> > space. But accessing the mmaped BAR will not be as easy as
> > 'bar_base_address + bar_offset', driver needs to check the
> > sparse mmap information to access specific BAR register.
> >
> > Patch 4 in this patchset adopts this option. Driver API change
> > is introduced in bus_pci_driver.h. Corresponding changes in
> > all drivers are also done and currently I am assuming drivers
> > do not support this feature so they will not check the
> > 'is_sparse' flag but assumes it to be false. Note that it will
> > not break any driver and each vendor can add related logic when
> > they start to support this feature. This is only because I don't
> > want to introduce complexity to drivers that do not want to
> > support this feature.
> >
> > Option 2: Map sparse mmap regions based on device-specific BAR layout
> > ======================================================================
> > In this approach, the sparse mmap regions are mapped to continuous
> > virtual address region that follows device-specific BAR layout.
> > For example, the BAR size is 0x4000 and only 0-0x1000 (sparse mmap
> > region #1) and 0x3000-0x4000 (sparse mmap region #2) could be
> > mmaped. Region #1 will be mapped at 'base_addr' and region #2
> > will be mapped at 'base_addr + 0x3000'. The good thing is if
> > we implement like this, driver can still access all BAR registers
> > using 'bar_base_address + bar_offset' way and we don't need
> > to introduce any driver API change. But the address space
> > range 'base_addr + 0x1000' to 'base_addr + 0x3000' may need to
> > be reserved so it could result in waste of address space or memory
> > (when we use MAP_ANONYMOUS and MAP_PRIVATE flag to reserve this
> > range). Meanwhile, driver needs to know which part of BAR is
> > mmaped (this is possible since the range is defined by vendor's
> > specific kernel module).
> >
> > Option 3: Support both option 1 & 2
> > ===================================
> > We could define a driver flag to let driver choose which way it
> > perfers since either option has its own Pros & Cons.
> >
> > Please share your comments, Thanks!
> >
> >
> > Chenbo Xia (4):
> >   bus/pci: introduce an internal representation of PCI device
> 
> I think this first patch main motivation was to avoid ABI issues.
> Since v22.11, the rte_pci_device object is opaque to applications.
> 
> So, do we still need this patch?
> 
> 
> >   bus/pci: avoid depending on private value in kernel source
> >   bus/pci: introduce helper for MMIO read and write
> >   bus/pci: add VFIO sparse mmap support
> >
> >  drivers/baseband/acc/rte_acc100_pmd.c         |   6 +-
> >  drivers/baseband/acc/rte_vrb_pmd.c            |   6 +-
> >  .../fpga_5gnr_fec/rte_fpga_5gnr_fec.c         |   6 +-
> >  drivers/baseband/fpga_lte_fec/fpga_lte_fec.c  |   6 +-
> >  drivers/bus/pci/bsd/pci.c                     |  43 +-
> >  drivers/bus/pci/bus_pci_driver.h              |  24 +-
> >  drivers/bus/pci/linux/pci.c                   |  91 +++-
> >  drivers/bus/pci/linux/pci_init.h              |  14 +-
> >  drivers/bus/pci/linux/pci_uio.c               |  34 +-
> >  drivers/bus/pci/linux/pci_vfio.c              | 445 ++++++++++++++----
> >  drivers/bus/pci/pci_common.c                  |  57 ++-
> >  drivers/bus/pci/pci_common_uio.c              |  12 +-
> >  drivers/bus/pci/private.h                     |  25 +-
> >  drivers/bus/pci/rte_bus_pci.h                 |  48 ++
> >  drivers/bus/pci/version.map                   |   3 +
> >  drivers/common/cnxk/roc_dev.c                 |   4 +-
> >  drivers/common/cnxk/roc_dpi.c                 |   2 +-
> >  drivers/common/cnxk/roc_ml.c                  |  22 +-
> >  drivers/common/qat/dev/qat_dev_gen1.c         |   2 +-
> >  drivers/common/qat/dev/qat_dev_gen4.c         |   4 +-
> >  drivers/common/sfc_efx/sfc_efx.c              |   2 +-
> >  drivers/compress/octeontx/otx_zip.c           |   4 +-
> >  drivers/crypto/ccp/ccp_dev.c                  |   4 +-
> >  drivers/crypto/cnxk/cnxk_cryptodev_ops.c      |   2 +-
> >  drivers/crypto/nitrox/nitrox_device.c         |   4 +-
> >  drivers/crypto/octeontx/otx_cryptodev_ops.c   |   6 +-
> >  drivers/crypto/virtio/virtio_pci.c            |   6 +-
> >  drivers/dma/cnxk/cnxk_dmadev.c                |   2 +-
> >  drivers/dma/hisilicon/hisi_dmadev.c           |   6 +-
> >  drivers/dma/idxd/idxd_pci.c                   |   4 +-
> >  drivers/dma/ioat/ioat_dmadev.c                |   2 +-
> >  drivers/event/dlb2/pf/dlb2_main.c             |  16 +-
> >  drivers/event/octeontx/ssovf_probe.c          |  38 +-
> >  drivers/event/octeontx/timvf_probe.c          |  18 +-
> >  drivers/event/skeleton/skeleton_eventdev.c    |   2 +-
> >  drivers/mempool/octeontx/octeontx_fpavf.c     |   6 +-
> >  drivers/net/ark/ark_ethdev.c                  |   4 +-
> >  drivers/net/atlantic/atl_ethdev.c             |   2 +-
> >  drivers/net/avp/avp_ethdev.c                  |  20 +-
> >  drivers/net/axgbe/axgbe_ethdev.c              |   4 +-
> >  drivers/net/bnx2x/bnx2x_ethdev.c              |   6 +-
> >  drivers/net/bnxt/bnxt_ethdev.c                |   8 +-
> >  drivers/net/cpfl/cpfl_ethdev.c                |   4 +-
> >  drivers/net/cxgbe/cxgbe_ethdev.c              |   2 +-
> >  drivers/net/cxgbe/cxgbe_main.c                |   2 +-
> >  drivers/net/cxgbe/cxgbevf_ethdev.c            |   2 +-
> >  drivers/net/cxgbe/cxgbevf_main.c              |   2 +-
> >  drivers/net/e1000/em_ethdev.c                 |   4 +-
> >  drivers/net/e1000/igb_ethdev.c                |   4 +-
> >  drivers/net/ena/ena_ethdev.c                  |   4 +-
> >  drivers/net/enetc/enetc_ethdev.c              |   2 +-
> >  drivers/net/enic/enic_main.c                  |   4 +-
> >  drivers/net/fm10k/fm10k_ethdev.c              |   2 +-
> >  drivers/net/gve/gve_ethdev.c                  |   4 +-
> >  drivers/net/hinic/base/hinic_pmd_hwif.c       |  14 +-
> >  drivers/net/hns3/hns3_ethdev.c                |   2 +-
> >  drivers/net/hns3/hns3_ethdev_vf.c             |   2 +-
> >  drivers/net/hns3/hns3_rxtx.c                  |   4 +-
> >  drivers/net/i40e/i40e_ethdev.c                |   2 +-
> >  drivers/net/iavf/iavf_ethdev.c                |   2 +-
> >  drivers/net/ice/ice_dcf.c                     |   2 +-
> >  drivers/net/ice/ice_ethdev.c                  |   2 +-
> >  drivers/net/idpf/idpf_ethdev.c                |   4 +-
> >  drivers/net/igc/igc_ethdev.c                  |   2 +-
> >  drivers/net/ionic/ionic_dev_pci.c             |   2 +-
> >  drivers/net/ixgbe/ixgbe_ethdev.c              |   4 +-
> >  drivers/net/liquidio/lio_ethdev.c             |   4 +-
> >  drivers/net/nfp/nfp_ethdev.c                  |   2 +-
> >  drivers/net/nfp/nfp_ethdev_vf.c               |   6 +-
> >  drivers/net/nfp/nfpcore/nfp_cpp_pcie_ops.c    |   4 +-
> >  drivers/net/ngbe/ngbe_ethdev.c                |   2 +-
> >  drivers/net/octeon_ep/otx_ep_ethdev.c         |   2 +-
> >  drivers/net/octeontx/base/octeontx_pkivf.c    |   6 +-
> >  drivers/net/octeontx/base/octeontx_pkovf.c    |  12 +-
> >  drivers/net/qede/qede_main.c                  |   6 +-
> >  drivers/net/sfc/sfc.c                         |   2 +-
> >  drivers/net/thunderx/nicvf_ethdev.c           |   2 +-
> >  drivers/net/txgbe/txgbe_ethdev.c              |   2 +-
> >  drivers/net/txgbe/txgbe_ethdev_vf.c           |   2 +-
> >  drivers/net/virtio/virtio_pci.c               |   6 +-
> >  drivers/net/vmxnet3/vmxnet3_ethdev.c          |   4 +-
> >  drivers/raw/cnxk_bphy/cnxk_bphy.c             |  10 +-
> >  drivers/raw/cnxk_bphy/cnxk_bphy_cgx.c         |   6 +-
> >  drivers/raw/ifpga/afu_pmd_n3000.c             |   4 +-
> >  drivers/raw/ifpga/ifpga_rawdev.c              |   6 +-
> >  drivers/raw/ntb/ntb_hw_intel.c                |   8 +-
> >  drivers/vdpa/ifc/ifcvf_vdpa.c                 |   6 +-
> >  drivers/vdpa/sfc/sfc_vdpa_hw.c                |   2 +-
> >  drivers/vdpa/sfc/sfc_vdpa_ops.c               |   2 +-
> >  lib/eal/include/rte_vfio.h                    |   1 -
> >  90 files changed, 853 insertions(+), 352 deletions(-)
> 
> 
> --
> David Marchand


^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [RFC 0/4] Support VFIO sparse mmap in PCI bus
  2023-04-18  7:46 ` [RFC 0/4] Support VFIO sparse mmap in PCI bus David Marchand
  2023-04-18  9:27   ` Xia, Chenbo
@ 2023-04-18  9:33   ` Xia, Chenbo
  1 sibling, 0 replies; 50+ messages in thread
From: Xia, Chenbo @ 2023-04-18  9:33 UTC (permalink / raw)
  To: David Marchand; +Cc: dev, skori, Cao, Yahui, Li, Miao

David,

Sorry that I missed one comment...

> -----Original Message-----
> From: David Marchand <david.marchand@redhat.com>
> Sent: Tuesday, April 18, 2023 3:47 PM
> To: Xia, Chenbo <chenbo.xia@intel.com>
> Cc: dev@dpdk.org; skori@marvell.com
> Subject: Re: [RFC 0/4] Support VFIO sparse mmap in PCI bus
> 
> Hello Chenbo,
> 
> On Tue, Apr 18, 2023 at 7:49 AM Chenbo Xia <chenbo.xia@intel.com> wrote:
> >
> > This series introduces a VFIO standard capability, called sparse
> > mmap to PCI bus. In linux kernel, it's defined as
> > VFIO_REGION_INFO_CAP_SPARSE_MMAP. Sparse mmap means instead of
> > mmap whole BAR region into DPDK process, only mmap part of the
> > BAR region after getting sparse mmap information from kernel.
> > For the rest of BAR region that is not mmap-ed, DPDK process
> > can use pread/pwrite system calls to access. Sparse mmap is
> > useful when kernel does not want userspace to mmap whole BAR
> > region, or kernel wants to control over access to specific BAR
> > region. Vendors can choose to enable this feature or not for
> > their devices in their specific kernel modules.
> 
> Sorry, I did not take the time to look into the details.
> Could you summarize what would be the benefit of this series?
> 
> 
> >
> > In this patchset:
> >
> > Patch 1-3 is mainly for introducing BAR access APIs so that
> > driver could use them to access specific BAR using pread/pwrite
> > system calls when part of the BAR is not mmap-able.
> >
> > Patch 4 adds the VFIO sparse mmap support finally. A question
> > is for all sparse mmap regions, should they be mapped to a
> > continuous virtual address region that follows device-specific
> > BAR layout or not. In theory, there could be three options to
> > support this feature.
> >
> > Option 1: Map sparse mmap regions independently
> > ======================================================
> > In this approach, we mmap each sparse mmap region one by one
> > and each region could be located anywhere in process address
> > space. But accessing the mmaped BAR will not be as easy as
> > 'bar_base_address + bar_offset', driver needs to check the
> > sparse mmap information to access specific BAR register.
> >
> > Patch 4 in this patchset adopts this option. Driver API change
> > is introduced in bus_pci_driver.h. Corresponding changes in
> > all drivers are also done and currently I am assuming drivers
> > do not support this feature so they will not check the
> > 'is_sparse' flag but assumes it to be false. Note that it will
> > not break any driver and each vendor can add related logic when
> > they start to support this feature. This is only because I don't
> > want to introduce complexity to drivers that do not want to
> > support this feature.
> >
> > Option 2: Map sparse mmap regions based on device-specific BAR layout
> > ======================================================================
> > In this approach, the sparse mmap regions are mapped to continuous
> > virtual address region that follows device-specific BAR layout.
> > For example, the BAR size is 0x4000 and only 0-0x1000 (sparse mmap
> > region #1) and 0x3000-0x4000 (sparse mmap region #2) could be
> > mmaped. Region #1 will be mapped at 'base_addr' and region #2
> > will be mapped at 'base_addr + 0x3000'. The good thing is if
> > we implement like this, driver can still access all BAR registers
> > using 'bar_base_address + bar_offset' way and we don't need
> > to introduce any driver API change. But the address space
> > range 'base_addr + 0x1000' to 'base_addr + 0x3000' may need to
> > be reserved so it could result in waste of address space or memory
> > (when we use MAP_ANONYMOUS and MAP_PRIVATE flag to reserve this
> > range). Meanwhile, driver needs to know which part of BAR is
> > mmaped (this is possible since the range is defined by vendor's
> > specific kernel module).
> >
> > Option 3: Support both option 1 & 2
> > ===================================
> > We could define a driver flag to let driver choose which way it
> > perfers since either option has its own Pros & Cons.
> >
> > Please share your comments, Thanks!
> >
> >
> > Chenbo Xia (4):
> >   bus/pci: introduce an internal representation of PCI device
> 
> I think this first patch main motivation was to avoid ABI issues.
> Since v22.11, the rte_pci_device object is opaque to applications.
> 
> So, do we still need this patch?

I think it could be good to reduce unnecessary driver APIs..
Hiding these region information could be friendly to driver developer?

Thanks,
Chenbo

> 
> 
> >   bus/pci: avoid depending on private value in kernel source
> >   bus/pci: introduce helper for MMIO read and write
> >   bus/pci: add VFIO sparse mmap support
> >
> >  drivers/baseband/acc/rte_acc100_pmd.c         |   6 +-
> >  drivers/baseband/acc/rte_vrb_pmd.c            |   6 +-
> >  .../fpga_5gnr_fec/rte_fpga_5gnr_fec.c         |   6 +-
> >  drivers/baseband/fpga_lte_fec/fpga_lte_fec.c  |   6 +-
> >  drivers/bus/pci/bsd/pci.c                     |  43 +-
> >  drivers/bus/pci/bus_pci_driver.h              |  24 +-
> >  drivers/bus/pci/linux/pci.c                   |  91 +++-
> >  drivers/bus/pci/linux/pci_init.h              |  14 +-
> >  drivers/bus/pci/linux/pci_uio.c               |  34 +-
> >  drivers/bus/pci/linux/pci_vfio.c              | 445 ++++++++++++++----
> >  drivers/bus/pci/pci_common.c                  |  57 ++-
> >  drivers/bus/pci/pci_common_uio.c              |  12 +-
> >  drivers/bus/pci/private.h                     |  25 +-
> >  drivers/bus/pci/rte_bus_pci.h                 |  48 ++
> >  drivers/bus/pci/version.map                   |   3 +
> >  drivers/common/cnxk/roc_dev.c                 |   4 +-
> >  drivers/common/cnxk/roc_dpi.c                 |   2 +-
> >  drivers/common/cnxk/roc_ml.c                  |  22 +-
> >  drivers/common/qat/dev/qat_dev_gen1.c         |   2 +-
> >  drivers/common/qat/dev/qat_dev_gen4.c         |   4 +-
> >  drivers/common/sfc_efx/sfc_efx.c              |   2 +-
> >  drivers/compress/octeontx/otx_zip.c           |   4 +-
> >  drivers/crypto/ccp/ccp_dev.c                  |   4 +-
> >  drivers/crypto/cnxk/cnxk_cryptodev_ops.c      |   2 +-
> >  drivers/crypto/nitrox/nitrox_device.c         |   4 +-
> >  drivers/crypto/octeontx/otx_cryptodev_ops.c   |   6 +-
> >  drivers/crypto/virtio/virtio_pci.c            |   6 +-
> >  drivers/dma/cnxk/cnxk_dmadev.c                |   2 +-
> >  drivers/dma/hisilicon/hisi_dmadev.c           |   6 +-
> >  drivers/dma/idxd/idxd_pci.c                   |   4 +-
> >  drivers/dma/ioat/ioat_dmadev.c                |   2 +-
> >  drivers/event/dlb2/pf/dlb2_main.c             |  16 +-
> >  drivers/event/octeontx/ssovf_probe.c          |  38 +-
> >  drivers/event/octeontx/timvf_probe.c          |  18 +-
> >  drivers/event/skeleton/skeleton_eventdev.c    |   2 +-
> >  drivers/mempool/octeontx/octeontx_fpavf.c     |   6 +-
> >  drivers/net/ark/ark_ethdev.c                  |   4 +-
> >  drivers/net/atlantic/atl_ethdev.c             |   2 +-
> >  drivers/net/avp/avp_ethdev.c                  |  20 +-
> >  drivers/net/axgbe/axgbe_ethdev.c              |   4 +-
> >  drivers/net/bnx2x/bnx2x_ethdev.c              |   6 +-
> >  drivers/net/bnxt/bnxt_ethdev.c                |   8 +-
> >  drivers/net/cpfl/cpfl_ethdev.c                |   4 +-
> >  drivers/net/cxgbe/cxgbe_ethdev.c              |   2 +-
> >  drivers/net/cxgbe/cxgbe_main.c                |   2 +-
> >  drivers/net/cxgbe/cxgbevf_ethdev.c            |   2 +-
> >  drivers/net/cxgbe/cxgbevf_main.c              |   2 +-
> >  drivers/net/e1000/em_ethdev.c                 |   4 +-
> >  drivers/net/e1000/igb_ethdev.c                |   4 +-
> >  drivers/net/ena/ena_ethdev.c                  |   4 +-
> >  drivers/net/enetc/enetc_ethdev.c              |   2 +-
> >  drivers/net/enic/enic_main.c                  |   4 +-
> >  drivers/net/fm10k/fm10k_ethdev.c              |   2 +-
> >  drivers/net/gve/gve_ethdev.c                  |   4 +-
> >  drivers/net/hinic/base/hinic_pmd_hwif.c       |  14 +-
> >  drivers/net/hns3/hns3_ethdev.c                |   2 +-
> >  drivers/net/hns3/hns3_ethdev_vf.c             |   2 +-
> >  drivers/net/hns3/hns3_rxtx.c                  |   4 +-
> >  drivers/net/i40e/i40e_ethdev.c                |   2 +-
> >  drivers/net/iavf/iavf_ethdev.c                |   2 +-
> >  drivers/net/ice/ice_dcf.c                     |   2 +-
> >  drivers/net/ice/ice_ethdev.c                  |   2 +-
> >  drivers/net/idpf/idpf_ethdev.c                |   4 +-
> >  drivers/net/igc/igc_ethdev.c                  |   2 +-
> >  drivers/net/ionic/ionic_dev_pci.c             |   2 +-
> >  drivers/net/ixgbe/ixgbe_ethdev.c              |   4 +-
> >  drivers/net/liquidio/lio_ethdev.c             |   4 +-
> >  drivers/net/nfp/nfp_ethdev.c                  |   2 +-
> >  drivers/net/nfp/nfp_ethdev_vf.c               |   6 +-
> >  drivers/net/nfp/nfpcore/nfp_cpp_pcie_ops.c    |   4 +-
> >  drivers/net/ngbe/ngbe_ethdev.c                |   2 +-
> >  drivers/net/octeon_ep/otx_ep_ethdev.c         |   2 +-
> >  drivers/net/octeontx/base/octeontx_pkivf.c    |   6 +-
> >  drivers/net/octeontx/base/octeontx_pkovf.c    |  12 +-
> >  drivers/net/qede/qede_main.c                  |   6 +-
> >  drivers/net/sfc/sfc.c                         |   2 +-
> >  drivers/net/thunderx/nicvf_ethdev.c           |   2 +-
> >  drivers/net/txgbe/txgbe_ethdev.c              |   2 +-
> >  drivers/net/txgbe/txgbe_ethdev_vf.c           |   2 +-
> >  drivers/net/virtio/virtio_pci.c               |   6 +-
> >  drivers/net/vmxnet3/vmxnet3_ethdev.c          |   4 +-
> >  drivers/raw/cnxk_bphy/cnxk_bphy.c             |  10 +-
> >  drivers/raw/cnxk_bphy/cnxk_bphy_cgx.c         |   6 +-
> >  drivers/raw/ifpga/afu_pmd_n3000.c             |   4 +-
> >  drivers/raw/ifpga/ifpga_rawdev.c              |   6 +-
> >  drivers/raw/ntb/ntb_hw_intel.c                |   8 +-
> >  drivers/vdpa/ifc/ifcvf_vdpa.c                 |   6 +-
> >  drivers/vdpa/sfc/sfc_vdpa_hw.c                |   2 +-
> >  drivers/vdpa/sfc/sfc_vdpa_ops.c               |   2 +-
> >  lib/eal/include/rte_vfio.h                    |   1 -
> >  90 files changed, 853 insertions(+), 352 deletions(-)
> 
> 
> --
> David Marchand


^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [RFC 0/4] Support VFIO sparse mmap in PCI bus
  2023-04-18  5:30 [RFC 0/4] Support VFIO sparse mmap in PCI bus Chenbo Xia
                   ` (4 preceding siblings ...)
  2023-04-18  7:46 ` [RFC 0/4] Support VFIO sparse mmap in PCI bus David Marchand
@ 2023-05-08  2:13 ` Xia, Chenbo
  2023-05-08  3:04   ` Sunil Kumar Kori
  2023-05-15  6:46 ` [PATCH v1 " Miao Li
  6 siblings, 1 reply; 50+ messages in thread
From: Xia, Chenbo @ 2023-05-08  2:13 UTC (permalink / raw)
  To: dev
  Cc: skori, techboard, thomas, Richardson, Bruce, ferruh.yigit,
	david.marchand, Cao, Yahui, Li, Miao

Gentle ping for some comments..

After rethink, personally I may choose option 2 as it will have no driver API change for
PMDs and as I look at current code, that's how we do when MSI-X table can't be mmap-ed.

Thanks,
Chenbo

> -----Original Message-----
> From: Chenbo Xia <chenbo.xia@intel.com>
> Sent: Tuesday, April 18, 2023 1:30 PM
> To: dev@dpdk.org
> Cc: skori@marvell.com
> Subject: [RFC 0/4] Support VFIO sparse mmap in PCI bus
> 
> This series introduces a VFIO standard capability, called sparse
> mmap to PCI bus. In linux kernel, it's defined as
> VFIO_REGION_INFO_CAP_SPARSE_MMAP. Sparse mmap means instead of
> mmap whole BAR region into DPDK process, only mmap part of the
> BAR region after getting sparse mmap information from kernel.
> For the rest of BAR region that is not mmap-ed, DPDK process
> can use pread/pwrite system calls to access. Sparse mmap is
> useful when kernel does not want userspace to mmap whole BAR
> region, or kernel wants to control over access to specific BAR
> region. Vendors can choose to enable this feature or not for
> their devices in their specific kernel modules.
> 
> In this patchset:
> 
> Patch 1-3 is mainly for introducing BAR access APIs so that
> driver could use them to access specific BAR using pread/pwrite
> system calls when part of the BAR is not mmap-able.
> 
> Patch 4 adds the VFIO sparse mmap support finally. A question
> is for all sparse mmap regions, should they be mapped to a
> continuous virtual address region that follows device-specific
> BAR layout or not. In theory, there could be three options to
> support this feature.
> 
> Option 1: Map sparse mmap regions independently
> ======================================================
> In this approach, we mmap each sparse mmap region one by one
> and each region could be located anywhere in process address
> space. But accessing the mmaped BAR will not be as easy as
> 'bar_base_address + bar_offset', driver needs to check the
> sparse mmap information to access specific BAR register.
> 
> Patch 4 in this patchset adopts this option. Driver API change
> is introduced in bus_pci_driver.h. Corresponding changes in
> all drivers are also done and currently I am assuming drivers
> do not support this feature so they will not check the
> 'is_sparse' flag but assumes it to be false. Note that it will
> not break any driver and each vendor can add related logic when
> they start to support this feature. This is only because I don't
> want to introduce complexity to drivers that do not want to
> support this feature.
> 
> Option 2: Map sparse mmap regions based on device-specific BAR layout
> ======================================================================
> In this approach, the sparse mmap regions are mapped to continuous
> virtual address region that follows device-specific BAR layout.
> For example, the BAR size is 0x4000 and only 0-0x1000 (sparse mmap
> region #1) and 0x3000-0x4000 (sparse mmap region #2) could be
> mmaped. Region #1 will be mapped at 'base_addr' and region #2
> will be mapped at 'base_addr + 0x3000'. The good thing is if
> we implement like this, driver can still access all BAR registers
> using 'bar_base_address + bar_offset' way and we don't need
> to introduce any driver API change. But the address space
> range 'base_addr + 0x1000' to 'base_addr + 0x3000' may need to
> be reserved so it could result in waste of address space or memory
> (when we use MAP_ANONYMOUS and MAP_PRIVATE flag to reserve this
> range). Meanwhile, driver needs to know which part of BAR is
> mmaped (this is possible since the range is defined by vendor's
> specific kernel module).
> 
> Option 3: Support both option 1 & 2
> ===================================
> We could define a driver flag to let driver choose which way it
> perfers since either option has its own Pros & Cons.
> 
> Please share your comments, Thanks!
> 
> 
> Chenbo Xia (4):
>   bus/pci: introduce an internal representation of PCI device
>   bus/pci: avoid depending on private value in kernel source
>   bus/pci: introduce helper for MMIO read and write
>   bus/pci: add VFIO sparse mmap support
> 
>  drivers/baseband/acc/rte_acc100_pmd.c         |   6 +-
>  drivers/baseband/acc/rte_vrb_pmd.c            |   6 +-
>  .../fpga_5gnr_fec/rte_fpga_5gnr_fec.c         |   6 +-
>  drivers/baseband/fpga_lte_fec/fpga_lte_fec.c  |   6 +-
>  drivers/bus/pci/bsd/pci.c                     |  43 +-
>  drivers/bus/pci/bus_pci_driver.h              |  24 +-
>  drivers/bus/pci/linux/pci.c                   |  91 +++-
>  drivers/bus/pci/linux/pci_init.h              |  14 +-
>  drivers/bus/pci/linux/pci_uio.c               |  34 +-
>  drivers/bus/pci/linux/pci_vfio.c              | 445 ++++++++++++++----
>  drivers/bus/pci/pci_common.c                  |  57 ++-
>  drivers/bus/pci/pci_common_uio.c              |  12 +-
>  drivers/bus/pci/private.h                     |  25 +-
>  drivers/bus/pci/rte_bus_pci.h                 |  48 ++
>  drivers/bus/pci/version.map                   |   3 +
>  drivers/common/cnxk/roc_dev.c                 |   4 +-
>  drivers/common/cnxk/roc_dpi.c                 |   2 +-
>  drivers/common/cnxk/roc_ml.c                  |  22 +-
>  drivers/common/qat/dev/qat_dev_gen1.c         |   2 +-
>  drivers/common/qat/dev/qat_dev_gen4.c         |   4 +-
>  drivers/common/sfc_efx/sfc_efx.c              |   2 +-
>  drivers/compress/octeontx/otx_zip.c           |   4 +-
>  drivers/crypto/ccp/ccp_dev.c                  |   4 +-
>  drivers/crypto/cnxk/cnxk_cryptodev_ops.c      |   2 +-
>  drivers/crypto/nitrox/nitrox_device.c         |   4 +-
>  drivers/crypto/octeontx/otx_cryptodev_ops.c   |   6 +-
>  drivers/crypto/virtio/virtio_pci.c            |   6 +-
>  drivers/dma/cnxk/cnxk_dmadev.c                |   2 +-
>  drivers/dma/hisilicon/hisi_dmadev.c           |   6 +-
>  drivers/dma/idxd/idxd_pci.c                   |   4 +-
>  drivers/dma/ioat/ioat_dmadev.c                |   2 +-
>  drivers/event/dlb2/pf/dlb2_main.c             |  16 +-
>  drivers/event/octeontx/ssovf_probe.c          |  38 +-
>  drivers/event/octeontx/timvf_probe.c          |  18 +-
>  drivers/event/skeleton/skeleton_eventdev.c    |   2 +-
>  drivers/mempool/octeontx/octeontx_fpavf.c     |   6 +-
>  drivers/net/ark/ark_ethdev.c                  |   4 +-
>  drivers/net/atlantic/atl_ethdev.c             |   2 +-
>  drivers/net/avp/avp_ethdev.c                  |  20 +-
>  drivers/net/axgbe/axgbe_ethdev.c              |   4 +-
>  drivers/net/bnx2x/bnx2x_ethdev.c              |   6 +-
>  drivers/net/bnxt/bnxt_ethdev.c                |   8 +-
>  drivers/net/cpfl/cpfl_ethdev.c                |   4 +-
>  drivers/net/cxgbe/cxgbe_ethdev.c              |   2 +-
>  drivers/net/cxgbe/cxgbe_main.c                |   2 +-
>  drivers/net/cxgbe/cxgbevf_ethdev.c            |   2 +-
>  drivers/net/cxgbe/cxgbevf_main.c              |   2 +-
>  drivers/net/e1000/em_ethdev.c                 |   4 +-
>  drivers/net/e1000/igb_ethdev.c                |   4 +-
>  drivers/net/ena/ena_ethdev.c                  |   4 +-
>  drivers/net/enetc/enetc_ethdev.c              |   2 +-
>  drivers/net/enic/enic_main.c                  |   4 +-
>  drivers/net/fm10k/fm10k_ethdev.c              |   2 +-
>  drivers/net/gve/gve_ethdev.c                  |   4 +-
>  drivers/net/hinic/base/hinic_pmd_hwif.c       |  14 +-
>  drivers/net/hns3/hns3_ethdev.c                |   2 +-
>  drivers/net/hns3/hns3_ethdev_vf.c             |   2 +-
>  drivers/net/hns3/hns3_rxtx.c                  |   4 +-
>  drivers/net/i40e/i40e_ethdev.c                |   2 +-
>  drivers/net/iavf/iavf_ethdev.c                |   2 +-
>  drivers/net/ice/ice_dcf.c                     |   2 +-
>  drivers/net/ice/ice_ethdev.c                  |   2 +-
>  drivers/net/idpf/idpf_ethdev.c                |   4 +-
>  drivers/net/igc/igc_ethdev.c                  |   2 +-
>  drivers/net/ionic/ionic_dev_pci.c             |   2 +-
>  drivers/net/ixgbe/ixgbe_ethdev.c              |   4 +-
>  drivers/net/liquidio/lio_ethdev.c             |   4 +-
>  drivers/net/nfp/nfp_ethdev.c                  |   2 +-
>  drivers/net/nfp/nfp_ethdev_vf.c               |   6 +-
>  drivers/net/nfp/nfpcore/nfp_cpp_pcie_ops.c    |   4 +-
>  drivers/net/ngbe/ngbe_ethdev.c                |   2 +-
>  drivers/net/octeon_ep/otx_ep_ethdev.c         |   2 +-
>  drivers/net/octeontx/base/octeontx_pkivf.c    |   6 +-
>  drivers/net/octeontx/base/octeontx_pkovf.c    |  12 +-
>  drivers/net/qede/qede_main.c                  |   6 +-
>  drivers/net/sfc/sfc.c                         |   2 +-
>  drivers/net/thunderx/nicvf_ethdev.c           |   2 +-
>  drivers/net/txgbe/txgbe_ethdev.c              |   2 +-
>  drivers/net/txgbe/txgbe_ethdev_vf.c           |   2 +-
>  drivers/net/virtio/virtio_pci.c               |   6 +-
>  drivers/net/vmxnet3/vmxnet3_ethdev.c          |   4 +-
>  drivers/raw/cnxk_bphy/cnxk_bphy.c             |  10 +-
>  drivers/raw/cnxk_bphy/cnxk_bphy_cgx.c         |   6 +-
>  drivers/raw/ifpga/afu_pmd_n3000.c             |   4 +-
>  drivers/raw/ifpga/ifpga_rawdev.c              |   6 +-
>  drivers/raw/ntb/ntb_hw_intel.c                |   8 +-
>  drivers/vdpa/ifc/ifcvf_vdpa.c                 |   6 +-
>  drivers/vdpa/sfc/sfc_vdpa_hw.c                |   2 +-
>  drivers/vdpa/sfc/sfc_vdpa_ops.c               |   2 +-
>  lib/eal/include/rte_vfio.h                    |   1 -
>  90 files changed, 853 insertions(+), 352 deletions(-)
> 
> --
> 2.17.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [RFC 0/4] Support VFIO sparse mmap in PCI bus
  2023-05-08  2:13 ` Xia, Chenbo
@ 2023-05-08  3:04   ` Sunil Kumar Kori
  0 siblings, 0 replies; 50+ messages in thread
From: Sunil Kumar Kori @ 2023-05-08  3:04 UTC (permalink / raw)
  To: Xia, Chenbo, dev
  Cc: techboard, thomas, Richardson, Bruce, ferruh.yigit,
	david.marchand, Cao, Yahui, Li, Miao

+1 for option 2.

Thanks & Regards
Sunil Kumar Kori

> -----Original Message-----
> From: Xia, Chenbo <chenbo.xia@intel.com>
> Sent: Monday, May 8, 2023 7:43 AM
> To: dev@dpdk.org
> Cc: Sunil Kumar Kori <skori@marvell.com>; techboard@dpdk.org;
> thomas@monjalon.net; Richardson, Bruce <bruce.richardson@intel.com>;
> ferruh.yigit@amd.com; david.marchand@redhat.com; Cao, Yahui
> <yahui.cao@intel.com>; Li, Miao <miao.li@intel.com>
> Subject: [EXT] RE: [RFC 0/4] Support VFIO sparse mmap in PCI bus
> 
> External Email
> 
> ----------------------------------------------------------------------
> Gentle ping for some comments..
> 
> After rethink, personally I may choose option 2 as it will have no driver API
> change for PMDs and as I look at current code, that's how we do when MSI-X
> table can't be mmap-ed.
> 
> Thanks,
> Chenbo
> 
> > -----Original Message-----
> > From: Chenbo Xia <chenbo.xia@intel.com>
> > Sent: Tuesday, April 18, 2023 1:30 PM
> > To: dev@dpdk.org
> > Cc: skori@marvell.com
> > Subject: [RFC 0/4] Support VFIO sparse mmap in PCI bus
> >
> > This series introduces a VFIO standard capability, called sparse mmap
> > to PCI bus. In linux kernel, it's defined as
> > VFIO_REGION_INFO_CAP_SPARSE_MMAP. Sparse mmap means instead of
> mmap
> > whole BAR region into DPDK process, only mmap part of the BAR region
> > after getting sparse mmap information from kernel.
> > For the rest of BAR region that is not mmap-ed, DPDK process can use
> > pread/pwrite system calls to access. Sparse mmap is useful when kernel
> > does not want userspace to mmap whole BAR region, or kernel wants to
> > control over access to specific BAR region. Vendors can choose to
> > enable this feature or not for their devices in their specific kernel
> > modules.
> >
> > In this patchset:
> >
> > Patch 1-3 is mainly for introducing BAR access APIs so that driver
> > could use them to access specific BAR using pread/pwrite system calls
> > when part of the BAR is not mmap-able.
> >
> > Patch 4 adds the VFIO sparse mmap support finally. A question is for
> > all sparse mmap regions, should they be mapped to a continuous virtual
> > address region that follows device-specific BAR layout or not. In
> > theory, there could be three options to support this feature.
> >
> > Option 1: Map sparse mmap regions independently
> > ======================================================
> > In this approach, we mmap each sparse mmap region one by one and each
> > region could be located anywhere in process address space. But
> > accessing the mmaped BAR will not be as easy as 'bar_base_address +
> > bar_offset', driver needs to check the sparse mmap information to
> > access specific BAR register.
> >
> > Patch 4 in this patchset adopts this option. Driver API change is
> > introduced in bus_pci_driver.h. Corresponding changes in all drivers
> > are also done and currently I am assuming drivers do not support this
> > feature so they will not check the 'is_sparse' flag but assumes it to
> > be false. Note that it will not break any driver and each vendor can
> > add related logic when they start to support this feature. This is
> > only because I don't want to introduce complexity to drivers that do
> > not want to support this feature.
> >
> > Option 2: Map sparse mmap regions based on device-specific BAR layout
> >
> ================================================================
> ======
> > In this approach, the sparse mmap regions are mapped to continuous
> > virtual address region that follows device-specific BAR layout.
> > For example, the BAR size is 0x4000 and only 0-0x1000 (sparse mmap
> > region #1) and 0x3000-0x4000 (sparse mmap region #2) could be mmaped.
> > Region #1 will be mapped at 'base_addr' and region #2 will be mapped
> > at 'base_addr + 0x3000'. The good thing is if we implement like this,
> > driver can still access all BAR registers using 'bar_base_address +
> > bar_offset' way and we don't need to introduce any driver API change.
> > But the address space range 'base_addr + 0x1000' to 'base_addr +
> > 0x3000' may need to be reserved so it could result in waste of address
> > space or memory (when we use MAP_ANONYMOUS and MAP_PRIVATE flag
> to
> > reserve this range). Meanwhile, driver needs to know which part of BAR
> > is mmaped (this is possible since the range is defined by vendor's
> > specific kernel module).
> >
> > Option 3: Support both option 1 & 2
> > ===================================
> > We could define a driver flag to let driver choose which way it
> > perfers since either option has its own Pros & Cons.
> >
> > Please share your comments, Thanks!
> >
> >
> > Chenbo Xia (4):
> >   bus/pci: introduce an internal representation of PCI device
> >   bus/pci: avoid depending on private value in kernel source
> >   bus/pci: introduce helper for MMIO read and write
> >   bus/pci: add VFIO sparse mmap support
> >
> >  drivers/baseband/acc/rte_acc100_pmd.c         |   6 +-
> >  drivers/baseband/acc/rte_vrb_pmd.c            |   6 +-
> >  .../fpga_5gnr_fec/rte_fpga_5gnr_fec.c         |   6 +-
> >  drivers/baseband/fpga_lte_fec/fpga_lte_fec.c  |   6 +-
> >  drivers/bus/pci/bsd/pci.c                     |  43 +-
> >  drivers/bus/pci/bus_pci_driver.h              |  24 +-
> >  drivers/bus/pci/linux/pci.c                   |  91 +++-
> >  drivers/bus/pci/linux/pci_init.h              |  14 +-
> >  drivers/bus/pci/linux/pci_uio.c               |  34 +-
> >  drivers/bus/pci/linux/pci_vfio.c              | 445 ++++++++++++++----
> >  drivers/bus/pci/pci_common.c                  |  57 ++-
> >  drivers/bus/pci/pci_common_uio.c              |  12 +-
> >  drivers/bus/pci/private.h                     |  25 +-
> >  drivers/bus/pci/rte_bus_pci.h                 |  48 ++
> >  drivers/bus/pci/version.map                   |   3 +
> >  drivers/common/cnxk/roc_dev.c                 |   4 +-
> >  drivers/common/cnxk/roc_dpi.c                 |   2 +-
> >  drivers/common/cnxk/roc_ml.c                  |  22 +-
> >  drivers/common/qat/dev/qat_dev_gen1.c         |   2 +-
> >  drivers/common/qat/dev/qat_dev_gen4.c         |   4 +-
> >  drivers/common/sfc_efx/sfc_efx.c              |   2 +-
> >  drivers/compress/octeontx/otx_zip.c           |   4 +-
> >  drivers/crypto/ccp/ccp_dev.c                  |   4 +-
> >  drivers/crypto/cnxk/cnxk_cryptodev_ops.c      |   2 +-
> >  drivers/crypto/nitrox/nitrox_device.c         |   4 +-
> >  drivers/crypto/octeontx/otx_cryptodev_ops.c   |   6 +-
> >  drivers/crypto/virtio/virtio_pci.c            |   6 +-
> >  drivers/dma/cnxk/cnxk_dmadev.c                |   2 +-
> >  drivers/dma/hisilicon/hisi_dmadev.c           |   6 +-
> >  drivers/dma/idxd/idxd_pci.c                   |   4 +-
> >  drivers/dma/ioat/ioat_dmadev.c                |   2 +-
> >  drivers/event/dlb2/pf/dlb2_main.c             |  16 +-
> >  drivers/event/octeontx/ssovf_probe.c          |  38 +-
> >  drivers/event/octeontx/timvf_probe.c          |  18 +-
> >  drivers/event/skeleton/skeleton_eventdev.c    |   2 +-
> >  drivers/mempool/octeontx/octeontx_fpavf.c     |   6 +-
> >  drivers/net/ark/ark_ethdev.c                  |   4 +-
> >  drivers/net/atlantic/atl_ethdev.c             |   2 +-
> >  drivers/net/avp/avp_ethdev.c                  |  20 +-
> >  drivers/net/axgbe/axgbe_ethdev.c              |   4 +-
> >  drivers/net/bnx2x/bnx2x_ethdev.c              |   6 +-
> >  drivers/net/bnxt/bnxt_ethdev.c                |   8 +-
> >  drivers/net/cpfl/cpfl_ethdev.c                |   4 +-
> >  drivers/net/cxgbe/cxgbe_ethdev.c              |   2 +-
> >  drivers/net/cxgbe/cxgbe_main.c                |   2 +-
> >  drivers/net/cxgbe/cxgbevf_ethdev.c            |   2 +-
> >  drivers/net/cxgbe/cxgbevf_main.c              |   2 +-
> >  drivers/net/e1000/em_ethdev.c                 |   4 +-
> >  drivers/net/e1000/igb_ethdev.c                |   4 +-
> >  drivers/net/ena/ena_ethdev.c                  |   4 +-
> >  drivers/net/enetc/enetc_ethdev.c              |   2 +-
> >  drivers/net/enic/enic_main.c                  |   4 +-
> >  drivers/net/fm10k/fm10k_ethdev.c              |   2 +-
> >  drivers/net/gve/gve_ethdev.c                  |   4 +-
> >  drivers/net/hinic/base/hinic_pmd_hwif.c       |  14 +-
> >  drivers/net/hns3/hns3_ethdev.c                |   2 +-
> >  drivers/net/hns3/hns3_ethdev_vf.c             |   2 +-
> >  drivers/net/hns3/hns3_rxtx.c                  |   4 +-
> >  drivers/net/i40e/i40e_ethdev.c                |   2 +-
> >  drivers/net/iavf/iavf_ethdev.c                |   2 +-
> >  drivers/net/ice/ice_dcf.c                     |   2 +-
> >  drivers/net/ice/ice_ethdev.c                  |   2 +-
> >  drivers/net/idpf/idpf_ethdev.c                |   4 +-
> >  drivers/net/igc/igc_ethdev.c                  |   2 +-
> >  drivers/net/ionic/ionic_dev_pci.c             |   2 +-
> >  drivers/net/ixgbe/ixgbe_ethdev.c              |   4 +-
> >  drivers/net/liquidio/lio_ethdev.c             |   4 +-
> >  drivers/net/nfp/nfp_ethdev.c                  |   2 +-
> >  drivers/net/nfp/nfp_ethdev_vf.c               |   6 +-
> >  drivers/net/nfp/nfpcore/nfp_cpp_pcie_ops.c    |   4 +-
> >  drivers/net/ngbe/ngbe_ethdev.c                |   2 +-
> >  drivers/net/octeon_ep/otx_ep_ethdev.c         |   2 +-
> >  drivers/net/octeontx/base/octeontx_pkivf.c    |   6 +-
> >  drivers/net/octeontx/base/octeontx_pkovf.c    |  12 +-
> >  drivers/net/qede/qede_main.c                  |   6 +-
> >  drivers/net/sfc/sfc.c                         |   2 +-
> >  drivers/net/thunderx/nicvf_ethdev.c           |   2 +-
> >  drivers/net/txgbe/txgbe_ethdev.c              |   2 +-
> >  drivers/net/txgbe/txgbe_ethdev_vf.c           |   2 +-
> >  drivers/net/virtio/virtio_pci.c               |   6 +-
> >  drivers/net/vmxnet3/vmxnet3_ethdev.c          |   4 +-
> >  drivers/raw/cnxk_bphy/cnxk_bphy.c             |  10 +-
> >  drivers/raw/cnxk_bphy/cnxk_bphy_cgx.c         |   6 +-
> >  drivers/raw/ifpga/afu_pmd_n3000.c             |   4 +-
> >  drivers/raw/ifpga/ifpga_rawdev.c              |   6 +-
> >  drivers/raw/ntb/ntb_hw_intel.c                |   8 +-
> >  drivers/vdpa/ifc/ifcvf_vdpa.c                 |   6 +-
> >  drivers/vdpa/sfc/sfc_vdpa_hw.c                |   2 +-
> >  drivers/vdpa/sfc/sfc_vdpa_ops.c               |   2 +-
> >  lib/eal/include/rte_vfio.h                    |   1 -
> >  90 files changed, 853 insertions(+), 352 deletions(-)
> >
> > --
> > 2.17.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v1 0/4] Support VFIO sparse mmap in PCI bus
  2023-04-18  5:30 [RFC 0/4] Support VFIO sparse mmap in PCI bus Chenbo Xia
                   ` (5 preceding siblings ...)
  2023-05-08  2:13 ` Xia, Chenbo
@ 2023-05-15  6:46 ` Miao Li
  2023-05-15  6:46   ` [PATCH v1 1/4] bus/pci: introduce an internal representation of PCI device Miao Li
                     ` (3 more replies)
  6 siblings, 4 replies; 50+ messages in thread
From: Miao Li @ 2023-05-15  6:46 UTC (permalink / raw)
  To: dev; +Cc: skori, thomas, david.marchand, ferruh.yigit, chenbo.xia, yahui.cao

This series introduces a VFIO standard capability, called sparse
mmap to PCI bus. In linux kernel, it's defined as
VFIO_REGION_INFO_CAP_SPARSE_MMAP. Sparse mmap means instead of
mmap whole BAR region into DPDK process, only mmap part of the
BAR region after getting sparse mmap information from kernel.
For the rest of BAR region that is not mmap-ed, DPDK process
can use pread/pwrite system calls to access. Sparse mmap is
useful when kernel does not want userspace to mmap whole BAR
region, or kernel wants to control over access to specific BAR
region. Vendors can choose to enable this feature or not for
their devices in their specific kernel modules.

In this patchset:

Patch 1-3 is mainly for introducing BAR access APIs so that
driver could use them to access specific BAR using pread/pwrite
system calls when part of the BAR is not mmap-able. Patch 4
adds the VFIO sparse mmap support finally.

Chenbo Xia (3):
  bus/pci: introduce an internal representation of PCI device
  bus/pci: avoid depending on private value in kernel source
  bus/pci: introduce helper for MMIO read and write

Miao Li (1):
  bus/pci: add VFIO sparse mmap support

 drivers/bus/pci/bsd/pci.c        |  35 +++-
 drivers/bus/pci/linux/pci.c      |  78 +++++--
 drivers/bus/pci/linux/pci_init.h |  14 +-
 drivers/bus/pci/linux/pci_uio.c  |  22 ++
 drivers/bus/pci/linux/pci_vfio.c | 335 +++++++++++++++++++++++++------
 drivers/bus/pci/pci_common.c     |  12 +-
 drivers/bus/pci/private.h        |  25 ++-
 drivers/bus/pci/rte_bus_pci.h    |  48 +++++
 drivers/bus/pci/version.map      |   3 +
 lib/eal/include/rte_vfio.h       |   1 -
 10 files changed, 482 insertions(+), 91 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v1 1/4] bus/pci: introduce an internal representation of PCI device
  2023-05-15  6:46 ` [PATCH v1 " Miao Li
@ 2023-05-15  6:46   ` Miao Li
  2023-05-15  6:46   ` [PATCH v1 2/4] bus/pci: avoid depending on private value in kernel source Miao Li
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 50+ messages in thread
From: Miao Li @ 2023-05-15  6:46 UTC (permalink / raw)
  To: dev; +Cc: skori, thomas, david.marchand, ferruh.yigit, chenbo.xia, yahui.cao

From: Chenbo Xia <chenbo.xia@intel.com>

This patch introduces an internal representation of the PCI device
which will be used to store the internal information that don't have
to be exposed to drivers, e.g., the VFIO region sizes/offsets.

In this patch, the internal structure is simply a wrapper of the
rte_pci_device structure. More fields will be added.

Signed-off-by: Chenbo Xia <chenbo.xia@intel.com>
---
 drivers/bus/pci/bsd/pci.c    | 13 ++++++++-----
 drivers/bus/pci/linux/pci.c  | 28 ++++++++++++++++------------
 drivers/bus/pci/pci_common.c | 12 ++++++------
 drivers/bus/pci/private.h    | 14 +++++++++++++-
 4 files changed, 43 insertions(+), 24 deletions(-)

diff --git a/drivers/bus/pci/bsd/pci.c b/drivers/bus/pci/bsd/pci.c
index 7459d15c7e..a747eca58c 100644
--- a/drivers/bus/pci/bsd/pci.c
+++ b/drivers/bus/pci/bsd/pci.c
@@ -208,16 +208,19 @@ pci_uio_map_resource_by_index(struct rte_pci_device *dev, int res_idx,
 static int
 pci_scan_one(int dev_pci_fd, struct pci_conf *conf)
 {
+	struct rte_pci_device_internal *pdev;
 	struct rte_pci_device *dev;
 	struct pci_bar_io bar;
 	unsigned i, max;
 
-	dev = malloc(sizeof(*dev));
-	if (dev == NULL) {
+	pdev = malloc(sizeof(*pdev));
+	if (pdev == NULL) {
+		RTE_LOG(ERR, EAL, "Cannot allocate memory for internal pci device\n");
 		return -1;
 	}
 
-	memset(dev, 0, sizeof(*dev));
+	memset(pdev, 0, sizeof(*pdev));
+	dev = &pdev->device;
 	dev->device.bus = &rte_pci_bus.bus;
 
 	dev->addr.domain = conf->pc_sel.pc_domain;
@@ -303,7 +306,7 @@ pci_scan_one(int dev_pci_fd, struct pci_conf *conf)
 				memmove(dev2->mem_resource,
 					dev->mem_resource,
 					sizeof(dev->mem_resource));
-				pci_free(dev);
+				pci_free(pdev);
 			}
 			return 0;
 		}
@@ -313,7 +316,7 @@ pci_scan_one(int dev_pci_fd, struct pci_conf *conf)
 	return 0;
 
 skipdev:
-	pci_free(dev);
+	pci_free(pdev);
 	return 0;
 }
 
diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index ebd1395502..4c2c5ba382 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -211,22 +211,26 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr *addr)
 {
 	char filename[PATH_MAX];
 	unsigned long tmp;
+	struct rte_pci_device_internal *pdev;
 	struct rte_pci_device *dev;
 	char driver[PATH_MAX];
 	int ret;
 
-	dev = malloc(sizeof(*dev));
-	if (dev == NULL)
+	pdev = malloc(sizeof(*pdev));
+	if (pdev == NULL) {
+		RTE_LOG(ERR, EAL, "Cannot allocate memory for internal pci device\n");
 		return -1;
+	}
 
-	memset(dev, 0, sizeof(*dev));
+	memset(pdev, 0, sizeof(*pdev));
+	dev = &pdev->device;
 	dev->device.bus = &rte_pci_bus.bus;
 	dev->addr = *addr;
 
 	/* get vendor id */
 	snprintf(filename, sizeof(filename), "%s/vendor", dirname);
 	if (eal_parse_sysfs_value(filename, &tmp) < 0) {
-		pci_free(dev);
+		pci_free(pdev);
 		return -1;
 	}
 	dev->id.vendor_id = (uint16_t)tmp;
@@ -234,7 +238,7 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr *addr)
 	/* get device id */
 	snprintf(filename, sizeof(filename), "%s/device", dirname);
 	if (eal_parse_sysfs_value(filename, &tmp) < 0) {
-		pci_free(dev);
+		pci_free(pdev);
 		return -1;
 	}
 	dev->id.device_id = (uint16_t)tmp;
@@ -243,7 +247,7 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr *addr)
 	snprintf(filename, sizeof(filename), "%s/subsystem_vendor",
 		 dirname);
 	if (eal_parse_sysfs_value(filename, &tmp) < 0) {
-		pci_free(dev);
+		pci_free(pdev);
 		return -1;
 	}
 	dev->id.subsystem_vendor_id = (uint16_t)tmp;
@@ -252,7 +256,7 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr *addr)
 	snprintf(filename, sizeof(filename), "%s/subsystem_device",
 		 dirname);
 	if (eal_parse_sysfs_value(filename, &tmp) < 0) {
-		pci_free(dev);
+		pci_free(pdev);
 		return -1;
 	}
 	dev->id.subsystem_device_id = (uint16_t)tmp;
@@ -261,7 +265,7 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr *addr)
 	snprintf(filename, sizeof(filename), "%s/class",
 		 dirname);
 	if (eal_parse_sysfs_value(filename, &tmp) < 0) {
-		pci_free(dev);
+		pci_free(pdev);
 		return -1;
 	}
 	/* the least 24 bits are valid: class, subclass, program interface */
@@ -297,7 +301,7 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr *addr)
 	snprintf(filename, sizeof(filename), "%s/resource", dirname);
 	if (pci_parse_sysfs_resource(filename, dev) < 0) {
 		RTE_LOG(ERR, EAL, "%s(): cannot parse resource\n", __func__);
-		pci_free(dev);
+		pci_free(pdev);
 		return -1;
 	}
 
@@ -306,7 +310,7 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr *addr)
 	ret = pci_get_kernel_driver_by_path(filename, driver, sizeof(driver));
 	if (ret < 0) {
 		RTE_LOG(ERR, EAL, "Fail to get kernel driver\n");
-		pci_free(dev);
+		pci_free(pdev);
 		return -1;
 	}
 
@@ -320,7 +324,7 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr *addr)
 		else
 			dev->kdrv = RTE_PCI_KDRV_UNKNOWN;
 	} else {
-		pci_free(dev);
+		pci_free(pdev);
 		return 0;
 	}
 	/* device is valid, add in list (sorted) */
@@ -375,7 +379,7 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr *addr)
 						pci_common_set(dev2);
 					}
 				}
-				pci_free(dev);
+				pci_free(pdev);
 			}
 			return 0;
 		}
diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
index e32a9d517a..52404ab0fe 100644
--- a/drivers/bus/pci/pci_common.c
+++ b/drivers/bus/pci/pci_common.c
@@ -121,12 +121,12 @@ pci_common_set(struct rte_pci_device *dev)
 }
 
 void
-pci_free(struct rte_pci_device *dev)
+pci_free(struct rte_pci_device_internal *pdev)
 {
-	if (dev == NULL)
+	if (pdev == NULL)
 		return;
-	free(dev->bus_info);
-	free(dev);
+	free(pdev->device.bus_info);
+	free(pdev);
 }
 
 /* map a particular resource from a file */
@@ -465,7 +465,7 @@ pci_cleanup(void)
 		rte_intr_instance_free(dev->vfio_req_intr_handle);
 		dev->vfio_req_intr_handle = NULL;
 
-		pci_free(dev);
+		pci_free(RTE_PCI_DEVICE_INTERNAL(dev));
 	}
 
 	return error;
@@ -681,7 +681,7 @@ pci_unplug(struct rte_device *dev)
 	if (ret == 0) {
 		rte_pci_remove_device(pdev);
 		rte_devargs_remove(dev->devargs);
-		pci_free(pdev);
+		pci_free(RTE_PCI_DEVICE_INTERNAL(pdev));
 	}
 	return ret;
 }
diff --git a/drivers/bus/pci/private.h b/drivers/bus/pci/private.h
index c8161a1074..b564646e03 100644
--- a/drivers/bus/pci/private.h
+++ b/drivers/bus/pci/private.h
@@ -13,6 +13,14 @@
 #include <rte_os_shim.h>
 #include <rte_pci.h>
 
+/*
+ * Convert struct rte_pci_device to struct rte_pci_device_internal
+ */
+#define RTE_PCI_DEVICE_INTERNAL(ptr) \
+	container_of(ptr, struct rte_pci_device_internal, device)
+#define RTE_PCI_DEVICE_INTERNAL_CONST(ptr) \
+	container_of(ptr, const struct rte_pci_device_internal, device)
+
 /**
  * Structure describing the PCI bus
  */
@@ -34,6 +42,10 @@ extern struct rte_pci_bus rte_pci_bus;
 struct rte_pci_driver;
 struct rte_pci_device;
 
+struct rte_pci_device_internal {
+	struct rte_pci_device device;
+};
+
 /**
  * Scan the content of the PCI bus, and the devices in the devices
  * list
@@ -53,7 +65,7 @@ pci_common_set(struct rte_pci_device *dev);
  * Free a PCI device.
  */
 void
-pci_free(struct rte_pci_device *dev);
+pci_free(struct rte_pci_device_internal *pdev);
 
 /**
  * Validate whether a device with given PCI address should be ignored or not.
-- 
2.25.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v1 2/4] bus/pci: avoid depending on private value in kernel source
  2023-05-15  6:46 ` [PATCH v1 " Miao Li
  2023-05-15  6:46   ` [PATCH v1 1/4] bus/pci: introduce an internal representation of PCI device Miao Li
@ 2023-05-15  6:46   ` Miao Li
  2023-05-15  6:46   ` [PATCH v1 3/4] bus/pci: introduce helper for MMIO read and write Miao Li
  2023-05-15  6:47   ` [PATCH v1 4/4] bus/pci: add VFIO sparse mmap support Miao Li
  3 siblings, 0 replies; 50+ messages in thread
From: Miao Li @ 2023-05-15  6:46 UTC (permalink / raw)
  To: dev
  Cc: skori, thomas, david.marchand, ferruh.yigit, chenbo.xia,
	yahui.cao, Anatoly Burakov

From: Chenbo Xia <chenbo.xia@intel.com>

The value 40 used in VFIO_GET_REGION_ADDR() is a private value
(VFIO_PCI_OFFSET_SHIFT) defined in Linux kernel source [1]. It
is not part of VFIO API, and we should not depend on it.

[1] https://github.com/torvalds/linux/blob/v6.2/include/linux/vfio_pci_core.h

Signed-off-by: Chenbo Xia <chenbo.xia@intel.com>
---
 drivers/bus/pci/linux/pci.c      |   4 +-
 drivers/bus/pci/linux/pci_init.h |   4 +-
 drivers/bus/pci/linux/pci_vfio.c | 195 +++++++++++++++++++++++--------
 drivers/bus/pci/private.h        |   9 ++
 lib/eal/include/rte_vfio.h       |   1 -
 5 files changed, 158 insertions(+), 55 deletions(-)

diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index 4c2c5ba382..04e21ae20f 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -645,7 +645,7 @@ int rte_pci_read_config(const struct rte_pci_device *device,
 		return pci_uio_read_config(intr_handle, buf, len, offset);
 #ifdef VFIO_PRESENT
 	case RTE_PCI_KDRV_VFIO:
-		return pci_vfio_read_config(intr_handle, buf, len, offset);
+		return pci_vfio_read_config(device, buf, len, offset);
 #endif
 	default:
 		rte_pci_device_name(&device->addr, devname,
@@ -669,7 +669,7 @@ int rte_pci_write_config(const struct rte_pci_device *device,
 		return pci_uio_write_config(intr_handle, buf, len, offset);
 #ifdef VFIO_PRESENT
 	case RTE_PCI_KDRV_VFIO:
-		return pci_vfio_write_config(intr_handle, buf, len, offset);
+		return pci_vfio_write_config(device, buf, len, offset);
 #endif
 	default:
 		rte_pci_device_name(&device->addr, devname,
diff --git a/drivers/bus/pci/linux/pci_init.h b/drivers/bus/pci/linux/pci_init.h
index dcea726186..9f6659ba6e 100644
--- a/drivers/bus/pci/linux/pci_init.h
+++ b/drivers/bus/pci/linux/pci_init.h
@@ -66,9 +66,9 @@ int pci_uio_ioport_unmap(struct rte_pci_ioport *p);
 #endif
 
 /* access config space */
-int pci_vfio_read_config(const struct rte_intr_handle *intr_handle,
+int pci_vfio_read_config(const struct rte_pci_device *dev,
 			 void *buf, size_t len, off_t offs);
-int pci_vfio_write_config(const struct rte_intr_handle *intr_handle,
+int pci_vfio_write_config(const struct rte_pci_device *dev,
 			  const void *buf, size_t len, off_t offs);
 
 int pci_vfio_ioport_map(struct rte_pci_device *dev, int bar,
diff --git a/drivers/bus/pci/linux/pci_vfio.c b/drivers/bus/pci/linux/pci_vfio.c
index fab3483d9f..1748ad2ae0 100644
--- a/drivers/bus/pci/linux/pci_vfio.c
+++ b/drivers/bus/pci/linux/pci_vfio.c
@@ -43,45 +43,82 @@ static struct rte_tailq_elem rte_vfio_tailq = {
 };
 EAL_REGISTER_TAILQ(rte_vfio_tailq)
 
+static int
+pci_vfio_get_region(const struct rte_pci_device *dev, int index,
+		    uint64_t *size, uint64_t *offset)
+{
+	const struct rte_pci_device_internal *pdev =
+		RTE_PCI_DEVICE_INTERNAL_CONST(dev);
+
+	if (index >= VFIO_PCI_NUM_REGIONS || index >= RTE_MAX_PCI_REGIONS)
+		return -1;
+
+	if (pdev->region[index].size == 0 && pdev->region[index].offset == 0)
+		return -1;
+
+	*size   = pdev->region[index].size;
+	*offset = pdev->region[index].offset;
+
+	return 0;
+}
+
 int
-pci_vfio_read_config(const struct rte_intr_handle *intr_handle,
+pci_vfio_read_config(const struct rte_pci_device *dev,
 		    void *buf, size_t len, off_t offs)
 {
-	int vfio_dev_fd = rte_intr_dev_fd_get(intr_handle);
+	uint64_t size, offset;
+	int fd;
 
-	if (vfio_dev_fd < 0)
+	fd = rte_intr_dev_fd_get(dev->intr_handle);
+
+	if (pci_vfio_get_region(dev, VFIO_PCI_CONFIG_REGION_INDEX,
+				&size, &offset) != 0)
+		return -1;
+
+	if ((uint64_t)len + offs > size)
 		return -1;
 
-	return pread64(vfio_dev_fd, buf, len,
-	       VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) + offs);
+	return pread64(fd, buf, len, offset + offs);
 }
 
 int
-pci_vfio_write_config(const struct rte_intr_handle *intr_handle,
+pci_vfio_write_config(const struct rte_pci_device *dev,
 		    const void *buf, size_t len, off_t offs)
 {
-	int vfio_dev_fd = rte_intr_dev_fd_get(intr_handle);
+	uint64_t size, offset;
+	int fd;
 
-	if (vfio_dev_fd < 0)
+	fd = rte_intr_dev_fd_get(dev->intr_handle);
+
+	if (pci_vfio_get_region(dev, VFIO_PCI_CONFIG_REGION_INDEX,
+				&size, &offset) != 0)
 		return -1;
 
-	return pwrite64(vfio_dev_fd, buf, len,
-	       VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) + offs);
+	if ((uint64_t)len + offs > size)
+		return -1;
+
+	return pwrite64(fd, buf, len, offset + offs);
 }
 
 /* get PCI BAR number where MSI-X interrupts are */
 static int
-pci_vfio_get_msix_bar(int fd, struct pci_msix_table *msix_table)
+pci_vfio_get_msix_bar(const struct rte_pci_device *dev, int fd,
+	struct pci_msix_table *msix_table)
 {
 	int ret;
 	uint32_t reg;
 	uint16_t flags;
 	uint8_t cap_id, cap_offset;
+	uint64_t size, offset;
+
+	if (pci_vfio_get_region(dev, VFIO_PCI_CONFIG_REGION_INDEX,
+		&size, &offset) != 0) {
+		RTE_LOG(ERR, EAL, "Cannot get offset of CONFIG region.\n");
+		return -1;
+	}
 
 	/* read PCI capability pointer from config space */
-	ret = pread64(fd, &reg, sizeof(reg),
-			VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
-			PCI_CAPABILITY_LIST);
+	ret = pread64(fd, &reg, sizeof(reg), offset + PCI_CAPABILITY_LIST);
 	if (ret != sizeof(reg)) {
 		RTE_LOG(ERR, EAL,
 			"Cannot read capability pointer from PCI config space!\n");
@@ -94,9 +131,7 @@ pci_vfio_get_msix_bar(int fd, struct pci_msix_table *msix_table)
 	while (cap_offset) {
 
 		/* read PCI capability ID */
-		ret = pread64(fd, &reg, sizeof(reg),
-				VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
-				cap_offset);
+		ret = pread64(fd, &reg, sizeof(reg), offset + cap_offset);
 		if (ret != sizeof(reg)) {
 			RTE_LOG(ERR, EAL,
 				"Cannot read capability ID from PCI config space!\n");
@@ -108,9 +143,7 @@ pci_vfio_get_msix_bar(int fd, struct pci_msix_table *msix_table)
 
 		/* if we haven't reached MSI-X, check next capability */
 		if (cap_id != PCI_CAP_ID_MSIX) {
-			ret = pread64(fd, &reg, sizeof(reg),
-					VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
-					cap_offset);
+			ret = pread64(fd, &reg, sizeof(reg), offset + cap_offset);
 			if (ret != sizeof(reg)) {
 				RTE_LOG(ERR, EAL,
 					"Cannot read capability pointer from PCI config space!\n");
@@ -125,18 +158,14 @@ pci_vfio_get_msix_bar(int fd, struct pci_msix_table *msix_table)
 		/* else, read table offset */
 		else {
 			/* table offset resides in the next 4 bytes */
-			ret = pread64(fd, &reg, sizeof(reg),
-					VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
-					cap_offset + 4);
+			ret = pread64(fd, &reg, sizeof(reg), offset + cap_offset + 4);
 			if (ret != sizeof(reg)) {
 				RTE_LOG(ERR, EAL,
 					"Cannot read table offset from PCI config space!\n");
 				return -1;
 			}
 
-			ret = pread64(fd, &flags, sizeof(flags),
-					VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
-					cap_offset + 2);
+			ret = pread64(fd, &flags, sizeof(flags), offset + cap_offset + 2);
 			if (ret != sizeof(flags)) {
 				RTE_LOG(ERR, EAL,
 					"Cannot read table flags from PCI config space!\n");
@@ -156,14 +185,19 @@ pci_vfio_get_msix_bar(int fd, struct pci_msix_table *msix_table)
 
 /* enable PCI bus memory space */
 static int
-pci_vfio_enable_bus_memory(int dev_fd)
+pci_vfio_enable_bus_memory(struct rte_pci_device *dev, int dev_fd)
 {
+	uint64_t size, offset;
 	uint16_t cmd;
 	int ret;
 
-	ret = pread64(dev_fd, &cmd, sizeof(cmd),
-		      VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
-		      PCI_COMMAND);
+	if (pci_vfio_get_region(dev, VFIO_PCI_CONFIG_REGION_INDEX,
+		&size, &offset) != 0) {
+		RTE_LOG(ERR, EAL, "Cannot get offset of CONFIG region.\n");
+		return -1;
+	}
+
+	ret = pread64(dev_fd, &cmd, sizeof(cmd), offset + PCI_COMMAND);
 
 	if (ret != sizeof(cmd)) {
 		RTE_LOG(ERR, EAL, "Cannot read command from PCI config space!\n");
@@ -174,9 +208,7 @@ pci_vfio_enable_bus_memory(int dev_fd)
 		return 0;
 
 	cmd |= PCI_COMMAND_MEMORY;
-	ret = pwrite64(dev_fd, &cmd, sizeof(cmd),
-		       VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
-		       PCI_COMMAND);
+	ret = pwrite64(dev_fd, &cmd, sizeof(cmd), offset + PCI_COMMAND);
 
 	if (ret != sizeof(cmd)) {
 		RTE_LOG(ERR, EAL, "Cannot write command to PCI config space!\n");
@@ -188,14 +220,19 @@ pci_vfio_enable_bus_memory(int dev_fd)
 
 /* set PCI bus mastering */
 static int
-pci_vfio_set_bus_master(int dev_fd, bool op)
+pci_vfio_set_bus_master(const struct rte_pci_device *dev, int dev_fd, bool op)
 {
+	uint64_t size, offset;
 	uint16_t reg;
 	int ret;
 
-	ret = pread64(dev_fd, &reg, sizeof(reg),
-			VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
-			PCI_COMMAND);
+	if (pci_vfio_get_region(dev, VFIO_PCI_CONFIG_REGION_INDEX,
+		&size, &offset) != 0) {
+		RTE_LOG(ERR, EAL, "Cannot get offset of CONFIG region.\n");
+		return -1;
+	}
+
+	ret = pread64(dev_fd, &reg, sizeof(reg), offset + PCI_COMMAND);
 	if (ret != sizeof(reg)) {
 		RTE_LOG(ERR, EAL, "Cannot read command from PCI config space!\n");
 		return -1;
@@ -207,9 +244,7 @@ pci_vfio_set_bus_master(int dev_fd, bool op)
 	else
 		reg &= ~(PCI_COMMAND_MASTER);
 
-	ret = pwrite64(dev_fd, &reg, sizeof(reg),
-			VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
-			PCI_COMMAND);
+	ret = pwrite64(dev_fd, &reg, sizeof(reg), offset + PCI_COMMAND);
 
 	if (ret != sizeof(reg)) {
 		RTE_LOG(ERR, EAL, "Cannot write command to PCI config space!\n");
@@ -458,14 +493,21 @@ pci_vfio_disable_notifier(struct rte_pci_device *dev)
 #endif
 
 static int
-pci_vfio_is_ioport_bar(int vfio_dev_fd, int bar_index)
+pci_vfio_is_ioport_bar(const struct rte_pci_device *dev, int vfio_dev_fd,
+	int bar_index)
 {
+	uint64_t size, offset;
 	uint32_t ioport_bar;
 	int ret;
 
+	if (pci_vfio_get_region(dev, VFIO_PCI_CONFIG_REGION_INDEX,
+		&size, &offset) != 0) {
+		RTE_LOG(ERR, EAL, "Cannot get offset of CONFIG region.\n");
+		return -1;
+	}
+
 	ret = pread64(vfio_dev_fd, &ioport_bar, sizeof(ioport_bar),
-			  VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX)
-			  + PCI_BASE_ADDRESS_0 + bar_index*4);
+			  offset + PCI_BASE_ADDRESS_0 + bar_index * 4);
 	if (ret != sizeof(ioport_bar)) {
 		RTE_LOG(ERR, EAL, "Cannot read command (%x) from config space!\n",
 			PCI_BASE_ADDRESS_0 + bar_index*4);
@@ -483,13 +525,13 @@ pci_rte_vfio_setup_device(struct rte_pci_device *dev, int vfio_dev_fd)
 		return -1;
 	}
 
-	if (pci_vfio_enable_bus_memory(vfio_dev_fd)) {
+	if (pci_vfio_enable_bus_memory(dev, vfio_dev_fd)) {
 		RTE_LOG(ERR, EAL, "Cannot enable bus memory!\n");
 		return -1;
 	}
 
 	/* set bus mastering for the device */
-	if (pci_vfio_set_bus_master(vfio_dev_fd, true)) {
+	if (pci_vfio_set_bus_master(dev, vfio_dev_fd, true)) {
 		RTE_LOG(ERR, EAL, "Cannot set up bus mastering!\n");
 		return -1;
 	}
@@ -719,11 +761,40 @@ pci_vfio_msix_is_mappable(int vfio_dev_fd, int msix_region)
 	return ret;
 }
 
+static int
+pci_vfio_fill_regions(struct rte_pci_device *dev, int vfio_dev_fd,
+		      struct vfio_device_info *device_info)
+{
+	struct rte_pci_device_internal *pdev = RTE_PCI_DEVICE_INTERNAL(dev);
+	struct vfio_region_info *reg = NULL;
+	int nb_maps, i, ret;
+
+	nb_maps = RTE_MIN((int)device_info->num_regions,
+			VFIO_PCI_CONFIG_REGION_INDEX + 1);
+
+	for (i = 0; i < nb_maps; i++) {
+		ret = pci_vfio_get_region_info(vfio_dev_fd, &reg, i);
+		if (ret < 0) {
+			RTE_LOG(DEBUG, EAL, "%s cannot get device region info error %i (%s)\n",
+				dev->name, errno, strerror(errno));
+			return -1;
+		}
+
+		pdev->region[i].size = reg->size;
+		pdev->region[i].offset = reg->offset;
+
+		free(reg);
+	}
+
+	return 0;
+}
 
 static int
 pci_vfio_map_resource_primary(struct rte_pci_device *dev)
 {
+	struct rte_pci_device_internal *pdev = RTE_PCI_DEVICE_INTERNAL(dev);
 	struct vfio_device_info device_info = { .argsz = sizeof(device_info) };
+	struct vfio_region_info *reg = NULL;
 	char pci_addr[PATH_MAX] = {0};
 	int vfio_dev_fd;
 	struct rte_pci_addr *loc = &dev->addr;
@@ -767,11 +838,22 @@ pci_vfio_map_resource_primary(struct rte_pci_device *dev)
 	/* map BARs */
 	maps = vfio_res->maps;
 
+	ret = pci_vfio_get_region_info(vfio_dev_fd, &reg,
+		VFIO_PCI_CONFIG_REGION_INDEX);
+	if (ret < 0) {
+		RTE_LOG(ERR, EAL, "%s cannot get device region info error %i (%s)\n",
+			dev->name, errno, strerror(errno));
+		goto err_vfio_res;
+	}
+	pdev->region[VFIO_PCI_CONFIG_REGION_INDEX].size = reg->size;
+	pdev->region[VFIO_PCI_CONFIG_REGION_INDEX].offset = reg->offset;
+	free(reg);
+
 	vfio_res->msix_table.bar_index = -1;
 	/* get MSI-X BAR, if any (we have to know where it is because we can't
 	 * easily mmap it when using VFIO)
 	 */
-	ret = pci_vfio_get_msix_bar(vfio_dev_fd, &vfio_res->msix_table);
+	ret = pci_vfio_get_msix_bar(dev, vfio_dev_fd, &vfio_res->msix_table);
 	if (ret < 0) {
 		RTE_LOG(ERR, EAL, "%s cannot get MSI-X BAR number!\n",
 				pci_addr);
@@ -792,7 +874,6 @@ pci_vfio_map_resource_primary(struct rte_pci_device *dev)
 	}
 
 	for (i = 0; i < vfio_res->nb_maps; i++) {
-		struct vfio_region_info *reg = NULL;
 		void *bar_addr;
 
 		ret = pci_vfio_get_region_info(vfio_dev_fd, &reg, i);
@@ -803,8 +884,11 @@ pci_vfio_map_resource_primary(struct rte_pci_device *dev)
 			goto err_vfio_res;
 		}
 
+		pdev->region[i].size = reg->size;
+		pdev->region[i].offset = reg->offset;
+
 		/* chk for io port region */
-		ret = pci_vfio_is_ioport_bar(vfio_dev_fd, i);
+		ret = pci_vfio_is_ioport_bar(dev, vfio_dev_fd, i);
 		if (ret < 0) {
 			free(reg);
 			goto err_vfio_res;
@@ -916,6 +1000,10 @@ pci_vfio_map_resource_secondary(struct rte_pci_device *dev)
 	if (ret)
 		return ret;
 
+	ret = pci_vfio_fill_regions(dev, vfio_dev_fd, &device_info);
+	if (ret)
+		return ret;
+
 	/* map BARs */
 	maps = vfio_res->maps;
 
@@ -1031,7 +1119,7 @@ pci_vfio_unmap_resource_primary(struct rte_pci_device *dev)
 	if (vfio_dev_fd < 0)
 		return -1;
 
-	if (pci_vfio_set_bus_master(vfio_dev_fd, false)) {
+	if (pci_vfio_set_bus_master(dev, vfio_dev_fd, false)) {
 		RTE_LOG(ERR, EAL, "%s cannot unset bus mastering for PCI device!\n",
 				pci_addr);
 		return -1;
@@ -1111,14 +1199,21 @@ int
 pci_vfio_ioport_map(struct rte_pci_device *dev, int bar,
 		    struct rte_pci_ioport *p)
 {
+	uint64_t size, offset;
+
 	if (bar < VFIO_PCI_BAR0_REGION_INDEX ||
 	    bar > VFIO_PCI_BAR5_REGION_INDEX) {
 		RTE_LOG(ERR, EAL, "invalid bar (%d)!\n", bar);
 		return -1;
 	}
 
+	if (pci_vfio_get_region(dev, bar, &size, &offset) != 0) {
+		RTE_LOG(ERR, EAL, "Cannot get offset of region %d.\n", bar);
+		return -1;
+	}
+
 	p->dev = dev;
-	p->base = VFIO_GET_REGION_ADDR(bar);
+	p->base = offset;
 	return 0;
 }
 
diff --git a/drivers/bus/pci/private.h b/drivers/bus/pci/private.h
index b564646e03..2d6991ccb7 100644
--- a/drivers/bus/pci/private.h
+++ b/drivers/bus/pci/private.h
@@ -13,6 +13,8 @@
 #include <rte_os_shim.h>
 #include <rte_pci.h>
 
+#define RTE_MAX_PCI_REGIONS    9
+
 /*
  * Convert struct rte_pci_device to struct rte_pci_device_internal
  */
@@ -42,8 +44,15 @@ extern struct rte_pci_bus rte_pci_bus;
 struct rte_pci_driver;
 struct rte_pci_device;
 
+struct rte_pci_region {
+	uint64_t size;
+	uint64_t offset;
+};
+
 struct rte_pci_device_internal {
 	struct rte_pci_device device;
+	/* PCI regions provided by e.g. VFIO. */
+	struct rte_pci_region region[RTE_MAX_PCI_REGIONS];
 };
 
 /**
diff --git a/lib/eal/include/rte_vfio.h b/lib/eal/include/rte_vfio.h
index 7bdb8932b2..3487c4f2a2 100644
--- a/lib/eal/include/rte_vfio.h
+++ b/lib/eal/include/rte_vfio.h
@@ -38,7 +38,6 @@ extern "C" {
 #define VFIO_CONTAINER_PATH "/dev/vfio/vfio"
 #define VFIO_GROUP_FMT "/dev/vfio/%u"
 #define VFIO_NOIOMMU_GROUP_FMT "/dev/vfio/noiommu-%u"
-#define VFIO_GET_REGION_ADDR(x) ((uint64_t) x << 40ULL)
 #define VFIO_GET_REGION_IDX(x) (x >> 40)
 #define VFIO_NOIOMMU_MODE      \
 	"/sys/module/vfio/parameters/enable_unsafe_noiommu_mode"
-- 
2.25.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v1 3/4] bus/pci: introduce helper for MMIO read and write
  2023-05-15  6:46 ` [PATCH v1 " Miao Li
  2023-05-15  6:46   ` [PATCH v1 1/4] bus/pci: introduce an internal representation of PCI device Miao Li
  2023-05-15  6:46   ` [PATCH v1 2/4] bus/pci: avoid depending on private value in kernel source Miao Li
@ 2023-05-15  6:46   ` Miao Li
  2023-05-15  6:47   ` [PATCH v1 4/4] bus/pci: add VFIO sparse mmap support Miao Li
  3 siblings, 0 replies; 50+ messages in thread
From: Miao Li @ 2023-05-15  6:46 UTC (permalink / raw)
  To: dev
  Cc: skori, thomas, david.marchand, ferruh.yigit, chenbo.xia,
	yahui.cao, Anatoly Burakov

From: Chenbo Xia <chenbo.xia@intel.com>

The MMIO regions may not be mmap-able for VFIO-PCI devices.
In this case, the driver should explicitly do read and write
to access these regions.

Signed-off-by: Chenbo Xia <chenbo.xia@intel.com>
---
 drivers/bus/pci/bsd/pci.c        | 22 +++++++++++++++
 drivers/bus/pci/linux/pci.c      | 46 ++++++++++++++++++++++++++++++
 drivers/bus/pci/linux/pci_init.h | 10 +++++++
 drivers/bus/pci/linux/pci_uio.c  | 22 +++++++++++++++
 drivers/bus/pci/linux/pci_vfio.c | 36 ++++++++++++++++++++++++
 drivers/bus/pci/rte_bus_pci.h    | 48 ++++++++++++++++++++++++++++++++
 drivers/bus/pci/version.map      |  3 ++
 7 files changed, 187 insertions(+)

diff --git a/drivers/bus/pci/bsd/pci.c b/drivers/bus/pci/bsd/pci.c
index a747eca58c..27f12590d4 100644
--- a/drivers/bus/pci/bsd/pci.c
+++ b/drivers/bus/pci/bsd/pci.c
@@ -489,6 +489,28 @@ int rte_pci_write_config(const struct rte_pci_device *dev,
 	return -1;
 }
 
+/* Read PCI MMIO space. */
+int rte_pci_mmio_read(const struct rte_pci_device *dev, int bar,
+		      void *buf, size_t len, off_t offset)
+{
+	if (bar >= PCI_MAX_RESOURCE || dev->mem_resource[bar].addr == NULL ||
+			(uint64_t)offset + len > dev->mem_resource[bar].len)
+		return -1;
+	memcpy(buf, (uint8_t *)dev->mem_resource[bar].addr + offset, len);
+	return len;
+}
+
+/* Write PCI MMIO space. */
+int rte_pci_mmio_write(const struct rte_pci_device *dev, int bar,
+		       const void *buf, size_t len, off_t offset)
+{
+	if (bar >= PCI_MAX_RESOURCE || dev->mem_resource[bar].addr == NULL ||
+			(uint64_t)offset + len > dev->mem_resource[bar].len)
+		return -1;
+	memcpy((uint8_t *)dev->mem_resource[bar].addr + offset, buf, len);
+	return len;
+}
+
 int
 rte_pci_ioport_map(struct rte_pci_device *dev, int bar,
 		struct rte_pci_ioport *p)
diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index 04e21ae20f..3d237398d9 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -680,6 +680,52 @@ int rte_pci_write_config(const struct rte_pci_device *device,
 	}
 }
 
+/* Read PCI MMIO space. */
+int rte_pci_mmio_read(const struct rte_pci_device *device, int bar,
+		void *buf, size_t len, off_t offset)
+{
+	char devname[RTE_DEV_NAME_MAX_LEN] = "";
+
+	switch (device->kdrv) {
+	case RTE_PCI_KDRV_IGB_UIO:
+	case RTE_PCI_KDRV_UIO_GENERIC:
+		return pci_uio_mmio_read(device, bar, buf, len, offset);
+#ifdef VFIO_PRESENT
+	case RTE_PCI_KDRV_VFIO:
+		return pci_vfio_mmio_read(device, bar, buf, len, offset);
+#endif
+	default:
+		rte_pci_device_name(&device->addr, devname,
+				    RTE_DEV_NAME_MAX_LEN);
+		RTE_LOG(ERR, EAL,
+			"Unknown driver type for %s\n", devname);
+		return -1;
+	}
+}
+
+/* Write PCI MMIO space. */
+int rte_pci_mmio_write(const struct rte_pci_device *device, int bar,
+		const void *buf, size_t len, off_t offset)
+{
+	char devname[RTE_DEV_NAME_MAX_LEN] = "";
+
+	switch (device->kdrv) {
+	case RTE_PCI_KDRV_IGB_UIO:
+	case RTE_PCI_KDRV_UIO_GENERIC:
+		return pci_uio_mmio_write(device, bar, buf, len, offset);
+#ifdef VFIO_PRESENT
+	case RTE_PCI_KDRV_VFIO:
+		return pci_vfio_mmio_write(device, bar, buf, len, offset);
+#endif
+	default:
+		rte_pci_device_name(&device->addr, devname,
+				    RTE_DEV_NAME_MAX_LEN);
+		RTE_LOG(ERR, EAL,
+			"Unknown driver type for %s\n", devname);
+		return -1;
+	}
+}
+
 int
 rte_pci_ioport_map(struct rte_pci_device *dev, int bar,
 		struct rte_pci_ioport *p)
diff --git a/drivers/bus/pci/linux/pci_init.h b/drivers/bus/pci/linux/pci_init.h
index 9f6659ba6e..d842809ccd 100644
--- a/drivers/bus/pci/linux/pci_init.h
+++ b/drivers/bus/pci/linux/pci_init.h
@@ -37,6 +37,11 @@ int pci_uio_read_config(const struct rte_intr_handle *intr_handle,
 int pci_uio_write_config(const struct rte_intr_handle *intr_handle,
 			 const void *buf, size_t len, off_t offs);
 
+int pci_uio_mmio_read(const struct rte_pci_device *dev, int bar,
+			void *buf, size_t len, off_t offset);
+int pci_uio_mmio_write(const struct rte_pci_device *dev, int bar,
+			const void *buf, size_t len, off_t offset);
+
 int pci_uio_ioport_map(struct rte_pci_device *dev, int bar,
 		       struct rte_pci_ioport *p);
 void pci_uio_ioport_read(struct rte_pci_ioport *p,
@@ -71,6 +76,11 @@ int pci_vfio_read_config(const struct rte_pci_device *dev,
 int pci_vfio_write_config(const struct rte_pci_device *dev,
 			  const void *buf, size_t len, off_t offs);
 
+int pci_vfio_mmio_read(const struct rte_pci_device *dev, int bar,
+			void *buf, size_t len, off_t offset);
+int pci_vfio_mmio_write(const struct rte_pci_device *dev, int bar,
+			const void *buf, size_t len, off_t offset);
+
 int pci_vfio_ioport_map(struct rte_pci_device *dev, int bar,
 		        struct rte_pci_ioport *p);
 void pci_vfio_ioport_read(struct rte_pci_ioport *p,
diff --git a/drivers/bus/pci/linux/pci_uio.c b/drivers/bus/pci/linux/pci_uio.c
index d52125e49b..2bf16e9369 100644
--- a/drivers/bus/pci/linux/pci_uio.c
+++ b/drivers/bus/pci/linux/pci_uio.c
@@ -55,6 +55,28 @@ pci_uio_write_config(const struct rte_intr_handle *intr_handle,
 	return pwrite(uio_cfg_fd, buf, len, offset);
 }
 
+int
+pci_uio_mmio_read(const struct rte_pci_device *dev, int bar,
+		  void *buf, size_t len, off_t offset)
+{
+	if (bar >= PCI_MAX_RESOURCE || dev->mem_resource[bar].addr == NULL ||
+			(uint64_t)offset + len > dev->mem_resource[bar].len)
+		return -1;
+	memcpy(buf, (uint8_t *)dev->mem_resource[bar].addr + offset, len);
+	return len;
+}
+
+int
+pci_uio_mmio_write(const struct rte_pci_device *dev, int bar,
+		   const void *buf, size_t len, off_t offset)
+{
+	if (bar >= PCI_MAX_RESOURCE || dev->mem_resource[bar].addr == NULL ||
+			(uint64_t)offset + len > dev->mem_resource[bar].len)
+		return -1;
+	memcpy((uint8_t *)dev->mem_resource[bar].addr + offset, buf, len);
+	return len;
+}
+
 static int
 pci_uio_set_bus_master(int dev_fd)
 {
diff --git a/drivers/bus/pci/linux/pci_vfio.c b/drivers/bus/pci/linux/pci_vfio.c
index 1748ad2ae0..f6289c907f 100644
--- a/drivers/bus/pci/linux/pci_vfio.c
+++ b/drivers/bus/pci/linux/pci_vfio.c
@@ -1258,6 +1258,42 @@ pci_vfio_ioport_unmap(struct rte_pci_ioport *p)
 	return -1;
 }
 
+int
+pci_vfio_mmio_read(const struct rte_pci_device *dev, int bar,
+		   void *buf, size_t len, off_t offs)
+{
+	uint64_t size, offset;
+	int fd;
+
+	fd = rte_intr_dev_fd_get(dev->intr_handle);
+
+	if (pci_vfio_get_region(dev, bar, &size, &offset) != 0)
+		return -1;
+
+	if ((uint64_t)len + offs > size)
+		return -1;
+
+	return pread64(fd, buf, len, offset + offs);
+}
+
+int
+pci_vfio_mmio_write(const struct rte_pci_device *dev, int bar,
+		    const void *buf, size_t len, off_t offs)
+{
+	uint64_t size, offset;
+	int fd;
+
+	fd = rte_intr_dev_fd_get(dev->intr_handle);
+
+	if (pci_vfio_get_region(dev, bar, &size, &offset) != 0)
+		return -1;
+
+	if ((uint64_t)len + offs > size)
+		return -1;
+
+	return pwrite64(fd, buf, len, offset + offs);
+}
+
 int
 pci_vfio_is_enabled(void)
 {
diff --git a/drivers/bus/pci/rte_bus_pci.h b/drivers/bus/pci/rte_bus_pci.h
index b193114fe5..82da087f24 100644
--- a/drivers/bus/pci/rte_bus_pci.h
+++ b/drivers/bus/pci/rte_bus_pci.h
@@ -135,6 +135,54 @@ int rte_pci_read_config(const struct rte_pci_device *device,
 int rte_pci_write_config(const struct rte_pci_device *device,
 		const void *buf, size_t len, off_t offset);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Read from a MMIO pci resource.
+ *
+ * @param device
+ *   A pointer to a rte_pci_device structure describing the device
+ *   to use
+ * @param bar
+ *   Index of the io pci resource we want to access.
+ * @param buf
+ *   A data buffer where the bytes should be read into
+ * @param len
+ *   The length of the data buffer.
+ * @param offset
+ *   The offset into MMIO space described by @bar
+ * @return
+ *  Number of bytes read on success, negative on error.
+ */
+__rte_experimental
+int rte_pci_mmio_read(const struct rte_pci_device *device, int bar,
+		void *buf, size_t len, off_t offset);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Write to a MMIO pci resource.
+ *
+ * @param device
+ *   A pointer to a rte_pci_device structure describing the device
+ *   to use
+ * @param bar
+ *   Index of the io pci resource we want to access.
+ * @param buf
+ *   A data buffer containing the bytes should be written
+ * @param len
+ *   The length of the data buffer.
+ * @param offset
+ *   The offset into MMIO space described by @bar
+ * @return
+ *  Number of bytes written on success, negative on error.
+ */
+__rte_experimental
+int rte_pci_mmio_write(const struct rte_pci_device *device, int bar,
+		const void *buf, size_t len, off_t offset);
+
 /**
  * Initialize a rte_pci_ioport object for a pci device io resource.
  *
diff --git a/drivers/bus/pci/version.map b/drivers/bus/pci/version.map
index 161ab86d3b..00fde139ca 100644
--- a/drivers/bus/pci/version.map
+++ b/drivers/bus/pci/version.map
@@ -21,6 +21,9 @@ EXPERIMENTAL {
 
 	# added in 21.08
 	rte_pci_set_bus_master;
+	# added in 23.07
+	rte_pci_mmio_read;
+	rte_pci_mmio_write;
 };
 
 INTERNAL {
-- 
2.25.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v1 4/4] bus/pci: add VFIO sparse mmap support
  2023-05-15  6:46 ` [PATCH v1 " Miao Li
                     ` (2 preceding siblings ...)
  2023-05-15  6:46   ` [PATCH v1 3/4] bus/pci: introduce helper for MMIO read and write Miao Li
@ 2023-05-15  6:47   ` Miao Li
  2023-05-15  9:41     ` [PATCH v2 0/4] Support VFIO sparse mmap in PCI bus Miao Li
  2023-05-15 15:52     ` [PATCH v1 4/4] bus/pci: add VFIO sparse mmap support Stephen Hemminger
  3 siblings, 2 replies; 50+ messages in thread
From: Miao Li @ 2023-05-15  6:47 UTC (permalink / raw)
  To: dev
  Cc: skori, thomas, david.marchand, ferruh.yigit, chenbo.xia,
	yahui.cao, Anatoly Burakov

This patch adds sparse mmap support in PCI bus. Sparse mmap is a
capability defined in VFIO which allows multiple mmap areas in one
VFIO region.

In this patch, the sparse mmap regions are mapped to one continuous
virtual address region that follows device-specific BAR layout. So,
driver can still access all mapped sparse mmap regions by using
'bar_base_address + bar_offset'.

Signed-off-by: Miao Li <miao.li@intel.com>
Signed-off-by: Chenbo Xia <chenbo.xia@intel.com>
---
 drivers/bus/pci/linux/pci_vfio.c | 104 +++++++++++++++++++++++++++----
 drivers/bus/pci/private.h        |   2 +
 2 files changed, 94 insertions(+), 12 deletions(-)

diff --git a/drivers/bus/pci/linux/pci_vfio.c b/drivers/bus/pci/linux/pci_vfio.c
index f6289c907f..304c168e01 100644
--- a/drivers/bus/pci/linux/pci_vfio.c
+++ b/drivers/bus/pci/linux/pci_vfio.c
@@ -673,6 +673,54 @@ pci_vfio_mmap_bar(int vfio_dev_fd, struct mapped_pci_resource *vfio_res,
 	return 0;
 }
 
+static int
+pci_vfio_sparse_mmap_bar(int vfio_dev_fd, struct mapped_pci_resource *vfio_res,
+		int bar_index, int additional_flags)
+{
+	struct pci_map *bar = &vfio_res->maps[bar_index];
+	struct vfio_region_sparse_mmap_area *sparse;
+	void *bar_addr;
+	uint32_t i;
+
+	if (bar->size == 0) {
+		RTE_LOG(DEBUG, EAL, "Bar size is 0, skip BAR%d\n", bar_index);
+		return 0;
+	}
+
+	/* reserve the address using an inaccessible mapping */
+	bar_addr = mmap(bar->addr, bar->size, 0, MAP_PRIVATE |
+			MAP_ANONYMOUS | additional_flags, -1, 0);
+	if (bar_addr != MAP_FAILED) {
+		void *map_addr = NULL;
+		for (i = 0; i < bar->nr_areas; i++) {
+			sparse = &bar->areas[i];
+			if (sparse->size) {
+				void *addr = RTE_PTR_ADD(bar_addr, sparse->offset);
+				map_addr = pci_map_resource(addr, vfio_dev_fd,
+					bar->offset + sparse->offset, sparse->size,
+					RTE_MAP_FORCE_ADDRESS);
+				if (map_addr == NULL) {
+					munmap(bar_addr, bar->size);
+					RTE_LOG(ERR, EAL, "Failed to map pci BAR%d\n",
+						bar_index);
+					goto err_map;
+				}
+			}
+		}
+	} else {
+		RTE_LOG(ERR, EAL, "Failed to create inaccessible mapping for BAR%d\n",
+			bar_index);
+		goto err_map;
+	}
+
+	bar->addr = bar_addr;
+	return 0;
+
+err_map:
+	bar->nr_areas = 0;
+	return -1;
+}
+
 /*
  * region info may contain capability headers, so we need to keep reallocating
  * the memory until we match allocated memory size with argsz.
@@ -875,6 +923,8 @@ pci_vfio_map_resource_primary(struct rte_pci_device *dev)
 
 	for (i = 0; i < vfio_res->nb_maps; i++) {
 		void *bar_addr;
+		struct vfio_info_cap_header *hdr;
+		struct vfio_region_info_cap_sparse_mmap *sparse;
 
 		ret = pci_vfio_get_region_info(vfio_dev_fd, &reg, i);
 		if (ret < 0) {
@@ -920,12 +970,33 @@ pci_vfio_map_resource_primary(struct rte_pci_device *dev)
 		maps[i].size = reg->size;
 		maps[i].path = NULL; /* vfio doesn't have per-resource paths */
 
-		ret = pci_vfio_mmap_bar(vfio_dev_fd, vfio_res, i, 0);
-		if (ret < 0) {
-			RTE_LOG(ERR, EAL, "%s mapping BAR%i failed: %s\n",
-					pci_addr, i, strerror(errno));
-			free(reg);
-			goto err_vfio_res;
+		hdr = pci_vfio_info_cap(reg, VFIO_REGION_INFO_CAP_SPARSE_MMAP);
+
+		if (hdr != NULL) {
+			sparse = container_of(hdr,
+				struct vfio_region_info_cap_sparse_mmap, header);
+			if (sparse->nr_areas > 0) {
+				maps[i].nr_areas = sparse->nr_areas;
+				maps[i].areas = sparse->areas;
+			}
+		}
+
+		if (maps[i].nr_areas > 0) {
+			ret = pci_vfio_sparse_mmap_bar(vfio_dev_fd, vfio_res, i, 0);
+			if (ret < 0) {
+				RTE_LOG(ERR, EAL, "%s sparse mapping BAR%i failed: %s\n",
+						pci_addr, i, strerror(errno));
+				free(reg);
+				goto err_vfio_res;
+			}
+		} else {
+			ret = pci_vfio_mmap_bar(vfio_dev_fd, vfio_res, i, 0);
+			if (ret < 0) {
+				RTE_LOG(ERR, EAL, "%s mapping BAR%i failed: %s\n",
+						pci_addr, i, strerror(errno));
+				free(reg);
+				goto err_vfio_res;
+			}
 		}
 
 		dev->mem_resource[i].addr = maps[i].addr;
@@ -1008,11 +1079,20 @@ pci_vfio_map_resource_secondary(struct rte_pci_device *dev)
 	maps = vfio_res->maps;
 
 	for (i = 0; i < vfio_res->nb_maps; i++) {
-		ret = pci_vfio_mmap_bar(vfio_dev_fd, vfio_res, i, MAP_FIXED);
-		if (ret < 0) {
-			RTE_LOG(ERR, EAL, "%s mapping BAR%i failed: %s\n",
-					pci_addr, i, strerror(errno));
-			goto err_vfio_dev_fd;
+		if (maps[i].nr_areas > 0) {
+			ret = pci_vfio_sparse_mmap_bar(vfio_dev_fd, vfio_res, i, 0);
+			if (ret < 0) {
+				RTE_LOG(ERR, EAL, "%s sparse mapping BAR%i failed: %s\n",
+						pci_addr, i, strerror(errno));
+				goto err_vfio_dev_fd;
+			}
+		} else {
+			ret = pci_vfio_mmap_bar(vfio_dev_fd, vfio_res, i, 0);
+			if (ret < 0) {
+				RTE_LOG(ERR, EAL, "%s mapping BAR%i failed: %s\n",
+						pci_addr, i, strerror(errno));
+				goto err_vfio_dev_fd;
+			}
 		}
 
 		dev->mem_resource[i].addr = maps[i].addr;
@@ -1062,7 +1142,7 @@ find_and_unmap_vfio_resource(struct mapped_pci_res_list *vfio_res_list,
 		break;
 	}
 
-	if  (vfio_res == NULL)
+	if (vfio_res == NULL)
 		return vfio_res;
 
 	RTE_LOG(INFO, EAL, "Releasing PCI mapped resource for %s\n",
diff --git a/drivers/bus/pci/private.h b/drivers/bus/pci/private.h
index 2d6991ccb7..8b0ce73533 100644
--- a/drivers/bus/pci/private.h
+++ b/drivers/bus/pci/private.h
@@ -121,6 +121,8 @@ struct pci_map {
 	uint64_t offset;
 	uint64_t size;
 	uint64_t phaddr;
+	uint32_t nr_areas;
+	struct vfio_region_sparse_mmap_area *areas;
 };
 
 struct pci_msix_table {
-- 
2.25.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v2 0/4] Support VFIO sparse mmap in PCI bus
  2023-05-15  6:47   ` [PATCH v1 4/4] bus/pci: add VFIO sparse mmap support Miao Li
@ 2023-05-15  9:41     ` Miao Li
  2023-05-15  9:41       ` [PATCH v2 1/4] bus/pci: introduce an internal representation of PCI device Miao Li
                         ` (4 more replies)
  2023-05-15 15:52     ` [PATCH v1 4/4] bus/pci: add VFIO sparse mmap support Stephen Hemminger
  1 sibling, 5 replies; 50+ messages in thread
From: Miao Li @ 2023-05-15  9:41 UTC (permalink / raw)
  To: dev; +Cc: skori, thomas, david.marchand, ferruh.yigit, chenbo.xia, yahui.cao

This series introduces a VFIO standard capability, called sparse
mmap to PCI bus. In linux kernel, it's defined as
VFIO_REGION_INFO_CAP_SPARSE_MMAP. Sparse mmap means instead of
mmap whole BAR region into DPDK process, only mmap part of the
BAR region after getting sparse mmap information from kernel.
For the rest of BAR region that is not mmap-ed, DPDK process
can use pread/pwrite system calls to access. Sparse mmap is
useful when kernel does not want userspace to mmap whole BAR
region, or kernel wants to control over access to specific BAR
region. Vendors can choose to enable this feature or not for
their devices in their specific kernel modules.

In this patchset:

Patch 1-3 is mainly for introducing BAR access APIs so that
driver could use them to access specific BAR using pread/pwrite
system calls when part of the BAR is not mmap-able. Patch 4
adds the VFIO sparse mmap support finally.

v2:
1. add PCI device internal structure in bus/pci/windows/pci.c
2. fix parameter type error

Chenbo Xia (3):
  bus/pci: introduce an internal representation of PCI device
  bus/pci: avoid depending on private value in kernel source
  bus/pci: introduce helper for MMIO read and write

Miao Li (1):
  bus/pci: add VFIO sparse mmap support

 drivers/bus/pci/bsd/pci.c        |  35 +++-
 drivers/bus/pci/linux/pci.c      |  78 +++++--
 drivers/bus/pci/linux/pci_init.h |  14 +-
 drivers/bus/pci/linux/pci_uio.c  |  22 ++
 drivers/bus/pci/linux/pci_vfio.c | 335 +++++++++++++++++++++++++------
 drivers/bus/pci/pci_common.c     |  12 +-
 drivers/bus/pci/private.h        |  25 ++-
 drivers/bus/pci/rte_bus_pci.h    |  48 +++++
 drivers/bus/pci/version.map      |   3 +
 drivers/bus/pci/windows/pci.c    |  14 +-
 lib/eal/include/rte_vfio.h       |   1 -
 11 files changed, 491 insertions(+), 96 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v2 1/4] bus/pci: introduce an internal representation of PCI device
  2023-05-15  9:41     ` [PATCH v2 0/4] Support VFIO sparse mmap in PCI bus Miao Li
@ 2023-05-15  9:41       ` Miao Li
  2023-05-15  9:41       ` [PATCH v2 2/4] bus/pci: avoid depending on private value in kernel source Miao Li
                         ` (3 subsequent siblings)
  4 siblings, 0 replies; 50+ messages in thread
From: Miao Li @ 2023-05-15  9:41 UTC (permalink / raw)
  To: dev; +Cc: skori, thomas, david.marchand, ferruh.yigit, chenbo.xia, yahui.cao

From: Chenbo Xia <chenbo.xia@intel.com>

This patch introduces an internal representation of the PCI device
which will be used to store the internal information that don't have
to be exposed to drivers, e.g., the VFIO region sizes/offsets.

In this patch, the internal structure is simply a wrapper of the
rte_pci_device structure. More fields will be added.

Signed-off-by: Chenbo Xia <chenbo.xia@intel.com>
---
 drivers/bus/pci/bsd/pci.c     | 13 ++++++++-----
 drivers/bus/pci/linux/pci.c   | 28 ++++++++++++++++------------
 drivers/bus/pci/pci_common.c  | 12 ++++++------
 drivers/bus/pci/private.h     | 14 +++++++++++++-
 drivers/bus/pci/windows/pci.c | 14 +++++++++-----
 5 files changed, 52 insertions(+), 29 deletions(-)

diff --git a/drivers/bus/pci/bsd/pci.c b/drivers/bus/pci/bsd/pci.c
index 7459d15c7e..a747eca58c 100644
--- a/drivers/bus/pci/bsd/pci.c
+++ b/drivers/bus/pci/bsd/pci.c
@@ -208,16 +208,19 @@ pci_uio_map_resource_by_index(struct rte_pci_device *dev, int res_idx,
 static int
 pci_scan_one(int dev_pci_fd, struct pci_conf *conf)
 {
+	struct rte_pci_device_internal *pdev;
 	struct rte_pci_device *dev;
 	struct pci_bar_io bar;
 	unsigned i, max;
 
-	dev = malloc(sizeof(*dev));
-	if (dev == NULL) {
+	pdev = malloc(sizeof(*pdev));
+	if (pdev == NULL) {
+		RTE_LOG(ERR, EAL, "Cannot allocate memory for internal pci device\n");
 		return -1;
 	}
 
-	memset(dev, 0, sizeof(*dev));
+	memset(pdev, 0, sizeof(*pdev));
+	dev = &pdev->device;
 	dev->device.bus = &rte_pci_bus.bus;
 
 	dev->addr.domain = conf->pc_sel.pc_domain;
@@ -303,7 +306,7 @@ pci_scan_one(int dev_pci_fd, struct pci_conf *conf)
 				memmove(dev2->mem_resource,
 					dev->mem_resource,
 					sizeof(dev->mem_resource));
-				pci_free(dev);
+				pci_free(pdev);
 			}
 			return 0;
 		}
@@ -313,7 +316,7 @@ pci_scan_one(int dev_pci_fd, struct pci_conf *conf)
 	return 0;
 
 skipdev:
-	pci_free(dev);
+	pci_free(pdev);
 	return 0;
 }
 
diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index ebd1395502..4c2c5ba382 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -211,22 +211,26 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr *addr)
 {
 	char filename[PATH_MAX];
 	unsigned long tmp;
+	struct rte_pci_device_internal *pdev;
 	struct rte_pci_device *dev;
 	char driver[PATH_MAX];
 	int ret;
 
-	dev = malloc(sizeof(*dev));
-	if (dev == NULL)
+	pdev = malloc(sizeof(*pdev));
+	if (pdev == NULL) {
+		RTE_LOG(ERR, EAL, "Cannot allocate memory for internal pci device\n");
 		return -1;
+	}
 
-	memset(dev, 0, sizeof(*dev));
+	memset(pdev, 0, sizeof(*pdev));
+	dev = &pdev->device;
 	dev->device.bus = &rte_pci_bus.bus;
 	dev->addr = *addr;
 
 	/* get vendor id */
 	snprintf(filename, sizeof(filename), "%s/vendor", dirname);
 	if (eal_parse_sysfs_value(filename, &tmp) < 0) {
-		pci_free(dev);
+		pci_free(pdev);
 		return -1;
 	}
 	dev->id.vendor_id = (uint16_t)tmp;
@@ -234,7 +238,7 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr *addr)
 	/* get device id */
 	snprintf(filename, sizeof(filename), "%s/device", dirname);
 	if (eal_parse_sysfs_value(filename, &tmp) < 0) {
-		pci_free(dev);
+		pci_free(pdev);
 		return -1;
 	}
 	dev->id.device_id = (uint16_t)tmp;
@@ -243,7 +247,7 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr *addr)
 	snprintf(filename, sizeof(filename), "%s/subsystem_vendor",
 		 dirname);
 	if (eal_parse_sysfs_value(filename, &tmp) < 0) {
-		pci_free(dev);
+		pci_free(pdev);
 		return -1;
 	}
 	dev->id.subsystem_vendor_id = (uint16_t)tmp;
@@ -252,7 +256,7 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr *addr)
 	snprintf(filename, sizeof(filename), "%s/subsystem_device",
 		 dirname);
 	if (eal_parse_sysfs_value(filename, &tmp) < 0) {
-		pci_free(dev);
+		pci_free(pdev);
 		return -1;
 	}
 	dev->id.subsystem_device_id = (uint16_t)tmp;
@@ -261,7 +265,7 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr *addr)
 	snprintf(filename, sizeof(filename), "%s/class",
 		 dirname);
 	if (eal_parse_sysfs_value(filename, &tmp) < 0) {
-		pci_free(dev);
+		pci_free(pdev);
 		return -1;
 	}
 	/* the least 24 bits are valid: class, subclass, program interface */
@@ -297,7 +301,7 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr *addr)
 	snprintf(filename, sizeof(filename), "%s/resource", dirname);
 	if (pci_parse_sysfs_resource(filename, dev) < 0) {
 		RTE_LOG(ERR, EAL, "%s(): cannot parse resource\n", __func__);
-		pci_free(dev);
+		pci_free(pdev);
 		return -1;
 	}
 
@@ -306,7 +310,7 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr *addr)
 	ret = pci_get_kernel_driver_by_path(filename, driver, sizeof(driver));
 	if (ret < 0) {
 		RTE_LOG(ERR, EAL, "Fail to get kernel driver\n");
-		pci_free(dev);
+		pci_free(pdev);
 		return -1;
 	}
 
@@ -320,7 +324,7 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr *addr)
 		else
 			dev->kdrv = RTE_PCI_KDRV_UNKNOWN;
 	} else {
-		pci_free(dev);
+		pci_free(pdev);
 		return 0;
 	}
 	/* device is valid, add in list (sorted) */
@@ -375,7 +379,7 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr *addr)
 						pci_common_set(dev2);
 					}
 				}
-				pci_free(dev);
+				pci_free(pdev);
 			}
 			return 0;
 		}
diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
index e32a9d517a..52404ab0fe 100644
--- a/drivers/bus/pci/pci_common.c
+++ b/drivers/bus/pci/pci_common.c
@@ -121,12 +121,12 @@ pci_common_set(struct rte_pci_device *dev)
 }
 
 void
-pci_free(struct rte_pci_device *dev)
+pci_free(struct rte_pci_device_internal *pdev)
 {
-	if (dev == NULL)
+	if (pdev == NULL)
 		return;
-	free(dev->bus_info);
-	free(dev);
+	free(pdev->device.bus_info);
+	free(pdev);
 }
 
 /* map a particular resource from a file */
@@ -465,7 +465,7 @@ pci_cleanup(void)
 		rte_intr_instance_free(dev->vfio_req_intr_handle);
 		dev->vfio_req_intr_handle = NULL;
 
-		pci_free(dev);
+		pci_free(RTE_PCI_DEVICE_INTERNAL(dev));
 	}
 
 	return error;
@@ -681,7 +681,7 @@ pci_unplug(struct rte_device *dev)
 	if (ret == 0) {
 		rte_pci_remove_device(pdev);
 		rte_devargs_remove(dev->devargs);
-		pci_free(pdev);
+		pci_free(RTE_PCI_DEVICE_INTERNAL(pdev));
 	}
 	return ret;
 }
diff --git a/drivers/bus/pci/private.h b/drivers/bus/pci/private.h
index c8161a1074..b564646e03 100644
--- a/drivers/bus/pci/private.h
+++ b/drivers/bus/pci/private.h
@@ -13,6 +13,14 @@
 #include <rte_os_shim.h>
 #include <rte_pci.h>
 
+/*
+ * Convert struct rte_pci_device to struct rte_pci_device_internal
+ */
+#define RTE_PCI_DEVICE_INTERNAL(ptr) \
+	container_of(ptr, struct rte_pci_device_internal, device)
+#define RTE_PCI_DEVICE_INTERNAL_CONST(ptr) \
+	container_of(ptr, const struct rte_pci_device_internal, device)
+
 /**
  * Structure describing the PCI bus
  */
@@ -34,6 +42,10 @@ extern struct rte_pci_bus rte_pci_bus;
 struct rte_pci_driver;
 struct rte_pci_device;
 
+struct rte_pci_device_internal {
+	struct rte_pci_device device;
+};
+
 /**
  * Scan the content of the PCI bus, and the devices in the devices
  * list
@@ -53,7 +65,7 @@ pci_common_set(struct rte_pci_device *dev);
  * Free a PCI device.
  */
 void
-pci_free(struct rte_pci_device *dev);
+pci_free(struct rte_pci_device_internal *pdev);
 
 /**
  * Validate whether a device with given PCI address should be ignored or not.
diff --git a/drivers/bus/pci/windows/pci.c b/drivers/bus/pci/windows/pci.c
index 5cf05ce1a0..40eaffe483 100644
--- a/drivers/bus/pci/windows/pci.c
+++ b/drivers/bus/pci/windows/pci.c
@@ -336,6 +336,7 @@ set_kernel_driver_type(PSP_DEVINFO_DATA device_info_data,
 static int
 pci_scan_one(HDEVINFO dev_info, PSP_DEVINFO_DATA device_info_data)
 {
+	struct rte_pci_device_internal *pdev;
 	struct rte_pci_device *dev = NULL;
 	int ret = -1;
 	char  pci_device_info[REGSTR_VAL_MAX_HCID_LEN];
@@ -370,11 +371,14 @@ pci_scan_one(HDEVINFO dev_info, PSP_DEVINFO_DATA device_info_data)
 		goto end;
 	}
 
-	dev = malloc(sizeof(*dev));
-	if (dev == NULL)
+	pdev = malloc(sizeof(*pdev));
+	if (pdev == NULL) {
+		RTE_LOG(ERR, EAL, "Cannot allocate memory for internal pci device\n");
 		goto end;
+	}
 
-	memset(dev, 0, sizeof(*dev));
+	memset(pdev, 0, sizeof(*pdev));
+	dev = &pdev->device;
 
 	dev->device.bus = &rte_pci_bus.bus;
 	dev->addr = addr;
@@ -409,7 +413,7 @@ pci_scan_one(HDEVINFO dev_info, PSP_DEVINFO_DATA device_info_data)
 				dev2->max_vfs = dev->max_vfs;
 				memmove(dev2->mem_resource, dev->mem_resource,
 					sizeof(dev->mem_resource));
-				pci_free(dev);
+				pci_free(pdev);
 			}
 			return 0;
 		}
@@ -418,7 +422,7 @@ pci_scan_one(HDEVINFO dev_info, PSP_DEVINFO_DATA device_info_data)
 
 	return 0;
 end:
-	pci_free(dev);
+	pci_free(pdev);
 	return ret;
 }
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v2 2/4] bus/pci: avoid depending on private value in kernel source
  2023-05-15  9:41     ` [PATCH v2 0/4] Support VFIO sparse mmap in PCI bus Miao Li
  2023-05-15  9:41       ` [PATCH v2 1/4] bus/pci: introduce an internal representation of PCI device Miao Li
@ 2023-05-15  9:41       ` Miao Li
  2023-05-15  9:41       ` [PATCH v2 3/4] bus/pci: introduce helper for MMIO read and write Miao Li
                         ` (2 subsequent siblings)
  4 siblings, 0 replies; 50+ messages in thread
From: Miao Li @ 2023-05-15  9:41 UTC (permalink / raw)
  To: dev
  Cc: skori, thomas, david.marchand, ferruh.yigit, chenbo.xia,
	yahui.cao, Anatoly Burakov

From: Chenbo Xia <chenbo.xia@intel.com>

The value 40 used in VFIO_GET_REGION_ADDR() is a private value
(VFIO_PCI_OFFSET_SHIFT) defined in Linux kernel source [1]. It
is not part of VFIO API, and we should not depend on it.

[1] https://github.com/torvalds/linux/blob/v6.2/include/linux/vfio_pci_core.h

Signed-off-by: Chenbo Xia <chenbo.xia@intel.com>
---
 drivers/bus/pci/linux/pci.c      |   4 +-
 drivers/bus/pci/linux/pci_init.h |   4 +-
 drivers/bus/pci/linux/pci_vfio.c | 195 +++++++++++++++++++++++--------
 drivers/bus/pci/private.h        |   9 ++
 lib/eal/include/rte_vfio.h       |   1 -
 5 files changed, 158 insertions(+), 55 deletions(-)

diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index 4c2c5ba382..04e21ae20f 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -645,7 +645,7 @@ int rte_pci_read_config(const struct rte_pci_device *device,
 		return pci_uio_read_config(intr_handle, buf, len, offset);
 #ifdef VFIO_PRESENT
 	case RTE_PCI_KDRV_VFIO:
-		return pci_vfio_read_config(intr_handle, buf, len, offset);
+		return pci_vfio_read_config(device, buf, len, offset);
 #endif
 	default:
 		rte_pci_device_name(&device->addr, devname,
@@ -669,7 +669,7 @@ int rte_pci_write_config(const struct rte_pci_device *device,
 		return pci_uio_write_config(intr_handle, buf, len, offset);
 #ifdef VFIO_PRESENT
 	case RTE_PCI_KDRV_VFIO:
-		return pci_vfio_write_config(intr_handle, buf, len, offset);
+		return pci_vfio_write_config(device, buf, len, offset);
 #endif
 	default:
 		rte_pci_device_name(&device->addr, devname,
diff --git a/drivers/bus/pci/linux/pci_init.h b/drivers/bus/pci/linux/pci_init.h
index dcea726186..9f6659ba6e 100644
--- a/drivers/bus/pci/linux/pci_init.h
+++ b/drivers/bus/pci/linux/pci_init.h
@@ -66,9 +66,9 @@ int pci_uio_ioport_unmap(struct rte_pci_ioport *p);
 #endif
 
 /* access config space */
-int pci_vfio_read_config(const struct rte_intr_handle *intr_handle,
+int pci_vfio_read_config(const struct rte_pci_device *dev,
 			 void *buf, size_t len, off_t offs);
-int pci_vfio_write_config(const struct rte_intr_handle *intr_handle,
+int pci_vfio_write_config(const struct rte_pci_device *dev,
 			  const void *buf, size_t len, off_t offs);
 
 int pci_vfio_ioport_map(struct rte_pci_device *dev, int bar,
diff --git a/drivers/bus/pci/linux/pci_vfio.c b/drivers/bus/pci/linux/pci_vfio.c
index fab3483d9f..1748ad2ae0 100644
--- a/drivers/bus/pci/linux/pci_vfio.c
+++ b/drivers/bus/pci/linux/pci_vfio.c
@@ -43,45 +43,82 @@ static struct rte_tailq_elem rte_vfio_tailq = {
 };
 EAL_REGISTER_TAILQ(rte_vfio_tailq)
 
+static int
+pci_vfio_get_region(const struct rte_pci_device *dev, int index,
+		    uint64_t *size, uint64_t *offset)
+{
+	const struct rte_pci_device_internal *pdev =
+		RTE_PCI_DEVICE_INTERNAL_CONST(dev);
+
+	if (index >= VFIO_PCI_NUM_REGIONS || index >= RTE_MAX_PCI_REGIONS)
+		return -1;
+
+	if (pdev->region[index].size == 0 && pdev->region[index].offset == 0)
+		return -1;
+
+	*size   = pdev->region[index].size;
+	*offset = pdev->region[index].offset;
+
+	return 0;
+}
+
 int
-pci_vfio_read_config(const struct rte_intr_handle *intr_handle,
+pci_vfio_read_config(const struct rte_pci_device *dev,
 		    void *buf, size_t len, off_t offs)
 {
-	int vfio_dev_fd = rte_intr_dev_fd_get(intr_handle);
+	uint64_t size, offset;
+	int fd;
 
-	if (vfio_dev_fd < 0)
+	fd = rte_intr_dev_fd_get(dev->intr_handle);
+
+	if (pci_vfio_get_region(dev, VFIO_PCI_CONFIG_REGION_INDEX,
+				&size, &offset) != 0)
+		return -1;
+
+	if ((uint64_t)len + offs > size)
 		return -1;
 
-	return pread64(vfio_dev_fd, buf, len,
-	       VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) + offs);
+	return pread64(fd, buf, len, offset + offs);
 }
 
 int
-pci_vfio_write_config(const struct rte_intr_handle *intr_handle,
+pci_vfio_write_config(const struct rte_pci_device *dev,
 		    const void *buf, size_t len, off_t offs)
 {
-	int vfio_dev_fd = rte_intr_dev_fd_get(intr_handle);
+	uint64_t size, offset;
+	int fd;
 
-	if (vfio_dev_fd < 0)
+	fd = rte_intr_dev_fd_get(dev->intr_handle);
+
+	if (pci_vfio_get_region(dev, VFIO_PCI_CONFIG_REGION_INDEX,
+				&size, &offset) != 0)
 		return -1;
 
-	return pwrite64(vfio_dev_fd, buf, len,
-	       VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) + offs);
+	if ((uint64_t)len + offs > size)
+		return -1;
+
+	return pwrite64(fd, buf, len, offset + offs);
 }
 
 /* get PCI BAR number where MSI-X interrupts are */
 static int
-pci_vfio_get_msix_bar(int fd, struct pci_msix_table *msix_table)
+pci_vfio_get_msix_bar(const struct rte_pci_device *dev, int fd,
+	struct pci_msix_table *msix_table)
 {
 	int ret;
 	uint32_t reg;
 	uint16_t flags;
 	uint8_t cap_id, cap_offset;
+	uint64_t size, offset;
+
+	if (pci_vfio_get_region(dev, VFIO_PCI_CONFIG_REGION_INDEX,
+		&size, &offset) != 0) {
+		RTE_LOG(ERR, EAL, "Cannot get offset of CONFIG region.\n");
+		return -1;
+	}
 
 	/* read PCI capability pointer from config space */
-	ret = pread64(fd, &reg, sizeof(reg),
-			VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
-			PCI_CAPABILITY_LIST);
+	ret = pread64(fd, &reg, sizeof(reg), offset + PCI_CAPABILITY_LIST);
 	if (ret != sizeof(reg)) {
 		RTE_LOG(ERR, EAL,
 			"Cannot read capability pointer from PCI config space!\n");
@@ -94,9 +131,7 @@ pci_vfio_get_msix_bar(int fd, struct pci_msix_table *msix_table)
 	while (cap_offset) {
 
 		/* read PCI capability ID */
-		ret = pread64(fd, &reg, sizeof(reg),
-				VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
-				cap_offset);
+		ret = pread64(fd, &reg, sizeof(reg), offset + cap_offset);
 		if (ret != sizeof(reg)) {
 			RTE_LOG(ERR, EAL,
 				"Cannot read capability ID from PCI config space!\n");
@@ -108,9 +143,7 @@ pci_vfio_get_msix_bar(int fd, struct pci_msix_table *msix_table)
 
 		/* if we haven't reached MSI-X, check next capability */
 		if (cap_id != PCI_CAP_ID_MSIX) {
-			ret = pread64(fd, &reg, sizeof(reg),
-					VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
-					cap_offset);
+			ret = pread64(fd, &reg, sizeof(reg), offset + cap_offset);
 			if (ret != sizeof(reg)) {
 				RTE_LOG(ERR, EAL,
 					"Cannot read capability pointer from PCI config space!\n");
@@ -125,18 +158,14 @@ pci_vfio_get_msix_bar(int fd, struct pci_msix_table *msix_table)
 		/* else, read table offset */
 		else {
 			/* table offset resides in the next 4 bytes */
-			ret = pread64(fd, &reg, sizeof(reg),
-					VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
-					cap_offset + 4);
+			ret = pread64(fd, &reg, sizeof(reg), offset + cap_offset + 4);
 			if (ret != sizeof(reg)) {
 				RTE_LOG(ERR, EAL,
 					"Cannot read table offset from PCI config space!\n");
 				return -1;
 			}
 
-			ret = pread64(fd, &flags, sizeof(flags),
-					VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
-					cap_offset + 2);
+			ret = pread64(fd, &flags, sizeof(flags), offset + cap_offset + 2);
 			if (ret != sizeof(flags)) {
 				RTE_LOG(ERR, EAL,
 					"Cannot read table flags from PCI config space!\n");
@@ -156,14 +185,19 @@ pci_vfio_get_msix_bar(int fd, struct pci_msix_table *msix_table)
 
 /* enable PCI bus memory space */
 static int
-pci_vfio_enable_bus_memory(int dev_fd)
+pci_vfio_enable_bus_memory(struct rte_pci_device *dev, int dev_fd)
 {
+	uint64_t size, offset;
 	uint16_t cmd;
 	int ret;
 
-	ret = pread64(dev_fd, &cmd, sizeof(cmd),
-		      VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
-		      PCI_COMMAND);
+	if (pci_vfio_get_region(dev, VFIO_PCI_CONFIG_REGION_INDEX,
+		&size, &offset) != 0) {
+		RTE_LOG(ERR, EAL, "Cannot get offset of CONFIG region.\n");
+		return -1;
+	}
+
+	ret = pread64(dev_fd, &cmd, sizeof(cmd), offset + PCI_COMMAND);
 
 	if (ret != sizeof(cmd)) {
 		RTE_LOG(ERR, EAL, "Cannot read command from PCI config space!\n");
@@ -174,9 +208,7 @@ pci_vfio_enable_bus_memory(int dev_fd)
 		return 0;
 
 	cmd |= PCI_COMMAND_MEMORY;
-	ret = pwrite64(dev_fd, &cmd, sizeof(cmd),
-		       VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
-		       PCI_COMMAND);
+	ret = pwrite64(dev_fd, &cmd, sizeof(cmd), offset + PCI_COMMAND);
 
 	if (ret != sizeof(cmd)) {
 		RTE_LOG(ERR, EAL, "Cannot write command to PCI config space!\n");
@@ -188,14 +220,19 @@ pci_vfio_enable_bus_memory(int dev_fd)
 
 /* set PCI bus mastering */
 static int
-pci_vfio_set_bus_master(int dev_fd, bool op)
+pci_vfio_set_bus_master(const struct rte_pci_device *dev, int dev_fd, bool op)
 {
+	uint64_t size, offset;
 	uint16_t reg;
 	int ret;
 
-	ret = pread64(dev_fd, &reg, sizeof(reg),
-			VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
-			PCI_COMMAND);
+	if (pci_vfio_get_region(dev, VFIO_PCI_CONFIG_REGION_INDEX,
+		&size, &offset) != 0) {
+		RTE_LOG(ERR, EAL, "Cannot get offset of CONFIG region.\n");
+		return -1;
+	}
+
+	ret = pread64(dev_fd, &reg, sizeof(reg), offset + PCI_COMMAND);
 	if (ret != sizeof(reg)) {
 		RTE_LOG(ERR, EAL, "Cannot read command from PCI config space!\n");
 		return -1;
@@ -207,9 +244,7 @@ pci_vfio_set_bus_master(int dev_fd, bool op)
 	else
 		reg &= ~(PCI_COMMAND_MASTER);
 
-	ret = pwrite64(dev_fd, &reg, sizeof(reg),
-			VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
-			PCI_COMMAND);
+	ret = pwrite64(dev_fd, &reg, sizeof(reg), offset + PCI_COMMAND);
 
 	if (ret != sizeof(reg)) {
 		RTE_LOG(ERR, EAL, "Cannot write command to PCI config space!\n");
@@ -458,14 +493,21 @@ pci_vfio_disable_notifier(struct rte_pci_device *dev)
 #endif
 
 static int
-pci_vfio_is_ioport_bar(int vfio_dev_fd, int bar_index)
+pci_vfio_is_ioport_bar(const struct rte_pci_device *dev, int vfio_dev_fd,
+	int bar_index)
 {
+	uint64_t size, offset;
 	uint32_t ioport_bar;
 	int ret;
 
+	if (pci_vfio_get_region(dev, VFIO_PCI_CONFIG_REGION_INDEX,
+		&size, &offset) != 0) {
+		RTE_LOG(ERR, EAL, "Cannot get offset of CONFIG region.\n");
+		return -1;
+	}
+
 	ret = pread64(vfio_dev_fd, &ioport_bar, sizeof(ioport_bar),
-			  VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX)
-			  + PCI_BASE_ADDRESS_0 + bar_index*4);
+			  offset + PCI_BASE_ADDRESS_0 + bar_index * 4);
 	if (ret != sizeof(ioport_bar)) {
 		RTE_LOG(ERR, EAL, "Cannot read command (%x) from config space!\n",
 			PCI_BASE_ADDRESS_0 + bar_index*4);
@@ -483,13 +525,13 @@ pci_rte_vfio_setup_device(struct rte_pci_device *dev, int vfio_dev_fd)
 		return -1;
 	}
 
-	if (pci_vfio_enable_bus_memory(vfio_dev_fd)) {
+	if (pci_vfio_enable_bus_memory(dev, vfio_dev_fd)) {
 		RTE_LOG(ERR, EAL, "Cannot enable bus memory!\n");
 		return -1;
 	}
 
 	/* set bus mastering for the device */
-	if (pci_vfio_set_bus_master(vfio_dev_fd, true)) {
+	if (pci_vfio_set_bus_master(dev, vfio_dev_fd, true)) {
 		RTE_LOG(ERR, EAL, "Cannot set up bus mastering!\n");
 		return -1;
 	}
@@ -719,11 +761,40 @@ pci_vfio_msix_is_mappable(int vfio_dev_fd, int msix_region)
 	return ret;
 }
 
+static int
+pci_vfio_fill_regions(struct rte_pci_device *dev, int vfio_dev_fd,
+		      struct vfio_device_info *device_info)
+{
+	struct rte_pci_device_internal *pdev = RTE_PCI_DEVICE_INTERNAL(dev);
+	struct vfio_region_info *reg = NULL;
+	int nb_maps, i, ret;
+
+	nb_maps = RTE_MIN((int)device_info->num_regions,
+			VFIO_PCI_CONFIG_REGION_INDEX + 1);
+
+	for (i = 0; i < nb_maps; i++) {
+		ret = pci_vfio_get_region_info(vfio_dev_fd, &reg, i);
+		if (ret < 0) {
+			RTE_LOG(DEBUG, EAL, "%s cannot get device region info error %i (%s)\n",
+				dev->name, errno, strerror(errno));
+			return -1;
+		}
+
+		pdev->region[i].size = reg->size;
+		pdev->region[i].offset = reg->offset;
+
+		free(reg);
+	}
+
+	return 0;
+}
 
 static int
 pci_vfio_map_resource_primary(struct rte_pci_device *dev)
 {
+	struct rte_pci_device_internal *pdev = RTE_PCI_DEVICE_INTERNAL(dev);
 	struct vfio_device_info device_info = { .argsz = sizeof(device_info) };
+	struct vfio_region_info *reg = NULL;
 	char pci_addr[PATH_MAX] = {0};
 	int vfio_dev_fd;
 	struct rte_pci_addr *loc = &dev->addr;
@@ -767,11 +838,22 @@ pci_vfio_map_resource_primary(struct rte_pci_device *dev)
 	/* map BARs */
 	maps = vfio_res->maps;
 
+	ret = pci_vfio_get_region_info(vfio_dev_fd, &reg,
+		VFIO_PCI_CONFIG_REGION_INDEX);
+	if (ret < 0) {
+		RTE_LOG(ERR, EAL, "%s cannot get device region info error %i (%s)\n",
+			dev->name, errno, strerror(errno));
+		goto err_vfio_res;
+	}
+	pdev->region[VFIO_PCI_CONFIG_REGION_INDEX].size = reg->size;
+	pdev->region[VFIO_PCI_CONFIG_REGION_INDEX].offset = reg->offset;
+	free(reg);
+
 	vfio_res->msix_table.bar_index = -1;
 	/* get MSI-X BAR, if any (we have to know where it is because we can't
 	 * easily mmap it when using VFIO)
 	 */
-	ret = pci_vfio_get_msix_bar(vfio_dev_fd, &vfio_res->msix_table);
+	ret = pci_vfio_get_msix_bar(dev, vfio_dev_fd, &vfio_res->msix_table);
 	if (ret < 0) {
 		RTE_LOG(ERR, EAL, "%s cannot get MSI-X BAR number!\n",
 				pci_addr);
@@ -792,7 +874,6 @@ pci_vfio_map_resource_primary(struct rte_pci_device *dev)
 	}
 
 	for (i = 0; i < vfio_res->nb_maps; i++) {
-		struct vfio_region_info *reg = NULL;
 		void *bar_addr;
 
 		ret = pci_vfio_get_region_info(vfio_dev_fd, &reg, i);
@@ -803,8 +884,11 @@ pci_vfio_map_resource_primary(struct rte_pci_device *dev)
 			goto err_vfio_res;
 		}
 
+		pdev->region[i].size = reg->size;
+		pdev->region[i].offset = reg->offset;
+
 		/* chk for io port region */
-		ret = pci_vfio_is_ioport_bar(vfio_dev_fd, i);
+		ret = pci_vfio_is_ioport_bar(dev, vfio_dev_fd, i);
 		if (ret < 0) {
 			free(reg);
 			goto err_vfio_res;
@@ -916,6 +1000,10 @@ pci_vfio_map_resource_secondary(struct rte_pci_device *dev)
 	if (ret)
 		return ret;
 
+	ret = pci_vfio_fill_regions(dev, vfio_dev_fd, &device_info);
+	if (ret)
+		return ret;
+
 	/* map BARs */
 	maps = vfio_res->maps;
 
@@ -1031,7 +1119,7 @@ pci_vfio_unmap_resource_primary(struct rte_pci_device *dev)
 	if (vfio_dev_fd < 0)
 		return -1;
 
-	if (pci_vfio_set_bus_master(vfio_dev_fd, false)) {
+	if (pci_vfio_set_bus_master(dev, vfio_dev_fd, false)) {
 		RTE_LOG(ERR, EAL, "%s cannot unset bus mastering for PCI device!\n",
 				pci_addr);
 		return -1;
@@ -1111,14 +1199,21 @@ int
 pci_vfio_ioport_map(struct rte_pci_device *dev, int bar,
 		    struct rte_pci_ioport *p)
 {
+	uint64_t size, offset;
+
 	if (bar < VFIO_PCI_BAR0_REGION_INDEX ||
 	    bar > VFIO_PCI_BAR5_REGION_INDEX) {
 		RTE_LOG(ERR, EAL, "invalid bar (%d)!\n", bar);
 		return -1;
 	}
 
+	if (pci_vfio_get_region(dev, bar, &size, &offset) != 0) {
+		RTE_LOG(ERR, EAL, "Cannot get offset of region %d.\n", bar);
+		return -1;
+	}
+
 	p->dev = dev;
-	p->base = VFIO_GET_REGION_ADDR(bar);
+	p->base = offset;
 	return 0;
 }
 
diff --git a/drivers/bus/pci/private.h b/drivers/bus/pci/private.h
index b564646e03..2d6991ccb7 100644
--- a/drivers/bus/pci/private.h
+++ b/drivers/bus/pci/private.h
@@ -13,6 +13,8 @@
 #include <rte_os_shim.h>
 #include <rte_pci.h>
 
+#define RTE_MAX_PCI_REGIONS    9
+
 /*
  * Convert struct rte_pci_device to struct rte_pci_device_internal
  */
@@ -42,8 +44,15 @@ extern struct rte_pci_bus rte_pci_bus;
 struct rte_pci_driver;
 struct rte_pci_device;
 
+struct rte_pci_region {
+	uint64_t size;
+	uint64_t offset;
+};
+
 struct rte_pci_device_internal {
 	struct rte_pci_device device;
+	/* PCI regions provided by e.g. VFIO. */
+	struct rte_pci_region region[RTE_MAX_PCI_REGIONS];
 };
 
 /**
diff --git a/lib/eal/include/rte_vfio.h b/lib/eal/include/rte_vfio.h
index 7bdb8932b2..3487c4f2a2 100644
--- a/lib/eal/include/rte_vfio.h
+++ b/lib/eal/include/rte_vfio.h
@@ -38,7 +38,6 @@ extern "C" {
 #define VFIO_CONTAINER_PATH "/dev/vfio/vfio"
 #define VFIO_GROUP_FMT "/dev/vfio/%u"
 #define VFIO_NOIOMMU_GROUP_FMT "/dev/vfio/noiommu-%u"
-#define VFIO_GET_REGION_ADDR(x) ((uint64_t) x << 40ULL)
 #define VFIO_GET_REGION_IDX(x) (x >> 40)
 #define VFIO_NOIOMMU_MODE      \
 	"/sys/module/vfio/parameters/enable_unsafe_noiommu_mode"
-- 
2.25.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v2 3/4] bus/pci: introduce helper for MMIO read and write
  2023-05-15  9:41     ` [PATCH v2 0/4] Support VFIO sparse mmap in PCI bus Miao Li
  2023-05-15  9:41       ` [PATCH v2 1/4] bus/pci: introduce an internal representation of PCI device Miao Li
  2023-05-15  9:41       ` [PATCH v2 2/4] bus/pci: avoid depending on private value in kernel source Miao Li
@ 2023-05-15  9:41       ` Miao Li
  2023-05-15  9:41       ` [PATCH v2 4/4] bus/pci: add VFIO sparse mmap support Miao Li
  2023-05-25 16:31       ` [PATCH v3 0/4] Support VFIO sparse mmap in PCI bus Miao Li
  4 siblings, 0 replies; 50+ messages in thread
From: Miao Li @ 2023-05-15  9:41 UTC (permalink / raw)
  To: dev
  Cc: skori, thomas, david.marchand, ferruh.yigit, chenbo.xia,
	yahui.cao, Anatoly Burakov

From: Chenbo Xia <chenbo.xia@intel.com>

The MMIO regions may not be mmap-able for VFIO-PCI devices.
In this case, the driver should explicitly do read and write
to access these regions.

Signed-off-by: Chenbo Xia <chenbo.xia@intel.com>
---
 drivers/bus/pci/bsd/pci.c        | 22 +++++++++++++++
 drivers/bus/pci/linux/pci.c      | 46 ++++++++++++++++++++++++++++++
 drivers/bus/pci/linux/pci_init.h | 10 +++++++
 drivers/bus/pci/linux/pci_uio.c  | 22 +++++++++++++++
 drivers/bus/pci/linux/pci_vfio.c | 36 ++++++++++++++++++++++++
 drivers/bus/pci/rte_bus_pci.h    | 48 ++++++++++++++++++++++++++++++++
 drivers/bus/pci/version.map      |  3 ++
 7 files changed, 187 insertions(+)

diff --git a/drivers/bus/pci/bsd/pci.c b/drivers/bus/pci/bsd/pci.c
index a747eca58c..27f12590d4 100644
--- a/drivers/bus/pci/bsd/pci.c
+++ b/drivers/bus/pci/bsd/pci.c
@@ -489,6 +489,28 @@ int rte_pci_write_config(const struct rte_pci_device *dev,
 	return -1;
 }
 
+/* Read PCI MMIO space. */
+int rte_pci_mmio_read(const struct rte_pci_device *dev, int bar,
+		      void *buf, size_t len, off_t offset)
+{
+	if (bar >= PCI_MAX_RESOURCE || dev->mem_resource[bar].addr == NULL ||
+			(uint64_t)offset + len > dev->mem_resource[bar].len)
+		return -1;
+	memcpy(buf, (uint8_t *)dev->mem_resource[bar].addr + offset, len);
+	return len;
+}
+
+/* Write PCI MMIO space. */
+int rte_pci_mmio_write(const struct rte_pci_device *dev, int bar,
+		       const void *buf, size_t len, off_t offset)
+{
+	if (bar >= PCI_MAX_RESOURCE || dev->mem_resource[bar].addr == NULL ||
+			(uint64_t)offset + len > dev->mem_resource[bar].len)
+		return -1;
+	memcpy((uint8_t *)dev->mem_resource[bar].addr + offset, buf, len);
+	return len;
+}
+
 int
 rte_pci_ioport_map(struct rte_pci_device *dev, int bar,
 		struct rte_pci_ioport *p)
diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index 04e21ae20f..3d237398d9 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -680,6 +680,52 @@ int rte_pci_write_config(const struct rte_pci_device *device,
 	}
 }
 
+/* Read PCI MMIO space. */
+int rte_pci_mmio_read(const struct rte_pci_device *device, int bar,
+		void *buf, size_t len, off_t offset)
+{
+	char devname[RTE_DEV_NAME_MAX_LEN] = "";
+
+	switch (device->kdrv) {
+	case RTE_PCI_KDRV_IGB_UIO:
+	case RTE_PCI_KDRV_UIO_GENERIC:
+		return pci_uio_mmio_read(device, bar, buf, len, offset);
+#ifdef VFIO_PRESENT
+	case RTE_PCI_KDRV_VFIO:
+		return pci_vfio_mmio_read(device, bar, buf, len, offset);
+#endif
+	default:
+		rte_pci_device_name(&device->addr, devname,
+				    RTE_DEV_NAME_MAX_LEN);
+		RTE_LOG(ERR, EAL,
+			"Unknown driver type for %s\n", devname);
+		return -1;
+	}
+}
+
+/* Write PCI MMIO space. */
+int rte_pci_mmio_write(const struct rte_pci_device *device, int bar,
+		const void *buf, size_t len, off_t offset)
+{
+	char devname[RTE_DEV_NAME_MAX_LEN] = "";
+
+	switch (device->kdrv) {
+	case RTE_PCI_KDRV_IGB_UIO:
+	case RTE_PCI_KDRV_UIO_GENERIC:
+		return pci_uio_mmio_write(device, bar, buf, len, offset);
+#ifdef VFIO_PRESENT
+	case RTE_PCI_KDRV_VFIO:
+		return pci_vfio_mmio_write(device, bar, buf, len, offset);
+#endif
+	default:
+		rte_pci_device_name(&device->addr, devname,
+				    RTE_DEV_NAME_MAX_LEN);
+		RTE_LOG(ERR, EAL,
+			"Unknown driver type for %s\n", devname);
+		return -1;
+	}
+}
+
 int
 rte_pci_ioport_map(struct rte_pci_device *dev, int bar,
 		struct rte_pci_ioport *p)
diff --git a/drivers/bus/pci/linux/pci_init.h b/drivers/bus/pci/linux/pci_init.h
index 9f6659ba6e..d842809ccd 100644
--- a/drivers/bus/pci/linux/pci_init.h
+++ b/drivers/bus/pci/linux/pci_init.h
@@ -37,6 +37,11 @@ int pci_uio_read_config(const struct rte_intr_handle *intr_handle,
 int pci_uio_write_config(const struct rte_intr_handle *intr_handle,
 			 const void *buf, size_t len, off_t offs);
 
+int pci_uio_mmio_read(const struct rte_pci_device *dev, int bar,
+			void *buf, size_t len, off_t offset);
+int pci_uio_mmio_write(const struct rte_pci_device *dev, int bar,
+			const void *buf, size_t len, off_t offset);
+
 int pci_uio_ioport_map(struct rte_pci_device *dev, int bar,
 		       struct rte_pci_ioport *p);
 void pci_uio_ioport_read(struct rte_pci_ioport *p,
@@ -71,6 +76,11 @@ int pci_vfio_read_config(const struct rte_pci_device *dev,
 int pci_vfio_write_config(const struct rte_pci_device *dev,
 			  const void *buf, size_t len, off_t offs);
 
+int pci_vfio_mmio_read(const struct rte_pci_device *dev, int bar,
+			void *buf, size_t len, off_t offset);
+int pci_vfio_mmio_write(const struct rte_pci_device *dev, int bar,
+			const void *buf, size_t len, off_t offset);
+
 int pci_vfio_ioport_map(struct rte_pci_device *dev, int bar,
 		        struct rte_pci_ioport *p);
 void pci_vfio_ioport_read(struct rte_pci_ioport *p,
diff --git a/drivers/bus/pci/linux/pci_uio.c b/drivers/bus/pci/linux/pci_uio.c
index d52125e49b..2bf16e9369 100644
--- a/drivers/bus/pci/linux/pci_uio.c
+++ b/drivers/bus/pci/linux/pci_uio.c
@@ -55,6 +55,28 @@ pci_uio_write_config(const struct rte_intr_handle *intr_handle,
 	return pwrite(uio_cfg_fd, buf, len, offset);
 }
 
+int
+pci_uio_mmio_read(const struct rte_pci_device *dev, int bar,
+		  void *buf, size_t len, off_t offset)
+{
+	if (bar >= PCI_MAX_RESOURCE || dev->mem_resource[bar].addr == NULL ||
+			(uint64_t)offset + len > dev->mem_resource[bar].len)
+		return -1;
+	memcpy(buf, (uint8_t *)dev->mem_resource[bar].addr + offset, len);
+	return len;
+}
+
+int
+pci_uio_mmio_write(const struct rte_pci_device *dev, int bar,
+		   const void *buf, size_t len, off_t offset)
+{
+	if (bar >= PCI_MAX_RESOURCE || dev->mem_resource[bar].addr == NULL ||
+			(uint64_t)offset + len > dev->mem_resource[bar].len)
+		return -1;
+	memcpy((uint8_t *)dev->mem_resource[bar].addr + offset, buf, len);
+	return len;
+}
+
 static int
 pci_uio_set_bus_master(int dev_fd)
 {
diff --git a/drivers/bus/pci/linux/pci_vfio.c b/drivers/bus/pci/linux/pci_vfio.c
index 1748ad2ae0..f6289c907f 100644
--- a/drivers/bus/pci/linux/pci_vfio.c
+++ b/drivers/bus/pci/linux/pci_vfio.c
@@ -1258,6 +1258,42 @@ pci_vfio_ioport_unmap(struct rte_pci_ioport *p)
 	return -1;
 }
 
+int
+pci_vfio_mmio_read(const struct rte_pci_device *dev, int bar,
+		   void *buf, size_t len, off_t offs)
+{
+	uint64_t size, offset;
+	int fd;
+
+	fd = rte_intr_dev_fd_get(dev->intr_handle);
+
+	if (pci_vfio_get_region(dev, bar, &size, &offset) != 0)
+		return -1;
+
+	if ((uint64_t)len + offs > size)
+		return -1;
+
+	return pread64(fd, buf, len, offset + offs);
+}
+
+int
+pci_vfio_mmio_write(const struct rte_pci_device *dev, int bar,
+		    const void *buf, size_t len, off_t offs)
+{
+	uint64_t size, offset;
+	int fd;
+
+	fd = rte_intr_dev_fd_get(dev->intr_handle);
+
+	if (pci_vfio_get_region(dev, bar, &size, &offset) != 0)
+		return -1;
+
+	if ((uint64_t)len + offs > size)
+		return -1;
+
+	return pwrite64(fd, buf, len, offset + offs);
+}
+
 int
 pci_vfio_is_enabled(void)
 {
diff --git a/drivers/bus/pci/rte_bus_pci.h b/drivers/bus/pci/rte_bus_pci.h
index b193114fe5..82da087f24 100644
--- a/drivers/bus/pci/rte_bus_pci.h
+++ b/drivers/bus/pci/rte_bus_pci.h
@@ -135,6 +135,54 @@ int rte_pci_read_config(const struct rte_pci_device *device,
 int rte_pci_write_config(const struct rte_pci_device *device,
 		const void *buf, size_t len, off_t offset);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Read from a MMIO pci resource.
+ *
+ * @param device
+ *   A pointer to a rte_pci_device structure describing the device
+ *   to use
+ * @param bar
+ *   Index of the io pci resource we want to access.
+ * @param buf
+ *   A data buffer where the bytes should be read into
+ * @param len
+ *   The length of the data buffer.
+ * @param offset
+ *   The offset into MMIO space described by @bar
+ * @return
+ *  Number of bytes read on success, negative on error.
+ */
+__rte_experimental
+int rte_pci_mmio_read(const struct rte_pci_device *device, int bar,
+		void *buf, size_t len, off_t offset);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Write to a MMIO pci resource.
+ *
+ * @param device
+ *   A pointer to a rte_pci_device structure describing the device
+ *   to use
+ * @param bar
+ *   Index of the io pci resource we want to access.
+ * @param buf
+ *   A data buffer containing the bytes should be written
+ * @param len
+ *   The length of the data buffer.
+ * @param offset
+ *   The offset into MMIO space described by @bar
+ * @return
+ *  Number of bytes written on success, negative on error.
+ */
+__rte_experimental
+int rte_pci_mmio_write(const struct rte_pci_device *device, int bar,
+		const void *buf, size_t len, off_t offset);
+
 /**
  * Initialize a rte_pci_ioport object for a pci device io resource.
  *
diff --git a/drivers/bus/pci/version.map b/drivers/bus/pci/version.map
index 161ab86d3b..00fde139ca 100644
--- a/drivers/bus/pci/version.map
+++ b/drivers/bus/pci/version.map
@@ -21,6 +21,9 @@ EXPERIMENTAL {
 
 	# added in 21.08
 	rte_pci_set_bus_master;
+	# added in 23.07
+	rte_pci_mmio_read;
+	rte_pci_mmio_write;
 };
 
 INTERNAL {
-- 
2.25.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v2 4/4] bus/pci: add VFIO sparse mmap support
  2023-05-15  9:41     ` [PATCH v2 0/4] Support VFIO sparse mmap in PCI bus Miao Li
                         ` (2 preceding siblings ...)
  2023-05-15  9:41       ` [PATCH v2 3/4] bus/pci: introduce helper for MMIO read and write Miao Li
@ 2023-05-15  9:41       ` Miao Li
  2023-05-25 16:31       ` [PATCH v3 0/4] Support VFIO sparse mmap in PCI bus Miao Li
  4 siblings, 0 replies; 50+ messages in thread
From: Miao Li @ 2023-05-15  9:41 UTC (permalink / raw)
  To: dev
  Cc: skori, thomas, david.marchand, ferruh.yigit, chenbo.xia,
	yahui.cao, Anatoly Burakov

This patch adds sparse mmap support in PCI bus. Sparse mmap is a
capability defined in VFIO which allows multiple mmap areas in one
VFIO region.

In this patch, the sparse mmap regions are mapped to one continuous
virtual address region that follows device-specific BAR layout. So,
driver can still access all mapped sparse mmap regions by using
'bar_base_address + bar_offset'.

Signed-off-by: Miao Li <miao.li@intel.com>
Signed-off-by: Chenbo Xia <chenbo.xia@intel.com>
---
 drivers/bus/pci/linux/pci_vfio.c | 104 +++++++++++++++++++++++++++----
 drivers/bus/pci/private.h        |   2 +
 2 files changed, 94 insertions(+), 12 deletions(-)

diff --git a/drivers/bus/pci/linux/pci_vfio.c b/drivers/bus/pci/linux/pci_vfio.c
index f6289c907f..dd57566083 100644
--- a/drivers/bus/pci/linux/pci_vfio.c
+++ b/drivers/bus/pci/linux/pci_vfio.c
@@ -673,6 +673,54 @@ pci_vfio_mmap_bar(int vfio_dev_fd, struct mapped_pci_resource *vfio_res,
 	return 0;
 }
 
+static int
+pci_vfio_sparse_mmap_bar(int vfio_dev_fd, struct mapped_pci_resource *vfio_res,
+		int bar_index, int additional_flags)
+{
+	struct pci_map *bar = &vfio_res->maps[bar_index];
+	struct vfio_region_sparse_mmap_area *sparse;
+	void *bar_addr;
+	uint32_t i;
+
+	if (bar->size == 0) {
+		RTE_LOG(DEBUG, EAL, "Bar size is 0, skip BAR%d\n", bar_index);
+		return 0;
+	}
+
+	/* reserve the address using an inaccessible mapping */
+	bar_addr = mmap(bar->addr, bar->size, 0, MAP_PRIVATE |
+			MAP_ANONYMOUS | additional_flags, -1, 0);
+	if (bar_addr != MAP_FAILED) {
+		void *map_addr = NULL;
+		for (i = 0; i < bar->nr_areas; i++) {
+			sparse = &bar->areas[i];
+			if (sparse->size) {
+				void *addr = RTE_PTR_ADD(bar_addr, (uintptr_t)sparse->offset);
+				map_addr = pci_map_resource(addr, vfio_dev_fd,
+					bar->offset + sparse->offset, sparse->size,
+					RTE_MAP_FORCE_ADDRESS);
+				if (map_addr == NULL) {
+					munmap(bar_addr, bar->size);
+					RTE_LOG(ERR, EAL, "Failed to map pci BAR%d\n",
+						bar_index);
+					goto err_map;
+				}
+			}
+		}
+	} else {
+		RTE_LOG(ERR, EAL, "Failed to create inaccessible mapping for BAR%d\n",
+			bar_index);
+		goto err_map;
+	}
+
+	bar->addr = bar_addr;
+	return 0;
+
+err_map:
+	bar->nr_areas = 0;
+	return -1;
+}
+
 /*
  * region info may contain capability headers, so we need to keep reallocating
  * the memory until we match allocated memory size with argsz.
@@ -875,6 +923,8 @@ pci_vfio_map_resource_primary(struct rte_pci_device *dev)
 
 	for (i = 0; i < vfio_res->nb_maps; i++) {
 		void *bar_addr;
+		struct vfio_info_cap_header *hdr;
+		struct vfio_region_info_cap_sparse_mmap *sparse;
 
 		ret = pci_vfio_get_region_info(vfio_dev_fd, &reg, i);
 		if (ret < 0) {
@@ -920,12 +970,33 @@ pci_vfio_map_resource_primary(struct rte_pci_device *dev)
 		maps[i].size = reg->size;
 		maps[i].path = NULL; /* vfio doesn't have per-resource paths */
 
-		ret = pci_vfio_mmap_bar(vfio_dev_fd, vfio_res, i, 0);
-		if (ret < 0) {
-			RTE_LOG(ERR, EAL, "%s mapping BAR%i failed: %s\n",
-					pci_addr, i, strerror(errno));
-			free(reg);
-			goto err_vfio_res;
+		hdr = pci_vfio_info_cap(reg, VFIO_REGION_INFO_CAP_SPARSE_MMAP);
+
+		if (hdr != NULL) {
+			sparse = container_of(hdr,
+				struct vfio_region_info_cap_sparse_mmap, header);
+			if (sparse->nr_areas > 0) {
+				maps[i].nr_areas = sparse->nr_areas;
+				maps[i].areas = sparse->areas;
+			}
+		}
+
+		if (maps[i].nr_areas > 0) {
+			ret = pci_vfio_sparse_mmap_bar(vfio_dev_fd, vfio_res, i, 0);
+			if (ret < 0) {
+				RTE_LOG(ERR, EAL, "%s sparse mapping BAR%i failed: %s\n",
+						pci_addr, i, strerror(errno));
+				free(reg);
+				goto err_vfio_res;
+			}
+		} else {
+			ret = pci_vfio_mmap_bar(vfio_dev_fd, vfio_res, i, 0);
+			if (ret < 0) {
+				RTE_LOG(ERR, EAL, "%s mapping BAR%i failed: %s\n",
+						pci_addr, i, strerror(errno));
+				free(reg);
+				goto err_vfio_res;
+			}
 		}
 
 		dev->mem_resource[i].addr = maps[i].addr;
@@ -1008,11 +1079,20 @@ pci_vfio_map_resource_secondary(struct rte_pci_device *dev)
 	maps = vfio_res->maps;
 
 	for (i = 0; i < vfio_res->nb_maps; i++) {
-		ret = pci_vfio_mmap_bar(vfio_dev_fd, vfio_res, i, MAP_FIXED);
-		if (ret < 0) {
-			RTE_LOG(ERR, EAL, "%s mapping BAR%i failed: %s\n",
-					pci_addr, i, strerror(errno));
-			goto err_vfio_dev_fd;
+		if (maps[i].nr_areas > 0) {
+			ret = pci_vfio_sparse_mmap_bar(vfio_dev_fd, vfio_res, i, 0);
+			if (ret < 0) {
+				RTE_LOG(ERR, EAL, "%s sparse mapping BAR%i failed: %s\n",
+						pci_addr, i, strerror(errno));
+				goto err_vfio_dev_fd;
+			}
+		} else {
+			ret = pci_vfio_mmap_bar(vfio_dev_fd, vfio_res, i, 0);
+			if (ret < 0) {
+				RTE_LOG(ERR, EAL, "%s mapping BAR%i failed: %s\n",
+						pci_addr, i, strerror(errno));
+				goto err_vfio_dev_fd;
+			}
 		}
 
 		dev->mem_resource[i].addr = maps[i].addr;
@@ -1062,7 +1142,7 @@ find_and_unmap_vfio_resource(struct mapped_pci_res_list *vfio_res_list,
 		break;
 	}
 
-	if  (vfio_res == NULL)
+	if (vfio_res == NULL)
 		return vfio_res;
 
 	RTE_LOG(INFO, EAL, "Releasing PCI mapped resource for %s\n",
diff --git a/drivers/bus/pci/private.h b/drivers/bus/pci/private.h
index 2d6991ccb7..8b0ce73533 100644
--- a/drivers/bus/pci/private.h
+++ b/drivers/bus/pci/private.h
@@ -121,6 +121,8 @@ struct pci_map {
 	uint64_t offset;
 	uint64_t size;
 	uint64_t phaddr;
+	uint32_t nr_areas;
+	struct vfio_region_sparse_mmap_area *areas;
 };
 
 struct pci_msix_table {
-- 
2.25.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 4/4] bus/pci: add VFIO sparse mmap support
  2023-05-15  6:47   ` [PATCH v1 4/4] bus/pci: add VFIO sparse mmap support Miao Li
  2023-05-15  9:41     ` [PATCH v2 0/4] Support VFIO sparse mmap in PCI bus Miao Li
@ 2023-05-15 15:52     ` Stephen Hemminger
  2023-05-22  2:41       ` Li, Miao
  2023-05-22  3:42       ` Xia, Chenbo
  1 sibling, 2 replies; 50+ messages in thread
From: Stephen Hemminger @ 2023-05-15 15:52 UTC (permalink / raw)
  To: Miao Li
  Cc: dev, skori, thomas, david.marchand, ferruh.yigit, chenbo.xia,
	yahui.cao, Anatoly Burakov

On Mon, 15 May 2023 06:47:00 +0000
Miao Li <miao.li@intel.com> wrote:

> +				map_addr = pci_map_resource(addr, vfio_dev_fd,
> +					bar->offset + sparse->offset, sparse->size,
> +					RTE_MAP_FORCE_ADDRESS);
> +				if (map_addr == NULL) {
> +					munmap(bar_addr, bar->size);
> +					RTE_LOG(ERR, EAL, "Failed to map pci BAR%d\n",
> +						bar_index);

If mmap() fails then printing errno would help diagnose why.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [PATCH v1 4/4] bus/pci: add VFIO sparse mmap support
  2023-05-15 15:52     ` [PATCH v1 4/4] bus/pci: add VFIO sparse mmap support Stephen Hemminger
@ 2023-05-22  2:41       ` Li, Miao
  2023-05-22  3:42       ` Xia, Chenbo
  1 sibling, 0 replies; 50+ messages in thread
From: Li, Miao @ 2023-05-22  2:41 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: dev, skori, thomas, david.marchand, ferruh.yigit, Xia, Chenbo,
	Cao, Yahui, Burakov, Anatoly

Hi,
I will add errno print in patch v3.

> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Monday, May 15, 2023 11:53 PM
> To: Li, Miao <miao.li@intel.com>
> Cc: dev@dpdk.org; skori@marvell.com; thomas@monjalon.net;
> david.marchand@redhat.com; ferruh.yigit@amd.com; Xia, Chenbo
> <chenbo.xia@intel.com>; Cao, Yahui <yahui.cao@intel.com>; Burakov, Anatoly
> <anatoly.burakov@intel.com>
> Subject: Re: [PATCH v1 4/4] bus/pci: add VFIO sparse mmap support
> 
> On Mon, 15 May 2023 06:47:00 +0000
> Miao Li <miao.li@intel.com> wrote:
> 
> > +				map_addr = pci_map_resource(addr,
> vfio_dev_fd,
> > +					bar->offset + sparse->offset, sparse-
> >size,
> > +					RTE_MAP_FORCE_ADDRESS);
> > +				if (map_addr == NULL) {
> > +					munmap(bar_addr, bar->size);
> > +					RTE_LOG(ERR, EAL, "Failed to map pci
> BAR%d\n",
> > +						bar_index);
> 
> If mmap() fails then printing errno would help diagnose why.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [PATCH v1 4/4] bus/pci: add VFIO sparse mmap support
  2023-05-15 15:52     ` [PATCH v1 4/4] bus/pci: add VFIO sparse mmap support Stephen Hemminger
  2023-05-22  2:41       ` Li, Miao
@ 2023-05-22  3:42       ` Xia, Chenbo
  1 sibling, 0 replies; 50+ messages in thread
From: Xia, Chenbo @ 2023-05-22  3:42 UTC (permalink / raw)
  To: Stephen Hemminger, Li, Miao
  Cc: dev, skori, thomas, david.marchand, ferruh.yigit, Cao, Yahui,
	Burakov, Anatoly

Hi Stephen,

> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Monday, May 15, 2023 11:53 PM
> To: Li, Miao <miao.li@intel.com>
> Cc: dev@dpdk.org; skori@marvell.com; thomas@monjalon.net;
> david.marchand@redhat.com; ferruh.yigit@amd.com; Xia, Chenbo
> <chenbo.xia@intel.com>; Cao, Yahui <yahui.cao@intel.com>; Burakov, Anatoly
> <anatoly.burakov@intel.com>
> Subject: Re: [PATCH v1 4/4] bus/pci: add VFIO sparse mmap support
> 
> On Mon, 15 May 2023 06:47:00 +0000
> Miao Li <miao.li@intel.com> wrote:
> 
> > +				map_addr = pci_map_resource(addr, vfio_dev_fd,
> > +					bar->offset + sparse->offset, sparse->size,
> > +					RTE_MAP_FORCE_ADDRESS);
> > +				if (map_addr == NULL) {
> > +					munmap(bar_addr, bar->size);
> > +					RTE_LOG(ERR, EAL, "Failed to map pci
> BAR%d\n",
> > +						bar_index);
> 
> If mmap() fails then printing errno would help diagnose why.

Thanks for your review! It seems errno will be printed in function
pci_map_resource() when mmap() fails. So I guess we don't need it here?

Thanks,
Chenbo


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v3 0/4] Support VFIO sparse mmap in PCI bus
  2023-05-15  9:41     ` [PATCH v2 0/4] Support VFIO sparse mmap in PCI bus Miao Li
                         ` (3 preceding siblings ...)
  2023-05-15  9:41       ` [PATCH v2 4/4] bus/pci: add VFIO sparse mmap support Miao Li
@ 2023-05-25 16:31       ` Miao Li
  2023-05-25 16:31         ` [PATCH v3 1/4] bus/pci: introduce an internal representation of PCI device Miao Li
                           ` (4 more replies)
  4 siblings, 5 replies; 50+ messages in thread
From: Miao Li @ 2023-05-25 16:31 UTC (permalink / raw)
  To: dev; +Cc: skori, thomas, david.marchand, ferruh.yigit, chenbo.xia, yahui.cao

This series introduces a VFIO standard capability, called sparse
mmap to PCI bus. In linux kernel, it's defined as
VFIO_REGION_INFO_CAP_SPARSE_MMAP. Sparse mmap means instead of
mmap whole BAR region into DPDK process, only mmap part of the
BAR region after getting sparse mmap information from kernel.
For the rest of BAR region that is not mmap-ed, DPDK process
can use pread/pwrite system calls to access. Sparse mmap is
useful when kernel does not want userspace to mmap whole BAR
region, or kernel wants to control over access to specific BAR
region. Vendors can choose to enable this feature or not for
their devices in their specific kernel modules.

In this patchset:

Patch 1-3 is mainly for introducing BAR access APIs so that
driver could use them to access specific BAR using pread/pwrite
system calls when part of the BAR is not mmap-able. Patch 4
adds the VFIO sparse mmap support finally.

v3:
fix variable 'pdev' and 'info' uninitialized error

v2:
1. add PCI device internal structure in bus/pci/windows/pci.c
2. fix parameter type error

Chenbo Xia (3):
  bus/pci: introduce an internal representation of PCI device
  bus/pci: avoid depending on private value in kernel source
  bus/pci: introduce helper for MMIO read and write

Miao Li (1):
  bus/pci: add VFIO sparse mmap support

 drivers/bus/pci/bsd/pci.c        |  35 +++-
 drivers/bus/pci/linux/pci.c      |  78 +++++--
 drivers/bus/pci/linux/pci_init.h |  14 +-
 drivers/bus/pci/linux/pci_uio.c  |  22 ++
 drivers/bus/pci/linux/pci_vfio.c | 337 +++++++++++++++++++++++++------
 drivers/bus/pci/pci_common.c     |  12 +-
 drivers/bus/pci/private.h        |  25 ++-
 drivers/bus/pci/rte_bus_pci.h    |  48 +++++
 drivers/bus/pci/version.map      |   3 +
 drivers/bus/pci/windows/pci.c    |  14 +-
 lib/eal/include/rte_vfio.h       |   1 -
 11 files changed, 492 insertions(+), 97 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v3 1/4] bus/pci: introduce an internal representation of PCI device
  2023-05-25 16:31       ` [PATCH v3 0/4] Support VFIO sparse mmap in PCI bus Miao Li
@ 2023-05-25 16:31         ` Miao Li
  2023-05-29  6:14           ` [EXT] " Sunil Kumar Kori
  2023-05-29  6:28           ` Cao, Yahui
  2023-05-25 16:31         ` [PATCH v3 2/4] bus/pci: avoid depending on private value in kernel source Miao Li
                           ` (3 subsequent siblings)
  4 siblings, 2 replies; 50+ messages in thread
From: Miao Li @ 2023-05-25 16:31 UTC (permalink / raw)
  To: dev; +Cc: skori, thomas, david.marchand, ferruh.yigit, chenbo.xia, yahui.cao

From: Chenbo Xia <chenbo.xia@intel.com>

This patch introduces an internal representation of the PCI device
which will be used to store the internal information that don't have
to be exposed to drivers, e.g., the VFIO region sizes/offsets.

In this patch, the internal structure is simply a wrapper of the
rte_pci_device structure. More fields will be added.

Signed-off-by: Chenbo Xia <chenbo.xia@intel.com>
---
 drivers/bus/pci/bsd/pci.c     | 13 ++++++++-----
 drivers/bus/pci/linux/pci.c   | 28 ++++++++++++++++------------
 drivers/bus/pci/pci_common.c  | 12 ++++++------
 drivers/bus/pci/private.h     | 14 +++++++++++++-
 drivers/bus/pci/windows/pci.c | 14 +++++++++-----
 5 files changed, 52 insertions(+), 29 deletions(-)

diff --git a/drivers/bus/pci/bsd/pci.c b/drivers/bus/pci/bsd/pci.c
index 7459d15c7e..a747eca58c 100644
--- a/drivers/bus/pci/bsd/pci.c
+++ b/drivers/bus/pci/bsd/pci.c
@@ -208,16 +208,19 @@ pci_uio_map_resource_by_index(struct rte_pci_device *dev, int res_idx,
 static int
 pci_scan_one(int dev_pci_fd, struct pci_conf *conf)
 {
+	struct rte_pci_device_internal *pdev;
 	struct rte_pci_device *dev;
 	struct pci_bar_io bar;
 	unsigned i, max;
 
-	dev = malloc(sizeof(*dev));
-	if (dev == NULL) {
+	pdev = malloc(sizeof(*pdev));
+	if (pdev == NULL) {
+		RTE_LOG(ERR, EAL, "Cannot allocate memory for internal pci device\n");
 		return -1;
 	}
 
-	memset(dev, 0, sizeof(*dev));
+	memset(pdev, 0, sizeof(*pdev));
+	dev = &pdev->device;
 	dev->device.bus = &rte_pci_bus.bus;
 
 	dev->addr.domain = conf->pc_sel.pc_domain;
@@ -303,7 +306,7 @@ pci_scan_one(int dev_pci_fd, struct pci_conf *conf)
 				memmove(dev2->mem_resource,
 					dev->mem_resource,
 					sizeof(dev->mem_resource));
-				pci_free(dev);
+				pci_free(pdev);
 			}
 			return 0;
 		}
@@ -313,7 +316,7 @@ pci_scan_one(int dev_pci_fd, struct pci_conf *conf)
 	return 0;
 
 skipdev:
-	pci_free(dev);
+	pci_free(pdev);
 	return 0;
 }
 
diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index ebd1395502..4c2c5ba382 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -211,22 +211,26 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr *addr)
 {
 	char filename[PATH_MAX];
 	unsigned long tmp;
+	struct rte_pci_device_internal *pdev;
 	struct rte_pci_device *dev;
 	char driver[PATH_MAX];
 	int ret;
 
-	dev = malloc(sizeof(*dev));
-	if (dev == NULL)
+	pdev = malloc(sizeof(*pdev));
+	if (pdev == NULL) {
+		RTE_LOG(ERR, EAL, "Cannot allocate memory for internal pci device\n");
 		return -1;
+	}
 
-	memset(dev, 0, sizeof(*dev));
+	memset(pdev, 0, sizeof(*pdev));
+	dev = &pdev->device;
 	dev->device.bus = &rte_pci_bus.bus;
 	dev->addr = *addr;
 
 	/* get vendor id */
 	snprintf(filename, sizeof(filename), "%s/vendor", dirname);
 	if (eal_parse_sysfs_value(filename, &tmp) < 0) {
-		pci_free(dev);
+		pci_free(pdev);
 		return -1;
 	}
 	dev->id.vendor_id = (uint16_t)tmp;
@@ -234,7 +238,7 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr *addr)
 	/* get device id */
 	snprintf(filename, sizeof(filename), "%s/device", dirname);
 	if (eal_parse_sysfs_value(filename, &tmp) < 0) {
-		pci_free(dev);
+		pci_free(pdev);
 		return -1;
 	}
 	dev->id.device_id = (uint16_t)tmp;
@@ -243,7 +247,7 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr *addr)
 	snprintf(filename, sizeof(filename), "%s/subsystem_vendor",
 		 dirname);
 	if (eal_parse_sysfs_value(filename, &tmp) < 0) {
-		pci_free(dev);
+		pci_free(pdev);
 		return -1;
 	}
 	dev->id.subsystem_vendor_id = (uint16_t)tmp;
@@ -252,7 +256,7 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr *addr)
 	snprintf(filename, sizeof(filename), "%s/subsystem_device",
 		 dirname);
 	if (eal_parse_sysfs_value(filename, &tmp) < 0) {
-		pci_free(dev);
+		pci_free(pdev);
 		return -1;
 	}
 	dev->id.subsystem_device_id = (uint16_t)tmp;
@@ -261,7 +265,7 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr *addr)
 	snprintf(filename, sizeof(filename), "%s/class",
 		 dirname);
 	if (eal_parse_sysfs_value(filename, &tmp) < 0) {
-		pci_free(dev);
+		pci_free(pdev);
 		return -1;
 	}
 	/* the least 24 bits are valid: class, subclass, program interface */
@@ -297,7 +301,7 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr *addr)
 	snprintf(filename, sizeof(filename), "%s/resource", dirname);
 	if (pci_parse_sysfs_resource(filename, dev) < 0) {
 		RTE_LOG(ERR, EAL, "%s(): cannot parse resource\n", __func__);
-		pci_free(dev);
+		pci_free(pdev);
 		return -1;
 	}
 
@@ -306,7 +310,7 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr *addr)
 	ret = pci_get_kernel_driver_by_path(filename, driver, sizeof(driver));
 	if (ret < 0) {
 		RTE_LOG(ERR, EAL, "Fail to get kernel driver\n");
-		pci_free(dev);
+		pci_free(pdev);
 		return -1;
 	}
 
@@ -320,7 +324,7 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr *addr)
 		else
 			dev->kdrv = RTE_PCI_KDRV_UNKNOWN;
 	} else {
-		pci_free(dev);
+		pci_free(pdev);
 		return 0;
 	}
 	/* device is valid, add in list (sorted) */
@@ -375,7 +379,7 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr *addr)
 						pci_common_set(dev2);
 					}
 				}
-				pci_free(dev);
+				pci_free(pdev);
 			}
 			return 0;
 		}
diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
index e32a9d517a..52404ab0fe 100644
--- a/drivers/bus/pci/pci_common.c
+++ b/drivers/bus/pci/pci_common.c
@@ -121,12 +121,12 @@ pci_common_set(struct rte_pci_device *dev)
 }
 
 void
-pci_free(struct rte_pci_device *dev)
+pci_free(struct rte_pci_device_internal *pdev)
 {
-	if (dev == NULL)
+	if (pdev == NULL)
 		return;
-	free(dev->bus_info);
-	free(dev);
+	free(pdev->device.bus_info);
+	free(pdev);
 }
 
 /* map a particular resource from a file */
@@ -465,7 +465,7 @@ pci_cleanup(void)
 		rte_intr_instance_free(dev->vfio_req_intr_handle);
 		dev->vfio_req_intr_handle = NULL;
 
-		pci_free(dev);
+		pci_free(RTE_PCI_DEVICE_INTERNAL(dev));
 	}
 
 	return error;
@@ -681,7 +681,7 @@ pci_unplug(struct rte_device *dev)
 	if (ret == 0) {
 		rte_pci_remove_device(pdev);
 		rte_devargs_remove(dev->devargs);
-		pci_free(pdev);
+		pci_free(RTE_PCI_DEVICE_INTERNAL(pdev));
 	}
 	return ret;
 }
diff --git a/drivers/bus/pci/private.h b/drivers/bus/pci/private.h
index c8161a1074..b564646e03 100644
--- a/drivers/bus/pci/private.h
+++ b/drivers/bus/pci/private.h
@@ -13,6 +13,14 @@
 #include <rte_os_shim.h>
 #include <rte_pci.h>
 
+/*
+ * Convert struct rte_pci_device to struct rte_pci_device_internal
+ */
+#define RTE_PCI_DEVICE_INTERNAL(ptr) \
+	container_of(ptr, struct rte_pci_device_internal, device)
+#define RTE_PCI_DEVICE_INTERNAL_CONST(ptr) \
+	container_of(ptr, const struct rte_pci_device_internal, device)
+
 /**
  * Structure describing the PCI bus
  */
@@ -34,6 +42,10 @@ extern struct rte_pci_bus rte_pci_bus;
 struct rte_pci_driver;
 struct rte_pci_device;
 
+struct rte_pci_device_internal {
+	struct rte_pci_device device;
+};
+
 /**
  * Scan the content of the PCI bus, and the devices in the devices
  * list
@@ -53,7 +65,7 @@ pci_common_set(struct rte_pci_device *dev);
  * Free a PCI device.
  */
 void
-pci_free(struct rte_pci_device *dev);
+pci_free(struct rte_pci_device_internal *pdev);
 
 /**
  * Validate whether a device with given PCI address should be ignored or not.
diff --git a/drivers/bus/pci/windows/pci.c b/drivers/bus/pci/windows/pci.c
index 5cf05ce1a0..df5221d913 100644
--- a/drivers/bus/pci/windows/pci.c
+++ b/drivers/bus/pci/windows/pci.c
@@ -336,6 +336,7 @@ set_kernel_driver_type(PSP_DEVINFO_DATA device_info_data,
 static int
 pci_scan_one(HDEVINFO dev_info, PSP_DEVINFO_DATA device_info_data)
 {
+	struct rte_pci_device_internal *pdev = NULL;
 	struct rte_pci_device *dev = NULL;
 	int ret = -1;
 	char  pci_device_info[REGSTR_VAL_MAX_HCID_LEN];
@@ -370,11 +371,14 @@ pci_scan_one(HDEVINFO dev_info, PSP_DEVINFO_DATA device_info_data)
 		goto end;
 	}
 
-	dev = malloc(sizeof(*dev));
-	if (dev == NULL)
+	pdev = malloc(sizeof(*pdev));
+	if (pdev == NULL) {
+		RTE_LOG(ERR, EAL, "Cannot allocate memory for internal pci device\n");
 		goto end;
+	}
 
-	memset(dev, 0, sizeof(*dev));
+	memset(pdev, 0, sizeof(*pdev));
+	dev = &pdev->device;
 
 	dev->device.bus = &rte_pci_bus.bus;
 	dev->addr = addr;
@@ -409,7 +413,7 @@ pci_scan_one(HDEVINFO dev_info, PSP_DEVINFO_DATA device_info_data)
 				dev2->max_vfs = dev->max_vfs;
 				memmove(dev2->mem_resource, dev->mem_resource,
 					sizeof(dev->mem_resource));
-				pci_free(dev);
+				pci_free(pdev);
 			}
 			return 0;
 		}
@@ -418,7 +422,7 @@ pci_scan_one(HDEVINFO dev_info, PSP_DEVINFO_DATA device_info_data)
 
 	return 0;
 end:
-	pci_free(dev);
+	pci_free(pdev);
 	return ret;
 }
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v3 2/4] bus/pci: avoid depending on private value in kernel source
  2023-05-25 16:31       ` [PATCH v3 0/4] Support VFIO sparse mmap in PCI bus Miao Li
  2023-05-25 16:31         ` [PATCH v3 1/4] bus/pci: introduce an internal representation of PCI device Miao Li
@ 2023-05-25 16:31         ` Miao Li
  2023-05-29  6:15           ` [EXT] " Sunil Kumar Kori
  2023-05-29  6:30           ` Cao, Yahui
  2023-05-25 16:31         ` [PATCH v3 3/4] bus/pci: introduce helper for MMIO read and write Miao Li
                           ` (2 subsequent siblings)
  4 siblings, 2 replies; 50+ messages in thread
From: Miao Li @ 2023-05-25 16:31 UTC (permalink / raw)
  To: dev
  Cc: skori, thomas, david.marchand, ferruh.yigit, chenbo.xia,
	yahui.cao, Anatoly Burakov

From: Chenbo Xia <chenbo.xia@intel.com>

The value 40 used in VFIO_GET_REGION_ADDR() is a private value
(VFIO_PCI_OFFSET_SHIFT) defined in Linux kernel source [1]. It
is not part of VFIO API, and we should not depend on it.

[1] https://github.com/torvalds/linux/blob/v6.2/include/linux/vfio_pci_core.h

Signed-off-by: Chenbo Xia <chenbo.xia@intel.com>
---
 drivers/bus/pci/linux/pci.c      |   4 +-
 drivers/bus/pci/linux/pci_init.h |   4 +-
 drivers/bus/pci/linux/pci_vfio.c | 197 +++++++++++++++++++++++--------
 drivers/bus/pci/private.h        |   9 ++
 lib/eal/include/rte_vfio.h       |   1 -
 5 files changed, 159 insertions(+), 56 deletions(-)

diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index 4c2c5ba382..04e21ae20f 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -645,7 +645,7 @@ int rte_pci_read_config(const struct rte_pci_device *device,
 		return pci_uio_read_config(intr_handle, buf, len, offset);
 #ifdef VFIO_PRESENT
 	case RTE_PCI_KDRV_VFIO:
-		return pci_vfio_read_config(intr_handle, buf, len, offset);
+		return pci_vfio_read_config(device, buf, len, offset);
 #endif
 	default:
 		rte_pci_device_name(&device->addr, devname,
@@ -669,7 +669,7 @@ int rte_pci_write_config(const struct rte_pci_device *device,
 		return pci_uio_write_config(intr_handle, buf, len, offset);
 #ifdef VFIO_PRESENT
 	case RTE_PCI_KDRV_VFIO:
-		return pci_vfio_write_config(intr_handle, buf, len, offset);
+		return pci_vfio_write_config(device, buf, len, offset);
 #endif
 	default:
 		rte_pci_device_name(&device->addr, devname,
diff --git a/drivers/bus/pci/linux/pci_init.h b/drivers/bus/pci/linux/pci_init.h
index dcea726186..9f6659ba6e 100644
--- a/drivers/bus/pci/linux/pci_init.h
+++ b/drivers/bus/pci/linux/pci_init.h
@@ -66,9 +66,9 @@ int pci_uio_ioport_unmap(struct rte_pci_ioport *p);
 #endif
 
 /* access config space */
-int pci_vfio_read_config(const struct rte_intr_handle *intr_handle,
+int pci_vfio_read_config(const struct rte_pci_device *dev,
 			 void *buf, size_t len, off_t offs);
-int pci_vfio_write_config(const struct rte_intr_handle *intr_handle,
+int pci_vfio_write_config(const struct rte_pci_device *dev,
 			  const void *buf, size_t len, off_t offs);
 
 int pci_vfio_ioport_map(struct rte_pci_device *dev, int bar,
diff --git a/drivers/bus/pci/linux/pci_vfio.c b/drivers/bus/pci/linux/pci_vfio.c
index fab3483d9f..5aef84b7d0 100644
--- a/drivers/bus/pci/linux/pci_vfio.c
+++ b/drivers/bus/pci/linux/pci_vfio.c
@@ -43,45 +43,82 @@ static struct rte_tailq_elem rte_vfio_tailq = {
 };
 EAL_REGISTER_TAILQ(rte_vfio_tailq)
 
+static int
+pci_vfio_get_region(const struct rte_pci_device *dev, int index,
+		    uint64_t *size, uint64_t *offset)
+{
+	const struct rte_pci_device_internal *pdev =
+		RTE_PCI_DEVICE_INTERNAL_CONST(dev);
+
+	if (index >= VFIO_PCI_NUM_REGIONS || index >= RTE_MAX_PCI_REGIONS)
+		return -1;
+
+	if (pdev->region[index].size == 0 && pdev->region[index].offset == 0)
+		return -1;
+
+	*size   = pdev->region[index].size;
+	*offset = pdev->region[index].offset;
+
+	return 0;
+}
+
 int
-pci_vfio_read_config(const struct rte_intr_handle *intr_handle,
+pci_vfio_read_config(const struct rte_pci_device *dev,
 		    void *buf, size_t len, off_t offs)
 {
-	int vfio_dev_fd = rte_intr_dev_fd_get(intr_handle);
+	uint64_t size, offset;
+	int fd;
 
-	if (vfio_dev_fd < 0)
+	fd = rte_intr_dev_fd_get(dev->intr_handle);
+
+	if (pci_vfio_get_region(dev, VFIO_PCI_CONFIG_REGION_INDEX,
+				&size, &offset) != 0)
+		return -1;
+
+	if ((uint64_t)len + offs > size)
 		return -1;
 
-	return pread64(vfio_dev_fd, buf, len,
-	       VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) + offs);
+	return pread64(fd, buf, len, offset + offs);
 }
 
 int
-pci_vfio_write_config(const struct rte_intr_handle *intr_handle,
+pci_vfio_write_config(const struct rte_pci_device *dev,
 		    const void *buf, size_t len, off_t offs)
 {
-	int vfio_dev_fd = rte_intr_dev_fd_get(intr_handle);
+	uint64_t size, offset;
+	int fd;
 
-	if (vfio_dev_fd < 0)
+	fd = rte_intr_dev_fd_get(dev->intr_handle);
+
+	if (pci_vfio_get_region(dev, VFIO_PCI_CONFIG_REGION_INDEX,
+				&size, &offset) != 0)
 		return -1;
 
-	return pwrite64(vfio_dev_fd, buf, len,
-	       VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) + offs);
+	if ((uint64_t)len + offs > size)
+		return -1;
+
+	return pwrite64(fd, buf, len, offset + offs);
 }
 
 /* get PCI BAR number where MSI-X interrupts are */
 static int
-pci_vfio_get_msix_bar(int fd, struct pci_msix_table *msix_table)
+pci_vfio_get_msix_bar(const struct rte_pci_device *dev, int fd,
+	struct pci_msix_table *msix_table)
 {
 	int ret;
 	uint32_t reg;
 	uint16_t flags;
 	uint8_t cap_id, cap_offset;
+	uint64_t size, offset;
+
+	if (pci_vfio_get_region(dev, VFIO_PCI_CONFIG_REGION_INDEX,
+		&size, &offset) != 0) {
+		RTE_LOG(ERR, EAL, "Cannot get offset of CONFIG region.\n");
+		return -1;
+	}
 
 	/* read PCI capability pointer from config space */
-	ret = pread64(fd, &reg, sizeof(reg),
-			VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
-			PCI_CAPABILITY_LIST);
+	ret = pread64(fd, &reg, sizeof(reg), offset + PCI_CAPABILITY_LIST);
 	if (ret != sizeof(reg)) {
 		RTE_LOG(ERR, EAL,
 			"Cannot read capability pointer from PCI config space!\n");
@@ -94,9 +131,7 @@ pci_vfio_get_msix_bar(int fd, struct pci_msix_table *msix_table)
 	while (cap_offset) {
 
 		/* read PCI capability ID */
-		ret = pread64(fd, &reg, sizeof(reg),
-				VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
-				cap_offset);
+		ret = pread64(fd, &reg, sizeof(reg), offset + cap_offset);
 		if (ret != sizeof(reg)) {
 			RTE_LOG(ERR, EAL,
 				"Cannot read capability ID from PCI config space!\n");
@@ -108,9 +143,7 @@ pci_vfio_get_msix_bar(int fd, struct pci_msix_table *msix_table)
 
 		/* if we haven't reached MSI-X, check next capability */
 		if (cap_id != PCI_CAP_ID_MSIX) {
-			ret = pread64(fd, &reg, sizeof(reg),
-					VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
-					cap_offset);
+			ret = pread64(fd, &reg, sizeof(reg), offset + cap_offset);
 			if (ret != sizeof(reg)) {
 				RTE_LOG(ERR, EAL,
 					"Cannot read capability pointer from PCI config space!\n");
@@ -125,18 +158,14 @@ pci_vfio_get_msix_bar(int fd, struct pci_msix_table *msix_table)
 		/* else, read table offset */
 		else {
 			/* table offset resides in the next 4 bytes */
-			ret = pread64(fd, &reg, sizeof(reg),
-					VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
-					cap_offset + 4);
+			ret = pread64(fd, &reg, sizeof(reg), offset + cap_offset + 4);
 			if (ret != sizeof(reg)) {
 				RTE_LOG(ERR, EAL,
 					"Cannot read table offset from PCI config space!\n");
 				return -1;
 			}
 
-			ret = pread64(fd, &flags, sizeof(flags),
-					VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
-					cap_offset + 2);
+			ret = pread64(fd, &flags, sizeof(flags), offset + cap_offset + 2);
 			if (ret != sizeof(flags)) {
 				RTE_LOG(ERR, EAL,
 					"Cannot read table flags from PCI config space!\n");
@@ -156,14 +185,19 @@ pci_vfio_get_msix_bar(int fd, struct pci_msix_table *msix_table)
 
 /* enable PCI bus memory space */
 static int
-pci_vfio_enable_bus_memory(int dev_fd)
+pci_vfio_enable_bus_memory(struct rte_pci_device *dev, int dev_fd)
 {
+	uint64_t size, offset;
 	uint16_t cmd;
 	int ret;
 
-	ret = pread64(dev_fd, &cmd, sizeof(cmd),
-		      VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
-		      PCI_COMMAND);
+	if (pci_vfio_get_region(dev, VFIO_PCI_CONFIG_REGION_INDEX,
+		&size, &offset) != 0) {
+		RTE_LOG(ERR, EAL, "Cannot get offset of CONFIG region.\n");
+		return -1;
+	}
+
+	ret = pread64(dev_fd, &cmd, sizeof(cmd), offset + PCI_COMMAND);
 
 	if (ret != sizeof(cmd)) {
 		RTE_LOG(ERR, EAL, "Cannot read command from PCI config space!\n");
@@ -174,9 +208,7 @@ pci_vfio_enable_bus_memory(int dev_fd)
 		return 0;
 
 	cmd |= PCI_COMMAND_MEMORY;
-	ret = pwrite64(dev_fd, &cmd, sizeof(cmd),
-		       VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
-		       PCI_COMMAND);
+	ret = pwrite64(dev_fd, &cmd, sizeof(cmd), offset + PCI_COMMAND);
 
 	if (ret != sizeof(cmd)) {
 		RTE_LOG(ERR, EAL, "Cannot write command to PCI config space!\n");
@@ -188,14 +220,19 @@ pci_vfio_enable_bus_memory(int dev_fd)
 
 /* set PCI bus mastering */
 static int
-pci_vfio_set_bus_master(int dev_fd, bool op)
+pci_vfio_set_bus_master(const struct rte_pci_device *dev, int dev_fd, bool op)
 {
+	uint64_t size, offset;
 	uint16_t reg;
 	int ret;
 
-	ret = pread64(dev_fd, &reg, sizeof(reg),
-			VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
-			PCI_COMMAND);
+	if (pci_vfio_get_region(dev, VFIO_PCI_CONFIG_REGION_INDEX,
+		&size, &offset) != 0) {
+		RTE_LOG(ERR, EAL, "Cannot get offset of CONFIG region.\n");
+		return -1;
+	}
+
+	ret = pread64(dev_fd, &reg, sizeof(reg), offset + PCI_COMMAND);
 	if (ret != sizeof(reg)) {
 		RTE_LOG(ERR, EAL, "Cannot read command from PCI config space!\n");
 		return -1;
@@ -207,9 +244,7 @@ pci_vfio_set_bus_master(int dev_fd, bool op)
 	else
 		reg &= ~(PCI_COMMAND_MASTER);
 
-	ret = pwrite64(dev_fd, &reg, sizeof(reg),
-			VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
-			PCI_COMMAND);
+	ret = pwrite64(dev_fd, &reg, sizeof(reg), offset + PCI_COMMAND);
 
 	if (ret != sizeof(reg)) {
 		RTE_LOG(ERR, EAL, "Cannot write command to PCI config space!\n");
@@ -458,14 +493,21 @@ pci_vfio_disable_notifier(struct rte_pci_device *dev)
 #endif
 
 static int
-pci_vfio_is_ioport_bar(int vfio_dev_fd, int bar_index)
+pci_vfio_is_ioport_bar(const struct rte_pci_device *dev, int vfio_dev_fd,
+	int bar_index)
 {
+	uint64_t size, offset;
 	uint32_t ioport_bar;
 	int ret;
 
+	if (pci_vfio_get_region(dev, VFIO_PCI_CONFIG_REGION_INDEX,
+		&size, &offset) != 0) {
+		RTE_LOG(ERR, EAL, "Cannot get offset of CONFIG region.\n");
+		return -1;
+	}
+
 	ret = pread64(vfio_dev_fd, &ioport_bar, sizeof(ioport_bar),
-			  VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX)
-			  + PCI_BASE_ADDRESS_0 + bar_index*4);
+			  offset + PCI_BASE_ADDRESS_0 + bar_index * 4);
 	if (ret != sizeof(ioport_bar)) {
 		RTE_LOG(ERR, EAL, "Cannot read command (%x) from config space!\n",
 			PCI_BASE_ADDRESS_0 + bar_index*4);
@@ -483,13 +525,13 @@ pci_rte_vfio_setup_device(struct rte_pci_device *dev, int vfio_dev_fd)
 		return -1;
 	}
 
-	if (pci_vfio_enable_bus_memory(vfio_dev_fd)) {
+	if (pci_vfio_enable_bus_memory(dev, vfio_dev_fd)) {
 		RTE_LOG(ERR, EAL, "Cannot enable bus memory!\n");
 		return -1;
 	}
 
 	/* set bus mastering for the device */
-	if (pci_vfio_set_bus_master(vfio_dev_fd, true)) {
+	if (pci_vfio_set_bus_master(dev, vfio_dev_fd, true)) {
 		RTE_LOG(ERR, EAL, "Cannot set up bus mastering!\n");
 		return -1;
 	}
@@ -704,7 +746,7 @@ pci_vfio_info_cap(struct vfio_region_info *info, int cap)
 static int
 pci_vfio_msix_is_mappable(int vfio_dev_fd, int msix_region)
 {
-	struct vfio_region_info *info;
+	struct vfio_region_info *info = NULL;
 	int ret;
 
 	ret = pci_vfio_get_region_info(vfio_dev_fd, &info, msix_region);
@@ -719,11 +761,40 @@ pci_vfio_msix_is_mappable(int vfio_dev_fd, int msix_region)
 	return ret;
 }
 
+static int
+pci_vfio_fill_regions(struct rte_pci_device *dev, int vfio_dev_fd,
+		      struct vfio_device_info *device_info)
+{
+	struct rte_pci_device_internal *pdev = RTE_PCI_DEVICE_INTERNAL(dev);
+	struct vfio_region_info *reg = NULL;
+	int nb_maps, i, ret;
+
+	nb_maps = RTE_MIN((int)device_info->num_regions,
+			VFIO_PCI_CONFIG_REGION_INDEX + 1);
+
+	for (i = 0; i < nb_maps; i++) {
+		ret = pci_vfio_get_region_info(vfio_dev_fd, &reg, i);
+		if (ret < 0) {
+			RTE_LOG(DEBUG, EAL, "%s cannot get device region info error %i (%s)\n",
+				dev->name, errno, strerror(errno));
+			return -1;
+		}
+
+		pdev->region[i].size = reg->size;
+		pdev->region[i].offset = reg->offset;
+
+		free(reg);
+	}
+
+	return 0;
+}
 
 static int
 pci_vfio_map_resource_primary(struct rte_pci_device *dev)
 {
+	struct rte_pci_device_internal *pdev = RTE_PCI_DEVICE_INTERNAL(dev);
 	struct vfio_device_info device_info = { .argsz = sizeof(device_info) };
+	struct vfio_region_info *reg = NULL;
 	char pci_addr[PATH_MAX] = {0};
 	int vfio_dev_fd;
 	struct rte_pci_addr *loc = &dev->addr;
@@ -767,11 +838,22 @@ pci_vfio_map_resource_primary(struct rte_pci_device *dev)
 	/* map BARs */
 	maps = vfio_res->maps;
 
+	ret = pci_vfio_get_region_info(vfio_dev_fd, &reg,
+		VFIO_PCI_CONFIG_REGION_INDEX);
+	if (ret < 0) {
+		RTE_LOG(ERR, EAL, "%s cannot get device region info error %i (%s)\n",
+			dev->name, errno, strerror(errno));
+		goto err_vfio_res;
+	}
+	pdev->region[VFIO_PCI_CONFIG_REGION_INDEX].size = reg->size;
+	pdev->region[VFIO_PCI_CONFIG_REGION_INDEX].offset = reg->offset;
+	free(reg);
+
 	vfio_res->msix_table.bar_index = -1;
 	/* get MSI-X BAR, if any (we have to know where it is because we can't
 	 * easily mmap it when using VFIO)
 	 */
-	ret = pci_vfio_get_msix_bar(vfio_dev_fd, &vfio_res->msix_table);
+	ret = pci_vfio_get_msix_bar(dev, vfio_dev_fd, &vfio_res->msix_table);
 	if (ret < 0) {
 		RTE_LOG(ERR, EAL, "%s cannot get MSI-X BAR number!\n",
 				pci_addr);
@@ -792,7 +874,6 @@ pci_vfio_map_resource_primary(struct rte_pci_device *dev)
 	}
 
 	for (i = 0; i < vfio_res->nb_maps; i++) {
-		struct vfio_region_info *reg = NULL;
 		void *bar_addr;
 
 		ret = pci_vfio_get_region_info(vfio_dev_fd, &reg, i);
@@ -803,8 +884,11 @@ pci_vfio_map_resource_primary(struct rte_pci_device *dev)
 			goto err_vfio_res;
 		}
 
+		pdev->region[i].size = reg->size;
+		pdev->region[i].offset = reg->offset;
+
 		/* chk for io port region */
-		ret = pci_vfio_is_ioport_bar(vfio_dev_fd, i);
+		ret = pci_vfio_is_ioport_bar(dev, vfio_dev_fd, i);
 		if (ret < 0) {
 			free(reg);
 			goto err_vfio_res;
@@ -916,6 +1000,10 @@ pci_vfio_map_resource_secondary(struct rte_pci_device *dev)
 	if (ret)
 		return ret;
 
+	ret = pci_vfio_fill_regions(dev, vfio_dev_fd, &device_info);
+	if (ret)
+		return ret;
+
 	/* map BARs */
 	maps = vfio_res->maps;
 
@@ -1031,7 +1119,7 @@ pci_vfio_unmap_resource_primary(struct rte_pci_device *dev)
 	if (vfio_dev_fd < 0)
 		return -1;
 
-	if (pci_vfio_set_bus_master(vfio_dev_fd, false)) {
+	if (pci_vfio_set_bus_master(dev, vfio_dev_fd, false)) {
 		RTE_LOG(ERR, EAL, "%s cannot unset bus mastering for PCI device!\n",
 				pci_addr);
 		return -1;
@@ -1111,14 +1199,21 @@ int
 pci_vfio_ioport_map(struct rte_pci_device *dev, int bar,
 		    struct rte_pci_ioport *p)
 {
+	uint64_t size, offset;
+
 	if (bar < VFIO_PCI_BAR0_REGION_INDEX ||
 	    bar > VFIO_PCI_BAR5_REGION_INDEX) {
 		RTE_LOG(ERR, EAL, "invalid bar (%d)!\n", bar);
 		return -1;
 	}
 
+	if (pci_vfio_get_region(dev, bar, &size, &offset) != 0) {
+		RTE_LOG(ERR, EAL, "Cannot get offset of region %d.\n", bar);
+		return -1;
+	}
+
 	p->dev = dev;
-	p->base = VFIO_GET_REGION_ADDR(bar);
+	p->base = offset;
 	return 0;
 }
 
diff --git a/drivers/bus/pci/private.h b/drivers/bus/pci/private.h
index b564646e03..2d6991ccb7 100644
--- a/drivers/bus/pci/private.h
+++ b/drivers/bus/pci/private.h
@@ -13,6 +13,8 @@
 #include <rte_os_shim.h>
 #include <rte_pci.h>
 
+#define RTE_MAX_PCI_REGIONS    9
+
 /*
  * Convert struct rte_pci_device to struct rte_pci_device_internal
  */
@@ -42,8 +44,15 @@ extern struct rte_pci_bus rte_pci_bus;
 struct rte_pci_driver;
 struct rte_pci_device;
 
+struct rte_pci_region {
+	uint64_t size;
+	uint64_t offset;
+};
+
 struct rte_pci_device_internal {
 	struct rte_pci_device device;
+	/* PCI regions provided by e.g. VFIO. */
+	struct rte_pci_region region[RTE_MAX_PCI_REGIONS];
 };
 
 /**
diff --git a/lib/eal/include/rte_vfio.h b/lib/eal/include/rte_vfio.h
index 7bdb8932b2..3487c4f2a2 100644
--- a/lib/eal/include/rte_vfio.h
+++ b/lib/eal/include/rte_vfio.h
@@ -38,7 +38,6 @@ extern "C" {
 #define VFIO_CONTAINER_PATH "/dev/vfio/vfio"
 #define VFIO_GROUP_FMT "/dev/vfio/%u"
 #define VFIO_NOIOMMU_GROUP_FMT "/dev/vfio/noiommu-%u"
-#define VFIO_GET_REGION_ADDR(x) ((uint64_t) x << 40ULL)
 #define VFIO_GET_REGION_IDX(x) (x >> 40)
 #define VFIO_NOIOMMU_MODE      \
 	"/sys/module/vfio/parameters/enable_unsafe_noiommu_mode"
-- 
2.25.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v3 3/4] bus/pci: introduce helper for MMIO read and write
  2023-05-25 16:31       ` [PATCH v3 0/4] Support VFIO sparse mmap in PCI bus Miao Li
  2023-05-25 16:31         ` [PATCH v3 1/4] bus/pci: introduce an internal representation of PCI device Miao Li
  2023-05-25 16:31         ` [PATCH v3 2/4] bus/pci: avoid depending on private value in kernel source Miao Li
@ 2023-05-25 16:31         ` Miao Li
  2023-05-29  6:16           ` [EXT] " Sunil Kumar Kori
  2023-05-29  6:31           ` Cao, Yahui
  2023-05-25 16:31         ` [PATCH v3 4/4] bus/pci: add VFIO sparse mmap support Miao Li
  2023-05-31  5:37         ` [PATCH v4 0/4] Support VFIO sparse mmap in PCI bus Miao Li
  4 siblings, 2 replies; 50+ messages in thread
From: Miao Li @ 2023-05-25 16:31 UTC (permalink / raw)
  To: dev
  Cc: skori, thomas, david.marchand, ferruh.yigit, chenbo.xia,
	yahui.cao, Anatoly Burakov

From: Chenbo Xia <chenbo.xia@intel.com>

The MMIO regions may not be mmap-able for VFIO-PCI devices.
In this case, the driver should explicitly do read and write
to access these regions.

Signed-off-by: Chenbo Xia <chenbo.xia@intel.com>
---
 drivers/bus/pci/bsd/pci.c        | 22 +++++++++++++++
 drivers/bus/pci/linux/pci.c      | 46 ++++++++++++++++++++++++++++++
 drivers/bus/pci/linux/pci_init.h | 10 +++++++
 drivers/bus/pci/linux/pci_uio.c  | 22 +++++++++++++++
 drivers/bus/pci/linux/pci_vfio.c | 36 ++++++++++++++++++++++++
 drivers/bus/pci/rte_bus_pci.h    | 48 ++++++++++++++++++++++++++++++++
 drivers/bus/pci/version.map      |  3 ++
 7 files changed, 187 insertions(+)

diff --git a/drivers/bus/pci/bsd/pci.c b/drivers/bus/pci/bsd/pci.c
index a747eca58c..27f12590d4 100644
--- a/drivers/bus/pci/bsd/pci.c
+++ b/drivers/bus/pci/bsd/pci.c
@@ -489,6 +489,28 @@ int rte_pci_write_config(const struct rte_pci_device *dev,
 	return -1;
 }
 
+/* Read PCI MMIO space. */
+int rte_pci_mmio_read(const struct rte_pci_device *dev, int bar,
+		      void *buf, size_t len, off_t offset)
+{
+	if (bar >= PCI_MAX_RESOURCE || dev->mem_resource[bar].addr == NULL ||
+			(uint64_t)offset + len > dev->mem_resource[bar].len)
+		return -1;
+	memcpy(buf, (uint8_t *)dev->mem_resource[bar].addr + offset, len);
+	return len;
+}
+
+/* Write PCI MMIO space. */
+int rte_pci_mmio_write(const struct rte_pci_device *dev, int bar,
+		       const void *buf, size_t len, off_t offset)
+{
+	if (bar >= PCI_MAX_RESOURCE || dev->mem_resource[bar].addr == NULL ||
+			(uint64_t)offset + len > dev->mem_resource[bar].len)
+		return -1;
+	memcpy((uint8_t *)dev->mem_resource[bar].addr + offset, buf, len);
+	return len;
+}
+
 int
 rte_pci_ioport_map(struct rte_pci_device *dev, int bar,
 		struct rte_pci_ioport *p)
diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index 04e21ae20f..3d237398d9 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -680,6 +680,52 @@ int rte_pci_write_config(const struct rte_pci_device *device,
 	}
 }
 
+/* Read PCI MMIO space. */
+int rte_pci_mmio_read(const struct rte_pci_device *device, int bar,
+		void *buf, size_t len, off_t offset)
+{
+	char devname[RTE_DEV_NAME_MAX_LEN] = "";
+
+	switch (device->kdrv) {
+	case RTE_PCI_KDRV_IGB_UIO:
+	case RTE_PCI_KDRV_UIO_GENERIC:
+		return pci_uio_mmio_read(device, bar, buf, len, offset);
+#ifdef VFIO_PRESENT
+	case RTE_PCI_KDRV_VFIO:
+		return pci_vfio_mmio_read(device, bar, buf, len, offset);
+#endif
+	default:
+		rte_pci_device_name(&device->addr, devname,
+				    RTE_DEV_NAME_MAX_LEN);
+		RTE_LOG(ERR, EAL,
+			"Unknown driver type for %s\n", devname);
+		return -1;
+	}
+}
+
+/* Write PCI MMIO space. */
+int rte_pci_mmio_write(const struct rte_pci_device *device, int bar,
+		const void *buf, size_t len, off_t offset)
+{
+	char devname[RTE_DEV_NAME_MAX_LEN] = "";
+
+	switch (device->kdrv) {
+	case RTE_PCI_KDRV_IGB_UIO:
+	case RTE_PCI_KDRV_UIO_GENERIC:
+		return pci_uio_mmio_write(device, bar, buf, len, offset);
+#ifdef VFIO_PRESENT
+	case RTE_PCI_KDRV_VFIO:
+		return pci_vfio_mmio_write(device, bar, buf, len, offset);
+#endif
+	default:
+		rte_pci_device_name(&device->addr, devname,
+				    RTE_DEV_NAME_MAX_LEN);
+		RTE_LOG(ERR, EAL,
+			"Unknown driver type for %s\n", devname);
+		return -1;
+	}
+}
+
 int
 rte_pci_ioport_map(struct rte_pci_device *dev, int bar,
 		struct rte_pci_ioport *p)
diff --git a/drivers/bus/pci/linux/pci_init.h b/drivers/bus/pci/linux/pci_init.h
index 9f6659ba6e..d842809ccd 100644
--- a/drivers/bus/pci/linux/pci_init.h
+++ b/drivers/bus/pci/linux/pci_init.h
@@ -37,6 +37,11 @@ int pci_uio_read_config(const struct rte_intr_handle *intr_handle,
 int pci_uio_write_config(const struct rte_intr_handle *intr_handle,
 			 const void *buf, size_t len, off_t offs);
 
+int pci_uio_mmio_read(const struct rte_pci_device *dev, int bar,
+			void *buf, size_t len, off_t offset);
+int pci_uio_mmio_write(const struct rte_pci_device *dev, int bar,
+			const void *buf, size_t len, off_t offset);
+
 int pci_uio_ioport_map(struct rte_pci_device *dev, int bar,
 		       struct rte_pci_ioport *p);
 void pci_uio_ioport_read(struct rte_pci_ioport *p,
@@ -71,6 +76,11 @@ int pci_vfio_read_config(const struct rte_pci_device *dev,
 int pci_vfio_write_config(const struct rte_pci_device *dev,
 			  const void *buf, size_t len, off_t offs);
 
+int pci_vfio_mmio_read(const struct rte_pci_device *dev, int bar,
+			void *buf, size_t len, off_t offset);
+int pci_vfio_mmio_write(const struct rte_pci_device *dev, int bar,
+			const void *buf, size_t len, off_t offset);
+
 int pci_vfio_ioport_map(struct rte_pci_device *dev, int bar,
 		        struct rte_pci_ioport *p);
 void pci_vfio_ioport_read(struct rte_pci_ioport *p,
diff --git a/drivers/bus/pci/linux/pci_uio.c b/drivers/bus/pci/linux/pci_uio.c
index d52125e49b..2bf16e9369 100644
--- a/drivers/bus/pci/linux/pci_uio.c
+++ b/drivers/bus/pci/linux/pci_uio.c
@@ -55,6 +55,28 @@ pci_uio_write_config(const struct rte_intr_handle *intr_handle,
 	return pwrite(uio_cfg_fd, buf, len, offset);
 }
 
+int
+pci_uio_mmio_read(const struct rte_pci_device *dev, int bar,
+		  void *buf, size_t len, off_t offset)
+{
+	if (bar >= PCI_MAX_RESOURCE || dev->mem_resource[bar].addr == NULL ||
+			(uint64_t)offset + len > dev->mem_resource[bar].len)
+		return -1;
+	memcpy(buf, (uint8_t *)dev->mem_resource[bar].addr + offset, len);
+	return len;
+}
+
+int
+pci_uio_mmio_write(const struct rte_pci_device *dev, int bar,
+		   const void *buf, size_t len, off_t offset)
+{
+	if (bar >= PCI_MAX_RESOURCE || dev->mem_resource[bar].addr == NULL ||
+			(uint64_t)offset + len > dev->mem_resource[bar].len)
+		return -1;
+	memcpy((uint8_t *)dev->mem_resource[bar].addr + offset, buf, len);
+	return len;
+}
+
 static int
 pci_uio_set_bus_master(int dev_fd)
 {
diff --git a/drivers/bus/pci/linux/pci_vfio.c b/drivers/bus/pci/linux/pci_vfio.c
index 5aef84b7d0..24b0795fbd 100644
--- a/drivers/bus/pci/linux/pci_vfio.c
+++ b/drivers/bus/pci/linux/pci_vfio.c
@@ -1258,6 +1258,42 @@ pci_vfio_ioport_unmap(struct rte_pci_ioport *p)
 	return -1;
 }
 
+int
+pci_vfio_mmio_read(const struct rte_pci_device *dev, int bar,
+		   void *buf, size_t len, off_t offs)
+{
+	uint64_t size, offset;
+	int fd;
+
+	fd = rte_intr_dev_fd_get(dev->intr_handle);
+
+	if (pci_vfio_get_region(dev, bar, &size, &offset) != 0)
+		return -1;
+
+	if ((uint64_t)len + offs > size)
+		return -1;
+
+	return pread64(fd, buf, len, offset + offs);
+}
+
+int
+pci_vfio_mmio_write(const struct rte_pci_device *dev, int bar,
+		    const void *buf, size_t len, off_t offs)
+{
+	uint64_t size, offset;
+	int fd;
+
+	fd = rte_intr_dev_fd_get(dev->intr_handle);
+
+	if (pci_vfio_get_region(dev, bar, &size, &offset) != 0)
+		return -1;
+
+	if ((uint64_t)len + offs > size)
+		return -1;
+
+	return pwrite64(fd, buf, len, offset + offs);
+}
+
 int
 pci_vfio_is_enabled(void)
 {
diff --git a/drivers/bus/pci/rte_bus_pci.h b/drivers/bus/pci/rte_bus_pci.h
index b193114fe5..82da087f24 100644
--- a/drivers/bus/pci/rte_bus_pci.h
+++ b/drivers/bus/pci/rte_bus_pci.h
@@ -135,6 +135,54 @@ int rte_pci_read_config(const struct rte_pci_device *device,
 int rte_pci_write_config(const struct rte_pci_device *device,
 		const void *buf, size_t len, off_t offset);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Read from a MMIO pci resource.
+ *
+ * @param device
+ *   A pointer to a rte_pci_device structure describing the device
+ *   to use
+ * @param bar
+ *   Index of the io pci resource we want to access.
+ * @param buf
+ *   A data buffer where the bytes should be read into
+ * @param len
+ *   The length of the data buffer.
+ * @param offset
+ *   The offset into MMIO space described by @bar
+ * @return
+ *  Number of bytes read on success, negative on error.
+ */
+__rte_experimental
+int rte_pci_mmio_read(const struct rte_pci_device *device, int bar,
+		void *buf, size_t len, off_t offset);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Write to a MMIO pci resource.
+ *
+ * @param device
+ *   A pointer to a rte_pci_device structure describing the device
+ *   to use
+ * @param bar
+ *   Index of the io pci resource we want to access.
+ * @param buf
+ *   A data buffer containing the bytes should be written
+ * @param len
+ *   The length of the data buffer.
+ * @param offset
+ *   The offset into MMIO space described by @bar
+ * @return
+ *  Number of bytes written on success, negative on error.
+ */
+__rte_experimental
+int rte_pci_mmio_write(const struct rte_pci_device *device, int bar,
+		const void *buf, size_t len, off_t offset);
+
 /**
  * Initialize a rte_pci_ioport object for a pci device io resource.
  *
diff --git a/drivers/bus/pci/version.map b/drivers/bus/pci/version.map
index 161ab86d3b..00fde139ca 100644
--- a/drivers/bus/pci/version.map
+++ b/drivers/bus/pci/version.map
@@ -21,6 +21,9 @@ EXPERIMENTAL {
 
 	# added in 21.08
 	rte_pci_set_bus_master;
+	# added in 23.07
+	rte_pci_mmio_read;
+	rte_pci_mmio_write;
 };
 
 INTERNAL {
-- 
2.25.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v3 4/4] bus/pci: add VFIO sparse mmap support
  2023-05-25 16:31       ` [PATCH v3 0/4] Support VFIO sparse mmap in PCI bus Miao Li
                           ` (2 preceding siblings ...)
  2023-05-25 16:31         ` [PATCH v3 3/4] bus/pci: introduce helper for MMIO read and write Miao Li
@ 2023-05-25 16:31         ` Miao Li
  2023-05-29  6:17           ` [EXT] " Sunil Kumar Kori
                             ` (2 more replies)
  2023-05-31  5:37         ` [PATCH v4 0/4] Support VFIO sparse mmap in PCI bus Miao Li
  4 siblings, 3 replies; 50+ messages in thread
From: Miao Li @ 2023-05-25 16:31 UTC (permalink / raw)
  To: dev
  Cc: skori, thomas, david.marchand, ferruh.yigit, chenbo.xia,
	yahui.cao, Anatoly Burakov

This patch adds sparse mmap support in PCI bus. Sparse mmap is a
capability defined in VFIO which allows multiple mmap areas in one
VFIO region.

In this patch, the sparse mmap regions are mapped to one continuous
virtual address region that follows device-specific BAR layout. So,
driver can still access all mapped sparse mmap regions by using
'bar_base_address + bar_offset'.

Signed-off-by: Miao Li <miao.li@intel.com>
Signed-off-by: Chenbo Xia <chenbo.xia@intel.com>
---
 drivers/bus/pci/linux/pci_vfio.c | 104 +++++++++++++++++++++++++++----
 drivers/bus/pci/private.h        |   2 +
 2 files changed, 94 insertions(+), 12 deletions(-)

diff --git a/drivers/bus/pci/linux/pci_vfio.c b/drivers/bus/pci/linux/pci_vfio.c
index 24b0795fbd..c411909976 100644
--- a/drivers/bus/pci/linux/pci_vfio.c
+++ b/drivers/bus/pci/linux/pci_vfio.c
@@ -673,6 +673,54 @@ pci_vfio_mmap_bar(int vfio_dev_fd, struct mapped_pci_resource *vfio_res,
 	return 0;
 }
 
+static int
+pci_vfio_sparse_mmap_bar(int vfio_dev_fd, struct mapped_pci_resource *vfio_res,
+		int bar_index, int additional_flags)
+{
+	struct pci_map *bar = &vfio_res->maps[bar_index];
+	struct vfio_region_sparse_mmap_area *sparse;
+	void *bar_addr;
+	uint32_t i;
+
+	if (bar->size == 0) {
+		RTE_LOG(DEBUG, EAL, "Bar size is 0, skip BAR%d\n", bar_index);
+		return 0;
+	}
+
+	/* reserve the address using an inaccessible mapping */
+	bar_addr = mmap(bar->addr, bar->size, 0, MAP_PRIVATE |
+			MAP_ANONYMOUS | additional_flags, -1, 0);
+	if (bar_addr != MAP_FAILED) {
+		void *map_addr = NULL;
+		for (i = 0; i < bar->nr_areas; i++) {
+			sparse = &bar->areas[i];
+			if (sparse->size) {
+				void *addr = RTE_PTR_ADD(bar_addr, (uintptr_t)sparse->offset);
+				map_addr = pci_map_resource(addr, vfio_dev_fd,
+					bar->offset + sparse->offset, sparse->size,
+					RTE_MAP_FORCE_ADDRESS);
+				if (map_addr == NULL) {
+					munmap(bar_addr, bar->size);
+					RTE_LOG(ERR, EAL, "Failed to map pci BAR%d\n",
+						bar_index);
+					goto err_map;
+				}
+			}
+		}
+	} else {
+		RTE_LOG(ERR, EAL, "Failed to create inaccessible mapping for BAR%d\n",
+			bar_index);
+		goto err_map;
+	}
+
+	bar->addr = bar_addr;
+	return 0;
+
+err_map:
+	bar->nr_areas = 0;
+	return -1;
+}
+
 /*
  * region info may contain capability headers, so we need to keep reallocating
  * the memory until we match allocated memory size with argsz.
@@ -875,6 +923,8 @@ pci_vfio_map_resource_primary(struct rte_pci_device *dev)
 
 	for (i = 0; i < vfio_res->nb_maps; i++) {
 		void *bar_addr;
+		struct vfio_info_cap_header *hdr;
+		struct vfio_region_info_cap_sparse_mmap *sparse;
 
 		ret = pci_vfio_get_region_info(vfio_dev_fd, &reg, i);
 		if (ret < 0) {
@@ -920,12 +970,33 @@ pci_vfio_map_resource_primary(struct rte_pci_device *dev)
 		maps[i].size = reg->size;
 		maps[i].path = NULL; /* vfio doesn't have per-resource paths */
 
-		ret = pci_vfio_mmap_bar(vfio_dev_fd, vfio_res, i, 0);
-		if (ret < 0) {
-			RTE_LOG(ERR, EAL, "%s mapping BAR%i failed: %s\n",
-					pci_addr, i, strerror(errno));
-			free(reg);
-			goto err_vfio_res;
+		hdr = pci_vfio_info_cap(reg, VFIO_REGION_INFO_CAP_SPARSE_MMAP);
+
+		if (hdr != NULL) {
+			sparse = container_of(hdr,
+				struct vfio_region_info_cap_sparse_mmap, header);
+			if (sparse->nr_areas > 0) {
+				maps[i].nr_areas = sparse->nr_areas;
+				maps[i].areas = sparse->areas;
+			}
+		}
+
+		if (maps[i].nr_areas > 0) {
+			ret = pci_vfio_sparse_mmap_bar(vfio_dev_fd, vfio_res, i, 0);
+			if (ret < 0) {
+				RTE_LOG(ERR, EAL, "%s sparse mapping BAR%i failed: %s\n",
+						pci_addr, i, strerror(errno));
+				free(reg);
+				goto err_vfio_res;
+			}
+		} else {
+			ret = pci_vfio_mmap_bar(vfio_dev_fd, vfio_res, i, 0);
+			if (ret < 0) {
+				RTE_LOG(ERR, EAL, "%s mapping BAR%i failed: %s\n",
+						pci_addr, i, strerror(errno));
+				free(reg);
+				goto err_vfio_res;
+			}
 		}
 
 		dev->mem_resource[i].addr = maps[i].addr;
@@ -1008,11 +1079,20 @@ pci_vfio_map_resource_secondary(struct rte_pci_device *dev)
 	maps = vfio_res->maps;
 
 	for (i = 0; i < vfio_res->nb_maps; i++) {
-		ret = pci_vfio_mmap_bar(vfio_dev_fd, vfio_res, i, MAP_FIXED);
-		if (ret < 0) {
-			RTE_LOG(ERR, EAL, "%s mapping BAR%i failed: %s\n",
-					pci_addr, i, strerror(errno));
-			goto err_vfio_dev_fd;
+		if (maps[i].nr_areas > 0) {
+			ret = pci_vfio_sparse_mmap_bar(vfio_dev_fd, vfio_res, i, 0);
+			if (ret < 0) {
+				RTE_LOG(ERR, EAL, "%s sparse mapping BAR%i failed: %s\n",
+						pci_addr, i, strerror(errno));
+				goto err_vfio_dev_fd;
+			}
+		} else {
+			ret = pci_vfio_mmap_bar(vfio_dev_fd, vfio_res, i, 0);
+			if (ret < 0) {
+				RTE_LOG(ERR, EAL, "%s mapping BAR%i failed: %s\n",
+						pci_addr, i, strerror(errno));
+				goto err_vfio_dev_fd;
+			}
 		}
 
 		dev->mem_resource[i].addr = maps[i].addr;
@@ -1062,7 +1142,7 @@ find_and_unmap_vfio_resource(struct mapped_pci_res_list *vfio_res_list,
 		break;
 	}
 
-	if  (vfio_res == NULL)
+	if (vfio_res == NULL)
 		return vfio_res;
 
 	RTE_LOG(INFO, EAL, "Releasing PCI mapped resource for %s\n",
diff --git a/drivers/bus/pci/private.h b/drivers/bus/pci/private.h
index 2d6991ccb7..8b0ce73533 100644
--- a/drivers/bus/pci/private.h
+++ b/drivers/bus/pci/private.h
@@ -121,6 +121,8 @@ struct pci_map {
 	uint64_t offset;
 	uint64_t size;
 	uint64_t phaddr;
+	uint32_t nr_areas;
+	struct vfio_region_sparse_mmap_area *areas;
 };
 
 struct pci_msix_table {
-- 
2.25.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [EXT] [PATCH v3 1/4] bus/pci: introduce an internal representation of PCI device
  2023-05-25 16:31         ` [PATCH v3 1/4] bus/pci: introduce an internal representation of PCI device Miao Li
@ 2023-05-29  6:14           ` Sunil Kumar Kori
  2023-05-29  6:28           ` Cao, Yahui
  1 sibling, 0 replies; 50+ messages in thread
From: Sunil Kumar Kori @ 2023-05-29  6:14 UTC (permalink / raw)
  To: Miao Li, dev; +Cc: thomas, david.marchand, ferruh.yigit, chenbo.xia, yahui.cao

> -----Original Message-----
> From: Miao Li <miao.li@intel.com>
> Sent: Thursday, May 25, 2023 10:01 PM
> To: dev@dpdk.org
> Cc: Sunil Kumar Kori <skori@marvell.com>; thomas@monjalon.net;
> david.marchand@redhat.com; ferruh.yigit@amd.com;
> chenbo.xia@intel.com; yahui.cao@intel.com
> Subject: [EXT] [PATCH v3 1/4] bus/pci: introduce an internal representation
> of PCI device
> 
> External Email
> 
> ----------------------------------------------------------------------
> From: Chenbo Xia <chenbo.xia@intel.com>
> 
> This patch introduces an internal representation of the PCI device which will
> be used to store the internal information that don't have to be exposed to
> drivers, e.g., the VFIO region sizes/offsets.
> 
> In this patch, the internal structure is simply a wrapper of the rte_pci_device
> structure. More fields will be added.
> 
> Signed-off-by: Chenbo Xia <chenbo.xia@intel.com>
> ---
>  drivers/bus/pci/bsd/pci.c     | 13 ++++++++-----
>  drivers/bus/pci/linux/pci.c   | 28 ++++++++++++++++------------
>  drivers/bus/pci/pci_common.c  | 12 ++++++------
>  drivers/bus/pci/private.h     | 14 +++++++++++++-
>  drivers/bus/pci/windows/pci.c | 14 +++++++++-----
>  5 files changed, 52 insertions(+), 29 deletions(-)
> 

Acked-by: Sunil Kumar Kori <skori@marvell.com>

...
[snipped]
...
> 2.25.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [EXT] [PATCH v3 2/4] bus/pci: avoid depending on private value in kernel source
  2023-05-25 16:31         ` [PATCH v3 2/4] bus/pci: avoid depending on private value in kernel source Miao Li
@ 2023-05-29  6:15           ` Sunil Kumar Kori
  2023-05-29  6:30           ` Cao, Yahui
  1 sibling, 0 replies; 50+ messages in thread
From: Sunil Kumar Kori @ 2023-05-29  6:15 UTC (permalink / raw)
  To: Miao Li, dev
  Cc: thomas, david.marchand, ferruh.yigit, chenbo.xia, yahui.cao,
	Anatoly Burakov

> -----Original Message-----
> From: Miao Li <miao.li@intel.com>
> Sent: Thursday, May 25, 2023 10:01 PM
> To: dev@dpdk.org
> Cc: Sunil Kumar Kori <skori@marvell.com>; thomas@monjalon.net;
> david.marchand@redhat.com; ferruh.yigit@amd.com;
> chenbo.xia@intel.com; yahui.cao@intel.com; Anatoly Burakov
> <anatoly.burakov@intel.com>
> Subject: [EXT] [PATCH v3 2/4] bus/pci: avoid depending on private value in
> kernel source
> 
> External Email
> 
> ----------------------------------------------------------------------
> From: Chenbo Xia <chenbo.xia@intel.com>
> 
> The value 40 used in VFIO_GET_REGION_ADDR() is a private value
> (VFIO_PCI_OFFSET_SHIFT) defined in Linux kernel source [1]. It is not part of
> VFIO API, and we should not depend on it.
> 
> [1] https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__github.com_torvalds_linux_blob_v6.2_include_linux_vfio-5Fpci-
> 5Fcore.h&d=DwIDAg&c=nKjWec2b6R0mOyPaz7xtfQ&r=dXeXaAMkP5COgn1z
> xHMyaF1_d9IIuq6vHQO6NrIPjaE&m=JmiGBKn8A8yznTdPB1knOyEeYM4moYy
> ws5F5wylRMz9_Jp8-FRr-
> _FDWaUpA6a7U&s=DBgE0M81mcB0EWqXuq8apKbHmhKQIQ52RFcPWdHXat
> s&e=
> 
> Signed-off-by: Chenbo Xia <chenbo.xia@intel.com>
> ---
>  drivers/bus/pci/linux/pci.c      |   4 +-
>  drivers/bus/pci/linux/pci_init.h |   4 +-
>  drivers/bus/pci/linux/pci_vfio.c | 197 +++++++++++++++++++++++--------
>  drivers/bus/pci/private.h        |   9 ++
>  lib/eal/include/rte_vfio.h       |   1 -
>  5 files changed, 159 insertions(+), 56 deletions(-)
> 
Acked-by: Sunil Kumar Kori <skori@marvell.com>

...
[snipped]
...

> 2.25.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [EXT] [PATCH v3 3/4] bus/pci: introduce helper for MMIO read and write
  2023-05-25 16:31         ` [PATCH v3 3/4] bus/pci: introduce helper for MMIO read and write Miao Li
@ 2023-05-29  6:16           ` Sunil Kumar Kori
  2023-05-29  6:31           ` Cao, Yahui
  1 sibling, 0 replies; 50+ messages in thread
From: Sunil Kumar Kori @ 2023-05-29  6:16 UTC (permalink / raw)
  To: Miao Li, dev
  Cc: thomas, david.marchand, ferruh.yigit, chenbo.xia, yahui.cao,
	Anatoly Burakov

> -----Original Message-----
> From: Miao Li <miao.li@intel.com>
> Sent: Thursday, May 25, 2023 10:01 PM
> To: dev@dpdk.org
> Cc: Sunil Kumar Kori <skori@marvell.com>; thomas@monjalon.net;
> david.marchand@redhat.com; ferruh.yigit@amd.com;
> chenbo.xia@intel.com; yahui.cao@intel.com; Anatoly Burakov
> <anatoly.burakov@intel.com>
> Subject: [EXT] [PATCH v3 3/4] bus/pci: introduce helper for MMIO read and
> write
> 
> External Email
> 
> ----------------------------------------------------------------------
> From: Chenbo Xia <chenbo.xia@intel.com>
> 
> The MMIO regions may not be mmap-able for VFIO-PCI devices.
> In this case, the driver should explicitly do read and write to access these
> regions.
> 
> Signed-off-by: Chenbo Xia <chenbo.xia@intel.com>
> ---
>  drivers/bus/pci/bsd/pci.c        | 22 +++++++++++++++
>  drivers/bus/pci/linux/pci.c      | 46 ++++++++++++++++++++++++++++++
>  drivers/bus/pci/linux/pci_init.h | 10 +++++++
> drivers/bus/pci/linux/pci_uio.c  | 22 +++++++++++++++
> drivers/bus/pci/linux/pci_vfio.c | 36 ++++++++++++++++++++++++
>  drivers/bus/pci/rte_bus_pci.h    | 48 ++++++++++++++++++++++++++++++++
>  drivers/bus/pci/version.map      |  3 ++
>  7 files changed, 187 insertions(+)
> 
Acked-by: Sunil Kumar Kori <skori@marvell.com>

...
[snipped]
...

> 2.25.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [EXT] [PATCH v3 4/4] bus/pci: add VFIO sparse mmap support
  2023-05-25 16:31         ` [PATCH v3 4/4] bus/pci: add VFIO sparse mmap support Miao Li
@ 2023-05-29  6:17           ` Sunil Kumar Kori
  2023-05-29  6:32           ` Cao, Yahui
  2023-05-29  9:25           ` Xia, Chenbo
  2 siblings, 0 replies; 50+ messages in thread
From: Sunil Kumar Kori @ 2023-05-29  6:17 UTC (permalink / raw)
  To: Miao Li, dev
  Cc: thomas, david.marchand, ferruh.yigit, chenbo.xia, yahui.cao,
	Anatoly Burakov

> -----Original Message-----
> From: Miao Li <miao.li@intel.com>
> Sent: Thursday, May 25, 2023 10:01 PM
> To: dev@dpdk.org
> Cc: Sunil Kumar Kori <skori@marvell.com>; thomas@monjalon.net;
> david.marchand@redhat.com; ferruh.yigit@amd.com;
> chenbo.xia@intel.com; yahui.cao@intel.com; Anatoly Burakov
> <anatoly.burakov@intel.com>
> Subject: [EXT] [PATCH v3 4/4] bus/pci: add VFIO sparse mmap support
> 
> External Email
> 
> ----------------------------------------------------------------------
> This patch adds sparse mmap support in PCI bus. Sparse mmap is a capability
> defined in VFIO which allows multiple mmap areas in one VFIO region.
> 
> In this patch, the sparse mmap regions are mapped to one continuous virtual
> address region that follows device-specific BAR layout. So, driver can still
> access all mapped sparse mmap regions by using 'bar_base_address +
> bar_offset'.
> 
> Signed-off-by: Miao Li <miao.li@intel.com>
> Signed-off-by: Chenbo Xia <chenbo.xia@intel.com>
> ---
>  drivers/bus/pci/linux/pci_vfio.c | 104 +++++++++++++++++++++++++++----
>  drivers/bus/pci/private.h        |   2 +
>  2 files changed, 94 insertions(+), 12 deletions(-)
> 

Acked-by: Sunil Kumar Kori <skori@marvell.com>
...
[snipped]
...

> 2.25.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 1/4] bus/pci: introduce an internal representation of PCI device
  2023-05-25 16:31         ` [PATCH v3 1/4] bus/pci: introduce an internal representation of PCI device Miao Li
  2023-05-29  6:14           ` [EXT] " Sunil Kumar Kori
@ 2023-05-29  6:28           ` Cao, Yahui
  1 sibling, 0 replies; 50+ messages in thread
From: Cao, Yahui @ 2023-05-29  6:28 UTC (permalink / raw)
  To: Miao Li, dev; +Cc: skori, thomas, david.marchand, ferruh.yigit, chenbo.xia


On 5/26/2023 12:31 AM, Miao Li wrote:
> From: Chenbo Xia <chenbo.xia@intel.com>
>
> This patch introduces an internal representation of the PCI device
> which will be used to store the internal information that don't have
> to be exposed to drivers, e.g., the VFIO region sizes/offsets.
>
> In this patch, the internal structure is simply a wrapper of the
> rte_pci_device structure. More fields will be added.
>
> Signed-off-by: Chenbo Xia <chenbo.xia@intel.com>
> ---
>   drivers/bus/pci/bsd/pci.c     | 13 ++++++++-----
>   drivers/bus/pci/linux/pci.c   | 28 ++++++++++++++++------------
>   drivers/bus/pci/pci_common.c  | 12 ++++++------
>   drivers/bus/pci/private.h     | 14 +++++++++++++-
>   drivers/bus/pci/windows/pci.c | 14 +++++++++-----
>   5 files changed, 52 insertions(+), 29 deletions(-)
>
Acked-by: Yahui Cao <yahui.cao@intel.com>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 2/4] bus/pci: avoid depending on private value in kernel source
  2023-05-25 16:31         ` [PATCH v3 2/4] bus/pci: avoid depending on private value in kernel source Miao Li
  2023-05-29  6:15           ` [EXT] " Sunil Kumar Kori
@ 2023-05-29  6:30           ` Cao, Yahui
  1 sibling, 0 replies; 50+ messages in thread
From: Cao, Yahui @ 2023-05-29  6:30 UTC (permalink / raw)
  To: Miao Li, dev
  Cc: skori, thomas, david.marchand, ferruh.yigit, chenbo.xia, Anatoly Burakov


On 5/26/2023 12:31 AM, Miao Li wrote:
> From: Chenbo Xia <chenbo.xia@intel.com>
>
> The value 40 used in VFIO_GET_REGION_ADDR() is a private value
> (VFIO_PCI_OFFSET_SHIFT) defined in Linux kernel source [1]. It
> is not part of VFIO API, and we should not depend on it.
>
> [1] https://github.com/torvalds/linux/blob/v6.2/include/linux/vfio_pci_core.h
>
> Signed-off-by: Chenbo Xia <chenbo.xia@intel.com>
> ---
>   drivers/bus/pci/linux/pci.c      |   4 +-
>   drivers/bus/pci/linux/pci_init.h |   4 +-
>   drivers/bus/pci/linux/pci_vfio.c | 197 +++++++++++++++++++++++--------
>   drivers/bus/pci/private.h        |   9 ++
>   lib/eal/include/rte_vfio.h       |   1 -
>   5 files changed, 159 insertions(+), 56 deletions(-)
>
Acked-by: Yahui Cao <yahui.cao@intel.com>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 3/4] bus/pci: introduce helper for MMIO read and write
  2023-05-25 16:31         ` [PATCH v3 3/4] bus/pci: introduce helper for MMIO read and write Miao Li
  2023-05-29  6:16           ` [EXT] " Sunil Kumar Kori
@ 2023-05-29  6:31           ` Cao, Yahui
  1 sibling, 0 replies; 50+ messages in thread
From: Cao, Yahui @ 2023-05-29  6:31 UTC (permalink / raw)
  To: Miao Li, dev
  Cc: skori, thomas, david.marchand, ferruh.yigit, chenbo.xia, Anatoly Burakov


On 5/26/2023 12:31 AM, Miao Li wrote:
> From: Chenbo Xia <chenbo.xia@intel.com>
>
> The MMIO regions may not be mmap-able for VFIO-PCI devices.
> In this case, the driver should explicitly do read and write
> to access these regions.
>
> Signed-off-by: Chenbo Xia <chenbo.xia@intel.com>
> ---
>   drivers/bus/pci/bsd/pci.c        | 22 +++++++++++++++
>   drivers/bus/pci/linux/pci.c      | 46 ++++++++++++++++++++++++++++++
>   drivers/bus/pci/linux/pci_init.h | 10 +++++++
>   drivers/bus/pci/linux/pci_uio.c  | 22 +++++++++++++++
>   drivers/bus/pci/linux/pci_vfio.c | 36 ++++++++++++++++++++++++
>   drivers/bus/pci/rte_bus_pci.h    | 48 ++++++++++++++++++++++++++++++++
>   drivers/bus/pci/version.map      |  3 ++
>   7 files changed, 187 insertions(+)
>
Acked-by: Yahui Cao <yahui.cao@intel.com>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 4/4] bus/pci: add VFIO sparse mmap support
  2023-05-25 16:31         ` [PATCH v3 4/4] bus/pci: add VFIO sparse mmap support Miao Li
  2023-05-29  6:17           ` [EXT] " Sunil Kumar Kori
@ 2023-05-29  6:32           ` Cao, Yahui
  2023-05-29  9:25           ` Xia, Chenbo
  2 siblings, 0 replies; 50+ messages in thread
From: Cao, Yahui @ 2023-05-29  6:32 UTC (permalink / raw)
  To: Miao Li, dev
  Cc: skori, thomas, david.marchand, ferruh.yigit, chenbo.xia, Anatoly Burakov


On 5/26/2023 12:31 AM, Miao Li wrote:
> This patch adds sparse mmap support in PCI bus. Sparse mmap is a
> capability defined in VFIO which allows multiple mmap areas in one
> VFIO region.
>
> In this patch, the sparse mmap regions are mapped to one continuous
> virtual address region that follows device-specific BAR layout. So,
> driver can still access all mapped sparse mmap regions by using
> 'bar_base_address + bar_offset'.
>
> Signed-off-by: Miao Li <miao.li@intel.com>
> Signed-off-by: Chenbo Xia <chenbo.xia@intel.com>
> ---
>   drivers/bus/pci/linux/pci_vfio.c | 104 +++++++++++++++++++++++++++----
>   drivers/bus/pci/private.h        |   2 +
>   2 files changed, 94 insertions(+), 12 deletions(-)
>
Acked-by: Yahui Cao <yahui.cao@intel.com>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [PATCH v3 4/4] bus/pci: add VFIO sparse mmap support
  2023-05-25 16:31         ` [PATCH v3 4/4] bus/pci: add VFIO sparse mmap support Miao Li
  2023-05-29  6:17           ` [EXT] " Sunil Kumar Kori
  2023-05-29  6:32           ` Cao, Yahui
@ 2023-05-29  9:25           ` Xia, Chenbo
  2 siblings, 0 replies; 50+ messages in thread
From: Xia, Chenbo @ 2023-05-29  9:25 UTC (permalink / raw)
  To: Li, Miao, dev
  Cc: skori, thomas, david.marchand, ferruh.yigit, Cao, Yahui, Burakov,
	Anatoly

> -----Original Message-----
> From: Li, Miao <miao.li@intel.com>
> Sent: Friday, May 26, 2023 12:31 AM
> To: dev@dpdk.org
> Cc: skori@marvell.com; thomas@monjalon.net; david.marchand@redhat.com;
> ferruh.yigit@amd.com; Xia, Chenbo <chenbo.xia@intel.com>; Cao, Yahui
> <yahui.cao@intel.com>; Burakov, Anatoly <anatoly.burakov@intel.com>
> Subject: [PATCH v3 4/4] bus/pci: add VFIO sparse mmap support
> 
> This patch adds sparse mmap support in PCI bus. Sparse mmap is a
> capability defined in VFIO which allows multiple mmap areas in one
> VFIO region.
> 
> In this patch, the sparse mmap regions are mapped to one continuous
> virtual address region that follows device-specific BAR layout. So,
> driver can still access all mapped sparse mmap regions by using
> 'bar_base_address + bar_offset'.
> 
> Signed-off-by: Miao Li <miao.li@intel.com>
> Signed-off-by: Chenbo Xia <chenbo.xia@intel.com>
> ---
>  drivers/bus/pci/linux/pci_vfio.c | 104 +++++++++++++++++++++++++++----
>  drivers/bus/pci/private.h        |   2 +
>  2 files changed, 94 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/bus/pci/linux/pci_vfio.c
> b/drivers/bus/pci/linux/pci_vfio.c
> index 24b0795fbd..c411909976 100644
> --- a/drivers/bus/pci/linux/pci_vfio.c
> +++ b/drivers/bus/pci/linux/pci_vfio.c
> @@ -673,6 +673,54 @@ pci_vfio_mmap_bar(int vfio_dev_fd, struct
> mapped_pci_resource *vfio_res,
>  	return 0;
>  }
> 
> +static int
> +pci_vfio_sparse_mmap_bar(int vfio_dev_fd, struct mapped_pci_resource
> *vfio_res,
> +		int bar_index, int additional_flags)
> +{
> +	struct pci_map *bar = &vfio_res->maps[bar_index];
> +	struct vfio_region_sparse_mmap_area *sparse;
> +	void *bar_addr;
> +	uint32_t i;
> +
> +	if (bar->size == 0) {
> +		RTE_LOG(DEBUG, EAL, "Bar size is 0, skip BAR%d\n", bar_index);
> +		return 0;
> +	}
> +
> +	/* reserve the address using an inaccessible mapping */
> +	bar_addr = mmap(bar->addr, bar->size, 0, MAP_PRIVATE |
> +			MAP_ANONYMOUS | additional_flags, -1, 0);
> +	if (bar_addr != MAP_FAILED) {
> +		void *map_addr = NULL;
> +		for (i = 0; i < bar->nr_areas; i++) {
> +			sparse = &bar->areas[i];
> +			if (sparse->size) {
> +				void *addr = RTE_PTR_ADD(bar_addr,
> (uintptr_t)sparse->offset);
> +				map_addr = pci_map_resource(addr, vfio_dev_fd,
> +					bar->offset + sparse->offset, sparse->size,
> +					RTE_MAP_FORCE_ADDRESS);
> +				if (map_addr == NULL) {
> +					munmap(bar_addr, bar->size);
> +					RTE_LOG(ERR, EAL, "Failed to map pci
> BAR%d\n",
> +						bar_index);
> +					goto err_map;
> +				}
> +			}
> +		}
> +	} else {
> +		RTE_LOG(ERR, EAL, "Failed to create inaccessible mapping for
> BAR%d\n",
> +			bar_index);
> +		goto err_map;
> +	}
> +
> +	bar->addr = bar_addr;
> +	return 0;
> +
> +err_map:
> +	bar->nr_areas = 0;
> +	return -1;
> +}
> +
>  /*
>   * region info may contain capability headers, so we need to keep
> reallocating
>   * the memory until we match allocated memory size with argsz.
> @@ -875,6 +923,8 @@ pci_vfio_map_resource_primary(struct rte_pci_device
> *dev)
> 
>  	for (i = 0; i < vfio_res->nb_maps; i++) {
>  		void *bar_addr;
> +		struct vfio_info_cap_header *hdr;
> +		struct vfio_region_info_cap_sparse_mmap *sparse;
> 
>  		ret = pci_vfio_get_region_info(vfio_dev_fd, &reg, i);
>  		if (ret < 0) {
> @@ -920,12 +970,33 @@ pci_vfio_map_resource_primary(struct rte_pci_device
> *dev)
>  		maps[i].size = reg->size;
>  		maps[i].path = NULL; /* vfio doesn't have per-resource paths
> */
> 
> -		ret = pci_vfio_mmap_bar(vfio_dev_fd, vfio_res, i, 0);
> -		if (ret < 0) {
> -			RTE_LOG(ERR, EAL, "%s mapping BAR%i failed: %s\n",
> -					pci_addr, i, strerror(errno));
> -			free(reg);
> -			goto err_vfio_res;
> +		hdr = pci_vfio_info_cap(reg, VFIO_REGION_INFO_CAP_SPARSE_MMAP);
> +
> +		if (hdr != NULL) {
> +			sparse = container_of(hdr,
> +				struct vfio_region_info_cap_sparse_mmap, header);
> +			if (sparse->nr_areas > 0) {
> +				maps[i].nr_areas = sparse->nr_areas;
> +				maps[i].areas = sparse->areas;

I just notice that this is wrong as the memory that pointer 'sparse' points to
will be freed at the end. map[i].areas needs to be allocated by rte_zmalloc
and freed correctly. Otherwise it could leads to secondary process segfault
when it tries to access maps[i].areas.

Will fix this in v4.

Thanks,
Chenbo

> +			}
> +		}
> +
> +		if (maps[i].nr_areas > 0) {
> +			ret = pci_vfio_sparse_mmap_bar(vfio_dev_fd, vfio_res, i,
> 0);
> +			if (ret < 0) {
> +				RTE_LOG(ERR, EAL, "%s sparse mapping BAR%i
> failed: %s\n",
> +						pci_addr, i, strerror(errno));
> +				free(reg);
> +				goto err_vfio_res;
> +			}
> +		} else {
> +			ret = pci_vfio_mmap_bar(vfio_dev_fd, vfio_res, i, 0);
> +			if (ret < 0) {
> +				RTE_LOG(ERR, EAL, "%s mapping BAR%i failed: %s\n",
> +						pci_addr, i, strerror(errno));
> +				free(reg);
> +				goto err_vfio_res;
> +			}
>  		}
> 
>  		dev->mem_resource[i].addr = maps[i].addr;
> @@ -1008,11 +1079,20 @@ pci_vfio_map_resource_secondary(struct
> rte_pci_device *dev)
>  	maps = vfio_res->maps;
> 
>  	for (i = 0; i < vfio_res->nb_maps; i++) {
> -		ret = pci_vfio_mmap_bar(vfio_dev_fd, vfio_res, i, MAP_FIXED);
> -		if (ret < 0) {
> -			RTE_LOG(ERR, EAL, "%s mapping BAR%i failed: %s\n",
> -					pci_addr, i, strerror(errno));
> -			goto err_vfio_dev_fd;
> +		if (maps[i].nr_areas > 0) {
> +			ret = pci_vfio_sparse_mmap_bar(vfio_dev_fd, vfio_res, i,
> 0);
> +			if (ret < 0) {
> +				RTE_LOG(ERR, EAL, "%s sparse mapping BAR%i
> failed: %s\n",
> +						pci_addr, i, strerror(errno));
> +				goto err_vfio_dev_fd;
> +			}
> +		} else {
> +			ret = pci_vfio_mmap_bar(vfio_dev_fd, vfio_res, i, 0);
> +			if (ret < 0) {
> +				RTE_LOG(ERR, EAL, "%s mapping BAR%i failed: %s\n",
> +						pci_addr, i, strerror(errno));
> +				goto err_vfio_dev_fd;
> +			}
>  		}
> 
>  		dev->mem_resource[i].addr = maps[i].addr;
> @@ -1062,7 +1142,7 @@ find_and_unmap_vfio_resource(struct
> mapped_pci_res_list *vfio_res_list,
>  		break;
>  	}
> 
> -	if  (vfio_res == NULL)
> +	if (vfio_res == NULL)
>  		return vfio_res;
> 
>  	RTE_LOG(INFO, EAL, "Releasing PCI mapped resource for %s\n",
> diff --git a/drivers/bus/pci/private.h b/drivers/bus/pci/private.h
> index 2d6991ccb7..8b0ce73533 100644
> --- a/drivers/bus/pci/private.h
> +++ b/drivers/bus/pci/private.h
> @@ -121,6 +121,8 @@ struct pci_map {
>  	uint64_t offset;
>  	uint64_t size;
>  	uint64_t phaddr;
> +	uint32_t nr_areas;
> +	struct vfio_region_sparse_mmap_area *areas;
>  };
> 
>  struct pci_msix_table {
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v4 0/4] Support VFIO sparse mmap in PCI bus
  2023-05-25 16:31       ` [PATCH v3 0/4] Support VFIO sparse mmap in PCI bus Miao Li
                           ` (3 preceding siblings ...)
  2023-05-25 16:31         ` [PATCH v3 4/4] bus/pci: add VFIO sparse mmap support Miao Li
@ 2023-05-31  5:37         ` Miao Li
  2023-05-31  5:37           ` [PATCH v4 1/4] bus/pci: introduce an internal representation of PCI device Miao Li
                             ` (5 more replies)
  4 siblings, 6 replies; 50+ messages in thread
From: Miao Li @ 2023-05-31  5:37 UTC (permalink / raw)
  To: dev; +Cc: skori, thomas, david.marchand, ferruh.yigit, chenbo.xia, yahui.cao

This series introduces a VFIO standard capability, called sparse
mmap to PCI bus. In linux kernel, it's defined as
VFIO_REGION_INFO_CAP_SPARSE_MMAP. Sparse mmap means instead of
mmap whole BAR region into DPDK process, only mmap part of the
BAR region after getting sparse mmap information from kernel.
For the rest of BAR region that is not mmap-ed, DPDK process
can use pread/pwrite system calls to access. Sparse mmap is
useful when kernel does not want userspace to mmap whole BAR
region, or kernel wants to control over access to specific BAR
region. Vendors can choose to enable this feature or not for
their devices in their specific kernel modules.

In this patchset:

Patch 1-3 is mainly for introducing BAR access APIs so that
driver could use them to access specific BAR using pread/pwrite
system calls when part of the BAR is not mmap-able. Patch 4
adds the VFIO sparse mmap support finally.

v4:
1. add sparse mmap information allocation and release
2. add release note for BAR access APIs

v3:
fix variable 'pdev' and 'info' uninitialized error

v2:
1. add PCI device internal structure in bus/pci/windows/pci.c
2. fix parameter type error

Chenbo Xia (3):
  bus/pci: introduce an internal representation of PCI device
  bus/pci: avoid depending on private value in kernel source
  bus/pci: introduce helper for MMIO read and write

Miao Li (1):
  bus/pci: add VFIO sparse mmap support

 doc/guides/rel_notes/release_23_07.rst |   5 +
 drivers/bus/pci/bsd/pci.c              |  35 ++-
 drivers/bus/pci/linux/pci.c            |  78 +++++-
 drivers/bus/pci/linux/pci_init.h       |  14 +-
 drivers/bus/pci/linux/pci_uio.c        |  22 ++
 drivers/bus/pci/linux/pci_vfio.c       | 371 ++++++++++++++++++++-----
 drivers/bus/pci/pci_common.c           |  12 +-
 drivers/bus/pci/private.h              |  25 +-
 drivers/bus/pci/rte_bus_pci.h          |  48 ++++
 drivers/bus/pci/version.map            |   3 +
 drivers/bus/pci/windows/pci.c          |  14 +-
 lib/eal/include/rte_vfio.h             |   1 -
 12 files changed, 525 insertions(+), 103 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v4 1/4] bus/pci: introduce an internal representation of PCI device
  2023-05-31  5:37         ` [PATCH v4 0/4] Support VFIO sparse mmap in PCI bus Miao Li
@ 2023-05-31  5:37           ` Miao Li
  2023-05-31  5:37           ` [PATCH v4 2/4] bus/pci: avoid depending on private value in kernel source Miao Li
                             ` (4 subsequent siblings)
  5 siblings, 0 replies; 50+ messages in thread
From: Miao Li @ 2023-05-31  5:37 UTC (permalink / raw)
  To: dev; +Cc: skori, thomas, david.marchand, ferruh.yigit, chenbo.xia, yahui.cao

From: Chenbo Xia <chenbo.xia@intel.com>

This patch introduces an internal representation of the PCI device
which will be used to store the internal information that don't have
to be exposed to drivers, e.g., the VFIO region sizes/offsets.

In this patch, the internal structure is simply a wrapper of the
rte_pci_device structure. More fields will be added.

Signed-off-by: Chenbo Xia <chenbo.xia@intel.com>
Acked-by: Sunil Kumar Kori <skori@marvell.com>
Acked-by: Yahui Cao <yahui.cao@intel.com>
---
 drivers/bus/pci/bsd/pci.c     | 13 ++++++++-----
 drivers/bus/pci/linux/pci.c   | 28 ++++++++++++++++------------
 drivers/bus/pci/pci_common.c  | 12 ++++++------
 drivers/bus/pci/private.h     | 14 +++++++++++++-
 drivers/bus/pci/windows/pci.c | 14 +++++++++-----
 5 files changed, 52 insertions(+), 29 deletions(-)

diff --git a/drivers/bus/pci/bsd/pci.c b/drivers/bus/pci/bsd/pci.c
index 7459d15c7e..a747eca58c 100644
--- a/drivers/bus/pci/bsd/pci.c
+++ b/drivers/bus/pci/bsd/pci.c
@@ -208,16 +208,19 @@ pci_uio_map_resource_by_index(struct rte_pci_device *dev, int res_idx,
 static int
 pci_scan_one(int dev_pci_fd, struct pci_conf *conf)
 {
+	struct rte_pci_device_internal *pdev;
 	struct rte_pci_device *dev;
 	struct pci_bar_io bar;
 	unsigned i, max;
 
-	dev = malloc(sizeof(*dev));
-	if (dev == NULL) {
+	pdev = malloc(sizeof(*pdev));
+	if (pdev == NULL) {
+		RTE_LOG(ERR, EAL, "Cannot allocate memory for internal pci device\n");
 		return -1;
 	}
 
-	memset(dev, 0, sizeof(*dev));
+	memset(pdev, 0, sizeof(*pdev));
+	dev = &pdev->device;
 	dev->device.bus = &rte_pci_bus.bus;
 
 	dev->addr.domain = conf->pc_sel.pc_domain;
@@ -303,7 +306,7 @@ pci_scan_one(int dev_pci_fd, struct pci_conf *conf)
 				memmove(dev2->mem_resource,
 					dev->mem_resource,
 					sizeof(dev->mem_resource));
-				pci_free(dev);
+				pci_free(pdev);
 			}
 			return 0;
 		}
@@ -313,7 +316,7 @@ pci_scan_one(int dev_pci_fd, struct pci_conf *conf)
 	return 0;
 
 skipdev:
-	pci_free(dev);
+	pci_free(pdev);
 	return 0;
 }
 
diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index ebd1395502..4c2c5ba382 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -211,22 +211,26 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr *addr)
 {
 	char filename[PATH_MAX];
 	unsigned long tmp;
+	struct rte_pci_device_internal *pdev;
 	struct rte_pci_device *dev;
 	char driver[PATH_MAX];
 	int ret;
 
-	dev = malloc(sizeof(*dev));
-	if (dev == NULL)
+	pdev = malloc(sizeof(*pdev));
+	if (pdev == NULL) {
+		RTE_LOG(ERR, EAL, "Cannot allocate memory for internal pci device\n");
 		return -1;
+	}
 
-	memset(dev, 0, sizeof(*dev));
+	memset(pdev, 0, sizeof(*pdev));
+	dev = &pdev->device;
 	dev->device.bus = &rte_pci_bus.bus;
 	dev->addr = *addr;
 
 	/* get vendor id */
 	snprintf(filename, sizeof(filename), "%s/vendor", dirname);
 	if (eal_parse_sysfs_value(filename, &tmp) < 0) {
-		pci_free(dev);
+		pci_free(pdev);
 		return -1;
 	}
 	dev->id.vendor_id = (uint16_t)tmp;
@@ -234,7 +238,7 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr *addr)
 	/* get device id */
 	snprintf(filename, sizeof(filename), "%s/device", dirname);
 	if (eal_parse_sysfs_value(filename, &tmp) < 0) {
-		pci_free(dev);
+		pci_free(pdev);
 		return -1;
 	}
 	dev->id.device_id = (uint16_t)tmp;
@@ -243,7 +247,7 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr *addr)
 	snprintf(filename, sizeof(filename), "%s/subsystem_vendor",
 		 dirname);
 	if (eal_parse_sysfs_value(filename, &tmp) < 0) {
-		pci_free(dev);
+		pci_free(pdev);
 		return -1;
 	}
 	dev->id.subsystem_vendor_id = (uint16_t)tmp;
@@ -252,7 +256,7 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr *addr)
 	snprintf(filename, sizeof(filename), "%s/subsystem_device",
 		 dirname);
 	if (eal_parse_sysfs_value(filename, &tmp) < 0) {
-		pci_free(dev);
+		pci_free(pdev);
 		return -1;
 	}
 	dev->id.subsystem_device_id = (uint16_t)tmp;
@@ -261,7 +265,7 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr *addr)
 	snprintf(filename, sizeof(filename), "%s/class",
 		 dirname);
 	if (eal_parse_sysfs_value(filename, &tmp) < 0) {
-		pci_free(dev);
+		pci_free(pdev);
 		return -1;
 	}
 	/* the least 24 bits are valid: class, subclass, program interface */
@@ -297,7 +301,7 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr *addr)
 	snprintf(filename, sizeof(filename), "%s/resource", dirname);
 	if (pci_parse_sysfs_resource(filename, dev) < 0) {
 		RTE_LOG(ERR, EAL, "%s(): cannot parse resource\n", __func__);
-		pci_free(dev);
+		pci_free(pdev);
 		return -1;
 	}
 
@@ -306,7 +310,7 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr *addr)
 	ret = pci_get_kernel_driver_by_path(filename, driver, sizeof(driver));
 	if (ret < 0) {
 		RTE_LOG(ERR, EAL, "Fail to get kernel driver\n");
-		pci_free(dev);
+		pci_free(pdev);
 		return -1;
 	}
 
@@ -320,7 +324,7 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr *addr)
 		else
 			dev->kdrv = RTE_PCI_KDRV_UNKNOWN;
 	} else {
-		pci_free(dev);
+		pci_free(pdev);
 		return 0;
 	}
 	/* device is valid, add in list (sorted) */
@@ -375,7 +379,7 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr *addr)
 						pci_common_set(dev2);
 					}
 				}
-				pci_free(dev);
+				pci_free(pdev);
 			}
 			return 0;
 		}
diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
index e32a9d517a..52404ab0fe 100644
--- a/drivers/bus/pci/pci_common.c
+++ b/drivers/bus/pci/pci_common.c
@@ -121,12 +121,12 @@ pci_common_set(struct rte_pci_device *dev)
 }
 
 void
-pci_free(struct rte_pci_device *dev)
+pci_free(struct rte_pci_device_internal *pdev)
 {
-	if (dev == NULL)
+	if (pdev == NULL)
 		return;
-	free(dev->bus_info);
-	free(dev);
+	free(pdev->device.bus_info);
+	free(pdev);
 }
 
 /* map a particular resource from a file */
@@ -465,7 +465,7 @@ pci_cleanup(void)
 		rte_intr_instance_free(dev->vfio_req_intr_handle);
 		dev->vfio_req_intr_handle = NULL;
 
-		pci_free(dev);
+		pci_free(RTE_PCI_DEVICE_INTERNAL(dev));
 	}
 
 	return error;
@@ -681,7 +681,7 @@ pci_unplug(struct rte_device *dev)
 	if (ret == 0) {
 		rte_pci_remove_device(pdev);
 		rte_devargs_remove(dev->devargs);
-		pci_free(pdev);
+		pci_free(RTE_PCI_DEVICE_INTERNAL(pdev));
 	}
 	return ret;
 }
diff --git a/drivers/bus/pci/private.h b/drivers/bus/pci/private.h
index c8161a1074..b564646e03 100644
--- a/drivers/bus/pci/private.h
+++ b/drivers/bus/pci/private.h
@@ -13,6 +13,14 @@
 #include <rte_os_shim.h>
 #include <rte_pci.h>
 
+/*
+ * Convert struct rte_pci_device to struct rte_pci_device_internal
+ */
+#define RTE_PCI_DEVICE_INTERNAL(ptr) \
+	container_of(ptr, struct rte_pci_device_internal, device)
+#define RTE_PCI_DEVICE_INTERNAL_CONST(ptr) \
+	container_of(ptr, const struct rte_pci_device_internal, device)
+
 /**
  * Structure describing the PCI bus
  */
@@ -34,6 +42,10 @@ extern struct rte_pci_bus rte_pci_bus;
 struct rte_pci_driver;
 struct rte_pci_device;
 
+struct rte_pci_device_internal {
+	struct rte_pci_device device;
+};
+
 /**
  * Scan the content of the PCI bus, and the devices in the devices
  * list
@@ -53,7 +65,7 @@ pci_common_set(struct rte_pci_device *dev);
  * Free a PCI device.
  */
 void
-pci_free(struct rte_pci_device *dev);
+pci_free(struct rte_pci_device_internal *pdev);
 
 /**
  * Validate whether a device with given PCI address should be ignored or not.
diff --git a/drivers/bus/pci/windows/pci.c b/drivers/bus/pci/windows/pci.c
index 5cf05ce1a0..df5221d913 100644
--- a/drivers/bus/pci/windows/pci.c
+++ b/drivers/bus/pci/windows/pci.c
@@ -336,6 +336,7 @@ set_kernel_driver_type(PSP_DEVINFO_DATA device_info_data,
 static int
 pci_scan_one(HDEVINFO dev_info, PSP_DEVINFO_DATA device_info_data)
 {
+	struct rte_pci_device_internal *pdev = NULL;
 	struct rte_pci_device *dev = NULL;
 	int ret = -1;
 	char  pci_device_info[REGSTR_VAL_MAX_HCID_LEN];
@@ -370,11 +371,14 @@ pci_scan_one(HDEVINFO dev_info, PSP_DEVINFO_DATA device_info_data)
 		goto end;
 	}
 
-	dev = malloc(sizeof(*dev));
-	if (dev == NULL)
+	pdev = malloc(sizeof(*pdev));
+	if (pdev == NULL) {
+		RTE_LOG(ERR, EAL, "Cannot allocate memory for internal pci device\n");
 		goto end;
+	}
 
-	memset(dev, 0, sizeof(*dev));
+	memset(pdev, 0, sizeof(*pdev));
+	dev = &pdev->device;
 
 	dev->device.bus = &rte_pci_bus.bus;
 	dev->addr = addr;
@@ -409,7 +413,7 @@ pci_scan_one(HDEVINFO dev_info, PSP_DEVINFO_DATA device_info_data)
 				dev2->max_vfs = dev->max_vfs;
 				memmove(dev2->mem_resource, dev->mem_resource,
 					sizeof(dev->mem_resource));
-				pci_free(dev);
+				pci_free(pdev);
 			}
 			return 0;
 		}
@@ -418,7 +422,7 @@ pci_scan_one(HDEVINFO dev_info, PSP_DEVINFO_DATA device_info_data)
 
 	return 0;
 end:
-	pci_free(dev);
+	pci_free(pdev);
 	return ret;
 }
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v4 2/4] bus/pci: avoid depending on private value in kernel source
  2023-05-31  5:37         ` [PATCH v4 0/4] Support VFIO sparse mmap in PCI bus Miao Li
  2023-05-31  5:37           ` [PATCH v4 1/4] bus/pci: introduce an internal representation of PCI device Miao Li
@ 2023-05-31  5:37           ` Miao Li
  2023-05-31  5:37           ` [PATCH v4 3/4] bus/pci: introduce helper for MMIO read and write Miao Li
                             ` (3 subsequent siblings)
  5 siblings, 0 replies; 50+ messages in thread
From: Miao Li @ 2023-05-31  5:37 UTC (permalink / raw)
  To: dev
  Cc: skori, thomas, david.marchand, ferruh.yigit, chenbo.xia,
	yahui.cao, Anatoly Burakov

From: Chenbo Xia <chenbo.xia@intel.com>

The value 40 used in VFIO_GET_REGION_ADDR() is a private value
(VFIO_PCI_OFFSET_SHIFT) defined in Linux kernel source [1]. It
is not part of VFIO API, and we should not depend on it.

[1] https://github.com/torvalds/linux/blob/v6.2/include/linux/vfio_pci_core.h

Signed-off-by: Chenbo Xia <chenbo.xia@intel.com>
Acked-by: Sunil Kumar Kori <skori@marvell.com>
Acked-by: Yahui Cao <yahui.cao@intel.com>
---
 drivers/bus/pci/linux/pci.c      |   4 +-
 drivers/bus/pci/linux/pci_init.h |   4 +-
 drivers/bus/pci/linux/pci_vfio.c | 197 +++++++++++++++++++++++--------
 drivers/bus/pci/private.h        |   9 ++
 lib/eal/include/rte_vfio.h       |   1 -
 5 files changed, 159 insertions(+), 56 deletions(-)

diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index 4c2c5ba382..04e21ae20f 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -645,7 +645,7 @@ int rte_pci_read_config(const struct rte_pci_device *device,
 		return pci_uio_read_config(intr_handle, buf, len, offset);
 #ifdef VFIO_PRESENT
 	case RTE_PCI_KDRV_VFIO:
-		return pci_vfio_read_config(intr_handle, buf, len, offset);
+		return pci_vfio_read_config(device, buf, len, offset);
 #endif
 	default:
 		rte_pci_device_name(&device->addr, devname,
@@ -669,7 +669,7 @@ int rte_pci_write_config(const struct rte_pci_device *device,
 		return pci_uio_write_config(intr_handle, buf, len, offset);
 #ifdef VFIO_PRESENT
 	case RTE_PCI_KDRV_VFIO:
-		return pci_vfio_write_config(intr_handle, buf, len, offset);
+		return pci_vfio_write_config(device, buf, len, offset);
 #endif
 	default:
 		rte_pci_device_name(&device->addr, devname,
diff --git a/drivers/bus/pci/linux/pci_init.h b/drivers/bus/pci/linux/pci_init.h
index dcea726186..9f6659ba6e 100644
--- a/drivers/bus/pci/linux/pci_init.h
+++ b/drivers/bus/pci/linux/pci_init.h
@@ -66,9 +66,9 @@ int pci_uio_ioport_unmap(struct rte_pci_ioport *p);
 #endif
 
 /* access config space */
-int pci_vfio_read_config(const struct rte_intr_handle *intr_handle,
+int pci_vfio_read_config(const struct rte_pci_device *dev,
 			 void *buf, size_t len, off_t offs);
-int pci_vfio_write_config(const struct rte_intr_handle *intr_handle,
+int pci_vfio_write_config(const struct rte_pci_device *dev,
 			  const void *buf, size_t len, off_t offs);
 
 int pci_vfio_ioport_map(struct rte_pci_device *dev, int bar,
diff --git a/drivers/bus/pci/linux/pci_vfio.c b/drivers/bus/pci/linux/pci_vfio.c
index fab3483d9f..5aef84b7d0 100644
--- a/drivers/bus/pci/linux/pci_vfio.c
+++ b/drivers/bus/pci/linux/pci_vfio.c
@@ -43,45 +43,82 @@ static struct rte_tailq_elem rte_vfio_tailq = {
 };
 EAL_REGISTER_TAILQ(rte_vfio_tailq)
 
+static int
+pci_vfio_get_region(const struct rte_pci_device *dev, int index,
+		    uint64_t *size, uint64_t *offset)
+{
+	const struct rte_pci_device_internal *pdev =
+		RTE_PCI_DEVICE_INTERNAL_CONST(dev);
+
+	if (index >= VFIO_PCI_NUM_REGIONS || index >= RTE_MAX_PCI_REGIONS)
+		return -1;
+
+	if (pdev->region[index].size == 0 && pdev->region[index].offset == 0)
+		return -1;
+
+	*size   = pdev->region[index].size;
+	*offset = pdev->region[index].offset;
+
+	return 0;
+}
+
 int
-pci_vfio_read_config(const struct rte_intr_handle *intr_handle,
+pci_vfio_read_config(const struct rte_pci_device *dev,
 		    void *buf, size_t len, off_t offs)
 {
-	int vfio_dev_fd = rte_intr_dev_fd_get(intr_handle);
+	uint64_t size, offset;
+	int fd;
 
-	if (vfio_dev_fd < 0)
+	fd = rte_intr_dev_fd_get(dev->intr_handle);
+
+	if (pci_vfio_get_region(dev, VFIO_PCI_CONFIG_REGION_INDEX,
+				&size, &offset) != 0)
+		return -1;
+
+	if ((uint64_t)len + offs > size)
 		return -1;
 
-	return pread64(vfio_dev_fd, buf, len,
-	       VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) + offs);
+	return pread64(fd, buf, len, offset + offs);
 }
 
 int
-pci_vfio_write_config(const struct rte_intr_handle *intr_handle,
+pci_vfio_write_config(const struct rte_pci_device *dev,
 		    const void *buf, size_t len, off_t offs)
 {
-	int vfio_dev_fd = rte_intr_dev_fd_get(intr_handle);
+	uint64_t size, offset;
+	int fd;
 
-	if (vfio_dev_fd < 0)
+	fd = rte_intr_dev_fd_get(dev->intr_handle);
+
+	if (pci_vfio_get_region(dev, VFIO_PCI_CONFIG_REGION_INDEX,
+				&size, &offset) != 0)
 		return -1;
 
-	return pwrite64(vfio_dev_fd, buf, len,
-	       VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) + offs);
+	if ((uint64_t)len + offs > size)
+		return -1;
+
+	return pwrite64(fd, buf, len, offset + offs);
 }
 
 /* get PCI BAR number where MSI-X interrupts are */
 static int
-pci_vfio_get_msix_bar(int fd, struct pci_msix_table *msix_table)
+pci_vfio_get_msix_bar(const struct rte_pci_device *dev, int fd,
+	struct pci_msix_table *msix_table)
 {
 	int ret;
 	uint32_t reg;
 	uint16_t flags;
 	uint8_t cap_id, cap_offset;
+	uint64_t size, offset;
+
+	if (pci_vfio_get_region(dev, VFIO_PCI_CONFIG_REGION_INDEX,
+		&size, &offset) != 0) {
+		RTE_LOG(ERR, EAL, "Cannot get offset of CONFIG region.\n");
+		return -1;
+	}
 
 	/* read PCI capability pointer from config space */
-	ret = pread64(fd, &reg, sizeof(reg),
-			VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
-			PCI_CAPABILITY_LIST);
+	ret = pread64(fd, &reg, sizeof(reg), offset + PCI_CAPABILITY_LIST);
 	if (ret != sizeof(reg)) {
 		RTE_LOG(ERR, EAL,
 			"Cannot read capability pointer from PCI config space!\n");
@@ -94,9 +131,7 @@ pci_vfio_get_msix_bar(int fd, struct pci_msix_table *msix_table)
 	while (cap_offset) {
 
 		/* read PCI capability ID */
-		ret = pread64(fd, &reg, sizeof(reg),
-				VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
-				cap_offset);
+		ret = pread64(fd, &reg, sizeof(reg), offset + cap_offset);
 		if (ret != sizeof(reg)) {
 			RTE_LOG(ERR, EAL,
 				"Cannot read capability ID from PCI config space!\n");
@@ -108,9 +143,7 @@ pci_vfio_get_msix_bar(int fd, struct pci_msix_table *msix_table)
 
 		/* if we haven't reached MSI-X, check next capability */
 		if (cap_id != PCI_CAP_ID_MSIX) {
-			ret = pread64(fd, &reg, sizeof(reg),
-					VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
-					cap_offset);
+			ret = pread64(fd, &reg, sizeof(reg), offset + cap_offset);
 			if (ret != sizeof(reg)) {
 				RTE_LOG(ERR, EAL,
 					"Cannot read capability pointer from PCI config space!\n");
@@ -125,18 +158,14 @@ pci_vfio_get_msix_bar(int fd, struct pci_msix_table *msix_table)
 		/* else, read table offset */
 		else {
 			/* table offset resides in the next 4 bytes */
-			ret = pread64(fd, &reg, sizeof(reg),
-					VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
-					cap_offset + 4);
+			ret = pread64(fd, &reg, sizeof(reg), offset + cap_offset + 4);
 			if (ret != sizeof(reg)) {
 				RTE_LOG(ERR, EAL,
 					"Cannot read table offset from PCI config space!\n");
 				return -1;
 			}
 
-			ret = pread64(fd, &flags, sizeof(flags),
-					VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
-					cap_offset + 2);
+			ret = pread64(fd, &flags, sizeof(flags), offset + cap_offset + 2);
 			if (ret != sizeof(flags)) {
 				RTE_LOG(ERR, EAL,
 					"Cannot read table flags from PCI config space!\n");
@@ -156,14 +185,19 @@ pci_vfio_get_msix_bar(int fd, struct pci_msix_table *msix_table)
 
 /* enable PCI bus memory space */
 static int
-pci_vfio_enable_bus_memory(int dev_fd)
+pci_vfio_enable_bus_memory(struct rte_pci_device *dev, int dev_fd)
 {
+	uint64_t size, offset;
 	uint16_t cmd;
 	int ret;
 
-	ret = pread64(dev_fd, &cmd, sizeof(cmd),
-		      VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
-		      PCI_COMMAND);
+	if (pci_vfio_get_region(dev, VFIO_PCI_CONFIG_REGION_INDEX,
+		&size, &offset) != 0) {
+		RTE_LOG(ERR, EAL, "Cannot get offset of CONFIG region.\n");
+		return -1;
+	}
+
+	ret = pread64(dev_fd, &cmd, sizeof(cmd), offset + PCI_COMMAND);
 
 	if (ret != sizeof(cmd)) {
 		RTE_LOG(ERR, EAL, "Cannot read command from PCI config space!\n");
@@ -174,9 +208,7 @@ pci_vfio_enable_bus_memory(int dev_fd)
 		return 0;
 
 	cmd |= PCI_COMMAND_MEMORY;
-	ret = pwrite64(dev_fd, &cmd, sizeof(cmd),
-		       VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
-		       PCI_COMMAND);
+	ret = pwrite64(dev_fd, &cmd, sizeof(cmd), offset + PCI_COMMAND);
 
 	if (ret != sizeof(cmd)) {
 		RTE_LOG(ERR, EAL, "Cannot write command to PCI config space!\n");
@@ -188,14 +220,19 @@ pci_vfio_enable_bus_memory(int dev_fd)
 
 /* set PCI bus mastering */
 static int
-pci_vfio_set_bus_master(int dev_fd, bool op)
+pci_vfio_set_bus_master(const struct rte_pci_device *dev, int dev_fd, bool op)
 {
+	uint64_t size, offset;
 	uint16_t reg;
 	int ret;
 
-	ret = pread64(dev_fd, &reg, sizeof(reg),
-			VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
-			PCI_COMMAND);
+	if (pci_vfio_get_region(dev, VFIO_PCI_CONFIG_REGION_INDEX,
+		&size, &offset) != 0) {
+		RTE_LOG(ERR, EAL, "Cannot get offset of CONFIG region.\n");
+		return -1;
+	}
+
+	ret = pread64(dev_fd, &reg, sizeof(reg), offset + PCI_COMMAND);
 	if (ret != sizeof(reg)) {
 		RTE_LOG(ERR, EAL, "Cannot read command from PCI config space!\n");
 		return -1;
@@ -207,9 +244,7 @@ pci_vfio_set_bus_master(int dev_fd, bool op)
 	else
 		reg &= ~(PCI_COMMAND_MASTER);
 
-	ret = pwrite64(dev_fd, &reg, sizeof(reg),
-			VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) +
-			PCI_COMMAND);
+	ret = pwrite64(dev_fd, &reg, sizeof(reg), offset + PCI_COMMAND);
 
 	if (ret != sizeof(reg)) {
 		RTE_LOG(ERR, EAL, "Cannot write command to PCI config space!\n");
@@ -458,14 +493,21 @@ pci_vfio_disable_notifier(struct rte_pci_device *dev)
 #endif
 
 static int
-pci_vfio_is_ioport_bar(int vfio_dev_fd, int bar_index)
+pci_vfio_is_ioport_bar(const struct rte_pci_device *dev, int vfio_dev_fd,
+	int bar_index)
 {
+	uint64_t size, offset;
 	uint32_t ioport_bar;
 	int ret;
 
+	if (pci_vfio_get_region(dev, VFIO_PCI_CONFIG_REGION_INDEX,
+		&size, &offset) != 0) {
+		RTE_LOG(ERR, EAL, "Cannot get offset of CONFIG region.\n");
+		return -1;
+	}
+
 	ret = pread64(vfio_dev_fd, &ioport_bar, sizeof(ioport_bar),
-			  VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX)
-			  + PCI_BASE_ADDRESS_0 + bar_index*4);
+			  offset + PCI_BASE_ADDRESS_0 + bar_index * 4);
 	if (ret != sizeof(ioport_bar)) {
 		RTE_LOG(ERR, EAL, "Cannot read command (%x) from config space!\n",
 			PCI_BASE_ADDRESS_0 + bar_index*4);
@@ -483,13 +525,13 @@ pci_rte_vfio_setup_device(struct rte_pci_device *dev, int vfio_dev_fd)
 		return -1;
 	}
 
-	if (pci_vfio_enable_bus_memory(vfio_dev_fd)) {
+	if (pci_vfio_enable_bus_memory(dev, vfio_dev_fd)) {
 		RTE_LOG(ERR, EAL, "Cannot enable bus memory!\n");
 		return -1;
 	}
 
 	/* set bus mastering for the device */
-	if (pci_vfio_set_bus_master(vfio_dev_fd, true)) {
+	if (pci_vfio_set_bus_master(dev, vfio_dev_fd, true)) {
 		RTE_LOG(ERR, EAL, "Cannot set up bus mastering!\n");
 		return -1;
 	}
@@ -704,7 +746,7 @@ pci_vfio_info_cap(struct vfio_region_info *info, int cap)
 static int
 pci_vfio_msix_is_mappable(int vfio_dev_fd, int msix_region)
 {
-	struct vfio_region_info *info;
+	struct vfio_region_info *info = NULL;
 	int ret;
 
 	ret = pci_vfio_get_region_info(vfio_dev_fd, &info, msix_region);
@@ -719,11 +761,40 @@ pci_vfio_msix_is_mappable(int vfio_dev_fd, int msix_region)
 	return ret;
 }
 
+static int
+pci_vfio_fill_regions(struct rte_pci_device *dev, int vfio_dev_fd,
+		      struct vfio_device_info *device_info)
+{
+	struct rte_pci_device_internal *pdev = RTE_PCI_DEVICE_INTERNAL(dev);
+	struct vfio_region_info *reg = NULL;
+	int nb_maps, i, ret;
+
+	nb_maps = RTE_MIN((int)device_info->num_regions,
+			VFIO_PCI_CONFIG_REGION_INDEX + 1);
+
+	for (i = 0; i < nb_maps; i++) {
+		ret = pci_vfio_get_region_info(vfio_dev_fd, &reg, i);
+		if (ret < 0) {
+			RTE_LOG(DEBUG, EAL, "%s cannot get device region info error %i (%s)\n",
+				dev->name, errno, strerror(errno));
+			return -1;
+		}
+
+		pdev->region[i].size = reg->size;
+		pdev->region[i].offset = reg->offset;
+
+		free(reg);
+	}
+
+	return 0;
+}
 
 static int
 pci_vfio_map_resource_primary(struct rte_pci_device *dev)
 {
+	struct rte_pci_device_internal *pdev = RTE_PCI_DEVICE_INTERNAL(dev);
 	struct vfio_device_info device_info = { .argsz = sizeof(device_info) };
+	struct vfio_region_info *reg = NULL;
 	char pci_addr[PATH_MAX] = {0};
 	int vfio_dev_fd;
 	struct rte_pci_addr *loc = &dev->addr;
@@ -767,11 +838,22 @@ pci_vfio_map_resource_primary(struct rte_pci_device *dev)
 	/* map BARs */
 	maps = vfio_res->maps;
 
+	ret = pci_vfio_get_region_info(vfio_dev_fd, &reg,
+		VFIO_PCI_CONFIG_REGION_INDEX);
+	if (ret < 0) {
+		RTE_LOG(ERR, EAL, "%s cannot get device region info error %i (%s)\n",
+			dev->name, errno, strerror(errno));
+		goto err_vfio_res;
+	}
+	pdev->region[VFIO_PCI_CONFIG_REGION_INDEX].size = reg->size;
+	pdev->region[VFIO_PCI_CONFIG_REGION_INDEX].offset = reg->offset;
+	free(reg);
+
 	vfio_res->msix_table.bar_index = -1;
 	/* get MSI-X BAR, if any (we have to know where it is because we can't
 	 * easily mmap it when using VFIO)
 	 */
-	ret = pci_vfio_get_msix_bar(vfio_dev_fd, &vfio_res->msix_table);
+	ret = pci_vfio_get_msix_bar(dev, vfio_dev_fd, &vfio_res->msix_table);
 	if (ret < 0) {
 		RTE_LOG(ERR, EAL, "%s cannot get MSI-X BAR number!\n",
 				pci_addr);
@@ -792,7 +874,6 @@ pci_vfio_map_resource_primary(struct rte_pci_device *dev)
 	}
 
 	for (i = 0; i < vfio_res->nb_maps; i++) {
-		struct vfio_region_info *reg = NULL;
 		void *bar_addr;
 
 		ret = pci_vfio_get_region_info(vfio_dev_fd, &reg, i);
@@ -803,8 +884,11 @@ pci_vfio_map_resource_primary(struct rte_pci_device *dev)
 			goto err_vfio_res;
 		}
 
+		pdev->region[i].size = reg->size;
+		pdev->region[i].offset = reg->offset;
+
 		/* chk for io port region */
-		ret = pci_vfio_is_ioport_bar(vfio_dev_fd, i);
+		ret = pci_vfio_is_ioport_bar(dev, vfio_dev_fd, i);
 		if (ret < 0) {
 			free(reg);
 			goto err_vfio_res;
@@ -916,6 +1000,10 @@ pci_vfio_map_resource_secondary(struct rte_pci_device *dev)
 	if (ret)
 		return ret;
 
+	ret = pci_vfio_fill_regions(dev, vfio_dev_fd, &device_info);
+	if (ret)
+		return ret;
+
 	/* map BARs */
 	maps = vfio_res->maps;
 
@@ -1031,7 +1119,7 @@ pci_vfio_unmap_resource_primary(struct rte_pci_device *dev)
 	if (vfio_dev_fd < 0)
 		return -1;
 
-	if (pci_vfio_set_bus_master(vfio_dev_fd, false)) {
+	if (pci_vfio_set_bus_master(dev, vfio_dev_fd, false)) {
 		RTE_LOG(ERR, EAL, "%s cannot unset bus mastering for PCI device!\n",
 				pci_addr);
 		return -1;
@@ -1111,14 +1199,21 @@ int
 pci_vfio_ioport_map(struct rte_pci_device *dev, int bar,
 		    struct rte_pci_ioport *p)
 {
+	uint64_t size, offset;
+
 	if (bar < VFIO_PCI_BAR0_REGION_INDEX ||
 	    bar > VFIO_PCI_BAR5_REGION_INDEX) {
 		RTE_LOG(ERR, EAL, "invalid bar (%d)!\n", bar);
 		return -1;
 	}
 
+	if (pci_vfio_get_region(dev, bar, &size, &offset) != 0) {
+		RTE_LOG(ERR, EAL, "Cannot get offset of region %d.\n", bar);
+		return -1;
+	}
+
 	p->dev = dev;
-	p->base = VFIO_GET_REGION_ADDR(bar);
+	p->base = offset;
 	return 0;
 }
 
diff --git a/drivers/bus/pci/private.h b/drivers/bus/pci/private.h
index b564646e03..2d6991ccb7 100644
--- a/drivers/bus/pci/private.h
+++ b/drivers/bus/pci/private.h
@@ -13,6 +13,8 @@
 #include <rte_os_shim.h>
 #include <rte_pci.h>
 
+#define RTE_MAX_PCI_REGIONS    9
+
 /*
  * Convert struct rte_pci_device to struct rte_pci_device_internal
  */
@@ -42,8 +44,15 @@ extern struct rte_pci_bus rte_pci_bus;
 struct rte_pci_driver;
 struct rte_pci_device;
 
+struct rte_pci_region {
+	uint64_t size;
+	uint64_t offset;
+};
+
 struct rte_pci_device_internal {
 	struct rte_pci_device device;
+	/* PCI regions provided by e.g. VFIO. */
+	struct rte_pci_region region[RTE_MAX_PCI_REGIONS];
 };
 
 /**
diff --git a/lib/eal/include/rte_vfio.h b/lib/eal/include/rte_vfio.h
index 7bdb8932b2..3487c4f2a2 100644
--- a/lib/eal/include/rte_vfio.h
+++ b/lib/eal/include/rte_vfio.h
@@ -38,7 +38,6 @@ extern "C" {
 #define VFIO_CONTAINER_PATH "/dev/vfio/vfio"
 #define VFIO_GROUP_FMT "/dev/vfio/%u"
 #define VFIO_NOIOMMU_GROUP_FMT "/dev/vfio/noiommu-%u"
-#define VFIO_GET_REGION_ADDR(x) ((uint64_t) x << 40ULL)
 #define VFIO_GET_REGION_IDX(x) (x >> 40)
 #define VFIO_NOIOMMU_MODE      \
 	"/sys/module/vfio/parameters/enable_unsafe_noiommu_mode"
-- 
2.25.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v4 3/4] bus/pci: introduce helper for MMIO read and write
  2023-05-31  5:37         ` [PATCH v4 0/4] Support VFIO sparse mmap in PCI bus Miao Li
  2023-05-31  5:37           ` [PATCH v4 1/4] bus/pci: introduce an internal representation of PCI device Miao Li
  2023-05-31  5:37           ` [PATCH v4 2/4] bus/pci: avoid depending on private value in kernel source Miao Li
@ 2023-05-31  5:37           ` Miao Li
  2023-05-31  5:37           ` [PATCH v4 4/4] bus/pci: add VFIO sparse mmap support Miao Li
                             ` (2 subsequent siblings)
  5 siblings, 0 replies; 50+ messages in thread
From: Miao Li @ 2023-05-31  5:37 UTC (permalink / raw)
  To: dev
  Cc: skori, thomas, david.marchand, ferruh.yigit, chenbo.xia,
	yahui.cao, Anatoly Burakov

From: Chenbo Xia <chenbo.xia@intel.com>

The MMIO regions may not be mmap-able for VFIO-PCI devices.
In this case, the driver should explicitly do read and write
to access these regions.

Signed-off-by: Chenbo Xia <chenbo.xia@intel.com>
Acked-by: Sunil Kumar Kori <skori@marvell.com>
Acked-by: Yahui Cao <yahui.cao@intel.com>
---
 doc/guides/rel_notes/release_23_07.rst |  5 +++
 drivers/bus/pci/bsd/pci.c              | 22 ++++++++++++
 drivers/bus/pci/linux/pci.c            | 46 ++++++++++++++++++++++++
 drivers/bus/pci/linux/pci_init.h       | 10 ++++++
 drivers/bus/pci/linux/pci_uio.c        | 22 ++++++++++++
 drivers/bus/pci/linux/pci_vfio.c       | 36 +++++++++++++++++++
 drivers/bus/pci/rte_bus_pci.h          | 48 ++++++++++++++++++++++++++
 drivers/bus/pci/version.map            |  3 ++
 8 files changed, 192 insertions(+)

diff --git a/doc/guides/rel_notes/release_23_07.rst b/doc/guides/rel_notes/release_23_07.rst
index a9b1293689..dba39134f1 100644
--- a/doc/guides/rel_notes/release_23_07.rst
+++ b/doc/guides/rel_notes/release_23_07.rst
@@ -55,6 +55,11 @@ New Features
      Also, make sure to start the actual text at the margin.
      =======================================================
 
+* **Added MMIO read and write APIs to PCI bus.**
+
+  Introduced ``rte_pci_mmio_read()`` and ``rte_pci_mmio_write()`` APIs to PCI
+  bus so that PCI drivers can access PCI memory resources when they are not
+  mapped to process address space.
 
 Removed Items
 -------------
diff --git a/drivers/bus/pci/bsd/pci.c b/drivers/bus/pci/bsd/pci.c
index a747eca58c..27f12590d4 100644
--- a/drivers/bus/pci/bsd/pci.c
+++ b/drivers/bus/pci/bsd/pci.c
@@ -489,6 +489,28 @@ int rte_pci_write_config(const struct rte_pci_device *dev,
 	return -1;
 }
 
+/* Read PCI MMIO space. */
+int rte_pci_mmio_read(const struct rte_pci_device *dev, int bar,
+		      void *buf, size_t len, off_t offset)
+{
+	if (bar >= PCI_MAX_RESOURCE || dev->mem_resource[bar].addr == NULL ||
+			(uint64_t)offset + len > dev->mem_resource[bar].len)
+		return -1;
+	memcpy(buf, (uint8_t *)dev->mem_resource[bar].addr + offset, len);
+	return len;
+}
+
+/* Write PCI MMIO space. */
+int rte_pci_mmio_write(const struct rte_pci_device *dev, int bar,
+		       const void *buf, size_t len, off_t offset)
+{
+	if (bar >= PCI_MAX_RESOURCE || dev->mem_resource[bar].addr == NULL ||
+			(uint64_t)offset + len > dev->mem_resource[bar].len)
+		return -1;
+	memcpy((uint8_t *)dev->mem_resource[bar].addr + offset, buf, len);
+	return len;
+}
+
 int
 rte_pci_ioport_map(struct rte_pci_device *dev, int bar,
 		struct rte_pci_ioport *p)
diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index 04e21ae20f..3d237398d9 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -680,6 +680,52 @@ int rte_pci_write_config(const struct rte_pci_device *device,
 	}
 }
 
+/* Read PCI MMIO space. */
+int rte_pci_mmio_read(const struct rte_pci_device *device, int bar,
+		void *buf, size_t len, off_t offset)
+{
+	char devname[RTE_DEV_NAME_MAX_LEN] = "";
+
+	switch (device->kdrv) {
+	case RTE_PCI_KDRV_IGB_UIO:
+	case RTE_PCI_KDRV_UIO_GENERIC:
+		return pci_uio_mmio_read(device, bar, buf, len, offset);
+#ifdef VFIO_PRESENT
+	case RTE_PCI_KDRV_VFIO:
+		return pci_vfio_mmio_read(device, bar, buf, len, offset);
+#endif
+	default:
+		rte_pci_device_name(&device->addr, devname,
+				    RTE_DEV_NAME_MAX_LEN);
+		RTE_LOG(ERR, EAL,
+			"Unknown driver type for %s\n", devname);
+		return -1;
+	}
+}
+
+/* Write PCI MMIO space. */
+int rte_pci_mmio_write(const struct rte_pci_device *device, int bar,
+		const void *buf, size_t len, off_t offset)
+{
+	char devname[RTE_DEV_NAME_MAX_LEN] = "";
+
+	switch (device->kdrv) {
+	case RTE_PCI_KDRV_IGB_UIO:
+	case RTE_PCI_KDRV_UIO_GENERIC:
+		return pci_uio_mmio_write(device, bar, buf, len, offset);
+#ifdef VFIO_PRESENT
+	case RTE_PCI_KDRV_VFIO:
+		return pci_vfio_mmio_write(device, bar, buf, len, offset);
+#endif
+	default:
+		rte_pci_device_name(&device->addr, devname,
+				    RTE_DEV_NAME_MAX_LEN);
+		RTE_LOG(ERR, EAL,
+			"Unknown driver type for %s\n", devname);
+		return -1;
+	}
+}
+
 int
 rte_pci_ioport_map(struct rte_pci_device *dev, int bar,
 		struct rte_pci_ioport *p)
diff --git a/drivers/bus/pci/linux/pci_init.h b/drivers/bus/pci/linux/pci_init.h
index 9f6659ba6e..d842809ccd 100644
--- a/drivers/bus/pci/linux/pci_init.h
+++ b/drivers/bus/pci/linux/pci_init.h
@@ -37,6 +37,11 @@ int pci_uio_read_config(const struct rte_intr_handle *intr_handle,
 int pci_uio_write_config(const struct rte_intr_handle *intr_handle,
 			 const void *buf, size_t len, off_t offs);
 
+int pci_uio_mmio_read(const struct rte_pci_device *dev, int bar,
+			void *buf, size_t len, off_t offset);
+int pci_uio_mmio_write(const struct rte_pci_device *dev, int bar,
+			const void *buf, size_t len, off_t offset);
+
 int pci_uio_ioport_map(struct rte_pci_device *dev, int bar,
 		       struct rte_pci_ioport *p);
 void pci_uio_ioport_read(struct rte_pci_ioport *p,
@@ -71,6 +76,11 @@ int pci_vfio_read_config(const struct rte_pci_device *dev,
 int pci_vfio_write_config(const struct rte_pci_device *dev,
 			  const void *buf, size_t len, off_t offs);
 
+int pci_vfio_mmio_read(const struct rte_pci_device *dev, int bar,
+			void *buf, size_t len, off_t offset);
+int pci_vfio_mmio_write(const struct rte_pci_device *dev, int bar,
+			const void *buf, size_t len, off_t offset);
+
 int pci_vfio_ioport_map(struct rte_pci_device *dev, int bar,
 		        struct rte_pci_ioport *p);
 void pci_vfio_ioport_read(struct rte_pci_ioport *p,
diff --git a/drivers/bus/pci/linux/pci_uio.c b/drivers/bus/pci/linux/pci_uio.c
index d52125e49b..2bf16e9369 100644
--- a/drivers/bus/pci/linux/pci_uio.c
+++ b/drivers/bus/pci/linux/pci_uio.c
@@ -55,6 +55,28 @@ pci_uio_write_config(const struct rte_intr_handle *intr_handle,
 	return pwrite(uio_cfg_fd, buf, len, offset);
 }
 
+int
+pci_uio_mmio_read(const struct rte_pci_device *dev, int bar,
+		  void *buf, size_t len, off_t offset)
+{
+	if (bar >= PCI_MAX_RESOURCE || dev->mem_resource[bar].addr == NULL ||
+			(uint64_t)offset + len > dev->mem_resource[bar].len)
+		return -1;
+	memcpy(buf, (uint8_t *)dev->mem_resource[bar].addr + offset, len);
+	return len;
+}
+
+int
+pci_uio_mmio_write(const struct rte_pci_device *dev, int bar,
+		   const void *buf, size_t len, off_t offset)
+{
+	if (bar >= PCI_MAX_RESOURCE || dev->mem_resource[bar].addr == NULL ||
+			(uint64_t)offset + len > dev->mem_resource[bar].len)
+		return -1;
+	memcpy((uint8_t *)dev->mem_resource[bar].addr + offset, buf, len);
+	return len;
+}
+
 static int
 pci_uio_set_bus_master(int dev_fd)
 {
diff --git a/drivers/bus/pci/linux/pci_vfio.c b/drivers/bus/pci/linux/pci_vfio.c
index 5aef84b7d0..24b0795fbd 100644
--- a/drivers/bus/pci/linux/pci_vfio.c
+++ b/drivers/bus/pci/linux/pci_vfio.c
@@ -1258,6 +1258,42 @@ pci_vfio_ioport_unmap(struct rte_pci_ioport *p)
 	return -1;
 }
 
+int
+pci_vfio_mmio_read(const struct rte_pci_device *dev, int bar,
+		   void *buf, size_t len, off_t offs)
+{
+	uint64_t size, offset;
+	int fd;
+
+	fd = rte_intr_dev_fd_get(dev->intr_handle);
+
+	if (pci_vfio_get_region(dev, bar, &size, &offset) != 0)
+		return -1;
+
+	if ((uint64_t)len + offs > size)
+		return -1;
+
+	return pread64(fd, buf, len, offset + offs);
+}
+
+int
+pci_vfio_mmio_write(const struct rte_pci_device *dev, int bar,
+		    const void *buf, size_t len, off_t offs)
+{
+	uint64_t size, offset;
+	int fd;
+
+	fd = rte_intr_dev_fd_get(dev->intr_handle);
+
+	if (pci_vfio_get_region(dev, bar, &size, &offset) != 0)
+		return -1;
+
+	if ((uint64_t)len + offs > size)
+		return -1;
+
+	return pwrite64(fd, buf, len, offset + offs);
+}
+
 int
 pci_vfio_is_enabled(void)
 {
diff --git a/drivers/bus/pci/rte_bus_pci.h b/drivers/bus/pci/rte_bus_pci.h
index b193114fe5..82da087f24 100644
--- a/drivers/bus/pci/rte_bus_pci.h
+++ b/drivers/bus/pci/rte_bus_pci.h
@@ -135,6 +135,54 @@ int rte_pci_read_config(const struct rte_pci_device *device,
 int rte_pci_write_config(const struct rte_pci_device *device,
 		const void *buf, size_t len, off_t offset);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Read from a MMIO pci resource.
+ *
+ * @param device
+ *   A pointer to a rte_pci_device structure describing the device
+ *   to use
+ * @param bar
+ *   Index of the io pci resource we want to access.
+ * @param buf
+ *   A data buffer where the bytes should be read into
+ * @param len
+ *   The length of the data buffer.
+ * @param offset
+ *   The offset into MMIO space described by @bar
+ * @return
+ *  Number of bytes read on success, negative on error.
+ */
+__rte_experimental
+int rte_pci_mmio_read(const struct rte_pci_device *device, int bar,
+		void *buf, size_t len, off_t offset);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Write to a MMIO pci resource.
+ *
+ * @param device
+ *   A pointer to a rte_pci_device structure describing the device
+ *   to use
+ * @param bar
+ *   Index of the io pci resource we want to access.
+ * @param buf
+ *   A data buffer containing the bytes should be written
+ * @param len
+ *   The length of the data buffer.
+ * @param offset
+ *   The offset into MMIO space described by @bar
+ * @return
+ *  Number of bytes written on success, negative on error.
+ */
+__rte_experimental
+int rte_pci_mmio_write(const struct rte_pci_device *device, int bar,
+		const void *buf, size_t len, off_t offset);
+
 /**
  * Initialize a rte_pci_ioport object for a pci device io resource.
  *
diff --git a/drivers/bus/pci/version.map b/drivers/bus/pci/version.map
index 161ab86d3b..00fde139ca 100644
--- a/drivers/bus/pci/version.map
+++ b/drivers/bus/pci/version.map
@@ -21,6 +21,9 @@ EXPERIMENTAL {
 
 	# added in 21.08
 	rte_pci_set_bus_master;
+	# added in 23.07
+	rte_pci_mmio_read;
+	rte_pci_mmio_write;
 };
 
 INTERNAL {
-- 
2.25.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v4 4/4] bus/pci: add VFIO sparse mmap support
  2023-05-31  5:37         ` [PATCH v4 0/4] Support VFIO sparse mmap in PCI bus Miao Li
                             ` (2 preceding siblings ...)
  2023-05-31  5:37           ` [PATCH v4 3/4] bus/pci: introduce helper for MMIO read and write Miao Li
@ 2023-05-31  5:37           ` Miao Li
  2023-06-07 16:30           ` [PATCH v4 0/4] Support VFIO sparse mmap in PCI bus Thomas Monjalon
  2023-06-08  6:43           ` Ali Alnubani
  5 siblings, 0 replies; 50+ messages in thread
From: Miao Li @ 2023-05-31  5:37 UTC (permalink / raw)
  To: dev
  Cc: skori, thomas, david.marchand, ferruh.yigit, chenbo.xia,
	yahui.cao, Anatoly Burakov

This patch adds sparse mmap support in PCI bus. Sparse mmap is a
capability defined in VFIO which allows multiple mmap areas in one
VFIO region.

In this patch, the sparse mmap regions are mapped to one continuous
virtual address region that follows device-specific BAR layout. So,
driver can still access all mapped sparse mmap regions by using
'bar_base_address + bar_offset'.

Signed-off-by: Miao Li <miao.li@intel.com>
Signed-off-by: Chenbo Xia <chenbo.xia@intel.com>
Acked-by: Sunil Kumar Kori <skori@marvell.com>
Acked-by: Yahui Cao <yahui.cao@intel.com>
---
 drivers/bus/pci/linux/pci_vfio.c | 138 +++++++++++++++++++++++++++----
 drivers/bus/pci/private.h        |   2 +
 2 files changed, 122 insertions(+), 18 deletions(-)

diff --git a/drivers/bus/pci/linux/pci_vfio.c b/drivers/bus/pci/linux/pci_vfio.c
index 24b0795fbd..e6db30d36a 100644
--- a/drivers/bus/pci/linux/pci_vfio.c
+++ b/drivers/bus/pci/linux/pci_vfio.c
@@ -673,6 +673,54 @@ pci_vfio_mmap_bar(int vfio_dev_fd, struct mapped_pci_resource *vfio_res,
 	return 0;
 }
 
+static int
+pci_vfio_sparse_mmap_bar(int vfio_dev_fd, struct mapped_pci_resource *vfio_res,
+		int bar_index, int additional_flags)
+{
+	struct pci_map *bar = &vfio_res->maps[bar_index];
+	struct vfio_region_sparse_mmap_area *sparse;
+	void *bar_addr;
+	uint32_t i;
+
+	if (bar->size == 0) {
+		RTE_LOG(DEBUG, EAL, "Bar size is 0, skip BAR%d\n", bar_index);
+		return 0;
+	}
+
+	/* reserve the address using an inaccessible mapping */
+	bar_addr = mmap(bar->addr, bar->size, 0, MAP_PRIVATE |
+			MAP_ANONYMOUS | additional_flags, -1, 0);
+	if (bar_addr != MAP_FAILED) {
+		void *map_addr = NULL;
+		for (i = 0; i < bar->nr_areas; i++) {
+			sparse = &bar->areas[i];
+			if (sparse->size) {
+				void *addr = RTE_PTR_ADD(bar_addr, (uintptr_t)sparse->offset);
+				map_addr = pci_map_resource(addr, vfio_dev_fd,
+					bar->offset + sparse->offset, sparse->size,
+					RTE_MAP_FORCE_ADDRESS);
+				if (map_addr == NULL) {
+					munmap(bar_addr, bar->size);
+					RTE_LOG(ERR, EAL, "Failed to map pci BAR%d\n",
+						bar_index);
+					goto err_map;
+				}
+			}
+		}
+	} else {
+		RTE_LOG(ERR, EAL, "Failed to create inaccessible mapping for BAR%d\n",
+			bar_index);
+		goto err_map;
+	}
+
+	bar->addr = bar_addr;
+	return 0;
+
+err_map:
+	bar->nr_areas = 0;
+	return -1;
+}
+
 /*
  * region info may contain capability headers, so we need to keep reallocating
  * the memory until we match allocated memory size with argsz.
@@ -798,7 +846,7 @@ pci_vfio_map_resource_primary(struct rte_pci_device *dev)
 	char pci_addr[PATH_MAX] = {0};
 	int vfio_dev_fd;
 	struct rte_pci_addr *loc = &dev->addr;
-	int i, ret;
+	int i, j, ret;
 	struct mapped_pci_resource *vfio_res = NULL;
 	struct mapped_pci_res_list *vfio_res_list =
 		RTE_TAILQ_CAST(rte_vfio_tailq.head, mapped_pci_res_list);
@@ -875,13 +923,15 @@ pci_vfio_map_resource_primary(struct rte_pci_device *dev)
 
 	for (i = 0; i < vfio_res->nb_maps; i++) {
 		void *bar_addr;
+		struct vfio_info_cap_header *hdr;
+		struct vfio_region_info_cap_sparse_mmap *sparse;
 
 		ret = pci_vfio_get_region_info(vfio_dev_fd, &reg, i);
 		if (ret < 0) {
 			RTE_LOG(ERR, EAL,
 				"%s cannot get device region info error "
 				"%i (%s)\n", pci_addr, errno, strerror(errno));
-			goto err_vfio_res;
+			goto err_map;
 		}
 
 		pdev->region[i].size = reg->size;
@@ -891,7 +941,7 @@ pci_vfio_map_resource_primary(struct rte_pci_device *dev)
 		ret = pci_vfio_is_ioport_bar(dev, vfio_dev_fd, i);
 		if (ret < 0) {
 			free(reg);
-			goto err_vfio_res;
+			goto err_map;
 		} else if (ret) {
 			RTE_LOG(INFO, EAL, "Ignore mapping IO port bar(%d)\n",
 					i);
@@ -920,12 +970,41 @@ pci_vfio_map_resource_primary(struct rte_pci_device *dev)
 		maps[i].size = reg->size;
 		maps[i].path = NULL; /* vfio doesn't have per-resource paths */
 
-		ret = pci_vfio_mmap_bar(vfio_dev_fd, vfio_res, i, 0);
-		if (ret < 0) {
-			RTE_LOG(ERR, EAL, "%s mapping BAR%i failed: %s\n",
-					pci_addr, i, strerror(errno));
-			free(reg);
-			goto err_vfio_res;
+		hdr = pci_vfio_info_cap(reg, VFIO_REGION_INFO_CAP_SPARSE_MMAP);
+
+		if (hdr != NULL) {
+			sparse = container_of(hdr,
+				struct vfio_region_info_cap_sparse_mmap, header);
+			if (sparse->nr_areas > 0) {
+				maps[i].nr_areas = sparse->nr_areas;
+				maps[i].areas = rte_zmalloc(NULL,
+					sizeof(*maps[i].areas) * maps[i].nr_areas, 0);
+				if (maps[i].areas == NULL) {
+					RTE_LOG(ERR, EAL,
+						"Cannot alloc memory for sparse map areas\n");
+					goto err_map;
+				}
+				memcpy(maps[i].areas, sparse->areas,
+					sizeof(*maps[i].areas) * maps[i].nr_areas);
+			}
+		}
+
+		if (maps[i].nr_areas > 0) {
+			ret = pci_vfio_sparse_mmap_bar(vfio_dev_fd, vfio_res, i, 0);
+			if (ret < 0) {
+				RTE_LOG(ERR, EAL, "%s sparse mapping BAR%i failed: %s\n",
+						pci_addr, i, strerror(errno));
+				free(reg);
+				goto err_map;
+			}
+		} else {
+			ret = pci_vfio_mmap_bar(vfio_dev_fd, vfio_res, i, 0);
+			if (ret < 0) {
+				RTE_LOG(ERR, EAL, "%s mapping BAR%i failed: %s\n",
+						pci_addr, i, strerror(errno));
+				free(reg);
+				goto err_map;
+			}
 		}
 
 		dev->mem_resource[i].addr = maps[i].addr;
@@ -935,19 +1014,26 @@ pci_vfio_map_resource_primary(struct rte_pci_device *dev)
 
 	if (pci_rte_vfio_setup_device(dev, vfio_dev_fd) < 0) {
 		RTE_LOG(ERR, EAL, "%s setup device failed\n", pci_addr);
-		goto err_vfio_res;
+		goto err_map;
 	}
 
 #ifdef HAVE_VFIO_DEV_REQ_INTERFACE
 	if (pci_vfio_enable_notifier(dev, vfio_dev_fd) != 0) {
 		RTE_LOG(ERR, EAL, "Error setting up notifier!\n");
-		goto err_vfio_res;
+		goto err_map;
 	}
 
 #endif
 	TAILQ_INSERT_TAIL(vfio_res_list, vfio_res, next);
 
 	return 0;
+err_map:
+	for (j = 0; j < i; j++) {
+		if (maps[j].addr)
+			pci_unmap_resource(maps[j].addr, maps[j].size);
+		if (maps[j].nr_areas > 0)
+			rte_free(maps[j].areas);
+	}
 err_vfio_res:
 	rte_free(vfio_res);
 err_vfio_dev_fd:
@@ -963,7 +1049,7 @@ pci_vfio_map_resource_secondary(struct rte_pci_device *dev)
 	char pci_addr[PATH_MAX] = {0};
 	int vfio_dev_fd;
 	struct rte_pci_addr *loc = &dev->addr;
-	int i, ret;
+	int i, j, ret;
 	struct mapped_pci_resource *vfio_res = NULL;
 	struct mapped_pci_res_list *vfio_res_list =
 		RTE_TAILQ_CAST(rte_vfio_tailq.head, mapped_pci_res_list);
@@ -1008,11 +1094,20 @@ pci_vfio_map_resource_secondary(struct rte_pci_device *dev)
 	maps = vfio_res->maps;
 
 	for (i = 0; i < vfio_res->nb_maps; i++) {
-		ret = pci_vfio_mmap_bar(vfio_dev_fd, vfio_res, i, MAP_FIXED);
-		if (ret < 0) {
-			RTE_LOG(ERR, EAL, "%s mapping BAR%i failed: %s\n",
-					pci_addr, i, strerror(errno));
-			goto err_vfio_dev_fd;
+		if (maps[i].nr_areas > 0) {
+			ret = pci_vfio_sparse_mmap_bar(vfio_dev_fd, vfio_res, i, MAP_FIXED);
+			if (ret < 0) {
+				RTE_LOG(ERR, EAL, "%s sparse mapping BAR%i failed: %s\n",
+						pci_addr, i, strerror(errno));
+				goto err_vfio_dev_fd;
+			}
+		} else {
+			ret = pci_vfio_mmap_bar(vfio_dev_fd, vfio_res, i, MAP_FIXED);
+			if (ret < 0) {
+				RTE_LOG(ERR, EAL, "%s mapping BAR%i failed: %s\n",
+						pci_addr, i, strerror(errno));
+				goto err_vfio_dev_fd;
+			}
 		}
 
 		dev->mem_resource[i].addr = maps[i].addr;
@@ -1028,6 +1123,10 @@ pci_vfio_map_resource_secondary(struct rte_pci_device *dev)
 
 	return 0;
 err_vfio_dev_fd:
+	for (j = 0; j < i; j++) {
+		if (maps[j].addr)
+			pci_unmap_resource(maps[j].addr, maps[j].size);
+	}
 	rte_vfio_release_device(rte_pci_get_sysfs_path(),
 			pci_addr, vfio_dev_fd);
 	return -1;
@@ -1062,7 +1161,7 @@ find_and_unmap_vfio_resource(struct mapped_pci_res_list *vfio_res_list,
 		break;
 	}
 
-	if  (vfio_res == NULL)
+	if (vfio_res == NULL)
 		return vfio_res;
 
 	RTE_LOG(INFO, EAL, "Releasing PCI mapped resource for %s\n",
@@ -1080,6 +1179,9 @@ find_and_unmap_vfio_resource(struct mapped_pci_res_list *vfio_res_list,
 				pci_addr, maps[i].addr);
 			pci_unmap_resource(maps[i].addr, maps[i].size);
 		}
+
+		if (maps[i].nr_areas > 0)
+			rte_free(maps[i].areas);
 	}
 
 	return vfio_res;
diff --git a/drivers/bus/pci/private.h b/drivers/bus/pci/private.h
index 2d6991ccb7..8b0ce73533 100644
--- a/drivers/bus/pci/private.h
+++ b/drivers/bus/pci/private.h
@@ -121,6 +121,8 @@ struct pci_map {
 	uint64_t offset;
 	uint64_t size;
 	uint64_t phaddr;
+	uint32_t nr_areas;
+	struct vfio_region_sparse_mmap_area *areas;
 };
 
 struct pci_msix_table {
-- 
2.25.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v4 0/4] Support VFIO sparse mmap in PCI bus
  2023-05-31  5:37         ` [PATCH v4 0/4] Support VFIO sparse mmap in PCI bus Miao Li
                             ` (3 preceding siblings ...)
  2023-05-31  5:37           ` [PATCH v4 4/4] bus/pci: add VFIO sparse mmap support Miao Li
@ 2023-06-07 16:30           ` Thomas Monjalon
  2023-06-08  0:28             ` Patrick Robb
  2023-06-08  1:33             ` Xia, Chenbo
  2023-06-08  6:43           ` Ali Alnubani
  5 siblings, 2 replies; 50+ messages in thread
From: Thomas Monjalon @ 2023-06-07 16:30 UTC (permalink / raw)
  To: Miao Li; +Cc: dev, skori, david.marchand, ferruh.yigit, chenbo.xia, yahui.cao

31/05/2023 07:37, Miao Li:
> This series introduces a VFIO standard capability, called sparse
> mmap to PCI bus. In linux kernel, it's defined as
> VFIO_REGION_INFO_CAP_SPARSE_MMAP. Sparse mmap means instead of
> mmap whole BAR region into DPDK process, only mmap part of the
> BAR region after getting sparse mmap information from kernel.
> For the rest of BAR region that is not mmap-ed, DPDK process
> can use pread/pwrite system calls to access. Sparse mmap is
> useful when kernel does not want userspace to mmap whole BAR
> region, or kernel wants to control over access to specific BAR
> region. Vendors can choose to enable this feature or not for
> their devices in their specific kernel modules.
> 
> In this patchset:
> 
> Patch 1-3 is mainly for introducing BAR access APIs so that
> driver could use them to access specific BAR using pread/pwrite
> system calls when part of the BAR is not mmap-able. Patch 4
> adds the VFIO sparse mmap support finally.
> 
> v4:
> 1. add sparse mmap information allocation and release
> 2. add release note for BAR access APIs
> 
> v3:
> fix variable 'pdev' and 'info' uninitialized error
> 
> v2:
> 1. add PCI device internal structure in bus/pci/windows/pci.c
> 2. fix parameter type error
> 
> Chenbo Xia (3):
>   bus/pci: introduce an internal representation of PCI device
>   bus/pci: avoid depending on private value in kernel source
>   bus/pci: introduce helper for MMIO read and write
> 
> Miao Li (1):
>   bus/pci: add VFIO sparse mmap support

Applied, thanks.

Is there some drivers which may reuse the new MMIO helpers?



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v4 0/4] Support VFIO sparse mmap in PCI bus
  2023-06-07 16:30           ` [PATCH v4 0/4] Support VFIO sparse mmap in PCI bus Thomas Monjalon
@ 2023-06-08  0:28             ` Patrick Robb
  2023-06-08  1:36               ` Xia, Chenbo
  2023-06-08  1:33             ` Xia, Chenbo
  1 sibling, 1 reply; 50+ messages in thread
From: Patrick Robb @ 2023-06-08  0:28 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Miao Li, dev, skori, david.marchand, ferruh.yigit, chenbo.xia, yahui.cao

[-- Attachment #1: Type: text/plain, Size: 633 bytes --]

Hello,

This patchseries might have introduced a bug for building DPDK on windows.
It failed on windows build when it went through our CI last week, and I am
seeing other patch series fail on the windows build now that it is merged
into main.

Tomorrow morning, I will check our windows system used for CI to verify it
is still valid for building DPDK in terms of clang version, linker etc. But
it seems fine from a quick glance right now. I will update if/when I learn
more.

Thanks,
Patrick

-- 

Patrick Robb

Technical Service Manager

UNH InterOperability Laboratory

21 Madbury Rd, Suite 100, Durham, NH 03824

www.iol.unh.edu

[-- Attachment #2: Type: text/html, Size: 2718 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [PATCH v4 0/4] Support VFIO sparse mmap in PCI bus
  2023-06-07 16:30           ` [PATCH v4 0/4] Support VFIO sparse mmap in PCI bus Thomas Monjalon
  2023-06-08  0:28             ` Patrick Robb
@ 2023-06-08  1:33             ` Xia, Chenbo
  1 sibling, 0 replies; 50+ messages in thread
From: Xia, Chenbo @ 2023-06-08  1:33 UTC (permalink / raw)
  To: Thomas Monjalon, Li, Miao
  Cc: dev, skori, david.marchand, ferruh.yigit, Cao, Yahui

Hi Thomas,

> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Thursday, June 8, 2023 12:31 AM
> To: Li, Miao <miao.li@intel.com>
> Cc: dev@dpdk.org; skori@marvell.com; david.marchand@redhat.com;
> ferruh.yigit@amd.com; Xia, Chenbo <chenbo.xia@intel.com>; Cao, Yahui
> <yahui.cao@intel.com>
> Subject: Re: [PATCH v4 0/4] Support VFIO sparse mmap in PCI bus
> 
> 31/05/2023 07:37, Miao Li:
> > This series introduces a VFIO standard capability, called sparse
> > mmap to PCI bus. In linux kernel, it's defined as
> > VFIO_REGION_INFO_CAP_SPARSE_MMAP. Sparse mmap means instead of
> > mmap whole BAR region into DPDK process, only mmap part of the
> > BAR region after getting sparse mmap information from kernel.
> > For the rest of BAR region that is not mmap-ed, DPDK process
> > can use pread/pwrite system calls to access. Sparse mmap is
> > useful when kernel does not want userspace to mmap whole BAR
> > region, or kernel wants to control over access to specific BAR
> > region. Vendors can choose to enable this feature or not for
> > their devices in their specific kernel modules.
> >
> > In this patchset:
> >
> > Patch 1-3 is mainly for introducing BAR access APIs so that
> > driver could use them to access specific BAR using pread/pwrite
> > system calls when part of the BAR is not mmap-able. Patch 4
> > adds the VFIO sparse mmap support finally.
> >
> > v4:
> > 1. add sparse mmap information allocation and release
> > 2. add release note for BAR access APIs
> >
> > v3:
> > fix variable 'pdev' and 'info' uninitialized error
> >
> > v2:
> > 1. add PCI device internal structure in bus/pci/windows/pci.c
> > 2. fix parameter type error
> >
> > Chenbo Xia (3):
> >   bus/pci: introduce an internal representation of PCI device
> >   bus/pci: avoid depending on private value in kernel source
> >   bus/pci: introduce helper for MMIO read and write
> >
> > Miao Li (1):
> >   bus/pci: add VFIO sparse mmap support
> 
> Applied, thanks.
> 
> Is there some drivers which may reuse the new MMIO helpers?

Yes, we will send patches to let Intel drivers use them soon. Other
drivers could start to use when they want to support sparse mmap.

Thanks,
Chenbo 

> 


^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [PATCH v4 0/4] Support VFIO sparse mmap in PCI bus
  2023-06-08  0:28             ` Patrick Robb
@ 2023-06-08  1:36               ` Xia, Chenbo
  0 siblings, 0 replies; 50+ messages in thread
From: Xia, Chenbo @ 2023-06-08  1:36 UTC (permalink / raw)
  To: Patrick Robb, Thomas Monjalon
  Cc: Li, Miao, dev, skori, david.marchand, ferruh.yigit, Cao, Yahui


[-- Attachment #1.1: Type: text/plain, Size: 1280 bytes --]

Hi Patrick,

Oops.. Seems weird as patchwork does not report anything. Please reach out to me when you get some information about why it’s failing. I could fix it ASAP.

Thanks,
Chenbo

From: Patrick Robb <probb@iol.unh.edu>
Sent: Thursday, June 8, 2023 8:29 AM
To: Thomas Monjalon <thomas@monjalon.net>
Cc: Li, Miao <miao.li@intel.com>; dev@dpdk.org; skori@marvell.com; david.marchand@redhat.com; ferruh.yigit@amd.com; Xia, Chenbo <chenbo.xia@intel.com>; Cao, Yahui <yahui.cao@intel.com>
Subject: Re: [PATCH v4 0/4] Support VFIO sparse mmap in PCI bus

Hello,

This patchseries might have introduced a bug for building DPDK on windows. It failed on windows build when it went through our CI last week, and I am seeing other patch series fail on the windows build now that it is merged into main.

Tomorrow morning, I will check our windows system used for CI to verify it is still valid for building DPDK in terms of clang version, linker etc. But it seems fine from a quick glance right now. I will update if/when I learn more.

Thanks,
Patrick

--

Patrick Robb

Technical Service Manager

UNH InterOperability Laboratory

21 Madbury Rd, Suite 100, Durham, NH 03824

www.iol.unh.edu<http://www.iol.unh.edu/>



[Image removed by sender.]

[-- Attachment #1.2: Type: text/html, Size: 5685 bytes --]

[-- Attachment #2: ~WRD0000.jpg --]
[-- Type: image/jpeg, Size: 823 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [PATCH v4 0/4] Support VFIO sparse mmap in PCI bus
  2023-05-31  5:37         ` [PATCH v4 0/4] Support VFIO sparse mmap in PCI bus Miao Li
                             ` (4 preceding siblings ...)
  2023-06-07 16:30           ` [PATCH v4 0/4] Support VFIO sparse mmap in PCI bus Thomas Monjalon
@ 2023-06-08  6:43           ` Ali Alnubani
  2023-06-08  6:50             ` Xia, Chenbo
  5 siblings, 1 reply; 50+ messages in thread
From: Ali Alnubani @ 2023-06-08  6:43 UTC (permalink / raw)
  To: Miao Li, dev, NBU-Contact-Thomas Monjalon (EXTERNAL)
  Cc: skori, david.marchand, ferruh.yigit, chenbo.xia, yahui.cao, Patrick Robb

> -----Original Message-----
> From: Miao Li <miao.li@intel.com>
> Sent: Wednesday, May 31, 2023 8:38 AM
> To: dev@dpdk.org
> Cc: skori@marvell.com; NBU-Contact-Thomas Monjalon (EXTERNAL)
> <thomas@monjalon.net>; david.marchand@redhat.com;
> ferruh.yigit@amd.com; chenbo.xia@intel.com; yahui.cao@intel.com
> Subject: [PATCH v4 0/4] Support VFIO sparse mmap in PCI bus
> 
> This series introduces a VFIO standard capability, called sparse
> mmap to PCI bus. In linux kernel, it's defined as
> VFIO_REGION_INFO_CAP_SPARSE_MMAP. Sparse mmap means instead of
> mmap whole BAR region into DPDK process, only mmap part of the
> BAR region after getting sparse mmap information from kernel.
> For the rest of BAR region that is not mmap-ed, DPDK process
> can use pread/pwrite system calls to access. Sparse mmap is
> useful when kernel does not want userspace to mmap whole BAR
> region, or kernel wants to control over access to specific BAR
> region. Vendors can choose to enable this feature or not for
> their devices in their specific kernel modules.
> 

Hello,

I see the build failure Patrick reported as well and can confirm it's caused by 095cf6e68b28 ("bus/pci: introduce MMIO read/write").
Bugzilla ticket: https://bugs.dpdk.org/show_bug.cgi?id=1245

Regards,
Ali

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [PATCH v4 0/4] Support VFIO sparse mmap in PCI bus
  2023-06-08  6:43           ` Ali Alnubani
@ 2023-06-08  6:50             ` Xia, Chenbo
  2023-06-08  7:03               ` David Marchand
  0 siblings, 1 reply; 50+ messages in thread
From: Xia, Chenbo @ 2023-06-08  6:50 UTC (permalink / raw)
  To: Ali Alnubani, Li, Miao, dev, NBU-Contact-Thomas Monjalon (EXTERNAL)
  Cc: skori, david.marchand, ferruh.yigit, Cao, Yahui, Patrick Robb

> -----Original Message-----
> From: Ali Alnubani <alialnu@nvidia.com>
> Sent: Thursday, June 8, 2023 2:43 PM
> To: Li, Miao <miao.li@intel.com>; dev@dpdk.org; NBU-Contact-Thomas
> Monjalon (EXTERNAL) <thomas@monjalon.net>
> Cc: skori@marvell.com; david.marchand@redhat.com; ferruh.yigit@amd.com;
> Xia, Chenbo <chenbo.xia@intel.com>; Cao, Yahui <yahui.cao@intel.com>;
> Patrick Robb <probb@iol.unh.edu>
> Subject: RE: [PATCH v4 0/4] Support VFIO sparse mmap in PCI bus
> 
> > -----Original Message-----
> > From: Miao Li <miao.li@intel.com>
> > Sent: Wednesday, May 31, 2023 8:38 AM
> > To: dev@dpdk.org
> > Cc: skori@marvell.com; NBU-Contact-Thomas Monjalon (EXTERNAL)
> > <thomas@monjalon.net>; david.marchand@redhat.com;
> > ferruh.yigit@amd.com; chenbo.xia@intel.com; yahui.cao@intel.com
> > Subject: [PATCH v4 0/4] Support VFIO sparse mmap in PCI bus
> >
> > This series introduces a VFIO standard capability, called sparse
> > mmap to PCI bus. In linux kernel, it's defined as
> > VFIO_REGION_INFO_CAP_SPARSE_MMAP. Sparse mmap means instead of
> > mmap whole BAR region into DPDK process, only mmap part of the
> > BAR region after getting sparse mmap information from kernel.
> > For the rest of BAR region that is not mmap-ed, DPDK process
> > can use pread/pwrite system calls to access. Sparse mmap is
> > useful when kernel does not want userspace to mmap whole BAR
> > region, or kernel wants to control over access to specific BAR
> > region. Vendors can choose to enable this feature or not for
> > their devices in their specific kernel modules.
> >
> 
> Hello,
> 
> I see the build failure Patrick reported as well and can confirm it's
> caused by 095cf6e68b28 ("bus/pci: introduce MMIO read/write").
> Bugzilla ticket: https://bugs.dpdk.org/show_bug.cgi?id=1245

Thanks Ali. I just read the bz and understand what's missing. I will send
a patch today.

But since last time CI is not reporting the error, this time how could I make
sure the fix will perfectly work?

Regards,
Chenbo

> 
> Regards,
> Ali

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v4 0/4] Support VFIO sparse mmap in PCI bus
  2023-06-08  6:50             ` Xia, Chenbo
@ 2023-06-08  7:03               ` David Marchand
  2023-06-08 12:47                 ` Patrick Robb
  0 siblings, 1 reply; 50+ messages in thread
From: David Marchand @ 2023-06-08  7:03 UTC (permalink / raw)
  To: Xia, Chenbo
  Cc: Ali Alnubani, Li, Miao, dev,
	NBU-Contact-Thomas Monjalon (EXTERNAL),
	skori, ferruh.yigit, Cao, Yahui, Patrick Robb

Hello Chenbo, Patrick,


On Thu, Jun 8, 2023 at 8:50 AM Xia, Chenbo <chenbo.xia@intel.com> wrote:
> > > This series introduces a VFIO standard capability, called sparse
> > > mmap to PCI bus. In linux kernel, it's defined as
> > > VFIO_REGION_INFO_CAP_SPARSE_MMAP. Sparse mmap means instead of
> > > mmap whole BAR region into DPDK process, only mmap part of the
> > > BAR region after getting sparse mmap information from kernel.
> > > For the rest of BAR region that is not mmap-ed, DPDK process
> > > can use pread/pwrite system calls to access. Sparse mmap is
> > > useful when kernel does not want userspace to mmap whole BAR
> > > region, or kernel wants to control over access to specific BAR
> > > region. Vendors can choose to enable this feature or not for
> > > their devices in their specific kernel modules.
> > >
> >
> > Hello,
> >
> > I see the build failure Patrick reported as well and can confirm it's
> > caused by 095cf6e68b28 ("bus/pci: introduce MMIO read/write").
> > Bugzilla ticket: https://bugs.dpdk.org/show_bug.cgi?id=1245
>
> Thanks Ali. I just read the bz and understand what's missing. I will send
> a patch today.
>
> But since last time CI is not reporting the error, this time how could I make
> sure the fix will perfectly work?

Chenbo,

In theory, this error should have been reported so go ahead and post your fix.

Patrick,

This missing report could be a mail delivery issue (I can see the test
ran at UNH).
I see no trace in test-report ml.
Can you look at the reason?

Thanks.


-- 
David Marchand


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v4 0/4] Support VFIO sparse mmap in PCI bus
  2023-06-08  7:03               ` David Marchand
@ 2023-06-08 12:47                 ` Patrick Robb
  0 siblings, 0 replies; 50+ messages in thread
From: Patrick Robb @ 2023-06-08 12:47 UTC (permalink / raw)
  To: David Marchand
  Cc: Xia, Chenbo, Ali Alnubani, Li, Miao, dev,
	NBU-Contact-Thomas Monjalon (EXTERNAL),
	skori, ferruh.yigit, Cao, Yahui

[-- Attachment #1: Type: text/plain, Size: 2084 bytes --]

On Thu, Jun 8, 2023 at 3:03 AM David Marchand <david.marchand@redhat.com>
wrote:

> Hello Chenbo, Patrick,
>
>
> On Thu, Jun 8, 2023 at 8:50 AM Xia, Chenbo <chenbo.xia@intel.com> wrote:
> > > > This series introduces a VFIO standard capability, called sparse
> > > > mmap to PCI bus. In linux kernel, it's defined as
> > > > VFIO_REGION_INFO_CAP_SPARSE_MMAP. Sparse mmap means instead of
> > > > mmap whole BAR region into DPDK process, only mmap part of the
> > > > BAR region after getting sparse mmap information from kernel.
> > > > For the rest of BAR region that is not mmap-ed, DPDK process
> > > > can use pread/pwrite system calls to access. Sparse mmap is
> > > > useful when kernel does not want userspace to mmap whole BAR
> > > > region, or kernel wants to control over access to specific BAR
> > > > region. Vendors can choose to enable this feature or not for
> > > > their devices in their specific kernel modules.
> > > >
> > >
> > > Hello,
> > >
> > > I see the build failure Patrick reported as well and can confirm it's
> > > caused by 095cf6e68b28 ("bus/pci: introduce MMIO read/write").
> > > Bugzilla ticket: https://bugs.dpdk.org/show_bug.cgi?id=1245
> >
> > Thanks Ali. I just read the bz and understand what's missing. I will send
> > a patch today.
> >
> > But since last time CI is not reporting the error, this time how could I
> make
> > sure the fix will perfectly work?
>
> Chenbo,
>
> In theory, this error should have been reported so go ahead and post your
> fix.
>
> Patrick,
>
> This missing report could be a mail delivery issue (I can see the test
> ran at UNH).
> I see no trace in test-report ml.
> Can you look at the reason?
>
> Thanks.
>
>
> --
> David Marchand
>
>
I see where the issue is in our reporting process for the windows results
and I will resolve it after the CI meeting this morning. Sorry everyone for
the oversight.
-- 

Patrick Robb

Technical Service Manager

UNH InterOperability Laboratory

21 Madbury Rd, Suite 100, Durham, NH 03824

www.iol.unh.edu

[-- Attachment #2: Type: text/html, Size: 4721 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2023-06-08 12:47 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-18  5:30 [RFC 0/4] Support VFIO sparse mmap in PCI bus Chenbo Xia
2023-04-18  5:30 ` [RFC 1/4] bus/pci: introduce an internal representation of PCI device Chenbo Xia
2023-04-18  5:30 ` [RFC 2/4] bus/pci: avoid depending on private value in kernel source Chenbo Xia
2023-04-18  5:30 ` [RFC 3/4] bus/pci: introduce helper for MMIO read and write Chenbo Xia
2023-04-18  5:30 ` [RFC 4/4] bus/pci: add VFIO sparse mmap support Chenbo Xia
2023-04-18  7:46 ` [RFC 0/4] Support VFIO sparse mmap in PCI bus David Marchand
2023-04-18  9:27   ` Xia, Chenbo
2023-04-18  9:33   ` Xia, Chenbo
2023-05-08  2:13 ` Xia, Chenbo
2023-05-08  3:04   ` Sunil Kumar Kori
2023-05-15  6:46 ` [PATCH v1 " Miao Li
2023-05-15  6:46   ` [PATCH v1 1/4] bus/pci: introduce an internal representation of PCI device Miao Li
2023-05-15  6:46   ` [PATCH v1 2/4] bus/pci: avoid depending on private value in kernel source Miao Li
2023-05-15  6:46   ` [PATCH v1 3/4] bus/pci: introduce helper for MMIO read and write Miao Li
2023-05-15  6:47   ` [PATCH v1 4/4] bus/pci: add VFIO sparse mmap support Miao Li
2023-05-15  9:41     ` [PATCH v2 0/4] Support VFIO sparse mmap in PCI bus Miao Li
2023-05-15  9:41       ` [PATCH v2 1/4] bus/pci: introduce an internal representation of PCI device Miao Li
2023-05-15  9:41       ` [PATCH v2 2/4] bus/pci: avoid depending on private value in kernel source Miao Li
2023-05-15  9:41       ` [PATCH v2 3/4] bus/pci: introduce helper for MMIO read and write Miao Li
2023-05-15  9:41       ` [PATCH v2 4/4] bus/pci: add VFIO sparse mmap support Miao Li
2023-05-25 16:31       ` [PATCH v3 0/4] Support VFIO sparse mmap in PCI bus Miao Li
2023-05-25 16:31         ` [PATCH v3 1/4] bus/pci: introduce an internal representation of PCI device Miao Li
2023-05-29  6:14           ` [EXT] " Sunil Kumar Kori
2023-05-29  6:28           ` Cao, Yahui
2023-05-25 16:31         ` [PATCH v3 2/4] bus/pci: avoid depending on private value in kernel source Miao Li
2023-05-29  6:15           ` [EXT] " Sunil Kumar Kori
2023-05-29  6:30           ` Cao, Yahui
2023-05-25 16:31         ` [PATCH v3 3/4] bus/pci: introduce helper for MMIO read and write Miao Li
2023-05-29  6:16           ` [EXT] " Sunil Kumar Kori
2023-05-29  6:31           ` Cao, Yahui
2023-05-25 16:31         ` [PATCH v3 4/4] bus/pci: add VFIO sparse mmap support Miao Li
2023-05-29  6:17           ` [EXT] " Sunil Kumar Kori
2023-05-29  6:32           ` Cao, Yahui
2023-05-29  9:25           ` Xia, Chenbo
2023-05-31  5:37         ` [PATCH v4 0/4] Support VFIO sparse mmap in PCI bus Miao Li
2023-05-31  5:37           ` [PATCH v4 1/4] bus/pci: introduce an internal representation of PCI device Miao Li
2023-05-31  5:37           ` [PATCH v4 2/4] bus/pci: avoid depending on private value in kernel source Miao Li
2023-05-31  5:37           ` [PATCH v4 3/4] bus/pci: introduce helper for MMIO read and write Miao Li
2023-05-31  5:37           ` [PATCH v4 4/4] bus/pci: add VFIO sparse mmap support Miao Li
2023-06-07 16:30           ` [PATCH v4 0/4] Support VFIO sparse mmap in PCI bus Thomas Monjalon
2023-06-08  0:28             ` Patrick Robb
2023-06-08  1:36               ` Xia, Chenbo
2023-06-08  1:33             ` Xia, Chenbo
2023-06-08  6:43           ` Ali Alnubani
2023-06-08  6:50             ` Xia, Chenbo
2023-06-08  7:03               ` David Marchand
2023-06-08 12:47                 ` Patrick Robb
2023-05-15 15:52     ` [PATCH v1 4/4] bus/pci: add VFIO sparse mmap support Stephen Hemminger
2023-05-22  2:41       ` Li, Miao
2023-05-22  3:42       ` Xia, Chenbo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).