From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 1564442977; Tue, 18 Apr 2023 07:49:21 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 04AED410EA; Tue, 18 Apr 2023 07:49:21 +0200 (CEST) Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31]) by mails.dpdk.org (Postfix) with ESMTP id 9A5E640EDF for ; Tue, 18 Apr 2023 07:49:19 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1681796959; x=1713332959; h=from:to:cc:subject:date:message-id; bh=lUJqsMViEBycgWQ0JYEZygBmWwKCozcIZPfVLT17r8Y=; b=E1yXsW2El0/HungtLBopic3LQ3Td4ogSUia+5xK7sPnBSSe+kJqMhm+W 2BGpdyawxyCrKBN+8zLtAfPM71/KN5XOOh99IPXk9bHwTztO8cN/DL0E+ FX/ZdxOqxU9mQwdOwAxMi6VJsFoUUnkB9lifnlicjPyX/IsmhoyaKXxle ScLP1osSc2RkBQGSz+DtU6V9qti40OVWKt5RWH0ij9BRhOCZ+AQ3noARg zkl2aipkPrc78oeBQvYN0Bc4RCVCERZhz6WYufao9tsBRfvSisUeYakgh cZ8LshtdHp1PiDn7LoChZHRpQ12yHHl4+9JRalD7dbqHwMo3hUbTfw6us A==; X-IronPort-AV: E=McAfee;i="6600,9927,10683"; a="407978627" X-IronPort-AV: E=Sophos;i="5.99,206,1677571200"; d="scan'208";a="407978627" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Apr 2023 22:49:18 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10683"; a="668373188" X-IronPort-AV: E=Sophos;i="5.99,206,1677571200"; d="scan'208";a="668373188" Received: from npg-dpdk-virtio-xiachenbo-nw.sh.intel.com ([10.67.119.79]) by orsmga006.jf.intel.com with ESMTP; 17 Apr 2023 22:49:17 -0700 From: Chenbo Xia To: dev@dpdk.org Cc: skori@marvell.com Subject: [RFC 0/4] Support VFIO sparse mmap in PCI bus Date: Tue, 18 Apr 2023 13:30:08 +0800 Message-Id: <20230418053012.10667-1-chenbo.xia@intel.com> X-Mailer: git-send-email 2.17.1 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org This series introduces a VFIO standard capability, called sparse mmap to PCI bus. In linux kernel, it's defined as VFIO_REGION_INFO_CAP_SPARSE_MMAP. Sparse mmap means instead of mmap whole BAR region into DPDK process, only mmap part of the BAR region after getting sparse mmap information from kernel. For the rest of BAR region that is not mmap-ed, DPDK process can use pread/pwrite system calls to access. Sparse mmap is useful when kernel does not want userspace to mmap whole BAR region, or kernel wants to control over access to specific BAR region. Vendors can choose to enable this feature or not for their devices in their specific kernel modules. In this patchset: Patch 1-3 is mainly for introducing BAR access APIs so that driver could use them to access specific BAR using pread/pwrite system calls when part of the BAR is not mmap-able. Patch 4 adds the VFIO sparse mmap support finally. A question is for all sparse mmap regions, should they be mapped to a continuous virtual address region that follows device-specific BAR layout or not. In theory, there could be three options to support this feature. Option 1: Map sparse mmap regions independently ====================================================== In this approach, we mmap each sparse mmap region one by one and each region could be located anywhere in process address space. But accessing the mmaped BAR will not be as easy as 'bar_base_address + bar_offset', driver needs to check the sparse mmap information to access specific BAR register. Patch 4 in this patchset adopts this option. Driver API change is introduced in bus_pci_driver.h. Corresponding changes in all drivers are also done and currently I am assuming drivers do not support this feature so they will not check the 'is_sparse' flag but assumes it to be false. Note that it will not break any driver and each vendor can add related logic when they start to support this feature. This is only because I don't want to introduce complexity to drivers that do not want to support this feature. Option 2: Map sparse mmap regions based on device-specific BAR layout ====================================================================== In this approach, the sparse mmap regions are mapped to continuous virtual address region that follows device-specific BAR layout. For example, the BAR size is 0x4000 and only 0-0x1000 (sparse mmap region #1) and 0x3000-0x4000 (sparse mmap region #2) could be mmaped. Region #1 will be mapped at 'base_addr' and region #2 will be mapped at 'base_addr + 0x3000'. The good thing is if we implement like this, driver can still access all BAR registers using 'bar_base_address + bar_offset' way and we don't need to introduce any driver API change. But the address space range 'base_addr + 0x1000' to 'base_addr + 0x3000' may need to be reserved so it could result in waste of address space or memory (when we use MAP_ANONYMOUS and MAP_PRIVATE flag to reserve this range). Meanwhile, driver needs to know which part of BAR is mmaped (this is possible since the range is defined by vendor's specific kernel module). Option 3: Support both option 1 & 2 =================================== We could define a driver flag to let driver choose which way it perfers since either option has its own Pros & Cons. Please share your comments, Thanks! Chenbo Xia (4): bus/pci: introduce an internal representation of PCI device bus/pci: avoid depending on private value in kernel source bus/pci: introduce helper for MMIO read and write bus/pci: add VFIO sparse mmap support drivers/baseband/acc/rte_acc100_pmd.c | 6 +- drivers/baseband/acc/rte_vrb_pmd.c | 6 +- .../fpga_5gnr_fec/rte_fpga_5gnr_fec.c | 6 +- drivers/baseband/fpga_lte_fec/fpga_lte_fec.c | 6 +- drivers/bus/pci/bsd/pci.c | 43 +- drivers/bus/pci/bus_pci_driver.h | 24 +- drivers/bus/pci/linux/pci.c | 91 +++- drivers/bus/pci/linux/pci_init.h | 14 +- drivers/bus/pci/linux/pci_uio.c | 34 +- drivers/bus/pci/linux/pci_vfio.c | 445 ++++++++++++++---- drivers/bus/pci/pci_common.c | 57 ++- drivers/bus/pci/pci_common_uio.c | 12 +- drivers/bus/pci/private.h | 25 +- drivers/bus/pci/rte_bus_pci.h | 48 ++ drivers/bus/pci/version.map | 3 + drivers/common/cnxk/roc_dev.c | 4 +- drivers/common/cnxk/roc_dpi.c | 2 +- drivers/common/cnxk/roc_ml.c | 22 +- drivers/common/qat/dev/qat_dev_gen1.c | 2 +- drivers/common/qat/dev/qat_dev_gen4.c | 4 +- drivers/common/sfc_efx/sfc_efx.c | 2 +- drivers/compress/octeontx/otx_zip.c | 4 +- drivers/crypto/ccp/ccp_dev.c | 4 +- drivers/crypto/cnxk/cnxk_cryptodev_ops.c | 2 +- drivers/crypto/nitrox/nitrox_device.c | 4 +- drivers/crypto/octeontx/otx_cryptodev_ops.c | 6 +- drivers/crypto/virtio/virtio_pci.c | 6 +- drivers/dma/cnxk/cnxk_dmadev.c | 2 +- drivers/dma/hisilicon/hisi_dmadev.c | 6 +- drivers/dma/idxd/idxd_pci.c | 4 +- drivers/dma/ioat/ioat_dmadev.c | 2 +- drivers/event/dlb2/pf/dlb2_main.c | 16 +- drivers/event/octeontx/ssovf_probe.c | 38 +- drivers/event/octeontx/timvf_probe.c | 18 +- drivers/event/skeleton/skeleton_eventdev.c | 2 +- drivers/mempool/octeontx/octeontx_fpavf.c | 6 +- drivers/net/ark/ark_ethdev.c | 4 +- drivers/net/atlantic/atl_ethdev.c | 2 +- drivers/net/avp/avp_ethdev.c | 20 +- drivers/net/axgbe/axgbe_ethdev.c | 4 +- drivers/net/bnx2x/bnx2x_ethdev.c | 6 +- drivers/net/bnxt/bnxt_ethdev.c | 8 +- drivers/net/cpfl/cpfl_ethdev.c | 4 +- drivers/net/cxgbe/cxgbe_ethdev.c | 2 +- drivers/net/cxgbe/cxgbe_main.c | 2 +- drivers/net/cxgbe/cxgbevf_ethdev.c | 2 +- drivers/net/cxgbe/cxgbevf_main.c | 2 +- drivers/net/e1000/em_ethdev.c | 4 +- drivers/net/e1000/igb_ethdev.c | 4 +- drivers/net/ena/ena_ethdev.c | 4 +- drivers/net/enetc/enetc_ethdev.c | 2 +- drivers/net/enic/enic_main.c | 4 +- drivers/net/fm10k/fm10k_ethdev.c | 2 +- drivers/net/gve/gve_ethdev.c | 4 +- drivers/net/hinic/base/hinic_pmd_hwif.c | 14 +- drivers/net/hns3/hns3_ethdev.c | 2 +- drivers/net/hns3/hns3_ethdev_vf.c | 2 +- drivers/net/hns3/hns3_rxtx.c | 4 +- drivers/net/i40e/i40e_ethdev.c | 2 +- drivers/net/iavf/iavf_ethdev.c | 2 +- drivers/net/ice/ice_dcf.c | 2 +- drivers/net/ice/ice_ethdev.c | 2 +- drivers/net/idpf/idpf_ethdev.c | 4 +- drivers/net/igc/igc_ethdev.c | 2 +- drivers/net/ionic/ionic_dev_pci.c | 2 +- drivers/net/ixgbe/ixgbe_ethdev.c | 4 +- drivers/net/liquidio/lio_ethdev.c | 4 +- drivers/net/nfp/nfp_ethdev.c | 2 +- drivers/net/nfp/nfp_ethdev_vf.c | 6 +- drivers/net/nfp/nfpcore/nfp_cpp_pcie_ops.c | 4 +- drivers/net/ngbe/ngbe_ethdev.c | 2 +- drivers/net/octeon_ep/otx_ep_ethdev.c | 2 +- drivers/net/octeontx/base/octeontx_pkivf.c | 6 +- drivers/net/octeontx/base/octeontx_pkovf.c | 12 +- drivers/net/qede/qede_main.c | 6 +- drivers/net/sfc/sfc.c | 2 +- drivers/net/thunderx/nicvf_ethdev.c | 2 +- drivers/net/txgbe/txgbe_ethdev.c | 2 +- drivers/net/txgbe/txgbe_ethdev_vf.c | 2 +- drivers/net/virtio/virtio_pci.c | 6 +- drivers/net/vmxnet3/vmxnet3_ethdev.c | 4 +- drivers/raw/cnxk_bphy/cnxk_bphy.c | 10 +- drivers/raw/cnxk_bphy/cnxk_bphy_cgx.c | 6 +- drivers/raw/ifpga/afu_pmd_n3000.c | 4 +- drivers/raw/ifpga/ifpga_rawdev.c | 6 +- drivers/raw/ntb/ntb_hw_intel.c | 8 +- drivers/vdpa/ifc/ifcvf_vdpa.c | 6 +- drivers/vdpa/sfc/sfc_vdpa_hw.c | 2 +- drivers/vdpa/sfc/sfc_vdpa_ops.c | 2 +- lib/eal/include/rte_vfio.h | 1 - 90 files changed, 853 insertions(+), 352 deletions(-) -- 2.17.1