From: "Xia, Chenbo" <chenbo.xia@intel.com>
To: David Marchand <david.marchand@redhat.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>,
"skori@marvell.com" <skori@marvell.com>,
"Cao, Yahui" <yahui.cao@intel.com>,
"Li, Miao" <miao.li@intel.com>
Subject: RE: [RFC 0/4] Support VFIO sparse mmap in PCI bus
Date: Tue, 18 Apr 2023 09:33:47 +0000 [thread overview]
Message-ID: <SN6PR11MB35040755150590EF21A230B29C9D9@SN6PR11MB3504.namprd11.prod.outlook.com> (raw)
In-Reply-To: <CAJFAV8xej6m-p37Pc3ScUsuGFd+Fh=2VDW5s=K4UXBbn44qEAg@mail.gmail.com>
David,
Sorry that I missed one comment...
> -----Original Message-----
> From: David Marchand <david.marchand@redhat.com>
> Sent: Tuesday, April 18, 2023 3:47 PM
> To: Xia, Chenbo <chenbo.xia@intel.com>
> Cc: dev@dpdk.org; skori@marvell.com
> Subject: Re: [RFC 0/4] Support VFIO sparse mmap in PCI bus
>
> Hello Chenbo,
>
> On Tue, Apr 18, 2023 at 7:49 AM Chenbo Xia <chenbo.xia@intel.com> wrote:
> >
> > This series introduces a VFIO standard capability, called sparse
> > mmap to PCI bus. In linux kernel, it's defined as
> > VFIO_REGION_INFO_CAP_SPARSE_MMAP. Sparse mmap means instead of
> > mmap whole BAR region into DPDK process, only mmap part of the
> > BAR region after getting sparse mmap information from kernel.
> > For the rest of BAR region that is not mmap-ed, DPDK process
> > can use pread/pwrite system calls to access. Sparse mmap is
> > useful when kernel does not want userspace to mmap whole BAR
> > region, or kernel wants to control over access to specific BAR
> > region. Vendors can choose to enable this feature or not for
> > their devices in their specific kernel modules.
>
> Sorry, I did not take the time to look into the details.
> Could you summarize what would be the benefit of this series?
>
>
> >
> > In this patchset:
> >
> > Patch 1-3 is mainly for introducing BAR access APIs so that
> > driver could use them to access specific BAR using pread/pwrite
> > system calls when part of the BAR is not mmap-able.
> >
> > Patch 4 adds the VFIO sparse mmap support finally. A question
> > is for all sparse mmap regions, should they be mapped to a
> > continuous virtual address region that follows device-specific
> > BAR layout or not. In theory, there could be three options to
> > support this feature.
> >
> > Option 1: Map sparse mmap regions independently
> > ======================================================
> > In this approach, we mmap each sparse mmap region one by one
> > and each region could be located anywhere in process address
> > space. But accessing the mmaped BAR will not be as easy as
> > 'bar_base_address + bar_offset', driver needs to check the
> > sparse mmap information to access specific BAR register.
> >
> > Patch 4 in this patchset adopts this option. Driver API change
> > is introduced in bus_pci_driver.h. Corresponding changes in
> > all drivers are also done and currently I am assuming drivers
> > do not support this feature so they will not check the
> > 'is_sparse' flag but assumes it to be false. Note that it will
> > not break any driver and each vendor can add related logic when
> > they start to support this feature. This is only because I don't
> > want to introduce complexity to drivers that do not want to
> > support this feature.
> >
> > Option 2: Map sparse mmap regions based on device-specific BAR layout
> > ======================================================================
> > In this approach, the sparse mmap regions are mapped to continuous
> > virtual address region that follows device-specific BAR layout.
> > For example, the BAR size is 0x4000 and only 0-0x1000 (sparse mmap
> > region #1) and 0x3000-0x4000 (sparse mmap region #2) could be
> > mmaped. Region #1 will be mapped at 'base_addr' and region #2
> > will be mapped at 'base_addr + 0x3000'. The good thing is if
> > we implement like this, driver can still access all BAR registers
> > using 'bar_base_address + bar_offset' way and we don't need
> > to introduce any driver API change. But the address space
> > range 'base_addr + 0x1000' to 'base_addr + 0x3000' may need to
> > be reserved so it could result in waste of address space or memory
> > (when we use MAP_ANONYMOUS and MAP_PRIVATE flag to reserve this
> > range). Meanwhile, driver needs to know which part of BAR is
> > mmaped (this is possible since the range is defined by vendor's
> > specific kernel module).
> >
> > Option 3: Support both option 1 & 2
> > ===================================
> > We could define a driver flag to let driver choose which way it
> > perfers since either option has its own Pros & Cons.
> >
> > Please share your comments, Thanks!
> >
> >
> > Chenbo Xia (4):
> > bus/pci: introduce an internal representation of PCI device
>
> I think this first patch main motivation was to avoid ABI issues.
> Since v22.11, the rte_pci_device object is opaque to applications.
>
> So, do we still need this patch?
I think it could be good to reduce unnecessary driver APIs..
Hiding these region information could be friendly to driver developer?
Thanks,
Chenbo
>
>
> > bus/pci: avoid depending on private value in kernel source
> > bus/pci: introduce helper for MMIO read and write
> > bus/pci: add VFIO sparse mmap support
> >
> > drivers/baseband/acc/rte_acc100_pmd.c | 6 +-
> > drivers/baseband/acc/rte_vrb_pmd.c | 6 +-
> > .../fpga_5gnr_fec/rte_fpga_5gnr_fec.c | 6 +-
> > drivers/baseband/fpga_lte_fec/fpga_lte_fec.c | 6 +-
> > drivers/bus/pci/bsd/pci.c | 43 +-
> > drivers/bus/pci/bus_pci_driver.h | 24 +-
> > drivers/bus/pci/linux/pci.c | 91 +++-
> > drivers/bus/pci/linux/pci_init.h | 14 +-
> > drivers/bus/pci/linux/pci_uio.c | 34 +-
> > drivers/bus/pci/linux/pci_vfio.c | 445 ++++++++++++++----
> > drivers/bus/pci/pci_common.c | 57 ++-
> > drivers/bus/pci/pci_common_uio.c | 12 +-
> > drivers/bus/pci/private.h | 25 +-
> > drivers/bus/pci/rte_bus_pci.h | 48 ++
> > drivers/bus/pci/version.map | 3 +
> > drivers/common/cnxk/roc_dev.c | 4 +-
> > drivers/common/cnxk/roc_dpi.c | 2 +-
> > drivers/common/cnxk/roc_ml.c | 22 +-
> > drivers/common/qat/dev/qat_dev_gen1.c | 2 +-
> > drivers/common/qat/dev/qat_dev_gen4.c | 4 +-
> > drivers/common/sfc_efx/sfc_efx.c | 2 +-
> > drivers/compress/octeontx/otx_zip.c | 4 +-
> > drivers/crypto/ccp/ccp_dev.c | 4 +-
> > drivers/crypto/cnxk/cnxk_cryptodev_ops.c | 2 +-
> > drivers/crypto/nitrox/nitrox_device.c | 4 +-
> > drivers/crypto/octeontx/otx_cryptodev_ops.c | 6 +-
> > drivers/crypto/virtio/virtio_pci.c | 6 +-
> > drivers/dma/cnxk/cnxk_dmadev.c | 2 +-
> > drivers/dma/hisilicon/hisi_dmadev.c | 6 +-
> > drivers/dma/idxd/idxd_pci.c | 4 +-
> > drivers/dma/ioat/ioat_dmadev.c | 2 +-
> > drivers/event/dlb2/pf/dlb2_main.c | 16 +-
> > drivers/event/octeontx/ssovf_probe.c | 38 +-
> > drivers/event/octeontx/timvf_probe.c | 18 +-
> > drivers/event/skeleton/skeleton_eventdev.c | 2 +-
> > drivers/mempool/octeontx/octeontx_fpavf.c | 6 +-
> > drivers/net/ark/ark_ethdev.c | 4 +-
> > drivers/net/atlantic/atl_ethdev.c | 2 +-
> > drivers/net/avp/avp_ethdev.c | 20 +-
> > drivers/net/axgbe/axgbe_ethdev.c | 4 +-
> > drivers/net/bnx2x/bnx2x_ethdev.c | 6 +-
> > drivers/net/bnxt/bnxt_ethdev.c | 8 +-
> > drivers/net/cpfl/cpfl_ethdev.c | 4 +-
> > drivers/net/cxgbe/cxgbe_ethdev.c | 2 +-
> > drivers/net/cxgbe/cxgbe_main.c | 2 +-
> > drivers/net/cxgbe/cxgbevf_ethdev.c | 2 +-
> > drivers/net/cxgbe/cxgbevf_main.c | 2 +-
> > drivers/net/e1000/em_ethdev.c | 4 +-
> > drivers/net/e1000/igb_ethdev.c | 4 +-
> > drivers/net/ena/ena_ethdev.c | 4 +-
> > drivers/net/enetc/enetc_ethdev.c | 2 +-
> > drivers/net/enic/enic_main.c | 4 +-
> > drivers/net/fm10k/fm10k_ethdev.c | 2 +-
> > drivers/net/gve/gve_ethdev.c | 4 +-
> > drivers/net/hinic/base/hinic_pmd_hwif.c | 14 +-
> > drivers/net/hns3/hns3_ethdev.c | 2 +-
> > drivers/net/hns3/hns3_ethdev_vf.c | 2 +-
> > drivers/net/hns3/hns3_rxtx.c | 4 +-
> > drivers/net/i40e/i40e_ethdev.c | 2 +-
> > drivers/net/iavf/iavf_ethdev.c | 2 +-
> > drivers/net/ice/ice_dcf.c | 2 +-
> > drivers/net/ice/ice_ethdev.c | 2 +-
> > drivers/net/idpf/idpf_ethdev.c | 4 +-
> > drivers/net/igc/igc_ethdev.c | 2 +-
> > drivers/net/ionic/ionic_dev_pci.c | 2 +-
> > drivers/net/ixgbe/ixgbe_ethdev.c | 4 +-
> > drivers/net/liquidio/lio_ethdev.c | 4 +-
> > drivers/net/nfp/nfp_ethdev.c | 2 +-
> > drivers/net/nfp/nfp_ethdev_vf.c | 6 +-
> > drivers/net/nfp/nfpcore/nfp_cpp_pcie_ops.c | 4 +-
> > drivers/net/ngbe/ngbe_ethdev.c | 2 +-
> > drivers/net/octeon_ep/otx_ep_ethdev.c | 2 +-
> > drivers/net/octeontx/base/octeontx_pkivf.c | 6 +-
> > drivers/net/octeontx/base/octeontx_pkovf.c | 12 +-
> > drivers/net/qede/qede_main.c | 6 +-
> > drivers/net/sfc/sfc.c | 2 +-
> > drivers/net/thunderx/nicvf_ethdev.c | 2 +-
> > drivers/net/txgbe/txgbe_ethdev.c | 2 +-
> > drivers/net/txgbe/txgbe_ethdev_vf.c | 2 +-
> > drivers/net/virtio/virtio_pci.c | 6 +-
> > drivers/net/vmxnet3/vmxnet3_ethdev.c | 4 +-
> > drivers/raw/cnxk_bphy/cnxk_bphy.c | 10 +-
> > drivers/raw/cnxk_bphy/cnxk_bphy_cgx.c | 6 +-
> > drivers/raw/ifpga/afu_pmd_n3000.c | 4 +-
> > drivers/raw/ifpga/ifpga_rawdev.c | 6 +-
> > drivers/raw/ntb/ntb_hw_intel.c | 8 +-
> > drivers/vdpa/ifc/ifcvf_vdpa.c | 6 +-
> > drivers/vdpa/sfc/sfc_vdpa_hw.c | 2 +-
> > drivers/vdpa/sfc/sfc_vdpa_ops.c | 2 +-
> > lib/eal/include/rte_vfio.h | 1 -
> > 90 files changed, 853 insertions(+), 352 deletions(-)
>
>
> --
> David Marchand
next prev parent reply other threads:[~2023-04-18 9:33 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-18 5:30 Chenbo Xia
2023-04-18 5:30 ` [RFC 1/4] bus/pci: introduce an internal representation of PCI device Chenbo Xia
2023-04-18 5:30 ` [RFC 2/4] bus/pci: avoid depending on private value in kernel source Chenbo Xia
2023-04-18 5:30 ` [RFC 3/4] bus/pci: introduce helper for MMIO read and write Chenbo Xia
2023-04-18 5:30 ` [RFC 4/4] bus/pci: add VFIO sparse mmap support Chenbo Xia
2023-04-18 7:46 ` [RFC 0/4] Support VFIO sparse mmap in PCI bus David Marchand
2023-04-18 9:27 ` Xia, Chenbo
2023-04-18 9:33 ` Xia, Chenbo [this message]
2023-05-08 2:13 ` Xia, Chenbo
2023-05-08 3:04 ` Sunil Kumar Kori
2023-05-15 6:46 ` [PATCH v1 " Miao Li
2023-05-15 6:46 ` [PATCH v1 1/4] bus/pci: introduce an internal representation of PCI device Miao Li
2023-05-15 6:46 ` [PATCH v1 2/4] bus/pci: avoid depending on private value in kernel source Miao Li
2023-05-15 6:46 ` [PATCH v1 3/4] bus/pci: introduce helper for MMIO read and write Miao Li
2023-05-15 6:47 ` [PATCH v1 4/4] bus/pci: add VFIO sparse mmap support Miao Li
2023-05-15 9:41 ` [PATCH v2 0/4] Support VFIO sparse mmap in PCI bus Miao Li
2023-05-15 9:41 ` [PATCH v2 1/4] bus/pci: introduce an internal representation of PCI device Miao Li
2023-05-15 9:41 ` [PATCH v2 2/4] bus/pci: avoid depending on private value in kernel source Miao Li
2023-05-15 9:41 ` [PATCH v2 3/4] bus/pci: introduce helper for MMIO read and write Miao Li
2023-05-15 9:41 ` [PATCH v2 4/4] bus/pci: add VFIO sparse mmap support Miao Li
2023-05-25 16:31 ` [PATCH v3 0/4] Support VFIO sparse mmap in PCI bus Miao Li
2023-05-25 16:31 ` [PATCH v3 1/4] bus/pci: introduce an internal representation of PCI device Miao Li
2023-05-29 6:14 ` [EXT] " Sunil Kumar Kori
2023-05-29 6:28 ` Cao, Yahui
2023-05-25 16:31 ` [PATCH v3 2/4] bus/pci: avoid depending on private value in kernel source Miao Li
2023-05-29 6:15 ` [EXT] " Sunil Kumar Kori
2023-05-29 6:30 ` Cao, Yahui
2023-05-25 16:31 ` [PATCH v3 3/4] bus/pci: introduce helper for MMIO read and write Miao Li
2023-05-29 6:16 ` [EXT] " Sunil Kumar Kori
2023-05-29 6:31 ` Cao, Yahui
2023-05-25 16:31 ` [PATCH v3 4/4] bus/pci: add VFIO sparse mmap support Miao Li
2023-05-29 6:17 ` [EXT] " Sunil Kumar Kori
2023-05-29 6:32 ` Cao, Yahui
2023-05-29 9:25 ` Xia, Chenbo
2023-05-31 5:37 ` [PATCH v4 0/4] Support VFIO sparse mmap in PCI bus Miao Li
2023-05-31 5:37 ` [PATCH v4 1/4] bus/pci: introduce an internal representation of PCI device Miao Li
2023-05-31 5:37 ` [PATCH v4 2/4] bus/pci: avoid depending on private value in kernel source Miao Li
2023-05-31 5:37 ` [PATCH v4 3/4] bus/pci: introduce helper for MMIO read and write Miao Li
2023-05-31 5:37 ` [PATCH v4 4/4] bus/pci: add VFIO sparse mmap support Miao Li
2023-06-07 16:30 ` [PATCH v4 0/4] Support VFIO sparse mmap in PCI bus Thomas Monjalon
2023-06-08 0:28 ` Patrick Robb
2023-06-08 1:36 ` Xia, Chenbo
2023-06-08 1:33 ` Xia, Chenbo
2023-06-08 6:43 ` Ali Alnubani
2023-06-08 6:50 ` Xia, Chenbo
2023-06-08 7:03 ` David Marchand
2023-06-08 12:47 ` Patrick Robb
2023-05-15 15:52 ` [PATCH v1 4/4] bus/pci: add VFIO sparse mmap support Stephen Hemminger
2023-05-22 2:41 ` Li, Miao
2023-05-22 3:42 ` Xia, Chenbo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=SN6PR11MB35040755150590EF21A230B29C9D9@SN6PR11MB3504.namprd11.prod.outlook.com \
--to=chenbo.xia@intel.com \
--cc=david.marchand@redhat.com \
--cc=dev@dpdk.org \
--cc=miao.li@intel.com \
--cc=skori@marvell.com \
--cc=yahui.cao@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).