From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 0506242979; Tue, 18 Apr 2023 09:46:57 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 926CB40EDF; Tue, 18 Apr 2023 09:46:57 +0200 (CEST) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by mails.dpdk.org (Postfix) with ESMTP id 553B240698 for ; Tue, 18 Apr 2023 09:46:56 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681804015; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=QK96A7YDI4Zby4VW0dSkT0R/J/8gffWosc/rOTyKOyE=; b=GFmFMYj8yeb/V5AM/mC4GGUwn4LPXCBG8P6S4ykD3VbDX9OY2AxswlAHM8BjftnWn8AuvQ LTOE2G3STgzvA7KokOyru9ry7VAxA5C+IlH6Ul5s7DZn+wSglCfubP2c0K/7BvwxPI4k0G BB7LYseD22oBGFBky81tnB3hr3evZwE= Received: from mail-pg1-f197.google.com (mail-pg1-f197.google.com [209.85.215.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-312-9o4DzeTIOt60PaetEQbCrQ-1; Tue, 18 Apr 2023 03:46:51 -0400 X-MC-Unique: 9o4DzeTIOt60PaetEQbCrQ-1 Received: by mail-pg1-f197.google.com with SMTP id 41be03b00d2f7-517a659f877so1047792a12.1 for ; Tue, 18 Apr 2023 00:46:51 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1681804011; x=1684396011; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=QK96A7YDI4Zby4VW0dSkT0R/J/8gffWosc/rOTyKOyE=; b=cFK0wSqA47AqtIKqC0nkoq5f3cFMiQhYm/9j3sJrK2akPCl3jqRKcf/qbiXYoTHyXM 13e17lz7YpMMUl9BpNRy+vwBwM+KJck+x9dZGAWNcKDlZHCG50l735tYhEtsKONAUovX Dv1nY4UBOpUeWPeboiDGSXQbeOvzSk5/dIXtS9jDO/GLcvhhi5XSeyDsh0FMr3ZEDvlt uAEQ0QSFCnzqIakVcv6/by4ZFkAdPp58uXwCbkF9cAUPXRgLbHMCdHrb9jlplR1jTHcv FbRb74CLzjy91JWFY1sjfuwEe6Oe46AjufsealfOZVWdtW59Tq65smozV9O3RTQXthsa JK6A== X-Gm-Message-State: AAQBX9cUuNRN7iMCNg2HOCYB+LCTyjunozs4YAB+KPMdItdrn8C0zAvB hvWQh96/mPUqLnzrPrmhFwxD+OXqdV84kf6ApX0TKMHv8NfqKJdgeYs0wOyTSndxWl+QkyNiUkz cIoFV7za8L2WVWK4IT2A= X-Received: by 2002:a63:4a0c:0:b0:503:354c:41ad with SMTP id x12-20020a634a0c000000b00503354c41admr352637pga.11.1681804010726; Tue, 18 Apr 2023 00:46:50 -0700 (PDT) X-Google-Smtp-Source: AKy350aL4dBe4ldKYDOYyW9bvlGz5XVbN3VaXIcs0ul9ambPpIw8QJfrs3uQKYCARlrz5ixM9on3Fl+R8R4aimnMoRY= X-Received: by 2002:a63:4a0c:0:b0:503:354c:41ad with SMTP id x12-20020a634a0c000000b00503354c41admr352634pga.11.1681804010375; Tue, 18 Apr 2023 00:46:50 -0700 (PDT) MIME-Version: 1.0 References: <20230418053012.10667-1-chenbo.xia@intel.com> In-Reply-To: <20230418053012.10667-1-chenbo.xia@intel.com> From: David Marchand Date: Tue, 18 Apr 2023 09:46:39 +0200 Message-ID: Subject: Re: [RFC 0/4] Support VFIO sparse mmap in PCI bus To: Chenbo Xia Cc: dev@dpdk.org, skori@marvell.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Hello Chenbo, On Tue, Apr 18, 2023 at 7:49=E2=80=AFAM Chenbo Xia w= rote: > > This series introduces a VFIO standard capability, called sparse > mmap to PCI bus. In linux kernel, it's defined as > VFIO_REGION_INFO_CAP_SPARSE_MMAP. Sparse mmap means instead of > mmap whole BAR region into DPDK process, only mmap part of the > BAR region after getting sparse mmap information from kernel. > For the rest of BAR region that is not mmap-ed, DPDK process > can use pread/pwrite system calls to access. Sparse mmap is > useful when kernel does not want userspace to mmap whole BAR > region, or kernel wants to control over access to specific BAR > region. Vendors can choose to enable this feature or not for > their devices in their specific kernel modules. Sorry, I did not take the time to look into the details. Could you summarize what would be the benefit of this series? > > In this patchset: > > Patch 1-3 is mainly for introducing BAR access APIs so that > driver could use them to access specific BAR using pread/pwrite > system calls when part of the BAR is not mmap-able. > > Patch 4 adds the VFIO sparse mmap support finally. A question > is for all sparse mmap regions, should they be mapped to a > continuous virtual address region that follows device-specific > BAR layout or not. In theory, there could be three options to > support this feature. > > Option 1: Map sparse mmap regions independently > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D > In this approach, we mmap each sparse mmap region one by one > and each region could be located anywhere in process address > space. But accessing the mmaped BAR will not be as easy as > 'bar_base_address + bar_offset', driver needs to check the > sparse mmap information to access specific BAR register. > > Patch 4 in this patchset adopts this option. Driver API change > is introduced in bus_pci_driver.h. Corresponding changes in > all drivers are also done and currently I am assuming drivers > do not support this feature so they will not check the > 'is_sparse' flag but assumes it to be false. Note that it will > not break any driver and each vendor can add related logic when > they start to support this feature. This is only because I don't > want to introduce complexity to drivers that do not want to > support this feature. > > Option 2: Map sparse mmap regions based on device-specific BAR layout > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > In this approach, the sparse mmap regions are mapped to continuous > virtual address region that follows device-specific BAR layout. > For example, the BAR size is 0x4000 and only 0-0x1000 (sparse mmap > region #1) and 0x3000-0x4000 (sparse mmap region #2) could be > mmaped. Region #1 will be mapped at 'base_addr' and region #2 > will be mapped at 'base_addr + 0x3000'. The good thing is if > we implement like this, driver can still access all BAR registers > using 'bar_base_address + bar_offset' way and we don't need > to introduce any driver API change. But the address space > range 'base_addr + 0x1000' to 'base_addr + 0x3000' may need to > be reserved so it could result in waste of address space or memory > (when we use MAP_ANONYMOUS and MAP_PRIVATE flag to reserve this > range). Meanwhile, driver needs to know which part of BAR is > mmaped (this is possible since the range is defined by vendor's > specific kernel module). > > Option 3: Support both option 1 & 2 > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > We could define a driver flag to let driver choose which way it > perfers since either option has its own Pros & Cons. > > Please share your comments, Thanks! > > > Chenbo Xia (4): > bus/pci: introduce an internal representation of PCI device I think this first patch main motivation was to avoid ABI issues. Since v22.11, the rte_pci_device object is opaque to applications. So, do we still need this patch? > bus/pci: avoid depending on private value in kernel source > bus/pci: introduce helper for MMIO read and write > bus/pci: add VFIO sparse mmap support > > drivers/baseband/acc/rte_acc100_pmd.c | 6 +- > drivers/baseband/acc/rte_vrb_pmd.c | 6 +- > .../fpga_5gnr_fec/rte_fpga_5gnr_fec.c | 6 +- > drivers/baseband/fpga_lte_fec/fpga_lte_fec.c | 6 +- > drivers/bus/pci/bsd/pci.c | 43 +- > drivers/bus/pci/bus_pci_driver.h | 24 +- > drivers/bus/pci/linux/pci.c | 91 +++- > drivers/bus/pci/linux/pci_init.h | 14 +- > drivers/bus/pci/linux/pci_uio.c | 34 +- > drivers/bus/pci/linux/pci_vfio.c | 445 ++++++++++++++---- > drivers/bus/pci/pci_common.c | 57 ++- > drivers/bus/pci/pci_common_uio.c | 12 +- > drivers/bus/pci/private.h | 25 +- > drivers/bus/pci/rte_bus_pci.h | 48 ++ > drivers/bus/pci/version.map | 3 + > drivers/common/cnxk/roc_dev.c | 4 +- > drivers/common/cnxk/roc_dpi.c | 2 +- > drivers/common/cnxk/roc_ml.c | 22 +- > drivers/common/qat/dev/qat_dev_gen1.c | 2 +- > drivers/common/qat/dev/qat_dev_gen4.c | 4 +- > drivers/common/sfc_efx/sfc_efx.c | 2 +- > drivers/compress/octeontx/otx_zip.c | 4 +- > drivers/crypto/ccp/ccp_dev.c | 4 +- > drivers/crypto/cnxk/cnxk_cryptodev_ops.c | 2 +- > drivers/crypto/nitrox/nitrox_device.c | 4 +- > drivers/crypto/octeontx/otx_cryptodev_ops.c | 6 +- > drivers/crypto/virtio/virtio_pci.c | 6 +- > drivers/dma/cnxk/cnxk_dmadev.c | 2 +- > drivers/dma/hisilicon/hisi_dmadev.c | 6 +- > drivers/dma/idxd/idxd_pci.c | 4 +- > drivers/dma/ioat/ioat_dmadev.c | 2 +- > drivers/event/dlb2/pf/dlb2_main.c | 16 +- > drivers/event/octeontx/ssovf_probe.c | 38 +- > drivers/event/octeontx/timvf_probe.c | 18 +- > drivers/event/skeleton/skeleton_eventdev.c | 2 +- > drivers/mempool/octeontx/octeontx_fpavf.c | 6 +- > drivers/net/ark/ark_ethdev.c | 4 +- > drivers/net/atlantic/atl_ethdev.c | 2 +- > drivers/net/avp/avp_ethdev.c | 20 +- > drivers/net/axgbe/axgbe_ethdev.c | 4 +- > drivers/net/bnx2x/bnx2x_ethdev.c | 6 +- > drivers/net/bnxt/bnxt_ethdev.c | 8 +- > drivers/net/cpfl/cpfl_ethdev.c | 4 +- > drivers/net/cxgbe/cxgbe_ethdev.c | 2 +- > drivers/net/cxgbe/cxgbe_main.c | 2 +- > drivers/net/cxgbe/cxgbevf_ethdev.c | 2 +- > drivers/net/cxgbe/cxgbevf_main.c | 2 +- > drivers/net/e1000/em_ethdev.c | 4 +- > drivers/net/e1000/igb_ethdev.c | 4 +- > drivers/net/ena/ena_ethdev.c | 4 +- > drivers/net/enetc/enetc_ethdev.c | 2 +- > drivers/net/enic/enic_main.c | 4 +- > drivers/net/fm10k/fm10k_ethdev.c | 2 +- > drivers/net/gve/gve_ethdev.c | 4 +- > drivers/net/hinic/base/hinic_pmd_hwif.c | 14 +- > drivers/net/hns3/hns3_ethdev.c | 2 +- > drivers/net/hns3/hns3_ethdev_vf.c | 2 +- > drivers/net/hns3/hns3_rxtx.c | 4 +- > drivers/net/i40e/i40e_ethdev.c | 2 +- > drivers/net/iavf/iavf_ethdev.c | 2 +- > drivers/net/ice/ice_dcf.c | 2 +- > drivers/net/ice/ice_ethdev.c | 2 +- > drivers/net/idpf/idpf_ethdev.c | 4 +- > drivers/net/igc/igc_ethdev.c | 2 +- > drivers/net/ionic/ionic_dev_pci.c | 2 +- > drivers/net/ixgbe/ixgbe_ethdev.c | 4 +- > drivers/net/liquidio/lio_ethdev.c | 4 +- > drivers/net/nfp/nfp_ethdev.c | 2 +- > drivers/net/nfp/nfp_ethdev_vf.c | 6 +- > drivers/net/nfp/nfpcore/nfp_cpp_pcie_ops.c | 4 +- > drivers/net/ngbe/ngbe_ethdev.c | 2 +- > drivers/net/octeon_ep/otx_ep_ethdev.c | 2 +- > drivers/net/octeontx/base/octeontx_pkivf.c | 6 +- > drivers/net/octeontx/base/octeontx_pkovf.c | 12 +- > drivers/net/qede/qede_main.c | 6 +- > drivers/net/sfc/sfc.c | 2 +- > drivers/net/thunderx/nicvf_ethdev.c | 2 +- > drivers/net/txgbe/txgbe_ethdev.c | 2 +- > drivers/net/txgbe/txgbe_ethdev_vf.c | 2 +- > drivers/net/virtio/virtio_pci.c | 6 +- > drivers/net/vmxnet3/vmxnet3_ethdev.c | 4 +- > drivers/raw/cnxk_bphy/cnxk_bphy.c | 10 +- > drivers/raw/cnxk_bphy/cnxk_bphy_cgx.c | 6 +- > drivers/raw/ifpga/afu_pmd_n3000.c | 4 +- > drivers/raw/ifpga/ifpga_rawdev.c | 6 +- > drivers/raw/ntb/ntb_hw_intel.c | 8 +- > drivers/vdpa/ifc/ifcvf_vdpa.c | 6 +- > drivers/vdpa/sfc/sfc_vdpa_hw.c | 2 +- > drivers/vdpa/sfc/sfc_vdpa_ops.c | 2 +- > lib/eal/include/rte_vfio.h | 1 - > 90 files changed, 853 insertions(+), 352 deletions(-) --=20 David Marchand