From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 198E7A04B1 for ; Thu, 5 Nov 2020 10:04:40 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id D9E6CBBA4; Thu, 5 Nov 2020 10:04:38 +0100 (CET) Received: from mx0b-0016f401.pphosted.com (mx0a-0016f401.pphosted.com [67.231.148.174]) by dpdk.org (Postfix) with ESMTP id EE0179AFA; Thu, 5 Nov 2020 10:04:32 +0100 (CET) Received: from pps.filterd (m0045849.ppops.net [127.0.0.1]) by mx0a-0016f401.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 0A58xwY2028563; Thu, 5 Nov 2020 01:04:31 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-type; s=pfpt0220; bh=t9RcXXvOwKb9+ewICVWX/Rw9Z24CiFvfSPhVMAC7e6M=; b=i0MqWC5Nawkij+wamY01SutWfcHKBIUFZ/vTx8LbMOdeF01TELYb793iOzswgkGEzRGD BqN2R7xKA/5CTD14BISZmLgzMmBYzJ+EeJxGMy8R2CEQRkbmNxLq2KQTGoxT9QMlfeO/ 1dSeI/DcYp8dp4xFWeW55mlBsDNZDIKLeOeKt5a5CUGGdH9m4gH2KaTJyX/6RLXhia93 xf8wbRDRIm5PjKX+X5/HixHelEwSuXorCuRLdVF9PuU1FZEUL0h/wXeKgye0PtbRR/69 Lh+HjGH3t5zdHhAhsLLn+Pg7HHObAHTuxt8bSNTyUOEcYy2AaQZYHOFjpXMPaF8GO/Q8 Tw== Received: from sc-exch01.marvell.com ([199.233.58.181]) by mx0a-0016f401.pphosted.com with ESMTP id 34mbfcrndp-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT); Thu, 05 Nov 2020 01:04:31 -0800 Received: from DC5-EXCH02.marvell.com (10.69.176.39) by SC-EXCH01.marvell.com (10.93.176.81) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Thu, 5 Nov 2020 01:04:29 -0800 Received: from maili.marvell.com (10.69.176.80) by DC5-EXCH02.marvell.com (10.69.176.39) with Microsoft SMTP Server id 15.0.1497.2 via Frontend Transport; Thu, 5 Nov 2020 01:04:30 -0800 Received: from hyd1588t430.marvell.com (unknown [10.29.52.204]) by maili.marvell.com (Postfix) with ESMTP id 505223F703F; Thu, 5 Nov 2020 01:04:28 -0800 (PST) From: Nithin Dabilpuram To: CC: , , Nithin Dabilpuram , Date: Thu, 5 Nov 2020 14:34:21 +0530 Message-ID: <20201105090423.11954-2-ndabilpuram@marvell.com> X-Mailer: git-send-email 2.8.4 In-Reply-To: <20201105090423.11954-1-ndabilpuram@marvell.com> References: <20201012081106.10610-1-ndabilpuram@marvell.com> <20201105090423.11954-1-ndabilpuram@marvell.com> MIME-Version: 1.0 Content-Type: text/plain X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.312, 18.0.737 definitions=2020-11-05_05:2020-11-05, 2020-11-05 signatures=0 Subject: [dpdk-stable] [PATCH v2 1/3] vfio: revert changes for map contiguous areas in one go X-BeenThere: stable@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches for DPDK stable branches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: stable-bounces@dpdk.org Sender: "stable" In order to save DMA entries limited by kernel both for externel memory and hugepage memory, an attempt was made to map physically contiguous memory in one go. This cannot be done as VFIO IOMMU type1 does not support partially unmapping a previously mapped memory region while Heap can request for multi page mapping and partial unmapping. Hence for going back to old method of mapping/unmapping at memseg granularity, this commit reverts commit d1c7c0cdf7ba ("vfio: map contiguous areas in one go") Also add documentation on what module parameter needs to be used to increase the per-container dma map limit for VFIO. Fixes: d1c7c0cdf7ba ("vfio: map contiguous areas in one go") Cc: anatoly.burakov@intel.com Cc: stable@dpdk.org Signed-off-by: Nithin Dabilpuram --- doc/guides/linux_gsg/linux_drivers.rst | 10 ++++++ lib/librte_eal/linux/eal_vfio.c | 59 +++++----------------------------- 2 files changed, 18 insertions(+), 51 deletions(-) diff --git a/doc/guides/linux_gsg/linux_drivers.rst b/doc/guides/linux_gsg/linux_drivers.rst index 080b449..bb43ab2 100644 --- a/doc/guides/linux_gsg/linux_drivers.rst +++ b/doc/guides/linux_gsg/linux_drivers.rst @@ -67,6 +67,16 @@ Note that in order to use VFIO, your kernel must support it. VFIO kernel modules have been included in the Linux kernel since version 3.6.0 and are usually present by default, however please consult your distributions documentation to make sure that is the case. +For DMA mapping of either external memory or hugepages, VFIO interface is used. +VFIO does not support partial unmap of once mapped memory. Hence DPDK's memory is +mapped in hugepage granularity or system page granularity. Number of DMA +mappings is limited by kernel with user locked memory limit of a process(rlimit) +for system/hugepage memory. Another per-container overall limit applicable both +for external memory and system memory was added in kernel 5.1 defined by +VFIO module parameter ``dma_entry_limit`` with a default value of 64K. +When application is out of DMA entries, these limits need to be adjusted to +increase the allowed limit. + The ``vfio-pci`` module since Linux version 5.7 supports the creation of virtual functions. After the PF is bound to vfio-pci module, the user can create the VFs by sysfs interface, and these VFs are bound to vfio-pci module automatically. diff --git a/lib/librte_eal/linux/eal_vfio.c b/lib/librte_eal/linux/eal_vfio.c index 380f2f4..dbefcba 100644 --- a/lib/librte_eal/linux/eal_vfio.c +++ b/lib/librte_eal/linux/eal_vfio.c @@ -516,11 +516,9 @@ static void vfio_mem_event_callback(enum rte_mem_event type, const void *addr, size_t len, void *arg __rte_unused) { - rte_iova_t iova_start, iova_expected; struct rte_memseg_list *msl; struct rte_memseg *ms; size_t cur_len = 0; - uint64_t va_start; msl = rte_mem_virt2memseg_list(addr); @@ -549,63 +547,22 @@ vfio_mem_event_callback(enum rte_mem_event type, const void *addr, size_t len, #endif /* memsegs are contiguous in memory */ ms = rte_mem_virt2memseg(addr, msl); - - /* - * This memory is not guaranteed to be contiguous, but it still could - * be, or it could have some small contiguous chunks. Since the number - * of VFIO mappings is limited, and VFIO appears to not concatenate - * adjacent mappings, we have to do this ourselves. - * - * So, find contiguous chunks, then map them. - */ - va_start = ms->addr_64; - iova_start = iova_expected = ms->iova; while (cur_len < len) { - bool new_contig_area = ms->iova != iova_expected; - bool last_seg = (len - cur_len) == ms->len; - bool skip_last = false; - - /* only do mappings when current contiguous area ends */ - if (new_contig_area) { - if (type == RTE_MEM_EVENT_ALLOC) - vfio_dma_mem_map(default_vfio_cfg, va_start, - iova_start, - iova_expected - iova_start, 1); - else - vfio_dma_mem_map(default_vfio_cfg, va_start, - iova_start, - iova_expected - iova_start, 0); - va_start = ms->addr_64; - iova_start = ms->iova; - } /* some memory segments may have invalid IOVA */ if (ms->iova == RTE_BAD_IOVA) { RTE_LOG(DEBUG, EAL, "Memory segment at %p has bad IOVA, skipping\n", ms->addr); - skip_last = true; + goto next; } - iova_expected = ms->iova + ms->len; + if (type == RTE_MEM_EVENT_ALLOC) + vfio_dma_mem_map(default_vfio_cfg, ms->addr_64, + ms->iova, ms->len, 1); + else + vfio_dma_mem_map(default_vfio_cfg, ms->addr_64, + ms->iova, ms->len, 0); +next: cur_len += ms->len; ++ms; - - /* - * don't count previous segment, and don't attempt to - * dereference a potentially invalid pointer. - */ - if (skip_last && !last_seg) { - iova_expected = iova_start = ms->iova; - va_start = ms->addr_64; - } else if (!skip_last && last_seg) { - /* this is the last segment and we're not skipping */ - if (type == RTE_MEM_EVENT_ALLOC) - vfio_dma_mem_map(default_vfio_cfg, va_start, - iova_start, - iova_expected - iova_start, 1); - else - vfio_dma_mem_map(default_vfio_cfg, va_start, - iova_start, - iova_expected - iova_start, 0); - } } #ifdef RTE_ARCH_PPC_64 cur_len = 0; -- 2.8.4