From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <stable-bounces@dpdk.org>
Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124])
	by inbox.dpdk.org (Postfix) with ESMTP id B24E6A0A02
	for <public@inbox.dpdk.org>; Fri, 15 Jan 2021 08:32:58 +0100 (CET)
Received: from [217.70.189.124] (localhost [127.0.0.1])
	by mails.dpdk.org (Postfix) with ESMTP id A013E140DC1;
	Fri, 15 Jan 2021 08:32:58 +0100 (CET)
Received: from mx0b-0016f401.pphosted.com (mx0a-0016f401.pphosted.com
 [67.231.148.174])
 by mails.dpdk.org (Postfix) with ESMTP id 2AA88140DA0;
 Fri, 15 Jan 2021 08:32:54 +0100 (CET)
Received: from pps.filterd (m0045849.ppops.net [127.0.0.1])
 by mx0a-0016f401.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id
 10F7Tow5029205; Thu, 14 Jan 2021 23:32:54 -0800
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.com;
 h=from : to : cc :
 subject : date : message-id : in-reply-to : references : mime-version :
 content-type; s=pfpt0220; bh=ZDljhYSTmb8u017DksyaXrKC7pXIqAR6/8O3nF4LRSY=;
 b=XvHOippD1AChKOgrjc0aRdi0rSIO4kU8e1qhS91UDbhj5HaxuMact9iLRyu1neXevhNq
 akkLorj3a39T+UAhgGFBtAOx7+3QVIJfpQTknWAXe2kF+/NWfZEWwlPw2rDN9X3QvDW7
 m45oH2ngeRouqgj7AepZPWB6ReRbOEaxZZ0T2EE0HJs0qVqtAHhIfMWTi+evbUEJYl10
 PHOxLem0CVjXy2uISKE3NafH2hmk/1DsXQS99HNLto4Os7w2IqqfvwXmZVNHJVo3i4p0
 Bn2i93l4VbgZ3FPnT07CgZ6L4OhchMnHY7/EyrlLgcleuXXxZO6vaiJ15SDcqijGwEHc dg== 
Received: from dc5-exch01.marvell.com ([199.233.59.181])
 by mx0a-0016f401.pphosted.com with ESMTP id 35yaqt25s7-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT);
 Thu, 14 Jan 2021 23:32:54 -0800
Received: from SC-EXCH04.marvell.com (10.93.176.84) by DC5-EXCH01.marvell.com
 (10.69.176.38) with Microsoft SMTP Server (TLS) id 15.0.1497.2;
 Thu, 14 Jan 2021 23:32:52 -0800
Received: from DC5-EXCH02.marvell.com (10.69.176.39) by SC-EXCH04.marvell.com
 (10.93.176.84) with Microsoft SMTP Server (TLS) id 15.0.1497.2;
 Thu, 14 Jan 2021 23:32:52 -0800
Received: from maili.marvell.com (10.69.176.80) by DC5-EXCH02.marvell.com
 (10.69.176.39) with Microsoft SMTP Server id 15.0.1497.2 via Frontend
 Transport; Thu, 14 Jan 2021 23:32:52 -0800
Received: from hyd1588t430.marvell.com (unknown [10.29.52.204])
 by maili.marvell.com (Postfix) with ESMTP id 373E03F703F;
 Thu, 14 Jan 2021 23:32:49 -0800 (PST)
From: Nithin Dabilpuram <ndabilpuram@marvell.com>
To: <anatoly.burakov@intel.com>, David Christensen <drc@linux.vnet.ibm.com>,
 <david.marchand@redhat.com>
CC: <jerinj@marvell.com>, <dev@dpdk.org>, Nithin Dabilpuram
 <ndabilpuram@marvell.com>, <stable@dpdk.org>
Date: Fri, 15 Jan 2021 13:02:41 +0530
Message-ID: <20210115073243.7025-2-ndabilpuram@marvell.com>
X-Mailer: git-send-email 2.8.4
In-Reply-To: <20210115073243.7025-1-ndabilpuram@marvell.com>
References: <20201012081106.10610-1-ndabilpuram@marvell.com>
 <20210115073243.7025-1-ndabilpuram@marvell.com>
MIME-Version: 1.0
Content-Type: text/plain
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.343, 18.0.737
 definitions=2021-01-15_03:2021-01-15,
 2021-01-15 signatures=0
Subject: [dpdk-stable] [PATCH v8 1/3] vfio: revert changes for map
 contiguous areas in one go
X-BeenThere: stable@dpdk.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: patches for DPDK stable branches <stable.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/stable>,
 <mailto:stable-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/stable/>
List-Post: <mailto:stable@dpdk.org>
List-Help: <mailto:stable-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/stable>,
 <mailto:stable-request@dpdk.org?subject=subscribe>
Errors-To: stable-bounces@dpdk.org
Sender: "stable" <stable-bounces@dpdk.org>

In order to save DMA entries limited by kernel both for externel
memory and hugepage memory, an attempt was made to map physically
contiguous memory in one go. This cannot be done as VFIO IOMMU type1
does not support partially unmapping a previously mapped memory
region while Heap can request for multi page mapping and
partial unmapping.
Hence for going back to old method of mapping/unmapping at
memseg granularity, this commit reverts
commit d1c7c0cdf7ba ("vfio: map contiguous areas in one go")

Also add documentation on what module parameter needs to be used
to increase the per-container dma map limit for VFIO.

Fixes: d1c7c0cdf7ba ("vfio: map contiguous areas in one go")
Cc: anatoly.burakov@intel.com
Cc: stable@dpdk.org

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: David Christensen <drc@linux.vnet.ibm.com>
---
 doc/guides/linux_gsg/linux_drivers.rst | 10 ++++++
 lib/librte_eal/linux/eal_vfio.c        | 59 +++++-----------------------------
 2 files changed, 18 insertions(+), 51 deletions(-)

diff --git a/doc/guides/linux_gsg/linux_drivers.rst b/doc/guides/linux_gsg/linux_drivers.rst
index 90635a4..9a662a7 100644
--- a/doc/guides/linux_gsg/linux_drivers.rst
+++ b/doc/guides/linux_gsg/linux_drivers.rst
@@ -25,6 +25,16 @@ To make use of VFIO, the ``vfio-pci`` module must be loaded:
 VFIO kernel is usually present by default in all distributions,
 however please consult your distributions documentation to make sure that is the case.
 
+For DMA mapping of either external memory or hugepages, VFIO interface is used.
+VFIO does not support partial unmap of once mapped memory. Hence DPDK's memory is
+mapped in hugepage granularity or system page granularity. Number of DMA
+mappings is limited by kernel with user locked memory limit of a process(rlimit)
+for system/hugepage memory. Another per-container overall limit applicable both
+for external memory and system memory was added in kernel 5.1 defined by
+VFIO module parameter ``dma_entry_limit`` with a default value of 64K.
+When application is out of DMA entries, these limits need to be adjusted to
+increase the allowed limit.
+
 Since Linux version 5.7,
 the ``vfio-pci`` module supports the creation of virtual functions.
 After the PF is bound to ``vfio-pci`` module,
diff --git a/lib/librte_eal/linux/eal_vfio.c b/lib/librte_eal/linux/eal_vfio.c
index 0500824..64b134d 100644
--- a/lib/librte_eal/linux/eal_vfio.c
+++ b/lib/librte_eal/linux/eal_vfio.c
@@ -517,11 +517,9 @@ static void
 vfio_mem_event_callback(enum rte_mem_event type, const void *addr, size_t len,
 		void *arg __rte_unused)
 {
-	rte_iova_t iova_start, iova_expected;
 	struct rte_memseg_list *msl;
 	struct rte_memseg *ms;
 	size_t cur_len = 0;
-	uint64_t va_start;
 
 	msl = rte_mem_virt2memseg_list(addr);
 
@@ -539,63 +537,22 @@ vfio_mem_event_callback(enum rte_mem_event type, const void *addr, size_t len,
 
 	/* memsegs are contiguous in memory */
 	ms = rte_mem_virt2memseg(addr, msl);
-
-	/*
-	 * This memory is not guaranteed to be contiguous, but it still could
-	 * be, or it could have some small contiguous chunks. Since the number
-	 * of VFIO mappings is limited, and VFIO appears to not concatenate
-	 * adjacent mappings, we have to do this ourselves.
-	 *
-	 * So, find contiguous chunks, then map them.
-	 */
-	va_start = ms->addr_64;
-	iova_start = iova_expected = ms->iova;
 	while (cur_len < len) {
-		bool new_contig_area = ms->iova != iova_expected;
-		bool last_seg = (len - cur_len) == ms->len;
-		bool skip_last = false;
-
-		/* only do mappings when current contiguous area ends */
-		if (new_contig_area) {
-			if (type == RTE_MEM_EVENT_ALLOC)
-				vfio_dma_mem_map(default_vfio_cfg, va_start,
-						iova_start,
-						iova_expected - iova_start, 1);
-			else
-				vfio_dma_mem_map(default_vfio_cfg, va_start,
-						iova_start,
-						iova_expected - iova_start, 0);
-			va_start = ms->addr_64;
-			iova_start = ms->iova;
-		}
 		/* some memory segments may have invalid IOVA */
 		if (ms->iova == RTE_BAD_IOVA) {
 			RTE_LOG(DEBUG, EAL, "Memory segment at %p has bad IOVA, skipping\n",
 					ms->addr);
-			skip_last = true;
+			goto next;
 		}
-		iova_expected = ms->iova + ms->len;
+		if (type == RTE_MEM_EVENT_ALLOC)
+			vfio_dma_mem_map(default_vfio_cfg, ms->addr_64,
+					ms->iova, ms->len, 1);
+		else
+			vfio_dma_mem_map(default_vfio_cfg, ms->addr_64,
+					ms->iova, ms->len, 0);
+next:
 		cur_len += ms->len;
 		++ms;
-
-		/*
-		 * don't count previous segment, and don't attempt to
-		 * dereference a potentially invalid pointer.
-		 */
-		if (skip_last && !last_seg) {
-			iova_expected = iova_start = ms->iova;
-			va_start = ms->addr_64;
-		} else if (!skip_last && last_seg) {
-			/* this is the last segment and we're not skipping */
-			if (type == RTE_MEM_EVENT_ALLOC)
-				vfio_dma_mem_map(default_vfio_cfg, va_start,
-						iova_start,
-						iova_expected - iova_start, 1);
-			else
-				vfio_dma_mem_map(default_vfio_cfg, va_start,
-						iova_start,
-						iova_expected - iova_start, 0);
-		}
 	}
 }
 
-- 
2.8.4