From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 2BCA1A09DF for ; Wed, 2 Dec 2020 19:36:45 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id EE639C9C0; Wed, 2 Dec 2020 19:36:43 +0100 (CET) Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by dpdk.org (Postfix) with ESMTP id C1DFCC99C; Wed, 2 Dec 2020 19:36:39 +0100 (CET) Received: from pps.filterd (m0098413.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 0B2IWp9F037656; Wed, 2 Dec 2020 13:36:39 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=subject : to : cc : references : from : message-id : date : mime-version : in-reply-to : content-type : content-transfer-encoding; s=pp1; bh=UkSH3MOgqVV+A/itHnEdIMbD4dRxaNq60D9wWGVyG+g=; b=KzO52gY2HVYwMdLyhJcor54tRmBe7QhQjrxe9z4WOq9Rf2GvUPZCnDy8tioyJn8Ch8a2 AwiN18QdeMb0FdIFB4hfXkElZjIS4GK/KNKFqfVDK4gsYUDJ2ylk0L+fa732qzjxxrn3 3O0uK0SFMta52S7aAJAeMMB/ejbqa5MxVC3fPQHgsU6Vb+w0TE2ci2+CN4EfomnjUS35 LQS2X26XxtczF9of3XHafaSbdtc9KfjfuHCmjzmILC2ZbW/vITwUrPajQwmb9+jOwiE7 jsWNxQ7aqRSmUBk2T0nfrLirzw5atq8HC8ChjxBU/TD1lBGMz1pb+rhfR5liCwZLaj8e 6A== Received: from pps.reinject (localhost [127.0.0.1]) by mx0b-001b2d01.pphosted.com with ESMTP id 356dgfnae5-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 02 Dec 2020 13:36:38 -0500 Received: from m0098413.ppops.net (m0098413.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.36/8.16.0.36) with SMTP id 0B2IWvGJ038108; Wed, 2 Dec 2020 13:36:38 -0500 Received: from ppma02wdc.us.ibm.com (aa.5b.37a9.ip4.static.sl-reverse.com [169.55.91.170]) by mx0b-001b2d01.pphosted.com with ESMTP id 356dgfnadv-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 02 Dec 2020 13:36:38 -0500 Received: from pps.filterd (ppma02wdc.us.ibm.com [127.0.0.1]) by ppma02wdc.us.ibm.com (8.16.0.42/8.16.0.42) with SMTP id 0B2IQHHV010852; Wed, 2 Dec 2020 18:36:37 GMT Received: from b03cxnp07028.gho.boulder.ibm.com (b03cxnp07028.gho.boulder.ibm.com [9.17.130.15]) by ppma02wdc.us.ibm.com with ESMTP id 356cbehnuh-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 02 Dec 2020 18:36:36 +0000 Received: from b03ledav005.gho.boulder.ibm.com (b03ledav005.gho.boulder.ibm.com [9.17.130.236]) by b03cxnp07028.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 0B2IaZFq21954820 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 2 Dec 2020 18:36:35 GMT Received: from b03ledav005.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 6603BBE051; Wed, 2 Dec 2020 18:36:35 +0000 (GMT) Received: from b03ledav005.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id E559DBE053; Wed, 2 Dec 2020 18:36:34 +0000 (GMT) Received: from Davids-MBP.randomparity.org (unknown [9.163.49.161]) by b03ledav005.gho.boulder.ibm.com (Postfix) with ESMTP; Wed, 2 Dec 2020 18:36:34 +0000 (GMT) To: Nithin Dabilpuram , anatoly.burakov@intel.com, david.marchand@redhat.com Cc: jerinj@marvell.com, dev@dpdk.org, stable@dpdk.org References: <20201012081106.10610-1-ndabilpuram@marvell.com> <20201202054647.3449-1-ndabilpuram@marvell.com> <20201202054647.3449-2-ndabilpuram@marvell.com> From: David Christensen Message-ID: <7a79bbe4-2402-9f6e-4101-360f14b4e599@linux.vnet.ibm.com> Date: Wed, 2 Dec 2020 10:36:34 -0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.16; rv:78.0) Gecko/20100101 Thunderbird/78.5.0 MIME-Version: 1.0 In-Reply-To: <20201202054647.3449-2-ndabilpuram@marvell.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.312, 18.0.737 definitions=2020-12-02_10:2020-11-30, 2020-12-02 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 suspectscore=0 malwarescore=0 bulkscore=0 impostorscore=0 priorityscore=1501 spamscore=0 mlxlogscore=999 lowpriorityscore=0 adultscore=0 phishscore=0 mlxscore=0 clxscore=1011 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2012020107 Subject: Re: [dpdk-stable] [PATCH v4 1/4] vfio: revert changes for map contiguous areas in one go X-BeenThere: stable@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches for DPDK stable branches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: stable-bounces@dpdk.org Sender: "stable" On 12/1/20 9:46 PM, Nithin Dabilpuram wrote: > In order to save DMA entries limited by kernel both for externel > memory and hugepage memory, an attempt was made to map physically > contiguous memory in one go. This cannot be done as VFIO IOMMU type1 > does not support partially unmapping a previously mapped memory > region while Heap can request for multi page mapping and > partial unmapping. > Hence for going back to old method of mapping/unmapping at > memseg granularity, this commit reverts > commit d1c7c0cdf7ba ("vfio: map contiguous areas in one go") > > Also add documentation on what module parameter needs to be used > to increase the per-container dma map limit for VFIO. > > Fixes: d1c7c0cdf7ba ("vfio: map contiguous areas in one go") > Cc: anatoly.burakov@intel.com > Cc: stable@dpdk.org > > Signed-off-by: Nithin Dabilpuram > Acked-by: Anatoly Burakov > --- > doc/guides/linux_gsg/linux_drivers.rst | 10 ++++++ > lib/librte_eal/linux/eal_vfio.c | 59 +++++----------------------------- > 2 files changed, 18 insertions(+), 51 deletions(-) > > diff --git a/doc/guides/linux_gsg/linux_drivers.rst b/doc/guides/linux_gsg/linux_drivers.rst > index 90635a4..9a662a7 100644 > --- a/doc/guides/linux_gsg/linux_drivers.rst > +++ b/doc/guides/linux_gsg/linux_drivers.rst > @@ -25,6 +25,16 @@ To make use of VFIO, the ``vfio-pci`` module must be loaded: > VFIO kernel is usually present by default in all distributions, > however please consult your distributions documentation to make sure that is the case. > > +For DMA mapping of either external memory or hugepages, VFIO interface is used. > +VFIO does not support partial unmap of once mapped memory. Hence DPDK's memory is > +mapped in hugepage granularity or system page granularity. Number of DMA > +mappings is limited by kernel with user locked memory limit of a process(rlimit) > +for system/hugepage memory. Another per-container overall limit applicable both > +for external memory and system memory was added in kernel 5.1 defined by > +VFIO module parameter ``dma_entry_limit`` with a default value of 64K. > +When application is out of DMA entries, these limits need to be adjusted to > +increase the allowed limit. > + > Since Linux version 5.7, > the ``vfio-pci`` module supports the creation of virtual functions. > After the PF is bound to ``vfio-pci`` module, > diff --git a/lib/librte_eal/linux/eal_vfio.c b/lib/librte_eal/linux/eal_vfio.c > index 0500824..64b134d 100644 > --- a/lib/librte_eal/linux/eal_vfio.c > +++ b/lib/librte_eal/linux/eal_vfio.c > @@ -517,11 +517,9 @@ static void > vfio_mem_event_callback(enum rte_mem_event type, const void *addr, size_t len, > void *arg __rte_unused) > { > - rte_iova_t iova_start, iova_expected; > struct rte_memseg_list *msl; > struct rte_memseg *ms; > size_t cur_len = 0; > - uint64_t va_start; > > msl = rte_mem_virt2memseg_list(addr); > > @@ -539,63 +537,22 @@ vfio_mem_event_callback(enum rte_mem_event type, const void *addr, size_t len, > > /* memsegs are contiguous in memory */ > ms = rte_mem_virt2memseg(addr, msl); > - > - /* > - * This memory is not guaranteed to be contiguous, but it still could > - * be, or it could have some small contiguous chunks. Since the number > - * of VFIO mappings is limited, and VFIO appears to not concatenate > - * adjacent mappings, we have to do this ourselves. > - * > - * So, find contiguous chunks, then map them. > - */ > - va_start = ms->addr_64; > - iova_start = iova_expected = ms->iova; > while (cur_len < len) { > - bool new_contig_area = ms->iova != iova_expected; > - bool last_seg = (len - cur_len) == ms->len; > - bool skip_last = false; > - > - /* only do mappings when current contiguous area ends */ > - if (new_contig_area) { > - if (type == RTE_MEM_EVENT_ALLOC) > - vfio_dma_mem_map(default_vfio_cfg, va_start, > - iova_start, > - iova_expected - iova_start, 1); > - else > - vfio_dma_mem_map(default_vfio_cfg, va_start, > - iova_start, > - iova_expected - iova_start, 0); > - va_start = ms->addr_64; > - iova_start = ms->iova; > - } > /* some memory segments may have invalid IOVA */ > if (ms->iova == RTE_BAD_IOVA) { > RTE_LOG(DEBUG, EAL, "Memory segment at %p has bad IOVA, skipping\n", > ms->addr); > - skip_last = true; > + goto next; > } > - iova_expected = ms->iova + ms->len; > + if (type == RTE_MEM_EVENT_ALLOC) > + vfio_dma_mem_map(default_vfio_cfg, ms->addr_64, > + ms->iova, ms->len, 1); > + else > + vfio_dma_mem_map(default_vfio_cfg, ms->addr_64, > + ms->iova, ms->len, 0); > +next: > cur_len += ms->len; > ++ms; > - > - /* > - * don't count previous segment, and don't attempt to > - * dereference a potentially invalid pointer. > - */ > - if (skip_last && !last_seg) { > - iova_expected = iova_start = ms->iova; > - va_start = ms->addr_64; > - } else if (!skip_last && last_seg) { > - /* this is the last segment and we're not skipping */ > - if (type == RTE_MEM_EVENT_ALLOC) > - vfio_dma_mem_map(default_vfio_cfg, va_start, > - iova_start, > - iova_expected - iova_start, 1); > - else > - vfio_dma_mem_map(default_vfio_cfg, va_start, > - iova_start, > - iova_expected - iova_start, 0); > - } > } > } > Acked-by: David Christensen