From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id C74B4A04DC for ; Sat, 17 Oct 2020 18:15:12 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 50FD5D031; Sat, 17 Oct 2020 18:15:11 +0200 (CEST) Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by dpdk.org (Postfix) with ESMTP id 72DF5CFAD; Sat, 17 Oct 2020 18:15:06 +0200 (CEST) IronPort-SDR: SKKeOfQo8lPFTAt1D+vUsMXRWaSWQU23OIIcfl3Iev9Ut+W3i+04gZkY9q7+BGYYmqkLfAEvOk sa/8BWE6O+Rg== X-IronPort-AV: E=McAfee;i="6000,8403,9777"; a="146106067" X-IronPort-AV: E=Sophos;i="5.77,387,1596524400"; d="scan'208";a="146106067" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Oct 2020 09:14:59 -0700 IronPort-SDR: iQeP3UBrsauSJEfF3BJzvtqyYovf8oB8+7IfUdWH9Fepp/+x+ngTUjWLRrLLSYuMi4i7pocvYz EKVGsvttby3Q== X-IronPort-AV: E=Sophos;i="5.77,387,1596524400"; d="scan'208";a="532108458" Received: from aburakov-mobl.ger.corp.intel.com (HELO [10.213.210.178]) ([10.213.210.178]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Oct 2020 09:14:58 -0700 To: Nithin Dabilpuram Cc: Jerin Jacob , dev@dpdk.org, stable@dpdk.org References: <20201012081106.10610-1-ndabilpuram@marvell.com> <20201012081106.10610-3-ndabilpuram@marvell.com> <05afb7f5-96bf-dffd-15dd-2024586f7290@intel.com> <20201015060914.GA32207@outlook.office365.com> <66b61bda-03a8-d4c4-af9f-0f90a6ef956d@intel.com> <20201016071015.GA22749@gmail.com> From: "Burakov, Anatoly" Message-ID: <4deaf00f-02d3-15b3-2ebe-4a2becc89251@intel.com> Date: Sat, 17 Oct 2020 17:14:55 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Thunderbird/68.12.0 MIME-Version: 1.0 In-Reply-To: <20201016071015.GA22749@gmail.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-stable] [dpdk-dev] [EXT] Re: [PATCH 2/2] vfio: fix partial DMA unmapping for VFIO type1 X-BeenThere: stable@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches for DPDK stable branches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: stable-bounces@dpdk.org Sender: "stable" On 16-Oct-20 8:10 AM, Nithin Dabilpuram wrote: > On Thu, Oct 15, 2020 at 04:10:31PM +0100, Burakov, Anatoly wrote: >> On 15-Oct-20 12:57 PM, Nithin Dabilpuram wrote: >>> On Thu, Oct 15, 2020 at 3:31 PM Burakov, Anatoly >>> wrote: >>>> >>>> On 15-Oct-20 7:09 AM, Nithin Dabilpuram wrote: >>>>> On Wed, Oct 14, 2020 at 04:07:10PM +0100, Burakov, Anatoly wrote: >>>>>> External Email >>>>>> >>>>>> ---------------------------------------------------------------------- >>>>>> On 12-Oct-20 9:11 AM, Nithin Dabilpuram wrote: >>>>>>> Partial unmapping is not supported for VFIO IOMMU type1 >>>>>>> by kernel. Though kernel gives return as zero, the unmapped size >>>>>>> returned will not be same as expected. So check for >>>>>>> returned unmap size and return error. >>>>>>> >>>>>>> For case of DMA map/unmap triggered by heap allocations, >>>>>>> maintain granularity of memseg page size so that heap >>>>>>> expansion and contraction does not have this issue. >>>>>> >>>>>> This is quite unfortunate, because there was a different bug that had to do >>>>>> with kernel having a very limited number of mappings available [1], as a >>>>>> result of which the page concatenation code was added. >>>>>> >>>>>> It should therefore be documented that the dma_entry_limit parameter should >>>>>> be adjusted should the user run out of the DMA entries. >>>>>> >>>>>> [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_lkml_155414977872.12780.13728555131525362206.stgit-40gimli.home_T_&d=DwICaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=FZ_tPCbgFOh18zwRPO9H0yDx8VW38vuapifdDfc8SFQ&m=3GMg-634_cdUCY4WpQPwjzZ_S4ckuMHOnt2FxyyjXMk&s=TJLzppkaDS95VGyRHX2hzflQfb9XLK0OiOszSXoeXKk&e= >>>> >>>> >>>> >>>>>>> RTE_LOG(ERR, EAL, " cannot clear DMA remapping, error %i (%s)\n", >>>>>>> errno, strerror(errno)); >>>>>>> return -1; >>>>>>> + } else if (dma_unmap.size != len) { >>>>>>> + RTE_LOG(ERR, EAL, " unexpected size %"PRIu64" of DMA " >>>>>>> + "remapping cleared instead of %"PRIu64"\n", >>>>>>> + (uint64_t)dma_unmap.size, len); >>>>>>> + rte_errno = EIO; >>>>>>> + return -1; >>>>>>> } >>>>>>> } >>>>>>> @@ -1853,6 +1869,12 @@ container_dma_unmap(struct vfio_config *vfio_cfg, uint64_t vaddr, uint64_t iova, >>>>>>> /* we're partially unmapping a previously mapped region, so we >>>>>>> * need to split entry into two. >>>>>>> */ >>>>>>> + if (!vfio_cfg->vfio_iommu_type->partial_unmap) { >>>>>>> + RTE_LOG(DEBUG, EAL, "DMA partial unmap unsupported\n"); >>>>>>> + rte_errno = ENOTSUP; >>>>>>> + ret = -1; >>>>>>> + goto out; >>>>>>> + } >>>>>> >>>>>> How would we ever arrive here if we never do more than 1 page worth of >>>>>> memory anyway? I don't think this is needed. >>>>> >>>>> container_dma_unmap() is called by user via rte_vfio_container_dma_unmap() >>>>> and when he maps we don't split it as we don't about his memory. >>>>> So if he maps multiple pages and tries to unmap partially, then we should fail. >>>> >>>> Should we map it in page granularity then, instead of adding this >>>> discrepancy between EAL and user mapping? I.e. instead of adding a >>>> workaround, how about we just do the same thing for user mem mappings? >>>> >>> In heap mapping's we map and unmap it at huge page granularity as we will always >>> maintain that. >>> >>> But here I think we don't know if user's allocation is huge page or >>> collection of system >>> pages. Only thing we can do here is map it at system page granularity which >>> could waste entries if he say really is working with hugepages. Isn't ? >>> >> >> Yeah we do. The API mandates the pages granularity, and it will check >> against page size and number of IOVA entries, so yes, we do enforce the fact >> that the IOVA addresses supplied by the user have to be page addresses. > > If I see rte_vfio_container_dma_map(), there is no mention of Huge page size > user is providing or we computing. He can call rte_vfio_container_dma_map() > with 1GB huge page or 4K system page. > > Am I missing something ? Are you suggesting that a DMA mapping for hugepage-backed memory will be made at system page size granularity? E.g. will a 1GB page-backed segment be mapped for DMA as a contiguous 4K-based block? >> >> -- >> Thanks, >> Anatoly -- Thanks, Anatoly