From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by dpdk.org (Postfix) with ESMTP id 69B6D2C8 for ; Thu, 6 Jul 2017 09:58:46 +0200 (CEST) Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 536CB13A60; Thu, 6 Jul 2017 07:58:45 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 536CB13A60 Authentication-Results: ext-mx05.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx05.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=maxime.coquelin@redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 mx1.redhat.com 536CB13A60 Received: from [10.36.112.14] (unknown [10.36.112.14]) by smtp.corp.redhat.com (Postfix) with ESMTPS id E2FB077EBC; Thu, 6 Jul 2017 07:58:42 +0000 (UTC) To: Jerin Jacob Cc: Santosh Shukla , thomas@monjalon.net, bruce.richardson@intel.com, dev@dpdk.org, hemant.agrawal@nxp.com, shreyansh.jain@nxp.com, gaetan.rivet@6wind.com References: <20170608110513.22548-1-santosh.shukla@caviumnetworks.com> <20170608110513.22548-8-santosh.shukla@caviumnetworks.com> <730e333b-a9ab-df8b-cf7a-1e0186c6152d@redhat.com> <20170705154314.GA4635@jerin> From: Maxime Coquelin Message-ID: <2fe366fb-15fa-f754-458e-3f4e8be18699@redhat.com> Date: Thu, 6 Jul 2017 09:58:41 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.1.0 MIME-Version: 1.0 In-Reply-To: <20170705154314.GA4635@jerin> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.29]); Thu, 06 Jul 2017 07:58:45 +0000 (UTC) Subject: Re: [dpdk-dev] [PATCH 07/10] linuxapp/eal_vfio: honor iova mode before mapping X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 06 Jul 2017 07:58:46 -0000 On 07/05/2017 05:43 PM, Jerin Jacob wrote: > -----Original Message----- >> Date: Wed, 5 Jul 2017 11:14:01 +0200 >> From: Maxime Coquelin >> To: Santosh Shukla , >> thomas@monjalon.net, bruce.richardson@intel.com, dev@dpdk.org >> CC: jerin.jacob@caviumnetworks.com, hemant.agrawal@nxp.com, >> shreyansh.jain@nxp.com, gaetan.rivet@6wind.com >> Subject: Re: [dpdk-dev] [PATCH 07/10] linuxapp/eal_vfio: honor iova mode >> before mapping >> User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 >> Thunderbird/52.1.0 >> >> >> >> On 06/08/2017 01:05 PM, Santosh Shukla wrote: >>> Check iova mode and accordingly map iova to pa or va. >>> >>> Signed-off-by: Santosh Shukla >>> Signed-off-by: Jerin Jacob >>> --- >>> lib/librte_eal/linuxapp/eal/eal_vfio.c | 10 ++++++++-- >>> 1 file changed, 8 insertions(+), 2 deletions(-) >>> >>> diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c >>> index 04914406f..348b7a7f4 100644 >>> --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c >>> +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c >>> @@ -706,7 +706,10 @@ vfio_type1_dma_map(int vfio_container_fd) >>> dma_map.argsz = sizeof(struct vfio_iommu_type1_dma_map); >>> dma_map.vaddr = ms[i].addr_64; >>> dma_map.size = ms[i].len; >>> - dma_map.iova = ms[i].phys_addr; >>> + if (rte_eal_iova_mode() == RTE_IOVA_VA) >>> + dma_map.iova = dma_map.vaddr; >>> + else >>> + dma_map.iova = ms[i].phys_addr; >>> dma_map.flags = VFIO_DMA_MAP_FLAG_READ | VFIO_DMA_MAP_FLAG_WRITE; >> >> IIUC, it is changing default behavior for VFIO devices. >> >> I see a possible problem, but I'm not sure the case is valid. >> >> Imagine you have two devices in the iommu group, and the two devices are >> used in separate processes. Each process could try two different >> physical addresses at the same virtual address, and so the second map >> would fail. > > IMO, Doesn't look like a problem. Here is the data flow > > 1) The vfio DMA map function(vfio_type1_dma_map()) will be called only > on primary process > http://dpdk.org/browse/dpdk/tree/lib/librte_eal/linuxapp/eal/eal_vfio.c#n359 > > 2) On secondary process, DPDK rte_eal_huge_page_attach() will make sure > that, the Secondary process has the _same_ virtual address as primary or > exit from on attach. > http://dpdk.org/browse/dpdk/tree/lib/librte_eal/linuxapp/eal/eal_memory.c#n1452 > > 3) Since secondary process adds the mapped the virtual address in step (2). > in the page table in OS. On SMMU entry miss(When device > request from I/O transaction), OS will load the mapping and update the SMMU > "context" with page tables from MMU. Ok thanks for the detailed info, but what about the case where the same iommu group is used by two primary processes? I don't know how frequent it is, but if ACS is not supported by either the endpoint or the the root port, then you would have to share the same IOMMU group for all the ports of your card. Right? > Let me add the background for why this feature is required in DPDK to > enable NPU style co-processors. > > The traditional NICs the Rx path code look like this: > 1) On control path, Fill the mempool with buffers > 2) on rx_burst(), alloc the mbuf from mempool > 3) SW has the mbuf in hand(which is a virtual address) and program the > HW with mbuf->buf_physaddr) > 4) Return the last pushed mbuf(will be updated by HW by now) > > > On NPU style co-processors, situation is different as the buffer recycling > has been done in HW unlike SW model. Here is the data flow: > 1) On control path, Fill the HW mempool with buffers(Obviously the IOVA > address, which is PA in existing model) > 2) on rx_burst, HW gives you IOVA address(as address as step 1) > 3) As application expects VA to operate on it, rx_burst() needs to > convert to VA from PAA. Which is very costly. > Instead with this IOVA as VA scheme, We can avoid the cost of converting > with help of IOMMU/SMMU. > > This patch set auto detects the mode based available of type devices in > bus and provides an option to override mode based on eal argument, so we > don't foresee any issue with this approach and welcome any alternative > approaches. I don't question the need of the feature for these kind of co-processors, using VA as IOVA in your case seems very valid. What concerns me is that we change the default behavior for all other devices. Having an option to override is fine to me, but the default mode should remain the same IMHO. Wouldn't it be possible to default to VA as IOVA only when an HW mempool is in use? > Similar problem exists in another part of the code in DPDK, > http://dpdk.org/browse/dpdk/tree/drivers/bus/fslmc/fslmc_vfio.c#n231 > Its a conditional compilation based approach with duplicating the vfio > code and we are trying to fix the problem in a generic way so that > everyone can get benefited out of it. > > Comments are welcome. Thanks, Maxime > /Jerin > >> >> By using physical addresses, you are safe against this problem. >> >> Any thoughts? >> >> Cheers, >> Maxime