From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by dpdk.org (Postfix) with ESMTP id 5FD115689 for ; Thu, 6 Jul 2017 16:39:42 +0200 (CEST) Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id C76B1C04D2AF; Thu, 6 Jul 2017 14:39:41 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com C76B1C04D2AF Authentication-Results: ext-mx07.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx07.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=maxime.coquelin@redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 mx1.redhat.com C76B1C04D2AF Received: from [10.36.112.14] (unknown [10.36.112.14]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 4A91470130; Thu, 6 Jul 2017 14:39:38 +0000 (UTC) To: santosh , Jerin Jacob Cc: thomas@monjalon.net, bruce.richardson@intel.com, dev@dpdk.org, hemant.agrawal@nxp.com, shreyansh.jain@nxp.com, gaetan.rivet@6wind.com References: <20170608110513.22548-1-santosh.shukla@caviumnetworks.com> <20170608110513.22548-8-santosh.shukla@caviumnetworks.com> <730e333b-a9ab-df8b-cf7a-1e0186c6152d@redhat.com> <20170705154314.GA4635@jerin> <2fe366fb-15fa-f754-458e-3f4e8be18699@redhat.com> <20170706094939.GA1709@jerin> <89425d75-3f79-d3e8-f0b1-330292866bbb@redhat.com> <1e7b7b37-ab0a-e568-c614-7c7ec606fd22@redhat.com> <16fd150a-7663-ef6d-dfa2-c72140a82d4b@caviumnetworks.com> From: Maxime Coquelin Message-ID: <4c69eee7-459f-d6df-10ef-7aeeff497910@redhat.com> Date: Thu, 6 Jul 2017 16:39:36 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.1.0 MIME-Version: 1.0 In-Reply-To: <16fd150a-7663-ef6d-dfa2-c72140a82d4b@caviumnetworks.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Thu, 06 Jul 2017 14:39:42 +0000 (UTC) Subject: Re: [dpdk-dev] [PATCH 07/10] linuxapp/eal_vfio: honor iova mode before mapping X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 06 Jul 2017 14:39:43 -0000 On 07/06/2017 04:13 PM, santosh wrote: > On Thursday 06 July 2017 06:41 PM, Maxime Coquelin wrote: > >> >> On 07/06/2017 03:08 PM, Maxime Coquelin wrote: >>> >>> >>> On 07/06/2017 01:19 PM, santosh wrote: >>>> On Thursday 06 July 2017 04:29 PM, Maxime Coquelin wrote: >>>> >>>>> >>>>> On 07/06/2017 11:49 AM, Jerin Jacob wrote: >>>>>> -----Original Message----- >>>>>>> Date: Thu, 6 Jul 2017 09:58:41 +0200 >>>>>>> From: Maxime Coquelin >>>>>>> To: Jerin Jacob >>>>>>> CC: Santosh Shukla , >>>>>>> thomas@monjalon.net, bruce.richardson@intel.com, dev@dpdk.org, >>>>>>> hemant.agrawal@nxp.com, shreyansh.jain@nxp.com, gaetan.rivet@6wind.com >>>>>>> Subject: Re: [dpdk-dev] [PATCH 07/10] linuxapp/eal_vfio: honor iova mode >>>>>>> before mapping >>>>>>> User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 >>>>>>> Thunderbird/52.1.0 >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 07/05/2017 05:43 PM, Jerin Jacob wrote: >>>>>>>> -----Original Message----- >>>>>>>>> Date: Wed, 5 Jul 2017 11:14:01 +0200 >>>>>>>>> From: Maxime Coquelin >>>>>>>>> To: Santosh Shukla , >>>>>>>>> thomas@monjalon.net, bruce.richardson@intel.com, dev@dpdk.org >>>>>>>>> CC: jerin.jacob@caviumnetworks.com, hemant.agrawal@nxp.com, >>>>>>>>> shreyansh.jain@nxp.com, gaetan.rivet@6wind.com >>>>>>>>> Subject: Re: [dpdk-dev] [PATCH 07/10] linuxapp/eal_vfio: honor iova mode >>>>>>>>> before mapping >>>>>>>>> User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 >>>>>>>>> Thunderbird/52.1.0 >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On 06/08/2017 01:05 PM, Santosh Shukla wrote: >>>>>>>>>> Check iova mode and accordingly map iova to pa or va. >>>>>>>>>> >>>>>>>>>> Signed-off-by: Santosh Shukla >>>>>>>>>> Signed-off-by: Jerin Jacob >>>>>>>>>> --- >>>>>>>>>> lib/librte_eal/linuxapp/eal/eal_vfio.c | 10 ++++++++-- >>>>>>>>>> 1 file changed, 8 insertions(+), 2 deletions(-) >>>>>>>>>> >>>>>>>>>> diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c >>>>>>>>>> index 04914406f..348b7a7f4 100644 >>>>>>>>>> --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c >>>>>>>>>> +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c >>>>>>>>>> @@ -706,7 +706,10 @@ vfio_type1_dma_map(int vfio_container_fd) >>>>>>>>>> dma_map.argsz = sizeof(struct vfio_iommu_type1_dma_map); >>>>>>>>>> dma_map.vaddr = ms[i].addr_64; >>>>>>>>>> dma_map.size = ms[i].len; >>>>>>>>>> - dma_map.iova = ms[i].phys_addr; >>>>>>>>>> + if (rte_eal_iova_mode() == RTE_IOVA_VA) >>>>>>>>>> + dma_map.iova = dma_map.vaddr; >>>>>>>>>> + else >>>>>>>>>> + dma_map.iova = ms[i].phys_addr; >>>>>>>>>> dma_map.flags = VFIO_DMA_MAP_FLAG_READ | VFIO_DMA_MAP_FLAG_WRITE; >>>>>>>>> >>>>>>>>> IIUC, it is changing default behavior for VFIO devices. >>>>>>>>> >>>>>>>>> I see a possible problem, but I'm not sure the case is valid. >>>>>>>>> >>>>>>>>> Imagine you have two devices in the iommu group, and the two devices are >>>>>>>>> used in separate processes. Each process could try two different >>>>>>>>> physical addresses at the same virtual address, and so the second map >>>>>>>>> would fail. >>>>>>>> >>>>>>>> IMO, Doesn't look like a problem. Here is the data flow >>>>>>>> >>>>>>>> 1) The vfio DMA map function(vfio_type1_dma_map()) will be called only >>>>>>>> on primary process >>>>>>>> http://dpdk.org/browse/dpdk/tree/lib/librte_eal/linuxapp/eal/eal_vfio.c#n359 >>>>>>>> >>>>>>>> 2) On secondary process, DPDK rte_eal_huge_page_attach() will make sure >>>>>>>> that, the Secondary process has the _same_ virtual address as primary or >>>>>>>> exit from on attach. >>>>>>>> http://dpdk.org/browse/dpdk/tree/lib/librte_eal/linuxapp/eal/eal_memory.c#n1452 >>>>>>>> >>>>>>>> 3) Since secondary process adds the mapped the virtual address in step (2). >>>>>>>> in the page table in OS. On SMMU entry miss(When device >>>>>>>> request from I/O transaction), OS will load the mapping and update the SMMU >>>>>>>> "context" with page tables from MMU. >>>>>>> >>>>>>> Ok thanks for the detailed info, but what about the case where the same >>>>>>> iommu group is used by two primary processes? >>>>>> >>>>>> Does that case exist with DPDK? We always need to blacklist same BDF in >>>>>> the secondary process to make things work with existing DPDK setup. Which >>>>>> make sense as well. Only primary process configures the HW blocks. >>>>> >>>>> I meant the case when two BDF are in the same IOMMU group (if ACS is not >>>>> supported at some point in the hierarchy). And I meant two primary >>>>> processes running, like for example two containers running each a DPDK >>>>> application. >>>>> >>>>> Maybe this is not a valid use-case (it is not secure, as it would break >>>>> isolation between the two containers), but it seems that it is something >>>>> DPDK allows today, if I'm not mistaken. >>>>> >>>> I'm not sure how two primary process could run, as because latter primary process >>>> would try accessing /var/run/.rte_config and would fail at this [1] point. >>>> >>>> It's not valid use-case for dpdk (imo). >>>> [1] http://dpdk.org/browse/dpdk/tree/lib/librte_eal/linuxapp/eal/eal.c#n204 >>> >>> Yes this is possible. I had never used it before, but Thomas told me it >>> is supported by setting--file-prefix option. I had a trial, and I >>> confirm it works: >>> session 1> ./install/bin/testpmd -l 0,2 --socket-mem=1024 -w 0000:05:00.0 --proc-type=primary --file-prefix=app1 -- --disable-hw-vlan -i --rxq=1 --txq=1 --nb-cores=1 --forward-mode=io >>> session 2> ./install/bin/testpmd -l 0,3 --socket-mem=1024 -w 0000:05:00.1 --proc-type=primary --file-prefix=app2 -- --disable-hw-vlan -i --rxq=1 --txq=1 --nb-cores=1 --forward-mode=io >>> >>> In the above example, two ports of the same card is used by two >>> processes. Note that in this case, ACS is supproted and both ports have >>> their own iommu group. >> >> # ls -al /var/run/.app* >> -rw-r-----. 1 root root 208420 Jul 6 09:08 /var/run/.app1_config >> -rw-r--r--. 1 root root 49728 Jul 6 09:08 /var/run/.app1_hugepage_info >> srwxr-xr-x. 1 root root 0 Jul 6 09:08 /var/run/.app1_mp_socket >> -rw-r-----. 1 root root 208420 Jul 6 09:08 /var/run/.app2_config >> -rw-r--r--. 1 root root 45584 Jul 6 09:08 /var/run/.app2_hugepage_info >> srwxr-xr-x. 1 root root 0 Jul 6 09:08 /var/run/.app2_mp_socket >> > Yes, You're right, you can start two primary process, I missed that point. > Use-case which you mentioned is ok, because they are under two different iommu > group so proposed scheme will work. It may not work for the case when ACS not present, > so its bypass mode which falls under vfio-noiommu category. > > Having said that: Per discussion on [1]. The proposed scheme where > bus makes decision based on pci_id and/or pci_drv will be a full proof > solution, and that way other types of devices will not be impacted. Right? Right! Thanks, Maxime > [1] https://www.mail-archive.com/dev@dpdk.org/msg70283.html > >