From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by dpdk.org (Postfix) with ESMTP id 560889E3 for ; Fri, 21 Apr 2017 10:52:30 +0200 (CEST) Received: from pps.filterd (m0098410.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.20/8.16.0.20) with SMTP id v3L8miNd036670 for ; Fri, 21 Apr 2017 04:52:29 -0400 Received: from e23smtp08.au.ibm.com (e23smtp08.au.ibm.com [202.81.31.141]) by mx0a-001b2d01.pphosted.com with ESMTP id 29y9k83tw2-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Fri, 21 Apr 2017 04:52:29 -0400 Received: from localhost by e23smtp08.au.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Fri, 21 Apr 2017 18:52:19 +1000 Received: from d23relay08.au.ibm.com (202.81.31.227) by e23smtp08.au.ibm.com (202.81.31.205) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Fri, 21 Apr 2017 18:52:18 +1000 Received: from d23av06.au.ibm.com (d23av06.au.ibm.com [9.190.235.151]) by d23relay08.au.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id v3L8q90N41418988 for ; Fri, 21 Apr 2017 18:52:17 +1000 Received: from d23av06.au.ibm.com (localhost [127.0.0.1]) by d23av06.au.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id v3L8pj5X011004 for ; Fri, 21 Apr 2017 18:51:45 +1000 Received: from [9.193.77.196] ([9.193.77.196]) by d23av06.au.ibm.com (8.14.4/8.14.4/NCO v10.0 AVin) with ESMTP id v3L8pieE010735; Fri, 21 Apr 2017 18:51:44 +1000 To: Alexey Kardashevskiy References: <20170420072402.38106-1-aik@ozlabs.ru> <20170420072402.38106-6-aik@ozlabs.ru> <12566b0a-8f9a-4040-a37d-2a106e49adcf@ozlabs.ru> <6e669e2b-2cfd-078d-b6b0-5c3819fad796@ozlabs.ru> <3e20a6f7-a1b7-4b13-5659-5afb827563ca@linux.vnet.ibm.com> <4977b4e8-e63a-0621-2375-89066d8de10a@ozlabs.ru> Cc: Jonas Pfefferle1 , Gowrishankar Muthukrishnan , Adrian Schuepbach , "dev@dpdk.org" From: gowrishankar muthukrishnan Date: Fri, 21 Apr 2017 14:21:22 +0530 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.7.0 MIME-Version: 1.0 In-Reply-To: <4977b4e8-e63a-0621-2375-89066d8de10a@ozlabs.ru> Content-Type: text/plain; charset=koi8-r; format=flowed Content-Transfer-Encoding: 7bit X-TM-AS-MML: disable x-cbid: 17042108-0048-0000-0000-000002201FE3 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17042108-0049-0000-0000-000047CD8868 Message-Id: <33f252b9-a379-073a-75ca-c43aa5e80f2d@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2017-04-21_06:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=48 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1703280000 definitions=main-1704210164 Subject: Re: [dpdk-dev] [PATCH dpdk 5/5] RFC: vfio/ppc64/spapr: Use correct bus addresses for DMA map X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Apr 2017 08:52:30 -0000 On Friday 21 April 2017 09:12 AM, Alexey Kardashevskiy wrote: > On 21/04/17 05:16, gowrishankar muthukrishnan wrote: >> On Thursday 20 April 2017 07:52 PM, Alexey Kardashevskiy wrote: >>> On 20/04/17 23:25, Alexey Kardashevskiy wrote: >>>> On 20/04/17 19:04, Jonas Pfefferle1 wrote: >>>>> Alexey Kardashevskiy wrote on 20/04/2017 09:24:02: >>>>> >>>>>> From: Alexey Kardashevskiy >>>>>> To: dev@dpdk.org >>>>>> Cc: Alexey Kardashevskiy , JPF@zurich.ibm.com, >>>>>> Gowrishankar Muthukrishnan >>>>>> Date: 20/04/2017 09:24 >>>>>> Subject: [PATCH dpdk 5/5] RFC: vfio/ppc64/spapr: Use correct bus >>>>>> addresses for DMA map >>>>>> >>>>>> VFIO_IOMMU_SPAPR_TCE_CREATE ioctl() returns the actual bus address for >>>>>> just created DMA window. It happens to start from zero because the >>>>>> default >>>>>> window is removed (leaving no windows) and new window starts from zero. >>>>>> However this is not guaranteed and the new window may start from another >>>>>> address, this adds an error check. >>>>>> >>>>>> Another issue is that IOVA passed to VFIO_IOMMU_MAP_DMA should be a PCI >>>>>> bus address while in this case a physical address of a user page is used. >>>>>> This changes IOVA to start from zero in a hope that the rest of DPDK >>>>>> expects this. >>>>> This is not the case. DPDK expects a 1:1 mapping PA==IOVA. It will use the >>>>> phys_addr of the memory segment it got from /proc/self/pagemap cf. >>>>> librte_eal/linuxapp/eal/eal_memory.c. We could try setting it here to the >>>>> actual iova which basically makes the whole virtual to phyiscal mapping >>>>> with pagemap unnecessary which I believe should be the case for VFIO >>>>> anyway. Pagemap should only be needed when using pci_uio. >>>> Ah, ok, makes sense now. But it sure needs a big fat comment there as it is >>>> not obvious why host RAM address is used there as DMA window start is not >>>> guaranteed. >>> Well, either way there is some bug - ms[i].phys_addr and ms[i].addr_64 both >>> have exact same value, in my setup it is 3fffb33c0000 which is a userspace >>> address - at least ms[i].phys_addr must be physical address. >> This patch breaks i40e_dev_init() in my server. >> >> EAL: PCI device 0004:01:00.0 on NUMA socket 1 >> EAL: probe driver: 8086:1583 net_i40e >> EAL: using IOMMU type 7 (sPAPR) >> eth_i40e_dev_init(): Failed to init adminq: -32 >> EAL: Releasing pci mapped resource for 0004:01:00.0 >> EAL: Calling pci_unmap_resource for 0004:01:00.0 at 0x3fff82aa0000 >> EAL: Requested device 0004:01:00.0 cannot be used >> EAL: PCI device 0004:01:00.1 on NUMA socket 1 >> EAL: probe driver: 8086:1583 net_i40e >> EAL: using IOMMU type 7 (sPAPR) >> eth_i40e_dev_init(): Failed to init adminq: -32 >> EAL: Releasing pci mapped resource for 0004:01:00.1 >> EAL: Calling pci_unmap_resource for 0004:01:00.1 at 0x3fff82aa0000 >> EAL: Requested device 0004:01:00.1 cannot be used >> EAL: No probed ethernet devices >> >> I have two memseg each of 1G size. Their mapped PA and VA are also different. >> >> (gdb) p /x ms[0] >> $3 = {phys_addr = 0x1e0b000000, {addr = 0x3effaf000000, addr_64 = >> 0x3effaf000000}, >> len = 0x40000000, hugepage_sz = 0x1000000, socket_id = 0x1, nchannel = >> 0x0, nrank = 0x0} >> (gdb) p /x ms[1] >> $4 = {phys_addr = 0xf6d000000, {addr = 0x3efbaf000000, addr_64 = >> 0x3efbaf000000}, >> len = 0x40000000, hugepage_sz = 0x1000000, socket_id = 0x0, nchannel = >> 0x0, nrank = 0x0} >> >> Could you please recheck this. May be, if new DMA window does not start >> from bus address 0, >> only then you reset dma_map.iova for this offset ? > As we figured out, it is --no-huge effect. > > Another thing - as I read the code - the window size comes from > rte_eal_get_physmem_size(). On my 512GB machine, DPDK allocates only 16GB > window so it is far away from 1:1 mapping which is believed to be DPDK > expectation. Looking now for a better version of rte_eal_get_physmem_size()... If your mem segs are more in count (not contiguous unless reserved in boot time), you could check CONFIG_RTE_MAX_NUMA_NODES and CONFIG_RTE_MAX_MEMSEG ?. Thanks, Gowrishankar > And another problem - after few unsuccessful starts of app/testpmd, all > huge pages are gone: > > aik@stratton2:~$ cat /proc/meminfo > MemTotal: 535527296 kB > MemFree: 516662272 kB > MemAvailable: 515501696 kB > ... > HugePages_Total: 1024 > HugePages_Free: 0 > HugePages_Rsvd: 0 > HugePages_Surp: 0 > Hugepagesize: 16384 kB > > > How is that possible? What is pinning these pages so testpmd process exit > does not clear that up? > > > >> >> Thanks, >> Gowrishankar >> >>>>>> Signed-off-by: Alexey Kardashevskiy >>>>>> --- >>>>>> lib/librte_eal/linuxapp/eal/eal_vfio.c | 12 ++++++++++-- >>>>>> 1 file changed, 10 insertions(+), 2 deletions(-) >>>>>> >>>>>> diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/ >>>>>> librte_eal/linuxapp/eal/eal_vfio.c >>>>>> index 46f951f4d..8b8e75c4f 100644 >>>>>> --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c >>>>>> +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c >>>>>> @@ -658,7 +658,7 @@ vfio_spapr_dma_map(int vfio_container_fd) >>>>>> { >>>>>> const struct rte_memseg *ms = rte_eal_get_physmem_layout(); >>>>>> int i, ret; >>>>>> - >>>>>> + phys_addr_t io_offset; >>>>>> struct vfio_iommu_spapr_register_memory reg = { >>>>>> .argsz = sizeof(reg), >>>>>> .flags = 0 >>>>>> @@ -702,6 +702,13 @@ vfio_spapr_dma_map(int vfio_container_fd) >>>>>> return -1; >>>>>> } >>>>>> + io_offset = create.start_addr; >>>>>> + if (io_offset) { >>>>>> + RTE_LOG(ERR, EAL, " DMA offsets other than zero is not >>>>>> supported, " >>>>>> + "new window is created at %lx\n", io_offset); >>>>>> + return -1; >>>>>> + } >>>>>> + >>>>>> /* map all DPDK segments for DMA. use 1:1 PA to IOVA mapping */ >>>>>> for (i = 0; i < RTE_MAX_MEMSEG; i++) { >>>>>> struct vfio_iommu_type1_dma_map dma_map; >>>>>> @@ -723,7 +730,7 @@ vfio_spapr_dma_map(int vfio_container_fd) >>>>>> dma_map.argsz = sizeof(struct vfio_iommu_type1_dma_map); >>>>>> dma_map.vaddr = ms[i].addr_64; >>>>>> dma_map.size = ms[i].len; >>>>>> - dma_map.iova = ms[i].phys_addr; >>>>>> + dma_map.iova = io_offset; >>>>>> dma_map.flags = VFIO_DMA_MAP_FLAG_READ | >>>>>> VFIO_DMA_MAP_FLAG_WRITE; >>>>>> @@ -735,6 +742,7 @@ vfio_spapr_dma_map(int vfio_container_fd) >>>>>> return -1; >>>>>> } >>>>>> + io_offset += dma_map.size; >>>>>> } >>>>>> return 0; >>>>>> -- >>>>>> 2.11.0 >>>>>> >> >