From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-io0-f194.google.com (mail-io0-f194.google.com [209.85.223.194]) by dpdk.org (Postfix) with ESMTP id 5D3032BAE for ; Fri, 21 Apr 2017 10:44:00 +0200 (CEST) Received: by mail-io0-f194.google.com with SMTP id h41so27878136ioi.1 for ; Fri, 21 Apr 2017 01:44:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ozlabs-ru.20150623.gappssmtp.com; s=20150623; h=subject:from:to:cc:references:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=Q+NjFCBdm0BSf0H+E5CXlnYUEI9R+9ibr6bRRHMKhvE=; b=AZ5ChGbaWf9JqH7ADHQhNIFwXGAy74BK+OgCXrlz1mGtROZ2LkIW+7NBOyujGaOLmz qaumVD+PTWhRTuBfLa2Nm6AefZ0mGOsSLDuNMG7NNnplKoaFkA1BYVA4R1S7XvY9j+pw t2KPTP1K1rGO6INmljq6doOEAyJfpPypaUj4CvUidK+MCaDS39d5/OzYtnSMutypf4cu 383oh7Jz5ipnP4G9cgXG5RtUSGrITRzQvQYGJBqKu2oh7jtTto5ssBBbVO1QXecDgTmr Sa+lhgu9ZL/v5JUbJGVz82DRaSda1j3dapOSxIZV14CZ63FaPRkLtXpPztPWI6pn4GHm 2l0A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:cc:references:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=Q+NjFCBdm0BSf0H+E5CXlnYUEI9R+9ibr6bRRHMKhvE=; b=M7G63DVxzwDIzLEcWgvVtJrabkjf88QOXcmLsTb84USl6C3FmikQqgatoZNjxJoGyX Q/mbqWn72fviXY8R8UM0eZrsIcDijR8bElJ3OW3WNxVXepSzNvqiz+Bnrzjzz7dqsiUs Y9aoX0wjz7G0qg/EN9x9Fd3O1oLOyISBpOdLiI8djcCW9F8EVIPMAlQX4USoz1Bri8/V g27FJGag8CJP1tShJXuybpQBeRfHq2dM7/hYFWbbnQM8vTvh9awLGQ5nMByDfkmaY4P2 opUBTWq8N8duXWyu4qKh+xhj3Dh827p+vXBzkgVMfiilPJCpCDw4gJvNrU8SvUZfKLQ6 rVZg== X-Gm-Message-State: AN3rC/7Al+x+b1qHzpbgk4IyJePo2yz7cira7YnIAktXizqRJr9say0e TBpLvMGkVJimyw== X-Received: by 10.99.209.5 with SMTP id k5mr11244410pgg.58.1492764239660; Fri, 21 Apr 2017 01:43:59 -0700 (PDT) Received: from [10.61.2.175] ([122.99.82.10]) by smtp.googlemail.com with ESMTPSA id v86sm14688266pfa.86.2017.04.21.01.43.55 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 21 Apr 2017 01:43:58 -0700 (PDT) From: Alexey Kardashevskiy To: gowrishankar muthukrishnan Cc: Jonas Pfefferle1 , Gowrishankar Muthukrishnan , Adrian Schuepbach , "dev@dpdk.org" References: <20170420072402.38106-1-aik@ozlabs.ru> <20170420072402.38106-6-aik@ozlabs.ru> <12566b0a-8f9a-4040-a37d-2a106e49adcf@ozlabs.ru> <6e669e2b-2cfd-078d-b6b0-5c3819fad796@ozlabs.ru> <3e20a6f7-a1b7-4b13-5659-5afb827563ca@linux.vnet.ibm.com> <4977b4e8-e63a-0621-2375-89066d8de10a@ozlabs.ru> Message-ID: <2354a035-1c97-b9b4-9a15-d62a26d6d160@ozlabs.ru> Date: Fri, 21 Apr 2017 18:43:53 +1000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.0 MIME-Version: 1.0 In-Reply-To: <4977b4e8-e63a-0621-2375-89066d8de10a@ozlabs.ru> Content-Type: text/plain; charset=koi8-r Content-Language: en-AU Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [PATCH dpdk 5/5] RFC: vfio/ppc64/spapr: Use correct bus addresses for DMA map X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Apr 2017 08:44:00 -0000 On 21/04/17 13:42, Alexey Kardashevskiy wrote: > On 21/04/17 05:16, gowrishankar muthukrishnan wrote: >> On Thursday 20 April 2017 07:52 PM, Alexey Kardashevskiy wrote: >>> On 20/04/17 23:25, Alexey Kardashevskiy wrote: >>>> On 20/04/17 19:04, Jonas Pfefferle1 wrote: >>>>> Alexey Kardashevskiy wrote on 20/04/2017 09:24:02: >>>>> >>>>>> From: Alexey Kardashevskiy >>>>>> To: dev@dpdk.org >>>>>> Cc: Alexey Kardashevskiy , JPF@zurich.ibm.com, >>>>>> Gowrishankar Muthukrishnan >>>>>> Date: 20/04/2017 09:24 >>>>>> Subject: [PATCH dpdk 5/5] RFC: vfio/ppc64/spapr: Use correct bus >>>>>> addresses for DMA map >>>>>> >>>>>> VFIO_IOMMU_SPAPR_TCE_CREATE ioctl() returns the actual bus address for >>>>>> just created DMA window. It happens to start from zero because the >>>>>> default >>>>>> window is removed (leaving no windows) and new window starts from zero. >>>>>> However this is not guaranteed and the new window may start from another >>>>>> address, this adds an error check. >>>>>> >>>>>> Another issue is that IOVA passed to VFIO_IOMMU_MAP_DMA should be a PCI >>>>>> bus address while in this case a physical address of a user page is used. >>>>>> This changes IOVA to start from zero in a hope that the rest of DPDK >>>>>> expects this. >>>>> This is not the case. DPDK expects a 1:1 mapping PA==IOVA. It will use the >>>>> phys_addr of the memory segment it got from /proc/self/pagemap cf. >>>>> librte_eal/linuxapp/eal/eal_memory.c. We could try setting it here to the >>>>> actual iova which basically makes the whole virtual to phyiscal mapping >>>>> with pagemap unnecessary which I believe should be the case for VFIO >>>>> anyway. Pagemap should only be needed when using pci_uio. >>>> >>>> Ah, ok, makes sense now. But it sure needs a big fat comment there as it is >>>> not obvious why host RAM address is used there as DMA window start is not >>>> guaranteed. >>> Well, either way there is some bug - ms[i].phys_addr and ms[i].addr_64 both >>> have exact same value, in my setup it is 3fffb33c0000 which is a userspace >>> address - at least ms[i].phys_addr must be physical address. >> >> This patch breaks i40e_dev_init() in my server. >> >> EAL: PCI device 0004:01:00.0 on NUMA socket 1 >> EAL: probe driver: 8086:1583 net_i40e >> EAL: using IOMMU type 7 (sPAPR) >> eth_i40e_dev_init(): Failed to init adminq: -32 >> EAL: Releasing pci mapped resource for 0004:01:00.0 >> EAL: Calling pci_unmap_resource for 0004:01:00.0 at 0x3fff82aa0000 >> EAL: Requested device 0004:01:00.0 cannot be used >> EAL: PCI device 0004:01:00.1 on NUMA socket 1 >> EAL: probe driver: 8086:1583 net_i40e >> EAL: using IOMMU type 7 (sPAPR) >> eth_i40e_dev_init(): Failed to init adminq: -32 >> EAL: Releasing pci mapped resource for 0004:01:00.1 >> EAL: Calling pci_unmap_resource for 0004:01:00.1 at 0x3fff82aa0000 >> EAL: Requested device 0004:01:00.1 cannot be used >> EAL: No probed ethernet devices >> >> I have two memseg each of 1G size. Their mapped PA and VA are also different. >> >> (gdb) p /x ms[0] >> $3 = {phys_addr = 0x1e0b000000, {addr = 0x3effaf000000, addr_64 = >> 0x3effaf000000}, >> len = 0x40000000, hugepage_sz = 0x1000000, socket_id = 0x1, nchannel = >> 0x0, nrank = 0x0} >> (gdb) p /x ms[1] >> $4 = {phys_addr = 0xf6d000000, {addr = 0x3efbaf000000, addr_64 = >> 0x3efbaf000000}, >> len = 0x40000000, hugepage_sz = 0x1000000, socket_id = 0x0, nchannel = >> 0x0, nrank = 0x0} >> >> Could you please recheck this. May be, if new DMA window does not start >> from bus address 0, >> only then you reset dma_map.iova for this offset ? > > As we figured out, it is --no-huge effect. > > Another thing - as I read the code - the window size comes from > rte_eal_get_physmem_size(). On my 512GB machine, DPDK allocates only 16GB > window so it is far away from 1:1 mapping which is believed to be DPDK > expectation. Looking now for a better version of rte_eal_get_physmem_size()... I have not found any helper to get a total RAM size or round-up-to-power-of-two - I could look through memory segments, find the one with highest ending physical address, round it up to power of two (requirement on POWER8 platform for a DMA window size) and use it as a DMA window size - is there kernel's order_base_2() analog? > > > And another problem - after few unsuccessful starts of app/testpmd, all > huge pages are gone: > > aik@stratton2:~$ cat /proc/meminfo > MemTotal: 535527296 kB > MemFree: 516662272 kB > MemAvailable: 515501696 kB > ... > HugePages_Total: 1024 > HugePages_Free: 0 > HugePages_Rsvd: 0 > HugePages_Surp: 0 > Hugepagesize: 16384 kB > > > How is that possible? What is pinning these pages so testpmd process exit > does not clear that up? Still not clear, any ideas why might be causing this? btw what is the correct way of running DPDK with hugepages? I basically create a folder in ~aik/hugepages and do sudo mount -t hugetlbfs hugetlbfs ~aik/hugepages sudo sysctl vm.nr_hugepages=4096 This creates bunch of pages: aik@stratton2:~$ cat /proc/meminfo | grep HugePage AnonHugePages: 0 kB ShmemHugePages: 0 kB HugePages_Total: 4096 HugePages_Free: 4096 HugePages_Rsvd: 0 HugePages_Surp: 0 And then I am watching testpmd to detect hugepages (it does see 4096 16MB pages) to allocate pages: rte_eal_hugepage_init() calls map_all_hugepages(... orig=1) - here all 4096 pages are allocated, then it calls map_all_hugepages(... orig=0) - and here I get lots of "EAL: Cannot get a virtual area: Cannot allocate memory" due to obvious reason - all pages are allocated. Since you folks have this tested somehow - what am I doing wrong? :) This is all very confusing - what is that orig=0/1 business is all about? > > > > >> >> >> Thanks, >> Gowrishankar >> >>> >>>> >>>>>> Signed-off-by: Alexey Kardashevskiy >>>>>> --- >>>>>> lib/librte_eal/linuxapp/eal/eal_vfio.c | 12 ++++++++++-- >>>>>> 1 file changed, 10 insertions(+), 2 deletions(-) >>>>>> >>>>>> diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/ >>>>>> librte_eal/linuxapp/eal/eal_vfio.c >>>>>> index 46f951f4d..8b8e75c4f 100644 >>>>>> --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c >>>>>> +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c >>>>>> @@ -658,7 +658,7 @@ vfio_spapr_dma_map(int vfio_container_fd) >>>>>> { >>>>>> const struct rte_memseg *ms = rte_eal_get_physmem_layout(); >>>>>> int i, ret; >>>>>> - >>>>>> + phys_addr_t io_offset; >>>>>> struct vfio_iommu_spapr_register_memory reg = { >>>>>> .argsz = sizeof(reg), >>>>>> .flags = 0 >>>>>> @@ -702,6 +702,13 @@ vfio_spapr_dma_map(int vfio_container_fd) >>>>>> return -1; >>>>>> } >>>>>> + io_offset = create.start_addr; >>>>>> + if (io_offset) { >>>>>> + RTE_LOG(ERR, EAL, " DMA offsets other than zero is not >>>>>> supported, " >>>>>> + "new window is created at %lx\n", io_offset); >>>>>> + return -1; >>>>>> + } >>>>>> + >>>>>> /* map all DPDK segments for DMA. use 1:1 PA to IOVA mapping */ >>>>>> for (i = 0; i < RTE_MAX_MEMSEG; i++) { >>>>>> struct vfio_iommu_type1_dma_map dma_map; >>>>>> @@ -723,7 +730,7 @@ vfio_spapr_dma_map(int vfio_container_fd) >>>>>> dma_map.argsz = sizeof(struct vfio_iommu_type1_dma_map); >>>>>> dma_map.vaddr = ms[i].addr_64; >>>>>> dma_map.size = ms[i].len; >>>>>> - dma_map.iova = ms[i].phys_addr; >>>>>> + dma_map.iova = io_offset; >>>>>> dma_map.flags = VFIO_DMA_MAP_FLAG_READ | >>>>>> VFIO_DMA_MAP_FLAG_WRITE; >>>>>> @@ -735,6 +742,7 @@ vfio_spapr_dma_map(int vfio_container_fd) >>>>>> return -1; >>>>>> } >>>>>> + io_offset += dma_map.size; >>>>>> } >>>>>> return 0; >>>>>> -- >>>>>> 2.11.0 >>>>>> >>>> >>> >> >> > > -- Alexey