From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-io0-f193.google.com (mail-io0-f193.google.com [209.85.223.193]) by dpdk.org (Postfix) with ESMTP id A88425905 for ; Fri, 21 Apr 2017 05:42:40 +0200 (CEST) Received: by mail-io0-f193.google.com with SMTP id h41so25016836ioi.1 for ; Thu, 20 Apr 2017 20:42:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ozlabs-ru.20150623.gappssmtp.com; s=20150623; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=u3DwWAUpyO0OHzhjIVHhREvQ3eFWwS0jXy/QWCIneTw=; b=lpVzJzxKkuMQHs8Z1SRTt5UmbTJ6UCJXnJK56KM1RFVM6anWqlyrKuDwIYI0eHdjY3 huXXl7g5ECl9OQx3AJ3MXDzJoYKOpII6b5c4bmTxMy8bYyMVzaK4lRpxP5IgUCaxiRHC dG5lmdNqpqybPS7WWzeW1TE+/K3NFr99OegGEW+/rFa+4XmnpaoDfti+GijWR+M6dbbK 5clTSS7RWf++x2lkBc+1s6SZjghF8EuXeLDitbeRzS4vzMNAMq2Wz64dsu/A7pa5mq0u YfJjKJ0Qg6ciohCpQLSbnEohfaONwi0Em14gq+YZoPOT0IcFa9vrF0eE/BYSYZ88WKzx TeMA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=u3DwWAUpyO0OHzhjIVHhREvQ3eFWwS0jXy/QWCIneTw=; b=kg+QY80euHquUe9J7VmkHEwrIn719o8Am8JrlQLg0FA2pwZ0hUySUhqvuUPlVVZwto QOJHEkqXW0Pjg6z9SKJ2xMIyFU2Bwwe+93edxbNKeHni0tlBKjl4B8Lh8/n0IkOtcwSf kKvohmr10ECzi22MKaymITsFyiSIyqvUYUUetuL00OgvBRHuWIs3NDbtWH0nsgmW8OA+ E75NFprLf8tr/RBN57FXKC9+k4ixSaYTg/Y8uwCCSicabdb06T082sEzMBtD+4GYVJ2C VAb4JuKDtpruWdph1u/aYQam8BP6LvwSwKpkOJwGFKWk8dbSnH4L8UKW2HEvj+IBfiEE hW3A== X-Gm-Message-State: AN3rC/78yIcfoS3QQMaHbGDrkta3vrd//5LD3eB/DClyF5iR/EmwTut4 17ZvgEPTaCJ0HtNt X-Received: by 10.99.176.77 with SMTP id z13mr10386032pgo.126.1492746159709; Thu, 20 Apr 2017 20:42:39 -0700 (PDT) Received: from [10.61.2.175] ([122.99.82.10]) by smtp.googlemail.com with ESMTPSA id y29sm12853786pfj.90.2017.04.20.20.42.37 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 20 Apr 2017 20:42:39 -0700 (PDT) To: gowrishankar muthukrishnan Cc: Jonas Pfefferle1 , Gowrishankar Muthukrishnan , Adrian Schuepbach , "dev@dpdk.org" References: <20170420072402.38106-1-aik@ozlabs.ru> <20170420072402.38106-6-aik@ozlabs.ru> <12566b0a-8f9a-4040-a37d-2a106e49adcf@ozlabs.ru> <6e669e2b-2cfd-078d-b6b0-5c3819fad796@ozlabs.ru> <3e20a6f7-a1b7-4b13-5659-5afb827563ca@linux.vnet.ibm.com> From: Alexey Kardashevskiy Message-ID: <4977b4e8-e63a-0621-2375-89066d8de10a@ozlabs.ru> Date: Fri, 21 Apr 2017 13:42:35 +1000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.0 MIME-Version: 1.0 In-Reply-To: <3e20a6f7-a1b7-4b13-5659-5afb827563ca@linux.vnet.ibm.com> Content-Type: text/plain; charset=koi8-r Content-Language: en-AU Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [PATCH dpdk 5/5] RFC: vfio/ppc64/spapr: Use correct bus addresses for DMA map X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Apr 2017 03:42:41 -0000 On 21/04/17 05:16, gowrishankar muthukrishnan wrote: > On Thursday 20 April 2017 07:52 PM, Alexey Kardashevskiy wrote: >> On 20/04/17 23:25, Alexey Kardashevskiy wrote: >>> On 20/04/17 19:04, Jonas Pfefferle1 wrote: >>>> Alexey Kardashevskiy wrote on 20/04/2017 09:24:02: >>>> >>>>> From: Alexey Kardashevskiy >>>>> To: dev@dpdk.org >>>>> Cc: Alexey Kardashevskiy , JPF@zurich.ibm.com, >>>>> Gowrishankar Muthukrishnan >>>>> Date: 20/04/2017 09:24 >>>>> Subject: [PATCH dpdk 5/5] RFC: vfio/ppc64/spapr: Use correct bus >>>>> addresses for DMA map >>>>> >>>>> VFIO_IOMMU_SPAPR_TCE_CREATE ioctl() returns the actual bus address for >>>>> just created DMA window. It happens to start from zero because the >>>>> default >>>>> window is removed (leaving no windows) and new window starts from zero. >>>>> However this is not guaranteed and the new window may start from another >>>>> address, this adds an error check. >>>>> >>>>> Another issue is that IOVA passed to VFIO_IOMMU_MAP_DMA should be a PCI >>>>> bus address while in this case a physical address of a user page is used. >>>>> This changes IOVA to start from zero in a hope that the rest of DPDK >>>>> expects this. >>>> This is not the case. DPDK expects a 1:1 mapping PA==IOVA. It will use the >>>> phys_addr of the memory segment it got from /proc/self/pagemap cf. >>>> librte_eal/linuxapp/eal/eal_memory.c. We could try setting it here to the >>>> actual iova which basically makes the whole virtual to phyiscal mapping >>>> with pagemap unnecessary which I believe should be the case for VFIO >>>> anyway. Pagemap should only be needed when using pci_uio. >>> >>> Ah, ok, makes sense now. But it sure needs a big fat comment there as it is >>> not obvious why host RAM address is used there as DMA window start is not >>> guaranteed. >> Well, either way there is some bug - ms[i].phys_addr and ms[i].addr_64 both >> have exact same value, in my setup it is 3fffb33c0000 which is a userspace >> address - at least ms[i].phys_addr must be physical address. > > This patch breaks i40e_dev_init() in my server. > > EAL: PCI device 0004:01:00.0 on NUMA socket 1 > EAL: probe driver: 8086:1583 net_i40e > EAL: using IOMMU type 7 (sPAPR) > eth_i40e_dev_init(): Failed to init adminq: -32 > EAL: Releasing pci mapped resource for 0004:01:00.0 > EAL: Calling pci_unmap_resource for 0004:01:00.0 at 0x3fff82aa0000 > EAL: Requested device 0004:01:00.0 cannot be used > EAL: PCI device 0004:01:00.1 on NUMA socket 1 > EAL: probe driver: 8086:1583 net_i40e > EAL: using IOMMU type 7 (sPAPR) > eth_i40e_dev_init(): Failed to init adminq: -32 > EAL: Releasing pci mapped resource for 0004:01:00.1 > EAL: Calling pci_unmap_resource for 0004:01:00.1 at 0x3fff82aa0000 > EAL: Requested device 0004:01:00.1 cannot be used > EAL: No probed ethernet devices > > I have two memseg each of 1G size. Their mapped PA and VA are also different. > > (gdb) p /x ms[0] > $3 = {phys_addr = 0x1e0b000000, {addr = 0x3effaf000000, addr_64 = > 0x3effaf000000}, > len = 0x40000000, hugepage_sz = 0x1000000, socket_id = 0x1, nchannel = > 0x0, nrank = 0x0} > (gdb) p /x ms[1] > $4 = {phys_addr = 0xf6d000000, {addr = 0x3efbaf000000, addr_64 = > 0x3efbaf000000}, > len = 0x40000000, hugepage_sz = 0x1000000, socket_id = 0x0, nchannel = > 0x0, nrank = 0x0} > > Could you please recheck this. May be, if new DMA window does not start > from bus address 0, > only then you reset dma_map.iova for this offset ? As we figured out, it is --no-huge effect. Another thing - as I read the code - the window size comes from rte_eal_get_physmem_size(). On my 512GB machine, DPDK allocates only 16GB window so it is far away from 1:1 mapping which is believed to be DPDK expectation. Looking now for a better version of rte_eal_get_physmem_size()... And another problem - after few unsuccessful starts of app/testpmd, all huge pages are gone: aik@stratton2:~$ cat /proc/meminfo MemTotal: 535527296 kB MemFree: 516662272 kB MemAvailable: 515501696 kB ... HugePages_Total: 1024 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 16384 kB How is that possible? What is pinning these pages so testpmd process exit does not clear that up? > > > Thanks, > Gowrishankar > >> >>> >>>>> Signed-off-by: Alexey Kardashevskiy >>>>> --- >>>>> lib/librte_eal/linuxapp/eal/eal_vfio.c | 12 ++++++++++-- >>>>> 1 file changed, 10 insertions(+), 2 deletions(-) >>>>> >>>>> diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/ >>>>> librte_eal/linuxapp/eal/eal_vfio.c >>>>> index 46f951f4d..8b8e75c4f 100644 >>>>> --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c >>>>> +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c >>>>> @@ -658,7 +658,7 @@ vfio_spapr_dma_map(int vfio_container_fd) >>>>> { >>>>> const struct rte_memseg *ms = rte_eal_get_physmem_layout(); >>>>> int i, ret; >>>>> - >>>>> + phys_addr_t io_offset; >>>>> struct vfio_iommu_spapr_register_memory reg = { >>>>> .argsz = sizeof(reg), >>>>> .flags = 0 >>>>> @@ -702,6 +702,13 @@ vfio_spapr_dma_map(int vfio_container_fd) >>>>> return -1; >>>>> } >>>>> + io_offset = create.start_addr; >>>>> + if (io_offset) { >>>>> + RTE_LOG(ERR, EAL, " DMA offsets other than zero is not >>>>> supported, " >>>>> + "new window is created at %lx\n", io_offset); >>>>> + return -1; >>>>> + } >>>>> + >>>>> /* map all DPDK segments for DMA. use 1:1 PA to IOVA mapping */ >>>>> for (i = 0; i < RTE_MAX_MEMSEG; i++) { >>>>> struct vfio_iommu_type1_dma_map dma_map; >>>>> @@ -723,7 +730,7 @@ vfio_spapr_dma_map(int vfio_container_fd) >>>>> dma_map.argsz = sizeof(struct vfio_iommu_type1_dma_map); >>>>> dma_map.vaddr = ms[i].addr_64; >>>>> dma_map.size = ms[i].len; >>>>> - dma_map.iova = ms[i].phys_addr; >>>>> + dma_map.iova = io_offset; >>>>> dma_map.flags = VFIO_DMA_MAP_FLAG_READ | >>>>> VFIO_DMA_MAP_FLAG_WRITE; >>>>> @@ -735,6 +742,7 @@ vfio_spapr_dma_map(int vfio_container_fd) >>>>> return -1; >>>>> } >>>>> + io_offset += dma_map.size; >>>>> } >>>>> return 0; >>>>> -- >>>>> 2.11.0 >>>>> >>> >> > > -- Alexey