DPDK patches and discussions
 help / color / mirror / Atom feed
From: Alexey Kardashevskiy <aik@ozlabs.ru>
To: gowrishankar muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
Cc: Jonas Pfefferle1 <JPF@zurich.ibm.com>,
	Gowrishankar Muthukrishnan <gowrishankar.m@in.ibm.com>,
	Adrian Schuepbach <DRI@zurich.ibm.com>,
	"dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH dpdk 5/5] RFC: vfio/ppc64/spapr: Use correct bus addresses for DMA map
Date: Fri, 21 Apr 2017 18:43:53 +1000	[thread overview]
Message-ID: <2354a035-1c97-b9b4-9a15-d62a26d6d160@ozlabs.ru> (raw)
In-Reply-To: <4977b4e8-e63a-0621-2375-89066d8de10a@ozlabs.ru>

On 21/04/17 13:42, Alexey Kardashevskiy wrote:
> On 21/04/17 05:16, gowrishankar muthukrishnan wrote:
>> On Thursday 20 April 2017 07:52 PM, Alexey Kardashevskiy wrote:
>>> On 20/04/17 23:25, Alexey Kardashevskiy wrote:
>>>> On 20/04/17 19:04, Jonas Pfefferle1 wrote:
>>>>> Alexey Kardashevskiy <aik@ozlabs.ru> wrote on 20/04/2017 09:24:02:
>>>>>
>>>>>> From: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>>>> To: dev@dpdk.org
>>>>>> Cc: Alexey Kardashevskiy <aik@ozlabs.ru>, JPF@zurich.ibm.com,
>>>>>> Gowrishankar Muthukrishnan <gowrishankar.m@in.ibm.com>
>>>>>> Date: 20/04/2017 09:24
>>>>>> Subject: [PATCH dpdk 5/5] RFC: vfio/ppc64/spapr: Use correct bus
>>>>>> addresses for DMA map
>>>>>>
>>>>>> VFIO_IOMMU_SPAPR_TCE_CREATE ioctl() returns the actual bus address for
>>>>>> just created DMA window. It happens to start from zero because the
>>>>>> default
>>>>>> window is removed (leaving no windows) and new window starts from zero.
>>>>>> However this is not guaranteed and the new window may start from another
>>>>>> address, this adds an error check.
>>>>>>
>>>>>> Another issue is that IOVA passed to VFIO_IOMMU_MAP_DMA should be a PCI
>>>>>> bus address while in this case a physical address of a user page is used.
>>>>>> This changes IOVA to start from zero in a hope that the rest of DPDK
>>>>>> expects this.
>>>>> This is not the case. DPDK expects a 1:1 mapping PA==IOVA. It will use the
>>>>> phys_addr of the memory segment it got from /proc/self/pagemap cf.
>>>>> librte_eal/linuxapp/eal/eal_memory.c. We could try setting it here to the
>>>>> actual iova which basically makes the whole virtual to phyiscal mapping
>>>>> with pagemap unnecessary which I believe should be the case for VFIO
>>>>> anyway. Pagemap should only be needed when using pci_uio.
>>>>
>>>> Ah, ok, makes sense now. But it sure needs a big fat comment there as it is
>>>> not obvious why host RAM address is used there as DMA window start is not
>>>> guaranteed.
>>> Well, either way there is some bug - ms[i].phys_addr and ms[i].addr_64 both
>>> have exact same value, in my setup it is 3fffb33c0000 which is a userspace
>>> address - at least ms[i].phys_addr must be physical address.
>>
>> This patch breaks i40e_dev_init() in my server.
>>
>> EAL: PCI device 0004:01:00.0 on NUMA socket 1
>> EAL:   probe driver: 8086:1583 net_i40e
>> EAL:   using IOMMU type 7 (sPAPR)
>> eth_i40e_dev_init(): Failed to init adminq: -32
>> EAL: Releasing pci mapped resource for 0004:01:00.0
>> EAL: Calling pci_unmap_resource for 0004:01:00.0 at 0x3fff82aa0000
>> EAL: Requested device 0004:01:00.0 cannot be used
>> EAL: PCI device 0004:01:00.1 on NUMA socket 1
>> EAL:   probe driver: 8086:1583 net_i40e
>> EAL:   using IOMMU type 7 (sPAPR)
>> eth_i40e_dev_init(): Failed to init adminq: -32
>> EAL: Releasing pci mapped resource for 0004:01:00.1
>> EAL: Calling pci_unmap_resource for 0004:01:00.1 at 0x3fff82aa0000
>> EAL: Requested device 0004:01:00.1 cannot be used
>> EAL: No probed ethernet devices
>>
>> I have two memseg each of 1G size. Their mapped PA and VA are also different.
>>
>> (gdb) p /x ms[0]
>> $3 = {phys_addr = 0x1e0b000000, {addr = 0x3effaf000000, addr_64 =
>> 0x3effaf000000},
>>   len = 0x40000000, hugepage_sz = 0x1000000, socket_id = 0x1, nchannel =
>> 0x0, nrank = 0x0}
>> (gdb) p /x ms[1]
>> $4 = {phys_addr = 0xf6d000000, {addr = 0x3efbaf000000, addr_64 =
>> 0x3efbaf000000},
>>   len = 0x40000000, hugepage_sz = 0x1000000, socket_id = 0x0, nchannel =
>> 0x0, nrank = 0x0}
>>
>> Could you please recheck this. May be, if new DMA window does not start
>> from bus address 0,
>> only then you reset dma_map.iova for this offset ?
> 
> As we figured out, it is --no-huge effect.
> 
> Another thing - as I read the code - the window size comes from
> rte_eal_get_physmem_size(). On my 512GB machine, DPDK allocates only 16GB
> window so it is far away from 1:1 mapping which is believed to be DPDK
> expectation. Looking now for a better version of rte_eal_get_physmem_size()...


I have not found any helper to get a total RAM size or
round-up-to-power-of-two - I could look through memory segments, find the
one with highest ending physical address, round it up to power of two
(requirement on POWER8 platform for a DMA window size) and use it as a DMA
window size - is there kernel's order_base_2() analog?


> 
> 
> And another problem - after few unsuccessful starts of app/testpmd, all
> huge pages are gone:
> 
> aik@stratton2:~$ cat /proc/meminfo
> MemTotal:       535527296 kB
> MemFree:        516662272 kB
> MemAvailable:   515501696 kB
> ...
> HugePages_Total:    1024
> HugePages_Free:        0
> HugePages_Rsvd:        0
> HugePages_Surp:        0
> Hugepagesize:      16384 kB
> 
> 
> How is that possible? What is pinning these pages so testpmd process exit
> does not clear that up?

Still not clear, any ideas why might be causing this?



btw what is the correct way of running DPDK with hugepages?

I basically create a folder in ~aik/hugepages and do
sudo mount -t hugetlbfs hugetlbfs ~aik/hugepages
sudo sysctl vm.nr_hugepages=4096

This creates bunch of pages:
aik@stratton2:~$ cat /proc/meminfo | grep HugePage
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
HugePages_Total:    4096
HugePages_Free:     4096
HugePages_Rsvd:        0
HugePages_Surp:        0


And then I am watching testpmd to detect hugepages (it does see 4096 16MB
pages) to allocate pages:
rte_eal_hugepage_init() calls map_all_hugepages(... orig=1) - here all 4096
pages are allocated, then it calls map_all_hugepages(... orig=0) - and here
I get lots of "EAL: Cannot get a virtual area: Cannot allocate memory" due
to obvious reason - all pages are allocated. Since you folks have this
tested somehow - what am I doing wrong? :) This is all very confusing -
what is that orig=0/1 business is all about?




> 
> 
> 
> 
>>
>>
>> Thanks,
>> Gowrishankar
>>
>>>
>>>>
>>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>>>> ---
>>>>>>   lib/librte_eal/linuxapp/eal/eal_vfio.c | 12 ++++++++++--
>>>>>>   1 file changed, 10 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/
>>>>>> librte_eal/linuxapp/eal/eal_vfio.c
>>>>>> index 46f951f4d..8b8e75c4f 100644
>>>>>> --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
>>>>>> +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
>>>>>> @@ -658,7 +658,7 @@ vfio_spapr_dma_map(int vfio_container_fd)
>>>>>>   {
>>>>>>      const struct rte_memseg *ms = rte_eal_get_physmem_layout();
>>>>>>      int i, ret;
>>>>>> -
>>>>>> +   phys_addr_t io_offset;
>>>>>>      struct vfio_iommu_spapr_register_memory reg = {
>>>>>>         .argsz = sizeof(reg),
>>>>>>         .flags = 0
>>>>>> @@ -702,6 +702,13 @@ vfio_spapr_dma_map(int vfio_container_fd)
>>>>>>         return -1;
>>>>>>      }
>>>>>>   +   io_offset = create.start_addr;
>>>>>> +   if (io_offset) {
>>>>>> +      RTE_LOG(ERR, EAL, "  DMA offsets other than zero is not
>>>>>> supported, "
>>>>>> +            "new window is created at %lx\n", io_offset);
>>>>>> +      return -1;
>>>>>> +   }
>>>>>> +
>>>>>>      /* map all DPDK segments for DMA. use 1:1 PA to IOVA mapping */
>>>>>>      for (i = 0; i < RTE_MAX_MEMSEG; i++) {
>>>>>>         struct vfio_iommu_type1_dma_map dma_map;
>>>>>> @@ -723,7 +730,7 @@ vfio_spapr_dma_map(int vfio_container_fd)
>>>>>>         dma_map.argsz = sizeof(struct vfio_iommu_type1_dma_map);
>>>>>>         dma_map.vaddr = ms[i].addr_64;
>>>>>>         dma_map.size = ms[i].len;
>>>>>> -      dma_map.iova = ms[i].phys_addr;
>>>>>> +      dma_map.iova = io_offset;
>>>>>>         dma_map.flags = VFIO_DMA_MAP_FLAG_READ |
>>>>>>                VFIO_DMA_MAP_FLAG_WRITE;
>>>>>>   @@ -735,6 +742,7 @@ vfio_spapr_dma_map(int vfio_container_fd)
>>>>>>            return -1;
>>>>>>         }
>>>>>>   +      io_offset += dma_map.size;
>>>>>>      }
>>>>>>        return 0;
>>>>>> -- 
>>>>>> 2.11.0
>>>>>>
>>>>
>>>
>>
>>
> 
> 


-- 
Alexey

  reply	other threads:[~2017-04-21  8:44 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-20  7:23 [dpdk-dev] [PATCH dpdk 0/5] ppc64/spapr: Attempt to use on POWER8 Alexey Kardashevskiy
2017-04-20  7:23 ` [dpdk-dev] [PATCH dpdk 1/5] vfio/ppc64/spapr: Use correct structures for add/remove windows Alexey Kardashevskiy
2017-04-20  7:23 ` [dpdk-dev] [PATCH dpdk 2/5] pci: Initialize common rte driver pointer Alexey Kardashevskiy
2017-04-24  9:28   ` Burakov, Anatoly
2017-04-20  7:24 ` [dpdk-dev] [PATCH dpdk 3/5] RFC: bnx2x: Update firmware versions Alexey Kardashevskiy
2017-04-20  7:24 ` [dpdk-dev] [PATCH dpdk 4/5] vfio: Do try setting IOMMU type if already set Alexey Kardashevskiy
2017-04-20 19:31   ` gowrishankar muthukrishnan
2017-04-21  8:54   ` Andrew Rybchenko
2017-04-26  7:50     ` Alexey Kardashevskiy
2017-04-26  8:27       ` Burakov, Anatoly
2017-04-26  8:45         ` Alejandro Lucero
2017-04-26  8:58           ` Burakov, Anatoly
2017-04-26 10:24             ` Alejandro Lucero
2017-04-20  7:24 ` [dpdk-dev] [PATCH dpdk 5/5] RFC: vfio/ppc64/spapr: Use correct bus addresses for DMA map Alexey Kardashevskiy
2017-04-20  9:04   ` Jonas Pfefferle1
2017-04-20 13:25     ` Alexey Kardashevskiy
2017-04-20 14:22       ` Alexey Kardashevskiy
2017-04-20 15:15         ` Jonas Pfefferle1
2017-04-20 22:01           ` Alexey Kardashevskiy
2017-04-20 19:16         ` gowrishankar muthukrishnan
2017-04-21  3:42           ` Alexey Kardashevskiy
2017-04-21  8:43             ` Alexey Kardashevskiy [this message]
     [not found]               ` <OF6F33ED54.7950E1EF-ONC1258109.003295E3-C1258109.00333E2E@notes.na.collabserv.com>
2017-04-22  0:12                 ` Alexey Kardashevskiy
2017-04-24  9:40                   ` Burakov, Anatoly
2017-04-21  8:51             ` gowrishankar muthukrishnan
     [not found]             ` <OF45247CC5.192F9D29-ONC1258109.002D6497-C1258109.002F2868@notes.na.collabserv.com>
2017-04-21  8:59               ` Alexey Kardashevskiy
2017-04-22 21:11 ` [dpdk-dev] [PATCH dpdk 0/5] ppc64/spapr: Attempt to use on POWER8 Olga Shern
2017-04-23 13:35   ` Alexey Kardashevskiy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2354a035-1c97-b9b4-9a15-d62a26d6d160@ozlabs.ru \
    --to=aik@ozlabs.ru \
    --cc=DRI@zurich.ibm.com \
    --cc=JPF@zurich.ibm.com \
    --cc=dev@dpdk.org \
    --cc=gowrishankar.m@in.ibm.com \
    --cc=gowrishankar.m@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).