From: "Burakov, Anatoly" <anatoly.burakov@intel.com>
To: David Christensen <drc@linux.vnet.ibm.com>, dev@dpdk.org
Subject: Re: [dpdk-dev] [PATCH 2/2] vfio: modify spapr iommu support to use static window sizing
Date: Tue, 5 May 2020 15:57:01 +0100 [thread overview]
Message-ID: <8f83778f-fe74-6c69-e2d0-73b9a6e7ca79@intel.com> (raw)
In-Reply-To: <782d6f04-f476-93d6-1a8f-2ed0b39dde10@linux.vnet.ibm.com>
On 01-May-20 5:48 PM, David Christensen wrote:
>>>> I'm not sure of the algorithm for "memory size" here.
>>>>
>>>> Technically, DPDK can reserve memory segments anywhere in the VA
>>>> space allocated by memseg lists. That space may be far bigger than
>>>> system memory (on a typical Intel server board you'd see 128GB of VA
>>>> space preallocated even though the machine itself might only have,
>>>> say, 16GB of RAM installed). The same applies to any other arch
>>>> running on Linux, so the window needs to cover at least
>>>> RTE_MIN(base_virtaddr, lowest memseglist VA address) and up to
>>>> highest memseglist VA address. That's not even mentioning the fact
>>>> that the user may register external memory for DMA which may cause
>>>> the window to be of insufficient size to cover said external memory.
>>>>
>>>> I also think that in general, "system memory" metric is ill suited
>>>> for measuring VA space, because unlike system memory, the VA space
>>>> is sparse and can therefore span *a lot* of address space even
>>>> though in reality it may actually use very little physical memory.
>>>
>>> I'm open to suggestions here. Perhaps an alternative in /proc/meminfo:
>>>
>>> VmallocTotal: 549755813888 kB
>>>
>>> I tested it with 1GB hugepages and it works, need to check with 2M as
>>> well. If there's no alternative for sizing the window based on
>>> available system parameters then I have another option which creates
>>> a new RTE_IOVA_TA mode that forces IOVA addresses into the range 0 to
>>> X where X is configured on the EAL command-line (--iova-base,
>>> --iova-len). I use these command-line values to create a static
>>> window.
>>>
>>
>> A whole new IOVA mode, while being a cleaner solution, would require a
>> lot of testing, and it doesn't really solve the external memory
>> problem, because we're still reliant on the user to provide IOVA
>> addresses. Perhaps something akin to VA/IOVA address reservation would
>> solve the problem, but again, lots of changes and testing, all for a
>> comparatively narrow use case.
>>
>> The vmalloc area seems big enough (512 terabytes on your machine, 32
>> terabytes on mine), so it'll probably be OK. I'd settle for:
>>
>> 1) start at base_virtaddr OR lowest memseg list address, whichever is
>> lowest
>
> The IOMMU only supports two starting addresses, 0 or 1<<59, so
> implementation will need to start at 0. (I've been bit by this before,
> my understanding is that the processor only supports 54 bits of the
> address and that the PCI host bridge uses bit 59 of the IOVA as a signal
> to do the address translation for the second DMA window.)
Fair enough, 0 it is then.
>
>> 2) end at lowest addr + VmallocTotal OR highest memseglist addr,
>> whichever is higher
>
> So, instead of rte_memseg_walk() execute rte_memseg_list_walk() to find
> the lowest/highest msl addresses?
Yep. rte_memseg_walk() will only cover allocated pages, while
rte_memseg_list_walk() will cover even empty page tables.
>
>> 3) a check in user DMA map function that would warn/throw an error
>> whenever there is an attempt to map an address for DMA that doesn't
>> fit into the DMA window
>
> Isn't this mostly prevented by the use of rte_mem_set_dma_mask() and
> rte_mem_check_dma_mask()? I'd expect an error would be thrown by the
> kernel IOMMU API for an out-of-range mapping that I would simply return
> to the caller (drivers/vfio/vfio_iommu_spapr_tce.c includes the comment
> /* iova is checked by the IOMMU API */). Why do you think double
> checking this would help?
I don't think we check rte_mem_check_dma_mask() anywhere in the call
path of external memory code. Also, i just checked, and you're right,
rte_vfio_container_dma_map() will fail if the kernel fails to map the
memory, however nothing will fail in external memory because the IOVA
addresses aren't checked for being within DMA mask.
See malloc_heap.c:1097 onwards, we simply add user-specified IOVA
addresses into the page table without checking if the fit into the DMA
mask. The DMA mapping will then happen through a mem event callback, but
we don't check return value of that callback either, so even if DMA
mapping fails, we'll only get a log message.
So, perhaps the real solution here is to add a DMA mask check into
rte_malloc_heap_memory_add(), so that we check the IOVA addresses before
we ever try to do anything with them. I'll submit a patch for this.
>
>>
>> I think that would be best approach. Thoughts?
>
> Dave
--
Thanks,
Anatoly
next prev parent reply other threads:[~2020-05-05 14:57 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-04-29 23:29 [dpdk-dev] [PATCH 0/2] vfio: change spapr DMA window sizing operation David Christensen
2020-04-29 23:29 ` [dpdk-dev] [PATCH 1/2] vfio: use ifdef's for ppc64 spapr code David Christensen
2020-04-30 11:14 ` Burakov, Anatoly
2020-04-30 16:22 ` David Christensen
2020-04-30 16:24 ` Burakov, Anatoly
2020-04-30 17:38 ` David Christensen
2020-05-01 8:49 ` Burakov, Anatoly
2020-04-29 23:29 ` [dpdk-dev] [PATCH 2/2] vfio: modify spapr iommu support to use static window sizing David Christensen
2020-04-30 11:34 ` Burakov, Anatoly
2020-04-30 17:36 ` David Christensen
2020-05-01 9:06 ` Burakov, Anatoly
2020-05-01 16:48 ` David Christensen
2020-05-05 14:57 ` Burakov, Anatoly [this message]
2020-05-05 16:26 ` David Christensen
2020-05-06 10:18 ` Burakov, Anatoly
2020-06-30 21:38 ` [dpdk-dev] [PATCH v2 0/1] vfio: change spapr DMA window sizing operation David Christensen
2020-06-30 21:38 ` [dpdk-dev] [PATCH v2 1/1] vfio: modify spapr iommu support to use static window sizing David Christensen
2020-08-10 21:07 ` [dpdk-dev] [PATCH v3 0/1] vfio: change spapr DMA window sizing operation David Christensen
2020-08-10 21:07 ` [dpdk-dev] [PATCH v3 1/1] vfio: modify spapr iommu support to use static window sizing David Christensen
2020-09-03 18:55 ` David Christensen
2020-09-17 11:13 ` Burakov, Anatoly
2020-10-07 12:49 ` Thomas Monjalon
2020-10-07 17:44 ` David Christensen
2020-10-08 9:39 ` Burakov, Anatoly
2020-10-12 19:19 ` David Christensen
2020-10-14 9:27 ` Burakov, Anatoly
2020-10-15 17:23 ` [dpdk-dev] [PATCH v4 0/1] vfio: change spapr DMA window sizing operation David Christensen
2020-10-15 17:23 ` [dpdk-dev] [PATCH v4 1/1] vfio: modify spapr iommu support to use static window sizing David Christensen
2020-10-20 12:05 ` Thomas Monjalon
2020-10-29 21:30 ` Thomas Monjalon
2020-11-02 11:04 ` Burakov, Anatoly
2020-11-03 22:05 ` [dpdk-dev] [PATCH v5 0/1] " David Christensen
2020-11-03 22:05 ` [dpdk-dev] [PATCH v5 1/1] " David Christensen
2020-11-04 19:43 ` Thomas Monjalon
2020-11-04 21:00 ` David Christensen
2020-11-04 21:02 ` Thomas Monjalon
2020-11-04 22:25 ` David Christensen
2020-11-05 7:12 ` Thomas Monjalon
2020-11-06 22:16 ` David Christensen
2020-11-07 9:58 ` Thomas Monjalon
2020-11-09 20:35 ` [dpdk-dev] [PATCH v5 0/1] " David Christensen
2020-11-09 20:35 ` [dpdk-dev] [PATCH v6 1/1] " David Christensen
2020-11-09 21:10 ` Thomas Monjalon
2020-11-10 17:41 ` [dpdk-dev] [PATCH v7 0/1] " David Christensen
2020-11-10 17:41 ` [dpdk-dev] [PATCH v7 1/1] " David Christensen
2020-11-10 17:43 ` [dpdk-dev] [PATCH v7 0/1] " David Christensen
2020-11-10 17:43 ` [dpdk-dev] [PATCH v7 1/1] " David Christensen
2020-11-13 8:39 ` Thomas Monjalon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8f83778f-fe74-6c69-e2d0-73b9a6e7ca79@intel.com \
--to=anatoly.burakov@intel.com \
--cc=dev@dpdk.org \
--cc=drc@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).