DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Burakov, Anatoly" <anatoly.burakov@intel.com>
To: David Christensen <drc@linux.vnet.ibm.com>, dev@dpdk.org
Subject: Re: [dpdk-dev] [PATCH 2/2] vfio: modify spapr iommu support to use static window sizing
Date: Tue, 5 May 2020 15:57:01 +0100
Message-ID: <8f83778f-fe74-6c69-e2d0-73b9a6e7ca79@intel.com> (raw)
In-Reply-To: <782d6f04-f476-93d6-1a8f-2ed0b39dde10@linux.vnet.ibm.com>

On 01-May-20 5:48 PM, David Christensen wrote:
>>>> I'm not sure of the algorithm for "memory size" here.
>>>>
>>>> Technically, DPDK can reserve memory segments anywhere in the VA 
>>>> space allocated by memseg lists. That space may be far bigger than 
>>>> system memory (on a typical Intel server board you'd see 128GB of VA 
>>>> space preallocated even though the machine itself might only have, 
>>>> say, 16GB of RAM installed). The same applies to any other arch 
>>>> running on Linux, so the window needs to cover at least 
>>>> RTE_MIN(base_virtaddr, lowest memseglist VA address) and up to 
>>>> highest memseglist VA address. That's not even mentioning the fact 
>>>> that the user may register external memory for DMA which may cause 
>>>> the window to be of insufficient size to cover said external memory.
>>>>
>>>> I also think that in general, "system memory" metric is ill suited 
>>>> for measuring VA space, because unlike system memory, the VA space 
>>>> is sparse and can therefore span *a lot* of address space even 
>>>> though in reality it may actually use very little physical memory.
>>>
>>> I'm open to suggestions here.  Perhaps an alternative in /proc/meminfo:
>>>
>>> VmallocTotal:   549755813888 kB
>>>
>>> I tested it with 1GB hugepages and it works, need to check with 2M as 
>>> well.  If there's no alternative for sizing the window based on 
>>> available system parameters then I have another option which creates 
>>> a new RTE_IOVA_TA mode that forces IOVA addresses into the range 0 to 
>>> X where X is configured on the EAL command-line (--iova-base, 
>>> --iova-len).   I use these command-line values to create a static 
>>> window.
>>>
>>
>> A whole new IOVA mode, while being a cleaner solution, would require a 
>> lot of testing, and it doesn't really solve the external memory 
>> problem, because we're still reliant on the user to provide IOVA 
>> addresses. Perhaps something akin to VA/IOVA address reservation would 
>> solve the problem, but again, lots of changes and testing, all for a 
>> comparatively narrow use case.
>>
>> The vmalloc area seems big enough (512 terabytes on your machine, 32 
>> terabytes on mine), so it'll probably be OK. I'd settle for:
>>
>> 1) start at base_virtaddr OR lowest memseg list address, whichever is 
>> lowest
> 
> The IOMMU only supports two starting addresses, 0 or 1<<59, so 
> implementation will need to start at 0.  (I've been bit by this before, 
> my understanding is that the processor only supports 54 bits of the 
> address and that the PCI host bridge uses bit 59 of the IOVA as a signal 
> to do the address translation for the second DMA window.)

Fair enough, 0 it is then.

> 
>> 2) end at lowest addr + VmallocTotal OR highest memseglist addr, 
>> whichever is higher
> 
> So, instead of rte_memseg_walk() execute rte_memseg_list_walk() to find 
> the lowest/highest msl addresses?

Yep. rte_memseg_walk() will only cover allocated pages, while 
rte_memseg_list_walk() will cover even empty page tables.

> 
>> 3) a check in user DMA map function that would warn/throw an error 
>> whenever there is an attempt to map an address for DMA that doesn't 
>> fit into the DMA window
> 
> Isn't this mostly prevented by the use of  rte_mem_set_dma_mask() and 
> rte_mem_check_dma_mask()?  I'd expect an error would be thrown by the 
> kernel IOMMU API for an out-of-range mapping that I would simply return 
> to the caller (drivers/vfio/vfio_iommu_spapr_tce.c includes the comment 
> /* iova is checked by the IOMMU API */).  Why do you think double 
> checking this would help?

I don't think we check rte_mem_check_dma_mask() anywhere in the call 
path of external memory code. Also, i just checked, and you're right, 
rte_vfio_container_dma_map() will fail if the kernel fails to map the 
memory, however nothing will fail in external memory because the IOVA 
addresses aren't checked for being within DMA mask.

See malloc_heap.c:1097 onwards, we simply add user-specified IOVA 
addresses into the page table without checking if the fit into the DMA 
mask. The DMA mapping will then happen through a mem event callback, but 
we don't check return value of that callback either, so even if DMA 
mapping fails, we'll only get a log message.

So, perhaps the real solution here is to add a DMA mask check into 
rte_malloc_heap_memory_add(), so that we check the IOVA addresses before 
we ever try to do anything with them. I'll submit a patch for this.

> 
>>
>> I think that would be best approach. Thoughts?
> 
> Dave


-- 
Thanks,
Anatoly

  reply	other threads:[~2020-05-05 14:57 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-29 23:29 [dpdk-dev] [PATCH 0/2] vfio: change spapr DMA window sizing operation David Christensen
2020-04-29 23:29 ` [dpdk-dev] [PATCH 1/2] vfio: use ifdef's for ppc64 spapr code David Christensen
2020-04-30 11:14   ` Burakov, Anatoly
2020-04-30 16:22     ` David Christensen
2020-04-30 16:24       ` Burakov, Anatoly
2020-04-30 17:38         ` David Christensen
2020-05-01  8:49           ` Burakov, Anatoly
2020-04-29 23:29 ` [dpdk-dev] [PATCH 2/2] vfio: modify spapr iommu support to use static window sizing David Christensen
2020-04-30 11:34   ` Burakov, Anatoly
2020-04-30 17:36     ` David Christensen
2020-05-01  9:06       ` Burakov, Anatoly
2020-05-01 16:48         ` David Christensen
2020-05-05 14:57           ` Burakov, Anatoly [this message]
2020-05-05 16:26             ` David Christensen
2020-05-06 10:18               ` Burakov, Anatoly
2020-06-30 21:38 ` [dpdk-dev] [PATCH v2 0/1] vfio: change spapr DMA window sizing operation David Christensen
2020-06-30 21:38   ` [dpdk-dev] [PATCH v2 1/1] vfio: modify spapr iommu support to use static window sizing David Christensen
2020-08-10 21:07   ` [dpdk-dev] [PATCH v3 0/1] vfio: change spapr DMA window sizing operation David Christensen
2020-08-10 21:07     ` [dpdk-dev] [PATCH v3 1/1] vfio: modify spapr iommu support to use static window sizing David Christensen
2020-09-03 18:55       ` David Christensen
2020-09-17 11:13       ` Burakov, Anatoly
2020-10-07 12:49         ` Thomas Monjalon
2020-10-07 17:44         ` David Christensen
2020-10-08  9:39           ` Burakov, Anatoly
2020-10-12 19:19             ` David Christensen
2020-10-14  9:27               ` Burakov, Anatoly
2020-10-15 17:23     ` [dpdk-dev] [PATCH v4 0/1] vfio: change spapr DMA window sizing operation David Christensen
2020-10-15 17:23       ` [dpdk-dev] [PATCH v4 1/1] vfio: modify spapr iommu support to use static window sizing David Christensen
2020-10-20 12:05         ` Thomas Monjalon
2020-10-29 21:30           ` Thomas Monjalon
2020-11-02 11:04         ` Burakov, Anatoly
2020-11-03 22:05       ` [dpdk-dev] [PATCH v5 0/1] " David Christensen
2020-11-03 22:05         ` [dpdk-dev] [PATCH v5 1/1] " David Christensen
2020-11-04 19:43           ` Thomas Monjalon
2020-11-04 21:00             ` David Christensen
2020-11-04 21:02               ` Thomas Monjalon
2020-11-04 22:25                 ` David Christensen
2020-11-05  7:12                   ` Thomas Monjalon
2020-11-06 22:16                     ` David Christensen
2020-11-07  9:58                       ` Thomas Monjalon
2020-11-09 20:35         ` [dpdk-dev] [PATCH v5 0/1] " David Christensen
2020-11-09 20:35           ` [dpdk-dev] [PATCH v6 1/1] " David Christensen
2020-11-09 21:10             ` Thomas Monjalon
2020-11-10 17:41           ` [dpdk-dev] [PATCH v7 0/1] " David Christensen
2020-11-10 17:41             ` [dpdk-dev] [PATCH v7 1/1] " David Christensen
2020-11-10 17:43           ` [dpdk-dev] [PATCH v7 0/1] " David Christensen
2020-11-10 17:43             ` [dpdk-dev] [PATCH v7 1/1] " David Christensen
2020-11-13  8:39               ` Thomas Monjalon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8f83778f-fe74-6c69-e2d0-73b9a6e7ca79@intel.com \
    --to=anatoly.burakov@intel.com \
    --cc=dev@dpdk.org \
    --cc=drc@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

DPDK patches and discussions

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://inbox.dpdk.org/dev/0 dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 dev dev/ https://inbox.dpdk.org/dev \
		dev@dpdk.org
	public-inbox-index dev

Example config snippet for mirrors.
Newsgroup available over NNTP:
	nntp://inbox.dpdk.org/inbox.dpdk.dev


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git