From: David Christensen <drc@linux.vnet.ibm.com>
To: "Burakov, Anatoly" <anatoly.burakov@intel.com>, dev@dpdk.org
Subject: Re: [dpdk-dev] [PATCH 2/2] vfio: modify spapr iommu support to use static window sizing
Date: Thu, 30 Apr 2020 10:36:17 -0700 [thread overview]
Message-ID: <6763793c-265b-c5cf-228a-b2c177574c84@linux.vnet.ibm.com> (raw)
In-Reply-To: <6cbb170a-3f13-47ba-e0ad-4a86cd6cb352@intel.com>
On 4/30/20 4:34 AM, Burakov, Anatoly wrote:
> On 30-Apr-20 12:29 AM, David Christensen wrote:
>> Current SPAPR IOMMU support code dynamically modifies the DMA window
>> size in response to every new memory allocation. This is potentially
>> dangerous because all existing mappings need to be unmapped/remapped in
>> order to resize the DMA window, leaving hardware holding IOVA addresses
>> that are not properly prepared for DMA. The new SPAPR code statically
>> assigns the DMA window size on first use, using the largest physical
>> memory address when IOVA=PA and the base_virtaddr + physical memory size
>> when IOVA=VA. As a result, memory will only be unmapped when
>> specifically requested.
>>
>> Signed-off-by: David Christensen <drc@linux.vnet.ibm.com>
>> ---
>
> Hi David,
>
> I haven't yet looked at the code in detail (will do so later), but some
> general comments and questions below.
>
>> + /*
>> + * Read "System RAM" in /proc/iomem:
>> + * 00000000-1fffffffff : System RAM
>> + * 200000000000-201fffffffff : System RAM
>> + */
>> + FILE *fd = fopen(proc_iomem, "r");
>> + if (fd == NULL) {
>> + RTE_LOG(ERR, EAL, "Cannot open %s\n", proc_iomem);
>> + return -1;
>> + }
>
> A quick check on my machines shows that when cat'ing /proc/iomem as
> non-root, you get zeroes everywhere, which leads me to believe that you
> have to be root to get anything useful out of /proc/iomem. Since one of
> the major selling points of VFIO is the ability to run as non-root,
> depending on iomem kind of defeats the purpose a bit.
I observed the same thing on my system during development. I didn't see
anything that precluded support for RTE_IOVA_PA in the VFIO code. Are
you suggesting that I should explicitly not support that configuration?
If you're attempting to use RTE_IOVA_PA then you're already required to
run as root, so there shouldn't be an issue accessing this
>> + return 0;
>> +
>> + } else if (rte_eal_iova_mode() == RTE_IOVA_VA) {
>> + /* Set the DMA window to base_virtaddr + system memory size */
>> + const char proc_meminfo[] = "/proc/meminfo";
>> + const char str_memtotal[] = "MemTotal:";
>> + int memtotal_len = sizeof(str_memtotal) - 1;
>> + char buffer[256];
>> + uint64_t size = 0;
>> +
>> + FILE *fd = fopen(proc_meminfo, "r");
>> + if (fd == NULL) {
>> + RTE_LOG(ERR, EAL, "Cannot open %s\n", proc_meminfo);
>> + return -1;
>> + }
>> + while (fgets(buffer, sizeof(buffer), fd)) {
>> + if (strncmp(buffer, str_memtotal, memtotal_len) == 0) {
>> + size = rte_str_to_size(&buffer[memtotal_len]);
>> + break;
>> + }
>> + }
>> + fclose(fd);
>> +
>> + if (size == 0) {
>> + RTE_LOG(ERR, EAL, "Failed to find valid \"MemTotal\" entry "
>> + "in file %s\n", proc_meminfo);
>> + return -1;
>> + }
>> +
>> + RTE_LOG(DEBUG, EAL, "MemTotal is 0x%" PRIx64 "\n", size);
>> + /* if no base virtual address is configured use 4GB */
>> + spapr_dma_win_len = rte_align64pow2(size +
>> + (internal_config.base_virtaddr > 0 ?
>> + (uint64_t)internal_config.base_virtaddr : 1ULL << 32));
>> + rte_mem_set_dma_mask(__builtin_ctzll(spapr_dma_win_len));
>
> I'm not sure of the algorithm for "memory size" here.
>
> Technically, DPDK can reserve memory segments anywhere in the VA space
> allocated by memseg lists. That space may be far bigger than system
> memory (on a typical Intel server board you'd see 128GB of VA space
> preallocated even though the machine itself might only have, say, 16GB
> of RAM installed). The same applies to any other arch running on Linux,
> so the window needs to cover at least RTE_MIN(base_virtaddr, lowest
> memseglist VA address) and up to highest memseglist VA address. That's
> not even mentioning the fact that the user may register external memory
> for DMA which may cause the window to be of insufficient size to cover
> said external memory.
>
> I also think that in general, "system memory" metric is ill suited for
> measuring VA space, because unlike system memory, the VA space is sparse
> and can therefore span *a lot* of address space even though in reality
> it may actually use very little physical memory.
I'm open to suggestions here. Perhaps an alternative in /proc/meminfo:
VmallocTotal: 549755813888 kB
I tested it with 1GB hugepages and it works, need to check with 2M as
well. If there's no alternative for sizing the window based on
available system parameters then I have another option which creates a
new RTE_IOVA_TA mode that forces IOVA addresses into the range 0 to X
where X is configured on the EAL command-line (--iova-base, --iova-len).
I use these command-line values to create a static window.
Dave
Dave
next prev parent reply other threads:[~2020-04-30 17:36 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-04-29 23:29 [dpdk-dev] [PATCH 0/2] vfio: change spapr DMA window sizing operation David Christensen
2020-04-29 23:29 ` [dpdk-dev] [PATCH 1/2] vfio: use ifdef's for ppc64 spapr code David Christensen
2020-04-30 11:14 ` Burakov, Anatoly
2020-04-30 16:22 ` David Christensen
2020-04-30 16:24 ` Burakov, Anatoly
2020-04-30 17:38 ` David Christensen
2020-05-01 8:49 ` Burakov, Anatoly
2020-04-29 23:29 ` [dpdk-dev] [PATCH 2/2] vfio: modify spapr iommu support to use static window sizing David Christensen
2020-04-30 11:34 ` Burakov, Anatoly
2020-04-30 17:36 ` David Christensen [this message]
2020-05-01 9:06 ` Burakov, Anatoly
2020-05-01 16:48 ` David Christensen
2020-05-05 14:57 ` Burakov, Anatoly
2020-05-05 16:26 ` David Christensen
2020-05-06 10:18 ` Burakov, Anatoly
2020-06-30 21:38 ` [dpdk-dev] [PATCH v2 0/1] vfio: change spapr DMA window sizing operation David Christensen
2020-06-30 21:38 ` [dpdk-dev] [PATCH v2 1/1] vfio: modify spapr iommu support to use static window sizing David Christensen
2020-08-10 21:07 ` [dpdk-dev] [PATCH v3 0/1] vfio: change spapr DMA window sizing operation David Christensen
2020-08-10 21:07 ` [dpdk-dev] [PATCH v3 1/1] vfio: modify spapr iommu support to use static window sizing David Christensen
2020-09-03 18:55 ` David Christensen
2020-09-17 11:13 ` Burakov, Anatoly
2020-10-07 12:49 ` Thomas Monjalon
2020-10-07 17:44 ` David Christensen
2020-10-08 9:39 ` Burakov, Anatoly
2020-10-12 19:19 ` David Christensen
2020-10-14 9:27 ` Burakov, Anatoly
2020-10-15 17:23 ` [dpdk-dev] [PATCH v4 0/1] vfio: change spapr DMA window sizing operation David Christensen
2020-10-15 17:23 ` [dpdk-dev] [PATCH v4 1/1] vfio: modify spapr iommu support to use static window sizing David Christensen
2020-10-20 12:05 ` Thomas Monjalon
2020-10-29 21:30 ` Thomas Monjalon
2020-11-02 11:04 ` Burakov, Anatoly
2020-11-03 22:05 ` [dpdk-dev] [PATCH v5 0/1] " David Christensen
2020-11-03 22:05 ` [dpdk-dev] [PATCH v5 1/1] " David Christensen
2020-11-04 19:43 ` Thomas Monjalon
2020-11-04 21:00 ` David Christensen
2020-11-04 21:02 ` Thomas Monjalon
2020-11-04 22:25 ` David Christensen
2020-11-05 7:12 ` Thomas Monjalon
2020-11-06 22:16 ` David Christensen
2020-11-07 9:58 ` Thomas Monjalon
2020-11-09 20:35 ` [dpdk-dev] [PATCH v5 0/1] " David Christensen
2020-11-09 20:35 ` [dpdk-dev] [PATCH v6 1/1] " David Christensen
2020-11-09 21:10 ` Thomas Monjalon
2020-11-10 17:41 ` [dpdk-dev] [PATCH v7 0/1] " David Christensen
2020-11-10 17:41 ` [dpdk-dev] [PATCH v7 1/1] " David Christensen
2020-11-10 17:43 ` [dpdk-dev] [PATCH v7 0/1] " David Christensen
2020-11-10 17:43 ` [dpdk-dev] [PATCH v7 1/1] " David Christensen
2020-11-13 8:39 ` Thomas Monjalon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6763793c-265b-c5cf-228a-b2c177574c84@linux.vnet.ibm.com \
--to=drc@linux.vnet.ibm.com \
--cc=anatoly.burakov@intel.com \
--cc=dev@dpdk.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).