From: Thomas Monjalon <thomas@monjalon.net>
To: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Cc: dev@dpdk.org,
Harini Ramakrishnan <harini.ramakrishnan@microsoft.com>,
Omar Cardona <ocardona@microsoft.com>,
Dmitry Malloy <dmitrym@microsoft.com>,
Narcisa Ana Maria Vasile <navasile@linux.microsoft.com>,
Pallavi Kadam <pallavi.kadam@intel.com>,
Ranjit Menon <ranjit.menon@intel.com>,
Tal Shnaiderman <talshn@mellanox.com>,
Fady Bader <fady@mellanox.com>, Ophir Munk <ophirmu@mellanox.com>,
Anatoly Burakov <anatoly.burakov@intel.com>
Subject: Re: [dpdk-dev] [Minutes 04/15/2020] Bi-Weekly DPDK Windows Community Call
Date: Fri, 17 Apr 2020 09:47:51 +0200 [thread overview]
Message-ID: <6962862.uKWtJMOXK1@thomas> (raw)
In-Reply-To: <20200417024633.21d77a3b@Sovereign>
17/04/2020 01:46, Dmitry Kozlyuk:
> > * [AI Dmitry K, Harini] Dmitry K to send summary of conversation for feedback, Harini to follow-up for resolution.
>
> On Windows community calls we've been discussing memory management
> implementation approaches and plans. This summary aims to bring everyone
> interested to the same page and to record information in one public place.
>
> [Dmitry M] is Dmitry Malloy from Microsoft, [Dmitry K] is me.
> Cc'ing Anatoly Burakov as DPDK memory subsystem maintainer.
>
>
> Current State
> -------------
>
> Patches are sent for basic memory management that should be suitable for most
> simple cases. Relevant implementation traits are as follows:
>
> * IOVA as PA only, PA is obtained via a kernel-mode driver.
> * Hugepages are allocated dynamically in user-mode (2MB only),
> IOVA-contiguity is provided by allocator to the extent possible.
> * No multi-process support.
>
>
> Background and Findings
> -----------------------
>
> Physical addresses are fundamentally limited and insecure because of the
> following (this list is not specific to Windows, but provides context):
>
> 1. A user-mode application with access to DMA and PA can convince the
> device to overwrite arbitrary RAM content, bypassing OS security.
>
> 2. IOMMU might be engaged rendering PA invalid for a particular device.
> This mode is mandatory for PCI passthrough into VM.
>
> 3. IOMMU may be used even on a bare-metal system to protect against #1 by
> limiting DMA for a device to IOMMU mappings. Zero-copy forwarding using
> DMA from different RX and TX devices must take care of this. On Windows,
> such mechanism is called Kernel DMA Protection [1].
>
> 4. Device can be VA-only with an onboard IOMMU (e.g. Mellanox NICs).
Mellanox NICs work also with PA memory.
> 5. In complex PCI topologies logical bus addresses may differ from PA,
> although a concrete example is missing for modern systems (IoT SoC?).
>
>
> Within Windows kernel there are two facilities to deal with the above:
>
> 1. DMA_ADAPTER interface and its AllocateDomainCommonBuffer() method [2].
> "DMA adapter" is an abstraction of bus-master mode or an allocated channel
> of a DMA controller. Also, each device belongs to a DMA domain, initially
> its so-called default domain. Only devices of the same domain can have a
> buffer suitable for DMA by all devices. In that, DMA domains are similar
> to IOMMU groups in Linux.
>
> Besides domain management, this interface allows allocation of such a
> common buffer, that is, a contiguous range of IOVA (logical addresses) and
> kernel VA (which can be mapped to user-space). Advantages of this
> interface: 1) it is universal w.r.t. PCI topology, IOMMU, etc; 2) it
> supports hugepages. One disadvantage is that kernel controls IOVA and VA.
>
> 2. DMA_IOMMU interface which is functionally similar to Linux VFIO driver,
> that is, it allows management of IOMMU mappings within a domain [3].
>
> [Dmitry M] Microsoft considers creating a generic memory-management driver
> exposing (some of) these interfaces which will be shipped with Windows. This
> is an idea on its early stage, not a commitment.
DMA_ADAPTER and DMA_IOMMU are kernel interfaces, without any userspace API?
> Notable DPDK memory management traits:
>
> 1. When memory is requested from EAL, it is unknown whether it will be used
> for DMA and with which device. The hint is when rte_virt2iova() is called,
> but this is not the case for VA-only devices.
>
> 2. Memory is reserved and then committed in segments (basically, hugepages).
>
> 3. There is a callback for segment list allocation and deallocation. For
> example, Linux EAL uses it to create IOMMU mappings when VFIO is engaged.
>
> 4. There are drivers that explicitly request PA via rte_virt2phys().
>
>
> Last but not the least, user-mode memory management notes:
>
> 1. Windows doesn't report limits on the number of hugepages.
>
> 2. By official documentation, only 2MB hugepages are supported.
>
> [Dmitry M] There are new, still undocumented Win32 API flags for 1GB [5].
> [Dmitry K] Found a novel allocator library using these new features [6].
> Failed to make use of [5] with AWE, unclear how to integrate into MM.
>
> 3. Address Windowing Extensions [4] allow allocating physical page
> frames (PFN) and then mapping them to VA, all in user-mode.
>
> [Dmitry K] Experiments show AWE cannot allocate hugepages (in a documented
> way at least) and cannot reliably provide contiguous ranges (and does not
> guarantee it). IMO, this interface is useless for common MM. Some drivers
> that do not need hugepages but require PA may benefit from it.
>
>
> Opens
> -----
>
> IMO, "Advanced memory management" milestone from roadmap should be split.
Yes for splitting. Feel free to send a patch for the roadmap.
And we should plan these tasks later in the year.
Basic memory management should be enough for first steps with PMDs.
> There are three major points of MM improvement, each requiring research and a
> complex patch:
>
> 1. Proper DMA buffers via AllocateDomainCommonBuffer (DPDK part is unclear).
> 2. VFIO-like code in Windows EAL using DMA_IOMMU.
> 3. Support for 1GB hugepages and related changes.
>
> Windows kernel interfaces described above have poor documentation. On Windows
> community call 2020-04-01 Dmitry Malloy agreed to help with this (concrete
> questions were raised and noted).
>
> Hugepages of 1GB are desirable, but allocating them relies on undocumented
> features. Also, because Windows does not provide hugepage limits, it may
> require more work to manage multiple sizes in DPDK.
>
>
> References
> ----------
>
> [1]: Kernel DMA Protection for Thunderbolt™ 3
> <https://docs.microsoft.com/en-us/windows/security/information-protection/kernel-dma-protection-for-thunderbolt>
> [2]: DMA_IOMMU interface -
> <https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/wdm/nc-wdm-iommu_map_identity_range>
> [3]: DMA_ADAPTER.AllocateDomainCommonBuffer -
> <https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/wdm/nc-wdm-pallocate_domain_common_buffer>
> [4]: Address Windowing Extensions (AWE)
> <https://docs.microsoft.com/en-us/windows/win32/memory/address-windowing-extensions>
> [5]: GitHub issue <https://github.com/dotnet/runtime/issues/12779>
> [6]: mimalloc <https://github.com/microsoft/mimalloc>
Thanks for the great summary.
next prev parent reply other threads:[~2020-04-17 7:47 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-04-15 21:02 [dpdk-dev] Fwd: " Thomas Monjalon
2020-04-16 23:46 ` [dpdk-dev] " Dmitry Kozlyuk
2020-04-17 7:47 ` Thomas Monjalon [this message]
2020-04-17 15:10 ` Dmitry Kozlyuk
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6962862.uKWtJMOXK1@thomas \
--to=thomas@monjalon.net \
--cc=anatoly.burakov@intel.com \
--cc=dev@dpdk.org \
--cc=dmitry.kozliuk@gmail.com \
--cc=dmitrym@microsoft.com \
--cc=fady@mellanox.com \
--cc=harini.ramakrishnan@microsoft.com \
--cc=navasile@linux.microsoft.com \
--cc=ocardona@microsoft.com \
--cc=ophirmu@mellanox.com \
--cc=pallavi.kadam@intel.com \
--cc=ranjit.menon@intel.com \
--cc=talshn@mellanox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).