DPDK patches and discussions
 help / color / mirror / Atom feed
From: Thomas Monjalon <thomas@monjalon.net>
To: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Cc: dev@dpdk.org,
	Harini Ramakrishnan <harini.ramakrishnan@microsoft.com>,
	Omar Cardona <ocardona@microsoft.com>,
	Dmitry Malloy <dmitrym@microsoft.com>,
	Narcisa Ana Maria Vasile <navasile@linux.microsoft.com>,
	Pallavi Kadam <pallavi.kadam@intel.com>,
	Ranjit Menon <ranjit.menon@intel.com>,
	Tal Shnaiderman <talshn@mellanox.com>,
	Fady Bader <fady@mellanox.com>, Ophir Munk <ophirmu@mellanox.com>,
	Anatoly Burakov <anatoly.burakov@intel.com>
Subject: Re: [dpdk-dev] [Minutes 04/15/2020] Bi-Weekly DPDK Windows Community Call
Date: Fri, 17 Apr 2020 09:47:51 +0200	[thread overview]
Message-ID: <6962862.uKWtJMOXK1@thomas> (raw)
In-Reply-To: <20200417024633.21d77a3b@Sovereign>

17/04/2020 01:46, Dmitry Kozlyuk:
> >   *   [AI Dmitry K, Harini] Dmitry K to send summary of conversation for feedback, Harini to follow-up for resolution.
> 
> On Windows community calls we've been discussing memory management
> implementation approaches and plans. This summary aims to bring everyone
> interested to the same page and to record information in one public place.
> 
> [Dmitry M] is Dmitry Malloy from Microsoft, [Dmitry K] is me.
> Cc'ing Anatoly Burakov as DPDK memory subsystem maintainer.
> 
> 
> Current State
> -------------
> 
> Patches are sent for basic memory management that should be suitable for most
> simple cases. Relevant implementation traits are as follows:
> 
> * IOVA as PA only, PA is obtained via a kernel-mode driver.
> * Hugepages are allocated dynamically in user-mode (2MB only),
>   IOVA-contiguity is provided by allocator to the extent possible.
> * No multi-process support.
> 
> 
> Background and Findings
> -----------------------
> 
> Physical addresses are fundamentally limited and insecure because of the
> following (this list is not specific to Windows, but provides context):
> 
> 1. A user-mode application with access to DMA and PA can convince the
>    device to overwrite arbitrary RAM content, bypassing OS security.
> 
> 2. IOMMU might be engaged rendering PA invalid for a particular device.
>    This mode is mandatory for PCI passthrough into VM.
> 
> 3. IOMMU may be used even on a bare-metal system to protect against #1 by
>    limiting DMA for a device to IOMMU mappings. Zero-copy forwarding using
>    DMA from different RX and TX devices must take care of this. On Windows,
>    such mechanism is called Kernel DMA Protection [1].
> 
> 4. Device can be VA-only with an onboard IOMMU (e.g. Mellanox NICs).

Mellanox NICs work also with PA memory.

> 5. In complex PCI topologies logical bus addresses may differ from PA,
>    although a concrete example is missing for modern systems (IoT SoC?).
> 
> 
> Within Windows kernel there are two facilities to deal with the above:
> 
> 1. DMA_ADAPTER interface and its AllocateDomainCommonBuffer() method [2].
>    "DMA adapter" is an abstraction of bus-master mode or an allocated channel
>    of a DMA controller. Also, each device belongs to a DMA domain, initially
>    its so-called default domain. Only devices of the same domain can have a
>    buffer suitable for DMA by all devices. In that, DMA domains are similar
>    to IOMMU groups in Linux.
> 
>    Besides domain management, this interface allows allocation of such a
>    common buffer, that is, a contiguous range of IOVA (logical addresses) and
>    kernel VA (which can be mapped to user-space). Advantages of this
>    interface: 1) it is universal w.r.t. PCI topology, IOMMU, etc; 2) it
>    supports hugepages. One disadvantage is that kernel controls IOVA and VA.
> 
> 2. DMA_IOMMU interface which is functionally similar to Linux VFIO driver,
>    that is, it allows management of IOMMU mappings within a domain [3].
> 
> [Dmitry M] Microsoft considers creating a generic memory-management driver
> exposing (some of) these interfaces which will be shipped with Windows. This
> is an idea on its early stage, not a commitment.

DMA_ADAPTER and DMA_IOMMU are kernel interfaces, without any userspace API?


> Notable DPDK memory management traits:
> 
> 1. When memory is requested from EAL, it is unknown whether it will be used
> for DMA and with which device. The hint is when rte_virt2iova() is called,
> but this is not the case for VA-only devices.
> 
> 2. Memory is reserved and then committed in segments (basically, hugepages).
> 
> 3. There is a callback for segment list allocation and deallocation. For
> example, Linux EAL uses it to create IOMMU mappings when VFIO is engaged.
> 
> 4. There are drivers that explicitly request PA via rte_virt2phys().
> 
> 
> Last but not the least, user-mode memory management notes:
> 
> 1. Windows doesn't report limits on the number of hugepages.
> 
> 2. By official documentation, only 2MB hugepages are supported.
> 
>    [Dmitry M] There are new, still undocumented Win32 API flags for 1GB [5].
>    [Dmitry K] Found a novel allocator library using these new features [6].
>    Failed to make use of [5] with AWE, unclear how to integrate into MM.
> 
> 3. Address Windowing Extensions [4] allow allocating physical page
>    frames (PFN) and then mapping them to VA, all in user-mode.
> 
>    [Dmitry K] Experiments show AWE cannot allocate hugepages (in a documented
>    way at least) and cannot reliably provide contiguous ranges (and does not
>    guarantee it). IMO, this interface is useless for common MM. Some drivers
>    that do not need hugepages but require PA may benefit from it.
> 
> 
> Opens
> -----
> 
> IMO, "Advanced memory management" milestone from roadmap should be split.

Yes for splitting. Feel free to send a patch for the roadmap.
And we should plan these tasks later in the year.
Basic memory management should be enough for first steps with PMDs.

> There are three major points of MM improvement, each requiring research and a
> complex patch:
> 
> 1. Proper DMA buffers via AllocateDomainCommonBuffer (DPDK part is unclear).
> 2. VFIO-like code in Windows EAL using DMA_IOMMU.
> 3. Support for 1GB hugepages and related changes.
> 
> Windows kernel interfaces described above have poor documentation. On Windows
> community call 2020-04-01 Dmitry Malloy agreed to help with this (concrete
> questions were raised and noted).
> 
> Hugepages of 1GB are desirable, but allocating them relies on undocumented
> features. Also, because Windows does not provide hugepage limits, it may
> require more work to manage multiple sizes in DPDK.
> 
> 
> References
> ----------
> 
> [1]: Kernel DMA Protection for Thunderbolt™ 3
> <https://docs.microsoft.com/en-us/windows/security/information-protection/kernel-dma-protection-for-thunderbolt>
> [2]: DMA_IOMMU interface -
> <https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/wdm/nc-wdm-iommu_map_identity_range>
> [3]: DMA_ADAPTER.AllocateDomainCommonBuffer -
> <https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/wdm/nc-wdm-pallocate_domain_common_buffer>
> [4]: Address Windowing Extensions (AWE)
> <https://docs.microsoft.com/en-us/windows/win32/memory/address-windowing-extensions>
> [5]: GitHub issue <https://github.com/dotnet/runtime/issues/12779>
> [6]: mimalloc <https://github.com/microsoft/mimalloc>


Thanks for the great summary.



  reply	other threads:[~2020-04-17  7:47 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-15 21:02 [dpdk-dev] Fwd: " Thomas Monjalon
2020-04-16 23:46 ` [dpdk-dev] " Dmitry Kozlyuk
2020-04-17  7:47   ` Thomas Monjalon [this message]
2020-04-17 15:10     ` Dmitry Kozlyuk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6962862.uKWtJMOXK1@thomas \
    --to=thomas@monjalon.net \
    --cc=anatoly.burakov@intel.com \
    --cc=dev@dpdk.org \
    --cc=dmitry.kozliuk@gmail.com \
    --cc=dmitrym@microsoft.com \
    --cc=fady@mellanox.com \
    --cc=harini.ramakrishnan@microsoft.com \
    --cc=navasile@linux.microsoft.com \
    --cc=ocardona@microsoft.com \
    --cc=ophirmu@mellanox.com \
    --cc=pallavi.kadam@intel.com \
    --cc=ranjit.menon@intel.com \
    --cc=talshn@mellanox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).