DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] Fwd: [Minutes 04/15/2020] Bi-Weekly DPDK Windows Community Call
@ 2020-04-15 21:02 Thomas Monjalon
  2020-04-16 23:46 ` [dpdk-dev] " Dmitry Kozlyuk
  0 siblings, 1 reply; 4+ messages in thread
From: Thomas Monjalon @ 2020-04-15 21:02 UTC (permalink / raw)
  To: dev
  Cc: Harini Ramakrishnan, Omar Cardona, Dmitry Malloy,
	Narcisa Ana Maria Vasile, Pallavi Kadam, Ranjit Menon,
	Dmitry Kozlyuk, Tal Shnaiderman, Fady Bader, Ophir Munk

From: Harini Ramakrishnan <Harini.Ramakrishnan@microsoft.com>

Attendees: Thomas Monjalon, Ranjit Menon, Tal Shnaiderman, Pallavi Kadam, Ranjit Menon, Narcisa Ana Maria Vasile, Harini Ramakrishnan, Dmitry Kozliuk, William Tu, Tasnim Bashar


1/ UIO Driver

  *   New Intel-Microsoft DA needed to unblock licensing issues.
  *   Naty will apply fixes with Dmitry K's feedback on UIO source.
  *   After licensing headers on source files are updated, release process for build and binary to begin.
  *   [AI Harini & Ranjit] To expedite licensing agreement resolution to unblock UIO driver release.


2/ Dmitry's Patches on EAL's memory management interface, virt2phys driver

  *   Naty and Ranjit have reviewed EAL and virt2phys driver
  *   Dmitry sent v3 patches with review comments.
  *   Vitr2phys driver code integrated in kmods repo
  *   Awaiting memory management community maintainer review, Dmitry M's comments on patches
  *   Awaiting comments on impact on Linux and FreeBSD code
  *   Awaiting Dmitry M's comment on patches.
  *   [AI Dmitry K, Harini] Dmitry K to send summary of conversation for feedback, Harini to follow-up for resolution.
  *   [AI Dmitry K] To send out v4 after all reviews are addressed.


3/ PCI Probing

  *   Mellanox internally reviewing patches for PCI bus
  *   Will be send out for review in the next 2 weeks.
  *   Next step is for Intel to provides patches for specific steps needed when probing PCI devices which are bound to netuio.


4/ Opens

  *   Intel is working to get logging patch set, will be sent out for review in few weeks.
  *   [AI Harini] Need Microsoft expert allocation to drive issues for clang compiler support for Windows DPDK

        *   https://bugs.llvm.org/show_bug.cgi?id=24383
        *   Harini to setup a focused conversation around prioritization of DPDK for Windows.
     *   IHVs not acknowledging Roadmap proposal will be removed from Roadmap.




^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [dpdk-dev] [Minutes 04/15/2020] Bi-Weekly DPDK Windows Community Call
  2020-04-15 21:02 [dpdk-dev] Fwd: [Minutes 04/15/2020] Bi-Weekly DPDK Windows Community Call Thomas Monjalon
@ 2020-04-16 23:46 ` Dmitry Kozlyuk
  2020-04-17  7:47   ` Thomas Monjalon
  0 siblings, 1 reply; 4+ messages in thread
From: Dmitry Kozlyuk @ 2020-04-16 23:46 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: dev, Harini Ramakrishnan, Omar Cardona, Dmitry Malloy,
	Narcisa Ana Maria Vasile, Pallavi Kadam, Ranjit Menon,
	Tal Shnaiderman, Fady Bader, Ophir Munk, Anatoly Burakov

>   *   [AI Dmitry K, Harini] Dmitry K to send summary of conversation for feedback, Harini to follow-up for resolution.

On Windows community calls we've been discussing memory management
implementation approaches and plans. This summary aims to bring everyone
interested to the same page and to record information in one public place.

[Dmitry M] is Dmitry Malloy from Microsoft, [Dmitry K] is me.
Cc'ing Anatoly Burakov as DPDK memory subsystem maintainer.


Current State
-------------

Patches are sent for basic memory management that should be suitable for most
simple cases. Relevant implementation traits are as follows:

* IOVA as PA only, PA is obtained via a kernel-mode driver.
* Hugepages are allocated dynamically in user-mode (2MB only),
  IOVA-contiguity is provided by allocator to the extent possible.
* No multi-process support.


Background and Findings
-----------------------

Physical addresses are fundamentally limited and insecure because of the
following (this list is not specific to Windows, but provides context):

1. A user-mode application with access to DMA and PA can convince the
   device to overwrite arbitrary RAM content, bypassing OS security.

2. IOMMU might be engaged rendering PA invalid for a particular device.
   This mode is mandatory for PCI passthrough into VM.

3. IOMMU may be used even on a bare-metal system to protect against #1 by
   limiting DMA for a device to IOMMU mappings. Zero-copy forwarding using
   DMA from different RX and TX devices must take care of this. On Windows,
   such mechanism is called Kernel DMA Protection [1].

4. Device can be VA-only with an onboard IOMMU (e.g. Mellanox NICs).

5. In complex PCI topologies logical bus addresses may differ from PA,
   although a concrete example is missing for modern systems (IoT SoC?).


Within Windows kernel there are two facilities to deal with the above:

1. DMA_ADAPTER interface and its AllocateDomainCommonBuffer() method [2].
   "DMA adapter" is an abstraction of bus-master mode or an allocated channel
   of a DMA controller. Also, each device belongs to a DMA domain, initially
   its so-called default domain. Only devices of the same domain can have a
   buffer suitable for DMA by all devices. In that, DMA domains are similar
   to IOMMU groups in Linux.

   Besides domain management, this interface allows allocation of such a
   common buffer, that is, a contiguous range of IOVA (logical addresses) and
   kernel VA (which can be mapped to user-space). Advantages of this
   interface: 1) it is universal w.r.t. PCI topology, IOMMU, etc; 2) it
   supports hugepages. One disadvantage is that kernel controls IOVA and VA.

2. DMA_IOMMU interface which is functionally similar to Linux VFIO driver,
   that is, it allows management of IOMMU mappings within a domain [3].

[Dmitry M] Microsoft considers creating a generic memory-management driver
exposing (some of) these interfaces which will be shipped with Windows. This
is an idea on its early stage, not a commitment.


Notable DPDK memory management traits:

1. When memory is requested from EAL, it is unknown whether it will be used
for DMA and with which device. The hint is when rte_virt2iova() is called,
but this is not the case for VA-only devices.

2. Memory is reserved and then committed in segments (basically, hugepages).

3. There is a callback for segment list allocation and deallocation. For
example, Linux EAL uses it to create IOMMU mappings when VFIO is engaged.

4. There are drivers that explicitly request PA via rte_virt2phys().


Last but not the least, user-mode memory management notes:

1. Windows doesn't report limits on the number of hugepages.

2. By official documentation, only 2MB hugepages are supported.

   [Dmitry M] There are new, still undocumented Win32 API flags for 1GB [5].
   [Dmitry K] Found a novel allocator library using these new features [6].
   Failed to make use of [5] with AWE, unclear how to integrate into MM.

3. Address Windowing Extensions [4] allow allocating physical page
   frames (PFN) and then mapping them to VA, all in user-mode.

   [Dmitry K] Experiments show AWE cannot allocate hugepages (in a documented
   way at least) and cannot reliably provide contiguous ranges (and does not
   guarantee it). IMO, this interface is useless for common MM. Some drivers
   that do not need hugepages but require PA may benefit from it.


Opens
-----

IMO, "Advanced memory management" milestone from roadmap should be split.
There are three major points of MM improvement, each requiring research and a
complex patch:

1. Proper DMA buffers via AllocateDomainCommonBuffer (DPDK part is unclear).
2. VFIO-like code in Windows EAL using DMA_IOMMU.
3. Support for 1GB hugepages and related changes.

Windows kernel interfaces described above have poor documentation. On Windows
community call 2020-04-01 Dmitry Malloy agreed to help with this (concrete
questions were raised and noted).

Hugepages of 1GB are desirable, but allocating them relies on undocumented
features. Also, because Windows does not provide hugepage limits, it may
require more work to manage multiple sizes in DPDK.


References
----------

[1]: Kernel DMA Protection for Thunderbolt™ 3
<https://docs.microsoft.com/en-us/windows/security/information-protection/kernel-dma-protection-for-thunderbolt>
[2]: DMA_IOMMU interface -
<https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/wdm/nc-wdm-iommu_map_identity_range>
[3]: DMA_ADAPTER.AllocateDomainCommonBuffer -
<https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/wdm/nc-wdm-pallocate_domain_common_buffer>
[4]: Address Windowing Extensions (AWE)
<https://docs.microsoft.com/en-us/windows/win32/memory/address-windowing-extensions>
[5]: GitHub issue <https://github.com/dotnet/runtime/issues/12779>
[6]: mimalloc <https://github.com/microsoft/mimalloc>

-- 
Dmitry Kozlyuk

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [dpdk-dev] [Minutes 04/15/2020] Bi-Weekly DPDK Windows Community Call
  2020-04-16 23:46 ` [dpdk-dev] " Dmitry Kozlyuk
@ 2020-04-17  7:47   ` Thomas Monjalon
  2020-04-17 15:10     ` Dmitry Kozlyuk
  0 siblings, 1 reply; 4+ messages in thread
From: Thomas Monjalon @ 2020-04-17  7:47 UTC (permalink / raw)
  To: Dmitry Kozlyuk
  Cc: dev, Harini Ramakrishnan, Omar Cardona, Dmitry Malloy,
	Narcisa Ana Maria Vasile, Pallavi Kadam, Ranjit Menon,
	Tal Shnaiderman, Fady Bader, Ophir Munk, Anatoly Burakov

17/04/2020 01:46, Dmitry Kozlyuk:
> >   *   [AI Dmitry K, Harini] Dmitry K to send summary of conversation for feedback, Harini to follow-up for resolution.
> 
> On Windows community calls we've been discussing memory management
> implementation approaches and plans. This summary aims to bring everyone
> interested to the same page and to record information in one public place.
> 
> [Dmitry M] is Dmitry Malloy from Microsoft, [Dmitry K] is me.
> Cc'ing Anatoly Burakov as DPDK memory subsystem maintainer.
> 
> 
> Current State
> -------------
> 
> Patches are sent for basic memory management that should be suitable for most
> simple cases. Relevant implementation traits are as follows:
> 
> * IOVA as PA only, PA is obtained via a kernel-mode driver.
> * Hugepages are allocated dynamically in user-mode (2MB only),
>   IOVA-contiguity is provided by allocator to the extent possible.
> * No multi-process support.
> 
> 
> Background and Findings
> -----------------------
> 
> Physical addresses are fundamentally limited and insecure because of the
> following (this list is not specific to Windows, but provides context):
> 
> 1. A user-mode application with access to DMA and PA can convince the
>    device to overwrite arbitrary RAM content, bypassing OS security.
> 
> 2. IOMMU might be engaged rendering PA invalid for a particular device.
>    This mode is mandatory for PCI passthrough into VM.
> 
> 3. IOMMU may be used even on a bare-metal system to protect against #1 by
>    limiting DMA for a device to IOMMU mappings. Zero-copy forwarding using
>    DMA from different RX and TX devices must take care of this. On Windows,
>    such mechanism is called Kernel DMA Protection [1].
> 
> 4. Device can be VA-only with an onboard IOMMU (e.g. Mellanox NICs).

Mellanox NICs work also with PA memory.

> 5. In complex PCI topologies logical bus addresses may differ from PA,
>    although a concrete example is missing for modern systems (IoT SoC?).
> 
> 
> Within Windows kernel there are two facilities to deal with the above:
> 
> 1. DMA_ADAPTER interface and its AllocateDomainCommonBuffer() method [2].
>    "DMA adapter" is an abstraction of bus-master mode or an allocated channel
>    of a DMA controller. Also, each device belongs to a DMA domain, initially
>    its so-called default domain. Only devices of the same domain can have a
>    buffer suitable for DMA by all devices. In that, DMA domains are similar
>    to IOMMU groups in Linux.
> 
>    Besides domain management, this interface allows allocation of such a
>    common buffer, that is, a contiguous range of IOVA (logical addresses) and
>    kernel VA (which can be mapped to user-space). Advantages of this
>    interface: 1) it is universal w.r.t. PCI topology, IOMMU, etc; 2) it
>    supports hugepages. One disadvantage is that kernel controls IOVA and VA.
> 
> 2. DMA_IOMMU interface which is functionally similar to Linux VFIO driver,
>    that is, it allows management of IOMMU mappings within a domain [3].
> 
> [Dmitry M] Microsoft considers creating a generic memory-management driver
> exposing (some of) these interfaces which will be shipped with Windows. This
> is an idea on its early stage, not a commitment.

DMA_ADAPTER and DMA_IOMMU are kernel interfaces, without any userspace API?


> Notable DPDK memory management traits:
> 
> 1. When memory is requested from EAL, it is unknown whether it will be used
> for DMA and with which device. The hint is when rte_virt2iova() is called,
> but this is not the case for VA-only devices.
> 
> 2. Memory is reserved and then committed in segments (basically, hugepages).
> 
> 3. There is a callback for segment list allocation and deallocation. For
> example, Linux EAL uses it to create IOMMU mappings when VFIO is engaged.
> 
> 4. There are drivers that explicitly request PA via rte_virt2phys().
> 
> 
> Last but not the least, user-mode memory management notes:
> 
> 1. Windows doesn't report limits on the number of hugepages.
> 
> 2. By official documentation, only 2MB hugepages are supported.
> 
>    [Dmitry M] There are new, still undocumented Win32 API flags for 1GB [5].
>    [Dmitry K] Found a novel allocator library using these new features [6].
>    Failed to make use of [5] with AWE, unclear how to integrate into MM.
> 
> 3. Address Windowing Extensions [4] allow allocating physical page
>    frames (PFN) and then mapping them to VA, all in user-mode.
> 
>    [Dmitry K] Experiments show AWE cannot allocate hugepages (in a documented
>    way at least) and cannot reliably provide contiguous ranges (and does not
>    guarantee it). IMO, this interface is useless for common MM. Some drivers
>    that do not need hugepages but require PA may benefit from it.
> 
> 
> Opens
> -----
> 
> IMO, "Advanced memory management" milestone from roadmap should be split.

Yes for splitting. Feel free to send a patch for the roadmap.
And we should plan these tasks later in the year.
Basic memory management should be enough for first steps with PMDs.

> There are three major points of MM improvement, each requiring research and a
> complex patch:
> 
> 1. Proper DMA buffers via AllocateDomainCommonBuffer (DPDK part is unclear).
> 2. VFIO-like code in Windows EAL using DMA_IOMMU.
> 3. Support for 1GB hugepages and related changes.
> 
> Windows kernel interfaces described above have poor documentation. On Windows
> community call 2020-04-01 Dmitry Malloy agreed to help with this (concrete
> questions were raised and noted).
> 
> Hugepages of 1GB are desirable, but allocating them relies on undocumented
> features. Also, because Windows does not provide hugepage limits, it may
> require more work to manage multiple sizes in DPDK.
> 
> 
> References
> ----------
> 
> [1]: Kernel DMA Protection for Thunderbolt™ 3
> <https://docs.microsoft.com/en-us/windows/security/information-protection/kernel-dma-protection-for-thunderbolt>
> [2]: DMA_IOMMU interface -
> <https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/wdm/nc-wdm-iommu_map_identity_range>
> [3]: DMA_ADAPTER.AllocateDomainCommonBuffer -
> <https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/wdm/nc-wdm-pallocate_domain_common_buffer>
> [4]: Address Windowing Extensions (AWE)
> <https://docs.microsoft.com/en-us/windows/win32/memory/address-windowing-extensions>
> [5]: GitHub issue <https://github.com/dotnet/runtime/issues/12779>
> [6]: mimalloc <https://github.com/microsoft/mimalloc>


Thanks for the great summary.



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [dpdk-dev] [Minutes 04/15/2020] Bi-Weekly DPDK Windows Community Call
  2020-04-17  7:47   ` Thomas Monjalon
@ 2020-04-17 15:10     ` Dmitry Kozlyuk
  0 siblings, 0 replies; 4+ messages in thread
From: Dmitry Kozlyuk @ 2020-04-17 15:10 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: dev, Harini Ramakrishnan, Omar Cardona, Dmitry Malloy,
	Narcisa Ana Maria Vasile, Pallavi Kadam, Ranjit Menon,
	Tal Shnaiderman, Fady Bader, Ophir Munk, Anatoly Burakov

> > 4. Device can be VA-only with an onboard IOMMU (e.g. Mellanox NICs).  
> 
> Mellanox NICs work also with PA memory.

IIRC, you told us there's something special with Mellanox NICs and IOMMU.

Documentation says (https://doc.dpdk.org/guides/nics/mlx5.html):

	For security reasons and robustness, this driver only deals with
	virtual memory addresses. The way resources allocations are handled
	by the kernel, combined with hardware specifications that allow to
	handle virtual memory addresses directly, ensure that DPDK
	applications cannot access random physical memory (or memory that
	does not belong to the current process).


> DMA_ADAPTER and DMA_IOMMU are kernel interfaces, without any userspace API?

Correct.

-- 
Dmitry Kozlyuk

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-04-17 15:10 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-15 21:02 [dpdk-dev] Fwd: [Minutes 04/15/2020] Bi-Weekly DPDK Windows Community Call Thomas Monjalon
2020-04-16 23:46 ` [dpdk-dev] " Dmitry Kozlyuk
2020-04-17  7:47   ` Thomas Monjalon
2020-04-17 15:10     ` Dmitry Kozlyuk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).