From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 7C299A058A; Fri, 17 Apr 2020 09:47:58 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 06FBF1DDEF; Fri, 17 Apr 2020 09:47:58 +0200 (CEST) Received: from new4-smtp.messagingengine.com (new4-smtp.messagingengine.com [66.111.4.230]) by dpdk.org (Postfix) with ESMTP id ADCF01DDEC for ; Fri, 17 Apr 2020 09:47:56 +0200 (CEST) Received: from compute7.internal (compute7.nyi.internal [10.202.2.47]) by mailnew.nyi.internal (Postfix) with ESMTP id 18D0C5801BD; Fri, 17 Apr 2020 03:47:55 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute7.internal (MEProxy); Fri, 17 Apr 2020 03:47:55 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=monjalon.net; h= from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding:content-type; s=mesmtp; bh=tuRGC8dE0ZoUOxJGOPzMkt2TvEziyqiNiyBn7VBLPyU=; b=S+YBSi1hM79p 4heyjn55RNITtcB+4qZG8MUyfq9v04qsHw2HB/LnrVXWNfCIVK2BBb9oL8f0/eew JWCl1JgLWgeEe7bgJolzRE7g3BSt6KpaCaTL0zavG8SpF8YD4Gkc/kNfbpwiW9cj XFZQh4a6tXomFQ+9gtYwyjDbvP6oiUE= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm2; bh=tuRGC8dE0ZoUOxJGOPzMkt2TvEziyqiNiyBn7VBLP yU=; b=SmamXpCTpE3E2PTGmBAjLbVzH7RIzG+A403jRnnrtUJWlkY2ovEkNlOwW f4coAvdEVpKES6wLITrVFLV09wYIZhj5EB34VKIPHFd6i5PtOBKusl3WELEeScNe RH+dHch1d2GMHPK0bpnPr5092Un4Hha78AhK9QRIYiIf2OBQxVeDoTqretFNCNlx MCPtzbiDht+jCsnh5D932FddFGUY7z+bIqRmb+2Py+6CwVUb1aZ2AuoNRMwZnjGf S0fSSQxFotMZBaYy3b8y7YS8SjUvhdkR+g+Xkk1rHy0OpCFvvq5qc39ejpwM3GOd F+16D45h0fFG7SrOJ5ZNkWzNtsn6w== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduhedrfeeigdduvdegucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhephffvufffkfgjfhgggfgtsehtqhertddttdejnecuhfhrohhmpefvhhhomhgr shcuofhonhhjrghlohhnuceothhhohhmrghssehmohhnjhgrlhhonhdrnhgvtheqnecuff homhgrihhnpehmihgtrhhoshhofhhtrdgtohhmpdhgihhthhhusgdrtghomhenucfkphep jeejrddufeegrddvtdefrddukeegnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrg hmpehmrghilhhfrhhomhepthhhohhmrghssehmohhnjhgrlhhonhdrnhgvth X-ME-Proxy: Received: from xps.localnet (184.203.134.77.rev.sfr.net [77.134.203.184]) by mail.messagingengine.com (Postfix) with ESMTPA id 63F673060065; Fri, 17 Apr 2020 03:47:53 -0400 (EDT) From: Thomas Monjalon To: Dmitry Kozlyuk Cc: dev@dpdk.org, Harini Ramakrishnan , Omar Cardona , Dmitry Malloy , Narcisa Ana Maria Vasile , Pallavi Kadam , Ranjit Menon , Tal Shnaiderman , Fady Bader , Ophir Munk , Anatoly Burakov Date: Fri, 17 Apr 2020 09:47:51 +0200 Message-ID: <6962862.uKWtJMOXK1@thomas> In-Reply-To: <20200417024633.21d77a3b@Sovereign> References: <4064679.AiC22s8V5E@thomas> <20200417024633.21d77a3b@Sovereign> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="UTF-8" Subject: Re: [dpdk-dev] [Minutes 04/15/2020] Bi-Weekly DPDK Windows Community Call X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" 17/04/2020 01:46, Dmitry Kozlyuk: > > * [AI Dmitry K, Harini] Dmitry K to send summary of conversation fo= r feedback, Harini to follow-up for resolution. >=20 > On Windows community calls we've been discussing memory management > implementation approaches and plans. This summary aims to bring everyone > interested to the same page and to record information in one public place. >=20 > [Dmitry M] is Dmitry Malloy from Microsoft, [Dmitry K] is me. > Cc'ing Anatoly Burakov as DPDK memory subsystem maintainer. >=20 >=20 > Current State > ------------- >=20 > Patches are sent for basic memory management that should be suitable for = most > simple cases. Relevant implementation traits are as follows: >=20 > * IOVA as PA only, PA is obtained via a kernel-mode driver. > * Hugepages are allocated dynamically in user-mode (2MB only), > IOVA-contiguity is provided by allocator to the extent possible. > * No multi-process support. >=20 >=20 > Background and Findings > ----------------------- >=20 > Physical addresses are fundamentally limited and insecure because of the > following (this list is not specific to Windows, but provides context): >=20 > 1. A user-mode application with access to DMA and PA can convince the > device to overwrite arbitrary RAM content, bypassing OS security. >=20 > 2. IOMMU might be engaged rendering PA invalid for a particular device. > This mode is mandatory for PCI passthrough into VM. >=20 > 3. IOMMU may be used even on a bare-metal system to protect against #1 by > limiting DMA for a device to IOMMU mappings. Zero-copy forwarding using > DMA from different RX and TX devices must take care of this. On Window= s, > such mechanism is called Kernel DMA Protection [1]. >=20 > 4. Device can be VA-only with an onboard IOMMU (e.g. Mellanox NICs). Mellanox NICs work also with PA memory. > 5. In complex PCI topologies logical bus addresses may differ from PA, > although a concrete example is missing for modern systems (IoT SoC?). >=20 >=20 > Within Windows kernel there are two facilities to deal with the above: >=20 > 1. DMA_ADAPTER interface and its AllocateDomainCommonBuffer() method [2]. > "DMA adapter" is an abstraction of bus-master mode or an allocated cha= nnel > of a DMA controller. Also, each device belongs to a DMA domain, initia= lly > its so-called default domain. Only devices of the same domain can have= a > buffer suitable for DMA by all devices. In that, DMA domains are simil= ar > to IOMMU groups in Linux. >=20 > Besides domain management, this interface allows allocation of such a > common buffer, that is, a contiguous range of IOVA (logical addresses)= and > kernel VA (which can be mapped to user-space). Advantages of this > interface: 1) it is universal w.r.t. PCI topology, IOMMU, etc; 2) it > supports hugepages. One disadvantage is that kernel controls IOVA and = VA. >=20 > 2. DMA_IOMMU interface which is functionally similar to Linux VFIO driver, > that is, it allows management of IOMMU mappings within a domain [3]. >=20 > [Dmitry M] Microsoft considers creating a generic memory-management driver > exposing (some of) these interfaces which will be shipped with Windows. T= his > is an idea on its early stage, not a commitment. DMA_ADAPTER and DMA_IOMMU are kernel interfaces, without any userspace API? > Notable DPDK memory management traits: >=20 > 1. When memory is requested from EAL, it is unknown whether it will be us= ed > for DMA and with which device. The hint is when rte_virt2iova() is called, > but this is not the case for VA-only devices. >=20 > 2. Memory is reserved and then committed in segments (basically, hugepage= s). >=20 > 3. There is a callback for segment list allocation and deallocation. For > example, Linux EAL uses it to create IOMMU mappings when VFIO is engaged. >=20 > 4. There are drivers that explicitly request PA via rte_virt2phys(). >=20 >=20 > Last but not the least, user-mode memory management notes: >=20 > 1. Windows doesn't report limits on the number of hugepages. >=20 > 2. By official documentation, only 2MB hugepages are supported. >=20 > [Dmitry M] There are new, still undocumented Win32 API flags for 1GB [= 5]. > [Dmitry K] Found a novel allocator library using these new features [6= ]. > Failed to make use of [5] with AWE, unclear how to integrate into MM. >=20 > 3. Address Windowing Extensions [4] allow allocating physical page > frames (PFN) and then mapping them to VA, all in user-mode. >=20 > [Dmitry K] Experiments show AWE cannot allocate hugepages (in a docume= nted > way at least) and cannot reliably provide contiguous ranges (and does = not > guarantee it). IMO, this interface is useless for common MM. Some driv= ers > that do not need hugepages but require PA may benefit from it. >=20 >=20 > Opens > ----- >=20 > IMO, "Advanced memory management" milestone from roadmap should be split. Yes for splitting. Feel free to send a patch for the roadmap. And we should plan these tasks later in the year. Basic memory management should be enough for first steps with PMDs. > There are three major points of MM improvement, each requiring research a= nd a > complex patch: >=20 > 1. Proper DMA buffers via AllocateDomainCommonBuffer (DPDK part is unclea= r). > 2. VFIO-like code in Windows EAL using DMA_IOMMU. > 3. Support for 1GB hugepages and related changes. >=20 > Windows kernel interfaces described above have poor documentation. On Win= dows > community call 2020-04-01 Dmitry Malloy agreed to help with this (concrete > questions were raised and noted). >=20 > Hugepages of 1GB are desirable, but allocating them relies on undocumented > features. Also, because Windows does not provide hugepage limits, it may > require more work to manage multiple sizes in DPDK. >=20 >=20 > References > ---------- >=20 > [1]: Kernel DMA Protection for Thunderbolt=E2=84=A2 3 > > [2]: DMA_IOMMU interface - > > [3]: DMA_ADAPTER.AllocateDomainCommonBuffer - > > [4]: Address Windowing Extensions (AWE) > > [5]: GitHub issue > [6]: mimalloc Thanks for the great summary.