From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id EF2C12C01 for ; Wed, 4 Jan 2017 12:39:20 +0100 (CET) Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga102.jf.intel.com with ESMTP; 04 Jan 2017 03:39:19 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.33,459,1477983600"; d="scan'208";a="1079058189" Received: from shwdeisgchi083.ccr.corp.intel.com (HELO [10.239.67.193]) ([10.239.67.193]) by orsmga001.jf.intel.com with ESMTP; 04 Jan 2017 03:39:18 -0800 To: "Walker, Benjamin" , "dev@dpdk.org" References: <1483044080.11975.1.camel@intel.com> From: "Tan, Jianfeng" Message-ID: <685186b4-e50e-c122-459b-e4635404c3f8@intel.com> Date: Wed, 4 Jan 2017 19:39:18 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.6.0 MIME-Version: 1.0 In-Reply-To: <1483044080.11975.1.camel@intel.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] Running DPDK as an unprivileged user X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 04 Jan 2017 11:39:21 -0000 Hi Benjamin, On 12/30/2016 4:41 AM, Walker, Benjamin wrote: > Hi all, > > I've been digging in to what it would take to run DPDK as an > unprivileged user and I have some findings that I thought > were worthy of discussion. The assumptions here are that I'm > using a very recent Linux kernel (4.8.15 to be specific) and > I'm using vfio with my IOMMU enabled. I'm only interested in > making it possible to run as an unprivileged user in this > type of environment. > > There are a few key things that DPDK needs to do in order to > run as an unprivileged user: > > 1) Allocate hugepages > 2) Map device resources > 3) Map hugepage virtual addresses to DMA addresses. > > For #1 and #2, DPDK works just fine today. You simply chown > the relevant resources in sysfs to the desired user and > everything is happy. > > The problem is #3. This currently relies on looking up the > mappings in /proc/self/pagemap, but the ability to get > physical addresses in /proc/self/pagemap as an unprivileged > user was removed from the kernel in the 4.x timeframe due to > the Rowhammer vulnerability. At this time, it is not > possible to run DPDK as an unprivileged user on a 4.x Linux > kernel. > > There is a way to make this work though, which I'll outline > now. Unfortunately, I think it is going to require some very > significant changes to the initialization flow in the EAL. > One bit of of background before I go into how to fix this - > there are three types of memory addresses - virtual > addresses, physical addresses, and DMA addresses. Sometimes > DMA addresses are called bus addresses or I/O addresses, but > I'll call them DMA addresses because I think that's the > clearest name. In a system without an IOMMU, DMA addresses > and physical addresses are equivalent, but in a system with > an IOMMU any arbitrary DMA address can be chosen by the user > to map to a given physical address. For security reasons > (rowhammer), it is no longer considered safe to expose > physical addresses to userspace, but it is perfectly fine to > expose DMA addresses when an IOMMU is present. > > DPDK today begins by allocating all of the required > hugepages, then finds all of the physical addresses for > those hugepages using /proc/self/pagemap, sorts the > hugepages by physical address, then remaps the pages to > contiguous virtual addresses. Later on and if vfio is > enabled, it asks vfio to pin the hugepages and to set their > DMA addresses in the IOMMU to be the physical addresses > discovered earlier. Of course, running as an unprivileged > user means all of the physical addresses in > /proc/self/pagemap are just 0, so this doesn't end up > working. Further, there is no real reason to choose the > physical address as the DMA address in the IOMMU - it would > be better to just count up starting at 0. Why not just using virtual address as the DMA address in this case to avoid maintaining another kind of addresses? > Also, because the > pages are pinned after the virtual to physical mapping is > looked up, there is a window where a page could be moved. > Hugepage mappings can be moved on more recent kernels (at > least 4.x), and the reliability of hugepages having static > mappings decreases with every kernel release. Do you mean kernel might take back a physical page after mapping it to a virtual page (maybe copy the data to another physical page)? Could you please show some links or kernel commits? > Note that this > probably means that using uio on recent kernels is subtly > broken and cannot be supported going forward because there > is no uio mechanism to pin the memory. > > The first open question I have is whether DPDK should allow > uio at all on recent (4.x) kernels. My current understanding > is that there is no way to pin memory and hugepages can now > be moved around, so uio would be unsafe. What does the > community think here? > > My second question is whether the user should be allowed to > mix uio and vfio usage simultaneously. For vfio, the > physical addresses are really DMA addresses and are best > when arbitrarily chosen to appear sequential relative to > their virtual addresses. Why "sequential relative to their virtual addresses"? IOMMU table is for DMA addr -> physical addr mapping. So we need to DMA addresses "sequential relative to their physical addresses"? Based on your above analysis on how hugepages are initialized, virtual addresses is a good candidate for DMA address? Thanks, Jianfeng