From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id BE9AC5A44 for ; Thu, 5 Jan 2017 16:52:33 +0100 (CET) Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga102.jf.intel.com with ESMTP; 05 Jan 2017 07:52:32 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.33,322,1477983600"; d="scan'208";a="210005334" Received: from shwdeisgchi083.ccr.corp.intel.com (HELO [10.239.67.193]) ([10.239.67.193]) by fmsmga004.fm.intel.com with ESMTP; 05 Jan 2017 07:52:31 -0800 To: "Walker, Benjamin" , "dev@dpdk.org" References: <1483044080.11975.1.camel@intel.com> <685186b4-e50e-c122-459b-e4635404c3f8@intel.com> <1483565664.9482.3.camel@intel.com> From: "Tan, Jianfeng" Message-ID: <6c6766f0-145e-9354-e275-d107d69173c3@intel.com> Date: Thu, 5 Jan 2017 23:52:31 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.6.0 MIME-Version: 1.0 In-Reply-To: <1483565664.9482.3.camel@intel.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] Running DPDK as an unprivileged user X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 05 Jan 2017 15:52:34 -0000 Hi Benjamin, On 1/5/2017 5:34 AM, Walker, Benjamin wrote: > On Wed, 2017-01-04 at 19:39 +0800, Tan, Jianfeng wrote: >> Hi Benjamin, >> >> >> On 12/30/2016 4:41 AM, Walker, Benjamin wrote: >>> DPDK today begins by allocating all of the required >>> hugepages, then finds all of the physical addresses for >>> those hugepages using /proc/self/pagemap, sorts the >>> hugepages by physical address, then remaps the pages to >>> contiguous virtual addresses. Later on and if vfio is >>> enabled, it asks vfio to pin the hugepages and to set their >>> DMA addresses in the IOMMU to be the physical addresses >>> discovered earlier. Of course, running as an unprivileged >>> user means all of the physical addresses in >>> /proc/self/pagemap are just 0, so this doesn't end up >>> working. Further, there is no real reason to choose the >>> physical address as the DMA address in the IOMMU - it would >>> be better to just count up starting at 0. >> Why not just using virtual address as the DMA address in this case to >> avoid maintaining another kind of addresses? > That's a valid choice, although I'm just storing the DMA address in the > physical address field that already exists. You either have a physical > address or a DMA address and never both. Yes, I understand that's why you cast the second question below. > >>> Also, because the >>> pages are pinned after the virtual to physical mapping is >>> looked up, there is a window where a page could be moved. >>> Hugepage mappings can be moved on more recent kernels (at >>> least 4.x), and the reliability of hugepages having static >>> mappings decreases with every kernel release. >> Do you mean kernel might take back a physical page after mapping it to a >> virtual page (maybe copy the data to another physical page)? Could you >> please show some links or kernel commits? > Yes - the kernel can move a physical page to another physical page > and change the virtual mapping at any time. For a concise example > see 'man migrate_pages(2)', or for a more serious example the code > that performs memory page compaction in the kernel which was > recently extended to support hugepages. > > Before we go down the path of me proving that the mapping isn't static, > let me turn that line of thinking around. Do you have any documentation > demonstrating that the mapping is static? It's not static for 4k pages, so > why are we assuming that it is static for 2MB pages? I understand that > it happened to be static for some versions of the kernel, but my understanding > is that this was purely by coincidence and never by intention. Thank you for the information. Based on what you provide above, I realize this behavior could happen since long time ago. > >>> Note that this >>> probably means that using uio on recent kernels is subtly >>> broken and cannot be supported going forward because there >>> is no uio mechanism to pin the memory. >>> >>> The first open question I have is whether DPDK should allow >>> uio at all on recent (4.x) kernels. My current understanding >>> is that there is no way to pin memory and hugepages can now >>> be moved around, so uio would be unsafe. What does the >>> community think here? Back to this question, removing uio support in DPDK seems a little overkill to me. Can we just document it down? Like, firstly warn users do not invoke migrate_pages() or move_pages() to a DPDK process; as for the kcompactd daemon and some more cases (like compaction could be triggered by alloc_pages()), could we just recommend to disable CONFIG_COMPACTION? Another side, how does vfio pin those memory? Through memlock (from code in vfio_pin_pages())? So why not just mlock those hugepages? >>> >>> My second question is whether the user should be allowed to >>> mix uio and vfio usage simultaneously. For vfio, the >>> physical addresses are really DMA addresses and are best >>> when arbitrarily chosen to appear sequential relative to >>> their virtual addresses. >> Why "sequential relative to their virtual addresses"? IOMMU table is for >> DMA addr -> physical addr mapping. So we need to DMA addresses >> "sequential relative to their physical addresses"? Based on your above >> analysis on how hugepages are initialized, virtual addresses is a good >> candidate for DMA address? > The code already goes through a separate organizational step on all of > the pages that remaps the virtual addresses such that they're sequential > relative to the physical backing pages, so this mostly ends up as the same > thing. Agreed. > Choosing to use the virtual address is a totally valid choice, but I worry it > may lead to confusion during debugging or in a multi-process scenario. Make sense. Thanks, Jianfeng