From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by dpdk.org (Postfix) with ESMTP id 6374B1B2D4 for ; Mon, 5 Feb 2018 11:03:40 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga107.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 05 Feb 2018 02:03:39 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.46,464,1511856000"; d="scan'208";a="28849052" Received: from aburakov-mobl.ger.corp.intel.com (HELO [10.237.220.145]) ([10.237.220.145]) by orsmga001.jf.intel.com with ESMTP; 05 Feb 2018 02:03:36 -0800 To: Yongseok Koh , "Walker, Benjamin" Cc: "dev@dpdk.org" , "thomas@monjalon.net" , "andras.kovacs@ericsson.com" , "Wiles, Keith" , "Richardson, Bruce" References: <1513892309.2658.80.camel@intel.com> <1514308764.2658.93.camel@intel.com> <20180202192832.GA42096@yongseok-MBP.local> From: "Burakov, Anatoly" Message-ID: Date: Mon, 5 Feb 2018 10:03:35 +0000 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 MIME-Version: 1.0 In-Reply-To: <20180202192832.GA42096@yongseok-MBP.local> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [RFC v2 00/23] Dynamic memory allocation for DPDK X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 05 Feb 2018 10:03:40 -0000 On 02-Feb-18 7:28 PM, Yongseok Koh wrote: > On Tue, Dec 26, 2017 at 05:19:25PM +0000, Walker, Benjamin wrote: >> On Fri, 2017-12-22 at 09:13 +0000, Burakov, Anatoly wrote: >>> On 21-Dec-17 9:38 PM, Walker, Benjamin wrote: >>>> SPDK will need some way to register for a notification when pages are >>>> allocated >>>> or freed. For storage, the number of requests per second is (relative to >>>> networking) fairly small (hundreds of thousands per second in a traditional >>>> block storage stack, or a few million per second with SPDK). Given that, we >>>> can >>>> afford to do a dynamic lookup from va to pa/iova on each request in order to >>>> greatly simplify our APIs (users can just pass pointers around instead of >>>> mbufs). DPDK has a way to lookup the pa from a given va, but it does so by >>>> scanning /proc/self/pagemap and is very slow. SPDK instead handles this by >>>> implementing a lookup table of va to pa/iova which we populate by scanning >>>> through the DPDK memory segments at start up, so the lookup in our table is >>>> sufficiently fast for storage use cases. If the list of memory segments >>>> changes, >>>> we need to know about it in order to update our map. >>> >>> Hi Benjamin, >>> >>> So, in other words, we need callbacks on alloa/free. What information >>> would SPDK need when receiving this notification? Since we can't really >>> know in advance how many pages we allocate (it may be one, it may be a >>> thousand) and they no longer are guaranteed to be contiguous, would a >>> per-page callback be OK? Alternatively, we could have one callback per >>> operation, but only provide VA and size of allocated memory, while >>> leaving everything else to the user. I do add a virt2memseg() function >>> which would allow you to look up segment physical addresses easier, so >>> you won't have to manually scan memseg lists to get IOVA for a given VA. >>> >>> Thanks for your feedback and suggestions! >> >> Yes - callbacks on alloc/free would be perfect. Ideally for us we want one >> callback per virtual memory region allocated, plus a function we can call to >> find the physical addresses/page break points on that virtual region. The >> function that finds the physical addresses does not have to be efficient - we'll >> just call that once when the new region is allocated and store the results in a >> fast lookup table. One call per virtual region is better for us than one call >> per physical page because we're actually keeping multiple different types of >> memory address translation tables in SPDK. One translates from va to pa/iova, so >> for this one we need to break this up into physical pages and it doesn't matter >> if you do one call per virtual region or one per physical page. However another >> one translates from va to RDMA lkey, so it is much more efficient if we can >> register large virtual regions in a single call. > > Another yes to callbacks. Like Benjamin mentioned about RDMA, MLX PMD has to > look up LKEY per each packet DMA. Let me briefly explain about this for your > understanding. For security reason, we don't allow application initiates a DMA > transaction with unknown random physical addresses. Instead, va-to-pa mapping > (we call it Memory Region) should be pre-registered and LKEY is the index of the > translation entry registered in device. With the current static memory model, it > is easy to manage because v-p mapping is unchanged over time. But if it becomes > dynamic, MLX PMD should get notified with the event to register/un-regsiter > Memory Region. > > For MLX PMD, it is also enough to get one notification per allocation/free of a > virutal memory region. It shouldn't necessarily be a per-page call like Benjamin > mentioned because PA of region doesn't need to be contiguous for registration. > But it doesn't need to know about physical address of the region (I'm not saying > it is unnecessary, but just FYI :-). > > Thanks, > Yongseok > Thanks for your feedback, good to hear we're on the right track. I already have a prototype implementation of this working, due for v1 submission :) -- Thanks, Anatoly