From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by dpdk.org (Postfix) with ESMTP id CF1191B370 for ; Fri, 22 Dec 2017 10:13:08 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 22 Dec 2017 01:13:07 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.45,440,1508828400"; d="scan'208";a="4897561" Received: from aburakov-mobl.ger.corp.intel.com (HELO [10.252.17.195]) ([10.252.17.195]) by orsmga006.jf.intel.com with ESMTP; 22 Dec 2017 01:13:05 -0800 To: "Walker, Benjamin" , "dev@dpdk.org" Cc: "thomas@monjalon.net" , "andras.kovacs@ericsson.com" , "Wiles, Keith" , "Richardson, Bruce" References: <1513892309.2658.80.camel@intel.com> From: "Burakov, Anatoly" Message-ID: Date: Fri, 22 Dec 2017 09:13:04 +0000 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.5.0 MIME-Version: 1.0 In-Reply-To: <1513892309.2658.80.camel@intel.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [RFC v2 00/23] Dynamic memory allocation for DPDK X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Dec 2017 09:13:09 -0000 On 21-Dec-17 9:38 PM, Walker, Benjamin wrote: > On Tue, 2017-12-19 at 11:14 +0000, Anatoly Burakov wrote: >> > >> Quick outline of all changes done as part of this patchset: >> >> * Malloc heap adjusted to handle holes in address space >> * Single memseg list replaced by multiple expandable memseg lists >> * VA space for hugepages is preallocated in advance >> * Added dynamic alloc/free for pages, happening as needed on malloc/free > > SPDK will need some way to register for a notification when pages are allocated > or freed. For storage, the number of requests per second is (relative to > networking) fairly small (hundreds of thousands per second in a traditional > block storage stack, or a few million per second with SPDK). Given that, we can > afford to do a dynamic lookup from va to pa/iova on each request in order to > greatly simplify our APIs (users can just pass pointers around instead of > mbufs). DPDK has a way to lookup the pa from a given va, but it does so by > scanning /proc/self/pagemap and is very slow. SPDK instead handles this by > implementing a lookup table of va to pa/iova which we populate by scanning > through the DPDK memory segments at start up, so the lookup in our table is > sufficiently fast for storage use cases. If the list of memory segments changes, > we need to know about it in order to update our map. Hi Benjamin, So, in other words, we need callbacks on alloa/free. What information would SPDK need when receiving this notification? Since we can't really know in advance how many pages we allocate (it may be one, it may be a thousand) and they no longer are guaranteed to be contiguous, would a per-page callback be OK? Alternatively, we could have one callback per operation, but only provide VA and size of allocated memory, while leaving everything else to the user. I do add a virt2memseg() function which would allow you to look up segment physical addresses easier, so you won't have to manually scan memseg lists to get IOVA for a given VA. Thanks for your feedback and suggestions! > > Having the map also enables a number of other nice things - for instance we > allow users to register memory that wasn't allocated through DPDK and use it for > DMA operations. We keep that va to pa/iova mapping in the same map. I appreciate > you adding APIs to dynamically register this type of memory with the IOMMU on > our behalf. That allows us to eliminate a nasty hack where we were looking up > the vfio file descriptor through sysfs in order to send the registration ioctl. > >> * Added contiguous memory allocation API's for rte_malloc and rte_memzone >> * Integrated Pawel Wodkowski's patch [1] for registering/unregistering memory >> with VFIO -- Thanks, Anatoly