From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by dpdk.org (Postfix) with ESMTP id DF6AA38EB for ; Thu, 14 Apr 2016 10:48:51 +0200 (CEST) Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga101.fm.intel.com with ESMTP; 14 Apr 2016 01:48:50 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.24,484,1455004800"; d="scan'208";a="784677184" Received: from smonroyx-mobl.ger.corp.intel.com (HELO [10.237.220.53]) ([10.237.220.53]) by orsmga003.jf.intel.com with ESMTP; 14 Apr 2016 01:48:49 -0700 To: Thomas Monjalon References: <1500486.8lzTDt5Q91@xps13> Cc: dev@dpdk.org From: Sergio Gonzalez Monroy Message-ID: <570F5971.4070607@intel.com> Date: Thu, 14 Apr 2016 09:48:49 +0100 User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.7.2 MIME-Version: 1.0 In-Reply-To: <1500486.8lzTDt5Q91@xps13> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] memory allocation requirements X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 14 Apr 2016 08:48:52 -0000 On 13/04/2016 17:03, Thomas Monjalon wrote: > After looking at the patches for container support, it appears that > some changes are needed in the memory management: > http://thread.gmane.org/gmane.comp.networking.dpdk.devel/32786/focus=32788 +1 > I think it is time to collect what are the needs and expectations of > the DPDK memory allocator. The goal is to satisfy every needs while > cleaning the API. > Here is a first try to start the discussion. > > The memory allocator has 2 classes of API in DPDK. > First the user/application allows or requires DPDK to take over some > memory resources of the system. The characteristics can be: > - numa node > - page size > - swappable or not > - contiguous (cannot be guaranteed) or not > - physical address (as root only) I think this ties up with the different command line options related to memory. We have 3 choices: 1) no option : allocate all free hugepages in the system. Read free hugepages from sysfs (possible race conditions if multiple mount points for the same page size). We also need to account for a limit on the hugetlbfs mount, plus if we have a cgroup it looks like we have no other way than handle SIGBUS signal to deal with the fact that we may succeed allocating the hugepages even though they are not pre-faulted (this happens with MAP_POPULATE option too). 2) -m : allocate as much memory regardless of the numa node. 3) --socket-mem : allocate memory per numa node. At the moment we are not able to specify how much memory of a given page size we want to allocate. So would be provide contiguous memory as an option changing default behavior? > Then the drivers or other libraries use the memory through > - rte_malloc > - rte_memzone > - rte_mempool > I think we can integrate the characteristics of the requested memory > in rte_malloc. Then rte_memzone would be only a named rte_malloc. > The rte_mempool still focus on collection of objects with cache. So the other bit we need to remember is the memory for the hardware queues. There is already an API in ethdev rte_eth_dma_zone_reserve() which I think would make sense to move to EAL so the memory allocator can guarantee contig memory transparently for the cases that we may have memory of different hugepage sizes. > If a rework happens, maybe that the build options CONFIG_RTE_LIBRTE_IVSHMEM > and CONFIG_RTE_EAL_SINGLE_FILE_SEGMENTS can be removed. > The Xen support should also be better integrated. CONFIG_RTE_LIBRTE_IVSHMEM should probably be a runtime option and CONFIG_RTE_EAL_SINGLE_FILE_SEGMENTS could likely be removed once we have a single mmap file for hugepages. > Currently, the first class of API is directly implemented as command line > parameters. Please let's think of C functions first. > The EAL parameters should simply wrap some API functions and let the > applications tune the memory initialization with a well documented API. > > Probably that I forget some needs, e.g. for the secondary processes. > Please comment. Regards, Sergio