From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id 92B8C4D3A for ; Tue, 20 Mar 2018 11:28:01 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 20 Mar 2018 03:28:00 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.48,334,1517904000"; d="scan'208";a="209754131" Received: from aburakov-mobl.ger.corp.intel.com (HELO [10.237.220.112]) ([10.237.220.112]) by orsmga005.jf.intel.com with ESMTP; 20 Mar 2018 03:27:56 -0700 To: Olivier Matz Cc: dev@dpdk.org, keith.wiles@intel.com, jianfeng.tan@intel.com, andras.kovacs@ericsson.com, laszlo.vadkeri@ericsson.com, benjamin.walker@intel.com, bruce.richardson@intel.com, thomas@monjalon.net, konstantin.ananyev@intel.com, kuralamudhan.ramakrishnan@intel.com, louise.m.daly@intel.com, nelio.laranjeiro@6wind.com, yskoh@mellanox.com, pepperjo@japf.ch, jerin.jacob@caviumnetworks.com, hemant.agrawal@nxp.com References: <20180319173053.khzj5xvmkwqgrozn@platinum> From: "Burakov, Anatoly" Message-ID: <013728e6-8c4d-bb1a-a19e-b74d321765ac@intel.com> Date: Tue, 20 Mar 2018 10:27:55 +0000 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 MIME-Version: 1.0 In-Reply-To: <20180319173053.khzj5xvmkwqgrozn@platinum> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [PATCH 00/41] Memory Hotplug for DPDK X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Mar 2018 10:28:02 -0000 On 19-Mar-18 5:30 PM, Olivier Matz wrote: > Hi Anatoly, > > On Sat, Mar 03, 2018 at 01:45:48PM +0000, Anatoly Burakov wrote: >> This patchset introduces dynamic memory allocation for DPDK (aka memory >> hotplug). Based upon RFC submitted in December [1]. >> >> Dependencies (to be applied in specified order): >> - IPC bugfixes patchset [2] >> - IPC improvements patchset [3] >> - IPC asynchronous request API patch [4] >> - Function to return number of sockets [5] >> >> Deprecation notices relevant to this patchset: >> - General outline of memory hotplug changes [6] >> - EAL NUMA node count changes [7] >> >> The vast majority of changes are in the EAL and malloc, the external API >> disruption is minimal: a new set of API's are added for contiguous memory >> allocation for rte_memzone, and a few API additions in rte_memory due to >> switch to memseg_lists as opposed to memsegs. Every other API change is >> internal to EAL, and all of the memory allocation/freeing is handled >> through rte_malloc, with no externally visible API changes. >> >> Quick outline of all changes done as part of this patchset: >> >> * Malloc heap adjusted to handle holes in address space >> * Single memseg list replaced by multiple memseg lists >> * VA space for hugepages is preallocated in advance >> * Added alloc/free for pages happening as needed on rte_malloc/rte_free >> * Added contiguous memory allocation API's for rte_memzone >> * Integrated Pawel Wodkowski's patch for registering/unregistering memory >> with VFIO [8] >> * Callbacks for registering memory allocations >> * Multiprocess support done via DPDK IPC introduced in 18.02 >> >> The biggest difference is a "memseg" now represents a single page (as opposed to >> being a big contiguous block of pages). As a consequence, both memzones and >> malloc elements are no longer guaranteed to be physically contiguous, unless >> the user asks for it at reserve time. To preserve whatever functionality that >> was dependent on previous behavior, a legacy memory option is also provided, >> however it is expected (or perhaps vainly hoped) to be temporary solution. >> >> Why multiple memseg lists instead of one? Since memseg is a single page now, >> the list of memsegs will get quite big, and we need to locate pages somehow >> when we allocate and free them. We could of course just walk the list and >> allocate one contiguous chunk of VA space for memsegs, but this >> implementation uses separate lists instead in order to speed up many >> operations with memseg lists. >> >> For v1, the following limitations are present: >> - FreeBSD does not even compile, let alone run >> - No 32-bit support >> - There are some minor quality-of-life improvements planned that aren't >> ready yet and will be part of v2 >> - VFIO support is only smoke-tested (but is expected to work), VFIO support >> with secondary processes is not tested; work is ongoing to validate VFIO >> for all use cases >> - Dynamic mapping/unmapping memory with VFIO is not supported in sPAPR >> IOMMU mode - help from sPAPR maintainers requested >> >> Nevertheless, this patchset should be testable under 64-bit Linux, and >> should work for all use cases bar those mentioned above. >> >> [1] http://dpdk.org/dev/patchwork/bundle/aburakov/Memory_RFC/ >> [2] http://dpdk.org/dev/patchwork/bundle/aburakov/IPC_Fixes/ >> [3] http://dpdk.org/dev/patchwork/bundle/aburakov/IPC_Improvements/ >> [4] http://dpdk.org/dev/patchwork/bundle/aburakov/IPC_Async_Request/ >> [5] http://dpdk.org/dev/patchwork/bundle/aburakov/Num_Sockets/ >> [6] http://dpdk.org/dev/patchwork/patch/34002/ >> [7] http://dpdk.org/dev/patchwork/patch/33853/ >> [8] http://dpdk.org/dev/patchwork/patch/24484/ > > I did a quick pass on your patches (unfortunately, I don't have > the time to really dive in it). > > I have few questions/comments: > > - This is really a big patchset. Thank you for working on this topic. > I'll try to test our application with it as soon as possible. > > - I see from patch 17 that it is possible that rte_malloc() expands > the heap by requesting more memory to the OS? Did I understand well? > Today, a good property of rte_malloc() compared to malloc() is that > it won't interrupt the process (the worst case is a spinlock). This > is appreciable on a dataplane core. Will it change? Hi Olivier, Not sure what you mean by "interrupt the process". The new rte_malloc will _mostly_ work just like the old one. There are now two levels of locks: the heap lock, and the system allocation lock. If your rte_malloc call requests amount of memory that can be satisfied by already allocated memory, then only the heap lock is engaged - or, to put it in other words, things work as before. When you *don't* have enough memory allocated, previously rte_malloc would just fail. Now, it instead will lock the second lock and try to allocate more memory from the system. This requires IPC (to ensure all processes have allocated/freed the same memory), so this will take way longer (timeout is set to wait up to 5 seconds, although under normal circumstances it's taking a lot less - depending on how many processes you have running, but generally under 100ms), and will block other system allocations (i.e. if another rte_malloc call on another heap is trying to request more memory from the system). So, in short - you can't allocate from the same heap in parallel (same as before), and you can't have parallel system memory allocation requests (regardless of from which heap it comes from). The latter *only* applies to system memory allocations - that is, if one heap is allocating system memory while another heap receives allocation request *and is able to satisfy it from already allocated memory*, it will not block, because the second lock is never engaged. > > - It's not a big issue, but I have the feeling that the "const" statement > is often forgotten in the patchset. I think it is helpful for both > optimization, documentation and to detect bugs that modifies/free > something that should not. Generally, if things aren't const, they aren't for a reason :) I made things const by default and removed constness once i needed to. However, there may have been a few places where i changed the code around but forgot to put constness back. I'll look into it. Thanks for your reviews! > > I'm sending some other dummy comments as replies to patches. > > Thanks, > Olivier > -- Thanks, Anatoly