From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.droids-corp.org (zoll.droids-corp.org [94.23.50.67]) by dpdk.org (Postfix) with ESMTP id 7AA9A7F0C for ; Mon, 19 Mar 2018 18:30:59 +0100 (CET) Received: from lfbn-lil-1-702-109.w81-254.abo.wanadoo.fr ([81.254.39.109] helo=droids-corp.org) by mail.droids-corp.org with esmtpsa (TLS1.0:RSA_AES_256_CBC_SHA1:256) (Exim 4.89) (envelope-from ) id 1exyd5-000789-0p; Mon, 19 Mar 2018 18:31:28 +0100 Received: by droids-corp.org (sSMTP sendmail emulation); Mon, 19 Mar 2018 18:30:53 +0100 Date: Mon, 19 Mar 2018 18:30:53 +0100 From: Olivier Matz To: Anatoly Burakov Cc: dev@dpdk.org, keith.wiles@intel.com, jianfeng.tan@intel.com, andras.kovacs@ericsson.com, laszlo.vadkeri@ericsson.com, benjamin.walker@intel.com, bruce.richardson@intel.com, thomas@monjalon.net, konstantin.ananyev@intel.com, kuralamudhan.ramakrishnan@intel.com, louise.m.daly@intel.com, nelio.laranjeiro@6wind.com, yskoh@mellanox.com, pepperjo@japf.ch, jerin.jacob@caviumnetworks.com, hemant.agrawal@nxp.com Message-ID: <20180319173053.khzj5xvmkwqgrozn@platinum> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: NeoMutt/20170113 (1.7.2) Subject: Re: [dpdk-dev] [PATCH 00/41] Memory Hotplug for DPDK X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 19 Mar 2018 17:30:59 -0000 Hi Anatoly, On Sat, Mar 03, 2018 at 01:45:48PM +0000, Anatoly Burakov wrote: > This patchset introduces dynamic memory allocation for DPDK (aka memory > hotplug). Based upon RFC submitted in December [1]. > > Dependencies (to be applied in specified order): > - IPC bugfixes patchset [2] > - IPC improvements patchset [3] > - IPC asynchronous request API patch [4] > - Function to return number of sockets [5] > > Deprecation notices relevant to this patchset: > - General outline of memory hotplug changes [6] > - EAL NUMA node count changes [7] > > The vast majority of changes are in the EAL and malloc, the external API > disruption is minimal: a new set of API's are added for contiguous memory > allocation for rte_memzone, and a few API additions in rte_memory due to > switch to memseg_lists as opposed to memsegs. Every other API change is > internal to EAL, and all of the memory allocation/freeing is handled > through rte_malloc, with no externally visible API changes. > > Quick outline of all changes done as part of this patchset: > > * Malloc heap adjusted to handle holes in address space > * Single memseg list replaced by multiple memseg lists > * VA space for hugepages is preallocated in advance > * Added alloc/free for pages happening as needed on rte_malloc/rte_free > * Added contiguous memory allocation API's for rte_memzone > * Integrated Pawel Wodkowski's patch for registering/unregistering memory > with VFIO [8] > * Callbacks for registering memory allocations > * Multiprocess support done via DPDK IPC introduced in 18.02 > > The biggest difference is a "memseg" now represents a single page (as opposed to > being a big contiguous block of pages). As a consequence, both memzones and > malloc elements are no longer guaranteed to be physically contiguous, unless > the user asks for it at reserve time. To preserve whatever functionality that > was dependent on previous behavior, a legacy memory option is also provided, > however it is expected (or perhaps vainly hoped) to be temporary solution. > > Why multiple memseg lists instead of one? Since memseg is a single page now, > the list of memsegs will get quite big, and we need to locate pages somehow > when we allocate and free them. We could of course just walk the list and > allocate one contiguous chunk of VA space for memsegs, but this > implementation uses separate lists instead in order to speed up many > operations with memseg lists. > > For v1, the following limitations are present: > - FreeBSD does not even compile, let alone run > - No 32-bit support > - There are some minor quality-of-life improvements planned that aren't > ready yet and will be part of v2 > - VFIO support is only smoke-tested (but is expected to work), VFIO support > with secondary processes is not tested; work is ongoing to validate VFIO > for all use cases > - Dynamic mapping/unmapping memory with VFIO is not supported in sPAPR > IOMMU mode - help from sPAPR maintainers requested > > Nevertheless, this patchset should be testable under 64-bit Linux, and > should work for all use cases bar those mentioned above. > > [1] http://dpdk.org/dev/patchwork/bundle/aburakov/Memory_RFC/ > [2] http://dpdk.org/dev/patchwork/bundle/aburakov/IPC_Fixes/ > [3] http://dpdk.org/dev/patchwork/bundle/aburakov/IPC_Improvements/ > [4] http://dpdk.org/dev/patchwork/bundle/aburakov/IPC_Async_Request/ > [5] http://dpdk.org/dev/patchwork/bundle/aburakov/Num_Sockets/ > [6] http://dpdk.org/dev/patchwork/patch/34002/ > [7] http://dpdk.org/dev/patchwork/patch/33853/ > [8] http://dpdk.org/dev/patchwork/patch/24484/ I did a quick pass on your patches (unfortunately, I don't have the time to really dive in it). I have few questions/comments: - This is really a big patchset. Thank you for working on this topic. I'll try to test our application with it as soon as possible. - I see from patch 17 that it is possible that rte_malloc() expands the heap by requesting more memory to the OS? Did I understand well? Today, a good property of rte_malloc() compared to malloc() is that it won't interrupt the process (the worst case is a spinlock). This is appreciable on a dataplane core. Will it change? - It's not a big issue, but I have the feeling that the "const" statement is often forgotten in the patchset. I think it is helpful for both optimization, documentation and to detect bugs that modifies/free something that should not. I'm sending some other dummy comments as replies to patches. Thanks, Olivier