DPDK patches and discussions
 help / color / mirror / Atom feed
From: Stephen Hemminger <stephen@networkplumber.org>
To: "Burakov, Anatoly" <anatoly.burakov@intel.com>
Cc: Thomas Monjalon <thomas@monjalon.net>,
	dev@dpdk.org, andras.kovacs@ericsson.com,
	laszlo.vadkeri@ericsson.com, keith.wiles@intel.com,
	benjamin.walker@intel.com, bruce.richardson@intel.com,
	Yongseok Koh <yskoh@mellanox.com>,
	nelio.laranjeiro@6wind.com, olivier.matz@6wind.com,
	rahul.lakkireddy@chelsio.com, jerin.jacob@cavium.com,
	hemant.agrawal@nxp.com, alejandro.lucero@netronome.com,
	arybchenko@solarflare.com, ferruh.yigit@intel.com,
	Srinath Mannam <srinath.mannam@broadcom.com>
Subject: Re: [dpdk-dev] [RFC v2 00/23] Dynamic memory allocation for DPDK
Date: Wed, 25 Apr 2018 09:12:34 -0700	[thread overview]
Message-ID: <20180425091234.5565aafb@xeon-e3> (raw)
In-Reply-To: <53e192ed-15d8-5fa1-3048-964d92b917b1@intel.com>

On Wed, 25 Apr 2018 17:02:48 +0100
"Burakov, Anatoly" <anatoly.burakov@intel.com> wrote:

> On 14-Feb-18 10:07 AM, Burakov, Anatoly wrote:
> > On 14-Feb-18 8:04 AM, Thomas Monjalon wrote:  
> >> Hi Anatoly,
> >>
> >> 19/12/2017 12:14, Anatoly Burakov:  
> >>>   * Memory tagging. This is related to previous item. Right now, we 
> >>> can only ask
> >>>     malloc to allocate memory by page size, but one could potentially 
> >>> have
> >>>     different memory regions backed by pages of similar sizes (for 
> >>> example,
> >>>     locked 1G pages, to completely avoid TLB misses, alongside 
> >>> regular 1G pages),
> >>>     and it would be good to have that kind of mechanism to 
> >>> distinguish between
> >>>     different memory types available to a DPDK application. One 
> >>> could, for example,
> >>>     tag memory by "purpose" (i.e. "fast", "slow"), or in other ways.  
> >>
> >> How do you imagine memory tagging?
> >> Should it be a parameter when requesting some memory from rte_malloc
> >> or rte_mempool?  
> > 
> > We can't make it a parameter for mempool without making it a parameter 
> > for rte_malloc, as every memory allocation in DPDK works through 
> > rte_malloc. So at the very least, rte_malloc will have it. And as long 
> > as rte_malloc has it, there's no reason why memzones and mempools 
> > couldn't - not much code to add.
> >   
> >> Could it be a bit-field allowing to combine some properties?
> >> Does it make sense to have "DMA" as one of the purpose?  
> > 
> > Something like a bitfield would be my preference, yes. That way we could 
> > classify memory in certain ways and allocate based on that. Which 
> > "certain ways" these are, i'm not sure. For example, in addition to 
> > tagging memory as "DMA-capable" (which i think is a given), one might 
> > tag certain memory as "non-default", as in, never allocate from this 
> > chunk of memory unless explicitly asked to do so - this could be useful 
> > for types of memory that are a precious resource.
> > 
> > Then again, it is likely that we won't have many types of memory in 
> > DPDK, and any other type would be implementation-specific, so maybe just 
> > stringly-typing it is OK (maybe we can finally make use of "type" 
> > parameter in rte_malloc!).
> >   
> >>
> >> How to transparently allocate the best memory for the NIC?
> >> You take care of the NUMA socket property, but there can be more
> >> requirements, like getting memory from the NIC itself.  
> > 
> > I would think that we can't make it generic enough to cover all cases, 
> > so it's best to expose some API's and let PMD's handle this themselves.
> >   
> >>
> >> +Cc more people (6WIND, Cavium, Chelsio, Mellanox, Netronome, NXP, 
> >> Solarflare)
> >> in order to trigger a discussion about the ideal requirements.
> >>  
> >   
> 
> Hi all,
> 
> I would like to restart this discussion, again :) I would like to hear 
> some feedback on my thoughts below.
> 
> I've had some more thinking about it, and while i have lots of use-cases 
> in mind, i suspect covering them all while keeping a sane API is 
> unrealistic.
> 
> So, first things first.
> 
> Main issue we have is the 1:1 correspondence of malloc heap, and socket 
> ID. This has led to various attempts to hijack socket id's to do 
> something else - i've seen this approach a few times before, most 
> recently in a patch by Srinath/Broadcom [1]. We need to break this 
> dependency somehow, and have a unique heap identifier.
> 
> Also, since memory allocators are expected to behave roughly similar to 
> drivers (e.g. have a driver API and provide hooks for init/alloc/free 
> functions, etc.), a request to allocate memory may not just go to the 
> heap itself (which is handled internally by rte_malloc), but also go to 
> its respective allocator. This is roughly similar to what is happening 
> currently, except that which allocator functions to call will then 
> depend on which driver allocated that heap.
> 
> So, we arrive at a dependency - heap => allocator. Each heap must know 
> to which allocator it belongs - so, we also need some kind of way to 
> identify not just the heap, but the allocator as well.
> 
> In the above quotes from previous mails i suggested categorizing memory 
> by "types", but now that i think of it, the API would've been too 
> complex, as we would've ideally had to cover use cases such as "allocate 
> memory of this type, no matter from which allocator it comes from", 
> "allocate memory from this particular heap", "allocate memory from this 
> particular allocator"... It gets complicated pretty fast.
> 
> What i propose instead, is this. In 99% of time, user wants our hugepage 
> allocator. So, by default, all allocations will come through that. In 
> the event that user needs memory from a specific heap, we need to 
> provide a new set of API's to request memory from a specific heap.
> 
> Do we expect situations where user might *not* want default allocator, 
> but also *not* know which exact heap he wants? If the answer is no 
> (which i'm counting on :) ), then allocating from a specific malloc 
> driver becomes as simple as something like this:
> 
> mem = rte_malloc_from_heap("my_very_special_heap");
> 
> (stringly-typed heap ID is just an example)
> 
> So, old API's remain intact, and are always passed through to a default 
> allocator, while new API's will grant access to other allocators.
> 
> Heap ID alone, however, may not provide enough flexibility. For example, 
> if a malloc driver allocates a specific kind of memory that is 
> NUMA-aware, it would perhaps be awkward to call different heap ID's when 
> the memory being allocated is arguably the same, just subdivided into 
> several blocks. Moreover, figuring out situations like this would likely 
> require some cooperation from the allocator itself (possibly some 
> allocator-specific API's), but should we add malloc heap arguments, 
> those would have to be generic. I'm not sure if we want to go that far, 
> though.
> 
> Does that sound reasonable?
> 
> Another tangentially related issue raised by Olivier [1] is of 
> allocating memory in blocks, rather than using rte_malloc. Current 
> implementation has rte_malloc storing its metadata right in the memory - 
> this leads to unnecessary memory fragmentation in certain cases, such as 
> allocating memory page-by-page, and in general polluting memory we might 
> not want to pollute with malloc metadata.
> 
> To fix this, memory allocator would have to store malloc data 
> externally, which comes with a few caveats (reverse mapping of pointers 
> to malloc elements, storing, looking up and accounting for said 
> elements, etc.). It's not currently planned to work on it, but it's 
> certainly something to think about :)
> 
> [1] http://dpdk.org/dev/patchwork/patch/36596/
> [2] http://dpdk.org/ml/archives/dev/2018-March/093212.html

Maybe the existing rte_malloc which tries to always work like malloc is not
the best API for applications? I always thought the Samba talloc API was less
error prone since it supports reference counting and hierarchal allocation.

      reply	other threads:[~2018-04-25 16:12 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-19 11:14 Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 01/23] eal: move get_virtual_area out of linuxapp eal_memory.c Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 02/23] eal: add function to report number of detected sockets Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 03/23] eal: add rte_fbarray Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 04/23] eal: move all locking to heap Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 05/23] eal: protect malloc heap stats with a lock Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 06/23] eal: make malloc a doubly-linked list Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 07/23] eal: make malloc_elem_join_adjacent_free public Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 08/23] eal: add "single file segments" command-line option Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 09/23] eal: add "legacy memory" option Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 10/23] eal: read hugepage counts from node-specific sysfs path Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 11/23] eal: replace memseg with memseg lists Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 12/23] eal: add support for dynamic memory allocation Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 13/23] eal: make use of dynamic memory allocation for init Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 14/23] eal: add support for dynamic unmapping of pages Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 15/23] eal: add API to check if memory is physically contiguous Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 16/23] eal: enable dynamic memory allocation/free on malloc/free Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 17/23] eal: add backend support for contiguous memory allocation Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 18/23] eal: add rte_malloc support for allocating contiguous memory Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 19/23] eal: enable reserving physically contiguous memzones Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 20/23] eal: make memzones use rte_fbarray Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 21/23] mempool: add support for the new memory allocation methods Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 22/23] vfio: allow to map other memory regions Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 23/23] eal: map/unmap memory with VFIO when alloc/free pages Anatoly Burakov
2017-12-19 15:46 ` [dpdk-dev] [RFC v2 00/23] Dynamic memory allocation for DPDK Stephen Hemminger
2017-12-19 16:02   ` Burakov, Anatoly
2017-12-19 16:06     ` Stephen Hemminger
2017-12-19 16:09       ` Burakov, Anatoly
2017-12-21 21:38 ` Walker, Benjamin
2017-12-22  9:13   ` Burakov, Anatoly
2017-12-26 17:19     ` Walker, Benjamin
2018-02-02 19:28       ` Yongseok Koh
2018-02-05 10:03         ` Burakov, Anatoly
2018-02-05 10:18           ` Nélio Laranjeiro
2018-02-05 10:36             ` Burakov, Anatoly
2018-02-06  9:10               ` Nélio Laranjeiro
2018-02-14  2:01           ` Yongseok Koh
2018-02-14  9:32             ` Burakov, Anatoly
2018-02-14 18:13               ` Yongseok Koh
2018-01-13 14:13 ` Burakov, Anatoly
2018-01-23 22:33 ` Yongseok Koh
2018-01-25 16:18   ` Burakov, Anatoly
2018-02-14  8:04 ` Thomas Monjalon
2018-02-14 10:07   ` Burakov, Anatoly
2018-04-25 16:02     ` Burakov, Anatoly
2018-04-25 16:12       ` Stephen Hemminger [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180425091234.5565aafb@xeon-e3 \
    --to=stephen@networkplumber.org \
    --cc=alejandro.lucero@netronome.com \
    --cc=anatoly.burakov@intel.com \
    --cc=andras.kovacs@ericsson.com \
    --cc=arybchenko@solarflare.com \
    --cc=benjamin.walker@intel.com \
    --cc=bruce.richardson@intel.com \
    --cc=dev@dpdk.org \
    --cc=ferruh.yigit@intel.com \
    --cc=hemant.agrawal@nxp.com \
    --cc=jerin.jacob@cavium.com \
    --cc=keith.wiles@intel.com \
    --cc=laszlo.vadkeri@ericsson.com \
    --cc=nelio.laranjeiro@6wind.com \
    --cc=olivier.matz@6wind.com \
    --cc=rahul.lakkireddy@chelsio.com \
    --cc=srinath.mannam@broadcom.com \
    --cc=thomas@monjalon.net \
    --cc=yskoh@mellanox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).