Re: [dpdk-dev] [RFC v2 00/23] Dynamic memory allocation for DPDK

DPDK patches and discussions
 help / color / mirror / Atom feed

From: "Burakov, Anatoly" <anatoly.burakov@intel.com>
To: Thomas Monjalon <thomas@monjalon.net>
Cc: dev@dpdk.org, andras.kovacs@ericsson.com,
	laszlo.vadkeri@ericsson.com, keith.wiles@intel.com,
	benjamin.walker@intel.com, bruce.richardson@intel.com,
	Yongseok Koh <yskoh@mellanox.com>,
	nelio.laranjeiro@6wind.com, olivier.matz@6wind.com,
	rahul.lakkireddy@chelsio.com, jerin.jacob@cavium.com,
	hemant.agrawal@nxp.com, alejandro.lucero@netronome.com,
	arybchenko@solarflare.com, ferruh.yigit@intel.com,
	Srinath Mannam <srinath.mannam@broadcom.com>
Subject: Re: [dpdk-dev] [RFC v2 00/23] Dynamic memory allocation for DPDK
Date: Wed, 25 Apr 2018 17:02:48 +0100	[thread overview]
Message-ID: <53e192ed-15d8-5fa1-3048-964d92b917b1@intel.com> (raw)
In-Reply-To: <38b1d748-f815-5cf6-acea-e58d291be40d@intel.com>

On 14-Feb-18 10:07 AM, Burakov, Anatoly wrote:
> On 14-Feb-18 8:04 AM, Thomas Monjalon wrote:
>> Hi Anatoly,
>>
>> 19/12/2017 12:14, Anatoly Burakov:
>>>   * Memory tagging. This is related to previous item. Right now, we 
>>> can only ask
>>>     malloc to allocate memory by page size, but one could potentially 
>>> have
>>>     different memory regions backed by pages of similar sizes (for 
>>> example,
>>>     locked 1G pages, to completely avoid TLB misses, alongside 
>>> regular 1G pages),
>>>     and it would be good to have that kind of mechanism to 
>>> distinguish between
>>>     different memory types available to a DPDK application. One 
>>> could, for example,
>>>     tag memory by "purpose" (i.e. "fast", "slow"), or in other ways.
>>
>> How do you imagine memory tagging?
>> Should it be a parameter when requesting some memory from rte_malloc
>> or rte_mempool?
> 
> We can't make it a parameter for mempool without making it a parameter 
> for rte_malloc, as every memory allocation in DPDK works through 
> rte_malloc. So at the very least, rte_malloc will have it. And as long 
> as rte_malloc has it, there's no reason why memzones and mempools 
> couldn't - not much code to add.
> 
>> Could it be a bit-field allowing to combine some properties?
>> Does it make sense to have "DMA" as one of the purpose?
> 
> Something like a bitfield would be my preference, yes. That way we could 
> classify memory in certain ways and allocate based on that. Which 
> "certain ways" these are, i'm not sure. For example, in addition to 
> tagging memory as "DMA-capable" (which i think is a given), one might 
> tag certain memory as "non-default", as in, never allocate from this 
> chunk of memory unless explicitly asked to do so - this could be useful 
> for types of memory that are a precious resource.
> 
> Then again, it is likely that we won't have many types of memory in 
> DPDK, and any other type would be implementation-specific, so maybe just 
> stringly-typing it is OK (maybe we can finally make use of "type" 
> parameter in rte_malloc!).
> 
>>
>> How to transparently allocate the best memory for the NIC?
>> You take care of the NUMA socket property, but there can be more
>> requirements, like getting memory from the NIC itself.
> 
> I would think that we can't make it generic enough to cover all cases, 
> so it's best to expose some API's and let PMD's handle this themselves.
> 
>>
>> +Cc more people (6WIND, Cavium, Chelsio, Mellanox, Netronome, NXP, 
>> Solarflare)
>> in order to trigger a discussion about the ideal requirements.
>>
> 

Hi all,

I would like to restart this discussion, again :) I would like to hear 
some feedback on my thoughts below.

I've had some more thinking about it, and while i have lots of use-cases 
in mind, i suspect covering them all while keeping a sane API is 
unrealistic.

So, first things first.

Main issue we have is the 1:1 correspondence of malloc heap, and socket 
ID. This has led to various attempts to hijack socket id's to do 
something else - i've seen this approach a few times before, most 
recently in a patch by Srinath/Broadcom [1]. We need to break this 
dependency somehow, and have a unique heap identifier.

Also, since memory allocators are expected to behave roughly similar to 
drivers (e.g. have a driver API and provide hooks for init/alloc/free 
functions, etc.), a request to allocate memory may not just go to the 
heap itself (which is handled internally by rte_malloc), but also go to 
its respective allocator. This is roughly similar to what is happening 
currently, except that which allocator functions to call will then 
depend on which driver allocated that heap.

So, we arrive at a dependency - heap => allocator. Each heap must know 
to which allocator it belongs - so, we also need some kind of way to 
identify not just the heap, but the allocator as well.

In the above quotes from previous mails i suggested categorizing memory 
by "types", but now that i think of it, the API would've been too 
complex, as we would've ideally had to cover use cases such as "allocate 
memory of this type, no matter from which allocator it comes from", 
"allocate memory from this particular heap", "allocate memory from this 
particular allocator"... It gets complicated pretty fast.

What i propose instead, is this. In 99% of time, user wants our hugepage 
allocator. So, by default, all allocations will come through that. In 
the event that user needs memory from a specific heap, we need to 
provide a new set of API's to request memory from a specific heap.

Do we expect situations where user might *not* want default allocator, 
but also *not* know which exact heap he wants? If the answer is no 
(which i'm counting on :) ), then allocating from a specific malloc 
driver becomes as simple as something like this:

mem = rte_malloc_from_heap("my_very_special_heap");

(stringly-typed heap ID is just an example)

So, old API's remain intact, and are always passed through to a default 
allocator, while new API's will grant access to other allocators.

Heap ID alone, however, may not provide enough flexibility. For example, 
if a malloc driver allocates a specific kind of memory that is 
NUMA-aware, it would perhaps be awkward to call different heap ID's when 
the memory being allocated is arguably the same, just subdivided into 
several blocks. Moreover, figuring out situations like this would likely 
require some cooperation from the allocator itself (possibly some 
allocator-specific API's), but should we add malloc heap arguments, 
those would have to be generic. I'm not sure if we want to go that far, 
though.

Does that sound reasonable?

Another tangentially related issue raised by Olivier [1] is of 
allocating memory in blocks, rather than using rte_malloc. Current 
implementation has rte_malloc storing its metadata right in the memory - 
this leads to unnecessary memory fragmentation in certain cases, such as 
allocating memory page-by-page, and in general polluting memory we might 
not want to pollute with malloc metadata.

To fix this, memory allocator would have to store malloc data 
externally, which comes with a few caveats (reverse mapping of pointers 
to malloc elements, storing, looking up and accounting for said 
elements, etc.). It's not currently planned to work on it, but it's 
certainly something to think about :)

[1] http://dpdk.org/dev/patchwork/patch/36596/
[2] http://dpdk.org/ml/archives/dev/2018-March/093212.html

-- 
Thanks,
Anatoly

next prev parent reply	other threads:[~2018-04-25 16:03 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-19 11:14 Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 01/23] eal: move get_virtual_area out of linuxapp eal_memory.c Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 02/23] eal: add function to report number of detected sockets Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 03/23] eal: add rte_fbarray Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 04/23] eal: move all locking to heap Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 05/23] eal: protect malloc heap stats with a lock Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 06/23] eal: make malloc a doubly-linked list Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 07/23] eal: make malloc_elem_join_adjacent_free public Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 08/23] eal: add "single file segments" command-line option Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 09/23] eal: add "legacy memory" option Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 10/23] eal: read hugepage counts from node-specific sysfs path Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 11/23] eal: replace memseg with memseg lists Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 12/23] eal: add support for dynamic memory allocation Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 13/23] eal: make use of dynamic memory allocation for init Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 14/23] eal: add support for dynamic unmapping of pages Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 15/23] eal: add API to check if memory is physically contiguous Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 16/23] eal: enable dynamic memory allocation/free on malloc/free Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 17/23] eal: add backend support for contiguous memory allocation Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 18/23] eal: add rte_malloc support for allocating contiguous memory Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 19/23] eal: enable reserving physically contiguous memzones Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 20/23] eal: make memzones use rte_fbarray Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 21/23] mempool: add support for the new memory allocation methods Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 22/23] vfio: allow to map other memory regions Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 23/23] eal: map/unmap memory with VFIO when alloc/free pages Anatoly Burakov
2017-12-19 15:46 ` [dpdk-dev] [RFC v2 00/23] Dynamic memory allocation for DPDK Stephen Hemminger
2017-12-19 16:02   ` Burakov, Anatoly
2017-12-19 16:06     ` Stephen Hemminger
2017-12-19 16:09       ` Burakov, Anatoly
2017-12-21 21:38 ` Walker, Benjamin
2017-12-22  9:13   ` Burakov, Anatoly
2017-12-26 17:19     ` Walker, Benjamin
2018-02-02 19:28       ` Yongseok Koh
2018-02-05 10:03         ` Burakov, Anatoly
2018-02-05 10:18           ` Nélio Laranjeiro
2018-02-05 10:36             ` Burakov, Anatoly
2018-02-06  9:10               ` Nélio Laranjeiro
2018-02-14  2:01           ` Yongseok Koh
2018-02-14  9:32             ` Burakov, Anatoly
2018-02-14 18:13               ` Yongseok Koh
2018-01-13 14:13 ` Burakov, Anatoly
2018-01-23 22:33 ` Yongseok Koh
2018-01-25 16:18   ` Burakov, Anatoly
2018-02-14  8:04 ` Thomas Monjalon
2018-02-14 10:07   ` Burakov, Anatoly
2018-04-25 16:02     ` Burakov, Anatoly [this message]
2018-04-25 16:12       ` Stephen Hemminger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53e192ed-15d8-5fa1-3048-964d92b917b1@intel.com \
    --to=anatoly.burakov@intel.com \
    --cc=alejandro.lucero@netronome.com \
    --cc=andras.kovacs@ericsson.com \
    --cc=arybchenko@solarflare.com \
    --cc=benjamin.walker@intel.com \
    --cc=bruce.richardson@intel.com \
    --cc=dev@dpdk.org \
    --cc=ferruh.yigit@intel.com \
    --cc=hemant.agrawal@nxp.com \
    --cc=jerin.jacob@cavium.com \
    --cc=keith.wiles@intel.com \
    --cc=laszlo.vadkeri@ericsson.com \
    --cc=nelio.laranjeiro@6wind.com \
    --cc=olivier.matz@6wind.com \
    --cc=rahul.lakkireddy@chelsio.com \
    --cc=srinath.mannam@broadcom.com \
    --cc=thomas@monjalon.net \
    --cc=yskoh@mellanox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).