From: Stephen Hemminger <stephen@networkplumber.org>
To: "Burakov, Anatoly" <anatoly.burakov@intel.com>
Cc: Thomas Monjalon <thomas@monjalon.net>,
dev@dpdk.org, andras.kovacs@ericsson.com,
laszlo.vadkeri@ericsson.com, keith.wiles@intel.com,
benjamin.walker@intel.com, bruce.richardson@intel.com,
Yongseok Koh <yskoh@mellanox.com>,
nelio.laranjeiro@6wind.com, olivier.matz@6wind.com,
rahul.lakkireddy@chelsio.com, jerin.jacob@cavium.com,
hemant.agrawal@nxp.com, alejandro.lucero@netronome.com,
arybchenko@solarflare.com, ferruh.yigit@intel.com,
Srinath Mannam <srinath.mannam@broadcom.com>
Subject: Re: [dpdk-dev] [RFC v2 00/23] Dynamic memory allocation for DPDK
Date: Wed, 25 Apr 2018 09:12:34 -0700 [thread overview]
Message-ID: <20180425091234.5565aafb@xeon-e3> (raw)
In-Reply-To: <53e192ed-15d8-5fa1-3048-964d92b917b1@intel.com>
On Wed, 25 Apr 2018 17:02:48 +0100
"Burakov, Anatoly" <anatoly.burakov@intel.com> wrote:
> On 14-Feb-18 10:07 AM, Burakov, Anatoly wrote:
> > On 14-Feb-18 8:04 AM, Thomas Monjalon wrote:
> >> Hi Anatoly,
> >>
> >> 19/12/2017 12:14, Anatoly Burakov:
> >>> * Memory tagging. This is related to previous item. Right now, we
> >>> can only ask
> >>> malloc to allocate memory by page size, but one could potentially
> >>> have
> >>> different memory regions backed by pages of similar sizes (for
> >>> example,
> >>> locked 1G pages, to completely avoid TLB misses, alongside
> >>> regular 1G pages),
> >>> and it would be good to have that kind of mechanism to
> >>> distinguish between
> >>> different memory types available to a DPDK application. One
> >>> could, for example,
> >>> tag memory by "purpose" (i.e. "fast", "slow"), or in other ways.
> >>
> >> How do you imagine memory tagging?
> >> Should it be a parameter when requesting some memory from rte_malloc
> >> or rte_mempool?
> >
> > We can't make it a parameter for mempool without making it a parameter
> > for rte_malloc, as every memory allocation in DPDK works through
> > rte_malloc. So at the very least, rte_malloc will have it. And as long
> > as rte_malloc has it, there's no reason why memzones and mempools
> > couldn't - not much code to add.
> >
> >> Could it be a bit-field allowing to combine some properties?
> >> Does it make sense to have "DMA" as one of the purpose?
> >
> > Something like a bitfield would be my preference, yes. That way we could
> > classify memory in certain ways and allocate based on that. Which
> > "certain ways" these are, i'm not sure. For example, in addition to
> > tagging memory as "DMA-capable" (which i think is a given), one might
> > tag certain memory as "non-default", as in, never allocate from this
> > chunk of memory unless explicitly asked to do so - this could be useful
> > for types of memory that are a precious resource.
> >
> > Then again, it is likely that we won't have many types of memory in
> > DPDK, and any other type would be implementation-specific, so maybe just
> > stringly-typing it is OK (maybe we can finally make use of "type"
> > parameter in rte_malloc!).
> >
> >>
> >> How to transparently allocate the best memory for the NIC?
> >> You take care of the NUMA socket property, but there can be more
> >> requirements, like getting memory from the NIC itself.
> >
> > I would think that we can't make it generic enough to cover all cases,
> > so it's best to expose some API's and let PMD's handle this themselves.
> >
> >>
> >> +Cc more people (6WIND, Cavium, Chelsio, Mellanox, Netronome, NXP,
> >> Solarflare)
> >> in order to trigger a discussion about the ideal requirements.
> >>
> >
>
> Hi all,
>
> I would like to restart this discussion, again :) I would like to hear
> some feedback on my thoughts below.
>
> I've had some more thinking about it, and while i have lots of use-cases
> in mind, i suspect covering them all while keeping a sane API is
> unrealistic.
>
> So, first things first.
>
> Main issue we have is the 1:1 correspondence of malloc heap, and socket
> ID. This has led to various attempts to hijack socket id's to do
> something else - i've seen this approach a few times before, most
> recently in a patch by Srinath/Broadcom [1]. We need to break this
> dependency somehow, and have a unique heap identifier.
>
> Also, since memory allocators are expected to behave roughly similar to
> drivers (e.g. have a driver API and provide hooks for init/alloc/free
> functions, etc.), a request to allocate memory may not just go to the
> heap itself (which is handled internally by rte_malloc), but also go to
> its respective allocator. This is roughly similar to what is happening
> currently, except that which allocator functions to call will then
> depend on which driver allocated that heap.
>
> So, we arrive at a dependency - heap => allocator. Each heap must know
> to which allocator it belongs - so, we also need some kind of way to
> identify not just the heap, but the allocator as well.
>
> In the above quotes from previous mails i suggested categorizing memory
> by "types", but now that i think of it, the API would've been too
> complex, as we would've ideally had to cover use cases such as "allocate
> memory of this type, no matter from which allocator it comes from",
> "allocate memory from this particular heap", "allocate memory from this
> particular allocator"... It gets complicated pretty fast.
>
> What i propose instead, is this. In 99% of time, user wants our hugepage
> allocator. So, by default, all allocations will come through that. In
> the event that user needs memory from a specific heap, we need to
> provide a new set of API's to request memory from a specific heap.
>
> Do we expect situations where user might *not* want default allocator,
> but also *not* know which exact heap he wants? If the answer is no
> (which i'm counting on :) ), then allocating from a specific malloc
> driver becomes as simple as something like this:
>
> mem = rte_malloc_from_heap("my_very_special_heap");
>
> (stringly-typed heap ID is just an example)
>
> So, old API's remain intact, and are always passed through to a default
> allocator, while new API's will grant access to other allocators.
>
> Heap ID alone, however, may not provide enough flexibility. For example,
> if a malloc driver allocates a specific kind of memory that is
> NUMA-aware, it would perhaps be awkward to call different heap ID's when
> the memory being allocated is arguably the same, just subdivided into
> several blocks. Moreover, figuring out situations like this would likely
> require some cooperation from the allocator itself (possibly some
> allocator-specific API's), but should we add malloc heap arguments,
> those would have to be generic. I'm not sure if we want to go that far,
> though.
>
> Does that sound reasonable?
>
> Another tangentially related issue raised by Olivier [1] is of
> allocating memory in blocks, rather than using rte_malloc. Current
> implementation has rte_malloc storing its metadata right in the memory -
> this leads to unnecessary memory fragmentation in certain cases, such as
> allocating memory page-by-page, and in general polluting memory we might
> not want to pollute with malloc metadata.
>
> To fix this, memory allocator would have to store malloc data
> externally, which comes with a few caveats (reverse mapping of pointers
> to malloc elements, storing, looking up and accounting for said
> elements, etc.). It's not currently planned to work on it, but it's
> certainly something to think about :)
>
> [1] http://dpdk.org/dev/patchwork/patch/36596/
> [2] http://dpdk.org/ml/archives/dev/2018-March/093212.html
Maybe the existing rte_malloc which tries to always work like malloc is not
the best API for applications? I always thought the Samba talloc API was less
error prone since it supports reference counting and hierarchal allocation.
prev parent reply other threads:[~2018-04-25 16:12 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-12-19 11:14 Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 01/23] eal: move get_virtual_area out of linuxapp eal_memory.c Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 02/23] eal: add function to report number of detected sockets Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 03/23] eal: add rte_fbarray Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 04/23] eal: move all locking to heap Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 05/23] eal: protect malloc heap stats with a lock Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 06/23] eal: make malloc a doubly-linked list Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 07/23] eal: make malloc_elem_join_adjacent_free public Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 08/23] eal: add "single file segments" command-line option Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 09/23] eal: add "legacy memory" option Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 10/23] eal: read hugepage counts from node-specific sysfs path Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 11/23] eal: replace memseg with memseg lists Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 12/23] eal: add support for dynamic memory allocation Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 13/23] eal: make use of dynamic memory allocation for init Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 14/23] eal: add support for dynamic unmapping of pages Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 15/23] eal: add API to check if memory is physically contiguous Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 16/23] eal: enable dynamic memory allocation/free on malloc/free Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 17/23] eal: add backend support for contiguous memory allocation Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 18/23] eal: add rte_malloc support for allocating contiguous memory Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 19/23] eal: enable reserving physically contiguous memzones Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 20/23] eal: make memzones use rte_fbarray Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 21/23] mempool: add support for the new memory allocation methods Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 22/23] vfio: allow to map other memory regions Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 23/23] eal: map/unmap memory with VFIO when alloc/free pages Anatoly Burakov
2017-12-19 15:46 ` [dpdk-dev] [RFC v2 00/23] Dynamic memory allocation for DPDK Stephen Hemminger
2017-12-19 16:02 ` Burakov, Anatoly
2017-12-19 16:06 ` Stephen Hemminger
2017-12-19 16:09 ` Burakov, Anatoly
2017-12-21 21:38 ` Walker, Benjamin
2017-12-22 9:13 ` Burakov, Anatoly
2017-12-26 17:19 ` Walker, Benjamin
2018-02-02 19:28 ` Yongseok Koh
2018-02-05 10:03 ` Burakov, Anatoly
2018-02-05 10:18 ` Nélio Laranjeiro
2018-02-05 10:36 ` Burakov, Anatoly
2018-02-06 9:10 ` Nélio Laranjeiro
2018-02-14 2:01 ` Yongseok Koh
2018-02-14 9:32 ` Burakov, Anatoly
2018-02-14 18:13 ` Yongseok Koh
2018-01-13 14:13 ` Burakov, Anatoly
2018-01-23 22:33 ` Yongseok Koh
2018-01-25 16:18 ` Burakov, Anatoly
2018-02-14 8:04 ` Thomas Monjalon
2018-02-14 10:07 ` Burakov, Anatoly
2018-04-25 16:02 ` Burakov, Anatoly
2018-04-25 16:12 ` Stephen Hemminger [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180425091234.5565aafb@xeon-e3 \
--to=stephen@networkplumber.org \
--cc=alejandro.lucero@netronome.com \
--cc=anatoly.burakov@intel.com \
--cc=andras.kovacs@ericsson.com \
--cc=arybchenko@solarflare.com \
--cc=benjamin.walker@intel.com \
--cc=bruce.richardson@intel.com \
--cc=dev@dpdk.org \
--cc=ferruh.yigit@intel.com \
--cc=hemant.agrawal@nxp.com \
--cc=jerin.jacob@cavium.com \
--cc=keith.wiles@intel.com \
--cc=laszlo.vadkeri@ericsson.com \
--cc=nelio.laranjeiro@6wind.com \
--cc=olivier.matz@6wind.com \
--cc=rahul.lakkireddy@chelsio.com \
--cc=srinath.mannam@broadcom.com \
--cc=thomas@monjalon.net \
--cc=yskoh@mellanox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).