DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Burakov, Anatoly" <anatoly.burakov@intel.com>
To: dev@dpdk.org
Cc: andras.kovacs@ericsson.com, laszlo.vadkeri@ericsson.com,
	keith.wiles@intel.com, benjamin.walker@intel.com,
	bruce.richardson@intel.com, thomas@monjalon.net,
	techboard@dpdk.org, jerin.jacob@caviumnetworks.com,
	rosenbaumalex@gmail.com, "Ananyev,
	Konstantin" <konstantin.ananyev@intel.com>,
	ferruh.yigit@intel.com
Subject: Re: [dpdk-dev] [RFC v2 00/23] Dynamic memory allocation for DPDK
Date: Sat, 13 Jan 2018 14:13:53 +0000	[thread overview]
Message-ID: <5ea96aa4-1cbb-a509-b582-418e8bd71552@intel.com> (raw)
In-Reply-To: <cover.1513681966.git.anatoly.burakov@intel.com>

On 19-Dec-17 11:14 AM, Anatoly Burakov wrote:
> This patchset introduces a prototype implementation of dynamic memory allocation
> for DPDK. It is intended to start a conversation and build consensus on the best
> way to implement this functionality. The patchset works well enough to pass all
> unit tests, and to work with traffic forwarding, provided the device drivers are
> adjusted to ensure contiguous memory allocation where it matters.
> 
> The vast majority of changes are in the EAL and malloc, the external API
> disruption is minimal: a new set of API's are added for contiguous memory
> allocation (for rte_malloc and rte_memzone), and a few API additions in
> rte_memory. Every other API change is internal to EAL, and all of the memory
> allocation/freeing is handled through rte_malloc, with no externally visible
> API changes, aside from a call to get physmem layout, which no longer makes
> sense given that there are multiple memseg lists.
> 
> Quick outline of all changes done as part of this patchset:
> 
>   * Malloc heap adjusted to handle holes in address space
>   * Single memseg list replaced by multiple expandable memseg lists
>   * VA space for hugepages is preallocated in advance
>   * Added dynamic alloc/free for pages, happening as needed on malloc/free
>   * Added contiguous memory allocation API's for rte_malloc and rte_memzone
>   * Integrated Pawel Wodkowski's patch [1] for registering/unregistering memory
>     with VFIO
> 
> The biggest difference is a "memseg" now represents a single page (as opposed to
> being a big contiguous block of pages). As a consequence, both memzones and
> malloc elements are no longer guaranteed to be physically contiguous, unless
> the user asks for it. To preserve whatever functionality that was dependent
> on previous behavior, a legacy memory option is also provided, however it is
> expected to be temporary solution. The drivers weren't adjusted in this patchset,
> and it is expected that whoever shall test the drivers with this patchset will
> modify their relevant drivers to support the new set of API's. Basic testing
> with forwarding traffic was performed, both with UIO and VFIO, and no performance
> degradation was observed.
> 
> Why multiple memseg lists instead of one? It makes things easier on a number of
> fronts. Since memseg is a single page now, the list will get quite big, and we
> need to locate pages somehow when we allocate and free them. We could of course
> just walk the list and allocate one contiguous chunk of VA space for memsegs,
> but i chose to use separate lists instead, to speed up many operations with the
> list.
> 
> It would be great to see the following discussions within the community regarding
> both current implementation and future work:
> 
>   * Any suggestions to improve current implementation. The whole system with
>     multiple memseg lists is kind of unweildy, so maybe there are better ways to
>     do the same thing. Maybe use a single list after all? We're not expecting
>     malloc/free on hot path, so maybe it doesn't matter that we have to walk
>     the list of potentially thousands of pages?
>   * Pluggable memory allocators. Right now, allocators are hardcoded, but down
>     the line it would be great to have custom allocators (e.g. for externally
>     allocated memory). I've tried to keep the memalloc API minimal and generic
>     enough to be able to easily change it down the line, but suggestions are
>     welcome. Memory drivers, with ops for alloc/free etc.?
>   * Memory tagging. This is related to previous item. Right now, we can only ask
>     malloc to allocate memory by page size, but one could potentially have
>     different memory regions backed by pages of similar sizes (for example,
>     locked 1G pages, to completely avoid TLB misses, alongside regular 1G pages),
>     and it would be good to have that kind of mechanism to distinguish between
>     different memory types available to a DPDK application. One could, for example,
>     tag memory by "purpose" (i.e. "fast", "slow"), or in other ways.
>   * Secondary process implementation, in particular when it comes to allocating/
>     freeing new memory. Current plan is to make use of RPC mechanism proposed by
>     Jianfeng [2] to communicate between primary and secondary processes, however
>     other suggestions are welcome.
>   * Support for non-hugepage memory. This work is planned down the line. Aside
>     from obvious concerns about physical addresses, 4K pages are small and will
>     eat up enormous amounts of memseg list space, so my proposal would be to
>     allocate 4K pages in bigger blocks (say, 2MB).
>   * 32-bit support. Current implementation lacks it, and i don't see a trivial
>     way to make it work if we are to preallocate huge chunks of VA space in
>     advance. We could limit it to 1G per page size, but even that, on multiple
>     sockets, won't work that well, and we can't know in advance what kind of
>     memory user will try to allocate. Drop it? Leave it in legacy mode only?
>   * Preallocation. Right now, malloc will free any and all memory that it can,
>     which could lead to a (perhaps counterintuitive) situation where a user
>     calls DPDK with --socket-mem=1024,1024, does a single "rte_free" and loses
>     all of the preallocated memory in the process. Would preallocating memory
>     *and keeping it no matter what* be a valid use case? E.g. if DPDK was run
>     without any memory requirements specified, grow and shrink as needed, but
>     DPDK was asked to preallocate memory, we can grow but we can't shrink
>     past the preallocated amount?
> 
> Any other feedback about things i didn't think of or missed is greatly
> appreciated.
> 
> [1] http://dpdk.org/dev/patchwork/patch/24484/
> [2] http://dpdk.org/dev/patchwork/patch/31838/
> 
Hi all,

Could this proposal be discussed at the next tech board meeting?

-- 
Thanks,
Anatoly

  parent reply	other threads:[~2018-01-13 14:13 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-19 11:14 Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 01/23] eal: move get_virtual_area out of linuxapp eal_memory.c Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 02/23] eal: add function to report number of detected sockets Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 03/23] eal: add rte_fbarray Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 04/23] eal: move all locking to heap Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 05/23] eal: protect malloc heap stats with a lock Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 06/23] eal: make malloc a doubly-linked list Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 07/23] eal: make malloc_elem_join_adjacent_free public Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 08/23] eal: add "single file segments" command-line option Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 09/23] eal: add "legacy memory" option Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 10/23] eal: read hugepage counts from node-specific sysfs path Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 11/23] eal: replace memseg with memseg lists Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 12/23] eal: add support for dynamic memory allocation Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 13/23] eal: make use of dynamic memory allocation for init Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 14/23] eal: add support for dynamic unmapping of pages Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 15/23] eal: add API to check if memory is physically contiguous Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 16/23] eal: enable dynamic memory allocation/free on malloc/free Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 17/23] eal: add backend support for contiguous memory allocation Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 18/23] eal: add rte_malloc support for allocating contiguous memory Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 19/23] eal: enable reserving physically contiguous memzones Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 20/23] eal: make memzones use rte_fbarray Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 21/23] mempool: add support for the new memory allocation methods Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 22/23] vfio: allow to map other memory regions Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 23/23] eal: map/unmap memory with VFIO when alloc/free pages Anatoly Burakov
2017-12-19 15:46 ` [dpdk-dev] [RFC v2 00/23] Dynamic memory allocation for DPDK Stephen Hemminger
2017-12-19 16:02   ` Burakov, Anatoly
2017-12-19 16:06     ` Stephen Hemminger
2017-12-19 16:09       ` Burakov, Anatoly
2017-12-21 21:38 ` Walker, Benjamin
2017-12-22  9:13   ` Burakov, Anatoly
2017-12-26 17:19     ` Walker, Benjamin
2018-02-02 19:28       ` Yongseok Koh
2018-02-05 10:03         ` Burakov, Anatoly
2018-02-05 10:18           ` Nélio Laranjeiro
2018-02-05 10:36             ` Burakov, Anatoly
2018-02-06  9:10               ` Nélio Laranjeiro
2018-02-14  2:01           ` Yongseok Koh
2018-02-14  9:32             ` Burakov, Anatoly
2018-02-14 18:13               ` Yongseok Koh
2018-01-13 14:13 ` Burakov, Anatoly [this message]
2018-01-23 22:33 ` Yongseok Koh
2018-01-25 16:18   ` Burakov, Anatoly
2018-02-14  8:04 ` Thomas Monjalon
2018-02-14 10:07   ` Burakov, Anatoly
2018-04-25 16:02     ` Burakov, Anatoly
2018-04-25 16:12       ` Stephen Hemminger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5ea96aa4-1cbb-a509-b582-418e8bd71552@intel.com \
    --to=anatoly.burakov@intel.com \
    --cc=andras.kovacs@ericsson.com \
    --cc=benjamin.walker@intel.com \
    --cc=bruce.richardson@intel.com \
    --cc=dev@dpdk.org \
    --cc=ferruh.yigit@intel.com \
    --cc=jerin.jacob@caviumnetworks.com \
    --cc=keith.wiles@intel.com \
    --cc=konstantin.ananyev@intel.com \
    --cc=laszlo.vadkeri@ericsson.com \
    --cc=rosenbaumalex@gmail.com \
    --cc=techboard@dpdk.org \
    --cc=thomas@monjalon.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).