DPDK patches and discussions
 help / color / mirror / Atom feed
From: Anatoly Burakov <anatoly.burakov@intel.com>
To: dev@dpdk.org
Cc: andras.kovacs@ericsson.com, laszlo.vadkeri@ericsson.com,
	keith.wiles@intel.com, benjamin.walker@intel.com,
	bruce.richardson@intel.com, thomas@monjalon.net
Subject: [dpdk-dev] [RFC v2 00/23] Dynamic memory allocation for DPDK
Date: Tue, 19 Dec 2017 11:14:27 +0000	[thread overview]
Message-ID: <cover.1513681966.git.anatoly.burakov@intel.com> (raw)

This patchset introduces a prototype implementation of dynamic memory allocation
for DPDK. It is intended to start a conversation and build consensus on the best
way to implement this functionality. The patchset works well enough to pass all
unit tests, and to work with traffic forwarding, provided the device drivers are
adjusted to ensure contiguous memory allocation where it matters.

The vast majority of changes are in the EAL and malloc, the external API
disruption is minimal: a new set of API's are added for contiguous memory
allocation (for rte_malloc and rte_memzone), and a few API additions in
rte_memory. Every other API change is internal to EAL, and all of the memory
allocation/freeing is handled through rte_malloc, with no externally visible
API changes, aside from a call to get physmem layout, which no longer makes
sense given that there are multiple memseg lists.

Quick outline of all changes done as part of this patchset:

 * Malloc heap adjusted to handle holes in address space
 * Single memseg list replaced by multiple expandable memseg lists
 * VA space for hugepages is preallocated in advance
 * Added dynamic alloc/free for pages, happening as needed on malloc/free
 * Added contiguous memory allocation API's for rte_malloc and rte_memzone
 * Integrated Pawel Wodkowski's patch [1] for registering/unregistering memory
   with VFIO

The biggest difference is a "memseg" now represents a single page (as opposed to
being a big contiguous block of pages). As a consequence, both memzones and
malloc elements are no longer guaranteed to be physically contiguous, unless
the user asks for it. To preserve whatever functionality that was dependent
on previous behavior, a legacy memory option is also provided, however it is
expected to be temporary solution. The drivers weren't adjusted in this patchset,
and it is expected that whoever shall test the drivers with this patchset will
modify their relevant drivers to support the new set of API's. Basic testing
with forwarding traffic was performed, both with UIO and VFIO, and no performance
degradation was observed.

Why multiple memseg lists instead of one? It makes things easier on a number of
fronts. Since memseg is a single page now, the list will get quite big, and we
need to locate pages somehow when we allocate and free them. We could of course
just walk the list and allocate one contiguous chunk of VA space for memsegs,
but i chose to use separate lists instead, to speed up many operations with the
list.

It would be great to see the following discussions within the community regarding
both current implementation and future work:

 * Any suggestions to improve current implementation. The whole system with
   multiple memseg lists is kind of unweildy, so maybe there are better ways to
   do the same thing. Maybe use a single list after all? We're not expecting
   malloc/free on hot path, so maybe it doesn't matter that we have to walk
   the list of potentially thousands of pages?
 * Pluggable memory allocators. Right now, allocators are hardcoded, but down
   the line it would be great to have custom allocators (e.g. for externally
   allocated memory). I've tried to keep the memalloc API minimal and generic
   enough to be able to easily change it down the line, but suggestions are
   welcome. Memory drivers, with ops for alloc/free etc.?
 * Memory tagging. This is related to previous item. Right now, we can only ask
   malloc to allocate memory by page size, but one could potentially have
   different memory regions backed by pages of similar sizes (for example,
   locked 1G pages, to completely avoid TLB misses, alongside regular 1G pages),
   and it would be good to have that kind of mechanism to distinguish between
   different memory types available to a DPDK application. One could, for example,
   tag memory by "purpose" (i.e. "fast", "slow"), or in other ways.
 * Secondary process implementation, in particular when it comes to allocating/
   freeing new memory. Current plan is to make use of RPC mechanism proposed by
   Jianfeng [2] to communicate between primary and secondary processes, however
   other suggestions are welcome.
 * Support for non-hugepage memory. This work is planned down the line. Aside
   from obvious concerns about physical addresses, 4K pages are small and will
   eat up enormous amounts of memseg list space, so my proposal would be to
   allocate 4K pages in bigger blocks (say, 2MB).
 * 32-bit support. Current implementation lacks it, and i don't see a trivial
   way to make it work if we are to preallocate huge chunks of VA space in
   advance. We could limit it to 1G per page size, but even that, on multiple
   sockets, won't work that well, and we can't know in advance what kind of
   memory user will try to allocate. Drop it? Leave it in legacy mode only?
 * Preallocation. Right now, malloc will free any and all memory that it can,
   which could lead to a (perhaps counterintuitive) situation where a user
   calls DPDK with --socket-mem=1024,1024, does a single "rte_free" and loses
   all of the preallocated memory in the process. Would preallocating memory
   *and keeping it no matter what* be a valid use case? E.g. if DPDK was run
   without any memory requirements specified, grow and shrink as needed, but
   DPDK was asked to preallocate memory, we can grow but we can't shrink
   past the preallocated amount?

Any other feedback about things i didn't think of or missed is greatly
appreciated.

[1] http://dpdk.org/dev/patchwork/patch/24484/
[2] http://dpdk.org/dev/patchwork/patch/31838/

Anatoly Burakov (23):
  eal: move get_virtual_area out of linuxapp eal_memory.c
  eal: add function to report number of detected sockets
  eal: add rte_fbarray
  eal: move all locking to heap
  eal: protect malloc heap stats with a lock
  eal: make malloc a doubly-linked list
  eal: make malloc_elem_join_adjacent_free public
  eal: add "single file segments" command-line option
  eal: add "legacy memory" option
  eal: read hugepage counts from node-specific sysfs path
  eal: replace memseg with memseg lists
  eal: add support for dynamic memory allocation
  eal: make use of dynamic memory allocation for init
  eal: add support for dynamic unmapping of pages
  eal: add API to check if memory is physically contiguous
  eal: enable dynamic memory allocation/free on malloc/free
  eal: add backend support for contiguous memory allocation
  eal: add rte_malloc support for allocating contiguous memory
  eal: enable reserving physically contiguous memzones
  eal: make memzones use rte_fbarray
  mempool: add support for the new memory allocation methods
  vfio: allow to map other memory regions
  eal: map/unmap memory with VFIO when alloc/free pages

 config/common_base                                |   5 +-
 drivers/bus/pci/linux/pci.c                       |  29 +-
 drivers/net/ena/ena_ethdev.c                      |  10 +-
 drivers/net/virtio/virtio_user/vhost_kernel.c     | 106 ++--
 lib/librte_eal/common/Makefile                    |   2 +-
 lib/librte_eal/common/eal_common_fbarray.c        | 585 ++++++++++++++++++++++
 lib/librte_eal/common/eal_common_lcore.c          |  11 +
 lib/librte_eal/common/eal_common_memalloc.c       |  79 +++
 lib/librte_eal/common/eal_common_memory.c         | 315 +++++++++++-
 lib/librte_eal/common/eal_common_memzone.c        | 250 ++++++---
 lib/librte_eal/common/eal_common_options.c        |   8 +
 lib/librte_eal/common/eal_filesystem.h            |  13 +
 lib/librte_eal/common/eal_hugepages.h             |   1 +
 lib/librte_eal/common/eal_internal_cfg.h          |   6 +
 lib/librte_eal/common/eal_memalloc.h              |  55 ++
 lib/librte_eal/common/eal_options.h               |   4 +
 lib/librte_eal/common/eal_private.h               |  29 ++
 lib/librte_eal/common/include/rte_eal.h           |   1 +
 lib/librte_eal/common/include/rte_eal_memconfig.h |  26 +-
 lib/librte_eal/common/include/rte_fbarray.h       |  98 ++++
 lib/librte_eal/common/include/rte_lcore.h         |   8 +
 lib/librte_eal/common/include/rte_malloc.h        | 181 +++++++
 lib/librte_eal/common/include/rte_malloc_heap.h   |   6 +
 lib/librte_eal/common/include/rte_memory.h        |  16 +
 lib/librte_eal/common/include/rte_memzone.h       | 158 ++++++
 lib/librte_eal/common/malloc_elem.c               | 411 ++++++++++++---
 lib/librte_eal/common/malloc_elem.h               |  30 +-
 lib/librte_eal/common/malloc_heap.c               | 433 ++++++++++++++--
 lib/librte_eal/common/malloc_heap.h               |  14 +-
 lib/librte_eal/common/rte_malloc.c                | 139 +++--
 lib/librte_eal/linuxapp/eal/Makefile              |   4 +
 lib/librte_eal/linuxapp/eal/eal.c                 |  23 +-
 lib/librte_eal/linuxapp/eal/eal_hugepage_info.c   |  73 ++-
 lib/librte_eal/linuxapp/eal/eal_memalloc.c        | 556 ++++++++++++++++++++
 lib/librte_eal/linuxapp/eal/eal_memory.c          | 452 ++++++++++-------
 lib/librte_eal/linuxapp/eal/eal_vfio.c            | 280 ++++++++---
 lib/librte_eal/linuxapp/eal/eal_vfio.h            |  11 +
 lib/librte_mempool/rte_mempool.c                  |  84 +++-
 test/test/test_malloc.c                           |  29 +-
 test/test/test_memory.c                           |  44 +-
 test/test/test_memzone.c                          |  17 +-
 41 files changed, 3999 insertions(+), 603 deletions(-)
 create mode 100755 lib/librte_eal/common/eal_common_fbarray.c
 create mode 100755 lib/librte_eal/common/eal_common_memalloc.c
 create mode 100755 lib/librte_eal/common/eal_memalloc.h
 create mode 100755 lib/librte_eal/common/include/rte_fbarray.h
 create mode 100755 lib/librte_eal/linuxapp/eal/eal_memalloc.c

-- 
2.7.4

             reply	other threads:[~2017-12-19 11:14 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-19 11:14 Anatoly Burakov [this message]
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 01/23] eal: move get_virtual_area out of linuxapp eal_memory.c Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 02/23] eal: add function to report number of detected sockets Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 03/23] eal: add rte_fbarray Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 04/23] eal: move all locking to heap Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 05/23] eal: protect malloc heap stats with a lock Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 06/23] eal: make malloc a doubly-linked list Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 07/23] eal: make malloc_elem_join_adjacent_free public Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 08/23] eal: add "single file segments" command-line option Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 09/23] eal: add "legacy memory" option Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 10/23] eal: read hugepage counts from node-specific sysfs path Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 11/23] eal: replace memseg with memseg lists Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 12/23] eal: add support for dynamic memory allocation Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 13/23] eal: make use of dynamic memory allocation for init Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 14/23] eal: add support for dynamic unmapping of pages Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 15/23] eal: add API to check if memory is physically contiguous Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 16/23] eal: enable dynamic memory allocation/free on malloc/free Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 17/23] eal: add backend support for contiguous memory allocation Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 18/23] eal: add rte_malloc support for allocating contiguous memory Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 19/23] eal: enable reserving physically contiguous memzones Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 20/23] eal: make memzones use rte_fbarray Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 21/23] mempool: add support for the new memory allocation methods Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 22/23] vfio: allow to map other memory regions Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 23/23] eal: map/unmap memory with VFIO when alloc/free pages Anatoly Burakov
2017-12-19 15:46 ` [dpdk-dev] [RFC v2 00/23] Dynamic memory allocation for DPDK Stephen Hemminger
2017-12-19 16:02   ` Burakov, Anatoly
2017-12-19 16:06     ` Stephen Hemminger
2017-12-19 16:09       ` Burakov, Anatoly
2017-12-21 21:38 ` Walker, Benjamin
2017-12-22  9:13   ` Burakov, Anatoly
2017-12-26 17:19     ` Walker, Benjamin
2018-02-02 19:28       ` Yongseok Koh
2018-02-05 10:03         ` Burakov, Anatoly
2018-02-05 10:18           ` Nélio Laranjeiro
2018-02-05 10:36             ` Burakov, Anatoly
2018-02-06  9:10               ` Nélio Laranjeiro
2018-02-14  2:01           ` Yongseok Koh
2018-02-14  9:32             ` Burakov, Anatoly
2018-02-14 18:13               ` Yongseok Koh
2018-01-13 14:13 ` Burakov, Anatoly
2018-01-23 22:33 ` Yongseok Koh
2018-01-25 16:18   ` Burakov, Anatoly
2018-02-14  8:04 ` Thomas Monjalon
2018-02-14 10:07   ` Burakov, Anatoly
2018-04-25 16:02     ` Burakov, Anatoly
2018-04-25 16:12       ` Stephen Hemminger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1513681966.git.anatoly.burakov@intel.com \
    --to=anatoly.burakov@intel.com \
    --cc=andras.kovacs@ericsson.com \
    --cc=benjamin.walker@intel.com \
    --cc=bruce.richardson@intel.com \
    --cc=dev@dpdk.org \
    --cc=keith.wiles@intel.com \
    --cc=laszlo.vadkeri@ericsson.com \
    --cc=thomas@monjalon.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).