DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Wiles, Keith" <keith.wiles@intel.com>
To: "Burakov, Anatoly" <anatoly.burakov@intel.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>,
	"srinath.mannam@broadcom.com" <srinath.mannam@broadcom.com>,
	"scott.branden@broadcom.com" <scott.branden@broadcom.com>,
	"ajit.khaparde@broadcom.com" <ajit.khaparde@broadcom.com>,
	Thomas Monjalon <thomas@monjalon.net>,
	Shreyansh Jain <shreyansh.jain@nxp.com>,
	"jerin.jacob@caviumnetworks.com" <jerin.jacob@caviumnetworks.com>
Subject: Re: [dpdk-dev] [RFC 00/11] Support externally allocated memory in DPDK
Date: Fri, 13 Jul 2018 17:56:40 +0000	[thread overview]
Message-ID: <08D0040C-78AA-401A-863F-57B38533648F@intel.com> (raw)
In-Reply-To: <cca8de29-b578-eacc-14c3-b01253885d6c@intel.com>



> On Jul 13, 2018, at 12:10 PM, Burakov, Anatoly <anatoly.burakov@intel.com> wrote:
> 
> On 06-Jul-18 2:17 PM, Anatoly Burakov wrote:
>> This is a proposal to enable using externally allocated memory
>> in DPDK.
>> In a nutshell, here is what is being done here:
>> - Index malloc heaps by NUMA node index, rather than NUMA node itself
>> - Add identifier string to malloc heap, to uniquely identify it
>> - Allow creating named heaps and add/remove memory to/from those heaps
>> - Allocate memseg lists at runtime, to keep track of IOVA addresses
>>   of externally allocated memory
>>   - If IOVA addresses aren't provided, use RTE_BAD_IOVA
>> - Allow malloc and memzones to allocate from named heaps
>> The responsibility to ensure memory is accessible before using it is
>> on the shoulders of the user - there is no checking done with regards
>> to validity of the memory (nor could there be...).
>> The following limitations are present:
>> - No multiprocess support
>> - No thread safety
>> There is currently no way to allocate memory during initialization
>> stage, so even if multiprocess support is added, it is not guaranteed
>> to work because of underlying issues with mapping fbarrays in
>> secondary processes. This is not an issue in single process scenario,
>> but it may be an issue in a multiprocess scenario in case where
>> primary doesn't intend to share the externally allocated memory, yet
>> adding such memory could fail because some other process failed to
>> attach to this shared memory when it wasn't needed.
>> Anatoly Burakov (11):
>>   mem: allow memseg lists to be marked as external
>>   eal: add function to rerieve socket index by socket ID
>>   malloc: index heaps using heap ID rather than NUMA node
>>   malloc: add name to malloc heaps
>>   malloc: enable retrieving statistics from named heaps
>>   malloc: enable allocating from named heaps
>>   malloc: enable creating new malloc heaps
>>   malloc: allow adding memory to named heaps
>>   malloc: allow removing memory from named heaps
>>   malloc: allow destroying heaps
>>   memzone: enable reserving memory from named heaps
>>  config/common_base                            |   1 +
>>  lib/librte_eal/common/eal_common_lcore.c      |  15 +
>>  lib/librte_eal/common/eal_common_memory.c     |  51 +++-
>>  lib/librte_eal/common/eal_common_memzone.c    | 283 ++++++++++++++----
>>  .../common/include/rte_eal_memconfig.h        |   5 +-
>>  lib/librte_eal/common/include/rte_lcore.h     |  19 +-
>>  lib/librte_eal/common/include/rte_malloc.h    | 158 +++++++++-
>>  .../common/include/rte_malloc_heap.h          |   2 +
>>  lib/librte_eal/common/include/rte_memzone.h   | 183 +++++++++++
>>  lib/librte_eal/common/malloc_heap.c           | 277 +++++++++++++++--
>>  lib/librte_eal/common/malloc_heap.h           |  26 ++
>>  lib/librte_eal/common/rte_malloc.c            | 197 +++++++++++-
>>  lib/librte_eal/rte_eal_version.map            |  10 +
>>  13 files changed, 1118 insertions(+), 109 deletions(-)
> 
> So, now that the RFC is out, i would like to ask a general question.
> 
> One other thing that this patchset is missing, is the ability for data structures (e.g. hash, mempool, etc.) to be allocated from external heaps. Currently, we can kinda sorta do that with various _init() API's (initializing a data structure over already allocated memzone), but this is not ideal and is a hassle for anyone using external memory in DPDK.
> 
> There are basically four ways to approach this problem (that i can see).
> 
> First way is to change "socket ID" to mean "heap ID" everywhere. This has an upside of having a consistent API to allocate from internal and external heaps, with little to no API additions, only internal changes to account for the fact that "socket ID" is now "heap ID".
> 
> However, there is a massive downside to this approach: it is a *giant* API change, and it's also a giant *ABI-compatible* API change. Meaning, replacing socket ID with heap ID will not cause compile failures for old code, which would result in many subtle bugs in already existing codebases. So, while in the perfect world this would've been my preferred approach, realistically i think this is a very, very bad idea.
> 
> Second one is to add a separate "heap name" API's to everything. This has an upside of clean separation between allocation from internal and external heaps. (well, whether it's an upside is debatable...) This is the approach i expected to take when i was creating this patchset.
> 
> The downside is that we have to add new API's to every library and every DPDK data structure, to allow explicit allocation from external heaps. We will have to maintain both, and things like hardware drivers will need to have a way to indicate the need to allocate things from a particular external heap.
> 
> The third way is to expose the "heap ID" externally, and allow a single, unified API to reserve memory. That is, create an API that would map either a NUMA node ID or a heap name to an ID, and allow reserving memory through that ID regardless of whether it's internal or external memory. This would also allow to gradually phase out socket-based ID's in favor of heap ID API, should we choose to do so.
> 
> The downside for this is, it adds a layer of indirection between socket ID and reserving memory on a particular NUMA node, and it makes it hard to produce a single value of "heap ID" in such a way as to replicate current functionality of allocating with SOCKET_ID_ANY. Most likely user will have to explicitly try to allocate on all sockets, unless we keep old API's around in parallel.
> 
> Finally, a fourth way would be to abuse the socket ID to also mean something else, which is an approach i've seen numerous times already, and one that i don't like. We could register new heaps as a new, fake socket ID, and use that to address external heaps (each heap would get its own socket). So, keep current socket ID behavior, but for non-existent sockets it would be possible to be registered as a fake socket pointing to an external heap.
> 
> The upside for this approach would be that no API changes are required whatsoever to existing libraries - this scheme is compatible with both internal and external heaps without adding a separate API.
> 
> The downside is bad semantics - "special" sockets, handling of SOCKET_ID_ANY, handling of "invalid socket" vs. "invalid socket that happens to correspond to an existing external heap", and many other things that can be confusing. I don't like this option, but it's an option :)
> 
> Thoughts? Comments?

#1 is super clean, but very disruptive to everyone. Very Bad IMO
#2 is also clean, but adds a lot of new APIs that everyone needs to use or at least in the external heap cases.
#3 not sure I fully understand it, but reproducing heap IDs for testing is a problem and requires new/old APIs

#4 Very easy to add, IMO it is clean and very small disruption to developers. It does require the special handling, but I feel it is OK and can be explained in the docs. Having a socket id as an ‘int’ gives us a lot room e.g. id < 64K is normal socket and > 64K is external id.

My vote would be #4, as it seems the least risk and work. :-)

> 
> I myself still favor the second way, however there are good arguments to be made for each of these options.
> 
> -- 
> Thanks,
> Anatoly

Regards,
Keith


  reply	other threads:[~2018-07-13 17:56 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-06 13:17 Anatoly Burakov
2018-07-06 13:17 ` [dpdk-dev] [RFC 01/11] mem: allow memseg lists to be marked as external Anatoly Burakov
2018-07-10 11:18   ` Alejandro Lucero
2018-07-10 11:31     ` Burakov, Anatoly
2018-07-06 13:17 ` [dpdk-dev] [RFC 02/11] eal: add function to rerieve socket index by socket ID Anatoly Burakov
2018-07-10 13:03   ` Alejandro Lucero
2018-07-06 13:17 ` [dpdk-dev] [RFC 03/11] malloc: index heaps using heap ID rather than NUMA node Anatoly Burakov
2018-07-13 16:05   ` Alejandro Lucero
2018-07-13 16:08     ` Burakov, Anatoly
2018-07-06 13:17 ` [dpdk-dev] [RFC 04/11] malloc: add name to malloc heaps Anatoly Burakov
2018-07-13 16:09   ` Alejandro Lucero
2018-07-06 13:17 ` [dpdk-dev] [RFC 05/11] malloc: enable retrieving statistics from named heaps Anatoly Burakov
2018-07-13 16:25   ` Alejandro Lucero
2018-07-06 13:17 ` [dpdk-dev] [RFC 06/11] malloc: enable allocating " Anatoly Burakov
2018-07-13 16:31   ` Alejandro Lucero
2018-07-06 13:17 ` [dpdk-dev] [RFC 07/11] malloc: enable creating new malloc heaps Anatoly Burakov
2018-07-06 13:17 ` [dpdk-dev] [RFC 08/11] malloc: allow adding memory to named heaps Anatoly Burakov
2018-07-13 17:04   ` Alejandro Lucero
2018-07-06 13:17 ` [dpdk-dev] [RFC 09/11] malloc: allow removing memory from " Anatoly Burakov
2018-07-06 13:17 ` [dpdk-dev] [RFC 10/11] malloc: allow destroying heaps Anatoly Burakov
2018-07-06 13:17 ` [dpdk-dev] [RFC 11/11] memzone: enable reserving memory from named heaps Anatoly Burakov
2018-07-13 17:10 ` [dpdk-dev] [RFC 00/11] Support externally allocated memory in DPDK Burakov, Anatoly
2018-07-13 17:56   ` Wiles, Keith [this message]
2018-07-19 10:58     ` László Vadkerti
2018-07-26 13:48       ` Burakov, Anatoly

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=08D0040C-78AA-401A-863F-57B38533648F@intel.com \
    --to=keith.wiles@intel.com \
    --cc=ajit.khaparde@broadcom.com \
    --cc=anatoly.burakov@intel.com \
    --cc=dev@dpdk.org \
    --cc=jerin.jacob@caviumnetworks.com \
    --cc=scott.branden@broadcom.com \
    --cc=shreyansh.jain@nxp.com \
    --cc=srinath.mannam@broadcom.com \
    --cc=thomas@monjalon.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).