From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by dpdk.org (Postfix) with ESMTP id 0307232A5 for ; Thu, 20 Sep 2018 13:36:40 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 20 Sep 2018 04:36:40 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,398,1531810800"; d="scan'208";a="81952059" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by FMSMGA003.fm.intel.com with ESMTP; 20 Sep 2018 04:36:34 -0700 Received: from sivswdev01.ir.intel.com (sivswdev01.ir.intel.com [10.237.217.45]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id w8KBaXkI022793; Thu, 20 Sep 2018 12:36:33 +0100 Received: from sivswdev01.ir.intel.com (localhost [127.0.0.1]) by sivswdev01.ir.intel.com with ESMTP id w8KBaXDK011352; Thu, 20 Sep 2018 12:36:33 +0100 Received: (from aburakov@localhost) by sivswdev01.ir.intel.com with LOCAL id w8KBaXGj011342; Thu, 20 Sep 2018 12:36:33 +0100 From: Anatoly Burakov To: dev@dpdk.org Cc: laszlo.madarassy@ericsson.com, laszlo.vadkerti@ericsson.com, andras.kovacs@ericsson.com, winnie.tian@ericsson.com, daniel.andrasi@ericsson.com, janos.kobor@ericsson.com, geza.koblo@ericsson.com, srinath.mannam@broadcom.com, scott.branden@broadcom.com, ajit.khaparde@broadcom.com, keith.wiles@intel.com, bruce.richardson@intel.com, thomas@monjalon.net, shreyansh.jain@nxp.com, shahafs@mellanox.com, arybchenko@solarflare.com Date: Thu, 20 Sep 2018 12:36:13 +0100 Message-Id: X-Mailer: git-send-email 1.7.0.7 In-Reply-To: References: Subject: [dpdk-dev] [PATCH v3 00/20] Support externally allocated memory in DPDK X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Sep 2018 11:36:41 -0000 This is a proposal to enable using externally allocated memory in DPDK. In a nutshell, here is what is being done here: - Index internal malloc heaps by NUMA node index, rather than NUMA node itself (external heaps will have ID's in order of creation) - Add identifier string to malloc heap, to uniquely identify it - Each new heap will receive a unique socket ID that will be used by allocator to decide from which heap (internal or external) to allocate requested amount of memory - Allow creating named heaps and add/remove memory to/from those heaps - Allocate memseg lists at runtime, to keep track of IOVA addresses of externally allocated memory - If IOVA addresses aren't provided, use RTE_BAD_IOVA - Allow malloc and memzones to allocate from external heaps - Allow other data structures to allocate from externall heaps The responsibility to ensure memory is accessible before using it is on the shoulders of the user - there is no checking done with regards to validity of the memory (nor could there be...). The general approach is to create heap and add memory into it. For any other process wishing to use the same memory, said memory must first be attached (otherwise some things will not work). A design decision was made to make multiprocess synchronization a manual process. Due to underlying issues with attaching to fbarrays in secondary processes, this design was deemed to be better because we don't want to fail to create external heap in the primary because something in the secondary has failed when in fact we may not eve have wanted this memory to be accessible in the secondary in the first place. Using external memory in multiprocess is *hard*, because not only memory space needs to be preallocated, but it also needs to be attached in each process to allow other processes to access the page table. The attach API call may or may not succeed, depending on memory layout, for reasons similar to other multiprocess failures. This is treated as a "known issue" for this release. Creating and destroying heaps is currently restricted to primary processes, because we need to keep track of all socket ID's we've ever used to prevent their reuse, and obviously different processes would have kept different socket ID counters, and it isn't important enough to put into shared memory. This means that secondary processes will not be able to create new heaps. If this use case is important enough, we can put the max socket ID into shared memory, or allow socket ID reuse (which i do not think is a good idea because it has the potential to make things harder to debug). v3 -> v2 changes: - Rebase on top of latest master - Clarifications added to mempool code as per Andrew Rynchenko's comments v2 -> v1 changes: - Fixed NULL dereference on heap socket ID lookup - Fixed memseg offset calculation on adding memory to heap - Improved unit test to test for above bugfixes - Restricted heap creation to primary processes only - Added sample application - Added documentation RFC -> v1 changes: - Removed the "named heaps" API, allocate using fake socket ID instead - Added multiprocess support - Everything is now thread-safe - Numerous bugfixes and API improvements Anatoly Burakov (20): mem: add length to memseg list mem: allow memseg lists to be marked as external malloc: index heaps using heap ID rather than NUMA node mem: do not check for invalid socket ID flow_classify: do not check for invalid socket ID pipeline: do not check for invalid socket ID sched: do not check for invalid socket ID malloc: add name to malloc heaps malloc: add function to query socket ID of named heap malloc: allow creating malloc heaps malloc: allow destroying heaps malloc: allow adding memory to named heaps malloc: allow removing memory from named heaps malloc: allow attaching to external memory chunks malloc: allow detaching from external memory test: add unit tests for external memory support examples: add external memory example app doc: add external memory feature to the release notes doc: add external memory feature to programmer's guide doc: add external memory sample application guide config/common_base | 1 + config/rte_config.h | 1 + .../prog_guide/env_abstraction_layer.rst | 38 ++ doc/guides/rel_notes/deprecation.rst | 15 - doc/guides/rel_notes/release_18_11.rst | 24 +- doc/guides/sample_app_ug/external_mem.rst | 115 +++++ doc/guides/sample_app_ug/index.rst | 1 + drivers/bus/fslmc/fslmc_vfio.c | 7 +- drivers/bus/pci/linux/pci.c | 2 +- drivers/net/mlx4/mlx4_mr.c | 3 + drivers/net/mlx5/mlx5.c | 5 +- drivers/net/mlx5/mlx5_mr.c | 3 + drivers/net/virtio/virtio_user/vhost_kernel.c | 5 +- examples/external_mem/Makefile | 62 +++ examples/external_mem/extmem.c | 461 ++++++++++++++++++ examples/external_mem/meson.build | 12 + lib/librte_eal/bsdapp/eal/Makefile | 2 +- lib/librte_eal/bsdapp/eal/eal.c | 3 + lib/librte_eal/bsdapp/eal/eal_memory.c | 9 +- lib/librte_eal/common/eal_common_memory.c | 8 +- lib/librte_eal/common/eal_common_memzone.c | 8 +- .../common/include/rte_eal_memconfig.h | 6 +- lib/librte_eal/common/include/rte_malloc.h | 183 +++++++ .../common/include/rte_malloc_heap.h | 3 + lib/librte_eal/common/include/rte_memory.h | 9 + lib/librte_eal/common/malloc_heap.c | 300 ++++++++++-- lib/librte_eal/common/malloc_heap.h | 17 + lib/librte_eal/common/rte_malloc.c | 393 ++++++++++++++- lib/librte_eal/linuxapp/eal/Makefile | 2 +- lib/librte_eal/linuxapp/eal/eal.c | 10 +- lib/librte_eal/linuxapp/eal/eal_memalloc.c | 12 +- lib/librte_eal/linuxapp/eal/eal_memory.c | 4 +- lib/librte_eal/linuxapp/eal/eal_vfio.c | 17 +- lib/librte_eal/meson.build | 2 +- lib/librte_eal/rte_eal_version.map | 7 + lib/librte_flow_classify/rte_flow_classify.c | 3 +- lib/librte_mempool/rte_mempool.c | 35 +- lib/librte_pipeline/rte_pipeline.c | 3 +- lib/librte_sched/rte_sched.c | 2 +- test/test/Makefile | 1 + test/test/autotest_data.py | 14 +- test/test/meson.build | 1 + test/test/test_external_mem.c | 389 +++++++++++++++ test/test/test_malloc.c | 3 + test/test/test_memzone.c | 3 + 45 files changed, 2099 insertions(+), 105 deletions(-) create mode 100644 doc/guides/sample_app_ug/external_mem.rst create mode 100644 examples/external_mem/Makefile create mode 100644 examples/external_mem/extmem.c create mode 100644 examples/external_mem/meson.build create mode 100644 test/test/test_external_mem.c -- 2.17.1