From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by dpdk.org (Postfix) with ESMTP id 2C2772BF1 for ; Sat, 3 Mar 2018 14:46:34 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 03 Mar 2018 05:46:33 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.47,418,1515484800"; d="scan'208";a="208584184" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by fmsmga006.fm.intel.com with ESMTP; 03 Mar 2018 05:46:30 -0800 Received: from sivswdev01.ir.intel.com (sivswdev01.ir.intel.com [10.237.217.45]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id w23DkUL1012129; Sat, 3 Mar 2018 13:46:30 GMT Received: from sivswdev01.ir.intel.com (localhost [127.0.0.1]) by sivswdev01.ir.intel.com with ESMTP id w23DkTb2023656; Sat, 3 Mar 2018 13:46:29 GMT Received: (from aburakov@localhost) by sivswdev01.ir.intel.com with LOCAL id w23DkTl9023647; Sat, 3 Mar 2018 13:46:29 GMT From: Anatoly Burakov To: dev@dpdk.org Cc: keith.wiles@intel.com, jianfeng.tan@intel.com, andras.kovacs@ericsson.com, laszlo.vadkeri@ericsson.com, benjamin.walker@intel.com, bruce.richardson@intel.com, thomas@monjalon.net, konstantin.ananyev@intel.com, kuralamudhan.ramakrishnan@intel.com, louise.m.daly@intel.com, nelio.laranjeiro@6wind.com, yskoh@mellanox.com, pepperjo@japf.ch, jerin.jacob@caviumnetworks.com, hemant.agrawal@nxp.com, olivier.matz@6wind.com Date: Sat, 3 Mar 2018 13:45:48 +0000 Message-Id: X-Mailer: git-send-email 1.7.0.7 Subject: [dpdk-dev] [PATCH 00/41] Memory Hotplug for DPDK X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 03 Mar 2018 13:46:35 -0000 This patchset introduces dynamic memory allocation for DPDK (aka memory hotplug). Based upon RFC submitted in December [1]. Dependencies (to be applied in specified order): - IPC bugfixes patchset [2] - IPC improvements patchset [3] - IPC asynchronous request API patch [4] - Function to return number of sockets [5] Deprecation notices relevant to this patchset: - General outline of memory hotplug changes [6] - EAL NUMA node count changes [7] The vast majority of changes are in the EAL and malloc, the external API disruption is minimal: a new set of API's are added for contiguous memory allocation for rte_memzone, and a few API additions in rte_memory due to switch to memseg_lists as opposed to memsegs. Every other API change is internal to EAL, and all of the memory allocation/freeing is handled through rte_malloc, with no externally visible API changes. Quick outline of all changes done as part of this patchset: * Malloc heap adjusted to handle holes in address space * Single memseg list replaced by multiple memseg lists * VA space for hugepages is preallocated in advance * Added alloc/free for pages happening as needed on rte_malloc/rte_free * Added contiguous memory allocation API's for rte_memzone * Integrated Pawel Wodkowski's patch for registering/unregistering memory with VFIO [8] * Callbacks for registering memory allocations * Multiprocess support done via DPDK IPC introduced in 18.02 The biggest difference is a "memseg" now represents a single page (as opposed to being a big contiguous block of pages). As a consequence, both memzones and malloc elements are no longer guaranteed to be physically contiguous, unless the user asks for it at reserve time. To preserve whatever functionality that was dependent on previous behavior, a legacy memory option is also provided, however it is expected (or perhaps vainly hoped) to be temporary solution. Why multiple memseg lists instead of one? Since memseg is a single page now, the list of memsegs will get quite big, and we need to locate pages somehow when we allocate and free them. We could of course just walk the list and allocate one contiguous chunk of VA space for memsegs, but this implementation uses separate lists instead in order to speed up many operations with memseg lists. For v1, the following limitations are present: - FreeBSD does not even compile, let alone run - No 32-bit support - There are some minor quality-of-life improvements planned that aren't ready yet and will be part of v2 - VFIO support is only smoke-tested (but is expected to work), VFIO support with secondary processes is not tested; work is ongoing to validate VFIO for all use cases - Dynamic mapping/unmapping memory with VFIO is not supported in sPAPR IOMMU mode - help from sPAPR maintainers requested Nevertheless, this patchset should be testable under 64-bit Linux, and should work for all use cases bar those mentioned above. [1] http://dpdk.org/dev/patchwork/bundle/aburakov/Memory_RFC/ [2] http://dpdk.org/dev/patchwork/bundle/aburakov/IPC_Fixes/ [3] http://dpdk.org/dev/patchwork/bundle/aburakov/IPC_Improvements/ [4] http://dpdk.org/dev/patchwork/bundle/aburakov/IPC_Async_Request/ [5] http://dpdk.org/dev/patchwork/bundle/aburakov/Num_Sockets/ [6] http://dpdk.org/dev/patchwork/patch/34002/ [7] http://dpdk.org/dev/patchwork/patch/33853/ [8] http://dpdk.org/dev/patchwork/patch/24484/ Anatoly Burakov (41): eal: move get_virtual_area out of linuxapp eal_memory.c eal: move all locking to heap eal: make malloc heap a doubly-linked list eal: add function to dump malloc heap contents test: add command to dump malloc heap contents eal: make malloc_elem_join_adjacent_free public eal: make malloc free list remove public eal: make malloc free return resulting malloc element eal: add rte_fbarray eal: add "single file segments" command-line option eal: add "legacy memory" option eal: read hugepage counts from node-specific sysfs path eal: replace memseg with memseg lists eal: add support for mapping hugepages at runtime eal: add support for unmapping pages at runtime eal: make use of memory hotplug for init eal: enable memory hotplug support in rte_malloc test: fix malloc autotest to support memory hotplug eal: add API to check if memory is contiguous eal: add backend support for contiguous allocation eal: enable reserving physically contiguous memzones eal: replace memzone array with fbarray mempool: add support for the new allocation methods vfio: allow to map other memory regions eal: map/unmap memory with VFIO when alloc/free pages eal: prepare memseg lists for multiprocess sync eal: add multiprocess init with memory hotplug eal: add support for multiprocess memory hotplug eal: add support for callbacks on memory hotplug eal: enable callbacks on malloc/free and mp sync ethdev: use contiguous allocation for DMA memory crypto/qat: use contiguous allocation for DMA memory net/avf: use contiguous allocation for DMA memory net/bnx2x: use contiguous allocation for DMA memory net/cxgbe: use contiguous allocation for DMA memory net/ena: use contiguous allocation for DMA memory net/enic: use contiguous allocation for DMA memory net/i40e: use contiguous allocation for DMA memory net/qede: use contiguous allocation for DMA memory net/virtio: use contiguous allocation for DMA memory net/vmxnet3: use contiguous allocation for DMA memory config/common_base | 15 +- drivers/bus/pci/linux/pci.c | 29 +- drivers/crypto/qat/qat_qp.c | 4 +- drivers/net/avf/avf_ethdev.c | 2 +- drivers/net/bnx2x/bnx2x.c | 2 +- drivers/net/bnx2x/bnx2x_rxtx.c | 3 +- drivers/net/cxgbe/sge.c | 3 +- drivers/net/ena/base/ena_plat_dpdk.h | 7 +- drivers/net/ena/ena_ethdev.c | 10 +- drivers/net/enic/enic_main.c | 4 +- drivers/net/i40e/i40e_ethdev.c | 2 +- drivers/net/i40e/i40e_rxtx.c | 2 +- drivers/net/qede/base/bcm_osal.c | 5 +- drivers/net/virtio/virtio_ethdev.c | 8 +- drivers/net/virtio/virtio_user/vhost_kernel.c | 108 ++- drivers/net/vmxnet3/vmxnet3_ethdev.c | 7 +- lib/librte_eal/bsdapp/eal/Makefile | 4 + lib/librte_eal/bsdapp/eal/eal.c | 25 + lib/librte_eal/bsdapp/eal/eal_hugepage_info.c | 7 + lib/librte_eal/bsdapp/eal/eal_memalloc.c | 33 + lib/librte_eal/bsdapp/eal/meson.build | 1 + lib/librte_eal/common/Makefile | 2 +- lib/librte_eal/common/eal_common_fbarray.c | 859 +++++++++++++++++ lib/librte_eal/common/eal_common_memalloc.c | 181 ++++ lib/librte_eal/common/eal_common_memory.c | 512 +++++++++- lib/librte_eal/common/eal_common_memzone.c | 275 ++++-- lib/librte_eal/common/eal_common_options.c | 8 + lib/librte_eal/common/eal_filesystem.h | 13 + lib/librte_eal/common/eal_hugepages.h | 7 + lib/librte_eal/common/eal_internal_cfg.h | 10 +- lib/librte_eal/common/eal_memalloc.h | 41 + lib/librte_eal/common/eal_options.h | 4 + lib/librte_eal/common/eal_private.h | 33 + lib/librte_eal/common/include/rte_eal_memconfig.h | 29 +- lib/librte_eal/common/include/rte_fbarray.h | 352 +++++++ lib/librte_eal/common/include/rte_malloc.h | 9 + lib/librte_eal/common/include/rte_malloc_heap.h | 6 + lib/librte_eal/common/include/rte_memory.h | 79 +- lib/librte_eal/common/include/rte_memzone.h | 155 ++- lib/librte_eal/common/include/rte_vfio.h | 39 + lib/librte_eal/common/malloc_elem.c | 436 +++++++-- lib/librte_eal/common/malloc_elem.h | 41 +- lib/librte_eal/common/malloc_heap.c | 694 +++++++++++++- lib/librte_eal/common/malloc_heap.h | 15 +- lib/librte_eal/common/malloc_mp.c | 723 ++++++++++++++ lib/librte_eal/common/malloc_mp.h | 86 ++ lib/librte_eal/common/meson.build | 4 + lib/librte_eal/common/rte_malloc.c | 75 +- lib/librte_eal/linuxapp/eal/Makefile | 5 + lib/librte_eal/linuxapp/eal/eal.c | 102 +- lib/librte_eal/linuxapp/eal/eal_hugepage_info.c | 155 ++- lib/librte_eal/linuxapp/eal/eal_memalloc.c | 1049 +++++++++++++++++++++ lib/librte_eal/linuxapp/eal/eal_memory.c | 516 ++++++---- lib/librte_eal/linuxapp/eal/eal_vfio.c | 318 +++++-- lib/librte_eal/linuxapp/eal/eal_vfio.h | 11 + lib/librte_eal/linuxapp/eal/meson.build | 1 + lib/librte_eal/rte_eal_version.map | 23 +- lib/librte_ether/rte_ethdev.c | 3 +- lib/librte_mempool/rte_mempool.c | 87 +- test/test/commands.c | 3 + test/test/test_malloc.c | 71 +- test/test/test_memory.c | 43 +- test/test/test_memzone.c | 26 +- 63 files changed, 6631 insertions(+), 751 deletions(-) create mode 100644 lib/librte_eal/bsdapp/eal/eal_memalloc.c create mode 100644 lib/librte_eal/common/eal_common_fbarray.c create mode 100644 lib/librte_eal/common/eal_common_memalloc.c create mode 100644 lib/librte_eal/common/eal_memalloc.h create mode 100644 lib/librte_eal/common/include/rte_fbarray.h create mode 100644 lib/librte_eal/common/malloc_mp.c create mode 100644 lib/librte_eal/common/malloc_mp.h create mode 100644 lib/librte_eal/linuxapp/eal/eal_memalloc.c -- 2.7.4