From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by dpdk.org (Postfix) with ESMTP id 151F91B892 for ; Wed, 4 Apr 2018 01:22:28 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 03 Apr 2018 16:22:25 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.48,403,1517904000"; d="scan'208";a="43168498" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by fmsmga004.fm.intel.com with ESMTP; 03 Apr 2018 16:22:21 -0700 Received: from sivswdev01.ir.intel.com (sivswdev01.ir.intel.com [10.237.217.45]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id w33NMLaE013038; Wed, 4 Apr 2018 00:22:21 +0100 Received: from sivswdev01.ir.intel.com (localhost [127.0.0.1]) by sivswdev01.ir.intel.com with ESMTP id w33NMLYA014550; Wed, 4 Apr 2018 00:22:21 +0100 Received: (from aburakov@localhost) by sivswdev01.ir.intel.com with LOCAL id w33NMKuv014544; Wed, 4 Apr 2018 00:22:20 +0100 From: Anatoly Burakov To: dev@dpdk.org Cc: keith.wiles@intel.com, jianfeng.tan@intel.com, andras.kovacs@ericsson.com, laszlo.vadkeri@ericsson.com, benjamin.walker@intel.com, bruce.richardson@intel.com, thomas@monjalon.net, konstantin.ananyev@intel.com, kuralamudhan.ramakrishnan@intel.com, louise.m.daly@intel.com, nelio.laranjeiro@6wind.com, yskoh@mellanox.com, pepperjo@japf.ch, jerin.jacob@caviumnetworks.com, hemant.agrawal@nxp.com, olivier.matz@6wind.com, shreyansh.jain@nxp.com, gowrishankar.m@linux.vnet.ibm.com Date: Wed, 4 Apr 2018 00:21:12 +0100 Message-Id: X-Mailer: git-send-email 1.7.0.7 In-Reply-To: References: Subject: [dpdk-dev] [PATCH v3 00/68] Memory Hotplug for DPDK X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 Apr 2018 23:22:31 -0000 This patchset introduces dynamic memory allocation for DPDK (aka memory hotplug). Based upon RFC submitted in December [1]. Dependencies (to be applied in specified order): - IPC asynchronous request API patch [2] - Function to return number of sockets [3] - EAL IOVA fix [4] Deprecation notices relevant to this patchset: - General outline of memory hotplug changes [5] - EAL NUMA node count changes [6] The vast majority of changes are in the EAL and malloc, the external API disruption is minimal: a new set of API's are added for contiguous memory allocation for rte_memzone, and a few API additions in rte_memory due to switch to memseg_lists as opposed to memsegs. Every other API change is internal to EAL, and all of the memory allocation/freeing is handled through rte_malloc, with no externally visible API changes. Quick outline of all changes done as part of this patchset: * Malloc heap adjusted to handle holes in address space * Single memseg list replaced by multiple memseg lists * VA space for hugepages is preallocated in advance * Added alloc/free for pages happening as needed on rte_malloc/rte_free * Added contiguous memory allocation API's for rte_memzone * Added convenience API calls to walk over memsegs * Integrated Pawel Wodkowski's patch for registering/unregistering memory with VFIO [7] * Callbacks for registering memory allocations * Callbacks for allowing/disallowing allocations above specified limit * Multiprocess support done via DPDK IPC introduced in 18.02 The biggest difference is a "memseg" now represents a single page (as opposed to being a big contiguous block of pages). As a consequence, both memzones and malloc elements are no longer guaranteed to be physically contiguous, unless the user asks for it at reserve time. To preserve whatever functionality that was dependent on previous behavior, a legacy memory option is also provided, however it is expected (or perhaps vainly hoped) to be temporary solution. Why multiple memseg lists instead of one? Since memseg is a single page now, the list of memsegs will get quite big, and we need to locate pages somehow when we allocate and free them. We could of course just walk the list and allocate one contiguous chunk of VA space for memsegs, but this implementation uses separate lists instead in order to speed up many operations with memseg lists. For v3, the following limitations are present: - VFIO support is only smoke-tested (but is expected to work), VFIO support with secondary processes is not tested; work is ongoing to validate VFIO for all use cases - FSLMC bus VFIO code is not yet integrated, work is in progress For testing, it is recommended to use the GitHub repository [8], as it will have all of the dependencies already integrated. v3: - Lots of compile fixes - Fixes for multiprocess synchronization - Introduced support for sPAPR IOMMU, courtesy of Gowrishankar @ IBM - Fixes for mempool size calculation - Added convenience memseg walk() API's - Added alloc validation callback v2: - fixed deadlock at init - reverted rte_panic changes at init, this is now handled inside IPC [1] http://dpdk.org/dev/patchwork/bundle/aburakov/Memory_RFC/ [2] http://dpdk.org/dev/patchwork/bundle/aburakov/IPC_Async_Request/ [3] http://dpdk.org/dev/patchwork/bundle/aburakov/Num_Sockets/ [4] http://dpdk.org/dev/patchwork/bundle/aburakov/IOVA_mode_fixes/ [5] http://dpdk.org/dev/patchwork/patch/34002/ [6] http://dpdk.org/dev/patchwork/patch/33853/ [7] http://dpdk.org/dev/patchwork/patch/24484/ [8] https://github.com/anatolyburakov/dpdk Anatoly Burakov (68): eal: move get_virtual_area out of linuxapp eal_memory.c eal: move all locking to heap eal: make malloc heap a doubly-linked list eal: add function to dump malloc heap contents test: add command to dump malloc heap contents eal: make malloc_elem_join_adjacent_free public eal: make malloc free list remove public eal: make malloc free return resulting malloc element eal: replace panics with error messages in malloc eal: add backend support for contiguous allocation eal: enable reserving physically contiguous memzones ethdev: use contiguous allocation for DMA memory crypto/qat: use contiguous allocation for DMA memory net/avf: use contiguous allocation for DMA memory net/bnx2x: use contiguous allocation for DMA memory net/cxgbe: use contiguous allocation for DMA memory net/ena: use contiguous allocation for DMA memory net/enic: use contiguous allocation for DMA memory net/i40e: use contiguous allocation for DMA memory net/qede: use contiguous allocation for DMA memory net/virtio: use contiguous allocation for DMA memory net/vmxnet3: use contiguous allocation for DMA memory net/bnxt: use contiguous allocation for DMA memory mempool: add support for the new allocation methods eal: add function to walk all memsegs bus/fslmc: use memseg walk instead of iteration bus/pci: use memseg walk instead of iteration net/mlx5: use memseg walk instead of iteration eal: use memseg walk instead of iteration mempool: use memseg walk instead of iteration test: use memseg walk instead of iteration vfio/type1: use memseg walk instead of iteration vfio/spapr: use memseg walk instead of iteration eal: add contig walk function virtio: use memseg contig walk instead of iteration eal: add iova2virt function bus/dpaa: use iova2virt instead of memseg iteration bus/fslmc: use iova2virt instead of memseg iteration crypto/dpaa_sec: use iova2virt instead of memseg iteration eal: add virt2memseg function bus/fslmc: use virt2memseg instead of iteration net/mlx4: use virt2memseg instead of iteration net/mlx5: use virt2memseg instead of iteration crypto/dpaa_sec: use virt2memseg instead of iteration eal: use memzone walk instead of iteration vfio: allow to map other memory regions eal: add "legacy memory" option eal: add rte_fbarray eal: replace memseg with memseg lists eal: replace memzone array with fbarray eal: add support for mapping hugepages at runtime eal: add support for unmapping pages at runtime eal: add "single file segments" command-line option eal: add API to check if memory is contiguous eal: prepare memseg lists for multiprocess sync eal: read hugepage counts from node-specific sysfs path eal: make use of memory hotplug for init eal: share hugepage info primary and secondary eal: add secondary process init with memory hotplug eal: enable memory hotplug support in rte_malloc eal: add support for multiprocess memory hotplug eal: add support for callbacks on memory hotplug eal: enable callbacks on malloc/free and mp sync vfio: enable support for mem event callbacks eal: enable non-legacy memory mode eal: add memory validator callback eal: enable validation before new page allocation eal: prevent preallocated pages from being freed config/common_base | 15 +- config/defconfig_i686-native-linuxapp-gcc | 3 + config/defconfig_i686-native-linuxapp-icc | 3 + config/defconfig_x86_x32-native-linuxapp-gcc | 3 + config/rte_config.h | 7 +- doc/guides/rel_notes/deprecation.rst | 9 - drivers/bus/dpaa/rte_dpaa_bus.h | 12 +- drivers/bus/fslmc/fslmc_vfio.c | 80 +- drivers/bus/fslmc/portal/dpaa2_hw_pvt.h | 27 +- drivers/bus/pci/Makefile | 3 + drivers/bus/pci/linux/pci.c | 28 +- drivers/bus/pci/meson.build | 3 + drivers/crypto/dpaa_sec/dpaa_sec.c | 30 +- drivers/crypto/qat/Makefile | 3 + drivers/crypto/qat/meson.build | 3 + drivers/crypto/qat/qat_qp.c | 23 +- drivers/event/dpaa/Makefile | 3 + drivers/event/dpaa2/Makefile | 3 + drivers/mempool/dpaa/Makefile | 3 + drivers/mempool/dpaa2/Makefile | 3 + drivers/net/avf/Makefile | 3 + drivers/net/avf/avf_ethdev.c | 2 +- drivers/net/bnx2x/Makefile | 3 + drivers/net/bnx2x/bnx2x.c | 2 +- drivers/net/bnx2x/bnx2x_rxtx.c | 3 +- drivers/net/bnxt/Makefile | 3 + drivers/net/bnxt/bnxt_ethdev.c | 6 +- drivers/net/bnxt/bnxt_ring.c | 3 +- drivers/net/bnxt/bnxt_vnic.c | 2 +- drivers/net/cxgbe/Makefile | 3 + drivers/net/cxgbe/sge.c | 3 +- drivers/net/dpaa/Makefile | 3 + drivers/net/dpaa2/Makefile | 3 + drivers/net/dpaa2/meson.build | 3 + drivers/net/ena/Makefile | 3 + drivers/net/ena/base/ena_plat_dpdk.h | 7 +- drivers/net/ena/ena_ethdev.c | 10 +- drivers/net/enic/Makefile | 3 + drivers/net/enic/enic_main.c | 4 +- drivers/net/i40e/Makefile | 3 + drivers/net/i40e/i40e_ethdev.c | 2 +- drivers/net/i40e/i40e_rxtx.c | 2 +- drivers/net/i40e/meson.build | 3 + drivers/net/mlx4/mlx4_mr.c | 17 +- drivers/net/mlx5/Makefile | 3 + drivers/net/mlx5/mlx5.c | 25 +- drivers/net/mlx5/mlx5_mr.c | 18 +- drivers/net/octeontx/Makefile | 3 + drivers/net/qede/Makefile | 3 + drivers/net/qede/base/bcm_osal.c | 5 +- drivers/net/virtio/virtio_ethdev.c | 8 +- drivers/net/virtio/virtio_user/vhost_kernel.c | 83 +- drivers/net/vmxnet3/Makefile | 3 + drivers/net/vmxnet3/vmxnet3_ethdev.c | 7 +- lib/librte_eal/bsdapp/eal/Makefile | 4 + lib/librte_eal/bsdapp/eal/eal.c | 83 +- lib/librte_eal/bsdapp/eal/eal_hugepage_info.c | 65 +- lib/librte_eal/bsdapp/eal/eal_memalloc.c | 48 + lib/librte_eal/bsdapp/eal/eal_memory.c | 222 +++- lib/librte_eal/bsdapp/eal/meson.build | 1 + lib/librte_eal/common/Makefile | 2 +- lib/librte_eal/common/eal_common_fbarray.c | 859 ++++++++++++++++ lib/librte_eal/common/eal_common_memalloc.c | 359 +++++++ lib/librte_eal/common/eal_common_memory.c | 804 ++++++++++++++- lib/librte_eal/common/eal_common_memzone.c | 274 +++-- lib/librte_eal/common/eal_common_options.c | 13 +- lib/librte_eal/common/eal_filesystem.h | 30 + lib/librte_eal/common/eal_hugepages.h | 11 +- lib/librte_eal/common/eal_internal_cfg.h | 12 +- lib/librte_eal/common/eal_memalloc.h | 80 ++ lib/librte_eal/common/eal_options.h | 4 + lib/librte_eal/common/eal_private.h | 33 + lib/librte_eal/common/include/rte_eal_memconfig.h | 28 +- lib/librte_eal/common/include/rte_fbarray.h | 353 +++++++ lib/librte_eal/common/include/rte_malloc.h | 10 + lib/librte_eal/common/include/rte_malloc_heap.h | 6 + lib/librte_eal/common/include/rte_memory.h | 232 ++++- lib/librte_eal/common/include/rte_memzone.h | 159 ++- lib/librte_eal/common/include/rte_vfio.h | 39 + lib/librte_eal/common/malloc_elem.c | 433 ++++++-- lib/librte_eal/common/malloc_elem.h | 43 +- lib/librte_eal/common/malloc_heap.c | 704 ++++++++++++- lib/librte_eal/common/malloc_heap.h | 15 +- lib/librte_eal/common/malloc_mp.c | 744 ++++++++++++++ lib/librte_eal/common/malloc_mp.h | 86 ++ lib/librte_eal/common/meson.build | 4 + lib/librte_eal/common/rte_malloc.c | 85 +- lib/librte_eal/linuxapp/eal/Makefile | 5 + lib/librte_eal/linuxapp/eal/eal.c | 62 +- lib/librte_eal/linuxapp/eal/eal_hugepage_info.c | 218 +++- lib/librte_eal/linuxapp/eal/eal_memalloc.c | 1124 +++++++++++++++++++++ lib/librte_eal/linuxapp/eal/eal_memory.c | 1119 ++++++++++++-------- lib/librte_eal/linuxapp/eal/eal_vfio.c | 491 +++++++-- lib/librte_eal/linuxapp/eal/eal_vfio.h | 12 + lib/librte_eal/linuxapp/eal/meson.build | 1 + lib/librte_eal/rte_eal_version.map | 33 +- lib/librte_ether/rte_ethdev.c | 3 +- lib/librte_mempool/Makefile | 3 + lib/librte_mempool/meson.build | 3 + lib/librte_mempool/rte_mempool.c | 138 ++- test/test/commands.c | 3 + test/test/test_malloc.c | 30 +- test/test/test_memory.c | 27 +- test/test/test_memzone.c | 62 +- 104 files changed, 8434 insertions(+), 1263 deletions(-) create mode 100644 lib/librte_eal/bsdapp/eal/eal_memalloc.c create mode 100644 lib/librte_eal/common/eal_common_fbarray.c create mode 100644 lib/librte_eal/common/eal_common_memalloc.c create mode 100644 lib/librte_eal/common/eal_memalloc.h create mode 100644 lib/librte_eal/common/include/rte_fbarray.h create mode 100644 lib/librte_eal/common/malloc_mp.c create mode 100644 lib/librte_eal/common/malloc_mp.h create mode 100644 lib/librte_eal/linuxapp/eal/eal_memalloc.c -- 2.7.4