From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by dpdk.org (Postfix) with ESMTP id 178FD1B7D8 for ; Mon, 9 Apr 2018 20:01:20 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 09 Apr 2018 11:01:19 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.48,427,1517904000"; d="scan'208";a="32015135" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by orsmga007.jf.intel.com with ESMTP; 09 Apr 2018 11:01:15 -0700 Received: from sivswdev01.ir.intel.com (sivswdev01.ir.intel.com [10.237.217.45]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id w39I1FNN030932; Mon, 9 Apr 2018 19:01:15 +0100 Received: from sivswdev01.ir.intel.com (localhost [127.0.0.1]) by sivswdev01.ir.intel.com with ESMTP id w39I1EOl027525; Mon, 9 Apr 2018 19:01:14 +0100 Received: (from aburakov@localhost) by sivswdev01.ir.intel.com with LOCAL id w39I1Ehk027508; Mon, 9 Apr 2018 19:01:14 +0100 From: Anatoly Burakov To: dev@dpdk.org Cc: keith.wiles@intel.com, jianfeng.tan@intel.com, andras.kovacs@ericsson.com, laszlo.vadkeri@ericsson.com, benjamin.walker@intel.com, bruce.richardson@intel.com, thomas@monjalon.net, konstantin.ananyev@intel.com, kuralamudhan.ramakrishnan@intel.com, louise.m.daly@intel.com, nelio.laranjeiro@6wind.com, yskoh@mellanox.com, pepperjo@japf.ch, jerin.jacob@caviumnetworks.com, hemant.agrawal@nxp.com, olivier.matz@6wind.com, shreyansh.jain@nxp.com, gowrishankar.m@linux.vnet.ibm.com Date: Mon, 9 Apr 2018 19:00:03 +0100 Message-Id: X-Mailer: git-send-email 1.7.0.7 In-Reply-To: References: Subject: [dpdk-dev] [PATCH v5 00/70] Memory Hotplug for DPDK X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Apr 2018 18:01:21 -0000 This patchset introduces dynamic memory allocation for DPDK (aka memory hotplug). Based upon RFC submitted in December [1]. Dependencies (to be applied in specified order): - EAL IOVA fix [2] Deprecation notices relevant to this patchset: - General outline of memory hotplug changes [3] The vast majority of changes are in the EAL and malloc, the external API disruption is minimal: a new flag is added to memzone API for contiguous memory allocation, a few API additions in rte_memory due to switch to memseg_lists as opposed to memsegs, and a few new convenience API's. Every other API change is internal to EAL, and all of the memory allocation/freeing is handled through rte_malloc, with no externally visible API changes. Quick outline of all changes done as part of this patchset: * Malloc heap adjusted to handle holes in address space * Single memseg list replaced by multiple memseg lists * VA space for hugepages is preallocated in advance * Added alloc/free for pages happening as needed on rte_malloc/rte_free * Added contiguous memory allocation API's for rte_memzone * Added convenience API calls to walk over memsegs * Integrated Pawel Wodkowski's patch for registering/unregistering memory with VFIO [4] * Callbacks for registering memory allocations * Callbacks for allowing/disallowing allocations above specified limit * Multiprocess support done via DPDK IPC introduced in 18.02 The biggest difference is a "memseg" now represents a single page (as opposed to being a big contiguous block of pages). As a consequence, both memzones and malloc elements are no longer guaranteed to be physically contiguous, unless the user asks for it at reserve time. To preserve whatever functionality that was dependent on previous behavior, a legacy memory option is also provided, however it is expected (or perhaps vainly hoped) to be temporary solution. Why multiple memseg lists instead of one? Since memseg is a single page now, the list of memsegs will get quite big, and we need to locate pages somehow when we allocate and free them. We could of course just walk the list and allocate one contiguous chunk of VA space for memsegs, but this implementation uses separate lists instead in order to speed up many operations with memseg lists. For v5, the following limitations are present: - VFIO support for multiple processes is not well-tested; work is ongoing to validate VFIO for all use cases - There are known problems with PPC64 VFIO code - For DPAA and FSLMC platforms, performance will be heavily degraded for IOVA as PA cases; separate patches are expected to address the issue For testing, it is recommended to use the GitHub repository [5], as it will have all of the dependencies already integrated. Tested-by: Hemant Agrawal Tested-by: Santosh Shukla v5: - Fixed missing DMA window creation on PPC64 for VFIO - fslmc VFIO fixes - Added new user DMA map code to keep track of user DMA maps when hotplug is in use (also used on PPC64 on remap) - A few checkpatch and commit message fixes here and there v4: - Fixed bug in memzone lookup - Added draft fslmc VFIO code - Rebased on latest master + dependent patchset - Documented limitations for *_walk() functions v3: - Lots of compile fixes - Fixes for multiprocess synchronization - Introduced support for sPAPR IOMMU, courtesy of Gowrishankar @ IBM - Fixes for mempool size calculation - Added convenience memseg walk() API's - Added alloc validation callback v2: - fixed deadlock at init - reverted rte_panic changes at init, this is now handled inside IPC [1] http://dpdk.org/dev/patchwork/bundle/aburakov/Memory_RFC/ [2] http://dpdk.org/dev/patchwork/bundle/aburakov/IOVA_mode_fixes/ [3] http://dpdk.org/dev/patchwork/patch/34002/ [4] http://dpdk.org/dev/patchwork/patch/24484/ [5] https://github.com/anatolyburakov/dpdk Anatoly Burakov (70): eal: move get_virtual_area out of linuxapp eal_memory.c eal: move all locking to heap eal: make malloc heap a doubly-linked list eal: add function to dump malloc heap contents test: add command to dump malloc heap contents eal: make malloc_elem_join_adjacent_free public eal: make malloc free list remove public eal: make malloc free return resulting malloc element eal: replace panics with error messages in malloc eal: add backend support for contiguous allocation eal: enable reserving physically contiguous memzones ethdev: use contiguous allocation for DMA memory crypto/qat: use contiguous allocation for DMA memory net/avf: use contiguous allocation for DMA memory net/bnx2x: use contiguous allocation for DMA memory net/bnxt: use contiguous allocation for DMA memory net/cxgbe: use contiguous allocation for DMA memory net/ena: use contiguous allocation for DMA memory net/enic: use contiguous allocation for DMA memory net/i40e: use contiguous allocation for DMA memory net/qede: use contiguous allocation for DMA memory net/virtio: use contiguous allocation for DMA memory net/vmxnet3: use contiguous allocation for DMA memory mempool: add support for the new allocation methods eal: add function to walk all memsegs bus/fslmc: use memseg walk instead of iteration bus/pci: use memseg walk instead of iteration net/mlx5: use memseg walk instead of iteration eal: use memseg walk instead of iteration mempool: use memseg walk instead of iteration test: use memseg walk instead of iteration vfio/type1: use memseg walk instead of iteration vfio/spapr: use memseg walk instead of iteration eal: add contig walk function virtio: use memseg contig walk instead of iteration eal: add iova2virt function bus/dpaa: use iova2virt instead of memseg iteration bus/fslmc: use iova2virt instead of memseg iteration crypto/dpaa_sec: use iova2virt instead of memseg iteration eal: add virt2memseg function bus/fslmc: use virt2memseg instead of iteration crypto/dpaa_sec: use virt2memseg instead of iteration net/mlx4: use virt2memseg instead of iteration net/mlx5: use virt2memseg instead of iteration eal: use memzone walk instead of iteration vfio: allow to map other memory regions eal: add "legacy memory" option eal: add rte_fbarray eal: replace memseg with memseg lists eal: replace memzone array with fbarray eal: add support for mapping hugepages at runtime eal: add support for unmapping pages at runtime eal: add "single file segments" command-line option eal: add API to check if memory is contiguous eal: prepare memseg lists for multiprocess sync eal: read hugepage counts from node-specific sysfs path eal: make use of memory hotplug for init eal: share hugepage info primary and secondary eal: add secondary process init with memory hotplug eal: enable memory hotplug support in rte_malloc eal: add support for multiprocess memory hotplug eal: add support for callbacks on memory hotplug eal: enable callbacks on malloc/free and mp sync vfio: enable support for mem event callbacks bus/fslmc: move vfio DMA map into bus probe bus/fslmc: enable support for mem event callbacks for vfio eal: enable non-legacy memory mode eal: add memory validator callback eal: enable validation before new page allocation eal: prevent preallocated pages from being freed config/common_base | 15 +- config/defconfig_i686-native-linuxapp-gcc | 3 + config/defconfig_i686-native-linuxapp-icc | 3 + config/defconfig_x86_x32-native-linuxapp-gcc | 3 + config/rte_config.h | 7 +- doc/guides/rel_notes/deprecation.rst | 9 - drivers/bus/dpaa/rte_dpaa_bus.h | 12 +- drivers/bus/fslmc/fslmc_bus.c | 11 + drivers/bus/fslmc/fslmc_vfio.c | 195 +++- drivers/bus/fslmc/portal/dpaa2_hw_pvt.h | 27 +- drivers/bus/pci/Makefile | 3 + drivers/bus/pci/linux/pci.c | 28 +- drivers/bus/pci/meson.build | 3 + drivers/crypto/dpaa_sec/dpaa_sec.c | 30 +- drivers/crypto/qat/qat_qp.c | 23 +- drivers/event/dpaa2/Makefile | 3 + drivers/mempool/dpaa/Makefile | 3 + drivers/mempool/dpaa/meson.build | 3 + drivers/mempool/dpaa2/Makefile | 3 + drivers/mempool/dpaa2/meson.build | 3 + drivers/net/avf/avf_ethdev.c | 4 +- drivers/net/bnx2x/bnx2x.c | 2 +- drivers/net/bnx2x/bnx2x_rxtx.c | 3 +- drivers/net/bnxt/bnxt_ethdev.c | 17 +- drivers/net/bnxt/bnxt_ring.c | 9 +- drivers/net/bnxt/bnxt_vnic.c | 8 +- drivers/net/cxgbe/sge.c | 3 +- drivers/net/dpaa/Makefile | 3 + drivers/net/dpaa2/Makefile | 3 + drivers/net/dpaa2/dpaa2_ethdev.c | 1 - drivers/net/dpaa2/meson.build | 3 + drivers/net/ena/Makefile | 3 + drivers/net/ena/base/ena_plat_dpdk.h | 9 +- drivers/net/ena/ena_ethdev.c | 10 +- drivers/net/enic/enic_main.c | 9 +- drivers/net/i40e/i40e_ethdev.c | 4 +- drivers/net/i40e/i40e_rxtx.c | 4 +- drivers/net/mlx4/mlx4_mr.c | 18 +- drivers/net/mlx5/Makefile | 3 + drivers/net/mlx5/mlx5.c | 25 +- drivers/net/mlx5/mlx5_mr.c | 19 +- drivers/net/qede/base/bcm_osal.c | 7 +- drivers/net/virtio/virtio_ethdev.c | 8 +- drivers/net/virtio/virtio_user/vhost_kernel.c | 83 +- drivers/net/vmxnet3/vmxnet3_ethdev.c | 5 +- lib/librte_eal/bsdapp/eal/Makefile | 4 + lib/librte_eal/bsdapp/eal/eal.c | 83 +- lib/librte_eal/bsdapp/eal/eal_hugepage_info.c | 65 +- lib/librte_eal/bsdapp/eal/eal_memalloc.c | 48 + lib/librte_eal/bsdapp/eal/eal_memory.c | 224 +++- lib/librte_eal/bsdapp/eal/meson.build | 1 + lib/librte_eal/common/Makefile | 2 +- lib/librte_eal/common/eal_common_fbarray.c | 859 ++++++++++++++++ lib/librte_eal/common/eal_common_memalloc.c | 359 +++++++ lib/librte_eal/common/eal_common_memory.c | 824 ++++++++++++++- lib/librte_eal/common/eal_common_memzone.c | 235 +++-- lib/librte_eal/common/eal_common_options.c | 13 +- lib/librte_eal/common/eal_filesystem.h | 30 + lib/librte_eal/common/eal_hugepages.h | 11 +- lib/librte_eal/common/eal_internal_cfg.h | 12 +- lib/librte_eal/common/eal_memalloc.h | 80 ++ lib/librte_eal/common/eal_options.h | 4 + lib/librte_eal/common/eal_private.h | 33 + lib/librte_eal/common/include/rte_eal_memconfig.h | 28 +- lib/librte_eal/common/include/rte_fbarray.h | 353 +++++++ lib/librte_eal/common/include/rte_malloc.h | 10 + lib/librte_eal/common/include/rte_malloc_heap.h | 6 + lib/librte_eal/common/include/rte_memory.h | 258 ++++- lib/librte_eal/common/include/rte_memzone.h | 12 +- lib/librte_eal/common/include/rte_vfio.h | 41 + lib/librte_eal/common/malloc_elem.c | 433 ++++++-- lib/librte_eal/common/malloc_elem.h | 43 +- lib/librte_eal/common/malloc_heap.c | 704 ++++++++++++- lib/librte_eal/common/malloc_heap.h | 15 +- lib/librte_eal/common/malloc_mp.c | 744 ++++++++++++++ lib/librte_eal/common/malloc_mp.h | 86 ++ lib/librte_eal/common/meson.build | 4 + lib/librte_eal/common/rte_malloc.c | 85 +- lib/librte_eal/linuxapp/eal/Makefile | 5 + lib/librte_eal/linuxapp/eal/eal.c | 62 +- lib/librte_eal/linuxapp/eal/eal_hugepage_info.c | 218 +++- lib/librte_eal/linuxapp/eal/eal_memalloc.c | 1123 +++++++++++++++++++++ lib/librte_eal/linuxapp/eal/eal_memory.c | 1119 ++++++++++++-------- lib/librte_eal/linuxapp/eal/eal_vfio.c | 870 ++++++++++++++-- lib/librte_eal/linuxapp/eal/eal_vfio.h | 12 + lib/librte_eal/linuxapp/eal/meson.build | 1 + lib/librte_eal/rte_eal_version.map | 30 +- lib/librte_ether/rte_ethdev.c | 3 +- lib/librte_mempool/Makefile | 3 + lib/librte_mempool/meson.build | 3 + lib/librte_mempool/rte_mempool.c | 149 ++- test/test/commands.c | 3 + test/test/test_malloc.c | 30 +- test/test/test_memory.c | 27 +- test/test/test_memzone.c | 62 +- 95 files changed, 8794 insertions(+), 1285 deletions(-) create mode 100644 lib/librte_eal/bsdapp/eal/eal_memalloc.c create mode 100644 lib/librte_eal/common/eal_common_fbarray.c create mode 100644 lib/librte_eal/common/eal_common_memalloc.c create mode 100644 lib/librte_eal/common/eal_memalloc.h create mode 100644 lib/librte_eal/common/include/rte_fbarray.h create mode 100644 lib/librte_eal/common/malloc_mp.c create mode 100644 lib/librte_eal/common/malloc_mp.h create mode 100644 lib/librte_eal/linuxapp/eal/eal_memalloc.c -- 2.7.4