From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by dpdk.org (Postfix) with ESMTP id 44C3F1BB90 for ; Wed, 11 Apr 2018 14:30:59 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 11 Apr 2018 05:30:51 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.48,436,1517904000"; d="scan'208";a="33464623" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by orsmga006.jf.intel.com with ESMTP; 11 Apr 2018 05:30:47 -0700 Received: from sivswdev01.ir.intel.com (sivswdev01.ir.intel.com [10.237.217.45]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id w3BCUkUw012278; Wed, 11 Apr 2018 13:30:46 +0100 Received: from sivswdev01.ir.intel.com (localhost [127.0.0.1]) by sivswdev01.ir.intel.com with ESMTP id w3BCUk7C013496; Wed, 11 Apr 2018 13:30:46 +0100 Received: (from aburakov@localhost) by sivswdev01.ir.intel.com with LOCAL id w3BCUj0t013484; Wed, 11 Apr 2018 13:30:45 +0100 From: Anatoly Burakov To: dev@dpdk.org Cc: keith.wiles@intel.com, jianfeng.tan@intel.com, andras.kovacs@ericsson.com, laszlo.vadkeri@ericsson.com, benjamin.walker@intel.com, bruce.richardson@intel.com, thomas@monjalon.net, konstantin.ananyev@intel.com, kuralamudhan.ramakrishnan@intel.com, louise.m.daly@intel.com, nelio.laranjeiro@6wind.com, yskoh@mellanox.com, pepperjo@japf.ch, jerin.jacob@caviumnetworks.com, hemant.agrawal@nxp.com, olivier.matz@6wind.com, shreyansh.jain@nxp.com, gowrishankar.m@linux.vnet.ibm.com Date: Wed, 11 Apr 2018 13:29:35 +0100 Message-Id: X-Mailer: git-send-email 1.7.0.7 In-Reply-To: References: Subject: [dpdk-dev] [PATCH v6 00/70] Memory Hotplug for DPDK X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 Apr 2018 12:31:00 -0000 This patchset introduces dynamic memory allocation for DPDK (aka memory hotplug). Based upon RFC submitted in December [1]. Deprecation notices relevant to this patchset: - General outline of memory hotplug changes [2] The vast majority of changes are in the EAL and malloc, the external API disruption is minimal: a new flag is added to memzone API for contiguous memory allocation, a few API additions in rte_memory due to switch to memseg_lists as opposed to memsegs, and a few new convenience API's. Every other API change is internal to EAL, and all of the memory allocation/freeing is handled through rte_malloc, with no externally visible API changes. Quick outline of all changes done as part of this patchset: * Malloc heap adjusted to handle holes in address space * Single memseg list replaced by multiple memseg lists * VA space for hugepages is preallocated in advance * Added alloc/free for pages happening as needed on rte_malloc/rte_free * Added contiguous memory allocation API's for rte_memzone * Added convenience API calls to walk over memsegs * Integrated Pawel Wodkowski's patch for registering/unregistering memory with VFIO [3] * Callbacks for registering memory allocations * Callbacks for allowing/disallowing allocations above specified limit * Multiprocess support done via DPDK IPC introduced in 18.02 The biggest difference is a "memseg" now represents a single page (as opposed to being a big contiguous block of pages). As a consequence, both memzones and malloc elements are no longer guaranteed to be physically contiguous, unless the user asks for it at reserve time. To preserve whatever functionality that was dependent on previous behavior, a legacy memory option is also provided, however it is expected (or perhaps vainly hoped) to be temporary solution. Why multiple memseg lists instead of one? Since memseg is a single page now, the list of memsegs will get quite big, and we need to locate pages somehow when we allocate and free them. We could of course just walk the list and allocate one contiguous chunk of VA space for memsegs, but this implementation uses separate lists instead in order to speed up many operations with memseg lists. For v6, the following limitations are present: - VFIO support for multiple processes is not well-tested; work is ongoing to validate VFIO for all use cases - There are known problems with PPC64 VFIO code, expected to be addressed in separate patches - For DPAA and FSLMC platforms, performance will be heavily degraded for IOVA as PA cases; separate patches are expected to address the issue Tested-by: Hemant Agrawal Tested-by: Santosh Shukla Tested-by: Gowrishankar Muthukrishnan v6: - Compile fix in bnx2x - Added PPC64 DMA window creation to appropriate patch - C++-tyle comment fixes - Commit message renames to be more specific about affected areas v5: - Fixed missing DMA window creation on PPC64 for VFIO - fslmc VFIO fixes - Added new user DMA map code to keep track of user DMA maps when hotplug is in use (also used on PPC64 on remap) - A few checkpatch and commit message fixes here and there v4: - Fixed bug in memzone lookup - Added draft fslmc VFIO code - Rebased on latest master + dependent patchset - Documented limitations for *_walk() functions v3: - Lots of compile fixes - Fixes for multiprocess synchronization - Introduced support for sPAPR IOMMU, courtesy of Gowrishankar @ IBM - Fixes for mempool size calculation - Added convenience memseg walk() API's - Added alloc validation callback v2: - fixed deadlock at init - reverted rte_panic changes at init, this is now handled inside IPC [1] http://dpdk.org/dev/patchwork/bundle/aburakov/Memory_RFC/ [2] http://dpdk.org/dev/patchwork/patch/34002/ [3] http://dpdk.org/dev/patchwork/patch/24484/ Anatoly Burakov (70): eal: move get_virtual_area out of linuxapp eal_memory.c malloc: move all locking to heap malloc: make heap a doubly-linked list malloc: add function to dump heap contents test: add command to dump malloc heap contents malloc: make malloc_elem_join_adjacent_free public malloc: make elem_free_list_remove public malloc: make free return resulting element malloc: replace panics with error messages malloc: add support for contiguous allocation memzone: enable reserving IOVA-contiguous memzones ethdev: use contiguous allocation for DMA memory crypto/qat: use contiguous allocation for DMA memory net/avf: use contiguous allocation for DMA memory net/bnx2x: use contiguous allocation for DMA memory net/bnxt: use contiguous allocation for DMA memory net/cxgbe: use contiguous allocation for DMA memory net/ena: use contiguous allocation for DMA memory net/enic: use contiguous allocation for DMA memory net/i40e: use contiguous allocation for DMA memory net/qede: use contiguous allocation for DMA memory net/virtio: use contiguous allocation for DMA memory net/vmxnet3: use contiguous allocation for DMA memory mempool: add support for the new allocation methods eal: add function to walk all memsegs bus/fslmc: use memseg walk instead of iteration bus/pci: use memseg walk instead of iteration net/mlx5: use memseg walk instead of iteration eal: use memseg walk instead of iteration mempool: use memseg walk instead of iteration test: use memseg walk instead of iteration vfio/type1: use memseg walk instead of iteration vfio/spapr: use memseg walk instead of iteration eal: add contig walk function virtio: use memseg contig walk instead of iteration eal: add iova2virt function bus/dpaa: use iova2virt instead of memseg iteration bus/fslmc: use iova2virt instead of memseg iteration crypto/dpaa_sec: use iova2virt instead of memseg iteration eal: add virt2memseg function bus/fslmc: use virt2memseg instead of iteration crypto/dpaa_sec: use virt2memseg instead of iteration net/mlx4: use virt2memseg instead of iteration net/mlx5: use virt2memseg instead of iteration memzone: use walk instead of iteration for dumping vfio: allow to map other memory regions eal: add legacy memory option eal: add shared indexed file-backed array eal: replace memseg with memseg lists eal: replace memzone array with fbarray mem: add support for mapping hugepages at runtime mem: add support for unmapping pages at runtime eal: add single file segments command-line option mem: add internal API to check if memory is contiguous mem: prepare memseg lists for multiprocess sync eal: read hugepage counts from node-specific sysfs path eal: make use of memory hotplug for init eal: share hugepage info primary and secondary eal: add secondary process init with memory hotplug malloc: enable memory hotplug support malloc: add support for multiprocess memory hotplug malloc: add support for callbacks on memory events malloc: enable callbacks on alloc/free and mp sync vfio: enable support for mem event callbacks bus/fslmc: move vfio DMA map into bus probe bus/fslmc: enable support for mem event callbacks for vfio eal: enable non-legacy memory mode eal: add memory validator callback malloc: enable validation before new page allocation mem: prevent preallocated pages from being freed config/common_base | 15 +- config/defconfig_i686-native-linuxapp-gcc | 3 + config/defconfig_i686-native-linuxapp-icc | 3 + config/defconfig_x86_x32-native-linuxapp-gcc | 3 + config/rte_config.h | 7 +- doc/guides/rel_notes/deprecation.rst | 9 - drivers/bus/dpaa/rte_dpaa_bus.h | 12 +- drivers/bus/fslmc/fslmc_bus.c | 11 + drivers/bus/fslmc/fslmc_vfio.c | 195 +++- drivers/bus/fslmc/portal/dpaa2_hw_pvt.h | 27 +- drivers/bus/pci/Makefile | 3 + drivers/bus/pci/linux/pci.c | 28 +- drivers/bus/pci/meson.build | 3 + drivers/crypto/dpaa_sec/dpaa_sec.c | 30 +- drivers/crypto/qat/qat_qp.c | 23 +- drivers/event/dpaa2/Makefile | 3 + drivers/mempool/dpaa/Makefile | 3 + drivers/mempool/dpaa/meson.build | 3 + drivers/mempool/dpaa2/Makefile | 3 + drivers/mempool/dpaa2/meson.build | 3 + drivers/net/avf/avf_ethdev.c | 4 +- drivers/net/bnx2x/bnx2x.c | 4 +- drivers/net/bnx2x/bnx2x_rxtx.c | 3 +- drivers/net/bnxt/bnxt_ethdev.c | 17 +- drivers/net/bnxt/bnxt_ring.c | 9 +- drivers/net/bnxt/bnxt_vnic.c | 8 +- drivers/net/cxgbe/sge.c | 3 +- drivers/net/dpaa/Makefile | 3 + drivers/net/dpaa2/Makefile | 3 + drivers/net/dpaa2/dpaa2_ethdev.c | 1 - drivers/net/dpaa2/meson.build | 3 + drivers/net/ena/Makefile | 3 + drivers/net/ena/base/ena_plat_dpdk.h | 9 +- drivers/net/ena/ena_ethdev.c | 10 +- drivers/net/enic/enic_main.c | 9 +- drivers/net/i40e/i40e_ethdev.c | 4 +- drivers/net/i40e/i40e_rxtx.c | 4 +- drivers/net/mlx4/mlx4_mr.c | 18 +- drivers/net/mlx5/Makefile | 3 + drivers/net/mlx5/mlx5.c | 25 +- drivers/net/mlx5/mlx5_mr.c | 19 +- drivers/net/qede/base/bcm_osal.c | 7 +- drivers/net/virtio/virtio_ethdev.c | 8 +- drivers/net/virtio/virtio_user/vhost_kernel.c | 83 +- drivers/net/vmxnet3/vmxnet3_ethdev.c | 5 +- lib/librte_eal/bsdapp/eal/Makefile | 4 + lib/librte_eal/bsdapp/eal/eal.c | 83 +- lib/librte_eal/bsdapp/eal/eal_hugepage_info.c | 65 +- lib/librte_eal/bsdapp/eal/eal_memalloc.c | 48 + lib/librte_eal/bsdapp/eal/eal_memory.c | 224 +++- lib/librte_eal/bsdapp/eal/meson.build | 1 + lib/librte_eal/common/Makefile | 2 +- lib/librte_eal/common/eal_common_fbarray.c | 859 ++++++++++++++++ lib/librte_eal/common/eal_common_memalloc.c | 359 +++++++ lib/librte_eal/common/eal_common_memory.c | 824 ++++++++++++++- lib/librte_eal/common/eal_common_memzone.c | 235 +++-- lib/librte_eal/common/eal_common_options.c | 15 +- lib/librte_eal/common/eal_filesystem.h | 30 + lib/librte_eal/common/eal_hugepages.h | 11 +- lib/librte_eal/common/eal_internal_cfg.h | 12 +- lib/librte_eal/common/eal_memalloc.h | 79 ++ lib/librte_eal/common/eal_options.h | 4 + lib/librte_eal/common/eal_private.h | 33 + lib/librte_eal/common/include/rte_eal_memconfig.h | 28 +- lib/librte_eal/common/include/rte_fbarray.h | 353 +++++++ lib/librte_eal/common/include/rte_malloc.h | 10 + lib/librte_eal/common/include/rte_malloc_heap.h | 6 + lib/librte_eal/common/include/rte_memory.h | 258 ++++- lib/librte_eal/common/include/rte_memzone.h | 12 +- lib/librte_eal/common/include/rte_vfio.h | 41 + lib/librte_eal/common/malloc_elem.c | 433 ++++++-- lib/librte_eal/common/malloc_elem.h | 43 +- lib/librte_eal/common/malloc_heap.c | 704 ++++++++++++- lib/librte_eal/common/malloc_heap.h | 15 +- lib/librte_eal/common/malloc_mp.c | 744 ++++++++++++++ lib/librte_eal/common/malloc_mp.h | 86 ++ lib/librte_eal/common/meson.build | 4 + lib/librte_eal/common/rte_malloc.c | 85 +- lib/librte_eal/linuxapp/eal/Makefile | 5 + lib/librte_eal/linuxapp/eal/eal.c | 62 +- lib/librte_eal/linuxapp/eal/eal_hugepage_info.c | 218 +++- lib/librte_eal/linuxapp/eal/eal_memalloc.c | 1123 +++++++++++++++++++++ lib/librte_eal/linuxapp/eal/eal_memory.c | 1120 ++++++++++++-------- lib/librte_eal/linuxapp/eal/eal_vfio.c | 870 ++++++++++++++-- lib/librte_eal/linuxapp/eal/eal_vfio.h | 12 + lib/librte_eal/linuxapp/eal/meson.build | 1 + lib/librte_eal/rte_eal_version.map | 30 +- lib/librte_ether/rte_ethdev.c | 3 +- lib/librte_mempool/Makefile | 3 + lib/librte_mempool/meson.build | 3 + lib/librte_mempool/rte_mempool.c | 149 ++- test/test/commands.c | 3 + test/test/test_malloc.c | 30 +- test/test/test_memory.c | 27 +- test/test/test_memzone.c | 62 +- 95 files changed, 8797 insertions(+), 1286 deletions(-) create mode 100644 lib/librte_eal/bsdapp/eal/eal_memalloc.c create mode 100644 lib/librte_eal/common/eal_common_fbarray.c create mode 100644 lib/librte_eal/common/eal_common_memalloc.c create mode 100644 lib/librte_eal/common/eal_memalloc.h create mode 100644 lib/librte_eal/common/include/rte_fbarray.h create mode 100644 lib/librte_eal/common/malloc_mp.c create mode 100644 lib/librte_eal/common/malloc_mp.h create mode 100644 lib/librte_eal/linuxapp/eal/eal_memalloc.c -- 2.7.4