From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by dpdk.org (Postfix) with ESMTP id 7BF331BA14 for ; Mon, 9 Apr 2018 20:35:40 +0200 (CEST) Received: from pps.filterd (m0098393.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id w39IYwAW117800 for ; Mon, 9 Apr 2018 14:35:39 -0400 Received: from e06smtp10.uk.ibm.com (e06smtp10.uk.ibm.com [195.75.94.106]) by mx0a-001b2d01.pphosted.com with ESMTP id 2h8bc0pjdx-1 (version=TLSv1.2 cipher=AES256-SHA256 bits=256 verify=NOT) for ; Mon, 09 Apr 2018 14:35:39 -0400 Received: from localhost by e06smtp10.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 9 Apr 2018 19:35:36 +0100 Received: from b06cxnps3075.portsmouth.uk.ibm.com (9.149.109.195) by e06smtp10.uk.ibm.com (192.168.101.140) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Mon, 9 Apr 2018 19:35:31 +0100 Received: from d06av25.portsmouth.uk.ibm.com (d06av25.portsmouth.uk.ibm.com [9.149.105.61]) by b06cxnps3075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id w39IZVNn15204844; Mon, 9 Apr 2018 18:35:31 GMT Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id D82BC11C04A; Mon, 9 Apr 2018 19:27:38 +0100 (BST) Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 2572B11C05B; Mon, 9 Apr 2018 19:27:35 +0100 (BST) Received: from [9.79.222.4] (unknown [9.79.222.4]) by d06av25.portsmouth.uk.ibm.com (Postfix) with ESMTP; Mon, 9 Apr 2018 19:27:34 +0100 (BST) To: Anatoly Burakov , dev@dpdk.org Cc: keith.wiles@intel.com, jianfeng.tan@intel.com, andras.kovacs@ericsson.com, laszlo.vadkeri@ericsson.com, benjamin.walker@intel.com, bruce.richardson@intel.com, thomas@monjalon.net, konstantin.ananyev@intel.com, kuralamudhan.ramakrishnan@intel.com, louise.m.daly@intel.com, nelio.laranjeiro@6wind.com, yskoh@mellanox.com, pepperjo@japf.ch, jerin.jacob@caviumnetworks.com, hemant.agrawal@nxp.com, olivier.matz@6wind.com, shreyansh.jain@nxp.com References: From: gowrishankar muthukrishnan Date: Tue, 10 Apr 2018 00:05:26 +0530 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-GB X-TM-AS-GCONF: 00 x-cbid: 18040918-0040-0000-0000-0000042C40AF X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18040918-0041-0000-0000-000026304C8F Message-Id: <8fb9466a-00e0-d4bb-05a2-96d916cfaeb2@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2018-04-09_08:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=2 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1709140000 definitions=main-1804090188 Subject: Re: [dpdk-dev] [PATCH v5 00/70] Memory Hotplug for DPDK X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Apr 2018 18:35:40 -0000 On Monday 09 April 2018 11:30 PM, Anatoly Burakov wrote: > This patchset introduces dynamic memory allocation for DPDK (aka memory > hotplug). Based upon RFC submitted in December [1]. > > Dependencies (to be applied in specified order): > - EAL IOVA fix [2] > > Deprecation notices relevant to this patchset: > - General outline of memory hotplug changes [3] > > The vast majority of changes are in the EAL and malloc, the external API > disruption is minimal: a new flag is added to memzone API for contiguous > memory allocation, a few API additions in rte_memory due to switch > to memseg_lists as opposed to memsegs, and a few new convenience API's. > Every other API change is internal to EAL, and all of the memory > allocation/freeing is handled through rte_malloc, with no externally > visible API changes. > > Quick outline of all changes done as part of this patchset: > > * Malloc heap adjusted to handle holes in address space > * Single memseg list replaced by multiple memseg lists > * VA space for hugepages is preallocated in advance > * Added alloc/free for pages happening as needed on rte_malloc/rte_free > * Added contiguous memory allocation API's for rte_memzone > * Added convenience API calls to walk over memsegs > * Integrated Pawel Wodkowski's patch for registering/unregistering memory > with VFIO [4] > * Callbacks for registering memory allocations > * Callbacks for allowing/disallowing allocations above specified limit > * Multiprocess support done via DPDK IPC introduced in 18.02 > > The biggest difference is a "memseg" now represents a single page (as opposed to > being a big contiguous block of pages). As a consequence, both memzones and > malloc elements are no longer guaranteed to be physically contiguous, unless > the user asks for it at reserve time. To preserve whatever functionality that > was dependent on previous behavior, a legacy memory option is also provided, > however it is expected (or perhaps vainly hoped) to be temporary solution. > > Why multiple memseg lists instead of one? Since memseg is a single page now, > the list of memsegs will get quite big, and we need to locate pages somehow > when we allocate and free them. We could of course just walk the list and > allocate one contiguous chunk of VA space for memsegs, but this > implementation uses separate lists instead in order to speed up many > operations with memseg lists. > > For v5, the following limitations are present: > - VFIO support for multiple processes is not well-tested; work is ongoing > to validate VFIO for all use cases > - There are known problems with PPC64 VFIO code As below. > - For DPAA and FSLMC platforms, performance will be heavily degraded for > IOVA as PA cases; separate patches are expected to address the issue > > For testing, it is recommended to use the GitHub repository [5], as it will > have all of the dependencies already integrated. > > Tested-by: Hemant Agrawal > Tested-by: Santosh Shukla Tested-by: Gowrishankar Muthukrishnan VFIO related validations being done on powerpc still, so I'll post our arch specific changes, as I test more. This would not block this patch set to get merged, as the changes we would observe is mostly on top of sPAPR IOMMU (which is specific to powerpc only) and does not affect other arch. Thanks, Gowrishankar > > v5: > - Fixed missing DMA window creation on PPC64 for VFIO > - fslmc VFIO fixes > - Added new user DMA map code to keep track of user DMA maps > when hotplug is in use (also used on PPC64 on remap) > - A few checkpatch and commit message fixes here and there > > v4: > - Fixed bug in memzone lookup > - Added draft fslmc VFIO code > - Rebased on latest master + dependent patchset > - Documented limitations for *_walk() functions > > v3: > - Lots of compile fixes > - Fixes for multiprocess synchronization > - Introduced support for sPAPR IOMMU, courtesy of Gowrishankar @ IBM > - Fixes for mempool size calculation > - Added convenience memseg walk() API's > - Added alloc validation callback > > v2: - fixed deadlock at init > - reverted rte_panic changes at init, this is now handled inside IPC > > [1] http://dpdk.org/dev/patchwork/bundle/aburakov/Memory_RFC/ > [2] http://dpdk.org/dev/patchwork/bundle/aburakov/IOVA_mode_fixes/ > [3] http://dpdk.org/dev/patchwork/patch/34002/ > [4] http://dpdk.org/dev/patchwork/patch/24484/ > [5] https://github.com/anatolyburakov/dpdk > > Anatoly Burakov (70): > eal: move get_virtual_area out of linuxapp eal_memory.c > eal: move all locking to heap > eal: make malloc heap a doubly-linked list > eal: add function to dump malloc heap contents > test: add command to dump malloc heap contents > eal: make malloc_elem_join_adjacent_free public > eal: make malloc free list remove public > eal: make malloc free return resulting malloc element > eal: replace panics with error messages in malloc > eal: add backend support for contiguous allocation > eal: enable reserving physically contiguous memzones > ethdev: use contiguous allocation for DMA memory > crypto/qat: use contiguous allocation for DMA memory > net/avf: use contiguous allocation for DMA memory > net/bnx2x: use contiguous allocation for DMA memory > net/bnxt: use contiguous allocation for DMA memory > net/cxgbe: use contiguous allocation for DMA memory > net/ena: use contiguous allocation for DMA memory > net/enic: use contiguous allocation for DMA memory > net/i40e: use contiguous allocation for DMA memory > net/qede: use contiguous allocation for DMA memory > net/virtio: use contiguous allocation for DMA memory > net/vmxnet3: use contiguous allocation for DMA memory > mempool: add support for the new allocation methods > eal: add function to walk all memsegs > bus/fslmc: use memseg walk instead of iteration > bus/pci: use memseg walk instead of iteration > net/mlx5: use memseg walk instead of iteration > eal: use memseg walk instead of iteration > mempool: use memseg walk instead of iteration > test: use memseg walk instead of iteration > vfio/type1: use memseg walk instead of iteration > vfio/spapr: use memseg walk instead of iteration > eal: add contig walk function > virtio: use memseg contig walk instead of iteration > eal: add iova2virt function > bus/dpaa: use iova2virt instead of memseg iteration > bus/fslmc: use iova2virt instead of memseg iteration > crypto/dpaa_sec: use iova2virt instead of memseg iteration > eal: add virt2memseg function > bus/fslmc: use virt2memseg instead of iteration > crypto/dpaa_sec: use virt2memseg instead of iteration > net/mlx4: use virt2memseg instead of iteration > net/mlx5: use virt2memseg instead of iteration > eal: use memzone walk instead of iteration > vfio: allow to map other memory regions > eal: add "legacy memory" option > eal: add rte_fbarray > eal: replace memseg with memseg lists > eal: replace memzone array with fbarray > eal: add support for mapping hugepages at runtime > eal: add support for unmapping pages at runtime > eal: add "single file segments" command-line option > eal: add API to check if memory is contiguous > eal: prepare memseg lists for multiprocess sync > eal: read hugepage counts from node-specific sysfs path > eal: make use of memory hotplug for init > eal: share hugepage info primary and secondary > eal: add secondary process init with memory hotplug > eal: enable memory hotplug support in rte_malloc > eal: add support for multiprocess memory hotplug > eal: add support for callbacks on memory hotplug > eal: enable callbacks on malloc/free and mp sync > vfio: enable support for mem event callbacks > bus/fslmc: move vfio DMA map into bus probe > bus/fslmc: enable support for mem event callbacks for vfio > eal: enable non-legacy memory mode > eal: add memory validator callback > eal: enable validation before new page allocation > eal: prevent preallocated pages from being freed > > config/common_base | 15 +- > config/defconfig_i686-native-linuxapp-gcc | 3 + > config/defconfig_i686-native-linuxapp-icc | 3 + > config/defconfig_x86_x32-native-linuxapp-gcc | 3 + > config/rte_config.h | 7 +- > doc/guides/rel_notes/deprecation.rst | 9 - > drivers/bus/dpaa/rte_dpaa_bus.h | 12 +- > drivers/bus/fslmc/fslmc_bus.c | 11 + > drivers/bus/fslmc/fslmc_vfio.c | 195 +++- > drivers/bus/fslmc/portal/dpaa2_hw_pvt.h | 27 +- > drivers/bus/pci/Makefile | 3 + > drivers/bus/pci/linux/pci.c | 28 +- > drivers/bus/pci/meson.build | 3 + > drivers/crypto/dpaa_sec/dpaa_sec.c | 30 +- > drivers/crypto/qat/qat_qp.c | 23 +- > drivers/event/dpaa2/Makefile | 3 + > drivers/mempool/dpaa/Makefile | 3 + > drivers/mempool/dpaa/meson.build | 3 + > drivers/mempool/dpaa2/Makefile | 3 + > drivers/mempool/dpaa2/meson.build | 3 + > drivers/net/avf/avf_ethdev.c | 4 +- > drivers/net/bnx2x/bnx2x.c | 2 +- > drivers/net/bnx2x/bnx2x_rxtx.c | 3 +- > drivers/net/bnxt/bnxt_ethdev.c | 17 +- > drivers/net/bnxt/bnxt_ring.c | 9 +- > drivers/net/bnxt/bnxt_vnic.c | 8 +- > drivers/net/cxgbe/sge.c | 3 +- > drivers/net/dpaa/Makefile | 3 + > drivers/net/dpaa2/Makefile | 3 + > drivers/net/dpaa2/dpaa2_ethdev.c | 1 - > drivers/net/dpaa2/meson.build | 3 + > drivers/net/ena/Makefile | 3 + > drivers/net/ena/base/ena_plat_dpdk.h | 9 +- > drivers/net/ena/ena_ethdev.c | 10 +- > drivers/net/enic/enic_main.c | 9 +- > drivers/net/i40e/i40e_ethdev.c | 4 +- > drivers/net/i40e/i40e_rxtx.c | 4 +- > drivers/net/mlx4/mlx4_mr.c | 18 +- > drivers/net/mlx5/Makefile | 3 + > drivers/net/mlx5/mlx5.c | 25 +- > drivers/net/mlx5/mlx5_mr.c | 19 +- > drivers/net/qede/base/bcm_osal.c | 7 +- > drivers/net/virtio/virtio_ethdev.c | 8 +- > drivers/net/virtio/virtio_user/vhost_kernel.c | 83 +- > drivers/net/vmxnet3/vmxnet3_ethdev.c | 5 +- > lib/librte_eal/bsdapp/eal/Makefile | 4 + > lib/librte_eal/bsdapp/eal/eal.c | 83 +- > lib/librte_eal/bsdapp/eal/eal_hugepage_info.c | 65 +- > lib/librte_eal/bsdapp/eal/eal_memalloc.c | 48 + > lib/librte_eal/bsdapp/eal/eal_memory.c | 224 +++- > lib/librte_eal/bsdapp/eal/meson.build | 1 + > lib/librte_eal/common/Makefile | 2 +- > lib/librte_eal/common/eal_common_fbarray.c | 859 ++++++++++++++++ > lib/librte_eal/common/eal_common_memalloc.c | 359 +++++++ > lib/librte_eal/common/eal_common_memory.c | 824 ++++++++++++++- > lib/librte_eal/common/eal_common_memzone.c | 235 +++-- > lib/librte_eal/common/eal_common_options.c | 13 +- > lib/librte_eal/common/eal_filesystem.h | 30 + > lib/librte_eal/common/eal_hugepages.h | 11 +- > lib/librte_eal/common/eal_internal_cfg.h | 12 +- > lib/librte_eal/common/eal_memalloc.h | 80 ++ > lib/librte_eal/common/eal_options.h | 4 + > lib/librte_eal/common/eal_private.h | 33 + > lib/librte_eal/common/include/rte_eal_memconfig.h | 28 +- > lib/librte_eal/common/include/rte_fbarray.h | 353 +++++++ > lib/librte_eal/common/include/rte_malloc.h | 10 + > lib/librte_eal/common/include/rte_malloc_heap.h | 6 + > lib/librte_eal/common/include/rte_memory.h | 258 ++++- > lib/librte_eal/common/include/rte_memzone.h | 12 +- > lib/librte_eal/common/include/rte_vfio.h | 41 + > lib/librte_eal/common/malloc_elem.c | 433 ++++++-- > lib/librte_eal/common/malloc_elem.h | 43 +- > lib/librte_eal/common/malloc_heap.c | 704 ++++++++++++- > lib/librte_eal/common/malloc_heap.h | 15 +- > lib/librte_eal/common/malloc_mp.c | 744 ++++++++++++++ > lib/librte_eal/common/malloc_mp.h | 86 ++ > lib/librte_eal/common/meson.build | 4 + > lib/librte_eal/common/rte_malloc.c | 85 +- > lib/librte_eal/linuxapp/eal/Makefile | 5 + > lib/librte_eal/linuxapp/eal/eal.c | 62 +- > lib/librte_eal/linuxapp/eal/eal_hugepage_info.c | 218 +++- > lib/librte_eal/linuxapp/eal/eal_memalloc.c | 1123 +++++++++++++++++++++ > lib/librte_eal/linuxapp/eal/eal_memory.c | 1119 ++++++++++++-------- > lib/librte_eal/linuxapp/eal/eal_vfio.c | 870 ++++++++++++++-- > lib/librte_eal/linuxapp/eal/eal_vfio.h | 12 + > lib/librte_eal/linuxapp/eal/meson.build | 1 + > lib/librte_eal/rte_eal_version.map | 30 +- > lib/librte_ether/rte_ethdev.c | 3 +- > lib/librte_mempool/Makefile | 3 + > lib/librte_mempool/meson.build | 3 + > lib/librte_mempool/rte_mempool.c | 149 ++- > test/test/commands.c | 3 + > test/test/test_malloc.c | 30 +- > test/test/test_memory.c | 27 +- > test/test/test_memzone.c | 62 +- > 95 files changed, 8794 insertions(+), 1285 deletions(-) > create mode 100644 lib/librte_eal/bsdapp/eal/eal_memalloc.c > create mode 100644 lib/librte_eal/common/eal_common_fbarray.c > create mode 100644 lib/librte_eal/common/eal_common_memalloc.c > create mode 100644 lib/librte_eal/common/eal_memalloc.h > create mode 100644 lib/librte_eal/common/include/rte_fbarray.h > create mode 100644 lib/librte_eal/common/malloc_mp.c > create mode 100644 lib/librte_eal/common/malloc_mp.h > create mode 100644 lib/librte_eal/linuxapp/eal/eal_memalloc.c >