From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by dpdk.org (Postfix) with ESMTP id A84671BBA5 for ; Wed, 11 Apr 2018 14:31:08 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 11 Apr 2018 05:31:06 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.48,436,1517904000"; d="scan'208";a="46992936" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by orsmga001.jf.intel.com with ESMTP; 11 Apr 2018 05:31:02 -0700 Received: from sivswdev01.ir.intel.com (sivswdev01.ir.intel.com [10.237.217.45]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id w3BCV2GU012577; Wed, 11 Apr 2018 13:31:02 +0100 Received: from sivswdev01.ir.intel.com (localhost [127.0.0.1]) by sivswdev01.ir.intel.com with ESMTP id w3BCV1mf014013; Wed, 11 Apr 2018 13:31:01 +0100 Received: (from aburakov@localhost) by sivswdev01.ir.intel.com with LOCAL id w3BCV12O014009; Wed, 11 Apr 2018 13:31:01 +0100 From: Anatoly Burakov To: dev@dpdk.org Cc: keith.wiles@intel.com, jianfeng.tan@intel.com, andras.kovacs@ericsson.com, laszlo.vadkeri@ericsson.com, benjamin.walker@intel.com, bruce.richardson@intel.com, thomas@monjalon.net, konstantin.ananyev@intel.com, kuralamudhan.ramakrishnan@intel.com, louise.m.daly@intel.com, nelio.laranjeiro@6wind.com, yskoh@mellanox.com, pepperjo@japf.ch, jerin.jacob@caviumnetworks.com, hemant.agrawal@nxp.com, olivier.matz@6wind.com, shreyansh.jain@nxp.com, gowrishankar.m@linux.vnet.ibm.com Date: Wed, 11 Apr 2018 13:30:38 +0100 Message-Id: <36e4616e58696ca1401d7894343ee508aab5ab9f.1523448978.git.anatoly.burakov@intel.com> X-Mailer: git-send-email 1.7.0.7 In-Reply-To: References: In-Reply-To: References: Subject: [dpdk-dev] [PATCH v6 63/70] malloc: enable callbacks on alloc/free and mp sync X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 Apr 2018 12:31:09 -0000 Callbacks will be triggered just after allocation and just before deallocation, to ensure that memory address space referenced in the callback is always valid by the time callback is called. Signed-off-by: Anatoly Burakov Tested-by: Santosh Shukla Tested-by: Hemant Agrawal Tested-by: Gowrishankar Muthukrishnan --- lib/librte_eal/common/malloc_heap.c | 21 +++++++++++++++++++++ lib/librte_eal/linuxapp/eal/eal_memalloc.c | 30 ++++++++++++++++++++++++++++++ lib/librte_eal/linuxapp/eal/eal_vfio.c | 15 +++++++++++++-- 3 files changed, 64 insertions(+), 2 deletions(-) diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c index be39250..18c7b69 100644 --- a/lib/librte_eal/common/malloc_heap.c +++ b/lib/librte_eal/common/malloc_heap.c @@ -241,6 +241,7 @@ try_expand_heap_primary(struct malloc_heap *heap, uint64_t pg_sz, void *map_addr; size_t alloc_sz; int n_segs; + bool callback_triggered = false; alloc_sz = RTE_ALIGN_CEIL(align + elt_size + MALLOC_ELEM_TRAILER_LEN, pg_sz); @@ -262,12 +263,22 @@ try_expand_heap_primary(struct malloc_heap *heap, uint64_t pg_sz, map_addr = ms[0]->addr; + /* notify user about changes in memory map */ + eal_memalloc_mem_event_notify(RTE_MEM_EVENT_ALLOC, map_addr, alloc_sz); + /* notify other processes that this has happened */ if (request_sync()) { /* we couldn't ensure all processes have mapped memory, * so free it back and notify everyone that it's been * freed back. + * + * technically, we could've avoided adding memory addresses to + * the map, but that would've led to inconsistent behavior + * between primary and secondary processes, as those get + * callbacks during sync. therefore, force primary process to + * do alloc-and-rollback syncs as well. */ + callback_triggered = true; goto free_elem; } heap->total_size += alloc_sz; @@ -280,6 +291,10 @@ try_expand_heap_primary(struct malloc_heap *heap, uint64_t pg_sz, return 0; free_elem: + if (callback_triggered) + eal_memalloc_mem_event_notify(RTE_MEM_EVENT_FREE, + map_addr, alloc_sz); + rollback_expand_heap(ms, n_segs, elem, map_addr, alloc_sz); request_sync(); @@ -642,6 +657,10 @@ malloc_heap_free(struct malloc_elem *elem) heap->total_size -= aligned_len; if (rte_eal_process_type() == RTE_PROC_PRIMARY) { + /* notify user about changes in memory map */ + eal_memalloc_mem_event_notify(RTE_MEM_EVENT_FREE, + aligned_start, aligned_len); + /* don't care if any of this fails */ malloc_heap_free_pages(aligned_start, aligned_len); @@ -666,6 +685,8 @@ malloc_heap_free(struct malloc_elem *elem) * already removed from the heap, so it is, for all intents and * purposes, hidden from the rest of DPDK even if some other * process (including this one) may have these pages mapped. + * + * notifications about deallocated memory happen during sync. */ request_to_primary(&req); } diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c index 75f2b0c..93f80bb 100644 --- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c +++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c @@ -876,6 +876,21 @@ sync_chunk(struct rte_memseg_list *primary_msl, diff_len = RTE_MIN(chunk_len, diff_len); + /* if we are freeing memory, notify the application */ + if (!used) { + struct rte_memseg *ms; + void *start_va; + size_t len, page_sz; + + ms = rte_fbarray_get(l_arr, start); + start_va = ms->addr; + page_sz = (size_t)primary_msl->page_sz; + len = page_sz * diff_len; + + eal_memalloc_mem_event_notify(RTE_MEM_EVENT_FREE, + start_va, len); + } + for (i = 0; i < diff_len; i++) { struct rte_memseg *p_ms, *l_ms; int seg_idx = start + i; @@ -901,6 +916,21 @@ sync_chunk(struct rte_memseg_list *primary_msl, } } + /* if we just allocated memory, notify the application */ + if (used) { + struct rte_memseg *ms; + void *start_va; + size_t len, page_sz; + + ms = rte_fbarray_get(l_arr, start); + start_va = ms->addr; + page_sz = (size_t)primary_msl->page_sz; + len = page_sz * diff_len; + + eal_memalloc_mem_event_notify(RTE_MEM_EVENT_ALLOC, + start_va, len); + } + /* calculate how much we can advance until next chunk */ diff_len = used ? rte_fbarray_find_contig_used(l_arr, start) : diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c index 5101c04..2eea3b8 100644 --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c @@ -1128,6 +1128,7 @@ vfio_spapr_dma_mem_map(int vfio_container_fd, uint64_t vaddr, uint64_t iova, create.levels = 1; if (do_map) { + void *addr; /* re-create window and remap the entire memory */ if (iova > create.window_size) { if (vfio_spapr_create_new_dma_window(vfio_container_fd, @@ -1158,9 +1159,19 @@ vfio_spapr_dma_mem_map(int vfio_container_fd, uint64_t vaddr, uint64_t iova, /* now that we've remapped all of the memory that was present * before, map the segment that we were requested to map. + * + * however, if we were called by the callback, the memory we + * were called with was already in the memseg list, so previous + * mapping should've mapped that segment already. + * + * virt2memseg_list is a relatively cheap check, so use that. if + * memory is within any memseg list, it's a memseg, so it's + * already mapped. */ - if (vfio_spapr_dma_do_map(vfio_container_fd, - vaddr, iova, len, 1) < 0) { + addr = (void *)(uintptr_t)vaddr; + if (rte_mem_virt2memseg_list(addr) == NULL && + vfio_spapr_dma_do_map(vfio_container_fd, + vaddr, iova, len, 1) < 0) { RTE_LOG(ERR, EAL, "Could not map segment\n"); ret = -1; goto out; -- 2.7.4