DPDK patches and discussions
 help / color / mirror / Atom feed
From: Anatoly Burakov <anatoly.burakov@intel.com>
To: dev@dpdk.org
Cc: sergio.gonzalez.monroy@intel.com
Subject: [dpdk-dev] [PATCH v6 3/3] memzone: improve zero-length memzone reserve
Date: Thu, 31 May 2018 10:51:01 +0100	[thread overview]
Message-ID: <fd9f3018da8cc84e2d56050653e10c427378502c.1527758967.git.anatoly.burakov@intel.com> (raw)
In-Reply-To: <cover.1527758967.git.anatoly.burakov@intel.com>
In-Reply-To: <cover.1527758967.git.anatoly.burakov@intel.com>

Currently, reserving zero-length memzones is done by looking at
malloc statistics, and reserving biggest sized element found in those
statistics. This has two issues.

First, there is a race condition. The heap is unlocked between the
time we check stats, and the time we reserve malloc element for memzone.
This may lead to inability to reserve the memzone we wanted to reserve,
because another allocation might have taken place and biggest sized
element may no longer be available.

Second, the size returned by malloc statistics does not include any
alignment information, which is worked around by being conservative and
subtracting alignment length from the final result. This leads to
fragmentation and reserving memzones that could have been bigger but
aren't.

Fix all of this by using earlier-introduced operation to reserve
biggest possible malloc element. This, however, comes with a trade-off,
because we can only lock one heap at a time. So, if we check the first
available heap and find *any* element at all, that element will be
considered "the biggest", even though other heaps might have bigger
elements. We cannot know what other heaps have before we try and
allocate it, and it is not a good idea to lock all of the heaps at
the same time, so, we will just document this limitation and
encourage users to reserve memzones with socket id properly set.

Also, fixup unit tests to account for the new behavior.

Fixes: fafcc11985a2 ("mem: rework memzone to be allocated by malloc")
Cc: sergio.gonzalez.monroy@intel.com

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---

Notes:
    v6:
    - Rebase on 18.05
    
    v5:
    - Use bound len when reserving bounded zero-length memzones
    
    v4:
    - Rebased on latest master
    - Improved documentation
    - Added accounting for element pad [1]
    - Fixed max len underflow in test
    - Checkpatch fixes
    
    [1] A patch has recently fixed a similar issue:
    
    https://dpdk.org/dev/patchwork/patch/39332/
    
    The accounting for padding is also needed because size of the element
    may include not only malloc header overhead, but also the padding if
    it has any.
    
    At first glance, it would seem like additional change is needed for
    pre-18.05 code as well. However, on closer inspection, the original code
    was incorrect because it was comparing requested_len to 0, which is never
    zero and is always a minimum of cache line size due to earlier RTE_MAX()
    call (or rather, it could be zero, but in that case it would fail earlier).
    This downgrades the above quoted bug from "potential memory corruption bug"
    to "this bug was never a bug due to another bug".
    
    A proper fix for pre-18.05 would be to remove the check altogether and
    always go by requested_len, which is what we use to reserve memzones
    in the first place. I will submit it separately.

 lib/librte_eal/common/eal_common_memzone.c  |  70 ++-------
 lib/librte_eal/common/include/rte_memzone.h |  24 ++-
 test/test/test_memzone.c                    | 165 +++++++++++---------
 3 files changed, 128 insertions(+), 131 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_memzone.c b/lib/librte_eal/common/eal_common_memzone.c
index faa3b0615..7300fe05d 100644
--- a/lib/librte_eal/common/eal_common_memzone.c
+++ b/lib/librte_eal/common/eal_common_memzone.c
@@ -52,38 +52,6 @@ memzone_lookup_thread_unsafe(const char *name)
 	return NULL;
 }
 
-
-/* This function will return the greatest free block if a heap has been
- * specified. If no heap has been specified, it will return the heap and
- * length of the greatest free block available in all heaps */
-static size_t
-find_heap_max_free_elem(int *s, unsigned align)
-{
-	struct rte_mem_config *mcfg;
-	struct rte_malloc_socket_stats stats;
-	int i, socket = *s;
-	size_t len = 0;
-
-	/* get pointer to global configuration */
-	mcfg = rte_eal_get_configuration()->mem_config;
-
-	for (i = 0; i < RTE_MAX_NUMA_NODES; i++) {
-		if ((socket != SOCKET_ID_ANY) && (socket != i))
-			continue;
-
-		malloc_heap_get_stats(&mcfg->malloc_heaps[i], &stats);
-		if (stats.greatest_free_size > len) {
-			len = stats.greatest_free_size;
-			*s = i;
-		}
-	}
-
-	if (len < MALLOC_ELEM_OVERHEAD + align)
-		return 0;
-
-	return len - MALLOC_ELEM_OVERHEAD - align;
-}
-
 static const struct rte_memzone *
 memzone_reserve_aligned_thread_unsafe(const char *name, size_t len,
 		int socket_id, unsigned int flags, unsigned int align,
@@ -92,6 +60,7 @@ memzone_reserve_aligned_thread_unsafe(const char *name, size_t len,
 	struct rte_memzone *mz;
 	struct rte_mem_config *mcfg;
 	struct rte_fbarray *arr;
+	void *mz_addr;
 	size_t requested_len;
 	int mz_idx;
 	bool contig;
@@ -140,8 +109,7 @@ memzone_reserve_aligned_thread_unsafe(const char *name, size_t len,
 		return NULL;
 	}
 
-	len += RTE_CACHE_LINE_MASK;
-	len &= ~((size_t) RTE_CACHE_LINE_MASK);
+	len = RTE_ALIGN_CEIL(len, RTE_CACHE_LINE_SIZE);
 
 	/* save minimal requested  length */
 	requested_len = RTE_MAX((size_t)RTE_CACHE_LINE_SIZE,  len);
@@ -165,27 +133,18 @@ memzone_reserve_aligned_thread_unsafe(const char *name, size_t len,
 	/* malloc only cares about size flags, remove contig flag from flags */
 	flags &= ~RTE_MEMZONE_IOVA_CONTIG;
 
-	if (len == 0) {
-		/* len == 0 is only allowed for non-contiguous zones */
-		if (contig) {
-			RTE_LOG(DEBUG, EAL, "Reserving zero-length contiguous memzones is not supported\n");
-			rte_errno = EINVAL;
-			return NULL;
-		}
-		if (bound != 0)
+	if (len == 0 && bound == 0) {
+		/* no size constraints were placed, so use malloc elem len */
+		requested_len = 0;
+		mz_addr = malloc_heap_alloc_biggest(NULL, socket_id, flags,
+				align, contig);
+	} else {
+		if (len == 0)
 			requested_len = bound;
-		else {
-			requested_len = find_heap_max_free_elem(&socket_id, align);
-			if (requested_len == 0) {
-				rte_errno = ENOMEM;
-				return NULL;
-			}
-		}
+		/* allocate memory on heap */
+		mz_addr = malloc_heap_alloc(NULL, requested_len, socket_id,
+				flags, align, bound, contig);
 	}
-
-	/* allocate memory on heap */
-	void *mz_addr = malloc_heap_alloc(NULL, requested_len, socket_id, flags,
-			align, bound, contig);
 	if (mz_addr == NULL) {
 		rte_errno = ENOMEM;
 		return NULL;
@@ -213,8 +172,9 @@ memzone_reserve_aligned_thread_unsafe(const char *name, size_t len,
 	snprintf(mz->name, sizeof(mz->name), "%s", name);
 	mz->iova = rte_malloc_virt2iova(mz_addr);
 	mz->addr = mz_addr;
-	mz->len = (requested_len == 0 ?
-			(elem->size - MALLOC_ELEM_OVERHEAD) : requested_len);
+	mz->len = requested_len == 0 ?
+			elem->size - elem->pad - MALLOC_ELEM_OVERHEAD :
+			requested_len;
 	mz->hugepage_sz = elem->msl->page_sz;
 	mz->socket_id = elem->msl->socket_id;
 	mz->flags = 0;
diff --git a/lib/librte_eal/common/include/rte_memzone.h b/lib/librte_eal/common/include/rte_memzone.h
index ef370fa6f..f478fa9e6 100644
--- a/lib/librte_eal/common/include/rte_memzone.h
+++ b/lib/librte_eal/common/include/rte_memzone.h
@@ -81,8 +81,12 @@ struct rte_memzone {
  *   memzones from memory that is already available. It will not trigger any
  *   new allocations.
  *
- * @note Reserving IOVA-contiguous memzones with len set to 0 is not currently
- *   supported.
+ * @note: When reserving memzones with len set to 0, it is preferable to also
+ *   set a valid socket_id. Setting socket_id to SOCKET_ID_ANY is supported, but
+ *   will likely not yield expected results. Specifically, the resulting memzone
+ *   may not necessarily be the biggest memzone available, but rather biggest
+ *   memzone available on socket id corresponding to an lcore from which
+ *   reservation was called.
  *
  * @param name
  *   The name of the memzone. If it already exists, the function will
@@ -141,8 +145,12 @@ const struct rte_memzone *rte_memzone_reserve(const char *name,
  *   memzones from memory that is already available. It will not trigger any
  *   new allocations.
  *
- * @note Reserving IOVA-contiguous memzones with len set to 0 is not currently
- *   supported.
+ * @note: When reserving memzones with len set to 0, it is preferable to also
+ *   set a valid socket_id. Setting socket_id to SOCKET_ID_ANY is supported, but
+ *   will likely not yield expected results. Specifically, the resulting memzone
+ *   may not necessarily be the biggest memzone available, but rather biggest
+ *   memzone available on socket id corresponding to an lcore from which
+ *   reservation was called.
  *
  * @param name
  *   The name of the memzone. If it already exists, the function will
@@ -206,8 +214,12 @@ const struct rte_memzone *rte_memzone_reserve_aligned(const char *name,
  *   memzones from memory that is already available. It will not trigger any
  *   new allocations.
  *
- * @note Reserving IOVA-contiguous memzones with len set to 0 is not currently
- *   supported.
+ * @note: When reserving memzones with len set to 0, it is preferable to also
+ *   set a valid socket_id. Setting socket_id to SOCKET_ID_ANY is supported, but
+ *   will likely not yield expected results. Specifically, the resulting memzone
+ *   may not necessarily be the biggest memzone available, but rather biggest
+ *   memzone available on socket id corresponding to an lcore from which
+ *   reservation was called.
  *
  * @param name
  *   The name of the memzone. If it already exists, the function will
diff --git a/test/test/test_memzone.c b/test/test/test_memzone.c
index efcf7327e..452d7cc5e 100644
--- a/test/test/test_memzone.c
+++ b/test/test/test_memzone.c
@@ -467,61 +467,69 @@ test_memzone_reserve_flags(void)
 
 /* Find the heap with the greatest free block size */
 static size_t
-find_max_block_free_size(const unsigned _align)
+find_max_block_free_size(unsigned int align, unsigned int socket_id)
 {
 	struct rte_malloc_socket_stats stats;
-	unsigned i, align = _align;
-	size_t len = 0;
+	size_t len, overhead;
 
-	for (i = 0; i < RTE_MAX_NUMA_NODES; i++) {
-		rte_malloc_get_socket_stats(i, &stats);
-		if (stats.greatest_free_size > len)
-			len = stats.greatest_free_size;
-	}
+	rte_malloc_get_socket_stats(socket_id, &stats);
+
+	len = stats.greatest_free_size;
+	overhead = MALLOC_ELEM_OVERHEAD;
+
+	if (len == 0)
+		return 0;
 
-	if (align < RTE_CACHE_LINE_SIZE)
-		align = RTE_CACHE_LINE_ROUNDUP(align+1);
+	align = RTE_CACHE_LINE_ROUNDUP(align);
+	overhead += align;
 
-	if (len <= MALLOC_ELEM_OVERHEAD + align)
+	if (len < overhead)
 		return 0;
 
-	return len - MALLOC_ELEM_OVERHEAD - align;
+	return len - overhead;
 }
 
 static int
 test_memzone_reserve_max(void)
 {
-	const struct rte_memzone *mz;
-	size_t maxlen;
+	unsigned int i;
 
-	maxlen = find_max_block_free_size(0);
+	for (i = 0; i < rte_socket_count(); i++) {
+		const struct rte_memzone *mz;
+		size_t maxlen;
+		int socket;
 
-	if (maxlen == 0) {
-		printf("There is no space left!\n");
-		return 0;
-	}
+		socket = rte_socket_id_by_idx(i);
+		maxlen = find_max_block_free_size(0, socket);
 
-	mz = rte_memzone_reserve(TEST_MEMZONE_NAME("max_zone"), 0,
-			SOCKET_ID_ANY, 0);
-	if (mz == NULL){
-		printf("Failed to reserve a big chunk of memory - %s\n",
-				rte_strerror(rte_errno));
-		rte_dump_physmem_layout(stdout);
-		rte_memzone_dump(stdout);
-		return -1;
-	}
+		if (maxlen == 0) {
+			printf("There is no space left!\n");
+			return 0;
+		}
 
-	if (mz->len != maxlen) {
-		printf("Memzone reserve with 0 size did not return bigest block\n");
-		printf("Expected size = %zu, actual size = %zu\n", maxlen, mz->len);
-		rte_dump_physmem_layout(stdout);
-		rte_memzone_dump(stdout);
-		return -1;
-	}
+		mz = rte_memzone_reserve(TEST_MEMZONE_NAME("max_zone"), 0,
+				socket, 0);
+		if (mz == NULL) {
+			printf("Failed to reserve a big chunk of memory - %s\n",
+					rte_strerror(rte_errno));
+			rte_dump_physmem_layout(stdout);
+			rte_memzone_dump(stdout);
+			return -1;
+		}
 
-	if (rte_memzone_free(mz)) {
-		printf("Fail memzone free\n");
-		return -1;
+		if (mz->len != maxlen) {
+			printf("Memzone reserve with 0 size did not return bigest block\n");
+			printf("Expected size = %zu, actual size = %zu\n",
+					maxlen, mz->len);
+			rte_dump_physmem_layout(stdout);
+			rte_memzone_dump(stdout);
+			return -1;
+		}
+
+		if (rte_memzone_free(mz)) {
+			printf("Fail memzone free\n");
+			return -1;
+		}
 	}
 
 	return 0;
@@ -530,45 +538,62 @@ test_memzone_reserve_max(void)
 static int
 test_memzone_reserve_max_aligned(void)
 {
-	const struct rte_memzone *mz;
-	size_t maxlen = 0;
+	unsigned int i;
 
-	/* random alignment */
-	rte_srand((unsigned)rte_rdtsc());
-	const unsigned align = 1 << ((rte_rand() % 8) + 5); /* from 128 up to 4k alignment */
+	for (i = 0; i < rte_socket_count(); i++) {
+		const struct rte_memzone *mz;
+		size_t maxlen, minlen = 0;
+		int socket;
 
-	maxlen = find_max_block_free_size(align);
+		socket = rte_socket_id_by_idx(i);
 
-	if (maxlen == 0) {
-		printf("There is no space left for biggest %u-aligned memzone!\n", align);
-		return 0;
-	}
+		/* random alignment */
+		rte_srand((unsigned int)rte_rdtsc());
+		const unsigned int align = 1 << ((rte_rand() % 8) + 5); /* from 128 up to 4k alignment */
 
-	mz = rte_memzone_reserve_aligned(TEST_MEMZONE_NAME("max_zone_aligned"),
-			0, SOCKET_ID_ANY, 0, align);
-	if (mz == NULL){
-		printf("Failed to reserve a big chunk of memory - %s\n",
-				rte_strerror(rte_errno));
-		rte_dump_physmem_layout(stdout);
-		rte_memzone_dump(stdout);
-		return -1;
-	}
+		/* memzone size may be between size and size - align */
+		minlen = find_max_block_free_size(align, socket);
+		maxlen = find_max_block_free_size(0, socket);
 
-	if (mz->len != maxlen) {
-		printf("Memzone reserve with 0 size and alignment %u did not return"
-				" bigest block\n", align);
-		printf("Expected size = %zu, actual size = %zu\n",
-				maxlen, mz->len);
-		rte_dump_physmem_layout(stdout);
-		rte_memzone_dump(stdout);
-		return -1;
-	}
+		if (minlen == 0 || maxlen == 0) {
+			printf("There is no space left for biggest %u-aligned memzone!\n",
+					align);
+			return 0;
+		}
 
-	if (rte_memzone_free(mz)) {
-		printf("Fail memzone free\n");
-		return -1;
-	}
+		mz = rte_memzone_reserve_aligned(
+				TEST_MEMZONE_NAME("max_zone_aligned"),
+				0, socket, 0, align);
+		if (mz == NULL) {
+			printf("Failed to reserve a big chunk of memory - %s\n",
+					rte_strerror(rte_errno));
+			rte_dump_physmem_layout(stdout);
+			rte_memzone_dump(stdout);
+			return -1;
+		}
+		if (mz->addr != RTE_PTR_ALIGN(mz->addr, align)) {
+			printf("Memzone reserve with 0 size and alignment %u did not return aligned block\n",
+					align);
+			rte_dump_physmem_layout(stdout);
+			rte_memzone_dump(stdout);
+			return -1;
+		}
 
+		if (mz->len < minlen || mz->len > maxlen) {
+			printf("Memzone reserve with 0 size and alignment %u did not return"
+					" bigest block\n", align);
+			printf("Expected size = %zu-%zu, actual size = %zu\n",
+					minlen, maxlen, mz->len);
+			rte_dump_physmem_layout(stdout);
+			rte_memzone_dump(stdout);
+			return -1;
+		}
+
+		if (rte_memzone_free(mz)) {
+			printf("Fail memzone free\n");
+			return -1;
+		}
+	}
 	return 0;
 }
 
-- 
2.17.0

  parent reply	other threads:[~2018-05-31  9:51 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-25 14:10 [dpdk-dev] [PATCH 1/2] malloc: add biggest free IOVA-contiguous element to stats Anatoly Burakov
2018-04-25 14:10 ` [dpdk-dev] [PATCH 2/2] memzone: allow IOVA-contiguous memzones with zero size Anatoly Burakov
2018-04-25 14:40 ` [dpdk-dev] [PATCH 1/2] malloc: add biggest free IOVA-contiguous element to stats Burakov, Anatoly
2018-04-26  8:06 ` [dpdk-dev] [PATCH v2 " Anatoly Burakov
2018-05-03 17:17   ` [dpdk-dev] [PATCH v3 0/3] Improve zero-length memzone allocation Anatoly Burakov
2018-05-14 11:19     ` [dpdk-dev] [PATCH v4 " Anatoly Burakov
2018-05-14 13:47       ` [dpdk-dev] [PATCH v5 " Anatoly Burakov
2018-05-31  9:50         ` [dpdk-dev] [PATCH v6 " Anatoly Burakov
2018-07-13  9:24           ` Thomas Monjalon
2018-05-31  9:50         ` [dpdk-dev] [PATCH v6 1/3] malloc: add finding biggest free IOVA-contiguous element Anatoly Burakov
2018-05-31  9:51         ` [dpdk-dev] [PATCH v6 2/3] malloc: allow reserving biggest element Anatoly Burakov
2018-05-31  9:51         ` Anatoly Burakov [this message]
2018-05-14 13:47       ` [dpdk-dev] [PATCH v5 1/3] malloc: add biggest free IOVA-contiguous element to stats Anatoly Burakov
2018-05-14 13:58         ` Shreyansh Jain
2018-05-14 13:47       ` [dpdk-dev] [PATCH v5 2/3] malloc: allow reserving biggest element Anatoly Burakov
2018-05-14 13:47       ` [dpdk-dev] [PATCH v5 3/3] memzone: improve zero-length memzone reserve Anatoly Burakov
2018-05-14 11:19     ` [dpdk-dev] [PATCH v4 1/3] malloc: add biggest free IOVA-contiguous element to stats Anatoly Burakov
2018-05-14 11:19     ` [dpdk-dev] [PATCH v4 2/3] malloc: allow reserving biggest element Anatoly Burakov
2018-05-14 11:19     ` [dpdk-dev] [PATCH v4 3/3] memzone: improve zero-length memzone reserve Anatoly Burakov
2018-05-03 17:17   ` [dpdk-dev] [PATCH v3 1/3] malloc: add biggest free IOVA-contiguous element to stats Anatoly Burakov
2018-05-10 13:39     ` Remy Horton
2018-05-03 17:18   ` [dpdk-dev] [PATCH v3 2/3] malloc: allow reserving biggest element Anatoly Burakov
2018-05-10 13:57     ` Remy Horton
2018-05-14  8:22       ` Burakov, Anatoly
2018-05-03 17:18   ` [dpdk-dev] [PATCH v3 3/3] memzone: improve zero-length memzone reserve Anatoly Burakov
2018-05-11 10:25     ` Remy Horton
2018-05-14  8:21       ` Burakov, Anatoly
2018-05-14 11:29         ` Burakov, Anatoly
2018-05-14 12:23           ` Burakov, Anatoly
2018-05-15  6:24           ` Remy Horton
2018-05-15  8:37             ` Burakov, Anatoly
2018-04-26  8:06 ` [dpdk-dev] [PATCH v2 2/2] memzone: allow IOVA-contiguous memzones with zero size Anatoly Burakov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fd9f3018da8cc84e2d56050653e10c427378502c.1527758967.git.anatoly.burakov@intel.com \
    --to=anatoly.burakov@intel.com \
    --cc=dev@dpdk.org \
    --cc=sergio.gonzalez.monroy@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).