* [dpdk-dev] [PATCH 1/2] eal/malloc: merge malloc_elems in heap if they are contiguous
2018-05-03 10:11 [dpdk-dev] [PATCH 0/2] eal/malloc: fix wrong heap initialization over multiple memsegs Gowrishankar
@ 2018-05-03 10:11 ` Gowrishankar
2018-05-04 9:29 ` Burakov, Anatoly
2018-05-03 10:11 ` [dpdk-dev] [PATCH 2/2] eal/malloc: fix heap index to correctly insert memseg Gowrishankar
2018-05-18 13:10 ` [dpdk-dev] [PATCH 0/2] eal/malloc: fix wrong heap initialization over multiple memsegs Thomas Monjalon
2 siblings, 1 reply; 8+ messages in thread
From: Gowrishankar @ 2018-05-03 10:11 UTC (permalink / raw)
To: Sergio Gonzalez Monroy
Cc: Anatoly Burakov, dev, Thomas Monjalon,
Gowrishankar Muthukrishnan, stable
From: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
During malloc heap init, if there are malloc_elems contiguous in
virt addresses, they could be merged so that, merged malloc_elem
would guarantee larger free memory size than its actual hugepage
size, it was created for.
Fixes: fafcc11985 ("mem: rework memzone to be allocated by malloc")
Cc: stable@dpdk.org
Signed-off-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
---
lib/librte_eal/common/malloc_heap.c | 30 +++++++++++++++++++++++++++++-
1 file changed, 29 insertions(+), 1 deletion(-)
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 267a4c6..1cacf7f 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -213,7 +213,9 @@
{
struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
unsigned ms_cnt;
- struct rte_memseg *ms;
+ struct rte_memseg *ms, *prev_ms = NULL;
+ struct malloc_elem *elem, *prev_elem;
+ int ret;
if (mcfg == NULL)
return -1;
@@ -222,6 +224,32 @@
(ms_cnt < RTE_MAX_MEMSEG) && (ms->len > 0);
ms_cnt++, ms++) {
malloc_heap_add_memseg(&mcfg->malloc_heaps[ms->socket_id], ms);
+ elem = (struct malloc_elem *)ms->addr;
+ if (prev_ms != NULL && \
+ (ms->socket_id == prev_ms->socket_id)) {
+ prev_elem = (struct malloc_elem *)prev_ms->addr;
+
+ /* prev_elem and elem to be contiguous for the resize.
+ Other wise look for prev_elem in iterations */
+ if (elem != RTE_PTR_ADD(prev_elem,
+ prev_elem->size + MALLOC_ELEM_OVERHEAD)) {
+ prev_ms = ms;
+ continue;
+ }
+ /* end BUSY elem pointed by prev_elem can be merged
+ with prev_elem itself, as it expands it size now.
+ */
+ prev_elem->size += MALLOC_ELEM_OVERHEAD;
+
+ /* preserve end BUSY elem that points to current elem,
+ or else free_list will be broken */
+ ret = malloc_elem_resize(prev_elem,
+ prev_elem->size + elem->size - MALLOC_ELEM_OVERHEAD);
+ if (ret < 0)
+ prev_elem = elem;
+ } else {
+ prev_ms = ms;
+ }
}
return 0;
--
1.9.1
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [dpdk-dev] [PATCH 1/2] eal/malloc: merge malloc_elems in heap if they are contiguous
2018-05-03 10:11 ` [dpdk-dev] [PATCH 1/2] eal/malloc: merge malloc_elems in heap if they are contiguous Gowrishankar
@ 2018-05-04 9:29 ` Burakov, Anatoly
2018-05-04 10:41 ` gowrishankar muthukrishnan
0 siblings, 1 reply; 8+ messages in thread
From: Burakov, Anatoly @ 2018-05-04 9:29 UTC (permalink / raw)
To: Gowrishankar, Sergio Gonzalez Monroy; +Cc: dev, Thomas Monjalon, stable
On 03-May-18 11:11 AM, Gowrishankar wrote:
> From: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
>
> During malloc heap init, if there are malloc_elems contiguous in
> virt addresses, they could be merged so that, merged malloc_elem
> would guarantee larger free memory size than its actual hugepage
> size, it was created for.
>
> Fixes: fafcc11985 ("mem: rework memzone to be allocated by malloc")
> Cc: stable@dpdk.org
>
> Signed-off-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
> ---
Hi Gowrishankar,
I haven't looked at the patchset in detail yet, however i have a general
question: how do we end up with VA-contiguous memsegs that are not part
of the same memseg in the first place? Is there something wrong with
memseg sorting code? Alternatively, if they were broken up, presumably
they were broken up for a reason, namely while they may be VA
contiguous, they weren't IOVA-contiguous.
Can you provide a dump of physmem layout where memory would have been VA
and IOVA-contiguous while belonging to different memsegs?
--
Thanks,
Anatoly
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [dpdk-dev] [PATCH 1/2] eal/malloc: merge malloc_elems in heap if they are contiguous
2018-05-04 9:29 ` Burakov, Anatoly
@ 2018-05-04 10:41 ` gowrishankar muthukrishnan
2018-05-04 11:02 ` Burakov, Anatoly
0 siblings, 1 reply; 8+ messages in thread
From: gowrishankar muthukrishnan @ 2018-05-04 10:41 UTC (permalink / raw)
To: Burakov, Anatoly, Sergio Gonzalez Monroy
Cc: dev, Thomas Monjalon, stable, Chao Zhu
Hi Anatoly,
On Friday 04 May 2018 02:59 PM, Burakov, Anatoly wrote:
> On 03-May-18 11:11 AM, Gowrishankar wrote:
>> From: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
>>
>> During malloc heap init, if there are malloc_elems contiguous in
>> virt addresses, they could be merged so that, merged malloc_elem
>> would guarantee larger free memory size than its actual hugepage
>> size, it was created for.
>>
>> Fixes: fafcc11985 ("mem: rework memzone to be allocated by malloc")
>> Cc: stable@dpdk.org
>>
>> Signed-off-by: Gowrishankar Muthukrishnan
>> <gowrishankar.m@linux.vnet.ibm.com>
>> ---
>
> Hi Gowrishankar,
>
> I haven't looked at the patchset in detail yet, however i have a
> general question: how do we end up with VA-contiguous memsegs that are
> not part of the same memseg in the first place? Is there something
> wrong with memseg sorting code? Alternatively, if
> they were broken up, presumably they were broken up for a reason,
> namely while they may be VA contiguous, they weren't IOVA-contiguous.
In powerpc, when *nr_overcommit_hugepages set* (to respect address hint
in get_virtual_area() as requested by secondary process), mmap() would
not be allocate one big VA chunk for all the available hugepages. In
order to support secondary process be in same VA
range, we need to add anonymous and hugetlb flags in mmap calls while
remapping. As mmap can only create max VA at the size of hugepage
(MAP_HUGETLB) and also to respect address hint (MAP_ANONYMOUS), multiple
VA chunks are created, even though both VA and IOVA are contiguous in
most of the cases.
>
> Can you provide a dump of physmem layout where memory would have been
> VA and IOVA-contiguous while belonging to different memsegs?
Please find here: https://pastebin.com/tDNEaxdU
As you notice malloc_heaps, its index for heap size is 8 which is
supposedly 11.
To note, these are not problems with memory rework done in latest code
base. So, I refered code until v18.02.
--
Regards,
Gowrishankar M
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [dpdk-dev] [PATCH 1/2] eal/malloc: merge malloc_elems in heap if they are contiguous
2018-05-04 10:41 ` gowrishankar muthukrishnan
@ 2018-05-04 11:02 ` Burakov, Anatoly
0 siblings, 0 replies; 8+ messages in thread
From: Burakov, Anatoly @ 2018-05-04 11:02 UTC (permalink / raw)
To: gowrishankar muthukrishnan, Sergio Gonzalez Monroy
Cc: dev, Thomas Monjalon, stable, Chao Zhu
On 04-May-18 11:41 AM, gowrishankar muthukrishnan wrote:
> Hi Anatoly,
>
> On Friday 04 May 2018 02:59 PM, Burakov, Anatoly wrote:
>> On 03-May-18 11:11 AM, Gowrishankar wrote:
>>> From: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
>>>
>>> During malloc heap init, if there are malloc_elems contiguous in
>>> virt addresses, they could be merged so that, merged malloc_elem
>>> would guarantee larger free memory size than its actual hugepage
>>> size, it was created for.
>>>
>>> Fixes: fafcc11985 ("mem: rework memzone to be allocated by malloc")
>>> Cc: stable@dpdk.org
>>>
>>> Signed-off-by: Gowrishankar Muthukrishnan
>>> <gowrishankar.m@linux.vnet.ibm.com>
>>> ---
>>
>> Hi Gowrishankar,
>>
>> I haven't looked at the patchset in detail yet, however i have a
>> general question: how do we end up with VA-contiguous memsegs that are
>> not part of the same memseg in the first place? Is there something
>> wrong with memseg sorting code? Alternatively, if they were broken up,
>> presumably they were broken up for a reason, namely while they may be
>> VA contiguous, they weren't IOVA-contiguous.
>
> In powerpc, when *nr_overcommit_hugepages set* (to respect address hint
> in get_virtual_area() as requested by secondary process), mmap() would
> not be allocate one big VA chunk for all the available hugepages. In
> order to support secondary process be in same VA
> range, we need to add anonymous and hugetlb flags in mmap calls while
> remapping. As mmap can only create max VA at the size of hugepage
> (MAP_HUGETLB) and also to respect address hint (MAP_ANONYMOUS), multiple
> VA chunks are created, even though both VA and IOVA are contiguous in
> most of the cases.
OK, suppose on PPC64, that may happen. Still (and please correct me if
i'm misunderstanding the patchset - as i said, i haven't looked at it in
detail, and have only taken a cursory look), there are two issues i see
here:
1) there's no check for IOVA-contiguousness, only VA-contiguousness,
which means you are risking accidentally concatenating segments that
aren't IOVA-contiguous. Prior to 18.05, the rest of DPDK expects all
segments to be VA- and IOVA-contiguous.
2) i don't think this problem should be solved in malloc. Malloc
elements have memseg pointers in them, and if you concatenate multiple
segments, you will end up having malloc elements which point to wrong
segments. Instead, you should fix memseg allocation code to do
concatenate seemingly disparate segments, and avoid the problem with
malloc elements in the first place. Maybe do another sorting pass, or
something. In any case, memseg allocation code is the correct place to
fix this, IMO.
>
>>
>> Can you provide a dump of physmem layout where memory would have been
>> VA and IOVA-contiguous while belonging to different memsegs?
>
> Please find here: https://pastebin.com/tDNEaxdU
>
> As you notice malloc_heaps, its index for heap size is 8 which is
> supposedly 11.
That's a bit hard to read. There's a rte_eal_dump_physmem_layout()
function that should help display this in a more user-friendly manner :)
>
> To note, these are not problems with memory rework done in latest code
> base. So, I refered code until v18.02.
>
--
Thanks,
Anatoly
^ permalink raw reply [flat|nested] 8+ messages in thread
* [dpdk-dev] [PATCH 2/2] eal/malloc: fix heap index to correctly insert memseg
2018-05-03 10:11 [dpdk-dev] [PATCH 0/2] eal/malloc: fix wrong heap initialization over multiple memsegs Gowrishankar
2018-05-03 10:11 ` [dpdk-dev] [PATCH 1/2] eal/malloc: merge malloc_elems in heap if they are contiguous Gowrishankar
@ 2018-05-03 10:11 ` Gowrishankar
2018-05-18 13:10 ` [dpdk-dev] [PATCH 0/2] eal/malloc: fix wrong heap initialization over multiple memsegs Thomas Monjalon
2 siblings, 0 replies; 8+ messages in thread
From: Gowrishankar @ 2018-05-03 10:11 UTC (permalink / raw)
To: Sergio Gonzalez Monroy
Cc: Anatoly Burakov, dev, Thomas Monjalon,
Gowrishankar Muthukrishnan, stable
From: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
When there are multiple memsegs created and adding new memseg would
cause bigger heap size, its index in free_head list should be based
on new size of heap. Currently, only the size of elem is accounted
as in malloc_elem_free_list_insert. As heap total size gets bigger,
list of those memsegs should be at the right index, so that
malloc_heap_alloc would find suitable element for the requested
memory size by applications.
Fixes: b0489e7bca ("malloc: fix linear complexity")
Cc: stable@dpdk.org
Signed-off-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
---
Eg. Below heap is in one numa socket, for the size of 1G (i.e 64*16MB).
All corresponding malloc_elem are always added in heap index 8,
as their size is always 16MB (and due to which, index is also 8 always).
free_head = {{lh_first = 0x0}, {lh_first = 0x0}, {lh_first = 0x0}, {
lh_first = 0x0}, {lh_first = 0x0}, {lh_first = 0x0}, {lh_first = 0x0}, {
lh_first = 0x0}, {lh_first = 0x7efd3f000000}, {lh_first = 0x0}, {lh_first = 0x0}, {
lh_first = 0x0}, {lh_first = 0x0}},
alloc_count = 6, total_size = 1073733632},
Ideally, this list of memsegs should ideally be at slot 12, as they grow heap for 1G.
---
lib/librte_eal/common/malloc_heap.c | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 1cacf7f..f686e5e 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -105,10 +105,32 @@
ms->len - MALLOC_ELEM_OVERHEAD);
end_elem = RTE_PTR_ALIGN_FLOOR(end_elem, RTE_CACHE_LINE_SIZE);
const size_t elem_size = (uintptr_t)end_elem - (uintptr_t)start_elem;
+ size_t cur_idx, new_idx, heap_size;
malloc_elem_init(start_elem, heap, ms, elem_size);
malloc_elem_mkend(end_elem, start_elem);
+
+ /* Compare heap index based on its current size as well as
+ * its new size with memseg added. If new size needs new index
+ * move its free_head to the new slot.
+ */
+ cur_idx = malloc_elem_free_list_index(heap->total_size);
+ heap_size = heap->total_size + elem_size;
+ new_idx = malloc_elem_free_list_index(heap_size);
+ if (cur_idx != new_idx) {
+ heap->free_head[new_idx] = heap->free_head[cur_idx];
+ memset(&heap->free_head[cur_idx],
+ 0, sizeof(heap->free_head[cur_idx]));
+ }
+
+ /* malloc_elem_free_list_insert calculates index based on
+ * elem->size, hence we set elem->size as new heap size,
+ * while inserting this elem. After that, we reset elem->size
+ * to its original value. A minor hack though!.
+ */
+ start_elem->size = heap_size;
malloc_elem_free_list_insert(start_elem);
+ start_elem->size = elem_size;
heap->total_size += elem_size;
}
--
1.9.1
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [dpdk-dev] [PATCH 0/2] eal/malloc: fix wrong heap initialization over multiple memsegs
2018-05-03 10:11 [dpdk-dev] [PATCH 0/2] eal/malloc: fix wrong heap initialization over multiple memsegs Gowrishankar
2018-05-03 10:11 ` [dpdk-dev] [PATCH 1/2] eal/malloc: merge malloc_elems in heap if they are contiguous Gowrishankar
2018-05-03 10:11 ` [dpdk-dev] [PATCH 2/2] eal/malloc: fix heap index to correctly insert memseg Gowrishankar
@ 2018-05-18 13:10 ` Thomas Monjalon
2018-05-19 1:35 ` gowrishankar muthukrishnan
2 siblings, 1 reply; 8+ messages in thread
From: Thomas Monjalon @ 2018-05-18 13:10 UTC (permalink / raw)
To: Gowrishankar; +Cc: dev, Anatoly Burakov, chaozhu, pradeep
Hi,
What is the status of DPDK 18.05 on IBM POWER?
This patch suggests there are some issues but there were no news for
two weeks, after comments from Anatoly.
Are we going to release a DPDK which does not work well on POWER?
03/05/2018 12:11, Gowrishankar:
> When there are multiple memsegs (each per hugepage), there are couple of
> problems observed:
>
> 1. Same heap size index is always chosen to add new malloc_elems
> again and again, while there is an increasing heap size actually.
> Hence, when there is memalloc request for size *more than*
> elem->size available in free heap, malloc_heap_alloc would fail.
> In elem_start_pt(), we are actually relying on elem->size at the
> best, for finding suitable element, which is lower than requested
> size, in this case.
>
> Hence, patch 1 in this series addresses this by merging
> contiguous malloc_elem (by virt addresses), so that there is
> better chance of finding suitable elem for the requested size.
>
> 2. Even after resizing the heap malloc_elems, its free_head index
> is still the same, as the memsegs are just added in every malloc_
> elem. If larger memory is requested in rte_malloc, in a way
> that, heap index of requested size is beyond the slot where the
> entire heap is available, malloc_heap_alloc would fail.
> Because, at the time of heap init, only the lower index is
> always chosen to fill up memsegs. Hence, patch 2 addresses this
> by moving the list of malloc_elems into new slot in heap, as its
> size grows.
>
> We encountered these situations as we run ip_reassembly example app,
> when multiple segments are created in VA (when overcommit hugepages set
> in powerpc arch).
>
> These problems are found only in the current releases (until v18.05
> which carries new implementation for dynamic memory allocation).
> These patches are tested with unit tests as well as some of the
> examples apps. I request more testing if possible, on other archs
> as these are problems in available LTS codes as well.
>
> Signed-off-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [dpdk-dev] [PATCH 0/2] eal/malloc: fix wrong heap initialization over multiple memsegs
2018-05-18 13:10 ` [dpdk-dev] [PATCH 0/2] eal/malloc: fix wrong heap initialization over multiple memsegs Thomas Monjalon
@ 2018-05-19 1:35 ` gowrishankar muthukrishnan
0 siblings, 0 replies; 8+ messages in thread
From: gowrishankar muthukrishnan @ 2018-05-19 1:35 UTC (permalink / raw)
To: Thomas Monjalon; +Cc: dev, Anatoly Burakov, chaozhu, pradeep
Hi Thomas,
On Friday 18 May 2018 06:40 PM, Thomas Monjalon wrote:
> Hi,
>
> What is the status of DPDK 18.05 on IBM POWER?
>
> This patch suggests there are some issues but there were no news for
> two weeks, after comments from Anatoly.
> Are we going to release a DPDK which does not work well on POWER?
This issue is not applicable in v18.05 onwards (due to memory rework
changes there).
Apologies for no update meanwhile, I will check on Anatoly's comments
and get back.
My idea is to address this before next candidate ready on LTS releases
(v17.11, v16.11).
Regards,
Gowrishankar
>
> 03/05/2018 12:11, Gowrishankar:
>> When there are multiple memsegs (each per hugepage), there are couple of
>> problems observed:
>>
>> 1. Same heap size index is always chosen to add new malloc_elems
>> again and again, while there is an increasing heap size actually.
>> Hence, when there is memalloc request for size *more than*
>> elem->size available in free heap, malloc_heap_alloc would fail.
>> In elem_start_pt(), we are actually relying on elem->size at the
>> best, for finding suitable element, which is lower than requested
>> size, in this case.
>>
>> Hence, patch 1 in this series addresses this by merging
>> contiguous malloc_elem (by virt addresses), so that there is
>> better chance of finding suitable elem for the requested size.
>>
>> 2. Even after resizing the heap malloc_elems, its free_head index
>> is still the same, as the memsegs are just added in every malloc_
>> elem. If larger memory is requested in rte_malloc, in a way
>> that, heap index of requested size is beyond the slot where the
>> entire heap is available, malloc_heap_alloc would fail.
>> Because, at the time of heap init, only the lower index is
>> always chosen to fill up memsegs. Hence, patch 2 addresses this
>> by moving the list of malloc_elems into new slot in heap, as its
>> size grows.
>>
>> We encountered these situations as we run ip_reassembly example app,
>> when multiple segments are created in VA (when overcommit hugepages set
>> in powerpc arch).
>>
>> These problems are found only in the current releases (until v18.05
>> which carries new implementation for dynamic memory allocation).
>> These patches are tested with unit tests as well as some of the
>> examples apps. I request more testing if possible, on other archs
>> as these are problems in available LTS codes as well.
>>
>> Signed-off-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
>
>
^ permalink raw reply [flat|nested] 8+ messages in thread