* [PATCH] malloc: fix malloc performance may becomes worse as the number of malloc increases @ 2023-02-10 6:30 Fengnan Chang 2023-02-15 10:10 ` Thomas Monjalon 2023-02-20 10:59 ` David Marchand 0 siblings, 2 replies; 9+ messages in thread From: Fengnan Chang @ 2023-02-10 6:30 UTC (permalink / raw) To: anatoly.burakov; +Cc: rsanford, dev, Fengnan Chang Here is a simple test case: " uint64_t entry_time, time; size_t size = 4096; unsigned align = 4096; for (int j = 0; j < 10; j++) { entry_time = rte_get_timer_cycles(); for (int i = 0; i < 2000; i++) { rte_malloc(NULL, size, align); } time = (rte_get_timer_cycles()-entry_time) * 1000000 / rte_get_timer_hz(); printf("total open time %lu avg time %lu\n", time, time/2000); } " Single rte_malloc cost time may becomes wrose as the number of malloc increases, In my env, first round avg time is 15us, second is 44us, third is 77us, fourth is 168us... The reason is,in the malloc process, malloc_elem_alloc may split new_elem if there have too much free space after new_elem, and insert the trailer into freelist. When alloc 4k with align 4k, the trailer very likely insert to free_head[2] again, it makes free_head[2] longer. when alloc 4k again, it will search free_head[2] from begin, with the number of malloc increases, search free_head[2] need more time, so the performance will become worse. Same problem will also occurs in alloc 64k with align 64k, but if alloc 4k with align 64, doesn't have this problem. Fix this by adjust free_head list size range, make free_head[3] hold elements which bigger or equal 4k, free_head[4] hold elements which bigger or equal 16k. In terms of probabilities, when alloc 4k or 16k, the probability of finding a suitable elem from a larger size list is greater than from a smaller size list. Signed-off-by: Fengnan Chang <changfengnan@bytedance.com> --- lib/eal/common/malloc_elem.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/lib/eal/common/malloc_elem.c b/lib/eal/common/malloc_elem.c index 83f05497cc..35a2313d04 100644 --- a/lib/eal/common/malloc_elem.c +++ b/lib/eal/common/malloc_elem.c @@ -367,11 +367,11 @@ prev_elem_is_adjacent(struct malloc_elem *elem) * containing larger elements. * * Example element size ranges for a heap with five free lists: - * heap->free_head[0] - (0 , 2^8] - * heap->free_head[1] - (2^8 , 2^10] - * heap->free_head[2] - (2^10 ,2^12] - * heap->free_head[3] - (2^12, 2^14] - * heap->free_head[4] - (2^14, MAX_SIZE] + * heap->free_head[0] - (0 , 2^8) + * heap->free_head[1] - [2^8 , 2^10) + * heap->free_head[2] - [2^10 ,2^12) + * heap->free_head[3] - [2^12, 2^14) + * heap->free_head[4] - [2^14, MAX_SIZE] */ size_t malloc_elem_free_list_index(size_t size) @@ -385,8 +385,8 @@ malloc_elem_free_list_index(size_t size) if (size <= (1UL << MALLOC_MINSIZE_LOG2)) return 0; - /* Find next power of 2 >= size. */ - log2 = sizeof(size) * 8 - __builtin_clzl(size - 1); + /* Find next power of 2 > size. */ + log2 = sizeof(size) * 8 - __builtin_clzl(size); /* Compute freelist index, based on log2(size). */ index = (log2 - MALLOC_MINSIZE_LOG2 + MALLOC_LOG2_INCREMENT - 1) / -- 2.37.0 (Apple Git-136) ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] malloc: fix malloc performance may becomes worse as the number of malloc increases 2023-02-10 6:30 [PATCH] malloc: fix malloc performance may becomes worse as the number of malloc increases Fengnan Chang @ 2023-02-15 10:10 ` Thomas Monjalon 2023-02-15 11:10 ` Morten Brørup 2023-02-16 10:40 ` Liang Ma 2023-02-20 10:59 ` David Marchand 1 sibling, 2 replies; 9+ messages in thread From: Thomas Monjalon @ 2023-02-15 10:10 UTC (permalink / raw) To: anatoly.burakov, Fengnan Chang Cc: dev, rsanford, bruce.richardson, Morten Brørup, maxime.coquelin, david.marchand, jerinj, honnappa.nagarahalli Looking for reviewers please. 10/02/2023 07:30, Fengnan Chang: > Here is a simple test case: > " > uint64_t entry_time, time; > size_t size = 4096; > unsigned align = 4096; > for (int j = 0; j < 10; j++) { > entry_time = rte_get_timer_cycles(); > for (int i = 0; i < 2000; i++) { > rte_malloc(NULL, size, align); > } > time = (rte_get_timer_cycles()-entry_time) * 1000000 / > rte_get_timer_hz(); > printf("total open time %lu avg time %lu\n", time, time/2000); > } > " > > Single rte_malloc cost time may becomes wrose as the number of malloc > increases, In my env, first round avg time is 15us, second is 44us, > third is 77us, fourth is 168us... > > The reason is,in the malloc process, malloc_elem_alloc may split new_elem > if there have too much free space after new_elem, and insert the trailer > into freelist. When alloc 4k with align 4k, the trailer very likely insert > to free_head[2] again, it makes free_head[2] longer. when alloc 4k again, > it will search free_head[2] from begin, with the number of malloc increases, > search free_head[2] need more time, so the performance will become worse. > Same problem will also occurs in alloc 64k with align 64k, but if alloc > 4k with align 64, doesn't have this problem. > > Fix this by adjust free_head list size range, make free_head[3] hold > elements which bigger or equal 4k, free_head[4] hold elements which bigger > or equal 16k. > In terms of probabilities, when alloc 4k or 16k, the probability of finding > a suitable elem from a larger size list is greater than from a smaller > size list. > > Signed-off-by: Fengnan Chang <changfengnan@bytedance.com> > --- > lib/eal/common/malloc_elem.c | 14 +++++++------- > 1 file changed, 7 insertions(+), 7 deletions(-) > > diff --git a/lib/eal/common/malloc_elem.c b/lib/eal/common/malloc_elem.c > index 83f05497cc..35a2313d04 100644 > --- a/lib/eal/common/malloc_elem.c > +++ b/lib/eal/common/malloc_elem.c > @@ -367,11 +367,11 @@ prev_elem_is_adjacent(struct malloc_elem *elem) > * containing larger elements. > * > * Example element size ranges for a heap with five free lists: > - * heap->free_head[0] - (0 , 2^8] > - * heap->free_head[1] - (2^8 , 2^10] > - * heap->free_head[2] - (2^10 ,2^12] > - * heap->free_head[3] - (2^12, 2^14] > - * heap->free_head[4] - (2^14, MAX_SIZE] > + * heap->free_head[0] - (0 , 2^8) > + * heap->free_head[1] - [2^8 , 2^10) > + * heap->free_head[2] - [2^10 ,2^12) > + * heap->free_head[3] - [2^12, 2^14) > + * heap->free_head[4] - [2^14, MAX_SIZE] > */ > size_t > malloc_elem_free_list_index(size_t size) > @@ -385,8 +385,8 @@ malloc_elem_free_list_index(size_t size) > if (size <= (1UL << MALLOC_MINSIZE_LOG2)) > return 0; > > - /* Find next power of 2 >= size. */ > - log2 = sizeof(size) * 8 - __builtin_clzl(size - 1); > + /* Find next power of 2 > size. */ > + log2 = sizeof(size) * 8 - __builtin_clzl(size); > > /* Compute freelist index, based on log2(size). */ > index = (log2 - MALLOC_MINSIZE_LOG2 + MALLOC_LOG2_INCREMENT - 1) / > ^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: [PATCH] malloc: fix malloc performance may becomes worse as the number of malloc increases 2023-02-15 10:10 ` Thomas Monjalon @ 2023-02-15 11:10 ` Morten Brørup 2023-02-15 17:16 ` Stephen Hemminger 2023-02-16 14:02 ` Liang Ma 2023-02-16 10:40 ` Liang Ma 1 sibling, 2 replies; 9+ messages in thread From: Morten Brørup @ 2023-02-15 11:10 UTC (permalink / raw) To: Thomas Monjalon, anatoly.burakov, Fengnan Chang Cc: dev, rsanford, bruce.richardson, maxime.coquelin, david.marchand, jerinj, honnappa.nagarahalli, Fidaullah Noonari +CC: Fidaullah Noonari <fidaullah.noonari@emumba.com>, your name also shows up in the git log; perhaps you can help review this patch. > From: Thomas Monjalon [mailto:thomas@monjalon.net] > Sent: Wednesday, 15 February 2023 11.10 > > Looking for reviewers please. > > 10/02/2023 07:30, Fengnan Chang: > > Here is a simple test case: > > " > > uint64_t entry_time, time; > > size_t size = 4096; > > unsigned align = 4096; > > for (int j = 0; j < 10; j++) { > > entry_time = rte_get_timer_cycles(); > > for (int i = 0; i < 2000; i++) { > > rte_malloc(NULL, size, align); > > } > > time = (rte_get_timer_cycles()-entry_time) * 1000000 / > > rte_get_timer_hz(); > > printf("total open time %lu avg time %lu\n", time, time/2000); > > } > > " > > > > Single rte_malloc cost time may becomes wrose as the number of malloc > > increases, In my env, first round avg time is 15us, second is 44us, > > third is 77us, fourth is 168us... > > > > The reason is,in the malloc process, malloc_elem_alloc may split > new_elem > > if there have too much free space after new_elem, and insert the > trailer > > into freelist. When alloc 4k with align 4k, the trailer very likely > insert > > to free_head[2] again, it makes free_head[2] longer. when alloc 4k > again, > > it will search free_head[2] from begin, with the number of malloc > increases, > > search free_head[2] need more time, so the performance will become > worse. > > Same problem will also occurs in alloc 64k with align 64k, but if > alloc > > 4k with align 64, doesn't have this problem. > > > > Fix this by adjust free_head list size range, make free_head[3] hold > > elements which bigger or equal 4k, free_head[4] hold elements which > bigger > > or equal 16k. > > In terms of probabilities, when alloc 4k or 16k, the probability of > finding > > a suitable elem from a larger size list is greater than from a > smaller > > size list. > > > > Signed-off-by: Fengnan Chang <changfengnan@bytedance.com> > > --- > > lib/eal/common/malloc_elem.c | 14 +++++++------- > > 1 file changed, 7 insertions(+), 7 deletions(-) > > > > diff --git a/lib/eal/common/malloc_elem.c > b/lib/eal/common/malloc_elem.c > > index 83f05497cc..35a2313d04 100644 > > --- a/lib/eal/common/malloc_elem.c > > +++ b/lib/eal/common/malloc_elem.c > > @@ -367,11 +367,11 @@ prev_elem_is_adjacent(struct malloc_elem *elem) > > * containing larger elements. > > * > > * Example element size ranges for a heap with five free lists: > > - * heap->free_head[0] - (0 , 2^8] > > - * heap->free_head[1] - (2^8 , 2^10] > > - * heap->free_head[2] - (2^10 ,2^12] > > - * heap->free_head[3] - (2^12, 2^14] > > - * heap->free_head[4] - (2^14, MAX_SIZE] > > + * heap->free_head[0] - (0 , 2^8) > > + * heap->free_head[1] - [2^8 , 2^10) > > + * heap->free_head[2] - [2^10 ,2^12) > > + * heap->free_head[3] - [2^12, 2^14) > > + * heap->free_head[4] - [2^14, MAX_SIZE] > > */ > > size_t > > malloc_elem_free_list_index(size_t size) > > @@ -385,8 +385,8 @@ malloc_elem_free_list_index(size_t size) > > if (size <= (1UL << MALLOC_MINSIZE_LOG2)) > > return 0; > > > > - /* Find next power of 2 >= size. */ > > - log2 = sizeof(size) * 8 - __builtin_clzl(size - 1); > > + /* Find next power of 2 > size. */ > > + log2 = sizeof(size) * 8 - __builtin_clzl(size); > > > > /* Compute freelist index, based on log2(size). */ > > index = (log2 - MALLOC_MINSIZE_LOG2 + MALLOC_LOG2_INCREMENT - 1) > / > > I gave up reviewing in depth, because the existing code is not easy to quickly understand, and it would take too long for me. E.g. the malloc_elem->size is field is undocumented, and find_suitable_element() calls the malloc_elem_free_list_index() function with the raw size (as passed to the function), but malloc_elem_free_list_insert() calls the malloc_elem_free_list_index() with malloc_elem->size - MALLOC_ELEM_HEADER_LEN. Looking isolated at the patch itself... I agree with the way the patch modifies the ranges of the free list, and the consequential removal of the "- 1" from the calculation of log2. Intuitively, the lists should cover ranges such as [0x100..0x3FF], which this patch does, not [0x101..0x400], how it was previously... The ranges with this patch make much more sense. So if the existing code is otherwise correct, i.e. handles the size with/without MALLOC_ELEM_HEADER_LEN correctly, my gut feeling says this patch is an improvement. Acked-by: Morten Brørup <mb@smartsharesystems.com> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] malloc: fix malloc performance may becomes worse as the number of malloc increases 2023-02-15 11:10 ` Morten Brørup @ 2023-02-15 17:16 ` Stephen Hemminger 2023-02-16 2:54 ` [External] " Fengnan Chang 2023-02-16 14:02 ` Liang Ma 1 sibling, 1 reply; 9+ messages in thread From: Stephen Hemminger @ 2023-02-15 17:16 UTC (permalink / raw) To: Morten Brørup Cc: Thomas Monjalon, anatoly.burakov, Fengnan Chang, dev, rsanford, bruce.richardson, maxime.coquelin, david.marchand, jerinj, honnappa.nagarahalli, Fidaullah Noonari On Wed, 15 Feb 2023 12:10:23 +0100 Morten Brørup <mb@smartsharesystems.com> wrote: > Looking isolated at the patch itself... > > I agree with the way the patch modifies the ranges of the free list, and the consequential removal of the "- 1" from the calculation of log2. > > Intuitively, the lists should cover ranges such as [0x100..0x3FF], which this patch does, not [0x101..0x400], how it was previously... The ranges with this patch make much more sense. > > So if the existing code is otherwise correct, i.e. handles the size with/without MALLOC_ELEM_HEADER_LEN correctly, my gut feeling says this patch is an improvement. > > Acked-by: Morten Brørup <mb@smartsharesystems.com> It would be good to have a malloc performance test. Possibly something reused from some other project. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [External] Re: [PATCH] malloc: fix malloc performance may becomes worse as the number of malloc increases 2023-02-15 17:16 ` Stephen Hemminger @ 2023-02-16 2:54 ` Fengnan Chang 0 siblings, 0 replies; 9+ messages in thread From: Fengnan Chang @ 2023-02-16 2:54 UTC (permalink / raw) To: Stephen Hemminger Cc: Morten Brørup, Thomas Monjalon, anatoly.burakov, dev, rsanford, bruce.richardson, maxime.coquelin, david.marchand, jerinj, honnappa.nagarahalli, Fidaullah Noonari Stephen Hemminger <stephen@networkplumber.org> 于2023年2月16日周四 01:16写道: > > On Wed, 15 Feb 2023 12:10:23 +0100 > Morten Brørup <mb@smartsharesystems.com> wrote: > > > Looking isolated at the patch itself... > > > > I agree with the way the patch modifies the ranges of the free list, and the consequential removal of the "- 1" from the calculation of log2. > > > > Intuitively, the lists should cover ranges such as [0x100..0x3FF], which this patch does, not [0x101..0x400], how it was previously... The ranges with this patch make much more sense. > > > > So if the existing code is otherwise correct, i.e. handles the size with/without MALLOC_ELEM_HEADER_LEN correctly, my gut feeling says this patch is an improvement. > > > > Acked-by: Morten Brørup <mb@smartsharesystems.com> > > It would be good to have a malloc performance test. > Possibly something reused from some other project. I have done some performance tests in SPDK before,maybe available for your reference: https://bytedance.feishu.cn/wiki/wikcnKWXQmN4qhyxLcWJqP4vrOs In one word, the performance of 4k malloc has improved, and other cases almost the same. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] malloc: fix malloc performance may becomes worse as the number of malloc increases 2023-02-15 11:10 ` Morten Brørup 2023-02-15 17:16 ` Stephen Hemminger @ 2023-02-16 14:02 ` Liang Ma 2023-02-17 2:14 ` [External] " Fengnan Chang 1 sibling, 1 reply; 9+ messages in thread From: Liang Ma @ 2023-02-16 14:02 UTC (permalink / raw) To: Morten Brørup Cc: Thomas Monjalon, anatoly.burakov, Fengnan Chang, dev, rsanford, bruce.richardson, maxime.coquelin, david.marchand, jerinj, honnappa.nagarahalli, Fidaullah Noonari On Wed, Feb 15, 2023 at 12:10:23PM +0100, Morten Brørup wrote: > +CC: Fidaullah Noonari <fidaullah.noonari@emumba.com>, your name also shows up in the git log; perhaps you can help review this patch. > > > I gave up reviewing in depth, because the existing code is not easy to quickly understand, and it would take too long for me. E.g. the malloc_elem->size is field is undocumented, and find_suitable_element() calls the malloc_elem_free_list_index() function with the raw size (as passed to the function), but malloc_elem_free_list_insert() calls the malloc_elem_free_list_index() with malloc_elem->size - MALLOC_ELEM_HEADER_LEN. > > Looking isolated at the patch itself... > > I agree with the way the patch modifies the ranges of the free list, and the consequential removal of the "- 1" from the calculation of log2. > > Intuitively, the lists should cover ranges such as [0x100..0x3FF], which this patch does, not [0x101..0x400], how it was previously... The ranges with this patch make much more sense. > > So if the existing code is otherwise correct, i.e. handles the size with/without MALLOC_ELEM_HEADER_LEN correctly, my gut feeling says this patch is an improvement. > > Acked-by: Morten Brørup <mb@smartsharesystems.com> I run the test with DPDK malloc perf test. The issue is replicated. IMO, the whole process is if application use rte_malloc with a relative large alignment size. e.g. 4K alignment. Currently implementation will generate lots "fragment" because of the large alignment in related mem element free list. In the test code, 4K malloc size + 4k alignment will lead to the actually space is needed has to take from upper level mem element idx free list. The consequence is that will generate lots fragment element and those element is inserted to the current level mem free-list. However, when the rte_malloc select which free list to start scan with, it doesn't take the alignment into account, which ends up with waste some time in the current level free-list. If the scan logic take alignment into account, it might "smartly" skip current level and jump to the upper level directly. > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [External] Re: [PATCH] malloc: fix malloc performance may becomes worse as the number of malloc increases 2023-02-16 14:02 ` Liang Ma @ 2023-02-17 2:14 ` Fengnan Chang 0 siblings, 0 replies; 9+ messages in thread From: Fengnan Chang @ 2023-02-17 2:14 UTC (permalink / raw) To: Liang Ma Cc: Morten Brørup, Thomas Monjalon, anatoly.burakov, dev, rsanford, bruce.richardson, maxime.coquelin, david.marchand, jerinj, honnappa.nagarahalli, Fidaullah Noonari Liang Ma <liangma@liangbit.com> 于2023年2月16日周四 22:04写道: > > On Wed, Feb 15, 2023 at 12:10:23PM +0100, Morten Brørup wrote: > > +CC: Fidaullah Noonari <fidaullah.noonari@emumba.com>, your name also shows up in the git log; perhaps you can help review this patch. > > > > > > I gave up reviewing in depth, because the existing code is not easy to quickly understand, and it would take too long for me. E.g. the malloc_elem->size is field is undocumented, and find_suitable_element() calls the malloc_elem_free_list_index() function with the raw size (as passed to the function), but malloc_elem_free_list_insert() calls the malloc_elem_free_list_index() with malloc_elem->size - MALLOC_ELEM_HEADER_LEN. > > > > Looking isolated at the patch itself... > > > > I agree with the way the patch modifies the ranges of the free list, and the consequential removal of the "- 1" from the calculation of log2. > > > > Intuitively, the lists should cover ranges such as [0x100..0x3FF], which this patch does, not [0x101..0x400], how it was previously... The ranges with this patch make much more sense. > > > > So if the existing code is otherwise correct, i.e. handles the size with/without MALLOC_ELEM_HEADER_LEN correctly, my gut feeling says this patch is an improvement. > > > > Acked-by: Morten Brørup <mb@smartsharesystems.com> > I run the test with DPDK malloc perf test. The issue is replicated. > IMO, the whole process is if application use rte_malloc with a relative > large alignment size. e.g. 4K alignment. Currently implementation will > generate lots "fragment" because of the large alignment in related mem > element free list. In the test code, 4K malloc size + 4k alignment will > lead to the actually space is needed has to take from upper level mem > element idx free list. The consequence is that will generate lots > fragment element and those element is inserted to the current level mem > free-list. However, when the rte_malloc select which free list to start > scan with, it doesn't take the alignment into account, which ends up > with waste some time in the current level free-list. If the scan logic > take alignment into account, it might "smartly" skip current level and > jump to the upper level directly. > Thank you for the detailed explanation ! You may have already found the problem that this patch can only solve the fragmentation problem of 4k/16k/64k scenes, not work for 3k/15k/63k. For example, alloc 3k and align with 1k, also has this problem. Now this patch can't handle all similar scenarios,We already started working on a new soultion which aims to solve all similar problems . BTW, this patch is still useful, because even without fragmentation problems, the probability of finding a suitable elem from a larger size list is greater than from a smaller size list. > > > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] malloc: fix malloc performance may becomes worse as the number of malloc increases 2023-02-15 10:10 ` Thomas Monjalon 2023-02-15 11:10 ` Morten Brørup @ 2023-02-16 10:40 ` Liang Ma 1 sibling, 0 replies; 9+ messages in thread From: Liang Ma @ 2023-02-16 10:40 UTC (permalink / raw) To: Thomas Monjalon Cc: anatoly.burakov, Fengnan Chang, dev, rsanford, bruce.richardson, Morten Brørup, maxime.coquelin, david.marchand, jerinj, honnappa.nagarahalli On Wed, Feb 15, 2023 at 11:10:25AM +0100, Thomas Monjalon wrote: > Looking for reviewers please. I will help on this. > > 10/02/2023 07:30, Fengnan Chang: > > Here is a simple test case: > > " > > uint64_t entry_time, time; > > size_t size = 4096; > > unsigned align = 4096; > > for (int j = 0; j < 10; j++) { > > entry_time = rte_get_timer_cycles(); > > for (int i = 0; i < 2000; i++) { > > rte_malloc(NULL, size, align); > > } > > time = (rte_get_timer_cycles()-entry_time) * 1000000 / > > rte_get_timer_hz(); > > printf("total open time %lu avg time %lu\n", time, time/2000); > > } > > " > > > > Single rte_malloc cost time may becomes wrose as the number of malloc > > increases, In my env, first round avg time is 15us, second is 44us, > > third is 77us, fourth is 168us... > > > > The reason is,in the malloc process, malloc_elem_alloc may split new_elem > > if there have too much free space after new_elem, and insert the trailer > > into freelist. When alloc 4k with align 4k, the trailer very likely insert > > to free_head[2] again, it makes free_head[2] longer. when alloc 4k again, > > it will search free_head[2] from begin, with the number of malloc increases, > > search free_head[2] need more time, so the performance will become worse. > > Same problem will also occurs in alloc 64k with align 64k, but if alloc > > 4k with align 64, doesn't have this problem. > > > > Fix this by adjust free_head list size range, make free_head[3] hold > > elements which bigger or equal 4k, free_head[4] hold elements which bigger > > or equal 16k. > > In terms of probabilities, when alloc 4k or 16k, the probability of finding > > a suitable elem from a larger size list is greater than from a smaller > > size list. > > > > Signed-off-by: Fengnan Chang <changfengnan@bytedance.com> > > --- > > lib/eal/common/malloc_elem.c | 14 +++++++------- > > 1 file changed, 7 insertions(+), 7 deletions(-) > > > > diff --git a/lib/eal/common/malloc_elem.c b/lib/eal/common/malloc_elem.c > > index 83f05497cc..35a2313d04 100644 > > --- a/lib/eal/common/malloc_elem.c > > +++ b/lib/eal/common/malloc_elem.c > > @@ -367,11 +367,11 @@ prev_elem_is_adjacent(struct malloc_elem *elem) > > * containing larger elements. > > * > > * Example element size ranges for a heap with five free lists: > > - * heap->free_head[0] - (0 , 2^8] > > - * heap->free_head[1] - (2^8 , 2^10] > > - * heap->free_head[2] - (2^10 ,2^12] > > - * heap->free_head[3] - (2^12, 2^14] > > - * heap->free_head[4] - (2^14, MAX_SIZE] > > + * heap->free_head[0] - (0 , 2^8) > > + * heap->free_head[1] - [2^8 , 2^10) > > + * heap->free_head[2] - [2^10 ,2^12) > > + * heap->free_head[3] - [2^12, 2^14) > > + * heap->free_head[4] - [2^14, MAX_SIZE] > > */ > > size_t > > malloc_elem_free_list_index(size_t size) > > @@ -385,8 +385,8 @@ malloc_elem_free_list_index(size_t size) > > if (size <= (1UL << MALLOC_MINSIZE_LOG2)) > > return 0; > > > > - /* Find next power of 2 >= size. */ > > - log2 = sizeof(size) * 8 - __builtin_clzl(size - 1); > > + /* Find next power of 2 > size. */ > > + log2 = sizeof(size) * 8 - __builtin_clzl(size); > > > > /* Compute freelist index, based on log2(size). */ > > index = (log2 - MALLOC_MINSIZE_LOG2 + MALLOC_LOG2_INCREMENT - 1) / > > > > > > > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] malloc: fix malloc performance may becomes worse as the number of malloc increases 2023-02-10 6:30 [PATCH] malloc: fix malloc performance may becomes worse as the number of malloc increases Fengnan Chang 2023-02-15 10:10 ` Thomas Monjalon @ 2023-02-20 10:59 ` David Marchand 1 sibling, 0 replies; 9+ messages in thread From: David Marchand @ 2023-02-20 10:59 UTC (permalink / raw) To: Fengnan Chang Cc: anatoly.burakov, rsanford, dev, Morten Brørup, Liang Ma, Stephen Hemminger On Fri, Feb 10, 2023 at 7:30 AM Fengnan Chang <changfengnan@bytedance.com> wrote: > > Here is a simple test case: > " > uint64_t entry_time, time; > size_t size = 4096; > unsigned align = 4096; > for (int j = 0; j < 10; j++) { > entry_time = rte_get_timer_cycles(); > for (int i = 0; i < 2000; i++) { > rte_malloc(NULL, size, align); > } > time = (rte_get_timer_cycles()-entry_time) * 1000000 / > rte_get_timer_hz(); > printf("total open time %lu avg time %lu\n", time, time/2000); > } > " > > Single rte_malloc cost time may becomes wrose as the number of malloc > increases, In my env, first round avg time is 15us, second is 44us, > third is 77us, fourth is 168us... > > The reason is,in the malloc process, malloc_elem_alloc may split new_elem > if there have too much free space after new_elem, and insert the trailer > into freelist. When alloc 4k with align 4k, the trailer very likely insert > to free_head[2] again, it makes free_head[2] longer. when alloc 4k again, > it will search free_head[2] from begin, with the number of malloc increases, > search free_head[2] need more time, so the performance will become worse. > Same problem will also occurs in alloc 64k with align 64k, but if alloc > 4k with align 64, doesn't have this problem. > > Fix this by adjust free_head list size range, make free_head[3] hold > elements which bigger or equal 4k, free_head[4] hold elements which bigger > or equal 16k. > In terms of probabilities, when alloc 4k or 16k, the probability of finding > a suitable elem from a larger size list is greater than from a smaller > size list. > > Signed-off-by: Fengnan Chang <changfengnan@bytedance.com> Acked-by: Morten Brørup <mb@smartsharesystems.com> The change looks simple enough. I see an improvement with the (verbatim) malloc_perf_autotest unit tests for 1M allocations too. Let's take this change now and see how it goes with -rc1 testing. Applied, thanks. -- David Marchand ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2023-02-20 10:59 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-02-10 6:30 [PATCH] malloc: fix malloc performance may becomes worse as the number of malloc increases Fengnan Chang 2023-02-15 10:10 ` Thomas Monjalon 2023-02-15 11:10 ` Morten Brørup 2023-02-15 17:16 ` Stephen Hemminger 2023-02-16 2:54 ` [External] " Fengnan Chang 2023-02-16 14:02 ` Liang Ma 2023-02-17 2:14 ` [External] " Fengnan Chang 2023-02-16 10:40 ` Liang Ma 2023-02-20 10:59 ` David Marchand
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).