DPDK patches and discussions
 help / color / mirror / Atom feed
* DPDK 19.11.5 Legacy Memory Design Query
@ 2022-09-14  7:30 Umakiran Godavarthi (ugodavar)
  2022-09-21  6:50 ` Umakiran Godavarthi (ugodavar)
  0 siblings, 1 reply; 11+ messages in thread
From: Umakiran Godavarthi (ugodavar) @ 2022-09-14  7:30 UTC (permalink / raw)
  To: anatoly.burakov, dev

[-- Attachment #1: Type: text/plain, Size: 4938 bytes --]

Hi Anatoly/DPDK-Developers

I am working on DPDK 19.11.5 Legacy Memory design and have a query about how to boot up in Legacy memory mode.


  1.
Linux kernel boots up with huge pages (‘N’) and free huge pages (‘N’) initially


  1.  We calculate huge pages we need for data path (Driver need buffers for all queues) by sorting the memory fragments 2MB huge pages fragments and find we need ‘X’ pages , so we go to kernel and set SYSFS attribute nr_hugepages (/proc/sys/vm/nr_hugepages) to X pages.   We also store the sorted physical memory fragments as POOL_0, POOL_1, POOL_2….etc. (Just for pool count purpose enough for all ports and queues to initialize)

For example, if host has memory pattern huge pages like this for total 500 we get in step 1 kernel reservation.

250, 90, 80, 70 , 10 -> Sum is 500 (N pages)

We need only 350 pages (350 based on no of ports, queues dpdk application needs)

So we need 250, 90, 10.

So total 3 pools POOL_0 -> 250 pages, POOL_1 -> 90, POOL_2 -> 10 pages


  1.  We boot up DPDK by RTE_EAL_INIT


  1.  Then we go to DPDK Memory segment list walkthrough and for each FBARRAY , we find the used pages by DPDK and unmap the remaining pages by below code (Idea is to free the huge pages taken by DPDK process virtual memory) -> Free_HP will be 0 then, as X pages are used by DPDK and all unnecessary pages are freed in this step)
Sample Code of 4 :

              rte_memseg_list_walk_thread_unsafe(dpdk_find_and_free_unused, NULL); ->DPDK_FIND_AND_FREE_UNUSED is called for each Memory segment list (FBARRAY pointer is derived from MSL like below)

              dpdk_find_and_free_unused(const struct rte_memseg_list *msl,
                                          void *arg UNUSED)
               {
                      Int ms_idx;
                       arr = (struct rte_fbarray *) &msl->memseg_arr;

                        /*
                         * use size of 2 instead of 1 to find the next free slot but
                        * not hole.
                        */
                     ms_idx = rte_fbarray_find_next_n_free(arr, 0, 2);

                     if (ms_idx >= 0) {
                         addr = RTE_PTR_ADD(msl->base_va, ms_idx * msl->page_sz);
                            munmap(addr, RTE_PTR_DIFF(RTE_PTR_ADD(msl->base_va, msl->len), addr));
                      }
               }


  1.  With NR_PAGES As ‘X’ and FREE_PAGES as 0, we create MBUF pools using RTE API and we face crash (We take care ‘X” pages has multi pools based on memory fragmentation given to the primary process, so pool by pool we have confidence that DPDK should find physical memory segment contiguous and allocate successfully)

                    struct rte_mempool *pool;

                   pool = rte_pktmbuf_pool_create(name, num_mbufs,
                                   RTE_MIN(num_mbufs/4, MBUF_CACHE_SIZE),
                                   MBUF_PRIV_SIZE,
                                   frame_len + RTE_PKTMBUF_HEADROOM,
                                   rte_socket_id()



  1.  Sometimes randomly we face a crash during pool create in Step 5 for each POOL stored in Step 2 process for all ports and queues initialize later on

               DPDK EAL core comes with BT like this

                 #6  sigcrash (signo=11, info=0x7fff1c1867f0, ctx=0x7fff1c1866c0)
    #7
    #8  malloc_elem_can_hold () from ./usr/lib64/dpdk-19/librte_eal.so.20.0
    #9  find_suitable_element () from ./usr/lib64/dpdk-19/librte_eal.so.20.0
    #10  malloc_heap_alloc () from ./usr/lib64/dpdk-19/librte_eal.so.20.0
    #11  rte_malloc_socket () from ./usr/lib64/dpdk-19/librte_eal.so.20.0
    #12  rte_mempool_create_empty () from ./usr/lib64/dpdk-19/librte_mempool.so.20.0
    #13  rte_pktmbuf_pool_create_by_ops () from ./usr/lib64/dpdk-19/librte_mbuf.so.20.0
    #14  rte_pktmbuf_pool_create () from ./usr/lib64/dpdk-19/librte_mbuf.so.20.0
    #15  dpdk_create_mbuf_pool (mem_chunk_tbl=0x555d556de8e0 , num_mbufs=46080, frame_len=2048, name=0x7fff1c1873c0 "DPDK_POOL_0")

                We see find suitable element does not able to find a suitable element in DPDK memory segment lists it searches for HEAP ALLOC and returns NULL and NULL dereference crashes  boot up process

Please let me know any comments on boot up process for 1-6 and any reason behind the crash ?

We are suspecting Step 4 where FBARRAY unused pages freeing at last should free the least contiguous memory segments right ? (munmap after finding 2 free pages , entire length we unmap in step 4 to free virtual memory)

Please let me know thoughts on FBARRAY design, is it expected to map the most contiguous…..least contiguous in a virtual address space right ?

So our most contiguous segments in Step 2 is safe even after Step 3, Step 4 we believe. Please correct my understanding if anything wrong.


Thanks
Umakiran


[-- Attachment #2: Type: text/html, Size: 27029 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: DPDK 19.11.5 Legacy Memory Design Query
  2022-09-14  7:30 DPDK 19.11.5 Legacy Memory Design Query Umakiran Godavarthi (ugodavar)
@ 2022-09-21  6:50 ` Umakiran Godavarthi (ugodavar)
  2022-09-22  8:08   ` Umakiran Godavarthi (ugodavar)
  0 siblings, 1 reply; 11+ messages in thread
From: Umakiran Godavarthi (ugodavar) @ 2022-09-21  6:50 UTC (permalink / raw)
  To: anatoly.burakov, dev

[-- Attachment #1: Type: text/plain, Size: 6309 bytes --]

Hi Team,
Have sent a message to DPDK alias. Can you please have a look and share
your thoughts on this ?

Please reply on legacy memory design and thoughts on the crash reason ?


                 #6  sigcrash (signo=11, info=0x7fff1c1867f0, ctx=0x7fff1c1866c0)
    #7
    #8  malloc_elem_can_hold () from ./usr/lib64/dpdk-19/librte_eal.so.20.0
    #9  find_suitable_element () from ./usr/lib64/dpdk-19/librte_eal.so.20.0
    #10  malloc_heap_alloc () from ./usr/lib64/dpdk-19/librte_eal.so.20.0
    #11  rte_malloc_socket () from ./usr/lib64/dpdk-19/librte_eal.so.20.0
    #12  rte_mempool_create_empty () from ./usr/lib64/dpdk-19/librte_mempool.so.20.0
    #13  rte_pktmbuf_pool_create_by_ops () from ./usr/lib64/dpdk-19/librte_mbuf.so.20.0
    #14  rte_pktmbuf_pool_create () from ./usr/lib64/dpdk-19/librte_mbuf.so.20.0
    #15  dpdk_create_mbuf_pool (mem_chunk_tbl=0x555d556de8e0 , num_mbufs=46080, frame_len=2048, name=0x7fff1c1873c0 "DPDK_POOL_0")

Is there a bug in DPDK 19.11.5 it crashes when it searches for pages for POOL creation

Please let us know this is second email

Thanks
Umakiran

From: Umakiran Godavarthi (ugodavar) <ugodavar@cisco.com>
Date: Wednesday, 14 September 2022 at 1:00 PM
To: anatoly.burakov@intel.com <anatoly.burakov@intel.com>, dev@dpdk.org <dev@dpdk.org>
Subject: DPDK 19.11.5 Legacy Memory Design Query
Hi Anatoly/DPDK-Developers

I am working on DPDK 19.11.5 Legacy Memory design and have a query about how to boot up in Legacy memory mode.


  1.
Linux kernel boots up with huge pages (‘N’) and free huge pages (‘N’) initially


  1.  We calculate huge pages we need for data path (Driver need buffers for all queues) by sorting the memory fragments 2MB huge pages fragments and find we need ‘X’ pages , so we go to kernel and set SYSFS attribute nr_hugepages (/proc/sys/vm/nr_hugepages) to X pages.   We also store the sorted physical memory fragments as POOL_0, POOL_1, POOL_2….etc. (Just for pool count purpose enough for all ports and queues to initialize)

For example, if host has memory pattern huge pages like this for total 500 we get in step 1 kernel reservation.

250, 90, 80, 70 , 10 -> Sum is 500 (N pages)

We need only 350 pages (350 based on no of ports, queues dpdk application needs)

So we need 250, 90, 10.

So total 3 pools POOL_0 -> 250 pages, POOL_1 -> 90, POOL_2 -> 10 pages


  1.  We boot up DPDK by RTE_EAL_INIT


  1.  Then we go to DPDK Memory segment list walkthrough and for each FBARRAY , we find the used pages by DPDK and unmap the remaining pages by below code (Idea is to free the huge pages taken by DPDK process virtual memory) -> Free_HP will be 0 then, as X pages are used by DPDK and all unnecessary pages are freed in this step)
Sample Code of 4 :

              rte_memseg_list_walk_thread_unsafe(dpdk_find_and_free_unused, NULL); ->DPDK_FIND_AND_FREE_UNUSED is called for each Memory segment list (FBARRAY pointer is derived from MSL like below)

              dpdk_find_and_free_unused(const struct rte_memseg_list *msl,
                                          void *arg UNUSED)
               {
                      Int ms_idx;
                       arr = (struct rte_fbarray *) &msl->memseg_arr;

                        /*
                         * use size of 2 instead of 1 to find the next free slot but
                        * not hole.
                        */
                     ms_idx = rte_fbarray_find_next_n_free(arr, 0, 2);

                     if (ms_idx >= 0) {
                         addr = RTE_PTR_ADD(msl->base_va, ms_idx * msl->page_sz);
                            munmap(addr, RTE_PTR_DIFF(RTE_PTR_ADD(msl->base_va, msl->len), addr));
                      }
               }


  1.  With NR_PAGES As ‘X’ and FREE_PAGES as 0, we create MBUF pools using RTE API and we face crash (We take care ‘X” pages has multi pools based on memory fragmentation given to the primary process, so pool by pool we have confidence that DPDK should find physical memory segment contiguous and allocate successfully)

                    struct rte_mempool *pool;

                   pool = rte_pktmbuf_pool_create(name, num_mbufs,
                                   RTE_MIN(num_mbufs/4, MBUF_CACHE_SIZE),
                                   MBUF_PRIV_SIZE,
                                   frame_len + RTE_PKTMBUF_HEADROOM,
                                   rte_socket_id()



  1.  Sometimes randomly we face a crash during pool create in Step 5 for each POOL stored in Step 2 process for all ports and queues initialize later on

               DPDK EAL core comes with BT like this

                 #6  sigcrash (signo=11, info=0x7fff1c1867f0, ctx=0x7fff1c1866c0)
    #7
    #8  malloc_elem_can_hold () from ./usr/lib64/dpdk-19/librte_eal.so.20.0
    #9  find_suitable_element () from ./usr/lib64/dpdk-19/librte_eal.so.20.0
    #10  malloc_heap_alloc () from ./usr/lib64/dpdk-19/librte_eal.so.20.0
    #11  rte_malloc_socket () from ./usr/lib64/dpdk-19/librte_eal.so.20.0
    #12  rte_mempool_create_empty () from ./usr/lib64/dpdk-19/librte_mempool.so.20.0
    #13  rte_pktmbuf_pool_create_by_ops () from ./usr/lib64/dpdk-19/librte_mbuf.so.20.0
    #14  rte_pktmbuf_pool_create () from ./usr/lib64/dpdk-19/librte_mbuf.so.20.0
    #15  dpdk_create_mbuf_pool (mem_chunk_tbl=0x555d556de8e0 , num_mbufs=46080, frame_len=2048, name=0x7fff1c1873c0 "DPDK_POOL_0")

                We see find suitable element does not able to find a suitable element in DPDK memory segment lists it searches for HEAP ALLOC and returns NULL and NULL dereference crashes  boot up process

Please let me know any comments on boot up process for 1-6 and any reason behind the crash ?

We are suspecting Step 4 where FBARRAY unused pages freeing at last should free the least contiguous memory segments right ? (munmap after finding 2 free pages , entire length we unmap in step 4 to free virtual memory)

Please let me know thoughts on FBARRAY design, is it expected to map the most contiguous…..least contiguous in a virtual address space right ?

So our most contiguous segments in Step 2 is safe even after Step 3, Step 4 we believe. Please correct my understanding if anything wrong.


Thanks
Umakiran


[-- Attachment #2: Type: text/html, Size: 36236 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: DPDK 19.11.5 Legacy Memory Design Query
  2022-09-21  6:50 ` Umakiran Godavarthi (ugodavar)
@ 2022-09-22  8:08   ` Umakiran Godavarthi (ugodavar)
  2022-09-22  9:00     ` Dmitry Kozlyuk
  0 siblings, 1 reply; 11+ messages in thread
From: Umakiran Godavarthi (ugodavar) @ 2022-09-22  8:08 UTC (permalink / raw)
  To: anatoly.burakov, dev, stephen

[-- Attachment #1: Type: text/plain, Size: 6758 bytes --]

Hi Stephen/Anatoly Burakov

Please help us in below RTE EAL CRASH reason is it we are not getting segments for POOL creation or a bug in DPDK 19.11.5 ?

We are blocked !!!!

Thanks
Umakiran

From: Umakiran Godavarthi (ugodavar) <ugodavar@cisco.com>
Date: Wednesday, 21 September 2022 at 12:20 PM
To: anatoly.burakov@intel.com <anatoly.burakov@intel.com>, dev@dpdk.org <dev@dpdk.org>
Subject: Re: DPDK 19.11.5 Legacy Memory Design Query
Hi Team,
Have sent a message to DPDK alias. Can you please have a look and share
your thoughts on this ?

Please reply on legacy memory design and thoughts on the crash reason ?


                 #6  sigcrash (signo=11, info=0x7fff1c1867f0, ctx=0x7fff1c1866c0)
    #7
    #8  malloc_elem_can_hold () from ./usr/lib64/dpdk-19/librte_eal.so.20.0
    #9  find_suitable_element () from ./usr/lib64/dpdk-19/librte_eal.so.20.0
    #10  malloc_heap_alloc () from ./usr/lib64/dpdk-19/librte_eal.so.20.0
    #11  rte_malloc_socket () from ./usr/lib64/dpdk-19/librte_eal.so.20.0
    #12  rte_mempool_create_empty () from ./usr/lib64/dpdk-19/librte_mempool.so.20.0
    #13  rte_pktmbuf_pool_create_by_ops () from ./usr/lib64/dpdk-19/librte_mbuf.so.20.0
    #14  rte_pktmbuf_pool_create () from ./usr/lib64/dpdk-19/librte_mbuf.so.20.0
    #15  dpdk_create_mbuf_pool (mem_chunk_tbl=0x555d556de8e0 , num_mbufs=46080, frame_len=2048, name=0x7fff1c1873c0 "DPDK_POOL_0")

Is there a bug in DPDK 19.11.5 it crashes when it searches for pages for POOL creation

Please let us know this is second email

Thanks
Umakiran

From: Umakiran Godavarthi (ugodavar) <ugodavar@cisco.com>
Date: Wednesday, 14 September 2022 at 1:00 PM
To: anatoly.burakov@intel.com <anatoly.burakov@intel.com>, dev@dpdk.org <dev@dpdk.org>
Subject: DPDK 19.11.5 Legacy Memory Design Query
Hi Anatoly/DPDK-Developers

I am working on DPDK 19.11.5 Legacy Memory design and have a query about how to boot up in Legacy memory mode.


  1.
Linux kernel boots up with huge pages (‘N’) and free huge pages (‘N’) initially


  1.  We calculate huge pages we need for data path (Driver need buffers for all queues) by sorting the memory fragments 2MB huge pages fragments and find we need ‘X’ pages , so we go to kernel and set SYSFS attribute nr_hugepages (/proc/sys/vm/nr_hugepages) to X pages.   We also store the sorted physical memory fragments as POOL_0, POOL_1, POOL_2….etc. (Just for pool count purpose enough for all ports and queues to initialize)

For example, if host has memory pattern huge pages like this for total 500 we get in step 1 kernel reservation.

250, 90, 80, 70 , 10 -> Sum is 500 (N pages)

We need only 350 pages (350 based on no of ports, queues dpdk application needs)

So we need 250, 90, 10.

So total 3 pools POOL_0 -> 250 pages, POOL_1 -> 90, POOL_2 -> 10 pages


  1.  We boot up DPDK by RTE_EAL_INIT


  1.  Then we go to DPDK Memory segment list walkthrough and for each FBARRAY , we find the used pages by DPDK and unmap the remaining pages by below code (Idea is to free the huge pages taken by DPDK process virtual memory) -> Free_HP will be 0 then, as X pages are used by DPDK and all unnecessary pages are freed in this step)
Sample Code of 4 :

              rte_memseg_list_walk_thread_unsafe(dpdk_find_and_free_unused, NULL); ->DPDK_FIND_AND_FREE_UNUSED is called for each Memory segment list (FBARRAY pointer is derived from MSL like below)

              dpdk_find_and_free_unused(const struct rte_memseg_list *msl,
                                          void *arg UNUSED)
               {
                      Int ms_idx;
                       arr = (struct rte_fbarray *) &msl->memseg_arr;

                        /*
                         * use size of 2 instead of 1 to find the next free slot but
                        * not hole.
                        */
                     ms_idx = rte_fbarray_find_next_n_free(arr, 0, 2);

                     if (ms_idx >= 0) {
                         addr = RTE_PTR_ADD(msl->base_va, ms_idx * msl->page_sz);
                            munmap(addr, RTE_PTR_DIFF(RTE_PTR_ADD(msl->base_va, msl->len), addr));
                      }
               }


  1.  With NR_PAGES As ‘X’ and FREE_PAGES as 0, we create MBUF pools using RTE API and we face crash (We take care ‘X” pages has multi pools based on memory fragmentation given to the primary process, so pool by pool we have confidence that DPDK should find physical memory segment contiguous and allocate successfully)

                    struct rte_mempool *pool;

                   pool = rte_pktmbuf_pool_create(name, num_mbufs,
                                   RTE_MIN(num_mbufs/4, MBUF_CACHE_SIZE),
                                   MBUF_PRIV_SIZE,
                                   frame_len + RTE_PKTMBUF_HEADROOM,
                                   rte_socket_id()



  1.  Sometimes randomly we face a crash during pool create in Step 5 for each POOL stored in Step 2 process for all ports and queues initialize later on

               DPDK EAL core comes with BT like this

                 #6  sigcrash (signo=11, info=0x7fff1c1867f0, ctx=0x7fff1c1866c0)
    #7
    #8  malloc_elem_can_hold () from ./usr/lib64/dpdk-19/librte_eal.so.20.0
    #9  find_suitable_element () from ./usr/lib64/dpdk-19/librte_eal.so.20.0
    #10  malloc_heap_alloc () from ./usr/lib64/dpdk-19/librte_eal.so.20.0
    #11  rte_malloc_socket () from ./usr/lib64/dpdk-19/librte_eal.so.20.0
    #12  rte_mempool_create_empty () from ./usr/lib64/dpdk-19/librte_mempool.so.20.0
    #13  rte_pktmbuf_pool_create_by_ops () from ./usr/lib64/dpdk-19/librte_mbuf.so.20.0
    #14  rte_pktmbuf_pool_create () from ./usr/lib64/dpdk-19/librte_mbuf.so.20.0
    #15  dpdk_create_mbuf_pool (mem_chunk_tbl=0x555d556de8e0 , num_mbufs=46080, frame_len=2048, name=0x7fff1c1873c0 "DPDK_POOL_0")

                We see find suitable element does not able to find a suitable element in DPDK memory segment lists it searches for HEAP ALLOC and returns NULL and NULL dereference crashes  boot up process

Please let me know any comments on boot up process for 1-6 and any reason behind the crash ?

We are suspecting Step 4 where FBARRAY unused pages freeing at last should free the least contiguous memory segments right ? (munmap after finding 2 free pages , entire length we unmap in step 4 to free virtual memory)

Please let me know thoughts on FBARRAY design, is it expected to map the most contiguous…..least contiguous in a virtual address space right ?

So our most contiguous segments in Step 2 is safe even after Step 3, Step 4 we believe. Please correct my understanding if anything wrong.


Thanks
Umakiran


[-- Attachment #2: Type: text/html, Size: 31889 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: DPDK 19.11.5 Legacy Memory Design Query
  2022-09-22  8:08   ` Umakiran Godavarthi (ugodavar)
@ 2022-09-22  9:00     ` Dmitry Kozlyuk
  2022-09-23 11:20       ` Umakiran Godavarthi (ugodavar)
  0 siblings, 1 reply; 11+ messages in thread
From: Dmitry Kozlyuk @ 2022-09-22  9:00 UTC (permalink / raw)
  To: Umakiran Godavarthi (ugodavar); +Cc: anatoly.burakov, dev, stephen

Hi Umakiran,

> From: Umakiran Godavarthi (ugodavar) <ugodavar@cisco.com>
> Date: Wednesday, 14 September 2022 at 1:00 PM
[...]
>   1.  Then we go to DPDK Memory segment list walkthrough and for each FBARRAY , we find the used pages by DPDK and unmap the remaining pages by below code (Idea is to free the huge pages taken by DPDK process virtual memory) -> Free_HP will be 0 then, as X pages are used by DPDK and all unnecessary pages are freed in this step)
> Sample Code of 4 :
> 
>               rte_memseg_list_walk_thread_unsafe(dpdk_find_and_free_unused, NULL); ->DPDK_FIND_AND_FREE_UNUSED is called for each Memory segment list (FBARRAY pointer is derived from MSL like below)
> 
>               dpdk_find_and_free_unused(const struct rte_memseg_list *msl,
>                                           void *arg UNUSED)
>                {
>                       Int ms_idx;
>                        arr = (struct rte_fbarray *) &msl->memseg_arr;
> 
>                         /*
>                          * use size of 2 instead of 1 to find the next free slot but
>                         * not hole.
>                         */
>                      ms_idx = rte_fbarray_find_next_n_free(arr, 0, 2);
> 
>                      if (ms_idx >= 0) {
>                          addr = RTE_PTR_ADD(msl->base_va, ms_idx * msl->page_sz);
>                             munmap(addr, RTE_PTR_DIFF(RTE_PTR_ADD(msl->base_va, msl->len), addr));
>                       }
>                }

You unmap memory, but you do not maintain DPDK memory management structures,
that is, DPDK does not know that this page is no longer usable.
Probably this is the reason for the crash.
You could print regions you're unmapping and the segfault address to confirm.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: DPDK 19.11.5 Legacy Memory Design Query
  2022-09-22  9:00     ` Dmitry Kozlyuk
@ 2022-09-23 11:20       ` Umakiran Godavarthi (ugodavar)
  2022-09-23 11:47         ` Dmitry Kozlyuk
  0 siblings, 1 reply; 11+ messages in thread
From: Umakiran Godavarthi (ugodavar) @ 2022-09-23 11:20 UTC (permalink / raw)
  To: Dmitry Kozlyuk; +Cc: anatoly.burakov, dev, stephen

[-- Attachment #1: Type: text/plain, Size: 4338 bytes --]

Hi Dmitry

My answer for your reply

You unmap memory, but you do not maintain DPDK memory management structures,
that is, DPDK does not know that this page is no longer usable.
Probably this is the reason for the crash.
You could print regions you're unmapping and the segfault address to confirm.

[Uma] :  Yes we are unmapping the entire range hoping all are free inside DPDK and DPDK heaps never use these pages.

Suppose we have 400 pages total free_hp, we want only 252 pages , so we reduce nr_pages to 252.

So we assume 253 to 400 inside DPDK are free as we nr_pages are made by my application as 252.

ms_idx = rte_fbarray_find_next_n_free(arr, 0, 2); -> 253 comes
ms_check_idx = rte_fbarray_find_next_n_free(arr, 0, RTE_PTR_DIFF(RTE_PTR_ADD(msl->base_va, msl->len), addr)/msl->page_sz); -> 253 comes (should be same as above)
ms_next_idx =  rte_fbarray_find_next_used(arr, ms_idx); -> This comes -1 as NO USED page is there (<0)

Hence we call unmap like -> munmap(addr, RTE_PTR_DIFF(RTE_PTR_ADD(msl->base_va, msl->len), addr));

Please let us know how to check in DPDK free heaps or FBARRAY that these pages we are freeing are really safe ? (253 to 400 unwanted pages by our application, other than above 3 checks)

If it’s not safe to free, how to inform DPDK to free those pages in FBARRAY and also clean up its heap list so that it never crashes !!

We are suspecting this code below  hits NULL crash or invalid address reference for us

static struct malloc_elem *
find_suitable_element(struct malloc_heap *heap, size_t size,
                unsigned int flags, size_t align, size_t bound, bool contig)
{
        size_t idx;
        struct malloc_elem *elem, *alt_elem = NULL;

        for (idx = malloc_elem_free_list_index(size);
                        idx < RTE_HEAP_NUM_FREELISTS; idx++) {
                for (elem = LIST_FIRST(&heap->free_head[idx]);   -> We are suspecting elem is invalid address and hence crashed !!!!
                                !!elem; elem = LIST_NEXT(elem, free_list)) {
                        if (malloc_elem_can_hold(elem, size, align, bound,
                                        contig)) {
Thanks
Umakiran


From: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Date: Thursday, 22 September 2022 at 2:31 PM
To: Umakiran Godavarthi (ugodavar) <ugodavar@cisco.com>
Cc: anatoly.burakov@intel.com <anatoly.burakov@intel.com>, dev@dpdk.org <dev@dpdk.org>, stephen@networkplumber.org <stephen@networkplumber.org>
Subject: Re: DPDK 19.11.5 Legacy Memory Design Query
Hi Umakiran,

> From: Umakiran Godavarthi (ugodavar) <ugodavar@cisco.com>
> Date: Wednesday, 14 September 2022 at 1:00 PM
[...]
>   1.  Then we go to DPDK Memory segment list walkthrough and for each FBARRAY , we find the used pages by DPDK and unmap the remaining pages by below code (Idea is to free the huge pages taken by DPDK process virtual memory) -> Free_HP will be 0 then, as X pages are used by DPDK and all unnecessary pages are freed in this step)
> Sample Code of 4 :
>
>               rte_memseg_list_walk_thread_unsafe(dpdk_find_and_free_unused, NULL); ->DPDK_FIND_AND_FREE_UNUSED is called for each Memory segment list (FBARRAY pointer is derived from MSL like below)
>
>               dpdk_find_and_free_unused(const struct rte_memseg_list *msl,
>                                           void *arg UNUSED)
>                {
>                       Int ms_idx;
>                        arr = (struct rte_fbarray *) &msl->memseg_arr;
>
>                         /*
>                          * use size of 2 instead of 1 to find the next free slot but
>                         * not hole.
>                         */
>                      ms_idx = rte_fbarray_find_next_n_free(arr, 0, 2);
>
>                      if (ms_idx >= 0) {
>                          addr = RTE_PTR_ADD(msl->base_va, ms_idx * msl->page_sz);
>                             munmap(addr, RTE_PTR_DIFF(RTE_PTR_ADD(msl->base_va, msl->len), addr));
>                       }
>                }

You unmap memory, but you do not maintain DPDK memory management structures,
that is, DPDK does not know that this page is no longer usable.
Probably this is the reason for the crash.
You could print regions you're unmapping and the segfault address to confirm.

[-- Attachment #2: Type: text/html, Size: 12046 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: DPDK 19.11.5 Legacy Memory Design Query
  2022-09-23 11:20       ` Umakiran Godavarthi (ugodavar)
@ 2022-09-23 11:47         ` Dmitry Kozlyuk
  2022-09-23 12:12           ` Umakiran Godavarthi (ugodavar)
  0 siblings, 1 reply; 11+ messages in thread
From: Dmitry Kozlyuk @ 2022-09-23 11:47 UTC (permalink / raw)
  To: Umakiran Godavarthi (ugodavar); +Cc: anatoly.burakov, dev, stephen

2022-09-23 11:20 (UTC+0000), Umakiran Godavarthi (ugodavar): 
> [Uma] :  Yes we are unmapping the entire range hoping all are free inside DPDK and DPDK heaps never use these pages.
> 
> Suppose we have 400 pages total free_hp, we want only 252 pages , so we reduce nr_pages to 252.
> 
> So we assume 253 to 400 inside DPDK are free as we nr_pages are made by my application as 252.
> 
> ms_idx = rte_fbarray_find_next_n_free(arr, 0, 2); -> 253 comes
> ms_check_idx = rte_fbarray_find_next_n_free(arr, 0, RTE_PTR_DIFF(RTE_PTR_ADD(msl->base_va, msl->len), addr)/msl->page_sz); -> 253 comes (should be same as above)
> ms_next_idx =  rte_fbarray_find_next_used(arr, ms_idx); -> This comes -1 as NO USED page is there (<0)
> 
> Hence we call unmap like -> munmap(addr, RTE_PTR_DIFF(RTE_PTR_ADD(msl->base_va, msl->len), addr));
> 
> Please let us know how to check in DPDK free heaps or FBARRAY that these pages we are freeing are really safe ? (253 to 400 unwanted pages by our application, other than above 3 checks)
> 
> If it’s not safe to free, how to inform DPDK to free those pages in FBARRAY and also clean up its heap list so that it never crashes !!

There still DPDK malloc internal structures that you cannot adjust.
I suggest going another way:
Instead of letting DPDK allocate all hugepages and unmapping some,
allow DPDK to allocate an absolute minimum (1 x 2MB page?)
and add all the rest you need via rte_extmem_*() API.

Why do you need legacy mode in the first place?
Looks like you're painfully trying to achieve the same result
that dynamic mode would give you automatically.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: DPDK 19.11.5 Legacy Memory Design Query
  2022-09-23 11:47         ` Dmitry Kozlyuk
@ 2022-09-23 12:12           ` Umakiran Godavarthi (ugodavar)
  2022-09-23 13:10             ` Dmitry Kozlyuk
  0 siblings, 1 reply; 11+ messages in thread
From: Umakiran Godavarthi (ugodavar) @ 2022-09-23 12:12 UTC (permalink / raw)
  To: Dmitry Kozlyuk; +Cc: anatoly.burakov, dev, stephen

[-- Attachment #1: Type: text/plain, Size: 3467 bytes --]

Thanks Dmitry for quick turn around

Yes below are my answers


There still DPDK malloc internal structures that you cannot adjust.
I suggest going another way:
Instead of letting DPDK allocate all hugepages and unmapping some,
allow DPDK to allocate an absolute minimum (1 x 2MB page?)
and add all the rest you need via rte_extmem_*() API.

[Uma] :  Yes I agree if free_hp = 400, nr_hp = 252, we are expecting DPDK takes only 252 and keep the remaining pages free in its heap.
               As you have mentioned just boot DPDK with 1 page, and add pages we want later, is this the steps

  1.  NR_HP =1 , FREE_HP = 1
  2.  EAL INIT (DPDK boots up with 1 2 MB page)
  3.  What is the API for adding later on pages ? (rte_extmem_*, can you please give the full API details and how to call it with arguments )

We can do 1,2,3 there is a problem once we reduce pages to 1 , kernel will free the huge pages totally

So is there a way not to touch NR_HP, FREE_HP, and just pass arguments to boot DPDK with just 1 page ? Please let us know and later add pages we need to DPDK !!

Why do you need legacy mode in the first place?
Looks like you're painfully trying to achieve the same result
that dynamic mode would give you automatically.

[Uma] : Yes , we can’t avoid legacy memory design as secondary process mapped page by page to primary process , and physical addr space is same for both processes.  We have to stick to legacy memory design only for now !!

Thanks
Umakiran

From: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Date: Friday, 23 September 2022 at 5:17 PM
To: Umakiran Godavarthi (ugodavar) <ugodavar@cisco.com>
Cc: anatoly.burakov@intel.com <anatoly.burakov@intel.com>, dev@dpdk.org <dev@dpdk.org>, stephen@networkplumber.org <stephen@networkplumber.org>
Subject: Re: DPDK 19.11.5 Legacy Memory Design Query
2022-09-23 11:20 (UTC+0000), Umakiran Godavarthi (ugodavar):
> [Uma] :  Yes we are unmapping the entire range hoping all are free inside DPDK and DPDK heaps never use these pages.
>
> Suppose we have 400 pages total free_hp, we want only 252 pages , so we reduce nr_pages to 252.
>
> So we assume 253 to 400 inside DPDK are free as we nr_pages are made by my application as 252.
>
> ms_idx = rte_fbarray_find_next_n_free(arr, 0, 2); -> 253 comes
> ms_check_idx = rte_fbarray_find_next_n_free(arr, 0, RTE_PTR_DIFF(RTE_PTR_ADD(msl->base_va, msl->len), addr)/msl->page_sz); -> 253 comes (should be same as above)
> ms_next_idx =  rte_fbarray_find_next_used(arr, ms_idx); -> This comes -1 as NO USED page is there (<0)
>
> Hence we call unmap like -> munmap(addr, RTE_PTR_DIFF(RTE_PTR_ADD(msl->base_va, msl->len), addr));
>
> Please let us know how to check in DPDK free heaps or FBARRAY that these pages we are freeing are really safe ? (253 to 400 unwanted pages by our application, other than above 3 checks)
>
> If it’s not safe to free, how to inform DPDK to free those pages in FBARRAY and also clean up its heap list so that it never crashes !!

There still DPDK malloc internal structures that you cannot adjust.
I suggest going another way:
Instead of letting DPDK allocate all hugepages and unmapping some,
allow DPDK to allocate an absolute minimum (1 x 2MB page?)
and add all the rest you need via rte_extmem_*() API.

Why do you need legacy mode in the first place?
Looks like you're painfully trying to achieve the same result
that dynamic mode would give you automatically.

[-- Attachment #2: Type: text/html, Size: 8480 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: DPDK 19.11.5 Legacy Memory Design Query
  2022-09-23 12:12           ` Umakiran Godavarthi (ugodavar)
@ 2022-09-23 13:10             ` Dmitry Kozlyuk
  2022-09-26 12:55               ` Umakiran Godavarthi (ugodavar)
  0 siblings, 1 reply; 11+ messages in thread
From: Dmitry Kozlyuk @ 2022-09-23 13:10 UTC (permalink / raw)
  To: Umakiran Godavarthi (ugodavar); +Cc: anatoly.burakov, dev, stephen

2022-09-23 12:12 (UTC+0000), Umakiran Godavarthi (ugodavar): 
> [Uma] :  Yes I agree if free_hp = 400, nr_hp = 252, we are expecting DPDK takes only 252 and keep the remaining pages free in its heap.
>                As you have mentioned just boot DPDK with 1 page, and add pages we want later, is this the steps
> 
>   1.  NR_HP =1 , FREE_HP = 1
>   2.  EAL INIT (DPDK boots up with 1 2 MB page)
>   3.  What is the API for adding later on pages ? (rte_extmem_*, can you please give the full API details and how to call it with arguments )

Guide:

    https://doc.dpdk.org/guides-19.11/prog_guide/env_abstraction_layer.html#support-for-externally-allocated-memory

I recommend reading the entire section about DPDK memory management
since you're going to use an uncommon API
and should understand what's going on.

API (the linked function and those following it):

    http://doc.dpdk.org/api-19.11/rte__malloc_8h.html#a2295623c85ba41fe5bf7dce6bf0393d6

    http://doc.dpdk.org/api-19.11/rte__memory_8h.html#a653510fb0c58bf63f54708677e3a2eba

> We can do 1,2,3 there is a problem once we reduce pages to 1 , kernel will free the huge pages totally
> 
> So is there a way not to touch NR_HP, FREE_HP, and just pass arguments to boot DPDK with just 1 page ? Please let us know and later add pages we need to DPDK !!

See --socket-mem EAL option:

    http://doc.dpdk.org/guides-19.11/linux_gsg/linux_eal_parameters.html#id3

> Why do you need legacy mode in the first place?
> Looks like you're painfully trying to achieve the same result
> that dynamic mode would give you automatically.
> 
> [Uma] : Yes , we can’t avoid legacy memory design as secondary process mapped page by page to primary process , and physical addr space is same for both processes.  We have to stick to legacy memory design only for now !!

Sorry, I still don't understand.
Virtual and physical addresses of DPDK memory are the same across processes
in both legacy and dynamic memory mode.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: DPDK 19.11.5 Legacy Memory Design Query
  2022-09-23 13:10             ` Dmitry Kozlyuk
@ 2022-09-26 12:55               ` Umakiran Godavarthi (ugodavar)
  2022-09-26 13:06                 ` Umakiran Godavarthi (ugodavar)
  0 siblings, 1 reply; 11+ messages in thread
From: Umakiran Godavarthi (ugodavar) @ 2022-09-26 12:55 UTC (permalink / raw)
  To: Dmitry Kozlyuk; +Cc: anatoly.burakov, dev, stephen

[-- Attachment #1: Type: text/plain, Size: 4495 bytes --]

Thanks @Dmitry Kozlyuk<mailto:dmitry.kozliuk@gmail.com> for your suggestions

I will try the following for DPDK pool creation

My logic of calculating MBUF’s remains same

Saw this code in DPDK testpmd where External heap memory is used

case MP_ALLOC_XMEM_HUGE:
                {
                        int heap_socket;
                        bool huge = mp_alloc_type == MP_ALLOC_XMEM_HUGE;

                        if (setup_extmem(nb_mbuf, mbuf_seg_size, huge) < 0)
                                rte_exit(EXIT_FAILURE, "Could not create external memory\n");

                        heap_socket =
                                rte_malloc_heap_get_socket(EXTMEM_HEAP_NAME);
                        if (heap_socket < 0)
                                rte_exit(EXIT_FAILURE, "Could not get external memory socket ID\n");

                        TESTPMD_LOG(INFO, "preferred mempool ops selected: %s\n",
                                        rte_mbuf_best_mempool_ops());
                        rte_mp = rte_pktmbuf_pool_create(pool_name, nb_mbuf,
                                        mb_mempool_cache, 0, mbuf_seg_size,
                                        heap_socket);
                        break;
                }

So I will do the same


  1.  EAL Init
  2.  Calculate MBUF we need for our application
  3.  Then create pool using MP_ALLOC_XEM_HUGE

1,2,3 should work right , that should avoid heap corruption issues right ?

We have total 3 types pool creation

/*
 * Select mempool allocation type:
 * - native: use regular DPDK memory
 * - anon: use regular DPDK memory to create mempool, but populate using
 *         anonymous memory (may not be IOVA-contiguous)
 * - xmem: use externally allocated hugepage memory
 */

Instead of Freeing unused virtual memory in DPDK native way, we just create a pool with type XMEM_HUGE and add page by page to it like testpmd code

Please let me know 1,2,3 steps are good right, so we need to boot DPDK with socket mem

--socket-mem 2048 so that DPDK takes only 1 page natively and boots up right ?

Thanks
Umakiran


From: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Date: Friday, 23 September 2022 at 6:41 PM
To: Umakiran Godavarthi (ugodavar) <ugodavar@cisco.com>
Cc: anatoly.burakov@intel.com <anatoly.burakov@intel.com>, dev@dpdk.org <dev@dpdk.org>, stephen@networkplumber.org <stephen@networkplumber.org>
Subject: Re: DPDK 19.11.5 Legacy Memory Design Query
2022-09-23 12:12 (UTC+0000), Umakiran Godavarthi (ugodavar):
> [Uma] :  Yes I agree if free_hp = 400, nr_hp = 252, we are expecting DPDK takes only 252 and keep the remaining pages free in its heap.
>                As you have mentioned just boot DPDK with 1 page, and add pages we want later, is this the steps
>
>   1.  NR_HP =1 , FREE_HP = 1
>   2.  EAL INIT (DPDK boots up with 1 2 MB page)
>   3.  What is the API for adding later on pages ? (rte_extmem_*, can you please give the full API details and how to call it with arguments )

Guide:

    https://doc.dpdk.org/guides-19.11/prog_guide/env_abstraction_layer.html#support-for-externally-allocated-memory

I recommend reading the entire section about DPDK memory management
since you're going to use an uncommon API
and should understand what's going on.

API (the linked function and those following it):

    http://doc.dpdk.org/api-19.11/rte__malloc_8h.html#a2295623c85ba41fe5bf7dce6bf0393d6

    http://doc.dpdk.org/api-19.11/rte__memory_8h.html#a653510fb0c58bf63f54708677e3a2eba

> We can do 1,2,3 there is a problem once we reduce pages to 1 , kernel will free the huge pages totally
>
> So is there a way not to touch NR_HP, FREE_HP, and just pass arguments to boot DPDK with just 1 page ? Please let us know and later add pages we need to DPDK !!

See --socket-mem EAL option:

    http://doc.dpdk.org/guides-19.11/linux_gsg/linux_eal_parameters.html#id3

> Why do you need legacy mode in the first place?
> Looks like you're painfully trying to achieve the same result
> that dynamic mode would give you automatically.
>
> [Uma] : Yes , we can’t avoid legacy memory design as secondary process mapped page by page to primary process , and physical addr space is same for both processes.  We have to stick to legacy memory design only for now !!

Sorry, I still don't understand.
Virtual and physical addresses of DPDK memory are the same across processes
in both legacy and dynamic memory mode.

[-- Attachment #2: Type: text/html, Size: 15482 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: DPDK 19.11.5 Legacy Memory Design Query
  2022-09-26 12:55               ` Umakiran Godavarthi (ugodavar)
@ 2022-09-26 13:06                 ` Umakiran Godavarthi (ugodavar)
  2022-10-10 15:15                   ` Dmitry Kozlyuk
  0 siblings, 1 reply; 11+ messages in thread
From: Umakiran Godavarthi (ugodavar) @ 2022-09-26 13:06 UTC (permalink / raw)
  To: Dmitry Kozlyuk; +Cc: anatoly.burakov, dev, stephen

[-- Attachment #1: Type: text/plain, Size: 6144 bytes --]

Hi Dimitry

We know If the application does unmap, DPDK native heaps are not getting cleaned up  in “native: use regular DPDK memory”  memory type.

Can we get an API from DPDK community please to clean up the heaps given the VA and length to be freed ? Like below

https://doc.dpdk.org/api/rte__memory_8h.html#afb3c1f8be29fa15953cebdad3a9cd8eb

int rte_extmem_unregister
(
void *
va_addr,
size_t
len
)

Unregister external memory chunk with DPDK.
Note
Using this API is mutually exclusive with rte_malloc family of API's.
This API will not perform any DMA unmapping. It is expected that user will do that themselves.
Before calling this function, all other processes must call rte_extmem_detach to detach from the memory area.
Parameters
va_addr
Start of virtual area to unregister
len
Length of virtual area to unregister
Returns
·         0 on success
·         -1 in case of error, with rte_errno set to one of the following: EINVAL - one of the parameters was invalid ENOENT - memory chunk was not
If we get such API in native way , it will be good for us instead of changing the design and going all the way writing code for new memory type

Please suggest us can we get an API to clean up DPDK NATIVE POOLS if VA_ADDR and LEN is given ?

Thanks
Umakiran


From: Umakiran Godavarthi (ugodavar) <ugodavar@cisco.com>
Date: Monday, 26 September 2022 at 6:25 PM
To: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Cc: anatoly.burakov@intel.com <anatoly.burakov@intel.com>, dev@dpdk.org <dev@dpdk.org>, stephen@networkplumber.org <stephen@networkplumber.org>
Subject: Re: DPDK 19.11.5 Legacy Memory Design Query
Thanks @Dmitry Kozlyuk<mailto:dmitry.kozliuk@gmail.com> for your suggestions

I will try the following for DPDK pool creation

My logic of calculating MBUF’s remains same

Saw this code in DPDK testpmd where External heap memory is used

case MP_ALLOC_XMEM_HUGE:
                {
                        int heap_socket;
                        bool huge = mp_alloc_type == MP_ALLOC_XMEM_HUGE;

                        if (setup_extmem(nb_mbuf, mbuf_seg_size, huge) < 0)
                                rte_exit(EXIT_FAILURE, "Could not create external memory\n");

                        heap_socket =
                                rte_malloc_heap_get_socket(EXTMEM_HEAP_NAME);
                        if (heap_socket < 0)
                                rte_exit(EXIT_FAILURE, "Could not get external memory socket ID\n");

                        TESTPMD_LOG(INFO, "preferred mempool ops selected: %s\n",
                                        rte_mbuf_best_mempool_ops());
                        rte_mp = rte_pktmbuf_pool_create(pool_name, nb_mbuf,
                                        mb_mempool_cache, 0, mbuf_seg_size,
                                        heap_socket);
                        break;
                }

So I will do the same


  1.  EAL Init
  2.  Calculate MBUF we need for our application
  3.  Then create pool using MP_ALLOC_XEM_HUGE

1,2,3 should work right , that should avoid heap corruption issues right ?

We have total 3 types pool creation

/*
 * Select mempool allocation type:
 * - native: use regular DPDK memory
 * - anon: use regular DPDK memory to create mempool, but populate using
 *         anonymous memory (may not be IOVA-contiguous)
 * - xmem: use externally allocated hugepage memory
 */

Instead of Freeing unused virtual memory in DPDK native way, we just create a pool with type XMEM_HUGE and add page by page to it like testpmd code

Please let me know 1,2,3 steps are good right, so we need to boot DPDK with socket mem

--socket-mem 2048 so that DPDK takes only 1 page natively and boots up right ?

Thanks
Umakiran


From: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Date: Friday, 23 September 2022 at 6:41 PM
To: Umakiran Godavarthi (ugodavar) <ugodavar@cisco.com>
Cc: anatoly.burakov@intel.com <anatoly.burakov@intel.com>, dev@dpdk.org <dev@dpdk.org>, stephen@networkplumber.org <stephen@networkplumber.org>
Subject: Re: DPDK 19.11.5 Legacy Memory Design Query
2022-09-23 12:12 (UTC+0000), Umakiran Godavarthi (ugodavar):
> [Uma] :  Yes I agree if free_hp = 400, nr_hp = 252, we are expecting DPDK takes only 252 and keep the remaining pages free in its heap.
>                As you have mentioned just boot DPDK with 1 page, and add pages we want later, is this the steps
>
>   1.  NR_HP =1 , FREE_HP = 1
>   2.  EAL INIT (DPDK boots up with 1 2 MB page)
>   3.  What is the API for adding later on pages ? (rte_extmem_*, can you please give the full API details and how to call it with arguments )

Guide:

    https://doc.dpdk.org/guides-19.11/prog_guide/env_abstraction_layer.html#support-for-externally-allocated-memory

I recommend reading the entire section about DPDK memory management
since you're going to use an uncommon API
and should understand what's going on.

API (the linked function and those following it):

    http://doc.dpdk.org/api-19.11/rte__malloc_8h.html#a2295623c85ba41fe5bf7dce6bf0393d6

    http://doc.dpdk.org/api-19.11/rte__memory_8h.html#a653510fb0c58bf63f54708677e3a2eba

> We can do 1,2,3 there is a problem once we reduce pages to 1 , kernel will free the huge pages totally
>
> So is there a way not to touch NR_HP, FREE_HP, and just pass arguments to boot DPDK with just 1 page ? Please let us know and later add pages we need to DPDK !!

See --socket-mem EAL option:

    http://doc.dpdk.org/guides-19.11/linux_gsg/linux_eal_parameters.html#id3

> Why do you need legacy mode in the first place?
> Looks like you're painfully trying to achieve the same result
> that dynamic mode would give you automatically.
>
> [Uma] : Yes , we can’t avoid legacy memory design as secondary process mapped page by page to primary process , and physical addr space is same for both processes.  We have to stick to legacy memory design only for now !!

Sorry, I still don't understand.
Virtual and physical addresses of DPDK memory are the same across processes
in both legacy and dynamic memory mode.

[-- Attachment #2: Type: text/html, Size: 29462 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: DPDK 19.11.5 Legacy Memory Design Query
  2022-09-26 13:06                 ` Umakiran Godavarthi (ugodavar)
@ 2022-10-10 15:15                   ` Dmitry Kozlyuk
  0 siblings, 0 replies; 11+ messages in thread
From: Dmitry Kozlyuk @ 2022-10-10 15:15 UTC (permalink / raw)
  To: Umakiran Godavarthi (ugodavar); +Cc: anatoly.burakov, dev, stephen

Hi Umakiran,

Please quote what is needed and reply below the quotes.

2022-09-26 13:06 (UTC+0000), Umakiran Godavarthi (ugodavar):
> Hi Dimitry
> 
> We know If the application does unmap, DPDK native heaps are not getting cleaned up  in “native: use regular DPDK memory”  memory type.
> 
> Can we get an API from DPDK community please to clean up the heaps given the VA and length to be freed ? Like below
> 
> https://doc.dpdk.org/api/rte__memory_8h.html#afb3c1f8be29fa15953cebdad3a9cd8eb
> 
> int rte_extmem_unregister
> (
> void *
> va_addr,
> size_t
> len
> )
> 
> Unregister external memory chunk with DPDK.
> Note
> Using this API is mutually exclusive with rte_malloc family of API's.
> This API will not perform any DMA unmapping. It is expected that user will do that themselves.
> Before calling this function, all other processes must call rte_extmem_detach to detach from the memory area.
> Parameters
> va_addr
> Start of virtual area to unregister
> len
> Length of virtual area to unregister
> Returns
> ·         0 on success
> ·         -1 in case of error, with rte_errno set to one of the following: EINVAL - one of the parameters was invalid ENOENT - memory chunk was not
> If we get such API in native way , it will be good for us instead of changing the design and going all the way writing code for new memory type
> 
> Please suggest us can we get an API to clean up DPDK NATIVE POOLS if VA_ADDR and LEN is given ?

1. Let's clarify the terms first.

   Malloc heaps which store hugepages that an application can allocate.
   These are internal structures of the DPDK memory manager.
   Correct, unmapping memory without maintaining these structures
   leads to the issue you're trying to solve.

   Pools are allocated on top of some kind of memory.
   Note that "native", "xmem", etc. are specific to TestPMD app.
   To DPDK, "native" means memory from the DPDK allocator
   (pool memory can also come from outside of DPDK for example).

2. DPDK 19.11 is LTS and will be EOL soon [1].
   New APIs are only added to the upstream.

3. A new API needs rationale.
   I still don't see why you need legacy mode in your case.
   New API also would not fit well into the legacy mode idea:
   static memory layout.
   In dynamic memory mode, it would be useless, because unneeded pages
   are not allocated in the first place and freed once not used anymore.

[1]: https://core.dpdk.org/roadmap/#stable

> From: Umakiran Godavarthi (ugodavar) <ugodavar@cisco.com>
> Date: Monday, 26 September 2022 at 6:25 PM
> To: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
> Cc: anatoly.burakov@intel.com <anatoly.burakov@intel.com>, dev@dpdk.org <dev@dpdk.org>, stephen@networkplumber.org <stephen@networkplumber.org>
> Subject: Re: DPDK 19.11.5 Legacy Memory Design Query
> Thanks @Dmitry Kozlyuk<mailto:dmitry.kozliuk@gmail.com> for your suggestions
> 
> I will try the following for DPDK pool creation
> 
> My logic of calculating MBUF’s remains same
> 
> Saw this code in DPDK testpmd where External heap memory is used
> 
> case MP_ALLOC_XMEM_HUGE:
>                 {
>                         int heap_socket;
>                         bool huge = mp_alloc_type == MP_ALLOC_XMEM_HUGE;
> 
>                         if (setup_extmem(nb_mbuf, mbuf_seg_size, huge) < 0)
>                                 rte_exit(EXIT_FAILURE, "Could not create external memory\n");
> 
>                         heap_socket =
>                                 rte_malloc_heap_get_socket(EXTMEM_HEAP_NAME);
>                         if (heap_socket < 0)
>                                 rte_exit(EXIT_FAILURE, "Could not get external memory socket ID\n");
> 
>                         TESTPMD_LOG(INFO, "preferred mempool ops selected: %s\n",
>                                         rte_mbuf_best_mempool_ops());
>                         rte_mp = rte_pktmbuf_pool_create(pool_name, nb_mbuf,
>                                         mb_mempool_cache, 0, mbuf_seg_size,
>                                         heap_socket);
>                         break;
>                 }
> 
> So I will do the same
> 
> 
>   1.  EAL Init
>   2.  Calculate MBUF we need for our application
>   3.  Then create pool using MP_ALLOC_XEM_HUGE
> 
> 1,2,3 should work right , that should avoid heap corruption issues right ?

Yes, this snippet (and relevant functions there) is for your case.

> 
> We have total 3 types pool creation
> 
> /*
>  * Select mempool allocation type:
>  * - native: use regular DPDK memory
>  * - anon: use regular DPDK memory to create mempool, but populate using
>  *         anonymous memory (may not be IOVA-contiguous)
>  * - xmem: use externally allocated hugepage memory
>  */
> 
> Instead of Freeing unused virtual memory in DPDK native way, we just create a pool with type XMEM_HUGE and add page by page to it like testpmd code
> 
> Please let me know 1,2,3 steps are good right, so we need to boot DPDK with socket mem
> 
> --socket-mem 2048 so that DPDK takes only 1 page natively and boots up right ?

--socket-mem is in megabytes, so it's --socket-mem 2
and 2MB hugepages must be available.
Maybe even --no-huge will suit your case
since you allocate all hugepages yourself effectively.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-10-10 15:15 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-14  7:30 DPDK 19.11.5 Legacy Memory Design Query Umakiran Godavarthi (ugodavar)
2022-09-21  6:50 ` Umakiran Godavarthi (ugodavar)
2022-09-22  8:08   ` Umakiran Godavarthi (ugodavar)
2022-09-22  9:00     ` Dmitry Kozlyuk
2022-09-23 11:20       ` Umakiran Godavarthi (ugodavar)
2022-09-23 11:47         ` Dmitry Kozlyuk
2022-09-23 12:12           ` Umakiran Godavarthi (ugodavar)
2022-09-23 13:10             ` Dmitry Kozlyuk
2022-09-26 12:55               ` Umakiran Godavarthi (ugodavar)
2022-09-26 13:06                 ` Umakiran Godavarthi (ugodavar)
2022-10-10 15:15                   ` Dmitry Kozlyuk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).