DPDK usage discussions
 help / color / mirror / Atom feed
* [dpdk-users] dpdk-pdump prints "EAL: Error: Invalid memory"
@ 2021-08-04  7:14 Yan, Xiaoping (NSB - CN/Hangzhou)
  2021-08-10  5:34 ` Yan, Xiaoping (NSB - CN/Hangzhou)
  0 siblings, 1 reply; 2+ messages in thread
From: Yan, Xiaoping (NSB - CN/Hangzhou) @ 2021-08-04  7:14 UTC (permalink / raw)
  To: users

Hi,

After updating dpdk version from 19.11 to 20.11
dpdk-pdump prints such error:
EAL: Error: Invalid memory
Port 7 MAC: 02 70 63 61 70 03
core (2), capture for (1) tuples
- port 0 device ((null)) queue 65535
^C

--legacy-mem is used for both primary primary and dpdk-pdump.
With some debug, I find that  mlx5_mem_is_rte incorrectly consider this address from os memory ((addr=0x4482b80)) as rte address, so mlx5_free
calls rte_free() to free it and caused error.
And this seems to because len of some unused memsegs is not set to 0 (so rte_mem_virt2memseg_list(0x4482b80) returns a memseg).
Here is memsegs:
(gdb) p mcfg->memsegs
$3 = {{{base_va = 0x2aac0000000, addr_64 = 2932388921344}, page_sz = 1073741824, socket_id = 0,
    version = 0, len = 34359738368, external = 0, heap = 1, memseg_arr = {
      name = "memseg-1048576k-0-0", '\000' <repeats 44 times>, count = 5, len = 32,
      elt_sz = 48, data = 0x2aaa302e000, rwlock = {cnt = 0}}}, {{base_va = 0x0, addr_64 = 0},
    page_sz = 1073741824, socket_id = 0, version = 0, len = 34359738368, external = 0,
    heap = 0, memseg_arr = {name = '\000' <repeats 63 times>, count = 0, len = 0, elt_sz = 0,
      data = 0x0, rwlock = {cnt = 0}}}, {{base_va = 0x0, addr_64 = 0}, page_sz = 1073741824,
    socket_id = 1, version = 0, len = 34359738368, external = 0, heap = 0, memseg_arr = {
      name = '\000' <repeats 63 times>, count = 0, len = 0, elt_sz = 0, data = 0x0, rwlock = {
        cnt = 0}}}, {{base_va = 0x0, addr_64 = 0}, page_sz = 1073741824, socket_id = 1,
    version = 0, len = 34359738368, external = 0, heap = 0, memseg_arr = {
      name = '\000' <repeats 63 times>, count = 0, len = 0, elt_sz = 0, data = 0x0, rwlock = {
        cnt = 0}}}, {{base_va = 0x0, addr_64 = 0}, page_sz = 0, socket_id = 0, version = 0,
    len = 0, external = 0, heap = 0, memseg_arr = {name = '\000' <repeats 63 times>, count = 0,
      len = 0, elt_sz = 0, data = 0x0, rwlock = {cnt = 0}}} <repeats 124 times>}

Here is the stack trace
(gdb) bt
#0  mlx5_free (addr=0x4482b80) at ../dpdk-20.11/drivers/common/mlx5/mlx5_malloc.c:260
#1  0x0000000000706f5c in mlx5_mp_req_verbs_cmd_fd (mp_id=mp_id@entry=0x7ffcdb6d9e50)
    at ../dpdk-20.11/drivers/common/mlx5/mlx5_common_mp.c:140
#2  0x000000000050496f in mlx5_dev_spawn (config=0x7ffcdb6d9d70, spawn=0x2ab753799c0,
    dpdk_dev=0x4491400) at ../dpdk-20.11/drivers/net/mlx5/linux/mlx5_os.c:774
#3  mlx5_os_pci_probe (pci_drv=<optimized out>, pci_dev=<optimized out>)
    at ../dpdk-20.11/drivers/net/mlx5/linux/mlx5_os.c:2154
#4  0x0000000000708b5a in drivers_probe (user_classes=1, pci_dev=0x44913f0,
    pci_drv=0xe01800 <mlx5_pci_driver>, dev=0x2ab75379a80)
    at ../dpdk-20.11/drivers/common/mlx5/mlx5_common_pci.c:246
#5  mlx5_common_pci_probe (pci_drv=0xe01800 <mlx5_pci_driver>, pci_dev=0x44913f0)
    at ../dpdk-20.11/drivers/common/mlx5/mlx5_common_pci.c:308
#6  0x00000000004268f9 in rte_pci_probe_one_driver (dev=0x44913f0,
    dr=0xe01800 <mlx5_pci_driver>) at ../dpdk-20.11/drivers/bus/pci/pci_common.c:243
#7  pci_probe_all_drivers (dev=0x44913f0) at ../dpdk-20.11/drivers/bus/pci/pci_common.c:318
#8  pci_probe () at ../dpdk-20.11/drivers/bus/pci/pci_common.c:345
#9  0x00000000006bc4d3 in rte_bus_probe ()
    at ../dpdk-20.11/lib/librte_eal/common/eal_common_bus.c:72
#10 0x0000000000422304 in rte_eal_init (argc=argc@entry=16, argv=argv@entry=0x7ffcdb6da530)
    at ../dpdk-20.11/lib/librte_eal/linux/eal.c:1210
#11 0x000000000056fee9 in main (argc=16, argv=0x7ffcdb6da748)
    at ../dpdk-20.11/app/pdump/main.c:1118
(gdb) c
Continuing.

Thread 1 "dpdk-pdump" hit Breakpoint 5, rte_free (addr=0x4482b80)


It seems to me that below code piece from eal_legacy_hugepage_init should also set len to 0?
              /* we're not going to allocate more pages, so release VA space for
              * unused memseg lists
              */
              for (i = 0; i < RTE_MAX_MEMSEG_LISTS; i++) {
                             struct rte_memseg_list *msl = &mcfg->memsegs[i];
                             size_t mem_sz;

                             /* skip inactive lists */
                             if (msl->base_va == NULL)
                                           continue;
                             /* skip lists where there is at least one page allocated */
                             if (msl->memseg_arr.count > 0)
                                           continue;
                             /* this is an unused list, deallocate it */
                             mem_sz = msl->len;
                             munmap(msl->base_va, mem_sz);
                             msl->base_va = NULL;
                             // here, we should add msl->len = 0; ?
                             msl->heap = 0;

                             /* destroy backing fbarray */
                             rte_fbarray_destroy(&msl->memseg_arr);
              }

Any comment?
Thank you.


Best regards
Yan Xiaoping


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [dpdk-users] dpdk-pdump prints "EAL: Error: Invalid memory"
  2021-08-04  7:14 [dpdk-users] dpdk-pdump prints "EAL: Error: Invalid memory" Yan, Xiaoping (NSB - CN/Hangzhou)
@ 2021-08-10  5:34 ` Yan, Xiaoping (NSB - CN/Hangzhou)
  0 siblings, 0 replies; 2+ messages in thread
From: Yan, Xiaoping (NSB - CN/Hangzhou) @ 2021-08-10  5:34 UTC (permalink / raw)
  To: users

Hi,

I modified eal_legacy_hugepage_init and problem is solved.
Should the correction be added to dpdk upstream?

diff --git a/package/dpdk/dpdk-20.11/lib/librte_eal/linux/eal_memory.c b/package/dpdk/dpdk-20.11/lib/librte_eal/linux/eal_memory.c
index 03a4f2dd..89a13e91 100644
--- a/package/dpdk/dpdk-20.11/lib/librte_eal/linux/eal_memory.c
+++ b/package/dpdk/dpdk-20.11/lib/librte_eal/linux/eal_memory.c
@@ -1458,6 +1458,7 @@ eal_legacy_hugepage_init(void)
                mem_sz = msl->len;
                munmap(msl->base_va, mem_sz);
                msl->base_va = NULL;
+               msl->len = 0;
                msl->heap = 0;

                /* destroy backing fbarray */


Best regards
Yan Xiaoping

From: Yan, Xiaoping (NSB - CN/Hangzhou)
Sent: 2021年8月4日 15:14
To: 'users@dpdk.org' <users@dpdk.org>
Subject: dpdk-pdump prints "EAL: Error: Invalid memory"

Hi,

After updating dpdk version from 19.11 to 20.11
dpdk-pdump prints such error:
EAL: Error: Invalid memory
Port 7 MAC: 02 70 63 61 70 03
core (2), capture for (1) tuples
- port 0 device ((null)) queue 65535
^C

--legacy-mem is used for both primary primary and dpdk-pdump.
With some debug, I find that  mlx5_mem_is_rte incorrectly consider this address from os memory ((addr=0x4482b80)) as rte address, so mlx5_free
calls rte_free() to free it and caused error.
And this seems to because len of some unused memsegs is not set to 0 (so rte_mem_virt2memseg_list(0x4482b80) returns a memseg).
Here is memsegs:
(gdb) p mcfg->memsegs
$3 = {{{base_va = 0x2aac0000000, addr_64 = 2932388921344}, page_sz = 1073741824, socket_id = 0,
    version = 0, len = 34359738368, external = 0, heap = 1, memseg_arr = {
      name = "memseg-1048576k-0-0", '\000' <repeats 44 times>, count = 5, len = 32,
      elt_sz = 48, data = 0x2aaa302e000, rwlock = {cnt = 0}}}, {{base_va = 0x0, addr_64 = 0},
    page_sz = 1073741824, socket_id = 0, version = 0, len = 34359738368, external = 0,
    heap = 0, memseg_arr = {name = '\000' <repeats 63 times>, count = 0, len = 0, elt_sz = 0,
      data = 0x0, rwlock = {cnt = 0}}}, {{base_va = 0x0, addr_64 = 0}, page_sz = 1073741824,
    socket_id = 1, version = 0, len = 34359738368, external = 0, heap = 0, memseg_arr = {
      name = '\000' <repeats 63 times>, count = 0, len = 0, elt_sz = 0, data = 0x0, rwlock = {
        cnt = 0}}}, {{base_va = 0x0, addr_64 = 0}, page_sz = 1073741824, socket_id = 1,
    version = 0, len = 34359738368, external = 0, heap = 0, memseg_arr = {
      name = '\000' <repeats 63 times>, count = 0, len = 0, elt_sz = 0, data = 0x0, rwlock = {
        cnt = 0}}}, {{base_va = 0x0, addr_64 = 0}, page_sz = 0, socket_id = 0, version = 0,
    len = 0, external = 0, heap = 0, memseg_arr = {name = '\000' <repeats 63 times>, count = 0,
      len = 0, elt_sz = 0, data = 0x0, rwlock = {cnt = 0}}} <repeats 124 times>}

Here is the stack trace
(gdb) bt
#0  mlx5_free (addr=0x4482b80) at ../dpdk-20.11/drivers/common/mlx5/mlx5_malloc.c:260
#1  0x0000000000706f5c in mlx5_mp_req_verbs_cmd_fd (mp_id=mp_id@entry=0x7ffcdb6d9e50)
    at ../dpdk-20.11/drivers/common/mlx5/mlx5_common_mp.c:140
#2  0x000000000050496f in mlx5_dev_spawn (config=0x7ffcdb6d9d70, spawn=0x2ab753799c0,
    dpdk_dev=0x4491400) at ../dpdk-20.11/drivers/net/mlx5/linux/mlx5_os.c:774
#3  mlx5_os_pci_probe (pci_drv=<optimized out>, pci_dev=<optimized out>)
    at ../dpdk-20.11/drivers/net/mlx5/linux/mlx5_os.c:2154
#4  0x0000000000708b5a in drivers_probe (user_classes=1, pci_dev=0x44913f0,
    pci_drv=0xe01800 <mlx5_pci_driver>, dev=0x2ab75379a80)
    at ../dpdk-20.11/drivers/common/mlx5/mlx5_common_pci.c:246
#5  mlx5_common_pci_probe (pci_drv=0xe01800 <mlx5_pci_driver>, pci_dev=0x44913f0)
    at ../dpdk-20.11/drivers/common/mlx5/mlx5_common_pci.c:308
#6  0x00000000004268f9 in rte_pci_probe_one_driver (dev=0x44913f0,
    dr=0xe01800 <mlx5_pci_driver>) at ../dpdk-20.11/drivers/bus/pci/pci_common.c:243
#7  pci_probe_all_drivers (dev=0x44913f0) at ../dpdk-20.11/drivers/bus/pci/pci_common.c:318
#8  pci_probe () at ../dpdk-20.11/drivers/bus/pci/pci_common.c:345
#9  0x00000000006bc4d3 in rte_bus_probe ()
    at ../dpdk-20.11/lib/librte_eal/common/eal_common_bus.c:72
#10 0x0000000000422304 in rte_eal_init (argc=argc@entry=16, argv=argv@entry=0x7ffcdb6da530)
    at ../dpdk-20.11/lib/librte_eal/linux/eal.c:1210
#11 0x000000000056fee9 in main (argc=16, argv=0x7ffcdb6da748)
    at ../dpdk-20.11/app/pdump/main.c:1118
(gdb) c
Continuing.

Thread 1 "dpdk-pdump" hit Breakpoint 5, rte_free (addr=0x4482b80)


It seems to me that below code piece from eal_legacy_hugepage_init should also set len to 0?
              /* we're not going to allocate more pages, so release VA space for
              * unused memseg lists
              */
              for (i = 0; i < RTE_MAX_MEMSEG_LISTS; i++) {
                             struct rte_memseg_list *msl = &mcfg->memsegs[i];
                             size_t mem_sz;

                             /* skip inactive lists */
                             if (msl->base_va == NULL)
                                           continue;
                             /* skip lists where there is at least one page allocated */
                             if (msl->memseg_arr.count > 0)
                                           continue;
                             /* this is an unused list, deallocate it */
                             mem_sz = msl->len;
                             munmap(msl->base_va, mem_sz);
                             msl->base_va = NULL;
                             // here, we should add msl->len = 0; ?
                             msl->heap = 0;

                             /* destroy backing fbarray */
                             rte_fbarray_destroy(&msl->memseg_arr);
              }

Any comment?
Thank you.


Best regards
Yan Xiaoping


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2021-08-10  5:35 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-04  7:14 [dpdk-users] dpdk-pdump prints "EAL: Error: Invalid memory" Yan, Xiaoping (NSB - CN/Hangzhou)
2021-08-10  5:34 ` Yan, Xiaoping (NSB - CN/Hangzhou)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).