DPDK patches and discussions
 help / color / mirror / Atom feed
From: Olivier Matz <olivier.matz@6wind.com>
To: Tianli Lai <laitianli@tom.com>
Cc: dev@dpdk.org
Subject: Re: [PATCH] mempool: fix rte primary program coredump
Date: Thu, 27 Jan 2022 11:06:56 +0100	[thread overview]
Message-ID: <YfJuwGdRxi7QS+CG@platinum> (raw)
In-Reply-To: <1636559839-6553-1-git-send-email-laitianli@tom.com>

Hi Tianli,

On Wed, Nov 10, 2021 at 11:57:19PM +0800, Tianli Lai wrote:
> the primary program(such as ofp app) run first, then run the secondary
> program(such as dpdk-pdump), the primary program would receive signal
> SIGSEGV. the function stack as follow:
> 
> aived signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7fffee60e700 (LWP 112613)]
> 0x00007ffff5f2cc0b in bucket_stack_pop (stack=0xffff00010000) at
> /ofp/dpdk/drivers/mempool/bucket/rte_mempool_bucket.c:95
> 95      if (stack->top == 0)
> Missing separate debuginfos, use: debuginfo-install
> glibc-2.17-196.el7.x86_64 libatomic-4.8.5-16.el7.x86_64
> libconfig-1.4.9-5.el7.x86_64 libgcc-4.8.5-16.el7.x86_64
> libpcap-1.5.3-12.el7.x86_64 numactl-libs-2.0.9-6.el7_2.x86_64
> openssl-libs-1.0.2k-8.el7.x86_64 zlib-1.2.7-17.el7.x86_64
> (gdb) bt
>  #0  0x00007ffff5f2cc0b in bucket_stack_pop (stack=0xffff00010000) at /ofp/dpdk/drivers/mempool/bucket/rte_mempool_bucket.c:95
>  #1  0x00007ffff5f2e5dc in bucket_dequeue_orphans (bd=0x2209e5fac0,obj_table=0x220b083710, n_orphans=251) at /ofp/dpdk/drivers/mempool/bucket/rte_mempool_bucket.c:190
>  #2  0x00007ffff5f30192 in bucket_dequeue (mp=0x220b07d5c0,obj_table=0x220b083710, n=251) at /ofp/dpdk/drivers/mempool/bucket/rte_mempool_bucket.c:288
>  #3  0x00007ffff5f47e18 in rte_mempool_ops_dequeue_bulk (mp=0x220b07d5c0,obj_table=0x220b083710, n=251) at /ofp/dpdk/x86_64-native-linuxapp-gcc/include/rte_mempool.h:739
>  #4  0x00007ffff5f4819d in __mempool_generic_get (cache=0x220b083700, n=1, obj_table=0x7fffee5deb18, mp=0x220b07d5c0) at /ofp/dpdk/x86_64-native-linuxapp-gcc/include/rte_mempool.h:1443
>  #5  rte_mempool_generic_get (cache=0x220b083700, n=1, obj_table=0x7fffee5deb18, mp=0x220b07d5c0) at /ofp/dpdk/x86_64-native-linuxapp-gcc/include/rte_mempool.h:1506
>  #6  rte_mempool_get_bulk (n=1, obj_table=0x7fffee5deb18, mp=0x220b07d5c0) at /ofp/dpdk/x86_64-native-linuxapp-gcc/include/rte_mempool.h:1539
>  #7  rte_mempool_get (obj_p=0x7fffee5deb18, mp=0x220b07d5c0) at /ofp/dpdk/x86_64-native-linuxapp-gcc/include/rte_mempool.h:1565
>  #8  rte_mbuf_raw_alloc (mp=0x220b07d5c0) at /ofp/dpdk/x86_64-native-linuxapp-gcc/include/rte_mbuf.h:551
>  #9  0x00007ffff5f483a4 in rte_pktmbuf_alloc (mp=0x220b07d5c0) at /ofp/dpdk/x86_64-native-linuxapp-gcc/include/rte_mbuf.h:804
>  #10 0x00007ffff5f4c9d9 in pdump_pktmbuf_copy (m=0x220746ad80, mp=0x220b07d5c0) at /ofp/dpdk/lib/librte_pdump/rte_pdump.c:99
>  #11 0x00007ffff5f4e42e in pdump_copy (pkts=0x7fffee5dfdf0, nb_pkts=1, user_params=0x7ffff76d7cc0 <rx_cbs>) at /ofp/dpdk/lib/librte_pdump/rte_pdump.c:151
>  #12 0x00007ffff5f4eadd in pdump_rx (port=0, qidx=0, pkts=0x7fffee5dfdf0, nb_pkts=1, max_pkts=16, user_params=0x7ffff76d7cc0 <rx_cbs>) at /ofp/dpdk/lib/librte_pdump/rte_pdump.c:172
>  #13 0x00007ffff5d0e9e8 in rte_eth_rx_burst (port_id=0, queue_id=0, rx_pkts=0x7fffee5dfdf0, nb_pkts=16) at /ofp/dpdk/x86_64-native-linuxapp-gcc/usr/local/include/dpdk/rte_ethdev.h:4396
>  #14 0x00007ffff5d114c3 in recv_pkt_dpdk (pktio_entry=0x22005436c0, index=0, pkt_table=0x7fffee5dfdf0, num=16) at odp_packet_dpdk.c:1081
>  #15 0x00007ffff5d2f931 in odp_pktin_recv (queue=...,packets=0x7fffee5dfdf0, num=16) at ../linux-generic/odp_packet_io.c:1896
>  #16 0x000000000040a344 in rx_burst (pktin=...) at app_main.c:223
>  #17 0x000000000040aca4 in run_server_single (arg=0x7fffffffe2b0) at app_main.c:417
>  #18 0x00007ffff7bd6883 in run_thread (arg=0x7fffffffe3b8) at threads.c:67
>  #19 0x00007ffff53c8e25 in start_thread () from /lib64/libpthread.so.0
>  #20 0x00007ffff433e34d in clone () from /lib64/libc.so.6.c:67
> 
> The program crash down reason is:
> 
> In primary program and secondary program , the global array rte_mempool_ops.ops[]:
>         primary name            secondary name
>  [0]:   "bucket"                "ring_mp_mc"
>  [1]:   "dpaa"                  "ring_sp_sc"
>  [2]:   "dpaa2"                 "ring_mp_sc"
>  [3]:   "octeontx_fpavf"        "ring_sp_mc"
>  [4]:   "octeontx2_npa"         "octeontx2_npa"
>  [5]:   "ring_mp_mc"            "bucket"
>  [6]:   "ring_sp_sc"            "stack"
>  [7]:   "ring_mp_sc"            "if_stack"
>  [8]:   "ring_sp_mc"            "dpaa"
>  [9]:   "stack"                 "dpaa2"
>  [10]:  "if_stack"              "octeontx_fpavf"
>  [11]:  NULL                    NULL
> 
>  this array in primary program is different with secondary program.
>  so when secondary program call rte_pktmbuf_pool_create_by_ops() with
>  mempool name “ring_mp_mc”, but the primary program use "bucket" type
>  to alloc rte_mbuf.
> 
>  so sort this array both primary program and secondary program when init
>  memzone.
> 
> Signed-off-by: Tianli Lai <laitianli@tom.com>

I think it is the same problem than the one described here:
http://inbox.dpdk.org/dev/1583114253-15345-1-git-send-email-xiangxia.m.yue@gmail.com/#r

To summarize what is said in the thread, sorting ops look dangerous because it
changes the index during the lifetime of the application. A new proposal was
made to use a shared memory to ensure the indexes are the same in primary and
secondaries, but it requires some changes in EAL to have init callbacks at a
specific place.

I have a draft patchset that may fix this issue by using the vdev infrastructure
instead of a specific init, but it is not heavily tested. I can send it here as
a RFC if you want to try it.

One thing that is not clear to me is how do you trigger this issue? Why the
mempool ops are not loaded in the same order in primary and secondary?

Thanks,
Olivier

  parent reply	other threads:[~2022-01-27 10:07 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-10 15:57 Tianli Lai
2021-11-10 16:00 ` David Marchand
2021-11-10 16:07   ` laitianli
2021-11-10 17:15 ` Jerin Jacob
2022-01-27 10:06 ` Olivier Matz [this message]
2023-06-30 21:36   ` Stephen Hemminger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YfJuwGdRxi7QS+CG@platinum \
    --to=olivier.matz@6wind.com \
    --cc=dev@dpdk.org \
    --cc=laitianli@tom.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).