DPDK patches and discussions
 help / color / mirror / Atom feed
From: David Marchand <david.marchand@redhat.com>
To: Aaron Conole <aconole@redhat.com>,
	Harry Van Haaren <harry.van.haaren@intel.com>
Cc: dev <dev@dpdk.org>
Subject: Re: [dpdk-dev] [RFC] service: stop lcore threads before 'finalize'
Date: Fri, 17 Jan 2020 09:17:43 +0100	[thread overview]
Message-ID: <CAJFAV8wCmtLUE6MLb6QZ76KZ_t7croR5cX1ODfa4cm7MUX1zHw@mail.gmail.com> (raw)
In-Reply-To: <f7two9rxjst.fsf@dhcp-25.97.bos.redhat.com>

On Thu, Jan 16, 2020 at 8:50 PM Aaron Conole <aconole@redhat.com> wrote:
>
> I've noticed an occasional segfault from the build system in the
> service_autotest and after talking with David (CC'd), it seems like it's
> due to the rte_service_finalize deleting the lcore_states object while
> active lcores are running.
>
> The below patch is an attempt to solve it by first reassigning all the
> lcores back to ROLE_RTE before releasing the memory.  There is probably
> a larger question for DPDK proper about actually closing the pending
> lcore threads, but that's a separate issue.  I've been running with the
> patch for a while, and haven't seen the crash anymore on my system.
>
> Thoughts?  Is it acceptable as-is?

Added this patch to my env, still reproducing the same issue after ~10-20 tries.
I added a breakpoint to service_lcore_uninit that is indeed caught
when exiting the test application (just wanted to make sure your
change was in my binary).


To reproduce:

I modified app/test/meson.build to have an explicit "-l 0-1" +
compiled with your patch.
Then, I started a dummy busyloop "while true; do true; done" in a
shell that I had pinned to core 1 (taskset -pc 1 $$).
Finally, started another shell (as root), pinned to cores 0-1 on my
laptop (taskset -pc 0,1 $$) and ran meson test --gdb  --repeat=10000
service_autotest

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff4922700 (LWP 8572)]
rte_service_runner_func (arg=<optimized out>) at
../lib/librte_eal/common/rte_service.c:458
458            cs->loops++;
A debugging session is active.

    Inferior 1 [process 8566] will be killed.

Quit anyway? (y or n) n
Not confirmed.
Missing separate debuginfos, use: debuginfo-install
elfutils-libelf-0.172-2.el7.x86_64 glibc-2.17-260.el7_6.6.x86_64
libgcc-4.8.5-36.el7_6.2.x86_64 libibverbs-17.2-3.el7.x86_64
libnl3-3.2.28-4.el7.x86_64 libpcap-1.5.3-11.el7.x86_64
numactl-libs-2.0.9-7.el7.x86_64 openssl-libs-1.0.2k-16.el7_6.1.x86_64
zlib-1.2.7-18.el7.x86_64
(gdb) info threads
  Id   Target Id         Frame
* 4    Thread 0x7ffff4922700 (LWP 8572) "lcore-slave-1"
rte_service_runner_func (arg=<optimized out>) at
../lib/librte_eal/common/rte_service.c:458
  3    Thread 0x7ffff5123700 (LWP 8571) "rte_mp_handle"
0x00007ffff63a4b4d in recvmsg () from /lib64/libpthread.so.0
  2    Thread 0x7ffff5924700 (LWP 8570) "eal-intr-thread"
0x00007ffff60c7603 in epoll_wait () from /lib64/libc.so.6
  1    Thread 0x7ffff7fd2c00 (LWP 8566) "dpdk-test" 0x00007ffff7deb96f
in _dl_name_match_p () from /lib64/ld-linux-x86-64.so.2
(gdb) bt
#0  rte_service_runner_func (arg=<optimized out>) at
../lib/librte_eal/common/rte_service.c:458
#1  0x0000000000b2c84f in eal_thread_loop (arg=<optimized out>) at
../lib/librte_eal/linux/eal/eal_thread.c:153
#2  0x00007ffff639ddd5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007ffff60c702d in clone () from /lib64/libc.so.6
(gdb) f 0
#0  rte_service_runner_func (arg=<optimized out>) at
../lib/librte_eal/common/rte_service.c:458
458            cs->loops++;
(gdb) p *cs
$1 = {service_mask = 0, runstate = 0 '\000', is_service_core = 0
'\000', service_active_on_lcore = '\000' <repeats 63 times>, loops =
0, calls_per_service = {0 <repeats 64 times>}}
(gdb) p lcore_config[1]
$2 = {thread_id = 140737296606976, pipe_master2slave = {14, 20},
pipe_slave2master = {21, 22}, f = 0xb26ec0 <rte_service_runner_func>,
arg = 0x0, ret = 0, state = RUNNING, socket_id = 0, core_id = 1,
  core_index = 1, core_role = 0 '\000', detected = 1 '\001', cpuset =
{__bits = {2, 0 <repeats 15 times>}}}
(gdb) p lcore_config[0]
$3 = {thread_id = 0, pipe_master2slave = {0, 0}, pipe_slave2master =
{0, 0}, f = 0x0, arg = 0x0, ret = 0, state = WAIT, socket_id = 0,
core_id = 0, core_index = 0, core_role = 0 '\000', detected = 1
'\001',
  cpuset = {__bits = {1, 0 <repeats 15 times>}}}

(gdb) thread 1
[Switching to thread 1 (Thread 0x7ffff7fd2c00 (LWP 8566))]
#0  0x00007ffff7deb96f in _dl_name_match_p () from /lib64/ld-linux-x86-64.so.2
(gdb) bt
#0  0x00007ffff7deb96f in _dl_name_match_p () from /lib64/ld-linux-x86-64.so.2
#1  0x00007ffff7de4756 in do_lookup_x () from /lib64/ld-linux-x86-64.so.2
#2  0x00007ffff7de4fcf in _dl_lookup_symbol_x () from
/lib64/ld-linux-x86-64.so.2
#3  0x00007ffff7de9d1e in _dl_fixup () from /lib64/ld-linux-x86-64.so.2
#4  0x00007ffff7df19da in _dl_runtime_resolve_xsavec () from
/lib64/ld-linux-x86-64.so.2
#5  0x00007ffff7deafba in _dl_fini () from /lib64/ld-linux-x86-64.so.2
#6  0x00007ffff6002c29 in __run_exit_handlers () from /lib64/libc.so.6
#7  0x00007ffff6002c77 in exit () from /lib64/libc.so.6
#8  0x00007ffff5feb49c in __libc_start_main () from /lib64/libc.so.6
#9  0x00000000004fa126 in _start ()


--
David Marchand


  reply	other threads:[~2020-01-17  8:17 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-16 19:50 Aaron Conole
2020-01-17  8:17 ` David Marchand [this message]
2020-02-04 13:34   ` David Marchand
2020-02-04 14:50     ` Aaron Conole
2020-02-10 14:16       ` Van Haaren, Harry
2020-02-10 14:42         ` David Marchand
2020-02-20 13:25         ` David Marchand
2020-02-21 12:28           ` Van Haaren, Harry
2020-03-10 13:04             ` David Marchand
2020-03-10 13:27               ` Van Haaren, Harry

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJFAV8wCmtLUE6MLb6QZ76KZ_t7croR5cX1ODfa4cm7MUX1zHw@mail.gmail.com \
    --to=david.marchand@redhat.com \
    --cc=aconole@redhat.com \
    --cc=dev@dpdk.org \
    --cc=harry.van.haaren@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).