From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id D2105A051A; Fri, 17 Jan 2020 09:17:59 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 340661D176; Fri, 17 Jan 2020 09:17:59 +0100 (CET) Received: from us-smtp-1.mimecast.com (us-smtp-delivery-1.mimecast.com [205.139.110.120]) by dpdk.org (Postfix) with ESMTP id 436861D16D for ; Fri, 17 Jan 2020 09:17:57 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1579249076; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=XJlJCb/D3AEHML2q7gE6ZSZ1VQRtbiQZdcLGHV65Tyg=; b=HrufgNOUtXNQKD9R+IiklNM7sqU+ntxzKB1duQQ5wZjwZLynaDKsLOmF8oH6PCSZDzBf5n yVsmSRU4/DgHV+CtDUW+c9Rtat/cB6ibrn7onkZPiR4K2L0qtWF/QoMX+kD1RZVxrB5FDW /CFE5z/3bUNEzhq95+03FGBvwwKeQCw= Received: from mail-vk1-f198.google.com (mail-vk1-f198.google.com [209.85.221.198]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-306-C9eX0ypUMX68cUoO-_6u8Q-1; Fri, 17 Jan 2020 03:17:55 -0500 Received: by mail-vk1-f198.google.com with SMTP id s4so9390098vkk.7 for ; Fri, 17 Jan 2020 00:17:55 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=9tAyloXSkT1MwnanTYcqGne3mNt6MpcO+ARjZa6mP3w=; b=ZtSLchLLvGN+T03gTouGsoIclIehgoE6F+fa3NPakx95w0dCFr2jgb5Zh8yLu4E12W sTcmn2tjPvRXWPo6AHi38Qlx00ozY+EQgcEytWVl75wTZv2LOdln+/hoBYUwEDOT5C2D gMUSEls+kXFMRgTb4zkEbYsKLhT1jHwcx7pDHysmB6NnXwS5TV7no7ua3CLPyjaintil Yj9ACd2RPOp0aAXf70OqmTcMtqmyWSOMA3sa/8p/gNlNxw4kVz4Y38eVnwyOEYGc0plm DcZeNBXtLwlSGxvU0PIEHQCpiTGb/KgCScTFNNvhpjCsjRBHv4hJVgVvuk5RYNc0RmTe 3lWA== X-Gm-Message-State: APjAAAWQNwx2QcJ7WeSm8kpY0fj9DpMh8+9Oh7vunLpCFHi1T4CVPJuZ gCY3xSgCCCYRXYMtAutFohExDUGIMwfqOpS4/sFQWuOOU8dNH9bq44KCGVFxy0WpA9OEv51aL8X ALD3wkvSqCc9TtlDF4FY= X-Received: by 2002:a05:6102:20ca:: with SMTP id i10mr4123418vsr.105.1579249075023; Fri, 17 Jan 2020 00:17:55 -0800 (PST) X-Google-Smtp-Source: APXvYqwgJaPtdJPeJId9VRsYWT05Vok2+NwNtdqYq1JybNb0iiUoYWg9tBnViWht+uUHxc2RBaX2asSPc+RTB+74bic= X-Received: by 2002:a05:6102:20ca:: with SMTP id i10mr4123405vsr.105.1579249074697; Fri, 17 Jan 2020 00:17:54 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: David Marchand Date: Fri, 17 Jan 2020 09:17:43 +0100 Message-ID: To: Aaron Conole , Harry Van Haaren Cc: dev X-MC-Unique: C9eX0ypUMX68cUoO-_6u8Q-1 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Subject: Re: [dpdk-dev] [RFC] service: stop lcore threads before 'finalize' X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On Thu, Jan 16, 2020 at 8:50 PM Aaron Conole wrote: > > I've noticed an occasional segfault from the build system in the > service_autotest and after talking with David (CC'd), it seems like it's > due to the rte_service_finalize deleting the lcore_states object while > active lcores are running. > > The below patch is an attempt to solve it by first reassigning all the > lcores back to ROLE_RTE before releasing the memory. There is probably > a larger question for DPDK proper about actually closing the pending > lcore threads, but that's a separate issue. I've been running with the > patch for a while, and haven't seen the crash anymore on my system. > > Thoughts? Is it acceptable as-is? Added this patch to my env, still reproducing the same issue after ~10-20 t= ries. I added a breakpoint to service_lcore_uninit that is indeed caught when exiting the test application (just wanted to make sure your change was in my binary). To reproduce: I modified app/test/meson.build to have an explicit "-l 0-1" + compiled with your patch. Then, I started a dummy busyloop "while true; do true; done" in a shell that I had pinned to core 1 (taskset -pc 1 $$). Finally, started another shell (as root), pinned to cores 0-1 on my laptop (taskset -pc 0,1 $$) and ran meson test --gdb --repeat=3D10000 service_autotest Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7ffff4922700 (LWP 8572)] rte_service_runner_func (arg=3D) at ../lib/librte_eal/common/rte_service.c:458 458 cs->loops++; A debugging session is active. Inferior 1 [process 8566] will be killed. Quit anyway? (y or n) n Not confirmed. Missing separate debuginfos, use: debuginfo-install elfutils-libelf-0.172-2.el7.x86_64 glibc-2.17-260.el7_6.6.x86_64 libgcc-4.8.5-36.el7_6.2.x86_64 libibverbs-17.2-3.el7.x86_64 libnl3-3.2.28-4.el7.x86_64 libpcap-1.5.3-11.el7.x86_64 numactl-libs-2.0.9-7.el7.x86_64 openssl-libs-1.0.2k-16.el7_6.1.x86_64 zlib-1.2.7-18.el7.x86_64 (gdb) info threads Id Target Id Frame * 4 Thread 0x7ffff4922700 (LWP 8572) "lcore-slave-1" rte_service_runner_func (arg=3D) at ../lib/librte_eal/common/rte_service.c:458 3 Thread 0x7ffff5123700 (LWP 8571) "rte_mp_handle" 0x00007ffff63a4b4d in recvmsg () from /lib64/libpthread.so.0 2 Thread 0x7ffff5924700 (LWP 8570) "eal-intr-thread" 0x00007ffff60c7603 in epoll_wait () from /lib64/libc.so.6 1 Thread 0x7ffff7fd2c00 (LWP 8566) "dpdk-test" 0x00007ffff7deb96f in _dl_name_match_p () from /lib64/ld-linux-x86-64.so.2 (gdb) bt #0 rte_service_runner_func (arg=3D) at ../lib/librte_eal/common/rte_service.c:458 #1 0x0000000000b2c84f in eal_thread_loop (arg=3D) at ../lib/librte_eal/linux/eal/eal_thread.c:153 #2 0x00007ffff639ddd5 in start_thread () from /lib64/libpthread.so.0 #3 0x00007ffff60c702d in clone () from /lib64/libc.so.6 (gdb) f 0 #0 rte_service_runner_func (arg=3D) at ../lib/librte_eal/common/rte_service.c:458 458 cs->loops++; (gdb) p *cs $1 =3D {service_mask =3D 0, runstate =3D 0 '\000', is_service_core =3D 0 '\000', service_active_on_lcore =3D '\000' , loops =3D 0, calls_per_service =3D {0 }} (gdb) p lcore_config[1] $2 =3D {thread_id =3D 140737296606976, pipe_master2slave =3D {14, 20}, pipe_slave2master =3D {21, 22}, f =3D 0xb26ec0 , arg =3D 0x0, ret =3D 0, state =3D RUNNING, socket_id =3D 0, core_id =3D 1, core_index =3D 1, core_role =3D 0 '\000', detected =3D 1 '\001', cpuset = =3D {__bits =3D {2, 0 }}} (gdb) p lcore_config[0] $3 =3D {thread_id =3D 0, pipe_master2slave =3D {0, 0}, pipe_slave2master = =3D {0, 0}, f =3D 0x0, arg =3D 0x0, ret =3D 0, state =3D WAIT, socket_id =3D 0, core_id =3D 0, core_index =3D 0, core_role =3D 0 '\000', detected =3D 1 '\001', cpuset =3D {__bits =3D {1, 0 }}} (gdb) thread 1 [Switching to thread 1 (Thread 0x7ffff7fd2c00 (LWP 8566))] #0 0x00007ffff7deb96f in _dl_name_match_p () from /lib64/ld-linux-x86-64.s= o.2 (gdb) bt #0 0x00007ffff7deb96f in _dl_name_match_p () from /lib64/ld-linux-x86-64.s= o.2 #1 0x00007ffff7de4756 in do_lookup_x () from /lib64/ld-linux-x86-64.so.2 #2 0x00007ffff7de4fcf in _dl_lookup_symbol_x () from /lib64/ld-linux-x86-64.so.2 #3 0x00007ffff7de9d1e in _dl_fixup () from /lib64/ld-linux-x86-64.so.2 #4 0x00007ffff7df19da in _dl_runtime_resolve_xsavec () from /lib64/ld-linux-x86-64.so.2 #5 0x00007ffff7deafba in _dl_fini () from /lib64/ld-linux-x86-64.so.2 #6 0x00007ffff6002c29 in __run_exit_handlers () from /lib64/libc.so.6 #7 0x00007ffff6002c77 in exit () from /lib64/libc.so.6 #8 0x00007ffff5feb49c in __libc_start_main () from /lib64/libc.so.6 #9 0x00000000004fa126 in _start () -- David Marchand