* [dpdk-dev] 0/6] support oops handling @ 2021-07-30 8:49 jerinj 2021-07-30 8:49 ` [dpdk-dev] 1/6] eal: introduce oops handling API jerinj ` (5 more replies) 0 siblings, 6 replies; 45+ messages in thread From: jerinj @ 2021-07-30 8:49 UTC (permalink / raw) To: dev Cc: thomas, david.marchand, bruce.richardson, dmitry.kozliuk, navasile, dmitrym, pallavi.kadam, konstantin.ananyev, ruifeng.wang, viktorin, drc, Jerin Jacob From: Jerin Jacob <jerinj@marvell.com> It is handy to get detailed OOPS information like Linux kernel when DPDK application crashes without losing any of the features provided by coredump infrastructure by the OS. This patch series introduces the APIs to handle OOPS in DPDK. Following section details the implementation and API interface to application. On rte_eal_init() invocation, the EAL library installs the oops handler for the essential signals. The rte_oops_signals_enabled() API provides the list of signals the library installed by the EAL. The default EAL oops handler decodes the oops message using rte_oops_decode() and then calls the signal handler installed by the application before invoking the rte_eal_init(). This scheme will also enable the use of the default coredump handler(for gdb etc.) provided by OS if the application does not install any specific signal handler. The second case where the application installs the signal handler after the rte_eal_init() invocation, rte_oops_decode() provides the means of decoding the oops message in the application's fault handler. Patch split: Patch 1/6: defines the API and stub implementation for Unix systems Patch 2/6: The API implementation Patch 3/6: add an optional libunwind dependency to DPDK for better backtrace in oops. Patch 4/6: x86 specific archinfo like x86 register dump on oops Patch 5/6: arm64 specific archinfo like arm64 register dump on oops Patch 6/6: UT for the new APIs Example command for the build, run, and output logs of an x86-64 linux machine. meson --buildtype debug build ninja -C build echo "oops_autotest" | ./build/app/test/dpdk-test --no-huge -c 0x2 Signal info: ------------ PID: 2439496 Signal number: 11 Fault address: 0x5 Backtrace: ---------- [ 0x55e8b56d5cee]: test_oops_generate()+0x75 [ 0x55e8b5459843]: unit_test_suite_runner()+0x1aa [ 0x55e8b56d605c]: test_oops()+0x13 [ 0x55e8b544bdfc]: cmd_autotest_parsed()+0x55 [ 0x55e8b6063a0d]: cmdline_parse()+0x319 [ 0x55e8b6061dea]: cmdline_valid_buffer()+0x35 [ 0x55e8b6066bd8]: rdline_char_in()+0xc48 [ 0x55e8b606221c]: cmdline_in()+0x62 [ 0x55e8b6062495]: cmdline_interact()+0x56 [ 0x55e8b5459314]: main()+0x65e [ 0x7f54b25d2b25]: __libc_start_main()+0xd5 [ 0x55e8b544bc9e]: _start()+0x2e Arch info: ---------- R8 : 0x0000000000000000 R9 : 0x0000000000000000 R10: 0x00007f54b25b8b48 R11: 0x00007f54b25e7930 R12: 0x00007fffc695e610 R13: 0x0000000000000000 R14: 0x0000000000000000 R15: 0x0000000000000000 RAX: 0x0000000000000005 RBX: 0x0000000000000001 RCX: 0x00007f54b278a943 RDX: 0x3769043bf13a2594 RBP: 0x00007fffc6958340 RSP: 0x00007fffc6958330 RSI: 0x0000000000000000 RDI: 0x000055e8c4c1e380 RIP: 0x000055e8b56d5cee EFL: 0x0000000000010246 Stack dump: ---------- 0x7fffc6958330: 0x6000000 0x7fffc6958334: 0x0 0x7fffc6958338: 0x30cfeac5 0x7fffc695833c: 0x0 0x7fffc6958340: 0xe08395c6 0x7fffc6958344: 0xff7f0000 0x7fffc6958348: 0x439845b5 0x7fffc695834c: 0xe8550000 0x7fffc6958350: 0x0 0x7fffc6958354: 0xb000000 0x7fffc6958358: 0x20445bb9 0x7fffc695835c: 0xe8550000 0x7fffc6958360: 0x925506b6 0x7fffc6958364: 0x0 0x7fffc6958368: 0x0 0x7fffc695836c: 0x0 Code dump: ---------- 0x55e8b56d5cee: 0xc7000000 0x55e8b56d5cf2: 0xeb12 0x55e8b56d5cf6: 0xfb6054b 0x55e8b56d5cfa: 0x87540f84 0x55e8b56d5cfe: 0xc07407b8 0x55e8b56d5d02: 0x0 0x55e8b56d5d06: 0xeb05b8ff 0x55e8b56d5d0a: 0xffffffc9 0x55e8b56d5d0e: 0xc3554889 0x55e8b56d5d12: 0xe54881ec 0x55e8b56d5d16: 0xc0000000 0x55e8b56d5d1a: 0x89bd4cff 0x55e8b56d5d1e: 0xffff4889 0x55e8b56d5d22: 0xb540ffff Jerin Jacob (6): eal: introduce oops handling API eal: oops handling API implementation eal: support libunwind based backtrace eal/x86: support register dump for oops eal/arm64: support register dump for oops test/oops: support unit test case for oops handling APIs .github/workflows/build.yml | 2 +- .travis.yml | 2 +- app/test/meson.build | 2 + app/test/test_oops.c | 121 ++++++++++++++ config/meson.build | 8 + doc/api/doxy-api-index.md | 3 +- lib/eal/common/eal_private.h | 3 + lib/eal/freebsd/eal.c | 6 + lib/eal/include/meson.build | 1 + lib/eal/include/rte_oops.h | 100 ++++++++++++ lib/eal/linux/eal.c | 6 + lib/eal/unix/eal_oops.c | 297 +++++++++++++++++++++++++++++++++++ lib/eal/unix/meson.build | 1 + lib/eal/version.map | 4 + 14 files changed, 553 insertions(+), 3 deletions(-) create mode 100644 app/test/test_oops.c create mode 100644 lib/eal/include/rte_oops.h create mode 100644 lib/eal/unix/eal_oops.c -- 2.32.0 ^ permalink raw reply [flat|nested] 45+ messages in thread
* [dpdk-dev] 1/6] eal: introduce oops handling API 2021-07-30 8:49 [dpdk-dev] 0/6] support oops handling jerinj @ 2021-07-30 8:49 ` jerinj 2021-08-17 3:27 ` [dpdk-dev] [PATCH v2 0/6] support oops handling jerinj 2021-07-30 8:49 ` [dpdk-dev] 2/6] eal: oops handling API implementation jerinj ` (4 subsequent siblings) 5 siblings, 1 reply; 45+ messages in thread From: jerinj @ 2021-07-30 8:49 UTC (permalink / raw) To: dev, Bruce Richardson, Ray Kinsella Cc: thomas, david.marchand, dmitry.kozliuk, navasile, dmitrym, pallavi.kadam, konstantin.ananyev, ruifeng.wang, viktorin, drc, Jerin Jacob From: Jerin Jacob <jerinj@marvell.com> Introducing oops handling API with following specification and enable stub implementation for Linux and FreeBSD. On rte_eal_init() invocation, the EAL library installs the oops handler for the essential signals. The rte_oops_signals_enabled() API provides the list of signals the library installed by the EAL. The default EAL oops handler decodes the oops message using rte_oops_decode() and then calls the signal handler installed by the application before invoking the rte_eal_init(). This scheme will also enable the use of the default coredump handler(for gdb etc.) provided by OS if the application does not install any specific signal handler. The second case where the application installs the signal handler after the rte_eal_init() invocation, rte_oops_decode() provides the means of decoding the oops message in the application's fault handler. Signed-off-by: Jerin Jacob <jerinj@marvell.com> --- doc/api/doxy-api-index.md | 3 +- lib/eal/common/eal_private.h | 3 ++ lib/eal/freebsd/eal.c | 6 +++ lib/eal/include/meson.build | 1 + lib/eal/include/rte_oops.h | 100 +++++++++++++++++++++++++++++++++++ lib/eal/linux/eal.c | 6 +++ lib/eal/unix/eal_oops.c | 36 +++++++++++++ lib/eal/unix/meson.build | 1 + lib/eal/version.map | 4 ++ 9 files changed, 159 insertions(+), 1 deletion(-) create mode 100644 lib/eal/include/rte_oops.h create mode 100644 lib/eal/unix/eal_oops.c diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md index 1992107a03..0d0da35205 100644 --- a/doc/api/doxy-api-index.md +++ b/doc/api/doxy-api-index.md @@ -215,7 +215,8 @@ The public API headers are grouped by topics: [log] (@ref rte_log.h), [errno] (@ref rte_errno.h), [trace] (@ref rte_trace.h), - [trace_point] (@ref rte_trace_point.h) + [trace_point] (@ref rte_trace_point.h), + [oops] (@ref rte_oops.h) - **misc**: [EAL config] (@ref rte_eal.h), diff --git a/lib/eal/common/eal_private.h b/lib/eal/common/eal_private.h index 64cf4e81c8..c3a490d803 100644 --- a/lib/eal/common/eal_private.h +++ b/lib/eal/common/eal_private.h @@ -716,6 +716,9 @@ void __rte_thread_init(unsigned int lcore_id, rte_cpuset_t *cpuset); */ void __rte_thread_uninit(void); +int eal_oops_init(void); +void eal_oops_fini(void); + /** * asprintf(3) replacement for Windows. */ diff --git a/lib/eal/freebsd/eal.c b/lib/eal/freebsd/eal.c index 6cee5ae369..3c098708c6 100644 --- a/lib/eal/freebsd/eal.c +++ b/lib/eal/freebsd/eal.c @@ -692,6 +692,11 @@ rte_eal_init(int argc, char **argv) return -1; } + if (eal_oops_init()) { + rte_eal_init_alert("oops init failed."); + rte_errno = ENOENT; + } + thread_id = pthread_self(); eal_reset_internal_config(internal_conf); @@ -974,6 +979,7 @@ rte_eal_cleanup(void) rte_trace_save(); eal_trace_fini(); eal_cleanup_config(internal_conf); + eal_oops_fini(); return 0; } diff --git a/lib/eal/include/meson.build b/lib/eal/include/meson.build index 88a9eba12f..6c74bdb7b5 100644 --- a/lib/eal/include/meson.build +++ b/lib/eal/include/meson.build @@ -30,6 +30,7 @@ headers += files( 'rte_malloc.h', 'rte_memory.h', 'rte_memzone.h', + 'rte_oops.h', 'rte_pci_dev_feature_defs.h', 'rte_pci_dev_features.h', 'rte_per_lcore.h', diff --git a/lib/eal/include/rte_oops.h b/lib/eal/include/rte_oops.h new file mode 100644 index 0000000000..ff82c409ec --- /dev/null +++ b/lib/eal/include/rte_oops.h @@ -0,0 +1,100 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(C) 2020 Marvell. + */ + +#ifndef _RTE_OOPS_H_ +#define _RTE_OOPS_H_ + +#include <rte_common.h> +#include <rte_compat.h> +#include <rte_config.h> + +/** + * @file + * + * RTE oops API + * + * This file provides the oops handling APIs to RTE applications. + * + * On rte_eal_init() invocation, the EAL library installs the oops handler for + * the essential signals. The rte_oops_signals_enabled() API provides the list + * of signals the library installed by the EAL. + * + * The default EAL oops handler decodes the oops message using rte_oops_decode() + * and then calls the signal handler installed by the application before + * invoking the rte_eal_init(). This scheme will also enable the use of + * the default coredump handler(for gdb etc.) provided by OS if the application + * does not install any specific signal handler. + * + * The second case where the application installs the signal handler after + * the rte_eal_init() invocation, rte_oops_decode() provides the means of + * decoding the oops message in the application's fault handler. + * + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + */ + +#ifdef __cplusplus +extern "C" { +#endif + +/** + * Maximum number of oops signals enabled in EAL. + * @see rte_oops_signals_enabled() + */ +#define RTE_OOPS_SIGNALS_MAX 32 + +/** + * Get the list of enabled oops signals installed by EAL. + * + * @param [out] signals + * A pointer to store the enabled signals. + * Value NULL is allowed. if not NULL, then the size of this array must be + * at least RTE_OOPS_SIGNALS_MAX. + * + * @return + * Number of enabled oops signals. + */ +__rte_experimental +int rte_oops_signals_enabled(int *signals); + +#if defined(RTE_EXEC_ENV_LINUX) || defined(RTE_EXEC_ENV_FREEBSD) +#include <signal.h> +#include <ucontext.h> + +/** + * Decode an oops + * + * This prototype is same as sa_sigaction defined in signal.h. + * Application must register signal handler using sigaction() with + * sa_flag as SA_SIGINFO flag to get this information from unix OS. + * + * @param sig + * Signal number + * @param info + * Signal info provided by sa_sigaction. Value NULL is allowed. + * @param uc + * ucontext_t provided when signal installed with SA_SIGINFO flag. + * Value NULL is allowed. + * + */ +__rte_experimental +void rte_oops_decode(int sig, siginfo_t *info, ucontext_t *uc); +#else + +/** + * Decode an oops + * + * @param sig + * Signal number + */ +__rte_experimental +void rte_oops_decode(int sig); + +#endif + +#ifdef __cplusplus +} +#endif + +#endif /* _RTE_OOPS_H_ */ diff --git a/lib/eal/linux/eal.c b/lib/eal/linux/eal.c index 3577eaeaa4..3438a96b75 100644 --- a/lib/eal/linux/eal.c +++ b/lib/eal/linux/eal.c @@ -991,6 +991,11 @@ rte_eal_init(int argc, char **argv) return -1; } + if (eal_oops_init()) { + rte_eal_init_alert("oops init failed."); + rte_errno = ENOENT; + } + p = strrchr(argv[0], '/'); strlcpy(logid, p ? p + 1 : argv[0], sizeof(logid)); thread_id = pthread_self(); @@ -1371,6 +1376,7 @@ rte_eal_cleanup(void) rte_trace_save(); eal_trace_fini(); eal_cleanup_config(internal_conf); + eal_oops_fini(); return 0; } diff --git a/lib/eal/unix/eal_oops.c b/lib/eal/unix/eal_oops.c new file mode 100644 index 0000000000..53b580f733 --- /dev/null +++ b/lib/eal/unix/eal_oops.c @@ -0,0 +1,36 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(C) 2021 Marvell. + */ + + +#include <rte_oops.h> + +#include "eal_private.h" + +void +rte_oops_decode(int sig, siginfo_t *info, ucontext_t *uc) +{ + RTE_SET_USED(sig); + RTE_SET_USED(info); + RTE_SET_USED(uc); + +} + +int +rte_oops_signals_enabled(int *signals) +{ + RTE_SET_USED(signals); + + return 0; +} + +int +eal_oops_init(void) +{ + return 0; +} + +void +eal_oops_fini(void) +{ +} diff --git a/lib/eal/unix/meson.build b/lib/eal/unix/meson.build index e3ecd3e956..cdd3320669 100644 --- a/lib/eal/unix/meson.build +++ b/lib/eal/unix/meson.build @@ -6,5 +6,6 @@ sources += files( 'eal_unix_memory.c', 'eal_unix_timer.c', 'eal_firmware.c', + 'eal_oops.c', 'rte_thread.c', ) diff --git a/lib/eal/version.map b/lib/eal/version.map index 887012d02a..f2841d09fd 100644 --- a/lib/eal/version.map +++ b/lib/eal/version.map @@ -426,6 +426,10 @@ EXPERIMENTAL { # added in 21.08 rte_power_monitor_multi; # WINDOWS_NO_EXPORT + + # added in 21.11 + rte_oops_signals_enabled; # WINDOWS_NO_EXPORT + rte_oops_decode; # WINDOWS_NO_EXPORT }; INTERNAL { -- 2.32.0 ^ permalink raw reply [flat|nested] 45+ messages in thread
* [dpdk-dev] [PATCH v2 0/6] support oops handling 2021-07-30 8:49 ` [dpdk-dev] 1/6] eal: introduce oops handling API jerinj @ 2021-08-17 3:27 ` jerinj 2021-08-17 3:27 ` [dpdk-dev] [PATCH v2 1/6] eal: introduce oops handling API jerinj ` (6 more replies) 0 siblings, 7 replies; 45+ messages in thread From: jerinj @ 2021-08-17 3:27 UTC (permalink / raw) To: dev Cc: thomas, david.marchand, bruce.richardson, dmitry.kozliuk, navasile, dmitrym, pallavi.kadam, konstantin.ananyev, ruifeng.wang, viktorin, drc, Jerin Jacob From: Jerin Jacob <jerinj@marvell.com> v2: - Fix powerpc build (David Christensen) It is handy to get detailed OOPS information like Linux kernel when DPDK application crashes without losing any of the features provided by coredump infrastructure by the OS. This patch series introduces the APIs to handle OOPS in DPDK. Following section details the implementation and API interface to application. On rte_eal_init() invocation, the EAL library installs the oops handler for the essential signals. The rte_oops_signals_enabled() API provides the list of signals the library installed by the EAL. The default EAL oops handler decodes the oops message using rte_oops_decode() and then calls the signal handler installed by the application before invoking the rte_eal_init(). This scheme will also enable the use of the default coredump handler(for gdb etc.) provided by OS if the application does not install any specific signal handler. The second case where the application installs the signal handler after the rte_eal_init() invocation, rte_oops_decode() provides the means of decoding the oops message in the application's fault handler. Patch split: Patch 1/6: defines the API and stub implementation for Unix systems Patch 2/6: The API implementation Patch 3/6: add an optional libunwind dependency to DPDK for better backtrace in oops. Patch 4/6: x86 specific archinfo like x86 register dump on oops Patch 5/6: arm64 specific archinfo like arm64 register dump on oops Patch 6/6: UT for the new APIs Example command for the build, run, and output logs of an x86-64 linux machine. meson --buildtype debug build ninja -C build echo "oops_autotest" | ./build/app/test/dpdk-test --no-huge -c 0x2 Signal info: ------------ PID: 2439496 Signal number: 11 Fault address: 0x5 Backtrace: ---------- [ 0x55e8b56d5cee]: test_oops_generate()+0x75 [ 0x55e8b5459843]: unit_test_suite_runner()+0x1aa [ 0x55e8b56d605c]: test_oops()+0x13 [ 0x55e8b544bdfc]: cmd_autotest_parsed()+0x55 [ 0x55e8b6063a0d]: cmdline_parse()+0x319 [ 0x55e8b6061dea]: cmdline_valid_buffer()+0x35 [ 0x55e8b6066bd8]: rdline_char_in()+0xc48 [ 0x55e8b606221c]: cmdline_in()+0x62 [ 0x55e8b6062495]: cmdline_interact()+0x56 [ 0x55e8b5459314]: main()+0x65e [ 0x7f54b25d2b25]: __libc_start_main()+0xd5 [ 0x55e8b544bc9e]: _start()+0x2e Arch info: ---------- R8 : 0x0000000000000000 R9 : 0x0000000000000000 R10: 0x00007f54b25b8b48 R11: 0x00007f54b25e7930 R12: 0x00007fffc695e610 R13: 0x0000000000000000 R14: 0x0000000000000000 R15: 0x0000000000000000 RAX: 0x0000000000000005 RBX: 0x0000000000000001 RCX: 0x00007f54b278a943 RDX: 0x3769043bf13a2594 RBP: 0x00007fffc6958340 RSP: 0x00007fffc6958330 RSI: 0x0000000000000000 RDI: 0x000055e8c4c1e380 RIP: 0x000055e8b56d5cee EFL: 0x0000000000010246 Stack dump: ---------- 0x7fffc6958330: 0x6000000 0x7fffc6958334: 0x0 0x7fffc6958338: 0x30cfeac5 0x7fffc695833c: 0x0 0x7fffc6958340: 0xe08395c6 0x7fffc6958344: 0xff7f0000 0x7fffc6958348: 0x439845b5 0x7fffc695834c: 0xe8550000 0x7fffc6958350: 0x0 0x7fffc6958354: 0xb000000 0x7fffc6958358: 0x20445bb9 0x7fffc695835c: 0xe8550000 0x7fffc6958360: 0x925506b6 0x7fffc6958364: 0x0 0x7fffc6958368: 0x0 0x7fffc695836c: 0x0 Code dump: ---------- 0x55e8b56d5cee: 0xc7000000 0x55e8b56d5cf2: 0xeb12 0x55e8b56d5cf6: 0xfb6054b 0x55e8b56d5cfa: 0x87540f84 0x55e8b56d5cfe: 0xc07407b8 0x55e8b56d5d02: 0x0 0x55e8b56d5d06: 0xeb05b8ff 0x55e8b56d5d0a: 0xffffffc9 0x55e8b56d5d0e: 0xc3554889 0x55e8b56d5d12: 0xe54881ec 0x55e8b56d5d16: 0xc0000000 0x55e8b56d5d1a: 0x89bd4cff 0x55e8b56d5d1e: 0xffff4889 0x55e8b56d5d22: 0xb540ffff Jerin Jacob (6): eal: introduce oops handling API eal: oops handling API implementation eal: support libunwind based backtrace eal/x86: support register dump for oops eal/arm64: support register dump for oops test/oops: support unit test case for oops handling APIs .github/workflows/build.yml | 2 +- .travis.yml | 2 +- app/test/meson.build | 2 + app/test/test_oops.c | 121 ++++++++++++++ config/meson.build | 8 + doc/api/doxy-api-index.md | 3 +- lib/eal/common/eal_private.h | 3 + lib/eal/freebsd/eal.c | 6 + lib/eal/include/meson.build | 1 + lib/eal/include/rte_oops.h | 100 ++++++++++++ lib/eal/linux/eal.c | 6 + lib/eal/unix/eal_oops.c | 298 +++++++++++++++++++++++++++++++++++ lib/eal/unix/meson.build | 1 + lib/eal/version.map | 4 + 14 files changed, 554 insertions(+), 3 deletions(-) create mode 100644 app/test/test_oops.c create mode 100644 lib/eal/include/rte_oops.h create mode 100644 lib/eal/unix/eal_oops.c -- 2.32.0 ^ permalink raw reply [flat|nested] 45+ messages in thread
* [dpdk-dev] [PATCH v2 1/6] eal: introduce oops handling API 2021-08-17 3:27 ` [dpdk-dev] [PATCH v2 0/6] support oops handling jerinj @ 2021-08-17 3:27 ` jerinj 2021-08-17 3:53 ` Stephen Hemminger 2021-08-17 3:27 ` [dpdk-dev] [PATCH v2 2/6] eal: oops handling API implementation jerinj ` (5 subsequent siblings) 6 siblings, 1 reply; 45+ messages in thread From: jerinj @ 2021-08-17 3:27 UTC (permalink / raw) To: dev, Bruce Richardson, Ray Kinsella Cc: thomas, david.marchand, dmitry.kozliuk, navasile, dmitrym, pallavi.kadam, konstantin.ananyev, ruifeng.wang, viktorin, drc, Jerin Jacob From: Jerin Jacob <jerinj@marvell.com> Introducing oops handling API with following specification and enable stub implementation for Linux and FreeBSD. On rte_eal_init() invocation, the EAL library installs the oops handler for the essential signals. The rte_oops_signals_enabled() API provides the list of signals the library installed by the EAL. The default EAL oops handler decodes the oops message using rte_oops_decode() and then calls the signal handler installed by the application before invoking the rte_eal_init(). This scheme will also enable the use of the default coredump handler(for gdb etc.) provided by OS if the application does not install any specific signal handler. The second case where the application installs the signal handler after the rte_eal_init() invocation, rte_oops_decode() provides the means of decoding the oops message in the application's fault handler. Signed-off-by: Jerin Jacob <jerinj@marvell.com> --- doc/api/doxy-api-index.md | 3 +- lib/eal/common/eal_private.h | 3 ++ lib/eal/freebsd/eal.c | 6 +++ lib/eal/include/meson.build | 1 + lib/eal/include/rte_oops.h | 100 +++++++++++++++++++++++++++++++++++ lib/eal/linux/eal.c | 6 +++ lib/eal/unix/eal_oops.c | 36 +++++++++++++ lib/eal/unix/meson.build | 1 + lib/eal/version.map | 4 ++ 9 files changed, 159 insertions(+), 1 deletion(-) create mode 100644 lib/eal/include/rte_oops.h create mode 100644 lib/eal/unix/eal_oops.c diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md index 1992107a03..0d0da35205 100644 --- a/doc/api/doxy-api-index.md +++ b/doc/api/doxy-api-index.md @@ -215,7 +215,8 @@ The public API headers are grouped by topics: [log] (@ref rte_log.h), [errno] (@ref rte_errno.h), [trace] (@ref rte_trace.h), - [trace_point] (@ref rte_trace_point.h) + [trace_point] (@ref rte_trace_point.h), + [oops] (@ref rte_oops.h) - **misc**: [EAL config] (@ref rte_eal.h), diff --git a/lib/eal/common/eal_private.h b/lib/eal/common/eal_private.h index 64cf4e81c8..c3a490d803 100644 --- a/lib/eal/common/eal_private.h +++ b/lib/eal/common/eal_private.h @@ -716,6 +716,9 @@ void __rte_thread_init(unsigned int lcore_id, rte_cpuset_t *cpuset); */ void __rte_thread_uninit(void); +int eal_oops_init(void); +void eal_oops_fini(void); + /** * asprintf(3) replacement for Windows. */ diff --git a/lib/eal/freebsd/eal.c b/lib/eal/freebsd/eal.c index 6cee5ae369..3c098708c6 100644 --- a/lib/eal/freebsd/eal.c +++ b/lib/eal/freebsd/eal.c @@ -692,6 +692,11 @@ rte_eal_init(int argc, char **argv) return -1; } + if (eal_oops_init()) { + rte_eal_init_alert("oops init failed."); + rte_errno = ENOENT; + } + thread_id = pthread_self(); eal_reset_internal_config(internal_conf); @@ -974,6 +979,7 @@ rte_eal_cleanup(void) rte_trace_save(); eal_trace_fini(); eal_cleanup_config(internal_conf); + eal_oops_fini(); return 0; } diff --git a/lib/eal/include/meson.build b/lib/eal/include/meson.build index 88a9eba12f..6c74bdb7b5 100644 --- a/lib/eal/include/meson.build +++ b/lib/eal/include/meson.build @@ -30,6 +30,7 @@ headers += files( 'rte_malloc.h', 'rte_memory.h', 'rte_memzone.h', + 'rte_oops.h', 'rte_pci_dev_feature_defs.h', 'rte_pci_dev_features.h', 'rte_per_lcore.h', diff --git a/lib/eal/include/rte_oops.h b/lib/eal/include/rte_oops.h new file mode 100644 index 0000000000..ff82c409ec --- /dev/null +++ b/lib/eal/include/rte_oops.h @@ -0,0 +1,100 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(C) 2020 Marvell. + */ + +#ifndef _RTE_OOPS_H_ +#define _RTE_OOPS_H_ + +#include <rte_common.h> +#include <rte_compat.h> +#include <rte_config.h> + +/** + * @file + * + * RTE oops API + * + * This file provides the oops handling APIs to RTE applications. + * + * On rte_eal_init() invocation, the EAL library installs the oops handler for + * the essential signals. The rte_oops_signals_enabled() API provides the list + * of signals the library installed by the EAL. + * + * The default EAL oops handler decodes the oops message using rte_oops_decode() + * and then calls the signal handler installed by the application before + * invoking the rte_eal_init(). This scheme will also enable the use of + * the default coredump handler(for gdb etc.) provided by OS if the application + * does not install any specific signal handler. + * + * The second case where the application installs the signal handler after + * the rte_eal_init() invocation, rte_oops_decode() provides the means of + * decoding the oops message in the application's fault handler. + * + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + */ + +#ifdef __cplusplus +extern "C" { +#endif + +/** + * Maximum number of oops signals enabled in EAL. + * @see rte_oops_signals_enabled() + */ +#define RTE_OOPS_SIGNALS_MAX 32 + +/** + * Get the list of enabled oops signals installed by EAL. + * + * @param [out] signals + * A pointer to store the enabled signals. + * Value NULL is allowed. if not NULL, then the size of this array must be + * at least RTE_OOPS_SIGNALS_MAX. + * + * @return + * Number of enabled oops signals. + */ +__rte_experimental +int rte_oops_signals_enabled(int *signals); + +#if defined(RTE_EXEC_ENV_LINUX) || defined(RTE_EXEC_ENV_FREEBSD) +#include <signal.h> +#include <ucontext.h> + +/** + * Decode an oops + * + * This prototype is same as sa_sigaction defined in signal.h. + * Application must register signal handler using sigaction() with + * sa_flag as SA_SIGINFO flag to get this information from unix OS. + * + * @param sig + * Signal number + * @param info + * Signal info provided by sa_sigaction. Value NULL is allowed. + * @param uc + * ucontext_t provided when signal installed with SA_SIGINFO flag. + * Value NULL is allowed. + * + */ +__rte_experimental +void rte_oops_decode(int sig, siginfo_t *info, ucontext_t *uc); +#else + +/** + * Decode an oops + * + * @param sig + * Signal number + */ +__rte_experimental +void rte_oops_decode(int sig); + +#endif + +#ifdef __cplusplus +} +#endif + +#endif /* _RTE_OOPS_H_ */ diff --git a/lib/eal/linux/eal.c b/lib/eal/linux/eal.c index 3577eaeaa4..3438a96b75 100644 --- a/lib/eal/linux/eal.c +++ b/lib/eal/linux/eal.c @@ -991,6 +991,11 @@ rte_eal_init(int argc, char **argv) return -1; } + if (eal_oops_init()) { + rte_eal_init_alert("oops init failed."); + rte_errno = ENOENT; + } + p = strrchr(argv[0], '/'); strlcpy(logid, p ? p + 1 : argv[0], sizeof(logid)); thread_id = pthread_self(); @@ -1371,6 +1376,7 @@ rte_eal_cleanup(void) rte_trace_save(); eal_trace_fini(); eal_cleanup_config(internal_conf); + eal_oops_fini(); return 0; } diff --git a/lib/eal/unix/eal_oops.c b/lib/eal/unix/eal_oops.c new file mode 100644 index 0000000000..53b580f733 --- /dev/null +++ b/lib/eal/unix/eal_oops.c @@ -0,0 +1,36 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(C) 2021 Marvell. + */ + + +#include <rte_oops.h> + +#include "eal_private.h" + +void +rte_oops_decode(int sig, siginfo_t *info, ucontext_t *uc) +{ + RTE_SET_USED(sig); + RTE_SET_USED(info); + RTE_SET_USED(uc); + +} + +int +rte_oops_signals_enabled(int *signals) +{ + RTE_SET_USED(signals); + + return 0; +} + +int +eal_oops_init(void) +{ + return 0; +} + +void +eal_oops_fini(void) +{ +} diff --git a/lib/eal/unix/meson.build b/lib/eal/unix/meson.build index e3ecd3e956..cdd3320669 100644 --- a/lib/eal/unix/meson.build +++ b/lib/eal/unix/meson.build @@ -6,5 +6,6 @@ sources += files( 'eal_unix_memory.c', 'eal_unix_timer.c', 'eal_firmware.c', + 'eal_oops.c', 'rte_thread.c', ) diff --git a/lib/eal/version.map b/lib/eal/version.map index 887012d02a..f2841d09fd 100644 --- a/lib/eal/version.map +++ b/lib/eal/version.map @@ -426,6 +426,10 @@ EXPERIMENTAL { # added in 21.08 rte_power_monitor_multi; # WINDOWS_NO_EXPORT + + # added in 21.11 + rte_oops_signals_enabled; # WINDOWS_NO_EXPORT + rte_oops_decode; # WINDOWS_NO_EXPORT }; INTERNAL { -- 2.32.0 ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [dpdk-dev] [PATCH v2 1/6] eal: introduce oops handling API 2021-08-17 3:27 ` [dpdk-dev] [PATCH v2 1/6] eal: introduce oops handling API jerinj @ 2021-08-17 3:53 ` Stephen Hemminger 2021-08-17 7:38 ` Jerin Jacob 0 siblings, 1 reply; 45+ messages in thread From: Stephen Hemminger @ 2021-08-17 3:53 UTC (permalink / raw) To: jerinj Cc: dev, Bruce Richardson, Ray Kinsella, thomas, david.marchand, dmitry.kozliuk, navasile, dmitrym, pallavi.kadam, konstantin.ananyev, ruifeng.wang, viktorin, drc On Tue, 17 Aug 2021 08:57:18 +0530 <jerinj@marvell.com> wrote: > From: Jerin Jacob <jerinj@marvell.com> > > Introducing oops handling API with following specification > and enable stub implementation for Linux and FreeBSD. > > On rte_eal_init() invocation, the EAL library installs the > oops handler for the essential signals. > The rte_oops_signals_enabled() API provides the list > of signals the library installed by the EAL. This is a big change, and many applications already handle these signals themselves. Therefore adding this needs to be opt-in and not enabled by default. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [dpdk-dev] [PATCH v2 1/6] eal: introduce oops handling API 2021-08-17 3:53 ` Stephen Hemminger @ 2021-08-17 7:38 ` Jerin Jacob 2021-08-17 15:09 ` Stephen Hemminger 0 siblings, 1 reply; 45+ messages in thread From: Jerin Jacob @ 2021-08-17 7:38 UTC (permalink / raw) To: Stephen Hemminger Cc: Jerin Jacob, dpdk-dev, Bruce Richardson, Ray Kinsella, Thomas Monjalon, David Marchand, Dmitry Kozlyuk, Narcisa Ana Maria Vasile, Dmitry Malloy (MESHCHANINOV), Pallavi Kadam, Ananyev, Konstantin, Ruifeng Wang (Arm Technology China), Jan Viktorin, David Christensen On Tue, Aug 17, 2021 at 9:23 AM Stephen Hemminger <stephen@networkplumber.org> wrote: > > On Tue, 17 Aug 2021 08:57:18 +0530 > <jerinj@marvell.com> wrote: > > > From: Jerin Jacob <jerinj@marvell.com> > > > > Introducing oops handling API with following specification > > and enable stub implementation for Linux and FreeBSD. > > > > On rte_eal_init() invocation, the EAL library installs the > > oops handler for the essential signals. > > The rte_oops_signals_enabled() API provides the list > > of signals the library installed by the EAL. > > This is a big change, and many applications already handle these > signals themselves. Therefore adding this needs to be opt-in > and not enabled by default. In order to avoid every application explicitly register this sighandler and to cater to the co-existing application-specific signal-hander usage. The following design has been chosen. (It is mentioned in the commit log, I will describe here for more clarity) Case 1: a) The application installs the signal handler prior to rte_eal_init(). b) Implementation stores the application-specific signal and replace a signal handler as oops eal handler c) when application/DPDK get the segfault, the default EAL oops handler gets invoked d) Then it dumps the EAL specific message, it calls the application-specific signal handler installed in step 1 by application. This avoids breaking any contract with the application. i.e Behavior is the same current EAL now. That is the reason for not using SA_RESETHAND(which call SIG_DFL after eal oops handler instead application-specific handler) Case 2: a) The application install the signal handler after rte_eal_init(), b) EAL hander get replaced with application handle then the application can call rte_oops_decode() to decode. In order to cater the above use case, rte_oops_signals_enabled() and rte_oops_decode() provided. Here we are not breaking any contract with the application. Do you have concerns about this design? ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [dpdk-dev] [PATCH v2 1/6] eal: introduce oops handling API 2021-08-17 7:38 ` Jerin Jacob @ 2021-08-17 15:09 ` Stephen Hemminger 2021-08-17 15:27 ` Jerin Jacob 0 siblings, 1 reply; 45+ messages in thread From: Stephen Hemminger @ 2021-08-17 15:09 UTC (permalink / raw) To: Jerin Jacob Cc: Jerin Jacob, dpdk-dev, Bruce Richardson, Ray Kinsella, Thomas Monjalon, David Marchand, Dmitry Kozlyuk, Narcisa Ana Maria Vasile, Dmitry Malloy (MESHCHANINOV), Pallavi Kadam, Ananyev, Konstantin, Ruifeng Wang (Arm Technology China), Jan Viktorin, David Christensen On Tue, 17 Aug 2021 13:08:46 +0530 Jerin Jacob <jerinjacobk@gmail.com> wrote: > On Tue, Aug 17, 2021 at 9:23 AM Stephen Hemminger > <stephen@networkplumber.org> wrote: > > > > On Tue, 17 Aug 2021 08:57:18 +0530 > > <jerinj@marvell.com> wrote: > > > > > From: Jerin Jacob <jerinj@marvell.com> > > > > > > Introducing oops handling API with following specification > > > and enable stub implementation for Linux and FreeBSD. > > > > > > On rte_eal_init() invocation, the EAL library installs the > > > oops handler for the essential signals. > > > The rte_oops_signals_enabled() API provides the list > > > of signals the library installed by the EAL. > > > > This is a big change, and many applications already handle these > > signals themselves. Therefore adding this needs to be opt-in > > and not enabled by default. > > In order to avoid every application explicitly register this > sighandler and to cater to the > co-existing application-specific signal-hander usage. > The following design has been chosen. (It is mentioned in the commit log, > I will describe here for more clarity) > > Case 1: > a) The application installs the signal handler prior to rte_eal_init(). > b) Implementation stores the application-specific signal and replace a > signal handler as oops eal handler > c) when application/DPDK get the segfault, the default EAL oops > handler gets invoked > d) Then it dumps the EAL specific message, it calls the > application-specific signal handler > installed in step 1 by application. This avoids breaking any contract > with the application. > i.e Behavior is the same current EAL now. > That is the reason for not using SA_RESETHAND(which call SIG_DFL after > eal oops handler instead > application-specific handler) > > Case 2: > a) The application install the signal handler after rte_eal_init(), > b) EAL hander get replaced with application handle then the application can call > rte_oops_decode() to decode. > > In order to cater the above use case, rte_oops_signals_enabled() and > rte_oops_decode() > provided. > > Here we are not breaking any contract with the application. > Do you have concerns about this design? In our application as a service it is important not to do any backtrace in production. We rely on other infrastructure to process coredumps. This should be controlled enabled by a command line argument. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [dpdk-dev] [PATCH v2 1/6] eal: introduce oops handling API 2021-08-17 15:09 ` Stephen Hemminger @ 2021-08-17 15:27 ` Jerin Jacob 2021-08-17 15:52 ` Stephen Hemminger 0 siblings, 1 reply; 45+ messages in thread From: Jerin Jacob @ 2021-08-17 15:27 UTC (permalink / raw) To: Stephen Hemminger Cc: Jerin Jacob, dpdk-dev, Bruce Richardson, Ray Kinsella, Thomas Monjalon, David Marchand, Dmitry Kozlyuk, Narcisa Ana Maria Vasile, Dmitry Malloy (MESHCHANINOV), Pallavi Kadam, Ananyev, Konstantin, Ruifeng Wang (Arm Technology China), Jan Viktorin, David Christensen On Tue, Aug 17, 2021 at 8:39 PM Stephen Hemminger <stephen@networkplumber.org> wrote: > > On Tue, 17 Aug 2021 13:08:46 +0530 > Jerin Jacob <jerinjacobk@gmail.com> wrote: > > > On Tue, Aug 17, 2021 at 9:23 AM Stephen Hemminger > > <stephen@networkplumber.org> wrote: > > > > > > On Tue, 17 Aug 2021 08:57:18 +0530 > > > <jerinj@marvell.com> wrote: > > > > > > > From: Jerin Jacob <jerinj@marvell.com> > > > > > > > > Introducing oops handling API with following specification > > > > and enable stub implementation for Linux and FreeBSD. > > > > > > > > On rte_eal_init() invocation, the EAL library installs the > > > > oops handler for the essential signals. > > > > The rte_oops_signals_enabled() API provides the list > > > > of signals the library installed by the EAL. > > > > > > This is a big change, and many applications already handle these > > > signals themselves. Therefore adding this needs to be opt-in > > > and not enabled by default. > > > > In order to avoid every application explicitly register this > > sighandler and to cater to the > > co-existing application-specific signal-hander usage. > > The following design has been chosen. (It is mentioned in the commit log, > > I will describe here for more clarity) > > > > Case 1: > > a) The application installs the signal handler prior to rte_eal_init(). > > b) Implementation stores the application-specific signal and replace a > > signal handler as oops eal handler > > c) when application/DPDK get the segfault, the default EAL oops > > handler gets invoked > > d) Then it dumps the EAL specific message, it calls the > > application-specific signal handler > > installed in step 1 by application. This avoids breaking any contract > > with the application. > > i.e Behavior is the same current EAL now. > > That is the reason for not using SA_RESETHAND(which call SIG_DFL after > > eal oops handler instead > > application-specific handler) > > > > Case 2: > > a) The application install the signal handler after rte_eal_init(), > > b) EAL hander get replaced with application handle then the application can call > > rte_oops_decode() to decode. > > > > In order to cater the above use case, rte_oops_signals_enabled() and > > rte_oops_decode() > > provided. > > > > Here we are not breaking any contract with the application. > > Do you have concerns about this design? > > In our application as a service it is important not to do any backtrace > in production. We rely on other infrastructure to process coredumps. Other infrastructure will work. For example, If we are using standard coredump using linux infra. In Current implementation, - EAL handler dump the DPDK OOPS like kernel on stderr - Implementation calls SIG_DFL in eal oops handler - The above step creates the coredump or re-directs any other infrastructure you are using for coredump. > > This should be controlled enabled by a command line argument. If we allow other infrastructure coredump to work as-is, why enable/disable required from eal? ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [dpdk-dev] [PATCH v2 1/6] eal: introduce oops handling API 2021-08-17 15:27 ` Jerin Jacob @ 2021-08-17 15:52 ` Stephen Hemminger 2021-08-18 9:37 ` Jerin Jacob 0 siblings, 1 reply; 45+ messages in thread From: Stephen Hemminger @ 2021-08-17 15:52 UTC (permalink / raw) To: Jerin Jacob Cc: Jerin Jacob, dpdk-dev, Bruce Richardson, Ray Kinsella, Thomas Monjalon, David Marchand, Dmitry Kozlyuk, Narcisa Ana Maria Vasile, Dmitry Malloy (MESHCHANINOV), Pallavi Kadam, Ananyev, Konstantin, Ruifeng Wang (Arm Technology China), Jan Viktorin, David Christensen On Tue, 17 Aug 2021 20:57:50 +0530 Jerin Jacob <jerinjacobk@gmail.com> wrote: > On Tue, Aug 17, 2021 at 8:39 PM Stephen Hemminger > <stephen@networkplumber.org> wrote: > > > > On Tue, 17 Aug 2021 13:08:46 +0530 > > Jerin Jacob <jerinjacobk@gmail.com> wrote: > > > > > On Tue, Aug 17, 2021 at 9:23 AM Stephen Hemminger > > > <stephen@networkplumber.org> wrote: > > > > > > > > On Tue, 17 Aug 2021 08:57:18 +0530 > > > > <jerinj@marvell.com> wrote: > > > > > > > > > From: Jerin Jacob <jerinj@marvell.com> > > > > > > > > > > Introducing oops handling API with following specification > > > > > and enable stub implementation for Linux and FreeBSD. > > > > > > > > > > On rte_eal_init() invocation, the EAL library installs the > > > > > oops handler for the essential signals. > > > > > The rte_oops_signals_enabled() API provides the list > > > > > of signals the library installed by the EAL. > > > > > > > > This is a big change, and many applications already handle these > > > > signals themselves. Therefore adding this needs to be opt-in > > > > and not enabled by default. > > > > > > In order to avoid every application explicitly register this > > > sighandler and to cater to the > > > co-existing application-specific signal-hander usage. > > > The following design has been chosen. (It is mentioned in the commit log, > > > I will describe here for more clarity) > > > > > > Case 1: > > > a) The application installs the signal handler prior to rte_eal_init(). > > > b) Implementation stores the application-specific signal and replace a > > > signal handler as oops eal handler > > > c) when application/DPDK get the segfault, the default EAL oops > > > handler gets invoked > > > d) Then it dumps the EAL specific message, it calls the > > > application-specific signal handler > > > installed in step 1 by application. This avoids breaking any contract > > > with the application. > > > i.e Behavior is the same current EAL now. > > > That is the reason for not using SA_RESETHAND(which call SIG_DFL after > > > eal oops handler instead > > > application-specific handler) > > > > > > Case 2: > > > a) The application install the signal handler after rte_eal_init(), > > > b) EAL hander get replaced with application handle then the application can call > > > rte_oops_decode() to decode. > > > > > > In order to cater the above use case, rte_oops_signals_enabled() and > > > rte_oops_decode() > > > provided. > > > > > > Here we are not breaking any contract with the application. > > > Do you have concerns about this design? > > > > In our application as a service it is important not to do any backtrace > > in production. We rely on other infrastructure to process coredumps. > > Other infrastructure will work. For example, If we are using standard coredump > using linux infra. In Current implementation, > - EAL handler dump the DPDK OOPS like kernel on stderr > - Implementation calls SIG_DFL in eal oops handler > - The above step creates the coredump or re-directs any other > infrastructure you are using for coredump. > > > > > This should be controlled enabled by a command line argument. > > If we allow other infrastructure coredump to work as-is, why > enable/disable required from eal? The addition of DPDK OOPS adds additional steps which make all faults be identified as the oops code. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [dpdk-dev] [PATCH v2 1/6] eal: introduce oops handling API 2021-08-17 15:52 ` Stephen Hemminger @ 2021-08-18 9:37 ` Jerin Jacob 2021-08-18 16:46 ` Stephen Hemminger 0 siblings, 1 reply; 45+ messages in thread From: Jerin Jacob @ 2021-08-18 9:37 UTC (permalink / raw) To: Stephen Hemminger Cc: Jerin Jacob, dpdk-dev, Bruce Richardson, Ray Kinsella, Thomas Monjalon, David Marchand, Dmitry Kozlyuk, Narcisa Ana Maria Vasile, Dmitry Malloy (MESHCHANINOV), Pallavi Kadam, Ananyev, Konstantin, Ruifeng Wang (Arm Technology China), Jan Viktorin, David Christensen On Tue, Aug 17, 2021 at 9:22 PM Stephen Hemminger <stephen@networkplumber.org> wrote: > > On Tue, 17 Aug 2021 20:57:50 +0530 > Jerin Jacob <jerinjacobk@gmail.com> wrote: > > > On Tue, Aug 17, 2021 at 8:39 PM Stephen Hemminger > > <stephen@networkplumber.org> wrote: > > > > > > On Tue, 17 Aug 2021 13:08:46 +0530 > > > Jerin Jacob <jerinjacobk@gmail.com> wrote: > > > > > > > On Tue, Aug 17, 2021 at 9:23 AM Stephen Hemminger > > > > <stephen@networkplumber.org> wrote: > > > > > > > > > > On Tue, 17 Aug 2021 08:57:18 +0530 > > > > > <jerinj@marvell.com> wrote: > > > > > > > > > > > From: Jerin Jacob <jerinj@marvell.com> > > > > > > > > > > > > Introducing oops handling API with following specification > > > > > > and enable stub implementation for Linux and FreeBSD. > > > > > > > > > > > > On rte_eal_init() invocation, the EAL library installs the > > > > > > oops handler for the essential signals. > > > > > > The rte_oops_signals_enabled() API provides the list > > > > > > of signals the library installed by the EAL. > > > > > > > > > > This is a big change, and many applications already handle these > > > > > signals themselves. Therefore adding this needs to be opt-in > > > > > and not enabled by default. > > > > > > > > In order to avoid every application explicitly register this > > > > sighandler and to cater to the > > > > co-existing application-specific signal-hander usage. > > > > The following design has been chosen. (It is mentioned in the commit log, > > > > I will describe here for more clarity) > > > > > > > > Case 1: > > > > a) The application installs the signal handler prior to rte_eal_init(). > > > > b) Implementation stores the application-specific signal and replace a > > > > signal handler as oops eal handler > > > > c) when application/DPDK get the segfault, the default EAL oops > > > > handler gets invoked > > > > d) Then it dumps the EAL specific message, it calls the > > > > application-specific signal handler > > > > installed in step 1 by application. This avoids breaking any contract > > > > with the application. > > > > i.e Behavior is the same current EAL now. > > > > That is the reason for not using SA_RESETHAND(which call SIG_DFL after > > > > eal oops handler instead > > > > application-specific handler) > > > > > > > > Case 2: > > > > a) The application install the signal handler after rte_eal_init(), > > > > b) EAL hander get replaced with application handle then the application can call > > > > rte_oops_decode() to decode. > > > > > > > > In order to cater the above use case, rte_oops_signals_enabled() and > > > > rte_oops_decode() > > > > provided. > > > > > > > > Here we are not breaking any contract with the application. > > > > Do you have concerns about this design? > > > > > > In our application as a service it is important not to do any backtrace > > > in production. We rely on other infrastructure to process coredumps. > > > > Other infrastructure will work. For example, If we are using standard coredump > > using linux infra. In Current implementation, > > - EAL handler dump the DPDK OOPS like kernel on stderr > > - Implementation calls SIG_DFL in eal oops handler > > - The above step creates the coredump or re-directs any other > > infrastructure you are using for coredump. > > > > > > > > This should be controlled enabled by a command line argument. > > > > If we allow other infrastructure coredump to work as-is, why > > enable/disable required from eal? > > The addition of DPDK OOPS adds additional steps which make all > faults be identified as the oops code. Since we are using SA_ONSTACK it is not losing the original segfault info. I verified like this, Please find below the steps. 0) Enable coredump infra in Linux using coredumpctl or so 1) Apply this series 2) Apply for the following patch to create a segfault from the library. This will test, segfault caught by eal and forward to default Linux singal handler. [main]dell[dpdk.org] $ git diff diff --git a/lib/eal/linux/eal.c b/lib/eal/linux/eal.c index 3438a96b75..b935c32c98 100644 --- a/lib/eal/linux/eal.c +++ b/lib/eal/linux/eal.c @@ -1338,6 +1338,8 @@ rte_eal_init(int argc, char **argv) eal_mcfg_complete(); + /* Generate a segfault */ + *(volatile int *)0x05 = 0; return fctret; } 3)Build meson --buildtype debug build ninja -C build 4) Run $ ./build/app/test/dpdk-test --no-huge -c 0x2 Please find oops dump[1] and gdb core dump backtrace[2]. Gdb core dump trace preserves the original segfault cause and trace. Any other concerns? [1] [main]dell[dpdk.org] $ ./build/app/test/dpdk-test --no-huge -c 0x2 EAL: Detected 56 lcore(s) EAL: Detected 2 NUMA nodes EAL: Static memory layout is selected, amount of reserved memory can be adjusted with -m or --socket-mem EAL: Detected static linkage of DPDK EAL: Multi-process socket /run/user/1000/dpdk/rte/mp_socket EAL: Selected IOVA mode 'VA' EAL: WARNING: Main core has no memory on local socket! Signal info: ------------ PID: 2666512 Signal number: 11 Fault address: 0x5 Backtrace: ---------- [ 0x5582acd1e08a]: rte_eal_init()+0xe18 [ 0x5582ac086f4e]: main()+0x298 [ 0x7f0facf1fb25]: __libc_start_main()+0xd5 [ 0x5582ac079c9e]: _start()+0x2e Arch info: ---------- R8 : 0x0000000000000002 R9 : 0x00007ffe9273c590 R10: 0x0000000000000000 R11: 0x0000000000000246 R12: 0x00005582bc3ce7a0 R13: 0x00000000000000ca R14: 0x0000000000000000 R15: 0x0000000000000000 RAX: 0x0000000000000005 RBX: 0x00005582bc3c75c8 RCX: 0x00007ffe9273c530 RDX: 0x0000000000000000 RBP: 0x00007ffe9273c820 RSP: 0x00007ffe9273c690 RSI: 0x0000000000000008 RDI: 0x00000000000000ca RIP: 0x00005582acd1e08a EFL: 0x0000000000010246 [2] Core was generated by `./build/app/test/dpdk-test --no-huge -c 0x2'. Program terminated with signal SIGSEGV, Segmentation fault. #0 rte_eal_init (argc=4, argv=0x7ffe9273cec8) at ../lib/eal/linux/eal.c:1342 1342 *(volatile int *)0x05 = 0; [Current thread is 1 (Thread 0x7f0faca83c00 (LWP 2666512))] (gdb) bt #0 rte_eal_init (argc=4, argv=0x7ffe9273cec8) at ../lib/eal/linux/eal.c:1342 #1 0x00005582ac086f4e in main (argc=4, argv=0x7ffe9273cec8) at ../app/test/test.c:146 > ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [dpdk-dev] [PATCH v2 1/6] eal: introduce oops handling API 2021-08-18 9:37 ` Jerin Jacob @ 2021-08-18 16:46 ` Stephen Hemminger 2021-08-18 18:04 ` Jerin Jacob 0 siblings, 1 reply; 45+ messages in thread From: Stephen Hemminger @ 2021-08-18 16:46 UTC (permalink / raw) To: Jerin Jacob Cc: Jerin Jacob, dpdk-dev, Bruce Richardson, Ray Kinsella, Thomas Monjalon, David Marchand, Dmitry Kozlyuk, Narcisa Ana Maria Vasile, Dmitry Malloy (MESHCHANINOV), Pallavi Kadam, Ananyev, Konstantin, Ruifeng Wang (Arm Technology China), Jan Viktorin, David Christensen On Wed, 18 Aug 2021 15:07:25 +0530 Jerin Jacob <jerinjacobk@gmail.com> wrote: > On Tue, Aug 17, 2021 at 9:22 PM Stephen Hemminger > <stephen@networkplumber.org> wrote: > > > > On Tue, 17 Aug 2021 20:57:50 +0530 > > Jerin Jacob <jerinjacobk@gmail.com> wrote: > > > > > On Tue, Aug 17, 2021 at 8:39 PM Stephen Hemminger > > > <stephen@networkplumber.org> wrote: > > > > > > > > On Tue, 17 Aug 2021 13:08:46 +0530 > > > > Jerin Jacob <jerinjacobk@gmail.com> wrote: > > > > > > > > > On Tue, Aug 17, 2021 at 9:23 AM Stephen Hemminger > > > > > <stephen@networkplumber.org> wrote: > > > > > > > > > > > > On Tue, 17 Aug 2021 08:57:18 +0530 > > > > > > <jerinj@marvell.com> wrote: > > > > > > > > > > > > > From: Jerin Jacob <jerinj@marvell.com> > > > > > > > > > > > > > > Introducing oops handling API with following specification > > > > > > > and enable stub implementation for Linux and FreeBSD. > > > > > > > > > > > > > > On rte_eal_init() invocation, the EAL library installs the > > > > > > > oops handler for the essential signals. > > > > > > > The rte_oops_signals_enabled() API provides the list > > > > > > > of signals the library installed by the EAL. > > > > > > > > > > > > This is a big change, and many applications already handle these > > > > > > signals themselves. Therefore adding this needs to be opt-in > > > > > > and not enabled by default. > > > > > > > > > > In order to avoid every application explicitly register this > > > > > sighandler and to cater to the > > > > > co-existing application-specific signal-hander usage. > > > > > The following design has been chosen. (It is mentioned in the commit log, > > > > > I will describe here for more clarity) > > > > > > > > > > Case 1: > > > > > a) The application installs the signal handler prior to rte_eal_init(). > > > > > b) Implementation stores the application-specific signal and replace a > > > > > signal handler as oops eal handler > > > > > c) when application/DPDK get the segfault, the default EAL oops > > > > > handler gets invoked > > > > > d) Then it dumps the EAL specific message, it calls the > > > > > application-specific signal handler > > > > > installed in step 1 by application. This avoids breaking any contract > > > > > with the application. > > > > > i.e Behavior is the same current EAL now. > > > > > That is the reason for not using SA_RESETHAND(which call SIG_DFL after > > > > > eal oops handler instead > > > > > application-specific handler) > > > > > > > > > > Case 2: > > > > > a) The application install the signal handler after rte_eal_init(), > > > > > b) EAL hander get replaced with application handle then the application can call > > > > > rte_oops_decode() to decode. > > > > > > > > > > In order to cater the above use case, rte_oops_signals_enabled() and > > > > > rte_oops_decode() > > > > > provided. > > > > > > > > > > Here we are not breaking any contract with the application. > > > > > Do you have concerns about this design? > > > > > > > > In our application as a service it is important not to do any backtrace > > > > in production. We rely on other infrastructure to process coredumps. > > > > > > Other infrastructure will work. For example, If we are using standard coredump > > > using linux infra. In Current implementation, > > > - EAL handler dump the DPDK OOPS like kernel on stderr > > > - Implementation calls SIG_DFL in eal oops handler > > > - The above step creates the coredump or re-directs any other > > > infrastructure you are using for coredump. > > > > > > > > > > > This should be controlled enabled by a command line argument. > > > > > > If we allow other infrastructure coredump to work as-is, why > > > enable/disable required from eal? > > > > The addition of DPDK OOPS adds additional steps which make all > > faults be identified as the oops code. > > Since we are using SA_ONSTACK it is not losing the original segfault > info. > > I verified like this, Please find below the steps. > > 0) Enable coredump infra in Linux using coredumpctl or so > 1) Apply this series > 2) Apply for the following patch to create a segfault from the library. > This will test, segfault caught by eal and forward to default Linux singal > handler. > > [main]dell[dpdk.org] $ git diff > diff --git a/lib/eal/linux/eal.c b/lib/eal/linux/eal.c > index 3438a96b75..b935c32c98 100644 > --- a/lib/eal/linux/eal.c > +++ b/lib/eal/linux/eal.c > @@ -1338,6 +1338,8 @@ rte_eal_init(int argc, char **argv) > > eal_mcfg_complete(); > > + /* Generate a segfault */ > + *(volatile int *)0x05 = 0; > return fctret; > > } > 3)Build > meson --buildtype debug build > ninja -C build > > 4) Run > $ ./build/app/test/dpdk-test --no-huge -c 0x2 > > Please find oops dump[1] and gdb core dump backtrace[2]. > Gdb core dump trace preserves the original segfault cause and trace. > > Any other concerns? Your new oops handling duplicates existing code in our application (and I know others that do this as well). The problem is that an application may do this before calling rte_eal_init and your new code will break that. Therefore my recommendation is that the new oops handling needs to be not a built in feature of EAL. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [dpdk-dev] [PATCH v2 1/6] eal: introduce oops handling API 2021-08-18 16:46 ` Stephen Hemminger @ 2021-08-18 18:04 ` Jerin Jacob 0 siblings, 0 replies; 45+ messages in thread From: Jerin Jacob @ 2021-08-18 18:04 UTC (permalink / raw) To: Stephen Hemminger Cc: Jerin Jacob, dpdk-dev, Bruce Richardson, Ray Kinsella, Thomas Monjalon, David Marchand, Dmitry Kozlyuk, Narcisa Ana Maria Vasile, Dmitry Malloy (MESHCHANINOV), Pallavi Kadam, Ananyev, Konstantin, Ruifeng Wang (Arm Technology China), Jan Viktorin, David Christensen On Wed, Aug 18, 2021 at 10:16 PM Stephen Hemminger <stephen@networkplumber.org> wrote: > > On Wed, 18 Aug 2021 15:07:25 +0530 > Jerin Jacob <jerinjacobk@gmail.com> wrote: > > > On Tue, Aug 17, 2021 at 9:22 PM Stephen Hemminger > > <stephen@networkplumber.org> wrote: > > > > > > On Tue, 17 Aug 2021 20:57:50 +0530 > > > Jerin Jacob <jerinjacobk@gmail.com> wrote: > > > > > > > On Tue, Aug 17, 2021 at 8:39 PM Stephen Hemminger > > > > <stephen@networkplumber.org> wrote: > > > > > > > > > > On Tue, 17 Aug 2021 13:08:46 +0530 > > > > > Jerin Jacob <jerinjacobk@gmail.com> wrote: > > > > > > > > > > > On Tue, Aug 17, 2021 at 9:23 AM Stephen Hemminger > > > > > > <stephen@networkplumber.org> wrote: > > > > > > > > > > > > > > On Tue, 17 Aug 2021 08:57:18 +0530 > > > > > > > <jerinj@marvell.com> wrote: > > > > > > > > > > > > > > > From: Jerin Jacob <jerinj@marvell.com> > > > > > > > > > > > > > > > > Introducing oops handling API with following specification > > > > > > > > and enable stub implementation for Linux and FreeBSD. > > > > > > > > > > > > > > > > On rte_eal_init() invocation, the EAL library installs the > > > > > > > > oops handler for the essential signals. > > > > > > > > The rte_oops_signals_enabled() API provides the list > > > > > > > > of signals the library installed by the EAL. > > > > > > > > > > > > > > This is a big change, and many applications already handle these > > > > > > > signals themselves. Therefore adding this needs to be opt-in > > > > > > > and not enabled by default. > > > > > > > > > > > > In order to avoid every application explicitly register this > > > > > > sighandler and to cater to the > > > > > > co-existing application-specific signal-hander usage. > > > > > > The following design has been chosen. (It is mentioned in the commit log, > > > > > > I will describe here for more clarity) > > > > > > > > > > > > Case 1: > > > > > > a) The application installs the signal handler prior to rte_eal_init(). > > > > > > b) Implementation stores the application-specific signal and replace a > > > > > > signal handler as oops eal handler > > > > > > c) when application/DPDK get the segfault, the default EAL oops > > > > > > handler gets invoked > > > > > > d) Then it dumps the EAL specific message, it calls the > > > > > > application-specific signal handler > > > > > > installed in step 1 by application. This avoids breaking any contract > > > > > > with the application. > > > > > > i.e Behavior is the same current EAL now. > > > > > > That is the reason for not using SA_RESETHAND(which call SIG_DFL after > > > > > > eal oops handler instead > > > > > > application-specific handler) > > > > > > > > > > > > Case 2: > > > > > > a) The application install the signal handler after rte_eal_init(), > > > > > > b) EAL hander get replaced with application handle then the application can call > > > > > > rte_oops_decode() to decode. > > > > > > > > > > > > In order to cater the above use case, rte_oops_signals_enabled() and > > > > > > rte_oops_decode() > > > > > > provided. > > > > > > > > > > > > Here we are not breaking any contract with the application. > > > > > > Do you have concerns about this design? > > > > > > > > > > In our application as a service it is important not to do any backtrace > > > > > in production. We rely on other infrastructure to process coredumps. > > > > > > > > Other infrastructure will work. For example, If we are using standard coredump > > > > using linux infra. In Current implementation, > > > > - EAL handler dump the DPDK OOPS like kernel on stderr > > > > - Implementation calls SIG_DFL in eal oops handler > > > > - The above step creates the coredump or re-directs any other > > > > infrastructure you are using for coredump. > > > > > > > > > > > > > > This should be controlled enabled by a command line argument. > > > > > > > > If we allow other infrastructure coredump to work as-is, why > > > > enable/disable required from eal? > > > > > > The addition of DPDK OOPS adds additional steps which make all > > > faults be identified as the oops code. > > > > Since we are using SA_ONSTACK it is not losing the original segfault > > info. > > > > I verified like this, Please find below the steps. > > > > 0) Enable coredump infra in Linux using coredumpctl or so > > 1) Apply this series > > 2) Apply for the following patch to create a segfault from the library. > > This will test, segfault caught by eal and forward to default Linux singal > > handler. > > > > [main]dell[dpdk.org] $ git diff > > diff --git a/lib/eal/linux/eal.c b/lib/eal/linux/eal.c > > index 3438a96b75..b935c32c98 100644 > > --- a/lib/eal/linux/eal.c > > +++ b/lib/eal/linux/eal.c > > @@ -1338,6 +1338,8 @@ rte_eal_init(int argc, char **argv) > > > > eal_mcfg_complete(); > > > > + /* Generate a segfault */ > > + *(volatile int *)0x05 = 0; > > return fctret; > > > > } > > 3)Build > > meson --buildtype debug build > > ninja -C build > > > > 4) Run > > $ ./build/app/test/dpdk-test --no-huge -c 0x2 > > > > Please find oops dump[1] and gdb core dump backtrace[2]. > > Gdb core dump trace preserves the original segfault cause and trace. > > > > Any other concerns? > > Your new oops handling duplicates existing code in our application > (and I know others that do this as well). The problem is that an > application may do this before calling rte_eal_init and your new > code will break that. Not sure what it breaks, Could you elaborate on this? Your app signal handler will be called with the original signal the info it is registered before rte_eal_init(). We can have an additional API to disable the oops prints if you insist. (Though I don't the know use case where someone needs this other than someone don't want to see/log this print). If that is rational, I can add API to disable oops print it. I prefer to install it by default as it won't break anything and it helps to not add oops API in existing apps i.e without calling any additional features in all existing applications. > > Therefore my recommendation is that the new oops handling needs > to be not a built in feature of EAL. > > > ^ permalink raw reply [flat|nested] 45+ messages in thread
* [dpdk-dev] [PATCH v2 2/6] eal: oops handling API implementation 2021-08-17 3:27 ` [dpdk-dev] [PATCH v2 0/6] support oops handling jerinj 2021-08-17 3:27 ` [dpdk-dev] [PATCH v2 1/6] eal: introduce oops handling API jerinj @ 2021-08-17 3:27 ` jerinj 2021-08-17 3:52 ` Stephen Hemminger 2021-08-17 3:27 ` [dpdk-dev] [PATCH v2 3/6] eal: support libunwind based backtrace jerinj ` (4 subsequent siblings) 6 siblings, 1 reply; 45+ messages in thread From: jerinj @ 2021-08-17 3:27 UTC (permalink / raw) To: dev Cc: thomas, david.marchand, bruce.richardson, dmitry.kozliuk, navasile, dmitrym, pallavi.kadam, konstantin.ananyev, ruifeng.wang, viktorin, drc, Jerin Jacob From: Jerin Jacob <jerinj@marvell.com> Implement the base oops handling APIs. Signed-off-by: Jerin Jacob <jerinj@marvell.com> --- lib/eal/unix/eal_oops.c | 176 ++++++++++++++++++++++++++++++++++++++-- 1 file changed, 169 insertions(+), 7 deletions(-) diff --git a/lib/eal/unix/eal_oops.c b/lib/eal/unix/eal_oops.c index 53b580f733..7b12cfd5f5 100644 --- a/lib/eal/unix/eal_oops.c +++ b/lib/eal/unix/eal_oops.c @@ -2,35 +2,197 @@ * Copyright(C) 2021 Marvell. */ +#include <inttypes.h> +#include <signal.h> +#include <ucontext.h> +#include <unistd.h> +#include <rte_byteorder.h> +#include <rte_debug.h> +#include <rte_log.h> #include <rte_oops.h> #include "eal_private.h" -void -rte_oops_decode(int sig, siginfo_t *info, ucontext_t *uc) +#define oops_print(...) rte_log(RTE_LOG_ERR, RTE_LOGTYPE_EAL, __VA_ARGS__) + +static int oops_signals[] = {SIGSEGV, SIGBUS, SIGILL, SIGABRT, SIGFPE, SIGSYS}; + +struct oops_signal { + int sig; + bool enabled; + struct sigaction sa; +}; + +static struct oops_signal signals_db[RTE_DIM(oops_signals)]; + +static void +back_trace_dump(ucontext_t *context) +{ + RTE_SET_USED(context); + + rte_dump_stack(); +} +static void +siginfo_dump(int sig, siginfo_t *info) +{ + oops_print("PID: %" PRIdMAX "\n", (intmax_t)getpid()); + + if (info == NULL) + return; + if (sig != info->si_signo) + oops_print("Invalid signal info\n"); + + oops_print("Signal number: %d\n", info->si_signo); + oops_print("Fault address: %p\n", info->si_addr); +} + +static void +mem32_dump(void *ptr) +{ + uint32_t *p = ptr; + int i; + + for (i = 0; i < 16; i++) + oops_print("%p: 0x%x\n", p + i, rte_be_to_cpu_32(p[i])); +} + +static void +stack_dump_header(void) +{ + oops_print("Stack dump:\n"); + oops_print("----------\n"); +} + +static void +code_dump_header(void) +{ + oops_print("Code dump:\n"); + oops_print("----------\n"); +} + +static void +stack_code_dump(void *stack, void *code) +{ + if (stack == NULL || code == NULL) + return; + + oops_print("\n"); + stack_dump_header(); + mem32_dump(stack); + oops_print("\n"); + + code_dump_header(); + mem32_dump(code); + oops_print("\n"); +} +static void +archinfo_dump(ucontext_t *uc) { - RTE_SET_USED(sig); - RTE_SET_USED(info); RTE_SET_USED(uc); + stack_code_dump(NULL, NULL); +} + +static void +default_signal_handler_invoke(int sig) +{ + unsigned int idx; + + for (idx = 0; idx < RTE_DIM(oops_signals); idx++) { + /* Skip disabled signals */ + if (signals_db[idx].sig != sig) + continue; + if (!signals_db[idx].enabled) + continue; + /* Replace with stored handler */ + sigaction(sig, &signals_db[idx].sa, NULL); + kill(getpid(), sig); + } +} + +void +rte_oops_decode(int sig, siginfo_t *info, ucontext_t *uc) +{ + oops_print("Signal info:\n"); + oops_print("------------\n"); + siginfo_dump(sig, info); + oops_print("\n"); + + oops_print("Backtrace:\n"); + oops_print("----------\n"); + back_trace_dump(uc); + oops_print("\n"); + + oops_print("Arch info:\n"); + oops_print("----------\n"); + if (uc) + archinfo_dump(uc); +} + +static void +eal_oops_handler(int sig, siginfo_t *info, void *ctx) +{ + ucontext_t *uc = ctx; + + rte_oops_decode(sig, info, uc); + default_signal_handler_invoke(sig); } int rte_oops_signals_enabled(int *signals) { - RTE_SET_USED(signals); + int count = 0, sig[RTE_OOPS_SIGNALS_MAX]; + unsigned int idx = 0; - return 0; + for (idx = 0; idx < RTE_DIM(oops_signals); idx++) { + if (signals_db[idx].enabled) { + sig[count] = signals_db[idx].sig; + count++; + } + } + if (signals) + memcpy(signals, sig, sizeof(*signals) * count); + + return count; } int eal_oops_init(void) { - return 0; + unsigned int idx, rc = 0; + struct sigaction sa; + + RTE_BUILD_BUG_ON(RTE_DIM(oops_signals) > RTE_OOPS_SIGNALS_MAX); + + sigemptyset(&sa.sa_mask); + sa.sa_sigaction = &eal_oops_handler; + sa.sa_flags = SA_RESTART | SA_SIGINFO | SA_ONSTACK; + + for (idx = 0; idx < RTE_DIM(oops_signals); idx++) { + signals_db[idx].sig = oops_signals[idx]; + /* Get exiting sigaction */ + rc = sigaction(signals_db[idx].sig, NULL, &signals_db[idx].sa); + if (rc) + continue; + /* Replace with oops handler */ + rc = sigaction(signals_db[idx].sig, &sa, NULL); + if (rc) + continue; + signals_db[idx].enabled = true; + } + return rc; } void eal_oops_fini(void) { + unsigned int idx; + + for (idx = 0; idx < RTE_DIM(oops_signals); idx++) { + if (!signals_db[idx].enabled) + continue; + /* Replace with stored handler */ + sigaction(signals_db[idx].sig, &signals_db[idx].sa, NULL); + } } -- 2.32.0 ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [dpdk-dev] [PATCH v2 2/6] eal: oops handling API implementation 2021-08-17 3:27 ` [dpdk-dev] [PATCH v2 2/6] eal: oops handling API implementation jerinj @ 2021-08-17 3:52 ` Stephen Hemminger 2021-08-17 10:24 ` Jerin Jacob 0 siblings, 1 reply; 45+ messages in thread From: Stephen Hemminger @ 2021-08-17 3:52 UTC (permalink / raw) To: jerinj Cc: dev, thomas, david.marchand, bruce.richardson, dmitry.kozliuk, navasile, dmitrym, pallavi.kadam, konstantin.ananyev, ruifeng.wang, viktorin, drc On Tue, 17 Aug 2021 08:57:19 +0530 <jerinj@marvell.com> wrote: > +#define oops_print(...) rte_log(RTE_LOG_ERR, RTE_LOGTYPE_EAL, __VA_ARGS__) It is problematic to call rte_log from a signal handler. The malloc pool maybe corrupted and rte_log can call functions that use malloc. Even rte_dump_stack() is unsafe from these signals. > + > +static int oops_signals[] = {SIGSEGV, SIGBUS, SIGILL, SIGABRT, SIGFPE, SIGSYS}; Should be constant. > + > +struct oops_signal { > + int sig; Redundant, you defined the oops_signals above. > + bool enabled; Redundant, you can just compare with action. > + struct sigaction sa; > +}; > + > +static struct oops_signal signals_db[RTE_DIM(oops_signals)]; > + > +static void > +back_trace_dump(ucontext_t *context) > +{ > + RTE_SET_USED(context); > + > + rte_dump_stack(); > +} rte_dump_stack() is not safe in signal handler: Recommend backtrace_symbols_fd ?? Better yet use libunwind > +static void > +siginfo_dump(int sig, siginfo_t *info) > +{ > + oops_print("PID: %" PRIdMAX "\n", (intmax_t)getpid()); > + > + if (info == NULL) > + return; > + if (sig != info->si_signo) > + oops_print("Invalid signal info\n"); > + > + oops_print("Signal number: %d\n", info->si_signo); > + oops_print("Fault address: %p\n", info->si_addr); > +} > + > +static void > +mem32_dump(void *ptr) Should be const > +{ > + uint32_t *p = ptr; > + int i; > + > + for (i = 0; i < 16; i++) > + oops_print("%p: 0x%x\n", p + i, rte_be_to_cpu_32(p[i])); > +} Why reinvent hexdump? > + > +static void > +stack_dump_header(void) > +{ > + oops_print("Stack dump:\n"); > + oops_print("----------\n"); > +} > + > +static void > +code_dump_header(void) > +{ > + oops_print("Code dump:\n"); > + oops_print("----------\n"); > +} > + > +static void > +stack_code_dump(void *stack, void *code) > +{ > + if (stack == NULL || code == NULL) > + return; > + > + oops_print("\n"); > + stack_dump_header(); > + mem32_dump(stack); > + oops_print("\n"); > + > + code_dump_header(); > + mem32_dump(code); > + oops_print("\n"); > +} > +static void > +archinfo_dump(ucontext_t *uc) > { > - RTE_SET_USED(sig); > - RTE_SET_USED(info); > RTE_SET_USED(uc); > > + stack_code_dump(NULL, NULL); > +} > + > +static void > +default_signal_handler_invoke(int sig) > +{ > + unsigned int idx; > + > + for (idx = 0; idx < RTE_DIM(oops_signals); idx++) { > + /* Skip disabled signals */ > + if (signals_db[idx].sig != sig) > + continue; > + if (!signals_db[idx].enabled) > + continue; > + /* Replace with stored handler */ > + sigaction(sig, &signals_db[idx].sa, NULL); > + kill(getpid(), sig); If you use SA_RESETHAND, you don't need this stuff. > + } > +} > + > +void > +rte_oops_decode(int sig, siginfo_t *info, ucontext_t *uc) > +{ > + oops_print("Signal info:\n"); > + oops_print("------------\n"); > + siginfo_dump(sig, info); > + oops_print("\n"); > + > + oops_print("Backtrace:\n"); > + oops_print("----------\n"); > + back_trace_dump(uc); > + oops_print("\n"); > + > + oops_print("Arch info:\n"); > + oops_print("----------\n"); > + if (uc) > + archinfo_dump(uc); > +} > + > +static void > +eal_oops_handler(int sig, siginfo_t *info, void *ctx) > +{ > + ucontext_t *uc = ctx; > + > + rte_oops_decode(sig, info, uc); > + default_signal_handler_invoke(sig); If you use SA_RESETHAND, then just doing raise(sig) here. > } > > int > rte_oops_signals_enabled(int *signals) Why is this necessary and exported? > { > - RTE_SET_USED(signals); > + int count = 0, sig[RTE_OOPS_SIGNALS_MAX]; > + unsigned int idx = 0; > > - return 0; > + for (idx = 0; idx < RTE_DIM(oops_signals); idx++) { > + if (signals_db[idx].enabled) { > + sig[count] = signals_db[idx].sig; > + count++; > + } > + } > + if (signals) > + memcpy(signals, sig, sizeof(*signals) * count); > + > + return count; > } > > int > eal_oops_init(void) > { > - return 0; > + unsigned int idx, rc = 0; > + struct sigaction sa; > + > + RTE_BUILD_BUG_ON(RTE_DIM(oops_signals) > RTE_OOPS_SIGNALS_MAX); > + > + sigemptyset(&sa.sa_mask); > + sa.sa_sigaction = &eal_oops_handler; > + sa.sa_flags = SA_RESTART | SA_SIGINFO | SA_ONSTACK; > + > + for (idx = 0; idx < RTE_DIM(oops_signals); idx++) { > + signals_db[idx].sig = oops_signals[idx]; > + /* Get exiting sigaction */ > + rc = sigaction(signals_db[idx].sig, NULL, &signals_db[idx].sa); > + if (rc) > + continue; > + /* Replace with oops handler */ > + rc = sigaction(signals_db[idx].sig, &sa, NULL); > + if (rc) > + continue; > + signals_db[idx].enabled = true; > + } > + return rc; > } > > void > eal_oops_fini(void) > { > + unsigned int idx; > + > + for (idx = 0; idx < RTE_DIM(oops_signals); idx++) { > + if (!signals_db[idx].enabled) > + continue; > + /* Replace with stored handler */ > + sigaction(signals_db[idx].sig, &signals_db[idx].sa, NULL); > + } > } ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [dpdk-dev] [PATCH v2 2/6] eal: oops handling API implementation 2021-08-17 3:52 ` Stephen Hemminger @ 2021-08-17 10:24 ` Jerin Jacob 0 siblings, 0 replies; 45+ messages in thread From: Jerin Jacob @ 2021-08-17 10:24 UTC (permalink / raw) To: Stephen Hemminger Cc: Jerin Jacob, dpdk-dev, Thomas Monjalon, David Marchand, Richardson, Bruce, Dmitry Kozlyuk, Narcisa Ana Maria Vasile, Dmitry Malloy (MESHCHANINOV), Pallavi Kadam, Ananyev, Konstantin, Ruifeng Wang (Arm Technology China), Jan Viktorin, David Christensen On Tue, Aug 17, 2021 at 9:22 AM Stephen Hemminger <stephen@networkplumber.org> wrote: > > On Tue, 17 Aug 2021 08:57:19 +0530 > <jerinj@marvell.com> wrote: > > > +#define oops_print(...) rte_log(RTE_LOG_ERR, RTE_LOGTYPE_EAL, __VA_ARGS__) > > It is problematic to call rte_log from a signal handler. > The malloc pool maybe corrupted and rte_log can call functions that > use malloc. OK. What to use instead, fprint(stderr, ...)? > > Even rte_dump_stack() is unsafe from these signals. OK > > > + > > +static int oops_signals[] = {SIGSEGV, SIGBUS, SIGILL, SIGABRT, SIGFPE, SIGSYS}; > > Should be constant. Ack > > > + > > +struct oops_signal { > > + int sig; > > Redundant, you defined the oops_signals above. Ack. > > > + bool enabled; > > Redundant, you can just compare with action. Anyway, we need to database to hold the sigactions. This makes clean to implement rte_oops_signals_enabled(). Also != SIG_DFL is not enabled. > > > + struct sigaction sa; > > +}; > > + > > +static struct oops_signal signals_db[RTE_DIM(oops_signals)]; > > + > > +static void > > +back_trace_dump(ucontext_t *context) > > +{ > > + RTE_SET_USED(context); > > + > > + rte_dump_stack(); > > +} > > rte_dump_stack() is not safe in signal handler: > > Recommend backtrace_symbols_fd ?? > > Better yet use libunwind libunwind is an optional dependency. You can see in the next patch, back_trace_dump() will be implemented with libunwind based stack unwind, if the dependency is met. > > > +static void > > +siginfo_dump(int sig, siginfo_t *info) > > +{ > > + oops_print("PID: %" PRIdMAX "\n", (intmax_t)getpid()); > > + > > + if (info == NULL) > > + return; > > + if (sig != info->si_signo) > > + oops_print("Invalid signal info\n"); > > + > > + oops_print("Signal number: %d\n", info->si_signo); > > + oops_print("Fault address: %p\n", info->si_addr); > > +} > > + > > +static void > > +mem32_dump(void *ptr) > > Should be const Ack. > > > +{ > > + uint32_t *p = ptr; > > + int i; > > + > > + for (i = 0; i < 16; i++) > > + oops_print("%p: 0x%x\n", p + i, rte_be_to_cpu_32(p[i])); > > +} > > Why reinvent hexdump? Make sense. I can change to hexdump, But, it will use rte_log. Shouldn't we use fprint(stderr,..) variant. > > > + > > +static void > > +stack_dump_header(void) > > +{ > > + oops_print("Stack dump:\n"); > > + oops_print("----------\n"); > > +} > > + > > +static void > > +code_dump_header(void) > > +{ > > + oops_print("Code dump:\n"); > > + oops_print("----------\n"); > > +} > > + > > +static void > > +stack_code_dump(void *stack, void *code) > > +{ > > + if (stack == NULL || code == NULL) > > + return; > > + > > + oops_print("\n"); > > + stack_dump_header(); > > + mem32_dump(stack); > > + oops_print("\n"); > > + > > + code_dump_header(); > > + mem32_dump(code); > > + oops_print("\n"); > > +} > > +static void > > +archinfo_dump(ucontext_t *uc) > > { > > - RTE_SET_USED(sig); > > - RTE_SET_USED(info); > > RTE_SET_USED(uc); > > > > + stack_code_dump(NULL, NULL); > > +} > > + > > +static void > > +default_signal_handler_invoke(int sig) > > +{ > > + unsigned int idx; > > + > > + for (idx = 0; idx < RTE_DIM(oops_signals); idx++) { > > + /* Skip disabled signals */ > > + if (signals_db[idx].sig != sig) > > + continue; > > + if (!signals_db[idx].enabled) > > + continue; > > + /* Replace with stored handler */ > > + sigaction(sig, &signals_db[idx].sa, NULL); > > + kill(getpid(), sig); > > If you use SA_RESETHAND, you don't need this stuff. As mentioned in other 1/6 email reply, This is NOT the case where SIG_DFL handler called from eal oops handler, instead, it will be calling the signal handler which is registered prior to rte_eal_init() which is stored local database. > > > + } > > +} > > + > > +void > > +rte_oops_decode(int sig, siginfo_t *info, ucontext_t *uc) > > +{ > > + oops_print("Signal info:\n"); > > + oops_print("------------\n"); > > + siginfo_dump(sig, info); > > + oops_print("\n"); > > + > > + oops_print("Backtrace:\n"); > > + oops_print("----------\n"); > > + back_trace_dump(uc); > > + oops_print("\n"); > > + > > + oops_print("Arch info:\n"); > > + oops_print("----------\n"); > > + if (uc) > > + archinfo_dump(uc); > > +} > > + > > +static void > > +eal_oops_handler(int sig, siginfo_t *info, void *ctx) > > +{ > > + ucontext_t *uc = ctx; > > + > > + rte_oops_decode(sig, info, uc); > > + default_signal_handler_invoke(sig); > > If you use SA_RESETHAND, then just doing raise(sig) here. > > } > > > > int > > rte_oops_signals_enabled(int *signals) > > Why is this necessary and exported? Explained in 1/6 email reply. ^ permalink raw reply [flat|nested] 45+ messages in thread
* [dpdk-dev] [PATCH v2 3/6] eal: support libunwind based backtrace 2021-08-17 3:27 ` [dpdk-dev] [PATCH v2 0/6] support oops handling jerinj 2021-08-17 3:27 ` [dpdk-dev] [PATCH v2 1/6] eal: introduce oops handling API jerinj 2021-08-17 3:27 ` [dpdk-dev] [PATCH v2 2/6] eal: oops handling API implementation jerinj @ 2021-08-17 3:27 ` jerinj 2021-08-17 3:27 ` [dpdk-dev] [PATCH v2 4/6] eal/x86: support register dump for oops jerinj ` (3 subsequent siblings) 6 siblings, 0 replies; 45+ messages in thread From: jerinj @ 2021-08-17 3:27 UTC (permalink / raw) To: dev, Aaron Conole, Michael Santana, Bruce Richardson Cc: thomas, david.marchand, dmitry.kozliuk, navasile, dmitrym, pallavi.kadam, konstantin.ananyev, ruifeng.wang, viktorin, drc, Jerin Jacob From: Jerin Jacob <jerinj@marvell.com> adding optional libwind library dependency to DPDK for enhanced backtrace based on ucontext. Signed-off-by: Jerin Jacob <jerinj@marvell.com> --- .github/workflows/build.yml | 2 +- .travis.yml | 2 +- config/meson.build | 8 +++++++ lib/eal/unix/eal_oops.c | 47 +++++++++++++++++++++++++++++++++++++ 4 files changed, 57 insertions(+), 2 deletions(-) diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml index 7dac20ddeb..caaca207a6 100644 --- a/.github/workflows/build.yml +++ b/.github/workflows/build.yml @@ -93,7 +93,7 @@ jobs: run: sudo apt install -y ccache libnuma-dev python3-setuptools python3-wheel python3-pip python3-pyelftools ninja-build libbsd-dev libpcap-dev libibverbs-dev libcrypto++-dev libfdt-dev libjansson-dev - libarchive-dev + libarchive-dev libunwind-dev - name: Install libabigail build dependencies if no cache is available if: env.ABI_CHECKS == 'true' && steps.libabigail-cache.outputs.cache-hit != 'true' run: sudo apt install -y autoconf automake libtool pkg-config libxml2-dev diff --git a/.travis.yml b/.travis.yml index 23067d9e3c..e72b156014 100644 --- a/.travis.yml +++ b/.travis.yml @@ -16,7 +16,7 @@ addons: packages: &required_packages - [libnuma-dev, python3-setuptools, python3-wheel, python3-pip, python3-pyelftools, ninja-build] - [libbsd-dev, libpcap-dev, libibverbs-dev, libcrypto++-dev, libfdt-dev, libjansson-dev] - - [libarchive-dev] + - [libarchive-dev, libunwind-dev] _aarch64_packages: &aarch64_packages - *required_packages diff --git a/config/meson.build b/config/meson.build index e80421003b..26a85dab6b 100644 --- a/config/meson.build +++ b/config/meson.build @@ -236,6 +236,14 @@ if cc.get_id() == 'clang' and dpdk_conf.get('RTE_ARCH_64') == false dpdk_extra_ldflags += '-latomic' endif +# check for libunwind +unwind_dep = dependency('libunwind', required: false, method: 'pkg-config') +if unwind_dep.found() and cc.has_header('libunwind.h', dependencies: unwind_dep) + dpdk_conf.set('RTE_USE_LIBUNWIND', 1) + add_project_link_arguments('-lunwind', language: 'c') + dpdk_extra_ldflags += '-lunwind' +endif + # add -include rte_config to cflags add_project_arguments('-include', 'rte_config.h', language: 'c') diff --git a/lib/eal/unix/eal_oops.c b/lib/eal/unix/eal_oops.c index 7b12cfd5f5..a7f00ecd4e 100644 --- a/lib/eal/unix/eal_oops.c +++ b/lib/eal/unix/eal_oops.c @@ -26,6 +26,50 @@ struct oops_signal { static struct oops_signal signals_db[RTE_DIM(oops_signals)]; +#if defined(RTE_USE_LIBUNWIND) + +#define BACKTRACE_DEPTH 256 +#define UNW_LOCAL_ONLY +#include <libunwind.h> + +static void +back_trace_dump(ucontext_t *context) +{ + unw_cursor_t cursor; + unw_word_t ip, off; + int rc, level = 0; + char name[256]; + + if (context == NULL) { + rte_dump_stack(); + return; + } + + rc = unw_init_local(&cursor, (unw_context_t *)context); + if (rc < 0) + goto fail; + + for (;;) { + rc = unw_get_reg(&cursor, UNW_REG_IP, &ip); + if (rc < 0) + goto fail; + rc = unw_get_proc_name(&cursor, name, sizeof(name), &off); + if (rc == 0) + oops_print("[%16p]: %s()+0x%" PRIx64 "\n", (void *)ip, + name, (uint64_t)off); + else + oops_print("[%16p]: <unknown>\n", (void *)ip); + rc = unw_step(&cursor); + if (rc <= 0 || ++level >= BACKTRACE_DEPTH) + break; + } + return; +fail: + oops_print("libunwind call failed %s\n", unw_strerror(rc)); +} + +#else + static void back_trace_dump(ucontext_t *context) { @@ -33,6 +77,9 @@ back_trace_dump(ucontext_t *context) rte_dump_stack(); } + +#endif + static void siginfo_dump(int sig, siginfo_t *info) { -- 2.32.0 ^ permalink raw reply [flat|nested] 45+ messages in thread
* [dpdk-dev] [PATCH v2 4/6] eal/x86: support register dump for oops 2021-08-17 3:27 ` [dpdk-dev] [PATCH v2 0/6] support oops handling jerinj ` (2 preceding siblings ...) 2021-08-17 3:27 ` [dpdk-dev] [PATCH v2 3/6] eal: support libunwind based backtrace jerinj @ 2021-08-17 3:27 ` jerinj 2021-08-17 3:27 ` [dpdk-dev] [PATCH v2 5/6] eal/arm64: " jerinj ` (2 subsequent siblings) 6 siblings, 0 replies; 45+ messages in thread From: jerinj @ 2021-08-17 3:27 UTC (permalink / raw) To: dev Cc: thomas, david.marchand, bruce.richardson, dmitry.kozliuk, navasile, dmitrym, pallavi.kadam, konstantin.ananyev, ruifeng.wang, viktorin, drc, Jerin Jacob From: Jerin Jacob <jerinj@marvell.com> Dump the x86 arch state register in oops handling routine. Signed-off-by: Jerin Jacob <jerinj@marvell.com> --- lib/eal/unix/eal_oops.c | 34 ++++++++++++++++++++++++++++++++++ 1 file changed, 34 insertions(+) diff --git a/lib/eal/unix/eal_oops.c b/lib/eal/unix/eal_oops.c index a7f00ecd4e..a0f9526d96 100644 --- a/lib/eal/unix/eal_oops.c +++ b/lib/eal/unix/eal_oops.c @@ -133,6 +133,38 @@ stack_code_dump(void *stack, void *code) mem32_dump(code); oops_print("\n"); } + +#if defined(RTE_ARCH_X86_64) && defined(RTE_EXEC_ENV_LINUX) +static void +archinfo_dump(ucontext_t *uc) +{ + + mcontext_t *mc = &uc->uc_mcontext; + + oops_print("R8 : 0x%.16llx ", mc->gregs[REG_R8]); + oops_print("R9 : 0x%.16llx\n", mc->gregs[REG_R9]); + oops_print("R10: 0x%.16llx ", mc->gregs[REG_R10]); + oops_print("R11: 0x%.16llx\n", mc->gregs[REG_R11]); + oops_print("R12: 0x%.16llx ", mc->gregs[REG_R12]); + oops_print("R13: 0x%.16llx\n", mc->gregs[REG_R13]); + oops_print("R14: 0x%.16llx ", mc->gregs[REG_R14]); + oops_print("R15: 0x%.16llx\n", mc->gregs[REG_R15]); + oops_print("RAX: 0x%.16llx ", mc->gregs[REG_RAX]); + oops_print("RBX: 0x%.16llx\n", mc->gregs[REG_RBX]); + oops_print("RCX: 0x%.16llx ", mc->gregs[REG_RCX]); + oops_print("RDX: 0x%.16llx\n", mc->gregs[REG_RDX]); + oops_print("RBP: 0x%.16llx ", mc->gregs[REG_RBP]); + oops_print("RSP: 0x%.16llx\n", mc->gregs[REG_RSP]); + oops_print("RSI: 0x%.16llx ", mc->gregs[REG_RSI]); + oops_print("RDI: 0x%.16llx\n", mc->gregs[REG_RDI]); + oops_print("RIP: 0x%.16llx ", mc->gregs[REG_RIP]); + oops_print("EFL: 0x%.16llx\n", mc->gregs[REG_EFL]); + + stack_code_dump((void *)mc->gregs[REG_RSP], (void *)mc->gregs[REG_RIP]); +} + +#else + static void archinfo_dump(ucontext_t *uc) { @@ -141,6 +173,8 @@ archinfo_dump(ucontext_t *uc) stack_code_dump(NULL, NULL); } +#endif + static void default_signal_handler_invoke(int sig) { -- 2.32.0 ^ permalink raw reply [flat|nested] 45+ messages in thread
* [dpdk-dev] [PATCH v2 5/6] eal/arm64: support register dump for oops 2021-08-17 3:27 ` [dpdk-dev] [PATCH v2 0/6] support oops handling jerinj ` (3 preceding siblings ...) 2021-08-17 3:27 ` [dpdk-dev] [PATCH v2 4/6] eal/x86: support register dump for oops jerinj @ 2021-08-17 3:27 ` jerinj 2021-08-17 3:27 ` [dpdk-dev] [PATCH v2 6/6] test/oops: support unit test case for oops handling APIs jerinj 2021-09-06 4:17 ` [dpdk-dev] [PATCH v3 0/6] support oops handling jerinj 6 siblings, 0 replies; 45+ messages in thread From: jerinj @ 2021-08-17 3:27 UTC (permalink / raw) To: dev Cc: thomas, david.marchand, bruce.richardson, dmitry.kozliuk, navasile, dmitrym, pallavi.kadam, konstantin.ananyev, ruifeng.wang, viktorin, drc, Jerin Jacob From: Jerin Jacob <jerinj@marvell.com> Dump the arm64 arch state register in oops handling routine. Signed-off-by: Jerin Jacob <jerinj@marvell.com> --- lib/eal/unix/eal_oops.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/lib/eal/unix/eal_oops.c b/lib/eal/unix/eal_oops.c index a0f9526d96..9c783f936a 100644 --- a/lib/eal/unix/eal_oops.c +++ b/lib/eal/unix/eal_oops.c @@ -163,6 +163,25 @@ archinfo_dump(ucontext_t *uc) stack_code_dump((void *)mc->gregs[REG_RSP], (void *)mc->gregs[REG_RIP]); } +#elif defined(RTE_ARCH_ARM64) && defined(RTE_EXEC_ENV_LINUX) + +static void +archinfo_dump(ucontext_t *uc) +{ + mcontext_t *mc = &uc->uc_mcontext; + int i; + + oops_print("PC : 0x%.16llx ", mc->pc); + oops_print("SP : 0x%.16llx\n", mc->sp); + for (i = 0; i < 31; i++) + oops_print("X%.2d: 0x%.16llx%s", i, mc->regs[i], + i & 0x1 ? "\n" : " "); + + oops_print("PSTATE: 0x%.16llx\n", mc->pstate); + + stack_code_dump((void *)mc->sp, (void *)mc->pc); +} + #else static void -- 2.32.0 ^ permalink raw reply [flat|nested] 45+ messages in thread
* [dpdk-dev] [PATCH v2 6/6] test/oops: support unit test case for oops handling APIs 2021-08-17 3:27 ` [dpdk-dev] [PATCH v2 0/6] support oops handling jerinj ` (4 preceding siblings ...) 2021-08-17 3:27 ` [dpdk-dev] [PATCH v2 5/6] eal/arm64: " jerinj @ 2021-08-17 3:27 ` jerinj 2021-09-06 4:17 ` [dpdk-dev] [PATCH v3 0/6] support oops handling jerinj 6 siblings, 0 replies; 45+ messages in thread From: jerinj @ 2021-08-17 3:27 UTC (permalink / raw) To: dev Cc: thomas, david.marchand, bruce.richardson, dmitry.kozliuk, navasile, dmitrym, pallavi.kadam, konstantin.ananyev, ruifeng.wang, viktorin, drc, Jerin Jacob From: Jerin Jacob <jerinj@marvell.com> Added unit test cases for all the oops handling APIs. Signed-off-by: Jerin Jacob <jerinj@marvell.com> --- app/test/meson.build | 2 + app/test/test_oops.c | 121 +++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 123 insertions(+) create mode 100644 app/test/test_oops.c diff --git a/app/test/meson.build b/app/test/meson.build index a7611686ad..1e471ab351 100644 --- a/app/test/meson.build +++ b/app/test/meson.build @@ -97,6 +97,7 @@ test_sources = files( 'test_metrics.c', 'test_mcslock.c', 'test_mp_secondary.c', + 'test_oops.c', 'test_per_lcore.c', 'test_pflock.c', 'test_pmd_perf.c', @@ -236,6 +237,7 @@ fast_tests = [ ['memzone_autotest', false], ['meter_autotest', true], ['multiprocess_autotest', false], + ['oops_autotest', true], ['per_lcore_autotest', true], ['pflock_autotest', true], ['prefetch_autotest', true], diff --git a/app/test/test_oops.c b/app/test/test_oops.c new file mode 100644 index 0000000000..60a7f259c7 --- /dev/null +++ b/app/test/test_oops.c @@ -0,0 +1,121 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(C) 2021 Marvell + */ + +#include <setjmp.h> +#include <signal.h> + +#include <rte_config.h> +#include <rte_oops.h> + +#include "test.h" + +static jmp_buf pc; +static bool detected_segfault; + +static void +segv_handler(int sig, siginfo_t *info, void *ctx) +{ + detected_segfault = true; + rte_oops_decode(sig, info, (ucontext_t *)ctx); + longjmp(pc, 1); +} + +/* OS specific way install the signal segfault handler*/ +static int +segv_handler_install(void) +{ + struct sigaction sa; + + sigemptyset(&sa.sa_mask); + sa.sa_sigaction = &segv_handler; + sa.sa_flags = SA_SIGINFO; + + return sigaction(SIGSEGV, &sa, NULL); +} + +static int +test_oops_generate(void) +{ + int rc; + + rc = segv_handler_install(); + TEST_ASSERT_EQUAL(rc, 0, "rc=%d\n", rc); + + detected_segfault = false; + rc = setjmp(pc); /* Save the execution state */ + if (rc == 0) { + /* Generate a segfault */ + *(volatile int *)0x05 = 0; + } else { /* logjump from segv_handler */ + if (detected_segfault) + return TEST_SUCCESS; + + } + return TEST_FAILED; +} + +static int +test_signal_handler_installed(int count, int *signals) +{ + int i, rc, verified = 0; + struct sigaction sa; + + for (i = 0; i < count; i++) { + rc = sigaction(signals[i], NULL, &sa); + if (rc) { + printf("Failed to get sigaction for %d", signals[i]); + continue; + } + if (sa.sa_handler != SIG_DFL) + verified++; + } + TEST_ASSERT_EQUAL(count, verified, "count=%d verified=%d\n", count, + verified); + return TEST_SUCCESS; +} + +static int +test_oops_signals_enabled(void) +{ + int *signals = NULL; + int i, rc; + + rc = rte_oops_signals_enabled(signals); + TEST_ASSERT_NOT_EQUAL(rc, 0, "rc=%d\n", rc); + + signals = malloc(sizeof(int) * rc); + rc = rte_oops_signals_enabled(signals); + TEST_ASSERT_NOT_EQUAL(rc, 0, "rc=%d\n", rc); + free(signals); + + signals = malloc(sizeof(int) * RTE_OOPS_SIGNALS_MAX); + rc = rte_oops_signals_enabled(signals); + TEST_ASSERT_NOT_EQUAL(rc, 0, "rc=%d\n", rc); + + for (i = 0; i < rc; i++) + TEST_ASSERT_NOT_EQUAL(signals[i], 0, "idx=%d val=%d\n", i, + signals[i]); + + rc = test_signal_handler_installed(rc, signals); + free(signals); + + return rc; +} + +static struct unit_test_suite oops_tests = { + .suite_name = "oops autotest", + .setup = NULL, + .teardown = NULL, + .unit_test_cases = { + TEST_CASE(test_oops_signals_enabled), + TEST_CASE(test_oops_generate), + TEST_CASES_END()}}; + +static int +test_oops(void) +{ + return unit_test_suite_runner(&oops_tests); +} + +REGISTER_TEST_COMMAND(oops_autotest, test_oops); -- 2.32.0 ^ permalink raw reply [flat|nested] 45+ messages in thread
* [dpdk-dev] [PATCH v3 0/6] support oops handling 2021-08-17 3:27 ` [dpdk-dev] [PATCH v2 0/6] support oops handling jerinj ` (5 preceding siblings ...) 2021-08-17 3:27 ` [dpdk-dev] [PATCH v2 6/6] test/oops: support unit test case for oops handling APIs jerinj @ 2021-09-06 4:17 ` jerinj 2021-09-06 4:17 ` [dpdk-dev] [PATCH v3 1/6] eal: introduce oops handling API jerinj ` (6 more replies) 6 siblings, 7 replies; 45+ messages in thread From: jerinj @ 2021-09-06 4:17 UTC (permalink / raw) To: dev Cc: thomas, david.marchand, bruce.richardson, dmitry.kozliuk, navasile, dmitrym, pallavi.kadam, konstantin.ananyev, ruifeng.wang, drc, stephen, Jerin Jacob From: Jerin Jacob <jerinj@marvell.com> v3: - Updated the release notes - Introduce "--no-oops" EAL option to disable default EAL handler. Default EAL oops handler stores the existing handler and invoke after decoding. So there may not be explicit use case to use this. But added, just in case for control to application. Taken the similar appoarach like telemetry where by default it is enabled to avoid updating all the existing applications. - Change oops_print to fprintf as rte_log is not safe from fault handler.(Stephen) - Removed "sig" from signal_db as it is duplicate(Stephen) - Add const to mem32_dump(Stephen) - Add const to oops_signals[](Stephen) v2: - Fix powerpc build (David Christensen) It is handy to get detailed OOPS information like Linux kernel when DPDK application crashes without losing any of the features provided by coredump infrastructure by the OS. This patch series introduces the APIs to handle OOPS in DPDK. Following section details the implementation and API interface to application. On rte_eal_init() invocation and if –no-oops not provided in the EAL command line argument, then EAL library installs the oops handler for the essential signals. The rte_oops_signals_enabled() API provides the list of signals the library installed by the EAL. The default EAL oops handler decodes the oops message using rte_oops_decode() and then calls the signal handler installed by the application before invoking the rte_eal_init(). This scheme will also enable the use of the default coredump handler(for gdb etc.) provided by OS if the application does not install any specific signal handler. The second case where the application installs the signal handler after the rte_eal_init() invocation, rte_oops_decode() provides the means of decoding the oops message in the application's fault handler. Patch split: Patch 1/6: defines the API and stub implementation for Unix systems Patch 2/6: The API implementation Patch 3/6: add an optional libunwind dependency to DPDK for better backtrace in oops. Patch 4/6: x86 specific archinfo like x86 register dump on oops Patch 5/6: arm64 specific archinfo like arm64 register dump on oops Patch 6/6: UT for the new APIs Example command for the build, run, and output logs of an x86-64 linux machine. meson --buildtype debug build ninja -C build echo "oops_autotest" | ./build/app/test/dpdk-test --no-huge -c 0x2 Signal info: ------------ PID: 2439496 Signal number: 11 Fault address: 0x5 Backtrace: ---------- [ 0x55e8b56d5cee]: test_oops_generate()+0x75 [ 0x55e8b5459843]: unit_test_suite_runner()+0x1aa [ 0x55e8b56d605c]: test_oops()+0x13 [ 0x55e8b544bdfc]: cmd_autotest_parsed()+0x55 [ 0x55e8b6063a0d]: cmdline_parse()+0x319 [ 0x55e8b6061dea]: cmdline_valid_buffer()+0x35 [ 0x55e8b6066bd8]: rdline_char_in()+0xc48 [ 0x55e8b606221c]: cmdline_in()+0x62 [ 0x55e8b6062495]: cmdline_interact()+0x56 [ 0x55e8b5459314]: main()+0x65e [ 0x7f54b25d2b25]: __libc_start_main()+0xd5 [ 0x55e8b544bc9e]: _start()+0x2e Arch info: ---------- R8 : 0x0000000000000000 R9 : 0x0000000000000000 R10: 0x00007f54b25b8b48 R11: 0x00007f54b25e7930 R12: 0x00007fffc695e610 R13: 0x0000000000000000 R14: 0x0000000000000000 R15: 0x0000000000000000 RAX: 0x0000000000000005 RBX: 0x0000000000000001 RCX: 0x00007f54b278a943 RDX: 0x3769043bf13a2594 RBP: 0x00007fffc6958340 RSP: 0x00007fffc6958330 RSI: 0x0000000000000000 RDI: 0x000055e8c4c1e380 RIP: 0x000055e8b56d5cee EFL: 0x0000000000010246 Stack dump: ---------- 0x7fffc6958330: 0x6000000 0x7fffc6958334: 0x0 0x7fffc6958338: 0x30cfeac5 0x7fffc695833c: 0x0 0x7fffc6958340: 0xe08395c6 0x7fffc6958344: 0xff7f0000 0x7fffc6958348: 0x439845b5 0x7fffc695834c: 0xe8550000 0x7fffc6958350: 0x0 0x7fffc6958354: 0xb000000 0x7fffc6958358: 0x20445bb9 0x7fffc695835c: 0xe8550000 0x7fffc6958360: 0x925506b6 0x7fffc6958364: 0x0 0x7fffc6958368: 0x0 0x7fffc695836c: 0x0 Code dump: ---------- 0x55e8b56d5cee: 0xc7000000 0x55e8b56d5cf2: 0xeb12 0x55e8b56d5cf6: 0xfb6054b 0x55e8b56d5cfa: 0x87540f84 0x55e8b56d5cfe: 0xc07407b8 0x55e8b56d5d02: 0x0 0x55e8b56d5d06: 0xeb05b8ff 0x55e8b56d5d0a: 0xffffffc9 0x55e8b56d5d0e: 0xc3554889 0x55e8b56d5d12: 0xe54881ec 0x55e8b56d5d16: 0xc0000000 0x55e8b56d5d1a: 0x89bd4cff 0x55e8b56d5d1e: 0xffff4889 0x55e8b56d5d22: 0xb540ffff Jerin Jacob (6): eal: introduce oops handling API eal: oops handling API implementation eal: support libunwind based backtrace eal/x86: support register dump for oops eal/arm64: support register dump for oops test/oops: support unit test case for oops handling APIs .github/workflows/build.yml | 2 +- .travis.yml | 2 +- app/test/meson.build | 2 + app/test/test_oops.c | 122 +++++++++ config/meson.build | 8 + doc/api/doxy-api-index.md | 3 +- doc/guides/linux_gsg/eal_args.include.rst | 4 + doc/guides/rel_notes/release_21_11.rst | 10 + lib/eal/common/eal_common_options.c | 5 + lib/eal/common/eal_internal_cfg.h | 1 + lib/eal/common/eal_options.h | 2 + lib/eal/common/eal_private.h | 3 + lib/eal/freebsd/eal.c | 8 + lib/eal/include/meson.build | 1 + lib/eal/include/rte_oops.h | 101 ++++++++ lib/eal/linux/eal.c | 7 + lib/eal/unix/eal_oops.c | 293 ++++++++++++++++++++++ lib/eal/unix/meson.build | 1 + lib/eal/version.map | 4 + 19 files changed, 576 insertions(+), 3 deletions(-) create mode 100644 app/test/test_oops.c create mode 100644 lib/eal/include/rte_oops.h create mode 100644 lib/eal/unix/eal_oops.c -- 2.33.0 ^ permalink raw reply [flat|nested] 45+ messages in thread
* [dpdk-dev] [PATCH v3 1/6] eal: introduce oops handling API 2021-09-06 4:17 ` [dpdk-dev] [PATCH v3 0/6] support oops handling jerinj @ 2021-09-06 4:17 ` jerinj 2021-09-06 4:17 ` [dpdk-dev] [PATCH v3 2/6] eal: oops handling API implementation jerinj ` (5 subsequent siblings) 6 siblings, 0 replies; 45+ messages in thread From: jerinj @ 2021-09-06 4:17 UTC (permalink / raw) To: dev, Bruce Richardson, Ray Kinsella Cc: thomas, david.marchand, dmitry.kozliuk, navasile, dmitrym, pallavi.kadam, konstantin.ananyev, ruifeng.wang, drc, stephen, Jerin Jacob From: Jerin Jacob <jerinj@marvell.com> Introducing oops handling API with following specification and enable stub implementation for Linux and FreeBSD. On rte_eal_init() invocation and if –no-oops not provided in the EAL command line argument, then EAL library installs the oops handler for the essential signals. The rte_oops_signals_enabled() API provides the list of signals the library installed by the EAL. The default EAL oops handler decodes the oops message using rte_oops_decode() and then calls the signal handler installed by the application before invoking the rte_eal_init(). This scheme will also enable the use of the default coredump handler(for gdb etc.) provided by OS if the application does not install any specific signal handler. The second case where the application installs the signal handler after the rte_eal_init() invocation, rte_oops_decode() provides the means of decoding the oops message in the application's fault handler. Signed-off-by: Jerin Jacob <jerinj@marvell.com> --- doc/api/doxy-api-index.md | 3 +- doc/guides/linux_gsg/eal_args.include.rst | 4 + doc/guides/rel_notes/release_21_11.rst | 10 +++ lib/eal/common/eal_common_options.c | 5 ++ lib/eal/common/eal_internal_cfg.h | 1 + lib/eal/common/eal_options.h | 2 + lib/eal/common/eal_private.h | 3 + lib/eal/freebsd/eal.c | 8 ++ lib/eal/include/meson.build | 1 + lib/eal/include/rte_oops.h | 101 ++++++++++++++++++++++ lib/eal/linux/eal.c | 7 ++ lib/eal/unix/eal_oops.c | 36 ++++++++ lib/eal/unix/meson.build | 1 + lib/eal/version.map | 4 + 14 files changed, 185 insertions(+), 1 deletion(-) create mode 100644 lib/eal/include/rte_oops.h create mode 100644 lib/eal/unix/eal_oops.c diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md index 1992107a03..0d0da35205 100644 --- a/doc/api/doxy-api-index.md +++ b/doc/api/doxy-api-index.md @@ -215,7 +215,8 @@ The public API headers are grouped by topics: [log] (@ref rte_log.h), [errno] (@ref rte_errno.h), [trace] (@ref rte_trace.h), - [trace_point] (@ref rte_trace_point.h) + [trace_point] (@ref rte_trace_point.h), + [oops] (@ref rte_oops.h) - **misc**: [EAL config] (@ref rte_eal.h), diff --git a/doc/guides/linux_gsg/eal_args.include.rst b/doc/guides/linux_gsg/eal_args.include.rst index 96baa4a9b0..8db320bc07 100644 --- a/doc/guides/linux_gsg/eal_args.include.rst +++ b/doc/guides/linux_gsg/eal_args.include.rst @@ -226,3 +226,7 @@ Other options To disable use of max SIMD bitwidth limit:: --force-max-simd-bitwidth=0 + +* ``--no-oops``: + + Disable default EAL oops handler. diff --git a/doc/guides/rel_notes/release_21_11.rst b/doc/guides/rel_notes/release_21_11.rst index 675b573834..ba31a5dbed 100644 --- a/doc/guides/rel_notes/release_21_11.rst +++ b/doc/guides/rel_notes/release_21_11.rst @@ -62,6 +62,16 @@ New Features * Added bus-level parsing of the devargs syntax. * Kept compatibility with the legacy syntax as parsing fallback. +* **Added APIs for oops handling support.** + + Added support for decoding the oops fault with ``libunwind`` based backtrace, + architecture-specific register dump, instruction memory dump, and + stack memory dump. EAL installs the default oops handler if ``no-oops`` EAL + command line argument is not provided. The default EAL oops handler stores the + existing handler and invoke after decoding. It also offers ``rte_oops_decode`` + API to integrate the EAL oops decode function where the application does not + use the default EAL handler. + Removed Items ------------- diff --git a/lib/eal/common/eal_common_options.c b/lib/eal/common/eal_common_options.c index ff5861b5f3..b359e55485 100644 --- a/lib/eal/common/eal_common_options.c +++ b/lib/eal/common/eal_common_options.c @@ -107,6 +107,7 @@ eal_long_options[] = { {OPT_TELEMETRY, 0, NULL, OPT_TELEMETRY_NUM }, {OPT_NO_TELEMETRY, 0, NULL, OPT_NO_TELEMETRY_NUM }, {OPT_FORCE_MAX_SIMD_BITWIDTH, 1, NULL, OPT_FORCE_MAX_SIMD_BITWIDTH_NUM}, + {OPT_NO_OOPS, 0, NULL, OPT_NO_OOPS_NUM }, /* legacy options that will be removed in future */ {OPT_PCI_BLACKLIST, 1, NULL, OPT_PCI_BLACKLIST_NUM }, @@ -1825,6 +1826,9 @@ eal_parse_common_option(int opt, const char *optarg, return -1; } break; + case OPT_NO_OOPS_NUM: + conf->no_oops = 1; + break; /* don't know what to do, leave this to caller */ default: @@ -2128,6 +2132,7 @@ eal_common_usage(void) " --"OPT_TELEMETRY" Enable telemetry support (on by default)\n" " --"OPT_NO_TELEMETRY" Disable telemetry support\n" " --"OPT_FORCE_MAX_SIMD_BITWIDTH" Force the max SIMD bitwidth\n" + " --"OPT_NO_OOPS" Disable default oops EAL handler(on by default)\n" "\nEAL options for DEBUG use only:\n" " --"OPT_HUGE_UNLINK" Unlink hugepage files after init\n" " --"OPT_NO_HUGE" Use malloc instead of hugetlbfs\n" diff --git a/lib/eal/common/eal_internal_cfg.h b/lib/eal/common/eal_internal_cfg.h index d6c0470eb8..687aa062ea 100644 --- a/lib/eal/common/eal_internal_cfg.h +++ b/lib/eal/common/eal_internal_cfg.h @@ -94,6 +94,7 @@ struct internal_config { unsigned int no_telemetry; /**< true to disable Telemetry */ struct simd_bitwidth max_simd_bitwidth; /**< max simd bitwidth path to use */ + unsigned int no_oops; /**< true to disable oops */ }; void eal_reset_internal_config(struct internal_config *internal_cfg); diff --git a/lib/eal/common/eal_options.h b/lib/eal/common/eal_options.h index 7b348e707f..b0256d7529 100644 --- a/lib/eal/common/eal_options.h +++ b/lib/eal/common/eal_options.h @@ -93,6 +93,8 @@ enum { OPT_NO_TELEMETRY_NUM, #define OPT_FORCE_MAX_SIMD_BITWIDTH "force-max-simd-bitwidth" OPT_FORCE_MAX_SIMD_BITWIDTH_NUM, +#define OPT_NO_OOPS "no-oops" + OPT_NO_OOPS_NUM, /* legacy option that will be removed in future */ #define OPT_PCI_BLACKLIST "pci-blacklist" diff --git a/lib/eal/common/eal_private.h b/lib/eal/common/eal_private.h index 64cf4e81c8..c3a490d803 100644 --- a/lib/eal/common/eal_private.h +++ b/lib/eal/common/eal_private.h @@ -716,6 +716,9 @@ void __rte_thread_init(unsigned int lcore_id, rte_cpuset_t *cpuset); */ void __rte_thread_uninit(void); +int eal_oops_init(void); +void eal_oops_fini(void); + /** * asprintf(3) replacement for Windows. */ diff --git a/lib/eal/freebsd/eal.c b/lib/eal/freebsd/eal.c index 6cee5ae369..6a48a7e95c 100644 --- a/lib/eal/freebsd/eal.c +++ b/lib/eal/freebsd/eal.c @@ -692,6 +692,7 @@ rte_eal_init(int argc, char **argv) return -1; } + thread_id = pthread_self(); eal_reset_internal_config(internal_conf); @@ -719,6 +720,11 @@ rte_eal_init(int argc, char **argv) /* FreeBSD always uses legacy memory model */ internal_conf->legacy_mem = true; + if (internal_conf->no_oops == 0 && eal_oops_init()) { + rte_eal_init_alert("oops init failed."); + rte_errno = ENOENT; + } + if (eal_plugins_init() < 0) { rte_eal_init_alert("Cannot init plugins"); rte_errno = EINVAL; @@ -973,6 +979,8 @@ rte_eal_cleanup(void) rte_eal_memory_detach(); rte_trace_save(); eal_trace_fini(); + if (internal_conf->no_oops == 0) + eal_oops_fini(); eal_cleanup_config(internal_conf); return 0; } diff --git a/lib/eal/include/meson.build b/lib/eal/include/meson.build index 88a9eba12f..6c74bdb7b5 100644 --- a/lib/eal/include/meson.build +++ b/lib/eal/include/meson.build @@ -30,6 +30,7 @@ headers += files( 'rte_malloc.h', 'rte_memory.h', 'rte_memzone.h', + 'rte_oops.h', 'rte_pci_dev_feature_defs.h', 'rte_pci_dev_features.h', 'rte_per_lcore.h', diff --git a/lib/eal/include/rte_oops.h b/lib/eal/include/rte_oops.h new file mode 100644 index 0000000000..0a76c3d242 --- /dev/null +++ b/lib/eal/include/rte_oops.h @@ -0,0 +1,101 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(C) 2020 Marvell. + */ + +#ifndef _RTE_OOPS_H_ +#define _RTE_OOPS_H_ + +#include <rte_common.h> +#include <rte_compat.h> +#include <rte_config.h> + +/** + * @file + * + * RTE oops API + * + * This file provides the oops handling APIs to RTE applications. + * + * On rte_eal_init() invocation and if *--no-oops* not provided in the EAL + * command line argument, then EAL library installs the oops handler for + * the essential signals. The rte_oops_signals_enabled() API provides the list + * of signals the library installed by the EAL. + * + * The default EAL oops handler decodes the oops message using rte_oops_decode() + * and then calls the signal handler installed by the application before + * invoking the rte_eal_init(). This scheme will also enable the use of + * the default coredump handler(for gdb etc.) provided by OS if the application + * does not install any specific signal handler. + * + * The second case where the application installs the signal handler after + * the rte_eal_init() invocation, rte_oops_decode() provides the means of + * decoding the oops message in the application's fault handler. + * + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + */ + +#ifdef __cplusplus +extern "C" { +#endif + +/** + * Maximum number of oops signals enabled in EAL. + * @see rte_oops_signals_enabled() + */ +#define RTE_OOPS_SIGNALS_MAX 32 + +/** + * Get the list of enabled oops signals installed by EAL. + * + * @param [out] signals + * A pointer to store the enabled signals. + * Value NULL is allowed. if not NULL, then the size of this array must be + * at least RTE_OOPS_SIGNALS_MAX. + * + * @return + * Number of enabled oops signals. + */ +__rte_experimental +int rte_oops_signals_enabled(int *signals); + +#if defined(RTE_EXEC_ENV_LINUX) || defined(RTE_EXEC_ENV_FREEBSD) +#include <signal.h> +#include <ucontext.h> + +/** + * Decode an oops + * + * This prototype is same as sa_sigaction defined in signal.h. + * Application must register signal handler using sigaction() with + * sa_flag as SA_SIGINFO flag to get this information from unix OS. + * + * @param sig + * Signal number + * @param info + * Signal info provided by sa_sigaction. Value NULL is allowed. + * @param uc + * ucontext_t provided when signal installed with SA_SIGINFO flag. + * Value NULL is allowed. + * + */ +__rte_experimental +void rte_oops_decode(int sig, siginfo_t *info, ucontext_t *uc); +#else + +/** + * Decode an oops + * + * @param sig + * Signal number + */ +__rte_experimental +void rte_oops_decode(int sig); + +#endif + +#ifdef __cplusplus +} +#endif + +#endif /* _RTE_OOPS_H_ */ diff --git a/lib/eal/linux/eal.c b/lib/eal/linux/eal.c index 3577eaeaa4..0ab43c9e74 100644 --- a/lib/eal/linux/eal.c +++ b/lib/eal/linux/eal.c @@ -1017,6 +1017,11 @@ rte_eal_init(int argc, char **argv) return -1; } + if (internal_conf->no_oops == 0 && eal_oops_init()) { + rte_eal_init_alert("oops init failed."); + rte_errno = ENOENT; + } + if (eal_plugins_init() < 0) { rte_eal_init_alert("Cannot init plugins"); rte_errno = EINVAL; @@ -1370,6 +1375,8 @@ rte_eal_cleanup(void) rte_eal_memory_detach(); rte_trace_save(); eal_trace_fini(); + if (internal_conf->no_oops == 0) + eal_oops_fini(); eal_cleanup_config(internal_conf); return 0; } diff --git a/lib/eal/unix/eal_oops.c b/lib/eal/unix/eal_oops.c new file mode 100644 index 0000000000..53b580f733 --- /dev/null +++ b/lib/eal/unix/eal_oops.c @@ -0,0 +1,36 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(C) 2021 Marvell. + */ + + +#include <rte_oops.h> + +#include "eal_private.h" + +void +rte_oops_decode(int sig, siginfo_t *info, ucontext_t *uc) +{ + RTE_SET_USED(sig); + RTE_SET_USED(info); + RTE_SET_USED(uc); + +} + +int +rte_oops_signals_enabled(int *signals) +{ + RTE_SET_USED(signals); + + return 0; +} + +int +eal_oops_init(void) +{ + return 0; +} + +void +eal_oops_fini(void) +{ +} diff --git a/lib/eal/unix/meson.build b/lib/eal/unix/meson.build index e3ecd3e956..cdd3320669 100644 --- a/lib/eal/unix/meson.build +++ b/lib/eal/unix/meson.build @@ -6,5 +6,6 @@ sources += files( 'eal_unix_memory.c', 'eal_unix_timer.c', 'eal_firmware.c', + 'eal_oops.c', 'rte_thread.c', ) diff --git a/lib/eal/version.map b/lib/eal/version.map index beeb986adc..4106beb6ef 100644 --- a/lib/eal/version.map +++ b/lib/eal/version.map @@ -426,6 +426,10 @@ EXPERIMENTAL { # added in 21.08 rte_power_monitor_multi; # WINDOWS_NO_EXPORT + + # added in 21.11 + rte_oops_signals_enabled; # WINDOWS_NO_EXPORT + rte_oops_decode; # WINDOWS_NO_EXPORT }; INTERNAL { -- 2.33.0 ^ permalink raw reply [flat|nested] 45+ messages in thread
* [dpdk-dev] [PATCH v3 2/6] eal: oops handling API implementation 2021-09-06 4:17 ` [dpdk-dev] [PATCH v3 0/6] support oops handling jerinj 2021-09-06 4:17 ` [dpdk-dev] [PATCH v3 1/6] eal: introduce oops handling API jerinj @ 2021-09-06 4:17 ` jerinj 2021-09-06 4:17 ` [dpdk-dev] [PATCH v3 3/6] eal: support libunwind based backtrace jerinj ` (4 subsequent siblings) 6 siblings, 0 replies; 45+ messages in thread From: jerinj @ 2021-09-06 4:17 UTC (permalink / raw) To: dev Cc: thomas, david.marchand, bruce.richardson, dmitry.kozliuk, navasile, dmitrym, pallavi.kadam, konstantin.ananyev, ruifeng.wang, drc, stephen, Jerin Jacob From: Jerin Jacob <jerinj@marvell.com> Implement the base oops handling APIs. Signed-off-by: Jerin Jacob <jerinj@marvell.com> --- lib/eal/unix/eal_oops.c | 173 ++++++++++++++++++++++++++++++++++++++-- 1 file changed, 166 insertions(+), 7 deletions(-) diff --git a/lib/eal/unix/eal_oops.c b/lib/eal/unix/eal_oops.c index 53b580f733..a480437f23 100644 --- a/lib/eal/unix/eal_oops.c +++ b/lib/eal/unix/eal_oops.c @@ -2,35 +2,194 @@ * Copyright(C) 2021 Marvell. */ +#include <inttypes.h> +#include <signal.h> +#include <ucontext.h> +#include <unistd.h> +#include <rte_byteorder.h> +#include <rte_log.h> #include <rte_oops.h> #include "eal_private.h" -void -rte_oops_decode(int sig, siginfo_t *info, ucontext_t *uc) +/* It is not safe to call rte_log from signal handler due to the fact the + * malloc pool may be corrupted and rte_log uses malloc. + */ +#define oops_print(...) fprintf(stderr, __VA_ARGS__) + +static const int oops_signals[] = {SIGSEGV, SIGBUS, SIGILL, + SIGABRT, SIGFPE, SIGSYS}; + +struct oops_signal { + bool enabled; + struct sigaction sa; +}; + +static struct oops_signal signals_db[RTE_DIM(oops_signals)]; + +static void +back_trace_dump(ucontext_t *context) +{ + RTE_SET_USED(context); +} +static void +siginfo_dump(int sig, siginfo_t *info) +{ + oops_print("PID: %" PRIdMAX "\n", (intmax_t)getpid()); + + if (info == NULL) + return; + if (sig != info->si_signo) + oops_print("Invalid signal info\n"); + + oops_print("Signal number: %d\n", info->si_signo); + oops_print("Fault address: %p\n", info->si_addr); +} + +static void +mem32_dump(const void *ptr) +{ + const uint32_t *p = ptr; + int i; + + for (i = 0; i < 16; i++) + oops_print("%p: 0x%x\n", p + i, rte_be_to_cpu_32(p[i])); +} + +static void +stack_dump_header(void) +{ + oops_print("Stack dump:\n"); + oops_print("----------\n"); +} + +static void +code_dump_header(void) +{ + oops_print("Code dump:\n"); + oops_print("----------\n"); +} + +static void +stack_code_dump(void *stack, void *code) +{ + if (stack == NULL || code == NULL) + return; + + oops_print("\n"); + stack_dump_header(); + mem32_dump(stack); + oops_print("\n"); + + code_dump_header(); + mem32_dump(code); + oops_print("\n"); +} +static void +archinfo_dump(ucontext_t *uc) { - RTE_SET_USED(sig); - RTE_SET_USED(info); RTE_SET_USED(uc); + stack_code_dump(NULL, NULL); +} + +static void +default_signal_handler_invoke(int sig) +{ + unsigned int idx; + + for (idx = 0; idx < RTE_DIM(oops_signals); idx++) { + if (oops_signals[idx] != sig) + continue; + /* Skip disabled signals */ + if (!signals_db[idx].enabled) + continue; + /* Replace with stored handler */ + sigaction(sig, &signals_db[idx].sa, NULL); + kill(getpid(), sig); + } +} + +void +rte_oops_decode(int sig, siginfo_t *info, ucontext_t *uc) +{ + oops_print("Signal info:\n"); + oops_print("------------\n"); + siginfo_dump(sig, info); + oops_print("\n"); + + oops_print("Backtrace:\n"); + oops_print("----------\n"); + back_trace_dump(uc); + oops_print("\n"); + + oops_print("Arch info:\n"); + oops_print("----------\n"); + if (uc) + archinfo_dump(uc); +} + +static void +eal_oops_handler(int sig, siginfo_t *info, void *ctx) +{ + ucontext_t *uc = ctx; + + rte_oops_decode(sig, info, uc); + default_signal_handler_invoke(sig); } int rte_oops_signals_enabled(int *signals) { - RTE_SET_USED(signals); + int count = 0, sig[RTE_OOPS_SIGNALS_MAX]; + unsigned int idx = 0; - return 0; + for (idx = 0; idx < RTE_DIM(oops_signals); idx++) { + if (signals_db[idx].enabled) + sig[count++] = oops_signals[idx]; + } + if (signals) + memcpy(signals, sig, sizeof(*signals) * count); + + return count; } int eal_oops_init(void) { - return 0; + unsigned int idx, rc = 0; + struct sigaction sa; + + RTE_BUILD_BUG_ON(RTE_DIM(oops_signals) > RTE_OOPS_SIGNALS_MAX); + + sigemptyset(&sa.sa_mask); + sa.sa_sigaction = &eal_oops_handler; + sa.sa_flags = SA_RESTART | SA_SIGINFO | SA_ONSTACK; + + for (idx = 0; idx < RTE_DIM(oops_signals); idx++) { + /* Get exiting sigaction */ + rc = sigaction(oops_signals[idx], NULL, &signals_db[idx].sa); + if (rc) + continue; + /* Replace with oops handler */ + rc = sigaction(oops_signals[idx], &sa, NULL); + if (rc) + continue; + signals_db[idx].enabled = true; + } + return rc; } void eal_oops_fini(void) { + unsigned int idx; + + for (idx = 0; idx < RTE_DIM(oops_signals); idx++) { + if (!signals_db[idx].enabled) + continue; + /* Replace with stored handler */ + sigaction(oops_signals[idx], &signals_db[idx].sa, NULL); + } } -- 2.33.0 ^ permalink raw reply [flat|nested] 45+ messages in thread
* [dpdk-dev] [PATCH v3 3/6] eal: support libunwind based backtrace 2021-09-06 4:17 ` [dpdk-dev] [PATCH v3 0/6] support oops handling jerinj 2021-09-06 4:17 ` [dpdk-dev] [PATCH v3 1/6] eal: introduce oops handling API jerinj 2021-09-06 4:17 ` [dpdk-dev] [PATCH v3 2/6] eal: oops handling API implementation jerinj @ 2021-09-06 4:17 ` jerinj 2022-01-27 20:47 ` Stephen Hemminger 2021-09-06 4:17 ` [dpdk-dev] [PATCH v3 4/6] eal/x86: support register dump for oops jerinj ` (3 subsequent siblings) 6 siblings, 1 reply; 45+ messages in thread From: jerinj @ 2021-09-06 4:17 UTC (permalink / raw) To: dev, Aaron Conole, Michael Santana, Bruce Richardson Cc: thomas, david.marchand, dmitry.kozliuk, navasile, dmitrym, pallavi.kadam, konstantin.ananyev, ruifeng.wang, drc, stephen, Jerin Jacob From: Jerin Jacob <jerinj@marvell.com> adding optional libwind library dependency to DPDK for enhanced backtrace based on ucontext. Signed-off-by: Jerin Jacob <jerinj@marvell.com> --- .github/workflows/build.yml | 2 +- .travis.yml | 2 +- config/meson.build | 8 +++++++ lib/eal/unix/eal_oops.c | 45 +++++++++++++++++++++++++++++++++++++ 4 files changed, 55 insertions(+), 2 deletions(-) diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml index 151641e6fa..de985776ed 100644 --- a/.github/workflows/build.yml +++ b/.github/workflows/build.yml @@ -93,7 +93,7 @@ jobs: run: sudo apt install -y ccache libnuma-dev python3-setuptools python3-wheel python3-pip python3-pyelftools ninja-build libbsd-dev libpcap-dev libibverbs-dev libcrypto++-dev libfdt-dev libjansson-dev - libarchive-dev + libarchive-dev libunwind-dev - name: Install libabigail build dependencies if no cache is available if: env.ABI_CHECKS == 'true' && steps.libabigail-cache.outputs.cache-hit != 'true' run: sudo apt install -y autoconf automake libtool pkg-config libxml2-dev diff --git a/.travis.yml b/.travis.yml index 4bb5bf629e..cfb8931d3b 100644 --- a/.travis.yml +++ b/.travis.yml @@ -16,7 +16,7 @@ addons: packages: &required_packages - [libnuma-dev, python3-setuptools, python3-wheel, python3-pip, python3-pyelftools, ninja-build] - [libbsd-dev, libpcap-dev, libibverbs-dev, libcrypto++-dev, libfdt-dev, libjansson-dev] - - [libarchive-dev] + - [libarchive-dev, libunwind-dev] _aarch64_packages: &aarch64_packages - *required_packages diff --git a/config/meson.build b/config/meson.build index 3b5966ec2f..7f4dd52bc5 100644 --- a/config/meson.build +++ b/config/meson.build @@ -237,6 +237,14 @@ if cc.get_id() == 'clang' and dpdk_conf.get('RTE_ARCH_64') == false dpdk_extra_ldflags += '-latomic' endif +# check for libunwind +unwind_dep = dependency('libunwind', required: false, method: 'pkg-config') +if unwind_dep.found() and cc.has_header('libunwind.h', dependencies: unwind_dep) + dpdk_conf.set('RTE_USE_LIBUNWIND', 1) + add_project_link_arguments('-lunwind', language: 'c') + dpdk_extra_ldflags += '-lunwind' +endif + # add -include rte_config to cflags add_project_arguments('-include', 'rte_config.h', language: 'c') diff --git a/lib/eal/unix/eal_oops.c b/lib/eal/unix/eal_oops.c index a480437f23..9c2d9d99d9 100644 --- a/lib/eal/unix/eal_oops.c +++ b/lib/eal/unix/eal_oops.c @@ -28,11 +28,56 @@ struct oops_signal { static struct oops_signal signals_db[RTE_DIM(oops_signals)]; +#if defined(RTE_USE_LIBUNWIND) + +#define BACKTRACE_DEPTH 256 +#define UNW_LOCAL_ONLY +#include <libunwind.h> + +static void +back_trace_dump(ucontext_t *context) +{ + unw_cursor_t cursor; + unw_word_t ip, off; + int rc, level = 0; + char name[256]; + + if (context == NULL) + return; + + rc = unw_init_local(&cursor, (unw_context_t *)context); + if (rc < 0) + goto fail; + + for (;;) { + rc = unw_get_reg(&cursor, UNW_REG_IP, &ip); + if (rc < 0) + goto fail; + rc = unw_get_proc_name(&cursor, name, sizeof(name), &off); + if (rc == 0) + oops_print("[%16p]: %s()+0x%" PRIx64 "\n", (void *)ip, + name, (uint64_t)off); + else + oops_print("[%16p]: <unknown>\n", (void *)ip); + rc = unw_step(&cursor); + if (rc <= 0 || ++level >= BACKTRACE_DEPTH) + break; + } + return; +fail: + oops_print("libunwind call failed %s\n", unw_strerror(rc)); +} + +#else + static void back_trace_dump(ucontext_t *context) { RTE_SET_USED(context); } + +#endif + static void siginfo_dump(int sig, siginfo_t *info) { -- 2.33.0 ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [dpdk-dev] [PATCH v3 3/6] eal: support libunwind based backtrace 2021-09-06 4:17 ` [dpdk-dev] [PATCH v3 3/6] eal: support libunwind based backtrace jerinj @ 2022-01-27 20:47 ` Stephen Hemminger 2022-01-28 4:33 ` Jerin Jacob 0 siblings, 1 reply; 45+ messages in thread From: Stephen Hemminger @ 2022-01-27 20:47 UTC (permalink / raw) To: jerinj Cc: dev, Aaron Conole, Michael Santana, Bruce Richardson, thomas, david.marchand, dmitry.kozliuk, navasile, dmitrym, pallavi.kadam, konstantin.ananyev, ruifeng.wang, drc On Mon, 6 Sep 2021 09:47:29 +0530 <jerinj@marvell.com> wrote: > From: Jerin Jacob <jerinj@marvell.com> > > adding optional libwind library dependency to DPDK for > enhanced backtrace based on ucontext. > > Signed-off-by: Jerin Jacob <jerinj@marvell.com> Was looking for better backtrace and noticed that there is libbacktrace on github (BSD licensed). It provides more information like file and line number. Maybe DPDK should integrate it? PS: existing rte_dump_stack() is not safe from signal handlers. https://bugs.dpdk.org/show_bug.cgi?id=929 ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [dpdk-dev] [PATCH v3 3/6] eal: support libunwind based backtrace 2022-01-27 20:47 ` Stephen Hemminger @ 2022-01-28 4:33 ` Jerin Jacob 2022-01-28 8:41 ` Thomas Monjalon 0 siblings, 1 reply; 45+ messages in thread From: Jerin Jacob @ 2022-01-28 4:33 UTC (permalink / raw) To: Stephen Hemminger Cc: Jerin Jacob, dpdk-dev, Aaron Conole, Michael Santana, Bruce Richardson, Thomas Monjalon, David Marchand, Dmitry Kozlyuk, Narcisa Ana Maria Vasile, Dmitry Malloy (MESHCHANINOV), Pallavi Kadam, Ananyev, Konstantin, Ruifeng Wang (Arm Technology China), David Christensen On Fri, Jan 28, 2022 at 2:18 AM Stephen Hemminger <stephen@networkplumber.org> wrote: > > On Mon, 6 Sep 2021 09:47:29 +0530 > <jerinj@marvell.com> wrote: > > > From: Jerin Jacob <jerinj@marvell.com> > > > > adding optional libwind library dependency to DPDK for > > enhanced backtrace based on ucontext. > > > > Signed-off-by: Jerin Jacob <jerinj@marvell.com> > > > Was looking for better backtrace and noticed that there is libbacktrace > on github (BSD licensed). It provides more information like file and line number. > Maybe DPDK should integrate it? TB already decided to NOT pursue that path. > > > PS: existing rte_dump_stack() is not safe from signal handlers. > https://bugs.dpdk.org/show_bug.cgi?id=929 ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [dpdk-dev] [PATCH v3 3/6] eal: support libunwind based backtrace 2022-01-28 4:33 ` Jerin Jacob @ 2022-01-28 8:41 ` Thomas Monjalon 2022-01-28 14:27 ` Jerin Jacob 0 siblings, 1 reply; 45+ messages in thread From: Thomas Monjalon @ 2022-01-28 8:41 UTC (permalink / raw) To: Stephen Hemminger, Jerin Jacob Cc: Jerin Jacob, dpdk-dev, Aaron Conole, Michael Santana, Bruce Richardson, David Marchand, Dmitry Kozlyuk, Narcisa Ana Maria Vasile, Dmitry Malloy (MESHCHANINOV), Pallavi Kadam, Ananyev, Konstantin, Ruifeng Wang (Arm Technology China), David Christensen 28/01/2022 05:33, Jerin Jacob: > On Fri, Jan 28, 2022 at 2:18 AM Stephen Hemminger > <stephen@networkplumber.org> wrote: > > > > On Mon, 6 Sep 2021 09:47:29 +0530 > > <jerinj@marvell.com> wrote: > > > > > From: Jerin Jacob <jerinj@marvell.com> > > > > > > adding optional libwind library dependency to DPDK for > > > enhanced backtrace based on ucontext. > > > > > > Signed-off-by: Jerin Jacob <jerinj@marvell.com> > > > > > > Was looking for better backtrace and noticed that there is libbacktrace > > on github (BSD licensed). It provides more information like file and line number. > > Maybe DPDK should integrate it? > > TB already decided to NOT pursue that path. I don't remember why. Was it because of adding a dependency in makefile build system? Adding optional dependencies is easier now with Meson. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [dpdk-dev] [PATCH v3 3/6] eal: support libunwind based backtrace 2022-01-28 8:41 ` Thomas Monjalon @ 2022-01-28 14:27 ` Jerin Jacob 2022-01-28 17:05 ` Stephen Hemminger 0 siblings, 1 reply; 45+ messages in thread From: Jerin Jacob @ 2022-01-28 14:27 UTC (permalink / raw) To: Thomas Monjalon Cc: Stephen Hemminger, Jerin Jacob, dpdk-dev, Aaron Conole, Michael Santana, Bruce Richardson, David Marchand, Dmitry Kozlyuk, Narcisa Ana Maria Vasile, Dmitry Malloy (MESHCHANINOV), Pallavi Kadam, Ananyev, Konstantin, Ruifeng Wang (Arm Technology China), David Christensen On Fri, Jan 28, 2022 at 2:11 PM Thomas Monjalon <thomas@monjalon.net> wrote: > > 28/01/2022 05:33, Jerin Jacob: > > On Fri, Jan 28, 2022 at 2:18 AM Stephen Hemminger > > <stephen@networkplumber.org> wrote: > > > > > > On Mon, 6 Sep 2021 09:47:29 +0530 > > > <jerinj@marvell.com> wrote: > > > > > > > From: Jerin Jacob <jerinj@marvell.com> > > > > > > > > adding optional libwind library dependency to DPDK for > > > > enhanced backtrace based on ucontext. > > > > > > > > Signed-off-by: Jerin Jacob <jerinj@marvell.com> > > > > > > > > > Was looking for better backtrace and noticed that there is libbacktrace > > > on github (BSD licensed). It provides more information like file and line number. > > > Maybe DPDK should integrate it? > > > > TB already decided to NOT pursue that path. > > I don't remember why. Feature overlap with systemd features. > Was it because of adding a dependency in makefile build system? > Adding optional dependencies is easier now with Meson. > > ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [dpdk-dev] [PATCH v3 3/6] eal: support libunwind based backtrace 2022-01-28 14:27 ` Jerin Jacob @ 2022-01-28 17:05 ` Stephen Hemminger 0 siblings, 0 replies; 45+ messages in thread From: Stephen Hemminger @ 2022-01-28 17:05 UTC (permalink / raw) To: Jerin Jacob Cc: Thomas Monjalon, Jerin Jacob, dpdk-dev, Aaron Conole, Michael Santana, Bruce Richardson, David Marchand, Dmitry Kozlyuk, Narcisa Ana Maria Vasile, Dmitry Malloy (MESHCHANINOV), Pallavi Kadam, Ananyev, Konstantin, Ruifeng Wang (Arm Technology China), David Christensen On Fri, 28 Jan 2022 19:57:40 +0530 Jerin Jacob <jerinjacobk@gmail.com> wrote: > On Fri, Jan 28, 2022 at 2:11 PM Thomas Monjalon <thomas@monjalon.net> wrote: > > > > 28/01/2022 05:33, Jerin Jacob: > > > On Fri, Jan 28, 2022 at 2:18 AM Stephen Hemminger > > > <stephen@networkplumber.org> wrote: > > > > > > > > On Mon, 6 Sep 2021 09:47:29 +0530 > > > > <jerinj@marvell.com> wrote: > > > > > > > > > From: Jerin Jacob <jerinj@marvell.com> > > > > > > > > > > adding optional libwind library dependency to DPDK for > > > > > enhanced backtrace based on ucontext. > > > > > > > > > > Signed-off-by: Jerin Jacob <jerinj@marvell.com> > > > > > > > > > > > > Was looking for better backtrace and noticed that there is libbacktrace > > > > on github (BSD licensed). It provides more information like file and line number. > > > > Maybe DPDK should integrate it? > > > > > > TB already decided to NOT pursue that path. > > > > I don't remember why. > > Feature overlap with systemd features. > > > Was it because of adding a dependency in makefile build system? > > Adding optional dependencies is easier now with Meson. > > > > > > > Okay, thanks. I may look at the current signal unsafety bug of the current code. ^ permalink raw reply [flat|nested] 45+ messages in thread
* [dpdk-dev] [PATCH v3 4/6] eal/x86: support register dump for oops 2021-09-06 4:17 ` [dpdk-dev] [PATCH v3 0/6] support oops handling jerinj ` (2 preceding siblings ...) 2021-09-06 4:17 ` [dpdk-dev] [PATCH v3 3/6] eal: support libunwind based backtrace jerinj @ 2021-09-06 4:17 ` jerinj 2021-09-06 4:17 ` [dpdk-dev] [PATCH v3 5/6] eal/arm64: " jerinj ` (2 subsequent siblings) 6 siblings, 0 replies; 45+ messages in thread From: jerinj @ 2021-09-06 4:17 UTC (permalink / raw) To: dev Cc: thomas, david.marchand, bruce.richardson, dmitry.kozliuk, navasile, dmitrym, pallavi.kadam, konstantin.ananyev, ruifeng.wang, drc, stephen, Jerin Jacob From: Jerin Jacob <jerinj@marvell.com> Dump the x86 arch state register in oops handling routine. Signed-off-by: Jerin Jacob <jerinj@marvell.com> --- lib/eal/unix/eal_oops.c | 34 ++++++++++++++++++++++++++++++++++ 1 file changed, 34 insertions(+) diff --git a/lib/eal/unix/eal_oops.c b/lib/eal/unix/eal_oops.c index 9c2d9d99d9..a9c22cbe70 100644 --- a/lib/eal/unix/eal_oops.c +++ b/lib/eal/unix/eal_oops.c @@ -131,6 +131,38 @@ stack_code_dump(void *stack, void *code) mem32_dump(code); oops_print("\n"); } + +#if defined(RTE_ARCH_X86_64) && defined(RTE_EXEC_ENV_LINUX) +static void +archinfo_dump(ucontext_t *uc) +{ + + mcontext_t *mc = &uc->uc_mcontext; + + oops_print("R8 : 0x%.16llx ", mc->gregs[REG_R8]); + oops_print("R9 : 0x%.16llx\n", mc->gregs[REG_R9]); + oops_print("R10: 0x%.16llx ", mc->gregs[REG_R10]); + oops_print("R11: 0x%.16llx\n", mc->gregs[REG_R11]); + oops_print("R12: 0x%.16llx ", mc->gregs[REG_R12]); + oops_print("R13: 0x%.16llx\n", mc->gregs[REG_R13]); + oops_print("R14: 0x%.16llx ", mc->gregs[REG_R14]); + oops_print("R15: 0x%.16llx\n", mc->gregs[REG_R15]); + oops_print("RAX: 0x%.16llx ", mc->gregs[REG_RAX]); + oops_print("RBX: 0x%.16llx\n", mc->gregs[REG_RBX]); + oops_print("RCX: 0x%.16llx ", mc->gregs[REG_RCX]); + oops_print("RDX: 0x%.16llx\n", mc->gregs[REG_RDX]); + oops_print("RBP: 0x%.16llx ", mc->gregs[REG_RBP]); + oops_print("RSP: 0x%.16llx\n", mc->gregs[REG_RSP]); + oops_print("RSI: 0x%.16llx ", mc->gregs[REG_RSI]); + oops_print("RDI: 0x%.16llx\n", mc->gregs[REG_RDI]); + oops_print("RIP: 0x%.16llx ", mc->gregs[REG_RIP]); + oops_print("EFL: 0x%.16llx\n", mc->gregs[REG_EFL]); + + stack_code_dump((void *)mc->gregs[REG_RSP], (void *)mc->gregs[REG_RIP]); +} + +#else + static void archinfo_dump(ucontext_t *uc) { @@ -139,6 +171,8 @@ archinfo_dump(ucontext_t *uc) stack_code_dump(NULL, NULL); } +#endif + static void default_signal_handler_invoke(int sig) { -- 2.33.0 ^ permalink raw reply [flat|nested] 45+ messages in thread
* [dpdk-dev] [PATCH v3 5/6] eal/arm64: support register dump for oops 2021-09-06 4:17 ` [dpdk-dev] [PATCH v3 0/6] support oops handling jerinj ` (3 preceding siblings ...) 2021-09-06 4:17 ` [dpdk-dev] [PATCH v3 4/6] eal/x86: support register dump for oops jerinj @ 2021-09-06 4:17 ` jerinj 2021-09-06 4:17 ` [dpdk-dev] [PATCH v3 6/6] test/oops: support unit test case for oops handling APIs jerinj 2021-09-21 17:30 ` [dpdk-dev] [PATCH v3 0/6] support oops handling Thomas Monjalon 6 siblings, 0 replies; 45+ messages in thread From: jerinj @ 2021-09-06 4:17 UTC (permalink / raw) To: dev Cc: thomas, david.marchand, bruce.richardson, dmitry.kozliuk, navasile, dmitrym, pallavi.kadam, konstantin.ananyev, ruifeng.wang, drc, stephen, Jerin Jacob From: Jerin Jacob <jerinj@marvell.com> Dump the arm64 arch state register in oops handling routine. Signed-off-by: Jerin Jacob <jerinj@marvell.com> --- lib/eal/unix/eal_oops.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/lib/eal/unix/eal_oops.c b/lib/eal/unix/eal_oops.c index a9c22cbe70..6793497bee 100644 --- a/lib/eal/unix/eal_oops.c +++ b/lib/eal/unix/eal_oops.c @@ -161,6 +161,25 @@ archinfo_dump(ucontext_t *uc) stack_code_dump((void *)mc->gregs[REG_RSP], (void *)mc->gregs[REG_RIP]); } +#elif defined(RTE_ARCH_ARM64) && defined(RTE_EXEC_ENV_LINUX) + +static void +archinfo_dump(ucontext_t *uc) +{ + mcontext_t *mc = &uc->uc_mcontext; + int i; + + oops_print("PC : 0x%.16llx ", mc->pc); + oops_print("SP : 0x%.16llx\n", mc->sp); + for (i = 0; i < 31; i++) + oops_print("X%.2d: 0x%.16llx%s", i, mc->regs[i], + i & 0x1 ? "\n" : " "); + + oops_print("PSTATE: 0x%.16llx\n", mc->pstate); + + stack_code_dump((void *)mc->sp, (void *)mc->pc); +} + #else static void -- 2.33.0 ^ permalink raw reply [flat|nested] 45+ messages in thread
* [dpdk-dev] [PATCH v3 6/6] test/oops: support unit test case for oops handling APIs 2021-09-06 4:17 ` [dpdk-dev] [PATCH v3 0/6] support oops handling jerinj ` (4 preceding siblings ...) 2021-09-06 4:17 ` [dpdk-dev] [PATCH v3 5/6] eal/arm64: " jerinj @ 2021-09-06 4:17 ` jerinj 2021-09-21 17:30 ` [dpdk-dev] [PATCH v3 0/6] support oops handling Thomas Monjalon 6 siblings, 0 replies; 45+ messages in thread From: jerinj @ 2021-09-06 4:17 UTC (permalink / raw) To: dev Cc: thomas, david.marchand, bruce.richardson, dmitry.kozliuk, navasile, dmitrym, pallavi.kadam, konstantin.ananyev, ruifeng.wang, drc, stephen, Jerin Jacob From: Jerin Jacob <jerinj@marvell.com> Added unit test cases for all the oops handling APIs. Signed-off-by: Jerin Jacob <jerinj@marvell.com> --- app/test/meson.build | 2 + app/test/test_oops.c | 122 +++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 124 insertions(+) create mode 100644 app/test/test_oops.c diff --git a/app/test/meson.build b/app/test/meson.build index a7611686ad..1e471ab351 100644 --- a/app/test/meson.build +++ b/app/test/meson.build @@ -97,6 +97,7 @@ test_sources = files( 'test_metrics.c', 'test_mcslock.c', 'test_mp_secondary.c', + 'test_oops.c', 'test_per_lcore.c', 'test_pflock.c', 'test_pmd_perf.c', @@ -236,6 +237,7 @@ fast_tests = [ ['memzone_autotest', false], ['meter_autotest', true], ['multiprocess_autotest', false], + ['oops_autotest', true], ['per_lcore_autotest', true], ['pflock_autotest', true], ['prefetch_autotest', true], diff --git a/app/test/test_oops.c b/app/test/test_oops.c new file mode 100644 index 0000000000..288761822c --- /dev/null +++ b/app/test/test_oops.c @@ -0,0 +1,122 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(C) 2021 Marvell + */ + +#include <setjmp.h> +#include <signal.h> + +#include <rte_config.h> +#include <rte_oops.h> + +#include "test.h" + +static jmp_buf pc; +static bool detected_segfault; + +static void +segv_handler(int sig, siginfo_t *info, void *ctx) +{ + detected_segfault = true; + rte_oops_decode(sig, info, (ucontext_t *)ctx); + longjmp(pc, 1); +} + +/* OS specific way install the signal segfault handler*/ +static int +segv_handler_install(void) +{ + struct sigaction sa; + + sigemptyset(&sa.sa_mask); + sa.sa_sigaction = &segv_handler; + sa.sa_flags = SA_SIGINFO; + + return sigaction(SIGSEGV, &sa, NULL); +} + +static int +test_oops_generate(void) +{ + int rc; + + rc = segv_handler_install(); + TEST_ASSERT_EQUAL(rc, 0, "rc=%d\n", rc); + + detected_segfault = false; + rc = setjmp(pc); /* Save the execution state */ + if (rc == 0) { + /* Generate a segfault */ + *(volatile int *)0x05 = 0; + } else { /* logjump from segv_handler */ + if (detected_segfault) + return TEST_SUCCESS; + + } + return TEST_FAILED; +} + +static int +test_signal_handler_installed(int count, int *signals) +{ + int i, rc, verified = 0; + struct sigaction sa; + + for (i = 0; i < count; i++) { + rc = sigaction(signals[i], NULL, &sa); + if (rc) { + printf("Failed to get sigaction for %d", signals[i]); + continue; + } + if (sa.sa_handler != SIG_DFL) + verified++; + } + TEST_ASSERT_EQUAL(count, verified, "count=%d verified=%d\n", count, + verified); + return TEST_SUCCESS; +} + +static int +test_oops_signals_enabled(void) +{ + int *signals = NULL; + int i, rc; + + rc = rte_oops_signals_enabled(signals); + if (rc == 0) + return TEST_SUCCESS; + + signals = malloc(sizeof(int) * rc); + rc = rte_oops_signals_enabled(signals); + TEST_ASSERT_NOT_EQUAL(rc, 0, "rc=%d\n", rc); + free(signals); + + signals = malloc(sizeof(int) * RTE_OOPS_SIGNALS_MAX); + rc = rte_oops_signals_enabled(signals); + TEST_ASSERT_NOT_EQUAL(rc, 0, "rc=%d\n", rc); + + for (i = 0; i < rc; i++) + TEST_ASSERT_NOT_EQUAL(signals[i], 0, "idx=%d val=%d\n", i, + signals[i]); + + rc = test_signal_handler_installed(rc, signals); + free(signals); + + return rc; +} + +static struct unit_test_suite oops_tests = { + .suite_name = "oops autotest", + .setup = NULL, + .teardown = NULL, + .unit_test_cases = { + TEST_CASE(test_oops_signals_enabled), + TEST_CASE(test_oops_generate), + TEST_CASES_END()}}; + +static int +test_oops(void) +{ + return unit_test_suite_runner(&oops_tests); +} + +REGISTER_TEST_COMMAND(oops_autotest, test_oops); -- 2.33.0 ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [dpdk-dev] [PATCH v3 0/6] support oops handling 2021-09-06 4:17 ` [dpdk-dev] [PATCH v3 0/6] support oops handling jerinj ` (5 preceding siblings ...) 2021-09-06 4:17 ` [dpdk-dev] [PATCH v3 6/6] test/oops: support unit test case for oops handling APIs jerinj @ 2021-09-21 17:30 ` Thomas Monjalon 2021-09-21 17:54 ` Jerin Jacob 6 siblings, 1 reply; 45+ messages in thread From: Thomas Monjalon @ 2021-09-21 17:30 UTC (permalink / raw) To: Jerin Jacob Cc: dev, david.marchand, bruce.richardson, dmitry.kozliuk, navasile, dmitrym, pallavi.kadam, konstantin.ananyev, ruifeng.wang, drc, stephen, olivier.matz, ferruh.yigit, andrew.rybchenko, ajit.khaparde, mb 06/09/2021 06:17, jerinj@marvell.com: > It is handy to get detailed OOPS information like Linux kernel > when DPDK application crashes without losing any of the features > provided by coredump infrastructure by the OS. > > This patch series introduces the APIs to handle OOPS in DPDK. I don't understand how it is related to DPDK. It looks something to be handled freely by the application without DPDK forcing anything. What is the benefit for other DPDK features? Which problem is it solving? ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [dpdk-dev] [PATCH v3 0/6] support oops handling 2021-09-21 17:30 ` [dpdk-dev] [PATCH v3 0/6] support oops handling Thomas Monjalon @ 2021-09-21 17:54 ` Jerin Jacob 2021-09-22 7:34 ` Thomas Monjalon 0 siblings, 1 reply; 45+ messages in thread From: Jerin Jacob @ 2021-09-21 17:54 UTC (permalink / raw) To: Thomas Monjalon Cc: Jerin Jacob, dpdk-dev, David Marchand, Richardson, Bruce, Dmitry Kozlyuk, Narcisa Ana Maria Vasile, Dmitry Malloy (MESHCHANINOV), Pallavi Kadam, Ananyev, Konstantin, Ruifeng Wang (Arm Technology China), David Christensen, Stephen Hemminger, Olivier Matz, Ferruh Yigit, Andrew Rybchenko, Ajit Khaparde, Morten Brørup On Tue, Sep 21, 2021 at 11:00 PM Thomas Monjalon <thomas@monjalon.net> wrote: > > 06/09/2021 06:17, jerinj@marvell.com: > > It is handy to get detailed OOPS information like Linux kernel > > when DPDK application crashes without losing any of the features > > provided by coredump infrastructure by the OS. > > > > This patch series introduces the APIs to handle OOPS in DPDK. > > I don't understand how it is related to DPDK. It abstracts the execution environment/architecture(See Arch Info in log)[1] details to capture details on fault handlers to enable additional details on fault from DPDK application for additional debugging information. Just like Kernel prints its OOPS on fault. > It looks something to be handled freely by the application > without DPDK forcing anything. This NOT enforcing application to use DPDK OOPS handler, instead, if registered then it uses the default handler. Even if the default handler is registered it invokes the application handler if the application registers the fault handler. So there is not difference in behavior. > What is the benefit for other DPDK features? Could you clarify this question a bit more? > Which problem is it solving? Better debug trace on fault for DPDK application. Instead of faulting with no information. [1] Backtrace: ---------- [ 0x55e8b56d5cee]: test_oops_generate()+0x75 [ 0x55e8b5459843]: unit_test_suite_runner()+0x1aa [ 0x55e8b56d605c]: test_oops()+0x13 [ 0x55e8b544bdfc]: cmd_autotest_parsed()+0x55 [ 0x55e8b6063a0d]: cmdline_parse()+0x319 [ 0x55e8b6061dea]: cmdline_valid_buffer()+0x35 [ 0x55e8b6066bd8]: rdline_char_in()+0xc48 [ 0x55e8b606221c]: cmdline_in()+0x62 [ 0x55e8b6062495]: cmdline_interact()+0x56 [ 0x55e8b5459314]: main()+0x65e [ 0x7f54b25d2b25]: __libc_start_main()+0xd5 [ 0x55e8b544bc9e]: _start()+0x2e Arch info: ---------- R8 : 0x0000000000000000 R9 : 0x0000000000000000 R10: 0x00007f54b25b8b48 R11: 0x00007f54b25e7930 R12: 0x00007fffc695e610 R13: 0x0000000000000000 R14: 0x0000000000000000 R15: 0x0000000000000000 RAX: 0x0000000000000005 RBX: 0x0000000000000001 RCX: 0x00007f54b278a943 RDX: 0x3769043bf13a2594 RBP: 0x00007fffc6958340 RSP: 0x00007fffc6958330 RSI: 0x0000000000000000 RDI: 0x000055e8c4c1e380 RIP: 0x000055e8b56d5cee EFL: 0x0000000000010246 Stack dump: ---------- 0x7fffc6958330: 0x6000000 0x7fffc6958334: 0x0 0x7fffc6958338: 0x30cfeac5 0x7fffc695833c: 0x0 0x7fffc6958340: 0xe08395c6 0x7fffc6958344: 0xff7f0000 0x7fffc6958348: 0x439845b5 0x7fffc695834c: 0xe8550000 0x7fffc6958350: 0x0 0x7fffc6958354: 0xb000000 0x7fffc6958358: 0x20445bb9 0x7fffc695835c: 0xe8550000 0x7fffc6958360: 0x925506b6 0x7fffc6958364: 0x0 0x7fffc6958368: 0x0 0x7fffc695836c: 0x0 Code dump: ---------- 0x55e8b56d5cee: 0xc7000000 0x55e8b56d5cf2: 0xeb12 0x55e8b56d5cf6: 0xfb6054b 0x55e8b56d5cfa: 0x87540f84 0x55e8b56d5cfe: 0xc07407b8 0x55e8b56d5d02: 0x0 0x55e8b56d5d06: 0xeb05b8ff 0x55e8b56d5d0a: 0xffffffc9 0x55e8b56d5d0e: 0xc3554889 0x55e8b56d5d12: 0xe54881ec 0x55e8b56d5d16: 0xc0000000 0x55e8b56d5d1a: 0x89bd4cff 0x55e8b56d5d1e: 0xffff4889 0x55e8b56d5d22: 0xb540ffff > ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [dpdk-dev] [PATCH v3 0/6] support oops handling 2021-09-21 17:54 ` Jerin Jacob @ 2021-09-22 7:34 ` Thomas Monjalon 2021-09-22 8:03 ` Jerin Jacob 0 siblings, 1 reply; 45+ messages in thread From: Thomas Monjalon @ 2021-09-22 7:34 UTC (permalink / raw) To: Jerin Jacob Cc: Jerin Jacob, dpdk-dev, David Marchand, Richardson, Bruce, Dmitry Kozlyuk, Narcisa Ana Maria Vasile, Dmitry Malloy (MESHCHANINOV), Pallavi Kadam, Ananyev, Konstantin, Ruifeng Wang (Arm Technology China), David Christensen, Stephen Hemminger, Olivier Matz, Ferruh Yigit, Andrew Rybchenko, Ajit Khaparde, Morten Brørup 21/09/2021 19:54, Jerin Jacob: > On Tue, Sep 21, 2021 at 11:00 PM Thomas Monjalon <thomas@monjalon.net> wrote: > > > > 06/09/2021 06:17, jerinj@marvell.com: > > > It is handy to get detailed OOPS information like Linux kernel > > > when DPDK application crashes without losing any of the features > > > provided by coredump infrastructure by the OS. > > > > > > This patch series introduces the APIs to handle OOPS in DPDK. > > > > I don't understand how it is related to DPDK. > > It abstracts the execution environment/architecture(See Arch Info in > log)[1] details to capture > details on fault handlers to enable additional details on fault from > DPDK application for > additional debugging information. Just like Kernel prints its OOPS on fault. Not sure it is a good direction to achieve the same features as a kernel. In recent years, the idea was to make DPDK a focused library. > > It looks something to be handled freely by the application > > without DPDK forcing anything. > > This NOT enforcing application to use DPDK OOPS handler, instead, if > registered then > it uses the default handler. > > Even if the default handler is registered it invokes the application > handler if the application registers > the fault handler. So there is not difference in behavior. OK > > What is the benefit for other DPDK features? > > Could you clarify this question a bit more? I mean is it used by other parts of DPDK, or just a standalone feature? > > Which problem is it solving? > > Better debug trace on fault for DPDK application. Instead of faulting > with no information. It does not look to be in the scope of DPDK, or I miss something. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [dpdk-dev] [PATCH v3 0/6] support oops handling 2021-09-22 7:34 ` Thomas Monjalon @ 2021-09-22 8:03 ` Jerin Jacob 2021-09-22 8:33 ` Thomas Monjalon 0 siblings, 1 reply; 45+ messages in thread From: Jerin Jacob @ 2021-09-22 8:03 UTC (permalink / raw) To: Thomas Monjalon Cc: Jerin Jacob, dpdk-dev, David Marchand, Richardson, Bruce, Dmitry Kozlyuk, Narcisa Ana Maria Vasile, Dmitry Malloy (MESHCHANINOV), Pallavi Kadam, Ananyev, Konstantin, Ruifeng Wang (Arm Technology China), David Christensen, Stephen Hemminger, Olivier Matz, Ferruh Yigit, Andrew Rybchenko, Ajit Khaparde, Morten Brørup On Wed, Sep 22, 2021 at 1:04 PM Thomas Monjalon <thomas@monjalon.net> wrote: > > 21/09/2021 19:54, Jerin Jacob: > > On Tue, Sep 21, 2021 at 11:00 PM Thomas Monjalon <thomas@monjalon.net> wrote: > > > > > > 06/09/2021 06:17, jerinj@marvell.com: > > > > It is handy to get detailed OOPS information like Linux kernel > > > > when DPDK application crashes without losing any of the features > > > > provided by coredump infrastructure by the OS. > > > > > > > > This patch series introduces the APIs to handle OOPS in DPDK. > > > > > > I don't understand how it is related to DPDK. > > > > It abstracts the execution environment/architecture(See Arch Info in > > log)[1] details to capture > > details on fault handlers to enable additional details on fault from > > DPDK application for > > additional debugging information. Just like Kernel prints its OOPS on fault. > > Not sure it is a good direction to achieve the same features as a kernel. I just gave an example, that kernel has this feature and DPDK does not have it. And it is good for DPDK applications. Any specific point where you think this feature is not good for DPDK in-tree and out of tree applications? > In recent years, the idea was to make DPDK a focused library. Not sure how this feature is not deviating from that. See below, on libunwind library usage. > > > > It looks something to be handled freely by the application > > > without DPDK forcing anything. > > > > This NOT enforcing application to use DPDK OOPS handler, instead, if > > registered then > > it uses the default handler. > > > > Even if the default handler is registered it invokes the application > > handler if the application registers > > the fault handler. So there is not difference in behavior. > > OK > > > > What is the benefit for other DPDK features? > > > > Could you clarify this question a bit more? > > I mean is it used by other parts of DPDK, or just a standalone feature? Standalone feature in EAL. It can get a crash dump from any internal library if it segfaults. Default handler can be extended if we need more information specific to DPDK libraries if need (For example BPF etc) > > > > Which problem is it solving? > > > > Better debug trace on fault for DPDK application. Instead of faulting > > with no information. > > It does not look to be in the scope of DPDK, or I miss something. I think it is, like we have APIs for creating control threads in EAL. Also, This feature is dependent on libunwind as an optional dependency. So we are not duplicating any other library effort just that integrating all together including arch specific bits in EAL to have a feature for better DPDK application usage. > > ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [dpdk-dev] [PATCH v3 0/6] support oops handling 2021-09-22 8:03 ` Jerin Jacob @ 2021-09-22 8:33 ` Thomas Monjalon 2021-09-22 8:49 ` Jerin Jacob 0 siblings, 1 reply; 45+ messages in thread From: Thomas Monjalon @ 2021-09-22 8:33 UTC (permalink / raw) To: Jerin Jacob Cc: dev, David Marchand, Richardson, Bruce, Dmitry Kozlyuk, Narcisa Ana Maria Vasile, Dmitry Malloy (MESHCHANINOV), Pallavi Kadam, Ananyev, Konstantin, Ruifeng Wang (Arm Technology China), David Christensen, Stephen Hemminger, Olivier Matz, Ferruh Yigit, Andrew Rybchenko, Ajit Khaparde, Morten Brørup, Jerin Jacob, techboard 22/09/2021 10:03, Jerin Jacob: > On Wed, Sep 22, 2021 at 1:04 PM Thomas Monjalon <thomas@monjalon.net> wrote: > > 21/09/2021 19:54, Jerin Jacob: > > > On Tue, Sep 21, 2021 at 11:00 PM Thomas Monjalon <thomas@monjalon.net> wrote: > > > > 06/09/2021 06:17, jerinj@marvell.com: > > > > > It is handy to get detailed OOPS information like Linux kernel > > > > > when DPDK application crashes without losing any of the features > > > > > provided by coredump infrastructure by the OS. > > > > > > > > > > This patch series introduces the APIs to handle OOPS in DPDK. > > > > > > > > I don't understand how it is related to DPDK. > > > > > > It abstracts the execution environment/architecture(See Arch Info in > > > log)[1] details to capture > > > details on fault handlers to enable additional details on fault from > > > DPDK application for > > > additional debugging information. Just like Kernel prints its OOPS on fault. > > > > Not sure it is a good direction to achieve the same features as a kernel. > > I just gave an example, that kernel has this feature and DPDK does not have it. > And it is good for DPDK applications. > > Any specific point where you think this feature is not good for DPDK > in-tree and out of tree applications? No specific. Just a fear we make life more complex for some users, because there are always bugs and unplanned side effects. > > In recent years, the idea was to make DPDK a focused library. > > Not sure how this feature is not deviating from that. See below, on > libunwind library usage. > > > > > > > It looks something to be handled freely by the application > > > > without DPDK forcing anything. > > > > > > This NOT enforcing application to use DPDK OOPS handler, instead, if > > > registered then > > > it uses the default handler. > > > > > > Even if the default handler is registered it invokes the application > > > handler if the application registers > > > the fault handler. So there is not difference in behavior. > > > > OK > > > > > > What is the benefit for other DPDK features? > > > > > > Could you clarify this question a bit more? > > > > I mean is it used by other parts of DPDK, or just a standalone feature? > > Standalone feature in EAL. It can get a crash dump from any internal > library if it segfaults. > Default handler can be extended if we need more information specific > to DPDK libraries if need > (For example BPF etc) > > > > > > > Which problem is it solving? > > > > > > Better debug trace on fault for DPDK application. Instead of faulting > > > with no information. > > > > It does not look to be in the scope of DPDK, or I miss something. > > I think it is, like we have APIs for creating control threads in EAL. > > Also, This feature is dependent on libunwind as an optional dependency. > So we are not duplicating any other library effort just that integrating > all together including arch specific bits in EAL to have a feature for > better DPDK application usage. That's a difficult decision. We need more opinions. We may also discuss it in the techboard meeting today. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [dpdk-dev] [PATCH v3 0/6] support oops handling 2021-09-22 8:33 ` Thomas Monjalon @ 2021-09-22 8:49 ` Jerin Jacob 0 siblings, 0 replies; 45+ messages in thread From: Jerin Jacob @ 2021-09-22 8:49 UTC (permalink / raw) To: Thomas Monjalon Cc: Jerin Jacob, dpdk-dev, David Marchand, Richardson, Bruce, Dmitry Kozlyuk, Narcisa Ana Maria Vasile, Dmitry Malloy (MESHCHANINOV), Pallavi Kadam, Ananyev, Konstantin, Ruifeng Wang (Arm Technology China), David Christensen, Stephen Hemminger, Olivier Matz, Ferruh Yigit, Andrew Rybchenko, Ajit Khaparde, Morten Brørup, techboard On Wed, Sep 22, 2021 at 2:03 PM Thomas Monjalon <thomas@monjalon.net> wrote: > > 22/09/2021 10:03, Jerin Jacob: > > On Wed, Sep 22, 2021 at 1:04 PM Thomas Monjalon <thomas@monjalon.net> wrote: > > > 21/09/2021 19:54, Jerin Jacob: > > > > On Tue, Sep 21, 2021 at 11:00 PM Thomas Monjalon <thomas@monjalon.net> wrote: > > > > > 06/09/2021 06:17, jerinj@marvell.com: > > > > > > It is handy to get detailed OOPS information like Linux kernel > > > > > > when DPDK application crashes without losing any of the features > > > > > > provided by coredump infrastructure by the OS. > > > > > > > > > > > > This patch series introduces the APIs to handle OOPS in DPDK. > > > > > > > > > > I don't understand how it is related to DPDK. > > > > > > > > It abstracts the execution environment/architecture(See Arch Info in > > > > log)[1] details to capture > > > > details on fault handlers to enable additional details on fault from > > > > DPDK application for > > > > additional debugging information. Just like Kernel prints its OOPS on fault. > > > > > > Not sure it is a good direction to achieve the same features as a kernel. > > > > I just gave an example, that kernel has this feature and DPDK does not have it. > > And it is good for DPDK applications. > > > > Any specific point where you think this feature is not good for DPDK > > in-tree and out of tree applications? > > No specific. Just a fear we make life more complex for some users, > because there are always bugs and unplanned side effects. OK. That's more of a non technical thing. I have provided an EAL switch to disable this feature like telemetry has a disable option as EAL argument. It can be used for this purpose. > > > > In recent years, the idea was to make DPDK a focused library. > > > > Not sure how this feature is not deviating from that. See below, on > > libunwind library usage. > > > > > > > > > > It looks something to be handled freely by the application > > > > > without DPDK forcing anything. > > > > > > > > This NOT enforcing application to use DPDK OOPS handler, instead, if > > > > registered then > > > > it uses the default handler. > > > > > > > > Even if the default handler is registered it invokes the application > > > > handler if the application registers > > > > the fault handler. So there is not difference in behavior. > > > > > > OK > > > > > > > > What is the benefit for other DPDK features? > > > > > > > > Could you clarify this question a bit more? > > > > > > I mean is it used by other parts of DPDK, or just a standalone feature? > > > > Standalone feature in EAL. It can get a crash dump from any internal > > library if it segfaults. > > Default handler can be extended if we need more information specific > > to DPDK libraries if need > > (For example BPF etc) > > > > > > > > > > Which problem is it solving? > > > > > > > > Better debug trace on fault for DPDK application. Instead of faulting > > > > with no information. > > > > > > It does not look to be in the scope of DPDK, or I miss something. > > > > I think it is, like we have APIs for creating control threads in EAL. > > > > Also, This feature is dependent on libunwind as an optional dependency. > > So we are not duplicating any other library effort just that integrating > > all together including arch specific bits in EAL to have a feature for > > better DPDK application usage. > > That's a difficult decision. We need more opinions. Sure. > We may also discuss it in the techboard meeting today. Sure. > > ^ permalink raw reply [flat|nested] 45+ messages in thread
* [dpdk-dev] 2/6] eal: oops handling API implementation 2021-07-30 8:49 [dpdk-dev] 0/6] support oops handling jerinj 2021-07-30 8:49 ` [dpdk-dev] 1/6] eal: introduce oops handling API jerinj @ 2021-07-30 8:49 ` jerinj 2021-08-02 22:46 ` David Christensen 2021-07-30 8:49 ` [dpdk-dev] 3/6] eal: support libunwind based backtrace jerinj ` (3 subsequent siblings) 5 siblings, 1 reply; 45+ messages in thread From: jerinj @ 2021-07-30 8:49 UTC (permalink / raw) To: dev Cc: thomas, david.marchand, bruce.richardson, dmitry.kozliuk, navasile, dmitrym, pallavi.kadam, konstantin.ananyev, ruifeng.wang, viktorin, drc, Jerin Jacob From: Jerin Jacob <jerinj@marvell.com> Implement the base oops handling APIs. Signed-off-by: Jerin Jacob <jerinj@marvell.com> --- lib/eal/unix/eal_oops.c | 175 ++++++++++++++++++++++++++++++++++++++-- 1 file changed, 168 insertions(+), 7 deletions(-) diff --git a/lib/eal/unix/eal_oops.c b/lib/eal/unix/eal_oops.c index 53b580f733..1120c8ad8c 100644 --- a/lib/eal/unix/eal_oops.c +++ b/lib/eal/unix/eal_oops.c @@ -2,35 +2,196 @@ * Copyright(C) 2021 Marvell. */ +#include <inttypes.h> +#include <signal.h> +#include <ucontext.h> +#include <unistd.h> +#include <rte_byteorder.h> +#include <rte_log.h> #include <rte_oops.h> #include "eal_private.h" -void -rte_oops_decode(int sig, siginfo_t *info, ucontext_t *uc) +#define oops_print(...) rte_log(RTE_LOG_ERR, RTE_LOGTYPE_EAL, __VA_ARGS__) + +static int oops_signals[] = {SIGSEGV, SIGBUS, SIGILL, SIGABRT, SIGFPE, SIGSYS}; + +struct oops_signal { + int sig; + bool enabled; + struct sigaction sa; +}; + +static struct oops_signal signals_db[RTE_DIM(oops_signals)]; + +static void +back_trace_dump(ucontext_t *context) +{ + RTE_SET_USED(context); + + rte_dump_stack(); +} +static void +siginfo_dump(int sig, siginfo_t *info) +{ + oops_print("PID: %" PRIdMAX "\n", (intmax_t)getpid()); + + if (info == NULL) + return; + if (sig != info->si_signo) + oops_print("Invalid signal info\n"); + + oops_print("Signal number: %d\n", info->si_signo); + oops_print("Fault address: %p\n", info->si_addr); +} + +static void +mem32_dump(void *ptr) +{ + uint32_t *p = ptr; + int i; + + for (i = 0; i < 16; i++) + oops_print("%p: 0x%x\n", p + i, rte_be_to_cpu_32(p[i])); +} + +static void +stack_dump_header(void) +{ + oops_print("Stack dump:\n"); + oops_print("----------\n"); +} + +static void +code_dump_header(void) +{ + oops_print("Code dump:\n"); + oops_print("----------\n"); +} + +static void +stack_code_dump(void *stack, void *code) +{ + if (stack == NULL || code == NULL) + return; + + oops_print("\n"); + stack_dump_header(); + mem32_dump(stack); + oops_print("\n"); + + code_dump_header(); + mem32_dump(code); + oops_print("\n"); +} +static void +archinfo_dump(ucontext_t *uc) { - RTE_SET_USED(sig); - RTE_SET_USED(info); RTE_SET_USED(uc); + stack_code_dump(NULL, NULL); +} + +static void +default_signal_handler_invoke(int sig) +{ + unsigned int idx; + + for (idx = 0; idx < RTE_DIM(oops_signals); idx++) { + /* Skip disabled signals */ + if (signals_db[idx].sig != sig) + continue; + if (!signals_db[idx].enabled) + continue; + /* Replace with stored handler */ + sigaction(sig, &signals_db[idx].sa, NULL); + kill(getpid(), sig); + } +} + +void +rte_oops_decode(int sig, siginfo_t *info, ucontext_t *uc) +{ + oops_print("Signal info:\n"); + oops_print("------------\n"); + siginfo_dump(sig, info); + oops_print("\n"); + + oops_print("Backtrace:\n"); + oops_print("----------\n"); + back_trace_dump(uc); + oops_print("\n"); + + oops_print("Arch info:\n"); + oops_print("----------\n"); + if (uc) + archinfo_dump(uc); +} + +static void +eal_oops_handler(int sig, siginfo_t *info, void *ctx) +{ + ucontext_t *uc = ctx; + + rte_oops_decode(sig, info, uc); + default_signal_handler_invoke(sig); } int rte_oops_signals_enabled(int *signals) { - RTE_SET_USED(signals); + int count = 0, sig[RTE_OOPS_SIGNALS_MAX]; + unsigned int idx = 0; - return 0; + for (idx = 0; idx < RTE_DIM(oops_signals); idx++) { + if (signals_db[idx].enabled) { + sig[count] = signals_db[idx].sig; + count++; + } + } + if (signals) + memcpy(signals, sig, sizeof(*signals) * count); + + return count; } int eal_oops_init(void) { - return 0; + unsigned int idx, rc = 0; + struct sigaction sa; + + RTE_BUILD_BUG_ON(RTE_DIM(oops_signals) > RTE_OOPS_SIGNALS_MAX); + + sigemptyset(&sa.sa_mask); + sa.sa_sigaction = &eal_oops_handler; + sa.sa_flags = SA_RESTART | SA_SIGINFO | SA_ONSTACK; + + for (idx = 0; idx < RTE_DIM(oops_signals); idx++) { + signals_db[idx].sig = oops_signals[idx]; + /* Get exiting sigaction */ + rc = sigaction(signals_db[idx].sig, NULL, &signals_db[idx].sa); + if (rc) + continue; + /* Replace with oops handler */ + rc = sigaction(signals_db[idx].sig, &sa, NULL); + if (rc) + continue; + signals_db[idx].enabled = true; + } + return rc; } void eal_oops_fini(void) { + unsigned int idx; + + for (idx = 0; idx < RTE_DIM(oops_signals); idx++) { + if (!signals_db[idx].enabled) + continue; + /* Replace with stored handler */ + sigaction(signals_db[idx].sig, &signals_db[idx].sa, NULL); + } } -- 2.32.0 ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [dpdk-dev] 2/6] eal: oops handling API implementation 2021-07-30 8:49 ` [dpdk-dev] 2/6] eal: oops handling API implementation jerinj @ 2021-08-02 22:46 ` David Christensen 0 siblings, 0 replies; 45+ messages in thread From: David Christensen @ 2021-08-02 22:46 UTC (permalink / raw) To: jerinj, dev Cc: thomas, david.marchand, bruce.richardson, dmitry.kozliuk, navasile, dmitrym, pallavi.kadam, konstantin.ananyev, ruifeng.wang, viktorin On 7/30/21 1:49 AM, jerinj@marvell.com wrote: > From: Jerin Jacob <jerinj@marvell.com> > > Implement the base oops handling APIs. > > Signed-off-by: Jerin Jacob <jerinj@marvell.com> Building on POWER generates the following error: ninja: Entering directory `build' [1/244] Compiling C object 'lib/76b5a35@@rte_eal@sta/eal_unix_eal_oops.c.o'. ../lib/eal/unix/eal_oops.c: In function ‘back_trace_dump’: ../lib/eal/unix/eal_oops.c:33:2: warning: implicit declaration of function ‘rte_dump_stack’; did you mean ‘rte_bus_scan’? [-Wimplicit-function-declaration] rte_dump_stack(); ^~~~~~~~~~~~~~ rte_bus_scan ../lib/eal/unix/eal_oops.c:33:2: warning: nested extern declaration of ‘rte_dump_stack’ [-Wnested-externs] [19/19] Linking target app/test/dpdk-test. You can fix the issue by adding <rte_debug.h> to eal_oops.c. Must be a hidden include dependency in the x86/ARM code. Dave ^ permalink raw reply [flat|nested] 45+ messages in thread
* [dpdk-dev] 3/6] eal: support libunwind based backtrace 2021-07-30 8:49 [dpdk-dev] 0/6] support oops handling jerinj 2021-07-30 8:49 ` [dpdk-dev] 1/6] eal: introduce oops handling API jerinj 2021-07-30 8:49 ` [dpdk-dev] 2/6] eal: oops handling API implementation jerinj @ 2021-07-30 8:49 ` jerinj 2021-07-30 8:49 ` [dpdk-dev] 4/6] eal/x86: support register dump for oops jerinj ` (2 subsequent siblings) 5 siblings, 0 replies; 45+ messages in thread From: jerinj @ 2021-07-30 8:49 UTC (permalink / raw) To: dev, Aaron Conole, Michael Santana, Bruce Richardson Cc: thomas, david.marchand, dmitry.kozliuk, navasile, dmitrym, pallavi.kadam, konstantin.ananyev, ruifeng.wang, viktorin, drc, Jerin Jacob From: Jerin Jacob <jerinj@marvell.com> adding optional libwind library dependency to DPDK for enhanced backtrace based on ucontext. Signed-off-by: Jerin Jacob <jerinj@marvell.com> --- .github/workflows/build.yml | 2 +- .travis.yml | 2 +- config/meson.build | 8 +++++++ lib/eal/unix/eal_oops.c | 47 +++++++++++++++++++++++++++++++++++++ 4 files changed, 57 insertions(+), 2 deletions(-) diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml index 7dac20ddeb..caaca207a6 100644 --- a/.github/workflows/build.yml +++ b/.github/workflows/build.yml @@ -93,7 +93,7 @@ jobs: run: sudo apt install -y ccache libnuma-dev python3-setuptools python3-wheel python3-pip python3-pyelftools ninja-build libbsd-dev libpcap-dev libibverbs-dev libcrypto++-dev libfdt-dev libjansson-dev - libarchive-dev + libarchive-dev libunwind-dev - name: Install libabigail build dependencies if no cache is available if: env.ABI_CHECKS == 'true' && steps.libabigail-cache.outputs.cache-hit != 'true' run: sudo apt install -y autoconf automake libtool pkg-config libxml2-dev diff --git a/.travis.yml b/.travis.yml index 23067d9e3c..e72b156014 100644 --- a/.travis.yml +++ b/.travis.yml @@ -16,7 +16,7 @@ addons: packages: &required_packages - [libnuma-dev, python3-setuptools, python3-wheel, python3-pip, python3-pyelftools, ninja-build] - [libbsd-dev, libpcap-dev, libibverbs-dev, libcrypto++-dev, libfdt-dev, libjansson-dev] - - [libarchive-dev] + - [libarchive-dev, libunwind-dev] _aarch64_packages: &aarch64_packages - *required_packages diff --git a/config/meson.build b/config/meson.build index e80421003b..26a85dab6b 100644 --- a/config/meson.build +++ b/config/meson.build @@ -236,6 +236,14 @@ if cc.get_id() == 'clang' and dpdk_conf.get('RTE_ARCH_64') == false dpdk_extra_ldflags += '-latomic' endif +# check for libunwind +unwind_dep = dependency('libunwind', required: false, method: 'pkg-config') +if unwind_dep.found() and cc.has_header('libunwind.h', dependencies: unwind_dep) + dpdk_conf.set('RTE_USE_LIBUNWIND', 1) + add_project_link_arguments('-lunwind', language: 'c') + dpdk_extra_ldflags += '-lunwind' +endif + # add -include rte_config to cflags add_project_arguments('-include', 'rte_config.h', language: 'c') diff --git a/lib/eal/unix/eal_oops.c b/lib/eal/unix/eal_oops.c index 1120c8ad8c..118b236f35 100644 --- a/lib/eal/unix/eal_oops.c +++ b/lib/eal/unix/eal_oops.c @@ -25,6 +25,50 @@ struct oops_signal { static struct oops_signal signals_db[RTE_DIM(oops_signals)]; +#if defined(RTE_USE_LIBUNWIND) + +#define BACKTRACE_DEPTH 256 +#define UNW_LOCAL_ONLY +#include <libunwind.h> + +static void +back_trace_dump(ucontext_t *context) +{ + unw_cursor_t cursor; + unw_word_t ip, off; + int rc, level = 0; + char name[256]; + + if (context == NULL) { + rte_dump_stack(); + return; + } + + rc = unw_init_local(&cursor, (unw_context_t *)context); + if (rc < 0) + goto fail; + + for (;;) { + rc = unw_get_reg(&cursor, UNW_REG_IP, &ip); + if (rc < 0) + goto fail; + rc = unw_get_proc_name(&cursor, name, sizeof(name), &off); + if (rc == 0) + oops_print("[%16p]: %s()+0x%" PRIx64 "\n", (void *)ip, + name, (uint64_t)off); + else + oops_print("[%16p]: <unknown>\n", (void *)ip); + rc = unw_step(&cursor); + if (rc <= 0 || ++level >= BACKTRACE_DEPTH) + break; + } + return; +fail: + oops_print("libunwind call failed %s\n", unw_strerror(rc)); +} + +#else + static void back_trace_dump(ucontext_t *context) { @@ -32,6 +76,9 @@ back_trace_dump(ucontext_t *context) rte_dump_stack(); } + +#endif + static void siginfo_dump(int sig, siginfo_t *info) { -- 2.32.0 ^ permalink raw reply [flat|nested] 45+ messages in thread
* [dpdk-dev] 4/6] eal/x86: support register dump for oops 2021-07-30 8:49 [dpdk-dev] 0/6] support oops handling jerinj ` (2 preceding siblings ...) 2021-07-30 8:49 ` [dpdk-dev] 3/6] eal: support libunwind based backtrace jerinj @ 2021-07-30 8:49 ` jerinj 2021-07-30 8:49 ` [dpdk-dev] 5/6] eal/arm64: " jerinj 2021-07-30 8:49 ` [dpdk-dev] 6/6] test/oops: support unit test case for oops handling APIs jerinj 5 siblings, 0 replies; 45+ messages in thread From: jerinj @ 2021-07-30 8:49 UTC (permalink / raw) To: dev Cc: thomas, david.marchand, bruce.richardson, dmitry.kozliuk, navasile, dmitrym, pallavi.kadam, konstantin.ananyev, ruifeng.wang, viktorin, drc, Jerin Jacob From: Jerin Jacob <jerinj@marvell.com> Dump the x86 arch state register in oops handling routine. Signed-off-by: Jerin Jacob <jerinj@marvell.com> --- lib/eal/unix/eal_oops.c | 34 ++++++++++++++++++++++++++++++++++ 1 file changed, 34 insertions(+) diff --git a/lib/eal/unix/eal_oops.c b/lib/eal/unix/eal_oops.c index 118b236f35..da71481ade 100644 --- a/lib/eal/unix/eal_oops.c +++ b/lib/eal/unix/eal_oops.c @@ -132,6 +132,38 @@ stack_code_dump(void *stack, void *code) mem32_dump(code); oops_print("\n"); } + +#if defined(RTE_ARCH_X86_64) && defined(RTE_EXEC_ENV_LINUX) +static void +archinfo_dump(ucontext_t *uc) +{ + + mcontext_t *mc = &uc->uc_mcontext; + + oops_print("R8 : 0x%.16llx ", mc->gregs[REG_R8]); + oops_print("R9 : 0x%.16llx\n", mc->gregs[REG_R9]); + oops_print("R10: 0x%.16llx ", mc->gregs[REG_R10]); + oops_print("R11: 0x%.16llx\n", mc->gregs[REG_R11]); + oops_print("R12: 0x%.16llx ", mc->gregs[REG_R12]); + oops_print("R13: 0x%.16llx\n", mc->gregs[REG_R13]); + oops_print("R14: 0x%.16llx ", mc->gregs[REG_R14]); + oops_print("R15: 0x%.16llx\n", mc->gregs[REG_R15]); + oops_print("RAX: 0x%.16llx ", mc->gregs[REG_RAX]); + oops_print("RBX: 0x%.16llx\n", mc->gregs[REG_RBX]); + oops_print("RCX: 0x%.16llx ", mc->gregs[REG_RCX]); + oops_print("RDX: 0x%.16llx\n", mc->gregs[REG_RDX]); + oops_print("RBP: 0x%.16llx ", mc->gregs[REG_RBP]); + oops_print("RSP: 0x%.16llx\n", mc->gregs[REG_RSP]); + oops_print("RSI: 0x%.16llx ", mc->gregs[REG_RSI]); + oops_print("RDI: 0x%.16llx\n", mc->gregs[REG_RDI]); + oops_print("RIP: 0x%.16llx ", mc->gregs[REG_RIP]); + oops_print("EFL: 0x%.16llx\n", mc->gregs[REG_EFL]); + + stack_code_dump((void *)mc->gregs[REG_RSP], (void *)mc->gregs[REG_RIP]); +} + +#else + static void archinfo_dump(ucontext_t *uc) { @@ -140,6 +172,8 @@ archinfo_dump(ucontext_t *uc) stack_code_dump(NULL, NULL); } +#endif + static void default_signal_handler_invoke(int sig) { -- 2.32.0 ^ permalink raw reply [flat|nested] 45+ messages in thread
* [dpdk-dev] 5/6] eal/arm64: support register dump for oops 2021-07-30 8:49 [dpdk-dev] 0/6] support oops handling jerinj ` (3 preceding siblings ...) 2021-07-30 8:49 ` [dpdk-dev] 4/6] eal/x86: support register dump for oops jerinj @ 2021-07-30 8:49 ` jerinj 2021-08-02 22:49 ` David Christensen 2021-07-30 8:49 ` [dpdk-dev] 6/6] test/oops: support unit test case for oops handling APIs jerinj 5 siblings, 1 reply; 45+ messages in thread From: jerinj @ 2021-07-30 8:49 UTC (permalink / raw) To: dev Cc: thomas, david.marchand, bruce.richardson, dmitry.kozliuk, navasile, dmitrym, pallavi.kadam, konstantin.ananyev, ruifeng.wang, viktorin, drc, Jerin Jacob From: Jerin Jacob <jerinj@marvell.com> Dump the arm64 arch state register in oops handling routine. Signed-off-by: Jerin Jacob <jerinj@marvell.com> --- lib/eal/unix/eal_oops.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/lib/eal/unix/eal_oops.c b/lib/eal/unix/eal_oops.c index da71481ade..7469610d96 100644 --- a/lib/eal/unix/eal_oops.c +++ b/lib/eal/unix/eal_oops.c @@ -162,6 +162,25 @@ archinfo_dump(ucontext_t *uc) stack_code_dump((void *)mc->gregs[REG_RSP], (void *)mc->gregs[REG_RIP]); } +#elif defined(RTE_ARCH_ARM64) && defined(RTE_EXEC_ENV_LINUX) + +static void +archinfo_dump(ucontext_t *uc) +{ + mcontext_t *mc = &uc->uc_mcontext; + int i; + + oops_print("PC : 0x%.16llx", mc->pc); + oops_print("SP : 0x%.16llx\n", mc->sp); + for (i = 0; i < 31; i++) + oops_print("X%.2d: 0x%.16llx%s", i, mc->regs[i], + i & 0x1 ? "\n" : " "); + + oops_print("PSTATE: 0x%.16llx\n", mc->pstate); + + stack_code_dump((void *)mc->sp, (void *)mc->pc); +} + #else static void -- 2.32.0 ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [dpdk-dev] 5/6] eal/arm64: support register dump for oops 2021-07-30 8:49 ` [dpdk-dev] 5/6] eal/arm64: " jerinj @ 2021-08-02 22:49 ` David Christensen 2021-08-16 16:24 ` Jerin Jacob 0 siblings, 1 reply; 45+ messages in thread From: David Christensen @ 2021-08-02 22:49 UTC (permalink / raw) To: jerinj, dev Cc: thomas, david.marchand, bruce.richardson, dmitry.kozliuk, navasile, dmitrym, pallavi.kadam, konstantin.ananyev, ruifeng.wang, viktorin On 7/30/21 1:49 AM, jerinj@marvell.com wrote: > From: Jerin Jacob <jerinj@marvell.com> > > Dump the arm64 arch state register in oops > handling routine. > > Signed-off-by: Jerin Jacob <jerinj@marvell.com> > --- > lib/eal/unix/eal_oops.c | 19 +++++++++++++++++++ > 1 file changed, 19 insertions(+) > > diff --git a/lib/eal/unix/eal_oops.c b/lib/eal/unix/eal_oops.c > index da71481ade..7469610d96 100644 > --- a/lib/eal/unix/eal_oops.c > +++ b/lib/eal/unix/eal_oops.c > @@ -162,6 +162,25 @@ archinfo_dump(ucontext_t *uc) > stack_code_dump((void *)mc->gregs[REG_RSP], (void *)mc->gregs[REG_RIP]); > } > > +#elif defined(RTE_ARCH_ARM64) && defined(RTE_EXEC_ENV_LINUX) > + > +static void > +archinfo_dump(ucontext_t *uc) > +{ > + mcontext_t *mc = &uc->uc_mcontext; > + int i; > + > + oops_print("PC : 0x%.16llx", mc->pc); > + oops_print("SP : 0x%.16llx\n", mc->sp); > + for (i = 0; i < 31; i++) ~~~ Maybe <= instead of < ?? 31 is a strange number of registers and the line feed doesn't seem to line things up for PSTATEn below. > + oops_print("X%.2d: 0x%.16llx%s", i, mc->regs[i], > + i & 0x1 ? "\n" : " "); Dave ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [dpdk-dev] 5/6] eal/arm64: support register dump for oops 2021-08-02 22:49 ` David Christensen @ 2021-08-16 16:24 ` Jerin Jacob 0 siblings, 0 replies; 45+ messages in thread From: Jerin Jacob @ 2021-08-16 16:24 UTC (permalink / raw) To: David Christensen Cc: Jerin Jacob, dpdk-dev, Thomas Monjalon, David Marchand, Richardson, Bruce, Dmitry Kozlyuk, Narcisa Ana Maria Vasile, Dmitry Malloy (MESHCHANINOV), Pallavi Kadam, Ananyev, Konstantin, Ruifeng Wang (Arm Technology China), Jan Viktorin On Tue, Aug 3, 2021 at 4:20 AM David Christensen <drc@linux.vnet.ibm.com> wrote: > > > > On 7/30/21 1:49 AM, jerinj@marvell.com wrote: > > From: Jerin Jacob <jerinj@marvell.com> > > > > Dump the arm64 arch state register in oops > > handling routine. > > > > Signed-off-by: Jerin Jacob <jerinj@marvell.com> > > --- > > lib/eal/unix/eal_oops.c | 19 +++++++++++++++++++ > > 1 file changed, 19 insertions(+) > > > > diff --git a/lib/eal/unix/eal_oops.c b/lib/eal/unix/eal_oops.c > > index da71481ade..7469610d96 100644 > > --- a/lib/eal/unix/eal_oops.c > > +++ b/lib/eal/unix/eal_oops.c > > @@ -162,6 +162,25 @@ archinfo_dump(ucontext_t *uc) > > stack_code_dump((void *)mc->gregs[REG_RSP], (void *)mc->gregs[REG_RIP]); > > } > > > > +#elif defined(RTE_ARCH_ARM64) && defined(RTE_EXEC_ENV_LINUX) > > + > > +static void > > +archinfo_dump(ucontext_t *uc) > > +{ > > + mcontext_t *mc = &uc->uc_mcontext; > > + int i; > > + > > + oops_print("PC : 0x%.16llx", mc->pc); > > + oops_print("SP : 0x%.16llx\n", mc->sp); > > + for (i = 0; i < 31; i++) > ~~~ > Maybe <= instead of < ?? 31 is a strange number of registers and the > line feed doesn't seem to line things up for PSTATEn below. Based on spec https://elixir.bootlin.com/linux/v4.5/source/arch/arm64/include/uapi/asm/sigcontext.h it is 0 from 30 as r31 is SP, it is already part as struct sigcontext::sp. > > > + oops_print("X%.2d: 0x%.16llx%s", i, mc->regs[i], > > + i & 0x1 ? "\n" : " "); > > Dave ^ permalink raw reply [flat|nested] 45+ messages in thread
* [dpdk-dev] 6/6] test/oops: support unit test case for oops handling APIs 2021-07-30 8:49 [dpdk-dev] 0/6] support oops handling jerinj ` (4 preceding siblings ...) 2021-07-30 8:49 ` [dpdk-dev] 5/6] eal/arm64: " jerinj @ 2021-07-30 8:49 ` jerinj 5 siblings, 0 replies; 45+ messages in thread From: jerinj @ 2021-07-30 8:49 UTC (permalink / raw) To: dev Cc: thomas, david.marchand, bruce.richardson, dmitry.kozliuk, navasile, dmitrym, pallavi.kadam, konstantin.ananyev, ruifeng.wang, viktorin, drc, Jerin Jacob From: Jerin Jacob <jerinj@marvell.com> Added unit test cases for all the oops handling APIs. Signed-off-by: Jerin Jacob <jerinj@marvell.com> --- app/test/meson.build | 2 + app/test/test_oops.c | 121 +++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 123 insertions(+) create mode 100644 app/test/test_oops.c diff --git a/app/test/meson.build b/app/test/meson.build index a7611686ad..1e471ab351 100644 --- a/app/test/meson.build +++ b/app/test/meson.build @@ -97,6 +97,7 @@ test_sources = files( 'test_metrics.c', 'test_mcslock.c', 'test_mp_secondary.c', + 'test_oops.c', 'test_per_lcore.c', 'test_pflock.c', 'test_pmd_perf.c', @@ -236,6 +237,7 @@ fast_tests = [ ['memzone_autotest', false], ['meter_autotest', true], ['multiprocess_autotest', false], + ['oops_autotest', true], ['per_lcore_autotest', true], ['pflock_autotest', true], ['prefetch_autotest', true], diff --git a/app/test/test_oops.c b/app/test/test_oops.c new file mode 100644 index 0000000000..60a7f259c7 --- /dev/null +++ b/app/test/test_oops.c @@ -0,0 +1,121 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(C) 2021 Marvell + */ + +#include <setjmp.h> +#include <signal.h> + +#include <rte_config.h> +#include <rte_oops.h> + +#include "test.h" + +static jmp_buf pc; +static bool detected_segfault; + +static void +segv_handler(int sig, siginfo_t *info, void *ctx) +{ + detected_segfault = true; + rte_oops_decode(sig, info, (ucontext_t *)ctx); + longjmp(pc, 1); +} + +/* OS specific way install the signal segfault handler*/ +static int +segv_handler_install(void) +{ + struct sigaction sa; + + sigemptyset(&sa.sa_mask); + sa.sa_sigaction = &segv_handler; + sa.sa_flags = SA_SIGINFO; + + return sigaction(SIGSEGV, &sa, NULL); +} + +static int +test_oops_generate(void) +{ + int rc; + + rc = segv_handler_install(); + TEST_ASSERT_EQUAL(rc, 0, "rc=%d\n", rc); + + detected_segfault = false; + rc = setjmp(pc); /* Save the execution state */ + if (rc == 0) { + /* Generate a segfault */ + *(volatile int *)0x05 = 0; + } else { /* logjump from segv_handler */ + if (detected_segfault) + return TEST_SUCCESS; + + } + return TEST_FAILED; +} + +static int +test_signal_handler_installed(int count, int *signals) +{ + int i, rc, verified = 0; + struct sigaction sa; + + for (i = 0; i < count; i++) { + rc = sigaction(signals[i], NULL, &sa); + if (rc) { + printf("Failed to get sigaction for %d", signals[i]); + continue; + } + if (sa.sa_handler != SIG_DFL) + verified++; + } + TEST_ASSERT_EQUAL(count, verified, "count=%d verified=%d\n", count, + verified); + return TEST_SUCCESS; +} + +static int +test_oops_signals_enabled(void) +{ + int *signals = NULL; + int i, rc; + + rc = rte_oops_signals_enabled(signals); + TEST_ASSERT_NOT_EQUAL(rc, 0, "rc=%d\n", rc); + + signals = malloc(sizeof(int) * rc); + rc = rte_oops_signals_enabled(signals); + TEST_ASSERT_NOT_EQUAL(rc, 0, "rc=%d\n", rc); + free(signals); + + signals = malloc(sizeof(int) * RTE_OOPS_SIGNALS_MAX); + rc = rte_oops_signals_enabled(signals); + TEST_ASSERT_NOT_EQUAL(rc, 0, "rc=%d\n", rc); + + for (i = 0; i < rc; i++) + TEST_ASSERT_NOT_EQUAL(signals[i], 0, "idx=%d val=%d\n", i, + signals[i]); + + rc = test_signal_handler_installed(rc, signals); + free(signals); + + return rc; +} + +static struct unit_test_suite oops_tests = { + .suite_name = "oops autotest", + .setup = NULL, + .teardown = NULL, + .unit_test_cases = { + TEST_CASE(test_oops_signals_enabled), + TEST_CASE(test_oops_generate), + TEST_CASES_END()}}; + +static int +test_oops(void) +{ + return unit_test_suite_runner(&oops_tests); +} + +REGISTER_TEST_COMMAND(oops_autotest, test_oops); -- 2.32.0 ^ permalink raw reply [flat|nested] 45+ messages in thread
end of thread, other threads:[~2022-01-28 17:05 UTC | newest] Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-07-30 8:49 [dpdk-dev] 0/6] support oops handling jerinj 2021-07-30 8:49 ` [dpdk-dev] 1/6] eal: introduce oops handling API jerinj 2021-08-17 3:27 ` [dpdk-dev] [PATCH v2 0/6] support oops handling jerinj 2021-08-17 3:27 ` [dpdk-dev] [PATCH v2 1/6] eal: introduce oops handling API jerinj 2021-08-17 3:53 ` Stephen Hemminger 2021-08-17 7:38 ` Jerin Jacob 2021-08-17 15:09 ` Stephen Hemminger 2021-08-17 15:27 ` Jerin Jacob 2021-08-17 15:52 ` Stephen Hemminger 2021-08-18 9:37 ` Jerin Jacob 2021-08-18 16:46 ` Stephen Hemminger 2021-08-18 18:04 ` Jerin Jacob 2021-08-17 3:27 ` [dpdk-dev] [PATCH v2 2/6] eal: oops handling API implementation jerinj 2021-08-17 3:52 ` Stephen Hemminger 2021-08-17 10:24 ` Jerin Jacob 2021-08-17 3:27 ` [dpdk-dev] [PATCH v2 3/6] eal: support libunwind based backtrace jerinj 2021-08-17 3:27 ` [dpdk-dev] [PATCH v2 4/6] eal/x86: support register dump for oops jerinj 2021-08-17 3:27 ` [dpdk-dev] [PATCH v2 5/6] eal/arm64: " jerinj 2021-08-17 3:27 ` [dpdk-dev] [PATCH v2 6/6] test/oops: support unit test case for oops handling APIs jerinj 2021-09-06 4:17 ` [dpdk-dev] [PATCH v3 0/6] support oops handling jerinj 2021-09-06 4:17 ` [dpdk-dev] [PATCH v3 1/6] eal: introduce oops handling API jerinj 2021-09-06 4:17 ` [dpdk-dev] [PATCH v3 2/6] eal: oops handling API implementation jerinj 2021-09-06 4:17 ` [dpdk-dev] [PATCH v3 3/6] eal: support libunwind based backtrace jerinj 2022-01-27 20:47 ` Stephen Hemminger 2022-01-28 4:33 ` Jerin Jacob 2022-01-28 8:41 ` Thomas Monjalon 2022-01-28 14:27 ` Jerin Jacob 2022-01-28 17:05 ` Stephen Hemminger 2021-09-06 4:17 ` [dpdk-dev] [PATCH v3 4/6] eal/x86: support register dump for oops jerinj 2021-09-06 4:17 ` [dpdk-dev] [PATCH v3 5/6] eal/arm64: " jerinj 2021-09-06 4:17 ` [dpdk-dev] [PATCH v3 6/6] test/oops: support unit test case for oops handling APIs jerinj 2021-09-21 17:30 ` [dpdk-dev] [PATCH v3 0/6] support oops handling Thomas Monjalon 2021-09-21 17:54 ` Jerin Jacob 2021-09-22 7:34 ` Thomas Monjalon 2021-09-22 8:03 ` Jerin Jacob 2021-09-22 8:33 ` Thomas Monjalon 2021-09-22 8:49 ` Jerin Jacob 2021-07-30 8:49 ` [dpdk-dev] 2/6] eal: oops handling API implementation jerinj 2021-08-02 22:46 ` David Christensen 2021-07-30 8:49 ` [dpdk-dev] 3/6] eal: support libunwind based backtrace jerinj 2021-07-30 8:49 ` [dpdk-dev] 4/6] eal/x86: support register dump for oops jerinj 2021-07-30 8:49 ` [dpdk-dev] 5/6] eal/arm64: " jerinj 2021-08-02 22:49 ` David Christensen 2021-08-16 16:24 ` Jerin Jacob 2021-07-30 8:49 ` [dpdk-dev] 6/6] test/oops: support unit test case for oops handling APIs jerinj
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).