DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] 0/6] support oops handling
@ 2021-07-30  8:49 jerinj
  2021-07-30  8:49 ` [dpdk-dev] 1/6] eal: introduce oops handling API jerinj
                   ` (5 more replies)
  0 siblings, 6 replies; 45+ messages in thread
From: jerinj @ 2021-07-30  8:49 UTC (permalink / raw)
  To: dev
  Cc: thomas, david.marchand, bruce.richardson, dmitry.kozliuk,
	navasile, dmitrym, pallavi.kadam, konstantin.ananyev,
	ruifeng.wang, viktorin, drc, Jerin Jacob

From: Jerin Jacob <jerinj@marvell.com>

It is handy to get detailed OOPS information like Linux kernel
when DPDK application crashes without losing any of the features
provided by coredump infrastructure by the OS.

This patch series introduces the APIs to handle OOPS in DPDK.

Following section details the implementation and API interface to application.

On rte_eal_init() invocation, the EAL library installs the oops handler for
the essential signals. The rte_oops_signals_enabled() API provides the list
of signals the library installed by the EAL.

The default EAL oops handler decodes the oops message using rte_oops_decode()
and then calls the signal handler installed by the application 
before invoking the rte_eal_init(). This scheme will also enable the use of
the default coredump handler(for gdb etc.) provided by OS 
if the application does not install any specific signal handler. 

The second case where the application installs the signal handler after 
the rte_eal_init() invocation, rte_oops_decode() provides the means of
decoding the oops message in the application's fault handler.


Patch split:

Patch 1/6: defines the API and stub implementation for Unix systems
Patch 2/6: The API implementation
Patch 3/6: add an optional libunwind dependency to DPDK for better backtrace in oops.
Patch 4/6: x86 specific archinfo like x86 register dump on oops
Patch 5/6: arm64 specific archinfo like arm64 register dump on oops
Patch 6/6: UT for the new APIs


Example command for the build, run, and output logs of an x86-64 linux machine.
  

meson --buildtype debug build
ninja -C build

echo "oops_autotest" | ./build/app/test/dpdk-test --no-huge  -c 0x2

Signal info:
------------
PID:           2439496
Signal number: 11
Fault address: 0x5

Backtrace:
----------
[  0x55e8b56d5cee]: test_oops_generate()+0x75
[  0x55e8b5459843]: unit_test_suite_runner()+0x1aa
[  0x55e8b56d605c]: test_oops()+0x13
[  0x55e8b544bdfc]: cmd_autotest_parsed()+0x55
[  0x55e8b6063a0d]: cmdline_parse()+0x319
[  0x55e8b6061dea]: cmdline_valid_buffer()+0x35
[  0x55e8b6066bd8]: rdline_char_in()+0xc48
[  0x55e8b606221c]: cmdline_in()+0x62
[  0x55e8b6062495]: cmdline_interact()+0x56
[  0x55e8b5459314]: main()+0x65e
[  0x7f54b25d2b25]: __libc_start_main()+0xd5
[  0x55e8b544bc9e]: _start()+0x2e

Arch info:
----------
R8 : 0x0000000000000000  R9 : 0x0000000000000000
R10: 0x00007f54b25b8b48  R11: 0x00007f54b25e7930
R12: 0x00007fffc695e610  R13: 0x0000000000000000
R14: 0x0000000000000000  R15: 0x0000000000000000
RAX: 0x0000000000000005  RBX: 0x0000000000000001
RCX: 0x00007f54b278a943  RDX: 0x3769043bf13a2594
RBP: 0x00007fffc6958340  RSP: 0x00007fffc6958330
RSI: 0x0000000000000000  RDI: 0x000055e8c4c1e380
RIP: 0x000055e8b56d5cee  EFL: 0x0000000000010246

Stack dump:
----------
0x7fffc6958330: 0x6000000
0x7fffc6958334: 0x0
0x7fffc6958338: 0x30cfeac5
0x7fffc695833c: 0x0
0x7fffc6958340: 0xe08395c6
0x7fffc6958344: 0xff7f0000
0x7fffc6958348: 0x439845b5
0x7fffc695834c: 0xe8550000
0x7fffc6958350: 0x0
0x7fffc6958354: 0xb000000
0x7fffc6958358: 0x20445bb9
0x7fffc695835c: 0xe8550000
0x7fffc6958360: 0x925506b6
0x7fffc6958364: 0x0
0x7fffc6958368: 0x0
0x7fffc695836c: 0x0

Code dump:
----------
0x55e8b56d5cee: 0xc7000000
0x55e8b56d5cf2: 0xeb12
0x55e8b56d5cf6: 0xfb6054b
0x55e8b56d5cfa: 0x87540f84
0x55e8b56d5cfe: 0xc07407b8
0x55e8b56d5d02: 0x0
0x55e8b56d5d06: 0xeb05b8ff
0x55e8b56d5d0a: 0xffffffc9
0x55e8b56d5d0e: 0xc3554889
0x55e8b56d5d12: 0xe54881ec
0x55e8b56d5d16: 0xc0000000
0x55e8b56d5d1a: 0x89bd4cff
0x55e8b56d5d1e: 0xffff4889
0x55e8b56d5d22: 0xb540ffff

Jerin Jacob (6):
  eal: introduce oops handling API
  eal: oops handling API implementation
  eal: support libunwind based backtrace
  eal/x86: support register dump for oops
  eal/arm64: support register dump for oops
  test/oops: support unit test case for oops handling APIs

 .github/workflows/build.yml  |   2 +-
 .travis.yml                  |   2 +-
 app/test/meson.build         |   2 +
 app/test/test_oops.c         | 121 ++++++++++++++
 config/meson.build           |   8 +
 doc/api/doxy-api-index.md    |   3 +-
 lib/eal/common/eal_private.h |   3 +
 lib/eal/freebsd/eal.c        |   6 +
 lib/eal/include/meson.build  |   1 +
 lib/eal/include/rte_oops.h   | 100 ++++++++++++
 lib/eal/linux/eal.c          |   6 +
 lib/eal/unix/eal_oops.c      | 297 +++++++++++++++++++++++++++++++++++
 lib/eal/unix/meson.build     |   1 +
 lib/eal/version.map          |   4 +
 14 files changed, 553 insertions(+), 3 deletions(-)
 create mode 100644 app/test/test_oops.c
 create mode 100644 lib/eal/include/rte_oops.h
 create mode 100644 lib/eal/unix/eal_oops.c

-- 
2.32.0


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [dpdk-dev] 1/6] eal: introduce oops handling API
  2021-07-30  8:49 [dpdk-dev] 0/6] support oops handling jerinj
@ 2021-07-30  8:49 ` jerinj
  2021-08-17  3:27   ` [dpdk-dev] [PATCH v2 0/6] support oops handling jerinj
  2021-07-30  8:49 ` [dpdk-dev] 2/6] eal: oops handling API implementation jerinj
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 45+ messages in thread
From: jerinj @ 2021-07-30  8:49 UTC (permalink / raw)
  To: dev, Bruce Richardson, Ray Kinsella
  Cc: thomas, david.marchand, dmitry.kozliuk, navasile, dmitrym,
	pallavi.kadam, konstantin.ananyev, ruifeng.wang, viktorin, drc,
	Jerin Jacob

From: Jerin Jacob <jerinj@marvell.com>

Introducing oops handling API with following specification
and enable stub implementation for Linux and FreeBSD.

On rte_eal_init() invocation, the EAL library installs the
oops handler for the essential signals.
The rte_oops_signals_enabled() API provides the list
of signals the library installed by the EAL.

The default EAL oops handler decodes the oops message using
rte_oops_decode() and then calls the signal handler
installed by the application before invoking the rte_eal_init().
This scheme will also enable the use of the default coredump
handler(for gdb etc.) provided by OS if the application does
not install any specific signal handler.

The second case where the application installs the signal
handler after the rte_eal_init() invocation, rte_oops_decode()
provides the means of decoding the oops message in
the application's fault handler.

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
---
 doc/api/doxy-api-index.md    |   3 +-
 lib/eal/common/eal_private.h |   3 ++
 lib/eal/freebsd/eal.c        |   6 +++
 lib/eal/include/meson.build  |   1 +
 lib/eal/include/rte_oops.h   | 100 +++++++++++++++++++++++++++++++++++
 lib/eal/linux/eal.c          |   6 +++
 lib/eal/unix/eal_oops.c      |  36 +++++++++++++
 lib/eal/unix/meson.build     |   1 +
 lib/eal/version.map          |   4 ++
 9 files changed, 159 insertions(+), 1 deletion(-)
 create mode 100644 lib/eal/include/rte_oops.h
 create mode 100644 lib/eal/unix/eal_oops.c

diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index 1992107a03..0d0da35205 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -215,7 +215,8 @@ The public API headers are grouped by topics:
   [log]                (@ref rte_log.h),
   [errno]              (@ref rte_errno.h),
   [trace]              (@ref rte_trace.h),
-  [trace_point]        (@ref rte_trace_point.h)
+  [trace_point]        (@ref rte_trace_point.h),
+  [oops]               (@ref rte_oops.h)
 
 - **misc**:
   [EAL config]         (@ref rte_eal.h),
diff --git a/lib/eal/common/eal_private.h b/lib/eal/common/eal_private.h
index 64cf4e81c8..c3a490d803 100644
--- a/lib/eal/common/eal_private.h
+++ b/lib/eal/common/eal_private.h
@@ -716,6 +716,9 @@ void __rte_thread_init(unsigned int lcore_id, rte_cpuset_t *cpuset);
  */
 void __rte_thread_uninit(void);
 
+int eal_oops_init(void);
+void eal_oops_fini(void);
+
 /**
  * asprintf(3) replacement for Windows.
  */
diff --git a/lib/eal/freebsd/eal.c b/lib/eal/freebsd/eal.c
index 6cee5ae369..3c098708c6 100644
--- a/lib/eal/freebsd/eal.c
+++ b/lib/eal/freebsd/eal.c
@@ -692,6 +692,11 @@ rte_eal_init(int argc, char **argv)
 		return -1;
 	}
 
+	if (eal_oops_init()) {
+		rte_eal_init_alert("oops init failed.");
+		rte_errno = ENOENT;
+	}
+
 	thread_id = pthread_self();
 
 	eal_reset_internal_config(internal_conf);
@@ -974,6 +979,7 @@ rte_eal_cleanup(void)
 	rte_trace_save();
 	eal_trace_fini();
 	eal_cleanup_config(internal_conf);
+	eal_oops_fini();
 	return 0;
 }
 
diff --git a/lib/eal/include/meson.build b/lib/eal/include/meson.build
index 88a9eba12f..6c74bdb7b5 100644
--- a/lib/eal/include/meson.build
+++ b/lib/eal/include/meson.build
@@ -30,6 +30,7 @@ headers += files(
         'rte_malloc.h',
         'rte_memory.h',
         'rte_memzone.h',
+        'rte_oops.h',
         'rte_pci_dev_feature_defs.h',
         'rte_pci_dev_features.h',
         'rte_per_lcore.h',
diff --git a/lib/eal/include/rte_oops.h b/lib/eal/include/rte_oops.h
new file mode 100644
index 0000000000..ff82c409ec
--- /dev/null
+++ b/lib/eal/include/rte_oops.h
@@ -0,0 +1,100 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2020 Marvell.
+ */
+
+#ifndef _RTE_OOPS_H_
+#define _RTE_OOPS_H_
+
+#include <rte_common.h>
+#include <rte_compat.h>
+#include <rte_config.h>
+
+/**
+ * @file
+ *
+ * RTE oops API
+ *
+ * This file provides the oops handling APIs to RTE applications.
+ *
+ * On rte_eal_init() invocation, the EAL library installs the oops handler for
+ * the essential signals. The rte_oops_signals_enabled() API provides the list
+ * of signals the library installed by the EAL.
+ *
+ * The default EAL oops handler decodes the oops message using rte_oops_decode()
+ * and then calls the signal handler installed by the application before
+ * invoking the rte_eal_init(). This scheme will also enable the use of
+ * the default coredump handler(for gdb etc.) provided by OS if the application
+ * does not install any specific signal handler.
+ *
+ * The second case where the application installs the signal handler after
+ * the rte_eal_init() invocation, rte_oops_decode() provides the means of
+ * decoding the oops message in the application's fault handler.
+ *
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * Maximum number of oops signals enabled in EAL.
+ * @see rte_oops_signals_enabled()
+ */
+#define RTE_OOPS_SIGNALS_MAX 32
+
+/**
+ * Get the list of enabled oops signals installed by EAL.
+ *
+ * @param [out] signals
+ *   A pointer to store the enabled signals.
+ *   Value NULL is allowed. if not NULL, then the size of this array must be
+ *   at least RTE_OOPS_SIGNALS_MAX.
+ *
+ * @return
+ *   Number of enabled oops signals.
+ */
+__rte_experimental
+int rte_oops_signals_enabled(int *signals);
+
+#if defined(RTE_EXEC_ENV_LINUX) || defined(RTE_EXEC_ENV_FREEBSD)
+#include <signal.h>
+#include <ucontext.h>
+
+/**
+ * Decode an oops
+ *
+ * This prototype is same as sa_sigaction defined in signal.h.
+ * Application must register signal handler using sigaction() with
+ * sa_flag as SA_SIGINFO flag to get this information from unix OS.
+ *
+ * @param sig
+ *   Signal number
+ * @param info
+ *   Signal info provided by sa_sigaction. Value NULL is allowed.
+ * @param uc
+ *   ucontext_t provided when signal installed with SA_SIGINFO flag.
+ *   Value NULL is allowed.
+ *
+ */
+__rte_experimental
+void rte_oops_decode(int sig, siginfo_t *info, ucontext_t *uc);
+#else
+
+/**
+ * Decode an oops
+ *
+ * @param sig
+ *   Signal number
+ */
+__rte_experimental
+void rte_oops_decode(int sig);
+
+#endif
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_OOPS_H_ */
diff --git a/lib/eal/linux/eal.c b/lib/eal/linux/eal.c
index 3577eaeaa4..3438a96b75 100644
--- a/lib/eal/linux/eal.c
+++ b/lib/eal/linux/eal.c
@@ -991,6 +991,11 @@ rte_eal_init(int argc, char **argv)
 		return -1;
 	}
 
+	if (eal_oops_init()) {
+		rte_eal_init_alert("oops init failed.");
+		rte_errno = ENOENT;
+	}
+
 	p = strrchr(argv[0], '/');
 	strlcpy(logid, p ? p + 1 : argv[0], sizeof(logid));
 	thread_id = pthread_self();
@@ -1371,6 +1376,7 @@ rte_eal_cleanup(void)
 	rte_trace_save();
 	eal_trace_fini();
 	eal_cleanup_config(internal_conf);
+	eal_oops_fini();
 	return 0;
 }
 
diff --git a/lib/eal/unix/eal_oops.c b/lib/eal/unix/eal_oops.c
new file mode 100644
index 0000000000..53b580f733
--- /dev/null
+++ b/lib/eal/unix/eal_oops.c
@@ -0,0 +1,36 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2021 Marvell.
+ */
+
+
+#include <rte_oops.h>
+
+#include "eal_private.h"
+
+void
+rte_oops_decode(int sig, siginfo_t *info, ucontext_t *uc)
+{
+	RTE_SET_USED(sig);
+	RTE_SET_USED(info);
+	RTE_SET_USED(uc);
+
+}
+
+int
+rte_oops_signals_enabled(int *signals)
+{
+	RTE_SET_USED(signals);
+
+	return 0;
+}
+
+int
+eal_oops_init(void)
+{
+	return 0;
+}
+
+void
+eal_oops_fini(void)
+{
+}
diff --git a/lib/eal/unix/meson.build b/lib/eal/unix/meson.build
index e3ecd3e956..cdd3320669 100644
--- a/lib/eal/unix/meson.build
+++ b/lib/eal/unix/meson.build
@@ -6,5 +6,6 @@ sources += files(
         'eal_unix_memory.c',
         'eal_unix_timer.c',
         'eal_firmware.c',
+        'eal_oops.c',
         'rte_thread.c',
 )
diff --git a/lib/eal/version.map b/lib/eal/version.map
index 887012d02a..f2841d09fd 100644
--- a/lib/eal/version.map
+++ b/lib/eal/version.map
@@ -426,6 +426,10 @@ EXPERIMENTAL {
 
 	# added in 21.08
 	rte_power_monitor_multi; # WINDOWS_NO_EXPORT
+
+	# added in 21.11
+	rte_oops_signals_enabled; # WINDOWS_NO_EXPORT
+	rte_oops_decode; # WINDOWS_NO_EXPORT
 };
 
 INTERNAL {
-- 
2.32.0


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [dpdk-dev] 2/6] eal: oops handling API implementation
  2021-07-30  8:49 [dpdk-dev] 0/6] support oops handling jerinj
  2021-07-30  8:49 ` [dpdk-dev] 1/6] eal: introduce oops handling API jerinj
@ 2021-07-30  8:49 ` jerinj
  2021-08-02 22:46   ` David Christensen
  2021-07-30  8:49 ` [dpdk-dev] 3/6] eal: support libunwind based backtrace jerinj
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 45+ messages in thread
From: jerinj @ 2021-07-30  8:49 UTC (permalink / raw)
  To: dev
  Cc: thomas, david.marchand, bruce.richardson, dmitry.kozliuk,
	navasile, dmitrym, pallavi.kadam, konstantin.ananyev,
	ruifeng.wang, viktorin, drc, Jerin Jacob

From: Jerin Jacob <jerinj@marvell.com>

Implement the base oops handling APIs.

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
---
 lib/eal/unix/eal_oops.c | 175 ++++++++++++++++++++++++++++++++++++++--
 1 file changed, 168 insertions(+), 7 deletions(-)

diff --git a/lib/eal/unix/eal_oops.c b/lib/eal/unix/eal_oops.c
index 53b580f733..1120c8ad8c 100644
--- a/lib/eal/unix/eal_oops.c
+++ b/lib/eal/unix/eal_oops.c
@@ -2,35 +2,196 @@
  * Copyright(C) 2021 Marvell.
  */
 
+#include <inttypes.h>
+#include <signal.h>
+#include <ucontext.h>
+#include <unistd.h>
 
+#include <rte_byteorder.h>
+#include <rte_log.h>
 #include <rte_oops.h>
 
 #include "eal_private.h"
 
-void
-rte_oops_decode(int sig, siginfo_t *info, ucontext_t *uc)
+#define oops_print(...) rte_log(RTE_LOG_ERR, RTE_LOGTYPE_EAL, __VA_ARGS__)
+
+static int oops_signals[] = {SIGSEGV, SIGBUS, SIGILL, SIGABRT, SIGFPE, SIGSYS};
+
+struct oops_signal {
+	int sig;
+	bool enabled;
+	struct sigaction sa;
+};
+
+static struct oops_signal signals_db[RTE_DIM(oops_signals)];
+
+static void
+back_trace_dump(ucontext_t *context)
+{
+	RTE_SET_USED(context);
+
+	rte_dump_stack();
+}
+static void
+siginfo_dump(int sig, siginfo_t *info)
+{
+	oops_print("PID:           %" PRIdMAX "\n", (intmax_t)getpid());
+
+	if (info == NULL)
+		return;
+	if (sig != info->si_signo)
+		oops_print("Invalid signal info\n");
+
+	oops_print("Signal number: %d\n", info->si_signo);
+	oops_print("Fault address: %p\n", info->si_addr);
+}
+
+static void
+mem32_dump(void *ptr)
+{
+	uint32_t *p = ptr;
+	int i;
+
+	for (i = 0; i < 16; i++)
+		oops_print("%p: 0x%x\n", p + i, rte_be_to_cpu_32(p[i]));
+}
+
+static void
+stack_dump_header(void)
+{
+	oops_print("Stack dump:\n");
+	oops_print("----------\n");
+}
+
+static void
+code_dump_header(void)
+{
+	oops_print("Code dump:\n");
+	oops_print("----------\n");
+}
+
+static void
+stack_code_dump(void *stack, void *code)
+{
+	if (stack == NULL || code == NULL)
+		return;
+
+	oops_print("\n");
+	stack_dump_header();
+	mem32_dump(stack);
+	oops_print("\n");
+
+	code_dump_header();
+	mem32_dump(code);
+	oops_print("\n");
+}
+static void
+archinfo_dump(ucontext_t *uc)
 {
-	RTE_SET_USED(sig);
-	RTE_SET_USED(info);
 	RTE_SET_USED(uc);
 
+	stack_code_dump(NULL, NULL);
+}
+
+static void
+default_signal_handler_invoke(int sig)
+{
+	unsigned int idx;
+
+	for (idx = 0; idx < RTE_DIM(oops_signals); idx++) {
+		/* Skip disabled signals */
+		if (signals_db[idx].sig != sig)
+			continue;
+		if (!signals_db[idx].enabled)
+			continue;
+		/* Replace with stored handler */
+		sigaction(sig, &signals_db[idx].sa, NULL);
+		kill(getpid(), sig);
+	}
+}
+
+void
+rte_oops_decode(int sig, siginfo_t *info, ucontext_t *uc)
+{
+	oops_print("Signal info:\n");
+	oops_print("------------\n");
+	siginfo_dump(sig, info);
+	oops_print("\n");
+
+	oops_print("Backtrace:\n");
+	oops_print("----------\n");
+	back_trace_dump(uc);
+	oops_print("\n");
+
+	oops_print("Arch info:\n");
+	oops_print("----------\n");
+	if (uc)
+		archinfo_dump(uc);
+}
+
+static void
+eal_oops_handler(int sig, siginfo_t *info, void *ctx)
+{
+	ucontext_t *uc = ctx;
+
+	rte_oops_decode(sig, info, uc);
+	default_signal_handler_invoke(sig);
 }
 
 int
 rte_oops_signals_enabled(int *signals)
 {
-	RTE_SET_USED(signals);
+	int count = 0, sig[RTE_OOPS_SIGNALS_MAX];
+	unsigned int idx = 0;
 
-	return 0;
+	for (idx = 0; idx < RTE_DIM(oops_signals); idx++) {
+		if (signals_db[idx].enabled) {
+			sig[count] = signals_db[idx].sig;
+			count++;
+		}
+	}
+	if (signals)
+		memcpy(signals, sig, sizeof(*signals) * count);
+
+	return count;
 }
 
 int
 eal_oops_init(void)
 {
-	return 0;
+	unsigned int idx, rc = 0;
+	struct sigaction sa;
+
+	RTE_BUILD_BUG_ON(RTE_DIM(oops_signals) > RTE_OOPS_SIGNALS_MAX);
+
+	sigemptyset(&sa.sa_mask);
+	sa.sa_sigaction = &eal_oops_handler;
+	sa.sa_flags = SA_RESTART | SA_SIGINFO | SA_ONSTACK;
+
+	for (idx = 0; idx < RTE_DIM(oops_signals); idx++) {
+		signals_db[idx].sig = oops_signals[idx];
+		/* Get exiting sigaction */
+		rc = sigaction(signals_db[idx].sig, NULL, &signals_db[idx].sa);
+		if (rc)
+			continue;
+		/* Replace with oops handler */
+		rc = sigaction(signals_db[idx].sig, &sa, NULL);
+		if (rc)
+			continue;
+		signals_db[idx].enabled = true;
+	}
+	return rc;
 }
 
 void
 eal_oops_fini(void)
 {
+	unsigned int idx;
+
+	for (idx = 0; idx < RTE_DIM(oops_signals); idx++) {
+		if (!signals_db[idx].enabled)
+			continue;
+		/* Replace with stored handler */
+		sigaction(signals_db[idx].sig, &signals_db[idx].sa, NULL);
+	}
 }
-- 
2.32.0


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [dpdk-dev] 3/6] eal: support libunwind based backtrace
  2021-07-30  8:49 [dpdk-dev] 0/6] support oops handling jerinj
  2021-07-30  8:49 ` [dpdk-dev] 1/6] eal: introduce oops handling API jerinj
  2021-07-30  8:49 ` [dpdk-dev] 2/6] eal: oops handling API implementation jerinj
@ 2021-07-30  8:49 ` jerinj
  2021-07-30  8:49 ` [dpdk-dev] 4/6] eal/x86: support register dump for oops jerinj
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 45+ messages in thread
From: jerinj @ 2021-07-30  8:49 UTC (permalink / raw)
  To: dev, Aaron Conole, Michael Santana, Bruce Richardson
  Cc: thomas, david.marchand, dmitry.kozliuk, navasile, dmitrym,
	pallavi.kadam, konstantin.ananyev, ruifeng.wang, viktorin, drc,
	Jerin Jacob

From: Jerin Jacob <jerinj@marvell.com>

adding optional libwind library dependency to DPDK for
enhanced backtrace based on ucontext.

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
---
 .github/workflows/build.yml |  2 +-
 .travis.yml                 |  2 +-
 config/meson.build          |  8 +++++++
 lib/eal/unix/eal_oops.c     | 47 +++++++++++++++++++++++++++++++++++++
 4 files changed, 57 insertions(+), 2 deletions(-)

diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml
index 7dac20ddeb..caaca207a6 100644
--- a/.github/workflows/build.yml
+++ b/.github/workflows/build.yml
@@ -93,7 +93,7 @@ jobs:
       run: sudo apt install -y ccache libnuma-dev python3-setuptools
         python3-wheel python3-pip python3-pyelftools ninja-build libbsd-dev
         libpcap-dev libibverbs-dev libcrypto++-dev libfdt-dev libjansson-dev
-        libarchive-dev
+        libarchive-dev libunwind-dev
     - name: Install libabigail build dependencies if no cache is available
       if: env.ABI_CHECKS == 'true' && steps.libabigail-cache.outputs.cache-hit != 'true'
       run: sudo apt install -y autoconf automake libtool pkg-config libxml2-dev
diff --git a/.travis.yml b/.travis.yml
index 23067d9e3c..e72b156014 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -16,7 +16,7 @@ addons:
     packages: &required_packages
       - [libnuma-dev, python3-setuptools, python3-wheel, python3-pip, python3-pyelftools, ninja-build]
       - [libbsd-dev, libpcap-dev, libibverbs-dev, libcrypto++-dev, libfdt-dev, libjansson-dev]
-      - [libarchive-dev]
+      - [libarchive-dev, libunwind-dev]
 
 _aarch64_packages: &aarch64_packages
   - *required_packages
diff --git a/config/meson.build b/config/meson.build
index e80421003b..26a85dab6b 100644
--- a/config/meson.build
+++ b/config/meson.build
@@ -236,6 +236,14 @@ if cc.get_id() == 'clang' and dpdk_conf.get('RTE_ARCH_64') == false
     dpdk_extra_ldflags += '-latomic'
 endif
 
+# check for libunwind
+unwind_dep = dependency('libunwind', required: false, method: 'pkg-config')
+if unwind_dep.found() and cc.has_header('libunwind.h', dependencies: unwind_dep)
+    dpdk_conf.set('RTE_USE_LIBUNWIND', 1)
+    add_project_link_arguments('-lunwind', language: 'c')
+    dpdk_extra_ldflags += '-lunwind'
+endif
+
 # add -include rte_config to cflags
 add_project_arguments('-include', 'rte_config.h', language: 'c')
 
diff --git a/lib/eal/unix/eal_oops.c b/lib/eal/unix/eal_oops.c
index 1120c8ad8c..118b236f35 100644
--- a/lib/eal/unix/eal_oops.c
+++ b/lib/eal/unix/eal_oops.c
@@ -25,6 +25,50 @@ struct oops_signal {
 
 static struct oops_signal signals_db[RTE_DIM(oops_signals)];
 
+#if defined(RTE_USE_LIBUNWIND)
+
+#define BACKTRACE_DEPTH 256
+#define UNW_LOCAL_ONLY
+#include <libunwind.h>
+
+static void
+back_trace_dump(ucontext_t *context)
+{
+	unw_cursor_t cursor;
+	unw_word_t ip, off;
+	int rc, level = 0;
+	char name[256];
+
+	if (context == NULL) {
+		rte_dump_stack();
+		return;
+	}
+
+	rc = unw_init_local(&cursor, (unw_context_t *)context);
+	if (rc < 0)
+		goto fail;
+
+	for (;;) {
+		rc = unw_get_reg(&cursor, UNW_REG_IP, &ip);
+		if (rc < 0)
+			goto fail;
+		rc = unw_get_proc_name(&cursor, name, sizeof(name), &off);
+		if (rc == 0)
+			oops_print("[%16p]: %s()+0x%" PRIx64 "\n", (void *)ip,
+				   name, (uint64_t)off);
+		else
+			oops_print("[%16p]: <unknown>\n", (void *)ip);
+		rc = unw_step(&cursor);
+		if (rc <= 0 || ++level >= BACKTRACE_DEPTH)
+			break;
+	}
+	return;
+fail:
+	oops_print("libunwind call failed %s\n", unw_strerror(rc));
+}
+
+#else
+
 static void
 back_trace_dump(ucontext_t *context)
 {
@@ -32,6 +76,9 @@ back_trace_dump(ucontext_t *context)
 
 	rte_dump_stack();
 }
+
+#endif
+
 static void
 siginfo_dump(int sig, siginfo_t *info)
 {
-- 
2.32.0


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [dpdk-dev] 4/6] eal/x86: support register dump for oops
  2021-07-30  8:49 [dpdk-dev] 0/6] support oops handling jerinj
                   ` (2 preceding siblings ...)
  2021-07-30  8:49 ` [dpdk-dev] 3/6] eal: support libunwind based backtrace jerinj
@ 2021-07-30  8:49 ` jerinj
  2021-07-30  8:49 ` [dpdk-dev] 5/6] eal/arm64: " jerinj
  2021-07-30  8:49 ` [dpdk-dev] 6/6] test/oops: support unit test case for oops handling APIs jerinj
  5 siblings, 0 replies; 45+ messages in thread
From: jerinj @ 2021-07-30  8:49 UTC (permalink / raw)
  To: dev
  Cc: thomas, david.marchand, bruce.richardson, dmitry.kozliuk,
	navasile, dmitrym, pallavi.kadam, konstantin.ananyev,
	ruifeng.wang, viktorin, drc, Jerin Jacob

From: Jerin Jacob <jerinj@marvell.com>

Dump the x86 arch state register in oops
handling routine.

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
---
 lib/eal/unix/eal_oops.c | 34 ++++++++++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)

diff --git a/lib/eal/unix/eal_oops.c b/lib/eal/unix/eal_oops.c
index 118b236f35..da71481ade 100644
--- a/lib/eal/unix/eal_oops.c
+++ b/lib/eal/unix/eal_oops.c
@@ -132,6 +132,38 @@ stack_code_dump(void *stack, void *code)
 	mem32_dump(code);
 	oops_print("\n");
 }
+
+#if defined(RTE_ARCH_X86_64) && defined(RTE_EXEC_ENV_LINUX)
+static void
+archinfo_dump(ucontext_t *uc)
+{
+
+	mcontext_t *mc = &uc->uc_mcontext;
+
+	oops_print("R8 : 0x%.16llx  ", mc->gregs[REG_R8]);
+	oops_print("R9 : 0x%.16llx\n", mc->gregs[REG_R9]);
+	oops_print("R10: 0x%.16llx  ", mc->gregs[REG_R10]);
+	oops_print("R11: 0x%.16llx\n", mc->gregs[REG_R11]);
+	oops_print("R12: 0x%.16llx  ", mc->gregs[REG_R12]);
+	oops_print("R13: 0x%.16llx\n", mc->gregs[REG_R13]);
+	oops_print("R14: 0x%.16llx  ", mc->gregs[REG_R14]);
+	oops_print("R15: 0x%.16llx\n", mc->gregs[REG_R15]);
+	oops_print("RAX: 0x%.16llx  ", mc->gregs[REG_RAX]);
+	oops_print("RBX: 0x%.16llx\n", mc->gregs[REG_RBX]);
+	oops_print("RCX: 0x%.16llx  ", mc->gregs[REG_RCX]);
+	oops_print("RDX: 0x%.16llx\n", mc->gregs[REG_RDX]);
+	oops_print("RBP: 0x%.16llx  ", mc->gregs[REG_RBP]);
+	oops_print("RSP: 0x%.16llx\n", mc->gregs[REG_RSP]);
+	oops_print("RSI: 0x%.16llx  ", mc->gregs[REG_RSI]);
+	oops_print("RDI: 0x%.16llx\n", mc->gregs[REG_RDI]);
+	oops_print("RIP: 0x%.16llx  ", mc->gregs[REG_RIP]);
+	oops_print("EFL: 0x%.16llx\n", mc->gregs[REG_EFL]);
+
+	stack_code_dump((void *)mc->gregs[REG_RSP], (void *)mc->gregs[REG_RIP]);
+}
+
+#else
+
 static void
 archinfo_dump(ucontext_t *uc)
 {
@@ -140,6 +172,8 @@ archinfo_dump(ucontext_t *uc)
 	stack_code_dump(NULL, NULL);
 }
 
+#endif
+
 static void
 default_signal_handler_invoke(int sig)
 {
-- 
2.32.0


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [dpdk-dev] 5/6] eal/arm64: support register dump for oops
  2021-07-30  8:49 [dpdk-dev] 0/6] support oops handling jerinj
                   ` (3 preceding siblings ...)
  2021-07-30  8:49 ` [dpdk-dev] 4/6] eal/x86: support register dump for oops jerinj
@ 2021-07-30  8:49 ` jerinj
  2021-08-02 22:49   ` David Christensen
  2021-07-30  8:49 ` [dpdk-dev] 6/6] test/oops: support unit test case for oops handling APIs jerinj
  5 siblings, 1 reply; 45+ messages in thread
From: jerinj @ 2021-07-30  8:49 UTC (permalink / raw)
  To: dev
  Cc: thomas, david.marchand, bruce.richardson, dmitry.kozliuk,
	navasile, dmitrym, pallavi.kadam, konstantin.ananyev,
	ruifeng.wang, viktorin, drc, Jerin Jacob

From: Jerin Jacob <jerinj@marvell.com>

Dump the arm64 arch state register in oops
handling routine.

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
---
 lib/eal/unix/eal_oops.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/lib/eal/unix/eal_oops.c b/lib/eal/unix/eal_oops.c
index da71481ade..7469610d96 100644
--- a/lib/eal/unix/eal_oops.c
+++ b/lib/eal/unix/eal_oops.c
@@ -162,6 +162,25 @@ archinfo_dump(ucontext_t *uc)
 	stack_code_dump((void *)mc->gregs[REG_RSP], (void *)mc->gregs[REG_RIP]);
 }
 
+#elif defined(RTE_ARCH_ARM64) && defined(RTE_EXEC_ENV_LINUX)
+
+static void
+archinfo_dump(ucontext_t *uc)
+{
+	mcontext_t *mc = &uc->uc_mcontext;
+	int i;
+
+	oops_print("PC : 0x%.16llx", mc->pc);
+	oops_print("SP : 0x%.16llx\n", mc->sp);
+	for (i = 0; i < 31; i++)
+		oops_print("X%.2d: 0x%.16llx%s", i, mc->regs[i],
+			   i & 0x1 ? "\n" : " ");
+
+	oops_print("PSTATE: 0x%.16llx\n", mc->pstate);
+
+	stack_code_dump((void *)mc->sp, (void *)mc->pc);
+}
+
 #else
 
 static void
-- 
2.32.0


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [dpdk-dev] 6/6] test/oops: support unit test case for oops handling APIs
  2021-07-30  8:49 [dpdk-dev] 0/6] support oops handling jerinj
                   ` (4 preceding siblings ...)
  2021-07-30  8:49 ` [dpdk-dev] 5/6] eal/arm64: " jerinj
@ 2021-07-30  8:49 ` jerinj
  5 siblings, 0 replies; 45+ messages in thread
From: jerinj @ 2021-07-30  8:49 UTC (permalink / raw)
  To: dev
  Cc: thomas, david.marchand, bruce.richardson, dmitry.kozliuk,
	navasile, dmitrym, pallavi.kadam, konstantin.ananyev,
	ruifeng.wang, viktorin, drc, Jerin Jacob

From: Jerin Jacob <jerinj@marvell.com>

Added unit test cases for all the oops handling APIs.

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
---
 app/test/meson.build |   2 +
 app/test/test_oops.c | 121 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 123 insertions(+)
 create mode 100644 app/test/test_oops.c

diff --git a/app/test/meson.build b/app/test/meson.build
index a7611686ad..1e471ab351 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -97,6 +97,7 @@ test_sources = files(
         'test_metrics.c',
         'test_mcslock.c',
         'test_mp_secondary.c',
+        'test_oops.c',
         'test_per_lcore.c',
         'test_pflock.c',
         'test_pmd_perf.c',
@@ -236,6 +237,7 @@ fast_tests = [
         ['memzone_autotest', false],
         ['meter_autotest', true],
         ['multiprocess_autotest', false],
+        ['oops_autotest', true],
         ['per_lcore_autotest', true],
         ['pflock_autotest', true],
         ['prefetch_autotest', true],
diff --git a/app/test/test_oops.c b/app/test/test_oops.c
new file mode 100644
index 0000000000..60a7f259c7
--- /dev/null
+++ b/app/test/test_oops.c
@@ -0,0 +1,121 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2021 Marvell
+ */
+
+#include <setjmp.h>
+#include <signal.h>
+
+#include <rte_config.h>
+#include <rte_oops.h>
+
+#include "test.h"
+
+static jmp_buf pc;
+static bool detected_segfault;
+
+static void
+segv_handler(int sig, siginfo_t *info, void *ctx)
+{
+	detected_segfault = true;
+	rte_oops_decode(sig, info, (ucontext_t *)ctx);
+	longjmp(pc, 1);
+}
+
+/* OS specific way install the signal segfault handler*/
+static int
+segv_handler_install(void)
+{
+	struct sigaction sa;
+
+	sigemptyset(&sa.sa_mask);
+	sa.sa_sigaction = &segv_handler;
+	sa.sa_flags = SA_SIGINFO;
+
+	return sigaction(SIGSEGV, &sa, NULL);
+}
+
+static int
+test_oops_generate(void)
+{
+	int rc;
+
+	rc = segv_handler_install();
+	TEST_ASSERT_EQUAL(rc, 0, "rc=%d\n", rc);
+
+	detected_segfault = false;
+	rc = setjmp(pc); /* Save the execution state */
+	if (rc == 0) {
+		/* Generate a segfault */
+		*(volatile int *)0x05 = 0;
+	} else { /* logjump from segv_handler */
+		if (detected_segfault)
+			return TEST_SUCCESS;
+
+	}
+	return TEST_FAILED;
+}
+
+static int
+test_signal_handler_installed(int count, int *signals)
+{
+	int i, rc, verified = 0;
+	struct sigaction sa;
+
+	for (i = 0; i < count; i++) {
+		rc = sigaction(signals[i], NULL, &sa);
+		if (rc) {
+			printf("Failed to get sigaction for %d", signals[i]);
+			continue;
+		}
+		if (sa.sa_handler != SIG_DFL)
+			verified++;
+	}
+	TEST_ASSERT_EQUAL(count, verified, "count=%d verified=%d\n", count,
+			  verified);
+	return TEST_SUCCESS;
+}
+
+static int
+test_oops_signals_enabled(void)
+{
+	int *signals = NULL;
+	int i, rc;
+
+	rc = rte_oops_signals_enabled(signals);
+	TEST_ASSERT_NOT_EQUAL(rc, 0, "rc=%d\n", rc);
+
+	signals = malloc(sizeof(int) * rc);
+	rc = rte_oops_signals_enabled(signals);
+	TEST_ASSERT_NOT_EQUAL(rc, 0, "rc=%d\n", rc);
+	free(signals);
+
+	signals = malloc(sizeof(int) * RTE_OOPS_SIGNALS_MAX);
+	rc = rte_oops_signals_enabled(signals);
+	TEST_ASSERT_NOT_EQUAL(rc, 0, "rc=%d\n", rc);
+
+	for (i = 0; i < rc; i++)
+		TEST_ASSERT_NOT_EQUAL(signals[i], 0, "idx=%d val=%d\n", i,
+				      signals[i]);
+
+	rc = test_signal_handler_installed(rc, signals);
+	free(signals);
+
+	return rc;
+}
+
+static struct unit_test_suite oops_tests = {
+	.suite_name = "oops autotest",
+	.setup = NULL,
+	.teardown = NULL,
+	.unit_test_cases = {
+			    TEST_CASE(test_oops_signals_enabled),
+			    TEST_CASE(test_oops_generate),
+			    TEST_CASES_END()}};
+
+static int
+test_oops(void)
+{
+	return unit_test_suite_runner(&oops_tests);
+}
+
+REGISTER_TEST_COMMAND(oops_autotest, test_oops);
-- 
2.32.0


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [dpdk-dev] 2/6] eal: oops handling API implementation
  2021-07-30  8:49 ` [dpdk-dev] 2/6] eal: oops handling API implementation jerinj
@ 2021-08-02 22:46   ` David Christensen
  0 siblings, 0 replies; 45+ messages in thread
From: David Christensen @ 2021-08-02 22:46 UTC (permalink / raw)
  To: jerinj, dev
  Cc: thomas, david.marchand, bruce.richardson, dmitry.kozliuk,
	navasile, dmitrym, pallavi.kadam, konstantin.ananyev,
	ruifeng.wang, viktorin



On 7/30/21 1:49 AM, jerinj@marvell.com wrote:
> From: Jerin Jacob <jerinj@marvell.com>
> 
> Implement the base oops handling APIs.
> 
> Signed-off-by: Jerin Jacob <jerinj@marvell.com>

Building on POWER generates the following error:

ninja: Entering directory `build'
[1/244] Compiling C object 'lib/76b5a35@@rte_eal@sta/eal_unix_eal_oops.c.o'.
../lib/eal/unix/eal_oops.c: In function ‘back_trace_dump’:
../lib/eal/unix/eal_oops.c:33:2: warning: implicit declaration of 
function ‘rte_dump_stack’; did you mean ‘rte_bus_scan’? 
[-Wimplicit-function-declaration]
   rte_dump_stack();
   ^~~~~~~~~~~~~~
   rte_bus_scan
../lib/eal/unix/eal_oops.c:33:2: warning: nested extern declaration of 
‘rte_dump_stack’ [-Wnested-externs]
[19/19] Linking target app/test/dpdk-test.

You can fix the issue by adding <rte_debug.h> to eal_oops.c.  Must be a 
hidden include dependency in the x86/ARM code.

Dave

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [dpdk-dev] 5/6] eal/arm64: support register dump for oops
  2021-07-30  8:49 ` [dpdk-dev] 5/6] eal/arm64: " jerinj
@ 2021-08-02 22:49   ` David Christensen
  2021-08-16 16:24     ` Jerin Jacob
  0 siblings, 1 reply; 45+ messages in thread
From: David Christensen @ 2021-08-02 22:49 UTC (permalink / raw)
  To: jerinj, dev
  Cc: thomas, david.marchand, bruce.richardson, dmitry.kozliuk,
	navasile, dmitrym, pallavi.kadam, konstantin.ananyev,
	ruifeng.wang, viktorin



On 7/30/21 1:49 AM, jerinj@marvell.com wrote:
> From: Jerin Jacob <jerinj@marvell.com>
> 
> Dump the arm64 arch state register in oops
> handling routine.
> 
> Signed-off-by: Jerin Jacob <jerinj@marvell.com>
> ---
>   lib/eal/unix/eal_oops.c | 19 +++++++++++++++++++
>   1 file changed, 19 insertions(+)
> 
> diff --git a/lib/eal/unix/eal_oops.c b/lib/eal/unix/eal_oops.c
> index da71481ade..7469610d96 100644
> --- a/lib/eal/unix/eal_oops.c
> +++ b/lib/eal/unix/eal_oops.c
> @@ -162,6 +162,25 @@ archinfo_dump(ucontext_t *uc)
>   	stack_code_dump((void *)mc->gregs[REG_RSP], (void *)mc->gregs[REG_RIP]);
>   }
> 
> +#elif defined(RTE_ARCH_ARM64) && defined(RTE_EXEC_ENV_LINUX)
> +
> +static void
> +archinfo_dump(ucontext_t *uc)
> +{
> +	mcontext_t *mc = &uc->uc_mcontext;
> +	int i;
> +
> +	oops_print("PC : 0x%.16llx", mc->pc);
> +	oops_print("SP : 0x%.16llx\n", mc->sp);
> +	for (i = 0; i < 31; i++)
                      ~~~
Maybe <= instead of < ??  31 is a strange number of registers and the 
line feed doesn't seem to line things up for PSTATEn below.

> +		oops_print("X%.2d: 0x%.16llx%s", i, mc->regs[i],
> +			   i & 0x1 ? "\n" : " ");

Dave

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [dpdk-dev] 5/6] eal/arm64: support register dump for oops
  2021-08-02 22:49   ` David Christensen
@ 2021-08-16 16:24     ` Jerin Jacob
  0 siblings, 0 replies; 45+ messages in thread
From: Jerin Jacob @ 2021-08-16 16:24 UTC (permalink / raw)
  To: David Christensen
  Cc: Jerin Jacob, dpdk-dev, Thomas Monjalon, David Marchand,
	Richardson, Bruce, Dmitry Kozlyuk, Narcisa Ana Maria Vasile,
	Dmitry Malloy (MESHCHANINOV),
	Pallavi Kadam, Ananyev, Konstantin,
	Ruifeng Wang (Arm Technology China),
	Jan Viktorin

On Tue, Aug 3, 2021 at 4:20 AM David Christensen <drc@linux.vnet.ibm.com> wrote:
>
>
>
> On 7/30/21 1:49 AM, jerinj@marvell.com wrote:
> > From: Jerin Jacob <jerinj@marvell.com>
> >
> > Dump the arm64 arch state register in oops
> > handling routine.
> >
> > Signed-off-by: Jerin Jacob <jerinj@marvell.com>
> > ---
> >   lib/eal/unix/eal_oops.c | 19 +++++++++++++++++++
> >   1 file changed, 19 insertions(+)
> >
> > diff --git a/lib/eal/unix/eal_oops.c b/lib/eal/unix/eal_oops.c
> > index da71481ade..7469610d96 100644
> > --- a/lib/eal/unix/eal_oops.c
> > +++ b/lib/eal/unix/eal_oops.c
> > @@ -162,6 +162,25 @@ archinfo_dump(ucontext_t *uc)
> >       stack_code_dump((void *)mc->gregs[REG_RSP], (void *)mc->gregs[REG_RIP]);
> >   }
> >
> > +#elif defined(RTE_ARCH_ARM64) && defined(RTE_EXEC_ENV_LINUX)
> > +
> > +static void
> > +archinfo_dump(ucontext_t *uc)
> > +{
> > +     mcontext_t *mc = &uc->uc_mcontext;
> > +     int i;
> > +
> > +     oops_print("PC : 0x%.16llx", mc->pc);
> > +     oops_print("SP : 0x%.16llx\n", mc->sp);
> > +     for (i = 0; i < 31; i++)
>                       ~~~
> Maybe <= instead of < ??  31 is a strange number of registers and the
> line feed doesn't seem to line things up for PSTATEn below.

Based on spec https://elixir.bootlin.com/linux/v4.5/source/arch/arm64/include/uapi/asm/sigcontext.h
it is 0 from 30 as r31 is SP, it is already part as struct sigcontext::sp.


>
> > +             oops_print("X%.2d: 0x%.16llx%s", i, mc->regs[i],
> > +                        i & 0x1 ? "\n" : " ");
>
> Dave

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [dpdk-dev] [PATCH v2 0/6] support oops handling
  2021-07-30  8:49 ` [dpdk-dev] 1/6] eal: introduce oops handling API jerinj
@ 2021-08-17  3:27   ` jerinj
  2021-08-17  3:27     ` [dpdk-dev] [PATCH v2 1/6] eal: introduce oops handling API jerinj
                       ` (6 more replies)
  0 siblings, 7 replies; 45+ messages in thread
From: jerinj @ 2021-08-17  3:27 UTC (permalink / raw)
  To: dev
  Cc: thomas, david.marchand, bruce.richardson, dmitry.kozliuk,
	navasile, dmitrym, pallavi.kadam, konstantin.ananyev,
	ruifeng.wang, viktorin, drc, Jerin Jacob

From: Jerin Jacob <jerinj@marvell.com>

v2:
- Fix powerpc build (David Christensen)

It is handy to get detailed OOPS information like Linux kernel
when DPDK application crashes without losing any of the features
provided by coredump infrastructure by the OS.

This patch series introduces the APIs to handle OOPS in DPDK.

Following section details the implementation and API interface to application.

On rte_eal_init() invocation, the EAL library installs the oops handler for
the essential signals. The rte_oops_signals_enabled() API provides the list
of signals the library installed by the EAL.

The default EAL oops handler decodes the oops message using rte_oops_decode()
and then calls the signal handler installed by the application 
before invoking the rte_eal_init(). This scheme will also enable the use of
the default coredump handler(for gdb etc.) provided by OS 
if the application does not install any specific signal handler. 

The second case where the application installs the signal handler after 
the rte_eal_init() invocation, rte_oops_decode() provides the means of
decoding the oops message in the application's fault handler.


Patch split:

Patch 1/6: defines the API and stub implementation for Unix systems
Patch 2/6: The API implementation
Patch 3/6: add an optional libunwind dependency to DPDK for better backtrace in oops.
Patch 4/6: x86 specific archinfo like x86 register dump on oops
Patch 5/6: arm64 specific archinfo like arm64 register dump on oops
Patch 6/6: UT for the new APIs


Example command for the build, run, and output logs of an x86-64 linux machine.
  

meson --buildtype debug build
ninja -C build

echo "oops_autotest" | ./build/app/test/dpdk-test --no-huge  -c 0x2

Signal info:
------------
PID:           2439496
Signal number: 11
Fault address: 0x5

Backtrace:
----------
[  0x55e8b56d5cee]: test_oops_generate()+0x75
[  0x55e8b5459843]: unit_test_suite_runner()+0x1aa
[  0x55e8b56d605c]: test_oops()+0x13
[  0x55e8b544bdfc]: cmd_autotest_parsed()+0x55
[  0x55e8b6063a0d]: cmdline_parse()+0x319
[  0x55e8b6061dea]: cmdline_valid_buffer()+0x35
[  0x55e8b6066bd8]: rdline_char_in()+0xc48
[  0x55e8b606221c]: cmdline_in()+0x62
[  0x55e8b6062495]: cmdline_interact()+0x56
[  0x55e8b5459314]: main()+0x65e
[  0x7f54b25d2b25]: __libc_start_main()+0xd5
[  0x55e8b544bc9e]: _start()+0x2e

Arch info:
----------
R8 : 0x0000000000000000  R9 : 0x0000000000000000
R10: 0x00007f54b25b8b48  R11: 0x00007f54b25e7930
R12: 0x00007fffc695e610  R13: 0x0000000000000000
R14: 0x0000000000000000  R15: 0x0000000000000000
RAX: 0x0000000000000005  RBX: 0x0000000000000001
RCX: 0x00007f54b278a943  RDX: 0x3769043bf13a2594
RBP: 0x00007fffc6958340  RSP: 0x00007fffc6958330
RSI: 0x0000000000000000  RDI: 0x000055e8c4c1e380
RIP: 0x000055e8b56d5cee  EFL: 0x0000000000010246

Stack dump:
----------
0x7fffc6958330: 0x6000000
0x7fffc6958334: 0x0
0x7fffc6958338: 0x30cfeac5
0x7fffc695833c: 0x0
0x7fffc6958340: 0xe08395c6
0x7fffc6958344: 0xff7f0000
0x7fffc6958348: 0x439845b5
0x7fffc695834c: 0xe8550000
0x7fffc6958350: 0x0
0x7fffc6958354: 0xb000000
0x7fffc6958358: 0x20445bb9
0x7fffc695835c: 0xe8550000
0x7fffc6958360: 0x925506b6
0x7fffc6958364: 0x0
0x7fffc6958368: 0x0
0x7fffc695836c: 0x0

Code dump:
----------
0x55e8b56d5cee: 0xc7000000
0x55e8b56d5cf2: 0xeb12
0x55e8b56d5cf6: 0xfb6054b
0x55e8b56d5cfa: 0x87540f84
0x55e8b56d5cfe: 0xc07407b8
0x55e8b56d5d02: 0x0
0x55e8b56d5d06: 0xeb05b8ff
0x55e8b56d5d0a: 0xffffffc9
0x55e8b56d5d0e: 0xc3554889
0x55e8b56d5d12: 0xe54881ec
0x55e8b56d5d16: 0xc0000000
0x55e8b56d5d1a: 0x89bd4cff
0x55e8b56d5d1e: 0xffff4889
0x55e8b56d5d22: 0xb540ffff

Jerin Jacob (6):
  eal: introduce oops handling API
  eal: oops handling API implementation
  eal: support libunwind based backtrace
  eal/x86: support register dump for oops
  eal/arm64: support register dump for oops
  test/oops: support unit test case for oops handling APIs

 .github/workflows/build.yml  |   2 +-
 .travis.yml                  |   2 +-
 app/test/meson.build         |   2 +
 app/test/test_oops.c         | 121 ++++++++++++++
 config/meson.build           |   8 +
 doc/api/doxy-api-index.md    |   3 +-
 lib/eal/common/eal_private.h |   3 +
 lib/eal/freebsd/eal.c        |   6 +
 lib/eal/include/meson.build  |   1 +
 lib/eal/include/rte_oops.h   | 100 ++++++++++++
 lib/eal/linux/eal.c          |   6 +
 lib/eal/unix/eal_oops.c      | 298 +++++++++++++++++++++++++++++++++++
 lib/eal/unix/meson.build     |   1 +
 lib/eal/version.map          |   4 +
 14 files changed, 554 insertions(+), 3 deletions(-)
 create mode 100644 app/test/test_oops.c
 create mode 100644 lib/eal/include/rte_oops.h
 create mode 100644 lib/eal/unix/eal_oops.c

-- 
2.32.0


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [dpdk-dev] [PATCH v2 1/6] eal: introduce oops handling API
  2021-08-17  3:27   ` [dpdk-dev] [PATCH v2 0/6] support oops handling jerinj
@ 2021-08-17  3:27     ` jerinj
  2021-08-17  3:53       ` Stephen Hemminger
  2021-08-17  3:27     ` [dpdk-dev] [PATCH v2 2/6] eal: oops handling API implementation jerinj
                       ` (5 subsequent siblings)
  6 siblings, 1 reply; 45+ messages in thread
From: jerinj @ 2021-08-17  3:27 UTC (permalink / raw)
  To: dev, Bruce Richardson, Ray Kinsella
  Cc: thomas, david.marchand, dmitry.kozliuk, navasile, dmitrym,
	pallavi.kadam, konstantin.ananyev, ruifeng.wang, viktorin, drc,
	Jerin Jacob

From: Jerin Jacob <jerinj@marvell.com>

Introducing oops handling API with following specification
and enable stub implementation for Linux and FreeBSD.

On rte_eal_init() invocation, the EAL library installs the
oops handler for the essential signals.
The rte_oops_signals_enabled() API provides the list
of signals the library installed by the EAL.

The default EAL oops handler decodes the oops message using
rte_oops_decode() and then calls the signal handler
installed by the application before invoking the rte_eal_init().
This scheme will also enable the use of the default coredump
handler(for gdb etc.) provided by OS if the application does
not install any specific signal handler.

The second case where the application installs the signal
handler after the rte_eal_init() invocation, rte_oops_decode()
provides the means of decoding the oops message in
the application's fault handler.

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
---
 doc/api/doxy-api-index.md    |   3 +-
 lib/eal/common/eal_private.h |   3 ++
 lib/eal/freebsd/eal.c        |   6 +++
 lib/eal/include/meson.build  |   1 +
 lib/eal/include/rte_oops.h   | 100 +++++++++++++++++++++++++++++++++++
 lib/eal/linux/eal.c          |   6 +++
 lib/eal/unix/eal_oops.c      |  36 +++++++++++++
 lib/eal/unix/meson.build     |   1 +
 lib/eal/version.map          |   4 ++
 9 files changed, 159 insertions(+), 1 deletion(-)
 create mode 100644 lib/eal/include/rte_oops.h
 create mode 100644 lib/eal/unix/eal_oops.c

diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index 1992107a03..0d0da35205 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -215,7 +215,8 @@ The public API headers are grouped by topics:
   [log]                (@ref rte_log.h),
   [errno]              (@ref rte_errno.h),
   [trace]              (@ref rte_trace.h),
-  [trace_point]        (@ref rte_trace_point.h)
+  [trace_point]        (@ref rte_trace_point.h),
+  [oops]               (@ref rte_oops.h)
 
 - **misc**:
   [EAL config]         (@ref rte_eal.h),
diff --git a/lib/eal/common/eal_private.h b/lib/eal/common/eal_private.h
index 64cf4e81c8..c3a490d803 100644
--- a/lib/eal/common/eal_private.h
+++ b/lib/eal/common/eal_private.h
@@ -716,6 +716,9 @@ void __rte_thread_init(unsigned int lcore_id, rte_cpuset_t *cpuset);
  */
 void __rte_thread_uninit(void);
 
+int eal_oops_init(void);
+void eal_oops_fini(void);
+
 /**
  * asprintf(3) replacement for Windows.
  */
diff --git a/lib/eal/freebsd/eal.c b/lib/eal/freebsd/eal.c
index 6cee5ae369..3c098708c6 100644
--- a/lib/eal/freebsd/eal.c
+++ b/lib/eal/freebsd/eal.c
@@ -692,6 +692,11 @@ rte_eal_init(int argc, char **argv)
 		return -1;
 	}
 
+	if (eal_oops_init()) {
+		rte_eal_init_alert("oops init failed.");
+		rte_errno = ENOENT;
+	}
+
 	thread_id = pthread_self();
 
 	eal_reset_internal_config(internal_conf);
@@ -974,6 +979,7 @@ rte_eal_cleanup(void)
 	rte_trace_save();
 	eal_trace_fini();
 	eal_cleanup_config(internal_conf);
+	eal_oops_fini();
 	return 0;
 }
 
diff --git a/lib/eal/include/meson.build b/lib/eal/include/meson.build
index 88a9eba12f..6c74bdb7b5 100644
--- a/lib/eal/include/meson.build
+++ b/lib/eal/include/meson.build
@@ -30,6 +30,7 @@ headers += files(
         'rte_malloc.h',
         'rte_memory.h',
         'rte_memzone.h',
+        'rte_oops.h',
         'rte_pci_dev_feature_defs.h',
         'rte_pci_dev_features.h',
         'rte_per_lcore.h',
diff --git a/lib/eal/include/rte_oops.h b/lib/eal/include/rte_oops.h
new file mode 100644
index 0000000000..ff82c409ec
--- /dev/null
+++ b/lib/eal/include/rte_oops.h
@@ -0,0 +1,100 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2020 Marvell.
+ */
+
+#ifndef _RTE_OOPS_H_
+#define _RTE_OOPS_H_
+
+#include <rte_common.h>
+#include <rte_compat.h>
+#include <rte_config.h>
+
+/**
+ * @file
+ *
+ * RTE oops API
+ *
+ * This file provides the oops handling APIs to RTE applications.
+ *
+ * On rte_eal_init() invocation, the EAL library installs the oops handler for
+ * the essential signals. The rte_oops_signals_enabled() API provides the list
+ * of signals the library installed by the EAL.
+ *
+ * The default EAL oops handler decodes the oops message using rte_oops_decode()
+ * and then calls the signal handler installed by the application before
+ * invoking the rte_eal_init(). This scheme will also enable the use of
+ * the default coredump handler(for gdb etc.) provided by OS if the application
+ * does not install any specific signal handler.
+ *
+ * The second case where the application installs the signal handler after
+ * the rte_eal_init() invocation, rte_oops_decode() provides the means of
+ * decoding the oops message in the application's fault handler.
+ *
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * Maximum number of oops signals enabled in EAL.
+ * @see rte_oops_signals_enabled()
+ */
+#define RTE_OOPS_SIGNALS_MAX 32
+
+/**
+ * Get the list of enabled oops signals installed by EAL.
+ *
+ * @param [out] signals
+ *   A pointer to store the enabled signals.
+ *   Value NULL is allowed. if not NULL, then the size of this array must be
+ *   at least RTE_OOPS_SIGNALS_MAX.
+ *
+ * @return
+ *   Number of enabled oops signals.
+ */
+__rte_experimental
+int rte_oops_signals_enabled(int *signals);
+
+#if defined(RTE_EXEC_ENV_LINUX) || defined(RTE_EXEC_ENV_FREEBSD)
+#include <signal.h>
+#include <ucontext.h>
+
+/**
+ * Decode an oops
+ *
+ * This prototype is same as sa_sigaction defined in signal.h.
+ * Application must register signal handler using sigaction() with
+ * sa_flag as SA_SIGINFO flag to get this information from unix OS.
+ *
+ * @param sig
+ *   Signal number
+ * @param info
+ *   Signal info provided by sa_sigaction. Value NULL is allowed.
+ * @param uc
+ *   ucontext_t provided when signal installed with SA_SIGINFO flag.
+ *   Value NULL is allowed.
+ *
+ */
+__rte_experimental
+void rte_oops_decode(int sig, siginfo_t *info, ucontext_t *uc);
+#else
+
+/**
+ * Decode an oops
+ *
+ * @param sig
+ *   Signal number
+ */
+__rte_experimental
+void rte_oops_decode(int sig);
+
+#endif
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_OOPS_H_ */
diff --git a/lib/eal/linux/eal.c b/lib/eal/linux/eal.c
index 3577eaeaa4..3438a96b75 100644
--- a/lib/eal/linux/eal.c
+++ b/lib/eal/linux/eal.c
@@ -991,6 +991,11 @@ rte_eal_init(int argc, char **argv)
 		return -1;
 	}
 
+	if (eal_oops_init()) {
+		rte_eal_init_alert("oops init failed.");
+		rte_errno = ENOENT;
+	}
+
 	p = strrchr(argv[0], '/');
 	strlcpy(logid, p ? p + 1 : argv[0], sizeof(logid));
 	thread_id = pthread_self();
@@ -1371,6 +1376,7 @@ rte_eal_cleanup(void)
 	rte_trace_save();
 	eal_trace_fini();
 	eal_cleanup_config(internal_conf);
+	eal_oops_fini();
 	return 0;
 }
 
diff --git a/lib/eal/unix/eal_oops.c b/lib/eal/unix/eal_oops.c
new file mode 100644
index 0000000000..53b580f733
--- /dev/null
+++ b/lib/eal/unix/eal_oops.c
@@ -0,0 +1,36 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2021 Marvell.
+ */
+
+
+#include <rte_oops.h>
+
+#include "eal_private.h"
+
+void
+rte_oops_decode(int sig, siginfo_t *info, ucontext_t *uc)
+{
+	RTE_SET_USED(sig);
+	RTE_SET_USED(info);
+	RTE_SET_USED(uc);
+
+}
+
+int
+rte_oops_signals_enabled(int *signals)
+{
+	RTE_SET_USED(signals);
+
+	return 0;
+}
+
+int
+eal_oops_init(void)
+{
+	return 0;
+}
+
+void
+eal_oops_fini(void)
+{
+}
diff --git a/lib/eal/unix/meson.build b/lib/eal/unix/meson.build
index e3ecd3e956..cdd3320669 100644
--- a/lib/eal/unix/meson.build
+++ b/lib/eal/unix/meson.build
@@ -6,5 +6,6 @@ sources += files(
         'eal_unix_memory.c',
         'eal_unix_timer.c',
         'eal_firmware.c',
+        'eal_oops.c',
         'rte_thread.c',
 )
diff --git a/lib/eal/version.map b/lib/eal/version.map
index 887012d02a..f2841d09fd 100644
--- a/lib/eal/version.map
+++ b/lib/eal/version.map
@@ -426,6 +426,10 @@ EXPERIMENTAL {
 
 	# added in 21.08
 	rte_power_monitor_multi; # WINDOWS_NO_EXPORT
+
+	# added in 21.11
+	rte_oops_signals_enabled; # WINDOWS_NO_EXPORT
+	rte_oops_decode; # WINDOWS_NO_EXPORT
 };
 
 INTERNAL {
-- 
2.32.0


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [dpdk-dev] [PATCH v2 2/6] eal: oops handling API implementation
  2021-08-17  3:27   ` [dpdk-dev] [PATCH v2 0/6] support oops handling jerinj
  2021-08-17  3:27     ` [dpdk-dev] [PATCH v2 1/6] eal: introduce oops handling API jerinj
@ 2021-08-17  3:27     ` jerinj
  2021-08-17  3:52       ` Stephen Hemminger
  2021-08-17  3:27     ` [dpdk-dev] [PATCH v2 3/6] eal: support libunwind based backtrace jerinj
                       ` (4 subsequent siblings)
  6 siblings, 1 reply; 45+ messages in thread
From: jerinj @ 2021-08-17  3:27 UTC (permalink / raw)
  To: dev
  Cc: thomas, david.marchand, bruce.richardson, dmitry.kozliuk,
	navasile, dmitrym, pallavi.kadam, konstantin.ananyev,
	ruifeng.wang, viktorin, drc, Jerin Jacob

From: Jerin Jacob <jerinj@marvell.com>

Implement the base oops handling APIs.

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
---
 lib/eal/unix/eal_oops.c | 176 ++++++++++++++++++++++++++++++++++++++--
 1 file changed, 169 insertions(+), 7 deletions(-)

diff --git a/lib/eal/unix/eal_oops.c b/lib/eal/unix/eal_oops.c
index 53b580f733..7b12cfd5f5 100644
--- a/lib/eal/unix/eal_oops.c
+++ b/lib/eal/unix/eal_oops.c
@@ -2,35 +2,197 @@
  * Copyright(C) 2021 Marvell.
  */
 
+#include <inttypes.h>
+#include <signal.h>
+#include <ucontext.h>
+#include <unistd.h>
 
+#include <rte_byteorder.h>
+#include <rte_debug.h>
+#include <rte_log.h>
 #include <rte_oops.h>
 
 #include "eal_private.h"
 
-void
-rte_oops_decode(int sig, siginfo_t *info, ucontext_t *uc)
+#define oops_print(...) rte_log(RTE_LOG_ERR, RTE_LOGTYPE_EAL, __VA_ARGS__)
+
+static int oops_signals[] = {SIGSEGV, SIGBUS, SIGILL, SIGABRT, SIGFPE, SIGSYS};
+
+struct oops_signal {
+	int sig;
+	bool enabled;
+	struct sigaction sa;
+};
+
+static struct oops_signal signals_db[RTE_DIM(oops_signals)];
+
+static void
+back_trace_dump(ucontext_t *context)
+{
+	RTE_SET_USED(context);
+
+	rte_dump_stack();
+}
+static void
+siginfo_dump(int sig, siginfo_t *info)
+{
+	oops_print("PID:           %" PRIdMAX "\n", (intmax_t)getpid());
+
+	if (info == NULL)
+		return;
+	if (sig != info->si_signo)
+		oops_print("Invalid signal info\n");
+
+	oops_print("Signal number: %d\n", info->si_signo);
+	oops_print("Fault address: %p\n", info->si_addr);
+}
+
+static void
+mem32_dump(void *ptr)
+{
+	uint32_t *p = ptr;
+	int i;
+
+	for (i = 0; i < 16; i++)
+		oops_print("%p: 0x%x\n", p + i, rte_be_to_cpu_32(p[i]));
+}
+
+static void
+stack_dump_header(void)
+{
+	oops_print("Stack dump:\n");
+	oops_print("----------\n");
+}
+
+static void
+code_dump_header(void)
+{
+	oops_print("Code dump:\n");
+	oops_print("----------\n");
+}
+
+static void
+stack_code_dump(void *stack, void *code)
+{
+	if (stack == NULL || code == NULL)
+		return;
+
+	oops_print("\n");
+	stack_dump_header();
+	mem32_dump(stack);
+	oops_print("\n");
+
+	code_dump_header();
+	mem32_dump(code);
+	oops_print("\n");
+}
+static void
+archinfo_dump(ucontext_t *uc)
 {
-	RTE_SET_USED(sig);
-	RTE_SET_USED(info);
 	RTE_SET_USED(uc);
 
+	stack_code_dump(NULL, NULL);
+}
+
+static void
+default_signal_handler_invoke(int sig)
+{
+	unsigned int idx;
+
+	for (idx = 0; idx < RTE_DIM(oops_signals); idx++) {
+		/* Skip disabled signals */
+		if (signals_db[idx].sig != sig)
+			continue;
+		if (!signals_db[idx].enabled)
+			continue;
+		/* Replace with stored handler */
+		sigaction(sig, &signals_db[idx].sa, NULL);
+		kill(getpid(), sig);
+	}
+}
+
+void
+rte_oops_decode(int sig, siginfo_t *info, ucontext_t *uc)
+{
+	oops_print("Signal info:\n");
+	oops_print("------------\n");
+	siginfo_dump(sig, info);
+	oops_print("\n");
+
+	oops_print("Backtrace:\n");
+	oops_print("----------\n");
+	back_trace_dump(uc);
+	oops_print("\n");
+
+	oops_print("Arch info:\n");
+	oops_print("----------\n");
+	if (uc)
+		archinfo_dump(uc);
+}
+
+static void
+eal_oops_handler(int sig, siginfo_t *info, void *ctx)
+{
+	ucontext_t *uc = ctx;
+
+	rte_oops_decode(sig, info, uc);
+	default_signal_handler_invoke(sig);
 }
 
 int
 rte_oops_signals_enabled(int *signals)
 {
-	RTE_SET_USED(signals);
+	int count = 0, sig[RTE_OOPS_SIGNALS_MAX];
+	unsigned int idx = 0;
 
-	return 0;
+	for (idx = 0; idx < RTE_DIM(oops_signals); idx++) {
+		if (signals_db[idx].enabled) {
+			sig[count] = signals_db[idx].sig;
+			count++;
+		}
+	}
+	if (signals)
+		memcpy(signals, sig, sizeof(*signals) * count);
+
+	return count;
 }
 
 int
 eal_oops_init(void)
 {
-	return 0;
+	unsigned int idx, rc = 0;
+	struct sigaction sa;
+
+	RTE_BUILD_BUG_ON(RTE_DIM(oops_signals) > RTE_OOPS_SIGNALS_MAX);
+
+	sigemptyset(&sa.sa_mask);
+	sa.sa_sigaction = &eal_oops_handler;
+	sa.sa_flags = SA_RESTART | SA_SIGINFO | SA_ONSTACK;
+
+	for (idx = 0; idx < RTE_DIM(oops_signals); idx++) {
+		signals_db[idx].sig = oops_signals[idx];
+		/* Get exiting sigaction */
+		rc = sigaction(signals_db[idx].sig, NULL, &signals_db[idx].sa);
+		if (rc)
+			continue;
+		/* Replace with oops handler */
+		rc = sigaction(signals_db[idx].sig, &sa, NULL);
+		if (rc)
+			continue;
+		signals_db[idx].enabled = true;
+	}
+	return rc;
 }
 
 void
 eal_oops_fini(void)
 {
+	unsigned int idx;
+
+	for (idx = 0; idx < RTE_DIM(oops_signals); idx++) {
+		if (!signals_db[idx].enabled)
+			continue;
+		/* Replace with stored handler */
+		sigaction(signals_db[idx].sig, &signals_db[idx].sa, NULL);
+	}
 }
-- 
2.32.0


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [dpdk-dev] [PATCH v2 3/6] eal: support libunwind based backtrace
  2021-08-17  3:27   ` [dpdk-dev] [PATCH v2 0/6] support oops handling jerinj
  2021-08-17  3:27     ` [dpdk-dev] [PATCH v2 1/6] eal: introduce oops handling API jerinj
  2021-08-17  3:27     ` [dpdk-dev] [PATCH v2 2/6] eal: oops handling API implementation jerinj
@ 2021-08-17  3:27     ` jerinj
  2021-08-17  3:27     ` [dpdk-dev] [PATCH v2 4/6] eal/x86: support register dump for oops jerinj
                       ` (3 subsequent siblings)
  6 siblings, 0 replies; 45+ messages in thread
From: jerinj @ 2021-08-17  3:27 UTC (permalink / raw)
  To: dev, Aaron Conole, Michael Santana, Bruce Richardson
  Cc: thomas, david.marchand, dmitry.kozliuk, navasile, dmitrym,
	pallavi.kadam, konstantin.ananyev, ruifeng.wang, viktorin, drc,
	Jerin Jacob

From: Jerin Jacob <jerinj@marvell.com>

adding optional libwind library dependency to DPDK for
enhanced backtrace based on ucontext.

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
---
 .github/workflows/build.yml |  2 +-
 .travis.yml                 |  2 +-
 config/meson.build          |  8 +++++++
 lib/eal/unix/eal_oops.c     | 47 +++++++++++++++++++++++++++++++++++++
 4 files changed, 57 insertions(+), 2 deletions(-)

diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml
index 7dac20ddeb..caaca207a6 100644
--- a/.github/workflows/build.yml
+++ b/.github/workflows/build.yml
@@ -93,7 +93,7 @@ jobs:
       run: sudo apt install -y ccache libnuma-dev python3-setuptools
         python3-wheel python3-pip python3-pyelftools ninja-build libbsd-dev
         libpcap-dev libibverbs-dev libcrypto++-dev libfdt-dev libjansson-dev
-        libarchive-dev
+        libarchive-dev libunwind-dev
     - name: Install libabigail build dependencies if no cache is available
       if: env.ABI_CHECKS == 'true' && steps.libabigail-cache.outputs.cache-hit != 'true'
       run: sudo apt install -y autoconf automake libtool pkg-config libxml2-dev
diff --git a/.travis.yml b/.travis.yml
index 23067d9e3c..e72b156014 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -16,7 +16,7 @@ addons:
     packages: &required_packages
       - [libnuma-dev, python3-setuptools, python3-wheel, python3-pip, python3-pyelftools, ninja-build]
       - [libbsd-dev, libpcap-dev, libibverbs-dev, libcrypto++-dev, libfdt-dev, libjansson-dev]
-      - [libarchive-dev]
+      - [libarchive-dev, libunwind-dev]
 
 _aarch64_packages: &aarch64_packages
   - *required_packages
diff --git a/config/meson.build b/config/meson.build
index e80421003b..26a85dab6b 100644
--- a/config/meson.build
+++ b/config/meson.build
@@ -236,6 +236,14 @@ if cc.get_id() == 'clang' and dpdk_conf.get('RTE_ARCH_64') == false
     dpdk_extra_ldflags += '-latomic'
 endif
 
+# check for libunwind
+unwind_dep = dependency('libunwind', required: false, method: 'pkg-config')
+if unwind_dep.found() and cc.has_header('libunwind.h', dependencies: unwind_dep)
+    dpdk_conf.set('RTE_USE_LIBUNWIND', 1)
+    add_project_link_arguments('-lunwind', language: 'c')
+    dpdk_extra_ldflags += '-lunwind'
+endif
+
 # add -include rte_config to cflags
 add_project_arguments('-include', 'rte_config.h', language: 'c')
 
diff --git a/lib/eal/unix/eal_oops.c b/lib/eal/unix/eal_oops.c
index 7b12cfd5f5..a7f00ecd4e 100644
--- a/lib/eal/unix/eal_oops.c
+++ b/lib/eal/unix/eal_oops.c
@@ -26,6 +26,50 @@ struct oops_signal {
 
 static struct oops_signal signals_db[RTE_DIM(oops_signals)];
 
+#if defined(RTE_USE_LIBUNWIND)
+
+#define BACKTRACE_DEPTH 256
+#define UNW_LOCAL_ONLY
+#include <libunwind.h>
+
+static void
+back_trace_dump(ucontext_t *context)
+{
+	unw_cursor_t cursor;
+	unw_word_t ip, off;
+	int rc, level = 0;
+	char name[256];
+
+	if (context == NULL) {
+		rte_dump_stack();
+		return;
+	}
+
+	rc = unw_init_local(&cursor, (unw_context_t *)context);
+	if (rc < 0)
+		goto fail;
+
+	for (;;) {
+		rc = unw_get_reg(&cursor, UNW_REG_IP, &ip);
+		if (rc < 0)
+			goto fail;
+		rc = unw_get_proc_name(&cursor, name, sizeof(name), &off);
+		if (rc == 0)
+			oops_print("[%16p]: %s()+0x%" PRIx64 "\n", (void *)ip,
+				   name, (uint64_t)off);
+		else
+			oops_print("[%16p]: <unknown>\n", (void *)ip);
+		rc = unw_step(&cursor);
+		if (rc <= 0 || ++level >= BACKTRACE_DEPTH)
+			break;
+	}
+	return;
+fail:
+	oops_print("libunwind call failed %s\n", unw_strerror(rc));
+}
+
+#else
+
 static void
 back_trace_dump(ucontext_t *context)
 {
@@ -33,6 +77,9 @@ back_trace_dump(ucontext_t *context)
 
 	rte_dump_stack();
 }
+
+#endif
+
 static void
 siginfo_dump(int sig, siginfo_t *info)
 {
-- 
2.32.0


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [dpdk-dev] [PATCH v2 4/6] eal/x86: support register dump for oops
  2021-08-17  3:27   ` [dpdk-dev] [PATCH v2 0/6] support oops handling jerinj
                       ` (2 preceding siblings ...)
  2021-08-17  3:27     ` [dpdk-dev] [PATCH v2 3/6] eal: support libunwind based backtrace jerinj
@ 2021-08-17  3:27     ` jerinj
  2021-08-17  3:27     ` [dpdk-dev] [PATCH v2 5/6] eal/arm64: " jerinj
                       ` (2 subsequent siblings)
  6 siblings, 0 replies; 45+ messages in thread
From: jerinj @ 2021-08-17  3:27 UTC (permalink / raw)
  To: dev
  Cc: thomas, david.marchand, bruce.richardson, dmitry.kozliuk,
	navasile, dmitrym, pallavi.kadam, konstantin.ananyev,
	ruifeng.wang, viktorin, drc, Jerin Jacob

From: Jerin Jacob <jerinj@marvell.com>

Dump the x86 arch state register in oops
handling routine.

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
---
 lib/eal/unix/eal_oops.c | 34 ++++++++++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)

diff --git a/lib/eal/unix/eal_oops.c b/lib/eal/unix/eal_oops.c
index a7f00ecd4e..a0f9526d96 100644
--- a/lib/eal/unix/eal_oops.c
+++ b/lib/eal/unix/eal_oops.c
@@ -133,6 +133,38 @@ stack_code_dump(void *stack, void *code)
 	mem32_dump(code);
 	oops_print("\n");
 }
+
+#if defined(RTE_ARCH_X86_64) && defined(RTE_EXEC_ENV_LINUX)
+static void
+archinfo_dump(ucontext_t *uc)
+{
+
+	mcontext_t *mc = &uc->uc_mcontext;
+
+	oops_print("R8 : 0x%.16llx  ", mc->gregs[REG_R8]);
+	oops_print("R9 : 0x%.16llx\n", mc->gregs[REG_R9]);
+	oops_print("R10: 0x%.16llx  ", mc->gregs[REG_R10]);
+	oops_print("R11: 0x%.16llx\n", mc->gregs[REG_R11]);
+	oops_print("R12: 0x%.16llx  ", mc->gregs[REG_R12]);
+	oops_print("R13: 0x%.16llx\n", mc->gregs[REG_R13]);
+	oops_print("R14: 0x%.16llx  ", mc->gregs[REG_R14]);
+	oops_print("R15: 0x%.16llx\n", mc->gregs[REG_R15]);
+	oops_print("RAX: 0x%.16llx  ", mc->gregs[REG_RAX]);
+	oops_print("RBX: 0x%.16llx\n", mc->gregs[REG_RBX]);
+	oops_print("RCX: 0x%.16llx  ", mc->gregs[REG_RCX]);
+	oops_print("RDX: 0x%.16llx\n", mc->gregs[REG_RDX]);
+	oops_print("RBP: 0x%.16llx  ", mc->gregs[REG_RBP]);
+	oops_print("RSP: 0x%.16llx\n", mc->gregs[REG_RSP]);
+	oops_print("RSI: 0x%.16llx  ", mc->gregs[REG_RSI]);
+	oops_print("RDI: 0x%.16llx\n", mc->gregs[REG_RDI]);
+	oops_print("RIP: 0x%.16llx  ", mc->gregs[REG_RIP]);
+	oops_print("EFL: 0x%.16llx\n", mc->gregs[REG_EFL]);
+
+	stack_code_dump((void *)mc->gregs[REG_RSP], (void *)mc->gregs[REG_RIP]);
+}
+
+#else
+
 static void
 archinfo_dump(ucontext_t *uc)
 {
@@ -141,6 +173,8 @@ archinfo_dump(ucontext_t *uc)
 	stack_code_dump(NULL, NULL);
 }
 
+#endif
+
 static void
 default_signal_handler_invoke(int sig)
 {
-- 
2.32.0


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [dpdk-dev] [PATCH v2 5/6] eal/arm64: support register dump for oops
  2021-08-17  3:27   ` [dpdk-dev] [PATCH v2 0/6] support oops handling jerinj
                       ` (3 preceding siblings ...)
  2021-08-17  3:27     ` [dpdk-dev] [PATCH v2 4/6] eal/x86: support register dump for oops jerinj
@ 2021-08-17  3:27     ` jerinj
  2021-08-17  3:27     ` [dpdk-dev] [PATCH v2 6/6] test/oops: support unit test case for oops handling APIs jerinj
  2021-09-06  4:17     ` [dpdk-dev] [PATCH v3 0/6] support oops handling jerinj
  6 siblings, 0 replies; 45+ messages in thread
From: jerinj @ 2021-08-17  3:27 UTC (permalink / raw)
  To: dev
  Cc: thomas, david.marchand, bruce.richardson, dmitry.kozliuk,
	navasile, dmitrym, pallavi.kadam, konstantin.ananyev,
	ruifeng.wang, viktorin, drc, Jerin Jacob

From: Jerin Jacob <jerinj@marvell.com>

Dump the arm64 arch state register in oops
handling routine.

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
---
 lib/eal/unix/eal_oops.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/lib/eal/unix/eal_oops.c b/lib/eal/unix/eal_oops.c
index a0f9526d96..9c783f936a 100644
--- a/lib/eal/unix/eal_oops.c
+++ b/lib/eal/unix/eal_oops.c
@@ -163,6 +163,25 @@ archinfo_dump(ucontext_t *uc)
 	stack_code_dump((void *)mc->gregs[REG_RSP], (void *)mc->gregs[REG_RIP]);
 }
 
+#elif defined(RTE_ARCH_ARM64) && defined(RTE_EXEC_ENV_LINUX)
+
+static void
+archinfo_dump(ucontext_t *uc)
+{
+	mcontext_t *mc = &uc->uc_mcontext;
+	int i;
+
+	oops_print("PC : 0x%.16llx ", mc->pc);
+	oops_print("SP : 0x%.16llx\n", mc->sp);
+	for (i = 0; i < 31; i++)
+		oops_print("X%.2d: 0x%.16llx%s", i, mc->regs[i],
+			   i & 0x1 ? "\n" : " ");
+
+	oops_print("PSTATE: 0x%.16llx\n", mc->pstate);
+
+	stack_code_dump((void *)mc->sp, (void *)mc->pc);
+}
+
 #else
 
 static void
-- 
2.32.0


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [dpdk-dev] [PATCH v2 6/6] test/oops: support unit test case for oops handling APIs
  2021-08-17  3:27   ` [dpdk-dev] [PATCH v2 0/6] support oops handling jerinj
                       ` (4 preceding siblings ...)
  2021-08-17  3:27     ` [dpdk-dev] [PATCH v2 5/6] eal/arm64: " jerinj
@ 2021-08-17  3:27     ` jerinj
  2021-09-06  4:17     ` [dpdk-dev] [PATCH v3 0/6] support oops handling jerinj
  6 siblings, 0 replies; 45+ messages in thread
From: jerinj @ 2021-08-17  3:27 UTC (permalink / raw)
  To: dev
  Cc: thomas, david.marchand, bruce.richardson, dmitry.kozliuk,
	navasile, dmitrym, pallavi.kadam, konstantin.ananyev,
	ruifeng.wang, viktorin, drc, Jerin Jacob

From: Jerin Jacob <jerinj@marvell.com>

Added unit test cases for all the oops handling APIs.

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
---
 app/test/meson.build |   2 +
 app/test/test_oops.c | 121 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 123 insertions(+)
 create mode 100644 app/test/test_oops.c

diff --git a/app/test/meson.build b/app/test/meson.build
index a7611686ad..1e471ab351 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -97,6 +97,7 @@ test_sources = files(
         'test_metrics.c',
         'test_mcslock.c',
         'test_mp_secondary.c',
+        'test_oops.c',
         'test_per_lcore.c',
         'test_pflock.c',
         'test_pmd_perf.c',
@@ -236,6 +237,7 @@ fast_tests = [
         ['memzone_autotest', false],
         ['meter_autotest', true],
         ['multiprocess_autotest', false],
+        ['oops_autotest', true],
         ['per_lcore_autotest', true],
         ['pflock_autotest', true],
         ['prefetch_autotest', true],
diff --git a/app/test/test_oops.c b/app/test/test_oops.c
new file mode 100644
index 0000000000..60a7f259c7
--- /dev/null
+++ b/app/test/test_oops.c
@@ -0,0 +1,121 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2021 Marvell
+ */
+
+#include <setjmp.h>
+#include <signal.h>
+
+#include <rte_config.h>
+#include <rte_oops.h>
+
+#include "test.h"
+
+static jmp_buf pc;
+static bool detected_segfault;
+
+static void
+segv_handler(int sig, siginfo_t *info, void *ctx)
+{
+	detected_segfault = true;
+	rte_oops_decode(sig, info, (ucontext_t *)ctx);
+	longjmp(pc, 1);
+}
+
+/* OS specific way install the signal segfault handler*/
+static int
+segv_handler_install(void)
+{
+	struct sigaction sa;
+
+	sigemptyset(&sa.sa_mask);
+	sa.sa_sigaction = &segv_handler;
+	sa.sa_flags = SA_SIGINFO;
+
+	return sigaction(SIGSEGV, &sa, NULL);
+}
+
+static int
+test_oops_generate(void)
+{
+	int rc;
+
+	rc = segv_handler_install();
+	TEST_ASSERT_EQUAL(rc, 0, "rc=%d\n", rc);
+
+	detected_segfault = false;
+	rc = setjmp(pc); /* Save the execution state */
+	if (rc == 0) {
+		/* Generate a segfault */
+		*(volatile int *)0x05 = 0;
+	} else { /* logjump from segv_handler */
+		if (detected_segfault)
+			return TEST_SUCCESS;
+
+	}
+	return TEST_FAILED;
+}
+
+static int
+test_signal_handler_installed(int count, int *signals)
+{
+	int i, rc, verified = 0;
+	struct sigaction sa;
+
+	for (i = 0; i < count; i++) {
+		rc = sigaction(signals[i], NULL, &sa);
+		if (rc) {
+			printf("Failed to get sigaction for %d", signals[i]);
+			continue;
+		}
+		if (sa.sa_handler != SIG_DFL)
+			verified++;
+	}
+	TEST_ASSERT_EQUAL(count, verified, "count=%d verified=%d\n", count,
+			  verified);
+	return TEST_SUCCESS;
+}
+
+static int
+test_oops_signals_enabled(void)
+{
+	int *signals = NULL;
+	int i, rc;
+
+	rc = rte_oops_signals_enabled(signals);
+	TEST_ASSERT_NOT_EQUAL(rc, 0, "rc=%d\n", rc);
+
+	signals = malloc(sizeof(int) * rc);
+	rc = rte_oops_signals_enabled(signals);
+	TEST_ASSERT_NOT_EQUAL(rc, 0, "rc=%d\n", rc);
+	free(signals);
+
+	signals = malloc(sizeof(int) * RTE_OOPS_SIGNALS_MAX);
+	rc = rte_oops_signals_enabled(signals);
+	TEST_ASSERT_NOT_EQUAL(rc, 0, "rc=%d\n", rc);
+
+	for (i = 0; i < rc; i++)
+		TEST_ASSERT_NOT_EQUAL(signals[i], 0, "idx=%d val=%d\n", i,
+				      signals[i]);
+
+	rc = test_signal_handler_installed(rc, signals);
+	free(signals);
+
+	return rc;
+}
+
+static struct unit_test_suite oops_tests = {
+	.suite_name = "oops autotest",
+	.setup = NULL,
+	.teardown = NULL,
+	.unit_test_cases = {
+			    TEST_CASE(test_oops_signals_enabled),
+			    TEST_CASE(test_oops_generate),
+			    TEST_CASES_END()}};
+
+static int
+test_oops(void)
+{
+	return unit_test_suite_runner(&oops_tests);
+}
+
+REGISTER_TEST_COMMAND(oops_autotest, test_oops);
-- 
2.32.0


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [dpdk-dev] [PATCH v2 2/6] eal: oops handling API implementation
  2021-08-17  3:27     ` [dpdk-dev] [PATCH v2 2/6] eal: oops handling API implementation jerinj
@ 2021-08-17  3:52       ` Stephen Hemminger
  2021-08-17 10:24         ` Jerin Jacob
  0 siblings, 1 reply; 45+ messages in thread
From: Stephen Hemminger @ 2021-08-17  3:52 UTC (permalink / raw)
  To: jerinj
  Cc: dev, thomas, david.marchand, bruce.richardson, dmitry.kozliuk,
	navasile, dmitrym, pallavi.kadam, konstantin.ananyev,
	ruifeng.wang, viktorin, drc

On Tue, 17 Aug 2021 08:57:19 +0530
<jerinj@marvell.com> wrote:

> +#define oops_print(...) rte_log(RTE_LOG_ERR, RTE_LOGTYPE_EAL, __VA_ARGS__)

It is problematic to call rte_log from a signal handler.
The malloc pool maybe corrupted and rte_log can call functions that
use malloc.

Even rte_dump_stack() is unsafe from these signals.

> +
> +static int oops_signals[] = {SIGSEGV, SIGBUS, SIGILL, SIGABRT, SIGFPE, SIGSYS};

Should be constant.

> +
> +struct oops_signal {
> +	int sig;

Redundant, you defined the oops_signals above.

> +	bool enabled;

Redundant, you can just compare with action.

> +	struct sigaction sa;
> +};
> +
> +static struct oops_signal signals_db[RTE_DIM(oops_signals)];
> +
> +static void
> +back_trace_dump(ucontext_t *context)
> +{
> +	RTE_SET_USED(context);
> +
> +	rte_dump_stack();
> +}

rte_dump_stack() is not safe in signal handler:

Recommend backtrace_symbols_fd ??

Better yet use libunwind

> +static void
> +siginfo_dump(int sig, siginfo_t *info)
> +{
> +	oops_print("PID:           %" PRIdMAX "\n", (intmax_t)getpid());
> +
> +	if (info == NULL)
> +		return;
> +	if (sig != info->si_signo)
> +		oops_print("Invalid signal info\n");
> +
> +	oops_print("Signal number: %d\n", info->si_signo);
> +	oops_print("Fault address: %p\n", info->si_addr);
> +}
> +
> +static void
> +mem32_dump(void *ptr)

Should be const

> +{
> +	uint32_t *p = ptr;
> +	int i;
> +
> +	for (i = 0; i < 16; i++)
> +		oops_print("%p: 0x%x\n", p + i, rte_be_to_cpu_32(p[i]));
> +}

Why reinvent hexdump?

> +
> +static void
> +stack_dump_header(void)
> +{
> +	oops_print("Stack dump:\n");
> +	oops_print("----------\n");
> +}
> +
> +static void
> +code_dump_header(void)
> +{
> +	oops_print("Code dump:\n");
> +	oops_print("----------\n");
> +}
> +
> +static void
> +stack_code_dump(void *stack, void *code)
> +{
> +	if (stack == NULL || code == NULL)
> +		return;
> +
> +	oops_print("\n");
> +	stack_dump_header();
> +	mem32_dump(stack);
> +	oops_print("\n");
> +
> +	code_dump_header();
> +	mem32_dump(code);
> +	oops_print("\n");
> +}
> +static void
> +archinfo_dump(ucontext_t *uc)
>  {
> -	RTE_SET_USED(sig);
> -	RTE_SET_USED(info);
>  	RTE_SET_USED(uc);
>  
> +	stack_code_dump(NULL, NULL);
> +}
> +
> +static void
> +default_signal_handler_invoke(int sig)
> +{
> +	unsigned int idx;
> +
> +	for (idx = 0; idx < RTE_DIM(oops_signals); idx++) {
> +		/* Skip disabled signals */
> +		if (signals_db[idx].sig != sig)
> +			continue;
> +		if (!signals_db[idx].enabled)
> +			continue;
> +		/* Replace with stored handler */
> +		sigaction(sig, &signals_db[idx].sa, NULL);
> +		kill(getpid(), sig);

If you use SA_RESETHAND, you don't need this stuff.

> +	}
> +}
> +
> +void
> +rte_oops_decode(int sig, siginfo_t *info, ucontext_t *uc)
> +{
> +	oops_print("Signal info:\n");
> +	oops_print("------------\n");
> +	siginfo_dump(sig, info);
> +	oops_print("\n");
> +
> +	oops_print("Backtrace:\n");
> +	oops_print("----------\n");
> +	back_trace_dump(uc);
> +	oops_print("\n");
> +
> +	oops_print("Arch info:\n");
> +	oops_print("----------\n");
> +	if (uc)
> +		archinfo_dump(uc);
> +}
> +
> +static void
> +eal_oops_handler(int sig, siginfo_t *info, void *ctx)
> +{
> +	ucontext_t *uc = ctx;
> +
> +	rte_oops_decode(sig, info, uc);
> +	default_signal_handler_invoke(sig);

If you use SA_RESETHAND, then just doing raise(sig) here.
>  }
>  
>  int
>  rte_oops_signals_enabled(int *signals)

Why is this necessary and exported?

>  {
> -	RTE_SET_USED(signals);
> +	int count = 0, sig[RTE_OOPS_SIGNALS_MAX];
> +	unsigned int idx = 0;
>  
> -	return 0;
> +	for (idx = 0; idx < RTE_DIM(oops_signals); idx++) {
> +		if (signals_db[idx].enabled) {
> +			sig[count] = signals_db[idx].sig;
> +			count++;
> +		}
> +	}
> +	if (signals)
> +		memcpy(signals, sig, sizeof(*signals) * count);
> +
> +	return count;
>  }
>  
>  int
>  eal_oops_init(void)
>  {
> -	return 0;
> +	unsigned int idx, rc = 0;
> +	struct sigaction sa;
> +
> +	RTE_BUILD_BUG_ON(RTE_DIM(oops_signals) > RTE_OOPS_SIGNALS_MAX);
> +
> +	sigemptyset(&sa.sa_mask);
> +	sa.sa_sigaction = &eal_oops_handler;
> +	sa.sa_flags = SA_RESTART | SA_SIGINFO | SA_ONSTACK;
> +
> +	for (idx = 0; idx < RTE_DIM(oops_signals); idx++) {
> +		signals_db[idx].sig = oops_signals[idx];
> +		/* Get exiting sigaction */
> +		rc = sigaction(signals_db[idx].sig, NULL, &signals_db[idx].sa);
> +		if (rc)
> +			continue;
> +		/* Replace with oops handler */
> +		rc = sigaction(signals_db[idx].sig, &sa, NULL);
> +		if (rc)
> +			continue;
> +		signals_db[idx].enabled = true;
> +	}
> +	return rc;
>  }
>  
>  void
>  eal_oops_fini(void)
>  {
> +	unsigned int idx;
> +
> +	for (idx = 0; idx < RTE_DIM(oops_signals); idx++) {
> +		if (!signals_db[idx].enabled)
> +			continue;
> +		/* Replace with stored handler */
> +		sigaction(signals_db[idx].sig, &signals_db[idx].sa, NULL);
> +	}
>  }


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [dpdk-dev] [PATCH v2 1/6] eal: introduce oops handling API
  2021-08-17  3:27     ` [dpdk-dev] [PATCH v2 1/6] eal: introduce oops handling API jerinj
@ 2021-08-17  3:53       ` Stephen Hemminger
  2021-08-17  7:38         ` Jerin Jacob
  0 siblings, 1 reply; 45+ messages in thread
From: Stephen Hemminger @ 2021-08-17  3:53 UTC (permalink / raw)
  To: jerinj
  Cc: dev, Bruce Richardson, Ray Kinsella, thomas, david.marchand,
	dmitry.kozliuk, navasile, dmitrym, pallavi.kadam,
	konstantin.ananyev, ruifeng.wang, viktorin, drc

On Tue, 17 Aug 2021 08:57:18 +0530
<jerinj@marvell.com> wrote:

> From: Jerin Jacob <jerinj@marvell.com>
> 
> Introducing oops handling API with following specification
> and enable stub implementation for Linux and FreeBSD.
> 
> On rte_eal_init() invocation, the EAL library installs the
> oops handler for the essential signals.
> The rte_oops_signals_enabled() API provides the list
> of signals the library installed by the EAL.

This is a big change, and many applications already handle these
signals themselves. Therefore adding this needs to be opt-in
and not enabled by default.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [dpdk-dev] [PATCH v2 1/6] eal: introduce oops handling API
  2021-08-17  3:53       ` Stephen Hemminger
@ 2021-08-17  7:38         ` Jerin Jacob
  2021-08-17 15:09           ` Stephen Hemminger
  0 siblings, 1 reply; 45+ messages in thread
From: Jerin Jacob @ 2021-08-17  7:38 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Jerin Jacob, dpdk-dev, Bruce Richardson, Ray Kinsella,
	Thomas Monjalon, David Marchand, Dmitry Kozlyuk,
	Narcisa Ana Maria Vasile, Dmitry Malloy (MESHCHANINOV),
	Pallavi Kadam, Ananyev, Konstantin,
	Ruifeng Wang (Arm Technology China),
	Jan Viktorin, David Christensen

On Tue, Aug 17, 2021 at 9:23 AM Stephen Hemminger
<stephen@networkplumber.org> wrote:
>
> On Tue, 17 Aug 2021 08:57:18 +0530
> <jerinj@marvell.com> wrote:
>
> > From: Jerin Jacob <jerinj@marvell.com>
> >
> > Introducing oops handling API with following specification
> > and enable stub implementation for Linux and FreeBSD.
> >
> > On rte_eal_init() invocation, the EAL library installs the
> > oops handler for the essential signals.
> > The rte_oops_signals_enabled() API provides the list
> > of signals the library installed by the EAL.
>
> This is a big change, and many applications already handle these
> signals themselves. Therefore adding this needs to be opt-in
> and not enabled by default.

In order to avoid every application explicitly register this
sighandler and to cater to the
co-existing application-specific signal-hander usage.
The following design has been chosen. (It is mentioned in the commit log,
I will describe here for more clarity)

Case 1:
a) The application installs the signal handler prior to rte_eal_init().
b) Implementation stores the application-specific signal and replace a
signal handler as oops eal handler
c) when application/DPDK get the segfault, the default EAL oops
handler gets invoked
d) Then it dumps the EAL specific message, it calls the
application-specific signal handler
installed in step 1 by application. This avoids breaking any contract
with the application.
i.e Behavior is the same current EAL now.
That is the reason for not using SA_RESETHAND(which call SIG_DFL after
eal oops handler instead
application-specific handler)

Case 2:
a) The application install the signal handler after rte_eal_init(),
b) EAL hander get replaced with application handle then the application can call
rte_oops_decode() to decode.

In order to cater the above use case, rte_oops_signals_enabled() and
rte_oops_decode()
provided.

Here we are not breaking any contract with the application.
Do you have concerns about this design?

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [dpdk-dev] [PATCH v2 2/6] eal: oops handling API implementation
  2021-08-17  3:52       ` Stephen Hemminger
@ 2021-08-17 10:24         ` Jerin Jacob
  0 siblings, 0 replies; 45+ messages in thread
From: Jerin Jacob @ 2021-08-17 10:24 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Jerin Jacob, dpdk-dev, Thomas Monjalon, David Marchand,
	Richardson, Bruce, Dmitry Kozlyuk, Narcisa Ana Maria Vasile,
	Dmitry Malloy (MESHCHANINOV),
	Pallavi Kadam, Ananyev, Konstantin,
	Ruifeng Wang (Arm Technology China),
	Jan Viktorin, David Christensen

On Tue, Aug 17, 2021 at 9:22 AM Stephen Hemminger
<stephen@networkplumber.org> wrote:
>
> On Tue, 17 Aug 2021 08:57:19 +0530
> <jerinj@marvell.com> wrote:
>
> > +#define oops_print(...) rte_log(RTE_LOG_ERR, RTE_LOGTYPE_EAL, __VA_ARGS__)
>
> It is problematic to call rte_log from a signal handler.
> The malloc pool maybe corrupted and rte_log can call functions that
> use malloc.

OK. What to use instead, fprint(stderr, ...)?

>
> Even rte_dump_stack() is unsafe from these signals.

OK

>
> > +
> > +static int oops_signals[] = {SIGSEGV, SIGBUS, SIGILL, SIGABRT, SIGFPE, SIGSYS};
>
> Should be constant.

Ack

>
> > +
> > +struct oops_signal {
> > +     int sig;
>
> Redundant, you defined the oops_signals above.

Ack.

>
> > +     bool enabled;
>
> Redundant, you can just compare with action.

Anyway, we need to database to hold the sigactions. This makes clean
to implement rte_oops_signals_enabled().
Also != SIG_DFL is not enabled.

>
> > +     struct sigaction sa;
> > +};
> > +
> > +static struct oops_signal signals_db[RTE_DIM(oops_signals)];
> > +
> > +static void
> > +back_trace_dump(ucontext_t *context)
> > +{
> > +     RTE_SET_USED(context);
> > +
> > +     rte_dump_stack();
> > +}
>
> rte_dump_stack() is not safe in signal handler:
>
> Recommend backtrace_symbols_fd ??
>
> Better yet use libunwind

libunwind is an optional dependency. You can see in the next patch,
back_trace_dump() will be implemented with libunwind based stack unwind,
if the dependency is met.


>
> > +static void
> > +siginfo_dump(int sig, siginfo_t *info)
> > +{
> > +     oops_print("PID:           %" PRIdMAX "\n", (intmax_t)getpid());
> > +
> > +     if (info == NULL)
> > +             return;
> > +     if (sig != info->si_signo)
> > +             oops_print("Invalid signal info\n");
> > +
> > +     oops_print("Signal number: %d\n", info->si_signo);
> > +     oops_print("Fault address: %p\n", info->si_addr);
> > +}
> > +
> > +static void
> > +mem32_dump(void *ptr)
>
> Should be const

Ack.

>
> > +{
> > +     uint32_t *p = ptr;
> > +     int i;
> > +
> > +     for (i = 0; i < 16; i++)
> > +             oops_print("%p: 0x%x\n", p + i, rte_be_to_cpu_32(p[i]));
> > +}
 >
> Why reinvent hexdump?

Make sense. I can change to hexdump, But, it will use rte_log. Shouldn't we use
fprint(stderr,..) variant.

>
> > +
> > +static void
> > +stack_dump_header(void)
> > +{
> > +     oops_print("Stack dump:\n");
> > +     oops_print("----------\n");
> > +}
> > +
> > +static void
> > +code_dump_header(void)
> > +{
> > +     oops_print("Code dump:\n");
> > +     oops_print("----------\n");
> > +}
> > +
> > +static void
> > +stack_code_dump(void *stack, void *code)
> > +{
> > +     if (stack == NULL || code == NULL)
> > +             return;
> > +
> > +     oops_print("\n");
> > +     stack_dump_header();
> > +     mem32_dump(stack);
> > +     oops_print("\n");
> > +
> > +     code_dump_header();
> > +     mem32_dump(code);
> > +     oops_print("\n");
> > +}
> > +static void
> > +archinfo_dump(ucontext_t *uc)
> >  {
> > -     RTE_SET_USED(sig);
> > -     RTE_SET_USED(info);
> >       RTE_SET_USED(uc);
> >
> > +     stack_code_dump(NULL, NULL);
> > +}
> > +
> > +static void
> > +default_signal_handler_invoke(int sig)
> > +{
> > +     unsigned int idx;
> > +
> > +     for (idx = 0; idx < RTE_DIM(oops_signals); idx++) {
> > +             /* Skip disabled signals */
> > +             if (signals_db[idx].sig != sig)
> > +                     continue;
> > +             if (!signals_db[idx].enabled)
> > +                     continue;
> > +             /* Replace with stored handler */
> > +             sigaction(sig, &signals_db[idx].sa, NULL);
> > +             kill(getpid(), sig);
>
> If you use SA_RESETHAND, you don't need this stuff.

As mentioned in other 1/6 email reply, This is NOT the case where
SIG_DFL handler
called from eal oops handler, instead, it will be calling the signal
handler which
is registered prior to rte_eal_init() which is stored local database.



>
> > +     }
> > +}
> > +
> > +void
> > +rte_oops_decode(int sig, siginfo_t *info, ucontext_t *uc)
> > +{
> > +     oops_print("Signal info:\n");
> > +     oops_print("------------\n");
> > +     siginfo_dump(sig, info);
> > +     oops_print("\n");
> > +
> > +     oops_print("Backtrace:\n");
> > +     oops_print("----------\n");
> > +     back_trace_dump(uc);
> > +     oops_print("\n");
> > +
> > +     oops_print("Arch info:\n");
> > +     oops_print("----------\n");
> > +     if (uc)
> > +             archinfo_dump(uc);
> > +}
> > +
> > +static void
> > +eal_oops_handler(int sig, siginfo_t *info, void *ctx)
> > +{
> > +     ucontext_t *uc = ctx;
> > +
> > +     rte_oops_decode(sig, info, uc);
> > +     default_signal_handler_invoke(sig);
>
> If you use SA_RESETHAND, then just doing raise(sig) here.
> >  }
> >
> >  int
> >  rte_oops_signals_enabled(int *signals)
>
> Why is this necessary and exported?

Explained in 1/6 email reply.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [dpdk-dev] [PATCH v2 1/6] eal: introduce oops handling API
  2021-08-17  7:38         ` Jerin Jacob
@ 2021-08-17 15:09           ` Stephen Hemminger
  2021-08-17 15:27             ` Jerin Jacob
  0 siblings, 1 reply; 45+ messages in thread
From: Stephen Hemminger @ 2021-08-17 15:09 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: Jerin Jacob, dpdk-dev, Bruce Richardson, Ray Kinsella,
	Thomas Monjalon, David Marchand, Dmitry Kozlyuk,
	Narcisa Ana Maria Vasile, Dmitry Malloy (MESHCHANINOV),
	Pallavi Kadam, Ananyev, Konstantin,
	Ruifeng Wang (Arm Technology China),
	Jan Viktorin, David Christensen

On Tue, 17 Aug 2021 13:08:46 +0530
Jerin Jacob <jerinjacobk@gmail.com> wrote:

> On Tue, Aug 17, 2021 at 9:23 AM Stephen Hemminger
> <stephen@networkplumber.org> wrote:
> >
> > On Tue, 17 Aug 2021 08:57:18 +0530
> > <jerinj@marvell.com> wrote:
> >  
> > > From: Jerin Jacob <jerinj@marvell.com>
> > >
> > > Introducing oops handling API with following specification
> > > and enable stub implementation for Linux and FreeBSD.
> > >
> > > On rte_eal_init() invocation, the EAL library installs the
> > > oops handler for the essential signals.
> > > The rte_oops_signals_enabled() API provides the list
> > > of signals the library installed by the EAL.  
> >
> > This is a big change, and many applications already handle these
> > signals themselves. Therefore adding this needs to be opt-in
> > and not enabled by default.  
> 
> In order to avoid every application explicitly register this
> sighandler and to cater to the
> co-existing application-specific signal-hander usage.
> The following design has been chosen. (It is mentioned in the commit log,
> I will describe here for more clarity)
> 
> Case 1:
> a) The application installs the signal handler prior to rte_eal_init().
> b) Implementation stores the application-specific signal and replace a
> signal handler as oops eal handler
> c) when application/DPDK get the segfault, the default EAL oops
> handler gets invoked
> d) Then it dumps the EAL specific message, it calls the
> application-specific signal handler
> installed in step 1 by application. This avoids breaking any contract
> with the application.
> i.e Behavior is the same current EAL now.
> That is the reason for not using SA_RESETHAND(which call SIG_DFL after
> eal oops handler instead
> application-specific handler)
> 
> Case 2:
> a) The application install the signal handler after rte_eal_init(),
> b) EAL hander get replaced with application handle then the application can call
> rte_oops_decode() to decode.
> 
> In order to cater the above use case, rte_oops_signals_enabled() and
> rte_oops_decode()
> provided.
> 
> Here we are not breaking any contract with the application.
> Do you have concerns about this design?

In our application as a service it is important not to do any backtrace
in production. We rely on other infrastructure to process coredumps.

This should be controlled enabled by a command line argument.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [dpdk-dev] [PATCH v2 1/6] eal: introduce oops handling API
  2021-08-17 15:09           ` Stephen Hemminger
@ 2021-08-17 15:27             ` Jerin Jacob
  2021-08-17 15:52               ` Stephen Hemminger
  0 siblings, 1 reply; 45+ messages in thread
From: Jerin Jacob @ 2021-08-17 15:27 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Jerin Jacob, dpdk-dev, Bruce Richardson, Ray Kinsella,
	Thomas Monjalon, David Marchand, Dmitry Kozlyuk,
	Narcisa Ana Maria Vasile, Dmitry Malloy (MESHCHANINOV),
	Pallavi Kadam, Ananyev, Konstantin,
	Ruifeng Wang (Arm Technology China),
	Jan Viktorin, David Christensen

On Tue, Aug 17, 2021 at 8:39 PM Stephen Hemminger
<stephen@networkplumber.org> wrote:
>
> On Tue, 17 Aug 2021 13:08:46 +0530
> Jerin Jacob <jerinjacobk@gmail.com> wrote:
>
> > On Tue, Aug 17, 2021 at 9:23 AM Stephen Hemminger
> > <stephen@networkplumber.org> wrote:
> > >
> > > On Tue, 17 Aug 2021 08:57:18 +0530
> > > <jerinj@marvell.com> wrote:
> > >
> > > > From: Jerin Jacob <jerinj@marvell.com>
> > > >
> > > > Introducing oops handling API with following specification
> > > > and enable stub implementation for Linux and FreeBSD.
> > > >
> > > > On rte_eal_init() invocation, the EAL library installs the
> > > > oops handler for the essential signals.
> > > > The rte_oops_signals_enabled() API provides the list
> > > > of signals the library installed by the EAL.
> > >
> > > This is a big change, and many applications already handle these
> > > signals themselves. Therefore adding this needs to be opt-in
> > > and not enabled by default.
> >
> > In order to avoid every application explicitly register this
> > sighandler and to cater to the
> > co-existing application-specific signal-hander usage.
> > The following design has been chosen. (It is mentioned in the commit log,
> > I will describe here for more clarity)
> >
> > Case 1:
> > a) The application installs the signal handler prior to rte_eal_init().
> > b) Implementation stores the application-specific signal and replace a
> > signal handler as oops eal handler
> > c) when application/DPDK get the segfault, the default EAL oops
> > handler gets invoked
> > d) Then it dumps the EAL specific message, it calls the
> > application-specific signal handler
> > installed in step 1 by application. This avoids breaking any contract
> > with the application.
> > i.e Behavior is the same current EAL now.
> > That is the reason for not using SA_RESETHAND(which call SIG_DFL after
> > eal oops handler instead
> > application-specific handler)
> >
> > Case 2:
> > a) The application install the signal handler after rte_eal_init(),
> > b) EAL hander get replaced with application handle then the application can call
> > rte_oops_decode() to decode.
> >
> > In order to cater the above use case, rte_oops_signals_enabled() and
> > rte_oops_decode()
> > provided.
> >
> > Here we are not breaking any contract with the application.
> > Do you have concerns about this design?
>
> In our application as a service it is important not to do any backtrace
> in production. We rely on other infrastructure to process coredumps.

Other infrastructure will work. For example, If we are using standard coredump
using linux infra. In Current implementation,
- EAL handler dump the DPDK OOPS like kernel on stderr
- Implementation calls SIG_DFL in eal oops handler
- The above step creates the coredump or re-directs any other
infrastructure you are using for coredump.

>
> This should be controlled enabled by a command line argument.

If we allow other infrastructure coredump to work as-is, why
enable/disable required from eal?

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [dpdk-dev] [PATCH v2 1/6] eal: introduce oops handling API
  2021-08-17 15:27             ` Jerin Jacob
@ 2021-08-17 15:52               ` Stephen Hemminger
  2021-08-18  9:37                 ` Jerin Jacob
  0 siblings, 1 reply; 45+ messages in thread
From: Stephen Hemminger @ 2021-08-17 15:52 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: Jerin Jacob, dpdk-dev, Bruce Richardson, Ray Kinsella,
	Thomas Monjalon, David Marchand, Dmitry Kozlyuk,
	Narcisa Ana Maria Vasile, Dmitry Malloy (MESHCHANINOV),
	Pallavi Kadam, Ananyev, Konstantin,
	Ruifeng Wang (Arm Technology China),
	Jan Viktorin, David Christensen

On Tue, 17 Aug 2021 20:57:50 +0530
Jerin Jacob <jerinjacobk@gmail.com> wrote:

> On Tue, Aug 17, 2021 at 8:39 PM Stephen Hemminger
> <stephen@networkplumber.org> wrote:
> >
> > On Tue, 17 Aug 2021 13:08:46 +0530
> > Jerin Jacob <jerinjacobk@gmail.com> wrote:
> >  
> > > On Tue, Aug 17, 2021 at 9:23 AM Stephen Hemminger
> > > <stephen@networkplumber.org> wrote:  
> > > >
> > > > On Tue, 17 Aug 2021 08:57:18 +0530
> > > > <jerinj@marvell.com> wrote:
> > > >  
> > > > > From: Jerin Jacob <jerinj@marvell.com>
> > > > >
> > > > > Introducing oops handling API with following specification
> > > > > and enable stub implementation for Linux and FreeBSD.
> > > > >
> > > > > On rte_eal_init() invocation, the EAL library installs the
> > > > > oops handler for the essential signals.
> > > > > The rte_oops_signals_enabled() API provides the list
> > > > > of signals the library installed by the EAL.  
> > > >
> > > > This is a big change, and many applications already handle these
> > > > signals themselves. Therefore adding this needs to be opt-in
> > > > and not enabled by default.  
> > >
> > > In order to avoid every application explicitly register this
> > > sighandler and to cater to the
> > > co-existing application-specific signal-hander usage.
> > > The following design has been chosen. (It is mentioned in the commit log,
> > > I will describe here for more clarity)
> > >
> > > Case 1:
> > > a) The application installs the signal handler prior to rte_eal_init().
> > > b) Implementation stores the application-specific signal and replace a
> > > signal handler as oops eal handler
> > > c) when application/DPDK get the segfault, the default EAL oops
> > > handler gets invoked
> > > d) Then it dumps the EAL specific message, it calls the
> > > application-specific signal handler
> > > installed in step 1 by application. This avoids breaking any contract
> > > with the application.
> > > i.e Behavior is the same current EAL now.
> > > That is the reason for not using SA_RESETHAND(which call SIG_DFL after
> > > eal oops handler instead
> > > application-specific handler)
> > >
> > > Case 2:
> > > a) The application install the signal handler after rte_eal_init(),
> > > b) EAL hander get replaced with application handle then the application can call
> > > rte_oops_decode() to decode.
> > >
> > > In order to cater the above use case, rte_oops_signals_enabled() and
> > > rte_oops_decode()
> > > provided.
> > >
> > > Here we are not breaking any contract with the application.
> > > Do you have concerns about this design?  
> >
> > In our application as a service it is important not to do any backtrace
> > in production. We rely on other infrastructure to process coredumps.  
> 
> Other infrastructure will work. For example, If we are using standard coredump
> using linux infra. In Current implementation,
> - EAL handler dump the DPDK OOPS like kernel on stderr
> - Implementation calls SIG_DFL in eal oops handler
> - The above step creates the coredump or re-directs any other
> infrastructure you are using for coredump.
> 
> >
> > This should be controlled enabled by a command line argument.  
> 
> If we allow other infrastructure coredump to work as-is, why
> enable/disable required from eal?

The addition of DPDK OOPS adds additional steps which make all
faults be identified as the oops code.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [dpdk-dev] [PATCH v2 1/6] eal: introduce oops handling API
  2021-08-17 15:52               ` Stephen Hemminger
@ 2021-08-18  9:37                 ` Jerin Jacob
  2021-08-18 16:46                   ` Stephen Hemminger
  0 siblings, 1 reply; 45+ messages in thread
From: Jerin Jacob @ 2021-08-18  9:37 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Jerin Jacob, dpdk-dev, Bruce Richardson, Ray Kinsella,
	Thomas Monjalon, David Marchand, Dmitry Kozlyuk,
	Narcisa Ana Maria Vasile, Dmitry Malloy (MESHCHANINOV),
	Pallavi Kadam, Ananyev, Konstantin,
	Ruifeng Wang (Arm Technology China),
	Jan Viktorin, David Christensen

On Tue, Aug 17, 2021 at 9:22 PM Stephen Hemminger
<stephen@networkplumber.org> wrote:
>
> On Tue, 17 Aug 2021 20:57:50 +0530
> Jerin Jacob <jerinjacobk@gmail.com> wrote:
>
> > On Tue, Aug 17, 2021 at 8:39 PM Stephen Hemminger
> > <stephen@networkplumber.org> wrote:
> > >
> > > On Tue, 17 Aug 2021 13:08:46 +0530
> > > Jerin Jacob <jerinjacobk@gmail.com> wrote:
> > >
> > > > On Tue, Aug 17, 2021 at 9:23 AM Stephen Hemminger
> > > > <stephen@networkplumber.org> wrote:
> > > > >
> > > > > On Tue, 17 Aug 2021 08:57:18 +0530
> > > > > <jerinj@marvell.com> wrote:
> > > > >
> > > > > > From: Jerin Jacob <jerinj@marvell.com>
> > > > > >
> > > > > > Introducing oops handling API with following specification
> > > > > > and enable stub implementation for Linux and FreeBSD.
> > > > > >
> > > > > > On rte_eal_init() invocation, the EAL library installs the
> > > > > > oops handler for the essential signals.
> > > > > > The rte_oops_signals_enabled() API provides the list
> > > > > > of signals the library installed by the EAL.
> > > > >
> > > > > This is a big change, and many applications already handle these
> > > > > signals themselves. Therefore adding this needs to be opt-in
> > > > > and not enabled by default.
> > > >
> > > > In order to avoid every application explicitly register this
> > > > sighandler and to cater to the
> > > > co-existing application-specific signal-hander usage.
> > > > The following design has been chosen. (It is mentioned in the commit log,
> > > > I will describe here for more clarity)
> > > >
> > > > Case 1:
> > > > a) The application installs the signal handler prior to rte_eal_init().
> > > > b) Implementation stores the application-specific signal and replace a
> > > > signal handler as oops eal handler
> > > > c) when application/DPDK get the segfault, the default EAL oops
> > > > handler gets invoked
> > > > d) Then it dumps the EAL specific message, it calls the
> > > > application-specific signal handler
> > > > installed in step 1 by application. This avoids breaking any contract
> > > > with the application.
> > > > i.e Behavior is the same current EAL now.
> > > > That is the reason for not using SA_RESETHAND(which call SIG_DFL after
> > > > eal oops handler instead
> > > > application-specific handler)
> > > >
> > > > Case 2:
> > > > a) The application install the signal handler after rte_eal_init(),
> > > > b) EAL hander get replaced with application handle then the application can call
> > > > rte_oops_decode() to decode.
> > > >
> > > > In order to cater the above use case, rte_oops_signals_enabled() and
> > > > rte_oops_decode()
> > > > provided.
> > > >
> > > > Here we are not breaking any contract with the application.
> > > > Do you have concerns about this design?
> > >
> > > In our application as a service it is important not to do any backtrace
> > > in production. We rely on other infrastructure to process coredumps.
> >
> > Other infrastructure will work. For example, If we are using standard coredump
> > using linux infra. In Current implementation,
> > - EAL handler dump the DPDK OOPS like kernel on stderr
> > - Implementation calls SIG_DFL in eal oops handler
> > - The above step creates the coredump or re-directs any other
> > infrastructure you are using for coredump.
> >
> > >
> > > This should be controlled enabled by a command line argument.
> >
> > If we allow other infrastructure coredump to work as-is, why
> > enable/disable required from eal?
>
> The addition of DPDK OOPS adds additional steps which make all
> faults be identified as the oops code.

Since we are using SA_ONSTACK it is not losing the original segfault
info.

I verified like this, Please find below the steps.

0) Enable coredump infra in Linux using coredumpctl or so
1) Apply this series
2) Apply for the following patch to create a segfault from the library.
This will test, segfault caught by eal and forward to default Linux singal
handler.

[main]dell[dpdk.org] $ git diff
diff --git a/lib/eal/linux/eal.c b/lib/eal/linux/eal.c
index 3438a96b75..b935c32c98 100644
--- a/lib/eal/linux/eal.c
+++ b/lib/eal/linux/eal.c
@@ -1338,6 +1338,8 @@ rte_eal_init(int argc, char **argv)

        eal_mcfg_complete();

+       /* Generate a segfault */
+       *(volatile int *)0x05 = 0;
        return fctret;

 }
3)Build
meson --buildtype debug build
ninja -C build

4) Run
$ ./build/app/test/dpdk-test --no-huge  -c 0x2

Please find oops dump[1] and gdb core dump backtrace[2].
Gdb core dump trace preserves the original segfault cause and trace.

Any other concerns?


[1]
[main]dell[dpdk.org] $ ./build/app/test/dpdk-test --no-huge  -c 0x2
EAL: Detected 56 lcore(s)
EAL: Detected 2 NUMA nodes
EAL: Static memory layout is selected, amount of reserved memory can
be adjusted with -m or --socket-mem
EAL: Detected static linkage of DPDK
EAL: Multi-process socket /run/user/1000/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'VA'
EAL: WARNING: Main core has no memory on local socket!
Signal info:
------------
PID:           2666512
Signal number: 11
Fault address: 0x5

Backtrace:
----------
[  0x5582acd1e08a]: rte_eal_init()+0xe18
[  0x5582ac086f4e]: main()+0x298
[  0x7f0facf1fb25]: __libc_start_main()+0xd5
[  0x5582ac079c9e]: _start()+0x2e

Arch info:
----------
R8 : 0x0000000000000002  R9 : 0x00007ffe9273c590
R10: 0x0000000000000000  R11: 0x0000000000000246
R12: 0x00005582bc3ce7a0  R13: 0x00000000000000ca
R14: 0x0000000000000000  R15: 0x0000000000000000
RAX: 0x0000000000000005  RBX: 0x00005582bc3c75c8
RCX: 0x00007ffe9273c530  RDX: 0x0000000000000000
RBP: 0x00007ffe9273c820  RSP: 0x00007ffe9273c690
RSI: 0x0000000000000008  RDI: 0x00000000000000ca
RIP: 0x00005582acd1e08a  EFL: 0x0000000000010246


[2]

Core was generated by `./build/app/test/dpdk-test --no-huge -c 0x2'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  rte_eal_init (argc=4, argv=0x7ffe9273cec8) at ../lib/eal/linux/eal.c:1342
1342            *(volatile int *)0x05 = 0;
[Current thread is 1 (Thread 0x7f0faca83c00 (LWP 2666512))]
(gdb) bt
#0  rte_eal_init (argc=4, argv=0x7ffe9273cec8) at ../lib/eal/linux/eal.c:1342
#1  0x00005582ac086f4e in main (argc=4, argv=0x7ffe9273cec8) at
../app/test/test.c:146




>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [dpdk-dev] [PATCH v2 1/6] eal: introduce oops handling API
  2021-08-18  9:37                 ` Jerin Jacob
@ 2021-08-18 16:46                   ` Stephen Hemminger
  2021-08-18 18:04                     ` Jerin Jacob
  0 siblings, 1 reply; 45+ messages in thread
From: Stephen Hemminger @ 2021-08-18 16:46 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: Jerin Jacob, dpdk-dev, Bruce Richardson, Ray Kinsella,
	Thomas Monjalon, David Marchand, Dmitry Kozlyuk,
	Narcisa Ana Maria Vasile, Dmitry Malloy (MESHCHANINOV),
	Pallavi Kadam, Ananyev, Konstantin,
	Ruifeng Wang (Arm Technology China),
	Jan Viktorin, David Christensen

On Wed, 18 Aug 2021 15:07:25 +0530
Jerin Jacob <jerinjacobk@gmail.com> wrote:

> On Tue, Aug 17, 2021 at 9:22 PM Stephen Hemminger
> <stephen@networkplumber.org> wrote:
> >
> > On Tue, 17 Aug 2021 20:57:50 +0530
> > Jerin Jacob <jerinjacobk@gmail.com> wrote:
> >  
> > > On Tue, Aug 17, 2021 at 8:39 PM Stephen Hemminger
> > > <stephen@networkplumber.org> wrote:  
> > > >
> > > > On Tue, 17 Aug 2021 13:08:46 +0530
> > > > Jerin Jacob <jerinjacobk@gmail.com> wrote:
> > > >  
> > > > > On Tue, Aug 17, 2021 at 9:23 AM Stephen Hemminger
> > > > > <stephen@networkplumber.org> wrote:  
> > > > > >
> > > > > > On Tue, 17 Aug 2021 08:57:18 +0530
> > > > > > <jerinj@marvell.com> wrote:
> > > > > >  
> > > > > > > From: Jerin Jacob <jerinj@marvell.com>
> > > > > > >
> > > > > > > Introducing oops handling API with following specification
> > > > > > > and enable stub implementation for Linux and FreeBSD.
> > > > > > >
> > > > > > > On rte_eal_init() invocation, the EAL library installs the
> > > > > > > oops handler for the essential signals.
> > > > > > > The rte_oops_signals_enabled() API provides the list
> > > > > > > of signals the library installed by the EAL.  
> > > > > >
> > > > > > This is a big change, and many applications already handle these
> > > > > > signals themselves. Therefore adding this needs to be opt-in
> > > > > > and not enabled by default.  
> > > > >
> > > > > In order to avoid every application explicitly register this
> > > > > sighandler and to cater to the
> > > > > co-existing application-specific signal-hander usage.
> > > > > The following design has been chosen. (It is mentioned in the commit log,
> > > > > I will describe here for more clarity)
> > > > >
> > > > > Case 1:
> > > > > a) The application installs the signal handler prior to rte_eal_init().
> > > > > b) Implementation stores the application-specific signal and replace a
> > > > > signal handler as oops eal handler
> > > > > c) when application/DPDK get the segfault, the default EAL oops
> > > > > handler gets invoked
> > > > > d) Then it dumps the EAL specific message, it calls the
> > > > > application-specific signal handler
> > > > > installed in step 1 by application. This avoids breaking any contract
> > > > > with the application.
> > > > > i.e Behavior is the same current EAL now.
> > > > > That is the reason for not using SA_RESETHAND(which call SIG_DFL after
> > > > > eal oops handler instead
> > > > > application-specific handler)
> > > > >
> > > > > Case 2:
> > > > > a) The application install the signal handler after rte_eal_init(),
> > > > > b) EAL hander get replaced with application handle then the application can call
> > > > > rte_oops_decode() to decode.
> > > > >
> > > > > In order to cater the above use case, rte_oops_signals_enabled() and
> > > > > rte_oops_decode()
> > > > > provided.
> > > > >
> > > > > Here we are not breaking any contract with the application.
> > > > > Do you have concerns about this design?  
> > > >
> > > > In our application as a service it is important not to do any backtrace
> > > > in production. We rely on other infrastructure to process coredumps.  
> > >
> > > Other infrastructure will work. For example, If we are using standard coredump
> > > using linux infra. In Current implementation,
> > > - EAL handler dump the DPDK OOPS like kernel on stderr
> > > - Implementation calls SIG_DFL in eal oops handler
> > > - The above step creates the coredump or re-directs any other
> > > infrastructure you are using for coredump.
> > >  
> > > >
> > > > This should be controlled enabled by a command line argument.  
> > >
> > > If we allow other infrastructure coredump to work as-is, why
> > > enable/disable required from eal?  
> >
> > The addition of DPDK OOPS adds additional steps which make all
> > faults be identified as the oops code.  
> 
> Since we are using SA_ONSTACK it is not losing the original segfault
> info.
> 
> I verified like this, Please find below the steps.
> 
> 0) Enable coredump infra in Linux using coredumpctl or so
> 1) Apply this series
> 2) Apply for the following patch to create a segfault from the library.
> This will test, segfault caught by eal and forward to default Linux singal
> handler.
> 
> [main]dell[dpdk.org] $ git diff
> diff --git a/lib/eal/linux/eal.c b/lib/eal/linux/eal.c
> index 3438a96b75..b935c32c98 100644
> --- a/lib/eal/linux/eal.c
> +++ b/lib/eal/linux/eal.c
> @@ -1338,6 +1338,8 @@ rte_eal_init(int argc, char **argv)
> 
>         eal_mcfg_complete();
> 
> +       /* Generate a segfault */
> +       *(volatile int *)0x05 = 0;
>         return fctret;
> 
>  }
> 3)Build
> meson --buildtype debug build
> ninja -C build
> 
> 4) Run
> $ ./build/app/test/dpdk-test --no-huge  -c 0x2
> 
> Please find oops dump[1] and gdb core dump backtrace[2].
> Gdb core dump trace preserves the original segfault cause and trace.
> 
> Any other concerns?

Your new oops handling duplicates existing code in our application
(and I know others that do this as well). The problem is that an
application may do this before calling rte_eal_init and your new
code will break that.

Therefore my recommendation is that the new oops handling needs
to be not a built in feature of EAL.




^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [dpdk-dev] [PATCH v2 1/6] eal: introduce oops handling API
  2021-08-18 16:46                   ` Stephen Hemminger
@ 2021-08-18 18:04                     ` Jerin Jacob
  0 siblings, 0 replies; 45+ messages in thread
From: Jerin Jacob @ 2021-08-18 18:04 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Jerin Jacob, dpdk-dev, Bruce Richardson, Ray Kinsella,
	Thomas Monjalon, David Marchand, Dmitry Kozlyuk,
	Narcisa Ana Maria Vasile, Dmitry Malloy (MESHCHANINOV),
	Pallavi Kadam, Ananyev, Konstantin,
	Ruifeng Wang (Arm Technology China),
	Jan Viktorin, David Christensen

On Wed, Aug 18, 2021 at 10:16 PM Stephen Hemminger
<stephen@networkplumber.org> wrote:
>
> On Wed, 18 Aug 2021 15:07:25 +0530
> Jerin Jacob <jerinjacobk@gmail.com> wrote:
>
> > On Tue, Aug 17, 2021 at 9:22 PM Stephen Hemminger
> > <stephen@networkplumber.org> wrote:
> > >
> > > On Tue, 17 Aug 2021 20:57:50 +0530
> > > Jerin Jacob <jerinjacobk@gmail.com> wrote:
> > >
> > > > On Tue, Aug 17, 2021 at 8:39 PM Stephen Hemminger
> > > > <stephen@networkplumber.org> wrote:
> > > > >
> > > > > On Tue, 17 Aug 2021 13:08:46 +0530
> > > > > Jerin Jacob <jerinjacobk@gmail.com> wrote:
> > > > >
> > > > > > On Tue, Aug 17, 2021 at 9:23 AM Stephen Hemminger
> > > > > > <stephen@networkplumber.org> wrote:
> > > > > > >
> > > > > > > On Tue, 17 Aug 2021 08:57:18 +0530
> > > > > > > <jerinj@marvell.com> wrote:
> > > > > > >
> > > > > > > > From: Jerin Jacob <jerinj@marvell.com>
> > > > > > > >
> > > > > > > > Introducing oops handling API with following specification
> > > > > > > > and enable stub implementation for Linux and FreeBSD.
> > > > > > > >
> > > > > > > > On rte_eal_init() invocation, the EAL library installs the
> > > > > > > > oops handler for the essential signals.
> > > > > > > > The rte_oops_signals_enabled() API provides the list
> > > > > > > > of signals the library installed by the EAL.
> > > > > > >
> > > > > > > This is a big change, and many applications already handle these
> > > > > > > signals themselves. Therefore adding this needs to be opt-in
> > > > > > > and not enabled by default.
> > > > > >
> > > > > > In order to avoid every application explicitly register this
> > > > > > sighandler and to cater to the
> > > > > > co-existing application-specific signal-hander usage.
> > > > > > The following design has been chosen. (It is mentioned in the commit log,
> > > > > > I will describe here for more clarity)
> > > > > >
> > > > > > Case 1:
> > > > > > a) The application installs the signal handler prior to rte_eal_init().
> > > > > > b) Implementation stores the application-specific signal and replace a
> > > > > > signal handler as oops eal handler
> > > > > > c) when application/DPDK get the segfault, the default EAL oops
> > > > > > handler gets invoked
> > > > > > d) Then it dumps the EAL specific message, it calls the
> > > > > > application-specific signal handler
> > > > > > installed in step 1 by application. This avoids breaking any contract
> > > > > > with the application.
> > > > > > i.e Behavior is the same current EAL now.
> > > > > > That is the reason for not using SA_RESETHAND(which call SIG_DFL after
> > > > > > eal oops handler instead
> > > > > > application-specific handler)
> > > > > >
> > > > > > Case 2:
> > > > > > a) The application install the signal handler after rte_eal_init(),
> > > > > > b) EAL hander get replaced with application handle then the application can call
> > > > > > rte_oops_decode() to decode.
> > > > > >
> > > > > > In order to cater the above use case, rte_oops_signals_enabled() and
> > > > > > rte_oops_decode()
> > > > > > provided.
> > > > > >
> > > > > > Here we are not breaking any contract with the application.
> > > > > > Do you have concerns about this design?
> > > > >
> > > > > In our application as a service it is important not to do any backtrace
> > > > > in production. We rely on other infrastructure to process coredumps.
> > > >
> > > > Other infrastructure will work. For example, If we are using standard coredump
> > > > using linux infra. In Current implementation,
> > > > - EAL handler dump the DPDK OOPS like kernel on stderr
> > > > - Implementation calls SIG_DFL in eal oops handler
> > > > - The above step creates the coredump or re-directs any other
> > > > infrastructure you are using for coredump.
> > > >
> > > > >
> > > > > This should be controlled enabled by a command line argument.
> > > >
> > > > If we allow other infrastructure coredump to work as-is, why
> > > > enable/disable required from eal?
> > >
> > > The addition of DPDK OOPS adds additional steps which make all
> > > faults be identified as the oops code.
> >
> > Since we are using SA_ONSTACK it is not losing the original segfault
> > info.
> >
> > I verified like this, Please find below the steps.
> >
> > 0) Enable coredump infra in Linux using coredumpctl or so
> > 1) Apply this series
> > 2) Apply for the following patch to create a segfault from the library.
> > This will test, segfault caught by eal and forward to default Linux singal
> > handler.
> >
> > [main]dell[dpdk.org] $ git diff
> > diff --git a/lib/eal/linux/eal.c b/lib/eal/linux/eal.c
> > index 3438a96b75..b935c32c98 100644
> > --- a/lib/eal/linux/eal.c
> > +++ b/lib/eal/linux/eal.c
> > @@ -1338,6 +1338,8 @@ rte_eal_init(int argc, char **argv)
> >
> >         eal_mcfg_complete();
> >
> > +       /* Generate a segfault */
> > +       *(volatile int *)0x05 = 0;
> >         return fctret;
> >
> >  }
> > 3)Build
> > meson --buildtype debug build
> > ninja -C build
> >
> > 4) Run
> > $ ./build/app/test/dpdk-test --no-huge  -c 0x2
> >
> > Please find oops dump[1] and gdb core dump backtrace[2].
> > Gdb core dump trace preserves the original segfault cause and trace.
> >
> > Any other concerns?
>
> Your new oops handling duplicates existing code in our application
> (and I know others that do this as well). The problem is that an
> application may do this before calling rte_eal_init and your new
> code will break that.

Not sure what it breaks, Could you elaborate on this? Your app signal
handler will be called with the original signal
the info it is registered before rte_eal_init().

We can have an additional API to disable the oops prints if you
insist. (Though I don't the know use case
where someone needs this other than someone don't want to see/log this
print). If that is rational,
I can add API to disable oops print it. I prefer to install it by
default as it won't break anything and it helps
to not add oops API in existing apps i.e without calling any
additional features in all existing applications.

>
> Therefore my recommendation is that the new oops handling needs
> to be not a built in feature of EAL.
>
>
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [dpdk-dev] [PATCH v3 0/6] support oops handling
  2021-08-17  3:27   ` [dpdk-dev] [PATCH v2 0/6] support oops handling jerinj
                       ` (5 preceding siblings ...)
  2021-08-17  3:27     ` [dpdk-dev] [PATCH v2 6/6] test/oops: support unit test case for oops handling APIs jerinj
@ 2021-09-06  4:17     ` jerinj
  2021-09-06  4:17       ` [dpdk-dev] [PATCH v3 1/6] eal: introduce oops handling API jerinj
                         ` (6 more replies)
  6 siblings, 7 replies; 45+ messages in thread
From: jerinj @ 2021-09-06  4:17 UTC (permalink / raw)
  To: dev
  Cc: thomas, david.marchand, bruce.richardson, dmitry.kozliuk,
	navasile, dmitrym, pallavi.kadam, konstantin.ananyev,
	ruifeng.wang, drc, stephen, Jerin Jacob

From: Jerin Jacob <jerinj@marvell.com>

v3:

- Updated the release notes
- Introduce "--no-oops" EAL option to disable default EAL handler.
  Default EAL oops handler stores the existing handler and invoke after
  decoding. So there may not be explicit use case to use this. But added,
  just in case for control to application. Taken the similar appoarach like
  telemetry where by default it is enabled to avoid updating all the
  existing applications.
- Change oops_print to fprintf as rte_log is not safe from fault handler.(Stephen)
- Removed "sig" from signal_db as it is duplicate(Stephen)
- Add const to mem32_dump(Stephen)
- Add const to oops_signals[](Stephen)
	
v2:
- Fix powerpc build (David Christensen)

It is handy to get detailed OOPS information like Linux kernel
when DPDK application crashes without losing any of the features
provided by coredump infrastructure by the OS.

This patch series introduces the APIs to handle OOPS in DPDK.

Following section details the implementation and API interface to application.

On rte_eal_init() invocation and if –no-oops not provided in the EAL
command line argument, then EAL library installs the
oops handler for the essential signals.
The rte_oops_signals_enabled() API provides the list
of signals the library installed by the EAL.

The default EAL oops handler decodes the oops message using rte_oops_decode()
and then calls the signal handler installed by the application 
before invoking the rte_eal_init(). This scheme will also enable the use of
the default coredump handler(for gdb etc.) provided by OS 
if the application does not install any specific signal handler. 

The second case where the application installs the signal handler after 
the rte_eal_init() invocation, rte_oops_decode() provides the means of
decoding the oops message in the application's fault handler.


Patch split:

Patch 1/6: defines the API and stub implementation for Unix systems
Patch 2/6: The API implementation
Patch 3/6: add an optional libunwind dependency to DPDK for better backtrace in oops.
Patch 4/6: x86 specific archinfo like x86 register dump on oops
Patch 5/6: arm64 specific archinfo like arm64 register dump on oops
Patch 6/6: UT for the new APIs


Example command for the build, run, and output logs of an x86-64 linux machine.
  

meson --buildtype debug build
ninja -C build

echo "oops_autotest" | ./build/app/test/dpdk-test --no-huge  -c 0x2

Signal info:
------------
PID:           2439496
Signal number: 11
Fault address: 0x5

Backtrace:
----------
[  0x55e8b56d5cee]: test_oops_generate()+0x75
[  0x55e8b5459843]: unit_test_suite_runner()+0x1aa
[  0x55e8b56d605c]: test_oops()+0x13
[  0x55e8b544bdfc]: cmd_autotest_parsed()+0x55
[  0x55e8b6063a0d]: cmdline_parse()+0x319
[  0x55e8b6061dea]: cmdline_valid_buffer()+0x35
[  0x55e8b6066bd8]: rdline_char_in()+0xc48
[  0x55e8b606221c]: cmdline_in()+0x62
[  0x55e8b6062495]: cmdline_interact()+0x56
[  0x55e8b5459314]: main()+0x65e
[  0x7f54b25d2b25]: __libc_start_main()+0xd5
[  0x55e8b544bc9e]: _start()+0x2e

Arch info:
----------
R8 : 0x0000000000000000  R9 : 0x0000000000000000
R10: 0x00007f54b25b8b48  R11: 0x00007f54b25e7930
R12: 0x00007fffc695e610  R13: 0x0000000000000000
R14: 0x0000000000000000  R15: 0x0000000000000000
RAX: 0x0000000000000005  RBX: 0x0000000000000001
RCX: 0x00007f54b278a943  RDX: 0x3769043bf13a2594
RBP: 0x00007fffc6958340  RSP: 0x00007fffc6958330
RSI: 0x0000000000000000  RDI: 0x000055e8c4c1e380
RIP: 0x000055e8b56d5cee  EFL: 0x0000000000010246

Stack dump:
----------
0x7fffc6958330: 0x6000000
0x7fffc6958334: 0x0
0x7fffc6958338: 0x30cfeac5
0x7fffc695833c: 0x0
0x7fffc6958340: 0xe08395c6
0x7fffc6958344: 0xff7f0000
0x7fffc6958348: 0x439845b5
0x7fffc695834c: 0xe8550000
0x7fffc6958350: 0x0
0x7fffc6958354: 0xb000000
0x7fffc6958358: 0x20445bb9
0x7fffc695835c: 0xe8550000
0x7fffc6958360: 0x925506b6
0x7fffc6958364: 0x0
0x7fffc6958368: 0x0
0x7fffc695836c: 0x0

Code dump:
----------
0x55e8b56d5cee: 0xc7000000
0x55e8b56d5cf2: 0xeb12
0x55e8b56d5cf6: 0xfb6054b
0x55e8b56d5cfa: 0x87540f84
0x55e8b56d5cfe: 0xc07407b8
0x55e8b56d5d02: 0x0
0x55e8b56d5d06: 0xeb05b8ff
0x55e8b56d5d0a: 0xffffffc9
0x55e8b56d5d0e: 0xc3554889
0x55e8b56d5d12: 0xe54881ec
0x55e8b56d5d16: 0xc0000000
0x55e8b56d5d1a: 0x89bd4cff
0x55e8b56d5d1e: 0xffff4889
0x55e8b56d5d22: 0xb540ffff


Jerin Jacob (6):
  eal: introduce oops handling API
  eal: oops handling API implementation
  eal: support libunwind based backtrace
  eal/x86: support register dump for oops
  eal/arm64: support register dump for oops
  test/oops: support unit test case for oops handling APIs

 .github/workflows/build.yml               |   2 +-
 .travis.yml                               |   2 +-
 app/test/meson.build                      |   2 +
 app/test/test_oops.c                      | 122 +++++++++
 config/meson.build                        |   8 +
 doc/api/doxy-api-index.md                 |   3 +-
 doc/guides/linux_gsg/eal_args.include.rst |   4 +
 doc/guides/rel_notes/release_21_11.rst    |  10 +
 lib/eal/common/eal_common_options.c       |   5 +
 lib/eal/common/eal_internal_cfg.h         |   1 +
 lib/eal/common/eal_options.h              |   2 +
 lib/eal/common/eal_private.h              |   3 +
 lib/eal/freebsd/eal.c                     |   8 +
 lib/eal/include/meson.build               |   1 +
 lib/eal/include/rte_oops.h                | 101 ++++++++
 lib/eal/linux/eal.c                       |   7 +
 lib/eal/unix/eal_oops.c                   | 293 ++++++++++++++++++++++
 lib/eal/unix/meson.build                  |   1 +
 lib/eal/version.map                       |   4 +
 19 files changed, 576 insertions(+), 3 deletions(-)
 create mode 100644 app/test/test_oops.c
 create mode 100644 lib/eal/include/rte_oops.h
 create mode 100644 lib/eal/unix/eal_oops.c

-- 
2.33.0


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [dpdk-dev] [PATCH v3 1/6] eal: introduce oops handling API
  2021-09-06  4:17     ` [dpdk-dev] [PATCH v3 0/6] support oops handling jerinj
@ 2021-09-06  4:17       ` jerinj
  2021-09-06  4:17       ` [dpdk-dev] [PATCH v3 2/6] eal: oops handling API implementation jerinj
                         ` (5 subsequent siblings)
  6 siblings, 0 replies; 45+ messages in thread
From: jerinj @ 2021-09-06  4:17 UTC (permalink / raw)
  To: dev, Bruce Richardson, Ray Kinsella
  Cc: thomas, david.marchand, dmitry.kozliuk, navasile, dmitrym,
	pallavi.kadam, konstantin.ananyev, ruifeng.wang, drc, stephen,
	Jerin Jacob

From: Jerin Jacob <jerinj@marvell.com>

Introducing oops handling API with following specification
and enable stub implementation for Linux and FreeBSD.

On rte_eal_init() invocation and if –no-oops not provided in the EAL
command line argument, then EAL library installs the
oops handler for the essential signals.
The rte_oops_signals_enabled() API provides the list
of signals the library installed by the EAL.

The default EAL oops handler decodes the oops message using
rte_oops_decode() and then calls the signal handler
installed by the application before invoking the rte_eal_init().
This scheme will also enable the use of the default coredump
handler(for gdb etc.) provided by OS if the application does
not install any specific signal handler.

The second case where the application installs the signal
handler after the rte_eal_init() invocation, rte_oops_decode()
provides the means of decoding the oops message in
the application's fault handler.

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
---
 doc/api/doxy-api-index.md                 |   3 +-
 doc/guides/linux_gsg/eal_args.include.rst |   4 +
 doc/guides/rel_notes/release_21_11.rst    |  10 +++
 lib/eal/common/eal_common_options.c       |   5 ++
 lib/eal/common/eal_internal_cfg.h         |   1 +
 lib/eal/common/eal_options.h              |   2 +
 lib/eal/common/eal_private.h              |   3 +
 lib/eal/freebsd/eal.c                     |   8 ++
 lib/eal/include/meson.build               |   1 +
 lib/eal/include/rte_oops.h                | 101 ++++++++++++++++++++++
 lib/eal/linux/eal.c                       |   7 ++
 lib/eal/unix/eal_oops.c                   |  36 ++++++++
 lib/eal/unix/meson.build                  |   1 +
 lib/eal/version.map                       |   4 +
 14 files changed, 185 insertions(+), 1 deletion(-)
 create mode 100644 lib/eal/include/rte_oops.h
 create mode 100644 lib/eal/unix/eal_oops.c

diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index 1992107a03..0d0da35205 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -215,7 +215,8 @@ The public API headers are grouped by topics:
   [log]                (@ref rte_log.h),
   [errno]              (@ref rte_errno.h),
   [trace]              (@ref rte_trace.h),
-  [trace_point]        (@ref rte_trace_point.h)
+  [trace_point]        (@ref rte_trace_point.h),
+  [oops]               (@ref rte_oops.h)
 
 - **misc**:
   [EAL config]         (@ref rte_eal.h),
diff --git a/doc/guides/linux_gsg/eal_args.include.rst b/doc/guides/linux_gsg/eal_args.include.rst
index 96baa4a9b0..8db320bc07 100644
--- a/doc/guides/linux_gsg/eal_args.include.rst
+++ b/doc/guides/linux_gsg/eal_args.include.rst
@@ -226,3 +226,7 @@ Other options
     To disable use of max SIMD bitwidth limit::
 
         --force-max-simd-bitwidth=0
+
+*    ``--no-oops``:
+
+    Disable default EAL oops handler.
diff --git a/doc/guides/rel_notes/release_21_11.rst b/doc/guides/rel_notes/release_21_11.rst
index 675b573834..ba31a5dbed 100644
--- a/doc/guides/rel_notes/release_21_11.rst
+++ b/doc/guides/rel_notes/release_21_11.rst
@@ -62,6 +62,16 @@ New Features
   * Added bus-level parsing of the devargs syntax.
   * Kept compatibility with the legacy syntax as parsing fallback.
 
+* **Added APIs for oops handling support.**
+
+  Added support for decoding the oops fault with ``libunwind`` based backtrace,
+  architecture-specific register dump, instruction memory dump, and
+  stack memory dump. EAL installs the default oops handler if ``no-oops`` EAL
+  command line argument is not provided. The default EAL oops handler stores the
+  existing handler and invoke after decoding. It also offers ``rte_oops_decode``
+  API to integrate the EAL oops decode function where the application does not
+  use the default EAL handler.
+
 
 Removed Items
 -------------
diff --git a/lib/eal/common/eal_common_options.c b/lib/eal/common/eal_common_options.c
index ff5861b5f3..b359e55485 100644
--- a/lib/eal/common/eal_common_options.c
+++ b/lib/eal/common/eal_common_options.c
@@ -107,6 +107,7 @@ eal_long_options[] = {
 	{OPT_TELEMETRY,         0, NULL, OPT_TELEMETRY_NUM        },
 	{OPT_NO_TELEMETRY,      0, NULL, OPT_NO_TELEMETRY_NUM     },
 	{OPT_FORCE_MAX_SIMD_BITWIDTH, 1, NULL, OPT_FORCE_MAX_SIMD_BITWIDTH_NUM},
+	{OPT_NO_OOPS,           0, NULL, OPT_NO_OOPS_NUM          },
 
 	/* legacy options that will be removed in future */
 	{OPT_PCI_BLACKLIST,     1, NULL, OPT_PCI_BLACKLIST_NUM    },
@@ -1825,6 +1826,9 @@ eal_parse_common_option(int opt, const char *optarg,
 			return -1;
 		}
 		break;
+	case OPT_NO_OOPS_NUM:
+		conf->no_oops = 1;
+		break;
 
 	/* don't know what to do, leave this to caller */
 	default:
@@ -2128,6 +2132,7 @@ eal_common_usage(void)
 	       "  --"OPT_TELEMETRY"   Enable telemetry support (on by default)\n"
 	       "  --"OPT_NO_TELEMETRY"   Disable telemetry support\n"
 	       "  --"OPT_FORCE_MAX_SIMD_BITWIDTH" Force the max SIMD bitwidth\n"
+	       "  --"OPT_NO_OOPS"     Disable default oops EAL handler(on by default)\n"
 	       "\nEAL options for DEBUG use only:\n"
 	       "  --"OPT_HUGE_UNLINK"       Unlink hugepage files after init\n"
 	       "  --"OPT_NO_HUGE"           Use malloc instead of hugetlbfs\n"
diff --git a/lib/eal/common/eal_internal_cfg.h b/lib/eal/common/eal_internal_cfg.h
index d6c0470eb8..687aa062ea 100644
--- a/lib/eal/common/eal_internal_cfg.h
+++ b/lib/eal/common/eal_internal_cfg.h
@@ -94,6 +94,7 @@ struct internal_config {
 	unsigned int no_telemetry; /**< true to disable Telemetry */
 	struct simd_bitwidth max_simd_bitwidth;
 	/**< max simd bitwidth path to use */
+	unsigned int no_oops; /**< true to disable oops */
 };
 
 void eal_reset_internal_config(struct internal_config *internal_cfg);
diff --git a/lib/eal/common/eal_options.h b/lib/eal/common/eal_options.h
index 7b348e707f..b0256d7529 100644
--- a/lib/eal/common/eal_options.h
+++ b/lib/eal/common/eal_options.h
@@ -93,6 +93,8 @@ enum {
 	OPT_NO_TELEMETRY_NUM,
 #define OPT_FORCE_MAX_SIMD_BITWIDTH  "force-max-simd-bitwidth"
 	OPT_FORCE_MAX_SIMD_BITWIDTH_NUM,
+#define OPT_NO_OOPS           "no-oops"
+	OPT_NO_OOPS_NUM,
 
 	/* legacy option that will be removed in future */
 #define OPT_PCI_BLACKLIST     "pci-blacklist"
diff --git a/lib/eal/common/eal_private.h b/lib/eal/common/eal_private.h
index 64cf4e81c8..c3a490d803 100644
--- a/lib/eal/common/eal_private.h
+++ b/lib/eal/common/eal_private.h
@@ -716,6 +716,9 @@ void __rte_thread_init(unsigned int lcore_id, rte_cpuset_t *cpuset);
  */
 void __rte_thread_uninit(void);
 
+int eal_oops_init(void);
+void eal_oops_fini(void);
+
 /**
  * asprintf(3) replacement for Windows.
  */
diff --git a/lib/eal/freebsd/eal.c b/lib/eal/freebsd/eal.c
index 6cee5ae369..6a48a7e95c 100644
--- a/lib/eal/freebsd/eal.c
+++ b/lib/eal/freebsd/eal.c
@@ -692,6 +692,7 @@ rte_eal_init(int argc, char **argv)
 		return -1;
 	}
 
+
 	thread_id = pthread_self();
 
 	eal_reset_internal_config(internal_conf);
@@ -719,6 +720,11 @@ rte_eal_init(int argc, char **argv)
 	/* FreeBSD always uses legacy memory model */
 	internal_conf->legacy_mem = true;
 
+	if (internal_conf->no_oops == 0 && eal_oops_init()) {
+		rte_eal_init_alert("oops init failed.");
+		rte_errno = ENOENT;
+	}
+
 	if (eal_plugins_init() < 0) {
 		rte_eal_init_alert("Cannot init plugins");
 		rte_errno = EINVAL;
@@ -973,6 +979,8 @@ rte_eal_cleanup(void)
 	rte_eal_memory_detach();
 	rte_trace_save();
 	eal_trace_fini();
+	if (internal_conf->no_oops == 0)
+		eal_oops_fini();
 	eal_cleanup_config(internal_conf);
 	return 0;
 }
diff --git a/lib/eal/include/meson.build b/lib/eal/include/meson.build
index 88a9eba12f..6c74bdb7b5 100644
--- a/lib/eal/include/meson.build
+++ b/lib/eal/include/meson.build
@@ -30,6 +30,7 @@ headers += files(
         'rte_malloc.h',
         'rte_memory.h',
         'rte_memzone.h',
+        'rte_oops.h',
         'rte_pci_dev_feature_defs.h',
         'rte_pci_dev_features.h',
         'rte_per_lcore.h',
diff --git a/lib/eal/include/rte_oops.h b/lib/eal/include/rte_oops.h
new file mode 100644
index 0000000000..0a76c3d242
--- /dev/null
+++ b/lib/eal/include/rte_oops.h
@@ -0,0 +1,101 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2020 Marvell.
+ */
+
+#ifndef _RTE_OOPS_H_
+#define _RTE_OOPS_H_
+
+#include <rte_common.h>
+#include <rte_compat.h>
+#include <rte_config.h>
+
+/**
+ * @file
+ *
+ * RTE oops API
+ *
+ * This file provides the oops handling APIs to RTE applications.
+ *
+ * On rte_eal_init() invocation and if *--no-oops* not provided in the EAL
+ * command line argument, then EAL library installs the oops handler for
+ * the essential signals. The rte_oops_signals_enabled() API provides the list
+ * of signals the library installed by the EAL.
+ *
+ * The default EAL oops handler decodes the oops message using rte_oops_decode()
+ * and then calls the signal handler installed by the application before
+ * invoking the rte_eal_init(). This scheme will also enable the use of
+ * the default coredump handler(for gdb etc.) provided by OS if the application
+ * does not install any specific signal handler.
+ *
+ * The second case where the application installs the signal handler after
+ * the rte_eal_init() invocation, rte_oops_decode() provides the means of
+ * decoding the oops message in the application's fault handler.
+ *
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * Maximum number of oops signals enabled in EAL.
+ * @see rte_oops_signals_enabled()
+ */
+#define RTE_OOPS_SIGNALS_MAX 32
+
+/**
+ * Get the list of enabled oops signals installed by EAL.
+ *
+ * @param [out] signals
+ *   A pointer to store the enabled signals.
+ *   Value NULL is allowed. if not NULL, then the size of this array must be
+ *   at least RTE_OOPS_SIGNALS_MAX.
+ *
+ * @return
+ *   Number of enabled oops signals.
+ */
+__rte_experimental
+int rte_oops_signals_enabled(int *signals);
+
+#if defined(RTE_EXEC_ENV_LINUX) || defined(RTE_EXEC_ENV_FREEBSD)
+#include <signal.h>
+#include <ucontext.h>
+
+/**
+ * Decode an oops
+ *
+ * This prototype is same as sa_sigaction defined in signal.h.
+ * Application must register signal handler using sigaction() with
+ * sa_flag as SA_SIGINFO flag to get this information from unix OS.
+ *
+ * @param sig
+ *   Signal number
+ * @param info
+ *   Signal info provided by sa_sigaction. Value NULL is allowed.
+ * @param uc
+ *   ucontext_t provided when signal installed with SA_SIGINFO flag.
+ *   Value NULL is allowed.
+ *
+ */
+__rte_experimental
+void rte_oops_decode(int sig, siginfo_t *info, ucontext_t *uc);
+#else
+
+/**
+ * Decode an oops
+ *
+ * @param sig
+ *   Signal number
+ */
+__rte_experimental
+void rte_oops_decode(int sig);
+
+#endif
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_OOPS_H_ */
diff --git a/lib/eal/linux/eal.c b/lib/eal/linux/eal.c
index 3577eaeaa4..0ab43c9e74 100644
--- a/lib/eal/linux/eal.c
+++ b/lib/eal/linux/eal.c
@@ -1017,6 +1017,11 @@ rte_eal_init(int argc, char **argv)
 		return -1;
 	}
 
+	if (internal_conf->no_oops == 0 && eal_oops_init()) {
+		rte_eal_init_alert("oops init failed.");
+		rte_errno = ENOENT;
+	}
+
 	if (eal_plugins_init() < 0) {
 		rte_eal_init_alert("Cannot init plugins");
 		rte_errno = EINVAL;
@@ -1370,6 +1375,8 @@ rte_eal_cleanup(void)
 	rte_eal_memory_detach();
 	rte_trace_save();
 	eal_trace_fini();
+	if (internal_conf->no_oops == 0)
+		eal_oops_fini();
 	eal_cleanup_config(internal_conf);
 	return 0;
 }
diff --git a/lib/eal/unix/eal_oops.c b/lib/eal/unix/eal_oops.c
new file mode 100644
index 0000000000..53b580f733
--- /dev/null
+++ b/lib/eal/unix/eal_oops.c
@@ -0,0 +1,36 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2021 Marvell.
+ */
+
+
+#include <rte_oops.h>
+
+#include "eal_private.h"
+
+void
+rte_oops_decode(int sig, siginfo_t *info, ucontext_t *uc)
+{
+	RTE_SET_USED(sig);
+	RTE_SET_USED(info);
+	RTE_SET_USED(uc);
+
+}
+
+int
+rte_oops_signals_enabled(int *signals)
+{
+	RTE_SET_USED(signals);
+
+	return 0;
+}
+
+int
+eal_oops_init(void)
+{
+	return 0;
+}
+
+void
+eal_oops_fini(void)
+{
+}
diff --git a/lib/eal/unix/meson.build b/lib/eal/unix/meson.build
index e3ecd3e956..cdd3320669 100644
--- a/lib/eal/unix/meson.build
+++ b/lib/eal/unix/meson.build
@@ -6,5 +6,6 @@ sources += files(
         'eal_unix_memory.c',
         'eal_unix_timer.c',
         'eal_firmware.c',
+        'eal_oops.c',
         'rte_thread.c',
 )
diff --git a/lib/eal/version.map b/lib/eal/version.map
index beeb986adc..4106beb6ef 100644
--- a/lib/eal/version.map
+++ b/lib/eal/version.map
@@ -426,6 +426,10 @@ EXPERIMENTAL {
 
 	# added in 21.08
 	rte_power_monitor_multi; # WINDOWS_NO_EXPORT
+
+	# added in 21.11
+	rte_oops_signals_enabled; # WINDOWS_NO_EXPORT
+	rte_oops_decode; # WINDOWS_NO_EXPORT
 };
 
 INTERNAL {
-- 
2.33.0


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [dpdk-dev] [PATCH v3 2/6] eal: oops handling API implementation
  2021-09-06  4:17     ` [dpdk-dev] [PATCH v3 0/6] support oops handling jerinj
  2021-09-06  4:17       ` [dpdk-dev] [PATCH v3 1/6] eal: introduce oops handling API jerinj
@ 2021-09-06  4:17       ` jerinj
  2021-09-06  4:17       ` [dpdk-dev] [PATCH v3 3/6] eal: support libunwind based backtrace jerinj
                         ` (4 subsequent siblings)
  6 siblings, 0 replies; 45+ messages in thread
From: jerinj @ 2021-09-06  4:17 UTC (permalink / raw)
  To: dev
  Cc: thomas, david.marchand, bruce.richardson, dmitry.kozliuk,
	navasile, dmitrym, pallavi.kadam, konstantin.ananyev,
	ruifeng.wang, drc, stephen, Jerin Jacob

From: Jerin Jacob <jerinj@marvell.com>

Implement the base oops handling APIs.

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
---
 lib/eal/unix/eal_oops.c | 173 ++++++++++++++++++++++++++++++++++++++--
 1 file changed, 166 insertions(+), 7 deletions(-)

diff --git a/lib/eal/unix/eal_oops.c b/lib/eal/unix/eal_oops.c
index 53b580f733..a480437f23 100644
--- a/lib/eal/unix/eal_oops.c
+++ b/lib/eal/unix/eal_oops.c
@@ -2,35 +2,194 @@
  * Copyright(C) 2021 Marvell.
  */
 
+#include <inttypes.h>
+#include <signal.h>
+#include <ucontext.h>
+#include <unistd.h>
 
+#include <rte_byteorder.h>
+#include <rte_log.h>
 #include <rte_oops.h>
 
 #include "eal_private.h"
 
-void
-rte_oops_decode(int sig, siginfo_t *info, ucontext_t *uc)
+/* It is not safe to call rte_log from signal handler due to the fact the
+ * malloc pool may be corrupted and rte_log uses malloc.
+ */
+#define oops_print(...) fprintf(stderr, __VA_ARGS__)
+
+static const int oops_signals[] = {SIGSEGV, SIGBUS, SIGILL,
+				   SIGABRT, SIGFPE, SIGSYS};
+
+struct oops_signal {
+	bool enabled;
+	struct sigaction sa;
+};
+
+static struct oops_signal signals_db[RTE_DIM(oops_signals)];
+
+static void
+back_trace_dump(ucontext_t *context)
+{
+	RTE_SET_USED(context);
+}
+static void
+siginfo_dump(int sig, siginfo_t *info)
+{
+	oops_print("PID:           %" PRIdMAX "\n", (intmax_t)getpid());
+
+	if (info == NULL)
+		return;
+	if (sig != info->si_signo)
+		oops_print("Invalid signal info\n");
+
+	oops_print("Signal number: %d\n", info->si_signo);
+	oops_print("Fault address: %p\n", info->si_addr);
+}
+
+static void
+mem32_dump(const void *ptr)
+{
+	const uint32_t *p = ptr;
+	int i;
+
+	for (i = 0; i < 16; i++)
+		oops_print("%p: 0x%x\n", p + i, rte_be_to_cpu_32(p[i]));
+}
+
+static void
+stack_dump_header(void)
+{
+	oops_print("Stack dump:\n");
+	oops_print("----------\n");
+}
+
+static void
+code_dump_header(void)
+{
+	oops_print("Code dump:\n");
+	oops_print("----------\n");
+}
+
+static void
+stack_code_dump(void *stack, void *code)
+{
+	if (stack == NULL || code == NULL)
+		return;
+
+	oops_print("\n");
+	stack_dump_header();
+	mem32_dump(stack);
+	oops_print("\n");
+
+	code_dump_header();
+	mem32_dump(code);
+	oops_print("\n");
+}
+static void
+archinfo_dump(ucontext_t *uc)
 {
-	RTE_SET_USED(sig);
-	RTE_SET_USED(info);
 	RTE_SET_USED(uc);
 
+	stack_code_dump(NULL, NULL);
+}
+
+static void
+default_signal_handler_invoke(int sig)
+{
+	unsigned int idx;
+
+	for (idx = 0; idx < RTE_DIM(oops_signals); idx++) {
+		if (oops_signals[idx] != sig)
+			continue;
+		/* Skip disabled signals */
+		if (!signals_db[idx].enabled)
+			continue;
+		/* Replace with stored handler */
+		sigaction(sig, &signals_db[idx].sa, NULL);
+		kill(getpid(), sig);
+	}
+}
+
+void
+rte_oops_decode(int sig, siginfo_t *info, ucontext_t *uc)
+{
+	oops_print("Signal info:\n");
+	oops_print("------------\n");
+	siginfo_dump(sig, info);
+	oops_print("\n");
+
+	oops_print("Backtrace:\n");
+	oops_print("----------\n");
+	back_trace_dump(uc);
+	oops_print("\n");
+
+	oops_print("Arch info:\n");
+	oops_print("----------\n");
+	if (uc)
+		archinfo_dump(uc);
+}
+
+static void
+eal_oops_handler(int sig, siginfo_t *info, void *ctx)
+{
+	ucontext_t *uc = ctx;
+
+	rte_oops_decode(sig, info, uc);
+	default_signal_handler_invoke(sig);
 }
 
 int
 rte_oops_signals_enabled(int *signals)
 {
-	RTE_SET_USED(signals);
+	int count = 0, sig[RTE_OOPS_SIGNALS_MAX];
+	unsigned int idx = 0;
 
-	return 0;
+	for (idx = 0; idx < RTE_DIM(oops_signals); idx++) {
+		if (signals_db[idx].enabled)
+			sig[count++] = oops_signals[idx];
+	}
+	if (signals)
+		memcpy(signals, sig, sizeof(*signals) * count);
+
+	return count;
 }
 
 int
 eal_oops_init(void)
 {
-	return 0;
+	unsigned int idx, rc = 0;
+	struct sigaction sa;
+
+	RTE_BUILD_BUG_ON(RTE_DIM(oops_signals) > RTE_OOPS_SIGNALS_MAX);
+
+	sigemptyset(&sa.sa_mask);
+	sa.sa_sigaction = &eal_oops_handler;
+	sa.sa_flags = SA_RESTART | SA_SIGINFO | SA_ONSTACK;
+
+	for (idx = 0; idx < RTE_DIM(oops_signals); idx++) {
+		/* Get exiting sigaction */
+		rc = sigaction(oops_signals[idx], NULL, &signals_db[idx].sa);
+		if (rc)
+			continue;
+		/* Replace with oops handler */
+		rc = sigaction(oops_signals[idx], &sa, NULL);
+		if (rc)
+			continue;
+		signals_db[idx].enabled = true;
+	}
+	return rc;
 }
 
 void
 eal_oops_fini(void)
 {
+	unsigned int idx;
+
+	for (idx = 0; idx < RTE_DIM(oops_signals); idx++) {
+		if (!signals_db[idx].enabled)
+			continue;
+		/* Replace with stored handler */
+		sigaction(oops_signals[idx], &signals_db[idx].sa, NULL);
+	}
 }
-- 
2.33.0


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [dpdk-dev] [PATCH v3 3/6] eal: support libunwind based backtrace
  2021-09-06  4:17     ` [dpdk-dev] [PATCH v3 0/6] support oops handling jerinj
  2021-09-06  4:17       ` [dpdk-dev] [PATCH v3 1/6] eal: introduce oops handling API jerinj
  2021-09-06  4:17       ` [dpdk-dev] [PATCH v3 2/6] eal: oops handling API implementation jerinj
@ 2021-09-06  4:17       ` jerinj
  2022-01-27 20:47         ` Stephen Hemminger
  2021-09-06  4:17       ` [dpdk-dev] [PATCH v3 4/6] eal/x86: support register dump for oops jerinj
                         ` (3 subsequent siblings)
  6 siblings, 1 reply; 45+ messages in thread
From: jerinj @ 2021-09-06  4:17 UTC (permalink / raw)
  To: dev, Aaron Conole, Michael Santana, Bruce Richardson
  Cc: thomas, david.marchand, dmitry.kozliuk, navasile, dmitrym,
	pallavi.kadam, konstantin.ananyev, ruifeng.wang, drc, stephen,
	Jerin Jacob

From: Jerin Jacob <jerinj@marvell.com>

adding optional libwind library dependency to DPDK for
enhanced backtrace based on ucontext.

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
---
 .github/workflows/build.yml |  2 +-
 .travis.yml                 |  2 +-
 config/meson.build          |  8 +++++++
 lib/eal/unix/eal_oops.c     | 45 +++++++++++++++++++++++++++++++++++++
 4 files changed, 55 insertions(+), 2 deletions(-)

diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml
index 151641e6fa..de985776ed 100644
--- a/.github/workflows/build.yml
+++ b/.github/workflows/build.yml
@@ -93,7 +93,7 @@ jobs:
       run: sudo apt install -y ccache libnuma-dev python3-setuptools
         python3-wheel python3-pip python3-pyelftools ninja-build libbsd-dev
         libpcap-dev libibverbs-dev libcrypto++-dev libfdt-dev libjansson-dev
-        libarchive-dev
+        libarchive-dev libunwind-dev
     - name: Install libabigail build dependencies if no cache is available
       if: env.ABI_CHECKS == 'true' && steps.libabigail-cache.outputs.cache-hit != 'true'
       run: sudo apt install -y autoconf automake libtool pkg-config libxml2-dev
diff --git a/.travis.yml b/.travis.yml
index 4bb5bf629e..cfb8931d3b 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -16,7 +16,7 @@ addons:
     packages: &required_packages
       - [libnuma-dev, python3-setuptools, python3-wheel, python3-pip, python3-pyelftools, ninja-build]
       - [libbsd-dev, libpcap-dev, libibverbs-dev, libcrypto++-dev, libfdt-dev, libjansson-dev]
-      - [libarchive-dev]
+      - [libarchive-dev, libunwind-dev]
 
 _aarch64_packages: &aarch64_packages
   - *required_packages
diff --git a/config/meson.build b/config/meson.build
index 3b5966ec2f..7f4dd52bc5 100644
--- a/config/meson.build
+++ b/config/meson.build
@@ -237,6 +237,14 @@ if cc.get_id() == 'clang' and dpdk_conf.get('RTE_ARCH_64') == false
     dpdk_extra_ldflags += '-latomic'
 endif
 
+# check for libunwind
+unwind_dep = dependency('libunwind', required: false, method: 'pkg-config')
+if unwind_dep.found() and cc.has_header('libunwind.h', dependencies: unwind_dep)
+    dpdk_conf.set('RTE_USE_LIBUNWIND', 1)
+    add_project_link_arguments('-lunwind', language: 'c')
+    dpdk_extra_ldflags += '-lunwind'
+endif
+
 # add -include rte_config to cflags
 add_project_arguments('-include', 'rte_config.h', language: 'c')
 
diff --git a/lib/eal/unix/eal_oops.c b/lib/eal/unix/eal_oops.c
index a480437f23..9c2d9d99d9 100644
--- a/lib/eal/unix/eal_oops.c
+++ b/lib/eal/unix/eal_oops.c
@@ -28,11 +28,56 @@ struct oops_signal {
 
 static struct oops_signal signals_db[RTE_DIM(oops_signals)];
 
+#if defined(RTE_USE_LIBUNWIND)
+
+#define BACKTRACE_DEPTH 256
+#define UNW_LOCAL_ONLY
+#include <libunwind.h>
+
+static void
+back_trace_dump(ucontext_t *context)
+{
+	unw_cursor_t cursor;
+	unw_word_t ip, off;
+	int rc, level = 0;
+	char name[256];
+
+	if (context == NULL)
+		return;
+
+	rc = unw_init_local(&cursor, (unw_context_t *)context);
+	if (rc < 0)
+		goto fail;
+
+	for (;;) {
+		rc = unw_get_reg(&cursor, UNW_REG_IP, &ip);
+		if (rc < 0)
+			goto fail;
+		rc = unw_get_proc_name(&cursor, name, sizeof(name), &off);
+		if (rc == 0)
+			oops_print("[%16p]: %s()+0x%" PRIx64 "\n", (void *)ip,
+				   name, (uint64_t)off);
+		else
+			oops_print("[%16p]: <unknown>\n", (void *)ip);
+		rc = unw_step(&cursor);
+		if (rc <= 0 || ++level >= BACKTRACE_DEPTH)
+			break;
+	}
+	return;
+fail:
+	oops_print("libunwind call failed %s\n", unw_strerror(rc));
+}
+
+#else
+
 static void
 back_trace_dump(ucontext_t *context)
 {
 	RTE_SET_USED(context);
 }
+
+#endif
+
 static void
 siginfo_dump(int sig, siginfo_t *info)
 {
-- 
2.33.0


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [dpdk-dev] [PATCH v3 4/6] eal/x86: support register dump for oops
  2021-09-06  4:17     ` [dpdk-dev] [PATCH v3 0/6] support oops handling jerinj
                         ` (2 preceding siblings ...)
  2021-09-06  4:17       ` [dpdk-dev] [PATCH v3 3/6] eal: support libunwind based backtrace jerinj
@ 2021-09-06  4:17       ` jerinj
  2021-09-06  4:17       ` [dpdk-dev] [PATCH v3 5/6] eal/arm64: " jerinj
                         ` (2 subsequent siblings)
  6 siblings, 0 replies; 45+ messages in thread
From: jerinj @ 2021-09-06  4:17 UTC (permalink / raw)
  To: dev
  Cc: thomas, david.marchand, bruce.richardson, dmitry.kozliuk,
	navasile, dmitrym, pallavi.kadam, konstantin.ananyev,
	ruifeng.wang, drc, stephen, Jerin Jacob

From: Jerin Jacob <jerinj@marvell.com>

Dump the x86 arch state register in oops
handling routine.

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
---
 lib/eal/unix/eal_oops.c | 34 ++++++++++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)

diff --git a/lib/eal/unix/eal_oops.c b/lib/eal/unix/eal_oops.c
index 9c2d9d99d9..a9c22cbe70 100644
--- a/lib/eal/unix/eal_oops.c
+++ b/lib/eal/unix/eal_oops.c
@@ -131,6 +131,38 @@ stack_code_dump(void *stack, void *code)
 	mem32_dump(code);
 	oops_print("\n");
 }
+
+#if defined(RTE_ARCH_X86_64) && defined(RTE_EXEC_ENV_LINUX)
+static void
+archinfo_dump(ucontext_t *uc)
+{
+
+	mcontext_t *mc = &uc->uc_mcontext;
+
+	oops_print("R8 : 0x%.16llx  ", mc->gregs[REG_R8]);
+	oops_print("R9 : 0x%.16llx\n", mc->gregs[REG_R9]);
+	oops_print("R10: 0x%.16llx  ", mc->gregs[REG_R10]);
+	oops_print("R11: 0x%.16llx\n", mc->gregs[REG_R11]);
+	oops_print("R12: 0x%.16llx  ", mc->gregs[REG_R12]);
+	oops_print("R13: 0x%.16llx\n", mc->gregs[REG_R13]);
+	oops_print("R14: 0x%.16llx  ", mc->gregs[REG_R14]);
+	oops_print("R15: 0x%.16llx\n", mc->gregs[REG_R15]);
+	oops_print("RAX: 0x%.16llx  ", mc->gregs[REG_RAX]);
+	oops_print("RBX: 0x%.16llx\n", mc->gregs[REG_RBX]);
+	oops_print("RCX: 0x%.16llx  ", mc->gregs[REG_RCX]);
+	oops_print("RDX: 0x%.16llx\n", mc->gregs[REG_RDX]);
+	oops_print("RBP: 0x%.16llx  ", mc->gregs[REG_RBP]);
+	oops_print("RSP: 0x%.16llx\n", mc->gregs[REG_RSP]);
+	oops_print("RSI: 0x%.16llx  ", mc->gregs[REG_RSI]);
+	oops_print("RDI: 0x%.16llx\n", mc->gregs[REG_RDI]);
+	oops_print("RIP: 0x%.16llx  ", mc->gregs[REG_RIP]);
+	oops_print("EFL: 0x%.16llx\n", mc->gregs[REG_EFL]);
+
+	stack_code_dump((void *)mc->gregs[REG_RSP], (void *)mc->gregs[REG_RIP]);
+}
+
+#else
+
 static void
 archinfo_dump(ucontext_t *uc)
 {
@@ -139,6 +171,8 @@ archinfo_dump(ucontext_t *uc)
 	stack_code_dump(NULL, NULL);
 }
 
+#endif
+
 static void
 default_signal_handler_invoke(int sig)
 {
-- 
2.33.0


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [dpdk-dev] [PATCH v3 5/6] eal/arm64: support register dump for oops
  2021-09-06  4:17     ` [dpdk-dev] [PATCH v3 0/6] support oops handling jerinj
                         ` (3 preceding siblings ...)
  2021-09-06  4:17       ` [dpdk-dev] [PATCH v3 4/6] eal/x86: support register dump for oops jerinj
@ 2021-09-06  4:17       ` jerinj
  2021-09-06  4:17       ` [dpdk-dev] [PATCH v3 6/6] test/oops: support unit test case for oops handling APIs jerinj
  2021-09-21 17:30       ` [dpdk-dev] [PATCH v3 0/6] support oops handling Thomas Monjalon
  6 siblings, 0 replies; 45+ messages in thread
From: jerinj @ 2021-09-06  4:17 UTC (permalink / raw)
  To: dev
  Cc: thomas, david.marchand, bruce.richardson, dmitry.kozliuk,
	navasile, dmitrym, pallavi.kadam, konstantin.ananyev,
	ruifeng.wang, drc, stephen, Jerin Jacob

From: Jerin Jacob <jerinj@marvell.com>

Dump the arm64 arch state register in oops
handling routine.

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
---
 lib/eal/unix/eal_oops.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/lib/eal/unix/eal_oops.c b/lib/eal/unix/eal_oops.c
index a9c22cbe70..6793497bee 100644
--- a/lib/eal/unix/eal_oops.c
+++ b/lib/eal/unix/eal_oops.c
@@ -161,6 +161,25 @@ archinfo_dump(ucontext_t *uc)
 	stack_code_dump((void *)mc->gregs[REG_RSP], (void *)mc->gregs[REG_RIP]);
 }
 
+#elif defined(RTE_ARCH_ARM64) && defined(RTE_EXEC_ENV_LINUX)
+
+static void
+archinfo_dump(ucontext_t *uc)
+{
+	mcontext_t *mc = &uc->uc_mcontext;
+	int i;
+
+	oops_print("PC : 0x%.16llx ", mc->pc);
+	oops_print("SP : 0x%.16llx\n", mc->sp);
+	for (i = 0; i < 31; i++)
+		oops_print("X%.2d: 0x%.16llx%s", i, mc->regs[i],
+			   i & 0x1 ? "\n" : " ");
+
+	oops_print("PSTATE: 0x%.16llx\n", mc->pstate);
+
+	stack_code_dump((void *)mc->sp, (void *)mc->pc);
+}
+
 #else
 
 static void
-- 
2.33.0


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [dpdk-dev] [PATCH v3 6/6] test/oops: support unit test case for oops handling APIs
  2021-09-06  4:17     ` [dpdk-dev] [PATCH v3 0/6] support oops handling jerinj
                         ` (4 preceding siblings ...)
  2021-09-06  4:17       ` [dpdk-dev] [PATCH v3 5/6] eal/arm64: " jerinj
@ 2021-09-06  4:17       ` jerinj
  2021-09-21 17:30       ` [dpdk-dev] [PATCH v3 0/6] support oops handling Thomas Monjalon
  6 siblings, 0 replies; 45+ messages in thread
From: jerinj @ 2021-09-06  4:17 UTC (permalink / raw)
  To: dev
  Cc: thomas, david.marchand, bruce.richardson, dmitry.kozliuk,
	navasile, dmitrym, pallavi.kadam, konstantin.ananyev,
	ruifeng.wang, drc, stephen, Jerin Jacob

From: Jerin Jacob <jerinj@marvell.com>

Added unit test cases for all the oops handling APIs.

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
---
 app/test/meson.build |   2 +
 app/test/test_oops.c | 122 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 124 insertions(+)
 create mode 100644 app/test/test_oops.c

diff --git a/app/test/meson.build b/app/test/meson.build
index a7611686ad..1e471ab351 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -97,6 +97,7 @@ test_sources = files(
         'test_metrics.c',
         'test_mcslock.c',
         'test_mp_secondary.c',
+        'test_oops.c',
         'test_per_lcore.c',
         'test_pflock.c',
         'test_pmd_perf.c',
@@ -236,6 +237,7 @@ fast_tests = [
         ['memzone_autotest', false],
         ['meter_autotest', true],
         ['multiprocess_autotest', false],
+        ['oops_autotest', true],
         ['per_lcore_autotest', true],
         ['pflock_autotest', true],
         ['prefetch_autotest', true],
diff --git a/app/test/test_oops.c b/app/test/test_oops.c
new file mode 100644
index 0000000000..288761822c
--- /dev/null
+++ b/app/test/test_oops.c
@@ -0,0 +1,122 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2021 Marvell
+ */
+
+#include <setjmp.h>
+#include <signal.h>
+
+#include <rte_config.h>
+#include <rte_oops.h>
+
+#include "test.h"
+
+static jmp_buf pc;
+static bool detected_segfault;
+
+static void
+segv_handler(int sig, siginfo_t *info, void *ctx)
+{
+	detected_segfault = true;
+	rte_oops_decode(sig, info, (ucontext_t *)ctx);
+	longjmp(pc, 1);
+}
+
+/* OS specific way install the signal segfault handler*/
+static int
+segv_handler_install(void)
+{
+	struct sigaction sa;
+
+	sigemptyset(&sa.sa_mask);
+	sa.sa_sigaction = &segv_handler;
+	sa.sa_flags = SA_SIGINFO;
+
+	return sigaction(SIGSEGV, &sa, NULL);
+}
+
+static int
+test_oops_generate(void)
+{
+	int rc;
+
+	rc = segv_handler_install();
+	TEST_ASSERT_EQUAL(rc, 0, "rc=%d\n", rc);
+
+	detected_segfault = false;
+	rc = setjmp(pc); /* Save the execution state */
+	if (rc == 0) {
+		/* Generate a segfault */
+		*(volatile int *)0x05 = 0;
+	} else { /* logjump from segv_handler */
+		if (detected_segfault)
+			return TEST_SUCCESS;
+
+	}
+	return TEST_FAILED;
+}
+
+static int
+test_signal_handler_installed(int count, int *signals)
+{
+	int i, rc, verified = 0;
+	struct sigaction sa;
+
+	for (i = 0; i < count; i++) {
+		rc = sigaction(signals[i], NULL, &sa);
+		if (rc) {
+			printf("Failed to get sigaction for %d", signals[i]);
+			continue;
+		}
+		if (sa.sa_handler != SIG_DFL)
+			verified++;
+	}
+	TEST_ASSERT_EQUAL(count, verified, "count=%d verified=%d\n", count,
+			  verified);
+	return TEST_SUCCESS;
+}
+
+static int
+test_oops_signals_enabled(void)
+{
+	int *signals = NULL;
+	int i, rc;
+
+	rc = rte_oops_signals_enabled(signals);
+	if (rc == 0)
+		return TEST_SUCCESS;
+
+	signals = malloc(sizeof(int) * rc);
+	rc = rte_oops_signals_enabled(signals);
+	TEST_ASSERT_NOT_EQUAL(rc, 0, "rc=%d\n", rc);
+	free(signals);
+
+	signals = malloc(sizeof(int) * RTE_OOPS_SIGNALS_MAX);
+	rc = rte_oops_signals_enabled(signals);
+	TEST_ASSERT_NOT_EQUAL(rc, 0, "rc=%d\n", rc);
+
+	for (i = 0; i < rc; i++)
+		TEST_ASSERT_NOT_EQUAL(signals[i], 0, "idx=%d val=%d\n", i,
+				      signals[i]);
+
+	rc = test_signal_handler_installed(rc, signals);
+	free(signals);
+
+	return rc;
+}
+
+static struct unit_test_suite oops_tests = {
+	.suite_name = "oops autotest",
+	.setup = NULL,
+	.teardown = NULL,
+	.unit_test_cases = {
+			    TEST_CASE(test_oops_signals_enabled),
+			    TEST_CASE(test_oops_generate),
+			    TEST_CASES_END()}};
+
+static int
+test_oops(void)
+{
+	return unit_test_suite_runner(&oops_tests);
+}
+
+REGISTER_TEST_COMMAND(oops_autotest, test_oops);
-- 
2.33.0


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/6] support oops handling
  2021-09-06  4:17     ` [dpdk-dev] [PATCH v3 0/6] support oops handling jerinj
                         ` (5 preceding siblings ...)
  2021-09-06  4:17       ` [dpdk-dev] [PATCH v3 6/6] test/oops: support unit test case for oops handling APIs jerinj
@ 2021-09-21 17:30       ` Thomas Monjalon
  2021-09-21 17:54         ` Jerin Jacob
  6 siblings, 1 reply; 45+ messages in thread
From: Thomas Monjalon @ 2021-09-21 17:30 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, david.marchand, bruce.richardson, dmitry.kozliuk, navasile,
	dmitrym, pallavi.kadam, konstantin.ananyev, ruifeng.wang, drc,
	stephen, olivier.matz, ferruh.yigit, andrew.rybchenko,
	ajit.khaparde, mb

06/09/2021 06:17, jerinj@marvell.com:
> It is handy to get detailed OOPS information like Linux kernel
> when DPDK application crashes without losing any of the features
> provided by coredump infrastructure by the OS.
> 
> This patch series introduces the APIs to handle OOPS in DPDK.

I don't understand how it is related to DPDK.
It looks something to be handled freely by the application
without DPDK forcing anything.
What is the benefit for other DPDK features?
Which problem is it solving?



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/6] support oops handling
  2021-09-21 17:30       ` [dpdk-dev] [PATCH v3 0/6] support oops handling Thomas Monjalon
@ 2021-09-21 17:54         ` Jerin Jacob
  2021-09-22  7:34           ` Thomas Monjalon
  0 siblings, 1 reply; 45+ messages in thread
From: Jerin Jacob @ 2021-09-21 17:54 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Jerin Jacob, dpdk-dev, David Marchand, Richardson, Bruce,
	Dmitry Kozlyuk, Narcisa Ana Maria Vasile,
	Dmitry Malloy (MESHCHANINOV),
	Pallavi Kadam, Ananyev, Konstantin,
	Ruifeng Wang (Arm Technology China),
	David Christensen, Stephen Hemminger, Olivier Matz, Ferruh Yigit,
	Andrew Rybchenko, Ajit Khaparde, Morten Brørup

On Tue, Sep 21, 2021 at 11:00 PM Thomas Monjalon <thomas@monjalon.net> wrote:
>
> 06/09/2021 06:17, jerinj@marvell.com:
> > It is handy to get detailed OOPS information like Linux kernel
> > when DPDK application crashes without losing any of the features
> > provided by coredump infrastructure by the OS.
> >
> > This patch series introduces the APIs to handle OOPS in DPDK.
>
> I don't understand how it is related to DPDK.

It abstracts the execution environment/architecture(See Arch Info in
log)[1] details to capture
details on fault handlers to enable additional details on fault from
DPDK application for
additional debugging information. Just like Kernel prints its OOPS on fault.

> It looks something to be handled freely by the application
> without DPDK forcing anything.

This NOT enforcing application to use DPDK OOPS handler, instead, if
registered then
it uses the default handler.

Even if the default handler is registered it invokes the application
handler if the application registers
the fault handler. So there is not difference in behavior.

> What is the benefit for other DPDK features?

Could you clarify this question a bit more?

> Which problem is it solving?

Better debug trace on fault for DPDK application. Instead of faulting
with no information.


[1]

Backtrace:
----------
[  0x55e8b56d5cee]: test_oops_generate()+0x75
[  0x55e8b5459843]: unit_test_suite_runner()+0x1aa
[  0x55e8b56d605c]: test_oops()+0x13
[  0x55e8b544bdfc]: cmd_autotest_parsed()+0x55
[  0x55e8b6063a0d]: cmdline_parse()+0x319
[  0x55e8b6061dea]: cmdline_valid_buffer()+0x35
[  0x55e8b6066bd8]: rdline_char_in()+0xc48
[  0x55e8b606221c]: cmdline_in()+0x62
[  0x55e8b6062495]: cmdline_interact()+0x56
[  0x55e8b5459314]: main()+0x65e
[  0x7f54b25d2b25]: __libc_start_main()+0xd5
[  0x55e8b544bc9e]: _start()+0x2e

Arch info:
----------
R8 : 0x0000000000000000  R9 : 0x0000000000000000
R10: 0x00007f54b25b8b48  R11: 0x00007f54b25e7930
R12: 0x00007fffc695e610  R13: 0x0000000000000000
R14: 0x0000000000000000  R15: 0x0000000000000000
RAX: 0x0000000000000005  RBX: 0x0000000000000001
RCX: 0x00007f54b278a943  RDX: 0x3769043bf13a2594
RBP: 0x00007fffc6958340  RSP: 0x00007fffc6958330
RSI: 0x0000000000000000  RDI: 0x000055e8c4c1e380
RIP: 0x000055e8b56d5cee  EFL: 0x0000000000010246

Stack dump:
----------
0x7fffc6958330: 0x6000000
0x7fffc6958334: 0x0
0x7fffc6958338: 0x30cfeac5
0x7fffc695833c: 0x0
0x7fffc6958340: 0xe08395c6
0x7fffc6958344: 0xff7f0000
0x7fffc6958348: 0x439845b5
0x7fffc695834c: 0xe8550000
0x7fffc6958350: 0x0
0x7fffc6958354: 0xb000000
0x7fffc6958358: 0x20445bb9
0x7fffc695835c: 0xe8550000
0x7fffc6958360: 0x925506b6
0x7fffc6958364: 0x0
0x7fffc6958368: 0x0
0x7fffc695836c: 0x0

Code dump:
----------
0x55e8b56d5cee: 0xc7000000
0x55e8b56d5cf2: 0xeb12
0x55e8b56d5cf6: 0xfb6054b
0x55e8b56d5cfa: 0x87540f84
0x55e8b56d5cfe: 0xc07407b8
0x55e8b56d5d02: 0x0
0x55e8b56d5d06: 0xeb05b8ff
0x55e8b56d5d0a: 0xffffffc9
0x55e8b56d5d0e: 0xc3554889
0x55e8b56d5d12: 0xe54881ec
0x55e8b56d5d16: 0xc0000000
0x55e8b56d5d1a: 0x89bd4cff
0x55e8b56d5d1e: 0xffff4889
0x55e8b56d5d22: 0xb540ffff
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/6] support oops handling
  2021-09-21 17:54         ` Jerin Jacob
@ 2021-09-22  7:34           ` Thomas Monjalon
  2021-09-22  8:03             ` Jerin Jacob
  0 siblings, 1 reply; 45+ messages in thread
From: Thomas Monjalon @ 2021-09-22  7:34 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: Jerin Jacob, dpdk-dev, David Marchand, Richardson, Bruce,
	Dmitry Kozlyuk, Narcisa Ana Maria Vasile,
	Dmitry Malloy (MESHCHANINOV),
	Pallavi Kadam, Ananyev, Konstantin,
	Ruifeng Wang (Arm Technology China),
	David Christensen, Stephen Hemminger, Olivier Matz, Ferruh Yigit,
	Andrew Rybchenko, Ajit Khaparde, Morten Brørup

21/09/2021 19:54, Jerin Jacob:
> On Tue, Sep 21, 2021 at 11:00 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> >
> > 06/09/2021 06:17, jerinj@marvell.com:
> > > It is handy to get detailed OOPS information like Linux kernel
> > > when DPDK application crashes without losing any of the features
> > > provided by coredump infrastructure by the OS.
> > >
> > > This patch series introduces the APIs to handle OOPS in DPDK.
> >
> > I don't understand how it is related to DPDK.
> 
> It abstracts the execution environment/architecture(See Arch Info in
> log)[1] details to capture
> details on fault handlers to enable additional details on fault from
> DPDK application for
> additional debugging information. Just like Kernel prints its OOPS on fault.

Not sure it is a good direction to achieve the same features as a kernel.
In recent years, the idea was to make DPDK a focused library.

> > It looks something to be handled freely by the application
> > without DPDK forcing anything.
> 
> This NOT enforcing application to use DPDK OOPS handler, instead, if
> registered then
> it uses the default handler.
> 
> Even if the default handler is registered it invokes the application
> handler if the application registers
> the fault handler. So there is not difference in behavior.

OK

> > What is the benefit for other DPDK features?
> 
> Could you clarify this question a bit more?

I mean is it used by other parts of DPDK, or just a standalone feature?

> > Which problem is it solving?
> 
> Better debug trace on fault for DPDK application. Instead of faulting
> with no information.

It does not look to be in the scope of DPDK, or I miss something.



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/6] support oops handling
  2021-09-22  7:34           ` Thomas Monjalon
@ 2021-09-22  8:03             ` Jerin Jacob
  2021-09-22  8:33               ` Thomas Monjalon
  0 siblings, 1 reply; 45+ messages in thread
From: Jerin Jacob @ 2021-09-22  8:03 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Jerin Jacob, dpdk-dev, David Marchand, Richardson, Bruce,
	Dmitry Kozlyuk, Narcisa Ana Maria Vasile,
	Dmitry Malloy (MESHCHANINOV),
	Pallavi Kadam, Ananyev, Konstantin,
	Ruifeng Wang (Arm Technology China),
	David Christensen, Stephen Hemminger, Olivier Matz, Ferruh Yigit,
	Andrew Rybchenko, Ajit Khaparde, Morten Brørup

On Wed, Sep 22, 2021 at 1:04 PM Thomas Monjalon <thomas@monjalon.net> wrote:
>
> 21/09/2021 19:54, Jerin Jacob:
> > On Tue, Sep 21, 2021 at 11:00 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> > >
> > > 06/09/2021 06:17, jerinj@marvell.com:
> > > > It is handy to get detailed OOPS information like Linux kernel
> > > > when DPDK application crashes without losing any of the features
> > > > provided by coredump infrastructure by the OS.
> > > >
> > > > This patch series introduces the APIs to handle OOPS in DPDK.
> > >
> > > I don't understand how it is related to DPDK.
> >
> > It abstracts the execution environment/architecture(See Arch Info in
> > log)[1] details to capture
> > details on fault handlers to enable additional details on fault from
> > DPDK application for
> > additional debugging information. Just like Kernel prints its OOPS on fault.
>
> Not sure it is a good direction to achieve the same features as a kernel.

I just gave an example, that kernel has this feature and DPDK does not have it.
And it is good for DPDK applications.

Any specific point where you think this feature is not good for DPDK
in-tree and out of tree
applications?

> In recent years, the idea was to make DPDK a focused library.

Not sure how this feature is not deviating from that. See below, on
libunwind library usage.

>
> > > It looks something to be handled freely by the application
> > > without DPDK forcing anything.
> >
> > This NOT enforcing application to use DPDK OOPS handler, instead, if
> > registered then
> > it uses the default handler.
> >
> > Even if the default handler is registered it invokes the application
> > handler if the application registers
> > the fault handler. So there is not difference in behavior.
>
> OK
>
> > > What is the benefit for other DPDK features?
> >
> > Could you clarify this question a bit more?
>
> I mean is it used by other parts of DPDK, or just a standalone feature?

Standalone feature in EAL. It can get a crash dump from any internal
library if it segfaults.
Default handler can be extended if we need more information specific
to DPDK libraries if need
(For example BPF etc)

>
> > > Which problem is it solving?
> >
> > Better debug trace on fault for DPDK application. Instead of faulting
> > with no information.
>
> It does not look to be in the scope of DPDK, or I miss something.

I think it is, like we have APIs for creating control threads in EAL.

Also, This feature is dependent on libunwind as an optional dependency.
So we are not duplicating any other library effort just that integrating
all together including arch specific bits in EAL to have a feature for
better DPDK application usage.

>
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/6] support oops handling
  2021-09-22  8:03             ` Jerin Jacob
@ 2021-09-22  8:33               ` Thomas Monjalon
  2021-09-22  8:49                 ` Jerin Jacob
  0 siblings, 1 reply; 45+ messages in thread
From: Thomas Monjalon @ 2021-09-22  8:33 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, David Marchand, Richardson, Bruce, Dmitry Kozlyuk,
	Narcisa Ana Maria Vasile, Dmitry Malloy (MESHCHANINOV),
	Pallavi Kadam, Ananyev, Konstantin,
	Ruifeng Wang (Arm Technology China),
	David Christensen, Stephen Hemminger, Olivier Matz, Ferruh Yigit,
	Andrew Rybchenko, Ajit Khaparde, Morten Brørup, Jerin Jacob,
	techboard

22/09/2021 10:03, Jerin Jacob:
> On Wed, Sep 22, 2021 at 1:04 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> > 21/09/2021 19:54, Jerin Jacob:
> > > On Tue, Sep 21, 2021 at 11:00 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> > > > 06/09/2021 06:17, jerinj@marvell.com:
> > > > > It is handy to get detailed OOPS information like Linux kernel
> > > > > when DPDK application crashes without losing any of the features
> > > > > provided by coredump infrastructure by the OS.
> > > > >
> > > > > This patch series introduces the APIs to handle OOPS in DPDK.
> > > >
> > > > I don't understand how it is related to DPDK.
> > >
> > > It abstracts the execution environment/architecture(See Arch Info in
> > > log)[1] details to capture
> > > details on fault handlers to enable additional details on fault from
> > > DPDK application for
> > > additional debugging information. Just like Kernel prints its OOPS on fault.
> >
> > Not sure it is a good direction to achieve the same features as a kernel.
> 
> I just gave an example, that kernel has this feature and DPDK does not have it.
> And it is good for DPDK applications.
> 
> Any specific point where you think this feature is not good for DPDK
> in-tree and out of tree applications?

No specific. Just a fear we make life more complex for some users,
because there are always bugs and unplanned side effects.

> > In recent years, the idea was to make DPDK a focused library.
> 
> Not sure how this feature is not deviating from that. See below, on
> libunwind library usage.
> 
> >
> > > > It looks something to be handled freely by the application
> > > > without DPDK forcing anything.
> > >
> > > This NOT enforcing application to use DPDK OOPS handler, instead, if
> > > registered then
> > > it uses the default handler.
> > >
> > > Even if the default handler is registered it invokes the application
> > > handler if the application registers
> > > the fault handler. So there is not difference in behavior.
> >
> > OK
> >
> > > > What is the benefit for other DPDK features?
> > >
> > > Could you clarify this question a bit more?
> >
> > I mean is it used by other parts of DPDK, or just a standalone feature?
> 
> Standalone feature in EAL. It can get a crash dump from any internal
> library if it segfaults.
> Default handler can be extended if we need more information specific
> to DPDK libraries if need
> (For example BPF etc)
> 
> >
> > > > Which problem is it solving?
> > >
> > > Better debug trace on fault for DPDK application. Instead of faulting
> > > with no information.
> >
> > It does not look to be in the scope of DPDK, or I miss something.
> 
> I think it is, like we have APIs for creating control threads in EAL.
> 
> Also, This feature is dependent on libunwind as an optional dependency.
> So we are not duplicating any other library effort just that integrating
> all together including arch specific bits in EAL to have a feature for
> better DPDK application usage.

That's a difficult decision. We need more opinions.
We may also discuss it in the techboard meeting today.



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/6] support oops handling
  2021-09-22  8:33               ` Thomas Monjalon
@ 2021-09-22  8:49                 ` Jerin Jacob
  0 siblings, 0 replies; 45+ messages in thread
From: Jerin Jacob @ 2021-09-22  8:49 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Jerin Jacob, dpdk-dev, David Marchand, Richardson, Bruce,
	Dmitry Kozlyuk, Narcisa Ana Maria Vasile,
	Dmitry Malloy (MESHCHANINOV),
	Pallavi Kadam, Ananyev, Konstantin,
	Ruifeng Wang (Arm Technology China),
	David Christensen, Stephen Hemminger, Olivier Matz, Ferruh Yigit,
	Andrew Rybchenko, Ajit Khaparde, Morten Brørup, techboard

On Wed, Sep 22, 2021 at 2:03 PM Thomas Monjalon <thomas@monjalon.net> wrote:
>
> 22/09/2021 10:03, Jerin Jacob:
> > On Wed, Sep 22, 2021 at 1:04 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> > > 21/09/2021 19:54, Jerin Jacob:
> > > > On Tue, Sep 21, 2021 at 11:00 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> > > > > 06/09/2021 06:17, jerinj@marvell.com:
> > > > > > It is handy to get detailed OOPS information like Linux kernel
> > > > > > when DPDK application crashes without losing any of the features
> > > > > > provided by coredump infrastructure by the OS.
> > > > > >
> > > > > > This patch series introduces the APIs to handle OOPS in DPDK.
> > > > >
> > > > > I don't understand how it is related to DPDK.
> > > >
> > > > It abstracts the execution environment/architecture(See Arch Info in
> > > > log)[1] details to capture
> > > > details on fault handlers to enable additional details on fault from
> > > > DPDK application for
> > > > additional debugging information. Just like Kernel prints its OOPS on fault.
> > >
> > > Not sure it is a good direction to achieve the same features as a kernel.
> >
> > I just gave an example, that kernel has this feature and DPDK does not have it.
> > And it is good for DPDK applications.
> >
> > Any specific point where you think this feature is not good for DPDK
> > in-tree and out of tree applications?
>
> No specific. Just a fear we make life more complex for some users,
> because there are always bugs and unplanned side effects.

OK. That's more of a non technical thing.

I have provided an EAL switch to disable this feature like
telemetry has a disable option as EAL argument. It can be used for this purpose.

>
> > > In recent years, the idea was to make DPDK a focused library.
> >
> > Not sure how this feature is not deviating from that. See below, on
> > libunwind library usage.
> >
> > >
> > > > > It looks something to be handled freely by the application
> > > > > without DPDK forcing anything.
> > > >
> > > > This NOT enforcing application to use DPDK OOPS handler, instead, if
> > > > registered then
> > > > it uses the default handler.
> > > >
> > > > Even if the default handler is registered it invokes the application
> > > > handler if the application registers
> > > > the fault handler. So there is not difference in behavior.
> > >
> > > OK
> > >
> > > > > What is the benefit for other DPDK features?
> > > >
> > > > Could you clarify this question a bit more?
> > >
> > > I mean is it used by other parts of DPDK, or just a standalone feature?
> >
> > Standalone feature in EAL. It can get a crash dump from any internal
> > library if it segfaults.
> > Default handler can be extended if we need more information specific
> > to DPDK libraries if need
> > (For example BPF etc)
> >
> > >
> > > > > Which problem is it solving?
> > > >
> > > > Better debug trace on fault for DPDK application. Instead of faulting
> > > > with no information.
> > >
> > > It does not look to be in the scope of DPDK, or I miss something.
> >
> > I think it is, like we have APIs for creating control threads in EAL.
> >
> > Also, This feature is dependent on libunwind as an optional dependency.
> > So we are not duplicating any other library effort just that integrating
> > all together including arch specific bits in EAL to have a feature for
> > better DPDK application usage.
>
> That's a difficult decision. We need more opinions.

Sure.

> We may also discuss it in the techboard meeting today.

Sure.

>
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [dpdk-dev] [PATCH v3 3/6] eal: support libunwind based backtrace
  2021-09-06  4:17       ` [dpdk-dev] [PATCH v3 3/6] eal: support libunwind based backtrace jerinj
@ 2022-01-27 20:47         ` Stephen Hemminger
  2022-01-28  4:33           ` Jerin Jacob
  0 siblings, 1 reply; 45+ messages in thread
From: Stephen Hemminger @ 2022-01-27 20:47 UTC (permalink / raw)
  To: jerinj
  Cc: dev, Aaron Conole, Michael Santana, Bruce Richardson, thomas,
	david.marchand, dmitry.kozliuk, navasile, dmitrym, pallavi.kadam,
	konstantin.ananyev, ruifeng.wang, drc

On Mon, 6 Sep 2021 09:47:29 +0530
<jerinj@marvell.com> wrote:

> From: Jerin Jacob <jerinj@marvell.com>
> 
> adding optional libwind library dependency to DPDK for
> enhanced backtrace based on ucontext.
> 
> Signed-off-by: Jerin Jacob <jerinj@marvell.com>


Was looking for better backtrace and noticed that there is libbacktrace
on github (BSD licensed). It provides more information like file and line number.
Maybe DPDK should integrate it?


PS: existing rte_dump_stack() is not safe from signal handlers.
https://bugs.dpdk.org/show_bug.cgi?id=929

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [dpdk-dev] [PATCH v3 3/6] eal: support libunwind based backtrace
  2022-01-27 20:47         ` Stephen Hemminger
@ 2022-01-28  4:33           ` Jerin Jacob
  2022-01-28  8:41             ` Thomas Monjalon
  0 siblings, 1 reply; 45+ messages in thread
From: Jerin Jacob @ 2022-01-28  4:33 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Jerin Jacob, dpdk-dev, Aaron Conole, Michael Santana,
	Bruce Richardson, Thomas Monjalon, David Marchand,
	Dmitry Kozlyuk, Narcisa Ana Maria Vasile,
	Dmitry Malloy (MESHCHANINOV),
	Pallavi Kadam, Ananyev, Konstantin,
	Ruifeng Wang (Arm Technology China),
	David Christensen

On Fri, Jan 28, 2022 at 2:18 AM Stephen Hemminger
<stephen@networkplumber.org> wrote:
>
> On Mon, 6 Sep 2021 09:47:29 +0530
> <jerinj@marvell.com> wrote:
>
> > From: Jerin Jacob <jerinj@marvell.com>
> >
> > adding optional libwind library dependency to DPDK for
> > enhanced backtrace based on ucontext.
> >
> > Signed-off-by: Jerin Jacob <jerinj@marvell.com>
>
>
> Was looking for better backtrace and noticed that there is libbacktrace
> on github (BSD licensed). It provides more information like file and line number.
> Maybe DPDK should integrate it?

TB already decided to NOT pursue that path.


>
>
> PS: existing rte_dump_stack() is not safe from signal handlers.
> https://bugs.dpdk.org/show_bug.cgi?id=929

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [dpdk-dev] [PATCH v3 3/6] eal: support libunwind based backtrace
  2022-01-28  4:33           ` Jerin Jacob
@ 2022-01-28  8:41             ` Thomas Monjalon
  2022-01-28 14:27               ` Jerin Jacob
  0 siblings, 1 reply; 45+ messages in thread
From: Thomas Monjalon @ 2022-01-28  8:41 UTC (permalink / raw)
  To: Stephen Hemminger, Jerin Jacob
  Cc: Jerin Jacob, dpdk-dev, Aaron Conole, Michael Santana,
	Bruce Richardson, David Marchand, Dmitry Kozlyuk,
	Narcisa Ana Maria Vasile, Dmitry Malloy (MESHCHANINOV),
	Pallavi Kadam, Ananyev, Konstantin,
	Ruifeng Wang (Arm Technology China),
	David Christensen

28/01/2022 05:33, Jerin Jacob:
> On Fri, Jan 28, 2022 at 2:18 AM Stephen Hemminger
> <stephen@networkplumber.org> wrote:
> >
> > On Mon, 6 Sep 2021 09:47:29 +0530
> > <jerinj@marvell.com> wrote:
> >
> > > From: Jerin Jacob <jerinj@marvell.com>
> > >
> > > adding optional libwind library dependency to DPDK for
> > > enhanced backtrace based on ucontext.
> > >
> > > Signed-off-by: Jerin Jacob <jerinj@marvell.com>
> >
> >
> > Was looking for better backtrace and noticed that there is libbacktrace
> > on github (BSD licensed). It provides more information like file and line number.
> > Maybe DPDK should integrate it?
> 
> TB already decided to NOT pursue that path.

I don't remember why.
Was it because of adding a dependency in makefile build system?
Adding optional dependencies is easier now with Meson.



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [dpdk-dev] [PATCH v3 3/6] eal: support libunwind based backtrace
  2022-01-28  8:41             ` Thomas Monjalon
@ 2022-01-28 14:27               ` Jerin Jacob
  2022-01-28 17:05                 ` Stephen Hemminger
  0 siblings, 1 reply; 45+ messages in thread
From: Jerin Jacob @ 2022-01-28 14:27 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Stephen Hemminger, Jerin Jacob, dpdk-dev, Aaron Conole,
	Michael Santana, Bruce Richardson, David Marchand,
	Dmitry Kozlyuk, Narcisa Ana Maria Vasile,
	Dmitry Malloy (MESHCHANINOV),
	Pallavi Kadam, Ananyev, Konstantin,
	Ruifeng Wang (Arm Technology China),
	David Christensen

On Fri, Jan 28, 2022 at 2:11 PM Thomas Monjalon <thomas@monjalon.net> wrote:
>
> 28/01/2022 05:33, Jerin Jacob:
> > On Fri, Jan 28, 2022 at 2:18 AM Stephen Hemminger
> > <stephen@networkplumber.org> wrote:
> > >
> > > On Mon, 6 Sep 2021 09:47:29 +0530
> > > <jerinj@marvell.com> wrote:
> > >
> > > > From: Jerin Jacob <jerinj@marvell.com>
> > > >
> > > > adding optional libwind library dependency to DPDK for
> > > > enhanced backtrace based on ucontext.
> > > >
> > > > Signed-off-by: Jerin Jacob <jerinj@marvell.com>
> > >
> > >
> > > Was looking for better backtrace and noticed that there is libbacktrace
> > > on github (BSD licensed). It provides more information like file and line number.
> > > Maybe DPDK should integrate it?
> >
> > TB already decided to NOT pursue that path.
>
> I don't remember why.

Feature overlap with systemd features.

> Was it because of adding a dependency in makefile build system?
> Adding optional dependencies is easier now with Meson.



>
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [dpdk-dev] [PATCH v3 3/6] eal: support libunwind based backtrace
  2022-01-28 14:27               ` Jerin Jacob
@ 2022-01-28 17:05                 ` Stephen Hemminger
  0 siblings, 0 replies; 45+ messages in thread
From: Stephen Hemminger @ 2022-01-28 17:05 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: Thomas Monjalon, Jerin Jacob, dpdk-dev, Aaron Conole,
	Michael Santana, Bruce Richardson, David Marchand,
	Dmitry Kozlyuk, Narcisa Ana Maria Vasile,
	Dmitry Malloy (MESHCHANINOV),
	Pallavi Kadam, Ananyev, Konstantin,
	Ruifeng Wang (Arm Technology China),
	David Christensen

On Fri, 28 Jan 2022 19:57:40 +0530
Jerin Jacob <jerinjacobk@gmail.com> wrote:

> On Fri, Jan 28, 2022 at 2:11 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> >
> > 28/01/2022 05:33, Jerin Jacob:  
> > > On Fri, Jan 28, 2022 at 2:18 AM Stephen Hemminger
> > > <stephen@networkplumber.org> wrote:  
> > > >
> > > > On Mon, 6 Sep 2021 09:47:29 +0530
> > > > <jerinj@marvell.com> wrote:
> > > >  
> > > > > From: Jerin Jacob <jerinj@marvell.com>
> > > > >
> > > > > adding optional libwind library dependency to DPDK for
> > > > > enhanced backtrace based on ucontext.
> > > > >
> > > > > Signed-off-by: Jerin Jacob <jerinj@marvell.com>  
> > > >
> > > >
> > > > Was looking for better backtrace and noticed that there is libbacktrace
> > > > on github (BSD licensed). It provides more information like file and line number.
> > > > Maybe DPDK should integrate it?  
> > >
> > > TB already decided to NOT pursue that path.  
> >
> > I don't remember why.  
> 
> Feature overlap with systemd features.
> 
> > Was it because of adding a dependency in makefile build system?
> > Adding optional dependencies is easier now with Meson.  
> 
> 
> 
> >
> >  

Okay, thanks. I may look at the current signal unsafety bug of
the current code.

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2022-01-28 17:05 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-30  8:49 [dpdk-dev] 0/6] support oops handling jerinj
2021-07-30  8:49 ` [dpdk-dev] 1/6] eal: introduce oops handling API jerinj
2021-08-17  3:27   ` [dpdk-dev] [PATCH v2 0/6] support oops handling jerinj
2021-08-17  3:27     ` [dpdk-dev] [PATCH v2 1/6] eal: introduce oops handling API jerinj
2021-08-17  3:53       ` Stephen Hemminger
2021-08-17  7:38         ` Jerin Jacob
2021-08-17 15:09           ` Stephen Hemminger
2021-08-17 15:27             ` Jerin Jacob
2021-08-17 15:52               ` Stephen Hemminger
2021-08-18  9:37                 ` Jerin Jacob
2021-08-18 16:46                   ` Stephen Hemminger
2021-08-18 18:04                     ` Jerin Jacob
2021-08-17  3:27     ` [dpdk-dev] [PATCH v2 2/6] eal: oops handling API implementation jerinj
2021-08-17  3:52       ` Stephen Hemminger
2021-08-17 10:24         ` Jerin Jacob
2021-08-17  3:27     ` [dpdk-dev] [PATCH v2 3/6] eal: support libunwind based backtrace jerinj
2021-08-17  3:27     ` [dpdk-dev] [PATCH v2 4/6] eal/x86: support register dump for oops jerinj
2021-08-17  3:27     ` [dpdk-dev] [PATCH v2 5/6] eal/arm64: " jerinj
2021-08-17  3:27     ` [dpdk-dev] [PATCH v2 6/6] test/oops: support unit test case for oops handling APIs jerinj
2021-09-06  4:17     ` [dpdk-dev] [PATCH v3 0/6] support oops handling jerinj
2021-09-06  4:17       ` [dpdk-dev] [PATCH v3 1/6] eal: introduce oops handling API jerinj
2021-09-06  4:17       ` [dpdk-dev] [PATCH v3 2/6] eal: oops handling API implementation jerinj
2021-09-06  4:17       ` [dpdk-dev] [PATCH v3 3/6] eal: support libunwind based backtrace jerinj
2022-01-27 20:47         ` Stephen Hemminger
2022-01-28  4:33           ` Jerin Jacob
2022-01-28  8:41             ` Thomas Monjalon
2022-01-28 14:27               ` Jerin Jacob
2022-01-28 17:05                 ` Stephen Hemminger
2021-09-06  4:17       ` [dpdk-dev] [PATCH v3 4/6] eal/x86: support register dump for oops jerinj
2021-09-06  4:17       ` [dpdk-dev] [PATCH v3 5/6] eal/arm64: " jerinj
2021-09-06  4:17       ` [dpdk-dev] [PATCH v3 6/6] test/oops: support unit test case for oops handling APIs jerinj
2021-09-21 17:30       ` [dpdk-dev] [PATCH v3 0/6] support oops handling Thomas Monjalon
2021-09-21 17:54         ` Jerin Jacob
2021-09-22  7:34           ` Thomas Monjalon
2021-09-22  8:03             ` Jerin Jacob
2021-09-22  8:33               ` Thomas Monjalon
2021-09-22  8:49                 ` Jerin Jacob
2021-07-30  8:49 ` [dpdk-dev] 2/6] eal: oops handling API implementation jerinj
2021-08-02 22:46   ` David Christensen
2021-07-30  8:49 ` [dpdk-dev] 3/6] eal: support libunwind based backtrace jerinj
2021-07-30  8:49 ` [dpdk-dev] 4/6] eal/x86: support register dump for oops jerinj
2021-07-30  8:49 ` [dpdk-dev] 5/6] eal/arm64: " jerinj
2021-08-02 22:49   ` David Christensen
2021-08-16 16:24     ` Jerin Jacob
2021-07-30  8:49 ` [dpdk-dev] 6/6] test/oops: support unit test case for oops handling APIs jerinj

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).