From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 8357CA0C47; Wed, 18 Aug 2021 11:37:54 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 03D3D41134; Wed, 18 Aug 2021 11:37:54 +0200 (CEST) Received: from mail-io1-f45.google.com (mail-io1-f45.google.com [209.85.166.45]) by mails.dpdk.org (Postfix) with ESMTP id C2E384069E for ; Wed, 18 Aug 2021 11:37:52 +0200 (CEST) Received: by mail-io1-f45.google.com with SMTP id b7so1909959iob.4 for ; Wed, 18 Aug 2021 02:37:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=Fv8Yk9rB/BtIjJDgRk+Vi1EytBN9JXJ/hE1u0IhWPRA=; b=XpPab4OZx8Y4Vp9zVSNPqJWaY5x95Oghx9+4UmhawaS7YqRbA4NIcmq65odaILMu8T L4Savstk958CzZH0q7aPaZFe7iFMbBI+TcGtZJwS1uNqf5+YQ87ni71BaUO+fBaJWLLW ibnMSa2wWuplYV0A6IWFOk9eUVP34g2ci/KC4pDRKdly1tRZO8IW4jA37l3Pqt7SewD+ uUhpBhwM0szx19jTytKWTri80/+GIGw76KTEzVzMwfMLwGKYsDCucyXJBDizVuPhHFhA m7MXBHk0OnmxbQRSFEokKarL/ZsI2SR6nccybE6PLojet/vvldaEyBFSyM7k19Y0XpVH 18nQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Fv8Yk9rB/BtIjJDgRk+Vi1EytBN9JXJ/hE1u0IhWPRA=; b=tjmGx1xjOsoLooGre7Jf0zU9c1Qy03eD9fDCXNG811es8YnmXyJlnjv5vit03mf38r d+7HHZBgIuVauZn93On3v8AX4teOW70SspSJ5dsB40DKWp7CPUe5US4rZtqEftUzLi+V 8gePRZQoG6Ybc41vFn6VKCp7/PJ8aOejXCH6TSOXg5XXAr9Qs5UycJE41/iTiuZUY9Vv LARoZnTh//I13aShEyM+ftcTqNdE/sWYRbXGIzgNNSJyrL2jEpihd1mVy3LQPWOH5gI3 iaXQ97wyxWH5aQ6t5cM9kMC3DpPmbQE3W4OGxwUGssjdfwvdLhQ5IVHzz+HbPzWHFVCn YPow== X-Gm-Message-State: AOAM5337evzG2IVPepQVD77Gi1BfyXPwVbjVj3NPQ8E0dG4TSb6U3/Om gKtKNgPSbjUFVEfxWffUFs5Mea3O+Av40veEdvY= X-Google-Smtp-Source: ABdhPJxJS6CgINdpS/hbjb/+Nj4AZTpHIFo2kTL4gfrMi9Q5urIX1ffr4a8JNklJiWZpx3/IvNHo/O2oh2spjrEMlb8= X-Received: by 2002:a02:b88d:: with SMTP id p13mr7069826jam.104.1629279472051; Wed, 18 Aug 2021 02:37:52 -0700 (PDT) MIME-Version: 1.0 References: <20210730084938.2426128-2-jerinj@marvell.com> <20210817032723.3997054-1-jerinj@marvell.com> <20210817032723.3997054-2-jerinj@marvell.com> <20210816205345.6d686c7d@hermes.local> <20210817080924.7049fa2d@hermes.local> <20210817085231.16be26c5@hermes.local> In-Reply-To: <20210817085231.16be26c5@hermes.local> From: Jerin Jacob Date: Wed, 18 Aug 2021 15:07:25 +0530 Message-ID: To: Stephen Hemminger Cc: Jerin Jacob , dpdk-dev , Bruce Richardson , Ray Kinsella , Thomas Monjalon , David Marchand , Dmitry Kozlyuk , Narcisa Ana Maria Vasile , "Dmitry Malloy (MESHCHANINOV)" , Pallavi Kadam , "Ananyev, Konstantin" , "Ruifeng Wang (Arm Technology China)" , Jan Viktorin , David Christensen Content-Type: text/plain; charset="UTF-8" Subject: Re: [dpdk-dev] [PATCH v2 1/6] eal: introduce oops handling API X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On Tue, Aug 17, 2021 at 9:22 PM Stephen Hemminger wrote: > > On Tue, 17 Aug 2021 20:57:50 +0530 > Jerin Jacob wrote: > > > On Tue, Aug 17, 2021 at 8:39 PM Stephen Hemminger > > wrote: > > > > > > On Tue, 17 Aug 2021 13:08:46 +0530 > > > Jerin Jacob wrote: > > > > > > > On Tue, Aug 17, 2021 at 9:23 AM Stephen Hemminger > > > > wrote: > > > > > > > > > > On Tue, 17 Aug 2021 08:57:18 +0530 > > > > > wrote: > > > > > > > > > > > From: Jerin Jacob > > > > > > > > > > > > Introducing oops handling API with following specification > > > > > > and enable stub implementation for Linux and FreeBSD. > > > > > > > > > > > > On rte_eal_init() invocation, the EAL library installs the > > > > > > oops handler for the essential signals. > > > > > > The rte_oops_signals_enabled() API provides the list > > > > > > of signals the library installed by the EAL. > > > > > > > > > > This is a big change, and many applications already handle these > > > > > signals themselves. Therefore adding this needs to be opt-in > > > > > and not enabled by default. > > > > > > > > In order to avoid every application explicitly register this > > > > sighandler and to cater to the > > > > co-existing application-specific signal-hander usage. > > > > The following design has been chosen. (It is mentioned in the commit log, > > > > I will describe here for more clarity) > > > > > > > > Case 1: > > > > a) The application installs the signal handler prior to rte_eal_init(). > > > > b) Implementation stores the application-specific signal and replace a > > > > signal handler as oops eal handler > > > > c) when application/DPDK get the segfault, the default EAL oops > > > > handler gets invoked > > > > d) Then it dumps the EAL specific message, it calls the > > > > application-specific signal handler > > > > installed in step 1 by application. This avoids breaking any contract > > > > with the application. > > > > i.e Behavior is the same current EAL now. > > > > That is the reason for not using SA_RESETHAND(which call SIG_DFL after > > > > eal oops handler instead > > > > application-specific handler) > > > > > > > > Case 2: > > > > a) The application install the signal handler after rte_eal_init(), > > > > b) EAL hander get replaced with application handle then the application can call > > > > rte_oops_decode() to decode. > > > > > > > > In order to cater the above use case, rte_oops_signals_enabled() and > > > > rte_oops_decode() > > > > provided. > > > > > > > > Here we are not breaking any contract with the application. > > > > Do you have concerns about this design? > > > > > > In our application as a service it is important not to do any backtrace > > > in production. We rely on other infrastructure to process coredumps. > > > > Other infrastructure will work. For example, If we are using standard coredump > > using linux infra. In Current implementation, > > - EAL handler dump the DPDK OOPS like kernel on stderr > > - Implementation calls SIG_DFL in eal oops handler > > - The above step creates the coredump or re-directs any other > > infrastructure you are using for coredump. > > > > > > > > This should be controlled enabled by a command line argument. > > > > If we allow other infrastructure coredump to work as-is, why > > enable/disable required from eal? > > The addition of DPDK OOPS adds additional steps which make all > faults be identified as the oops code. Since we are using SA_ONSTACK it is not losing the original segfault info. I verified like this, Please find below the steps. 0) Enable coredump infra in Linux using coredumpctl or so 1) Apply this series 2) Apply for the following patch to create a segfault from the library. This will test, segfault caught by eal and forward to default Linux singal handler. [main]dell[dpdk.org] $ git diff diff --git a/lib/eal/linux/eal.c b/lib/eal/linux/eal.c index 3438a96b75..b935c32c98 100644 --- a/lib/eal/linux/eal.c +++ b/lib/eal/linux/eal.c @@ -1338,6 +1338,8 @@ rte_eal_init(int argc, char **argv) eal_mcfg_complete(); + /* Generate a segfault */ + *(volatile int *)0x05 = 0; return fctret; } 3)Build meson --buildtype debug build ninja -C build 4) Run $ ./build/app/test/dpdk-test --no-huge -c 0x2 Please find oops dump[1] and gdb core dump backtrace[2]. Gdb core dump trace preserves the original segfault cause and trace. Any other concerns? [1] [main]dell[dpdk.org] $ ./build/app/test/dpdk-test --no-huge -c 0x2 EAL: Detected 56 lcore(s) EAL: Detected 2 NUMA nodes EAL: Static memory layout is selected, amount of reserved memory can be adjusted with -m or --socket-mem EAL: Detected static linkage of DPDK EAL: Multi-process socket /run/user/1000/dpdk/rte/mp_socket EAL: Selected IOVA mode 'VA' EAL: WARNING: Main core has no memory on local socket! Signal info: ------------ PID: 2666512 Signal number: 11 Fault address: 0x5 Backtrace: ---------- [ 0x5582acd1e08a]: rte_eal_init()+0xe18 [ 0x5582ac086f4e]: main()+0x298 [ 0x7f0facf1fb25]: __libc_start_main()+0xd5 [ 0x5582ac079c9e]: _start()+0x2e Arch info: ---------- R8 : 0x0000000000000002 R9 : 0x00007ffe9273c590 R10: 0x0000000000000000 R11: 0x0000000000000246 R12: 0x00005582bc3ce7a0 R13: 0x00000000000000ca R14: 0x0000000000000000 R15: 0x0000000000000000 RAX: 0x0000000000000005 RBX: 0x00005582bc3c75c8 RCX: 0x00007ffe9273c530 RDX: 0x0000000000000000 RBP: 0x00007ffe9273c820 RSP: 0x00007ffe9273c690 RSI: 0x0000000000000008 RDI: 0x00000000000000ca RIP: 0x00005582acd1e08a EFL: 0x0000000000010246 [2] Core was generated by `./build/app/test/dpdk-test --no-huge -c 0x2'. Program terminated with signal SIGSEGV, Segmentation fault. #0 rte_eal_init (argc=4, argv=0x7ffe9273cec8) at ../lib/eal/linux/eal.c:1342 1342 *(volatile int *)0x05 = 0; [Current thread is 1 (Thread 0x7f0faca83c00 (LWP 2666512))] (gdb) bt #0 rte_eal_init (argc=4, argv=0x7ffe9273cec8) at ../lib/eal/linux/eal.c:1342 #1 0x00005582ac086f4e in main (argc=4, argv=0x7ffe9273cec8) at ../app/test/test.c:146 >