From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id B39D1A0C4C; Wed, 18 Aug 2021 20:05:23 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 75C274069C; Wed, 18 Aug 2021 20:05:23 +0200 (CEST) Received: from mail-il1-f179.google.com (mail-il1-f179.google.com [209.85.166.179]) by mails.dpdk.org (Postfix) with ESMTP id AE85440140 for ; Wed, 18 Aug 2021 20:05:21 +0200 (CEST) Received: by mail-il1-f179.google.com with SMTP id h29so3168906ila.2 for ; Wed, 18 Aug 2021 11:05:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=tYjNEfnvt+WKBzvmqmOfcYYV+jLdwtYXI2ciyN97Ol4=; b=EJkIX0qjEcEp5l5Yt1lWybuxOcBQITP+Fu3dujAuxd4snB/vN37ORWYlNOpAc03LSc RlqNNLG89b8I31jgY38UMmBP9F1L5o6BPR96qS//fHw70iFPFg4y501MpPcUWsPDDrBr B/zPBpz/AdbAlg5/Xj/CjQH/hDFVsRc5TiicnzJx3HGy3s+gCK9owozIMC0YY8wZrmVL YGbK0oXqwnkq5YCghQ5G9OVJPca719aiJ7EzxmkoINFNf5BQLvviOAPM16cAtSlOCqG8 FXFIkzQyyc9O3tH/f1fcF450IjWcwu8l2x5RIdeVElfj1BbWuO958DVKYxB4eXoEbEoA AYaA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=tYjNEfnvt+WKBzvmqmOfcYYV+jLdwtYXI2ciyN97Ol4=; b=ajv8s7ONjGCxjhNQgbysUnFErA4ESfVmQmGv8ky28uBYzQj4nkyJtJFay+cXvklD9Y AQI+evchZLqrJ4H03b9Qd0yVZhRVUMKhb414/cC4iL+sfxTKaNWDMhxVvOVD5nSc8lJS xC6CmlplntaL3ZeERSlnxR02bfQ41NKDfSwcwBN0jqOXkVVtb0+JroqdLa7dD67C1M0G a9CboveAGdO17ojb+hc7lUsBIqbs3ql8iO0RWKOJcxjLpSDwyxeUUvFewdRXp5Bcqu3l 7y0qeb3ckc3I4y37xNyKF8Lmfvu5qnGN7vGH792w2aecJ9jyJhpmPASWemxe+DwTp3zK 5eLg== X-Gm-Message-State: AOAM532YwZVdCahLIpMPM1LWKZzHefIJ3zUKrrUn9Zy/fKJPOju+J1BO d8fCtircJTegKeOBARo34/WUI0a707zUOLrlrWM= X-Google-Smtp-Source: ABdhPJwHCo6hu12leFQIqaBaHvniHIwchfCaqjk3T2ueItxQE2lZYRNRbQf3UY7dsWqsiUCwCrHD5moCJfq4OMRAAmg= X-Received: by 2002:a92:d70f:: with SMTP id m15mr6828503iln.162.1629309920920; Wed, 18 Aug 2021 11:05:20 -0700 (PDT) MIME-Version: 1.0 References: <20210730084938.2426128-2-jerinj@marvell.com> <20210817032723.3997054-1-jerinj@marvell.com> <20210817032723.3997054-2-jerinj@marvell.com> <20210816205345.6d686c7d@hermes.local> <20210817080924.7049fa2d@hermes.local> <20210817085231.16be26c5@hermes.local> <20210818094641.2fe829ba@hermes.local> In-Reply-To: <20210818094641.2fe829ba@hermes.local> From: Jerin Jacob Date: Wed, 18 Aug 2021 23:34:54 +0530 Message-ID: To: Stephen Hemminger Cc: Jerin Jacob , dpdk-dev , Bruce Richardson , Ray Kinsella , Thomas Monjalon , David Marchand , Dmitry Kozlyuk , Narcisa Ana Maria Vasile , "Dmitry Malloy (MESHCHANINOV)" , Pallavi Kadam , "Ananyev, Konstantin" , "Ruifeng Wang (Arm Technology China)" , Jan Viktorin , David Christensen Content-Type: text/plain; charset="UTF-8" Subject: Re: [dpdk-dev] [PATCH v2 1/6] eal: introduce oops handling API X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On Wed, Aug 18, 2021 at 10:16 PM Stephen Hemminger wrote: > > On Wed, 18 Aug 2021 15:07:25 +0530 > Jerin Jacob wrote: > > > On Tue, Aug 17, 2021 at 9:22 PM Stephen Hemminger > > wrote: > > > > > > On Tue, 17 Aug 2021 20:57:50 +0530 > > > Jerin Jacob wrote: > > > > > > > On Tue, Aug 17, 2021 at 8:39 PM Stephen Hemminger > > > > wrote: > > > > > > > > > > On Tue, 17 Aug 2021 13:08:46 +0530 > > > > > Jerin Jacob wrote: > > > > > > > > > > > On Tue, Aug 17, 2021 at 9:23 AM Stephen Hemminger > > > > > > wrote: > > > > > > > > > > > > > > On Tue, 17 Aug 2021 08:57:18 +0530 > > > > > > > wrote: > > > > > > > > > > > > > > > From: Jerin Jacob > > > > > > > > > > > > > > > > Introducing oops handling API with following specification > > > > > > > > and enable stub implementation for Linux and FreeBSD. > > > > > > > > > > > > > > > > On rte_eal_init() invocation, the EAL library installs the > > > > > > > > oops handler for the essential signals. > > > > > > > > The rte_oops_signals_enabled() API provides the list > > > > > > > > of signals the library installed by the EAL. > > > > > > > > > > > > > > This is a big change, and many applications already handle these > > > > > > > signals themselves. Therefore adding this needs to be opt-in > > > > > > > and not enabled by default. > > > > > > > > > > > > In order to avoid every application explicitly register this > > > > > > sighandler and to cater to the > > > > > > co-existing application-specific signal-hander usage. > > > > > > The following design has been chosen. (It is mentioned in the commit log, > > > > > > I will describe here for more clarity) > > > > > > > > > > > > Case 1: > > > > > > a) The application installs the signal handler prior to rte_eal_init(). > > > > > > b) Implementation stores the application-specific signal and replace a > > > > > > signal handler as oops eal handler > > > > > > c) when application/DPDK get the segfault, the default EAL oops > > > > > > handler gets invoked > > > > > > d) Then it dumps the EAL specific message, it calls the > > > > > > application-specific signal handler > > > > > > installed in step 1 by application. This avoids breaking any contract > > > > > > with the application. > > > > > > i.e Behavior is the same current EAL now. > > > > > > That is the reason for not using SA_RESETHAND(which call SIG_DFL after > > > > > > eal oops handler instead > > > > > > application-specific handler) > > > > > > > > > > > > Case 2: > > > > > > a) The application install the signal handler after rte_eal_init(), > > > > > > b) EAL hander get replaced with application handle then the application can call > > > > > > rte_oops_decode() to decode. > > > > > > > > > > > > In order to cater the above use case, rte_oops_signals_enabled() and > > > > > > rte_oops_decode() > > > > > > provided. > > > > > > > > > > > > Here we are not breaking any contract with the application. > > > > > > Do you have concerns about this design? > > > > > > > > > > In our application as a service it is important not to do any backtrace > > > > > in production. We rely on other infrastructure to process coredumps. > > > > > > > > Other infrastructure will work. For example, If we are using standard coredump > > > > using linux infra. In Current implementation, > > > > - EAL handler dump the DPDK OOPS like kernel on stderr > > > > - Implementation calls SIG_DFL in eal oops handler > > > > - The above step creates the coredump or re-directs any other > > > > infrastructure you are using for coredump. > > > > > > > > > > > > > > This should be controlled enabled by a command line argument. > > > > > > > > If we allow other infrastructure coredump to work as-is, why > > > > enable/disable required from eal? > > > > > > The addition of DPDK OOPS adds additional steps which make all > > > faults be identified as the oops code. > > > > Since we are using SA_ONSTACK it is not losing the original segfault > > info. > > > > I verified like this, Please find below the steps. > > > > 0) Enable coredump infra in Linux using coredumpctl or so > > 1) Apply this series > > 2) Apply for the following patch to create a segfault from the library. > > This will test, segfault caught by eal and forward to default Linux singal > > handler. > > > > [main]dell[dpdk.org] $ git diff > > diff --git a/lib/eal/linux/eal.c b/lib/eal/linux/eal.c > > index 3438a96b75..b935c32c98 100644 > > --- a/lib/eal/linux/eal.c > > +++ b/lib/eal/linux/eal.c > > @@ -1338,6 +1338,8 @@ rte_eal_init(int argc, char **argv) > > > > eal_mcfg_complete(); > > > > + /* Generate a segfault */ > > + *(volatile int *)0x05 = 0; > > return fctret; > > > > } > > 3)Build > > meson --buildtype debug build > > ninja -C build > > > > 4) Run > > $ ./build/app/test/dpdk-test --no-huge -c 0x2 > > > > Please find oops dump[1] and gdb core dump backtrace[2]. > > Gdb core dump trace preserves the original segfault cause and trace. > > > > Any other concerns? > > Your new oops handling duplicates existing code in our application > (and I know others that do this as well). The problem is that an > application may do this before calling rte_eal_init and your new > code will break that. Not sure what it breaks, Could you elaborate on this? Your app signal handler will be called with the original signal the info it is registered before rte_eal_init(). We can have an additional API to disable the oops prints if you insist. (Though I don't the know use case where someone needs this other than someone don't want to see/log this print). If that is rational, I can add API to disable oops print it. I prefer to install it by default as it won't break anything and it helps to not add oops API in existing apps i.e without calling any additional features in all existing applications. > > Therefore my recommendation is that the new oops handling needs > to be not a built in feature of EAL. > > >