From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id ECD67A0C4C; Wed, 18 Aug 2021 18:46:47 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id B0DB541134; Wed, 18 Aug 2021 18:46:47 +0200 (CEST) Received: from mail-pj1-f48.google.com (mail-pj1-f48.google.com [209.85.216.48]) by mails.dpdk.org (Postfix) with ESMTP id 9A13D410F2 for ; Wed, 18 Aug 2021 18:46:45 +0200 (CEST) Received: by mail-pj1-f48.google.com with SMTP id mq2-20020a17090b3802b0290178911d298bso2751051pjb.1 for ; Wed, 18 Aug 2021 09:46:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=networkplumber-org.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=yYydaNO+xZQlwutDBl+npIHPGAyXaBhtE1DemHFHDSQ=; b=g0lqE8VuhoMSge8dNWD0MkLYLg7Jdq7qToi4IawoPsSzy2fKFhM7+1cDGHd/WWi0Nu QAfq+uucpSzzlmDC7XXUelwFK1bZcwxkaTse9pvO/dOzZ5ggum+5Qd61/sJ89k9e6I8L 6joW2EStOzeNvO5DL3D4usOVnMxWr7NbFdftirsqGKQ4gnLt/Qricw6p7i2/W9SO75GN 8QB+RalYQKsRrnSBcoaTgZgb2YrtsovVaBL2xAoFw1IVoXYih6vxMIrQZqtGLkhW77EJ C0lGYRR4Tah43xgtAf/CuRM8izSlYk71JaNlRjskjN92tbhasAUHF68UUOQEVqriKni8 UK7Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=yYydaNO+xZQlwutDBl+npIHPGAyXaBhtE1DemHFHDSQ=; b=rrebHqIU2Ax/4Tz80rb38Z04LtD3ALKY3zGNfybrQkPGwRy7RhiDxOvsvtRSKWJv9J bJMbtcYoyrpKjZ1pMeV4dCMdCWl2vN9/jRq25zMuFJfxISkKMeJew7wjCsZV6f8HysMm C3G0XBqkDhlAe6uqeHINEKPVunWVtzIoStji0c7Yq+0WbD0BvHIb5hdeVVZ7r3HcknAZ 2dCi1CoAn6U7VpdUdPSGR3TeMl3z0JjpEx7y6jOtTnWpu8rYHEh/rY5L7gqBmfFT2Che C2ak5NGYH8EMCHd5kMiBN18YC0cMc8/BcEKepjP2IoWYfUduhl5w5v6ONXZgjNhgC42K /eVg== X-Gm-Message-State: AOAM530ItB4sJsGiq6RQhoSEJPEJq7aLoaC9ld5b259w23vr4WPJz8BQ cVu1D11cWRrEyn5uyWQNuIC1cw== X-Google-Smtp-Source: ABdhPJzDbsWACYTY8fOXrG6L20pPB1nSv7voMCvKSAFm+Bj3baffNG+BpYyjEXfjo0TomaRYT5dxzQ== X-Received: by 2002:a17:90a:6684:: with SMTP id m4mr10428596pjj.226.1629305204743; Wed, 18 Aug 2021 09:46:44 -0700 (PDT) Received: from hermes.local (204-195-33-123.wavecable.com. [204.195.33.123]) by smtp.gmail.com with ESMTPSA id g4sm275387pgs.42.2021.08.18.09.46.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 18 Aug 2021 09:46:44 -0700 (PDT) Date: Wed, 18 Aug 2021 09:46:41 -0700 From: Stephen Hemminger To: Jerin Jacob Cc: Jerin Jacob , dpdk-dev , Bruce Richardson , Ray Kinsella , Thomas Monjalon , David Marchand , Dmitry Kozlyuk , Narcisa Ana Maria Vasile , "Dmitry Malloy (MESHCHANINOV)" , Pallavi Kadam , "Ananyev, Konstantin" , "Ruifeng Wang (Arm Technology China)" , Jan Viktorin , David Christensen Message-ID: <20210818094641.2fe829ba@hermes.local> In-Reply-To: References: <20210730084938.2426128-2-jerinj@marvell.com> <20210817032723.3997054-1-jerinj@marvell.com> <20210817032723.3997054-2-jerinj@marvell.com> <20210816205345.6d686c7d@hermes.local> <20210817080924.7049fa2d@hermes.local> <20210817085231.16be26c5@hermes.local> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [PATCH v2 1/6] eal: introduce oops handling API X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On Wed, 18 Aug 2021 15:07:25 +0530 Jerin Jacob wrote: > On Tue, Aug 17, 2021 at 9:22 PM Stephen Hemminger > wrote: > > > > On Tue, 17 Aug 2021 20:57:50 +0530 > > Jerin Jacob wrote: > > > > > On Tue, Aug 17, 2021 at 8:39 PM Stephen Hemminger > > > wrote: > > > > > > > > On Tue, 17 Aug 2021 13:08:46 +0530 > > > > Jerin Jacob wrote: > > > > > > > > > On Tue, Aug 17, 2021 at 9:23 AM Stephen Hemminger > > > > > wrote: > > > > > > > > > > > > On Tue, 17 Aug 2021 08:57:18 +0530 > > > > > > wrote: > > > > > > > > > > > > > From: Jerin Jacob > > > > > > > > > > > > > > Introducing oops handling API with following specification > > > > > > > and enable stub implementation for Linux and FreeBSD. > > > > > > > > > > > > > > On rte_eal_init() invocation, the EAL library installs the > > > > > > > oops handler for the essential signals. > > > > > > > The rte_oops_signals_enabled() API provides the list > > > > > > > of signals the library installed by the EAL. > > > > > > > > > > > > This is a big change, and many applications already handle these > > > > > > signals themselves. Therefore adding this needs to be opt-in > > > > > > and not enabled by default. > > > > > > > > > > In order to avoid every application explicitly register this > > > > > sighandler and to cater to the > > > > > co-existing application-specific signal-hander usage. > > > > > The following design has been chosen. (It is mentioned in the commit log, > > > > > I will describe here for more clarity) > > > > > > > > > > Case 1: > > > > > a) The application installs the signal handler prior to rte_eal_init(). > > > > > b) Implementation stores the application-specific signal and replace a > > > > > signal handler as oops eal handler > > > > > c) when application/DPDK get the segfault, the default EAL oops > > > > > handler gets invoked > > > > > d) Then it dumps the EAL specific message, it calls the > > > > > application-specific signal handler > > > > > installed in step 1 by application. This avoids breaking any contract > > > > > with the application. > > > > > i.e Behavior is the same current EAL now. > > > > > That is the reason for not using SA_RESETHAND(which call SIG_DFL after > > > > > eal oops handler instead > > > > > application-specific handler) > > > > > > > > > > Case 2: > > > > > a) The application install the signal handler after rte_eal_init(), > > > > > b) EAL hander get replaced with application handle then the application can call > > > > > rte_oops_decode() to decode. > > > > > > > > > > In order to cater the above use case, rte_oops_signals_enabled() and > > > > > rte_oops_decode() > > > > > provided. > > > > > > > > > > Here we are not breaking any contract with the application. > > > > > Do you have concerns about this design? > > > > > > > > In our application as a service it is important not to do any backtrace > > > > in production. We rely on other infrastructure to process coredumps. > > > > > > Other infrastructure will work. For example, If we are using standard coredump > > > using linux infra. In Current implementation, > > > - EAL handler dump the DPDK OOPS like kernel on stderr > > > - Implementation calls SIG_DFL in eal oops handler > > > - The above step creates the coredump or re-directs any other > > > infrastructure you are using for coredump. > > > > > > > > > > > This should be controlled enabled by a command line argument. > > > > > > If we allow other infrastructure coredump to work as-is, why > > > enable/disable required from eal? > > > > The addition of DPDK OOPS adds additional steps which make all > > faults be identified as the oops code. > > Since we are using SA_ONSTACK it is not losing the original segfault > info. > > I verified like this, Please find below the steps. > > 0) Enable coredump infra in Linux using coredumpctl or so > 1) Apply this series > 2) Apply for the following patch to create a segfault from the library. > This will test, segfault caught by eal and forward to default Linux singal > handler. > > [main]dell[dpdk.org] $ git diff > diff --git a/lib/eal/linux/eal.c b/lib/eal/linux/eal.c > index 3438a96b75..b935c32c98 100644 > --- a/lib/eal/linux/eal.c > +++ b/lib/eal/linux/eal.c > @@ -1338,6 +1338,8 @@ rte_eal_init(int argc, char **argv) > > eal_mcfg_complete(); > > + /* Generate a segfault */ > + *(volatile int *)0x05 = 0; > return fctret; > > } > 3)Build > meson --buildtype debug build > ninja -C build > > 4) Run > $ ./build/app/test/dpdk-test --no-huge -c 0x2 > > Please find oops dump[1] and gdb core dump backtrace[2]. > Gdb core dump trace preserves the original segfault cause and trace. > > Any other concerns? Your new oops handling duplicates existing code in our application (and I know others that do this as well). The problem is that an application may do this before calling rte_eal_init and your new code will break that. Therefore my recommendation is that the new oops handling needs to be not a built in feature of EAL.