From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 561D243404 for ; Thu, 30 Nov 2023 22:19:51 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id D821940277; Thu, 30 Nov 2023 22:19:50 +0100 (CET) Received: from mail-pl1-f169.google.com (mail-pl1-f169.google.com [209.85.214.169]) by mails.dpdk.org (Postfix) with ESMTP id 7B73E40266 for ; Thu, 30 Nov 2023 22:19:49 +0100 (CET) Received: by mail-pl1-f169.google.com with SMTP id d9443c01a7336-1cfc2bcffc7so13140295ad.1 for ; Thu, 30 Nov 2023 13:19:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=networkplumber-org.20230601.gappssmtp.com; s=20230601; t=1701379188; x=1701983988; darn=dpdk.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:from:to:cc:subject:date :message-id:reply-to; bh=NZb3WKKG+COEBLDoCy8iaxg4za72WTl9Hi3ZeKtcGF0=; b=wxb5M6pkTUY7YWWy3a0z7pmOOn+FSECshMb55WLgJEWnASFI8Q5YAzK6ygvfQJTvnq 7+vdvPDpyFvL0LsBg52FUcGzV5ZN5um+Q3/TL/KW/L5d/gzzDP7bTTciHCDss14bmN/v eEs8FpYN83LYCqCNTqfuWKfRKya9yZO+JQHOULywLz700i40Jfj1PAFJy3f32xWs1vqr XkGEXJGJLlBXN3/b05XHXTelNR8mgU/mZ/f8Qa4PU38qcyy/z0Pf8XkUdcwds1Obn8RS hYoYj92O/EhO8STxy07A1qYwJ1S7OVORCaROaHjOAJhj2uSVITh4/DhxPPcnQ7aGzxe9 2cFA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701379188; x=1701983988; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=NZb3WKKG+COEBLDoCy8iaxg4za72WTl9Hi3ZeKtcGF0=; b=eMLJyOdIKekf1QLhDx0wMeRY2Hq4ODscqz4iMjZ94FeZ9ea6aj2bdYvmotEIVwAZ+e 6Ua/kwbGw2emDL8ZM1v5u2oh/IkZYU3+HLWVWBW8noVsqSh8SLUNXfukNHZubRzvkZxc sxba/Mv+jJToD7de+MrVVkTVi0OclQ5QOvaP4ZO/hnXSzgEuZmu2/12ePddBY4Fxe9F0 0gT9pCK7BK4pqcQ+0OYoRQijekcEOvA91GXbTU6KxnWI5yZMNkfDDG2bxJYZuIGcdaFB tGDs/rRA8KLcOUhn/9urW6/xpsFPrU4fc99vkH/HKVWatu8OenFxXmqCYtFcpVVIZ98q lNqg== X-Gm-Message-State: AOJu0Yza/9gZ+fOf8OO1GuVpAVXa97Lnawdc01cxXC3Ta/WAImLotVmh CrZklleqtR2VrDSDv2kxJlZQvw== X-Google-Smtp-Source: AGHT+IE5tP9tbPWun35NKKjSiynCQJxEsSSc8QHKE/oHry/iH92FyJhSdZTO0lkyftpF50ysA8F5PA== X-Received: by 2002:a17:902:da88:b0:1cf:c9ca:501c with SMTP id j8-20020a170902da8800b001cfc9ca501cmr17324742plx.5.1701379188380; Thu, 30 Nov 2023 13:19:48 -0800 (PST) Received: from hermes.local (204-195-123-141.wavecable.com. [204.195.123.141]) by smtp.gmail.com with ESMTPSA id i9-20020a170902c94900b001cffe1e7374sm1852116pla.214.2023.11.30.13.19.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Nov 2023 13:19:48 -0800 (PST) Date: Thu, 30 Nov 2023 13:19:46 -0800 From: Stephen Hemminger To: Dmitry Kozlyuk Cc: Fuji Nafiul , users@dpdk.org Subject: Re: how to make dpdk processes tolerable to segmantation fault? Message-ID: <20231130131946.428a070b@hermes.local> In-Reply-To: <20231130192401.2e3f3c4c@sovereign> References: <20231130192401.2e3f3c4c@sovereign> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-BeenThere: users@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK usage discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: users-bounces@dpdk.org On Thu, 30 Nov 2023 19:24:01 +0300 Dmitry Kozlyuk wrote: > 2023-11-30 13:45 (UTC+0600), Fuji Nafiul: > > In a normal c program, I saw that the segmentation fault in 1 loosely > > coupled thread doesn't necessarily affect other threads or the main > > program. There, I can check all the threads by process ID of it in every > > certain period of time and if some unexepected segmentation fault occurs or > > got killed I can re run the thread and it works fine. I can later monitor > > the logs and inspect the situation. > > > > But I saw that, segmentation fault or other unexpected error in remotely > > launched (using DPDK) functions on different core affects the whole dpdk > > process and whole dpdk program crashes.. why is that? > > > > Is there any alternative way to handle this scenario ? How can I take > > measures for unexpected future error occurance where I should auto rerun > > dpdk remote processes in live system? > > Please consider running the buggy code that causes SIGSEGV > in a separate process rather than a thread. > If it must use DPDK, can it be made an independent app? > > DPDK is unlikely to ever support the described scenario. > Continuing to run the process after SIGSEGV is inherently unsafe. > Specifically, DPDK communicates with its lcore threads > using pipes allocated at startup. > If such thread crashed and a SIGSEGV not killing the app was installed, > the communication would hang. > Generally, DPDK employs user-space synchronization primitives, > which cannot recover if one of the threads using them crashes. A couple of things you can do. - run your DPDK application as a systemd service which will be restarted when you crash. - catch SIGSEGV in the application an print a backtrace, then abort.