From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 0B59F4318E for ; Tue, 17 Oct 2023 18:43:51 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id F337F42D76; Tue, 17 Oct 2023 18:43:50 +0200 (CEST) Received: from mail-pf1-f174.google.com (mail-pf1-f174.google.com [209.85.210.174]) by mails.dpdk.org (Postfix) with ESMTP id CF9D140273 for ; Tue, 17 Oct 2023 18:43:48 +0200 (CEST) Received: by mail-pf1-f174.google.com with SMTP id d2e1a72fcca58-6b1d1099a84so4169156b3a.1 for ; Tue, 17 Oct 2023 09:43:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=networkplumber-org.20230601.gappssmtp.com; s=20230601; t=1697561028; x=1698165828; darn=dpdk.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:from:to:cc:subject:date :message-id:reply-to; bh=Ob/8fdVMTMzGk7/dlMjbihe4u7UaFFUDTK6VWSXfPd0=; b=swns/v3ujlGR6TDl/1kqyGo6+KNtIgqxwUhx9mRkhG3LKdTmtIy5P4KVh5lE3cXup4 6Cv53Og0K6wgCN2kzCAEQt2AoXpOyg4LpQQvqKFPYb206Q9AxDE7H8V1xYnADCx9IPog hMiXLhNfIuBhtGTt0ZgidX9vPwMezaMjaikCoB58sO6dd8C7LQflIft1eZDmVn4a3fE5 ODROfmEl4p1VIxAkqGIdeB/wBKslJL3iWontX6et+YSWroc9YcDtL6928vuEFZpJOIUJ TWVOoDyTrxy2oLbrAg0MBMI4q2mjl64Bgni/15mMldInxfQ8xzbY/N/n+eeD30ElgvX1 SlyA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697561028; x=1698165828; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Ob/8fdVMTMzGk7/dlMjbihe4u7UaFFUDTK6VWSXfPd0=; b=Dpt6WuIoXGFMw0rWqZyaamCJbyyQvYFREGJZYu3DD2bER0OCAvyS96qM58cDCCALfE ZD8sC/BfviDUONYdrGSlehun2hF5jTVoD4WaCPLVzX5m3sCrPEWFoLPRiFrFFl//Ci3+ /j+gxA/6f8hWb8h1YUE87lqpQFWKpqTwK/eUb/ERJdosw8vZS7C08t35ZDM2fMwa1wbm icLbCLrj+qIRpa01pyLxCHYxE9N5U7fR3aKgzhxqTFy9C34Ibmvb+ODAl3z7pM300ePb xeta6R+0+5FFIoxkroOZSdxfb37CQpMnJn3+R7EhIRpXGf4YwBsoYKzQV+Tr5KmWPbZ8 3BFw== X-Gm-Message-State: AOJu0YwQ5ThpJKNpZjI2XqWpPjS28aMp/8q9g4ur+HsRDdgBGvoQX1Jp SfWQIE4+SJ3MnEMuNZNJTPo9ZA== X-Google-Smtp-Source: AGHT+IFNdN6khlNLGVLTO1qXvjfLbUCG4sy8gmy44KmuY7KEgTeUkoPWRPN3snpdXTnoXk2Wo2Z6PQ== X-Received: by 2002:a05:6a21:71ca:b0:169:cd02:65e9 with SMTP id ay10-20020a056a2171ca00b00169cd0265e9mr2833297pzc.33.1697561027828; Tue, 17 Oct 2023 09:43:47 -0700 (PDT) Received: from hermes.local (204-195-126-68.wavecable.com. [204.195.126.68]) by smtp.gmail.com with ESMTPSA id 68-20020a630847000000b005b628aa2a8dsm91992pgi.69.2023.10.17.09.43.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Oct 2023 09:43:47 -0700 (PDT) Date: Tue, 17 Oct 2023 09:43:45 -0700 From: Stephen Hemminger To: Vipul Ashri Cc: dev@dpdk.org, stable@dpdk.org, =?UTF-8?B?R2HDq3Rhbg==?= Rivet Subject: Re: [PATCH v2] net/failsafe: link_update request crashing at boot Message-ID: <20231017094345.3f50eda5@hermes.local> In-Reply-To: <952c9880-9ad9-bfa6-e39a-271a06226640@oracle.com> References: <20211021115139.2634-1-vipul.ashri@oracle.com> <20211021214215.1633-1-vipul.ashri@oracle.com> <87c84612-4116-4fe7-a711-f5f364513c3d@www.fastmail.com> <952c9880-9ad9-bfa6-e39a-271a06226640@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-BeenThere: stable@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: patches for DPDK stable branches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: stable-bounces@dpdk.org On Tue, 15 Feb 2022 22:16:28 +0530 Vipul Ashri wrote: > On 2/14/2022 10:24 PM, Stephen Hemminger wrote: > > On Mon, 14 Feb 2022 13:09:19 +0000 > > Vipul Ashri wrote: > > > >> PORT 0 supports 16 rx queues and 16 tx queues (driver_name = net_failsafe, driver_type = 16) > >> > >> PORT 0 is polling for link-change, interrupts disabled > >> > >> [DPDK] tap_flow_create(): Kernel refused TC filter rule creation (17): File exists > > Looks like secondary process support doesn't work with the flow rules logic. > > Maybe after that you are into error paths that may not recover correctly?? > Thanks! Stephen for looking at my analysis, > > yes some hotplug synchronization issue between eal_intr_thread and primary > thread, but we are able to recover with this patch. > > Reason is this fail-safe flow is inside our custom added boot-time > polling to > update DPDK stats and calling ifindex ioctl to get interface data. > Ideally we > should not start polling so early. but moreover calling ifindex ioctl is > generic > functionality and should not break failsafe. We added this patch and > gracefully > prevented the so many multiple crashes. > > Setup details : > Azure testbed with Accelerated Networking(SRIOV) enabled, failsafe using > tap + > mellanox driver. I don't work for Azure anymore, so can't really test this. A short explanation why this patch is stalled. It seems like this patch is trying to avoid a crash when an earlier problem occurred, it is ok to do that but the original problem is still there and the testing it is impossible without having modified application. For the normal user, this just adds more always true checks in the configuration path. Ok, but it does add clutter. Since failsafe should be deprecated fixing this seems less relevant as well.