From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 2E4A9A0A0C; Thu, 22 Jul 2021 17:45:58 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id A10E44014D; Thu, 22 Jul 2021 17:45:57 +0200 (CEST) Received: from wout1-smtp.messagingengine.com (wout1-smtp.messagingengine.com [64.147.123.24]) by mails.dpdk.org (Postfix) with ESMTP id D30AB40040 for ; Thu, 22 Jul 2021 17:45:56 +0200 (CEST) Received: from compute2.internal (compute2.nyi.internal [10.202.2.42]) by mailout.west.internal (Postfix) with ESMTP id DD047320094A; Thu, 22 Jul 2021 11:45:55 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute2.internal (MEProxy); Thu, 22 Jul 2021 11:45:56 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=monjalon.net; h= from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding:content-type; s=fm1; bh= WFrJQfH7Tiz16Cab5L2bKyF9eReTH1WrjL93BrSIfL4=; b=Bxiet743M93Vo5dO hLsRLcXXXdfIUd4qmt6IeYclR4teHiibmMkt2hvEZTqjG1/jD4jQ1aOF3ri+Oje5 vS9wTYo399NhbJLw4tP28Dbl0Cmmv+Mg70XIf+y0Fu4/NKUTNX7rH0Pl8JueX6xN 7IAX+SVt37r/UxX3Keal03/Xt05lh0lwnTatsb40i2g50P1B8WWGxhy1zAel40Ku DweF+LwVQPUYnWfGdmlVex63TZa0gwEErXtSVNDY0lA7u0XtoVd57rWyH45HE0rF Kt9HzB2t1O4bOSqToOUP1M5ttT47b4BWt+tWTvc875kVuYAO+NIVC0hGb2f7sabM OCswrQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm3; bh=WFrJQfH7Tiz16Cab5L2bKyF9eReTH1WrjL93BrSIf L4=; b=Bg1EUh0DbcDo1QusuF8wbozDbn0LwFZSEbh/5oOV3HaHxxN0DGbpYhHrO gp96lC+263HL9OmsEJGc6uSMHXXE+Fd+KT2GMRlpizHmv8fYvj43K0qjNt3wUNBs EBOt5zpbbF2Lo5jb2Xol018n4ig9Jab1bAi97ZTNQPNlS3y/B66/op5DHgJU2nP9 YnBE73jra6p3HDEWngU3mUB1eHXRCJ20rrgRQZxBP8jBN6moFNnr4DxhI0PwO69H hg0UyA0gSJFYmmi0t0QeXfjuOwjtlR48BwMUr8ngD18cPoFB75uskQNNQvKlX9DL rOJ890pUw+HzVWU/HZdo6Wok5aWkQ== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvtddrfeeigdekkecutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc fjughrpefhvffufffkjghfggfgtgesthfuredttddtvdenucfhrhhomhepvfhhohhmrghs ucfoohhnjhgrlhhonhcuoehthhhomhgrshesmhhonhhjrghlohhnrdhnvghtqeenucggtf frrghtthgvrhhnpedugefgvdefudfftdefgeelgffhueekgfffhfeujedtteeutdejueei iedvffegheenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhroh hmpehthhhomhgrshesmhhonhhjrghlohhnrdhnvght X-ME-Proxy: Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 22 Jul 2021 11:45:54 -0400 (EDT) From: Thomas Monjalon To: fengchengwen Cc: Ferruh Yigit , "dev@dpdk.org" Date: Thu, 22 Jul 2021 17:46:12 +0200 Message-ID: <4435152.k7BQ785f6v@thomas> In-Reply-To: <0bc940bb-65e6-1acb-d026-7a2a08a0ad8b@huawei.com> References: <0bc940bb-65e6-1acb-d026-7a2a08a0ad8b@huawei.com> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" Subject: Re: [dpdk-dev] Question about hardware error handling policy X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" 22/07/2021 15:50, fengchengwen: > Hi, all > > I notice ethdev support dev_reset ops, which could be used to recover from > errors, and only 13+ drivers support this function. > And also there is event for reset: RTE_ETH_EVENT_INTR_RESET, and only 6 > drivers support it (most of them are VF). > > This provides users with two ways to handle hardware errors: > a. driver report RTE_ETH_EVENT_INTR_RESET, and application do reset ops. > b. application detect errors (the detection method is unclear), and call > reset ops to recover. > > According to the design of this API, error handling is assigned to the > application, and the driver is only responsible for reporting events. This > simplifies the driver design (for example, the driver does not need to maintain > mutex locks). > > As we know, many modern NICs come with firmware, have PCIE interfaces, > support SR-IOV, the hardware errors can have: firmware reboot/PF reset/ > VF reset/FLR, but these errors(particularly firmware/PF) are not addressed in > most drivers. > > Question 1: what do we think of these errors(particularly firmware/PF)? Do > we think that the probability is very low and that there is no need to deal with > them? Even rare errors must be managed. > Question 2: I prefer to put error handling in the application layer, because > doing it in the driver can make the driver complex, but there is no app to > register the INTR_RESET event handler. I think we can build a standard handler > in testpmd, What do you think? Absolutely. As any ethdev API, it must be tested with testpmd.