From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id B830FA034D; Fri, 28 Jan 2022 13:21:11 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 2743541199; Fri, 28 Jan 2022 13:21:11 +0100 (CET) Received: from relay.smtp-ext.broadcom.com (lpdvsmtp10.broadcom.com [192.19.11.229]) by mails.dpdk.org (Postfix) with ESMTP id 6036240141 for ; Fri, 28 Jan 2022 13:21:09 +0100 (CET) Received: from dhcp-10-123-153-22.dhcp.broadcom.net (bgccx-dev-host-lnx2.bec.broadcom.net [10.123.153.22]) by relay.smtp-ext.broadcom.com (Postfix) with ESMTP id 8033FC0000E9; Fri, 28 Jan 2022 04:21:07 -0800 (PST) DKIM-Filter: OpenDKIM Filter v2.11.0 relay.smtp-ext.broadcom.com 8033FC0000E9 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=broadcom.com; s=dkimrelay; t=1643372468; bh=p2TDQi6MS9WWjh4bLoPZvyTntTkKA8yFfDVbLxex6mc=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=eWBcZ4pEH7dB1YP5wfTVA8ColtlJu0/OvpxUE9+O2xGLdgh4Wduo6e3JPuA61F508 0HMeoWd0Gnl4C43nxDOi6le0sXapMCwoxjG6sJ0mqllKfOQWaN3GnqfVnOto4hlL9/ sumvIDZwoepORKNXd4eQKtL5xKllSmf5VtkCXFZk= From: Kalesh A P To: dev@dpdk.org Cc: ferruh.yigit@intel.com, ajit.khaparde@broadcom.com, asafp@nvidia.com Subject: [dpdk-dev] [PATCH v7 0/4] ethdev: error recovery support Date: Fri, 28 Jan 2022 18:18:26 +0530 Message-Id: <20220128124830.427-1-kalesh-anakkur.purayil@broadcom.com> X-Mailer: git-send-email 2.10.1 In-Reply-To: <20201009034832.10302-1-kalesh-anakkur.purayil@broadcom.com> References: <20201009034832.10302-1-kalesh-anakkur.purayil@broadcom.com> X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org From: Kalesh AP The error recovery solution is a protocol implemented between firmware and bnxt PMD to recover from the fatal errors without a system reboot. There is an alarm thread which constantly monitors the health of the firmware and initiates a recovery when needed. There are two scenarios here: 1. Hardware or firmware encountered an error which firmware detected. Firmware is in operational status here. In this case, firmware can reset the chip and notify the driver about the reset. 2. Hardware or firmware encountered an error but firmware is dead/hung. Firmware is not in operational status. In this case, the only possible way to recover the adapter is through host driver(bnxt PMD). In both cases, bnxt PMD reinitializes with the FW again after the reset. During that recovery process, data path will be halted and any control path operation would fail. So, the PMD has to notify the application about this reset/error event to prevent any activities from the application while the PMD is recovering from the error. While most of the recovery process is transparent to the application since most of the driver ensures recovery from FW reset or FW error conditions, the application will have to reprogram any flows which were offloaded to the underlying hardware. This patch set adds support for the reset and recovery event in the rte_eth_event framework. FW error and FW reset conditions would be managed by the PMD. Driver uses RTE_ETH_EVENT_ERR_RECOVERING event to notify the applications about the FW reset or error. PMD uses the RTE_ETH_EVENT_RECOVERED event to notify application about PMD has recovered from FW reset or FW error. The application should stop polling till it receives the RTE_ETH_EVENT_RECOVERED event from the PMD. v7: Addressed comments from Thomas and Andrew. v6: Addressed comments from Asaf Penso. 1. Updated 20.11 release notes with the new events added. 2. updated testpmd parse_event_printing_config function. v5: Addressed comments from Ophir Munk. 1. Renamed the new event name to RTE_ETH_EVENT_ERR_RECOVERING. 2. Fixed testpmd logs. 3. Documented the new recovery events. v4: Addressed comments from Thomas Monjalon 1. Added doxygen comments about new events. V3: Fixed a typo in commit log. V2: Added a new event RTE_ETH_EVENT_RESET instead of using the RTE_ETH_EVENT_INTR_RESET to notify applications about device reset. Kalesh AP (4): ethdev: support device reset and recovery events app/testpmd: handle device recovery event net/bnxt: notify applications about device reset/recovery doc: update release notes app/test-pmd/parameters.c | 8 ++++++-- app/test-pmd/testpmd.c | 10 +++++++++- doc/guides/prog_guide/poll_mode_drv.rst | 24 ++++++++++++++++++++++++ doc/guides/rel_notes/release_22_03.rst | 15 +++++++++++++++ drivers/net/bnxt/bnxt_cpr.c | 4 ++++ drivers/net/bnxt/bnxt_ethdev.c | 8 +++++++- lib/ethdev/rte_ethdev.h | 18 ++++++++++++++++++ 7 files changed, 83 insertions(+), 4 deletions(-) -- 2.10.1