From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 0CEABA0524; Thu, 18 Feb 2021 16:32:56 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 9869E40040; Thu, 18 Feb 2021 16:32:55 +0100 (CET) Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by mails.dpdk.org (Postfix) with ESMTP id 3317A4003D for ; Thu, 18 Feb 2021 16:32:52 +0100 (CET) IronPort-SDR: fAb1Zrk/m0TH3y/1exq5zbFaJauF54fyh6xdkKVo3KhQQSs1mCUqLJo4hB3V5EoGqrugwIdlaE QgfZSnprk8vA== X-IronPort-AV: E=McAfee;i="6000,8403,9898"; a="183600655" X-IronPort-AV: E=Sophos;i="5.81,187,1610438400"; d="scan'208";a="183600655" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Feb 2021 07:32:51 -0800 IronPort-SDR: PcGKIWMv/btAMTWm4J9yaPyOY7D39Tonz7IXrfoEm4n+M++f1hj3uukmsJAQ3xzYVh9nufwd+a KyBWt0rwmnqg== X-IronPort-AV: E=Sophos;i="5.81,187,1610438400"; d="scan'208";a="385999046" Received: from fyigit-mobl1.ger.corp.intel.com (HELO [10.252.26.139]) ([10.252.26.139]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Feb 2021 07:32:49 -0800 To: Andrew Rybchenko , Thomas Monjalon , Kalesh A P Cc: dev@dpdk.org, Ajit Khaparde , Ori Kam , Asaf Penso References: <20200122101654.20824-1-kalesh-anakkur.purayil@broadcom.com> <20201009034832.10302-1-kalesh-anakkur.purayil@broadcom.com> <20201009034832.10302-2-kalesh-anakkur.purayil@broadcom.com> <2251820.FElH11MiGr@thomas> From: Ferruh Yigit X-User: ferruhy Message-ID: <089bbd4f-b1a6-2631-f601-982e266708c3@intel.com> Date: Thu, 18 Feb 2021 15:32:45 +0000 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit Subject: Re: [dpdk-dev] [PATCH v6 1/3] ethdev: support device reset and recovery events X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 10/12/2020 9:09 AM, Andrew Rybchenko wrote: > On 10/12/20 12:29 AM, Thomas Monjalon wrote: >> 09/10/2020 05:48, Kalesh A P: >>> From: Kalesh AP >>> >>> Adding support for device reset and recovery events in the >>> rte_eth_event framework. FW error and FW reset conditions would be >>> managed internally by PMD without needing application intervention. >>> In such cases, PMD would need reset/recovery events to notify application >>> that PMD is undergoing a reset. >>> >>> Signed-off-by: Somnath Kotur >>> Signed-off-by: Kalesh AP >>> Reviewed-by: Ajit Khaparde >>> Reviewed-by: Asaf Penso >> >> The ethdev maintainers are not Cc'ed. >> Please use the option --cc-cmd devtools/get-maintainer.sh >> >> >>> +Error recovery support >>> +~~~~~~~~~~~~~~~~~~~~~~ >>> + >>> +When the PMD detects a FW reset or error condition, it will try to recover >>> +from the error without needing the application intervention. In such cases, >>> +PMD would need events to notify the application that it is undergoing >>> +an error recovery. >>> + >>> +The PMD will trigger RTE_ETH_EVENT_ERR_RECOVERING event to notify the >>> +application that PMD detected a FW reset or FW error condition. PMD will >>> +try to recover from the error by itself. Data path will be halted and >>> +control path operations would fail during the recovery period. >>> + >>> +The PMD will trigger RTE_ETH_EVENT_RECOVERED event to notify the application >>> +that the it has recovered from the error condition. Control path and data path >>> +are up now. Since the device undergone a reset, flow rules offloaded prior to >>> +the reset will be lost and the application has to recreate the rules again. > > What should be done if the state is not recoverable? > >>> diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h >>> index 9759f13..9b4b015 100644 >>> --- a/lib/librte_ethdev/rte_ethdev.h >>> +++ b/lib/librte_ethdev/rte_ethdev.h >>> @@ -3207,6 +3207,23 @@ enum rte_eth_event_type { >>> RTE_ETH_EVENT_DESTROY, /**< port is released */ >>> RTE_ETH_EVENT_IPSEC, /**< IPsec offload related event */ >>> RTE_ETH_EVENT_FLOW_AGED,/**< New aged-out flows is detected */ >>> + RTE_ETH_EVENT_ERR_RECOVERING, >>> + /**< port recovering from an error >>> + * >>> + * PMD detected a FW reset or error condition. >>> + * PMD will try to recover from the error. >>> + * Data path will be halted and Control path operations >>> + * would fail at this time. >>> + */ >> >> Does it mean the application has nothing to do when receiving this event? >> I think the app should stop polling at least. >> >>> + RTE_ETH_EVENT_RECOVERED, >>> + /**< port recovered from an error >>> + * >>> + * PMD has recovered from the error condition. >>> + * Control path and Data path are up now. >>> + * Since the device undergone a reset, flow rules >>> + * offloaded prior to the reset will be lost and >>> + * the application has to recreate the rules again. >>> + */ >> >> Please be more precise. >> Should the app re-configure the port, setup the queues, start the port? >> >> > Hi Kalesh Anakkur, The mechanics of notifying the application looks good, but the concerns seems more about what application should do with this information. PMD notifies the application on the FW/HW reset and pushes some tasks/responsibilities to the application, but for this to be useful, these tasks should be clear to application. Think yourself in a situation that you are developing an application and you received these events from a device that you don't know its internals, what will you do? Both Thomas and Andrew put cases that needs more clarification for application. Can you please send a new version with those clarifications? Thanks, ferruh