From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 55486A0C45; Fri, 23 Jul 2021 14:33:56 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id C8B1B40040; Fri, 23 Jul 2021 14:33:55 +0200 (CEST) Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by mails.dpdk.org (Postfix) with ESMTP id A67E24003C for ; Fri, 23 Jul 2021 14:33:53 +0200 (CEST) X-IronPort-AV: E=McAfee;i="6200,9189,10053"; a="211874661" X-IronPort-AV: E=Sophos;i="5.84,264,1620716400"; d="scan'208";a="211874661" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Jul 2021 05:33:52 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.84,264,1620716400"; d="scan'208";a="463154385" Received: from fmsmsx605.amr.corp.intel.com ([10.18.126.85]) by orsmga008.jf.intel.com with ESMTP; 23 Jul 2021 05:33:51 -0700 Received: from fmsmsx612.amr.corp.intel.com (10.18.126.92) by fmsmsx605.amr.corp.intel.com (10.18.126.85) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2242.4; Fri, 23 Jul 2021 05:33:51 -0700 Received: from fmsmsx609.amr.corp.intel.com (10.18.126.89) by fmsmsx612.amr.corp.intel.com (10.18.126.92) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2242.10; Fri, 23 Jul 2021 05:33:51 -0700 Received: from FMSEDG603.ED.cps.intel.com (10.1.192.133) by fmsmsx609.amr.corp.intel.com (10.18.126.89) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2242.10 via Frontend Transport; Fri, 23 Jul 2021 05:33:51 -0700 Received: from NAM02-DM3-obe.outbound.protection.outlook.com (104.47.56.49) by edgegateway.intel.com (192.55.55.68) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2242.10; Fri, 23 Jul 2021 05:33:51 -0700 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=EAY541qTsPCJO/xlKGtAZFPeEN4zd4PDKiBmXoK6UVskjFcYLZ7ug/X55gDkPaLZkKFobT9GggZfAjF3srf3VOJCipVdUIciPcGvFmeo23llqlTuaiRTkrgBTf+WIiiHISeVIGVdUtp9MdO5AOVV1Uo6AOrWr1MhE/I1/pPzkbzOLtup0UzdH3xsH/2126lAye2eA3Mv4GjdrxXrdSDQjPv23OTtgUA1XZOtwltlPvqwzwSEiGSnf24I8f/BN4JDteeXlHj1kaC/JV3EhUlwEF1tTAFu9ieUa1vYxCg5yYfF0hk793ECCn5QMLVUbE8eOqlxD3Sbx/tvqEwiCkrCyQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=k/m5ZHWDy8TpU5BPpzgJvurquTVjbuYlzbNft5VYe48=; b=UlHG0j+k2SuJcwRp1GaEPrLLb3kOqhHZfG9YxOLHGCfMDfJ9TKOEJy1zk4vJowo3p/lD4Ii3YYW2yAseUYlGv1j0DSn8lQTNz/KffMUsLWPZ9K/qCPdDtZFUseLGMGMZRu5VHHweBSWS3GPRW7w1yhmyOx11Mw29cqNPhZgY2pQj4n7mR2/zyZTA6QecnsFQyeozI4zJfB+1o07Y5MHQ+cd4GcGy4eHmHZ0nMEOyTQw28kfc18XS7iCVec6aUet2BdNP45xgTGjuCRjfenos11EGOda4HDBcMov85YXcadoXntIbEpFlEna5clxmM4YgP+r8vTe/aFh7gDO2KlbjBg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com; dkim=pass header.d=intel.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel.onmicrosoft.com; s=selector2-intel-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=k/m5ZHWDy8TpU5BPpzgJvurquTVjbuYlzbNft5VYe48=; b=kJRmEsq91b5CiQqUZbWzbWZn+twiHI+y7ylRCJmZKugdiNoyGHqSGYIb3vCBj35JYeuEUtPcLgXg8xGV6YcTOVH5STlLa7VBBGKdxaEZdImfD4ZpVtHXyRHJ5y/tlvlRkt5mhK4syji0sKOdagTucR9ucOVThqsK7xGVLpw6kpY= Authentication-Results: oktetlabs.ru; dkim=none (message not signed) header.d=none;oktetlabs.ru; dmarc=none action=none header.from=intel.com; Received: from PH0PR11MB5000.namprd11.prod.outlook.com (2603:10b6:510:41::19) by PH0PR11MB4808.namprd11.prod.outlook.com (2603:10b6:510:39::9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4352.25; Fri, 23 Jul 2021 12:33:50 +0000 Received: from PH0PR11MB5000.namprd11.prod.outlook.com ([fe80::bde5:66de:e755:c5bb]) by PH0PR11MB5000.namprd11.prod.outlook.com ([fe80::bde5:66de:e755:c5bb%5]) with mapi id 15.20.4352.029; Fri, 23 Jul 2021 12:33:50 +0000 To: Thomas Monjalon , fengchengwen CC: "dev@dpdk.org" , Andrew Rybchenko References: <0bc940bb-65e6-1acb-d026-7a2a08a0ad8b@huawei.com> <4435152.k7BQ785f6v@thomas> From: Ferruh Yigit X-User: ferruhy Message-ID: <6e220d0b-5683-ee12-bdab-1ef78d19ebdc@intel.com> Date: Fri, 23 Jul 2021 13:33:44 +0100 In-Reply-To: <4435152.k7BQ785f6v@thomas> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit X-ClientProxiedBy: LO2P123CA0086.GBRP123.PROD.OUTLOOK.COM (2603:10a6:600:138::19) To PH0PR11MB5000.namprd11.prod.outlook.com (2603:10b6:510:41::19) MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 Received: from [192.168.0.206] (37.228.236.146) by LO2P123CA0086.GBRP123.PROD.OUTLOOK.COM (2603:10a6:600:138::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4352.28 via Frontend Transport; Fri, 23 Jul 2021 12:33:49 +0000 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 6305ab5f-b457-4ee6-d87e-08d94dd61fb1 X-MS-TrafficTypeDiagnostic: PH0PR11MB4808: X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:10000; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 4vPJBfgxTIGNkBWlno+wkLGi7/28rJl8GigoAZLZ2Yu8vDIJ4Ztuii9tnh54SygQ7KyvV7WNZqBXOc/FET5eds6mLmIAuxrvj72VKcau2Y9DOXitkzJVYTMlzuxqlJzAXFr2SRv4dK18IxQUhhNLKeVpb6yWKPfWAgwBrzAMq1hBqjPJqsDNkJnVJO4xwm/vry0toxzs0Aiiv3HyHvdU3C+Y6Hz7mki2Kdw+QLVDAiQI6aczkYwmfC/B1cSVEhgsnwkBfF22ZSTNT0MaUjS3N4gve5F7060PlSishX8wrYeStsl/Vt6Np0CQ2SMPGSnv1BnLkbfzGdrxlnnTJr7bfaZkVFHECLZzgNsbnH+7yukOI+FuOH9iZTgBIdNPtIn91w8so6QE3HDMHv1umF3XEqLw7Y+k6DovgghbqsRdmil0rC4cONYYs9iV29w+0sz+0wLhRDv5j2zgdRUZQ80zyaQqHV1S8dK3Kf85ezUINYpC1H2XjgrBLjwJJXlkDahmCQ0l7AguPll7D3rj3v9sVeaakjSh/fSHEN1pLeq1MkjCA46QFn3GTp1GupeOHe6cLNnFGxOdSM4X4vysfGssAfOR1xWHsKrKocx/+sDi9kXBEkZz9HFrkpsZPqmgAJepMG/eIQmlyq6G6PKxqyh+b3Dd9TBhouYOBcVrqd4gm7rIIb2wJk94O+j20yNqccRpRi7r/BsKur1vrYp15g1UKQ== X-Forefront-Antispam-Report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:PH0PR11MB5000.namprd11.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(366004)(396003)(136003)(376002)(39860400002)(346002)(31686004)(316002)(54906003)(16576012)(110136005)(44832011)(53546011)(6486002)(2906002)(5660300002)(38100700002)(478600001)(86362001)(83380400001)(31696002)(8936002)(8676002)(36756003)(2616005)(956004)(26005)(66946007)(66476007)(4326008)(6666004)(66556008)(186003)(45980500001); DIR:OUT; SFP:1102; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?B?b0YvbmdNYXZkL1pQdlhSSkV1UVBFNDBMMTF4VmQ5bndiY3lqZC9EUUhOcm55?= =?utf-8?B?YWZkZFVaQkNkUUJkNzc2bVhRZGFWZDRtMDdYSDd5RVBBekVXSWNlajQ3UHhv?= =?utf-8?B?M0R4c1pxblJJUFZxbFpHMmVZc0g2RGg1b3NsS0JKN285Q3BWRk9XUCtoK2tz?= =?utf-8?B?MWozZ0JTVWFYcVJ5ZXBSZWRQanQzQXM0MTJpaHJKblMyS2JsUnFYRS9sSFNM?= =?utf-8?B?WWFiRnpOYTdaWnpqMGVlZzcwQ3VQVUJNLzB4RUhMNzMvak5PZ3kzcGh0R3g4?= =?utf-8?B?cExCMFVWZ0FyNkFTQ0hGdGFUSENaUGg5M1IyRk1BWCs4Z3N3c04yRUppTnAw?= =?utf-8?B?aWpad3FiU0Y4L3ZXZ0hLWU1ReGpSbXEvRnBJWmVpRzNLYnllRkJSRHJCMG0y?= =?utf-8?B?cThpMjdLT2QraUtBZ2hpeVpnVkozbmJpNmhyODB1N0pHVVE3VGhWa1loWEM1?= =?utf-8?B?V3A1S3crL2E0NDc5SXFNVnIyZVQ5VXdvUWZIdWZuUWpoWDNwTy9HUUJXTjhL?= =?utf-8?B?RW1HcG1EODNXMkpSZFBlNmxHRzJxQm8wQVpOdzFGL0RqTFJ1aVBwN0I2WWZs?= =?utf-8?B?MmE1R3E4WGRmdGtQS1Y2WGQyMDl3N2pVWDFPc29JalJWWkp0QUxsVm5OZG9X?= =?utf-8?B?UytjeTVaNVBRM3J2b2lhN0MxcEx0UDlSQnhqdGdhcGRaL0djcXJ6UlkxM0w0?= =?utf-8?B?eWIwTjltRk9BK2l2UnJLdU5MSk9PR0dreE4xWGZJS29VV3crenlxdkdPRUgr?= =?utf-8?B?VEtmNWU4NFRuclkxTWYvZGMrbmNRTktVZ3JUNnMvZDVUVmM2d0grYyt3K3Vm?= =?utf-8?B?S2JsN2hWOXA1SmFsN0JxNHhFRDRRS3g4U0pUVVVuZHVNcFpadVNlMVI4SE9x?= =?utf-8?B?dmFRall5RkRoZHh1NEJ6cU1uUVBTMkpmV041d0JDTGtyaWpOOU1tbGREQi9D?= =?utf-8?B?cXJ3cXJiUkF1SmtGYys4b3VIcTJUWDU4WnVzakxPUnBCWmxnOXplcjI4VEVM?= =?utf-8?B?Tkw4a1dkZXZjaUxZUTZSaTVVWEdkdDNocjNYcmdZV0VIbCtidzBkTkV1dCtN?= =?utf-8?B?M0cvMEtjMVZxcWpPd1pmYjJodm8xaVhwUUNhdXQ1akdVSWpzcXFuNCtRK2JM?= =?utf-8?B?NUh6NWZDN0tHdTRaLzNjOGFwdFlCUlZ5aEtDN09TdDl2TmRUd2JnSGpPVVZM?= =?utf-8?B?UHhHeUN2YmJSQkM1N0tBdUZXSXIwM3ZFdFo0Q1JpRnZoYTZYTndzaXRqYzQ1?= =?utf-8?B?bGZzMXd0MkhmVEpsU1J1NTYwZ0EyeDBMeCt3eTdXK0dudHMyakpkSUhkdVJX?= =?utf-8?B?VWdScVRMVElKbjJ2K0x0cXA0N29DemgwbitVK1hKWXJSdCszejRSbi80b3M5?= =?utf-8?B?SDU1Um1ieEdwOGIrSjM2b0tUMWV4TEI5SGxQQ2w5MnZHZU4xZkE5WnlhNlRW?= =?utf-8?B?NVhFendONWh2MlhFaUJoNGhVOWIwUDhCYi95Q0ZVblpyY1NqUHhleGtlek90?= =?utf-8?B?UmIrOVFKeG1KN01DSzJPOEVmanFHL25IbWx4Q1l2VVhrVmFacS83elg0cVMr?= =?utf-8?B?SWJ1MDBzczZaZ0ltRUYxWlBIbEVlbXJ1ZmsvZy81anRKQXhJVytKQ282RFdm?= =?utf-8?B?dmdEY0ZRdVVKdWRmRnJ4Y2VPMGJjZmRTWldOUkcvWTlRMG14dkIvQTdHZ0tv?= =?utf-8?B?YkRHcTFMR0tlUGtidU5WaGpuaS9HOCtGSmxmNjFjMDJQV1JSeUh2aE1rVG1V?= =?utf-8?Q?xmp+wljfVOdQjAimAbYgitwFJUFcC3tIP2fT7z7?= X-MS-Exchange-CrossTenant-Network-Message-Id: 6305ab5f-b457-4ee6-d87e-08d94dd61fb1 X-MS-Exchange-CrossTenant-AuthSource: PH0PR11MB5000.namprd11.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 23 Jul 2021 12:33:50.0531 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 46c98d88-e344-4ed4-8496-4ed7712e255d X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: 47Dw/B35LowvtZ1eFJ+fCZsInmppG34wepAUap+YUUKRhMXlQFoCFe/VbvPZ49+KTloBDfS5+wsNFAxGbPK6IA== X-MS-Exchange-Transport-CrossTenantHeadersStamped: PH0PR11MB4808 X-OriginatorOrg: intel.com Subject: Re: [dpdk-dev] Question about hardware error handling policy X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 7/22/2021 4:46 PM, Thomas Monjalon wrote: > 22/07/2021 15:50, fengchengwen: >> Hi, all >> >> I notice ethdev support dev_reset ops, which could be used to recover from >> errors, and only 13+ drivers support this function. 'rte_eth_dev_reset()' can be used to reset device config to defaults, not have to be for error recovering. >> And also there is event for reset: RTE_ETH_EVENT_INTR_RESET, and only 6 >> drivers support it (most of them are VF). >> >> This provides users with two ways to handle hardware errors: >> a. driver report RTE_ETH_EVENT_INTR_RESET, and application do reset ops. >> b. application detect errors (the detection method is unclear), and call >> reset ops to recover. >> >> According to the design of this API, error handling is assigned to the >> application, and the driver is only responsible for reporting events. This >> simplifies the driver design (for example, the driver does not need to maintain >> mutex locks). >> >> As we know, many modern NICs come with firmware, have PCIE interfaces, >> support SR-IOV, the hardware errors can have: firmware reboot/PF reset/ >> VF reset/FLR, but these errors(particularly firmware/PF) are not addressed in >> most drivers. >> >> Question 1: what do we think of these errors(particularly firmware/PF)? Do >> we think that the probability is very low and that there is no need to deal with >> them? > > Even rare errors must be managed. > +1 >> Question 2: I prefer to put error handling in the application layer, because >> doing it in the driver can make the driver complex, but there is no app to >> register the INTR_RESET event handler. I think we can build a standard handler >> in testpmd, What do you think? > > Absolutely. As any ethdev API, it must be tested with testpmd. > Testpmd registers for RESET event, but when event received all it does is print a log, so there is not logic behind it. If the intention is to add a error handling logic into testpmd, my concern is it being too complex or too device specific. And if there is something to cleanup, or recover etc in application level, it makes sense application to receive the event and act on it. But if the reset/recover can be handled in the PMD, if possible transparently, I think that is better choice. Another thing is I am not sure if what the applications should do on the reset event clear or same for all PMDs, which is not good.