From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id ADC7A41E23; Thu, 9 Mar 2023 12:31:00 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 407D240ED7; Thu, 9 Mar 2023 12:31:00 +0100 (CET) Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by mails.dpdk.org (Postfix) with ESMTP id 5F8D3400D7 for ; Thu, 9 Mar 2023 12:30:58 +0100 (CET) Received: from dggpeml500024.china.huawei.com (unknown [172.30.72.55]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4PXRkF3SjvzSkjj; Thu, 9 Mar 2023 19:27:49 +0800 (CST) Received: from [10.67.100.224] (10.67.100.224) by dggpeml500024.china.huawei.com (7.185.36.10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.21; Thu, 9 Mar 2023 19:30:55 +0800 Subject: Re: [PATCH 1/5] ethdev: fix race-condition of proactive error handling mode To: Honnappa Nagarahalli , Konstantin Ananyev , "dev@dpdk.org" , "thomas@monjalon.net" , Ferruh Yigit , Andrew Rybchenko , Kalesh AP , "Ajit Khaparde (ajit.khaparde@broadcom.com)" CC: nd References: <20230301030610.49468-1-fengchengwen@huawei.com> <20230301030610.49468-2-fengchengwen@huawei.com> <95edd6ca-fe1f-fd7c-719f-0a9e6d7c45b5@huawei.com> <90919d02-08ec-dcd1-db56-7104e7aeb299@huawei.com> From: fengchengwen Message-ID: Date: Thu, 9 Mar 2023 19:30:55 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.11.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 7bit X-Originating-IP: [10.67.100.224] X-ClientProxiedBy: dggems701-chm.china.huawei.com (10.3.19.178) To dggpeml500024.china.huawei.com (7.185.36.10) X-CFilter-Loop: Reflected X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On 2023/3/9 11:03, Honnappa Nagarahalli wrote: > > >> -----Original Message----- >> From: fengchengwen >> Sent: Wednesday, March 8, 2023 7:00 PM >> To: Honnappa Nagarahalli ; Konstantin >> Ananyev ; dev@dpdk.org; >> thomas@monjalon.net; Ferruh Yigit ; Andrew >> Rybchenko ; Kalesh AP > anakkur.purayil@broadcom.com>; Ajit Khaparde >> (ajit.khaparde@broadcom.com) >> Cc: nd >> Subject: Re: [PATCH 1/5] ethdev: fix race-condition of proactive error handling >> mode >> >> >> >> On 2023/3/8 9:09, Honnappa Nagarahalli wrote: >>> >>> >>>>>>>>> >>>>>>> >>>>>>> Is there any reason not to design this in the same way as >>>>>> 'rte_eth_dev_reset'? Why does the PMD have to recover by itself? >>>>>> >>>>>> I suppose it is a question for the authors of original patch... >>>>> Appreciate if the authors could comment on this. >>>> >>>> The main cause is that the hardware implementation limit, I will try >>>> to explain from hns3 PMD's view. >>>> For a global reset, all the function need responsed within a centain >>>> period of time. otherwise, the reset will fail. and also the reset >>>> requirement a few steps (all may take a long time). >>>> >>>> When with multiple functions in one DPDK, and trigger a global reset, >>>> the rte_eth_dev_reset will not cover this scene: >>>> 1. each port's will report RTE_ETH_EVENT_INTR_RESET in interrupt thread. >>>> 2. then invoke application callback, but due to the same thread, and each >>>> port's recover will take a long time, so later port will reset failed. > I am reading this again. What you are saying is, a single thread running the recovery process in sequence for multiple ports will not meet the required time limits. Hence, the recovery process needs to run in multiple threads simultaneously. This way each thread could run the recovery for a different port. Do I understand this correctly? No It's not realistic to have threads on every port. > > (Assuming my understanding is correct) The current implementation is running the recovery process in the context of data plane threads and not in the interrupt thread. Is this correct? No, the recovery process is running in the interrupt thread. > >>> If the design were to introduce RTE_ETH_EVENT_INTR_RECOVER and >> rte_eth_dev_recover, what problems do you see? >> >> I see the 'RTE_ETH_EVENT_INTR_RECOVER and rte_eth_dev_recover' has no >> difference with RTE_ETH_EVENT_INTR_RESET mechanism. >> Could you detail more? >> >>> >>>> >>>>> >>>>>> >>>>>>> We could have a similar API 'rte_eth_dev_recover' to do the >>>>>>> recovery >>>>>> functionality. >>>>>> >>>>>> I suppose such approach is also possible. >>>>>> Personally I am fine with both ways: either existing one or what >>>>>> you propose, as long as we'll fix existing race-condition. >>>>>> What is good with what you suggest - that way we probably don't >>>>>> need to worry how to allow user to enable/disable auto-recovery inside >> PMD. >>>>>> >>>>>> Konstantin >>>>>> >>>>>