From: fengchengwen <fengchengwen@huawei.com>
To: Thomas Monjalon <thomas@monjalon.net>
Cc: Ferruh Yigit <ferruh.yigit@intel.com>,
"dev@dpdk.org" <dev@dpdk.org>, <beilei.xing@intel.com>,
Matan Azrad <matan@nvidia.com>
Subject: Re: [dpdk-dev] Question about hardware error handling policy
Date: Fri, 23 Jul 2021 10:18:08 +0800 [thread overview]
Message-ID: <a88dd83e-e3ae-86c6-6bfd-3ee04e3d9438@huawei.com> (raw)
In-Reply-To: <4435152.k7BQ785f6v@thomas>
On 2021/7/22 23:46, Thomas Monjalon wrote:
> 22/07/2021 15:50, fengchengwen:
>> Hi, all
>>
>> I notice ethdev support dev_reset ops, which could be used to recover from
>> errors, and only 13+ drivers support this function.
>> And also there is event for reset: RTE_ETH_EVENT_INTR_RESET, and only 6
>> drivers support it (most of them are VF).
>>
>> This provides users with two ways to handle hardware errors:
>> a. driver report RTE_ETH_EVENT_INTR_RESET, and application do reset ops.
>> b. application detect errors (the detection method is unclear), and call
>> reset ops to recover.
>>
>> According to the design of this API, error handling is assigned to the
>> application, and the driver is only responsible for reporting events. This
>> simplifies the driver design (for example, the driver does not need to maintain
>> mutex locks).
>>
>> As we know, many modern NICs come with firmware, have PCIE interfaces,
>> support SR-IOV, the hardware errors can have: firmware reboot/PF reset/
>> VF reset/FLR, but these errors(particularly firmware/PF) are not addressed in
>> most drivers.
>>
>> Question 1: what do we think of these errors(particularly firmware/PF)? Do
>> we think that the probability is very low and that there is no need to deal with
>> them?
>
> Even rare errors must be managed.
Because intel and mlx NIC are widely used, I look at i40e/mlx5 driver code, and found:
1) i40e PF driver, it only show logs when detect global reset and other error:
if (icr0 & I40E_PFINT_ICR0_GRST_MASK)
PMD_DRV_LOG(INFO, "ICR0: global reset requested");
if (icr0 & I40E_PFINT_ICR0_PCI_EXCEPTION_MASK)
PMD_DRV_LOG(INFO, "ICR0: PCI exception activated");
if (icr0 & I40E_PFINT_ICR0_STORM_DETECT_MASK)
PMD_DRV_LOG(INFO, "ICR0: a change in the storm control state");
@Beilei Why not report RESET_EVENT in these cases ? or these error are very rarely
so just report it. And also, the application may still send or recv packet, These
resets, if not handled correctly, do not cause the hardware/driver to hang ?
2) mlx5 PF driver, I notice there is a mlx5_dev_interrupt_device_fatal which will
remove the device.
@Matan Why not report RESET_EVENT in these cases ? so the application can be
recovered quickly.
>
>> Question 2: I prefer to put error handling in the application layer, because
>> doing it in the driver can make the driver complex, but there is no app to
>> register the INTR_RESET event handler. I think we can build a standard handler
>> in testpmd, What do you think?
>
> Absolutely. As any ethdev API, it must be tested with testpmd.
>
>
> .
>
next prev parent reply other threads:[~2021-07-23 2:18 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-22 13:50 fengchengwen
2021-07-22 15:46 ` Thomas Monjalon
2021-07-23 2:18 ` fengchengwen [this message]
2021-07-25 15:12 ` Matan Azrad
2021-07-26 6:21 ` fengchengwen
2021-07-23 12:33 ` Ferruh Yigit
2021-07-23 12:51 ` Thomas Monjalon
2021-07-23 13:04 ` Andrew Rybchenko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a88dd83e-e3ae-86c6-6bfd-3ee04e3d9438@huawei.com \
--to=fengchengwen@huawei.com \
--cc=beilei.xing@intel.com \
--cc=dev@dpdk.org \
--cc=ferruh.yigit@intel.com \
--cc=matan@nvidia.com \
--cc=thomas@monjalon.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).