DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] Question about hardware error handling policy
@ 2021-07-22 13:50 fengchengwen
  2021-07-22 15:46 ` Thomas Monjalon
  0 siblings, 1 reply; 8+ messages in thread
From: fengchengwen @ 2021-07-22 13:50 UTC (permalink / raw)
  To: Thomas Monjalon, Ferruh Yigit, dev

Hi, all

    I notice ethdev support dev_reset ops, which could be used to recover from
errors, and only 13+ drivers support this function.
    And also there is event for reset: RTE_ETH_EVENT_INTR_RESET, and only 6
drivers support it (most of them are VF).

    This provides users with two ways to handle hardware errors:
    a. driver report RTE_ETH_EVENT_INTR_RESET, and application do reset ops.
    b. application detect errors (the detection method is unclear), and call
    reset ops to recover.

    According to the design of this API, error handling is assigned to the
application, and the driver is only responsible for reporting events. This
simplifies the driver design (for example, the driver does not need to maintain
mutex locks).

    As we know, many modern NICs come with firmware, have PCIE interfaces,
support SR-IOV, the hardware errors can have: firmware reboot/PF reset/
VF reset/FLR, but these errors(particularly firmware/PF) are not addressed in
most drivers.

    Question 1: what do we think of these errors(particularly firmware/PF)? Do
we think that the probability is very low and that there is no need to deal with
them?
    Question 2: I prefer to put error handling in the application layer, because
doing it in the driver can make the driver complex, but there is no app to
register the INTR_RESET event handler. I think we can build a standard handler
in testpmd, What do you think?

Thanks

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-07-26  6:21 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-22 13:50 [dpdk-dev] Question about hardware error handling policy fengchengwen
2021-07-22 15:46 ` Thomas Monjalon
2021-07-23  2:18   ` fengchengwen
2021-07-25 15:12     ` Matan Azrad
2021-07-26  6:21       ` fengchengwen
2021-07-23 12:33   ` Ferruh Yigit
2021-07-23 12:51     ` Thomas Monjalon
2021-07-23 13:04     ` Andrew Rybchenko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).