From: Bruce Richardson <bruce.richardson@intel.com>
To: Doraemon <nobitanobi@qq.com>
Cc: dev <dev@dpdk.org>
Subject: Re: [Help needed] net_ice: MDD event (Malicious Driver Detection) on TX queue when using rte_eth_tx_prepare / rte_eth_tx_burst
Date: Thu, 28 Aug 2025 16:57:05 +0100 [thread overview]
Message-ID: <aLB8UeIjprkqLrBm@bricha3-mobl1.ger.corp.intel.com> (raw)
In-Reply-To: <tencent_F6CFD7AE41985DA5FB835CDE833F44C6AF06@qq.com>
On Wed, Aug 27, 2025 at 08:52:26AM +0800, Doraemon wrote:
> Hello DPDK / net_ice maintainers,
>
> We are seeing a reproducible and concerning issue when using the
> net_ice PMD with DPDK 22.11.2, and we would appreciate your help
> diagnosing it.
> Summary
> - Environment:
> - DPDK: 22.11.2
> - net_ice PCI device: 8086:159b
> - ice kernel driver: 1.12.7
> - NIC firmware: FW 7.3.6111681 (NVM 4.30)
> - IOVA mode: PA, VFIO enabled
> - Multi-process socket: /var/run/dpdk/PGW/mp_socket
> - NUMA: 2, detected lcores: 112
> - Bonding: pmd_bond with bonded devices created (net_bonding0 on port
> 4, net_bonding1 on port 5)
> - Driver enabled AVX2 OFFLOAD Vector Tx (log shows
> "ice_set_tx_function(): Using AVX2 OFFLOAD Vector Tx")
> - Problem statement:
> - Our application calls rte_eth_tx_prepare before calling
> rte_eth_tx_burst as part of the normal transmission path.
> - After the application has been running for some time (not immediate),
> the kernel/driver emits the following messages repeatedly:
> - ice_interrupt_handler(): OICR: MDD event
> - ice_interrupt_handler(): Malicious Driver Detection event 3 by TCLAN
> on TX queue 1025 PF# 1
> - We are using a single TX queue (application-level single queue) and
> are sending only one packet per burst (burst size = 1).
> - The sequence is: rte_eth_tx_prepare (returns) -> rte_eth_tx_burst ->
> MDD events occur later.
> - The events affect stability and repeat over time.
> Relevant startup logs (excerpt)
> EAL: Detected CPU lcores: 112
> EAL: Detected NUMA nodes: 2
> EAL: Selected IOVA mode 'PA'
> EAL: VFIO support initialized
> EAL: Probe PCI driver: net_ice (8086:159b) device: 0000:3b:00.1 (socket
> 0)
> ice_load_pkg_type(): Active package is: 1.3.45.0, ICE COMMS Package
> (double VLAN mode)
> ice_dev_init(): FW 7.3.6111681 API 1.7
> ...
> bond_probe(3506) - Initializing pmd_bond for net_bonding0
> bond_probe(3592) - Create bonded device net_bonding0 on port 4 in mode
> 1 on socket 0.
> ...
> ice_set_tx_function(): Using AVX2 OFFLOAD Vector Tx (port 0).
> TELEMETRY: No legacy callbacks, legacy socket not created
> What we have tried / preliminary observations
> - Confirmed application calls rte_eth_tx_prepare prior to
> rte_eth_tx_burst.
> - Confirmed single TX queue configuration and small bursts (size = 1)
> �� not high-rate, not a typical high-burst/malicious pattern.
> - The MDD log identifies "TX queue 1025"; unclear how that maps to our
> DPDK queue numbering (we use queue 0 in the app).
> - No obvious other DPDK errors at startup; interface initializes
> normally and vector TX is enabled.
> - We suspect the driver's Malicious Driver Detection (MDD) is
> triggering due to some descriptor/doorbell ordering or offload
> interaction, possibly related to AVX2 Vector Tx offload.
> Questions / requests to the maintainers
> 1. What specifically triggers "MDD event 3 by TCLAN" in net_ice?
> Which driver check/threshold corresponds to event type 3?
> 2. How is the "TX queue 1025" value computed/mapped in the log? (Is
> it queue id + offset, VF mapping, or an internal vector id?) We need
> to map that log value to our DPDK queue index.
> 3. Can the rte_eth_tx_prepare + rte_eth_tx_burst call pattern cause
> MDD detections under any circumstances? If so, are there recommended
> usage patterns or ordering constraints to avoid false positives?
> 4. Are there known firmware/driver/DPDK version combinations with
> similar MDD behavior? Do you recommend specific NIC firmware, kernel
> driver, or DPDK versions as a workaround/fix?
> 5. Any suggested workarounds we can test quickly (e.g., disable vector
> TX offload, disable specific HW offloads, change interrupt/queue
> bindings, or adjust doorbell behavior)?
While I've not come across this particular issue before in the past, one
immediate suggestion might be to try the latest point release of 22.11,
updating from 22.11.2 to 22.11.9. Checking the diffs, I see that there were
some changes made to the ice_prep_pkts() function between those two
releases. Perhaps those changes may help here.
Regards,
/Bruce
prev parent reply other threads:[~2025-08-28 15:57 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-27 0:52 =?gb18030?B?RG9yYWVtb24=?=
2025-08-28 14:45 ` Stephen Hemminger
2025-08-28 15:57 ` Bruce Richardson [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aLB8UeIjprkqLrBm@bricha3-mobl1.ger.corp.intel.com \
--to=bruce.richardson@intel.com \
--cc=dev@dpdk.org \
--cc=nobitanobi@qq.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).