* [URGENT] Critical Issue: Chelsio NIC (cxgbe) fails to send Zero-Copy packets and provides incomplete HW offload in DPDK 【紧急技术问题】Chelsio 网卡 (cxgbe) 在 DPDK 中无法发送零拷贝分片包且硬件卸载功能不完整
@ 2025-10-05 9:09 bloodyevil
2025-10-06 12:47 ` Stephen Hemminger
0 siblings, 1 reply; 2+ messages in thread
From: bloodyevil @ 2025-10-05 9:09 UTC (permalink / raw)
To: dev
[-- Attachment #1: Type: text/plain, Size: 6553 bytes --]
Dear DPDK Development Team and Chelsio Support Team,
We are writing to report two severe, fundamental issues we've encountered while using Chelsio T5/T6 series NICs (with the cxgbe PMD) in our high-performance real-time audio streaming application. These problems prevent us from leveraging core DPDK features and require your urgent attention.
Issue #1: Complete Failure to Transmit Zero-Copy (Multi-Segment) Packets
To achieve the lowest latency, we are using the standard DPDK zero-copy mechanism: attaching an external shared memory buffer (from rte_memzone) containing our audio payload to an mbuf header using rte_pktmbuf_attach_extbuf. This correctly creates a multi-segment mbuf (nb_segs > 1).
However, we have found that the cxgbe driver is completely unable to transmit these multi-segment mbufs. Any attempt to send such a packet via rte_eth_tx_burst fails (returns 0 or results in silent packet drops), regardless of whether hardware offloads are enabled or disabled. The cxgbe PMD's capability report (tx_offload_capa) correctly does not include the RTE_ETH_TX_OFFLOAD_MULTI_SEGS flag.
This means that for the cxgbe driver, the standard zero-copy path in DPDK is entirely non-functional.
Issue #2: Incomplete Hardware Checksum Offload
Forced to abandon zero-copy, we implemented a "single-copy" workaround by using rte_memcpy to create a contiguous, single-segment mbuf. While this allows packets to be transmitted, we discovered a second critical issue: the hardware checksum offload functionality is incomplete.
Specifically:
We set the full offload flags on the mbuf: m->ol_flags |= RTE_MBUF_F_TX_IPV4 | RTE_MBUF_F_TX_IP_CKSUM | RTE_MBUF_F_TX_UDP_CKSUM;.
We zero out both the IP and UDP checksum fields in their respective headers before transmission.
Packet captures reveal that only the UDP checksum is correctly calculated and filled in by the hardware. The IP checksum field remains zeroed, causing the packet to be treated as invalid and dropped by the network.
For the packet to be transmitted successfully, we are forced to manually calculate the IP checksum in software (ip_h->hdr_checksum = rte_ipv4_cksum(ip_h);) while keeping the UDP checksum field zeroed for hardware offload.
This proves that although the cxgbe PMD reports support for RTE_ETH_TX_OFFLOAD_IPV4_CKSUM, it does not actually perform IP checksum offloading in practice.
For contrast, we must emphasize that both of the core functionalities we've described—zero-copy (multi-segment mbuf) transmission and full (IP+UDP) hardware checksum offloading—work perfectly on the same testbed when using NICs from Intel (igc/i40e) and onboard Realtek(RTL8125) controllers. This strongly suggests that the issues are specific to the Chelsio cxgbe PMD.
Our Dilemma
These two issues leave us in an untenable position:
The ideal zero-copy path is completely broken, preventing us from realizing a primary performance benefit of DPDK.
The fallback single-copy path is highly inefficient, as it not only incurs the CPU cost of a memcpy but also requires the additional CPU overhead of software IP checksum calculation, largely defeating the purpose of hardware offloads.
Our Questions
We urgently need your help to clarify the following:
Is this behavior from the cxgbe PMD (offloading only UDP checksum) consistent with the design expectations for a PMD in DPDK?
Does the DPDK framework provide any debugging mechanisms to trace why an explicitly set offload flag (RTE_MBUF_F_TX_IP_CKSUM) would be ignored by a PMD without reporting an error?
Resolving these issues is critical to the success of our project. Any information or guidance you can provide would be greatly appreciated.
Our Environment
DPDK Version: 25.07
NIC Model: Chelsio T520-CR
OS & Kernel: Tinycore64 16.0 kernel 6.6.63
Thank you for your time and attention to this urgent matter. We look forward to your response.
Best regards,
尊敬的 DPDK 开发团队和 Chelsio 技术支持团队:
您们好!
我们正在开发一个对性能要求极高的实时音频流项目,但目前在使用 Chelsio T5/T6 系列网卡(cxgbe PMD)时,遇到了两个严重的底层功能障碍,导致 DPDK 的核心优势无法发挥。我们恳请您们的紧急援助。
【核心问题一:零拷贝(多段 mbuf)数据包完全无法发送】
为了实现最低延迟,我们采用 DPDK 标准的零拷贝机制:通过 rte_pktmbuf_attach_extbuf 函数,将外部共享内存(rte_memzone)中的音频数据附加到 mbuf 头部之后。此操作会创建一个多段 mbuf (nb_segs > 1)。
然而,我们发现 cxgbe 驱动完全无法发送这种多段 mbuf。一旦调用 rte_eth_tx_burst 发送此类数据包,无论是否开启硬件卸载,发送都会失败(返回值为 0 或导致丢包),数据包无法出现在网络上。cxgbe PMD 的能力报告(tx_offload_capa)也确实不包含 RTE_ETH_TX_OFFLOAD_MULTI_SEGS 标志。
这表明,对于 cxgbe 驱动而言,DPDK 的标准零拷贝机制是完全不可用的。
【核心问题二:硬件校验和卸载功能不完整】
为了绕过上述问题,我们被迫采用“单拷贝”的妥协方案:手动 rte_memcpy 数据到一个大的、连续的单段 mbuf 中。虽然这种方式可以成功发包,但我们发现了第二个严重问题:硬件卸载功能是残缺的。
具体表现为:
我们为 mbuf 设置了完整的卸载标志:m->ol_flags |= RTE_MBUF_F_TX_IPV4 | RTE_MBUF_F_TX_IP_CKSUM | RTE_MBUF_F_TX_UDP_CKSUM;。
在发包前,我们将 IP 和 UDP 头部中的校验和字段都清零。
抓包分析后发现,只有 UDP 校验和被硬件正确计算并填充了。而 IP 校验和字段依然是 0,导致该包在网络中被视为无效数据包而被丢弃。
我们必须在软件中手动计算 IP 校验和(ip_h->hdr_checksum = rte_ipv4_cksum(ip_h);),同时保持 UDP 校验和为 0,才能让数据包正确发送并被接收端验证。
这证实了 cxgbe PMD 虽然声称支持 RTE_ETH_TX_OFFLOAD_IPV4_CKSUM,但在实际工作中并未执行 IP 校验和的硬件卸载。
作为对比,我们需要强调的是:我们描述的这两项核心功能——即零拷贝(多段 mbuf)发送和完整的硬件校验和卸载(IP+UDP),在我们的同一测试平台上,使用 Intel (igc/i40e) 和 Realtek (RTL8125) 的板载网卡时,都完全正常工作。这使我们确信,问题是特定于 Chelsio cxgbe PMD 的。
【我们的困境】
这两个问题使我们陷入了绝境:
理想的零拷贝路径完全不通,导致 DPDK 的核心性能优势无法体现。
妥协的单拷贝路径效率低下,不仅引入了 memcpy 的 CPU 开销,还必须额外承担 IP 校验和的软件计算开销,使得硬件卸载的价值大打折扣。
【我们的问题】
我们急需您的帮助来澄清以下问题:
cxgbe 驱动的这种行为(只卸载 UDP 校验和)是否符合 DPDK 对 PMD 的设计预期?
DPDK 框架是否有调试机制,可以追踪为何一个明确设置的卸载标志(RTE_MBUF_F_TX_IP_CKSUM)会被 PMD 忽略且不报告任何错
解决这些问题对于我们的项目能否成功至关重要。任何能够帮助我们前进的建议或信息,我们将不胜感激。
【我们的环境信息】
DPDK 版本:25.07
网卡型号:Chelsio T520-CR
操作系统与内核版本:Tinycore64 16.0 kernel 6.6.63
感谢您的时间和关注,我们急切地期待您的回复!
此致,
敬礼!
[-- Attachment #2: Type: text/html, Size: 47197 bytes --]
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [URGENT] Critical Issue: Chelsio NIC (cxgbe) fails to send Zero-Copy packets and provides incomplete HW offload in DPDK 【紧急技术问题】Chelsio 网卡 (cxgbe) 在 DPDK 中无法发送零拷贝分片包且硬件卸载功能不完整
2025-10-05 9:09 [URGENT] Critical Issue: Chelsio NIC (cxgbe) fails to send Zero-Copy packets and provides incomplete HW offload in DPDK 【紧急技术问题】Chelsio 网卡 (cxgbe) 在 DPDK 中无法发送零拷贝分片包且硬件卸载功能不完整 bloodyevil
@ 2025-10-06 12:47 ` Stephen Hemminger
0 siblings, 0 replies; 2+ messages in thread
From: Stephen Hemminger @ 2025-10-06 12:47 UTC (permalink / raw)
To: bloodyevil; +Cc: dev
[-- Attachment #1: Type: text/plain, Size: 8565 bytes --]
Please put this in a bugzilla report.
On Mon, Oct 6, 2025, 12:00 bloodyevil <bloodyevil@163.com> wrote:
> *Dear DPDK Development Team and Chelsio Support Team,*
>
> We are writing to report two severe, fundamental issues we've encountered
> while using Chelsio T5/T6 series NICs (with the cxgbe PMD) in our
> high-performance real-time audio streaming application. These problems
> prevent us from leveraging core DPDK features and require your urgent
> attention.
>
> *Issue #1: Complete Failure to Transmit Zero-Copy (Multi-Segment) Packets*
>
> To achieve the lowest latency, we are using the standard DPDK zero-copy
> mechanism: attaching an external shared memory buffer (from rte_memzone)
> containing our audio payload to an mbuf header using
> rte_pktmbuf_attach_extbuf. This correctly creates a *multi-segment mbuf* (nb_segs
> > 1).
>
> However, we have found that the *cxgbe driver is completely unable to
> transmit these multi-segment mbufs*. Any attempt to send such a packet via
> rte_eth_tx_burst fails (returns 0 or results in silent packet drops),
> regardless of whether hardware offloads are enabled or disabled. The cxgbe
> PMD's capability report (tx_offload_capa) correctly *does not include*
> the RTE_ETH_TX_OFFLOAD_MULTI_SEGS flag.
>
> This means that for the cxgbe driver, the standard zero-copy path in DPDK
> is entirely non-functional.
>
> *Issue #2: Incomplete Hardware Checksum Offload*
>
> Forced to abandon zero-copy, we implemented a "single-copy" workaround by
> using rte_memcpy to create a contiguous, single-segment mbuf. While this
> allows packets to be transmitted, we discovered a second critical issue: *the
> hardware checksum offload functionality is incomplete*.
>
> Specifically:
>
> 1.
>
> We set the full offload flags on the mbuf: m->ol_flags |=
> RTE_MBUF_F_TX_IPV4 | RTE_MBUF_F_TX_IP_CKSUM | RTE_MBUF_F_TX_UDP_CKSUM;.
> 2.
>
> We zero out both the IP and UDP checksum fields in their respective
> headers before transmission.
> 3.
>
> Packet captures reveal that only the *UDP checksum* is correctly
> calculated and filled in by the hardware. The *IP checksum* field
> remains zeroed, causing the packet to be treated as invalid and dropped by
> the network.
> 4.
>
> For the packet to be transmitted successfully, we are forced to *manually
> calculate the IP checksum in software* (ip_h->hdr_checksum =
> rte_ipv4_cksum(ip_h);) while keeping the UDP checksum field zeroed for
> hardware offload.
>
> This proves that although the cxgbe PMD reports support for
> RTE_ETH_TX_OFFLOAD_IPV4_CKSUM, it *does not actually perform IP checksum
> offloading* in practice.
>
> For contrast, we must emphasize that both of the core functionalities
> we've described—zero-copy (multi-segment mbuf) transmission and full
> (IP+UDP) hardware checksum offloading—work perfectly on the same testbed
> when using NICs from Intel (igc/i40e) and onboard Realtek(RTL8125)
> controllers. This strongly suggests that the issues are specific to the
> Chelsio cxgbe PMD.
>
> *Our Dilemma*
>
> These two issues leave us in an untenable position:
>
> -
>
> The ideal *zero-copy path is completely broken*, preventing us from
> realizing a primary performance benefit of DPDK.
> -
>
> The fallback *single-copy path is highly inefficient*, as it not only
> incurs the CPU cost of a memcpy but also requires the additional CPU
> overhead of software IP checksum calculation, largely defeating the purpose
> of hardware offloads.
>
> *Our Questions*
>
> We urgently need your help to clarify the following:
>
> -
> -
>
> Is this behavior from the cxgbe PMD (offloading only UDP checksum)
> consistent with the design expectations for a PMD in DPDK?
> -
>
> Does the DPDK framework provide any debugging mechanisms to trace
> why an explicitly set offload flag (RTE_MBUF_F_TX_IP_CKSUM) would
> be ignored by a PMD without reporting an error?
>
>
> Resolving these issues is critical to the success of our project. Any
> information or guidance you can provide would be greatly appreciated.
>
> *Our Environment*
>
> -
>
> *DPDK Version:* 25.07
> -
>
> *NIC Model:* Chelsio T520-CR
> -
>
> *OS & Kernel:* Tinycore64 16.0 kernel 6.6.63
>
> Thank you for your time and attention to this urgent matter. We look
> forward to your response.
>
> Best regards,
>
>
>
>
> *尊敬的 DPDK 开发团队和 Chelsio 技术支持团队:*
>
> 您们好!
>
> 我们正在开发一个对性能要求极高的实时音频流项目,但目前在使用 Chelsio T5/T6 系列网卡(cxgbe PMD)时,遇到了两个严重的底层功能障碍,导致
> DPDK 的核心优势无法发挥。我们恳请您们的紧急援助。
>
> *【核心问题一:零拷贝(多段 mbuf)数据包完全无法发送】*
>
> 为了实现最低延迟,我们采用 DPDK 标准的零拷贝机制:通过 rte_pktmbuf_attach_extbuf 函数,将外部共享内存(
> rte_memzone)中的音频数据附加到 mbuf 头部之后。此操作会创建一个*多段 mbuf* (nb_segs > 1)。
>
> 然而,我们发现 *cxgbe 驱动完全无法发送这种多段 mbuf*。一旦调用 rte_eth_tx_burst 发送此类数据包,无论是否开启硬件卸载,发送都会失败(返回值为
> 0 或导致丢包),数据包无法出现在网络上。cxgbe PMD 的能力报告(tx_offload_capa)也确实*不包含*
> RTE_ETH_TX_OFFLOAD_MULTI_SEGS 标志。
>
> 这表明,对于 cxgbe 驱动而言,DPDK 的标准零拷贝机制是完全不可用的。
>
> *【核心问题二:硬件校验和卸载功能不完整】*
>
> 为了绕过上述问题,我们被迫采用“单拷贝”的妥协方案:手动 rte_memcpy 数据到一个大的、连续的单段 mbuf
> 中。虽然这种方式可以成功发包,但我们发现了第二个严重问题:*硬件卸载功能是残缺的*。
>
> 具体表现为:
>
> 1.
>
> 我们为 mbuf 设置了完整的卸载标志:m->ol_flags |= RTE_MBUF_F_TX_IPV4 |
> RTE_MBUF_F_TX_IP_CKSUM | RTE_MBUF_F_TX_UDP_CKSUM;。
> 2.
>
> 在发包前,我们将 IP 和 UDP 头部中的校验和字段都清零。
> 3.
>
> 抓包分析后发现,只有 *UDP 校验和*被硬件正确计算并填充了。而 *IP 校验和*字段依然是 0,导致该包在网络中被视为无效数据包而被丢弃。
> 4.
>
> 我们必须在软件中手动计算 IP 校验和(ip_h->hdr_checksum = rte_ipv4_cksum(ip_h);),同时保持
> UDP 校验和为 0,才能让数据包正确发送并被接收端验证。
>
> 这证实了 cxgbe PMD 虽然声称支持 RTE_ETH_TX_OFFLOAD_IPV4_CKSUM,但在实际工作中*并未执行 IP
> 校验和的硬件卸载*。
>
> 作为对比,我们需要强调的是:我们描述的这两项核心功能——即零拷贝(多段
> mbuf)发送和完整的硬件校验和卸载(IP+UDP),在我们的同一测试平台上,使用 Intel (igc/i40e) 和 Realtek
> (RTL8125) 的板载网卡时,都完全正常工作。这使我们确信,问题是特定于 Chelsio cxgbe PMD 的。
>
> *【我们的困境】*
>
> 这两个问题使我们陷入了绝境:
>
> -
>
> *理想的零拷贝路径完全不通*,导致 DPDK 的核心性能优势无法体现。
> -
>
> *妥协的单拷贝路径效率低下*,不仅引入了 memcpy 的 CPU 开销,还必须额外承担 IP
> 校验和的软件计算开销,使得硬件卸载的价值大打折扣。
>
> *【我们的问题】*
>
> 我们急需您的帮助来澄清以下问题:
>
> -
> -
>
> cxgbe 驱动的这种行为(只卸载 UDP 校验和)是否符合 DPDK 对 PMD 的设计预期?
> -
>
> DPDK 框架是否有调试机制,可以追踪为何一个明确设置的卸载标志(RTE_MBUF_F_TX_IP_CKSUM)会被 PMD
> 忽略且不报告任何错
>
>
> 解决这些问题对于我们的项目能否成功至关重要。任何能够帮助我们前进的建议或信息,我们将不胜感激。
>
> *【我们的环境信息】*
>
> -
>
> *DPDK 版本:25.07*
> -
>
> *网卡型号:*Chelsio T520-CR
> -
>
> *操作系统与内核版本:Tinycore64 16.0 kernel 6.6.63*
>
> 感谢您的时间和关注,我们急切地期待您的回复!
>
> 此致,
>
> 敬礼!
>
>
[-- Attachment #2: Type: text/html, Size: 34709 bytes --]
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2025-10-06 12:48 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-10-05 9:09 [URGENT] Critical Issue: Chelsio NIC (cxgbe) fails to send Zero-Copy packets and provides incomplete HW offload in DPDK 【紧急技术问题】Chelsio 网卡 (cxgbe) 在 DPDK 中无法发送零拷贝分片包且硬件卸载功能不完整 bloodyevil
2025-10-06 12:47 ` Stephen Hemminger
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).