* QEDE bug report
@ 2025-04-22 15:39 Edwin Brossette
2025-04-22 15:51 ` [PATCH 1/5] qede: fix tunnel checksums offload flags edwin.brossette
0 siblings, 1 reply; 7+ messages in thread
From: Edwin Brossette @ 2025-04-22 15:39 UTC (permalink / raw)
To: dev; +Cc: dsinghrawat, palok, Olivier Matz, Laurent Hardy, Didier Pallard
[-- Attachment #1: Type: text/plain, Size: 17369 bytes --]
Hello,
I have several issues to report concerning the qede pmd as well as
potential solutions for them. Most of them have to do with configuring the
MTU.
========== Abort on mtu change ===========
First, the qede_assign_rxtx_handlers() seems to be working wrong since an
API change in the rte_eth lib. The commit linked bellow changes the way
packet receive and transmit functions are handled:
https://git.dpdk.org/dpdk/commit/?id=c87d435a4d79739c0cec2ed280b94b41cb908af7
Originally, the Rx/Tx handlers were located in rte_eth_dev struct. This is
currently no longer the case and the polling of incoming packets is done
with functions registered in rte_eth_fp_ops instead. The rte lib change is
supposed to be transparent for the individual pmds, but the polling
functions in rte_eth_dev are only synchronized with the ones in
rte_eth_fp_ops at the device start. This leads to an issue when trying to
configure the MTU while there is ongoing traffic:
-> Trying to change the MTU triggers a port restart.
-> qede_assign_rxtx_handlers() assign dummy polling functions in
dev->rx_pkt_burst and dev->tx_pkt_burst while the port is down.
-> However, rte_eth_rx_burst() polls in
&rte_eth_fp_ops[port_id].rx_pkt_burst which still points to
qede_recv_pkts_regular()
-> The application keep polling packets in the receive function and
triggers an assert(rx_mb != NULL) which caused an abort.
Since rte_eth_fp_ops is reset in rte_eth_dev_stop(), it may be better to
call this function instead of qede_dev_stop(). However the dummy functions
defined in lib/ethdev/ethdev_private.c log an error and dump stack when
called so they might not be intended to be used this way.
The way I fixed this issue in our applications is by forcing a complete
stop of the port before configuring the MTU. I have no DPDK patch to
suggest for this
-------------- Reproduction ------------------
1) Start testpmd:
dpdk-testpmd --log-level=pmd.net.qede.driver:7 -a 0000:17:00.0 -a
0000:17:00.1 -- -i --rxq=2 --txq=2 --coremask=0x0c --total-num-mbufs=250000
2) Start packet forwarding:
start
io packet forwarding - ports=2 - cores=2 - streams=4 - NUMA support
enabled, MP allocation mode: native
Logical Core 2 (socket 0) forwards packets on 2 streams:
RX P=0/Q=0 (socket 0) -> TX P=1/Q=0 (socket 0) peer=02:00:00:00:00:01
RX P=1/Q=0 (socket 0) -> TX P=0/Q=0 (socket 0) peer=02:00:00:00:00:00
Logical Core 3 (socket 1) forwards packets on 2 streams:
RX P=0/Q=1 (socket 0) -> TX P=1/Q=1 (socket 0) peer=02:00:00:00:00:01
RX P=1/Q=1 (socket 0) -> TX P=0/Q=1 (socket 0) peer=02:00:00:00:00:00
io packet forwarding packets/burst=32
nb forwarding cores=2 - nb forwarding ports=2
port 0: RX queue number: 2 Tx queue number: 2
Rx offloads=0x80000 Tx offloads=0x0
RX queue: 0
RX desc=0 - RX free threshold=0
RX threshold registers: pthresh=0 hthresh=0 wthresh=0
RX Offloads=0x80000
TX queue: 0
TX desc=0 - TX free threshold=0
TX threshold registers: pthresh=0 hthresh=0 wthresh=0
TX offloads=0x8000 - TX RS bit threshold=0
port 1: RX queue number: 2 Tx queue number: 2
Rx offloads=0x80000 Tx offloads=0x0
RX queue: 0
RX desc=0 - RX free threshold=0
RX threshold registers: pthresh=0 hthresh=0 wthresh=0
RX Offloads=0x80000
TX queue: 0
TX desc=0 - TX free threshold=0
TX threshold registers: pthresh=0 hthresh=0 wthresh=0
TX offloads=0x8000 - TX RS bit threshold=0
3) Send a continuous stream of packets and change the mtu while they
are being forwarded:
p = Ether()/IP(src='10.100.0.1', dst='10.200.0.1')/UDP(sport=11111,
dport=22222)/Raw(load='A'*100)
sendp(p, iface='ntfp1', count=5000000, inter=0.001)
port config mtu 0 1500
[qede_dev_set_link_state:1842(17:00.0:dpdk-port-0)]setting link state 0
[qede_link_update:1483(17:00.0:dpdk-port-0)]Link - Speed 0 Mode 1 AN 1
Status 0
[ecore_int_attentions:1239(17:00.0:dpdk-port-0-0)]MFW indication via
attention
[qede_link_update:1483(17:00.0:dpdk-port-0)]Link - Speed 0 Mode 1 AN 1
Status 0
[qede_activate_vport:536(17:00.0:dpdk-port-0)]vport is deactivated
[qede_tx_queue_reset:509(17:00.0:dpdk-port-0)]Reset TX queue 0
[qede_tx_queue_stop:991(17:00.0:dpdk-port-0)]TX queue 0 stopped
[qede_tx_queue_reset:509(17:00.0:dpdk-port-0)]Reset TX queue 1
[qede_tx_queue_stop:991(17:00.0:dpdk-port-0)]TX queue 1 stopped
[qede_rx_queue_reset:293(17:00.0:dpdk-port-0)]Reset RX queue 0
[qede_rx_queue_stop:371(17:00.0:dpdk-port-0)]RX queue 0 stopped
[qede_rx_queue_reset:293(17:00.0:dpdk-port-0)]Reset RX queue 1
[qede_rx_queue_stop:371(17:00.0:dpdk-port-0)]RX queue 1 stopped
dpdk-testpmd: ../../../source/dpdk-24.11/drivers/net/qede/qede_rxtx.c:1600:
qede_recv_pkts_regular: Assertion `rx_mb != NULL' failed.
Aborted (core dumped) <=======
As you can see, the application is aborted.
========= Bad Rx buffer size for mbufs ===========
Another issue I had when trying to set the MTU was that I got bad sanity
checks on mbufs when sending large packets, causing aborts. These checks
are only done when compiling in debug mode with RTE_LIBRTE_MBUF_DEBUG set,
so they may be easy to miss. Even without using this debugging config,
there are problems when transmitting those packets. This problem occurs
when calculating the size of the rx buffer, which was reworked in this
patch:
https://git.dpdk.org/dpdk/commit/drivers/net/qede?id=318d7da3122bac04772418c5eda9f50fcd175d18
-> First, the max Rx buffer size is configured at two different places in
the code, and the calculation is not consistant between them. In
qede_rx_queue_setup(), RTE_ETHER_CRC_LEN is added to max_rx_pktlen but not
in qede_set_mtu(). The commit above mentions that HW does not include CRC
in received frame when passed to host, meaning the CRC should not be added.
Also, QEDE_ETH_OVERHEAD is added twice, once when frame_size is initialized
with QEDE_MAX_ETHER_HDR_LEN and another time in qede_calc_rx_buf_size.
-> Furthermore, the commit mentions applying a flooring on rx_buf_size to
align its value. This will cause an issue when receiving large packets
nearing MTU size limit, as the rx_buffer will not be large enough for some
MTU values. What I observed when debugging the pmd is that the nic would do
Rx scatter to compensate for the insufficient buffer size, although the pmd
was not configured to handle this. I saw mbufs with m->nb_segs = 2 and
m->next = NULL being received, which was unsurprising, given the wrong
receive function was used as Rx scatter was not enabled:
qede_recv_pkts_regular() instead of qede_recv_pkts(). I would suggest
restoring the use of QEDE_CEIL_TO_CACHE_LINE_SIZE and force-enabling Rx
scatter in case this value would exceed mbuf size. This would ensure the
buffer is large enough to receive MTU-sized packets without fragmentation.
I will submit patches to improve the calculation of the rx buffer size.
-------------- Reproduction ------------------
Sadly, I did not manage to reproduce this issue with testpmd because Rx
scatter gets forcefully enabled by qede when started. This happens because
mtu is set at maximum possible value by default.
With a sufficient log level, this is visible at start:
Configuring Port 0 (socket 0)
[qede_check_fdir_support:149(17:00.0:dpdk-port-0)]flowdir is disabled
[qed_sb_init:500(17:00.0:dpdk-port-0)]hwfn [0] <--[init]-- SB 0000 [0x0000
upper]
[qede_alloc_fp_resc:656(17:00.0:dpdk-port-0)]sb_info idx 0x1 initialized
[qed_sb_init:500(17:00.0:dpdk-port-0)]hwfn [0] <--[init]-- SB 0001 [0x0001
upper]
[qede_alloc_fp_resc:656(17:00.0:dpdk-port-0)]sb_info idx 0x2 initialized
[qede_start_vport:498(17:00.0:dpdk-port-0)]VPORT started with MTU = 2154
[qede_vlan_stripping:906(17:00.0:dpdk-port-0)]VLAN stripping disabled
[qede_vlan_filter_set:970(17:00.0:dpdk-port-0)]No VLAN filters configured
yet
[qede_vlan_offload_set:1035(17:00.0:dpdk-port-0)]VLAN offload mask 3
[qede_dev_configure:1329(17:00.0:dpdk-port-0)]Device configured with RSS=2
TSS=2
[qede_alloc_tx_queue_mem:446(17:00.0:dpdk-port-0)]txq 0 num_desc 512
tx_free_thresh 480 socket 0
[qede_alloc_tx_queue_mem:446(17:00.0:dpdk-port-0)]txq 1 num_desc 512
tx_free_thresh 480 socket 0
[qede_rx_queue_setup:247(17:00.0:dpdk-port-0)]Forcing scatter-gather mode
<=========================
[qede_alloc_rx_queue_mem:155(17:00.0:dpdk-port-0)]mtu 2154 mbufsz 2048
bd_max_bytes 2048 scatter_mode 1
[qede_rx_queue_setup:283(17:00.0:dpdk-port-0)]rxq 0 num_desc 512
rx_buf_size=2048 socket 0
[qede_alloc_rx_queue_mem:155(17:00.0:dpdk-port-0)]mtu 2154 mbufsz 2048
bd_max_bytes 2048 scatter_mode 1
[qede_rx_queue_setup:283(17:00.0:dpdk-port-0)]rxq 1 num_desc 512
rx_buf_size=2048 socket 0
[qede_rx_queue_start:778(17:00.0:dpdk-port-0)]rxq 0 igu_sb_id 0x1
[qede_rx_queue_start:805(17:00.0:dpdk-port-0)]RX queue 0 started
[qede_rx_queue_start:778(17:00.0:dpdk-port-0)]rxq 1 igu_sb_id 0x2
[qede_rx_queue_start:805(17:00.0:dpdk-port-0)]RX queue 1 started
[qede_tx_queue_start:837(17:00.0:dpdk-port-0)]txq 0 igu_sb_id 0x1
[qede_tx_queue_start:870(17:00.0:dpdk-port-0)]TX queue 0 started
[qede_tx_queue_start:837(17:00.0:dpdk-port-0)]txq 1 igu_sb_id 0x2
[qede_tx_queue_start:870(17:00.0:dpdk-port-0)]TX queue 1 started
[qede_config_rss:1059(17:00.0:dpdk-port-0)]Applying driver default key
[qede_rss_hash_update:2121(17:00.0:dpdk-port-0)]RSS hf = 0x104 len = 40 key
= 0x7ffc5980db90
[qede_rss_hash_update:2126(17:00.0:dpdk-port-0)]Enabling rss
[qede_rss_hash_update:2140(17:00.0:dpdk-port-0)]Applying user supplied hash
key
[qede_rss_hash_update:2187(17:00.0:dpdk-port-0)]Storing RSS key
[qede_activate_vport:536(17:00.0:dpdk-port-0)]vport is activated
[qede_dev_set_link_state:1842(17:00.0:dpdk-port-0)]setting link state 1
[qede_link_update:1483(17:00.0:dpdk-port-0)]Link - Speed 0 Mode 1 AN 1
Status 0
[qede_link_update:1483(17:00.0:dpdk-port-0)]Link - Speed 0 Mode 1 AN 1
Status 0
[qede_assign_rxtx_handlers:338(17:00.0:dpdk-port-0)]Assigning qede_recv_pkts
[qede_assign_rxtx_handlers:354(17:00.0:dpdk-port-0)]Assigning
qede_xmit_pkts_regular
[qede_dev_start:1155(17:00.0:dpdk-port-0)]Device started
[qede_ucast_filter:682(17:00.0:dpdk-port-0)]Unicast MAC is not found
Port 0: F4:E9:D4:7A:A1:E6
========= Bad sanity check on port stop ===========
There is another sanity check which break the program on port stop when
releasing mbufs: If large traffic has been sent, it's possible to see
packets with m->pkt_len = 0 but non-zero m->data_len. It is probably
because mbufs are not reset when allocated in qede_alloc_rx_buffer() and
their value is only set at the end of qede_recv_pkts(). Calling
rte_pktmbuf_reset() before freeing the mbufs is enough to avoid this issue.
I will submit a patch doing this.
-------------- Reproduction ------------------
1) Start testpmd:
dpdk-testpmd --log-level=pmd.net.qede.driver:7 -a 0000:17:00.0 -a
0000:17:00.1 -- -i --rxq=2 --txq=2 --coremask=0x0c --total-num-mbufs=250000
--max-pkt-len 2172
show port info 0
********************* Infos for port 0 *********************
MAC address: F4:E9:D4:7A:A1:E6
Device name: 0000:17:00.0
Driver name: net_qede
Firmware-version: 8.40.33.0 MFW: 8.24.21.0
Devargs:
Connect to socket: 0
memory allocation on the socket: 0
Link status: up
Link speed: 10 Gbps
Link duplex: full-duplex
Autoneg status: On
MTU: 2154
Promiscuous mode: enabled
Allmulticast mode: disabled
Maximum number of MAC addresses: 256
Maximum number of MAC addresses of hash filtering: 0
VLAN offload:
strip off, filter off, extend off, qinq strip off
Hash key size in bytes: 40
Redirection table size: 128
Supported RSS offload flow types:
ipv4 ipv4-tcp ipv4-udp ipv6 ipv6-tcp ipv6-udp vxlan
geneve
Minimum size of RX buffer: 1024
Maximum configurable length of RX packet: 9672
Maximum configurable size of LRO aggregated packet: 0
Current number of RX queues: 2
Max possible RX queues: 32
Max possible number of RXDs per queue: 32768
Min possible number of RXDs per queue: 128
RXDs number alignment: 128
Current number of TX queues: 2
Max possible TX queues: 32
Max possible number of TXDs per queue: 32768
Min possible number of TXDs per queue: 256
TXDs number alignment: 256
Max segment number per packet: 255
Max segment number per MTU/TSO: 18
Device capabilities: 0x0( )
Device error handling mode: none
Device private info:
none
2) Start packet forwarding:
start
io packet forwarding - ports=2 - cores=2 - streams=4 - NUMA support
enabled, MP allocation mode: native
Logical Core 2 (socket 0) forwards packets on 2 streams:
RX P=0/Q=0 (socket 0) -> TX P=1/Q=0 (socket 0) peer=02:00:00:00:00:01
RX P=1/Q=0 (socket 0) -> TX P=0/Q=0 (socket 0) peer=02:00:00:00:00:00
Logical Core 3 (socket 1) forwards packets on 2 streams:
RX P=0/Q=1 (socket 0) -> TX P=1/Q=1 (socket 0) peer=02:00:00:00:00:01
RX P=1/Q=1 (socket 0) -> TX P=0/Q=1 (socket 0) peer=02:00:00:00:00:00
io packet forwarding packets/burst=32
nb forwarding cores=2 - nb forwarding ports=2
port 0: RX queue number: 2 Tx queue number: 2
Rx offloads=0x80000 Tx offloads=0x0
RX queue: 0
RX desc=0 - RX free threshold=0
RX threshold registers: pthresh=0 hthresh=0 wthresh=0
RX Offloads=0x80000
TX queue: 0
TX desc=0 - TX free threshold=0
TX threshold registers: pthresh=0 hthresh=0 wthresh=0
TX offloads=0x8000 - TX RS bit threshold=0
port 1: RX queue number: 2 Tx queue number: 2
Rx offloads=0x80000 Tx offloads=0x0
RX queue: 0
RX desc=0 - RX free threshold=0
RX threshold registers: pthresh=0 hthresh=0 wthresh=0
RX Offloads=0x80000
TX queue: 0
TX desc=0 - TX free threshold=0
TX threshold registers: pthresh=0 hthresh=0 wthresh=0
TX offloads=0x8000 - TX RS bit threshold=0
3) Send a continuous stream of packet of large size, nearing MTU limit
(size 2020):
p = Ether()/IP(src='10.100.0.1', dst='10.200.0.1')/UDP(sport=11111,
dport=22222)/Raw(load='A'*2020)
sendp(p, iface='ntfp1', count=5000, inter=0.001)
*Stop sending packets*
4) Stop packet forwarding:
stop
Telling cores to stop...
Waiting for lcores to finish...
------- Forward Stats for RX Port= 0/Queue= 1 -> TX Port= 1/Queue= 1
-------
RX-packets: 1251 TX-packets: 1251 TX-dropped: 0
---------------------- Forward statistics for port 0
----------------------
RX-packets: 1251 RX-dropped: 0 RX-total: 1251
TX-packets: 0 TX-dropped: 0 TX-total: 0
----------------------------------------------------------------------------
---------------------- Forward statistics for port 1
----------------------
RX-packets: 0 RX-dropped: 0 RX-total: 0
TX-packets: 1251 TX-dropped: 0 TX-total: 1251
----------------------------------------------------------------------------
+++++++++++++++ Accumulated forward statistics for all
ports+++++++++++++++
RX-packets: 1251 RX-dropped: 0 RX-total: 1251
TX-packets: 1251 TX-dropped: 0 TX-total: 1251
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Done.
5) Stop port:
port stop 0
Stopping ports...
[qede_dev_set_link_state:1842(17:00.0:dpdk-port-0)]setting link state 0
[qede_link_update:1483(17:00.0:dpdk-port-0)]Link - Speed 0 Mode 1 AN 1
Status 0
[ecore_int_attentions:1239(17:00.0:dpdk-port-0-0)]MFW indication via
attention
[qede_link_update:1483(17:00.0:dpdk-port-0)]Link - Speed 0 Mode 1 AN 1
Status 0
[qede_activate_vport:536(17:00.0:dpdk-port-0)]vport is deactivated
[qede_tx_queue_reset:509(17:00.0:dpdk-port-0)]Reset TX queue 0
[qede_tx_queue_stop:991(17:00.0:dpdk-port-0)]TX queue 0 stopped
[qede_tx_queue_reset:509(17:00.0:dpdk-port-0)]Reset TX queue 1
[qede_tx_queue_stop:991(17:00.0:dpdk-port-0)]TX queue 1 stopped
[qede_rx_queue_reset:293(17:00.0:dpdk-port-0)]Reset RX queue 0
[qede_rx_queue_stop:371(17:00.0:dpdk-port-0)]RX queue 0 stopped
EAL: PANIC in rte_mbuf_sanity_check():
bad pkt_len
0: dpdk-testpmd (rte_dump_stack+0x42) [643a372c5ac2]
1: dpdk-testpmd (__rte_panic+0xcc) [643a36c81963]
2: dpdk-testpmd (643a36ab0000+0x1d0b99) [643a36c80b99]
3: dpdk-testpmd (643a36ab0000+0x11b66be) [643a37c666be]
4: dpdk-testpmd (qede_stop_queues+0x162) [643a37c67ea2]
5: dpdk-testpmd (643a36ab0000+0x3c33fa) [643a36e733fa]
6: dpdk-testpmd (rte_eth_dev_stop+0x86) [643a3725a4c6]
7: dpdk-testpmd (643a36ab0000+0x4a2b28) [643a36f52b28]
8: dpdk-testpmd (643a36ab0000+0x799696) [643a37249696]
9: dpdk-testpmd (643a36ab0000+0x798604) [643a37248604]
10: dpdk-testpmd (rdline_char_in+0x38b) [643a3724bcdb]
11: dpdk-testpmd (cmdline_in+0x71) [643a372486d1]
12: dpdk-testpmd (cmdline_interact+0x40) [643a37248800]
13: dpdk-testpmd (prompt+0x2f) [643a36f01cef]
14: dpdk-testpmd (main+0x7e8) [643a36eddeb8]
15: /lib/x86_64-linux-gnu/libc.so.6 (718852800000+0x29d90) [718852829d90]
16: /lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main+0x80) [718852829e40]
17: dpdk-testpmd (_start+0x25) [643a36ef7145]
Aborted (core dumped) <=======
========= Wrong offload flags set for tunnel packets ===========
Finally, there is still another problem with offload flags for RX
outer/inner IP and L4 checksum.
Flags raised outside the tunnel part of qede_recv_pkts() and
qede_recv_pkts_regular() should be OUTER flags. Instead, they are the same
as the inner part.
This can lead to a problem where both GOOD and BAD L4 flags are set in
inner offloads. I will submit a patch for this.
[-- Attachment #2: Type: text/html, Size: 20060 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 1/5] qede: fix tunnel checksums offload flags
2025-04-22 15:39 QEDE bug report Edwin Brossette
@ 2025-04-22 15:51 ` edwin.brossette
2025-04-22 15:51 ` [PATCH 2/5] net/qede: fix bad sanity check on Rx queue release edwin.brossette
` (3 more replies)
0 siblings, 4 replies; 7+ messages in thread
From: edwin.brossette @ 2025-04-22 15:51 UTC (permalink / raw)
To: dev
Cc: olivier.matz, didier.pallard, lauren.hardy, dsinghrawat, palok,
stable, Edwin Brossette
From: Didier Pallard <didier.pallard@6wind.com>
In tunnel case, L3 bad checksum is properly setting
RTE_MBUF_F_RX_OUTER_IP_CKSUM_BAD while all other flags are set in inner
part of offload flags, this can cause both L4 flags BAD and GOOD to be set
in inner offloads when a tunnel packet is processed, changing these flags
to RTE_MBUF_F_RX_L4_CKSUM_NONE instead of GOOD/BAD values. This in turn can
cause upper layers to take incorrect decision on what to do with the
packet.
Remove IP_CKSUM_GOOD flag on outer IP layer, since there is currently no
way to indicate that this csum is good using DPDK offload flags.
Fixes: 81f8804992c9 ("net/qede: enhance Rx CPU utilization")
Fixes: 3d4bb4411683 ("net/qede: add fastpath support for VXLAN tunneling")
CC: stable@dpdk.org
Signed-off-by: Edwin Brossette <edwin.brossette@6wind.com>
Signed-off-by: Didier Pallard <didier.pallard@6wind.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
---
drivers/net/qede/qede_rxtx.c | 12 ++++--------
1 file changed, 4 insertions(+), 8 deletions(-)
diff --git a/drivers/net/qede/qede_rxtx.c b/drivers/net/qede/qede_rxtx.c
index 25e28fd9f61b..c764e3d83763 100644
--- a/drivers/net/qede/qede_rxtx.c
+++ b/drivers/net/qede/qede_rxtx.c
@@ -1617,9 +1617,9 @@ qede_recv_pkts_regular(void *p_rxq, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
"L4 csum failed, flags = 0x%x",
parse_flag);
rxq->rx_hw_errors++;
- ol_flags |= RTE_MBUF_F_RX_L4_CKSUM_BAD;
+ ol_flags |= RTE_MBUF_F_RX_OUTER_L4_CKSUM_BAD;
} else {
- ol_flags |= RTE_MBUF_F_RX_L4_CKSUM_GOOD;
+ ol_flags |= RTE_MBUF_F_RX_OUTER_L4_CKSUM_GOOD;
}
if (unlikely(qede_check_tunn_csum_l3(parse_flag))) {
@@ -1628,8 +1628,6 @@ qede_recv_pkts_regular(void *p_rxq, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
parse_flag);
rxq->rx_hw_errors++;
ol_flags |= RTE_MBUF_F_RX_OUTER_IP_CKSUM_BAD;
- } else {
- ol_flags |= RTE_MBUF_F_RX_IP_CKSUM_GOOD;
}
flags = fp_cqe->tunnel_pars_flags.flags;
@@ -1887,9 +1885,9 @@ qede_recv_pkts(void *p_rxq, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
"L4 csum failed, flags = 0x%x",
parse_flag);
rxq->rx_hw_errors++;
- ol_flags |= RTE_MBUF_F_RX_L4_CKSUM_BAD;
+ ol_flags |= RTE_MBUF_F_RX_OUTER_L4_CKSUM_BAD;
} else {
- ol_flags |= RTE_MBUF_F_RX_L4_CKSUM_GOOD;
+ ol_flags |= RTE_MBUF_F_RX_OUTER_L4_CKSUM_GOOD;
}
if (unlikely(qede_check_tunn_csum_l3(parse_flag))) {
@@ -1898,8 +1896,6 @@ qede_recv_pkts(void *p_rxq, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
parse_flag);
rxq->rx_hw_errors++;
ol_flags |= RTE_MBUF_F_RX_OUTER_IP_CKSUM_BAD;
- } else {
- ol_flags |= RTE_MBUF_F_RX_IP_CKSUM_GOOD;
}
if (tpa_start_flg)
--
2.35.0.4.g44a5d4affccf
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 2/5] net/qede: fix bad sanity check on Rx queue release
2025-04-22 15:51 ` [PATCH 1/5] qede: fix tunnel checksums offload flags edwin.brossette
@ 2025-04-22 15:51 ` edwin.brossette
2025-04-23 15:30 ` Stephen Hemminger
2025-04-22 15:51 ` [PATCH 3/5] Revert "net/qede: fix maximum Rx packet length" edwin.brossette
` (2 subsequent siblings)
3 siblings, 1 reply; 7+ messages in thread
From: edwin.brossette @ 2025-04-22 15:51 UTC (permalink / raw)
To: dev
Cc: olivier.matz, didier.pallard, lauren.hardy, dsinghrawat, palok,
Edwin Brossette, stable
From: Edwin Brossette <edwin.brossette@6wind.com>
As per the rte_mbuf API: the driver is responsible of initializing all
the required fields. This is not done at qede alloc, meaning there can
be garbage data in mbufs memory, although this garbage data should be
overwritten when the mbufs are used. Since a sanity check is done when
freeing the queues, its possible some remaining garbage data causes a
panic when trying to release the queues if some mbufs are being
processed.
Use rte_pktmbuf_raw_free() instead of rte_pktmbuf_free() as the sanity
check is more relaxed.
Fixes: 2ea6f76aff40 ("qede: add core driver")
CC: stable@dpdk.org
Signed-off-by: Edwin Brossette <edwin.brossette@6wind.com>
Acked-by: Didier Pallard <didier.pallard@6wind.com>
---
drivers/net/qede/qede_rxtx.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/qede/qede_rxtx.c b/drivers/net/qede/qede_rxtx.c
index c764e3d83763..601fcb30b357 100644
--- a/drivers/net/qede/qede_rxtx.c
+++ b/drivers/net/qede/qede_rxtx.c
@@ -305,7 +305,7 @@ static void qede_rx_queue_release_mbufs(struct qede_rx_queue *rxq)
if (rxq->sw_rx_ring) {
for (i = 0; i < rxq->nb_rx_desc; i++) {
if (rxq->sw_rx_ring[i]) {
- rte_pktmbuf_free(rxq->sw_rx_ring[i]);
+ rte_mbuf_raw_free(rxq->sw_rx_ring[i]);
rxq->sw_rx_ring[i] = NULL;
}
}
--
2.35.0.4.g44a5d4affccf
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 3/5] Revert "net/qede: fix maximum Rx packet length"
2025-04-22 15:51 ` [PATCH 1/5] qede: fix tunnel checksums offload flags edwin.brossette
2025-04-22 15:51 ` [PATCH 2/5] net/qede: fix bad sanity check on Rx queue release edwin.brossette
@ 2025-04-22 15:51 ` edwin.brossette
2025-04-22 15:51 ` [PATCH 4/5] net/qede: fix QEDE_ETH_OVERHEAD being counted twice in rx_buf_size edwin.brossette
2025-04-22 15:51 ` [PATCH 5/5] net/qede: fix rx_buf_size calculation edwin.brossette
3 siblings, 0 replies; 7+ messages in thread
From: edwin.brossette @ 2025-04-22 15:51 UTC (permalink / raw)
To: dev
Cc: olivier.matz, didier.pallard, lauren.hardy, dsinghrawat, palok,
Edwin Brossette, stable
From: Edwin Brossette <edwin.brossette@6wind.com>
This reverts commit d8ded501e05ce879f27f0ed1df7721a88b737e25.
The maximum length for Rx packets computed in qede_rx_queue_setup()
takes Ethernet CRC into account. This is not consistent with the value
computed in qede_set_mtu(). RTE_ETHER_CRC_LEN should not be added to
max_rx_pktlen, as HW does not include CRC in received frames passed to
host.
The original commit tries to fix another bug with this inappropriate
patch: packets with size nearing MTU limit are being dropped. This is
not because CRC length is not being accounted for in Rx buff size, but
because of the flooring applied to it: the rx_buff size computed is
lower than expected because we try to align it.
This issue will be fixed in the following patch.
CC: stable@dpdk.org
Signed-off-by: Edwin Brossette <edwin.brossette@6wind.com>
Acked-by: Didier Pallard <didier.pallard@6wind.com>
---
drivers/net/qede/qede_rxtx.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/qede/qede_rxtx.c b/drivers/net/qede/qede_rxtx.c
index 601fcb30b357..fe839a6ba844 100644
--- a/drivers/net/qede/qede_rxtx.c
+++ b/drivers/net/qede/qede_rxtx.c
@@ -235,7 +235,7 @@ qede_rx_queue_setup(struct rte_eth_dev *dev, uint16_t qid,
dev->data->rx_queues[qid] = NULL;
}
- max_rx_pktlen = dev->data->mtu + RTE_ETHER_HDR_LEN + RTE_ETHER_CRC_LEN;
+ max_rx_pktlen = dev->data->mtu + RTE_ETHER_HDR_LEN;
/* Fix up RX buffer size */
bufsz = (uint16_t)rte_pktmbuf_data_room_size(mp) - RTE_PKTMBUF_HEADROOM;
--
2.35.0.4.g44a5d4affccf
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 4/5] net/qede: fix QEDE_ETH_OVERHEAD being counted twice in rx_buf_size
2025-04-22 15:51 ` [PATCH 1/5] qede: fix tunnel checksums offload flags edwin.brossette
2025-04-22 15:51 ` [PATCH 2/5] net/qede: fix bad sanity check on Rx queue release edwin.brossette
2025-04-22 15:51 ` [PATCH 3/5] Revert "net/qede: fix maximum Rx packet length" edwin.brossette
@ 2025-04-22 15:51 ` edwin.brossette
2025-04-22 15:51 ` [PATCH 5/5] net/qede: fix rx_buf_size calculation edwin.brossette
3 siblings, 0 replies; 7+ messages in thread
From: edwin.brossette @ 2025-04-22 15:51 UTC (permalink / raw)
To: dev
Cc: olivier.matz, didier.pallard, lauren.hardy, dsinghrawat, palok,
Edwin Brossette, stable
From: Edwin Brossette <edwin.brossette@6wind.com>
rx_buf_size is computed at 2 different places: in qede_rx_queue_setup()
and in qede_set_mtu().
In qede_rx_queue_setup(), it is initialized with mtu + RTE_ETHER_HDR_LEN
and QEDE_ETH_OVERHEAD is added to it in qede_calc_rx_buf_size().
In qede_set_mtu(), it is initialized with
mtu + RTE_ETHER_HDR_LEN + QEDE_ETH_OVERHEAD and QEDE_ETH_OVERHEAD is
added to it a second time in qede_calc_rx_buf_size().
This is inconsistent and wrong in the case of qede_set_mtu(). Initialize
this variable with mtu + QEDE_MAX_ETHER_HDR_LEN instead and stop adding
+ QEDE_ETH_OVERHEAD over and over again. This will factorize the code.
Fixes: 318d7da3122b ("net/qede: fix Rx buffer size calculation")
CC: stable@dpdk.org
Signed-off-by: Edwin Brossette <edwin.brossette@6wind.com>
Acked-by: Didier Pallard <didier.pallard@6wind.com>
---
drivers/net/qede/qede_rxtx.c | 17 ++++++-----------
1 file changed, 6 insertions(+), 11 deletions(-)
diff --git a/drivers/net/qede/qede_rxtx.c b/drivers/net/qede/qede_rxtx.c
index fe839a6ba844..ea77f09e18be 100644
--- a/drivers/net/qede/qede_rxtx.c
+++ b/drivers/net/qede/qede_rxtx.c
@@ -101,19 +101,14 @@ qede_calc_rx_buf_size(struct rte_eth_dev *dev, uint16_t mbufsz,
* buffers can be used for single packet. So need to make sure
* mbuf size is sufficient enough for this.
*/
- if ((mbufsz * ETH_RX_MAX_BUFF_PER_PKT) <
- (max_frame_size + QEDE_ETH_OVERHEAD)) {
+ if ((mbufsz * ETH_RX_MAX_BUFF_PER_PKT) < max_frame_size) {
DP_ERR(edev, "mbuf %d size is not enough to hold max fragments (%d) for max rx packet length (%d)\n",
mbufsz, ETH_RX_MAX_BUFF_PER_PKT, max_frame_size);
return -EINVAL;
}
-
- rx_buf_size = RTE_MAX(mbufsz,
- (max_frame_size + QEDE_ETH_OVERHEAD) /
- ETH_RX_MAX_BUFF_PER_PKT);
- } else {
- rx_buf_size = max_frame_size + QEDE_ETH_OVERHEAD;
- }
+ rx_buf_size = mbufsz;
+ } else
+ rx_buf_size = max_frame_size;
/* Align to cache-line size if needed */
return QEDE_FLOOR_TO_CACHE_LINE_SIZE(rx_buf_size);
@@ -235,14 +230,14 @@ qede_rx_queue_setup(struct rte_eth_dev *dev, uint16_t qid,
dev->data->rx_queues[qid] = NULL;
}
- max_rx_pktlen = dev->data->mtu + RTE_ETHER_HDR_LEN;
+ max_rx_pktlen = dev->data->mtu + QEDE_MAX_ETHER_HDR_LEN;
/* Fix up RX buffer size */
bufsz = (uint16_t)rte_pktmbuf_data_room_size(mp) - RTE_PKTMBUF_HEADROOM;
/* cache align the mbuf size to simplify rx_buf_size calculation */
bufsz = QEDE_FLOOR_TO_CACHE_LINE_SIZE(bufsz);
if ((rxmode->offloads & RTE_ETH_RX_OFFLOAD_SCATTER) ||
- (max_rx_pktlen + QEDE_ETH_OVERHEAD) > bufsz) {
+ max_rx_pktlen > bufsz) {
if (!dev->data->scattered_rx) {
DP_INFO(edev, "Forcing scatter-gather mode\n");
dev->data->scattered_rx = 1;
--
2.35.0.4.g44a5d4affccf
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 5/5] net/qede: fix rx_buf_size calculation
2025-04-22 15:51 ` [PATCH 1/5] qede: fix tunnel checksums offload flags edwin.brossette
` (2 preceding siblings ...)
2025-04-22 15:51 ` [PATCH 4/5] net/qede: fix QEDE_ETH_OVERHEAD being counted twice in rx_buf_size edwin.brossette
@ 2025-04-22 15:51 ` edwin.brossette
3 siblings, 0 replies; 7+ messages in thread
From: edwin.brossette @ 2025-04-22 15:51 UTC (permalink / raw)
To: dev
Cc: olivier.matz, didier.pallard, lauren.hardy, dsinghrawat, palok,
Edwin Brossette, stable
From: Edwin Brossette <edwin.brossette@6wind.com>
When the MTU configured is lower than maximum mbuf size (all packet data
can be stored in a single mbuf), then rx buffer size is configured with
MTU + some overhead. A flooring is applied to this value to align it,
meaning its actual value is going to be lower than expected.
This is a considerable design flaw, because a data packet fitting
exactly the MTU might not fit in the rx buffer. What is observed in this
case is that the nic splits the datagram in two segments, essentially
doing Rx scatter. However, the qede pmd does not expect gather scatter
in this case and does not use the required function: it uses
qede_recv_pkts_regular() which is not capable of handling this case.
Thus, we end up with malformed packet with m->nb_segs = 2 and
m->next = NULL. This means the last part of the data is missing.
CEIL max_rx_pktlen instead of FLOORing it. Also change the check on
max_rx_pktlen > bufsz in qede_rx_queue_setup(): if this ceiling would
make max_rx_pktlen exceed mbuf size, then force enable scatter-gather
mode.
Fixes: 318d7da3122b ("net/qede: fix Rx buffer size calculation")
CC: stable@dpdk.org
Signed-off-by: Edwin Brossette <edwin.brossette@6wind.com>
Acked-by: Didier Pallard <didier.pallard@6wind.com>
---
drivers/net/qede/qede_rxtx.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/qede/qede_rxtx.c b/drivers/net/qede/qede_rxtx.c
index ea77f09e18be..2a6f1290ad4a 100644
--- a/drivers/net/qede/qede_rxtx.c
+++ b/drivers/net/qede/qede_rxtx.c
@@ -111,7 +111,7 @@ qede_calc_rx_buf_size(struct rte_eth_dev *dev, uint16_t mbufsz,
rx_buf_size = max_frame_size;
/* Align to cache-line size if needed */
- return QEDE_FLOOR_TO_CACHE_LINE_SIZE(rx_buf_size);
+ return QEDE_CEIL_TO_CACHE_LINE_SIZE(rx_buf_size);
}
static struct qede_rx_queue *
@@ -237,7 +237,7 @@ qede_rx_queue_setup(struct rte_eth_dev *dev, uint16_t qid,
/* cache align the mbuf size to simplify rx_buf_size calculation */
bufsz = QEDE_FLOOR_TO_CACHE_LINE_SIZE(bufsz);
if ((rxmode->offloads & RTE_ETH_RX_OFFLOAD_SCATTER) ||
- max_rx_pktlen > bufsz) {
+ QEDE_CEIL_TO_CACHE_LINE_SIZE(max_rx_pktlen) > bufsz) {
if (!dev->data->scattered_rx) {
DP_INFO(edev, "Forcing scatter-gather mode\n");
dev->data->scattered_rx = 1;
--
2.35.0.4.g44a5d4affccf
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/5] net/qede: fix bad sanity check on Rx queue release
2025-04-22 15:51 ` [PATCH 2/5] net/qede: fix bad sanity check on Rx queue release edwin.brossette
@ 2025-04-23 15:30 ` Stephen Hemminger
0 siblings, 0 replies; 7+ messages in thread
From: Stephen Hemminger @ 2025-04-23 15:30 UTC (permalink / raw)
To: edwin.brossette
Cc: dev, olivier.matz, didier.pallard, lauren.hardy, dsinghrawat,
palok, stable
On Tue, 22 Apr 2025 17:51:40 +0200
edwin.brossette@6wind.com wrote:
> From: Edwin Brossette <edwin.brossette@6wind.com>
>
> As per the rte_mbuf API: the driver is responsible of initializing all
> the required fields. This is not done at qede alloc, meaning there can
> be garbage data in mbufs memory, although this garbage data should be
> overwritten when the mbufs are used. Since a sanity check is done when
> freeing the queues, its possible some remaining garbage data causes a
> panic when trying to release the queues if some mbufs are being
> processed.
>
> Use rte_pktmbuf_raw_free() instead of rte_pktmbuf_free() as the sanity
> check is more relaxed.
>
> Fixes: 2ea6f76aff40 ("qede: add core driver")
> CC: stable@dpdk.org
Patch looks fine, but DPDK is trying to follow the inclusive naming
guidelines. The term "sanity check" is on the not allowed list.
I will reword the commit message.
https://inclusivenaming.org/word-lists/tier-2/sanity-check/
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-04-23 15:30 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-04-22 15:39 QEDE bug report Edwin Brossette
2025-04-22 15:51 ` [PATCH 1/5] qede: fix tunnel checksums offload flags edwin.brossette
2025-04-22 15:51 ` [PATCH 2/5] net/qede: fix bad sanity check on Rx queue release edwin.brossette
2025-04-23 15:30 ` Stephen Hemminger
2025-04-22 15:51 ` [PATCH 3/5] Revert "net/qede: fix maximum Rx packet length" edwin.brossette
2025-04-22 15:51 ` [PATCH 4/5] net/qede: fix QEDE_ETH_OVERHEAD being counted twice in rx_buf_size edwin.brossette
2025-04-22 15:51 ` [PATCH 5/5] net/qede: fix rx_buf_size calculation edwin.brossette
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).