Hello,

I have several issues to report concerning the qede pmd as well as potential solutions for them. Most of them have to do with configuring the MTU.

========== Abort on mtu change ===========

First, the qede_assign_rxtx_handlers() seems to be working wrong since an API change in the rte_eth lib. The commit linked bellow changes the way packet receive and transmit functions are handled:
https://git.dpdk.org/dpdk/commit/?id=c87d435a4d79739c0cec2ed280b94b41cb908af7

Originally, the Rx/Tx handlers were located in rte_eth_dev struct. This is currently no longer the case and the polling of incoming packets is done with functions registered in rte_eth_fp_ops instead. The rte lib change is supposed to be transparent for the individual pmds, but the polling functions in rte_eth_dev are only synchronized with the ones in rte_eth_fp_ops at the device start. This leads to an issue when trying to configure the MTU while there is ongoing traffic: 
-> Trying to change the MTU triggers a port restart.
-> qede_assign_rxtx_handlers() assign dummy polling functions in dev->rx_pkt_burst and dev->tx_pkt_burst while the port is down. 
-> However, rte_eth_rx_burst() polls in &rte_eth_fp_ops[port_id].rx_pkt_burst which still points to qede_recv_pkts_regular()
-> The application keep polling packets in the receive function and triggers an assert(rx_mb != NULL) which caused an abort. 

Since rte_eth_fp_ops is reset in rte_eth_dev_stop(), it may be better to call this function instead of qede_dev_stop(). However the dummy functions defined in lib/ethdev/ethdev_private.c log an error and dump stack when called so they might not be intended to be used this way.
The way I fixed this issue in our applications is by forcing a complete stop of the port before configuring the MTU. I have no DPDK patch to suggest for this

-------------- Reproduction ------------------

    1) Start testpmd:
dpdk-testpmd --log-level=pmd.net.qede.driver:7 -a 0000:17:00.0 -a 0000:17:00.1 -- -i --rxq=2 --txq=2 --coremask=0x0c --total-num-mbufs=250000

    2) Start packet forwarding:
start
io packet forwarding - ports=2 - cores=2 - streams=4 - NUMA support enabled, MP allocation mode: native
Logical Core 2 (socket 0) forwards packets on 2 streams:
  RX P=0/Q=0 (socket 0) -> TX P=1/Q=0 (socket 0) peer=02:00:00:00:00:01
  RX P=1/Q=0 (socket 0) -> TX P=0/Q=0 (socket 0) peer=02:00:00:00:00:00
Logical Core 3 (socket 1) forwards packets on 2 streams:
  RX P=0/Q=1 (socket 0) -> TX P=1/Q=1 (socket 0) peer=02:00:00:00:00:01
  RX P=1/Q=1 (socket 0) -> TX P=0/Q=1 (socket 0) peer=02:00:00:00:00:00

  io packet forwarding packets/burst=32
  nb forwarding cores=2 - nb forwarding ports=2
  port 0: RX queue number: 2 Tx queue number: 2
    Rx offloads=0x80000 Tx offloads=0x0
    RX queue: 0
      RX desc=0 - RX free threshold=0
      RX threshold registers: pthresh=0 hthresh=0  wthresh=0
      RX Offloads=0x80000
    TX queue: 0
      TX desc=0 - TX free threshold=0
      TX threshold registers: pthresh=0 hthresh=0  wthresh=0
      TX offloads=0x8000 - TX RS bit threshold=0
  port 1: RX queue number: 2 Tx queue number: 2
    Rx offloads=0x80000 Tx offloads=0x0
    RX queue: 0
      RX desc=0 - RX free threshold=0
      RX threshold registers: pthresh=0 hthresh=0  wthresh=0
      RX Offloads=0x80000
    TX queue: 0
      TX desc=0 - TX free threshold=0
      TX threshold registers: pthresh=0 hthresh=0  wthresh=0
      TX offloads=0x8000 - TX RS bit threshold=0

    3) Send a continuous stream of packets and change the mtu while they are being forwarded:

p = Ether()/IP(src='10.100.0.1', dst='10.200.0.1')/UDP(sport=11111, dport=22222)/Raw(load='A'*100)
sendp(p, iface='ntfp1', count=5000000, inter=0.001)

port config mtu 0 1500
[qede_dev_set_link_state:1842(17:00.0:dpdk-port-0)]setting link state 0
[qede_link_update:1483(17:00.0:dpdk-port-0)]Link - Speed 0 Mode 1 AN 1 Status 0
[ecore_int_attentions:1239(17:00.0:dpdk-port-0-0)]MFW indication via attention
[qede_link_update:1483(17:00.0:dpdk-port-0)]Link - Speed 0 Mode 1 AN 1 Status 0
[qede_activate_vport:536(17:00.0:dpdk-port-0)]vport is deactivated
[qede_tx_queue_reset:509(17:00.0:dpdk-port-0)]Reset TX queue 0
[qede_tx_queue_stop:991(17:00.0:dpdk-port-0)]TX queue 0 stopped
[qede_tx_queue_reset:509(17:00.0:dpdk-port-0)]Reset TX queue 1
[qede_tx_queue_stop:991(17:00.0:dpdk-port-0)]TX queue 1 stopped
[qede_rx_queue_reset:293(17:00.0:dpdk-port-0)]Reset RX queue 0
[qede_rx_queue_stop:371(17:00.0:dpdk-port-0)]RX queue 0 stopped
[qede_rx_queue_reset:293(17:00.0:dpdk-port-0)]Reset RX queue 1
[qede_rx_queue_stop:371(17:00.0:dpdk-port-0)]RX queue 1 stopped
dpdk-testpmd: ../../../source/dpdk-24.11/drivers/net/qede/qede_rxtx.c:1600: qede_recv_pkts_regular: Assertion `rx_mb != NULL' failed.
Aborted (core dumped)  <=======

As you can see, the application is aborted.


========= Bad Rx buffer size for mbufs ===========

Another issue I had when trying to set the MTU was that I got bad sanity checks on mbufs when sending large packets, causing aborts. These checks are only done when compiling in debug mode with RTE_LIBRTE_MBUF_DEBUG set, so they may be easy to miss. Even without using this debugging config, there are problems when transmitting those packets. This problem occurs when calculating the size of the rx buffer, which was reworked in this patch:
https://git.dpdk.org/dpdk/commit/drivers/net/qede?id=318d7da3122bac04772418c5eda9f50fcd175d18

-> First, the max Rx buffer size is configured at two different places in the code, and the calculation is not consistant between them. In qede_rx_queue_setup(), RTE_ETHER_CRC_LEN is added to max_rx_pktlen but not in qede_set_mtu(). The commit above mentions that HW does not include CRC in received frame when passed to host, meaning the CRC should not be added. Also, QEDE_ETH_OVERHEAD is added twice, once when frame_size is initialized with QEDE_MAX_ETHER_HDR_LEN and another time in qede_calc_rx_buf_size.

-> Furthermore, the commit mentions applying a flooring on rx_buf_size to align its value. This will cause an issue when receiving large packets nearing MTU size limit, as the rx_buffer will not be large enough for some MTU values. What I observed when debugging the pmd is that the nic would do Rx scatter to compensate for the insufficient buffer size, although the pmd was not configured to handle this. I  saw mbufs with m->nb_segs = 2 and m->next = NULL being received, which was unsurprising, given the wrong receive function was used as Rx scatter was not enabled: qede_recv_pkts_regular() instead of qede_recv_pkts(). I would suggest restoring the use of QEDE_CEIL_TO_CACHE_LINE_SIZE and force-enabling Rx scatter in case this value would exceed mbuf size. This would ensure the buffer is large enough to receive MTU-sized packets without fragmentation.

I will submit patches to improve the calculation of the rx buffer size.

-------------- Reproduction ------------------

Sadly, I did not manage to reproduce this issue with testpmd because Rx scatter gets forcefully enabled by qede when started. This happens because mtu is set at maximum possible value by default.
With a sufficient log level, this is visible at start:

Configuring Port 0 (socket 0)
[qede_check_fdir_support:149(17:00.0:dpdk-port-0)]flowdir is disabled
[qed_sb_init:500(17:00.0:dpdk-port-0)]hwfn [0] <--[init]-- SB 0000 [0x0000 upper]
[qede_alloc_fp_resc:656(17:00.0:dpdk-port-0)]sb_info idx 0x1 initialized
[qed_sb_init:500(17:00.0:dpdk-port-0)]hwfn [0] <--[init]-- SB 0001 [0x0001 upper]
[qede_alloc_fp_resc:656(17:00.0:dpdk-port-0)]sb_info idx 0x2 initialized
[qede_start_vport:498(17:00.0:dpdk-port-0)]VPORT started with MTU = 2154
[qede_vlan_stripping:906(17:00.0:dpdk-port-0)]VLAN stripping disabled
[qede_vlan_filter_set:970(17:00.0:dpdk-port-0)]No VLAN filters configured yet
[qede_vlan_offload_set:1035(17:00.0:dpdk-port-0)]VLAN offload mask 3
[qede_dev_configure:1329(17:00.0:dpdk-port-0)]Device configured with RSS=2 TSS=2
[qede_alloc_tx_queue_mem:446(17:00.0:dpdk-port-0)]txq 0 num_desc 512 tx_free_thresh 480 socket 0
[qede_alloc_tx_queue_mem:446(17:00.0:dpdk-port-0)]txq 1 num_desc 512 tx_free_thresh 480 socket 0
[qede_rx_queue_setup:247(17:00.0:dpdk-port-0)]Forcing scatter-gather mode <=========================
[qede_alloc_rx_queue_mem:155(17:00.0:dpdk-port-0)]mtu 2154 mbufsz 2048 bd_max_bytes 2048 scatter_mode 1
[qede_rx_queue_setup:283(17:00.0:dpdk-port-0)]rxq 0 num_desc 512 rx_buf_size=2048 socket 0
[qede_alloc_rx_queue_mem:155(17:00.0:dpdk-port-0)]mtu 2154 mbufsz 2048 bd_max_bytes 2048 scatter_mode 1
[qede_rx_queue_setup:283(17:00.0:dpdk-port-0)]rxq 1 num_desc 512 rx_buf_size=2048 socket 0
[qede_rx_queue_start:778(17:00.0:dpdk-port-0)]rxq 0 igu_sb_id 0x1
[qede_rx_queue_start:805(17:00.0:dpdk-port-0)]RX queue 0 started
[qede_rx_queue_start:778(17:00.0:dpdk-port-0)]rxq 1 igu_sb_id 0x2
[qede_rx_queue_start:805(17:00.0:dpdk-port-0)]RX queue 1 started
[qede_tx_queue_start:837(17:00.0:dpdk-port-0)]txq 0 igu_sb_id 0x1
[qede_tx_queue_start:870(17:00.0:dpdk-port-0)]TX queue 0 started
[qede_tx_queue_start:837(17:00.0:dpdk-port-0)]txq 1 igu_sb_id 0x2
[qede_tx_queue_start:870(17:00.0:dpdk-port-0)]TX queue 1 started
[qede_config_rss:1059(17:00.0:dpdk-port-0)]Applying driver default key
[qede_rss_hash_update:2121(17:00.0:dpdk-port-0)]RSS hf = 0x104 len = 40 key = 0x7ffc5980db90
[qede_rss_hash_update:2126(17:00.0:dpdk-port-0)]Enabling rss
[qede_rss_hash_update:2140(17:00.0:dpdk-port-0)]Applying user supplied hash key
[qede_rss_hash_update:2187(17:00.0:dpdk-port-0)]Storing RSS key
[qede_activate_vport:536(17:00.0:dpdk-port-0)]vport is activated
[qede_dev_set_link_state:1842(17:00.0:dpdk-port-0)]setting link state 1
[qede_link_update:1483(17:00.0:dpdk-port-0)]Link - Speed 0 Mode 1 AN 1 Status 0
[qede_link_update:1483(17:00.0:dpdk-port-0)]Link - Speed 0 Mode 1 AN 1 Status 0
[qede_assign_rxtx_handlers:338(17:00.0:dpdk-port-0)]Assigning qede_recv_pkts
[qede_assign_rxtx_handlers:354(17:00.0:dpdk-port-0)]Assigning qede_xmit_pkts_regular
[qede_dev_start:1155(17:00.0:dpdk-port-0)]Device started
[qede_ucast_filter:682(17:00.0:dpdk-port-0)]Unicast MAC is not found
Port 0: F4:E9:D4:7A:A1:E6


========= Bad sanity check on port stop ===========

There is another sanity check which break the program on port stop when releasing mbufs: If large traffic has been sent, it's possible to see packets with m->pkt_len = 0 but non-zero m->data_len. It is probably because mbufs are not reset when allocated in qede_alloc_rx_buffer() and their value is only set at the end of qede_recv_pkts(). Calling rte_pktmbuf_reset() before freeing the mbufs is enough to avoid this issue. I will submit a patch doing this.

-------------- Reproduction ------------------

1) Start testpmd:
dpdk-testpmd --log-level=pmd.net.qede.driver:7 -a 0000:17:00.0 -a 0000:17:00.1 -- -i --rxq=2 --txq=2 --coremask=0x0c --total-num-mbufs=250000 --max-pkt-len 2172

show port info 0

********************* Infos for port 0  *********************
MAC address: F4:E9:D4:7A:A1:E6
Device name: 0000:17:00.0
Driver name: net_qede
Firmware-version: 8.40.33.0 MFW: 8.24.21.0
Devargs:
Connect to socket: 0
memory allocation on the socket: 0
Link status: up
Link speed: 10 Gbps
Link duplex: full-duplex
Autoneg status: On
MTU: 2154
Promiscuous mode: enabled
Allmulticast mode: disabled
Maximum number of MAC addresses: 256
Maximum number of MAC addresses of hash filtering: 0
VLAN offload:
  strip off, filter off, extend off, qinq strip off
Hash key size in bytes: 40
Redirection table size: 128
Supported RSS offload flow types:
  ipv4  ipv4-tcp  ipv4-udp  ipv6  ipv6-tcp  ipv6-udp  vxlan
  geneve
Minimum size of RX buffer: 1024
Maximum configurable length of RX packet: 9672
Maximum configurable size of LRO aggregated packet: 0
Current number of RX queues: 2
Max possible RX queues: 32
Max possible number of RXDs per queue: 32768
Min possible number of RXDs per queue: 128
RXDs number alignment: 128
Current number of TX queues: 2
Max possible TX queues: 32
Max possible number of TXDs per queue: 32768
Min possible number of TXDs per queue: 256
TXDs number alignment: 256
Max segment number per packet: 255
Max segment number per MTU/TSO: 18
Device capabilities: 0x0( )
Device error handling mode: none
Device private info:
  none

2) Start packet forwarding:
start
io packet forwarding - ports=2 - cores=2 - streams=4 - NUMA support enabled, MP allocation mode: native
Logical Core 2 (socket 0) forwards packets on 2 streams:
  RX P=0/Q=0 (socket 0) -> TX P=1/Q=0 (socket 0) peer=02:00:00:00:00:01
  RX P=1/Q=0 (socket 0) -> TX P=0/Q=0 (socket 0) peer=02:00:00:00:00:00
Logical Core 3 (socket 1) forwards packets on 2 streams:
  RX P=0/Q=1 (socket 0) -> TX P=1/Q=1 (socket 0) peer=02:00:00:00:00:01
  RX P=1/Q=1 (socket 0) -> TX P=0/Q=1 (socket 0) peer=02:00:00:00:00:00

  io packet forwarding packets/burst=32
  nb forwarding cores=2 - nb forwarding ports=2
  port 0: RX queue number: 2 Tx queue number: 2
    Rx offloads=0x80000 Tx offloads=0x0
    RX queue: 0
      RX desc=0 - RX free threshold=0
      RX threshold registers: pthresh=0 hthresh=0  wthresh=0
      RX Offloads=0x80000
    TX queue: 0
      TX desc=0 - TX free threshold=0
      TX threshold registers: pthresh=0 hthresh=0  wthresh=0
      TX offloads=0x8000 - TX RS bit threshold=0
  port 1: RX queue number: 2 Tx queue number: 2
    Rx offloads=0x80000 Tx offloads=0x0
    RX queue: 0
      RX desc=0 - RX free threshold=0
      RX threshold registers: pthresh=0 hthresh=0  wthresh=0
      RX Offloads=0x80000
    TX queue: 0
      TX desc=0 - TX free threshold=0
      TX threshold registers: pthresh=0 hthresh=0  wthresh=0
      TX offloads=0x8000 - TX RS bit threshold=0

3) Send a continuous stream of packet of large size, nearing MTU limit (size 2020):
p = Ether()/IP(src='10.100.0.1', dst='10.200.0.1')/UDP(sport=11111, dport=22222)/Raw(load='A'*2020)
sendp(p, iface='ntfp1', count=5000, inter=0.001)

*Stop sending packets*

4) Stop packet forwarding:
stop
Telling cores to stop...
Waiting for lcores to finish...

  ------- Forward Stats for RX Port= 0/Queue= 1 -> TX Port= 1/Queue= 1 -------
  RX-packets: 1251           TX-packets: 1251           TX-dropped: 0            

  ---------------------- Forward statistics for port 0  ----------------------
  RX-packets: 1251           RX-dropped: 0             RX-total: 1251
  TX-packets: 0              TX-dropped: 0             TX-total: 0
  ----------------------------------------------------------------------------

  ---------------------- Forward statistics for port 1  ----------------------
  RX-packets: 0              RX-dropped: 0             RX-total: 0
  TX-packets: 1251           TX-dropped: 0             TX-total: 1251
  ----------------------------------------------------------------------------

  +++++++++++++++ Accumulated forward statistics for all ports+++++++++++++++
  RX-packets: 1251           RX-dropped: 0             RX-total: 1251
  TX-packets: 1251           TX-dropped: 0             TX-total: 1251
  ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Done.

5) Stop port:
port stop 0
Stopping ports...
[qede_dev_set_link_state:1842(17:00.0:dpdk-port-0)]setting link state 0
[qede_link_update:1483(17:00.0:dpdk-port-0)]Link - Speed 0 Mode 1 AN 1 Status 0
[ecore_int_attentions:1239(17:00.0:dpdk-port-0-0)]MFW indication via attention
[qede_link_update:1483(17:00.0:dpdk-port-0)]Link - Speed 0 Mode 1 AN 1 Status 0
[qede_activate_vport:536(17:00.0:dpdk-port-0)]vport is deactivated
[qede_tx_queue_reset:509(17:00.0:dpdk-port-0)]Reset TX queue 0
[qede_tx_queue_stop:991(17:00.0:dpdk-port-0)]TX queue 0 stopped
[qede_tx_queue_reset:509(17:00.0:dpdk-port-0)]Reset TX queue 1
[qede_tx_queue_stop:991(17:00.0:dpdk-port-0)]TX queue 1 stopped
[qede_rx_queue_reset:293(17:00.0:dpdk-port-0)]Reset RX queue 0
[qede_rx_queue_stop:371(17:00.0:dpdk-port-0)]RX queue 0 stopped
EAL: PANIC in rte_mbuf_sanity_check():
bad pkt_len
0: dpdk-testpmd (rte_dump_stack+0x42) [643a372c5ac2]
1: dpdk-testpmd (__rte_panic+0xcc) [643a36c81963]
2: dpdk-testpmd (643a36ab0000+0x1d0b99) [643a36c80b99]
3: dpdk-testpmd (643a36ab0000+0x11b66be) [643a37c666be]
4: dpdk-testpmd (qede_stop_queues+0x162) [643a37c67ea2]
5: dpdk-testpmd (643a36ab0000+0x3c33fa) [643a36e733fa]
6: dpdk-testpmd (rte_eth_dev_stop+0x86) [643a3725a4c6]
7: dpdk-testpmd (643a36ab0000+0x4a2b28) [643a36f52b28]
8: dpdk-testpmd (643a36ab0000+0x799696) [643a37249696]
9: dpdk-testpmd (643a36ab0000+0x798604) [643a37248604]
10: dpdk-testpmd (rdline_char_in+0x38b) [643a3724bcdb]
11: dpdk-testpmd (cmdline_in+0x71) [643a372486d1]
12: dpdk-testpmd (cmdline_interact+0x40) [643a37248800]
13: dpdk-testpmd (prompt+0x2f) [643a36f01cef]
14: dpdk-testpmd (main+0x7e8) [643a36eddeb8]
15: /lib/x86_64-linux-gnu/libc.so.6 (718852800000+0x29d90) [718852829d90]
16: /lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main+0x80) [718852829e40]
17: dpdk-testpmd (_start+0x25) [643a36ef7145]
Aborted (core dumped)    <=======

========= Wrong offload flags set for tunnel packets ===========

Finally, there is still another problem with offload flags for RX outer/inner IP and L4 checksum.
Flags raised outside the tunnel part of qede_recv_pkts() and qede_recv_pkts_regular() should be OUTER flags. Instead, they are the same as the inner part.
This can lead to a problem where both GOOD and BAD L4 flags are set in inner offloads. I will submit a patch for this.