Hi, All:

 

I am using Mellanox ConnectX-5 and ConnectX-4 Lx with DPDK v21.11 but

there is a probability that the nic can't send packets.

 

One condition is that the contiguous physical of hugepages allocated

on the host is poor. For example, if the environment is configured with

10GB hugepages but each hugepage is physically discontinuous, this problem

can be reproduced.

 

This problem is introduced by this patch:

https://git.dpdk.org/dpdk/commit/?id=fec28ca0e3a93143829f3b41a28a8da933f28499.

 

 

LOG:

dpdk # ./x86_64-native-linuxapp-gcc/app/dpdk-testpmd -c 0xFC0 --iova-mode pa

--legacy-mem -a 03:00.0 -a 03:00.1  -m 8192,0 -- -a -i --forward-mode=fwd

--rxq=4 --txq=4 --total-num-mbufs=1000000

 

EAL: Detected CPU lcores: 72

EAL: Detected NUMA nodes: 2

EAL: Detected static linkage of DPDK

EAL: Multi-process socket /var/run/dpdk/rte/mp_socket

EAL: Selected IOVA mode 'PA'

EAL: Probe PCI driver: mlx5_pci (15b3:1017) device: 0000:03:00.0 (socket 0)

mlx5_net: Default miss action is not supported.

EAL: Probe PCI driver: mlx5_pci (15b3:1017) device: 0000:03:00.1 (socket 0)

mlx5_net: Default miss action is not supported.

TELEMETRY: No legacy callbacks, legacy socket not created

Auto-start selected

Interactive-mode selected

Invalid fwd packet forwarding mode

testpmd: create a new mbuf pool <mb_pool_0>: n=1000000, size=2176, socket=0

testpmd: preferred mempool ops selected: ring_mp_mc

Configuring Port 0 (socket 0)

Port 0: 28:DE:E5:AB:9D:CA

Configuring Port 1 (socket 0)

Port 1: 28:DE:E5:AB:9D:CB

Checking link statuses...

Done

Start automatic packet forwarding

io packet forwarding - ports=2 - cores=1 - streams=8 - NUMA support enabled, MP allocation mode: native

Logical Core 7 (socket 0) forwards packets on 8 streams:

  RX P=0/Q=0 (socket 0) -> TX P=1/Q=0 (socket 0) peer=02:00:00:00:00:01

  RX P=1/Q=0 (socket 0) -> TX P=0/Q=0 (socket 0) peer=02:00:00:00:00:00

  RX P=0/Q=1 (socket 0) -> TX P=1/Q=1 (socket 0) peer=02:00:00:00:00:01

  RX P=1/Q=1 (socket 0) -> TX P=0/Q=1 (socket 0) peer=02:00:00:00:00:00

  RX P=0/Q=2 (socket 0) -> TX P=1/Q=2 (socket 0) peer=02:00:00:00:00:01

  RX P=1/Q=2 (socket 0) -> TX P=0/Q=2 (socket 0) peer=02:00:00:00:00:00

  RX P=0/Q=3 (socket 0) -> TX P=1/Q=3 (socket 0) peer=02:00:00:00:00:01

  RX P=1/Q=3 (socket 0) -> TX P=0/Q=3 (socket 0) peer=02:00:00:00:00:00

 

  io packet forwarding packets/burst=32

  nb forwarding cores=1 - nb forwarding ports=2

  port 0: RX queue number: 4 Tx queue number: 4

    Rx offloads=0x0 Tx offloads=0x10000

    RX queue: 0

      RX desc=4096 - RX free threshold=64

      RX threshold registers: pthresh=0 hthresh=0  wthresh=0

      RX Offloads=0x0

    TX queue: 0

      TX desc=4096 - TX free threshold=0

      TX threshold registers: pthresh=0 hthresh=0  wthresh=0

      TX offloads=0x10000 - TX RS bit threshold=0

  port 1: RX queue number: 4 Tx queue number: 4

    Rx offloads=0x0 Tx offloads=0x10000

    RX queue: 0

      RX desc=4096 - RX free threshold=64

      RX threshold registers: pthresh=0 hthresh=0  wthresh=0

      RX Offloads=0x0

    TX queue: 0

      TX desc=4096 - TX free threshold=0

      TX threshold registers: pthresh=0 hthresh=0  wthresh=0

      TX offloads=0x10000 - TX RS bit threshold=0

testpmd> mlx5_net: Cannot change Tx QP state to INIT Invalid argument

mlx5_net: Cannot change Tx QP state to INIT Invalid argument

mlx5_net: Cannot change Tx QP state to INIT Invalid argument

mlx5_net: Cannot change Tx QP state to INIT Invalid argument

 

testpmd> mlx5_net: Cannot change Tx QP state to INIT Invalid argument

mlx5_net: Cannot change Tx QP state to INIT Invalid argument

quimlx5_net: Cannot change Tx QP state to INIT Invalid argument

mlx5_net: Cannot change Tx QP state to INIT Invalid argument

 

And create some files:

/var/log/dpdk_mlx5_port_0_txq_0_index_0_1883249505

/var/log/dpdk_mlx5_port_0_txq_0_index_0_2291454530

/var/log/dpdk_mlx5_port_0_txq_0_index_0_2880295119

/var/log/dpdk_mlx5_port_1_txq_0_index_0_2198716197

/var/log/dpdk_mlx5_port_1_txq_0_index_0_2498129310

/var/log/dpdk_mlx5_port_1_txq_0_index_0_3046021743

 

Unexpected CQE error syndrome 0x04 CQN = 256 SQN = 6612 wqe_counter = 0 wq_ci = 1 cq_ci = 0

 

MLX5 Error CQ: at [0x7f6edca57000], len=16384

00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | ................

00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | ................

00000020: 00 00 00 01 73 65 65 6E 00 00 00 00 00 00 00 00 | ....seen........

00000030: 00 00 00 00 9D 00 53 04 29 00 19 D4 00 00 02 D2 | ......S.).......

00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | ................

00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | ................

00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | ................

00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 F0 | ................

00000080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | ................