DPDK usage discussions
 help / color / mirror / Atom feed
* [mlx5] CX6 NIC bug, the process exits abnormally.
@ 2023-09-14 12:08 jiangheng (G)
  0 siblings, 0 replies; only message in thread
From: jiangheng (G) @ 2023-09-14 12:08 UTC (permalink / raw)
  To: users, matan, Slava Ovsiienko, orika

[-- Attachment #1.1: Type: text/plain, Size: 2799 bytes --]

During the pressure test on the CX6 using DPDK, the process exits abnormally. It is located that the problem is caused by a bug of the DPDK mlx5 driver. Please check whether the latest firmware and driver fix this coredump.

By default, the DPDK enables the rxtx_vect and compress CQE functions, and the receive ringbuffer is 1024. During the service process pressure, the service process receives SIGFAULT and exits.
Call stack information:
    #2  0x0000000000e72437 in signal_captured_function (signo=11, si=0x7f6310f46eb0, ucontext=0x7f6310f46d80) at ../v1/handle_signal.c:499
    #3  <signal handler called>
    #4  _mm_storeu_si128 (__B=..., __P=<optimized out>) at /usr/lib/gcc/x86_64-linux-gnu/7.3.0/include/emmintrin.h:720
    #5  rxq_cq_decompress_v (elts=0x20217ff394e8, cq=0x20217f8538c0, rxq=0x20217ff36e00) at ../drivers/net/mlx5/mlx5_rxtx_vec_sse.h:159
    #6  rxq_burst_v (no_cq=<synthetic pointer>, err=<synthetic pointer>, pkts_n=9, pkts=0x2004e278c9d8, rxq=0x20217ff36e00) at ../drivers/net/mlx5/mlx5_rxtx_vec.c:349
    #7  mlx5_rx_burst_vec (dpdk_rxq=0x20217ff36e00, pkts=0x2004e278c9d8, pkts_n=128) at ../drivers/net/mlx5/mlx5_rxtx_vec.c:393
    #8  0x0000000001086448 in rte_eth_rx_burst (nb_pkts=128, rx_pkts=0x2004e278c9d8, queue_id=7, port_id=<optimized out>) at ../include/dpdk/rte_ethdev.h:5339

Ethernet controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx]
    [root@localhost ~]# ofed_info -s

    [root@localhost ~]# ethtool -i eth6|grep fir
firmware-version: 22.37.1014 (MT_0000000359)
dpdk version: DPDK 21.11

157:          /* B.1 store rearm data to mbuf. */
158:          _mm_storeu_si128((__m128i *)&elts[pos + 2]->rearm_data, rearm);
159:          _mm_storeu_si128((__m128i *)&elts[pos + 3]->rearm_data, rearm);

Root cause: When processing compressed CQEs, 9 mini CQEs need to be processed and (*rxq->elts)[1021] -> (*rxq->elts)[1028] is accessed. Only [0, 1027] are reserved during the initialization of the receive queue. A null pointer is accessed due to out-of-bounds access. As a result, a core dump occurs in the process.
(gdb) p elts[0]
$149 = (struct rte_mbuf *) 0x2006945a8000  //first round
(gdb) p elts[1]
$150 = (struct rte_mbuf *) 0x2006945aa1c0
(gdb) p elts[2]
$151 = (struct rte_mbuf *) 0x2006945ac380
(gdb) p elts[3]
$152 = (struct rte_mbuf *) 0x20217ff36f80
(gdb) p elts[4]
$153 = (struct rte_mbuf *) 0x20217ff36f80  //Second round
(gdb) p elts[5]
$154 = (struct rte_mbuf *) 0x20217ff36f80
(gdb) p elts[6]
$155 = (struct rte_mbuf *) 0x20217ff36f80
(gdb) p elts[7]
$156 = (struct rte_mbuf *) 0x0     //coredump
(gdb) p elts - (*rxq->elts)
$157 = 1021

[-- Attachment #1.2: Type: text/html, Size: 8958 bytes --]

[-- Attachment #2: image001.png --]
[-- Type: image/png, Size: 35072 bytes --]

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2023-09-14 12:08 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-14 12:08 [mlx5] CX6 NIC bug, the process exits abnormally jiangheng (G)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).