* [dpdk-dev] [Bug 334] ConnectX-4/mlx5 crashes under high load in rxq_cq_decompress_v()
@ 2019-07-22 1:56 bugzilla
0 siblings, 0 replies; only message in thread
From: bugzilla @ 2019-07-22 1:56 UTC (permalink / raw)
To: dev
https://bugs.dpdk.org/show_bug.cgi?id=334
Bug ID: 334
Summary: ConnectX-4/mlx5 crashes under high load in
rxq_cq_decompress_v()
Product: DPDK
Version: 18.11
Hardware: x86
OS: Linux
Status: UNCONFIRMED
Severity: normal
Priority: Normal
Component: ethdev
Assignee: dev@dpdk.org
Reporter: yasu@nttv6.jp
Target Milestone: ---
I'm writing my own DPDK application and it gets a crash in the
mlx5 driver function.
It doesn't crash under 10Gbps load but does under 50Gbps load
(or higher, 90Gbps was tested and resulted in a similar crash).
(both load are for a 100GbE port.)
4 cores (4 rxqs, 1-to-1) were assigned for the port.
48 txqs were assigned for the port.
The port's device is:
Mellanox Technologies MT27700 Family [ConnectX-4]
MLNX_OFED_LINUX-4.5-1.0.1.0-ubuntu18.04-x86_64
in Ubuntu 18.04.2 LTS 4.15.0-50-generic
$ sudo mstflint -d 86:00.0 q
Image type: FS3
FW Version: 12.17.2020
FW Release Date: 22.11.2016
Description: UID GuidsNumber
Base GUID: N/A 4
Base MAC: 00900b65b390 4
Orig Base MAC: N/A 4
Image VSD: N/A
Device VSD: N/A
PSID: LNR3270110033
Security Attributes: N/A
(Couldn't update the firmware because of the PSID.)
The backtrace of the crash:
Thread 11 "lcore-slave-8" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff1517700 (LWP 30617)]
0x0000555555f0230a in _mm_storeu_si128 (__B=..., __P=0x10)
at /usr/lib/gcc/x86_64-linux-gnu/7/include/emmintrin.h:721
721 *__P = __B;
(gdb) bt
#0 0x0000555555f0230a in _mm_storeu_si128 (__B=..., __P=0x10)
at /usr/lib/gcc/x86_64-linux-gnu/7/include/emmintrin.h:721
#1 rxq_cq_decompress_v (rxq=0x1c0da7480, cq=0x1c0c8cb80, elts=0x1c0da7a70)
at /usr/local/dpdk-stable-18.11.2/drivers/net/mlx5/mlx5_rxtx_vec_sse.h:438
#2 0x0000555555f05b82 in rxq_burst_v (rxq=0x1c0da7480, pkts=0x7ffff1514a40,
pkts_n=32, err=0x7ffff1505978)
at /usr/local/dpdk-stable-18.11.2/drivers/net/mlx5/mlx5_rxtx_vec_sse.h:956
#3 0x0000555555f0662a in mlx5_rx_burst_vec (dpdk_rxq=0x1c0da7480,
pkts=0x7ffff1514a40, pkts_n=32)
at /usr/local/dpdk-stable-18.11.2/drivers/net/mlx5/mlx5_rxtx_vec.c:238
#4 0x000055555563304d in rte_eth_rx_burst (port_id=0, queue_id=0,
rx_pkts=0x7ffff1514a40, nb_pkts=32)
at
/usr/local/dpdk-18.11/x86_64-native-linuxapp-gcc/include/rte_ethdev.h:3879
(our DPDK application functions follow.)
It reproduces always. The same happened in DPDK 19.05.0.
When the crash occurs, in frame 1: rxq_cq_decompress_v():
(gdb) p t_pkt->data_len
$1 = 124
(gdb) p mcqe_n
$2 = 124
(gdb) p pos
$3 = 116
(gdb) p elts[pos + 3]
$10 = (struct rte_mbuf *) 0x0
It seems sometimes something is wrong in the initialization of
struct rte_mbuf *elts[].
(gdb) p/x (void*[124])elts[0]
$4 = {0x1e0106fc0, 0x1e00a8880, 0x1de8b4580, 0x1de716340, 0x1e00ad600,
0x1dfcc04c0, 0x1decf89c0, 0x1df656440, 0x1e02fc500, 0x1df7303c0,
0x1df7876c0, 0x1df0adfc0, 0x1dc44a8c0, 0x1dfb55040, 0x1df4b3480,
0x1ded87800, 0x1e07a64c0, 0x1dec066c0, 0x1dc59d9c0, 0x1de3ae540,
0x1debaf3c0, 0x1dfd69d40, 0x1dfd36f80, 0x1df073dc0, 0x1dffb3ec0,
0x1df0d7280, 0x1e0235b80, 0x1de4b3e40, 0x1df925900, 0x1df421f80,
0x1df021840, 0x1dfab7980, 0x1dfe572c0, 0x1dea3cb00, 0x1dbf5a540,
0x1de10aa00, 0x1dded8c00, 0x1df87c080, 0x1dee80f40, 0x1df596f00,
0x1dff20300, 0x1e05a4dc0, 0x1e0182800, 0x1e0257a00, 0x1e0323100,
0x1e0f3f100, 0x1df5ff140, 0x1dfbe17c0, 0x1de2c3680, 0x1dfd54080,
0x1de18afc0, 0x1dd81ecc0, 0x1de1f7f80, 0x1ded09900, 0x1df35b600,
0x1de57f540, 0x1df9e4e40, 0x1e0747d80, 0x1e024df00, 0x1ddf2d840,
0x1df95ad80, 0x1dedf47c0, 0x1de1ebdc0, 0x1e00e9ec0, 0x1e02febc0,
0x1dae22840, 0x1e051d3c0, 0x1df46f780, 0x1e0353800, 0x1e0ceb480,
0x1dfe9fd40, 0x1db58d440, 0x1e0526ec0, 0x1d61ebe40, 0x1dfe85300,
0x1df3b4fc0, 0x1ddbc0cc0, 0x1e04823c0, 0x1df724200, 0x1df9cf180,
0x1dfeb0c80, 0x1df4fe5c0, 0x1dff0f3c0, 0x1e051fa80, 0x1dd81c600,
0x1ddcef880, 0x1de30e7c0, 0x1ded803c0, 0x1de51e740, 0x1deffac40,
0x1df533a40, 0x1dd399240, 0x1deccf700, 0x1dfefbdc0, 0x1de9ab600,
0x1e0502980, 0x1dfb52980, 0x1dedabd40, 0x1e07e5440, 0x1dea91740,
0x1dd749ac0, 0x1e0d1e240, 0x1df86b140, 0x1df9013c0, 0x1dfc31680,
0x1dfa15540, 0x1e03694c0, 0x1e06dfb40, 0x1dfdf8b80, 0x1ddd8cf40,
0x1e03086c0, 0x1c0da2380, 0x1c0da2380, 0x1c0da2380, 0x1c0da2380,
0x1c0da2380, 0x1c0da2380, 0x1c0da2380, 0x1c0da2380, 0x0, 0x0, 0x0, 0x0,
0x7ffff7ff487c}
FYI:
The same DPDK application works fine on DPDK-18.11.2,
for Mellanox Technologies MT27800 Family [ConnectX-5]
with below firmware, even in the high load.
MLNX_OFED_LINUX-4.5-1.0.1.0-ubuntu18.04-x86_64.tgz
in Ubuntu 18.04.2 LTS 4.15.0-54-generic.
# mstflint -d 3b:00.0 q
Image type: FS4
FW Version: 16.24.1000
FW Release Date: 26.11.2018
Product Version: 16.24.1000
Rom Info: type=UEFI version=14.17.11 cpu=AMD64
type=PXE version=3.5.603 cpu=AMD64
Description: UID GuidsNumber
Base GUID: 506b4b0300086c56 8
Base MAC: 506b4b086c56 8
Image VSD: N/A
Device VSD: N/A
PSID: MT_0000000008
Security Attributes: N/A
--
You are receiving this mail because:
You are the assignee for the bug.
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2019-07-22 1:56 UTC | newest]
Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-22 1:56 [dpdk-dev] [Bug 334] ConnectX-4/mlx5 crashes under high load in rxq_cq_decompress_v() bugzilla
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).