DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [Bug 334] ConnectX-4/mlx5 crashes under high load in rxq_cq_decompress_v()
@ 2019-07-22  1:56 bugzilla
  0 siblings, 0 replies; only message in thread
From: bugzilla @ 2019-07-22  1:56 UTC (permalink / raw)
  To: dev

https://bugs.dpdk.org/show_bug.cgi?id=334

            Bug ID: 334
           Summary: ConnectX-4/mlx5 crashes under high load in
                    rxq_cq_decompress_v()
           Product: DPDK
           Version: 18.11
          Hardware: x86
                OS: Linux
            Status: UNCONFIRMED
          Severity: normal
          Priority: Normal
         Component: ethdev
          Assignee: dev@dpdk.org
          Reporter: yasu@nttv6.jp
  Target Milestone: ---

I'm writing my own DPDK application and it gets a crash in the
mlx5 driver function.

It doesn't crash under 10Gbps load but does under 50Gbps load
(or higher, 90Gbps was tested and resulted in a similar crash).
(both load are for a 100GbE port.)
4 cores (4 rxqs, 1-to-1) were assigned for the port.
48 txqs were assigned for the port.

The port's device is:
Mellanox Technologies MT27700 Family [ConnectX-4]
MLNX_OFED_LINUX-4.5-1.0.1.0-ubuntu18.04-x86_64
in Ubuntu 18.04.2 LTS 4.15.0-50-generic

$ sudo mstflint -d 86:00.0 q
Image type:            FS3
FW Version:            12.17.2020
FW Release Date:       22.11.2016
Description:           UID                GuidsNumber
Base GUID:             N/A                     4
Base MAC:              00900b65b390            4
Orig Base MAC:         N/A                     4
Image VSD:             N/A
Device VSD:            N/A
PSID:                  LNR3270110033
Security Attributes:   N/A
(Couldn't update the firmware because of the PSID.)

The backtrace of the crash:

Thread 11 "lcore-slave-8" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff1517700 (LWP 30617)]
0x0000555555f0230a in _mm_storeu_si128 (__B=..., __P=0x10)
    at /usr/lib/gcc/x86_64-linux-gnu/7/include/emmintrin.h:721
721       *__P = __B;
(gdb) bt
#0  0x0000555555f0230a in _mm_storeu_si128 (__B=..., __P=0x10)
    at /usr/lib/gcc/x86_64-linux-gnu/7/include/emmintrin.h:721
#1  rxq_cq_decompress_v (rxq=0x1c0da7480, cq=0x1c0c8cb80, elts=0x1c0da7a70)
    at /usr/local/dpdk-stable-18.11.2/drivers/net/mlx5/mlx5_rxtx_vec_sse.h:438
#2  0x0000555555f05b82 in rxq_burst_v (rxq=0x1c0da7480, pkts=0x7ffff1514a40,
    pkts_n=32, err=0x7ffff1505978)
    at /usr/local/dpdk-stable-18.11.2/drivers/net/mlx5/mlx5_rxtx_vec_sse.h:956
#3  0x0000555555f0662a in mlx5_rx_burst_vec (dpdk_rxq=0x1c0da7480,
    pkts=0x7ffff1514a40, pkts_n=32)
    at /usr/local/dpdk-stable-18.11.2/drivers/net/mlx5/mlx5_rxtx_vec.c:238
#4  0x000055555563304d in rte_eth_rx_burst (port_id=0, queue_id=0,
    rx_pkts=0x7ffff1514a40, nb_pkts=32)
    at
/usr/local/dpdk-18.11/x86_64-native-linuxapp-gcc/include/rte_ethdev.h:3879
(our DPDK application functions follow.)

It reproduces always. The same happened in DPDK 19.05.0.

When the crash occurs, in frame 1: rxq_cq_decompress_v():
(gdb) p t_pkt->data_len
$1 = 124
(gdb) p mcqe_n
$2 = 124
(gdb) p pos
$3 = 116
(gdb) p elts[pos + 3]
$10 = (struct rte_mbuf *) 0x0

It seems sometimes something is wrong in the initialization of
struct rte_mbuf *elts[].

(gdb) p/x (void*[124])elts[0]
$4 = {0x1e0106fc0, 0x1e00a8880, 0x1de8b4580, 0x1de716340, 0x1e00ad600, 
  0x1dfcc04c0, 0x1decf89c0, 0x1df656440, 0x1e02fc500, 0x1df7303c0, 
  0x1df7876c0, 0x1df0adfc0, 0x1dc44a8c0, 0x1dfb55040, 0x1df4b3480, 
  0x1ded87800, 0x1e07a64c0, 0x1dec066c0, 0x1dc59d9c0, 0x1de3ae540, 
  0x1debaf3c0, 0x1dfd69d40, 0x1dfd36f80, 0x1df073dc0, 0x1dffb3ec0, 
  0x1df0d7280, 0x1e0235b80, 0x1de4b3e40, 0x1df925900, 0x1df421f80, 
  0x1df021840, 0x1dfab7980, 0x1dfe572c0, 0x1dea3cb00, 0x1dbf5a540, 
  0x1de10aa00, 0x1dded8c00, 0x1df87c080, 0x1dee80f40, 0x1df596f00, 
  0x1dff20300, 0x1e05a4dc0, 0x1e0182800, 0x1e0257a00, 0x1e0323100, 
  0x1e0f3f100, 0x1df5ff140, 0x1dfbe17c0, 0x1de2c3680, 0x1dfd54080, 
  0x1de18afc0, 0x1dd81ecc0, 0x1de1f7f80, 0x1ded09900, 0x1df35b600, 
  0x1de57f540, 0x1df9e4e40, 0x1e0747d80, 0x1e024df00, 0x1ddf2d840, 
  0x1df95ad80, 0x1dedf47c0, 0x1de1ebdc0, 0x1e00e9ec0, 0x1e02febc0, 
  0x1dae22840, 0x1e051d3c0, 0x1df46f780, 0x1e0353800, 0x1e0ceb480, 
  0x1dfe9fd40, 0x1db58d440, 0x1e0526ec0, 0x1d61ebe40, 0x1dfe85300, 
  0x1df3b4fc0, 0x1ddbc0cc0, 0x1e04823c0, 0x1df724200, 0x1df9cf180, 
  0x1dfeb0c80, 0x1df4fe5c0, 0x1dff0f3c0, 0x1e051fa80, 0x1dd81c600, 
  0x1ddcef880, 0x1de30e7c0, 0x1ded803c0, 0x1de51e740, 0x1deffac40, 
  0x1df533a40, 0x1dd399240, 0x1deccf700, 0x1dfefbdc0, 0x1de9ab600, 
  0x1e0502980, 0x1dfb52980, 0x1dedabd40, 0x1e07e5440, 0x1dea91740, 
  0x1dd749ac0, 0x1e0d1e240, 0x1df86b140, 0x1df9013c0, 0x1dfc31680, 
  0x1dfa15540, 0x1e03694c0, 0x1e06dfb40, 0x1dfdf8b80, 0x1ddd8cf40, 
  0x1e03086c0, 0x1c0da2380, 0x1c0da2380, 0x1c0da2380, 0x1c0da2380, 
  0x1c0da2380, 0x1c0da2380, 0x1c0da2380, 0x1c0da2380, 0x0, 0x0, 0x0, 0x0, 
  0x7ffff7ff487c}

FYI:
The same DPDK application works fine on DPDK-18.11.2,
for Mellanox Technologies MT27800 Family [ConnectX-5]
with below firmware, even in the high load.
MLNX_OFED_LINUX-4.5-1.0.1.0-ubuntu18.04-x86_64.tgz
in Ubuntu 18.04.2 LTS 4.15.0-54-generic.

# mstflint -d 3b:00.0 q
Image type:            FS4
FW Version:            16.24.1000
FW Release Date:       26.11.2018
Product Version:       16.24.1000
Rom Info:              type=UEFI version=14.17.11 cpu=AMD64
                       type=PXE version=3.5.603 cpu=AMD64
Description:           UID                GuidsNumber
Base GUID:             506b4b0300086c56        8
Base MAC:              506b4b086c56            8
Image VSD:             N/A
Device VSD:            N/A
PSID:                  MT_0000000008
Security Attributes:   N/A

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2019-07-22  1:56 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-22  1:56 [dpdk-dev] [Bug 334] ConnectX-4/mlx5 crashes under high load in rxq_cq_decompress_v() bugzilla

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).