From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 94C99A046B for ; Mon, 22 Jul 2019 03:56:45 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id D3BF42BE5; Mon, 22 Jul 2019 03:56:44 +0200 (CEST) Received: from inbox.dpdk.org (xvm-172-178.dc0.ghst.net [95.142.172.178]) by dpdk.org (Postfix) with ESMTP id 9B04D2BC7 for ; Mon, 22 Jul 2019 03:56:43 +0200 (CEST) Received: by inbox.dpdk.org (Postfix, from userid 33) id 6178BA0613; Mon, 22 Jul 2019 03:56:43 +0200 (CEST) From: bugzilla@dpdk.org To: dev@dpdk.org Date: Mon, 22 Jul 2019 01:56:43 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: DPDK X-Bugzilla-Component: ethdev X-Bugzilla-Version: 18.11 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: yasu@nttv6.jp X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: Normal X-Bugzilla-Assigned-To: dev@dpdk.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version rep_platform op_sys bug_status bug_severity priority component assigned_to reporter target_milestone Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.dpdk.org/ Auto-Submitted: auto-generated X-Auto-Response-Suppress: All MIME-Version: 1.0 Subject: [dpdk-dev] [Bug 334] ConnectX-4/mlx5 crashes under high load in rxq_cq_decompress_v() X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" https://bugs.dpdk.org/show_bug.cgi?id=3D334 Bug ID: 334 Summary: ConnectX-4/mlx5 crashes under high load in rxq_cq_decompress_v() Product: DPDK Version: 18.11 Hardware: x86 OS: Linux Status: UNCONFIRMED Severity: normal Priority: Normal Component: ethdev Assignee: dev@dpdk.org Reporter: yasu@nttv6.jp Target Milestone: --- I'm writing my own DPDK application and it gets a crash in the mlx5 driver function. It doesn't crash under 10Gbps load but does under 50Gbps load (or higher, 90Gbps was tested and resulted in a similar crash). (both load are for a 100GbE port.) 4 cores (4 rxqs, 1-to-1) were assigned for the port. 48 txqs were assigned for the port. The port's device is: Mellanox Technologies MT27700 Family [ConnectX-4] MLNX_OFED_LINUX-4.5-1.0.1.0-ubuntu18.04-x86_64 in Ubuntu 18.04.2 LTS 4.15.0-50-generic $ sudo mstflint -d 86:00.0 q Image type: FS3 FW Version: 12.17.2020 FW Release Date: 22.11.2016 Description: UID GuidsNumber Base GUID: N/A 4 Base MAC: 00900b65b390 4 Orig Base MAC: N/A 4 Image VSD: N/A Device VSD: N/A PSID: LNR3270110033 Security Attributes: N/A (Couldn't update the firmware because of the PSID.) The backtrace of the crash: Thread 11 "lcore-slave-8" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7ffff1517700 (LWP 30617)] 0x0000555555f0230a in _mm_storeu_si128 (__B=3D..., __P=3D0x10) at /usr/lib/gcc/x86_64-linux-gnu/7/include/emmintrin.h:721 721 *__P =3D __B; (gdb) bt #0 0x0000555555f0230a in _mm_storeu_si128 (__B=3D..., __P=3D0x10) at /usr/lib/gcc/x86_64-linux-gnu/7/include/emmintrin.h:721 #1 rxq_cq_decompress_v (rxq=3D0x1c0da7480, cq=3D0x1c0c8cb80, elts=3D0x1c0d= a7a70) at /usr/local/dpdk-stable-18.11.2/drivers/net/mlx5/mlx5_rxtx_vec_sse.h:= 438 #2 0x0000555555f05b82 in rxq_burst_v (rxq=3D0x1c0da7480, pkts=3D0x7ffff151= 4a40, pkts_n=3D32, err=3D0x7ffff1505978) at /usr/local/dpdk-stable-18.11.2/drivers/net/mlx5/mlx5_rxtx_vec_sse.h:= 956 #3 0x0000555555f0662a in mlx5_rx_burst_vec (dpdk_rxq=3D0x1c0da7480, pkts=3D0x7ffff1514a40, pkts_n=3D32) at /usr/local/dpdk-stable-18.11.2/drivers/net/mlx5/mlx5_rxtx_vec.c:238 #4 0x000055555563304d in rte_eth_rx_burst (port_id=3D0, queue_id=3D0, rx_pkts=3D0x7ffff1514a40, nb_pkts=3D32) at /usr/local/dpdk-18.11/x86_64-native-linuxapp-gcc/include/rte_ethdev.h:3879 (our DPDK application functions follow.) It reproduces always. The same happened in DPDK 19.05.0. When the crash occurs, in frame 1: rxq_cq_decompress_v(): (gdb) p t_pkt->data_len $1 =3D 124 (gdb) p mcqe_n $2 =3D 124 (gdb) p pos $3 =3D 116 (gdb) p elts[pos + 3] $10 =3D (struct rte_mbuf *) 0x0 It seems sometimes something is wrong in the initialization of struct rte_mbuf *elts[]. (gdb) p/x (void*[124])elts[0] $4 =3D {0x1e0106fc0, 0x1e00a8880, 0x1de8b4580, 0x1de716340, 0x1e00ad600,=20 0x1dfcc04c0, 0x1decf89c0, 0x1df656440, 0x1e02fc500, 0x1df7303c0,=20 0x1df7876c0, 0x1df0adfc0, 0x1dc44a8c0, 0x1dfb55040, 0x1df4b3480,=20 0x1ded87800, 0x1e07a64c0, 0x1dec066c0, 0x1dc59d9c0, 0x1de3ae540,=20 0x1debaf3c0, 0x1dfd69d40, 0x1dfd36f80, 0x1df073dc0, 0x1dffb3ec0,=20 0x1df0d7280, 0x1e0235b80, 0x1de4b3e40, 0x1df925900, 0x1df421f80,=20 0x1df021840, 0x1dfab7980, 0x1dfe572c0, 0x1dea3cb00, 0x1dbf5a540,=20 0x1de10aa00, 0x1dded8c00, 0x1df87c080, 0x1dee80f40, 0x1df596f00,=20 0x1dff20300, 0x1e05a4dc0, 0x1e0182800, 0x1e0257a00, 0x1e0323100,=20 0x1e0f3f100, 0x1df5ff140, 0x1dfbe17c0, 0x1de2c3680, 0x1dfd54080,=20 0x1de18afc0, 0x1dd81ecc0, 0x1de1f7f80, 0x1ded09900, 0x1df35b600,=20 0x1de57f540, 0x1df9e4e40, 0x1e0747d80, 0x1e024df00, 0x1ddf2d840,=20 0x1df95ad80, 0x1dedf47c0, 0x1de1ebdc0, 0x1e00e9ec0, 0x1e02febc0,=20 0x1dae22840, 0x1e051d3c0, 0x1df46f780, 0x1e0353800, 0x1e0ceb480,=20 0x1dfe9fd40, 0x1db58d440, 0x1e0526ec0, 0x1d61ebe40, 0x1dfe85300,=20 0x1df3b4fc0, 0x1ddbc0cc0, 0x1e04823c0, 0x1df724200, 0x1df9cf180,=20 0x1dfeb0c80, 0x1df4fe5c0, 0x1dff0f3c0, 0x1e051fa80, 0x1dd81c600,=20 0x1ddcef880, 0x1de30e7c0, 0x1ded803c0, 0x1de51e740, 0x1deffac40,=20 0x1df533a40, 0x1dd399240, 0x1deccf700, 0x1dfefbdc0, 0x1de9ab600,=20 0x1e0502980, 0x1dfb52980, 0x1dedabd40, 0x1e07e5440, 0x1dea91740,=20 0x1dd749ac0, 0x1e0d1e240, 0x1df86b140, 0x1df9013c0, 0x1dfc31680,=20 0x1dfa15540, 0x1e03694c0, 0x1e06dfb40, 0x1dfdf8b80, 0x1ddd8cf40,=20 0x1e03086c0, 0x1c0da2380, 0x1c0da2380, 0x1c0da2380, 0x1c0da2380,=20 0x1c0da2380, 0x1c0da2380, 0x1c0da2380, 0x1c0da2380, 0x0, 0x0, 0x0, 0x0,=20 0x7ffff7ff487c} FYI: The same DPDK application works fine on DPDK-18.11.2, for Mellanox Technologies MT27800 Family [ConnectX-5] with below firmware, even in the high load. MLNX_OFED_LINUX-4.5-1.0.1.0-ubuntu18.04-x86_64.tgz in Ubuntu 18.04.2 LTS 4.15.0-54-generic. # mstflint -d 3b:00.0 q Image type: FS4 FW Version: 16.24.1000 FW Release Date: 26.11.2018 Product Version: 16.24.1000 Rom Info: type=3DUEFI version=3D14.17.11 cpu=3DAMD64 type=3DPXE version=3D3.5.603 cpu=3DAMD64 Description: UID GuidsNumber Base GUID: 506b4b0300086c56 8 Base MAC: 506b4b086c56 8 Image VSD: N/A Device VSD: N/A PSID: MT_0000000008 Security Attributes: N/A --=20 You are receiving this mail because: You are the assignee for the bug.=