From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <martin.weiser@allegro-packets.com>
Received: from smtprelay07.ispgateway.de (smtprelay07.ispgateway.de
 [134.119.228.103]) by dpdk.org (Postfix) with ESMTP id 767531B1B2
 for <dev@dpdk.org>; Tue, 26 Sep 2017 11:23:46 +0200 (CEST)
Received: from [146.52.109.75] (helo=nb-martin.allegro)
 by smtprelay07.ispgateway.de with esmtpsa
 (TLSv1.2:ECDHE-RSA-AES128-GCM-SHA256:128) (Exim 4.89)
 (envelope-from <martin.weiser@allegro-packets.com>)
 id 1dwm5h-0003TS-Hb; Tue, 26 Sep 2017 11:23:45 +0200
To: Adrien Mazarguil <adrien.mazarguil@6wind.com>,
 Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>
From: Martin Weiser <martin.weiser@allegro-packets.com>
Message-ID: <5d1f07c4-5933-806d-4d11-8fdfabc701d7@allegro-packets.com>
Date: Tue, 26 Sep 2017 11:23:45 +0200
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:52.0)
 Gecko/20100101 Thunderbird/52.3.0
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="------------2A707CA3EA47338DC5C97577"
Content-Language: en-US
X-Df-Sender: bWFydGluLndlaXNlckBhbGxlZ3JvLXBhY2tldHMuY29t
Subject: [dpdk-dev] Mellanox ConnectX-5 crashes and mbuf leak
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Tue, 26 Sep 2017 09:23:46 -0000

This is a multi-part message in MIME format.
--------------2A707CA3EA47338DC5C97577
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

Hi,

we are currently testing the Mellanox ConnectX-5 100G NIC with DPDK
17.08 as well as dpdk-net-next and are
experiencing mbuf leaks as well as crashes (and in some instances even
kernel panics in a mlx5 module) under
certain load conditions.

We initially saw these issues only in our own DPDK-based application and
it took some effort to reproduce this
in one of the DPDK example applications. However with the attached patch
to the load-balancer example we can
reproduce the issues reliably.

The patch may look weird at first but I will explain why I made these
changes:

* the sleep introduced in the worker threads simulates heavy processing
which causes the software rx rings to fill
=C2=A0 up under load. If the rings are large enough (I increased the ring=

size with the load-balancer command line option
=C2=A0 as you can see in the example call further down) the mbuf pool may=
 run
empty and I believe this leads to a malfunction
=C2=A0 in the mlx5 driver. As soon as this happens the NIC will stop
forwarding traffic, probably because the driver
=C2=A0 cannot allocate mbufs for the packets received by the NIC.
Unfortunately when this happens most of the mbufs will
=C2=A0 never return to the mbuf pool so that even when the traffic stops =
the
pool will remain almost empty and the
=C2=A0 application will not forward traffic even at a very low rate.

* the use of the reference count in the mbuf in addition to the
situation described above is what makes the
=C2=A0 mlx5 DPDK driver crash almost immediately under load. In our
application we rely on this feature to be able to forward
=C2=A0 the packet quickly and still send the packet to a worker thread fo=
r
analysis and finally free the packet when analysis is
=C2=A0 done. Here I simulated this by increasing the mbuf reference count=

immediately after receiving the mbuf from the
=C2=A0 driver and then calling rte_pktmbuf_free in the worker thread whic=
h
should only decrement the reference count again
=C2=A0 and not actually free the mbuf.

We executed the patched load-balancer application with the following
command line:

=C2=A0=C2=A0=C2=A0 ./build/load_balancer -l 3-7 -n 4 -- --rx "(0,0,3),(1,=
0,3)" --tx
"(0,3),(1,3)" --w "4" --lpm "16.0.0.0/8=3D>0; 48.0.0.0/8=3D>1;" --pos-lb =
29
--rsz "1024, 32768, 1024, 1024"

Then we generated traffic using the t-rex traffic generator and the sfr
test case. On our machine the issues start
to happen when the traffic exceeds ~6 Gbps but this may vary depending
on how powerful the test machine is (by
the way we were able to reproduce this on different types of hardware).

A typical stacktrace looks like this:

=C2=A0=C2=A0=C2=A0 Thread 1 "load_balancer" received signal SIGSEGV, Segm=
entation fault.
=C2=A0=C2=A0=C2=A0 0x0000000000614475 in _mm_storeu_si128 (__B=3D..., __P=
=3D<optimized
out>) at /usr/lib/gcc/x86_64-linux-gnu/5/include/emmintrin.h:716
=C2=A0=C2=A0=C2=A0 716=C2=A0=C2=A0=C2=A0 =C2=A0 __builtin_ia32_storedqu (=
(char *)__P, (__v16qi)__B);
=C2=A0=C2=A0=C2=A0 (gdb) bt
=C2=A0=C2=A0=C2=A0 #0=C2=A0 0x0000000000614475 in _mm_storeu_si128 (__B=3D=
=2E.., __P=3D<optimized
out>) at /usr/lib/gcc/x86_64-linux-gnu/5/include/emmintrin.h:716
=C2=A0=C2=A0=C2=A0 #1=C2=A0 rxq_cq_decompress_v (elts=3D0x7fff3732bef0, c=
q=3D0x7ffff7f99380,
rxq=3D0x7fff3732a980) at
/root/dpdk-next-net/drivers/net/mlx5/mlx5_rxtx_vec_sse.c:679
=C2=A0=C2=A0=C2=A0 #2=C2=A0 rxq_burst_v (pkts_n=3D<optimized out>, pkts=3D=
0xa7c7b0 <app+432944>,
rxq=3D0x7fff3732a980) at
/root/dpdk-next-net/drivers/net/mlx5/mlx5_rxtx_vec_sse.c:1242
=C2=A0=C2=A0=C2=A0 #3=C2=A0 mlx5_rx_burst_vec (dpdk_rxq=3D0x7fff3732a980,=
 pkts=3D<optimized
out>, pkts_n=3D<optimized out>) at
/root/dpdk-next-net/drivers/net/mlx5/mlx5_rxtx_vec_sse.c:1277
=C2=A0=C2=A0=C2=A0 #4=C2=A0 0x000000000043c11d in rte_eth_rx_burst (nb_pk=
ts=3D3599,
rx_pkts=3D0xa7c7b0 <app+432944>, queue_id=3D0, port_id=3D0 '\000')
=C2=A0=C2=A0=C2=A0 at
/root/dpdk-next-net//x86_64-native-linuxapp-gcc/include/rte_ethdev.h:2781=

=C2=A0=C2=A0=C2=A0 #5=C2=A0 app_lcore_io_rx (lp=3Dlp@entry=3D0xa7c700 <ap=
p+432768>,
n_workers=3Dn_workers@entry=3D1, bsz_rd=3Dbsz_rd@entry=3D144,
bsz_wr=3Dbsz_wr@entry=3D144, pos_lb=3Dpos_lb@entry=3D29 '\035')
=C2=A0=C2=A0=C2=A0 at /root/dpdk-next-net/examples/load_balancer/runtime.=
c:198
=C2=A0=C2=A0=C2=A0 #6=C2=A0 0x0000000000447dc0 in app_lcore_main_loop_io =
() at
/root/dpdk-next-net/examples/load_balancer/runtime.c:485
=C2=A0=C2=A0=C2=A0 #7=C2=A0 app_lcore_main_loop (arg=3D<optimized out>) a=
t
/root/dpdk-next-net/examples/load_balancer/runtime.c:669
=C2=A0=C2=A0=C2=A0 #8=C2=A0 0x0000000000495e8b in rte_eal_mp_remote_launc=
h ()
=C2=A0=C2=A0=C2=A0 #9=C2=A0 0x0000000000441e0d in main (argc=3D<optimized=
 out>,
argv=3D<optimized out>) at
/root/dpdk-next-net/examples/load_balancer/main.c:99

The crash does not always happen at the exact same spot but in our tests
always in the same function.
In a few instances instead of an application crash the system froze
completely with what appeared to be a kernel
panic. The last output looked like a crash in the interrupt handler of a
mlx5 module but unfortunately I cannot
provide the exact output right now.

All tests were performed under Ubuntu 16.04 server running a
4.4.0-96-generic kernel and the lasted Mellanox OFED
MLNX_OFED_LINUX-4.1-1.0.2.0-ubuntu16.04-x86_64 was used.

Any help with this issue is greatly appreciated.

Best regards,
Martin


--------------2A707CA3EA47338DC5C97577
Content-Type: text/plain; charset=UTF-8; x-mac-type="0"; x-mac-creator="0";
 name="test.patch"
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
 filename="test.patch"

ZGlmZiAtLWdpdCBhL2NvbmZpZy9jb21tb25fYmFzZSBiL2NvbmZpZy9jb21tb25fYmFzZQpp
bmRleCA0MzlmM2NjLi4xMmI3MWU5IDEwMDY0NAotLS0gYS9jb25maWcvY29tbW9uX2Jhc2UK
KysrIGIvY29uZmlnL2NvbW1vbl9iYXNlCkBAIC0yMjAsNyArMjIwLDcgQEAgQ09ORklHX1JU
RV9MSUJSVEVfTUxYNF9UWF9NUF9DQUNIRT04CiAjCiAjIENvbXBpbGUgYnVyc3Qtb3JpZW50
ZWQgTWVsbGFub3ggQ29ubmVjdFgtNCAmIENvbm5lY3RYLTUgKE1MWDUpIFBNRAogIwotQ09O
RklHX1JURV9MSUJSVEVfTUxYNV9QTUQ9bgorQ09ORklHX1JURV9MSUJSVEVfTUxYNV9QTUQ9
eQogQ09ORklHX1JURV9MSUJSVEVfTUxYNV9ERUJVRz1uCiBDT05GSUdfUlRFX0xJQlJURV9N
TFg1X1RYX01QX0NBQ0hFPTgKCmRpZmYgLS1naXQgYS9leGFtcGxlcy9sb2FkX2JhbGFuY2Vy
L3J1bnRpbWUuYyBiL2V4YW1wbGVzL2xvYWRfYmFsYW5jZXIvcnVudGltZS5jCmluZGV4IGU1
NGI3ODUuLmQ0NDgxMDAgMTAwNjQ0Ci0tLSBhL2V4YW1wbGVzL2xvYWRfYmFsYW5jZXIvcnVu
dGltZS5jCisrKyBiL2V4YW1wbGVzL2xvYWRfYmFsYW5jZXIvcnVudGltZS5jCkBAIC00MSw2
ICs0MSw3IEBACiAjaW5jbHVkZSA8c3RkYXJnLmg+CiAjaW5jbHVkZSA8ZXJybm8uaD4KICNp
bmNsdWRlIDxnZXRvcHQuaD4KKyNpbmNsdWRlIDx1bmlzdGQuaD4KCiAjaW5jbHVkZSA8cnRl
X2NvbW1vbi5oPgogI2luY2x1ZGUgPHJ0ZV9ieXRlb3JkZXIuaD4KQEAgLTEzMyw2ICsxMzQs
OCBAQCBhcHBfbGNvcmVfaW9fcnhfYnVmZmVyX3RvX3NlbmQgKAogCXVpbnQzMl90IHBvczsK
IAlpbnQgcmV0OwoKKwlydGVfcGt0bWJ1Zl9yZWZjbnRfdXBkYXRlKG1idWYsIDEpOworCiAJ
cG9zID0gbHAtPnJ4Lm1idWZfb3V0W3dvcmtlcl0ubl9tYnVmczsKIAlscC0+cngubWJ1Zl9v
dXRbd29ya2VyXS5hcnJheVtwb3MgKytdID0gbWJ1ZjsKIAlpZiAobGlrZWx5KHBvcyA8IGJz
eikpIHsKQEAgLTUyMSw2ICs1MjQsOCBAQCBhcHBfbGNvcmVfd29ya2VyKAogCQljb250aW51
ZTsKICNlbmRpZgoKKwkJdXNsZWVwKDIwKTsKKwogCQlBUFBfV09SS0VSX1BSRUZFVENIMShy
dGVfcGt0bWJ1Zl9tdG9kKGxwLT5tYnVmX2luLmFycmF5WzBdLCB1bnNpZ25lZCBjaGFyICop
KTsKIAkJQVBQX1dPUktFUl9QUkVGRVRDSDAobHAtPm1idWZfaW4uYXJyYXlbMV0pOwoKQEAg
LTUzMCw2ICs1MzUsOCBAQCBhcHBfbGNvcmVfd29ya2VyKAogCQkJdWludDMyX3QgaXB2NF9k
c3QsIHBvczsKIAkJCXVpbnQzMl90IHBvcnQ7CgorCQkJcnRlX3BrdG1idWZfZnJlZShscC0+
bWJ1Zl9pbi5hcnJheVtqXSk7CisKIAkJCWlmIChsaWtlbHkoaiA8IGJzel9yZCAtIDEpKSB7
CiAJCQkJQVBQX1dPUktFUl9QUkVGRVRDSDEocnRlX3BrdG1idWZfbXRvZChscC0+bWJ1Zl9p
bi5hcnJheVtqKzFdLCB1bnNpZ25lZCBjaGFyICopKTsKIAkJCX0K
--------------2A707CA3EA47338DC5C97577--