From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <bugzilla@dpdk.org>
Received: by dpdk.org (Postfix, from userid 33)
 id 23CB51B3B5; Tue, 23 Oct 2018 19:48:09 +0200 (CEST)
From: bugzilla@dpdk.org
To: dev@dpdk.org
Date: Tue, 23 Oct 2018 17:48:09 +0000
X-Bugzilla-Reason: AssignedTo
X-Bugzilla-Type: new
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: DPDK
X-Bugzilla-Component: core
X-Bugzilla-Version: 18.08
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: critical
X-Bugzilla-Who: yskoh@mellanox.com
X-Bugzilla-Status: CONFIRMED
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: Normal
X-Bugzilla-Assigned-To: dev@dpdk.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: bug_id short_desc product version rep_platform
 op_sys bug_status bug_severity priority component assigned_to reporter
 target_milestone
Message-ID: <bug-97-3@http.bugs.dpdk.org/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://bugs.dpdk.org/
Auto-Submitted: auto-generated
X-Auto-Response-Suppress: All
MIME-Version: 1.0
Subject: [dpdk-dev] [Bug 97] rte_memcpy() moves data incorrectly on Ubuntu
 18.04 on Intel Skylake
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Tue, 23 Oct 2018 17:48:09 -0000

https://bugs.dpdk.org/show_bug.cgi?id=3D97

            Bug ID: 97
           Summary: rte_memcpy() moves data incorrectly on Ubuntu 18.04 on
                    Intel Skylake
           Product: DPDK
           Version: 18.08
          Hardware: x86
                OS: Linux
            Status: CONFIRMED
          Severity: critical
          Priority: Normal
         Component: core
          Assignee: dev@dpdk.org
          Reporter: yskoh@mellanox.com
  Target Milestone: ---

Reported by:
        https://mails.dpdk.org/archives/dev/2018-September/111522.html

We've recently encountered a weird issue with Ubuntu 18.04 on the Skylake
server. I can always reproduce this crash and I could narrowed it down. I g=
uess
it could be a GCC issue.


[1] How to reproduce
- ConnectX-4Lx/ConnectX-5 with mlx5 PMD in DPDK 18.02/18.05/18.08
- Ubuntu 18.04 on Intel Skylake server
- gcc (Ubuntu 7.3.0-16ubuntu3) 7.3.0
- Testpmd crashes when it starts to forward traffic. Easy to reproduce.
- Only happens on the Skylake server.


[2] Failure point

The attached patch gives an insight of why it crashes. The following is the
result of the patch and the GDB commands.

In summary, rte_memcpy() doesn't work as expected. In __mempool_generic_put=
(),
there's rte_memcpy() to move the array of objects to the lcore cache. If I =
run
memcmp() right after rte_memcpy(dst, src, n), data in dst differs from data=
 in
src. And it looks like some of data got shifted by a few bytes as you can s=
ee
below.

        [GDB command]
        $dst =3D 0x7ffff4e09ea8
        $src =3D 0x7fffce3fb970
        $n =3D 256
        x/32gx 0x7ffff4e09ea8
        x/32gx 0x7fffce3fb970
        testpmd: /home/mlnxtest/dpdk/build/include/rte_mempool.h:1140:
__mempool_generic_put: Assertion `0' failed.

        Thread 4 "lcore-slave-1" received signal SIGABRT, Aborted.
        [Switching to Thread 0x7fffce3ff700 (LWP 69913)]
        (gdb) x/32gx 0x7ffff4e09ea8
        0x7ffff4e09ea8: 0x00007fffaac38ec0      0x00007fffaac38500
        0x7ffff4e09eb8: 0x00007fffaac37b40      0x00007fffaac37180
        0x7ffff4e09ec8: 0x850000007fffaac3      0x7b4000007fffaac3
        0x7ffff4e09ed8: 0x00007fffaac35440      0x00007fffaac34a80
        0x7ffff4e09ee8: 0xaac3850000007fff      0xaac37b4000007fff
        0x7ffff4e09ef8: 0x00007fffaac32d40      0x00007fffaac32380
        0x7ffff4e09f08: 0x7fffaac385000000      0x7fffaac37b400000
        0x7ffff4e09f18: 0x00007fffaac30640      0x00007fffaac2fc80
        0x7ffff4e09f28: 0x00007fffaac2f2c0      0x00007fffaac2e900
        0x7ffff4e09f38: 0x00007fffaac2df40      0x00007fffaac2d580
        0x7ffff4e09f48: 0x00007fffaac2cbc0      0x00007fffaac2c200
        0x7ffff4e09f58: 0x00007fffaac2b840      0x00007fffaac2ae80
        0x7ffff4e09f68: 0x00007fffaac2a4c0      0x00007fffaac29b00
        0x7ffff4e09f78: 0x00007fffaac29140      0x00007fffaac28780
        0x7ffff4e09f88: 0x00007fffaac27dc0      0x00007fffaac27400
        0x7ffff4e09f98: 0x00007fffaac26a40      0x00007fffaac26080
        (gdb) x/32gx 0x7fffce3fb970
        0x7fffce3fb970: 0x00007fffaac38ec0      0x00007fffaac38500
        0x7fffce3fb980: 0x00007fffaac37b40      0x00007fffaac37180
        0x7fffce3fb990: 0x00007fffaac367c0      0x00007fffaac35e00
        0x7fffce3fb9a0: 0x00007fffaac35440      0x00007fffaac34a80
        0x7fffce3fb9b0: 0x00007fffaac340c0      0x00007fffaac33700
        0x7fffce3fb9c0: 0x00007fffaac32d40      0x00007fffaac32380
        0x7fffce3fb9d0: 0x00007fffaac319c0      0x00007fffaac31000
        0x7fffce3fb9e0: 0x00007fffaac30640      0x00007fffaac2fc80
        0x7fffce3fb9f0: 0x00007fffaac2f2c0      0x00007fffaac2e900
        0x7fffce3fba00: 0x00007fffaac2df40      0x00007fffaac2d580
        0x7fffce3fba10: 0x00007fffaac2cbc0      0x00007fffaac2c200
        0x7fffce3fba20: 0x00007fffaac2b840      0x00007fffaac2ae80
        0x7fffce3fba30: 0x00007fffaac2a4c0      0x00007fffaac29b00
        0x7fffce3fba40: 0x00007fffaac29140      0x00007fffaac28780
        0x7fffce3fba50: 0x00007fffaac27dc0      0x00007fffaac27400
        0x7fffce3fba60: 0x00007fffaac26a40      0x00007fffaac26080


AFAIK, AVX512F support is disabled by default in DPDK as it is still
experimental (CONFIG_RTE_ENABLE_AVX512=3Dn). But with gcc optimization, AVX2
version of rte_memcpy() seems to be optimized with 512b instructions. If I
disable it by adding EXTRA_CFLAGS=3D"-mno-avx512f", then it works fine and
doesn't
crash.

Do you have any idea regarding this issue or are you already aware of it?


Thanks,
Yongseok


$ git diff
diff --git a/config/common_base b/config/common_base
index ad03cf433..f512b5a88 100644
--- a/config/common_base
+++ b/config/common_base
@@ -275,8 +275,8 @@ CONFIG_RTE_LIBRTE_MLX4_TX_MP_CACHE=3D8
 #
 # Compile burst-oriented Mellanox ConnectX-4 & ConnectX-5 (MLX5) PMD
 #
-CONFIG_RTE_LIBRTE_MLX5_PMD=3Dn
-CONFIG_RTE_LIBRTE_MLX5_DEBUG=3Dn
+CONFIG_RTE_LIBRTE_MLX5_PMD=3Dy
+CONFIG_RTE_LIBRTE_MLX5_DEBUG=3Dy
 CONFIG_RTE_LIBRTE_MLX5_DLOPEN_DEPS=3Dn
 CONFIG_RTE_LIBRTE_MLX5_TX_MP_CACHE=3D8

@@ -597,7 +597,7 @@ CONFIG_RTE_RING_USE_C11_MEM_MODEL=3Dn
 #
 CONFIG_RTE_LIBRTE_MEMPOOL=3Dy
 CONFIG_RTE_MEMPOOL_CACHE_MAX_SIZE=3D512
-CONFIG_RTE_LIBRTE_MEMPOOL_DEBUG=3Dn
+CONFIG_RTE_LIBRTE_MEMPOOL_DEBUG=3Dy

 #
 # Compile Mempool drivers
diff --git a/lib/librte_mempool/rte_mempool.h
b/lib/librte_mempool/rte_mempool.h
index 8b1b7f7ed..9f48028d9 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -39,6 +39,7 @@
 #include <errno.h>
 #include <inttypes.h>
 #include <sys/queue.h>
+#include <assert.h>

 #include <rte_config.h>
 #include <rte_spinlock.h>
@@ -1123,6 +1124,22 @@ __mempool_generic_put(struct rte_mempool *mp, void *
const *obj_table,
        /* Add elements back into the cache */
        rte_memcpy(&cache_objs[0], obj_table, sizeof(void *) * n);

+       if(memcmp(&cache_objs[0], obj_table, sizeof(void *) * n)) {
+               printf("[GDB command] \n"
+                      "$dst =3D %p\n"
+                      "$src =3D %p\n"
+                      "$n =3D %ld\n"
+                      "x/%ldgx %p\n"
+                      "x/%ldgx %p\n",
+                      (void *)&cache_objs[0],
+                      (const void *)obj_table,
+                      sizeof(void *) * n,
+                      sizeof(void *) * n / 8, (void *)&cache_objs[0],
+                      sizeof(void *) * n / 8, (const void *)obj_table
+                      );
+               assert(0);
+       }
+
        cache->len +=3D n;

        if (cache->len >=3D cache->flushthresh) {

--=20
You are receiving this mail because:
You are the assignee for the bug.=