DPDK patches and discussions
 help / color / mirror / Atom feed
From: Marc Sune <marc.sune@bisdn.de>
To: "<dev@dpdk.org>" <dev@dpdk.org>
Subject: [dpdk-dev] Memory corruption in librte_ether?
Date: Fri, 17 Oct 2014 23:16:47 +0200	[thread overview]
Message-ID: <5441873F.90500@bisdn.de> (raw)

Hi all,

I was rebasing the KNI mempool v4 patch(I have it finalised, but wanted 
to check) to the latest master HEAD 
(075e064089e1c2b6899db58c69be1a387eb5ffa7) when I ran into problems with 
the current KNI example with em interfaces in a VM. I then switched to 
master's head and retried (so without the KNI mempool patch!) with the 
*same behaviour*. Behaviour here listed is with master head, so nothing 
to do with the patch I am working on.

The *VM*, emulated with qemu has 4 e1000 interfaces attached to several 
bridges. qmeu version 1.1.2 running in debian 7 64bit. With this setup I 
get the error:

(gdb) r
Starting program: /home/marc/dpdk_vanilla/examples/kni/build/kni -c 0x3 
-n 2 -- -p 0x3 -P --config=\(0,1,1,1\),\(1,0,0,0\)
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
EAL: Detected lcore 0 as core 0 on socket 0
EAL: Detected lcore 1 as core 0 on socket 0
EAL: Support maximum 64 logical core(s) by configuration.
EAL: Detected 2 lcore(s)
EAL: Setting up memory...
EAL: Ask a virtual area of 0x200000 bytes
EAL: Virtual area found at 0x7ffff6e00000 (size = 0x200000)
EAL: Ask a virtual area of 0x800000 bytes
EAL: Virtual area found at 0x7ffff6400000 (size = 0x800000)
EAL: Ask a virtual area of 0x400000 bytes
EAL: Virtual area found at 0x7ffff5e00000 (size = 0x400000)
EAL: Ask a virtual area of 0x17000000 bytes
EAL: Virtual area found at 0x7fffdec00000 (size = 0x17000000)
EAL: Ask a virtual area of 0x1e00000 bytes
EAL: Virtual area found at 0x7fffdcc00000 (size = 0x1e00000)
EAL: Ask a virtual area of 0x1400000 bytes
EAL: Virtual area found at 0x7fffdb600000 (size = 0x1400000)
EAL: Ask a virtual area of 0x800000 bytes
EAL: Virtual area found at 0x7fffdac00000 (size = 0x800000)
EAL: Ask a virtual area of 0x2000000 bytes
EAL: Virtual area found at 0x7fffd8a00000 (size = 0x2000000)
EAL: Ask a virtual area of 0x2c00000 bytes
EAL: Virtual area found at 0x7fffd5c00000 (size = 0x2c00000)
EAL: Ask a virtual area of 0x7c00000 bytes
EAL: Virtual area found at 0x7fffcde00000 (size = 0x7c00000)
EAL: Ask a virtual area of 0x400000 bytes
EAL: Virtual area found at 0x7fffcd800000 (size = 0x400000)
EAL: Ask a virtual area of 0xc00000 bytes
EAL: Virtual area found at 0x7fffcca00000 (size = 0xc00000)
EAL: Ask a virtual area of 0x400000 bytes
EAL: Virtual area found at 0x7fffcc400000 (size = 0x400000)
EAL: Ask a virtual area of 0x200000 bytes
EAL: Virtual area found at 0x7fffcc000000 (size = 0x200000)
EAL: Requesting 331 pages of size 2MB from socket 0
[New Thread 0x7fffcbfff700 (LWP 19279)]
yEAL: TSC frequency is ~2494343 KHz
EAL: WARNING: cpu flags constant_tsc=yes nonstop_tsc=no -> using 
unreliable clock cycles !
EAL: Master core 0 is ready (tid=f7ff0800)
[New Thread 0x7fffcb7fc700 (LWP 19280)]
EAL: Core 1 is ready (tid=cb7fc700)
EAL: PCI device 0000:00:03.0 on NUMA socket -1
EAL:   probe driver: 8086:100e rte_em_pmd
EAL:   0000:00:03.0 not managed by UIO driver, skipping
EAL: PCI device 0000:00:06.0 on NUMA socket -1
EAL:   probe driver: 8086:100e rte_em_pmd
EAL:   PCI memory mapped at 0x7ffff7f9a000
PMD: eth_em_dev_init(): port_id 0 vendorID=0x8086 deviceID=0x100e
EAL: PCI device 0000:00:07.0 on NUMA socket -1
EAL:   probe driver: 8086:100e rte_em_pmd
EAL:   PCI memory mapped at 0x7ffff7f7a000
PMD: eth_em_dev_init(): port_id 1 vendorID=0x8086 deviceID=0x100e
EAL: PCI device 0000:00:08.0 on NUMA socket -1
EAL:   probe driver: 8086:100e rte_em_pmd
EAL:   PCI memory mapped at 0x7ffff7f5a000
PMD: eth_em_dev_init(): port_id 2 vendorID=0x8086 deviceID=0x100e
EAL: PCI device 0000:00:09.0 on NUMA socket -1
EAL:   probe driver: 8086:100e rte_em_pmd
EAL:   PCI memory mapped at 0x7ffff7f3a000
PMD: eth_em_dev_init(): port_id 3 vendorID=0x8086 deviceID=0x100e
APP: Port ID: 0
APP: Rx lcore ID: 1, Tx lcore ID: 1
APP: Kernel thread lcore ID: 1
APP: Port ID: 1
APP: Rx lcore ID: 0, Tx lcore ID: 0
APP: Kernel thread lcore ID: 0
APP: Initialising port 0 ...
PMD: eth_em_rx_queue_setup(): sw_ring=0x7fffcd4e7d00 
hw_ring=0x7ffff6fdaac0 dma_addr=0x5daac0
PMD: eth_em_tx_queue_setup(): sw_ring=0x7fffcd4e5c00 
hw_ring=0x7ffff6feaac0 dma_addr=0x5eaac0
PMD: eth_em_start(): <<
KNI: pci: 00:06:00      8086:100e
APP: Initialising port 1 ...
PMD: eth_em_rx_queue_setup(): drop_en functionality not supported by device
EAL: Error - exiting with code: 1
   Cause: Could not setup up RX queue for port1 (-22)
[Thread 0x7fffcb7fc700 (LWP 19280) exited]
[Thread 0x7ffff7ff0800 (LWP 19278) exited]

The default rx_conf in librte_pmd_e1000/igb_ethdev.c seems OK, setting 
drop_en to 0.

Debugging e1000 pmd (the 4 NICs are emulating the same exact device):

marc@dpdk:~/dpdk/lib$ git diff
diff --git a/lib/librte_pmd_e1000/Makefile b/lib/librte_pmd_e1000/Makefile
index 14bc4a2..e50b715 100644
--- a/lib/librte_pmd_e1000/Makefile
+++ b/lib/librte_pmd_e1000/Makefile
@@ -36,7 +36,7 @@ include $(RTE_SDK)/mk/rte.vars.mk
  #
  LIB = librte_pmd_e1000.a

-CFLAGS += -O3
+CFLAGS += -g -O0
  CFLAGS += $(WERROR_FLAGS)

seems something is wrong

First iface (PCI 0:6.0):

(gdb) print dev->data->name
$4 = "0:6.0", '\000' <repeats 26 times>
(gdb) print *rx_conf
$5 = {rx_thresh = {pthresh = 0 '\000', hthresh = 0 '\000', wthresh = 0 
'\000'}, rx_free_thresh = 0, rx_drop_en = 0 '\000', rx_deferred_start = 
0 '\000'}
(gdb)

Second iface (PCI 0:7.0):

(gdb) print dev->data->name
$6 = "0:7.0", '\000' <repeats 26 times>
(gdb) print *rx_conf
$7 = {rx_thresh = {pthresh = 0 '\000', hthresh = 0 '\000', wthresh = 0 
'\000'}, rx_free_thresh = 33088, rx_drop_en = 176 '\260', 
rx_deferred_start = 44 ','}

Note that rx_free_thresh on has polluted values.

However, when adding -g -O0 in ethdev:

marc@dpdk:~/dpdk/lib$ git diff
diff --git a/lib/librte_ether/Makefile b/lib/librte_ether/Makefile
index b310f8b..ec385ef 100644
--- a/lib/librte_ether/Makefile
+++ b/lib/librte_ether/Makefile
@@ -36,7 +36,7 @@ include $(RTE_SDK)/mk/rte.vars.mk
  #
  LIB = libethdev.a

-CFLAGS += -O3
+CFLAGS += -g -O0
  CFLAGS += $(WERROR_FLAGS)

  SRCS-y += rte_ethdev.c
diff --git a/lib/librte_pmd_e1000/Makefile b/lib/librte_pmd_e1000/Makefile
index 14bc4a2..e50b715 100644
--- a/lib/librte_pmd_e1000/Makefile
+++ b/lib/librte_pmd_e1000/Makefile
@@ -36,7 +36,7 @@ include $(RTE_SDK)/mk/rte.vars.mk
  #
  LIB = librte_pmd_e1000.a

-CFLAGS += -O3
+CFLAGS += -g -O0
  CFLAGS += $(WERROR_FLAGS)

  ifeq ($(CC), icc)


Now the rx queue has correctly been set up (memory corruption!) so the 
rx_conf appears to be OK, although now tx_conf seems wrong:

(gdb) r
Starting program: /home/marc/dpdk_vanilla/examples/kni/build/kni -c 0x3 
-n 2 -- -p 0x3 -P --config=\(0,1,1,1\),\(1,0,0,0\)
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
EAL: Detected lcore 0 as core 0 on socket 0
EAL: Detected lcore 1 as core 0 on socket 0
EAL: Support maximum 64 logical core(s) by configuration.
EAL: Detected 2 lcore(s)
EAL: Setting up memory...
EAL: Ask a virtual area of 0x200000 bytes
EAL: Virtual area found at 0x7ffff6e00000 (size = 0x200000)
EAL: Ask a virtual area of 0x800000 bytes
EAL: Virtual area found at 0x7ffff6400000 (size = 0x800000)
EAL: Ask a virtual area of 0x400000 bytes
EAL: Virtual area found at 0x7ffff5e00000 (size = 0x400000)
EAL: Ask a virtual area of 0x17000000 bytes
EAL: Virtual area found at 0x7fffdec00000 (size = 0x17000000)
EAL: Ask a virtual area of 0x1e00000 bytes
EAL: Virtual area found at 0x7fffdcc00000 (size = 0x1e00000)
EAL: Ask a virtual area of 0x1400000 bytes
EAL: Virtual area found at 0x7fffdb600000 (size = 0x1400000)
EAL: Ask a virtual area of 0x800000 bytes
EAL: Virtual area found at 0x7fffdac00000 (size = 0x800000)
EAL: Ask a virtual area of 0x2000000 bytes
EAL: Virtual area found at 0x7fffd8a00000 (size = 0x2000000)
EAL: Ask a virtual area of 0x2c00000 bytes
EAL: Virtual area found at 0x7fffd5c00000 (size = 0x2c00000)
EAL: Ask a virtual area of 0x7c00000 bytes
EAL: Virtual area found at 0x7fffcde00000 (size = 0x7c00000)
EAL: Ask a virtual area of 0x400000 bytes
EAL: Virtual area found at 0x7fffcd800000 (size = 0x400000)
EAL: Ask a virtual area of 0xc00000 bytes
EAL: Virtual area found at 0x7fffcca00000 (size = 0xc00000)
EAL: Ask a virtual area of 0x400000 bytes
EAL: Virtual area found at 0x7fffcc400000 (size = 0x400000)
EAL: Ask a virtual area of 0x200000 bytes
EAL: Virtual area found at 0x7fffcc000000 (size = 0x200000)
EAL: Requesting 331 pages of size 2MB from socket 0
[New Thread 0x7fffcbfff700 (LWP 22143)]
EAL: TSC frequency is ~2494343 KHz
EAL: WARNING: cpu flags constant_tsc=yes nonstop_tsc=no -> using 
unreliable clock cycles !
EAL: Master core 0 is ready (tid=f7ff0800)
[New Thread 0x7fffcb7fc700 (LWP 22144)]
EAL: Core 1 is ready (tid=cb7fc700)
EAL: PCI device 0000:00:03.0 on NUMA socket -1
EAL:   probe driver: 8086:100e rte_em_pmd
EAL:   0000:00:03.0 not managed by UIO driver, skipping
EAL: PCI device 0000:00:06.0 on NUMA socket -1
EAL:   probe driver: 8086:100e rte_em_pmd
EAL:   PCI memory mapped at 0x7ffff7f9a000
PMD: eth_em_dev_init(): port_id 0 vendorID=0x8086 deviceID=0x100e
EAL: PCI device 0000:00:07.0 on NUMA socket -1
EAL:   probe driver: 8086:100e rte_em_pmd
EAL:   PCI memory mapped at 0x7ffff7f7a000
PMD: eth_em_dev_init(): port_id 1 vendorID=0x8086 deviceID=0x100e
EAL: PCI device 0000:00:08.0 on NUMA socket -1
EAL:   probe driver: 8086:100e rte_em_pmd
EAL:   PCI memory mapped at 0x7ffff7f5a000
PMD: eth_em_dev_init(): port_id 2 vendorID=0x8086 deviceID=0x100e
EAL: PCI device 0000:00:09.0 on NUMA socket -1
EAL:   probe driver: 8086:100e rte_em_pmd
EAL:   PCI memory mapped at 0x7ffff7f3a000
PMD: eth_em_dev_init(): port_id 3 vendorID=0x8086 deviceID=0x100e
APP: Port ID: 0
APP: Rx lcore ID: 1, Tx lcore ID: 1
APP: Kernel thread lcore ID: 1
APP: Port ID: 1
APP: Rx lcore ID: 0, Tx lcore ID: 0
APP: Kernel thread lcore ID: 0
APP: Initialising port 0 ...
PMD: eth_em_rx_queue_setup(): sw_ring=0x7fffcd4e7d00 
hw_ring=0x7ffff6fdaac0 dma_addr=0x5daac0
PMD: eth_em_tx_queue_setup(): sw_ring=0x7fffcd4e5c00 
hw_ring=0x7ffff6feaac0 dma_addr=0x5eaac0
PMD: eth_em_start(): <<
KNI: pci: 00:06:00      8086:100e
APP: Initialising port 1 ...
PMD: eth_em_rx_queue_setup(): sw_ring=0x7fffcd4e5600 
hw_ring=0x7fffcd50c1c0 dma_addr=0x2cb0c1c0
PMD: eth_em_tx_queue_setup(): tx_free_thresh must be less than the 
number of TX descriptors minus 3. (tx_free_thresh=65535 port=1 queue=0)
EAL: Error - exiting with code: 1
   Cause: Could not setup up TX queue for port1 (-22)
[Thread 0x7fffcbfff700 (LWP 22143) exited]
[Thread 0x7ffff7ff0800 (LWP 22140) exited]
[Inferior 1 (process 22140) exited with code 01]

Debugging it:

MD: eth_em_rx_queue_setup(): sw_ring=0x7fffcd4e7d00 
hw_ring=0x7ffff6fdaac0 dma_addr=0x5daac0

Breakpoint 1, eth_em_tx_queue_setup (dev=0x796420, queue_idx=0, 
nb_desc=512, socket_id=4294967295, tx_conf=0x7fffffffe39c)
     at /home/marc/dpdk_vanilla/lib/librte_pmd_e1000/em_rxtx.c:1208
1208        hw = E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private);
(gdb) print dev->data->name
$1 = "0:6.0", '\000' <repeats 26 times>
(gdb) print tx_conf
$2 = (const struct rte_eth_txconf *) 0x7fffffffe39c
(gdb) print *tx_conf
$3 = {tx_thresh = {pthresh = 0 '\000', hthresh = 0 '\000', wthresh = 0 
'\000'}, tx_rs_thresh = 0, tx_free_thresh = 0, txq_flags = 0, 
tx_deferred_start = 0 '\000'}
(gdb) c
Continuing.
PMD: eth_em_tx_queue_setup(): sw_ring=0x7fffcd4e5c00 
hw_ring=0x7ffff6feaac0 dma_addr=0x5eaac0
PMD: eth_em_start(): <<
KNI: pci: 00:06:00      8086:100e
APP: Initialising port 1 ...
PMD: eth_em_rx_queue_setup(): sw_ring=0x7fffcd4e5600 
hw_ring=0x7fffcd50c1c0 dma_addr=0x2cb0c1c0

Breakpoint 1, eth_em_tx_queue_setup (dev=0x796460, queue_idx=0, 
nb_desc=512, socket_id=4294967295, tx_conf=0x7fffffffe39c)
     at /home/marc/dpdk_vanilla/lib/librte_pmd_e1000/em_rxtx.c:1208
1208        hw = E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private);
(gdb) print dev->data->name
$4 = "0:7.0", '\000' <repeats 26 times>
(gdb) print *tx_conf
$5 = {tx_thresh = {pthresh = 0 '\000', hthresh = 0 '\000', wthresh = 0 
'\000'}, tx_rs_thresh = 58608, tx_free_thresh = 65535, txq_flags = 32767,
   tx_deferred_start = 0 '\000'}

The KNI example runs *perfectly*in the VM, with the same launching 
parameters with v1.7.1,  and seems to work fine until 
27b31ee33fa5e7cc9a086c690b98ed8e1a153c6a. So the commit that breaks it 
(the example, not the commit that is wrong) seems to be:

commit 81f7ecd934372fc9f592d1322f8eff86350fa4f5
Author: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Date:   Wed Oct 1 10:49:05 2014 +0100

     examples: use factorized default Rx/Tx configuration

     For apps that were using default rte_eth_rxconf and rte_eth_txconf
     structures, these have been removed and now they are obtained by
     calling rte_eth_dev_info_get, just before setting up RX/TX queues.

     Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
     Acked-by: David Marchand <david.marchand@6wind.com>


Which seems to indicate rte_eth_dev_info_get() is somehow corrupting 
memory(?¿). But I haven't figure out the problem (yet). I suspect of:

commit fbde27f19ab8f1d386868275bd8c016e693cf073
Author: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Date:   Wed Oct 1 10:49:04 2014 +0100

     ethdev: get default Rx/Tx configuration from dev info

     Many sample apps use duplicated code to set rte_eth_txconf and 
rte_eth_rxconf
     structures. This patch allows the user to get a default optimal 
RX/TX configuration
     through rte_eth_dev_info get, and still any parameters may be 
tweaked as wished,
     before setting up queues.

     Besides, if a NULL pointer is passed to rte_eth_rx_queue_setup or
     rte_eth_tx_queue_setup, these functions get internally the default 
RX/TX
     configuration for the user.

     Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
     Reviewed-by: Bruce Richardson <bruce.richardson@intel.com>
     Acked-by: David Marchand <david.marchand@6wind.com>
     [Thomas: split patch]

commit a30268e9a2d0618902e8cf96b90b27db4fb02d54
Author: Pablo de Lara <pablo.de.lara.guarch@intel.com>
Date:   Wed Oct 1 10:49:03 2014 +0100

     ethdev: reset whole dev info structure before filling

     To guarantee that RX/TX configuration structures are reseted
     before modifying them, plus the other dev info fields,
     dev info structure is zeroed beforehand.

     Signed-off-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
     Acked-by: David Marchand <david.marchand@6wind.com>


Can anyone confirm it?

Marc

p.s. Has someone managed to run a dpdk app with valgrind?

             reply	other threads:[~2014-10-17 21:08 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-10-17 21:16 Marc Sune [this message]
2014-10-20 17:31 ` De Lara Guarch, Pablo
2014-10-21  8:12   ` Marc Sune

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5441873F.90500@bisdn.de \
    --to=marc.sune@bisdn.de \
    --cc=dev@dpdk.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).