From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx.bisdn.de (mx.bisdn.de [185.27.182.31]) by dpdk.org (Postfix) with ESMTP id C68867E74 for ; Fri, 17 Oct 2014 23:08:19 +0200 (CEST) Received: from [192.168.1.43] (137.Red-88-11-218.dynamicIP.rima-tde.net [88.11.218.137]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx.bisdn.de (Postfix) with ESMTPSA id 75927A1764 for ; Fri, 17 Oct 2014 23:16:18 +0200 (CEST) Message-ID: <5441873F.90500@bisdn.de> Date: Fri, 17 Oct 2014 23:16:47 +0200 From: Marc Sune User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20131103 Icedove/17.0.10 MIME-Version: 1.0 To: "" Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit Subject: [dpdk-dev] Memory corruption in librte_ether? X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Oct 2014 21:08:20 -0000 Hi all, I was rebasing the KNI mempool v4 patch(I have it finalised, but wanted to check) to the latest master HEAD (075e064089e1c2b6899db58c69be1a387eb5ffa7) when I ran into problems with the current KNI example with em interfaces in a VM. I then switched to master's head and retried (so without the KNI mempool patch!) with the *same behaviour*. Behaviour here listed is with master head, so nothing to do with the patch I am working on. The *VM*, emulated with qemu has 4 e1000 interfaces attached to several bridges. qmeu version 1.1.2 running in debian 7 64bit. With this setup I get the error: (gdb) r Starting program: /home/marc/dpdk_vanilla/examples/kni/build/kni -c 0x3 -n 2 -- -p 0x3 -P --config=\(0,1,1,1\),\(1,0,0,0\) [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". EAL: Detected lcore 0 as core 0 on socket 0 EAL: Detected lcore 1 as core 0 on socket 0 EAL: Support maximum 64 logical core(s) by configuration. EAL: Detected 2 lcore(s) EAL: Setting up memory... EAL: Ask a virtual area of 0x200000 bytes EAL: Virtual area found at 0x7ffff6e00000 (size = 0x200000) EAL: Ask a virtual area of 0x800000 bytes EAL: Virtual area found at 0x7ffff6400000 (size = 0x800000) EAL: Ask a virtual area of 0x400000 bytes EAL: Virtual area found at 0x7ffff5e00000 (size = 0x400000) EAL: Ask a virtual area of 0x17000000 bytes EAL: Virtual area found at 0x7fffdec00000 (size = 0x17000000) EAL: Ask a virtual area of 0x1e00000 bytes EAL: Virtual area found at 0x7fffdcc00000 (size = 0x1e00000) EAL: Ask a virtual area of 0x1400000 bytes EAL: Virtual area found at 0x7fffdb600000 (size = 0x1400000) EAL: Ask a virtual area of 0x800000 bytes EAL: Virtual area found at 0x7fffdac00000 (size = 0x800000) EAL: Ask a virtual area of 0x2000000 bytes EAL: Virtual area found at 0x7fffd8a00000 (size = 0x2000000) EAL: Ask a virtual area of 0x2c00000 bytes EAL: Virtual area found at 0x7fffd5c00000 (size = 0x2c00000) EAL: Ask a virtual area of 0x7c00000 bytes EAL: Virtual area found at 0x7fffcde00000 (size = 0x7c00000) EAL: Ask a virtual area of 0x400000 bytes EAL: Virtual area found at 0x7fffcd800000 (size = 0x400000) EAL: Ask a virtual area of 0xc00000 bytes EAL: Virtual area found at 0x7fffcca00000 (size = 0xc00000) EAL: Ask a virtual area of 0x400000 bytes EAL: Virtual area found at 0x7fffcc400000 (size = 0x400000) EAL: Ask a virtual area of 0x200000 bytes EAL: Virtual area found at 0x7fffcc000000 (size = 0x200000) EAL: Requesting 331 pages of size 2MB from socket 0 [New Thread 0x7fffcbfff700 (LWP 19279)] yEAL: TSC frequency is ~2494343 KHz EAL: WARNING: cpu flags constant_tsc=yes nonstop_tsc=no -> using unreliable clock cycles ! EAL: Master core 0 is ready (tid=f7ff0800) [New Thread 0x7fffcb7fc700 (LWP 19280)] EAL: Core 1 is ready (tid=cb7fc700) EAL: PCI device 0000:00:03.0 on NUMA socket -1 EAL: probe driver: 8086:100e rte_em_pmd EAL: 0000:00:03.0 not managed by UIO driver, skipping EAL: PCI device 0000:00:06.0 on NUMA socket -1 EAL: probe driver: 8086:100e rte_em_pmd EAL: PCI memory mapped at 0x7ffff7f9a000 PMD: eth_em_dev_init(): port_id 0 vendorID=0x8086 deviceID=0x100e EAL: PCI device 0000:00:07.0 on NUMA socket -1 EAL: probe driver: 8086:100e rte_em_pmd EAL: PCI memory mapped at 0x7ffff7f7a000 PMD: eth_em_dev_init(): port_id 1 vendorID=0x8086 deviceID=0x100e EAL: PCI device 0000:00:08.0 on NUMA socket -1 EAL: probe driver: 8086:100e rte_em_pmd EAL: PCI memory mapped at 0x7ffff7f5a000 PMD: eth_em_dev_init(): port_id 2 vendorID=0x8086 deviceID=0x100e EAL: PCI device 0000:00:09.0 on NUMA socket -1 EAL: probe driver: 8086:100e rte_em_pmd EAL: PCI memory mapped at 0x7ffff7f3a000 PMD: eth_em_dev_init(): port_id 3 vendorID=0x8086 deviceID=0x100e APP: Port ID: 0 APP: Rx lcore ID: 1, Tx lcore ID: 1 APP: Kernel thread lcore ID: 1 APP: Port ID: 1 APP: Rx lcore ID: 0, Tx lcore ID: 0 APP: Kernel thread lcore ID: 0 APP: Initialising port 0 ... PMD: eth_em_rx_queue_setup(): sw_ring=0x7fffcd4e7d00 hw_ring=0x7ffff6fdaac0 dma_addr=0x5daac0 PMD: eth_em_tx_queue_setup(): sw_ring=0x7fffcd4e5c00 hw_ring=0x7ffff6feaac0 dma_addr=0x5eaac0 PMD: eth_em_start(): << KNI: pci: 00:06:00 8086:100e APP: Initialising port 1 ... PMD: eth_em_rx_queue_setup(): drop_en functionality not supported by device EAL: Error - exiting with code: 1 Cause: Could not setup up RX queue for port1 (-22) [Thread 0x7fffcb7fc700 (LWP 19280) exited] [Thread 0x7ffff7ff0800 (LWP 19278) exited] The default rx_conf in librte_pmd_e1000/igb_ethdev.c seems OK, setting drop_en to 0. Debugging e1000 pmd (the 4 NICs are emulating the same exact device): marc@dpdk:~/dpdk/lib$ git diff diff --git a/lib/librte_pmd_e1000/Makefile b/lib/librte_pmd_e1000/Makefile index 14bc4a2..e50b715 100644 --- a/lib/librte_pmd_e1000/Makefile +++ b/lib/librte_pmd_e1000/Makefile @@ -36,7 +36,7 @@ include $(RTE_SDK)/mk/rte.vars.mk # LIB = librte_pmd_e1000.a -CFLAGS += -O3 +CFLAGS += -g -O0 CFLAGS += $(WERROR_FLAGS) seems something is wrong First iface (PCI 0:6.0): (gdb) print dev->data->name $4 = "0:6.0", '\000' (gdb) print *rx_conf $5 = {rx_thresh = {pthresh = 0 '\000', hthresh = 0 '\000', wthresh = 0 '\000'}, rx_free_thresh = 0, rx_drop_en = 0 '\000', rx_deferred_start = 0 '\000'} (gdb) Second iface (PCI 0:7.0): (gdb) print dev->data->name $6 = "0:7.0", '\000' (gdb) print *rx_conf $7 = {rx_thresh = {pthresh = 0 '\000', hthresh = 0 '\000', wthresh = 0 '\000'}, rx_free_thresh = 33088, rx_drop_en = 176 '\260', rx_deferred_start = 44 ','} Note that rx_free_thresh on has polluted values. However, when adding -g -O0 in ethdev: marc@dpdk:~/dpdk/lib$ git diff diff --git a/lib/librte_ether/Makefile b/lib/librte_ether/Makefile index b310f8b..ec385ef 100644 --- a/lib/librte_ether/Makefile +++ b/lib/librte_ether/Makefile @@ -36,7 +36,7 @@ include $(RTE_SDK)/mk/rte.vars.mk # LIB = libethdev.a -CFLAGS += -O3 +CFLAGS += -g -O0 CFLAGS += $(WERROR_FLAGS) SRCS-y += rte_ethdev.c diff --git a/lib/librte_pmd_e1000/Makefile b/lib/librte_pmd_e1000/Makefile index 14bc4a2..e50b715 100644 --- a/lib/librte_pmd_e1000/Makefile +++ b/lib/librte_pmd_e1000/Makefile @@ -36,7 +36,7 @@ include $(RTE_SDK)/mk/rte.vars.mk # LIB = librte_pmd_e1000.a -CFLAGS += -O3 +CFLAGS += -g -O0 CFLAGS += $(WERROR_FLAGS) ifeq ($(CC), icc) Now the rx queue has correctly been set up (memory corruption!) so the rx_conf appears to be OK, although now tx_conf seems wrong: (gdb) r Starting program: /home/marc/dpdk_vanilla/examples/kni/build/kni -c 0x3 -n 2 -- -p 0x3 -P --config=\(0,1,1,1\),\(1,0,0,0\) [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". EAL: Detected lcore 0 as core 0 on socket 0 EAL: Detected lcore 1 as core 0 on socket 0 EAL: Support maximum 64 logical core(s) by configuration. EAL: Detected 2 lcore(s) EAL: Setting up memory... EAL: Ask a virtual area of 0x200000 bytes EAL: Virtual area found at 0x7ffff6e00000 (size = 0x200000) EAL: Ask a virtual area of 0x800000 bytes EAL: Virtual area found at 0x7ffff6400000 (size = 0x800000) EAL: Ask a virtual area of 0x400000 bytes EAL: Virtual area found at 0x7ffff5e00000 (size = 0x400000) EAL: Ask a virtual area of 0x17000000 bytes EAL: Virtual area found at 0x7fffdec00000 (size = 0x17000000) EAL: Ask a virtual area of 0x1e00000 bytes EAL: Virtual area found at 0x7fffdcc00000 (size = 0x1e00000) EAL: Ask a virtual area of 0x1400000 bytes EAL: Virtual area found at 0x7fffdb600000 (size = 0x1400000) EAL: Ask a virtual area of 0x800000 bytes EAL: Virtual area found at 0x7fffdac00000 (size = 0x800000) EAL: Ask a virtual area of 0x2000000 bytes EAL: Virtual area found at 0x7fffd8a00000 (size = 0x2000000) EAL: Ask a virtual area of 0x2c00000 bytes EAL: Virtual area found at 0x7fffd5c00000 (size = 0x2c00000) EAL: Ask a virtual area of 0x7c00000 bytes EAL: Virtual area found at 0x7fffcde00000 (size = 0x7c00000) EAL: Ask a virtual area of 0x400000 bytes EAL: Virtual area found at 0x7fffcd800000 (size = 0x400000) EAL: Ask a virtual area of 0xc00000 bytes EAL: Virtual area found at 0x7fffcca00000 (size = 0xc00000) EAL: Ask a virtual area of 0x400000 bytes EAL: Virtual area found at 0x7fffcc400000 (size = 0x400000) EAL: Ask a virtual area of 0x200000 bytes EAL: Virtual area found at 0x7fffcc000000 (size = 0x200000) EAL: Requesting 331 pages of size 2MB from socket 0 [New Thread 0x7fffcbfff700 (LWP 22143)] EAL: TSC frequency is ~2494343 KHz EAL: WARNING: cpu flags constant_tsc=yes nonstop_tsc=no -> using unreliable clock cycles ! EAL: Master core 0 is ready (tid=f7ff0800) [New Thread 0x7fffcb7fc700 (LWP 22144)] EAL: Core 1 is ready (tid=cb7fc700) EAL: PCI device 0000:00:03.0 on NUMA socket -1 EAL: probe driver: 8086:100e rte_em_pmd EAL: 0000:00:03.0 not managed by UIO driver, skipping EAL: PCI device 0000:00:06.0 on NUMA socket -1 EAL: probe driver: 8086:100e rte_em_pmd EAL: PCI memory mapped at 0x7ffff7f9a000 PMD: eth_em_dev_init(): port_id 0 vendorID=0x8086 deviceID=0x100e EAL: PCI device 0000:00:07.0 on NUMA socket -1 EAL: probe driver: 8086:100e rte_em_pmd EAL: PCI memory mapped at 0x7ffff7f7a000 PMD: eth_em_dev_init(): port_id 1 vendorID=0x8086 deviceID=0x100e EAL: PCI device 0000:00:08.0 on NUMA socket -1 EAL: probe driver: 8086:100e rte_em_pmd EAL: PCI memory mapped at 0x7ffff7f5a000 PMD: eth_em_dev_init(): port_id 2 vendorID=0x8086 deviceID=0x100e EAL: PCI device 0000:00:09.0 on NUMA socket -1 EAL: probe driver: 8086:100e rte_em_pmd EAL: PCI memory mapped at 0x7ffff7f3a000 PMD: eth_em_dev_init(): port_id 3 vendorID=0x8086 deviceID=0x100e APP: Port ID: 0 APP: Rx lcore ID: 1, Tx lcore ID: 1 APP: Kernel thread lcore ID: 1 APP: Port ID: 1 APP: Rx lcore ID: 0, Tx lcore ID: 0 APP: Kernel thread lcore ID: 0 APP: Initialising port 0 ... PMD: eth_em_rx_queue_setup(): sw_ring=0x7fffcd4e7d00 hw_ring=0x7ffff6fdaac0 dma_addr=0x5daac0 PMD: eth_em_tx_queue_setup(): sw_ring=0x7fffcd4e5c00 hw_ring=0x7ffff6feaac0 dma_addr=0x5eaac0 PMD: eth_em_start(): << KNI: pci: 00:06:00 8086:100e APP: Initialising port 1 ... PMD: eth_em_rx_queue_setup(): sw_ring=0x7fffcd4e5600 hw_ring=0x7fffcd50c1c0 dma_addr=0x2cb0c1c0 PMD: eth_em_tx_queue_setup(): tx_free_thresh must be less than the number of TX descriptors minus 3. (tx_free_thresh=65535 port=1 queue=0) EAL: Error - exiting with code: 1 Cause: Could not setup up TX queue for port1 (-22) [Thread 0x7fffcbfff700 (LWP 22143) exited] [Thread 0x7ffff7ff0800 (LWP 22140) exited] [Inferior 1 (process 22140) exited with code 01] Debugging it: MD: eth_em_rx_queue_setup(): sw_ring=0x7fffcd4e7d00 hw_ring=0x7ffff6fdaac0 dma_addr=0x5daac0 Breakpoint 1, eth_em_tx_queue_setup (dev=0x796420, queue_idx=0, nb_desc=512, socket_id=4294967295, tx_conf=0x7fffffffe39c) at /home/marc/dpdk_vanilla/lib/librte_pmd_e1000/em_rxtx.c:1208 1208 hw = E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private); (gdb) print dev->data->name $1 = "0:6.0", '\000' (gdb) print tx_conf $2 = (const struct rte_eth_txconf *) 0x7fffffffe39c (gdb) print *tx_conf $3 = {tx_thresh = {pthresh = 0 '\000', hthresh = 0 '\000', wthresh = 0 '\000'}, tx_rs_thresh = 0, tx_free_thresh = 0, txq_flags = 0, tx_deferred_start = 0 '\000'} (gdb) c Continuing. PMD: eth_em_tx_queue_setup(): sw_ring=0x7fffcd4e5c00 hw_ring=0x7ffff6feaac0 dma_addr=0x5eaac0 PMD: eth_em_start(): << KNI: pci: 00:06:00 8086:100e APP: Initialising port 1 ... PMD: eth_em_rx_queue_setup(): sw_ring=0x7fffcd4e5600 hw_ring=0x7fffcd50c1c0 dma_addr=0x2cb0c1c0 Breakpoint 1, eth_em_tx_queue_setup (dev=0x796460, queue_idx=0, nb_desc=512, socket_id=4294967295, tx_conf=0x7fffffffe39c) at /home/marc/dpdk_vanilla/lib/librte_pmd_e1000/em_rxtx.c:1208 1208 hw = E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private); (gdb) print dev->data->name $4 = "0:7.0", '\000' (gdb) print *tx_conf $5 = {tx_thresh = {pthresh = 0 '\000', hthresh = 0 '\000', wthresh = 0 '\000'}, tx_rs_thresh = 58608, tx_free_thresh = 65535, txq_flags = 32767, tx_deferred_start = 0 '\000'} The KNI example runs *perfectly*in the VM, with the same launching parameters with v1.7.1, and seems to work fine until 27b31ee33fa5e7cc9a086c690b98ed8e1a153c6a. So the commit that breaks it (the example, not the commit that is wrong) seems to be: commit 81f7ecd934372fc9f592d1322f8eff86350fa4f5 Author: Pablo de Lara Date: Wed Oct 1 10:49:05 2014 +0100 examples: use factorized default Rx/Tx configuration For apps that were using default rte_eth_rxconf and rte_eth_txconf structures, these have been removed and now they are obtained by calling rte_eth_dev_info_get, just before setting up RX/TX queues. Signed-off-by: Pablo de Lara Acked-by: David Marchand Which seems to indicate rte_eth_dev_info_get() is somehow corrupting memory(?¿). But I haven't figure out the problem (yet). I suspect of: commit fbde27f19ab8f1d386868275bd8c016e693cf073 Author: Pablo de Lara Date: Wed Oct 1 10:49:04 2014 +0100 ethdev: get default Rx/Tx configuration from dev info Many sample apps use duplicated code to set rte_eth_txconf and rte_eth_rxconf structures. This patch allows the user to get a default optimal RX/TX configuration through rte_eth_dev_info get, and still any parameters may be tweaked as wished, before setting up queues. Besides, if a NULL pointer is passed to rte_eth_rx_queue_setup or rte_eth_tx_queue_setup, these functions get internally the default RX/TX configuration for the user. Signed-off-by: Pablo de Lara Reviewed-by: Bruce Richardson Acked-by: David Marchand [Thomas: split patch] commit a30268e9a2d0618902e8cf96b90b27db4fb02d54 Author: Pablo de Lara Date: Wed Oct 1 10:49:03 2014 +0100 ethdev: reset whole dev info structure before filling To guarantee that RX/TX configuration structures are reseted before modifying them, plus the other dev info fields, dev info structure is zeroed beforehand. Signed-off-by: Pablo de Lara Acked-by: David Marchand Can anyone confirm it? Marc p.s. Has someone managed to run a dpdk app with valgrind?