From: Gilad Berman <giladb@mellanox.com>
To: Xiaozhou Li <xl@CS.Princeton.EDU>
Cc: "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] Performance issues with Mellanox Connectx-3 EN
Date: Sun, 16 Aug 2015 06:38:09 +0000 [thread overview]
Message-ID: <AM2PR05MB0675075BA1D5887AEFB11B02A07A0@AM2PR05MB0675.eurprd05.prod.outlook.com> (raw)
In-Reply-To: <CAL8CsmSsEA=siuspUSnDNE8PpXJmOfHkpqradAY+QF7vp+u4KQ@mail.gmail.com>
Xiaozhou,
I will take this thread offline and mail you. I promise to post the solution back in the list for future reference.
I do not want to spam everyone..
Thx!
From: Xiaozhou Li [mailto:xl@CS.Princeton.EDU]
Sent: Friday, August 14, 2015 7:11 AM
To: Gilad Berman <giladb@mellanox.com>
Cc: Xu, Qian Q <qian.q.xu@intel.com>; dev@dpdk.org
Subject: Re: [dpdk-dev] Performance issues with Mellanox Connectx-3 EN
Hi Qian and Gilad,
Thanks for your reply. We are using dpdk-2.0.0 and mlnx-en-2.4-1.0.0.1 on a Mellanox Connectx-3 EN with a single 40G port.
I ran testpmd on the server with following commands: sudo ./testpmd -c 0xff -n 4 -- -i --portmask=0x1 --port-topology=chained --rxq=4 --txq=4 --nb-cores=4; set fwd macswap
I have multiple clients send packets and receive replies. The server throughput is still only about 2Mpps. Testpmd shows no RX-dropped packet, but "ifconfig port" shows many dropped packets.
Please let me know if I am doing anything wrong and what else should I check. I am also copying the output when starting testpmd at the end of this email. Not sure if there is any useful information.
Thanks!
Xiaozhou
EAL: Detected lcore 0 as core 0 on socket 0
... (omit) ...
EAL: Detected 32 lcore(s)
EAL: VFIO modules not all loaded, skip VFIO support...
EAL: Setting up memory...
... (omit) ...
EAL: Ask a virtual area of 0xa00000 bytes
EAL: Virtual area found at 0x7f2d2fe00000 (size = 0xa00000)
EAL: Requesting 8192 pages of size 2MB from socket 0
EAL: Requesting 8192 pages of size 2MB from socket 1
EAL: TSC frequency is ~2199994 KHz
EAL: Master lcore 0 is ready (tid=39add900;cpuset=[0])
PMD: ENICPMD trace: rte_enic_pmd_init
EAL: lcore 4 is ready (tid=3676b700;cpuset=[4])
EAL: lcore 6 is ready (tid=35769700;cpuset=[6])
EAL: lcore 5 is ready (tid=35f6a700;cpuset=[5])
EAL: lcore 2 is ready (tid=3776d700;cpuset=[2])
EAL: lcore 1 is ready (tid=37f6e700;cpuset=[1])
EAL: lcore 3 is ready (tid=36f6c700;cpuset=[3])
EAL: lcore 7 is ready (tid=34f68700;cpuset=[7])
EAL: PCI device 0000:04:00.0 on NUMA socket 0
EAL: probe driver: 8086:1521 rte_igb_pmd
EAL: Not managed by a supported kernel driver, skipped
EAL: PCI device 0000:04:00.1 on NUMA socket 0
EAL: probe driver: 8086:1521 rte_igb_pmd
EAL: Not managed by a supported kernel driver, skipped
EAL: PCI device 0000:06:00.0 on NUMA socket 0
EAL: probe driver: 15b3:1003 librte_pmd_mlx4
PMD: librte_pmd_mlx4: PCI information matches, using device "mlx4_0" (VF: false)
PMD: librte_pmd_mlx4: 1 port(s) detected
PMD: librte_pmd_mlx4: port 1 MAC address is f4:52:14:5a:8f:70
EAL: PCI device 0000:81:00.0 on NUMA socket 1
EAL: probe driver: 8086:1528 rte_ixgbe_pmd
EAL: Not managed by a supported kernel driver, skipped
EAL: PCI device 0000:81:00.1 on NUMA socket 1
EAL: probe driver: 8086:1528 rte_ixgbe_pmd
EAL: Not managed by a supported kernel driver, skipped
Interactive-mode selected
Configuring Port 0 (socket 0)
PMD: librte_pmd_mlx4: 0x884360: TX queues number update: 0 -> 4
PMD: librte_pmd_mlx4: 0x884360: RX queues number update: 0 -> 4
Port 0: F4:52:14:5A:8F:70
Checking link statuses...
Port 0 Link Up - speed 40000 Mbps - full-duplex
Done
testpmd> show config rxtx
macswap packet forwarding - CRC stripping disabled - packets/burst=32
nb forwarding cores=4 - nb forwarding ports=1
RX queues=4 - RX desc=128 - RX free threshold=0
RX threshold registers: pthresh=0 hthresh=0 wthresh=0
TX queues=4 - TX desc=512 - TX free threshold=0
TX threshold registers: pthresh=0 hthresh=0 wthresh=0
TX RS bit threshold=0 - TXQ flags=0x0
testpmd> show config fwd
macswap packet forwarding - ports=1 - cores=4 - streams=4 - NUMA support disabled, MP over anonymous pages disabled
Logical Core 1 (socket 0) forwards packets on 1 streams:
RX P=0/Q=0 (socket 0) -> TX P=0/Q=0 (socket 0) peer=02:00:00:00:00:00
Logical Core 2 (socket 0) forwards packets on 1 streams:
RX P=0/Q=1 (socket 0) -> TX P=0/Q=1 (socket 0) peer=02:00:00:00:00:00
Logical Core 3 (socket 0) forwards packets on 1 streams:
RX P=0/Q=2 (socket 0) -> TX P=0/Q=2 (socket 0) peer=02:00:00:00:00:00
Logical Core 4 (socket 0) forwards packets on 1 streams:
RX P=0/Q=3 (socket 0) -> TX P=0/Q=3 (socket 0) peer=02:00:00:00:00:00
On Thu, Aug 13, 2015 at 6:13 AM, Gilad Berman <giladb@mellanox.com<mailto:giladb@mellanox.com>> wrote:
Xiaozhou,
Following Qian answer - 2Mpps is VERY (VERY) low and far below what we see even with single core.
Which version of DPDK and PMD are you using? Are you using MLNX optimized libs for PMD? Can you provide more details on the exact setup?
Can you run a simple test with testpmd and see if you are getting the same results?
Just to be clear - it does not matter which version you are using, 2Mpps is very far from what you should get :)
-----Original Message-----
From: dev [mailto:dev-bounces@dpdk.org<mailto:dev-bounces@dpdk.org>] On Behalf Of Xu, Qian Q
Sent: Thursday, August 13, 2015 6:25 AM
To: Xiaozhou Li <xl@CS.Princeton.EDU<mailto:xl@CS.Princeton.EDU>>; dev@dpdk.org<mailto:dev@dpdk.org>
Subject: Re: [dpdk-dev] Performance issues with Mellanox Connectx-3 EN
Xiaozhou
So seems the performance bottleneck is not at the core, have you checked that the Mellanox NIC's configuration? How many queues per port are you using? Could you try l3fwd example with Mellanox to check if the performance is good enough? I'm not familiar with Mellanox NIC, but if you have tried Intel Fortville 40G NIC, I can give more suggestions about the NIC's configurations.
Thanks
Qian
-----Original Message-----
From: dev [mailto:dev-bounces@dpdk.org<mailto:dev-bounces@dpdk.org>] On Behalf Of Xiaozhou Li
Sent: Thursday, August 13, 2015 7:20 AM
To: dev@dpdk.org<mailto:dev@dpdk.org>
Subject: [dpdk-dev] Performance issues with Mellanox Connectx-3 EN
Hi folks,
I am getting performance scalability issues with DPDK on Mellanox Connectx-3 .
Each of our machine has 16 cores and a single-port 40G Mellanox Connectx-3 EN. We find out the server throughput *does not scale* with number of cores. With a single thread on one core, we can get about 2 Mpps with a simple echo server implementation. However, the performance number does not increase as we use more cores. Our implementation is based on the l2fwd example.
I'd greatly appreciate it if anyone could provide some insights on what might be the problem and how can we improve the performance with Mellanox Connectx-3 EN. Thanks!
Best,
Xiaozhou
prev parent reply other threads:[~2015-08-16 6:38 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-08-12 23:20 Xiaozhou Li
2015-08-13 3:24 ` Xu, Qian Q
2015-08-13 11:13 ` Gilad Berman
2015-08-14 4:10 ` Xiaozhou Li
2015-08-16 6:38 ` Gilad Berman [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=AM2PR05MB0675075BA1D5887AEFB11B02A07A0@AM2PR05MB0675.eurprd05.prod.outlook.com \
--to=giladb@mellanox.com \
--cc=dev@dpdk.org \
--cc=xl@CS.Princeton.EDU \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).