* [dpdk-dev] Performance issues with Mellanox Connectx-3 EN @ 2015-08-12 23:20 Xiaozhou Li 2015-08-13 3:24 ` Xu, Qian Q 0 siblings, 1 reply; 5+ messages in thread From: Xiaozhou Li @ 2015-08-12 23:20 UTC (permalink / raw) To: dev Hi folks, I am getting performance scalability issues with DPDK on Mellanox Connectx-3 . Each of our machine has 16 cores and a single-port 40G Mellanox Connectx-3 EN. We find out the server throughput *does not scale* with number of cores. With a single thread on one core, we can get about 2 Mpps with a simple echo server implementation. However, the performance number does not increase as we use more cores. Our implementation is based on the l2fwd example. I'd greatly appreciate it if anyone could provide some insights on what might be the problem and how can we improve the performance with Mellanox Connectx-3 EN. Thanks! Best, Xiaozhou ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [dpdk-dev] Performance issues with Mellanox Connectx-3 EN 2015-08-12 23:20 [dpdk-dev] Performance issues with Mellanox Connectx-3 EN Xiaozhou Li @ 2015-08-13 3:24 ` Xu, Qian Q 2015-08-13 11:13 ` Gilad Berman 0 siblings, 1 reply; 5+ messages in thread From: Xu, Qian Q @ 2015-08-13 3:24 UTC (permalink / raw) To: Xiaozhou Li, dev Xiaozhou So seems the performance bottleneck is not at the core, have you checked that the Mellanox NIC's configuration? How many queues per port are you using? Could you try l3fwd example with Mellanox to check if the performance is good enough? I'm not familiar with Mellanox NIC, but if you have tried Intel Fortville 40G NIC, I can give more suggestions about the NIC's configurations. Thanks Qian -----Original Message----- From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Xiaozhou Li Sent: Thursday, August 13, 2015 7:20 AM To: dev@dpdk.org Subject: [dpdk-dev] Performance issues with Mellanox Connectx-3 EN Hi folks, I am getting performance scalability issues with DPDK on Mellanox Connectx-3 . Each of our machine has 16 cores and a single-port 40G Mellanox Connectx-3 EN. We find out the server throughput *does not scale* with number of cores. With a single thread on one core, we can get about 2 Mpps with a simple echo server implementation. However, the performance number does not increase as we use more cores. Our implementation is based on the l2fwd example. I'd greatly appreciate it if anyone could provide some insights on what might be the problem and how can we improve the performance with Mellanox Connectx-3 EN. Thanks! Best, Xiaozhou ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [dpdk-dev] Performance issues with Mellanox Connectx-3 EN 2015-08-13 3:24 ` Xu, Qian Q @ 2015-08-13 11:13 ` Gilad Berman 2015-08-14 4:10 ` Xiaozhou Li 0 siblings, 1 reply; 5+ messages in thread From: Gilad Berman @ 2015-08-13 11:13 UTC (permalink / raw) To: Xu, Qian Q, Xiaozhou Li, dev Xiaozhou, Following Qian answer - 2Mpps is VERY (VERY) low and far below what we see even with single core. Which version of DPDK and PMD are you using? Are you using MLNX optimized libs for PMD? Can you provide more details on the exact setup? Can you run a simple test with testpmd and see if you are getting the same results? Just to be clear - it does not matter which version you are using, 2Mpps is very far from what you should get :) -----Original Message----- From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Xu, Qian Q Sent: Thursday, August 13, 2015 6:25 AM To: Xiaozhou Li <xl@CS.Princeton.EDU>; dev@dpdk.org Subject: Re: [dpdk-dev] Performance issues with Mellanox Connectx-3 EN Xiaozhou So seems the performance bottleneck is not at the core, have you checked that the Mellanox NIC's configuration? How many queues per port are you using? Could you try l3fwd example with Mellanox to check if the performance is good enough? I'm not familiar with Mellanox NIC, but if you have tried Intel Fortville 40G NIC, I can give more suggestions about the NIC's configurations. Thanks Qian -----Original Message----- From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Xiaozhou Li Sent: Thursday, August 13, 2015 7:20 AM To: dev@dpdk.org Subject: [dpdk-dev] Performance issues with Mellanox Connectx-3 EN Hi folks, I am getting performance scalability issues with DPDK on Mellanox Connectx-3 . Each of our machine has 16 cores and a single-port 40G Mellanox Connectx-3 EN. We find out the server throughput *does not scale* with number of cores. With a single thread on one core, we can get about 2 Mpps with a simple echo server implementation. However, the performance number does not increase as we use more cores. Our implementation is based on the l2fwd example. I'd greatly appreciate it if anyone could provide some insights on what might be the problem and how can we improve the performance with Mellanox Connectx-3 EN. Thanks! Best, Xiaozhou ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [dpdk-dev] Performance issues with Mellanox Connectx-3 EN 2015-08-13 11:13 ` Gilad Berman @ 2015-08-14 4:10 ` Xiaozhou Li 2015-08-16 6:38 ` Gilad Berman 0 siblings, 1 reply; 5+ messages in thread From: Xiaozhou Li @ 2015-08-14 4:10 UTC (permalink / raw) To: Gilad Berman; +Cc: dev Hi Qian and Gilad, Thanks for your reply. We are using dpdk-2.0.0 and mlnx-en-2.4-1.0.0.1 on a Mellanox Connectx-3 EN with a single 40G port. I ran testpmd on the server with following commands: sudo ./testpmd -c 0xff -n 4 -- -i --portmask=0x1 --port-topology=chained --rxq=4 --txq=4 --nb-cores=4; set fwd macswap I have multiple clients send packets and receive replies. The server throughput is still only about 2Mpps. Testpmd shows no RX-dropped packet, but "ifconfig port" shows many dropped packets. Please let me know if I am doing anything wrong and what else should I check. I am also copying the output when starting testpmd at the end of this email. Not sure if there is any useful information. Thanks! Xiaozhou EAL: Detected lcore 0 as core 0 on socket 0 ... (omit) ... EAL: Detected 32 lcore(s) EAL: VFIO modules not all loaded, skip VFIO support... EAL: Setting up memory... ... (omit) ... EAL: Ask a virtual area of 0xa00000 bytes EAL: Virtual area found at 0x7f2d2fe00000 (size = 0xa00000) EAL: Requesting 8192 pages of size 2MB from socket 0 EAL: Requesting 8192 pages of size 2MB from socket 1 EAL: TSC frequency is ~2199994 KHz EAL: Master lcore 0 is ready (tid=39add900;cpuset=[0]) PMD: ENICPMD trace: rte_enic_pmd_init EAL: lcore 4 is ready (tid=3676b700;cpuset=[4]) EAL: lcore 6 is ready (tid=35769700;cpuset=[6]) EAL: lcore 5 is ready (tid=35f6a700;cpuset=[5]) EAL: lcore 2 is ready (tid=3776d700;cpuset=[2]) EAL: lcore 1 is ready (tid=37f6e700;cpuset=[1]) EAL: lcore 3 is ready (tid=36f6c700;cpuset=[3]) EAL: lcore 7 is ready (tid=34f68700;cpuset=[7]) EAL: PCI device 0000:04:00.0 on NUMA socket 0 EAL: probe driver: 8086:1521 rte_igb_pmd EAL: Not managed by a supported kernel driver, skipped EAL: PCI device 0000:04:00.1 on NUMA socket 0 EAL: probe driver: 8086:1521 rte_igb_pmd EAL: Not managed by a supported kernel driver, skipped EAL: PCI device 0000:06:00.0 on NUMA socket 0 EAL: probe driver: 15b3:1003 librte_pmd_mlx4 PMD: librte_pmd_mlx4: PCI information matches, using device "mlx4_0" (VF: false) PMD: librte_pmd_mlx4: 1 port(s) detected PMD: librte_pmd_mlx4: port 1 MAC address is f4:52:14:5a:8f:70 EAL: PCI device 0000:81:00.0 on NUMA socket 1 EAL: probe driver: 8086:1528 rte_ixgbe_pmd EAL: Not managed by a supported kernel driver, skipped EAL: PCI device 0000:81:00.1 on NUMA socket 1 EAL: probe driver: 8086:1528 rte_ixgbe_pmd EAL: Not managed by a supported kernel driver, skipped Interactive-mode selected Configuring Port 0 (socket 0) PMD: librte_pmd_mlx4: 0x884360: TX queues number update: 0 -> 4 PMD: librte_pmd_mlx4: 0x884360: RX queues number update: 0 -> 4 Port 0: F4:52:14:5A:8F:70 Checking link statuses... Port 0 Link Up - speed 40000 Mbps - full-duplex Done *testpmd>* show config rxtx macswap packet forwarding - CRC stripping disabled - packets/burst=32 nb forwarding cores=4 - nb forwarding ports=1 RX queues=4 - RX desc=128 - RX free threshold=0 RX threshold registers: pthresh=0 hthresh=0 wthresh=0 TX queues=4 - TX desc=512 - TX free threshold=0 TX threshold registers: pthresh=0 hthresh=0 wthresh=0 TX RS bit threshold=0 - TXQ flags=0x0 *testpmd> *show config fwd macswap packet forwarding - ports=1 - cores=4 - streams=4 - NUMA support disabled, MP over anonymous pages disabled Logical Core 1 (socket 0) forwards packets on 1 streams: RX P=0/Q=0 (socket 0) -> TX P=0/Q=0 (socket 0) peer=02:00:00:00:00:00 Logical Core 2 (socket 0) forwards packets on 1 streams: RX P=0/Q=1 (socket 0) -> TX P=0/Q=1 (socket 0) peer=02:00:00:00:00:00 Logical Core 3 (socket 0) forwards packets on 1 streams: RX P=0/Q=2 (socket 0) -> TX P=0/Q=2 (socket 0) peer=02:00:00:00:00:00 Logical Core 4 (socket 0) forwards packets on 1 streams: RX P=0/Q=3 (socket 0) -> TX P=0/Q=3 (socket 0) peer=02:00:00:00:00:00 On Thu, Aug 13, 2015 at 6:13 AM, Gilad Berman <giladb@mellanox.com> wrote: > Xiaozhou, > Following Qian answer - 2Mpps is VERY (VERY) low and far below what we see > even with single core. > Which version of DPDK and PMD are you using? Are you using MLNX optimized > libs for PMD? Can you provide more details on the exact setup? > Can you run a simple test with testpmd and see if you are getting the same > results? > > Just to be clear - it does not matter which version you are using, 2Mpps > is very far from what you should get :) > > -----Original Message----- > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Xu, Qian Q > Sent: Thursday, August 13, 2015 6:25 AM > To: Xiaozhou Li <xl@CS.Princeton.EDU>; dev@dpdk.org > Subject: Re: [dpdk-dev] Performance issues with Mellanox Connectx-3 EN > > Xiaozhou > So seems the performance bottleneck is not at the core, have you checked > that the Mellanox NIC's configuration? How many queues per port are you > using? Could you try l3fwd example with Mellanox to check if the > performance is good enough? I'm not familiar with Mellanox NIC, but if you > have tried Intel Fortville 40G NIC, I can give more suggestions about the > NIC's configurations. > > Thanks > Qian > > > -----Original Message----- > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Xiaozhou Li > Sent: Thursday, August 13, 2015 7:20 AM > To: dev@dpdk.org > Subject: [dpdk-dev] Performance issues with Mellanox Connectx-3 EN > > Hi folks, > > I am getting performance scalability issues with DPDK on Mellanox > Connectx-3 . > > Each of our machine has 16 cores and a single-port 40G Mellanox Connectx-3 > EN. We find out the server throughput *does not scale* with number of > cores. With a single thread on one core, we can get about 2 Mpps with a > simple echo server implementation. However, the performance number does not > increase as we use more cores. Our implementation is based on the l2fwd > example. > > I'd greatly appreciate it if anyone could provide some insights on what > might be the problem and how can we improve the performance with Mellanox > Connectx-3 EN. Thanks! > > Best, > Xiaozhou > ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [dpdk-dev] Performance issues with Mellanox Connectx-3 EN 2015-08-14 4:10 ` Xiaozhou Li @ 2015-08-16 6:38 ` Gilad Berman 0 siblings, 0 replies; 5+ messages in thread From: Gilad Berman @ 2015-08-16 6:38 UTC (permalink / raw) To: Xiaozhou Li; +Cc: dev Xiaozhou, I will take this thread offline and mail you. I promise to post the solution back in the list for future reference. I do not want to spam everyone.. Thx! From: Xiaozhou Li [mailto:xl@CS.Princeton.EDU] Sent: Friday, August 14, 2015 7:11 AM To: Gilad Berman <giladb@mellanox.com> Cc: Xu, Qian Q <qian.q.xu@intel.com>; dev@dpdk.org Subject: Re: [dpdk-dev] Performance issues with Mellanox Connectx-3 EN Hi Qian and Gilad, Thanks for your reply. We are using dpdk-2.0.0 and mlnx-en-2.4-1.0.0.1 on a Mellanox Connectx-3 EN with a single 40G port. I ran testpmd on the server with following commands: sudo ./testpmd -c 0xff -n 4 -- -i --portmask=0x1 --port-topology=chained --rxq=4 --txq=4 --nb-cores=4; set fwd macswap I have multiple clients send packets and receive replies. The server throughput is still only about 2Mpps. Testpmd shows no RX-dropped packet, but "ifconfig port" shows many dropped packets. Please let me know if I am doing anything wrong and what else should I check. I am also copying the output when starting testpmd at the end of this email. Not sure if there is any useful information. Thanks! Xiaozhou EAL: Detected lcore 0 as core 0 on socket 0 ... (omit) ... EAL: Detected 32 lcore(s) EAL: VFIO modules not all loaded, skip VFIO support... EAL: Setting up memory... ... (omit) ... EAL: Ask a virtual area of 0xa00000 bytes EAL: Virtual area found at 0x7f2d2fe00000 (size = 0xa00000) EAL: Requesting 8192 pages of size 2MB from socket 0 EAL: Requesting 8192 pages of size 2MB from socket 1 EAL: TSC frequency is ~2199994 KHz EAL: Master lcore 0 is ready (tid=39add900;cpuset=[0]) PMD: ENICPMD trace: rte_enic_pmd_init EAL: lcore 4 is ready (tid=3676b700;cpuset=[4]) EAL: lcore 6 is ready (tid=35769700;cpuset=[6]) EAL: lcore 5 is ready (tid=35f6a700;cpuset=[5]) EAL: lcore 2 is ready (tid=3776d700;cpuset=[2]) EAL: lcore 1 is ready (tid=37f6e700;cpuset=[1]) EAL: lcore 3 is ready (tid=36f6c700;cpuset=[3]) EAL: lcore 7 is ready (tid=34f68700;cpuset=[7]) EAL: PCI device 0000:04:00.0 on NUMA socket 0 EAL: probe driver: 8086:1521 rte_igb_pmd EAL: Not managed by a supported kernel driver, skipped EAL: PCI device 0000:04:00.1 on NUMA socket 0 EAL: probe driver: 8086:1521 rte_igb_pmd EAL: Not managed by a supported kernel driver, skipped EAL: PCI device 0000:06:00.0 on NUMA socket 0 EAL: probe driver: 15b3:1003 librte_pmd_mlx4 PMD: librte_pmd_mlx4: PCI information matches, using device "mlx4_0" (VF: false) PMD: librte_pmd_mlx4: 1 port(s) detected PMD: librte_pmd_mlx4: port 1 MAC address is f4:52:14:5a:8f:70 EAL: PCI device 0000:81:00.0 on NUMA socket 1 EAL: probe driver: 8086:1528 rte_ixgbe_pmd EAL: Not managed by a supported kernel driver, skipped EAL: PCI device 0000:81:00.1 on NUMA socket 1 EAL: probe driver: 8086:1528 rte_ixgbe_pmd EAL: Not managed by a supported kernel driver, skipped Interactive-mode selected Configuring Port 0 (socket 0) PMD: librte_pmd_mlx4: 0x884360: TX queues number update: 0 -> 4 PMD: librte_pmd_mlx4: 0x884360: RX queues number update: 0 -> 4 Port 0: F4:52:14:5A:8F:70 Checking link statuses... Port 0 Link Up - speed 40000 Mbps - full-duplex Done testpmd> show config rxtx macswap packet forwarding - CRC stripping disabled - packets/burst=32 nb forwarding cores=4 - nb forwarding ports=1 RX queues=4 - RX desc=128 - RX free threshold=0 RX threshold registers: pthresh=0 hthresh=0 wthresh=0 TX queues=4 - TX desc=512 - TX free threshold=0 TX threshold registers: pthresh=0 hthresh=0 wthresh=0 TX RS bit threshold=0 - TXQ flags=0x0 testpmd> show config fwd macswap packet forwarding - ports=1 - cores=4 - streams=4 - NUMA support disabled, MP over anonymous pages disabled Logical Core 1 (socket 0) forwards packets on 1 streams: RX P=0/Q=0 (socket 0) -> TX P=0/Q=0 (socket 0) peer=02:00:00:00:00:00 Logical Core 2 (socket 0) forwards packets on 1 streams: RX P=0/Q=1 (socket 0) -> TX P=0/Q=1 (socket 0) peer=02:00:00:00:00:00 Logical Core 3 (socket 0) forwards packets on 1 streams: RX P=0/Q=2 (socket 0) -> TX P=0/Q=2 (socket 0) peer=02:00:00:00:00:00 Logical Core 4 (socket 0) forwards packets on 1 streams: RX P=0/Q=3 (socket 0) -> TX P=0/Q=3 (socket 0) peer=02:00:00:00:00:00 On Thu, Aug 13, 2015 at 6:13 AM, Gilad Berman <giladb@mellanox.com<mailto:giladb@mellanox.com>> wrote: Xiaozhou, Following Qian answer - 2Mpps is VERY (VERY) low and far below what we see even with single core. Which version of DPDK and PMD are you using? Are you using MLNX optimized libs for PMD? Can you provide more details on the exact setup? Can you run a simple test with testpmd and see if you are getting the same results? Just to be clear - it does not matter which version you are using, 2Mpps is very far from what you should get :) -----Original Message----- From: dev [mailto:dev-bounces@dpdk.org<mailto:dev-bounces@dpdk.org>] On Behalf Of Xu, Qian Q Sent: Thursday, August 13, 2015 6:25 AM To: Xiaozhou Li <xl@CS.Princeton.EDU<mailto:xl@CS.Princeton.EDU>>; dev@dpdk.org<mailto:dev@dpdk.org> Subject: Re: [dpdk-dev] Performance issues with Mellanox Connectx-3 EN Xiaozhou So seems the performance bottleneck is not at the core, have you checked that the Mellanox NIC's configuration? How many queues per port are you using? Could you try l3fwd example with Mellanox to check if the performance is good enough? I'm not familiar with Mellanox NIC, but if you have tried Intel Fortville 40G NIC, I can give more suggestions about the NIC's configurations. Thanks Qian -----Original Message----- From: dev [mailto:dev-bounces@dpdk.org<mailto:dev-bounces@dpdk.org>] On Behalf Of Xiaozhou Li Sent: Thursday, August 13, 2015 7:20 AM To: dev@dpdk.org<mailto:dev@dpdk.org> Subject: [dpdk-dev] Performance issues with Mellanox Connectx-3 EN Hi folks, I am getting performance scalability issues with DPDK on Mellanox Connectx-3 . Each of our machine has 16 cores and a single-port 40G Mellanox Connectx-3 EN. We find out the server throughput *does not scale* with number of cores. With a single thread on one core, we can get about 2 Mpps with a simple echo server implementation. However, the performance number does not increase as we use more cores. Our implementation is based on the l2fwd example. I'd greatly appreciate it if anyone could provide some insights on what might be the problem and how can we improve the performance with Mellanox Connectx-3 EN. Thanks! Best, Xiaozhou ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2015-08-16 6:38 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-08-12 23:20 [dpdk-dev] Performance issues with Mellanox Connectx-3 EN Xiaozhou Li 2015-08-13 3:24 ` Xu, Qian Q 2015-08-13 11:13 ` Gilad Berman 2015-08-14 4:10 ` Xiaozhou Li 2015-08-16 6:38 ` Gilad Berman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).