From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lb0-f181.google.com (mail-lb0-f181.google.com [209.85.217.181]) by dpdk.org (Postfix) with ESMTP id 7CB5537AF for ; Sat, 31 Oct 2015 09:54:04 +0100 (CET) Received: by lbbes7 with SMTP id es7so61620569lbb.2 for ; Sat, 31 Oct 2015 01:54:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=weyb/Np4ojiL+pID1ejaERNiOrA6fNcCj4ZD7VRv6LI=; b=GBmVveeVVzmcvVOQ6pElnEGDPX1ADCaCva6asqOaeTC3d3tTEhFtisHptSSOPBnqFm 3EZfgZU6zB4X/om2hEKYZydJt/ylevhys9DWsyCTfLuydDgWq+8Fh7wdkR6uK35W7yh2 dAXjPOq1RiGlwIxObPNMSWqk27v4x5QAo4eaQXC6WEU26Ox965l+JatzD97BXpwEnGVI B0Q99BqjuL6rDS883l2d1TcYn4ydgZwq0Bwp+wyd2EKZD/BfllgcaqSC0y1w+YJpSHOP SYcVr7MHQZK3IwEhkGp1dAaZdjXHtgFZWCP5UwjezKGDBoQAKyFtLZPzEVemvkpToebd ibmw== MIME-Version: 1.0 X-Received: by 10.112.130.39 with SMTP id ob7mr5955466lbb.66.1446281644179; Sat, 31 Oct 2015 01:54:04 -0700 (PDT) Received: by 10.25.167.75 with HTTP; Sat, 31 Oct 2015 01:54:04 -0700 (PDT) Date: Sat, 31 Oct 2015 09:54:04 +0100 Message-ID: From: Jesper Wramberg To: users@dpdk.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Subject: [dpdk-users] Low TX performance on Mellanox ConnectX-3 NIC X-BeenThere: users@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: usage discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 31 Oct 2015 08:54:04 -0000 Hi all, I am experiencing some performance issues in a somewhat custom setup with two Mellanox ConnectX-3 NICs. I realize these issues might be due to the setup, but I was hoping someone might be able to pinpoint some possible problems/bottlenecks. The server: I have a Dell PowerEdge R630 with two Mellanox ConnectX-3 NICs (one on each socket). I have a minimal Centos 7.1.1503 installed with kernel-3.10.0-229. Note that this kernel is re-build with most things disabled to minimize size, etc. It has infiniband enabled, however, and mlx4_core as a module (since nothing works otherwise). Finally, I have connected the two NICs from port 2 to port 2. The firmware: I have installed the latest firmware for the NICs from dell which is 2.34.5060. The drivers, modules, etc.: I have downloaded the Mellanox OFED package 3.1 for Centos 7.1 and used its rebuild feature to build it against the custom kernel. I have installed it using the --basic option since I just want libibverbs, libmlx4, kernel modules and openibd service stuff. The mlx4_core.conf is set for ethernet on all ports. Moreover, it is configured for flow steering mode -7 and a few VFs. I can restart the openibd service successfully and everything seems to be working. ibdev2netdev reports the NICs and its VFs, etc. The only problems I have encountered at this stage is that the links doesn't always seem to come up unless I unplug and re-plug the cables. DPDK setup: I have built DPDK with the mlx4 pmd using the .h/.a files from the OFED package. I build it using the default values for everything. Running the simple hello world example I can see that everything is initialized correctly, etc. Test setup: To test the performance of the NICs I have the following setup. Two processes, P1 and P2, running on NIC A. Two other processes, P3 and P4, running on NIC B. All processes use virtual functions on their respective NICs. Depending on the test, the processes can either transmit or receive data. To transmit, I use a simple DPDK program which generates 32000 packets and transmits them over and over until it has sent 640 million packets. Similarly, I use a simple DPDK program to receive which is basically the layer 2 forwarding example without re-transmission. First test: In my first test, P1 transmits data to P3 while the other processes are idle. Packet size: 1480 byte packets Flow control: On/Off, doesn=E2=80=99t matter I get same result. Result: P3 receive all packets but it takes 192.52 seconds ~ 3.32 Mpps ~ 4.9Gbit/s Second test: I my second test, I attempt to increase the amount of data transmitted over NIC A. As such, P1 transmits data to P3 while P2 transmits data to P4. Packet size: 1480 byte packets Flow control: On/Off, doesn=E2=80=99t matter I get same result. Results: P3 and P4 receive all packets but it takes 364.40 seconds ~ 1.75 Mpps ~ 2.6Gbit/s for a single process to get its data transmitted. Does anyone has any idea what I am doing wrong here ? In the second test I would expect P1 to transmit with the same speed as in the first test. It seems that there is a bottleneck somewhere, however. I have left most things to their default values but have also tried tweaking queue sizes, number of queues, interrupts, etc. with no luck Best Regards, Jesper