From: Jesper Wramberg <jesper.wramberg@gmail.com>
To: Olga Shern <olgas@mellanox.com>
Cc: "users@dpdk.org" <users@dpdk.org>
Subject: Re: [dpdk-users] Low TX performance on Mellanox ConnectX-3 NIC
Date: Mon, 2 Nov 2015 14:57:12 +0100 [thread overview]
Message-ID: <CALhSPovmw9rTw_J7F0p7rGOPK-tWBy_RXspb61UOVFB8wMZ6Tw@mail.gmail.com> (raw)
In-Reply-To: <CALhSPovZaLoFn-f34D+wYGpgmbswdzg5EzANW7HKBQwFktaDTQ@mail.gmail.com>
Hi again,
Sorry I missed your first email. Wow, I can't believe I missed that. I read
the output from ethernet_bw as Mbit/s :-( That's kind of embarrassing.
You are right. My calculations are wrong. Sorry for bothering you with my
bad math. For whats it's worth I have spent quite some time wondering what
was wrong.
I still have some way to go though, since my original problems started in a
much larger, more complicated setup. But I'm glad this basic Tx/Rx setup
works as expected.
Thank you, best regards
Jesper
2015-11-02 13:57 GMT+01:00 Jesper Wramberg <jesper.wramberg@gmail.com>:
> Hey,
>
> As a follow-up I tried changing interrupts around without any changes to
> achieved speed.
> Lastly, after some iperf testing using 10 threads it would seem that it is
> impossible to achieve over 10G BW.
>
> Got an interesting output from "perf top -p <pid>" however while running
> the raw_ethernet_bw script.
>
> 37.22% libpthread-2.17.so [.] pthread_spin_lock
> 10.00% libmlx4-rdmav2.so [.] 0x000000000000b05a
> 1.20% libmlx4-rdmav2.so [.] 0x000000000000b3ec
> 1.07% libmlx4-rdmav2.so [.] 0x000000000000b06c
> 1.07% libmlx4-rdmav2.so [.] 0x000000000000afc0
> 1.06% raw_ethernet_bw [.] 0x000000000001484f
> 1.06% raw_ethernet_bw [.] 0x0000000000014869
> 1.06% raw_ethernet_bw [.] 0x00000000000142ec
> 1.05% libmlx4-rdmav2.so [.] 0x000000000000b41c
> 1.05% libmlx4-rdmav2.so [.] 0x000000000000aff6
> 1.05% raw_ethernet_bw [.] 0x0000000000014f09
> 1.03% libmlx4-rdmav2.so [.] 0x0000000000005a60
> 1.03% libmlx4-rdmav2.so [.] 0x000000000000be51
> 1.03% libpthread-2.17.so [.] pthread_spin_unlock
> 1.01% libmlx4-rdmav2.so [.] 0x000000000000afdc
> 1.00% libmlx4-rdmav2.so [.] 0x000000000000b042
> 1.00% raw_ethernet_bw [.] 0x0000000000014314
> 0.98% libmlx4-rdmav2.so [.] 0x000000000000bf38
> 0.97% libmlx4-rdmav2.so [.] 0x000000000000b3d2
> 0.97% raw_ethernet_bw [.] 0x00000000000142a4
> 0.96% raw_ethernet_bw [.] 0x0000000000014282
> 0.96% libmlx4-rdmav2.so [.] 0x000000000000b415
> 0.96% raw_ethernet_bw [.] 0x000000000001425e
>
> I wonder if the tool is supposed to spend so much time in
> pthread_spin_lock..
>
> Best regards,
> Jesper
>
> 2015-11-02 11:59 GMT+01:00 Jesper Wramberg <jesper.wramberg@gmail.com>:
>
>> Hi again,
>>
>> Thank you for your input. I have now switched to using the
>> raw_ethernet_bw script as transmitter and the test-pmd as receiver. An
>> immediate result I discovered was that the raw_ethernet_bw tool achieves
>> very similar TX performance as my DPDK transmitter.
>>
>>
>> (note both cpu10 and mlx4_0 is on same numa node as wanted)
>> taskset -c 10 raw_ethernet_bw --client -d mlx4_0 -i 2 -l 3 --duration 20
>> -s 1480 --dest_mac F4:52:14:7A:59:80
>>
>> ---------------------------------------------------------------------------------------
>> Post List requested - CQ moderation will be the size of the post list
>>
>> ---------------------------------------------------------------------------------------
>> Send Post List BW Test
>> Dual-port : OFF Device : mlx4_0
>> Number of qps : 1 Transport type : IB
>> Connection type : RawEth Using SRQ : OFF
>> TX depth : 128
>> Post List : 3
>> CQ Moderation : 3
>> Mtu : 1518[B]
>> Link type : Ethernet
>> Gid index : 0
>> Max inline data : 0[B]
>> rdma_cm QPs : OFF
>> Data ex. method : Ethernet
>>
>> ---------------------------------------------------------------------------------------
>> **raw ethernet header****************************************
>>
>> --------------------------------------------------------------
>> | Dest MAC | Src MAC | Packet Type |
>> |------------------------------------------------------------|
>> | F4:52:14:7A:59:80| E6:1D:2D:11:FF:41|DEFAULT |
>> |------------------------------------------------------------|
>>
>>
>> ---------------------------------------------------------------------------------------
>> #bytes #iterations BW peak[MB/sec] BW average[MB/sec]
>> MsgRate[Mpps]
>> 1480 33242748 0.00 4691.58
>> 3.323974
>>
>> ---------------------------------------------------------------------------------------
>>
>>
>> Running it with the 64 byte packets Olga specified gives me the following
>> result:
>>
>>
>> ---------------------------------------------------------------------------------------
>> #bytes #iterations BW peak[MB/sec] BW average[MB/sec]
>> MsgRate[Mpps]
>> 64 166585650 0.00 1016.67
>> 16.657163
>>
>> ---------------------------------------------------------------------------------------
>>
>>
>> The results are the same with and without flow control. I have followed
>> the Mellanox DPDK QSG and done everything in the performance section
>> (except the things regarding interrupts).
>>
>> So to answer Olga's questions :-)
>>
>> 1: Unfortunately I can't. If I try the FW update complains since the
>> cards came with Dell configuration (PSID: DEL0A70000023).
>>
>> 2: In my final setup I need jumboframes but just for the sake of testing
>> I tried changing CONFIG_RTE_LIBRTE_MLX4_SGE_WR_N to 1 in the DPDK config.
>> This did not really change anything, neither in my initial setup nor the
>> one described above.
>>
>> 3: In the final setup, I plan to share the NICs between multiple
>> independent processes. For this reason, I wanted to use SR-IOV and
>> whitelist a single VF to each process. Anyway, for the tests above I have
>> used the PFs for simplicity.
>> (Side note: I discovered that multiple DPDK instances can use the same
>> PCI address which might eliminate the need for SR-IOV. I wonder how that
>> works :-))
>>
>> So conclusively, isn't the raw_ethernet_bw tool supposed to have larger
>> output BW with 1480 byte packets ?
>>
>> I have a sysinfo dump using the Mellanox sysinfo-snapshot.py script. I
>> can mail this to anyone who have the time to look further into it.
>>
>> Thank you for your help, best regards
>> Jesper
>>
>> 2015-11-01 11:05 GMT+01:00 Olga Shern <olgas@mellanox.com>:
>>
>>> Hi Jesper,
>>>
>>> Several suggestions,
>>> 1. Any chance you can install latest FW from Mellanox web site or
>>> the one that is included in OFED 3.1 version that you have downloaded? The
>>> latest version is 2.35.5100.
>>> 2. Please configure SGE_NUM=1 in DPDK config file in case you
>>> don't need jumbo frames. This will improve performance.
>>> 3. Not clear from your description, if you are running DPDK on VM ?
>>> Are you suing SRIOV ?
>>> 4. I suggest you to run first, testpmd application. The traffic
>>> generator can be raw_ethernet_bw application that coming with MLNX_OFED, it
>>> can generate L2, IPV4 and TCP/UDP packets
>>> For example: taskset -c 10 raw_ethernet_bw --client -d mlx4_0
>>> -i 1 -l 3 --duration 10 -s 64 --dest_mac F4:52:14:7A:59:80 &
>>> This will send L2 packets via mlx4_0 NIC port 1 , packet size =
>>> 64, for 10 sec, batch = 3 (-l)
>>> You can see according to testpmd counters the performance.
>>>
>>> Please check Mellanox community posts, I think they can help you.
>>> https://community.mellanox.com/docs/DOC-1502
>>>
>>> We also have performance suggestions in our QSG:
>>>
>>> http://www.mellanox.com/related-docs/prod_software/MLNX_DPDK_Quick_Start_Guide_v2%201_1%201.pdf
>>>
>>> Best Regards,
>>> Olga
>>>
>>>
>>> Objet : [dpdk-users] Low TX performance on Mellanox ConnectX-3 NIC Date
>>> : samedi 31 octobre 2015, 09:54:04 De : Jesper Wramberg <
>>> jesper.wramberg@gmail.com> À : users@dpdk.org
>>>
>>> Hi all,
>>>
>>>
>>>
>>> I am experiencing some performance issues in a somewhat custom setup
>>> with two Mellanox ConnectX-3 NICs. I realize these issues might be due to
>>> the setup, but I was hoping someone might be able to pinpoint some possible
>>> problems/bottlenecks.
>>>
>>>
>>>
>>>
>>> The server:
>>>
>>> I have a Dell PowerEdge R630 with two Mellanox ConnectX-3 NICs (one on
>>> each socket). I have a minimal Centos 7.1.1503 installed with kernel-
>>> 3.10.0-229.
>>> Note that this kernel is re-build with most things disabled to minimize
>>> size, etc. It has infiniband enabled, however, and mlx4_core as a module
>>> (since nothing works otherwise). Finally, I have connected the two NICs
>>> from port 2 to port 2.
>>>
>>>
>>>
>>> The firmware:
>>>
>>> I have installed the latest firmware for the NICs from dell which is
>>> 2.34.5060.
>>>
>>>
>>>
>>> The drivers, modules, etc.:
>>>
>>> I have downloaded the Mellanox OFED package 3.1 for Centos 7.1 and used
>>> its rebuild feature to build it against the custom kernel. I have installed
>>> it using the --basic option since I just want libibverbs, libmlx4, kernel
>>> modules and openibd service stuff. The mlx4_core.conf is set for ethernet
>>> on all ports. Moreover, it is configured for flow steering mode -7 and a
>>> few VFs. I can restart the openibd service successfully and everything
>>> seems to be working. ibdev2netdev reports the NICs and its VFs, etc. The
>>> only problems I have encountered at this stage is that the links doesn't
>>> always seem to come up unless I unplug and re-plug the cables.
>>>
>>>
>>>
>>> DPDK setup:
>>>
>>> I have built DPDK with the mlx4 pmd using the .h/.a files from the OFED
>>> package. I build it using the default values for everything. Running the
>>> simple hello world example I can see that everything is initialized
>>> correctly, etc.
>>>
>>>
>>>
>>> Test setup:
>>>
>>> To test the performance of the NICs I have the following setup. Two
>>> processes, P1 and P2, running on NIC A. Two other processes, P3 and P4,
>>> running on NIC B. All processes use virtual functions on their respective
>>> NICs. Depending on the test, the processes can either transmit or receive
>>> data. To transmit, I use a simple DPDK program which generates 32000
>>> packets and transmits them over and over until it has sent 640 million
>>> packets. Similarly, I use a simple DPDK program to receive which is
>>> basically the layer 2 forwarding example without re-transmission.
>>>
>>>
>>>
>>> First test:
>>>
>>> In my first test, P1 transmits data to P3 while the other processes are
>>> idle.
>>>
>>> Packet size: 1480 byte packets
>>>
>>> Flow control: On/Off, doesn’t matter I get same result.
>>>
>>> Result: P3 receive all packets but it takes 192.52 seconds ~ 3.32 Mpps ~
>>> 4.9Gbit/s
>>>
>>>
>>>
>>> Second test:
>>>
>>> I my second test, I attempt to increase the amount of data transmitted
>>> over NIC A. As such, P1 transmits data to P3 while P2 transmits data to P4.
>>>
>>> Packet size: 1480 byte packets
>>>
>>> Flow control: On/Off, doesn’t matter I get same result.
>>>
>>> Results: P3 and P4 receive all packets but it takes 364.40 seconds ~
>>> 1.75 Mpps ~ 2.6Gbit/s for a single process to get its data transmitted.
>>>
>>>
>>>
>>>
>>>
>>> Does anyone has any idea what I am doing wrong here ? In the second test
>>> I would expect P1 to transmit with the same speed as in the first test. It
>>> seems that there is a bottleneck somewhere, however. I have left most
>>> things to their default values but have also tried tweaking queue sizes,
>>> number of queues, interrupts, etc. with no luck
>>>
>>>
>>>
>>>
>>>
>>> Best Regards,
>>>
>>> Jesper
>>>
>>
>>
>
next prev parent reply other threads:[~2015-11-02 13:57 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <36170571.QMO0L8HZgB@xps13>
2015-11-01 10:05 ` Olga Shern
2015-11-02 10:59 ` Jesper Wramberg
2015-11-02 12:31 ` Olga Shern
2015-11-02 12:57 ` Jesper Wramberg
2015-11-02 13:57 ` Jesper Wramberg [this message]
2015-10-31 8:54 Jesper Wramberg
2015-10-31 16:26 ` Wiles, Keith
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CALhSPovmw9rTw_J7F0p7rGOPK-tWBy_RXspb61UOVFB8wMZ6Tw@mail.gmail.com \
--to=jesper.wramberg@gmail.com \
--cc=olgas@mellanox.com \
--cc=users@dpdk.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).