RE: Mellanox Connectx-6 Dx dual port performance

DPDK usage discussions
 help / color / mirror / Atom feed

From: Asaf Penso <asafp@nvidia.com>
To: "Дмитрий Степанов" <stepanov.dmit@gmail.com>,
	"users@dpdk.org" <users@dpdk.org>
Subject: RE: Mellanox Connectx-6 Dx dual port performance
Date: Sun, 10 Apr 2022 07:30:48 +0000	[thread overview]
Message-ID: <DM5PR1201MB2555BA8921CFDFA73936A50ACDEB9@DM5PR1201MB2555.namprd12.prod.outlook.com> (raw)
In-Reply-To: <CA+-SuJ01pcMz_Y6H1=Z-Q9PGM6i8fpkfbqYT2JeB8PEHWoktBQ@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 4061 bytes --]

Hello,

Thanks for your mail and analysis.
The results below of max packet rate of 214Mpps for dual port ConnectX-6 Dx are expected, and are aligned with the NIC capabilities.

Regards,
Asaf Penso

From: Дмитрий Степанов <stepanov.dmit@gmail.com>
Sent: Tuesday, March 22, 2022 11:04 AM
To: users@dpdk.org
Subject: Mellanox Connectx-6 Dx dual port performance

Hi!

I'm testing overall dual port performance on ConnectX-6 Dx EN adapter card (100GbE; Dual-port QSFP56; PCIe 4.0/3.0 x16) with DPDK 21.11 on Ubuntu 20.04.
I have 2 dual port NICs installed on the same server (but on different NUMA nodes) which I use as a generator and a reciever respectively.
First, I started custom packet generator on port 0 and got 148 Mpps TX (64 bytes TCP packets with zero payload lentgh) which equals the maximum of 100 Gbps line rate. Then I launched the same generator with the same parameters simultaneously on port 1.
Performance on both ports decreased to 105-106 Mpss per port (210-212 Mpps in sum). If I use 512 bytes TCP packets - then running generators on both ports gives me 23 Mpps for each port (46 Mpps in sum, which for given TCP packet size equals the maximum line rate).

Mellanox performance report http://fast.dpdk.org/doc/perf/DPDK_21_08_Mellanox_NIC_performance_report.pdf<https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Ffast.dpdk.org%2Fdoc%2Fperf%2FDPDK_21_08_Mellanox_NIC_performance_report.pdf&data=04%7C01%7Casafp%40nvidia.com%7Cc85c2868f7cb46bcd74e08da0be2eefe%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637835366991655176%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=Y%2FIkVkGUK%2FGozni1b%2B5ICrdMDO%2B8LW84I8Poiol4wWw%3D&reserved=0> doesn't contain measurements for TX path, only for RX.
Provided Test#11 Mellanox ConnectX-6 Dx 100GbE PCIe Gen4 Throughput at Zero Packet Loss (2x 100GbE) for RX path contains near the same results that I got for TX path (214 Mpps for 64 bytes packets, 47 Mpps for 512 bytes packets). The question is - do my results for TX path should coincide with provided results for RX path? Why I can't get 148 x 2 Mpps for small packets when using both ports? What is a bottleneck here - PCIe, RAM or NIC itself?

To test RX path I used testpmd and l3fwd (slightly midified to print RX stats) utilities.

./dpdk-testpmd -l 64-127 -n 4 -a 0000:c1:00.0,mprq_en=1,mprq_log_stride_num=9 -a 0000:c1:00.1,mprq_en=1,mprq_log_stride_num=9 -- --stats-period 1 --nb-cores=16 --rxq=16 --txq=16 --rxd=4096 --txd=4096 --burst=64 --mbcache=512

./build/examples/dpdk-l3fwd -l 96-111 -n 4 --socketmem=0,4096 -a 0000:c1:00.0,mprq_en=1,rxqs_min_mprq=1,mprq_log_stride_num=9,txq_inline_mpw=128,rxq_pkt_pad_en=1 -a 0000:c1:00.1,mprq_en=1,rxqs_min_mprq=1,mprq_log_stride_num=9,txq_inline_mpw=128,rxq_pkt_pad_en=1 -- -p 0x3 -P --config='(0,0,111),(0,1,110),(0,2,109),(0,3,108),(0,4,107),(0,5,106),(0,6,105),(0,7,104),(1,0,103),(1,1,102),(1,2,101),(1,3,100),(1,4,99),(1,5,98),(1,6,97),(1,7,96)' --eth-dest=0,00:15:77:1f:eb:fb --eth-dest=1,00:15:77:1f:eb:fb

Then I provided 105 Mpps of 64 bytes TCP packets from another dual port NIC to each port (210 Mpps in sum). As I described above I can't get more than 210 Mpps in sum from generator. In both cases I was not able to get more than 75-85 Mpps for each port (150-170 Mpps in sum) on RX path. This contradicts with results provided in Mellanox performance report (214 Mpps for both ports, 112 Mpps per port on RX path). Running only single generator gives me 148 Mpps on both TX and RX sides. But after starting generator on the second port - the TX performance decreased to 105 Mpps per port (210 Mpps in sum), RX performance descreased to 75-85 Mpps per port (150-170 Mpps in sum for both ports). Could these poor RX results be due not fully utilized generator or I should get 210 Mpps provided by generator on both ports in sum? I used all suggestions for system tuning described in Mellanox performance report document.
I would be grateful for any advice.

Thanks in advance!

[-- Attachment #2: Type: text/html, Size: 7277 bytes --]

     prev parent reply	other threads:[~2022-04-10  7:30 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-22  9:03 Дмитрий Степанов
2022-04-10  7:30 ` Asaf Penso [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DM5PR1201MB2555BA8921CFDFA73936A50ACDEB9@DM5PR1201MB2555.namprd12.prod.outlook.com \
    --to=asafp@nvidia.com \
    --cc=stepanov.dmit@gmail.com \
    --cc=users@dpdk.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).