Hello Dmitry,

Could you please paste the testpmd commands per each experiment?

Also, have you looked into dpdk.org performance report to see how to tune for best results?

Regards,
Asaf Penso

From: Дмитрий Степанов <stepanov.dmit@gmail.com>
Sent: Friday, February 18, 2022 9:32:59 AM
To: users@dpdk.org <users@dpdk.org>
Subject: Mellanox performance degradation with more than 12 lcores
 
Hi folks!

I'm using Mellanox ConnectX-6 Dx EN adapter card (100GbE; Dual-port QSFP56; PCIe 4.0/3.0 x16) with DPDK 21.11 on a server with AMD EPYC 7702 64-Core Processor (NUMA system with 2 sockets). Hyperthreading is turned off.
I'm testing the maximum receive throughput I can get from a single port using testpmd utility (shipped with dpdk). My generator produces random UDP packets with zero payload length.

I get the maximum performance using 8-12 lcores (overall 120-125Mpps on receive path of single port):

numactl -N 1 -m 1 /opt/dpdk-21.11/build/app/dpdk-testpmd -l 64-127 -n 4  -a 0000:c1:00.0 -- --stats-period 1 --nb-cores=12 --rxq=12 --txq=12 --rxd=512

With more than 12 lcores overall receive performance reduces. With 16-32 lcores I get 100-110 Mpps, and I get a significant performance fall with 33 lcores - 84Mpps. With 63 cores I get even 35Mpps  overall receive performance.

Are there any limitations on the total number of receive queues (total lcores) that can handle a single port on a given NIC?

Thanks,
Dmitriy Stepanov