I get 125 Mpps from single port using 12 lcores: numactl -N 1 -m 1 /opt/dpdk-21.11/build/app/dpdk-testpmd -l 64-127 -n 4 -a 0000:c1:00.0 -- --stats-period 1 --nb-cores=12 --rxq=12 --txq=12 --rxd=512 With 63 cores i get 35 Mpps: numactl -N 1 -m 1 /opt/dpdk-21.11/build/app/dpdk-testpmd -l 64-127 -n 4 -a 0000:c1:00.0 -- --stats-period 1 --nb-cores=63 --rxq=63 --txq=63 --rxd=512 I'm using this guide as a reference - https://fast.dpdk.org/doc/perf/DPDK_20_11_Mellanox_NIC_performance_report.pdf This reference suggests examples of how to get the best performance but all of them use maximum 12 lcores. 125 Mpps with 12 lcores is nearly the maximum I can get from single 100GB port (148Mpps theoretical maximum for 64byte packet). I just want to understand - why I get good performance with 12 lcores and bad performance with 63 cores? пт, 18 февр. 2022 г. в 16:30, Asaf Penso : > Hello Dmitry, > > Could you please paste the testpmd commands per each experiment? > > Also, have you looked into dpdk.org performance report to see how to tune > for best results? > > Regards, > Asaf Penso > ------------------------------ > *From:* Дмитрий Степанов > *Sent:* Friday, February 18, 2022 9:32:59 AM > *To:* users@dpdk.org > *Subject:* Mellanox performance degradation with more than 12 lcores > > Hi folks! > > I'm using Mellanox ConnectX-6 Dx EN adapter card (100GbE; Dual-port > QSFP56; PCIe 4.0/3.0 x16) with DPDK 21.11 on a server with AMD EPYC 7702 > 64-Core Processor (NUMA system with 2 sockets). Hyperthreading is turned > off. > I'm testing the maximum receive throughput I can get from a single port > using testpmd utility (shipped with dpdk). My generator produces random UDP > packets with zero payload length. > > I get the maximum performance using 8-12 lcores (overall 120-125Mpps on > receive path of single port): > > numactl -N 1 -m 1 /opt/dpdk-21.11/build/app/dpdk-testpmd -l 64-127 -n 4 > -a 0000:c1:00.0 -- --stats-period 1 --nb-cores=12 --rxq=12 --txq=12 > --rxd=512 > > With more than 12 lcores overall receive performance reduces. With 16-32 > lcores I get 100-110 Mpps, and I get a significant performance fall with 33 > lcores - 84Mpps. With 63 cores I get even 35Mpps overall receive > performance. > > Are there any limitations on the total number of receive queues (total > lcores) that can handle a single port on a given NIC? > > Thanks, > Dmitriy Stepanov >