Thanks for the clarification! I was able to get 148Mpps with 12 lcores after some BIOS tunings. Looks like due to these HW limitations I have to use ring buffer as you suggested to support more than 32 lcores! пт, 18 февр. 2022 г. в 16:40, Dmitry Kozlyuk : > Hi, > > > With more than 12 lcores overall receive performance reduces. > > With 16-32 lcores I get 100-110 Mpps, > > It is more about the number of queues than the number of cores: > 12 queues are the threshold when Multi-Packet Receive Queue (MPRQ) > is automatically enabled in mlx5 PMD. > Try increasing --rxd and check out mprq_en device argument. > Please see mlx5 PMD user guide for details about MPRQ. > You should be able to get full 148 Mpps with your HW. > > > and I get a significant performance fall with 33 lcores - 84Mpps. > > With 63 cores I get even 35Mpps overall receive performance. > > > > Are there any limitations on the total number of receive queues (total > > lcores) that can handle a single port on a given NIC? > > This is a hardware limitation. > The limit on the number of queues you can create is very high (16M), > but performance can perfectly scale only up to 32 queues > at high packet rates (as opposed to bit rates). > Using more queues can even degrade it, just as you observe. > One way to overcome this (not specific to mlx5) > is to use a ring buffer for incoming packets, > from which any number of processing cores can take packets. >