Hi Slava and Erez and thanks for your answers,

 

Regarding the firmware, I’ve also deployed in a different OpenShift cluster were I see the exact same issue but with a different Mellanox NIC:

 

Mellanox Technologies MT2892 Family - ConnectX-6 DX 2-port 100GbE QSFP56 PCIe Adapter

driver: mlx5_core

version: 5.0-0
firmware-version: 22.36.1010 (DEL0000000027)

 

From what I can see the firmware is relatively new on that one?

 

I tried setting dv_flow_en=0 (and saw that it was propagated to config->dv_flow_en) but it didn’t seem to help.

 

Erez, I’m not sure what you mean by shared or non-shared mode in this case, however it seems it could be related to the fact that the container is running in a separate network namespace. Because the hw_counter directory is available on the host (cluster node), but not in the pod container.

 

Best regards,

Daniel

 

From: Erez Ferber <erezferber@gmail.com>
Sent: Monday, 5 June 2023 12:29
To: Slava Ovsiienko <viacheslavo@nvidia.com>
Cc: Daniel Östman <daniel.ostman@ericsson.com>; users@dpdk.org; Matan Azrad <matan@nvidia.com>; maxime.coquelin@redhat.com; david.marchand@redhat.com
Subject: Re: mlx5: imissed / out_of_buffer counter always 0

 

Hi Daniel,

 

is the container running in shared or non-shared mode ? 

For shared mode, I assume the kernel sysfs counters which DPDK relies on for imissed/out_of_buffer are not exposed.

 

Best regards,

Erez

 

On Fri, 2 Jun 2023 at 18:07, Slava Ovsiienko <viacheslavo@nvidia.com> wrote:

Hi, Daniel

 

I would recommend to take the following action:

- update the firmware, 16.33.xxxx looks to be outdated a little bit. Please, try 16.35.1012 or later.
  mlx5_glue->devx_obj_create might succeed with the newer FW.

- try to specify dv_flow_en=0 devarg, it forces mlx5 PMD to use rdma_core library for queue management
 and kernel driver will  be aware about Rx queues being created and attach them to the kernel counter set

 

With best regards,
Slava

 

From: Daniel Östman <daniel.ostman@ericsson.com>
Sent: Friday, June 2, 2023 3:59 PM
To: users@dpdk.org
Cc: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko <viacheslavo@nvidia.com>; maxime.coquelin@redhat.com; david.marchand@redhat.com
Subject: mlx5: imissed / out_of_buffer counter always 0

 

Hi,

 

I’m deploying a containerized DPDK application in an OpenShift Kubernetes environment using DPDK 21.11.3.

The application uses a Mellanox ConnectX-5 100G NIC through VFs.

 

The problem I have is that the ETH stats counter imissed (which seems to be mapped to “out_of_buffer” internally in mlx5 PMD driver) is 0 when I don’t expect it to be, i.e. when the application doesn’t read the packets fast enough.

 

Using GDB I can see that it tries to access the counter through /sys/class/infiniband/mlx5_99/ports/1/hw_counters/out_of_buffer but the hw_counters directory is missing so it will just return a zero value. I don’t know why it is missing.

When looking at mlx5_os_read_dev_stat() I can see that there is an alternative way of reading the counter, through mlx5_devx_cmd_queue_counter_query() but under the condition that priv->q_counters are set.

It doesn’t get set in my case because mlx5_glue->devx_obj_create() fails (errno 22) in mlx5_devx_cmd_queue_counter_alloc().

 

Have I missed something?

 

NIC info:

Mellanox Technologies MT27800 Family [ConnectX-5] - 100Gb 2-port QSFP28 MCX516A-CCHT
driver: mlx5_core
version: 5.0-0
firmware-version: 16.33.1048 (MT_0000000417)

 

Please let me know if I need to provide more information.

 

Best regards,

Daniel