* mlx5: imissed / out_of_buffer counter always 0
@ 2023-06-02 12:59 Daniel Östman
2023-06-02 15:07 ` Slava Ovsiienko
0 siblings, 1 reply; 11+ messages in thread
From: Daniel Östman @ 2023-06-02 12:59 UTC (permalink / raw)
To: users; +Cc: matan, viacheslavo, maxime.coquelin, david.marchand
[-- Attachment #1: Type: text/plain, Size: 1298 bytes --]
Hi,
I’m deploying a containerized DPDK application in an OpenShift Kubernetes environment using DPDK 21.11.3.
The application uses a Mellanox ConnectX-5 100G NIC through VFs.
The problem I have is that the ETH stats counter imissed (which seems to be mapped to “out_of_buffer” internally in mlx5 PMD driver) is 0 when I don’t expect it to be, i.e. when the application doesn’t read the packets fast enough.
Using GDB I can see that it tries to access the counter through /sys/class/infiniband/mlx5_99/ports/1/hw_counters/out_of_buffer but the hw_counters directory is missing so it will just return a zero value. I don’t know why it is missing.
When looking at mlx5_os_read_dev_stat() I can see that there is an alternative way of reading the counter, through mlx5_devx_cmd_queue_counter_query() but under the condition that priv->q_counters are set.
It doesn’t get set in my case because mlx5_glue->devx_obj_create() fails (errno 22) in mlx5_devx_cmd_queue_counter_alloc().
Have I missed something?
NIC info:
Mellanox Technologies MT27800 Family [ConnectX-5] - 100Gb 2-port QSFP28 MCX516A-CCHT
driver: mlx5_core
version: 5.0-0
firmware-version: 16.33.1048 (MT_0000000417)
Please let me know if I need to provide more information.
Best regards,
Daniel
[-- Attachment #2: Type: text/html, Size: 4283 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: mlx5: imissed / out_of_buffer counter always 0
2023-06-02 12:59 mlx5: imissed / out_of_buffer counter always 0 Daniel Östman
@ 2023-06-02 15:07 ` Slava Ovsiienko
2023-06-05 10:29 ` Erez Ferber
0 siblings, 1 reply; 11+ messages in thread
From: Slava Ovsiienko @ 2023-06-02 15:07 UTC (permalink / raw)
To: Daniel Östman, users; +Cc: Matan Azrad, maxime.coquelin, david.marchand
[-- Attachment #1: Type: text/plain, Size: 2070 bytes --]
Hi, Daniel
I would recommend to take the following action:
- update the firmware, 16.33.xxxx looks to be outdated a little bit. Please, try 16.35.1012 or later.
mlx5_glue->devx_obj_create might succeed with the newer FW.
- try to specify dv_flow_en=0 devarg, it forces mlx5 PMD to use rdma_core library for queue management
and kernel driver will be aware about Rx queues being created and attach them to the kernel counter set
With best regards,
Slava
From: Daniel Östman <daniel.ostman@ericsson.com>
Sent: Friday, June 2, 2023 3:59 PM
To: users@dpdk.org
Cc: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko <viacheslavo@nvidia.com>; maxime.coquelin@redhat.com; david.marchand@redhat.com
Subject: mlx5: imissed / out_of_buffer counter always 0
Hi,
I’m deploying a containerized DPDK application in an OpenShift Kubernetes environment using DPDK 21.11.3.
The application uses a Mellanox ConnectX-5 100G NIC through VFs.
The problem I have is that the ETH stats counter imissed (which seems to be mapped to “out_of_buffer” internally in mlx5 PMD driver) is 0 when I don’t expect it to be, i.e. when the application doesn’t read the packets fast enough.
Using GDB I can see that it tries to access the counter through /sys/class/infiniband/mlx5_99/ports/1/hw_counters/out_of_buffer but the hw_counters directory is missing so it will just return a zero value. I don’t know why it is missing.
When looking at mlx5_os_read_dev_stat() I can see that there is an alternative way of reading the counter, through mlx5_devx_cmd_queue_counter_query() but under the condition that priv->q_counters are set.
It doesn’t get set in my case because mlx5_glue->devx_obj_create() fails (errno 22) in mlx5_devx_cmd_queue_counter_alloc().
Have I missed something?
NIC info:
Mellanox Technologies MT27800 Family [ConnectX-5] - 100Gb 2-port QSFP28 MCX516A-CCHT
driver: mlx5_core
version: 5.0-0
firmware-version: 16.33.1048 (MT_0000000417)
Please let me know if I need to provide more information.
Best regards,
Daniel
[-- Attachment #2: Type: text/html, Size: 5080 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: mlx5: imissed / out_of_buffer counter always 0
2023-06-02 15:07 ` Slava Ovsiienko
@ 2023-06-05 10:29 ` Erez Ferber
2023-06-05 14:00 ` Daniel Östman
0 siblings, 1 reply; 11+ messages in thread
From: Erez Ferber @ 2023-06-05 10:29 UTC (permalink / raw)
To: Slava Ovsiienko
Cc: Daniel Östman, users, Matan Azrad, maxime.coquelin, david.marchand
[-- Attachment #1: Type: text/plain, Size: 2587 bytes --]
Hi Daniel,
is the container running in shared or non-shared mode ?
For shared mode, I assume the kernel sysfs counters which DPDK relies on
for imissed/out_of_buffer are not exposed.
Best regards,
Erez
On Fri, 2 Jun 2023 at 18:07, Slava Ovsiienko <viacheslavo@nvidia.com> wrote:
> Hi, Daniel
>
>
>
> I would recommend to take the following action:
>
> - update the firmware, 16.33.xxxx looks to be outdated a little bit.
> Please, try 16.35.1012 or later.
> mlx5_glue->devx_obj_create might succeed with the newer FW.
>
> - try to specify dv_flow_en=0 devarg, it forces mlx5 PMD to use rdma_core
> library for queue management
> and kernel driver will be aware about Rx queues being created and attach
> them to the kernel counter set
>
>
>
> With best regards,
> Slava
>
>
>
> *From:* Daniel Östman <daniel.ostman@ericsson.com>
> *Sent:* Friday, June 2, 2023 3:59 PM
> *To:* users@dpdk.org
> *Cc:* Matan Azrad <matan@nvidia.com>; Slava Ovsiienko <
> viacheslavo@nvidia.com>; maxime.coquelin@redhat.com;
> david.marchand@redhat.com
> *Subject:* mlx5: imissed / out_of_buffer counter always 0
>
>
>
> Hi,
>
>
>
> I’m deploying a containerized DPDK application in an OpenShift Kubernetes
> environment using DPDK 21.11.3.
>
> The application uses a Mellanox ConnectX-5 100G NIC through VFs.
>
>
>
> The problem I have is that the ETH stats counter imissed (which seems to
> be mapped to “out_of_buffer” internally in mlx5 PMD driver) is 0 when I
> don’t expect it to be, i.e. when the application doesn’t read the packets
> fast enough.
>
>
>
> Using GDB I can see that it tries to access the counter through
> /sys/class/infiniband/mlx5_99/ports/1/hw_counters/out_of_buffer but the
> hw_counters directory is missing so it will just return a zero value. I
> don’t know why it is missing.
>
> When looking at mlx5_os_read_dev_stat() I can see that there is an
> alternative way of reading the counter, through
> mlx5_devx_cmd_queue_counter_query() but under the condition that
> priv->q_counters are set.
>
> It doesn’t get set in my case because mlx5_glue->devx_obj_create() fails
> (errno 22) in mlx5_devx_cmd_queue_counter_alloc().
>
>
>
> Have I missed something?
>
>
>
> NIC info:
>
> Mellanox Technologies MT27800 Family [ConnectX-5] - 100Gb 2-port QSFP28
> MCX516A-CCHT
> driver: mlx5_core
> version: 5.0-0
> firmware-version: 16.33.1048 (MT_0000000417)
>
>
>
> Please let me know if I need to provide more information.
>
>
>
> Best regards,
>
> Daniel
>
>
>
[-- Attachment #2: Type: text/html, Size: 5188 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: mlx5: imissed / out_of_buffer counter always 0
2023-06-05 10:29 ` Erez Ferber
@ 2023-06-05 14:00 ` Daniel Östman
2023-06-21 20:22 ` Maxime Coquelin
0 siblings, 1 reply; 11+ messages in thread
From: Daniel Östman @ 2023-06-05 14:00 UTC (permalink / raw)
To: Erez Ferber, Slava Ovsiienko
Cc: users, Matan Azrad, maxime.coquelin, david.marchand
[-- Attachment #1: Type: text/plain, Size: 3783 bytes --]
Hi Slava and Erez and thanks for your answers,
Regarding the firmware, I’ve also deployed in a different OpenShift cluster were I see the exact same issue but with a different Mellanox NIC:
Mellanox Technologies MT2892 Family - ConnectX-6 DX 2-port 100GbE QSFP56 PCIe Adapter
driver: mlx5_core
version: 5.0-0
firmware-version: 22.36.1010 (DEL0000000027)
From what I can see the firmware is relatively new on that one?
I tried setting dv_flow_en=0 (and saw that it was propagated to config->dv_flow_en) but it didn’t seem to help.
Erez, I’m not sure what you mean by shared or non-shared mode in this case, however it seems it could be related to the fact that the container is running in a separate network namespace. Because the hw_counter directory is available on the host (cluster node), but not in the pod container.
Best regards,
Daniel
From: Erez Ferber <erezferber@gmail.com>
Sent: Monday, 5 June 2023 12:29
To: Slava Ovsiienko <viacheslavo@nvidia.com>
Cc: Daniel Östman <daniel.ostman@ericsson.com>; users@dpdk.org; Matan Azrad <matan@nvidia.com>; maxime.coquelin@redhat.com; david.marchand@redhat.com
Subject: Re: mlx5: imissed / out_of_buffer counter always 0
Hi Daniel,
is the container running in shared or non-shared mode ?
For shared mode, I assume the kernel sysfs counters which DPDK relies on for imissed/out_of_buffer are not exposed.
Best regards,
Erez
On Fri, 2 Jun 2023 at 18:07, Slava Ovsiienko <viacheslavo@nvidia.com<mailto:viacheslavo@nvidia.com>> wrote:
Hi, Daniel
I would recommend to take the following action:
- update the firmware, 16.33.xxxx looks to be outdated a little bit. Please, try 16.35.1012 or later.
mlx5_glue->devx_obj_create might succeed with the newer FW.
- try to specify dv_flow_en=0 devarg, it forces mlx5 PMD to use rdma_core library for queue management
and kernel driver will be aware about Rx queues being created and attach them to the kernel counter set
With best regards,
Slava
From: Daniel Östman <daniel.ostman@ericsson.com<mailto:daniel.ostman@ericsson.com>>
Sent: Friday, June 2, 2023 3:59 PM
To: users@dpdk.org<mailto:users@dpdk.org>
Cc: Matan Azrad <matan@nvidia.com<mailto:matan@nvidia.com>>; Slava Ovsiienko <viacheslavo@nvidia.com<mailto:viacheslavo@nvidia.com>>; maxime.coquelin@redhat.com<mailto:maxime.coquelin@redhat.com>; david.marchand@redhat.com<mailto:david.marchand@redhat.com>
Subject: mlx5: imissed / out_of_buffer counter always 0
Hi,
I’m deploying a containerized DPDK application in an OpenShift Kubernetes environment using DPDK 21.11.3.
The application uses a Mellanox ConnectX-5 100G NIC through VFs.
The problem I have is that the ETH stats counter imissed (which seems to be mapped to “out_of_buffer” internally in mlx5 PMD driver) is 0 when I don’t expect it to be, i.e. when the application doesn’t read the packets fast enough.
Using GDB I can see that it tries to access the counter through /sys/class/infiniband/mlx5_99/ports/1/hw_counters/out_of_buffer but the hw_counters directory is missing so it will just return a zero value. I don’t know why it is missing.
When looking at mlx5_os_read_dev_stat() I can see that there is an alternative way of reading the counter, through mlx5_devx_cmd_queue_counter_query() but under the condition that priv->q_counters are set.
It doesn’t get set in my case because mlx5_glue->devx_obj_create() fails (errno 22) in mlx5_devx_cmd_queue_counter_alloc().
Have I missed something?
NIC info:
Mellanox Technologies MT27800 Family [ConnectX-5] - 100Gb 2-port QSFP28 MCX516A-CCHT
driver: mlx5_core
version: 5.0-0
firmware-version: 16.33.1048 (MT_0000000417)
Please let me know if I need to provide more information.
Best regards,
Daniel
[-- Attachment #2: Type: text/html, Size: 12568 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: mlx5: imissed / out_of_buffer counter always 0
2023-06-05 14:00 ` Daniel Östman
@ 2023-06-21 20:22 ` Maxime Coquelin
2023-06-22 15:47 ` Maxime Coquelin
0 siblings, 1 reply; 11+ messages in thread
From: Maxime Coquelin @ 2023-06-21 20:22 UTC (permalink / raw)
To: Daniel Östman, Erez Ferber, Slava Ovsiienko
Cc: users, Matan Azrad, david.marchand
Hi Daniel, all,
On 6/5/23 16:00, Daniel Östman wrote:
> Hi Slava and Erez and thanks for your answers,
>
> Regarding the firmware, I’ve also deployed in a different OpenShift
> cluster were I see the exact same issue but with a different Mellanox NIC:
>
> Mellanox Technologies MT2892 Family - ConnectX-6 DX 2-port 100GbE QSFP56
> PCIe Adapter
>
> driver: mlx5_core
>
> version: 5.0-0
> firmware-version: 22.36.1010 (DEL0000000027)
>
> From what I can see the firmware is relatively new on that one?
With below configuration:
- ConnectX-6 Dx MT2892
- Kernel: 6.4.0-rc6
- FW version: 22.35.1012 (MT_0000000528)
The out-of-buffer counter is fetched via
mlx5_devx_cmd_queue_counter_query():
[pid 2942] ioctl(17, RDMA_VERBS_IOCTL, 0x7ffcb15bcd10) = 0
[pid 2942] write(1, "\n ######################## NIC "..., 80) = 80
[pid 2942] write(1, " RX-packets: 630997736 RX-miss"..., 70) = 70
[pid 2942] write(1, " RX-errors: 0\n", 15) = 15
[pid 2942] write(1, " RX-nombuf: 0 \n", 25) = 25
[pid 2942] write(1, " TX-packets: 0 TX-erro"..., 60) = 60
[pid 2942] write(1, "\n", 1) = 1
[pid 2942] write(1, " Throughput (since last show)\n", 31) = 31
[pid 2942] write(1, " Rx-pps: 0 "..., 106) = 106
[pid 2942] write(1, " ##############################"..., 79) = 79
It looks like we may miss some mlx5 kernel patches so that we can use
mlx5_devx_cmd_queue_counter_query() with RHEL?
Erez, Slava, any idea on the patches that could be missing?
Regards,
Maxime
>
> I tried setting dv_flow_en=0 (and saw that it was propagated to
> config->dv_flow_en) but it didn’t seem to help.
>
> Erez, I’m not sure what you mean by shared or non-shared mode in this
> case, however it seems it could be related to the fact that the
> container is running in a separate network namespace. Because the
> hw_counter directory is available on the host (cluster node), but not in
> the pod container.
>
> Best regards,
>
> Daniel
>
> *From:*Erez Ferber <erezferber@gmail.com>
> *Sent:* Monday, 5 June 2023 12:29
> *To:* Slava Ovsiienko <viacheslavo@nvidia.com>
> *Cc:* Daniel Östman <daniel.ostman@ericsson.com>; users@dpdk.org; Matan
> Azrad <matan@nvidia.com>; maxime.coquelin@redhat.com;
> david.marchand@redhat.com
> *Subject:* Re: mlx5: imissed / out_of_buffer counter always 0
>
> Hi Daniel,
>
> is the container running in shared or non-shared mode ?
>
> For shared mode, I assume the kernel sysfs counters which DPDK relies on
> for imissed/out_of_buffer are not exposed.
>
> Best regards,
>
> Erez
>
> On Fri, 2 Jun 2023 at 18:07, Slava Ovsiienko <viacheslavo@nvidia.com
> <mailto:viacheslavo@nvidia.com>> wrote:
>
> Hi, Daniel
>
> I would recommend to take the following action:
>
> - update the firmware, 16.33.xxxx looks to be outdated a little bit.
> Please, try 16.35.1012 or later.
> mlx5_glue->devx_obj_create might succeed with the newer FW.
>
> - try to specify dv_flow_en=0 devarg, it forces mlx5 PMD to use
> rdma_core library for queue management
> and kernel driver will be aware about Rx queues being created and
> attach them to the kernel counter set
>
> With best regards,
> Slava
>
> *From:*Daniel Östman <daniel.ostman@ericsson.com
> <mailto:daniel.ostman@ericsson.com>>
> *Sent:* Friday, June 2, 2023 3:59 PM
> *To:* users@dpdk.org <mailto:users@dpdk.org>
> *Cc:* Matan Azrad <matan@nvidia.com <mailto:matan@nvidia.com>>;
> Slava Ovsiienko <viacheslavo@nvidia.com
> <mailto:viacheslavo@nvidia.com>>; maxime.coquelin@redhat.com
> <mailto:maxime.coquelin@redhat.com>; david.marchand@redhat.com
> <mailto:david.marchand@redhat.com>
> *Subject:* mlx5: imissed / out_of_buffer counter always 0
>
> Hi,
>
> I’m deploying a containerized DPDK application in an OpenShift
> Kubernetes environment using DPDK 21.11.3.
>
> The application uses a Mellanox ConnectX-5 100G NIC through VFs.
>
> The problem I have is that the ETH stats counter imissed (which
> seems to be mapped to “out_of_buffer” internally in mlx5 PMD driver)
> is 0 when I don’t expect it to be, i.e. when the application doesn’t
> read the packets fast enough.
>
> Using GDB I can see that it tries to access the counter through
> /sys/class/infiniband/mlx5_99/ports/1/hw_counters/out_of_buffer but
> the hw_counters directory is missing so it will just return a zero
> value. I don’t know why it is missing.
>
> When looking at mlx5_os_read_dev_stat() I can see that there is an
> alternative way of reading the counter, through
> mlx5_devx_cmd_queue_counter_query() but under the condition that
> priv->q_counters are set.
>
> It doesn’t get set in my case because mlx5_glue->devx_obj_create()
> fails (errno 22) in mlx5_devx_cmd_queue_counter_alloc().
>
> Have I missed something?
>
> NIC info:
>
> Mellanox Technologies MT27800 Family [ConnectX-5] - 100Gb 2-port
> QSFP28 MCX516A-CCHT
> driver: mlx5_core
> version: 5.0-0
> firmware-version: 16.33.1048 (MT_0000000417)
>
> Please let me know if I need to provide more information.
>
> Best regards,
>
> Daniel
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: mlx5: imissed / out_of_buffer counter always 0
2023-06-21 20:22 ` Maxime Coquelin
@ 2023-06-22 15:47 ` Maxime Coquelin
2023-08-18 12:04 ` Daniel Östman
0 siblings, 1 reply; 11+ messages in thread
From: Maxime Coquelin @ 2023-06-22 15:47 UTC (permalink / raw)
To: Daniel Östman, Erez Ferber, Slava Ovsiienko
Cc: users, Matan Azrad, david.marchand
Hi,
On 6/21/23 22:22, Maxime Coquelin wrote:
> Hi Daniel, all,
>
> On 6/5/23 16:00, Daniel Östman wrote:
>> Hi Slava and Erez and thanks for your answers,
>>
>> Regarding the firmware, I’ve also deployed in a different OpenShift
>> cluster were I see the exact same issue but with a different Mellanox
>> NIC:
>>
>> Mellanox Technologies MT2892 Family - ConnectX-6 DX 2-port 100GbE
>> QSFP56 PCIe Adapter
>>
>> driver: mlx5_core
>>
>> version: 5.0-0
>> firmware-version: 22.36.1010 (DEL0000000027)
>>
>> From what I can see the firmware is relatively new on that one?
>
> With below configuration:
> - ConnectX-6 Dx MT2892
> - Kernel: 6.4.0-rc6
> - FW version: 22.35.1012 (MT_0000000528)
>
> The out-of-buffer counter is fetched via
> mlx5_devx_cmd_queue_counter_query():
>
> [pid 2942] ioctl(17, RDMA_VERBS_IOCTL, 0x7ffcb15bcd10) = 0
> [pid 2942] write(1, "\n ######################## NIC "..., 80) = 80
> [pid 2942] write(1, " RX-packets: 630997736 RX-miss"..., 70) = 70
> [pid 2942] write(1, " RX-errors: 0\n", 15) = 15
> [pid 2942] write(1, " RX-nombuf: 0 \n", 25) = 25
> [pid 2942] write(1, " TX-packets: 0 TX-erro"..., 60) = 60
> [pid 2942] write(1, "\n", 1) = 1
> [pid 2942] write(1, " Throughput (since last show)\n", 31) = 31
> [pid 2942] write(1, " Rx-pps: 0 "..., 106) = 106
> [pid 2942] write(1, " ##############################"..., 79) = 79
>
> It looks like we may miss some mlx5 kernel patches so that we can use
> mlx5_devx_cmd_queue_counter_query() with RHEL?
>
> Erez, Slava, any idea on the patches that could be missing?
Above test was on baremetal as root, I get the same "working" behaviour
on RHEL as root.
We managed to reproduce Daniel's with running the same within a
container, enabling debug logs we have this warning:
mlx5_common: DevX create q counter set failed errno=121 status=0x2
syndrome=0x8975f1
mlx5_net: Port 0 queue counter object cannot be created by DevX -
fall-back to use the kernel driver global queue counter.
Running the container as privileged solves the issue, and so does when
adding SYS_RAWIO capability to the container.
Erez, Slava, is that expected to require SYS_RAWIO just to get a stat
counter?
Daniel, could you try adding SYS_RAWIO to your pod to confirm you face
the same issue?
Thanks in advance,
Maxime
> Regards,
> Maxime
>
>>
>> I tried setting dv_flow_en=0 (and saw that it was propagated to
>> config->dv_flow_en) but it didn’t seem to help.
>>
>> Erez, I’m not sure what you mean by shared or non-shared mode in this
>> case, however it seems it could be related to the fact that the
>> container is running in a separate network namespace. Because the
>> hw_counter directory is available on the host (cluster node), but not
>> in the pod container.
>>
>> Best regards,
>>
>> Daniel
>>
>> *From:*Erez Ferber <erezferber@gmail.com>
>> *Sent:* Monday, 5 June 2023 12:29
>> *To:* Slava Ovsiienko <viacheslavo@nvidia.com>
>> *Cc:* Daniel Östman <daniel.ostman@ericsson.com>; users@dpdk.org;
>> Matan Azrad <matan@nvidia.com>; maxime.coquelin@redhat.com;
>> david.marchand@redhat.com
>> *Subject:* Re: mlx5: imissed / out_of_buffer counter always 0
>>
>> Hi Daniel,
>>
>> is the container running in shared or non-shared mode ?
>>
>> For shared mode, I assume the kernel sysfs counters which DPDK relies
>> on for imissed/out_of_buffer are not exposed.
>>
>> Best regards,
>>
>> Erez
>>
>> On Fri, 2 Jun 2023 at 18:07, Slava Ovsiienko <viacheslavo@nvidia.com
>> <mailto:viacheslavo@nvidia.com>> wrote:
>>
>> Hi, Daniel
>>
>> I would recommend to take the following action:
>>
>> - update the firmware, 16.33.xxxx looks to be outdated a little bit.
>> Please, try 16.35.1012 or later.
>> mlx5_glue->devx_obj_create might succeed with the newer FW.
>>
>> - try to specify dv_flow_en=0 devarg, it forces mlx5 PMD to use
>> rdma_core library for queue management
>> and kernel driver will be aware about Rx queues being created and
>> attach them to the kernel counter set
>>
>> With best regards,
>> Slava
>>
>> *From:*Daniel Östman <daniel.ostman@ericsson.com
>> <mailto:daniel.ostman@ericsson.com>>
>> *Sent:* Friday, June 2, 2023 3:59 PM
>> *To:* users@dpdk.org <mailto:users@dpdk.org>
>> *Cc:* Matan Azrad <matan@nvidia.com <mailto:matan@nvidia.com>>;
>> Slava Ovsiienko <viacheslavo@nvidia.com
>> <mailto:viacheslavo@nvidia.com>>; maxime.coquelin@redhat.com
>> <mailto:maxime.coquelin@redhat.com>; david.marchand@redhat.com
>> <mailto:david.marchand@redhat.com>
>> *Subject:* mlx5: imissed / out_of_buffer counter always 0
>>
>> Hi,
>>
>> I’m deploying a containerized DPDK application in an OpenShift
>> Kubernetes environment using DPDK 21.11.3.
>>
>> The application uses a Mellanox ConnectX-5 100G NIC through VFs.
>>
>> The problem I have is that the ETH stats counter imissed (which
>> seems to be mapped to “out_of_buffer” internally in mlx5 PMD driver)
>> is 0 when I don’t expect it to be, i.e. when the application doesn’t
>> read the packets fast enough.
>>
>> Using GDB I can see that it tries to access the counter through
>> /sys/class/infiniband/mlx5_99/ports/1/hw_counters/out_of_buffer but
>> the hw_counters directory is missing so it will just return a zero
>> value. I don’t know why it is missing.
>>
>> When looking at mlx5_os_read_dev_stat() I can see that there is an
>> alternative way of reading the counter, through
>> mlx5_devx_cmd_queue_counter_query() but under the condition that
>> priv->q_counters are set.
>>
>> It doesn’t get set in my case because mlx5_glue->devx_obj_create()
>> fails (errno 22) in mlx5_devx_cmd_queue_counter_alloc().
>>
>> Have I missed something?
>>
>> NIC info:
>>
>> Mellanox Technologies MT27800 Family [ConnectX-5] - 100Gb 2-port
>> QSFP28 MCX516A-CCHT
>> driver: mlx5_core
>> version: 5.0-0
>> firmware-version: 16.33.1048 (MT_0000000417)
>>
>> Please let me know if I need to provide more information.
>>
>> Best regards,
>>
>> Daniel
>>
^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: mlx5: imissed / out_of_buffer counter always 0
2023-06-22 15:47 ` Maxime Coquelin
@ 2023-08-18 12:04 ` Daniel Östman
2023-10-04 13:49 ` Maxime Coquelin
0 siblings, 1 reply; 11+ messages in thread
From: Daniel Östman @ 2023-08-18 12:04 UTC (permalink / raw)
To: Maxime Coquelin, Erez Ferber, Slava Ovsiienko
Cc: users, Matan Azrad, david.marchand
Hi Maxime,
Sorry for the late reply, I've been on vacation.
Please see my answer below.
/ Daniel
> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Thursday, 22 June 2023 17:48
> To: Daniel Östman <daniel.ostman@ericsson.com>; Erez Ferber
> <erezferber@gmail.com>; Slava Ovsiienko <viacheslavo@nvidia.com>
> Cc: users@dpdk.org; Matan Azrad <matan@nvidia.com>;
> david.marchand@redhat.com
> Subject: Re: mlx5: imissed / out_of_buffer counter always 0
>
> Hi,
>
> On 6/21/23 22:22, Maxime Coquelin wrote:
> > Hi Daniel, all,
> >
> > On 6/5/23 16:00, Daniel Östman wrote:
> >> Hi Slava and Erez and thanks for your answers,
> >>
> >> Regarding the firmware, I’ve also deployed in a different OpenShift
> >> cluster were I see the exact same issue but with a different Mellanox
> >> NIC:
> >>
> >> Mellanox Technologies MT2892 Family - ConnectX-6 DX 2-port 100GbE
> >> QSFP56 PCIe Adapter
> >>
> >> driver: mlx5_core
> >>
> >> version: 5.0-0
> >> firmware-version: 22.36.1010 (DEL0000000027)
> >>
> >> From what I can see the firmware is relatively new on that one?
> >
> > With below configuration:
> > - ConnectX-6 Dx MT2892
> > - Kernel: 6.4.0-rc6
> > - FW version: 22.35.1012 (MT_0000000528)
> >
> > The out-of-buffer counter is fetched via
> > mlx5_devx_cmd_queue_counter_query():
> >
> > [pid 2942] ioctl(17, RDMA_VERBS_IOCTL, 0x7ffcb15bcd10) = 0 [pid
> > 2942] write(1, "\n ######################## NIC "..., 80) = 80 [pid
> > 2942] write(1, " RX-packets: 630997736 RX-miss"..., 70) = 70 [pid
> > 2942] write(1, " RX-errors: 0\n", 15) = 15 [pid 2942] write(1, "
> > RX-nombuf: 0 \n", 25) = 25 [pid 2942] write(1, "
> > TX-packets: 0 TX-erro"..., 60) = 60 [pid 2942] write(1,
> > "\n", 1) = 1 [pid 2942] write(1, " Throughput (since last
> > show)\n", 31) = 31 [pid 2942] write(1, " Rx-pps: 0
> > "..., 106) = 106 [pid 2942] write(1, "
> > ##############################"..., 79) = 79
> >
> > It looks like we may miss some mlx5 kernel patches so that we can use
> > mlx5_devx_cmd_queue_counter_query() with RHEL?
> >
> > Erez, Slava, any idea on the patches that could be missing?
>
> Above test was on baremetal as root, I get the same "working" behaviour on
> RHEL as root.
>
> We managed to reproduce Daniel's with running the same within a container,
> enabling debug logs we have this warning:
>
> mlx5_common: DevX create q counter set failed errno=121 status=0x2
> syndrome=0x8975f1
> mlx5_net: Port 0 queue counter object cannot be created by DevX - fall-back
> to use the kernel driver global queue counter.
>
> Running the container as privileged solves the issue, and so does when
> adding SYS_RAWIO capability to the container.
>
> Erez, Slava, is that expected to require SYS_RAWIO just to get a stat counter?
>
> Daniel, could you try adding SYS_RAWIO to your pod to confirm you face the
> same issue?
Yes I can confirm what you are seeing when running in a cluster with Openshift 4.12 (RHEL 8.6) and with SYS_RAWIO or running as privileged.
But with privileged container I also need to run with UID 0 for it to work, is that what you are doing as well?
In both these cases the counter can be successfully retrieved through the DevX interface.
However, when running in a cluster with Openshift 4.10 (RHEL 8.4) I can not get it to work with any of these two approaches.
> Thanks in advance,
> Maxime
> > Regards,
> > Maxime
> >
> >>
> >> I tried setting dv_flow_en=0 (and saw that it was propagated to
> >> config->dv_flow_en) but it didn’t seem to help.
> >>
> >> Erez, I’m not sure what you mean by shared or non-shared mode in this
> >> case, however it seems it could be related to the fact that the
> >> container is running in a separate network namespace. Because the
> >> hw_counter directory is available on the host (cluster node), but not
> >> in the pod container.
> >>
> >> Best regards,
> >>
> >> Daniel
> >>
> >> *From:*Erez Ferber <erezferber@gmail.com>
> >> *Sent:* Monday, 5 June 2023 12:29
> >> *To:* Slava Ovsiienko <viacheslavo@nvidia.com>
> >> *Cc:* Daniel Östman <daniel.ostman@ericsson.com>; users@dpdk.org;
> >> Matan Azrad <matan@nvidia.com>; maxime.coquelin@redhat.com;
> >> david.marchand@redhat.com
> >> *Subject:* Re: mlx5: imissed / out_of_buffer counter always 0
> >>
> >> Hi Daniel,
> >>
> >> is the container running in shared or non-shared mode ?
> >>
> >> For shared mode, I assume the kernel sysfs counters which DPDK relies
> >> on for imissed/out_of_buffer are not exposed.
> >>
> >> Best regards,
> >>
> >> Erez
> >>
> >> On Fri, 2 Jun 2023 at 18:07, Slava Ovsiienko <viacheslavo@nvidia.com
> >> <mailto:viacheslavo@nvidia.com>> wrote:
> >>
> >> Hi, Daniel
> >>
> >> I would recommend to take the following action:
> >>
> >> - update the firmware, 16.33.xxxx looks to be outdated a little bit.
> >> Please, try 16.35.1012 or later.
> >> mlx5_glue->devx_obj_create might succeed with the newer FW.
> >>
> >> - try to specify dv_flow_en=0 devarg, it forces mlx5 PMD to use
> >> rdma_core library for queue management
> >> and kernel driver will be aware about Rx queues being created
> >> and
> >> attach them to the kernel counter set
> >>
> >> With best regards,
> >> Slava
> >>
> >> *From:*Daniel Östman <daniel.ostman@ericsson.com
> >> <mailto:daniel.ostman@ericsson.com>>
> >> *Sent:* Friday, June 2, 2023 3:59 PM
> >> *To:* users@dpdk.org <mailto:users@dpdk.org>
> >> *Cc:* Matan Azrad <matan@nvidia.com <mailto:matan@nvidia.com>>;
> >> Slava Ovsiienko <viacheslavo@nvidia.com
> >> <mailto:viacheslavo@nvidia.com>>; maxime.coquelin@redhat.com
> >> <mailto:maxime.coquelin@redhat.com>; david.marchand@redhat.com
> >> <mailto:david.marchand@redhat.com>
> >> *Subject:* mlx5: imissed / out_of_buffer counter always 0
> >>
> >> Hi,
> >>
> >> I’m deploying a containerized DPDK application in an OpenShift
> >> Kubernetes environment using DPDK 21.11.3.
> >>
> >> The application uses a Mellanox ConnectX-5 100G NIC through VFs.
> >>
> >> The problem I have is that the ETH stats counter imissed (which
> >> seems to be mapped to “out_of_buffer” internally in mlx5 PMD
> >> driver)
> >> is 0 when I don’t expect it to be, i.e. when the application
> >> doesn’t
> >> read the packets fast enough.
> >>
> >> Using GDB I can see that it tries to access the counter through
> >> /sys/class/infiniband/mlx5_99/ports/1/hw_counters/out_of_buffer
> >> but
> >> the hw_counters directory is missing so it will just return a
> >> zero
> >> value. I don’t know why it is missing.
> >>
> >> When looking at mlx5_os_read_dev_stat() I can see that there is
> >> an
> >> alternative way of reading the counter, through
> >> mlx5_devx_cmd_queue_counter_query() but under the condition that
> >> priv->q_counters are set.
> >>
> >> It doesn’t get set in my case because
> >> mlx5_glue->devx_obj_create()
> >> fails (errno 22) in mlx5_devx_cmd_queue_counter_alloc().
> >>
> >> Have I missed something?
> >>
> >> NIC info:
> >>
> >> Mellanox Technologies MT27800 Family [ConnectX-5] - 100Gb 2-port
> >> QSFP28 MCX516A-CCHT
> >> driver: mlx5_core
> >> version: 5.0-0
> >> firmware-version: 16.33.1048 (MT_0000000417)
> >>
> >> Please let me know if I need to provide more information.
> >>
> >> Best regards,
> >>
> >> Daniel
> >>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: mlx5: imissed / out_of_buffer counter always 0
2023-08-18 12:04 ` Daniel Östman
@ 2023-10-04 13:49 ` Maxime Coquelin
2023-11-08 12:55 ` Daniel Östman
0 siblings, 1 reply; 11+ messages in thread
From: Maxime Coquelin @ 2023-10-04 13:49 UTC (permalink / raw)
To: Daniel Östman, Erez Ferber, Slava Ovsiienko
Cc: users, Matan Azrad, david.marchand
Hi Daniel, Erez & Slava,
My time to be sorry, I missed this email when coming back from vacation.
On 8/18/23 14:04, Daniel Östman wrote:
> Hi Maxime,
>
> Sorry for the late reply, I've been on vacation.
> Please see my answer below.
>
> / Daniel
>
>> -----Original Message-----
>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
>> Sent: Thursday, 22 June 2023 17:48
>> To: Daniel Östman <daniel.ostman@ericsson.com>; Erez Ferber
>> <erezferber@gmail.com>; Slava Ovsiienko <viacheslavo@nvidia.com>
>> Cc: users@dpdk.org; Matan Azrad <matan@nvidia.com>;
>> david.marchand@redhat.com
>> Subject: Re: mlx5: imissed / out_of_buffer counter always 0
>>
>> Hi,
>>
>> On 6/21/23 22:22, Maxime Coquelin wrote:
>>> Hi Daniel, all,
>>>
>>> On 6/5/23 16:00, Daniel Östman wrote:
>>>> Hi Slava and Erez and thanks for your answers,
>>>>
>>>> Regarding the firmware, I’ve also deployed in a different OpenShift
>>>> cluster were I see the exact same issue but with a different Mellanox
>>>> NIC:
>>>>
>>>> Mellanox Technologies MT2892 Family - ConnectX-6 DX 2-port 100GbE
>>>> QSFP56 PCIe Adapter
>>>>
>>>> driver: mlx5_core
>>>>
>>>> version: 5.0-0
>>>> firmware-version: 22.36.1010 (DEL0000000027)
>>>>
>>>> From what I can see the firmware is relatively new on that one?
>>>
>>> With below configuration:
>>> - ConnectX-6 Dx MT2892
>>> - Kernel: 6.4.0-rc6
>>> - FW version: 22.35.1012 (MT_0000000528)
>>>
>>> The out-of-buffer counter is fetched via
>>> mlx5_devx_cmd_queue_counter_query():
>>>
>>> [pid 2942] ioctl(17, RDMA_VERBS_IOCTL, 0x7ffcb15bcd10) = 0 [pid
>>> 2942] write(1, "\n ######################## NIC "..., 80) = 80 [pid
>>> 2942] write(1, " RX-packets: 630997736 RX-miss"..., 70) = 70 [pid
>>> 2942] write(1, " RX-errors: 0\n", 15) = 15 [pid 2942] write(1, "
>>> RX-nombuf: 0 \n", 25) = 25 [pid 2942] write(1, "
>>> TX-packets: 0 TX-erro"..., 60) = 60 [pid 2942] write(1,
>>> "\n", 1) = 1 [pid 2942] write(1, " Throughput (since last
>>> show)\n", 31) = 31 [pid 2942] write(1, " Rx-pps: 0
>>> "..., 106) = 106 [pid 2942] write(1,"
>>> ##############################"..., 79) = 79
>>>
>>> It looks like we may miss some mlx5 kernel patches so that we can use
>>> mlx5_devx_cmd_queue_counter_query() with RHEL?
>>>
>>> Erez, Slava, any idea on the patches that could be missing?
>>
>> Above test was on baremetal as root, I get the same "working" behaviour on
>> RHEL as root.
>>
>> We managed to reproduce Daniel's with running the same within a container,
>> enabling debug logs we have this warning:
>>
>> mlx5_common: DevX create q counter set failed errno=121 status=0x2
>> syndrome=0x8975f1
>> mlx5_net: Port 0 queue counter object cannot be created by DevX - fall-back
>> to use the kernel driver global queue counter.
>>
>> Running the container as privileged solves the issue, and so does when
>> adding SYS_RAWIO capability to the container.
>>
>> Erez, Slava, is that expected to require SYS_RAWIO just to get a stat counter?
Erez & Slava, could it be possible to get the stats counters via devx
without requiring SYS_RAWIO?
>>
>> Daniel, could you try adding SYS_RAWIO to your pod to confirm you face the
>> same issue?
>
> Yes I can confirm what you are seeing when running in a cluster with Openshift 4.12 (RHEL 8.6) and with SYS_RAWIO or running as privileged.
> But with privileged container I also need to run with UID 0 for it to work, is that what you are doing as well?
I don't have an OCP setup at hand right now to test it, but IIRC yes we
ran it with UID 0.
> In both these cases the counter can be successfully retrieved through the DevX interface.
Ok.
> However, when running in a cluster with Openshift 4.10 (RHEL 8.4) I can not get it to work with any of these two approaches.
I'm not sure this is Kernel related, as I tested on both RHEL-8.4.0 and
latest RHEL_8.4 and I can get que q counters via ioctl().
Maxime
>> Thanks in advance,
>> Maxime
>>> Regards,
>>> Maxime
>>>
>>>>
>>>> I tried setting dv_flow_en=0 (and saw that it was propagated to
>>>> config->dv_flow_en) but it didn’t seem to help.
>>>>
>>>> Erez, I’m not sure what you mean by shared or non-shared mode in this
>>>> case, however it seems it could be related to the fact that the
>>>> container is running in a separate network namespace. Because the
>>>> hw_counter directory is available on the host (cluster node), but not
>>>> in the pod container.
>>>>
>>>> Best regards,
>>>>
>>>> Daniel
>>>>
>>>> *From:*Erez Ferber <erezferber@gmail.com>
>>>> *Sent:* Monday, 5 June 2023 12:29
>>>> *To:* Slava Ovsiienko <viacheslavo@nvidia.com>
>>>> *Cc:* Daniel Östman <daniel.ostman@ericsson.com>; users@dpdk.org;
>>>> Matan Azrad <matan@nvidia.com>; maxime.coquelin@redhat.com;
>>>> david.marchand@redhat.com
>>>> *Subject:* Re: mlx5: imissed / out_of_buffer counter always 0
>>>>
>>>> Hi Daniel,
>>>>
>>>> is the container running in shared or non-shared mode ?
>>>>
>>>> For shared mode, I assume the kernel sysfs counters which DPDK relies
>>>> on for imissed/out_of_buffer are not exposed.
>>>>
>>>> Best regards,
>>>>
>>>> Erez
>>>>
>>>> On Fri, 2 Jun 2023 at 18:07, Slava Ovsiienko <viacheslavo@nvidia.com
>>>> <mailto:viacheslavo@nvidia.com>> wrote:
>>>>
>>>> Hi, Daniel
>>>>
>>>> I would recommend to take the following action:
>>>>
>>>> - update the firmware, 16.33.xxxx looks to be outdated a little bit.
>>>> Please, try 16.35.1012 or later.
>>>> mlx5_glue->devx_obj_create might succeed with the newer FW.
>>>>
>>>> - try to specify dv_flow_en=0 devarg, it forces mlx5 PMD to use
>>>> rdma_core library for queue management
>>>> and kernel driver will be aware about Rx queues being created
>>>> and
>>>> attach them to the kernel counter set
>>>>
>>>> With best regards,
>>>> Slava
>>>>
>>>> *From:*Daniel Östman <daniel.ostman@ericsson.com
>>>> <mailto:daniel.ostman@ericsson.com>>
>>>> *Sent:* Friday, June 2, 2023 3:59 PM
>>>> *To:* users@dpdk.org <mailto:users@dpdk.org>
>>>> *Cc:* Matan Azrad <matan@nvidia.com <mailto:matan@nvidia.com>>;
>>>> Slava Ovsiienko <viacheslavo@nvidia.com
>>>> <mailto:viacheslavo@nvidia.com>>; maxime.coquelin@redhat.com
>>>> <mailto:maxime.coquelin@redhat.com>; david.marchand@redhat.com
>>>> <mailto:david.marchand@redhat.com>
>>>> *Subject:* mlx5: imissed / out_of_buffer counter always 0
>>>>
>>>> Hi,
>>>>
>>>> I’m deploying a containerized DPDK application in an OpenShift
>>>> Kubernetes environment using DPDK 21.11.3.
>>>>
>>>> The application uses a Mellanox ConnectX-5 100G NIC through VFs.
>>>>
>>>> The problem I have is that the ETH stats counter imissed (which
>>>> seems to be mapped to “out_of_buffer” internally in mlx5 PMD
>>>> driver)
>>>> is 0 when I don’t expect it to be, i.e. when the application
>>>> doesn’t
>>>> read the packets fast enough.
>>>>
>>>> Using GDB I can see that it tries to access the counter through
>>>> /sys/class/infiniband/mlx5_99/ports/1/hw_counters/out_of_buffer
>>>> but
>>>> the hw_counters directory is missing so it will just return a
>>>> zero
>>>> value. I don’t know why it is missing.
>>>>
>>>> When looking at mlx5_os_read_dev_stat() I can see that there is
>>>> an
>>>> alternative way of reading the counter, through
>>>> mlx5_devx_cmd_queue_counter_query() but under the condition that
>>>> priv->q_counters are set.
>>>>
>>>> It doesn’t get set in my case because
>>>> mlx5_glue->devx_obj_create()
>>>> fails (errno 22) in mlx5_devx_cmd_queue_counter_alloc().
>>>>
>>>> Have I missed something?
>>>>
>>>> NIC info:
>>>>
>>>> Mellanox Technologies MT27800 Family [ConnectX-5] - 100Gb 2-port
>>>> QSFP28 MCX516A-CCHT
>>>> driver: mlx5_core
>>>> version: 5.0-0
>>>> firmware-version: 16.33.1048 (MT_0000000417)
>>>>
>>>> Please let me know if I need to provide more information.
>>>>
>>>> Best regards,
>>>>
>>>> Daniel
>>>>
^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: mlx5: imissed / out_of_buffer counter always 0
2023-10-04 13:49 ` Maxime Coquelin
@ 2023-11-08 12:55 ` Daniel Östman
2023-11-09 14:51 ` Slava Ovsiienko
0 siblings, 1 reply; 11+ messages in thread
From: Daniel Östman @ 2023-11-08 12:55 UTC (permalink / raw)
To: Maxime Coquelin, Erez Ferber, Slava Ovsiienko
Cc: users, Matan Azrad, david.marchand
Hi,
Any input from Nvidia on this? Matan perhaps?
The question here is if it's expected to require capability SYS_RAWIO just to get the out of buffer counter?
If so, any plans on changing that?
Best regards,
Daniel
> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Wednesday, 4 October 2023 15:49
> To: Daniel Östman <daniel.ostman@ericsson.com>; Erez Ferber
> <erezferber@gmail.com>; Slava Ovsiienko <viacheslavo@nvidia.com>
> Cc: users@dpdk.org; Matan Azrad <matan@nvidia.com>;
> david.marchand@redhat.com
> Subject: Re: mlx5: imissed / out_of_buffer counter always 0
>
> Hi Daniel, Erez & Slava,
>
> My time to be sorry, I missed this email when coming back from vacation.
>
> On 8/18/23 14:04, Daniel Östman wrote:
> > Hi Maxime,
> >
> > Sorry for the late reply, I've been on vacation.
> > Please see my answer below.
> >
> > / Daniel
> >
> >> -----Original Message-----
> >> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> >> Sent: Thursday, 22 June 2023 17:48
> >> To: Daniel Östman <daniel.ostman@ericsson.com>; Erez Ferber
> >> <erezferber@gmail.com>; Slava Ovsiienko <viacheslavo@nvidia.com>
> >> Cc: users@dpdk.org; Matan Azrad <matan@nvidia.com>;
> >> david.marchand@redhat.com
> >> Subject: Re: mlx5: imissed / out_of_buffer counter always 0
> >>
> >> Hi,
> >>
> >> On 6/21/23 22:22, Maxime Coquelin wrote:
> >>> Hi Daniel, all,
> >>>
> >>> On 6/5/23 16:00, Daniel Östman wrote:
> >>>> Hi Slava and Erez and thanks for your answers,
> >>>>
> >>>> Regarding the firmware, I’ve also deployed in a different OpenShift
> >>>> cluster were I see the exact same issue but with a different
> >>>> Mellanox
> >>>> NIC:
> >>>>
> >>>> Mellanox Technologies MT2892 Family - ConnectX-6 DX 2-port 100GbE
> >>>> QSFP56 PCIe Adapter
> >>>>
> >>>> driver: mlx5_core
> >>>>
> >>>> version: 5.0-0
> >>>> firmware-version: 22.36.1010 (DEL0000000027)
> >>>>
> >>>> From what I can see the firmware is relatively new on that one?
> >>>
> >>> With below configuration:
> >>> - ConnectX-6 Dx MT2892
> >>> - Kernel: 6.4.0-rc6
> >>> - FW version: 22.35.1012 (MT_0000000528)
> >>>
> >>> The out-of-buffer counter is fetched via
> >>> mlx5_devx_cmd_queue_counter_query():
> >>>
> >>> [pid 2942] ioctl(17, RDMA_VERBS_IOCTL, 0x7ffcb15bcd10) = 0 [pid
> >>> 2942] write(1, "\n ######################## NIC "..., 80) = 80 [pid
> >>> 2942] write(1, " RX-packets: 630997736 RX-miss"..., 70) = 70 [pid
> >>> 2942] write(1, " RX-errors: 0\n", 15) = 15 [pid 2942] write(1, "
> >>> RX-nombuf: 0 \n", 25) = 25 [pid 2942] write(1, "
> >>> TX-packets: 0 TX-erro"..., 60) = 60 [pid 2942] write(1,
> >>> "\n", 1) = 1 [pid 2942] write(1, " Throughput (since
> >>> last show)\n", 31) = 31 [pid 2942] write(1, " Rx-pps: 0
> >>> "..., 106) = 106 [pid 2942] write(1,"
> >>> ##############################"..., 79) = 79
> >>>
> >>> It looks like we may miss some mlx5 kernel patches so that we can
> >>> use
> >>> mlx5_devx_cmd_queue_counter_query() with RHEL?
> >>>
> >>> Erez, Slava, any idea on the patches that could be missing?
> >>
> >> Above test was on baremetal as root, I get the same "working"
> >> behaviour on RHEL as root.
> >>
> >> We managed to reproduce Daniel's with running the same within a
> >> container, enabling debug logs we have this warning:
> >>
> >> mlx5_common: DevX create q counter set failed errno=121 status=0x2
> >> syndrome=0x8975f1
> >> mlx5_net: Port 0 queue counter object cannot be created by DevX -
> >> fall-back to use the kernel driver global queue counter.
> >>
> >> Running the container as privileged solves the issue, and so does
> >> when adding SYS_RAWIO capability to the container.
> >>
> >> Erez, Slava, is that expected to require SYS_RAWIO just to get a stat
> counter?
>
> Erez & Slava, could it be possible to get the stats counters via devx without
> requiring SYS_RAWIO?
>
> >>
> >> Daniel, could you try adding SYS_RAWIO to your pod to confirm you
> >> face the same issue?
> >
> > Yes I can confirm what you are seeing when running in a cluster with
> Openshift 4.12 (RHEL 8.6) and with SYS_RAWIO or running as privileged.
> > But with privileged container I also need to run with UID 0 for it to work, is
> that what you are doing as well?
>
> I don't have an OCP setup at hand right now to test it, but IIRC yes we ran it
> with UID 0.
>
> > In both these cases the counter can be successfully retrieved through the
> DevX interface.
>
> Ok.
>
> > However, when running in a cluster with Openshift 4.10 (RHEL 8.4) I can not
> get it to work with any of these two approaches.
>
> I'm not sure this is Kernel related, as I tested on both RHEL-8.4.0 and latest
> RHEL_8.4 and I can get que q counters via ioctl().
>
> Maxime
>
> >> Thanks in advance,
> >> Maxime
> >>> Regards,
> >>> Maxime
> >>>
> >>>>
> >>>> I tried setting dv_flow_en=0 (and saw that it was propagated to
> >>>> config->dv_flow_en) but it didn’t seem to help.
> >>>>
> >>>> Erez, I’m not sure what you mean by shared or non-shared mode in
> >>>> this case, however it seems it could be related to the fact that
> >>>> the container is running in a separate network namespace. Because
> >>>> the hw_counter directory is available on the host (cluster node),
> >>>> but not in the pod container.
> >>>>
> >>>> Best regards,
> >>>>
> >>>> Daniel
> >>>>
> >>>> *From:*Erez Ferber <erezferber@gmail.com>
> >>>> *Sent:* Monday, 5 June 2023 12:29
> >>>> *To:* Slava Ovsiienko <viacheslavo@nvidia.com>
> >>>> *Cc:* Daniel Östman <daniel.ostman@ericsson.com>;
> users@dpdk.org;
> >>>> Matan Azrad <matan@nvidia.com>; maxime.coquelin@redhat.com;
> >>>> david.marchand@redhat.com
> >>>> *Subject:* Re: mlx5: imissed / out_of_buffer counter always 0
> >>>>
> >>>> Hi Daniel,
> >>>>
> >>>> is the container running in shared or non-shared mode ?
> >>>>
> >>>> For shared mode, I assume the kernel sysfs counters which DPDK
> >>>> relies on for imissed/out_of_buffer are not exposed.
> >>>>
> >>>> Best regards,
> >>>>
> >>>> Erez
> >>>>
> >>>> On Fri, 2 Jun 2023 at 18:07, Slava Ovsiienko
> >>>> <viacheslavo@nvidia.com <mailto:viacheslavo@nvidia.com>> wrote:
> >>>>
> >>>> Hi, Daniel
> >>>>
> >>>> I would recommend to take the following action:
> >>>>
> >>>> - update the firmware, 16.33.xxxx looks to be outdated a little bit.
> >>>> Please, try 16.35.1012 or later.
> >>>> mlx5_glue->devx_obj_create might succeed with the newer FW.
> >>>>
> >>>> - try to specify dv_flow_en=0 devarg, it forces mlx5 PMD to
> >>>> use
> >>>> rdma_core library for queue management
> >>>> and kernel driver will be aware about Rx queues being
> >>>> created and
> >>>> attach them to the kernel counter set
> >>>>
> >>>> With best regards,
> >>>> Slava
> >>>>
> >>>> *From:*Daniel Östman <daniel.ostman@ericsson.com
> >>>> <mailto:daniel.ostman@ericsson.com>>
> >>>> *Sent:* Friday, June 2, 2023 3:59 PM
> >>>> *To:* users@dpdk.org <mailto:users@dpdk.org>
> >>>> *Cc:* Matan Azrad <matan@nvidia.com
> >>>> <mailto:matan@nvidia.com>>;
> >>>> Slava Ovsiienko <viacheslavo@nvidia.com
> >>>> <mailto:viacheslavo@nvidia.com>>; maxime.coquelin@redhat.com
> >>>> <mailto:maxime.coquelin@redhat.com>;
> david.marchand@redhat.com
> >>>> <mailto:david.marchand@redhat.com>
> >>>> *Subject:* mlx5: imissed / out_of_buffer counter always 0
> >>>>
> >>>> Hi,
> >>>>
> >>>> I’m deploying a containerized DPDK application in an OpenShift
> >>>> Kubernetes environment using DPDK 21.11.3.
> >>>>
> >>>> The application uses a Mellanox ConnectX-5 100G NIC through VFs.
> >>>>
> >>>> The problem I have is that the ETH stats counter imissed
> >>>> (which
> >>>> seems to be mapped to “out_of_buffer” internally in mlx5 PMD
> >>>> driver)
> >>>> is 0 when I don’t expect it to be, i.e. when the application
> >>>> doesn’t
> >>>> read the packets fast enough.
> >>>>
> >>>> Using GDB I can see that it tries to access the counter
> >>>> through
> >>>>
> >>>> /sys/class/infiniband/mlx5_99/ports/1/hw_counters/out_of_buffer
> >>>> but
> >>>> the hw_counters directory is missing so it will just return a
> >>>> zero
> >>>> value. I don’t know why it is missing.
> >>>>
> >>>> When looking at mlx5_os_read_dev_stat() I can see that there
> >>>> is an
> >>>> alternative way of reading the counter, through
> >>>> mlx5_devx_cmd_queue_counter_query() but under the condition
> >>>> that
> >>>> priv->q_counters are set.
> >>>>
> >>>> It doesn’t get set in my case because
> >>>> mlx5_glue->devx_obj_create()
> >>>> fails (errno 22) in mlx5_devx_cmd_queue_counter_alloc().
> >>>>
> >>>> Have I missed something?
> >>>>
> >>>> NIC info:
> >>>>
> >>>> Mellanox Technologies MT27800 Family [ConnectX-5] - 100Gb
> >>>> 2-port
> >>>> QSFP28 MCX516A-CCHT
> >>>> driver: mlx5_core
> >>>> version: 5.0-0
> >>>> firmware-version: 16.33.1048 (MT_0000000417)
> >>>>
> >>>> Please let me know if I need to provide more information.
> >>>>
> >>>> Best regards,
> >>>>
> >>>> Daniel
> >>>>
^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: mlx5: imissed / out_of_buffer counter always 0
2023-11-08 12:55 ` Daniel Östman
@ 2023-11-09 14:51 ` Slava Ovsiienko
2023-12-06 12:40 ` Daniel Östman
0 siblings, 1 reply; 11+ messages in thread
From: Slava Ovsiienko @ 2023-11-09 14:51 UTC (permalink / raw)
To: Daniel Östman, Maxime Coquelin, Erez Ferber
Cc: users, Matan Azrad, david.marchand, Maayan Kashani, Bing Zhao
Hi,
Sorry for the late response.
The missed packets are counter by internal NIC counter, attached to Rx queues.
mlx5 can use 2 different approaches to create RxQ - either via rdma-core API (also known as "Verbs"),
or using direct FW calls (sure, via kernel thunks) (also known as "DevX").
The way PMD chooses mostly depends on the dv_flow_en devargs, for the dv_flow_en=0, the
legacy "Verbs" way is engaged.
Once RxQ is being created with Verbs, the kernel can automatically attach the "out-of-buf" internal
counter to the queue being created. For the DevX approach PMD should take explicit care about -
allocate the counter and attach queues to it.
If you are OK with mlx5_devx_cmd_queue_counter_query() (used for DevX) - try to force
this way with dv_flow_en=1 (IIRC, this is a default since 20.02). I'm no so familiar if kernel
requires RAWIO permission for this call.
As for the sysfs entry access (used for Verbs) - it is unlikely will be changed.
With best regards,
Slava
> -----Original Message-----
> From: Daniel Östman <daniel.ostman@ericsson.com>
> Sent: Wednesday, November 8, 2023 2:56 PM
> To: Maxime Coquelin <maxime.coquelin@redhat.com>; Erez Ferber
> <erezferber@gmail.com>; Slava Ovsiienko <viacheslavo@nvidia.com>
> Cc: users@dpdk.org; Matan Azrad <matan@nvidia.com>;
> david.marchand@redhat.com
> Subject: RE: mlx5: imissed / out_of_buffer counter always 0
>
> Hi,
>
> Any input from Nvidia on this? Matan perhaps?
> The question here is if it's expected to require capability SYS_RAWIO just to
> get the out of buffer counter?
> If so, any plans on changing that?
>
> Best regards,
> Daniel
>
> > -----Original Message-----
> > From: Maxime Coquelin <maxime.coquelin@redhat.com>
> > Sent: Wednesday, 4 October 2023 15:49
> > To: Daniel Östman <daniel.ostman@ericsson.com>; Erez Ferber
> > <erezferber@gmail.com>; Slava Ovsiienko <viacheslavo@nvidia.com>
> > Cc: users@dpdk.org; Matan Azrad <matan@nvidia.com>;
> > david.marchand@redhat.com
> > Subject: Re: mlx5: imissed / out_of_buffer counter always 0
> >
> > Hi Daniel, Erez & Slava,
> >
> > My time to be sorry, I missed this email when coming back from vacation.
> >
> > On 8/18/23 14:04, Daniel Östman wrote:
> > > Hi Maxime,
> > >
> > > Sorry for the late reply, I've been on vacation.
> > > Please see my answer below.
> > >
> > > / Daniel
> > >
> > >> -----Original Message-----
> > >> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> > >> Sent: Thursday, 22 June 2023 17:48
> > >> To: Daniel Östman <daniel.ostman@ericsson.com>; Erez Ferber
> > >> <erezferber@gmail.com>; Slava Ovsiienko <viacheslavo@nvidia.com>
> > >> Cc: users@dpdk.org; Matan Azrad <matan@nvidia.com>;
> > >> david.marchand@redhat.com
> > >> Subject: Re: mlx5: imissed / out_of_buffer counter always 0
> > >>
> > >> Hi,
> > >>
> > >> On 6/21/23 22:22, Maxime Coquelin wrote:
> > >>> Hi Daniel, all,
> > >>>
> > >>> On 6/5/23 16:00, Daniel Östman wrote:
> > >>>> Hi Slava and Erez and thanks for your answers,
> > >>>>
> > >>>> Regarding the firmware, I’ve also deployed in a different
> > >>>> OpenShift cluster were I see the exact same issue but with a
> > >>>> different Mellanox
> > >>>> NIC:
> > >>>>
> > >>>> Mellanox Technologies MT2892 Family - ConnectX-6 DX 2-port
> 100GbE
> > >>>> QSFP56 PCIe Adapter
> > >>>>
> > >>>> driver: mlx5_core
> > >>>>
> > >>>> version: 5.0-0
> > >>>> firmware-version: 22.36.1010 (DEL0000000027)
> > >>>>
> > >>>> From what I can see the firmware is relatively new on that one?
> > >>>
> > >>> With below configuration:
> > >>> - ConnectX-6 Dx MT2892
> > >>> - Kernel: 6.4.0-rc6
> > >>> - FW version: 22.35.1012 (MT_0000000528)
> > >>>
> > >>> The out-of-buffer counter is fetched via
> > >>> mlx5_devx_cmd_queue_counter_query():
> > >>>
> > >>> [pid 2942] ioctl(17, RDMA_VERBS_IOCTL, 0x7ffcb15bcd10) = 0 [pid
> > >>> 2942] write(1, "\n ######################## NIC "..., 80) = 80
> > >>> [pid 2942] write(1, " RX-packets: 630997736 RX-miss"..., 70) =
> > >>> 70 [pid 2942] write(1, " RX-errors: 0\n", 15) = 15 [pid 2942] write(1, "
> > >>> RX-nombuf: 0 \n", 25) = 25 [pid 2942] write(1, "
> > >>> TX-packets: 0 TX-erro"..., 60) = 60 [pid 2942] write(1,
> > >>> "\n", 1) = 1 [pid 2942] write(1, " Throughput (since
> > >>> last show)\n", 31) = 31 [pid 2942] write(1, " Rx-pps:
> > >>> 0 "..., 106) = 106 [pid 2942] write(1,"
> > >>> ##############################"..., 79) = 79
> > >>>
> > >>> It looks like we may miss some mlx5 kernel patches so that we can
> > >>> use
> > >>> mlx5_devx_cmd_queue_counter_query() with RHEL?
> > >>>
> > >>> Erez, Slava, any idea on the patches that could be missing?
> > >>
> > >> Above test was on baremetal as root, I get the same "working"
> > >> behaviour on RHEL as root.
> > >>
> > >> We managed to reproduce Daniel's with running the same within a
> > >> container, enabling debug logs we have this warning:
> > >>
> > >> mlx5_common: DevX create q counter set failed errno=121 status=0x2
> > >> syndrome=0x8975f1
> > >> mlx5_net: Port 0 queue counter object cannot be created by DevX -
> > >> fall-back to use the kernel driver global queue counter.
> > >>
> > >> Running the container as privileged solves the issue, and so does
> > >> when adding SYS_RAWIO capability to the container.
> > >>
> > >> Erez, Slava, is that expected to require SYS_RAWIO just to get a
> > >> stat
> > counter?
> >
> > Erez & Slava, could it be possible to get the stats counters via devx
> > without requiring SYS_RAWIO?
> >
> > >>
> > >> Daniel, could you try adding SYS_RAWIO to your pod to confirm you
> > >> face the same issue?
> > >
> > > Yes I can confirm what you are seeing when running in a cluster with
> > Openshift 4.12 (RHEL 8.6) and with SYS_RAWIO or running as privileged.
> > > But with privileged container I also need to run with UID 0 for it
> > > to work, is
> > that what you are doing as well?
> >
> > I don't have an OCP setup at hand right now to test it, but IIRC yes
> > we ran it with UID 0.
> >
> > > In both these cases the counter can be successfully retrieved
> > > through the
> > DevX interface.
> >
> > Ok.
> >
> > > However, when running in a cluster with Openshift 4.10 (RHEL 8.4) I
> > > can not
> > get it to work with any of these two approaches.
> >
> > I'm not sure this is Kernel related, as I tested on both RHEL-8.4.0
> > and latest
> > RHEL_8.4 and I can get que q counters via ioctl().
> >
> > Maxime
> >
> > >> Thanks in advance,
> > >> Maxime
> > >>> Regards,
> > >>> Maxime
> > >>>
> > >>>>
> > >>>> I tried setting dv_flow_en=0 (and saw that it was propagated to
> > >>>> config->dv_flow_en) but it didn’t seem to help.
> > >>>>
> > >>>> Erez, I’m not sure what you mean by shared or non-shared mode in
> > >>>> this case, however it seems it could be related to the fact that
> > >>>> the container is running in a separate network namespace. Because
> > >>>> the hw_counter directory is available on the host (cluster node),
> > >>>> but not in the pod container.
> > >>>>
> > >>>> Best regards,
> > >>>>
> > >>>> Daniel
> > >>>>
> > >>>> *From:*Erez Ferber <erezferber@gmail.com>
> > >>>> *Sent:* Monday, 5 June 2023 12:29
> > >>>> *To:* Slava Ovsiienko <viacheslavo@nvidia.com>
> > >>>> *Cc:* Daniel Östman <daniel.ostman@ericsson.com>;
> > users@dpdk.org;
> > >>>> Matan Azrad <matan@nvidia.com>; maxime.coquelin@redhat.com;
> > >>>> david.marchand@redhat.com
> > >>>> *Subject:* Re: mlx5: imissed / out_of_buffer counter always 0
> > >>>>
> > >>>> Hi Daniel,
> > >>>>
> > >>>> is the container running in shared or non-shared mode ?
> > >>>>
> > >>>> For shared mode, I assume the kernel sysfs counters which DPDK
> > >>>> relies on for imissed/out_of_buffer are not exposed.
> > >>>>
> > >>>> Best regards,
> > >>>>
> > >>>> Erez
> > >>>>
> > >>>> On Fri, 2 Jun 2023 at 18:07, Slava Ovsiienko
> > >>>> <viacheslavo@nvidia.com <mailto:viacheslavo@nvidia.com>> wrote:
> > >>>>
> > >>>> Hi, Daniel
> > >>>>
> > >>>> I would recommend to take the following action:
> > >>>>
> > >>>> - update the firmware, 16.33.xxxx looks to be outdated a little bit.
> > >>>> Please, try 16.35.1012 or later.
> > >>>> mlx5_glue->devx_obj_create might succeed with the newer FW.
> > >>>>
> > >>>> - try to specify dv_flow_en=0 devarg, it forces mlx5 PMD to
> > >>>> use
> > >>>> rdma_core library for queue management
> > >>>> and kernel driver will be aware about Rx queues being
> > >>>> created and
> > >>>> attach them to the kernel counter set
> > >>>>
> > >>>> With best regards,
> > >>>> Slava
> > >>>>
> > >>>> *From:*Daniel Östman <daniel.ostman@ericsson.com
> > >>>> <mailto:daniel.ostman@ericsson.com>>
> > >>>> *Sent:* Friday, June 2, 2023 3:59 PM
> > >>>> *To:* users@dpdk.org <mailto:users@dpdk.org>
> > >>>> *Cc:* Matan Azrad <matan@nvidia.com
> > >>>> <mailto:matan@nvidia.com>>;
> > >>>> Slava Ovsiienko <viacheslavo@nvidia.com
> > >>>> <mailto:viacheslavo@nvidia.com>>;
> maxime.coquelin@redhat.com
> > >>>> <mailto:maxime.coquelin@redhat.com>;
> > david.marchand@redhat.com
> > >>>> <mailto:david.marchand@redhat.com>
> > >>>> *Subject:* mlx5: imissed / out_of_buffer counter always 0
> > >>>>
> > >>>> Hi,
> > >>>>
> > >>>> I’m deploying a containerized DPDK application in an
> > >>>> OpenShift
> > >>>> Kubernetes environment using DPDK 21.11.3.
> > >>>>
> > >>>> The application uses a Mellanox ConnectX-5 100G NIC through
> VFs.
> > >>>>
> > >>>> The problem I have is that the ETH stats counter imissed
> > >>>> (which
> > >>>> seems to be mapped to “out_of_buffer” internally in mlx5 PMD
> > >>>> driver)
> > >>>> is 0 when I don’t expect it to be, i.e. when the application
> > >>>> doesn’t
> > >>>> read the packets fast enough.
> > >>>>
> > >>>> Using GDB I can see that it tries to access the counter
> > >>>> through
> > >>>>
> > >>>> /sys/class/infiniband/mlx5_99/ports/1/hw_counters/out_of_buffer
> > >>>> but
> > >>>> the hw_counters directory is missing so it will just return
> > >>>> a zero
> > >>>> value. I don’t know why it is missing.
> > >>>>
> > >>>> When looking at mlx5_os_read_dev_stat() I can see that there
> > >>>> is an
> > >>>> alternative way of reading the counter, through
> > >>>> mlx5_devx_cmd_queue_counter_query() but under the condition
> > >>>> that
> > >>>> priv->q_counters are set.
> > >>>>
> > >>>> It doesn’t get set in my case because
> > >>>> mlx5_glue->devx_obj_create()
> > >>>> fails (errno 22) in mlx5_devx_cmd_queue_counter_alloc().
> > >>>>
> > >>>> Have I missed something?
> > >>>>
> > >>>> NIC info:
> > >>>>
> > >>>> Mellanox Technologies MT27800 Family [ConnectX-5] - 100Gb
> > >>>> 2-port
> > >>>> QSFP28 MCX516A-CCHT
> > >>>> driver: mlx5_core
> > >>>> version: 5.0-0
> > >>>> firmware-version: 16.33.1048 (MT_0000000417)
> > >>>>
> > >>>> Please let me know if I need to provide more information.
> > >>>>
> > >>>> Best regards,
> > >>>>
> > >>>> Daniel
> > >>>>
^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: mlx5: imissed / out_of_buffer counter always 0
2023-11-09 14:51 ` Slava Ovsiienko
@ 2023-12-06 12:40 ` Daniel Östman
0 siblings, 0 replies; 11+ messages in thread
From: Daniel Östman @ 2023-12-06 12:40 UTC (permalink / raw)
To: Slava Ovsiienko, Maxime Coquelin, Erez Ferber
Cc: users, Matan Azrad, david.marchand, Maayan Kashani, Bing Zhao
Hi Slava,
From our previous tests it seems like RAWIO is indeed required for DevX approach.
Do you have any MLX kernel driver developers to ask why that is required and if there are any plans
in changing that behaviour for retrieving this particular counter?
We can get this counter from Intel NICs without escalated permissions, so it's a bit sad
that the technical solution used here does not allow this.
Best regards,
Daniel
> -----Original Message-----
> From: Slava Ovsiienko <viacheslavo@nvidia.com>
> Sent: Thursday, 9 November 2023 15:51
> To: Daniel Östman <daniel.ostman@ericsson.com>; Maxime Coquelin
> <maxime.coquelin@redhat.com>; Erez Ferber <erezferber@gmail.com>
> Cc: users@dpdk.org; Matan Azrad <matan@nvidia.com>;
> david.marchand@redhat.com; Maayan Kashani <mkashani@nvidia.com>;
> Bing Zhao <bingz@nvidia.com>
> Subject: RE: mlx5: imissed / out_of_buffer counter always 0
>
> Hi,
>
> Sorry for the late response.
>
> The missed packets are counter by internal NIC counter, attached to Rx
> queues.
> mlx5 can use 2 different approaches to create RxQ - either via rdma-core API
> (also known as "Verbs"), or using direct FW calls (sure, via kernel thunks)
> (also known as "DevX").
> The way PMD chooses mostly depends on the dv_flow_en devargs, for the
> dv_flow_en=0, the legacy "Verbs" way is engaged.
>
> Once RxQ is being created with Verbs, the kernel can automatically attach
> the "out-of-buf" internal counter to the queue being created. For the DevX
> approach PMD should take explicit care about - allocate the counter and
> attach queues to it.
>
> If you are OK with mlx5_devx_cmd_queue_counter_query() (used for DevX)
> - try to force this way with dv_flow_en=1 (IIRC, this is a default since 20.02).
> I'm no so familiar if kernel requires RAWIO permission for this call.
>
> As for the sysfs entry access (used for Verbs) - it is unlikely will be changed.
>
> With best regards,
> Slava
>
> > -----Original Message-----
> > From: Daniel Östman <daniel.ostman@ericsson.com>
> > Sent: Wednesday, November 8, 2023 2:56 PM
> > To: Maxime Coquelin <maxime.coquelin@redhat.com>; Erez Ferber
> > <erezferber@gmail.com>; Slava Ovsiienko <viacheslavo@nvidia.com>
> > Cc: users@dpdk.org; Matan Azrad <matan@nvidia.com>;
> > david.marchand@redhat.com
> > Subject: RE: mlx5: imissed / out_of_buffer counter always 0
> >
> > Hi,
> >
> > Any input from Nvidia on this? Matan perhaps?
> > The question here is if it's expected to require capability SYS_RAWIO
> > just to get the out of buffer counter?
> > If so, any plans on changing that?
> >
> > Best regards,
> > Daniel
> >
> > > -----Original Message-----
> > > From: Maxime Coquelin <maxime.coquelin@redhat.com>
> > > Sent: Wednesday, 4 October 2023 15:49
> > > To: Daniel Östman <daniel.ostman@ericsson.com>; Erez Ferber
> > > <erezferber@gmail.com>; Slava Ovsiienko <viacheslavo@nvidia.com>
> > > Cc: users@dpdk.org; Matan Azrad <matan@nvidia.com>;
> > > david.marchand@redhat.com
> > > Subject: Re: mlx5: imissed / out_of_buffer counter always 0
> > >
> > > Hi Daniel, Erez & Slava,
> > >
> > > My time to be sorry, I missed this email when coming back from vacation.
> > >
> > > On 8/18/23 14:04, Daniel Östman wrote:
> > > > Hi Maxime,
> > > >
> > > > Sorry for the late reply, I've been on vacation.
> > > > Please see my answer below.
> > > >
> > > > / Daniel
> > > >
> > > >> -----Original Message-----
> > > >> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> > > >> Sent: Thursday, 22 June 2023 17:48
> > > >> To: Daniel Östman <daniel.ostman@ericsson.com>; Erez Ferber
> > > >> <erezferber@gmail.com>; Slava Ovsiienko <viacheslavo@nvidia.com>
> > > >> Cc: users@dpdk.org; Matan Azrad <matan@nvidia.com>;
> > > >> david.marchand@redhat.com
> > > >> Subject: Re: mlx5: imissed / out_of_buffer counter always 0
> > > >>
> > > >> Hi,
> > > >>
> > > >> On 6/21/23 22:22, Maxime Coquelin wrote:
> > > >>> Hi Daniel, all,
> > > >>>
> > > >>> On 6/5/23 16:00, Daniel Östman wrote:
> > > >>>> Hi Slava and Erez and thanks for your answers,
> > > >>>>
> > > >>>> Regarding the firmware, I’ve also deployed in a different
> > > >>>> OpenShift cluster were I see the exact same issue but with a
> > > >>>> different Mellanox
> > > >>>> NIC:
> > > >>>>
> > > >>>> Mellanox Technologies MT2892 Family - ConnectX-6 DX 2-port
> > 100GbE
> > > >>>> QSFP56 PCIe Adapter
> > > >>>>
> > > >>>> driver: mlx5_core
> > > >>>>
> > > >>>> version: 5.0-0
> > > >>>> firmware-version: 22.36.1010 (DEL0000000027)
> > > >>>>
> > > >>>> From what I can see the firmware is relatively new on that one?
> > > >>>
> > > >>> With below configuration:
> > > >>> - ConnectX-6 Dx MT2892
> > > >>> - Kernel: 6.4.0-rc6
> > > >>> - FW version: 22.35.1012 (MT_0000000528)
> > > >>>
> > > >>> The out-of-buffer counter is fetched via
> > > >>> mlx5_devx_cmd_queue_counter_query():
> > > >>>
> > > >>> [pid 2942] ioctl(17, RDMA_VERBS_IOCTL, 0x7ffcb15bcd10) = 0 [pid
> > > >>> 2942] write(1, "\n ######################## NIC "..., 80) = 80
> > > >>> [pid 2942] write(1, " RX-packets: 630997736 RX-miss"..., 70) =
> > > >>> 70 [pid 2942] write(1, " RX-errors: 0\n", 15) = 15 [pid 2942] write(1, "
> > > >>> RX-nombuf: 0 \n", 25) = 25 [pid 2942] write(1, "
> > > >>> TX-packets: 0 TX-erro"..., 60) = 60 [pid 2942]
> > > >>> write(1, "\n", 1) = 1 [pid 2942] write(1, "
> > > >>> Throughput (since last show)\n", 31) = 31 [pid 2942] write(1, " Rx-
> pps:
> > > >>> 0 "..., 106) = 106 [pid 2942] write(1,"
> > > >>> ##############################"..., 79) = 79
> > > >>>
> > > >>> It looks like we may miss some mlx5 kernel patches so that we
> > > >>> can use
> > > >>> mlx5_devx_cmd_queue_counter_query() with RHEL?
> > > >>>
> > > >>> Erez, Slava, any idea on the patches that could be missing?
> > > >>
> > > >> Above test was on baremetal as root, I get the same "working"
> > > >> behaviour on RHEL as root.
> > > >>
> > > >> We managed to reproduce Daniel's with running the same within a
> > > >> container, enabling debug logs we have this warning:
> > > >>
> > > >> mlx5_common: DevX create q counter set failed errno=121
> > > >> status=0x2
> > > >> syndrome=0x8975f1
> > > >> mlx5_net: Port 0 queue counter object cannot be created by DevX -
> > > >> fall-back to use the kernel driver global queue counter.
> > > >>
> > > >> Running the container as privileged solves the issue, and so does
> > > >> when adding SYS_RAWIO capability to the container.
> > > >>
> > > >> Erez, Slava, is that expected to require SYS_RAWIO just to get a
> > > >> stat
> > > counter?
> > >
> > > Erez & Slava, could it be possible to get the stats counters via
> > > devx without requiring SYS_RAWIO?
> > >
> > > >>
> > > >> Daniel, could you try adding SYS_RAWIO to your pod to confirm you
> > > >> face the same issue?
> > > >
> > > > Yes I can confirm what you are seeing when running in a cluster
> > > > with
> > > Openshift 4.12 (RHEL 8.6) and with SYS_RAWIO or running as privileged.
> > > > But with privileged container I also need to run with UID 0 for it
> > > > to work, is
> > > that what you are doing as well?
> > >
> > > I don't have an OCP setup at hand right now to test it, but IIRC yes
> > > we ran it with UID 0.
> > >
> > > > In both these cases the counter can be successfully retrieved
> > > > through the
> > > DevX interface.
> > >
> > > Ok.
> > >
> > > > However, when running in a cluster with Openshift 4.10 (RHEL 8.4)
> > > > I can not
> > > get it to work with any of these two approaches.
> > >
> > > I'm not sure this is Kernel related, as I tested on both RHEL-8.4.0
> > > and latest
> > > RHEL_8.4 and I can get que q counters via ioctl().
> > >
> > > Maxime
> > >
> > > >> Thanks in advance,
> > > >> Maxime
> > > >>> Regards,
> > > >>> Maxime
> > > >>>
> > > >>>>
> > > >>>> I tried setting dv_flow_en=0 (and saw that it was propagated to
> > > >>>> config->dv_flow_en) but it didn’t seem to help.
> > > >>>>
> > > >>>> Erez, I’m not sure what you mean by shared or non-shared mode
> > > >>>> in this case, however it seems it could be related to the fact
> > > >>>> that the container is running in a separate network namespace.
> > > >>>> Because the hw_counter directory is available on the host
> > > >>>> (cluster node), but not in the pod container.
> > > >>>>
> > > >>>> Best regards,
> > > >>>>
> > > >>>> Daniel
> > > >>>>
> > > >>>> *From:*Erez Ferber <erezferber@gmail.com>
> > > >>>> *Sent:* Monday, 5 June 2023 12:29
> > > >>>> *To:* Slava Ovsiienko <viacheslavo@nvidia.com>
> > > >>>> *Cc:* Daniel Östman <daniel.ostman@ericsson.com>;
> > > users@dpdk.org;
> > > >>>> Matan Azrad <matan@nvidia.com>;
> maxime.coquelin@redhat.com;
> > > >>>> david.marchand@redhat.com
> > > >>>> *Subject:* Re: mlx5: imissed / out_of_buffer counter always 0
> > > >>>>
> > > >>>> Hi Daniel,
> > > >>>>
> > > >>>> is the container running in shared or non-shared mode ?
> > > >>>>
> > > >>>> For shared mode, I assume the kernel sysfs counters which DPDK
> > > >>>> relies on for imissed/out_of_buffer are not exposed.
> > > >>>>
> > > >>>> Best regards,
> > > >>>>
> > > >>>> Erez
> > > >>>>
> > > >>>> On Fri, 2 Jun 2023 at 18:07, Slava Ovsiienko
> > > >>>> <viacheslavo@nvidia.com <mailto:viacheslavo@nvidia.com>>
> wrote:
> > > >>>>
> > > >>>> Hi, Daniel
> > > >>>>
> > > >>>> I would recommend to take the following action:
> > > >>>>
> > > >>>> - update the firmware, 16.33.xxxx looks to be outdated a little
> bit.
> > > >>>> Please, try 16.35.1012 or later.
> > > >>>> mlx5_glue->devx_obj_create might succeed with the newer
> FW.
> > > >>>>
> > > >>>> - try to specify dv_flow_en=0 devarg, it forces mlx5 PMD
> > > >>>> to use
> > > >>>> rdma_core library for queue management
> > > >>>> and kernel driver will be aware about Rx queues being
> > > >>>> created and
> > > >>>> attach them to the kernel counter set
> > > >>>>
> > > >>>> With best regards,
> > > >>>> Slava
> > > >>>>
> > > >>>> *From:*Daniel Östman <daniel.ostman@ericsson.com
> > > >>>> <mailto:daniel.ostman@ericsson.com>>
> > > >>>> *Sent:* Friday, June 2, 2023 3:59 PM
> > > >>>> *To:* users@dpdk.org <mailto:users@dpdk.org>
> > > >>>> *Cc:* Matan Azrad <matan@nvidia.com
> > > >>>> <mailto:matan@nvidia.com>>;
> > > >>>> Slava Ovsiienko <viacheslavo@nvidia.com
> > > >>>> <mailto:viacheslavo@nvidia.com>>;
> > maxime.coquelin@redhat.com
> > > >>>> <mailto:maxime.coquelin@redhat.com>;
> > > david.marchand@redhat.com
> > > >>>> <mailto:david.marchand@redhat.com>
> > > >>>> *Subject:* mlx5: imissed / out_of_buffer counter always 0
> > > >>>>
> > > >>>> Hi,
> > > >>>>
> > > >>>> I’m deploying a containerized DPDK application in an
> > > >>>> OpenShift
> > > >>>> Kubernetes environment using DPDK 21.11.3.
> > > >>>>
> > > >>>> The application uses a Mellanox ConnectX-5 100G NIC
> > > >>>> through
> > VFs.
> > > >>>>
> > > >>>> The problem I have is that the ETH stats counter imissed
> > > >>>> (which
> > > >>>> seems to be mapped to “out_of_buffer” internally in mlx5
> > > >>>> PMD
> > > >>>> driver)
> > > >>>> is 0 when I don’t expect it to be, i.e. when the
> > > >>>> application doesn’t
> > > >>>> read the packets fast enough.
> > > >>>>
> > > >>>> Using GDB I can see that it tries to access the counter
> > > >>>> through
> > > >>>>
> > > >>>>
> /sys/class/infiniband/mlx5_99/ports/1/hw_counters/out_of_buffer
> > > >>>> but
> > > >>>> the hw_counters directory is missing so it will just
> > > >>>> return a zero
> > > >>>> value. I don’t know why it is missing.
> > > >>>>
> > > >>>> When looking at mlx5_os_read_dev_stat() I can see that
> > > >>>> there is an
> > > >>>> alternative way of reading the counter, through
> > > >>>> mlx5_devx_cmd_queue_counter_query() but under the
> > > >>>> condition that
> > > >>>> priv->q_counters are set.
> > > >>>>
> > > >>>> It doesn’t get set in my case because
> > > >>>> mlx5_glue->devx_obj_create()
> > > >>>> fails (errno 22) in mlx5_devx_cmd_queue_counter_alloc().
> > > >>>>
> > > >>>> Have I missed something?
> > > >>>>
> > > >>>> NIC info:
> > > >>>>
> > > >>>> Mellanox Technologies MT27800 Family [ConnectX-5] - 100Gb
> > > >>>> 2-port
> > > >>>> QSFP28 MCX516A-CCHT
> > > >>>> driver: mlx5_core
> > > >>>> version: 5.0-0
> > > >>>> firmware-version: 16.33.1048 (MT_0000000417)
> > > >>>>
> > > >>>> Please let me know if I need to provide more information.
> > > >>>>
> > > >>>> Best regards,
> > > >>>>
> > > >>>> Daniel
> > > >>>>
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2023-12-06 12:40 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-02 12:59 mlx5: imissed / out_of_buffer counter always 0 Daniel Östman
2023-06-02 15:07 ` Slava Ovsiienko
2023-06-05 10:29 ` Erez Ferber
2023-06-05 14:00 ` Daniel Östman
2023-06-21 20:22 ` Maxime Coquelin
2023-06-22 15:47 ` Maxime Coquelin
2023-08-18 12:04 ` Daniel Östman
2023-10-04 13:49 ` Maxime Coquelin
2023-11-08 12:55 ` Daniel Östman
2023-11-09 14:51 ` Slava Ovsiienko
2023-12-06 12:40 ` Daniel Östman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).