DPDK usage discussions
 help / color / mirror / Atom feed
From: Maxime Coquelin <maxime.coquelin@redhat.com>
To: "Daniel Östman" <daniel.ostman@ericsson.com>,
	"Erez Ferber" <erezferber@gmail.com>,
	"Slava Ovsiienko" <viacheslavo@nvidia.com>
Cc: "users@dpdk.org" <users@dpdk.org>, Matan Azrad <matan@nvidia.com>,
	"david.marchand@redhat.com" <david.marchand@redhat.com>
Subject: Re: mlx5: imissed / out_of_buffer counter always 0
Date: Thu, 22 Jun 2023 17:47:39 +0200	[thread overview]
Message-ID: <e8138048-5065-96ce-c6ea-1d72121e2b8f@redhat.com> (raw)
In-Reply-To: <5d9ae8ec-450a-c411-c044-577f00b127f5@redhat.com>

Hi,

On 6/21/23 22:22, Maxime Coquelin wrote:
> Hi Daniel, all,
> 
> On 6/5/23 16:00, Daniel Östman wrote:
>> Hi Slava and Erez and thanks for your answers,
>>
>> Regarding the firmware, I’ve also deployed in a different OpenShift 
>> cluster were I see the exact same issue but with a different Mellanox 
>> NIC:
>>
>> Mellanox Technologies MT2892 Family - ConnectX-6 DX 2-port 100GbE 
>> QSFP56 PCIe Adapter
>>
>> driver: mlx5_core
>>
>> version: 5.0-0
>> firmware-version: 22.36.1010 (DEL0000000027)
>>
>>  From what I can see the firmware is relatively new on that one?
> 
> With below configuration:
> - ConnectX-6 Dx MT2892
> - Kernel: 6.4.0-rc6
> - FW version: 22.35.1012 (MT_0000000528)
> 
> The out-of-buffer counter is fetched via 
> mlx5_devx_cmd_queue_counter_query():
> 
> [pid  2942] ioctl(17, RDMA_VERBS_IOCTL, 0x7ffcb15bcd10) = 0
> [pid  2942] write(1, "\n  ######################## NIC "..., 80) = 80
> [pid  2942] write(1, "  RX-packets: 630997736  RX-miss"..., 70) = 70
> [pid  2942] write(1, "  RX-errors: 0\n", 15) = 15
> [pid  2942] write(1, "  RX-nombuf:  0         \n", 25) = 25
> [pid  2942] write(1, "  TX-packets: 0          TX-erro"..., 60) = 60
> [pid  2942] write(1, "\n", 1)           = 1
> [pid  2942] write(1, "  Throughput (since last show)\n", 31) = 31
> [pid  2942] write(1, "  Rx-pps:            0          "..., 106) = 106
> [pid  2942] write(1, "  ##############################"..., 79) = 79
> 
> It looks like we may miss some mlx5 kernel patches so that we can use 
> mlx5_devx_cmd_queue_counter_query() with RHEL?
> 
> Erez, Slava, any idea on the patches that could be missing?

Above test was on baremetal as root, I get the same "working" behaviour
on RHEL as root.

We managed to reproduce Daniel's with running the same within a
container, enabling debug logs we have this warning:

mlx5_common: DevX create q counter set failed errno=121 status=0x2 
syndrome=0x8975f1
mlx5_net: Port 0 queue counter object cannot be created by DevX - 
fall-back to use the kernel driver global queue counter.

Running the container as privileged solves the issue, and so does when
adding SYS_RAWIO capability to the container.

Erez, Slava, is that expected to require SYS_RAWIO just to get a stat
counter?

Daniel, could you try adding SYS_RAWIO to your pod to confirm you face
the same issue?

Thanks in advance,
Maxime
> Regards,
> Maxime
> 
>>
>> I tried setting dv_flow_en=0 (and saw that it was propagated to 
>> config->dv_flow_en) but it didn’t seem to help.
>>
>> Erez, I’m not sure what you mean by shared or non-shared mode in this 
>> case, however it seems it could be related to the fact that the 
>> container is running in a separate network namespace. Because the 
>> hw_counter directory is available on the host (cluster node), but not 
>> in the pod container.
>>
>> Best regards,
>>
>> Daniel
>>
>> *From:*Erez Ferber <erezferber@gmail.com>
>> *Sent:* Monday, 5 June 2023 12:29
>> *To:* Slava Ovsiienko <viacheslavo@nvidia.com>
>> *Cc:* Daniel Östman <daniel.ostman@ericsson.com>; users@dpdk.org; 
>> Matan Azrad <matan@nvidia.com>; maxime.coquelin@redhat.com; 
>> david.marchand@redhat.com
>> *Subject:* Re: mlx5: imissed / out_of_buffer counter always 0
>>
>> Hi Daniel,
>>
>> is the container running in shared or non-shared mode ?
>>
>> For shared mode, I assume the kernel sysfs counters which DPDK relies 
>> on for imissed/out_of_buffer are not exposed.
>>
>> Best regards,
>>
>> Erez
>>
>> On Fri, 2 Jun 2023 at 18:07, Slava Ovsiienko <viacheslavo@nvidia.com 
>> <mailto:viacheslavo@nvidia.com>> wrote:
>>
>>     Hi, Daniel
>>
>>     I would recommend to take the following action:
>>
>>     - update the firmware, 16.33.xxxx looks to be outdated a little bit.
>>     Please, try 16.35.1012 or later.
>>        mlx5_glue->devx_obj_create might succeed with the newer FW.
>>
>>     - try to specify dv_flow_en=0 devarg, it forces mlx5 PMD to use
>>     rdma_core library for queue management
>>       and kernel driver will  be aware about Rx queues being created and
>>     attach them to the kernel counter set
>>
>>     With best regards,
>>     Slava
>>
>>     *From:*Daniel Östman <daniel.ostman@ericsson.com
>>     <mailto:daniel.ostman@ericsson.com>>
>>     *Sent:* Friday, June 2, 2023 3:59 PM
>>     *To:* users@dpdk.org <mailto:users@dpdk.org>
>>     *Cc:* Matan Azrad <matan@nvidia.com <mailto:matan@nvidia.com>>;
>>     Slava Ovsiienko <viacheslavo@nvidia.com
>>     <mailto:viacheslavo@nvidia.com>>; maxime.coquelin@redhat.com
>>     <mailto:maxime.coquelin@redhat.com>; david.marchand@redhat.com
>>     <mailto:david.marchand@redhat.com>
>>     *Subject:* mlx5: imissed / out_of_buffer counter always 0
>>
>>     Hi,
>>
>>     I’m deploying a containerized DPDK application in an OpenShift
>>     Kubernetes environment using DPDK 21.11.3.
>>
>>     The application uses a Mellanox ConnectX-5 100G NIC through VFs.
>>
>>     The problem I have is that the ETH stats counter imissed (which
>>     seems to be mapped to “out_of_buffer” internally in mlx5 PMD driver)
>>     is 0 when I don’t expect it to be, i.e. when the application doesn’t
>>     read the packets fast enough.
>>
>>     Using GDB I can see that it tries to access the counter through
>>     /sys/class/infiniband/mlx5_99/ports/1/hw_counters/out_of_buffer but
>>     the hw_counters directory is missing so it will just return a zero
>>     value. I don’t know why it is missing.
>>
>>     When looking at mlx5_os_read_dev_stat() I can see that there is an
>>     alternative way of reading the counter, through
>>     mlx5_devx_cmd_queue_counter_query() but under the condition that
>>     priv->q_counters are set.
>>
>>     It doesn’t get set in my case because mlx5_glue->devx_obj_create()
>>     fails (errno 22) in mlx5_devx_cmd_queue_counter_alloc().
>>
>>     Have I missed something?
>>
>>     NIC info:
>>
>>     Mellanox Technologies MT27800 Family [ConnectX-5] - 100Gb 2-port
>>     QSFP28 MCX516A-CCHT
>>     driver: mlx5_core
>>     version: 5.0-0
>>     firmware-version: 16.33.1048 (MT_0000000417)
>>
>>     Please let me know if I need to provide more information.
>>
>>     Best regards,
>>
>>     Daniel
>>


  reply	other threads:[~2023-06-22 15:48 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-02 12:59 Daniel Östman
2023-06-02 15:07 ` Slava Ovsiienko
2023-06-05 10:29   ` Erez Ferber
2023-06-05 14:00     ` Daniel Östman
2023-06-21 20:22       ` Maxime Coquelin
2023-06-22 15:47         ` Maxime Coquelin [this message]
2023-08-18 12:04           ` Daniel Östman
2023-10-04 13:49             ` Maxime Coquelin
2023-11-08 12:55               ` Daniel Östman
2023-11-09 14:51                 ` Slava Ovsiienko
2023-12-06 12:40                   ` Daniel Östman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e8138048-5065-96ce-c6ea-1d72121e2b8f@redhat.com \
    --to=maxime.coquelin@redhat.com \
    --cc=daniel.ostman@ericsson.com \
    --cc=david.marchand@redhat.com \
    --cc=erezferber@gmail.com \
    --cc=matan@nvidia.com \
    --cc=users@dpdk.org \
    --cc=viacheslavo@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).