DPDK usage discussions
 help / color / mirror / Atom feed
From: Slava Ovsiienko <viacheslavo@nvidia.com>
To: "Daniel Östman" <daniel.ostman@ericsson.com>,
	"Maxime Coquelin" <maxime.coquelin@redhat.com>,
	"Erez Ferber" <erezferber@gmail.com>
Cc: "users@dpdk.org" <users@dpdk.org>, Matan Azrad <matan@nvidia.com>,
	"david.marchand@redhat.com" <david.marchand@redhat.com>,
	Maayan Kashani <mkashani@nvidia.com>,
	Bing Zhao <bingz@nvidia.com>
Subject: RE: mlx5: imissed / out_of_buffer counter always 0
Date: Thu, 9 Nov 2023 14:51:18 +0000	[thread overview]
Message-ID: <DM6PR12MB375322D7BDE947333FCF8842DFAFA@DM6PR12MB3753.namprd12.prod.outlook.com> (raw)
In-Reply-To: <PAVPR07MB9310674B4A4E37F9AFCCE40886A8A@PAVPR07MB9310.eurprd07.prod.outlook.com>

Hi,

Sorry for the late response.

The missed packets are counter by internal NIC counter, attached to Rx queues.
mlx5 can use 2 different approaches to create RxQ - either via rdma-core API (also known as "Verbs"),
or using direct FW calls (sure, via kernel thunks) (also known as "DevX").
The way PMD chooses mostly depends on the dv_flow_en devargs, for the dv_flow_en=0, the
legacy "Verbs" way is engaged.

Once RxQ is being created with Verbs, the kernel can automatically attach the "out-of-buf" internal
counter to the queue being created. For the DevX approach PMD should take explicit care about -
allocate the counter and attach queues to it.

If you are OK with mlx5_devx_cmd_queue_counter_query() (used for DevX) - try to force
this way with dv_flow_en=1 (IIRC, this is a default since 20.02). I'm no so familiar if kernel
requires RAWIO permission for this call.

As for the  sysfs entry access (used for Verbs) - it is unlikely will be changed.

With best regards,
Slava

> -----Original Message-----
> From: Daniel Östman <daniel.ostman@ericsson.com>
> Sent: Wednesday, November 8, 2023 2:56 PM
> To: Maxime Coquelin <maxime.coquelin@redhat.com>; Erez Ferber
> <erezferber@gmail.com>; Slava Ovsiienko <viacheslavo@nvidia.com>
> Cc: users@dpdk.org; Matan Azrad <matan@nvidia.com>;
> david.marchand@redhat.com
> Subject: RE: mlx5: imissed / out_of_buffer counter always 0
> 
> Hi,
> 
> Any input from Nvidia on this? Matan perhaps?
> The question here is if it's expected to require capability SYS_RAWIO just to
> get the out of buffer counter?
> If so, any plans on changing that?
> 
> Best regards,
> Daniel
> 
> > -----Original Message-----
> > From: Maxime Coquelin <maxime.coquelin@redhat.com>
> > Sent: Wednesday, 4 October 2023 15:49
> > To: Daniel Östman <daniel.ostman@ericsson.com>; Erez Ferber
> > <erezferber@gmail.com>; Slava Ovsiienko <viacheslavo@nvidia.com>
> > Cc: users@dpdk.org; Matan Azrad <matan@nvidia.com>;
> > david.marchand@redhat.com
> > Subject: Re: mlx5: imissed / out_of_buffer counter always 0
> >
> > Hi Daniel, Erez & Slava,
> >
> > My time to be sorry, I missed this email when coming back from vacation.
> >
> > On 8/18/23 14:04, Daniel Östman wrote:
> > > Hi Maxime,
> > >
> > > Sorry for the late reply, I've been on vacation.
> > > Please see my answer below.
> > >
> > > / Daniel
> > >
> > >> -----Original Message-----
> > >> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> > >> Sent: Thursday, 22 June 2023 17:48
> > >> To: Daniel Östman <daniel.ostman@ericsson.com>; Erez Ferber
> > >> <erezferber@gmail.com>; Slava Ovsiienko <viacheslavo@nvidia.com>
> > >> Cc: users@dpdk.org; Matan Azrad <matan@nvidia.com>;
> > >> david.marchand@redhat.com
> > >> Subject: Re: mlx5: imissed / out_of_buffer counter always 0
> > >>
> > >> Hi,
> > >>
> > >> On 6/21/23 22:22, Maxime Coquelin wrote:
> > >>> Hi Daniel, all,
> > >>>
> > >>> On 6/5/23 16:00, Daniel Östman wrote:
> > >>>> Hi Slava and Erez and thanks for your answers,
> > >>>>
> > >>>> Regarding the firmware, I’ve also deployed in a different
> > >>>> OpenShift cluster were I see the exact same issue but with a
> > >>>> different Mellanox
> > >>>> NIC:
> > >>>>
> > >>>> Mellanox Technologies MT2892 Family - ConnectX-6 DX 2-port
> 100GbE
> > >>>> QSFP56 PCIe Adapter
> > >>>>
> > >>>> driver: mlx5_core
> > >>>>
> > >>>> version: 5.0-0
> > >>>> firmware-version: 22.36.1010 (DEL0000000027)
> > >>>>
> > >>>>   From what I can see the firmware is relatively new on that one?
> > >>>
> > >>> With below configuration:
> > >>> - ConnectX-6 Dx MT2892
> > >>> - Kernel: 6.4.0-rc6
> > >>> - FW version: 22.35.1012 (MT_0000000528)
> > >>>
> > >>> The out-of-buffer counter is fetched via
> > >>> mlx5_devx_cmd_queue_counter_query():
> > >>>
> > >>> [pid  2942] ioctl(17, RDMA_VERBS_IOCTL, 0x7ffcb15bcd10) = 0 [pid
> > >>> 2942] write(1, "\n  ######################## NIC "..., 80) = 80
> > >>> [pid 2942] write(1, "  RX-packets: 630997736  RX-miss"..., 70) =
> > >>> 70 [pid 2942] write(1, "  RX-errors: 0\n", 15) = 15 [pid  2942] write(1, "
> > >>> RX-nombuf:  0         \n", 25) = 25 [pid  2942] write(1, "
> > >>> TX-packets: 0          TX-erro"..., 60) = 60 [pid  2942] write(1,
> > >>> "\n", 1)           = 1 [pid  2942] write(1, "  Throughput (since
> > >>> last show)\n", 31) = 31 [pid  2942] write(1, "  Rx-pps:
> > >>> 0 "..., 106) = 106 [pid  2942] write(1,"
> > >>> ##############################"..., 79) = 79
> > >>>
> > >>> It looks like we may miss some mlx5 kernel patches so that we can
> > >>> use
> > >>> mlx5_devx_cmd_queue_counter_query() with RHEL?
> > >>>
> > >>> Erez, Slava, any idea on the patches that could be missing?
> > >>
> > >> Above test was on baremetal as root, I get the same "working"
> > >> behaviour on RHEL as root.
> > >>
> > >> We managed to reproduce Daniel's with running the same within a
> > >> container, enabling debug logs we have this warning:
> > >>
> > >> mlx5_common: DevX create q counter set failed errno=121 status=0x2
> > >> syndrome=0x8975f1
> > >> mlx5_net: Port 0 queue counter object cannot be created by DevX -
> > >> fall-back to use the kernel driver global queue counter.
> > >>
> > >> Running the container as privileged solves the issue, and so does
> > >> when adding SYS_RAWIO capability to the container.
> > >>
> > >> Erez, Slava, is that expected to require SYS_RAWIO just to get a
> > >> stat
> > counter?
> >
> > Erez & Slava, could it be possible to get the stats counters via devx
> > without requiring SYS_RAWIO?
> >
> > >>
> > >> Daniel, could you try adding SYS_RAWIO to your pod to confirm you
> > >> face the same issue?
> > >
> > > Yes I can confirm what you are seeing when running in a cluster with
> > Openshift 4.12 (RHEL 8.6) and with SYS_RAWIO or running as privileged.
> > > But with privileged container I also need to run with UID 0 for it
> > > to work, is
> > that what you are doing as well?
> >
> > I don't have an OCP setup at hand right now to test it, but IIRC yes
> > we ran it with UID 0.
> >
> > > In both these cases the counter can be successfully retrieved
> > > through the
> > DevX interface.
> >
> > Ok.
> >
> > > However, when running in a cluster with Openshift 4.10 (RHEL 8.4) I
> > > can not
> > get it to work with any of these two approaches.
> >
> > I'm not sure this is Kernel related, as I tested on both RHEL-8.4.0
> > and latest
> > RHEL_8.4 and I can get que q counters via ioctl().
> >
> > Maxime
> >
> > >> Thanks in advance,
> > >> Maxime
> > >>> Regards,
> > >>> Maxime
> > >>>
> > >>>>
> > >>>> I tried setting dv_flow_en=0 (and saw that it was propagated to
> > >>>> config->dv_flow_en) but it didn’t seem to help.
> > >>>>
> > >>>> Erez, I’m not sure what you mean by shared or non-shared mode in
> > >>>> this case, however it seems it could be related to the fact that
> > >>>> the container is running in a separate network namespace. Because
> > >>>> the hw_counter directory is available on the host (cluster node),
> > >>>> but not in the pod container.
> > >>>>
> > >>>> Best regards,
> > >>>>
> > >>>> Daniel
> > >>>>
> > >>>> *From:*Erez Ferber <erezferber@gmail.com>
> > >>>> *Sent:* Monday, 5 June 2023 12:29
> > >>>> *To:* Slava Ovsiienko <viacheslavo@nvidia.com>
> > >>>> *Cc:* Daniel Östman <daniel.ostman@ericsson.com>;
> > users@dpdk.org;
> > >>>> Matan Azrad <matan@nvidia.com>; maxime.coquelin@redhat.com;
> > >>>> david.marchand@redhat.com
> > >>>> *Subject:* Re: mlx5: imissed / out_of_buffer counter always 0
> > >>>>
> > >>>> Hi Daniel,
> > >>>>
> > >>>> is the container running in shared or non-shared mode ?
> > >>>>
> > >>>> For shared mode, I assume the kernel sysfs counters which DPDK
> > >>>> relies on for imissed/out_of_buffer are not exposed.
> > >>>>
> > >>>> Best regards,
> > >>>>
> > >>>> Erez
> > >>>>
> > >>>> On Fri, 2 Jun 2023 at 18:07, Slava Ovsiienko
> > >>>> <viacheslavo@nvidia.com <mailto:viacheslavo@nvidia.com>> wrote:
> > >>>>
> > >>>>      Hi, Daniel
> > >>>>
> > >>>>      I would recommend to take the following action:
> > >>>>
> > >>>>      - update the firmware, 16.33.xxxx looks to be outdated a little bit.
> > >>>>      Please, try 16.35.1012 or later.
> > >>>>         mlx5_glue->devx_obj_create might succeed with the newer FW.
> > >>>>
> > >>>>      - try to specify dv_flow_en=0 devarg, it forces mlx5 PMD to
> > >>>> use
> > >>>>      rdma_core library for queue management
> > >>>>        and kernel driver will  be aware about Rx queues being
> > >>>> created and
> > >>>>      attach them to the kernel counter set
> > >>>>
> > >>>>      With best regards,
> > >>>>      Slava
> > >>>>
> > >>>>      *From:*Daniel Östman <daniel.ostman@ericsson.com
> > >>>>      <mailto:daniel.ostman@ericsson.com>>
> > >>>>      *Sent:* Friday, June 2, 2023 3:59 PM
> > >>>>      *To:* users@dpdk.org <mailto:users@dpdk.org>
> > >>>>      *Cc:* Matan Azrad <matan@nvidia.com
> > >>>> <mailto:matan@nvidia.com>>;
> > >>>>      Slava Ovsiienko <viacheslavo@nvidia.com
> > >>>>      <mailto:viacheslavo@nvidia.com>>;
> maxime.coquelin@redhat.com
> > >>>>      <mailto:maxime.coquelin@redhat.com>;
> > david.marchand@redhat.com
> > >>>>      <mailto:david.marchand@redhat.com>
> > >>>>      *Subject:* mlx5: imissed / out_of_buffer counter always 0
> > >>>>
> > >>>>      Hi,
> > >>>>
> > >>>>      I’m deploying a containerized DPDK application in an
> > >>>> OpenShift
> > >>>>      Kubernetes environment using DPDK 21.11.3.
> > >>>>
> > >>>>      The application uses a Mellanox ConnectX-5 100G NIC through
> VFs.
> > >>>>
> > >>>>      The problem I have is that the ETH stats counter imissed
> > >>>> (which
> > >>>>      seems to be mapped to “out_of_buffer” internally in mlx5 PMD
> > >>>> driver)
> > >>>>      is 0 when I don’t expect it to be, i.e. when the application
> > >>>> doesn’t
> > >>>>      read the packets fast enough.
> > >>>>
> > >>>>      Using GDB I can see that it tries to access the counter
> > >>>> through
> > >>>>
> > >>>> /sys/class/infiniband/mlx5_99/ports/1/hw_counters/out_of_buffer
> > >>>> but
> > >>>>      the hw_counters directory is missing so it will just return
> > >>>> a zero
> > >>>>      value. I don’t know why it is missing.
> > >>>>
> > >>>>      When looking at mlx5_os_read_dev_stat() I can see that there
> > >>>> is an
> > >>>>      alternative way of reading the counter, through
> > >>>>      mlx5_devx_cmd_queue_counter_query() but under the condition
> > >>>> that
> > >>>>      priv->q_counters are set.
> > >>>>
> > >>>>      It doesn’t get set in my case because
> > >>>> mlx5_glue->devx_obj_create()
> > >>>>      fails (errno 22) in mlx5_devx_cmd_queue_counter_alloc().
> > >>>>
> > >>>>      Have I missed something?
> > >>>>
> > >>>>      NIC info:
> > >>>>
> > >>>>      Mellanox Technologies MT27800 Family [ConnectX-5] - 100Gb
> > >>>> 2-port
> > >>>>      QSFP28 MCX516A-CCHT
> > >>>>      driver: mlx5_core
> > >>>>      version: 5.0-0
> > >>>>      firmware-version: 16.33.1048 (MT_0000000417)
> > >>>>
> > >>>>      Please let me know if I need to provide more information.
> > >>>>
> > >>>>      Best regards,
> > >>>>
> > >>>>      Daniel
> > >>>>

  reply	other threads:[~2023-11-09 14:51 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-02 12:59 Daniel Östman
2023-06-02 15:07 ` Slava Ovsiienko
2023-06-05 10:29   ` Erez Ferber
2023-06-05 14:00     ` Daniel Östman
2023-06-21 20:22       ` Maxime Coquelin
2023-06-22 15:47         ` Maxime Coquelin
2023-08-18 12:04           ` Daniel Östman
2023-10-04 13:49             ` Maxime Coquelin
2023-11-08 12:55               ` Daniel Östman
2023-11-09 14:51                 ` Slava Ovsiienko [this message]
2023-12-06 12:40                   ` Daniel Östman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DM6PR12MB375322D7BDE947333FCF8842DFAFA@DM6PR12MB3753.namprd12.prod.outlook.com \
    --to=viacheslavo@nvidia.com \
    --cc=bingz@nvidia.com \
    --cc=daniel.ostman@ericsson.com \
    --cc=david.marchand@redhat.com \
    --cc=erezferber@gmail.com \
    --cc=matan@nvidia.com \
    --cc=maxime.coquelin@redhat.com \
    --cc=mkashani@nvidia.com \
    --cc=users@dpdk.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).