DPDK usage discussions
 help / color / mirror / Atom feed
From: "Daniel Östman" <daniel.ostman@ericsson.com>
To: Slava Ovsiienko <viacheslavo@nvidia.com>,
	Maxime Coquelin <maxime.coquelin@redhat.com>,
	Erez Ferber <erezferber@gmail.com>
Cc: "users@dpdk.org" <users@dpdk.org>, Matan Azrad <matan@nvidia.com>,
	"david.marchand@redhat.com" <david.marchand@redhat.com>,
	Maayan Kashani <mkashani@nvidia.com>,
	Bing Zhao <bingz@nvidia.com>
Subject: RE: mlx5: imissed / out_of_buffer counter always 0
Date: Wed, 6 Dec 2023 12:40:44 +0000	[thread overview]
Message-ID: <PAVPR07MB9310E3EC6E749D7B145E06A68684A@PAVPR07MB9310.eurprd07.prod.outlook.com> (raw)
In-Reply-To: <DM6PR12MB375322D7BDE947333FCF8842DFAFA@DM6PR12MB3753.namprd12.prod.outlook.com>

Hi Slava,

From our previous tests it seems like RAWIO is indeed required for DevX approach.

Do you have any MLX kernel driver developers to ask why that is required and if there are any plans
in changing that behaviour for retrieving this particular counter?
We can get this counter from Intel NICs without escalated permissions, so it's a bit sad
that the technical solution used here does not allow this.

Best regards,
Daniel

> -----Original Message-----
> From: Slava Ovsiienko <viacheslavo@nvidia.com>
> Sent: Thursday, 9 November 2023 15:51
> To: Daniel Östman <daniel.ostman@ericsson.com>; Maxime Coquelin
> <maxime.coquelin@redhat.com>; Erez Ferber <erezferber@gmail.com>
> Cc: users@dpdk.org; Matan Azrad <matan@nvidia.com>;
> david.marchand@redhat.com; Maayan Kashani <mkashani@nvidia.com>;
> Bing Zhao <bingz@nvidia.com>
> Subject: RE: mlx5: imissed / out_of_buffer counter always 0
> 
> Hi,
> 
> Sorry for the late response.
> 
> The missed packets are counter by internal NIC counter, attached to Rx
> queues.
> mlx5 can use 2 different approaches to create RxQ - either via rdma-core API
> (also known as "Verbs"), or using direct FW calls (sure, via kernel thunks)
> (also known as "DevX").
> The way PMD chooses mostly depends on the dv_flow_en devargs, for the
> dv_flow_en=0, the legacy "Verbs" way is engaged.
> 
> Once RxQ is being created with Verbs, the kernel can automatically attach
> the "out-of-buf" internal counter to the queue being created. For the DevX
> approach PMD should take explicit care about - allocate the counter and
> attach queues to it.
> 
> If you are OK with mlx5_devx_cmd_queue_counter_query() (used for DevX)
> - try to force this way with dv_flow_en=1 (IIRC, this is a default since 20.02).
> I'm no so familiar if kernel requires RAWIO permission for this call.
> 
> As for the  sysfs entry access (used for Verbs) - it is unlikely will be changed.
> 
> With best regards,
> Slava
> 
> > -----Original Message-----
> > From: Daniel Östman <daniel.ostman@ericsson.com>
> > Sent: Wednesday, November 8, 2023 2:56 PM
> > To: Maxime Coquelin <maxime.coquelin@redhat.com>; Erez Ferber
> > <erezferber@gmail.com>; Slava Ovsiienko <viacheslavo@nvidia.com>
> > Cc: users@dpdk.org; Matan Azrad <matan@nvidia.com>;
> > david.marchand@redhat.com
> > Subject: RE: mlx5: imissed / out_of_buffer counter always 0
> >
> > Hi,
> >
> > Any input from Nvidia on this? Matan perhaps?
> > The question here is if it's expected to require capability SYS_RAWIO
> > just to get the out of buffer counter?
> > If so, any plans on changing that?
> >
> > Best regards,
> > Daniel
> >
> > > -----Original Message-----
> > > From: Maxime Coquelin <maxime.coquelin@redhat.com>
> > > Sent: Wednesday, 4 October 2023 15:49
> > > To: Daniel Östman <daniel.ostman@ericsson.com>; Erez Ferber
> > > <erezferber@gmail.com>; Slava Ovsiienko <viacheslavo@nvidia.com>
> > > Cc: users@dpdk.org; Matan Azrad <matan@nvidia.com>;
> > > david.marchand@redhat.com
> > > Subject: Re: mlx5: imissed / out_of_buffer counter always 0
> > >
> > > Hi Daniel, Erez & Slava,
> > >
> > > My time to be sorry, I missed this email when coming back from vacation.
> > >
> > > On 8/18/23 14:04, Daniel Östman wrote:
> > > > Hi Maxime,
> > > >
> > > > Sorry for the late reply, I've been on vacation.
> > > > Please see my answer below.
> > > >
> > > > / Daniel
> > > >
> > > >> -----Original Message-----
> > > >> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> > > >> Sent: Thursday, 22 June 2023 17:48
> > > >> To: Daniel Östman <daniel.ostman@ericsson.com>; Erez Ferber
> > > >> <erezferber@gmail.com>; Slava Ovsiienko <viacheslavo@nvidia.com>
> > > >> Cc: users@dpdk.org; Matan Azrad <matan@nvidia.com>;
> > > >> david.marchand@redhat.com
> > > >> Subject: Re: mlx5: imissed / out_of_buffer counter always 0
> > > >>
> > > >> Hi,
> > > >>
> > > >> On 6/21/23 22:22, Maxime Coquelin wrote:
> > > >>> Hi Daniel, all,
> > > >>>
> > > >>> On 6/5/23 16:00, Daniel Östman wrote:
> > > >>>> Hi Slava and Erez and thanks for your answers,
> > > >>>>
> > > >>>> Regarding the firmware, I’ve also deployed in a different
> > > >>>> OpenShift cluster were I see the exact same issue but with a
> > > >>>> different Mellanox
> > > >>>> NIC:
> > > >>>>
> > > >>>> Mellanox Technologies MT2892 Family - ConnectX-6 DX 2-port
> > 100GbE
> > > >>>> QSFP56 PCIe Adapter
> > > >>>>
> > > >>>> driver: mlx5_core
> > > >>>>
> > > >>>> version: 5.0-0
> > > >>>> firmware-version: 22.36.1010 (DEL0000000027)
> > > >>>>
> > > >>>>   From what I can see the firmware is relatively new on that one?
> > > >>>
> > > >>> With below configuration:
> > > >>> - ConnectX-6 Dx MT2892
> > > >>> - Kernel: 6.4.0-rc6
> > > >>> - FW version: 22.35.1012 (MT_0000000528)
> > > >>>
> > > >>> The out-of-buffer counter is fetched via
> > > >>> mlx5_devx_cmd_queue_counter_query():
> > > >>>
> > > >>> [pid  2942] ioctl(17, RDMA_VERBS_IOCTL, 0x7ffcb15bcd10) = 0 [pid
> > > >>> 2942] write(1, "\n  ######################## NIC "..., 80) = 80
> > > >>> [pid 2942] write(1, "  RX-packets: 630997736  RX-miss"..., 70) =
> > > >>> 70 [pid 2942] write(1, "  RX-errors: 0\n", 15) = 15 [pid  2942] write(1, "
> > > >>> RX-nombuf:  0         \n", 25) = 25 [pid  2942] write(1, "
> > > >>> TX-packets: 0          TX-erro"..., 60) = 60 [pid  2942]
> > > >>> write(1, "\n", 1)           = 1 [pid  2942] write(1, "
> > > >>> Throughput (since last show)\n", 31) = 31 [pid  2942] write(1, "  Rx-
> pps:
> > > >>> 0 "..., 106) = 106 [pid  2942] write(1,"
> > > >>> ##############################"..., 79) = 79
> > > >>>
> > > >>> It looks like we may miss some mlx5 kernel patches so that we
> > > >>> can use
> > > >>> mlx5_devx_cmd_queue_counter_query() with RHEL?
> > > >>>
> > > >>> Erez, Slava, any idea on the patches that could be missing?
> > > >>
> > > >> Above test was on baremetal as root, I get the same "working"
> > > >> behaviour on RHEL as root.
> > > >>
> > > >> We managed to reproduce Daniel's with running the same within a
> > > >> container, enabling debug logs we have this warning:
> > > >>
> > > >> mlx5_common: DevX create q counter set failed errno=121
> > > >> status=0x2
> > > >> syndrome=0x8975f1
> > > >> mlx5_net: Port 0 queue counter object cannot be created by DevX -
> > > >> fall-back to use the kernel driver global queue counter.
> > > >>
> > > >> Running the container as privileged solves the issue, and so does
> > > >> when adding SYS_RAWIO capability to the container.
> > > >>
> > > >> Erez, Slava, is that expected to require SYS_RAWIO just to get a
> > > >> stat
> > > counter?
> > >
> > > Erez & Slava, could it be possible to get the stats counters via
> > > devx without requiring SYS_RAWIO?
> > >
> > > >>
> > > >> Daniel, could you try adding SYS_RAWIO to your pod to confirm you
> > > >> face the same issue?
> > > >
> > > > Yes I can confirm what you are seeing when running in a cluster
> > > > with
> > > Openshift 4.12 (RHEL 8.6) and with SYS_RAWIO or running as privileged.
> > > > But with privileged container I also need to run with UID 0 for it
> > > > to work, is
> > > that what you are doing as well?
> > >
> > > I don't have an OCP setup at hand right now to test it, but IIRC yes
> > > we ran it with UID 0.
> > >
> > > > In both these cases the counter can be successfully retrieved
> > > > through the
> > > DevX interface.
> > >
> > > Ok.
> > >
> > > > However, when running in a cluster with Openshift 4.10 (RHEL 8.4)
> > > > I can not
> > > get it to work with any of these two approaches.
> > >
> > > I'm not sure this is Kernel related, as I tested on both RHEL-8.4.0
> > > and latest
> > > RHEL_8.4 and I can get que q counters via ioctl().
> > >
> > > Maxime
> > >
> > > >> Thanks in advance,
> > > >> Maxime
> > > >>> Regards,
> > > >>> Maxime
> > > >>>
> > > >>>>
> > > >>>> I tried setting dv_flow_en=0 (and saw that it was propagated to
> > > >>>> config->dv_flow_en) but it didn’t seem to help.
> > > >>>>
> > > >>>> Erez, I’m not sure what you mean by shared or non-shared mode
> > > >>>> in this case, however it seems it could be related to the fact
> > > >>>> that the container is running in a separate network namespace.
> > > >>>> Because the hw_counter directory is available on the host
> > > >>>> (cluster node), but not in the pod container.
> > > >>>>
> > > >>>> Best regards,
> > > >>>>
> > > >>>> Daniel
> > > >>>>
> > > >>>> *From:*Erez Ferber <erezferber@gmail.com>
> > > >>>> *Sent:* Monday, 5 June 2023 12:29
> > > >>>> *To:* Slava Ovsiienko <viacheslavo@nvidia.com>
> > > >>>> *Cc:* Daniel Östman <daniel.ostman@ericsson.com>;
> > > users@dpdk.org;
> > > >>>> Matan Azrad <matan@nvidia.com>;
> maxime.coquelin@redhat.com;
> > > >>>> david.marchand@redhat.com
> > > >>>> *Subject:* Re: mlx5: imissed / out_of_buffer counter always 0
> > > >>>>
> > > >>>> Hi Daniel,
> > > >>>>
> > > >>>> is the container running in shared or non-shared mode ?
> > > >>>>
> > > >>>> For shared mode, I assume the kernel sysfs counters which DPDK
> > > >>>> relies on for imissed/out_of_buffer are not exposed.
> > > >>>>
> > > >>>> Best regards,
> > > >>>>
> > > >>>> Erez
> > > >>>>
> > > >>>> On Fri, 2 Jun 2023 at 18:07, Slava Ovsiienko
> > > >>>> <viacheslavo@nvidia.com <mailto:viacheslavo@nvidia.com>>
> wrote:
> > > >>>>
> > > >>>>      Hi, Daniel
> > > >>>>
> > > >>>>      I would recommend to take the following action:
> > > >>>>
> > > >>>>      - update the firmware, 16.33.xxxx looks to be outdated a little
> bit.
> > > >>>>      Please, try 16.35.1012 or later.
> > > >>>>         mlx5_glue->devx_obj_create might succeed with the newer
> FW.
> > > >>>>
> > > >>>>      - try to specify dv_flow_en=0 devarg, it forces mlx5 PMD
> > > >>>> to use
> > > >>>>      rdma_core library for queue management
> > > >>>>        and kernel driver will  be aware about Rx queues being
> > > >>>> created and
> > > >>>>      attach them to the kernel counter set
> > > >>>>
> > > >>>>      With best regards,
> > > >>>>      Slava
> > > >>>>
> > > >>>>      *From:*Daniel Östman <daniel.ostman@ericsson.com
> > > >>>>      <mailto:daniel.ostman@ericsson.com>>
> > > >>>>      *Sent:* Friday, June 2, 2023 3:59 PM
> > > >>>>      *To:* users@dpdk.org <mailto:users@dpdk.org>
> > > >>>>      *Cc:* Matan Azrad <matan@nvidia.com
> > > >>>> <mailto:matan@nvidia.com>>;
> > > >>>>      Slava Ovsiienko <viacheslavo@nvidia.com
> > > >>>>      <mailto:viacheslavo@nvidia.com>>;
> > maxime.coquelin@redhat.com
> > > >>>>      <mailto:maxime.coquelin@redhat.com>;
> > > david.marchand@redhat.com
> > > >>>>      <mailto:david.marchand@redhat.com>
> > > >>>>      *Subject:* mlx5: imissed / out_of_buffer counter always 0
> > > >>>>
> > > >>>>      Hi,
> > > >>>>
> > > >>>>      I’m deploying a containerized DPDK application in an
> > > >>>> OpenShift
> > > >>>>      Kubernetes environment using DPDK 21.11.3.
> > > >>>>
> > > >>>>      The application uses a Mellanox ConnectX-5 100G NIC
> > > >>>> through
> > VFs.
> > > >>>>
> > > >>>>      The problem I have is that the ETH stats counter imissed
> > > >>>> (which
> > > >>>>      seems to be mapped to “out_of_buffer” internally in mlx5
> > > >>>> PMD
> > > >>>> driver)
> > > >>>>      is 0 when I don’t expect it to be, i.e. when the
> > > >>>> application doesn’t
> > > >>>>      read the packets fast enough.
> > > >>>>
> > > >>>>      Using GDB I can see that it tries to access the counter
> > > >>>> through
> > > >>>>
> > > >>>>
> /sys/class/infiniband/mlx5_99/ports/1/hw_counters/out_of_buffer
> > > >>>> but
> > > >>>>      the hw_counters directory is missing so it will just
> > > >>>> return a zero
> > > >>>>      value. I don’t know why it is missing.
> > > >>>>
> > > >>>>      When looking at mlx5_os_read_dev_stat() I can see that
> > > >>>> there is an
> > > >>>>      alternative way of reading the counter, through
> > > >>>>      mlx5_devx_cmd_queue_counter_query() but under the
> > > >>>> condition that
> > > >>>>      priv->q_counters are set.
> > > >>>>
> > > >>>>      It doesn’t get set in my case because
> > > >>>> mlx5_glue->devx_obj_create()
> > > >>>>      fails (errno 22) in mlx5_devx_cmd_queue_counter_alloc().
> > > >>>>
> > > >>>>      Have I missed something?
> > > >>>>
> > > >>>>      NIC info:
> > > >>>>
> > > >>>>      Mellanox Technologies MT27800 Family [ConnectX-5] - 100Gb
> > > >>>> 2-port
> > > >>>>      QSFP28 MCX516A-CCHT
> > > >>>>      driver: mlx5_core
> > > >>>>      version: 5.0-0
> > > >>>>      firmware-version: 16.33.1048 (MT_0000000417)
> > > >>>>
> > > >>>>      Please let me know if I need to provide more information.
> > > >>>>
> > > >>>>      Best regards,
> > > >>>>
> > > >>>>      Daniel
> > > >>>>

      reply	other threads:[~2023-12-06 12:40 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-02 12:59 Daniel Östman
2023-06-02 15:07 ` Slava Ovsiienko
2023-06-05 10:29   ` Erez Ferber
2023-06-05 14:00     ` Daniel Östman
2023-06-21 20:22       ` Maxime Coquelin
2023-06-22 15:47         ` Maxime Coquelin
2023-08-18 12:04           ` Daniel Östman
2023-10-04 13:49             ` Maxime Coquelin
2023-11-08 12:55               ` Daniel Östman
2023-11-09 14:51                 ` Slava Ovsiienko
2023-12-06 12:40                   ` Daniel Östman [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=PAVPR07MB9310E3EC6E749D7B145E06A68684A@PAVPR07MB9310.eurprd07.prod.outlook.com \
    --to=daniel.ostman@ericsson.com \
    --cc=bingz@nvidia.com \
    --cc=david.marchand@redhat.com \
    --cc=erezferber@gmail.com \
    --cc=matan@nvidia.com \
    --cc=maxime.coquelin@redhat.com \
    --cc=mkashani@nvidia.com \
    --cc=users@dpdk.org \
    --cc=viacheslavo@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).