DPDK patches and discussions
 help / color / mirror / Atom feed
From: David Marchand <david.marchand@redhat.com>
To: Gregory Etelson <getelson@nvidia.com>, rasland@nvidia.com
Cc: dev@dpdk.org, Matan Azrad <matan@nvidia.com>,
	 Viacheslav Ovsiienko <viacheslavo@nvidia.com>,
	Thomas Monjalon <thomas@monjalon.net>
Subject: Re: [PATCH 1/2] common/mlx5: update log format after devx_general_cmd error
Date: Tue, 8 Nov 2022 13:46:59 +0100	[thread overview]
Message-ID: <CAJFAV8w6yDW1wXRsiwbV6c9cnSpT2npYVdPyhUg=oRfsO5o8nQ@mail.gmail.com> (raw)
In-Reply-To: <20220608115826.11783-1-getelson@nvidia.com>

On Wed, Jun 8, 2022 at 1:58 PM Gregory Etelson <getelson@nvidia.com> wrote:
>
> Application can fetch syndrome value after FW operation failure
> starting from Mellanox OFED-5.6.
> The patch updates log data issued after devx_general_cmd error.
>
> Signed-off-by: Gregory Etelson <getelson@nvidia.com>
> Acked-by: Matan Azrad <matan@nvidia.com>
> ---
>  drivers/common/mlx5/mlx5_devx_cmds.c | 103 ++++++++++++---------------
>  1 file changed, 44 insertions(+), 59 deletions(-)
>
> diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
> index c6bdbc12bb..bc06aeccc7 100644
> --- a/drivers/common/mlx5/mlx5_devx_cmds.c
> +++ b/drivers/common/mlx5/mlx5_devx_cmds.c
> @@ -13,39 +13,49 @@
>  #include "mlx5_common_log.h"
>  #include "mlx5_malloc.h"
>
> +/* FW writes status value to the OUT buffer at offset 00H */
> +#define MLX5_FW_STATUS(o) MLX5_GET(general_obj_out_cmd_hdr, (o), status)
> +/* FW writes syndrome value to the OUT buffer at offset 04H */
> +#define MLX5_FW_SYNDROME(o) MLX5_GET(general_obj_out_cmd_hdr, (o), syndrome)
> +
> +#define MLX5_DEVX_ERR_RC(x) ((x) > 0 ? -(x) : ((x) < 0 ? (x) : -1))
> +
> +static void
> +mlx5_devx_err_log(void *out, const char *reason,
> +                 const char *param, uint32_t value)
> +{
> +       rte_errno = errno;
> +       if (!param)
> +               DRV_LOG(ERR, "DevX %s failed errno=%d status=%#x syndrome=%#x",
> +                       reason, errno, MLX5_FW_STATUS(out),
> +                       MLX5_FW_SYNDROME(out));
> +       else
> +               DRV_LOG(ERR, "DevX %s %s=%#X failed errno=%d status=%#x syndrome=%#x",
> +                       reason, param, value, errno, MLX5_FW_STATUS(out),
> +                       MLX5_FW_SYNDROME(out));
> +}
> +
>  static void *
>  mlx5_devx_get_hca_cap(void *ctx, uint32_t *in, uint32_t *out,
>                       int *err, uint32_t flags)
>  {
>         const size_t size_in = MLX5_ST_SZ_DW(query_hca_cap_in) * sizeof(int);
>         const size_t size_out = MLX5_ST_SZ_DW(query_hca_cap_out) * sizeof(int);
> -       int status, syndrome, rc;
> +       int rc;
>
> -       if (err)
> -               *err = 0;
>         memset(in, 0, size_in);
>         memset(out, 0, size_out);
>         MLX5_SET(query_hca_cap_in, in, opcode, MLX5_CMD_OP_QUERY_HCA_CAP);
>         MLX5_SET(query_hca_cap_in, in, op_mod, flags);
>         rc = mlx5_glue->devx_general_cmd(ctx, in, size_in, out, size_out);
> -       if (rc) {
> -               DRV_LOG(ERR,
> -                       "Failed to query devx HCA capabilities func %#02x",
> -                       flags >> 1);
> +       if (rc || MLX5_FW_STATUS(out)) {
> +               mlx5_devx_err_log(out, "HCA capabilities", "func", flags >> 1);
>                 if (err)
> -                       *err = rc > 0 ? -rc : rc;
> -               return NULL;
> -       }
> -       status = MLX5_GET(query_hca_cap_out, out, status);
> -       syndrome = MLX5_GET(query_hca_cap_out, out, syndrome);
> -       if (status) {
> -               DRV_LOG(ERR,
> -                       "Failed to query devx HCA capabilities func %#02x status %x, syndrome = %x",
> -                       flags >> 1, status, syndrome);
> -               if (err)
> -                       *err = -1;
> +                       *err = MLX5_DEVX_ERR_RC(rc);
>                 return NULL;
>         }
> +       if (err)
> +               *err = 0;
>         return MLX5_ADDR_OF(query_hca_cap_out, out, capability);
>  }
>
> @@ -74,7 +84,7 @@ mlx5_devx_cmd_register_read(void *ctx, uint16_t reg_id, uint32_t arg,
>         uint32_t in[MLX5_ST_SZ_DW(access_register_in)]   = {0};
>         uint32_t out[MLX5_ST_SZ_DW(access_register_out) +
>                      MLX5_ACCESS_REGISTER_DATA_DWORD_MAX] = {0};
> -       int status, rc;
> +       int rc;
>
>         MLX5_ASSERT(data && dw_cnt);
>         MLX5_ASSERT(dw_cnt <= MLX5_ACCESS_REGISTER_DATA_DWORD_MAX);
> @@ -91,23 +101,13 @@ mlx5_devx_cmd_register_read(void *ctx, uint16_t reg_id, uint32_t arg,
>         rc = mlx5_glue->devx_general_cmd(ctx, in, sizeof(in), out,
>                                          MLX5_ST_SZ_BYTES(access_register_out) +
>                                          sizeof(uint32_t) * dw_cnt);
> -       if (rc)
> -               goto error;
> -       status = MLX5_GET(access_register_out, out, status);
> -       if (status) {
> -               int syndrome = MLX5_GET(access_register_out, out, syndrome);
> -
> -               DRV_LOG(DEBUG, "Failed to read access NIC register 0x%X, "
> -                              "status %x, syndrome = %x",
> -                              reg_id, status, syndrome);
> -               return -1;
> +       if (rc || MLX5_FW_STATUS(out)) {
> +               mlx5_devx_err_log(out, "read access", "NIC register", reg_id);
> +               return MLX5_DEVX_ERR_RC(rc);

I went back in the mailing list archive to find this change.
This patch makes the pmd spat some scary log messages...

I get this on a system with a CX5 nic (OVS dpdk-latest + 22.11-rc2) :
2022-11-08T12:26:09.218Z|00027|dpdk|ERR|mlx5_common: DevX read access
NIC register=0X9055 failed errno=22 status=0 syndrome=0

Should I be scared?
The system runs fine and nothing else changed in my setup (afaics), so
I'd expect it continues working..

This specific change of level on the "read access" check breaks OVS
system dpdk unit tests, as those tests fail on ERR level logs.
If this is harmless, please send a patch to revert to debug level (and
for other log messages that were updated in this patch if it was
unneeded).


Thanks.

-- 
David Marchand


  parent reply	other threads:[~2022-11-08 12:47 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-08 11:58 Gregory Etelson
2022-06-08 11:58 ` [PATCH 2/2] common/mlx5: update log format after devx_obj_create error Gregory Etelson
2022-06-14 15:02 ` [PATCH 1/2] common/mlx5: update log format after devx_general_cmd error Raslan Darawsheh
2022-11-08 12:46 ` David Marchand [this message]
2022-11-08 16:05   ` Gregory Etelson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJFAV8w6yDW1wXRsiwbV6c9cnSpT2npYVdPyhUg=oRfsO5o8nQ@mail.gmail.com' \
    --to=david.marchand@redhat.com \
    --cc=dev@dpdk.org \
    --cc=getelson@nvidia.com \
    --cc=matan@nvidia.com \
    --cc=rasland@nvidia.com \
    --cc=thomas@monjalon.net \
    --cc=viacheslavo@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).