From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 0132446409; Mon, 17 Mar 2025 16:02:35 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 89172402D1; Mon, 17 Mar 2025 16:02:35 +0100 (CET) Received: from mail-pj1-f52.google.com (mail-pj1-f52.google.com [209.85.216.52]) by mails.dpdk.org (Postfix) with ESMTP id 127F9402BB for ; Mon, 17 Mar 2025 16:02:33 +0100 (CET) Received: by mail-pj1-f52.google.com with SMTP id 98e67ed59e1d1-2ff6cf448b8so4455103a91.3 for ; Mon, 17 Mar 2025 08:02:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=6wind.com; s=google; t=1742223752; x=1742828552; darn=dpdk.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=oIumwP5KEyLbFXfLk00wo5ggxt8usFzCzP8se7PO6lE=; b=MsCyQRXuqsJcuBsGXiXJRRF52jWIUprkXplSL4oNOEAI6NOVikfl41bdnEaBg8hsez XkXBVgje+ZHh6EJJX+Rr5EGm5E2fzrypZT2v8YwbLJdQUCXAwyO+s8zlniBtcnmtTK+Y bxgBhOUBc7CiumldWCuZER4sNpeKPSmV46XczKlyTkc5W3+yt7XBQIBTn8TiQeh4psnU UYmlGTJgvGWZoW06eC87OItdp4Wl40x0oF4eC3hSnTKbDkmkrvIZSpH1Ltv14Bh6m4XR hy4XBlYaHM7JLIAM8X3U3mU9dfqzbR+h7HAOOFc1RU+m26JStXMse0ArEAkUTzqp1j6a uXZQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742223752; x=1742828552; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=oIumwP5KEyLbFXfLk00wo5ggxt8usFzCzP8se7PO6lE=; b=AM8p+dWb9nLs8fXAncS5/CMBoROiYHtsbY7r+oKuCBxJuOA5iFd8Ui1dl/ZpI8G1bN plHeRPh7nCJlqT6o1uHAktgqkVCBy9gN36KWQnkVNLEw9JiWuC3XljoDNtj7HXqC3t7M IniCIt8xuBHBSFKL8voHh25YShJAn808zaKMMhSCXum7yyA6NCAK472KRMC4y5znMUvS iIOk7ur3CRwPgRjv8wIHV6oBlRvtvU931gh1u2PGK3YwI2G/RCmmfccZOn+Ee9ZGLLuk DIqnz3e9WRkhLGTRGfPfQCUHPvBKZlMx1FriyLJ13avapQ2SZXUKOzG8Az90PlO6tljx AbGQ== X-Forwarded-Encrypted: i=1; AJvYcCWr0M3U8H4LVGU6y5EqlPhbaxvJVGzMKD/WoXPaZ2hzNkvowISChuvIaL6Rlb7gAqdVTQU=@dpdk.org X-Gm-Message-State: AOJu0Ywsft6+ez22NYQeClkLYptlIYWhn1RhJTdNdnOntGhg8EcLfQ3f 1HMRy0ZszLnaiCM8DTfQN6ydLRHrnrU2+McUazIl24O8RUfp9l54adjVD6rYPO8O26FYgB1uCl+ qytJE1maCNtkrobx1c9C3MCR/yKRIOaSslpFb5g== X-Gm-Gg: ASbGnctYpzXLnMrDzhtS8296/7koDFB8Uz8gl9o+20w53BNKkCH0MR6OM28XbjyQO34 QramRV8acqE48gaeGJ1oHHBjpooRJKEm2OB8r6QmOEEvecCTOz0KIuyAajK/A9kFFx9aXR3MnGU FWOFiH4t8xQMJ5016gS3KE6/rp X-Google-Smtp-Source: AGHT+IHIaA+5CRqiK9u/p8Ta9ISbsyXKyCE+r1wBn1THwWcFfwHRN8lmXk821UnkWojpIpXtO66xDqKrS4OBi+V+d+0= X-Received: by 2002:a17:90b:38ce:b0:2ff:6ac2:c5a5 with SMTP id 98e67ed59e1d1-3019e9b76b2mr29504a91.26.1742223751668; Mon, 17 Mar 2025 08:02:31 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Edwin Brossette Date: Mon, 17 Mar 2025 16:02:20 +0100 X-Gm-Features: AQ5f1JrbRwYnP-AUlVKV82zvIlTUzSnHVSNEG_Twwn9ul3oEc37Ga0qXKL3XsPY Message-ID: Subject: Re: net/mlx5: wrong Rx/Tx descriptor limits when DevX is off To: Slava Ovsiienko Cc: Asaf Penso , "igootorov@gmail.com" , Laurent Hardy , Olivier Matz , Didier Pallard , Jean-Mickael Guerin , "dev@dpdk.org" Content-Type: multipart/alternative; boundary="000000000000d6850706308b134c" X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org --000000000000d6850706308b134c Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hello, Thank you for your answer. The short patch I joined with my first mail was just a rough example to report what I tested. I believe you know the driver's code better than I do, so I wouldn't be opposed to see you fix this issue. Thank you in advance. Regards, Edwin Brossette. On Wed, Mar 5, 2025 at 10:17=E2=80=AFAM Slava Ovsiienko wrote: > Hi, Edwin > > > > Thank you for the patch. > > You are quite right, =E2=80=9Csh->cdev->config.hca_attr.log_max_wq_sz=E2= =80=9D is not set > if DevX is disengaged. > > I found some other places where the uninitialized =E2=80=9Clog_max_wq_sz= =E2=80=9D might be > used. > So. I=E2=80=99d rather prefer to configure the =E2=80=9Clog_max_wq_sz=E2= =80=9D for IBV case as > well, instead of just fixing mlx5_dev_infos_get(). > > > There is the property in =E2=80=9Cpriv->sh->dev_cap.max_qp_wr=E2=80=9D, = it reflects the > max number of descriptors if rdma_core is used. > > Would you like to update your patch with this? Or would you prefer me to > do it ? > > With best regards, > Slava > > > > > > *From:* Edwin Brossette > *Sent:* Wednesday, February 12, 2025 4:34 PM > *To:* Slava Ovsiienko > *Cc:* Asaf Penso ; igootorov@gmail.com; Laurent Hardy < > laurent.hardy@6wind.com>; Olivier Matz ; Didier > Pallard ; Jean-Mickael Guerin ; > dev@dpdk.org > *Subject:* Re: net/mlx5: wrong Rx/Tx descriptor limits when DevX is off > > > > Hello, > > > > Sorry for bothering you again. > > May I inquire if this issue is still being worked on ? > > If so, when can I expect to see a fix ? > > > > Best regards, > > Edwin Brossette > > > > On Mon, Dec 23, 2024 at 2:09=E2=80=AFPM Slava Ovsiienko > wrote: > > Confirm, it=E2=80=99s a bug, IIUC was introduced by reporting function up= date. > AFAIK, we do not test with non-DevX environment anymore, so missed this. > > Fix should be provided. > > > > With best regards, > > Slava > > > > *From:* Asaf Penso > *Sent:* Sunday, December 22, 2024 9:39 AM > *To:* igootorov@gmail.com; Slava Ovsiienko > *Cc:* Laurent Hardy ; Olivier Matz < > olivier.matz@6wind.com>; Didier Pallard ; > Jean-Mickael Guerin ; Edwin Brossette < > edwin.brossette@6wind.com>; dev@dpdk.org > *Subject:* RE: net/mlx5: wrong Rx/Tx descriptor limits when DevX is off > > > > Hello Igor and Slava, > > Can you please check out this issue? > > > > Regards, > > Asaf Penso > > > > *From:* Edwin Brossette > *Sent:* Friday, 20 December 2024 19:06 > *To:* dev@dpdk.org > *Cc:* Laurent Hardy ; Olivier Matz < > olivier.matz@6wind.com>; Didier Pallard ; > Jean-Mickael Guerin > *Subject:* net/mlx5: wrong Rx/Tx descriptor limits when DevX is off > > > > Hello, > > I have run into a regression following an update to stable dpdk-24.11 wit= h > a number of my Mellanox cx4/5/6 nics. This regression occurs with all nic= s > in my lab which have DevX disabled: using mstconfig utility, I can see th= e > flag UCTX_EN is not set. > > Mainly, the issue is that the ports cannot be started, with the following > error logs in the journal: > > Set nb_rxd=3D1 (asked=3D512) for port=3D0 > Set nb_txd=3D1 (asked=3D512) for port=3D0 > starting port 0 > Initializing port 0 [7c:fe:90:65:e6:54] > port 0: ntfp1 (mlx5_pci) > nb_rxq=3D2 nb_txq=3D2 > rxq0=3Dc9 rxq1=3Dc25 > txq0=3Dc9 txq1=3Dc25 > port 0: rx_scatter=3D0 tx_scatter=3D0 max_rx_frame=3D1526 > mlx5_net: port 0 number of descriptors requested for Tx queue 0 must be > higher than MLX5_TX_COMP_THRESH, using 33 instead of 1 > mlx5_net: port 0 increased number of descriptors in Tx queue 0 to the nex= t > power of two (64) > mlx5_net: port 0 number of descriptors requested for Tx queue 1 must be > higher than MLX5_TX_COMP_THRESH, using 33 instead of 1 > mlx5_net: port 0 increased number of descriptors in Tx queue 1 to the nex= t > power of two (64) > mlx5_net: Port 0 Rx queue 0 CQ creation failure. > mlx5_net: port 0 Rx queue allocation failed: Cannot allocate memory > rte_eth_dev_start(port 0) failed, error=3D-12 > Failed to start port 0, set link down > Failed to start port 0 > > > Looking more precisely into the problem, it appears that the number of Rx > and Tx descriptors configured for my queues is 1. This happens because > mlx5_dev_infos_get() return a limit of 1 for both Rx and Tx, which is > unexpected. I identified this patch to be responsible for the regression: > > 4c3d7961d9002: net/mlx5: fix reported Rx/Tx descriptor limits > > https://git.dpdk.org/dpdk/commit/?id=3D4c3d7961d9002bb715a8ee76bcf464d633= 316d4c > > After doing some debugging, I noticed that hca_attr.log_max_wq_sz is neve= r > configured. This should be done in mlx5_devx_cmd_query_hca_attr() which i= s > called in this bit of code: > > https://git.dpdk.org/dpdk/tree/drivers/common/mlx5/mlx5_common.c#n681 > > /* > * When CTX is created by Verbs, query HCA attribute is unsupported. > * When CTX is imported, we cannot know if it is created by DevX or > * Verbs. So, we use query HCA attribute function to check it. > */ > if (cdev->config.devx || cdev->config.device_fd !=3D MLX5_ARG_UNSET) { > > /* Query HCA attributes. */ > ret =3D mlx5_devx_cmd_query_hca_attr(cdev->ctx, &cdev->config.hca_attr); > if (ret) { > > DRV_LOG(ERR, "Unable to read HCA caps in DevX mode."); > rte_errno =3D ENOTSUP; > goto error; > > } > cdev->config.devx =3D 1; > > } > DRV_LOG(DEBUG, "DevX is %ssupported.", cdev->config.devx ? "" : "NOT "); > > > > I deduced that following the above patch, the correct value for maximum R= x > and Tx descriptors will only be set if DevX is enabled (see the if > condition on cdev->config.devx). If it is disabled, then maximum Rx and T= x > descriptors will be 1, which will make the ports fail to start. Perhaps w= e > should keep the previous default value (65535) if config.devx =3D=3D 0 (D= evX > off)? This could be done like this, for example: > > diff --git a/drivers/net/mlx5/mlx5_ethdev.c > b/drivers/net/mlx5/mlx5_ethdev.c > index 7708a0b80883..8ba3eb4a32de 100644 > --- a/drivers/net/mlx5/mlx5_ethdev.c > +++ b/drivers/net/mlx5/mlx5_ethdev.c > @@ -359,10 +359,12 @@ mlx5_dev_infos_get(struct rte_eth_dev *dev, struct > rte_eth_dev_info *info) > info->flow_type_rss_offloads =3D ~MLX5_RSS_HF_MASK; > mlx5_set_default_params(dev, info); > mlx5_set_txlimit_params(dev, info); > - info->rx_desc_lim.nb_max =3D > - 1 << priv->sh->cdev->config.hca_attr.log_max_wq_sz; > - info->tx_desc_lim.nb_max =3D > - 1 << priv->sh->cdev->config.hca_attr.log_max_wq_sz; > + if (priv->sh->cdev->config.devx) { > + info->rx_desc_lim.nb_max =3D > + 1 << priv->sh->cdev->config.hca_attr.log_max_wq_s= z; > + info->tx_desc_lim.nb_max =3D > + 1 << priv->sh->cdev->config.hca_attr.log_max_wq_s= z; > + } > if (priv->sh->cdev->config.hca_attr.mem_rq_rmp && > priv->obj_ops.rxq_obj_new =3D=3D devx_obj_ops.rxq_obj_new) > info->dev_capa |=3D RTE_ETH_DEV_CAPA_RXQ_SHARE; > > Thanks in advance for your help. > > Regards, > Edwin Brossette. > > > > --000000000000d6850706308b134c Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hello,

Thank you for your answer.
Th= e short patch I joined with my first mail was just a rough example to repor= t what I tested. I believe you know the driver's code better than I do,= so I wouldn't be opposed to see you fix this issue.
Thank yo= u in advance.

Regards,
Edwin Brossette.<= br>

On Wed, Mar 5, 2025 at 10:17=E2=80=AFAM Slav= a Ovsiienko <viacheslavo@nvidi= a.com> wrote:

Hi, Edwin

=C2=A0<= /span>

Thank you for the pat= ch.

You are quite right, = =E2=80=9Csh->cdev->config.hca_attr.log_max_wq_sz=E2=80=9D is n= ot set if DevX is disengaged.

I found some other places where the uninitialized = =E2=80=9Clog_max_wq_sz=E2=80=9D might be used.
So. I=E2=80=99d rather prefer to configure the =E2=80=9Clog_max_wq_sz=E2=80= =9D for IBV case as well, instead of just fixing mlx5_dev_infos_get().


There is the property in =C2=A0=E2=80=9Cpriv->sh->dev_cap.max_qp_wr= =E2=80=9D, it reflects the max number of descriptors if rdma_core is used.<= u>

Would you like to update your patch with this? Or wo= uld you prefer me to do it ?

With best regards,
Slava

=C2=A0<= /span>

=C2=A0<= /span>

From: Edwin Brossette <edwin.brossette@6wind.com<= /a>>
Sent: Wednesday, February 12, 2025 4:34 PM
To: Slava Ovsiienko <
viacheslavo@nvidia.com>
Cc: Asaf Penso <asafp@nvidia.com>; igootorov@gmail.com; Laurent Hardy <laurent.hardy@6wind.com>; = Olivier Matz <olivier.matz@6wind.com>; Didier Pallard <didier.pallard@6wind.com>; J= ean-Mickael Guerin <j= mg@6wind.com>; dev= @dpdk.org
Subject: Re: net/mlx5: wrong Rx/Tx descriptor limits when DevX is of= f

=C2=A0

Hello,

=C2=A0

Sorry for bothering you again.

May I inquire if this issue is still being worked on= ?

If so, when can I expect to see a fix ?

=C2=A0

Best regards,

Edwin Brossette

=C2=A0

On Mon, Dec 23, 2024 at 2:09=E2=80=AFPM Slava Ovsiienko <viacheslavo@nvidia= .com> wrote:

Confirm, it=E2=80=99s= a bug, IIUC was introduced by reporting function update.
AFAIK, we do not test with non-DevX environment anymore, so missed this.

Fix should be provide= d.

=C2=A0<= u>

With best regards,

Slava

=C2=A0<= u>

From: Asaf Penso <asafp@nvidia.com>
Sent: Sunday, December 22, 2024 9:39 AM
To: igootor= ov@gmail.com; Slava Ovsiienko <viacheslavo@nvidia.com>
Cc: Laurent Hardy <laurent.hardy@6wind.com>; Olivier Matz <olivier.matz@6wind.com>; Didier Pallard <didier.pallard@6wind.com>; Jean-Mickael Guerin <jmg@6wind.com>; Edwin Brossette <edwin.brossette@6wind.com>; dev@dpdk.org
Subject: RE: net/mlx5: wrong Rx/Tx descriptor limits when DevX is of= f

=C2=A0

Hello Igor and Slava,=

Can you please check = out this issue?

=C2=A0<= u>

Regards,

Asaf Penso<= /u>

=C2=A0<= u>

=C2=A0

Hello,

I have run into a regression following an update to stable dpdk-24.11 with = a number of my Mellanox cx4/5/6 nics. This regression occurs with all nics = in my lab which have DevX disabled: using mstconfig utility, I can see the = flag UCTX_EN is not set.

Mainly, the issue is that the ports cannot be started, with the following e= rror logs in the journal:

Set nb_rxd=3D1 (asked=3D512) for port=3D0
Set nb_txd=3D1 (asked=3D512) for port=3D0
starting port 0
Initializing port 0 [7c:fe:90:65:e6:54]
port 0: ntfp1 (mlx5_pci)
nb_rxq=3D2 nb_txq=3D2
rxq0=3Dc9 rxq1=3Dc25
txq0=3Dc9 txq1=3Dc25
port 0: rx_scatter=3D0 tx_scatter=3D0 max_rx_frame=3D1526
mlx5_net: port 0 number of descriptors requested for Tx queue 0 must be hig= her than MLX5_TX_COMP_THRESH, using 33 instead of 1
mlx5_net: port 0 increased number of descriptors in Tx queue 0 to the next = power of two (64)
mlx5_net: port 0 number of descriptors requested for Tx queue 1 must be hig= her than MLX5_TX_COMP_THRESH, using 33 instead of 1
mlx5_net: port 0 increased number of descriptors in Tx queue 1 to the next = power of two (64)
mlx5_net: Port 0 Rx queue 0 CQ creation failure.
mlx5_net: port 0 Rx queue allocation failed: Cannot allocate memory
rte_eth_dev_start(port 0) failed, error=3D-12
Failed to start port 0, set link down
Failed to start port 0


Looking more precisely into the problem, it appears that the number of Rx a= nd Tx descriptors configured for my queues is 1. This happens because mlx5_= dev_infos_get() return a limit of 1 for both Rx and Tx, which is unexpected= . I identified this patch to be responsible for the regression:

4c3d7961d9002: net/mlx5: fix reported Rx/Tx descriptor limits
https://git.dpdk.org/dpdk/commit/?id=3D4= c3d7961d9002bb715a8ee76bcf464d633316d4c

After doing some debugging, I noticed that hca_attr.log_max_wq_sz is never = configured. This should be done in mlx5_devx_cmd_query_hca_attr() which is = called in this bit of code:

https://git.dpdk.org/dpdk/tree/drivers/common/mlx5= /mlx5_common.c#n681

/*
* When CTX is created by Verbs, query HCA attribute is unsupported.
* When CTX is imported, we cannot know if it is created by DevX or
* Verbs. So, we use query HCA attribute function to check it.
*/
if (cdev->config.devx || cdev->config.device_fd !=3D MLX5_ARG_UNSET) = {

/* Query HCA attributes. */
ret =3D mlx5_devx_cmd_query_hca_attr(cdev->ctx, &cdev->config.hca= _attr);
if (ret) {

DRV_LOG(ERR, "Unable to read HCA caps in DevX m= ode.");
rte_errno =3D ENOTSUP;
goto error;

}
cdev->config.devx =3D 1;

}
DRV_LOG(DEBUG, "DevX is %ssupported.", cdev->config.devx ? &qu= ot;" : "NOT ");

=C2=A0

I deduced that following the above patch, the correc= t value for maximum Rx and Tx descriptors will only be set if DevX is enabl= ed (see the if condition on cdev->config.devx). If it is disabled, then maximum Rx and Tx descriptors will be 1, which will m= ake the ports fail to start. Perhaps we should keep the previous default va= lue (65535) if config.devx =3D=3D 0 (DevX off)? This could be done like thi= s, for example:

diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.= c
index 7708a0b80883..8ba3eb4a32de 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -359,10 +359,12 @@ mlx5_dev_infos_get(struct rte_eth_dev *dev, struct rt= e_eth_dev_info *info)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 info->flow_type_rss_offloads =3D ~MLX5_RSS_H= F_MASK;
=C2=A0 =C2=A0 =C2=A0 =C2=A0 mlx5_set_default_params(dev, info);
=C2=A0 =C2=A0 =C2=A0 =C2=A0 mlx5_set_txlimit_params(dev, info);
- =C2=A0 =C2=A0 =C2=A0 info->rx_desc_lim.nb_max =3D
- =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 1 << priv->sh-&= gt;cdev->config.hca_attr.log_max_wq_sz;
- =C2=A0 =C2=A0 =C2=A0 info->tx_desc_lim.nb_max =3D
- =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 1 << priv->sh-&= gt;cdev->config.hca_attr.log_max_wq_sz;
+ =C2=A0 =C2=A0 =C2=A0 if (priv->sh->cdev->config.devx) {
+ =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 info->rx_desc_lim.nb_= max =3D
+ =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 1 << priv->sh->cdev->config.hca_attr.log_max_wq_sz;
+ =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 info->tx_desc_lim.nb_= max =3D
+ =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 1 << priv->sh->cdev->config.hca_attr.log_max_wq_sz;
+ =C2=A0 =C2=A0 =C2=A0 }
=C2=A0 =C2=A0 =C2=A0 =C2=A0 if (priv->sh->cdev->config.hca_attr.me= m_rq_rmp &&
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 priv->obj_ops.rxq_obj_new =3D= =3D devx_obj_ops.rxq_obj_new)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 info->dev_capa |= =3D RTE_ETH_DEV_CAPA_RXQ_SHARE;

Thanks in advance for your help.

Regards,
Edwin Brossette.

=C2=A0

--000000000000d6850706308b134c--