From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 615FF45EFC; Fri, 20 Dec 2024 18:06:12 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id E5F3A40156; Fri, 20 Dec 2024 18:06:11 +0100 (CET) Received: from mail-pg1-f170.google.com (mail-pg1-f170.google.com [209.85.215.170]) by mails.dpdk.org (Postfix) with ESMTP id 00EA44003C for ; Fri, 20 Dec 2024 18:06:10 +0100 (CET) Received: by mail-pg1-f170.google.com with SMTP id 41be03b00d2f7-7ea8de14848so1405074a12.2 for ; Fri, 20 Dec 2024 09:06:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=6wind.com; s=google; t=1734714370; x=1735319170; darn=dpdk.org; h=cc:to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=OOVc0rq6XvF0HpwtVWwNSUlN5MOP2y72YtJRkl0PgUs=; b=CbCwLbE9ALVNF3KNo9Rw9xVe8GfGegpJSenhGYHzBHte236IMfw6I9yIFrAHA48bNe 0jBy8VHPvZDbyeFT966QOaPaNp/uiFahKe+hmFdumGRRsT9o24QaG+5KYUzJ+sHV13xn Jb6t3dJAre5rhUcdPsCkndzDjbFlcEcgvXcVTlMTCrUj53RwdoUwmE5r8Qu/pdwNdZYc KtwzSoFTa9Ls8dmZTkQqfaj3g02U89UaOqHeU9ZFQSQlV2Ezqkjzg85Ha5kKo9mMb2wB /aEedtc67+eyBqNkIzUu6OWaPy84qRW9e/CyKDy74XkQ/qKPwRTs7DOrPc3Og3NTHXAe OZfw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734714370; x=1735319170; h=cc:to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=OOVc0rq6XvF0HpwtVWwNSUlN5MOP2y72YtJRkl0PgUs=; b=c21JXPtxzouvlGBcKA8NE84uBhx7RYHOc8kFEcnpuEYneBk69tVN9arFMdCpRfZuZB iFYXzvtnLdDkMNl2omK/7ssecBMTkKyyKU5sutOpvhFnWrEFyLdTsJMn5tBUiUcLoreX ZFixyzt4SjQB6c++1IIPKZ30RmxfUn8hEobTTrjcNqjYqpRhg3xPFSWJ439tJIWqtOSv KwZpXuqqx/tYkE3Jhn/7gI3ykt4EdA3pamPjUBkRVNRddVrt39u9jSCdordtn58ygRe6 22mxno+ShpQ1gOQBzfUwOjtuf5bjnG6i9+AwDq0r1vJwsfUYm5JwnrxEN5fGWDjrAdKm 9Lqw== X-Gm-Message-State: AOJu0YzmALP256Ujqvko7N1KoIxC4lPGOhJj9wTtcch51Qf0n1fC8l6X EB13eCUJFuJTrqOTIYL0lHPWPxpy3eL5cqFC340PsvO5Sch4nXOpIKVATVmWtO+bgbDBa6t7L85 L+iRN8UaToUh2xuaOPm62+7um/OaKixcfp217RHgpmSRtEXXL38w= X-Gm-Gg: ASbGnctj8fPXa/Z3TO0qCgPcEo0Gtaz7q6NAjrUyMkZdgJ3u12p2t/ezID11jelVqIZ yd/ojw/EXPP1gIiX3RKq1SKJfnu3C0nBXUbA8 X-Google-Smtp-Source: AGHT+IEOAXFs/6PueV3zcQVpdzFfP7AauMcYmOgCRQU9PEG5DzrGJbwafUXqsOxWNEumdMmKruNd8rQdzmFSM6oNLcY= X-Received: by 2002:a17:90b:2cc3:b0:2ee:693e:ed7c with SMTP id 98e67ed59e1d1-2f452eeb5ffmr5785782a91.33.1734714370003; Fri, 20 Dec 2024 09:06:10 -0800 (PST) MIME-Version: 1.0 From: Edwin Brossette Date: Fri, 20 Dec 2024 18:05:59 +0100 Message-ID: Subject: net/mlx5: wrong Rx/Tx descriptor limits when DevX is off To: dev@dpdk.org Cc: Laurent Hardy , Olivier Matz , Didier Pallard , Jean-Mickael Guerin Content-Type: multipart/alternative; boundary="000000000000cfaf290629b6a9d3" X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org --000000000000cfaf290629b6a9d3 Content-Type: text/plain; charset="UTF-8" Hello, I have run into a regression following an update to stable dpdk-24.11 with a number of my Mellanox cx4/5/6 nics. This regression occurs with all nics in my lab which have DevX disabled: using mstconfig utility, I can see the flag UCTX_EN is not set. Mainly, the issue is that the ports cannot be started, with the following error logs in the journal: Set nb_rxd=1 (asked=512) for port=0 Set nb_txd=1 (asked=512) for port=0 starting port 0 Initializing port 0 [7c:fe:90:65:e6:54] port 0: ntfp1 (mlx5_pci) nb_rxq=2 nb_txq=2 rxq0=c9 rxq1=c25 txq0=c9 txq1=c25 port 0: rx_scatter=0 tx_scatter=0 max_rx_frame=1526 mlx5_net: port 0 number of descriptors requested for Tx queue 0 must be higher than MLX5_TX_COMP_THRESH, using 33 instead of 1 mlx5_net: port 0 increased number of descriptors in Tx queue 0 to the next power of two (64) mlx5_net: port 0 number of descriptors requested for Tx queue 1 must be higher than MLX5_TX_COMP_THRESH, using 33 instead of 1 mlx5_net: port 0 increased number of descriptors in Tx queue 1 to the next power of two (64) mlx5_net: Port 0 Rx queue 0 CQ creation failure. mlx5_net: port 0 Rx queue allocation failed: Cannot allocate memory rte_eth_dev_start(port 0) failed, error=-12 Failed to start port 0, set link down Failed to start port 0 Looking more precisely into the problem, it appears that the number of Rx and Tx descriptors configured for my queues is 1. This happens because mlx5_dev_infos_get() return a limit of 1 for both Rx and Tx, which is unexpected. I identified this patch to be responsible for the regression: 4c3d7961d9002: net/mlx5: fix reported Rx/Tx descriptor limits https://git.dpdk.org/dpdk/commit/?id=4c3d7961d9002bb715a8ee76bcf464d633316d4c After doing some debugging, I noticed that hca_attr.log_max_wq_sz is never configured. This should be done in mlx5_devx_cmd_query_hca_attr() which is called in this bit of code: https://git.dpdk.org/dpdk/tree/drivers/common/mlx5/mlx5_common.c#n681 /* * When CTX is created by Verbs, query HCA attribute is unsupported. * When CTX is imported, we cannot know if it is created by DevX or * Verbs. So, we use query HCA attribute function to check it. */ if (cdev->config.devx || cdev->config.device_fd != MLX5_ARG_UNSET) { /* Query HCA attributes. */ ret = mlx5_devx_cmd_query_hca_attr(cdev->ctx, &cdev->config.hca_attr); if (ret) { DRV_LOG(ERR, "Unable to read HCA caps in DevX mode."); rte_errno = ENOTSUP; goto error; } cdev->config.devx = 1; } DRV_LOG(DEBUG, "DevX is %ssupported.", cdev->config.devx ? "" : "NOT "); I deduced that following the above patch, the correct value for maximum Rx and Tx descriptors will only be set if DevX is enabled (see the if condition on cdev->config.devx). If it is disabled, then maximum Rx and Tx descriptors will be 1, which will make the ports fail to start. Perhaps we should keep the previous default value (65535) if config.devx == 0 (DevX off)? This could be done like this, for example: diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c index 7708a0b80883..8ba3eb4a32de 100644 --- a/drivers/net/mlx5/mlx5_ethdev.c +++ b/drivers/net/mlx5/mlx5_ethdev.c @@ -359,10 +359,12 @@ mlx5_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *info) info->flow_type_rss_offloads = ~MLX5_RSS_HF_MASK; mlx5_set_default_params(dev, info); mlx5_set_txlimit_params(dev, info); - info->rx_desc_lim.nb_max = - 1 << priv->sh->cdev->config.hca_attr.log_max_wq_sz; - info->tx_desc_lim.nb_max = - 1 << priv->sh->cdev->config.hca_attr.log_max_wq_sz; + if (priv->sh->cdev->config.devx) { + info->rx_desc_lim.nb_max = + 1 << priv->sh->cdev->config.hca_attr.log_max_wq_sz; + info->tx_desc_lim.nb_max = + 1 << priv->sh->cdev->config.hca_attr.log_max_wq_sz; + } if (priv->sh->cdev->config.hca_attr.mem_rq_rmp && priv->obj_ops.rxq_obj_new == devx_obj_ops.rxq_obj_new) info->dev_capa |= RTE_ETH_DEV_CAPA_RXQ_SHARE; Thanks in advance for your help. Regards, Edwin Brossette. --000000000000cfaf290629b6a9d3 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hello,

I have run into a regression following an up= date to stable dpdk-24.11 with a number of my Mellanox cx4/5/6 nics. This regression=20 occurs with all nics in my lab which have DevX disabled: using mstconfig utility, I can see the flag UCTX_EN is not set.

Mainly, the issue i= s that the ports cannot be started, with the following error logs in the jo= urnal:

Set nb_rxd=3D1 (asked=3D512)= for port=3D0
Set nb_txd=3D1 (asked=3D512) for port=3D0
starting po= rt 0
Initializing port 0 [7c:fe:90:65:e6:54]
port 0: ntfp1 (mlx5_pc= i)
nb_rxq=3D2 nb_txq=3D2
rxq0=3Dc9 rxq1=3Dc25
txq0=3Dc9 txq1=3D= c25
port 0: rx_scatter=3D0 tx_scatter=3D0 max_rx_frame=3D1526
mlx5_= net: port 0 number of descriptors requested for Tx queue 0 must be higher t= han MLX5_TX_COMP_THRESH, using 33 instead of 1
mlx5_net: port 0 increas= ed number of descriptors in Tx queue 0 to the next power of two (64)
ml= x5_net: port 0 number of descriptors requested for Tx queue 1 must be highe= r than MLX5_TX_COMP_THRESH, using 33 instead of 1
mlx5_net: port 0 incr= eased number of descriptors in Tx queue 1 to the next power of two (64)
= mlx5_net: Port 0 Rx queue 0 CQ creation failure.
mlx5_net: port 0 Rx q= ueue allocation failed: Cannot allocate memory
rte_eth_dev_start(port 0= ) failed, error=3D-12
Failed to start port 0, set link down
Failed = to start port 0

Looking more precisely into the problem, it appears that the number of Rx and=20 Tx descriptors configured for my queues is 1. This happens because=20 mlx5_dev_infos_get() return a limit of 1 for both Rx and Tx, which is=20 unexpected. I identified this patch to be responsible for the=20 regression:

4c3d7961d9002: net/mlx5: fix reported Rx/Tx descriptor l= imits
https://git.dpdk.org/dpdk/commi= t/?id=3D4c3d7961d9002bb715a8ee76bcf464d633316d4c

After doing some debugging, I noticed that hca_attr.log_max_wq_sz is never=20 configured. This should be done in mlx5_devx_cmd_query_hca_attr() which=20 is called in this bit of code:

https://git.= dpdk.org/dpdk/tree/drivers/common/mlx5/mlx5_common.c#n681

/*
* When CTX is created by Verbs, query HC= A attribute is unsupported.
* When CTX is imported, we cannot know if = it is created by DevX or
* Verbs. So, we use query HCA attribute funct= ion to check it.
*/
if (cdev->config.devx || cdev->config.de= vice_fd !=3D MLX5_ARG_UNSET) {
/* Query HCA attributes. */
ret =3D mlx= 5_devx_cmd_query_hca_attr(cdev->ctx, &cdev->config.hca_attr);
= if (ret) {
DRV_LOG(ERR, "Unab= le to read HCA caps in DevX mode.");
rte_errno =3D ENOTSUP;
= goto error;
}
cdev->config.devx =3D 1;
}
DRV_LOG(DEBUG, "DevX is %ssupp= orted.", cdev->config.devx ? "" : "NOT ");
<= /div>
I deduced that following the above patch, the correct value for maximum=20 Rx and Tx descriptors will only be set if DevX is enabled (see the if=20 condition on cdev->config.devx). If it is disabled, then maximum Rx=20 and Tx descriptors will be 1, which will make the ports fail to start.=20 Perhaps we should keep the previous default value (65535) if config.devx =3D=3D 0 (DevX off)? This could be done like this, for example:

dif= f --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.cindex 7708a0b80883..8ba3eb4a32de 100644
--- a/drivers/net/mlx5/mlx5_et= hdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -359,10 +359,12 @@ mlx= 5_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *info)
= =C2=A0 =C2=A0 =C2=A0 =C2=A0 info->flow_type_rss_offloads =3D ~MLX5_RSS_H= F_MASK;
=C2=A0 =C2=A0 =C2=A0 =C2=A0 mlx5_set_default_params(dev, info);<= br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 mlx5_set_txlimit_params(dev, info);
- =C2= =A0 =C2=A0 =C2=A0 info->rx_desc_lim.nb_max =3D
- =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 1 << priv->sh->cdev->config.hca= _attr.log_max_wq_sz;
- =C2=A0 =C2=A0 =C2=A0 info->tx_desc_lim.nb_max = =3D
- =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 1 << priv-&= gt;sh->cdev->config.hca_attr.log_max_wq_sz;
+ =C2=A0 =C2=A0 =C2=A0= if (priv->sh->cdev->config.devx) {
+ =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 info->rx_desc_lim.nb_max =3D
+ =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 1 << = priv->sh->cdev->config.hca_attr.log_max_wq_sz;
+ =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 info->tx_desc_lim.nb_max =3D
+ =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 1= << priv->sh->cdev->config.hca_attr.log_max_wq_sz;
+ =C2= =A0 =C2=A0 =C2=A0 }
=C2=A0 =C2=A0 =C2=A0 =C2=A0 if (priv->sh->cdev= ->config.hca_attr.mem_rq_rmp &&
=C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 priv->obj_ops.rxq_obj_new =3D=3D devx_obj_ops.rxq_obj_new)=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 info->dev_ca= pa |=3D RTE_ETH_DEV_CAPA_RXQ_SHARE;

Thanks in advance for your help.=

Regards,
Edwin Brossette.


--000000000000cfaf290629b6a9d3--