DPDK patches and discussions
 help / color / mirror / Atom feed
* net/mlx5: wrong Rx/Tx descriptor limits when DevX is off
@ 2024-12-20 17:05 Edwin Brossette
  2024-12-22  7:39 ` Asaf Penso
  0 siblings, 1 reply; 3+ messages in thread
From: Edwin Brossette @ 2024-12-20 17:05 UTC (permalink / raw)
  To: dev; +Cc: Laurent Hardy, Olivier Matz, Didier Pallard, Jean-Mickael Guerin

[-- Attachment #1: Type: text/plain, Size: 4127 bytes --]

Hello,

I have run into a regression following an update to stable dpdk-24.11 with
a number of my Mellanox cx4/5/6 nics. This regression occurs with all nics
in my lab which have DevX disabled: using mstconfig utility, I can see the
flag UCTX_EN is not set.

Mainly, the issue is that the ports cannot be started, with the following
error logs in the journal:

Set nb_rxd=1 (asked=512) for port=0
Set nb_txd=1 (asked=512) for port=0
starting port 0
Initializing port 0 [7c:fe:90:65:e6:54]
port 0: ntfp1 (mlx5_pci)
nb_rxq=2 nb_txq=2
rxq0=c9 rxq1=c25
txq0=c9 txq1=c25
port 0: rx_scatter=0 tx_scatter=0 max_rx_frame=1526
mlx5_net: port 0 number of descriptors requested for Tx queue 0 must be
higher than MLX5_TX_COMP_THRESH, using 33 instead of 1
mlx5_net: port 0 increased number of descriptors in Tx queue 0 to the next
power of two (64)
mlx5_net: port 0 number of descriptors requested for Tx queue 1 must be
higher than MLX5_TX_COMP_THRESH, using 33 instead of 1
mlx5_net: port 0 increased number of descriptors in Tx queue 1 to the next
power of two (64)
mlx5_net: Port 0 Rx queue 0 CQ creation failure.
mlx5_net: port 0 Rx queue allocation failed: Cannot allocate memory
rte_eth_dev_start(port 0) failed, error=-12
Failed to start port 0, set link down
Failed to start port 0

Looking more precisely into the problem, it appears that the number of Rx
and Tx descriptors configured for my queues is 1. This happens because
mlx5_dev_infos_get() return a limit of 1 for both Rx and Tx, which is
unexpected. I identified this patch to be responsible for the regression:

4c3d7961d9002: net/mlx5: fix reported Rx/Tx descriptor limits
https://git.dpdk.org/dpdk/commit/?id=4c3d7961d9002bb715a8ee76bcf464d633316d4c

After doing some debugging, I noticed that hca_attr.log_max_wq_sz is never
configured. This should be done in mlx5_devx_cmd_query_hca_attr() which is
called in this bit of code:

https://git.dpdk.org/dpdk/tree/drivers/common/mlx5/mlx5_common.c#n681

/*
* When CTX is created by Verbs, query HCA attribute is unsupported.
* When CTX is imported, we cannot know if it is created by DevX or
* Verbs. So, we use query HCA attribute function to check it.
*/
if (cdev->config.devx || cdev->config.device_fd != MLX5_ARG_UNSET) {
/* Query HCA attributes. */
ret = mlx5_devx_cmd_query_hca_attr(cdev->ctx, &cdev->config.hca_attr);
if (ret) {
DRV_LOG(ERR, "Unable to read HCA caps in DevX mode.");
rte_errno = ENOTSUP;
goto error;
}
cdev->config.devx = 1;
}
DRV_LOG(DEBUG, "DevX is %ssupported.", cdev->config.devx ? "" : "NOT ");

I deduced that following the above patch, the correct value for maximum Rx
and Tx descriptors will only be set if DevX is enabled (see the if
condition on cdev->config.devx). If it is disabled, then maximum Rx and Tx
descriptors will be 1, which will make the ports fail to start. Perhaps we
should keep the previous default value (65535) if config.devx == 0 (DevX
off)? This could be done like this, for example:

diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 7708a0b80883..8ba3eb4a32de 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -359,10 +359,12 @@ mlx5_dev_infos_get(struct rte_eth_dev *dev, struct
rte_eth_dev_info *info)
        info->flow_type_rss_offloads = ~MLX5_RSS_HF_MASK;
        mlx5_set_default_params(dev, info);
        mlx5_set_txlimit_params(dev, info);
-       info->rx_desc_lim.nb_max =
-               1 << priv->sh->cdev->config.hca_attr.log_max_wq_sz;
-       info->tx_desc_lim.nb_max =
-               1 << priv->sh->cdev->config.hca_attr.log_max_wq_sz;
+       if (priv->sh->cdev->config.devx) {
+               info->rx_desc_lim.nb_max =
+                       1 << priv->sh->cdev->config.hca_attr.log_max_wq_sz;
+               info->tx_desc_lim.nb_max =
+                       1 << priv->sh->cdev->config.hca_attr.log_max_wq_sz;
+       }
        if (priv->sh->cdev->config.hca_attr.mem_rq_rmp &&
            priv->obj_ops.rxq_obj_new == devx_obj_ops.rxq_obj_new)
                info->dev_capa |= RTE_ETH_DEV_CAPA_RXQ_SHARE;

Thanks in advance for your help.

Regards,
Edwin Brossette.

[-- Attachment #2: Type: text/html, Size: 5377 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* RE: net/mlx5: wrong Rx/Tx descriptor limits when DevX is off
  2024-12-20 17:05 net/mlx5: wrong Rx/Tx descriptor limits when DevX is off Edwin Brossette
@ 2024-12-22  7:39 ` Asaf Penso
  2024-12-23 13:08   ` Slava Ovsiienko
  0 siblings, 1 reply; 3+ messages in thread
From: Asaf Penso @ 2024-12-22  7:39 UTC (permalink / raw)
  To: igootorov, Slava Ovsiienko
  Cc: Laurent Hardy, Olivier Matz, Didier Pallard, Jean-Mickael Guerin,
	Edwin Brossette, dev

[-- Attachment #1: Type: text/plain, Size: 4631 bytes --]

Hello Igor and Slava,
Can you please check out this issue?

Regards,
Asaf Penso

From: Edwin Brossette <edwin.brossette@6wind.com>
Sent: Friday, 20 December 2024 19:06
To: dev@dpdk.org
Cc: Laurent Hardy <laurent.hardy@6wind.com>; Olivier Matz <olivier.matz@6wind.com>; Didier Pallard <didier.pallard@6wind.com>; Jean-Mickael Guerin <jmg@6wind.com>
Subject: net/mlx5: wrong Rx/Tx descriptor limits when DevX is off

Hello,

I have run into a regression following an update to stable dpdk-24.11 with a number of my Mellanox cx4/5/6 nics. This regression occurs with all nics in my lab which have DevX disabled: using mstconfig utility, I can see the flag UCTX_EN is not set.

Mainly, the issue is that the ports cannot be started, with the following error logs in the journal:
Set nb_rxd=1 (asked=512) for port=0
Set nb_txd=1 (asked=512) for port=0
starting port 0
Initializing port 0 [7c:fe:90:65:e6:54]
port 0: ntfp1 (mlx5_pci)
nb_rxq=2 nb_txq=2
rxq0=c9 rxq1=c25
txq0=c9 txq1=c25
port 0: rx_scatter=0 tx_scatter=0 max_rx_frame=1526
mlx5_net: port 0 number of descriptors requested for Tx queue 0 must be higher than MLX5_TX_COMP_THRESH, using 33 instead of 1
mlx5_net: port 0 increased number of descriptors in Tx queue 0 to the next power of two (64)
mlx5_net: port 0 number of descriptors requested for Tx queue 1 must be higher than MLX5_TX_COMP_THRESH, using 33 instead of 1
mlx5_net: port 0 increased number of descriptors in Tx queue 1 to the next power of two (64)
mlx5_net: Port 0 Rx queue 0 CQ creation failure.
mlx5_net: port 0 Rx queue allocation failed: Cannot allocate memory
rte_eth_dev_start(port 0) failed, error=-12
Failed to start port 0, set link down
Failed to start port 0

Looking more precisely into the problem, it appears that the number of Rx and Tx descriptors configured for my queues is 1. This happens because mlx5_dev_infos_get() return a limit of 1 for both Rx and Tx, which is unexpected. I identified this patch to be responsible for the regression:

4c3d7961d9002: net/mlx5: fix reported Rx/Tx descriptor limits
https://git.dpdk.org/dpdk/commit/?id=4c3d7961d9002bb715a8ee76bcf464d633316d4c

After doing some debugging, I noticed that hca_attr.log_max_wq_sz is never configured. This should be done in mlx5_devx_cmd_query_hca_attr() which is called in this bit of code:

https://git.dpdk.org/dpdk/tree/drivers/common/mlx5/mlx5_common.c#n681
/*
* When CTX is created by Verbs, query HCA attribute is unsupported.
* When CTX is imported, we cannot know if it is created by DevX or
* Verbs. So, we use query HCA attribute function to check it.
*/
if (cdev->config.devx || cdev->config.device_fd != MLX5_ARG_UNSET) {
/* Query HCA attributes. */
ret = mlx5_devx_cmd_query_hca_attr(cdev->ctx, &cdev->config.hca_attr);
if (ret) {
DRV_LOG(ERR, "Unable to read HCA caps in DevX mode.");
rte_errno = ENOTSUP;
goto error;
}
cdev->config.devx = 1;
}
DRV_LOG(DEBUG, "DevX is %ssupported.", cdev->config.devx ? "" : "NOT ");

I deduced that following the above patch, the correct value for maximum Rx and Tx descriptors will only be set if DevX is enabled (see the if condition on cdev->config.devx). If it is disabled, then maximum Rx and Tx descriptors will be 1, which will make the ports fail to start. Perhaps we should keep the previous default value (65535) if config.devx == 0 (DevX off)? This could be done like this, for example:

diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 7708a0b80883..8ba3eb4a32de 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -359,10 +359,12 @@ mlx5_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *info)
        info->flow_type_rss_offloads = ~MLX5_RSS_HF_MASK;
        mlx5_set_default_params(dev, info);
        mlx5_set_txlimit_params(dev, info);
-       info->rx_desc_lim.nb_max =
-               1 << priv->sh->cdev->config.hca_attr.log_max_wq_sz;
-       info->tx_desc_lim.nb_max =
-               1 << priv->sh->cdev->config.hca_attr.log_max_wq_sz;
+       if (priv->sh->cdev->config.devx) {
+               info->rx_desc_lim.nb_max =
+                       1 << priv->sh->cdev->config.hca_attr.log_max_wq_sz;
+               info->tx_desc_lim.nb_max =
+                       1 << priv->sh->cdev->config.hca_attr.log_max_wq_sz;
+       }
        if (priv->sh->cdev->config.hca_attr.mem_rq_rmp &&
            priv->obj_ops.rxq_obj_new == devx_obj_ops.rxq_obj_new)
                info->dev_capa |= RTE_ETH_DEV_CAPA_RXQ_SHARE;

Thanks in advance for your help.

Regards,
Edwin Brossette.


[-- Attachment #2: Type: text/html, Size: 9503 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* RE: net/mlx5: wrong Rx/Tx descriptor limits when DevX is off
  2024-12-22  7:39 ` Asaf Penso
@ 2024-12-23 13:08   ` Slava Ovsiienko
  0 siblings, 0 replies; 3+ messages in thread
From: Slava Ovsiienko @ 2024-12-23 13:08 UTC (permalink / raw)
  To: Asaf Penso, igootorov
  Cc: Laurent Hardy, Olivier Matz, Didier Pallard, Jean-Mickael Guerin,
	Edwin Brossette, dev

[-- Attachment #1: Type: text/plain, Size: 5449 bytes --]

Confirm, it’s a bug, IIUC was introduced by reporting function update.
AFAIK, we do not test with non-DevX environment anymore, so missed this.
Fix should be provided.

With best regards,
Slava

From: Asaf Penso <asafp@nvidia.com>
Sent: Sunday, December 22, 2024 9:39 AM
To: igootorov@gmail.com; Slava Ovsiienko <viacheslavo@nvidia.com>
Cc: Laurent Hardy <laurent.hardy@6wind.com>; Olivier Matz <olivier.matz@6wind.com>; Didier Pallard <didier.pallard@6wind.com>; Jean-Mickael Guerin <jmg@6wind.com>; Edwin Brossette <edwin.brossette@6wind.com>; dev@dpdk.org
Subject: RE: net/mlx5: wrong Rx/Tx descriptor limits when DevX is off

Hello Igor and Slava,
Can you please check out this issue?

Regards,
Asaf Penso

From: Edwin Brossette <edwin.brossette@6wind.com<mailto:edwin.brossette@6wind.com>>
Sent: Friday, 20 December 2024 19:06
To: dev@dpdk.org<mailto:dev@dpdk.org>
Cc: Laurent Hardy <laurent.hardy@6wind.com<mailto:laurent.hardy@6wind.com>>; Olivier Matz <olivier.matz@6wind.com<mailto:olivier.matz@6wind.com>>; Didier Pallard <didier.pallard@6wind.com<mailto:didier.pallard@6wind.com>>; Jean-Mickael Guerin <jmg@6wind.com<mailto:jmg@6wind.com>>
Subject: net/mlx5: wrong Rx/Tx descriptor limits when DevX is off

Hello,

I have run into a regression following an update to stable dpdk-24.11 with a number of my Mellanox cx4/5/6 nics. This regression occurs with all nics in my lab which have DevX disabled: using mstconfig utility, I can see the flag UCTX_EN is not set.

Mainly, the issue is that the ports cannot be started, with the following error logs in the journal:
Set nb_rxd=1 (asked=512) for port=0
Set nb_txd=1 (asked=512) for port=0
starting port 0
Initializing port 0 [7c:fe:90:65:e6:54]
port 0: ntfp1 (mlx5_pci)
nb_rxq=2 nb_txq=2
rxq0=c9 rxq1=c25
txq0=c9 txq1=c25
port 0: rx_scatter=0 tx_scatter=0 max_rx_frame=1526
mlx5_net: port 0 number of descriptors requested for Tx queue 0 must be higher than MLX5_TX_COMP_THRESH, using 33 instead of 1
mlx5_net: port 0 increased number of descriptors in Tx queue 0 to the next power of two (64)
mlx5_net: port 0 number of descriptors requested for Tx queue 1 must be higher than MLX5_TX_COMP_THRESH, using 33 instead of 1
mlx5_net: port 0 increased number of descriptors in Tx queue 1 to the next power of two (64)
mlx5_net: Port 0 Rx queue 0 CQ creation failure.
mlx5_net: port 0 Rx queue allocation failed: Cannot allocate memory
rte_eth_dev_start(port 0) failed, error=-12
Failed to start port 0, set link down
Failed to start port 0

Looking more precisely into the problem, it appears that the number of Rx and Tx descriptors configured for my queues is 1. This happens because mlx5_dev_infos_get() return a limit of 1 for both Rx and Tx, which is unexpected. I identified this patch to be responsible for the regression:

4c3d7961d9002: net/mlx5: fix reported Rx/Tx descriptor limits
https://git.dpdk.org/dpdk/commit/?id=4c3d7961d9002bb715a8ee76bcf464d633316d4c

After doing some debugging, I noticed that hca_attr.log_max_wq_sz is never configured. This should be done in mlx5_devx_cmd_query_hca_attr() which is called in this bit of code:

https://git.dpdk.org/dpdk/tree/drivers/common/mlx5/mlx5_common.c#n681
/*
* When CTX is created by Verbs, query HCA attribute is unsupported.
* When CTX is imported, we cannot know if it is created by DevX or
* Verbs. So, we use query HCA attribute function to check it.
*/
if (cdev->config.devx || cdev->config.device_fd != MLX5_ARG_UNSET) {
/* Query HCA attributes. */
ret = mlx5_devx_cmd_query_hca_attr(cdev->ctx, &cdev->config.hca_attr);
if (ret) {
DRV_LOG(ERR, "Unable to read HCA caps in DevX mode.");
rte_errno = ENOTSUP;
goto error;
}
cdev->config.devx = 1;
}
DRV_LOG(DEBUG, "DevX is %ssupported.", cdev->config.devx ? "" : "NOT ");

I deduced that following the above patch, the correct value for maximum Rx and Tx descriptors will only be set if DevX is enabled (see the if condition on cdev->config.devx). If it is disabled, then maximum Rx and Tx descriptors will be 1, which will make the ports fail to start. Perhaps we should keep the previous default value (65535) if config.devx == 0 (DevX off)? This could be done like this, for example:

diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 7708a0b80883..8ba3eb4a32de 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -359,10 +359,12 @@ mlx5_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *info)
        info->flow_type_rss_offloads = ~MLX5_RSS_HF_MASK;
        mlx5_set_default_params(dev, info);
        mlx5_set_txlimit_params(dev, info);
-       info->rx_desc_lim.nb_max =
-               1 << priv->sh->cdev->config.hca_attr.log_max_wq_sz;
-       info->tx_desc_lim.nb_max =
-               1 << priv->sh->cdev->config.hca_attr.log_max_wq_sz;
+       if (priv->sh->cdev->config.devx) {
+               info->rx_desc_lim.nb_max =
+                       1 << priv->sh->cdev->config.hca_attr.log_max_wq_sz;
+               info->tx_desc_lim.nb_max =
+                       1 << priv->sh->cdev->config.hca_attr.log_max_wq_sz;
+       }
        if (priv->sh->cdev->config.hca_attr.mem_rq_rmp &&
            priv->obj_ops.rxq_obj_new == devx_obj_ops.rxq_obj_new)
                info->dev_capa |= RTE_ETH_DEV_CAPA_RXQ_SHARE;

Thanks in advance for your help.

Regards,
Edwin Brossette.


[-- Attachment #2: Type: text/html, Size: 11108 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2024-12-23 13:09 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-12-20 17:05 net/mlx5: wrong Rx/Tx descriptor limits when DevX is off Edwin Brossette
2024-12-22  7:39 ` Asaf Penso
2024-12-23 13:08   ` Slava Ovsiienko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).