From: Shani Peretz <shperetz@nvidia.com>
To: Dariusz Sosnowski <dsosnowski@nvidia.com>,
"stable@dpdk.org" <stable@dpdk.org>,
Kevin Traynor <ktraynor@redhat.com>
Cc: Slava Ovsiienko <viacheslavo@nvidia.com>,
Bing Zhao <bingz@nvidia.com>, Ori Kam <orika@nvidia.com>,
Suanming Mou <suanmingm@nvidia.com>,
Matan Azrad <matan@nvidia.com>,
"NBU-Contact-Adrien Mazarguil (EXTERNAL)"
<adrien.mazarguil@6wind.com>,
Didier Pallard <didier.pallard@6wind.com>,
"NBU-Contact-N?lio Laranjeiro (EXTERNAL)"
<nelio.laranjeiro@6wind.com>,
Francesco Santoro <francesco.santoro@6wind.com>
Subject: RE: [PATCH 24.11] net/mlx5: fix min and max MTU reporting
Date: Tue, 30 Dec 2025 10:09:03 +0000 [thread overview]
Message-ID: <MW4PR12MB7484B34CF4B6714844CF2942BFBCA@MW4PR12MB7484.namprd12.prod.outlook.com> (raw)
In-Reply-To: <20251104172715.1328088-1-dsosnowski@nvidia.com>
> -----Original Message-----
> From: Dariusz Sosnowski <dsosnowski@nvidia.com>
> Sent: Tuesday, 4 November 2025 19:27
> To: stable@dpdk.org; Kevin Traynor <ktraynor@redhat.com>
> Cc: Slava Ovsiienko <viacheslavo@nvidia.com>; Bing Zhao <bingz@nvidia.com>;
> Ori Kam <orika@nvidia.com>; Suanming Mou <suanmingm@nvidia.com>;
> Matan Azrad <matan@nvidia.com>; NBU-Contact-Adrien Mazarguil (EXTERNAL)
> <adrien.mazarguil@6wind.com>; Didier Pallard <didier.pallard@6wind.com>;
> NBU-Contact-N?lio Laranjeiro (EXTERNAL) <nelio.laranjeiro@6wind.com>;
> Francesco Santoro <francesco.santoro@6wind.com>
> Subject: [PATCH 24.11] net/mlx5: fix min and max MTU reporting
>
> External email: Use caution opening links or attachments
>
>
> [ upstream commit 44d657109216a32e8718446f20f91272e10575dd ]
>
> mlx5 PMD used hardcoded and incorrect values when reporting maximum MTU
> and maximum Rx packet length through rte_eth_dev_info_get().
>
> This patch adds support for querying OS for minimum and maximum allowed
> MTU values. Maximum Rx packet length is then calculated based on these
> values.
>
> On Linux, these values are queried through netlink, using IFLA_MIN_MTU and
> IFLA_MAX_MTU attributes added in Linux 4.18.
>
> Windows API unfortunately does not expose minimum and maximum allowed
> MTU values. In this case, fallback hardcoded values (working on currently
> supported HW) will be used.
>
> Bugzilla ID: 1719
> Fixes: e60fbd5b24fc ("mlx5: add device configure/start/stop")
> Cc: stable@dpdk.org
>
> Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
> ---
> drivers/common/mlx5/linux/mlx5_nl.c | 108 ++++++++++++++++++++++
> drivers/common/mlx5/linux/mlx5_nl.h | 3 +
> drivers/common/mlx5/version.map | 1 +
> drivers/net/mlx5/linux/mlx5_ethdev_os.c | 30 ++++++
> drivers/net/mlx5/linux/mlx5_os.c | 2 +
> drivers/net/mlx5/mlx5.h | 13 +++
> drivers/net/mlx5/mlx5_ethdev.c | 42 ++++++++-
> drivers/net/mlx5/windows/mlx5_ethdev_os.c | 28 ++++++
> drivers/net/mlx5/windows/mlx5_os.c | 2 +
> 9 files changed, 228 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/common/mlx5/linux/mlx5_nl.c
> b/drivers/common/mlx5/linux/mlx5_nl.c
> index a5ac4dc543..6824ea322d 100644
> --- a/drivers/common/mlx5/linux/mlx5_nl.c
> +++ b/drivers/common/mlx5/linux/mlx5_nl.c
> @@ -2033,3 +2033,111 @@ mlx5_nl_devlink_esw_multiport_get(int nlsk_fd,
> int family_id, const char *pci_ad
> *enable ? "en" : "dis", pci_addr);
> return ret;
> }
> +
> +struct mlx5_mtu {
> + uint32_t min_mtu;
> + bool min_mtu_set;
> + uint32_t max_mtu;
> + bool max_mtu_set;
> +};
> +
> +static int
> +mlx5_nl_get_mtu_bounds_cb(struct nlmsghdr *nh, void *arg) {
> + size_t off = NLMSG_LENGTH(sizeof(struct ifinfomsg));
> + struct mlx5_mtu *out = arg;
> +
> + while (off < nh->nlmsg_len) {
> + struct rtattr *ra = RTE_PTR_ADD(nh, off);
> + uint32_t *payload;
> +
> + switch (ra->rta_type) {
> + case IFLA_MIN_MTU:
> + payload = RTA_DATA(ra);
> + out->min_mtu = *payload;
> + out->min_mtu_set = true;
> + break;
> + case IFLA_MAX_MTU:
> + payload = RTA_DATA(ra);
> + out->max_mtu = *payload;
> + out->max_mtu_set = true;
> + break;
> + default:
> + /* Nothing to do for other attributes. */
> + break;
> + }
> + off += RTA_ALIGN(ra->rta_len);
> + }
> +
> + return 0;
> +}
> +
> +/**
> + * Query minimum and maximum allowed MTU values for given Linux network
> interface.
> + *
> + * This function queries the following interface attributes exposed in netlink
> since Linux 4.18:
> + *
> + * - IFLA_MIN_MTU - minimum allowed MTU
> + * - IFLA_MAX_MTU - maximum allowed MTU
> + *
> + * @param[in] nl
> + * Netlink socket of the ROUTE kind (NETLINK_ROUTE).
> + * @param[in] ifindex
> + * Linux network device index.
> + * @param[out] min_mtu
> + * Pointer to minimum allowed MTU. Populated only if both minimum and
> maximum MTU was queried.
> + * @param[out] max_mtu
> + * Pointer to maximum allowed MTU. Populated only if both minimum and
> maximum MTU was queried.
> + *
> + * @return
> + * 0 on success, negative on error and rte_errno is set.
> + *
> + * Known errors:
> + *
> + * - (-EINVAL) - either @p min_mtu or @p max_mtu is NULL.
> + * - (-ENOENT) - either minimum or maximum allowed MTU was not found in
> interface attributes.
> + */
> +int
> +mlx5_nl_get_mtu_bounds(int nl, unsigned int ifindex, uint16_t *min_mtu,
> +uint16_t *max_mtu) {
> + struct mlx5_mtu out = { 0 };
> + struct {
> + struct nlmsghdr nh;
> + struct ifinfomsg info;
> + } req = {
> + .nh = {
> + .nlmsg_len = NLMSG_LENGTH(sizeof(req.info)),
> + .nlmsg_type = RTM_GETLINK,
> + .nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK,
> + },
> + .info = {
> + .ifi_family = AF_UNSPEC,
> + .ifi_index = ifindex,
> + },
> + };
> + uint32_t sn = MLX5_NL_SN_GENERATE;
> + int ret;
> +
> + if (min_mtu == NULL || max_mtu == NULL) {
> + rte_errno = EINVAL;
> + return -rte_errno;
> + }
> +
> + ret = mlx5_nl_send(nl, &req.nh, sn);
> + if (ret < 0)
> + return ret;
> +
> + ret = mlx5_nl_recv(nl, sn, mlx5_nl_get_mtu_bounds_cb, &out);
> + if (ret < 0)
> + return ret;
> +
> + if (!out.min_mtu_set || !out.max_mtu_set) {
> + rte_errno = ENOENT;
> + return -rte_errno;
> + }
> +
> + *min_mtu = out.min_mtu;
> + *max_mtu = out.max_mtu;
> +
> + return ret;
> +}
> diff --git a/drivers/common/mlx5/linux/mlx5_nl.h
> b/drivers/common/mlx5/linux/mlx5_nl.h
> index 580de3b769..34306258ec 100644
> --- a/drivers/common/mlx5/linux/mlx5_nl.h
> +++ b/drivers/common/mlx5/linux/mlx5_nl.h
> @@ -87,4 +87,7 @@ __rte_internal
> int mlx5_nl_devlink_esw_multiport_get(int nlsk_fd, int family_id,
> const char *pci_addr, int *enable);
>
> +__rte_internal
> +int mlx5_nl_get_mtu_bounds(int nl, unsigned int ifindex, uint16_t
> +*min_mtu, uint16_t *max_mtu);
> +
> #endif /* RTE_PMD_MLX5_NL_H_ */
> diff --git a/drivers/common/mlx5/version.map
> b/drivers/common/mlx5/version.map index 6311b27c8a..ccd04d65e0 100644
> --- a/drivers/common/mlx5/version.map
> +++ b/drivers/common/mlx5/version.map
> @@ -146,6 +146,7 @@ INTERNAL {
> mlx5_nl_vf_mac_addr_modify; # WINDOWS_NO_EXPORT
> mlx5_nl_vlan_vmwa_create; # WINDOWS_NO_EXPORT
> mlx5_nl_vlan_vmwa_delete; # WINDOWS_NO_EXPORT
> + mlx5_nl_get_mtu_bounds; # WINDOWS_NO_EXPORT
>
> mlx5_os_get_physical_device_ctx;
> mlx5_os_umem_dereg;
> diff --git a/drivers/net/mlx5/linux/mlx5_ethdev_os.c
> b/drivers/net/mlx5/linux/mlx5_ethdev_os.c
> index f78d602098..a5dda1e5f1 100644
> --- a/drivers/net/mlx5/linux/mlx5_ethdev_os.c
> +++ b/drivers/net/mlx5/linux/mlx5_ethdev_os.c
> @@ -157,6 +157,36 @@ mlx5_ifreq(const struct rte_eth_dev *dev, int req,
> struct ifreq *ifr)
> return mlx5_ifreq_by_ifname(ifname, req, ifr); }
>
> +/**
> + * Get device minimum and maximum allowed MTU values.
> + *
> + * @param dev
> + * Pointer to Ethernet device.
> + * @param[out] min_mtu
> + * Minimum MTU value output buffer.
> + * @param[out] max_mtu
> + * Maximum MTU value output buffer.
> + *
> + * @return
> + * 0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +int
> +mlx5_os_get_mtu_bounds(struct rte_eth_dev *dev, uint16_t *min_mtu,
> +uint16_t *max_mtu) {
> + struct mlx5_priv *priv = dev->data->dev_private;
> + int nl_route;
> + int ret;
> +
> + nl_route = mlx5_nl_init(NETLINK_ROUTE, 0);
> + if (nl_route < 0)
> + return nl_route;
> +
> + ret = mlx5_nl_get_mtu_bounds(nl_route, priv->if_index, min_mtu,
> + max_mtu);
> +
> + close(nl_route);
> + return ret;
> +}
> +
> /**
> * Get device MTU.
> *
> diff --git a/drivers/net/mlx5/linux/mlx5_os.c
> b/drivers/net/mlx5/linux/mlx5_os.c
> index 4bd5c8da7d..b23ee265b6 100644
> --- a/drivers/net/mlx5/linux/mlx5_os.c
> +++ b/drivers/net/mlx5/linux/mlx5_os.c
> @@ -1570,6 +1570,8 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
> eth_dev->data->mac_addrs = priv->mac;
> eth_dev->device = dpdk_dev;
> eth_dev->data->dev_flags |= RTE_ETH_DEV_AUTOFILL_QUEUE_XSTATS;
> + /* Fetch minimum and maximum allowed MTU from the device. */
> + mlx5_get_mtu_bounds(eth_dev, &priv->min_mtu, &priv->max_mtu);
> /* Configure the first MAC address by default. */
> if (mlx5_get_mac(eth_dev, &mac.addr_bytes)) {
> DRV_LOG(ERR,
> diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h index
> 296276c7ca..7966b65bae 100644
> --- a/drivers/net/mlx5/mlx5.h
> +++ b/drivers/net/mlx5/mlx5.h
> @@ -74,6 +74,15 @@
> /* Maximal number of field/field parts to map into sample registers .*/
> #define MLX5_FLEX_ITEM_MAPPING_NUM 32
>
> +/* Number of bytes not included in MTU. */ #define MLX5_ETH_OVERHEAD
> +(RTE_ETHER_HDR_LEN + RTE_VLAN_HLEN + RTE_ETHER_CRC_LEN)
> +
> +/* Minimum allowed MTU to be reported whenever PMD cannot query it from
> +OS. */ #define MLX5_ETH_MIN_MTU (RTE_ETHER_MIN_MTU)
> +
> +/* Maximum allowed MTU to be reported whenever PMD cannot query it
> from
> +OS. */ #define MLX5_ETH_MAX_MTU (9978)
> +
> enum mlx5_ipool_index {
> #if defined(HAVE_IBV_FLOW_DV_SUPPORT) ||
> !defined(HAVE_INFINIBAND_VERBS_H)
> MLX5_IPOOL_DECAP_ENCAP = 0, /* Pool for encap/decap resource. */ @@
> -1957,6 +1966,8 @@ struct mlx5_priv {
> unsigned int vlan_filter_n; /* Number of configured VLAN filters. */
> /* Device properties. */
> uint16_t mtu; /* Configured MTU. */
> + uint16_t min_mtu; /* Minimum MTU allowed on the NIC. */
> + uint16_t max_mtu; /* Maximum MTU allowed on the NIC. */
> unsigned int isolated:1; /* Whether isolated mode is enabled. */
> unsigned int representor:1; /* Device is a port representor. */
> unsigned int master:1; /* Device is a E-Switch master. */ @@ -2286,6
> +2297,7 @@ struct mlx5_priv *mlx5_dev_to_eswitch_info(struct rte_eth_dev
> *dev); int mlx5_dev_configure_rss_reta(struct rte_eth_dev *dev); uint64_t
> mlx5_get_restore_flags(struct rte_eth_dev *dev,
> enum rte_eth_dev_operation op);
> +void mlx5_get_mtu_bounds(struct rte_eth_dev *dev, uint16_t *min_mtu,
> +uint16_t *max_mtu);
>
> /* mlx5_ethdev_os.c */
>
> @@ -2323,6 +2335,7 @@ int mlx5_os_get_stats_n(struct rte_eth_dev *dev,
> bool bond_master,
> uint16_t *n_stats, uint16_t *n_stats_sec); void
> mlx5_os_stats_init(struct rte_eth_dev *dev); int
> mlx5_get_flag_dropless_rq(struct rte_eth_dev *dev);
> +int mlx5_os_get_mtu_bounds(struct rte_eth_dev *dev, uint16_t *min_mtu,
> +uint16_t *max_mtu);
>
> /* mlx5_mac.c */
>
> diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
> index 68d1c1bfa7..7747b0c869 100644
> --- a/drivers/net/mlx5/mlx5_ethdev.c
> +++ b/drivers/net/mlx5/mlx5_ethdev.c
> @@ -360,9 +360,11 @@ mlx5_dev_infos_get(struct rte_eth_dev *dev, struct
> rte_eth_dev_info *info)
> unsigned int max;
> uint16_t max_wqe;
>
> + info->min_mtu = priv->min_mtu;
> + info->max_mtu = priv->max_mtu;
> + info->max_rx_pktlen = info->max_mtu + MLX5_ETH_OVERHEAD;
> /* FIXME: we should ask the device for these values. */
> info->min_rx_bufsize = 32;
> - info->max_rx_pktlen = 65536;
> info->max_lro_pkt_size = MLX5_MAX_LRO_SIZE;
> /*
> * Since we need one CQ per QP, the limit is the minimum number @@ -
> 863,3 +865,41 @@ mlx5_get_restore_flags(__rte_unused struct rte_eth_dev
> *dev,
> /* mlx5 PMD does not require any configuration restore. */
> return 0;
> }
> +
> +/**
> + * Query minimum and maximum allowed MTU value on the device.
> + *
> + * This functions will always return valid MTU bounds.
> + * In case platform-specific implementation fails or current platform
> +does not support it,
> + * the fallback default values will be used.
> + *
> + * @param[in] dev
> + * Pointer to Ethernet device
> + * @param[out] min_mtu
> + * Minimum MTU value output buffer.
> + * @param[out] max_mtu
> + * Maximum MTU value output buffer.
> + */
> +void
> +mlx5_get_mtu_bounds(struct rte_eth_dev *dev, uint16_t *min_mtu,
> +uint16_t *max_mtu) {
> + int ret;
> +
> + MLX5_ASSERT(min_mtu != NULL);
> + MLX5_ASSERT(max_mtu != NULL);
> +
> + ret = mlx5_os_get_mtu_bounds(dev, min_mtu, max_mtu);
> + if (ret < 0) {
> + if (ret != -ENOTSUP)
> + DRV_LOG(INFO, "port %u failed to query MTU bounds, using
> fallback values",
> + dev->data->port_id);
> + *min_mtu = MLX5_ETH_MIN_MTU;
> + *max_mtu = MLX5_ETH_MAX_MTU;
> +
> + /* This function does not fail. Clear rte_errno. */
> + rte_errno = 0;
> + }
> +
> + DRV_LOG(INFO, "port %u minimum MTU is %u", dev->data->port_id,
> *min_mtu);
> + DRV_LOG(INFO, "port %u maximum MTU is %u", dev->data->port_id,
> +*max_mtu); }
> diff --git a/drivers/net/mlx5/windows/mlx5_ethdev_os.c
> b/drivers/net/mlx5/windows/mlx5_ethdev_os.c
> index ec08bfef6d..e24ff367af 100644
> --- a/drivers/net/mlx5/windows/mlx5_ethdev_os.c
> +++ b/drivers/net/mlx5/windows/mlx5_ethdev_os.c
> @@ -71,6 +71,34 @@ mlx5_get_ifname(const struct rte_eth_dev *dev, char
> ifname[MLX5_NAMESIZE])
> return 0;
> }
>
> +/**
> + * Get device minimum and maximum allowed MTU.
> + *
> + * Windows API does not expose minimum and maximum allowed MTU.
> + * In this case, this just returns (-ENOTSUP) to allow
> +platform-independent code
> + * to fallback to default values.
> + *
> + * @param dev
> + * Pointer to Ethernet device.
> + * @param[out] min_mtu
> + * Minimum MTU value output buffer.
> + * @param[out] max_mtu
> + * Maximum MTU value output buffer.
> + *
> + * @return
> + * (-ENOTSUP) - not supported on Windows
> + */
> +int
> +mlx5_os_get_mtu_bounds(struct rte_eth_dev *dev, uint16_t *min_mtu,
> +uint16_t *max_mtu) {
> + RTE_SET_USED(dev);
> + RTE_SET_USED(min_mtu);
> + RTE_SET_USED(max_mtu);
> +
> + rte_errno = ENOTSUP;
> + return -rte_errno;
> +}
> +
> /**
> * Get device MTU.
> *
> diff --git a/drivers/net/mlx5/windows/mlx5_os.c
> b/drivers/net/mlx5/windows/mlx5_os.c
> index 7bd001950e..5c7e76383f 100644
> --- a/drivers/net/mlx5/windows/mlx5_os.c
> +++ b/drivers/net/mlx5/windows/mlx5_os.c
> @@ -478,6 +478,8 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
> eth_dev->data->mac_addrs = priv->mac;
> eth_dev->device = dpdk_dev;
> eth_dev->data->dev_flags |= RTE_ETH_DEV_AUTOFILL_QUEUE_XSTATS;
> + /* Fetch minimum and maximum allowed MTU from the device. */
> + mlx5_get_mtu_bounds(eth_dev, &priv->min_mtu, &priv->max_mtu);
> /* Configure the first MAC address by default. */
> if (mlx5_get_mac(eth_dev, &mac.addr_bytes)) {
> DRV_LOG(ERR,
> --
> 2.39.5
This patch has also been applied to 23.11.
Thanks,
Shani
prev parent reply other threads:[~2025-12-30 10:09 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-04 17:27 Dariusz Sosnowski
2025-11-05 11:07 ` Kevin Traynor
2025-12-30 10:09 ` Shani Peretz [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=MW4PR12MB7484B34CF4B6714844CF2942BFBCA@MW4PR12MB7484.namprd12.prod.outlook.com \
--to=shperetz@nvidia.com \
--cc=adrien.mazarguil@6wind.com \
--cc=bingz@nvidia.com \
--cc=didier.pallard@6wind.com \
--cc=dsosnowski@nvidia.com \
--cc=francesco.santoro@6wind.com \
--cc=ktraynor@redhat.com \
--cc=matan@nvidia.com \
--cc=nelio.laranjeiro@6wind.com \
--cc=orika@nvidia.com \
--cc=stable@dpdk.org \
--cc=suanmingm@nvidia.com \
--cc=viacheslavo@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).