patches for DPDK stable branches
 help / color / mirror / Atom feed
From: Shani Peretz <shperetz@nvidia.com>
To: Dariusz Sosnowski <dsosnowski@nvidia.com>,
	"stable@dpdk.org" <stable@dpdk.org>,
	Kevin Traynor <ktraynor@redhat.com>
Cc: Slava Ovsiienko <viacheslavo@nvidia.com>,
	Bing Zhao <bingz@nvidia.com>, Ori Kam <orika@nvidia.com>,
	Suanming Mou <suanmingm@nvidia.com>,
	Matan Azrad <matan@nvidia.com>,
	"NBU-Contact-Adrien Mazarguil (EXTERNAL)"
	<adrien.mazarguil@6wind.com>,
	Didier Pallard <didier.pallard@6wind.com>,
	"NBU-Contact-N?lio Laranjeiro (EXTERNAL)"
	<nelio.laranjeiro@6wind.com>,
	Francesco Santoro <francesco.santoro@6wind.com>
Subject: RE: [PATCH 24.11] net/mlx5: fix min and max MTU reporting
Date: Tue, 30 Dec 2025 10:09:03 +0000	[thread overview]
Message-ID: <MW4PR12MB7484B34CF4B6714844CF2942BFBCA@MW4PR12MB7484.namprd12.prod.outlook.com> (raw)
In-Reply-To: <20251104172715.1328088-1-dsosnowski@nvidia.com>



> -----Original Message-----
> From: Dariusz Sosnowski <dsosnowski@nvidia.com>
> Sent: Tuesday, 4 November 2025 19:27
> To: stable@dpdk.org; Kevin Traynor <ktraynor@redhat.com>
> Cc: Slava Ovsiienko <viacheslavo@nvidia.com>; Bing Zhao <bingz@nvidia.com>;
> Ori Kam <orika@nvidia.com>; Suanming Mou <suanmingm@nvidia.com>;
> Matan Azrad <matan@nvidia.com>; NBU-Contact-Adrien Mazarguil (EXTERNAL)
> <adrien.mazarguil@6wind.com>; Didier Pallard <didier.pallard@6wind.com>;
> NBU-Contact-N?lio Laranjeiro (EXTERNAL) <nelio.laranjeiro@6wind.com>;
> Francesco Santoro <francesco.santoro@6wind.com>
> Subject: [PATCH 24.11] net/mlx5: fix min and max MTU reporting
> 
> External email: Use caution opening links or attachments
> 
> 
> [ upstream commit 44d657109216a32e8718446f20f91272e10575dd ]
> 
> mlx5 PMD used hardcoded and incorrect values when reporting maximum MTU
> and maximum Rx packet length through rte_eth_dev_info_get().
> 
> This patch adds support for querying OS for minimum and maximum allowed
> MTU values. Maximum Rx packet length is then calculated based on these
> values.
> 
> On Linux, these values are queried through netlink, using IFLA_MIN_MTU and
> IFLA_MAX_MTU attributes added in Linux 4.18.
> 
> Windows API unfortunately does not expose minimum and maximum allowed
> MTU values. In this case, fallback hardcoded values (working on currently
> supported HW) will be used.
> 
> Bugzilla ID: 1719
> Fixes: e60fbd5b24fc ("mlx5: add device configure/start/stop")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
> ---
>  drivers/common/mlx5/linux/mlx5_nl.c       | 108 ++++++++++++++++++++++
>  drivers/common/mlx5/linux/mlx5_nl.h       |   3 +
>  drivers/common/mlx5/version.map           |   1 +
>  drivers/net/mlx5/linux/mlx5_ethdev_os.c   |  30 ++++++
>  drivers/net/mlx5/linux/mlx5_os.c          |   2 +
>  drivers/net/mlx5/mlx5.h                   |  13 +++
>  drivers/net/mlx5/mlx5_ethdev.c            |  42 ++++++++-
>  drivers/net/mlx5/windows/mlx5_ethdev_os.c |  28 ++++++
>  drivers/net/mlx5/windows/mlx5_os.c        |   2 +
>  9 files changed, 228 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/common/mlx5/linux/mlx5_nl.c
> b/drivers/common/mlx5/linux/mlx5_nl.c
> index a5ac4dc543..6824ea322d 100644
> --- a/drivers/common/mlx5/linux/mlx5_nl.c
> +++ b/drivers/common/mlx5/linux/mlx5_nl.c
> @@ -2033,3 +2033,111 @@ mlx5_nl_devlink_esw_multiport_get(int nlsk_fd,
> int family_id, const char *pci_ad
>                 *enable ? "en" : "dis", pci_addr);
>         return ret;
>  }
> +
> +struct mlx5_mtu {
> +       uint32_t min_mtu;
> +       bool min_mtu_set;
> +       uint32_t max_mtu;
> +       bool max_mtu_set;
> +};
> +
> +static int
> +mlx5_nl_get_mtu_bounds_cb(struct nlmsghdr *nh, void *arg) {
> +       size_t off = NLMSG_LENGTH(sizeof(struct ifinfomsg));
> +       struct mlx5_mtu *out = arg;
> +
> +       while (off < nh->nlmsg_len) {
> +               struct rtattr *ra = RTE_PTR_ADD(nh, off);
> +               uint32_t *payload;
> +
> +               switch (ra->rta_type) {
> +               case IFLA_MIN_MTU:
> +                       payload = RTA_DATA(ra);
> +                       out->min_mtu = *payload;
> +                       out->min_mtu_set = true;
> +                       break;
> +               case IFLA_MAX_MTU:
> +                       payload = RTA_DATA(ra);
> +                       out->max_mtu = *payload;
> +                       out->max_mtu_set = true;
> +                       break;
> +               default:
> +                       /* Nothing to do for other attributes. */
> +                       break;
> +               }
> +               off += RTA_ALIGN(ra->rta_len);
> +       }
> +
> +       return 0;
> +}
> +
> +/**
> + * Query minimum and maximum allowed MTU values for given Linux network
> interface.
> + *
> + * This function queries the following interface attributes exposed in netlink
> since Linux 4.18:
> + *
> + * - IFLA_MIN_MTU - minimum allowed MTU
> + * - IFLA_MAX_MTU - maximum allowed MTU
> + *
> + * @param[in] nl
> + *   Netlink socket of the ROUTE kind (NETLINK_ROUTE).
> + * @param[in] ifindex
> + *   Linux network device index.
> + * @param[out] min_mtu
> + *   Pointer to minimum allowed MTU. Populated only if both minimum and
> maximum MTU was queried.
> + * @param[out] max_mtu
> + *   Pointer to maximum allowed MTU. Populated only if both minimum and
> maximum MTU was queried.
> + *
> + * @return
> + *   0 on success, negative on error and rte_errno is set.
> + *
> + *   Known errors:
> + *
> + *   - (-EINVAL) - either @p min_mtu or @p max_mtu is NULL.
> + *   - (-ENOENT) - either minimum or maximum allowed MTU was not found in
> interface attributes.
> + */
> +int
> +mlx5_nl_get_mtu_bounds(int nl, unsigned int ifindex, uint16_t *min_mtu,
> +uint16_t *max_mtu) {
> +       struct mlx5_mtu out = { 0 };
> +       struct {
> +               struct nlmsghdr nh;
> +               struct ifinfomsg info;
> +       } req = {
> +               .nh = {
> +                       .nlmsg_len = NLMSG_LENGTH(sizeof(req.info)),
> +                       .nlmsg_type = RTM_GETLINK,
> +                       .nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK,
> +               },
> +               .info = {
> +                       .ifi_family = AF_UNSPEC,
> +                       .ifi_index = ifindex,
> +               },
> +       };
> +       uint32_t sn = MLX5_NL_SN_GENERATE;
> +       int ret;
> +
> +       if (min_mtu == NULL || max_mtu == NULL) {
> +               rte_errno = EINVAL;
> +               return -rte_errno;
> +       }
> +
> +       ret = mlx5_nl_send(nl, &req.nh, sn);
> +       if (ret < 0)
> +               return ret;
> +
> +       ret = mlx5_nl_recv(nl, sn, mlx5_nl_get_mtu_bounds_cb, &out);
> +       if (ret < 0)
> +               return ret;
> +
> +       if (!out.min_mtu_set || !out.max_mtu_set) {
> +               rte_errno = ENOENT;
> +               return -rte_errno;
> +       }
> +
> +       *min_mtu = out.min_mtu;
> +       *max_mtu = out.max_mtu;
> +
> +       return ret;
> +}
> diff --git a/drivers/common/mlx5/linux/mlx5_nl.h
> b/drivers/common/mlx5/linux/mlx5_nl.h
> index 580de3b769..34306258ec 100644
> --- a/drivers/common/mlx5/linux/mlx5_nl.h
> +++ b/drivers/common/mlx5/linux/mlx5_nl.h
> @@ -87,4 +87,7 @@ __rte_internal
>  int mlx5_nl_devlink_esw_multiport_get(int nlsk_fd, int family_id,
>                                       const char *pci_addr, int *enable);
> 
> +__rte_internal
> +int mlx5_nl_get_mtu_bounds(int nl, unsigned int ifindex, uint16_t
> +*min_mtu, uint16_t *max_mtu);
> +
>  #endif /* RTE_PMD_MLX5_NL_H_ */
> diff --git a/drivers/common/mlx5/version.map
> b/drivers/common/mlx5/version.map index 6311b27c8a..ccd04d65e0 100644
> --- a/drivers/common/mlx5/version.map
> +++ b/drivers/common/mlx5/version.map
> @@ -146,6 +146,7 @@ INTERNAL {
>         mlx5_nl_vf_mac_addr_modify; # WINDOWS_NO_EXPORT
>         mlx5_nl_vlan_vmwa_create; # WINDOWS_NO_EXPORT
>         mlx5_nl_vlan_vmwa_delete; # WINDOWS_NO_EXPORT
> +       mlx5_nl_get_mtu_bounds; # WINDOWS_NO_EXPORT
> 
>         mlx5_os_get_physical_device_ctx;
>         mlx5_os_umem_dereg;
> diff --git a/drivers/net/mlx5/linux/mlx5_ethdev_os.c
> b/drivers/net/mlx5/linux/mlx5_ethdev_os.c
> index f78d602098..a5dda1e5f1 100644
> --- a/drivers/net/mlx5/linux/mlx5_ethdev_os.c
> +++ b/drivers/net/mlx5/linux/mlx5_ethdev_os.c
> @@ -157,6 +157,36 @@ mlx5_ifreq(const struct rte_eth_dev *dev, int req,
> struct ifreq *ifr)
>         return mlx5_ifreq_by_ifname(ifname, req, ifr);  }
> 
> +/**
> + * Get device minimum and maximum allowed MTU values.
> + *
> + * @param dev
> + *   Pointer to Ethernet device.
> + * @param[out] min_mtu
> + *   Minimum MTU value output buffer.
> + * @param[out] max_mtu
> + *   Maximum MTU value output buffer.
> + *
> + * @return
> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +int
> +mlx5_os_get_mtu_bounds(struct rte_eth_dev *dev, uint16_t *min_mtu,
> +uint16_t *max_mtu) {
> +       struct mlx5_priv *priv = dev->data->dev_private;
> +       int nl_route;
> +       int ret;
> +
> +       nl_route = mlx5_nl_init(NETLINK_ROUTE, 0);
> +       if  (nl_route < 0)
> +               return nl_route;
> +
> +       ret = mlx5_nl_get_mtu_bounds(nl_route, priv->if_index, min_mtu,
> + max_mtu);
> +
> +       close(nl_route);
> +       return ret;
> +}
> +
>  /**
>   * Get device MTU.
>   *
> diff --git a/drivers/net/mlx5/linux/mlx5_os.c
> b/drivers/net/mlx5/linux/mlx5_os.c
> index 4bd5c8da7d..b23ee265b6 100644
> --- a/drivers/net/mlx5/linux/mlx5_os.c
> +++ b/drivers/net/mlx5/linux/mlx5_os.c
> @@ -1570,6 +1570,8 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
>         eth_dev->data->mac_addrs = priv->mac;
>         eth_dev->device = dpdk_dev;
>         eth_dev->data->dev_flags |= RTE_ETH_DEV_AUTOFILL_QUEUE_XSTATS;
> +       /* Fetch minimum and maximum allowed MTU from the device. */
> +       mlx5_get_mtu_bounds(eth_dev, &priv->min_mtu, &priv->max_mtu);
>         /* Configure the first MAC address by default. */
>         if (mlx5_get_mac(eth_dev, &mac.addr_bytes)) {
>                 DRV_LOG(ERR,
> diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h index
> 296276c7ca..7966b65bae 100644
> --- a/drivers/net/mlx5/mlx5.h
> +++ b/drivers/net/mlx5/mlx5.h
> @@ -74,6 +74,15 @@
>  /* Maximal number of field/field parts to map into sample registers .*/
>  #define MLX5_FLEX_ITEM_MAPPING_NUM             32
> 
> +/* Number of bytes not included in MTU. */ #define MLX5_ETH_OVERHEAD
> +(RTE_ETHER_HDR_LEN + RTE_VLAN_HLEN + RTE_ETHER_CRC_LEN)
> +
> +/* Minimum allowed MTU to be reported whenever PMD cannot query it from
> +OS. */ #define MLX5_ETH_MIN_MTU (RTE_ETHER_MIN_MTU)
> +
> +/* Maximum allowed MTU to be reported whenever PMD cannot query it
> from
> +OS. */ #define MLX5_ETH_MAX_MTU (9978)
> +
>  enum mlx5_ipool_index {
>  #if defined(HAVE_IBV_FLOW_DV_SUPPORT) ||
> !defined(HAVE_INFINIBAND_VERBS_H)
>         MLX5_IPOOL_DECAP_ENCAP = 0, /* Pool for encap/decap resource. */ @@
> -1957,6 +1966,8 @@ struct mlx5_priv {
>         unsigned int vlan_filter_n; /* Number of configured VLAN filters. */
>         /* Device properties. */
>         uint16_t mtu; /* Configured MTU. */
> +       uint16_t min_mtu; /* Minimum MTU allowed on the NIC. */
> +       uint16_t max_mtu; /* Maximum MTU allowed on the NIC. */
>         unsigned int isolated:1; /* Whether isolated mode is enabled. */
>         unsigned int representor:1; /* Device is a port representor. */
>         unsigned int master:1; /* Device is a E-Switch master. */ @@ -2286,6
> +2297,7 @@ struct mlx5_priv *mlx5_dev_to_eswitch_info(struct rte_eth_dev
> *dev);  int mlx5_dev_configure_rss_reta(struct rte_eth_dev *dev);  uint64_t
> mlx5_get_restore_flags(struct rte_eth_dev *dev,
>                                 enum rte_eth_dev_operation op);
> +void mlx5_get_mtu_bounds(struct rte_eth_dev *dev, uint16_t *min_mtu,
> +uint16_t *max_mtu);
> 
>  /* mlx5_ethdev_os.c */
> 
> @@ -2323,6 +2335,7 @@ int mlx5_os_get_stats_n(struct rte_eth_dev *dev,
> bool bond_master,
>                         uint16_t *n_stats, uint16_t *n_stats_sec);  void
> mlx5_os_stats_init(struct rte_eth_dev *dev);  int
> mlx5_get_flag_dropless_rq(struct rte_eth_dev *dev);
> +int mlx5_os_get_mtu_bounds(struct rte_eth_dev *dev, uint16_t *min_mtu,
> +uint16_t *max_mtu);
> 
>  /* mlx5_mac.c */
> 
> diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
> index 68d1c1bfa7..7747b0c869 100644
> --- a/drivers/net/mlx5/mlx5_ethdev.c
> +++ b/drivers/net/mlx5/mlx5_ethdev.c
> @@ -360,9 +360,11 @@ mlx5_dev_infos_get(struct rte_eth_dev *dev, struct
> rte_eth_dev_info *info)
>         unsigned int max;
>         uint16_t max_wqe;
> 
> +       info->min_mtu = priv->min_mtu;
> +       info->max_mtu = priv->max_mtu;
> +       info->max_rx_pktlen = info->max_mtu + MLX5_ETH_OVERHEAD;
>         /* FIXME: we should ask the device for these values. */
>         info->min_rx_bufsize = 32;
> -       info->max_rx_pktlen = 65536;
>         info->max_lro_pkt_size = MLX5_MAX_LRO_SIZE;
>         /*
>          * Since we need one CQ per QP, the limit is the minimum number @@ -
> 863,3 +865,41 @@ mlx5_get_restore_flags(__rte_unused struct rte_eth_dev
> *dev,
>         /* mlx5 PMD does not require any configuration restore. */
>         return 0;
>  }
> +
> +/**
> + * Query minimum and maximum allowed MTU value on the device.
> + *
> + * This functions will always return valid MTU bounds.
> + * In case platform-specific implementation fails or current platform
> +does not support it,
> + * the fallback default values will be used.
> + *
> + * @param[in] dev
> + *   Pointer to Ethernet device
> + * @param[out] min_mtu
> + *   Minimum MTU value output buffer.
> + * @param[out] max_mtu
> + *   Maximum MTU value output buffer.
> + */
> +void
> +mlx5_get_mtu_bounds(struct rte_eth_dev *dev, uint16_t *min_mtu,
> +uint16_t *max_mtu) {
> +       int ret;
> +
> +       MLX5_ASSERT(min_mtu != NULL);
> +       MLX5_ASSERT(max_mtu != NULL);
> +
> +       ret = mlx5_os_get_mtu_bounds(dev, min_mtu, max_mtu);
> +       if (ret < 0) {
> +               if (ret != -ENOTSUP)
> +                       DRV_LOG(INFO, "port %u failed to query MTU bounds, using
> fallback values",
> +                               dev->data->port_id);
> +               *min_mtu = MLX5_ETH_MIN_MTU;
> +               *max_mtu = MLX5_ETH_MAX_MTU;
> +
> +               /* This function does not fail. Clear rte_errno. */
> +               rte_errno = 0;
> +       }
> +
> +       DRV_LOG(INFO, "port %u minimum MTU is %u", dev->data->port_id,
> *min_mtu);
> +       DRV_LOG(INFO, "port %u maximum MTU is %u", dev->data->port_id,
> +*max_mtu); }
> diff --git a/drivers/net/mlx5/windows/mlx5_ethdev_os.c
> b/drivers/net/mlx5/windows/mlx5_ethdev_os.c
> index ec08bfef6d..e24ff367af 100644
> --- a/drivers/net/mlx5/windows/mlx5_ethdev_os.c
> +++ b/drivers/net/mlx5/windows/mlx5_ethdev_os.c
> @@ -71,6 +71,34 @@ mlx5_get_ifname(const struct rte_eth_dev *dev, char
> ifname[MLX5_NAMESIZE])
>         return 0;
>  }
> 
> +/**
> + * Get device minimum and maximum allowed MTU.
> + *
> + * Windows API does not expose minimum and maximum allowed MTU.
> + * In this case, this just returns (-ENOTSUP) to allow
> +platform-independent code
> + * to fallback to default values.
> + *
> + * @param dev
> + *   Pointer to Ethernet device.
> + * @param[out] min_mtu
> + *   Minimum MTU value output buffer.
> + * @param[out] max_mtu
> + *   Maximum MTU value output buffer.
> + *
> + * @return
> + *   (-ENOTSUP) - not supported on Windows
> + */
> +int
> +mlx5_os_get_mtu_bounds(struct rte_eth_dev *dev, uint16_t *min_mtu,
> +uint16_t *max_mtu) {
> +       RTE_SET_USED(dev);
> +       RTE_SET_USED(min_mtu);
> +       RTE_SET_USED(max_mtu);
> +
> +       rte_errno = ENOTSUP;
> +       return -rte_errno;
> +}
> +
>  /**
>   * Get device MTU.
>   *
> diff --git a/drivers/net/mlx5/windows/mlx5_os.c
> b/drivers/net/mlx5/windows/mlx5_os.c
> index 7bd001950e..5c7e76383f 100644
> --- a/drivers/net/mlx5/windows/mlx5_os.c
> +++ b/drivers/net/mlx5/windows/mlx5_os.c
> @@ -478,6 +478,8 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
>         eth_dev->data->mac_addrs = priv->mac;
>         eth_dev->device = dpdk_dev;
>         eth_dev->data->dev_flags |= RTE_ETH_DEV_AUTOFILL_QUEUE_XSTATS;
> +       /* Fetch minimum and maximum allowed MTU from the device. */
> +       mlx5_get_mtu_bounds(eth_dev, &priv->min_mtu, &priv->max_mtu);
>         /* Configure the first MAC address by default. */
>         if (mlx5_get_mac(eth_dev, &mac.addr_bytes)) {
>                 DRV_LOG(ERR,
> --
> 2.39.5


This patch has also been applied to 23.11.

Thanks, 
Shani

      parent reply	other threads:[~2025-12-30 10:09 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-04 17:27 Dariusz Sosnowski
2025-11-05 11:07 ` Kevin Traynor
2025-12-30 10:09 ` Shani Peretz [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=MW4PR12MB7484B34CF4B6714844CF2942BFBCA@MW4PR12MB7484.namprd12.prod.outlook.com \
    --to=shperetz@nvidia.com \
    --cc=adrien.mazarguil@6wind.com \
    --cc=bingz@nvidia.com \
    --cc=didier.pallard@6wind.com \
    --cc=dsosnowski@nvidia.com \
    --cc=francesco.santoro@6wind.com \
    --cc=ktraynor@redhat.com \
    --cc=matan@nvidia.com \
    --cc=nelio.laranjeiro@6wind.com \
    --cc=orika@nvidia.com \
    --cc=stable@dpdk.org \
    --cc=suanmingm@nvidia.com \
    --cc=viacheslavo@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).