DPDK patches and discussions
 help / color / mirror / Atom feed
From: Bruce Richardson <bruce.richardson@intel.com>
To: dev@dpdk.org
Cc: Bruce Richardson <bruce.richardson@intel.com>
Subject: [PATCH v5 5/5] net/ice: provide parameter to limit scheduler layers
Date: Wed, 23 Oct 2024 17:55:40 +0100	[thread overview]
Message-ID: <20241023165540.893269-6-bruce.richardson@intel.com> (raw)
In-Reply-To: <20241023165540.893269-1-bruce.richardson@intel.com>

In order to help with backward compatibility for applications, which may
expect the ice driver tx scheduler (accessed via tm apis) to only have 3
layers, add in a devarg to allow the user to explicitly limit the
number of scheduler layers visible to the application.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 doc/guides/nics/ice.rst      | 16 +++++++++-
 drivers/net/ice/ice_ethdev.c | 58 ++++++++++++++++++++++++++++++++++++
 drivers/net/ice/ice_ethdev.h |  4 ++-
 drivers/net/ice/ice_tm.c     | 28 +++++++++--------
 4 files changed, 92 insertions(+), 14 deletions(-)

diff --git a/doc/guides/nics/ice.rst b/doc/guides/nics/ice.rst
index df489be08d..471343a0ac 100644
--- a/doc/guides/nics/ice.rst
+++ b/doc/guides/nics/ice.rst
@@ -147,6 +147,16 @@ Runtime Configuration
 
     -a 80:00.0,ddp_pkg_file=/path/to/ice-version.pkg
 
+- ``Traffic Management Scheduling Levels``
+
+  The DPDK Traffic Management (rte_tm) APIs can be used to configure the Tx scheduler on the NIC.
+  From 24.11 release, all available hardware layers are available to software.
+  Earlier versions of DPDK only supported 3 levels in the scheduling hierarchy.
+  To help with backward compatibility the ``tm_sched_levels`` parameter can be used to limit the scheduler levels to the provided value.
+  The provided value must be between 3 and 8.
+  If the value provided is greater than the number of levels provided by the HW,
+  SW will use the hardware maximum value.
+
 - ``Protocol extraction for per queue``
 
   Configure the RX queues to do protocol extraction into mbuf for protocol
@@ -450,7 +460,7 @@ The ice PMD provides support for the Traffic Management API (RTE_TM),
 enabling users to configure and manage the traffic shaping and scheduling of transmitted packets.
 By default, all available transmit scheduler layers are available for configuration,
 allowing up to 2000 queues to be configured in a hierarchy of up to 8 levels.
-The number of levels in the hierarchy can be adjusted via driver parameter:
+The number of levels in the hierarchy can be adjusted via driver parameters:
 
 * the default 9-level topology (8 levels usable) can be replaced by a new topology downloaded from a DDP file,
   using the driver parameter ``ddp_load_sched_topo=1``.
@@ -461,6 +471,10 @@ The number of levels in the hierarchy can be adjusted via driver parameter:
   with increased fan-out at the lower 3 levels
   e.g. 64 at levels 2 and 3, and 256 or more at the leaf-node level.
 
+* the number of levels can be reduced by setting the driver parameter ``tm_sched_levels`` to a lower value.
+  This scheme will reduce in software the number of editable levels,
+  but will not affect the fan-out from each level.
+
 For more details on how to configure a Tx scheduling hierarchy,
 please refer to the ``rte_tm`` `API documentation <https://doc.dpdk.org/api/rte__tm_8h.html>`_.
 
diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c
index 7aed26118f..0f1f34e739 100644
--- a/drivers/net/ice/ice_ethdev.c
+++ b/drivers/net/ice/ice_ethdev.c
@@ -40,6 +40,7 @@
 #define ICE_MBUF_CHECK_ARG       "mbuf_check"
 #define ICE_DDP_FILENAME_ARG      "ddp_pkg_file"
 #define ICE_DDP_LOAD_SCHED_ARG    "ddp_load_sched_topo"
+#define ICE_TM_LEVELS_ARG         "tm_sched_levels"
 
 #define ICE_CYCLECOUNTER_MASK  0xffffffffffffffffULL
 
@@ -58,6 +59,7 @@ static const char * const ice_valid_args[] = {
 	ICE_MBUF_CHECK_ARG,
 	ICE_DDP_FILENAME_ARG,
 	ICE_DDP_LOAD_SCHED_ARG,
+	ICE_TM_LEVELS_ARG,
 	NULL
 };
 
@@ -1854,6 +1856,7 @@ ice_send_driver_ver(struct ice_hw *hw)
 static int
 ice_pf_setup(struct ice_pf *pf)
 {
+	struct ice_adapter *ad = ICE_PF_TO_ADAPTER(pf);
 	struct ice_hw *hw = ICE_PF_TO_HW(pf);
 	struct ice_vsi *vsi;
 	uint16_t unused;
@@ -1878,6 +1881,28 @@ ice_pf_setup(struct ice_pf *pf)
 		return -EINVAL;
 	}
 
+	/* set the number of hidden Tx scheduler layers. If no devargs parameter to
+	 * set the number of exposed levels, the default is to expose all levels,
+	 * except the TC layer.
+	 *
+	 * If the number of exposed levels is set, we check that it's not greater
+	 * than the HW can provide (in which case we do nothing except log a warning),
+	 * and then set the hidden layers to be the total number of levels minus the
+	 * requested visible number.
+	 */
+	pf->tm_conf.hidden_layers = hw->port_info->has_tc;
+	if (ad->devargs.tm_exposed_levels != 0) {
+		const uint8_t avail_layers = hw->num_tx_sched_layers - hw->port_info->has_tc;
+		const uint8_t req_layers = ad->devargs.tm_exposed_levels;
+		if (req_layers > avail_layers) {
+			PMD_INIT_LOG(WARNING, "The number of TM scheduler exposed levels exceeds the number of supported levels (%u)",
+					avail_layers);
+			PMD_INIT_LOG(WARNING, "Setting scheduler layers to %u", avail_layers);
+		} else {
+			pf->tm_conf.hidden_layers = hw->num_tx_sched_layers - req_layers;
+		}
+	}
+
 	pf->main_vsi = vsi;
 	rte_spinlock_init(&pf->link_lock);
 
@@ -2066,6 +2091,32 @@ parse_u64(const char *key, const char *value, void *args)
 	return 0;
 }
 
+static int
+parse_tx_sched_levels(const char *key, const char *value, void *args)
+{
+	uint8_t *num = args;
+	long tmp;
+	char *endptr;
+
+	errno = 0;
+	tmp = strtol(value, &endptr, 0);
+	/* the value needs two stage validation, since the actual number of available
+	 * levels is not known at this point. Initially just validate that it is in
+	 * the correct range, between 3 and 8. Later validation will check that the
+	 * available layers on a particular port is higher than the value specified here.
+	 */
+	if (errno || *endptr != '\0' ||
+			tmp < (ICE_VSI_LAYER_OFFSET - 1) || tmp >= ICE_SCHED_9_LAYERS) {
+		PMD_DRV_LOG(WARNING, "%s: Invalid value \"%s\", should be in range [%d, %d]",
+			    key, value, ICE_VSI_LAYER_OFFSET - 1, ICE_SCHED_9_LAYERS - 1);
+		return -1;
+	}
+
+	*num = tmp;
+
+	return 0;
+}
+
 static int
 lookup_pps_type(const char *pps_name)
 {
@@ -2312,6 +2363,12 @@ static int ice_parse_devargs(struct rte_eth_dev *dev)
 				 &parse_bool, &ad->devargs.ddp_load_sched);
 	if (ret)
 		goto bail;
+
+	ret = rte_kvargs_process(kvlist, ICE_TM_LEVELS_ARG,
+				 &parse_tx_sched_levels, &ad->devargs.tm_exposed_levels);
+	if (ret)
+		goto bail;
+
 bail:
 	rte_kvargs_free(kvlist);
 	return ret;
@@ -7182,6 +7239,7 @@ RTE_PMD_REGISTER_PARAM_STRING(net_ice,
 			      ICE_DEFAULT_MAC_DISABLE "=<0|1>"
 			      ICE_DDP_FILENAME_ARG "=</path/to/file>"
 			      ICE_DDP_LOAD_SCHED_ARG "=<0|1>"
+			      ICE_TM_LEVELS_ARG "=<N>"
 			      ICE_RX_LOW_LATENCY_ARG "=<0|1>");
 
 RTE_LOG_REGISTER_SUFFIX(ice_logtype_init, init, NOTICE);
diff --git a/drivers/net/ice/ice_ethdev.h b/drivers/net/ice/ice_ethdev.h
index 71fd7bca64..431561e48f 100644
--- a/drivers/net/ice/ice_ethdev.h
+++ b/drivers/net/ice/ice_ethdev.h
@@ -484,6 +484,7 @@ struct ice_tm_node {
 struct ice_tm_conf {
 	struct ice_shaper_profile_list shaper_profile_list;
 	struct ice_tm_node *root; /* root node - port */
+	uint8_t hidden_layers;    /* the number of hierarchy layers hidden from app */
 	bool committed;
 	bool clear_on_fail;
 };
@@ -557,6 +558,7 @@ struct ice_devargs {
 	uint8_t pin_idx;
 	uint8_t pps_out_ena;
 	uint8_t ddp_load_sched;
+	uint8_t tm_exposed_levels;
 	int xtr_field_offs;
 	uint8_t xtr_flag_offs[PROTO_XTR_MAX];
 	/* Name of the field. */
@@ -660,7 +662,7 @@ struct ice_vsi_vlan_pvid_info {
 
 /* ICE_PF_TO */
 #define ICE_PF_TO_HW(pf) \
-	(&(((struct ice_pf *)pf)->adapter->hw))
+	(&((pf)->adapter->hw))
 #define ICE_PF_TO_ADAPTER(pf) \
 	((struct ice_adapter *)(pf)->adapter)
 #define ICE_PF_TO_ETH_DEV(pf) \
diff --git a/drivers/net/ice/ice_tm.c b/drivers/net/ice/ice_tm.c
index 09e947a3b1..9e943da7a1 100644
--- a/drivers/net/ice/ice_tm.c
+++ b/drivers/net/ice/ice_tm.c
@@ -200,9 +200,10 @@ find_node(struct ice_tm_node *root, uint32_t id)
 }
 
 static inline uint8_t
-ice_get_leaf_level(struct ice_hw *hw)
+ice_get_leaf_level(const struct ice_pf *pf)
 {
-	return hw->num_tx_sched_layers - 1 - hw->port_info->has_tc;
+	const struct ice_hw *hw = ICE_PF_TO_HW(pf);
+	return hw->num_tx_sched_layers - pf->tm_conf.hidden_layers - 1;
 }
 
 static int
@@ -210,7 +211,6 @@ ice_node_type_get(struct rte_eth_dev *dev, uint32_t node_id,
 		   int *is_leaf, struct rte_tm_error *error)
 {
 	struct ice_pf *pf = ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
-	struct ice_hw *hw = ICE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
 	struct ice_tm_node *tm_node;
 
 	if (!is_leaf || !error)
@@ -230,7 +230,7 @@ ice_node_type_get(struct rte_eth_dev *dev, uint32_t node_id,
 		return -EINVAL;
 	}
 
-	if (tm_node->level == ice_get_leaf_level(hw))
+	if (tm_node->level == ice_get_leaf_level(pf))
 		*is_leaf = true;
 	else
 		*is_leaf = false;
@@ -434,7 +434,7 @@ ice_tm_node_add(struct rte_eth_dev *dev, uint32_t node_id,
 	}
 
 	ret = ice_node_param_check(node_id, priority, weight,
-			params, level_id == ice_get_leaf_level(hw), error);
+			params, level_id == ice_get_leaf_level(pf), error);
 	if (ret)
 		return ret;
 
@@ -762,8 +762,8 @@ free_sched_node_recursive(struct ice_port_info *pi, const struct ice_sched_node
 }
 
 static int
-create_sched_node_recursive(struct ice_port_info *pi, struct ice_tm_node *sw_node,
-		struct ice_sched_node *hw_root, uint16_t *created)
+create_sched_node_recursive(struct ice_pf *pf, struct ice_port_info *pi,
+		 struct ice_tm_node *sw_node, struct ice_sched_node *hw_root, uint16_t *created)
 {
 	struct ice_sched_node *parent = sw_node->sched_node;
 	uint32_t teid;
@@ -795,14 +795,14 @@ create_sched_node_recursive(struct ice_port_info *pi, struct ice_tm_node *sw_nod
 	 * then just return, rather than trying to create leaf nodes.
 	 * That is done later at queue start.
 	 */
-	if (sw_node->level + 2 == ice_get_leaf_level(pi->hw))
+	if (sw_node->level + 2 == ice_get_leaf_level(pf))
 		return 0;
 
 	for (uint16_t i = 0; i < sw_node->reference_count; i++) {
 		if (sw_node->children[i]->reference_count == 0)
 			continue;
 
-		if (create_sched_node_recursive(pi, sw_node->children[i], hw_root, created) < 0)
+		if (create_sched_node_recursive(pf, pi, sw_node->children[i], hw_root, created) < 0)
 			return -1;
 	}
 	return 0;
@@ -815,15 +815,19 @@ commit_new_hierarchy(struct rte_eth_dev *dev)
 	struct ice_pf *pf = ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
 	struct ice_port_info *pi = hw->port_info;
 	struct ice_tm_node *sw_root = pf->tm_conf.root;
-	struct ice_sched_node *new_vsi_root = (pi->has_tc) ? pi->root->children[0] : pi->root;
+	const uint16_t new_root_level = pf->tm_conf.hidden_layers;
 	uint16_t nodes_created_per_level[10] = {0}; /* counted per hw level, not per logical */
-	uint8_t q_lvl = ice_get_leaf_level(hw);
+	uint8_t q_lvl = ice_get_leaf_level(pf);
 	uint8_t qg_lvl = q_lvl - 1;
 
+	struct ice_sched_node *new_vsi_root = hw->vsi_ctx[pf->main_vsi->idx]->sched.vsi_node[0];
+	while (new_vsi_root->tx_sched_layer > new_root_level)
+		new_vsi_root = new_vsi_root->parent;
+
 	free_sched_node_recursive(pi, new_vsi_root, new_vsi_root, new_vsi_root->vsi_handle);
 
 	sw_root->sched_node = new_vsi_root;
-	if (create_sched_node_recursive(pi, sw_root, new_vsi_root, nodes_created_per_level) < 0)
+	if (create_sched_node_recursive(pf, pi, sw_root, new_vsi_root, nodes_created_per_level) < 0)
 		return -1;
 	for (uint16_t i = 0; i < RTE_DIM(nodes_created_per_level); i++)
 		PMD_DRV_LOG(DEBUG, "Created %u nodes at level %u",
-- 
2.43.0


      parent reply	other threads:[~2024-10-23 16:56 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-07  9:33 [PATCH 00/15] Improve rte_tm support in ICE driver Bruce Richardson
2024-08-07  9:33 ` [PATCH 01/15] net/ice: add traffic management node query function Bruce Richardson
2024-08-07  9:33 ` [PATCH 02/15] net/ice: detect stopping a flow-director queue twice Bruce Richardson
2024-08-07  9:33 ` [PATCH 03/15] net/ice: improve Tx scheduler graph output Bruce Richardson
2024-08-07  9:33 ` [PATCH 04/15] net/ice: add option to choose DDP package file Bruce Richardson
2024-08-07  9:33 ` [PATCH 05/15] net/ice: add option to download scheduler topology Bruce Richardson
2024-08-07  9:33 ` [PATCH 06/15] net/ice/base: allow init without TC class sched nodes Bruce Richardson
2024-08-07  9:33 ` [PATCH 07/15] net/ice/base: set VSI index on newly created nodes Bruce Richardson
2024-08-07  9:34 ` [PATCH 08/15] net/ice/base: read VSI layer info from VSI Bruce Richardson
2024-08-07  9:34 ` [PATCH 09/15] net/ice/base: remove 255 limit on sched child nodes Bruce Richardson
2024-08-07  9:34 ` [PATCH 10/15] net/ice/base: optimize subtree searches Bruce Richardson
2024-08-07  9:34 ` [PATCH 11/15] net/ice/base: make functions non-static Bruce Richardson
2024-08-07  9:34 ` [PATCH 12/15] net/ice/base: remove flag checks before topology upload Bruce Richardson
2024-08-07  9:34 ` [PATCH 13/15] net/ice: limit the number of queues to sched capabilities Bruce Richardson
2024-08-07  9:34 ` [PATCH 14/15] net/ice: enhance Tx scheduler hierarchy support Bruce Richardson
2024-08-07  9:34 ` [PATCH 15/15] net/ice: add minimal capability reporting API Bruce Richardson
2024-08-07  9:46 ` [PATCH v2 00/15] Improve rte_tm support in ICE driver Bruce Richardson
2024-08-07  9:46   ` [PATCH v2 01/15] net/ice: add traffic management node query function Bruce Richardson
2024-08-07  9:46   ` [PATCH v2 02/15] net/ice: detect stopping a flow-director queue twice Bruce Richardson
2024-08-07  9:46   ` [PATCH v2 03/15] net/ice: improve Tx scheduler graph output Bruce Richardson
2024-08-07  9:46   ` [PATCH v2 04/15] net/ice: add option to choose DDP package file Bruce Richardson
2024-08-07  9:46   ` [PATCH v2 05/15] net/ice: add option to download scheduler topology Bruce Richardson
2024-08-07  9:46   ` [PATCH v2 06/15] net/ice/base: allow init without TC class sched nodes Bruce Richardson
2024-08-07  9:46   ` [PATCH v2 07/15] net/ice/base: set VSI index on newly created nodes Bruce Richardson
2024-08-07  9:46   ` [PATCH v2 08/15] net/ice/base: read VSI layer info from VSI Bruce Richardson
2024-08-07  9:47   ` [PATCH v2 09/15] net/ice/base: remove 255 limit on sched child nodes Bruce Richardson
2024-08-07  9:47   ` [PATCH v2 10/15] net/ice/base: optimize subtree searches Bruce Richardson
2024-08-07  9:47   ` [PATCH v2 11/15] net/ice/base: make functions non-static Bruce Richardson
2024-08-07  9:47   ` [PATCH v2 12/15] net/ice/base: remove flag checks before topology upload Bruce Richardson
2024-08-07  9:47   ` [PATCH v2 13/15] net/ice: limit the number of queues to sched capabilities Bruce Richardson
2024-08-07  9:47   ` [PATCH v2 14/15] net/ice: enhance Tx scheduler hierarchy support Bruce Richardson
2024-08-07  9:47   ` [PATCH v2 15/15] net/ice: add minimal capability reporting API Bruce Richardson
2024-08-12 15:27 ` [PATCH v3 00/16] Improve rte_tm support in ICE driver Bruce Richardson
2024-08-12 15:28   ` [PATCH v3 01/16] net/ice: add traffic management node query function Bruce Richardson
2024-08-12 15:28   ` [PATCH v3 02/16] net/ice: detect stopping a flow-director queue twice Bruce Richardson
2024-08-12 15:28   ` [PATCH v3 03/16] net/ice: improve Tx scheduler graph output Bruce Richardson
2024-08-12 15:28   ` [PATCH v3 04/16] net/ice: add option to choose DDP package file Bruce Richardson
2024-08-12 15:28   ` [PATCH v3 05/16] net/ice: add option to download scheduler topology Bruce Richardson
2024-08-12 15:28   ` [PATCH v3 06/16] net/ice/base: allow init without TC class sched nodes Bruce Richardson
2024-08-12 15:28   ` [PATCH v3 07/16] net/ice/base: set VSI index on newly created nodes Bruce Richardson
2024-08-12 15:28   ` [PATCH v3 08/16] net/ice/base: read VSI layer info from VSI Bruce Richardson
2024-08-12 15:28   ` [PATCH v3 09/16] net/ice/base: remove 255 limit on sched child nodes Bruce Richardson
2024-08-12 15:28   ` [PATCH v3 10/16] net/ice/base: optimize subtree searches Bruce Richardson
2024-08-12 15:28   ` [PATCH v3 11/16] net/ice/base: make functions non-static Bruce Richardson
2024-08-12 15:28   ` [PATCH v3 12/16] net/ice/base: remove flag checks before topology upload Bruce Richardson
2024-08-12 15:28   ` [PATCH v3 13/16] net/ice: limit the number of queues to sched capabilities Bruce Richardson
2024-08-12 15:28   ` [PATCH v3 14/16] net/ice: enhance Tx scheduler hierarchy support Bruce Richardson
2024-08-12 15:28   ` [PATCH v3 15/16] net/ice: add minimal capability reporting API Bruce Richardson
2024-08-12 15:28   ` [PATCH v3 16/16] net/ice: do early check on node level when adding Bruce Richardson
2024-10-23 16:27 ` [PATCH v4 0/5] Improve rte_tm support in ICE driver Bruce Richardson
2024-10-23 16:27   ` [PATCH v4 1/5] net/ice: add option to download scheduler topology Bruce Richardson
2024-10-23 16:27   ` [PATCH v4 2/5] net/ice/base: make context alloc function non-static Bruce Richardson
2024-10-23 16:27   ` [PATCH v4 3/5] net/ice: enhance Tx scheduler hierarchy support Bruce Richardson
2024-10-23 16:27   ` [PATCH v4 4/5] net/ice: allowing stopping port to apply TM topology Bruce Richardson
2024-10-23 16:27   ` [PATCH v4 5/5] net/ice: provide parameter to limit scheduler layers Bruce Richardson
2024-10-23 16:55 ` [PATCH v5 0/5] Improve rte_tm support in ICE driver Bruce Richardson
2024-10-23 16:55   ` [PATCH v5 1/5] net/ice: add option to download scheduler topology Bruce Richardson
2024-10-23 16:55   ` [PATCH v5 2/5] net/ice/base: make context alloc function non-static Bruce Richardson
2024-10-23 16:55   ` [PATCH v5 3/5] net/ice: enhance Tx scheduler hierarchy support Bruce Richardson
2024-10-23 16:55   ` [PATCH v5 4/5] net/ice: allowing stopping port to apply TM topology Bruce Richardson
2024-10-23 16:55   ` Bruce Richardson [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20241023165540.893269-6-bruce.richardson@intel.com \
    --to=bruce.richardson@intel.com \
    --cc=dev@dpdk.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).