* [PATCH 00/15] Improve rte_tm support in ICE driver
@ 2024-08-07  9:33 Bruce Richardson
  2024-08-07  9:33 ` [PATCH 01/15] net/ice: add traffic management node query function Bruce Richardson
                   ` (19 more replies)
  0 siblings, 20 replies; 76+ messages in thread
From: Bruce Richardson @ 2024-08-07  9:33 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson
This patchset expands the capabilities of the traffic management
support in the ICE driver. It allows the driver to support different
sizes of topologies, and support >256 queues and more than 3 hierarchy
layers.
Bruce Richardson (15):
  net/ice: add traffic management node query function
  net/ice: detect stopping a flow-director queue twice
  net/ice: improve Tx scheduler graph output
  net/ice: add option to choose DDP package file
  net/ice: add option to download scheduler topology
  net/ice/base: allow init without TC class sched nodes
  net/ice/base: set VSI index on newly created nodes
  net/ice/base: read VSI layer info from VSI
  net/ice/base: remove 255 limit on sched child nodes
  net/ice/base: optimize subtree searches
  net/ice/base: make functions non-static
  net/ice/base: remove flag checks before topology upload
  net/ice: limit the number of queues to sched capabilities
  net/ice: enhance Tx scheduler hierarchy support
  net/ice: add minimal capability reporting API
 doc/guides/nics/ice.rst          |   9 +
 drivers/net/ice/base/ice_ddp.c   |  51 +--
 drivers/net/ice/base/ice_ddp.h   |   4 +-
 drivers/net/ice/base/ice_sched.c |  56 ++--
 drivers/net/ice/base/ice_sched.h |   8 +
 drivers/net/ice/base/ice_type.h  |   3 +-
 drivers/net/ice/ice_diagnose.c   | 196 ++++-------
 drivers/net/ice/ice_ethdev.c     |  92 +++--
 drivers/net/ice/ice_ethdev.h     |  18 +-
 drivers/net/ice/ice_rxtx.c       |  15 +
 drivers/net/ice/ice_tm.c         | 558 +++++++++++++++----------------
 11 files changed, 487 insertions(+), 523 deletions(-)
--
2.43.0
^ permalink raw reply	[flat|nested] 76+ messages in thread
* [PATCH 01/15] net/ice: add traffic management node query function
  2024-08-07  9:33 [PATCH 00/15] Improve rte_tm support in ICE driver Bruce Richardson
@ 2024-08-07  9:33 ` Bruce Richardson
  2024-08-07  9:33 ` [PATCH 02/15] net/ice: detect stopping a flow-director queue twice Bruce Richardson
                   ` (18 subsequent siblings)
  19 siblings, 0 replies; 76+ messages in thread
From: Bruce Richardson @ 2024-08-07  9:33 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson
Implement the new node querying function for the "ice" net driver.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/ice/ice_tm.c | 48 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)
diff --git a/drivers/net/ice/ice_tm.c b/drivers/net/ice/ice_tm.c
index 8a29a9e744..459446a6b0 100644
--- a/drivers/net/ice/ice_tm.c
+++ b/drivers/net/ice/ice_tm.c
@@ -17,6 +17,11 @@ static int ice_tm_node_add(struct rte_eth_dev *dev, uint32_t node_id,
 	      uint32_t weight, uint32_t level_id,
 	      const struct rte_tm_node_params *params,
 	      struct rte_tm_error *error);
+static int ice_node_query(const struct rte_eth_dev *dev, uint32_t node_id,
+		uint32_t *parent_node_id, uint32_t *priority,
+		uint32_t *weight, uint32_t *level_id,
+		struct rte_tm_node_params *params,
+		struct rte_tm_error *error);
 static int ice_tm_node_delete(struct rte_eth_dev *dev, uint32_t node_id,
 			    struct rte_tm_error *error);
 static int ice_node_type_get(struct rte_eth_dev *dev, uint32_t node_id,
@@ -35,6 +40,7 @@ const struct rte_tm_ops ice_tm_ops = {
 	.node_add = ice_tm_node_add,
 	.node_delete = ice_tm_node_delete,
 	.node_type_get = ice_node_type_get,
+	.node_query = ice_node_query,
 	.hierarchy_commit = ice_hierarchy_commit,
 };
 
@@ -219,6 +225,48 @@ ice_node_type_get(struct rte_eth_dev *dev, uint32_t node_id,
 	return 0;
 }
 
+static int
+ice_node_query(const struct rte_eth_dev *dev, uint32_t node_id,
+		uint32_t *parent_node_id, uint32_t *priority,
+		uint32_t *weight, uint32_t *level_id,
+		struct rte_tm_node_params *params,
+		struct rte_tm_error *error)
+{
+	struct ice_pf *pf = ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
+	struct ice_tm_node *tm_node;
+
+	if (node_id == RTE_TM_NODE_ID_NULL) {
+		error->type = RTE_TM_ERROR_TYPE_NODE_ID;
+		error->message = "invalid node id";
+		return -EINVAL;
+	}
+
+	/* check if the node id exists */
+	tm_node = find_node(pf->tm_conf.root, node_id);
+	if (!tm_node) {
+		error->type = RTE_TM_ERROR_TYPE_NODE_ID;
+		error->message = "no such node";
+		return -EEXIST;
+	}
+
+	if (parent_node_id != NULL)
+		*parent_node_id = tm_node->parent->id;
+
+	if (priority != NULL)
+		*priority = tm_node->priority;
+
+	if (weight != NULL)
+		*weight = tm_node->weight;
+
+	if (level_id != NULL)
+		*level_id = tm_node->level;
+
+	if (params != NULL)
+		*params = tm_node->params;
+
+	return 0;
+}
+
 static inline struct ice_tm_shaper_profile *
 ice_shaper_profile_search(struct rte_eth_dev *dev,
 			   uint32_t shaper_profile_id)
-- 
2.43.0
^ permalink raw reply	[flat|nested] 76+ messages in thread
* [PATCH 02/15] net/ice: detect stopping a flow-director queue twice
  2024-08-07  9:33 [PATCH 00/15] Improve rte_tm support in ICE driver Bruce Richardson
  2024-08-07  9:33 ` [PATCH 01/15] net/ice: add traffic management node query function Bruce Richardson
@ 2024-08-07  9:33 ` Bruce Richardson
  2024-08-07  9:33 ` [PATCH 03/15] net/ice: improve Tx scheduler graph output Bruce Richardson
                   ` (17 subsequent siblings)
  19 siblings, 0 replies; 76+ messages in thread
From: Bruce Richardson @ 2024-08-07  9:33 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson
If the flow-director queue is stopped at some point during the running
of an application, the shutdown procedure for the port issues an error
as it tries to stop the queue a second time, and fails to do so. We can
eliminate this error by setting the tail-register pointer to NULL on
stop, and checking for that condition in subsequent stop calls. Since
the register pointer is set on start, any restarting of the queue will
allow a stop call to progress as normal.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/ice/ice_rxtx.c | 5 +++++
 1 file changed, 5 insertions(+)
diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c
index f270498ed1..a150d28e73 100644
--- a/drivers/net/ice/ice_rxtx.c
+++ b/drivers/net/ice/ice_rxtx.c
@@ -1139,6 +1139,10 @@ ice_fdir_tx_queue_stop(struct rte_eth_dev *dev, uint16_t tx_queue_id)
 			    tx_queue_id);
 		return -EINVAL;
 	}
+	if (txq->qtx_tail == NULL) {
+		PMD_DRV_LOG(INFO, "TX queue %u not started\n", tx_queue_id);
+		return 0;
+	}
 	vsi = txq->vsi;
 
 	q_ids[0] = txq->reg_idx;
@@ -1153,6 +1157,7 @@ ice_fdir_tx_queue_stop(struct rte_eth_dev *dev, uint16_t tx_queue_id)
 	}
 
 	txq->tx_rel_mbufs(txq);
+	txq->qtx_tail = NULL;
 
 	return 0;
 }
-- 
2.43.0
^ permalink raw reply	[flat|nested] 76+ messages in thread
* [PATCH 03/15] net/ice: improve Tx scheduler graph output
  2024-08-07  9:33 [PATCH 00/15] Improve rte_tm support in ICE driver Bruce Richardson
  2024-08-07  9:33 ` [PATCH 01/15] net/ice: add traffic management node query function Bruce Richardson
  2024-08-07  9:33 ` [PATCH 02/15] net/ice: detect stopping a flow-director queue twice Bruce Richardson
@ 2024-08-07  9:33 ` Bruce Richardson
  2024-08-07  9:33 ` [PATCH 04/15] net/ice: add option to choose DDP package file Bruce Richardson
                   ` (16 subsequent siblings)
  19 siblings, 0 replies; 76+ messages in thread
From: Bruce Richardson @ 2024-08-07  9:33 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson
The function to dump the TX scheduler topology only adds to the chart
nodes connected to TX queues or for the flow director VSI. Change the
function to work recursively from the root node and thereby include all
scheduler nodes, whether in use or not, in the dump.
Also, improve the output of the Tx scheduler graphing function:
* Add VSI details to each node in graph
* When number of children is >16, skip middle nodes to reduce size of
  the graph, otherwise dot output is unviewable for large hierarchies
* For VSIs other than zero, use dot's clustering method to put those
  VSIs into subgraphs with borders
* For leaf nodes, display queue numbers for the any nodes asigned to
  ethdev NIC Tx queues
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/ice/ice_diagnose.c | 196 ++++++++++++---------------------
 1 file changed, 69 insertions(+), 127 deletions(-)
diff --git a/drivers/net/ice/ice_diagnose.c b/drivers/net/ice/ice_diagnose.c
index c357554707..623d84e37d 100644
--- a/drivers/net/ice/ice_diagnose.c
+++ b/drivers/net/ice/ice_diagnose.c
@@ -545,29 +545,15 @@ static void print_rl_profile(struct ice_aqc_rl_profile_elem *prof,
 	fprintf(stream, "\t\t\t\t\t</td>\n");
 }
 
-static
-void print_elem_type(FILE *stream, u8 type)
+static const char *
+get_elem_type(u8 type)
 {
-	switch (type) {
-	case 1:
-		fprintf(stream, "root");
-		break;
-	case 2:
-		fprintf(stream, "tc");
-		break;
-	case 3:
-		fprintf(stream, "se_generic");
-		break;
-	case 4:
-		fprintf(stream, "entry_point");
-		break;
-	case 5:
-		fprintf(stream, "leaf");
-		break;
-	default:
-		fprintf(stream, "%d", type);
-		break;
-	}
+	static const char * const ice_sched_node_types[] = {
+			"Undefined", "Root", "TC", "SE Generic", "SW Entry", "Leaf"
+	};
+	if (type < RTE_DIM(ice_sched_node_types))
+		return ice_sched_node_types[type];
+	return "*UNKNOWN*";
 }
 
 static
@@ -602,7 +588,9 @@ void print_priority_mode(FILE *stream, bool flag)
 }
 
 static
-void print_node(struct ice_aqc_txsched_elem_data *data,
+void print_node(struct ice_sched_node *node,
+		struct rte_eth_dev_data *ethdata,
+		struct ice_aqc_txsched_elem_data *data,
 		struct ice_aqc_rl_profile_elem *cir_prof,
 		struct ice_aqc_rl_profile_elem *eir_prof,
 		struct ice_aqc_rl_profile_elem *shared_prof,
@@ -613,17 +601,19 @@ void print_node(struct ice_aqc_txsched_elem_data *data,
 
 	fprintf(stream, "\t\t\t<table>\n");
 
-	fprintf(stream, "\t\t\t\t<tr>\n");
-	fprintf(stream, "\t\t\t\t\t<td> teid </td>\n");
-	fprintf(stream, "\t\t\t\t\t<td> %d </td>\n", data->node_teid);
-	fprintf(stream, "\t\t\t\t</tr>\n");
-
-	fprintf(stream, "\t\t\t\t<tr>\n");
-	fprintf(stream, "\t\t\t\t\t<td> type </td>\n");
-	fprintf(stream, "\t\t\t\t\t<td>");
-	print_elem_type(stream, data->data.elem_type);
-	fprintf(stream, "</td>\n");
-	fprintf(stream, "\t\t\t\t</tr>\n");
+	fprintf(stream, "\t\t\t\t<tr><td>teid</td><td>%d</td></tr>\n", data->node_teid);
+	fprintf(stream, "\t\t\t\t<tr><td>type</td><td>%s</td></tr>\n",
+			get_elem_type(data->data.elem_type));
+	fprintf(stream, "\t\t\t\t<tr><td>VSI</td><td>%u</td></tr>\n", node->vsi_handle);
+	if (data->data.elem_type == ICE_AQC_ELEM_TYPE_LEAF) {
+		for (uint16_t i = 0; i < ethdata->nb_tx_queues; i++) {
+			struct ice_tx_queue *q = ethdata->tx_queues[i];
+			if (q->q_teid == data->node_teid) {
+				fprintf(stream, "\t\t\t\t<tr><td>TXQ</td><td>%u</td></tr>\n", i);
+				break;
+			}
+		}
+	}
 
 	if (!detail)
 		goto brief;
@@ -705,8 +695,6 @@ void print_node(struct ice_aqc_txsched_elem_data *data,
 	fprintf(stream, "\t\tshape=plain\n");
 	fprintf(stream, "\t]\n");
 
-	if (data->parent_teid != 0xFFFFFFFF)
-		fprintf(stream, "\tNODE_%d -> NODE_%d\n", data->parent_teid, data->node_teid);
 }
 
 static
@@ -731,112 +719,92 @@ int query_rl_profile(struct ice_hw *hw,
 	return 0;
 }
 
-static
-int query_node(struct ice_hw *hw, uint32_t child, uint32_t *parent,
-	       uint8_t level, bool detail, FILE *stream)
+static int
+query_node(struct ice_hw *hw, struct rte_eth_dev_data *ethdata,
+		struct ice_sched_node *node, bool detail, FILE *stream)
 {
-	struct ice_aqc_txsched_elem_data data;
+	struct ice_aqc_txsched_elem_data *data = &node->info;
 	struct ice_aqc_rl_profile_elem cir_prof;
 	struct ice_aqc_rl_profile_elem eir_prof;
 	struct ice_aqc_rl_profile_elem shared_prof;
 	struct ice_aqc_rl_profile_elem *cp = NULL;
 	struct ice_aqc_rl_profile_elem *ep = NULL;
 	struct ice_aqc_rl_profile_elem *sp = NULL;
-	int status, ret;
-
-	status = ice_sched_query_elem(hw, child, &data);
-	if (status != ICE_SUCCESS) {
-		if (level == hw->num_tx_sched_layers) {
-			/* ignore the error when a queue has been stopped. */
-			PMD_DRV_LOG(WARNING, "Failed to query queue node %d.", child);
-			*parent = 0xffffffff;
-			return 0;
-		}
-		PMD_DRV_LOG(ERR, "Failed to query scheduling node %d.", child);
-		return -EINVAL;
-	}
-
-	*parent = data.parent_teid;
+	u8 level = node->tx_sched_layer;
+	int ret;
 
-	if (data.data.cir_bw.bw_profile_idx != 0) {
-		ret = query_rl_profile(hw, level, 0, data.data.cir_bw.bw_profile_idx, &cir_prof);
+	if (data->data.cir_bw.bw_profile_idx != 0) {
+		ret = query_rl_profile(hw, level, 0, data->data.cir_bw.bw_profile_idx, &cir_prof);
 
 		if (ret)
 			return ret;
 		cp = &cir_prof;
 	}
 
-	if (data.data.eir_bw.bw_profile_idx != 0) {
-		ret = query_rl_profile(hw, level, 1, data.data.eir_bw.bw_profile_idx, &eir_prof);
+	if (data->data.eir_bw.bw_profile_idx != 0) {
+		ret = query_rl_profile(hw, level, 1, data->data.eir_bw.bw_profile_idx, &eir_prof);
 
 		if (ret)
 			return ret;
 		ep = &eir_prof;
 	}
 
-	if (data.data.srl_id != 0) {
-		ret = query_rl_profile(hw, level, 2, data.data.srl_id, &shared_prof);
+	if (data->data.srl_id != 0) {
+		ret = query_rl_profile(hw, level, 2, data->data.srl_id, &shared_prof);
 
 		if (ret)
 			return ret;
 		sp = &shared_prof;
 	}
 
-	print_node(&data, cp, ep, sp, detail, stream);
+	print_node(node, ethdata, data, cp, ep, sp, detail, stream);
 
 	return 0;
 }
 
-static
-int query_nodes(struct ice_hw *hw,
-		uint32_t *children, int child_num,
-		uint32_t *parents, int *parent_num,
-		uint8_t level, bool detail,
-		FILE *stream)
+static int
+query_node_recursive(struct ice_hw *hw, struct rte_eth_dev_data *ethdata,
+		struct ice_sched_node *node, bool detail, FILE *stream)
 {
-	uint32_t parent;
-	int i;
-	int j;
-
-	*parent_num = 0;
-	for (i = 0; i < child_num; i++) {
-		bool exist = false;
-		int ret;
+	bool close = false;
+	if (node->parent != NULL && node->vsi_handle != node->parent->vsi_handle) {
+		fprintf(stream, "subgraph cluster_%u {\n", node->vsi_handle);
+		fprintf(stream, "\tlabel = \"VSI %u\";\n", node->vsi_handle);
+		close = true;
+	}
 
-		ret = query_node(hw, children[i], &parent, level, detail, stream);
-		if (ret)
-			return -EINVAL;
+	int ret = query_node(hw, ethdata, node, detail, stream);
+	if (ret != 0)
+		return ret;
 
-		for (j = 0; j < *parent_num; j++) {
-			if (parents[j] == parent) {
-				exist = true;
-				break;
-			}
+	for (uint16_t i = 0; i < node->num_children; i++) {
+		ret = query_node_recursive(hw, ethdata, node->children[i], detail, stream);
+		if (ret != 0)
+			return ret;
+		/* if we have a lot of nodes, skip a bunch in the middle */
+		if (node->num_children > 16 && i == 2) {
+			uint16_t inc = node->num_children - 5;
+			fprintf(stream, "\tn%d_children [label=\"... +%d child nodes ...\"];\n",
+					node->info.node_teid, inc);
+			fprintf(stream, "\tNODE_%d -> n%d_children;\n",
+					node->info.node_teid, node->info.node_teid);
+			i += inc;
 		}
-
-		if (!exist && parent != 0xFFFFFFFF)
-			parents[(*parent_num)++] = parent;
 	}
+	if (close)
+		fprintf(stream, "}\n");
+	if (node->info.parent_teid != 0xFFFFFFFF)
+		fprintf(stream, "\tNODE_%d -> NODE_%d\n",
+				node->info.parent_teid, node->info.node_teid);
 
 	return 0;
 }
 
-int rte_pmd_ice_dump_txsched(uint16_t port, bool detail, FILE *stream)
+int
+rte_pmd_ice_dump_txsched(uint16_t port, bool detail, FILE *stream)
 {
 	struct rte_eth_dev *dev;
 	struct ice_hw *hw;
-	struct ice_pf *pf;
-	struct ice_q_ctx *q_ctx;
-	uint16_t q_num;
-	uint16_t i;
-	struct ice_tx_queue *txq;
-	uint32_t buf1[256];
-	uint32_t buf2[256];
-	uint32_t *children = buf1;
-	uint32_t *parents = buf2;
-	int child_num = 0;
-	int parent_num = 0;
-	uint8_t level;
 
 	RTE_ETH_VALID_PORTID_OR_ERR_RET(port, -ENODEV);
 
@@ -846,35 +814,9 @@ int rte_pmd_ice_dump_txsched(uint16_t port, bool detail, FILE *stream)
 
 	dev = &rte_eth_devices[port];
 	hw = ICE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
-	pf = ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
-	level = hw->num_tx_sched_layers;
-
-	q_num = dev->data->nb_tx_queues;
-
-	/* main vsi */
-	for (i = 0; i < q_num; i++) {
-		txq = dev->data->tx_queues[i];
-		q_ctx = ice_get_lan_q_ctx(hw, txq->vsi->idx, 0, i);
-		children[child_num++] = q_ctx->q_teid;
-	}
-
-	/* fdir vsi */
-	q_ctx = ice_get_lan_q_ctx(hw, pf->fdir.fdir_vsi->idx, 0, 0);
-	children[child_num++] = q_ctx->q_teid;
 
 	fprintf(stream, "digraph tx_sched {\n");
-	while (child_num > 0) {
-		int ret;
-		ret = query_nodes(hw, children, child_num,
-				  parents, &parent_num,
-				  level, detail, stream);
-		if (ret)
-			return ret;
-
-		children = parents;
-		child_num = parent_num;
-		level--;
-	}
+	query_node_recursive(hw, dev->data, hw->port_info->root, detail, stream);
 	fprintf(stream, "}\n");
 
 	return 0;
-- 
2.43.0
^ permalink raw reply	[flat|nested] 76+ messages in thread
* [PATCH 04/15] net/ice: add option to choose DDP package file
  2024-08-07  9:33 [PATCH 00/15] Improve rte_tm support in ICE driver Bruce Richardson
                   ` (2 preceding siblings ...)
  2024-08-07  9:33 ` [PATCH 03/15] net/ice: improve Tx scheduler graph output Bruce Richardson
@ 2024-08-07  9:33 ` Bruce Richardson
  2024-08-07  9:33 ` [PATCH 05/15] net/ice: add option to download scheduler topology Bruce Richardson
                   ` (15 subsequent siblings)
  19 siblings, 0 replies; 76+ messages in thread
From: Bruce Richardson @ 2024-08-07  9:33 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson
The "Dynamic Device Personalization" package is loaded at initialization
time by the driver, but the specific package file loaded depends upon
what package file is found first by searching through a hard-coded list
of firmware paths. To enable greater control over the package loading,
we can add a device option to choose a specific DDP package file to
load.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 doc/guides/nics/ice.rst      |  9 +++++++++
 drivers/net/ice/ice_ethdev.c | 34 ++++++++++++++++++++++++++++++++++
 drivers/net/ice/ice_ethdev.h |  1 +
 3 files changed, 44 insertions(+)
diff --git a/doc/guides/nics/ice.rst b/doc/guides/nics/ice.rst
index ae975d19ad..58ccfbd1a5 100644
--- a/doc/guides/nics/ice.rst
+++ b/doc/guides/nics/ice.rst
@@ -108,6 +108,15 @@ Runtime Configuration
 
     -a 80:00.0,default-mac-disable=1
 
+- ``DDP Package File``
+
+  Rather than have the driver search for the DDP package to load,
+  or to override what package is used,
+  the ``ddp_pkg_file`` option can be used to provide the path to a specific package file.
+  For example::
+
+    -a 80:00.0,ddp_pkg_file=/path/to/ice-version.pkg
+
 - ``Protocol extraction for per queue``
 
   Configure the RX queues to do protocol extraction into mbuf for protocol
diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c
index 304f959b7e..3e7ceda9ce 100644
--- a/drivers/net/ice/ice_ethdev.c
+++ b/drivers/net/ice/ice_ethdev.c
@@ -36,6 +36,7 @@
 #define ICE_ONE_PPS_OUT_ARG       "pps_out"
 #define ICE_RX_LOW_LATENCY_ARG    "rx_low_latency"
 #define ICE_MBUF_CHECK_ARG       "mbuf_check"
+#define ICE_DDP_FILENAME          "ddp_pkg_file"
 
 #define ICE_CYCLECOUNTER_MASK  0xffffffffffffffffULL
 
@@ -52,6 +53,7 @@ static const char * const ice_valid_args[] = {
 	ICE_RX_LOW_LATENCY_ARG,
 	ICE_DEFAULT_MAC_DISABLE,
 	ICE_MBUF_CHECK_ARG,
+	ICE_DDP_FILENAME,
 	NULL
 };
 
@@ -692,6 +694,18 @@ handle_field_name_arg(__rte_unused const char *key, const char *value,
 	return 0;
 }
 
+static int
+handle_ddp_filename_arg(__rte_unused const char *key, const char *value, void *name_args)
+{
+	const char **filename = name_args;
+	if (strlen(value) >= ICE_MAX_PKG_FILENAME_SIZE) {
+		PMD_DRV_LOG(ERR, "The DDP package filename is too long : '%s'", value);
+		return -1;
+	}
+	*filename = strdup(value);
+	return 0;
+}
+
 static void
 ice_check_proto_xtr_support(struct ice_hw *hw)
 {
@@ -1882,6 +1896,16 @@ int ice_load_pkg(struct ice_adapter *adapter, bool use_dsn, uint64_t dsn)
 	size_t bufsz;
 	int err;
 
+	if (adapter->devargs.ddp_filename != NULL) {
+		strlcpy(pkg_file, adapter->devargs.ddp_filename, sizeof(pkg_file));
+		if (rte_firmware_read(pkg_file, &buf, &bufsz) == 0) {
+			goto load_fw;
+		} else {
+			PMD_INIT_LOG(ERR, "Cannot load DDP file: %s\n", pkg_file);
+			return -1;
+		}
+	}
+
 	if (!use_dsn)
 		goto no_dsn;
 
@@ -2216,6 +2240,13 @@ static int ice_parse_devargs(struct rte_eth_dev *dev)
 
 	ret = rte_kvargs_process(kvlist, ICE_RX_LOW_LATENCY_ARG,
 				 &parse_bool, &ad->devargs.rx_low_latency);
+	if (ret)
+		goto bail;
+
+	ret = rte_kvargs_process(kvlist, ICE_DDP_FILENAME,
+				 &handle_ddp_filename_arg, &ad->devargs.ddp_filename);
+	if (ret)
+		goto bail;
 
 bail:
 	rte_kvargs_free(kvlist);
@@ -2689,6 +2720,8 @@ ice_dev_close(struct rte_eth_dev *dev)
 	ice_free_hw_tbls(hw);
 	rte_free(hw->port_info);
 	hw->port_info = NULL;
+	free((void *)(uintptr_t)ad->devargs.ddp_filename);
+	ad->devargs.ddp_filename = NULL;
 	ice_shutdown_all_ctrlq(hw, true);
 	rte_free(pf->proto_xtr);
 	pf->proto_xtr = NULL;
@@ -6981,6 +7014,7 @@ RTE_PMD_REGISTER_PARAM_STRING(net_ice,
 			      ICE_PROTO_XTR_ARG "=[queue:]<vlan|ipv4|ipv6|ipv6_flow|tcp|ip_offset>"
 			      ICE_SAFE_MODE_SUPPORT_ARG "=<0|1>"
 			      ICE_DEFAULT_MAC_DISABLE "=<0|1>"
+			      ICE_DDP_FILENAME "=</path/to/file>"
 			      ICE_RX_LOW_LATENCY_ARG "=<0|1>");
 
 RTE_LOG_REGISTER_SUFFIX(ice_logtype_init, init, NOTICE);
diff --git a/drivers/net/ice/ice_ethdev.h b/drivers/net/ice/ice_ethdev.h
index 3ea9f37dc8..c211b5b9cc 100644
--- a/drivers/net/ice/ice_ethdev.h
+++ b/drivers/net/ice/ice_ethdev.h
@@ -568,6 +568,7 @@ struct ice_devargs {
 	/* Name of the field. */
 	char xtr_field_name[RTE_MBUF_DYN_NAMESIZE];
 	uint64_t mbuf_check;
+	const char *ddp_filename;
 };
 
 /**
-- 
2.43.0
^ permalink raw reply	[flat|nested] 76+ messages in thread
* [PATCH 05/15] net/ice: add option to download scheduler topology
  2024-08-07  9:33 [PATCH 00/15] Improve rte_tm support in ICE driver Bruce Richardson
                   ` (3 preceding siblings ...)
  2024-08-07  9:33 ` [PATCH 04/15] net/ice: add option to choose DDP package file Bruce Richardson
@ 2024-08-07  9:33 ` Bruce Richardson
  2024-08-07  9:33 ` [PATCH 06/15] net/ice/base: allow init without TC class sched nodes Bruce Richardson
                   ` (14 subsequent siblings)
  19 siblings, 0 replies; 76+ messages in thread
From: Bruce Richardson @ 2024-08-07  9:33 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson
The DDP package file being loaded at init time may contain an
alternative Tx Scheduler topology in it. Add driver option to load this
topology at init time.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/ice/base/ice_ddp.c | 18 +++++++++++++++---
 drivers/net/ice/base/ice_ddp.h |  4 ++--
 drivers/net/ice/ice_ethdev.c   | 24 +++++++++++++++---------
 drivers/net/ice/ice_ethdev.h   |  1 +
 4 files changed, 33 insertions(+), 14 deletions(-)
diff --git a/drivers/net/ice/base/ice_ddp.c b/drivers/net/ice/base/ice_ddp.c
index 24506dfaea..e6c42c5274 100644
--- a/drivers/net/ice/base/ice_ddp.c
+++ b/drivers/net/ice/base/ice_ddp.c
@@ -1326,7 +1326,7 @@ ice_fill_hw_ptype(struct ice_hw *hw)
  * ice_copy_and_init_pkg() instead of directly calling ice_init_pkg() in this
  * case.
  */
-enum ice_ddp_state ice_init_pkg(struct ice_hw *hw, u8 *buf, u32 len)
+enum ice_ddp_state ice_init_pkg(struct ice_hw *hw, u8 *buf, u32 len, bool load_sched)
 {
 	bool already_loaded = false;
 	enum ice_ddp_state state;
@@ -1344,6 +1344,18 @@ enum ice_ddp_state ice_init_pkg(struct ice_hw *hw, u8 *buf, u32 len)
 		return state;
 	}
 
+	if (load_sched) {
+		enum ice_status res = ice_cfg_tx_topo(hw, buf, len);
+		if (res != ICE_SUCCESS) {
+			ice_debug(hw, ICE_DBG_INIT, "failed to apply sched topology  (err: %d)\n",
+					res);
+			return ICE_DDP_PKG_ERR;
+		}
+		ice_debug(hw, ICE_DBG_INIT, "Topology download successful, reinitializing device\n");
+		ice_deinit_hw(hw);
+		ice_init_hw(hw);
+	}
+
 	/* initialize package info */
 	state = ice_init_pkg_info(hw, pkg);
 	if (state)
@@ -1416,7 +1428,7 @@ enum ice_ddp_state ice_init_pkg(struct ice_hw *hw, u8 *buf, u32 len)
  * related routines.
  */
 enum ice_ddp_state
-ice_copy_and_init_pkg(struct ice_hw *hw, const u8 *buf, u32 len)
+ice_copy_and_init_pkg(struct ice_hw *hw, const u8 *buf, u32 len, bool load_sched)
 {
 	enum ice_ddp_state state;
 	u8 *buf_copy;
@@ -1426,7 +1438,7 @@ ice_copy_and_init_pkg(struct ice_hw *hw, const u8 *buf, u32 len)
 
 	buf_copy = (u8 *)ice_memdup(hw, buf, len, ICE_NONDMA_TO_NONDMA);
 
-	state = ice_init_pkg(hw, buf_copy, len);
+	state = ice_init_pkg(hw, buf_copy, len, load_sched);
 	if (!ice_is_init_pkg_successful(state)) {
 		/* Free the copy, since we failed to initialize the package */
 		ice_free(hw, buf_copy);
diff --git a/drivers/net/ice/base/ice_ddp.h b/drivers/net/ice/base/ice_ddp.h
index 5761920207..2feba2e91d 100644
--- a/drivers/net/ice/base/ice_ddp.h
+++ b/drivers/net/ice/base/ice_ddp.h
@@ -451,9 +451,9 @@ ice_pkg_enum_entry(struct ice_seg *ice_seg, struct ice_pkg_enum *state,
 void *
 ice_pkg_enum_section(struct ice_seg *ice_seg, struct ice_pkg_enum *state,
 		     u32 sect_type);
-enum ice_ddp_state ice_init_pkg(struct ice_hw *hw, u8 *buff, u32 len);
+enum ice_ddp_state ice_init_pkg(struct ice_hw *hw, u8 *buff, u32 len, bool load_sched);
 enum ice_ddp_state
-ice_copy_and_init_pkg(struct ice_hw *hw, const u8 *buf, u32 len);
+ice_copy_and_init_pkg(struct ice_hw *hw, const u8 *buf, u32 len, bool load_sched);
 bool ice_is_init_pkg_successful(enum ice_ddp_state state);
 void ice_free_seg(struct ice_hw *hw);
 
diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c
index 3e7ceda9ce..0d2445a317 100644
--- a/drivers/net/ice/ice_ethdev.c
+++ b/drivers/net/ice/ice_ethdev.c
@@ -37,6 +37,7 @@
 #define ICE_RX_LOW_LATENCY_ARG    "rx_low_latency"
 #define ICE_MBUF_CHECK_ARG       "mbuf_check"
 #define ICE_DDP_FILENAME          "ddp_pkg_file"
+#define ICE_DDP_LOAD_SCHED        "ddp_load_sched_topo"
 
 #define ICE_CYCLECOUNTER_MASK  0xffffffffffffffffULL
 
@@ -54,6 +55,7 @@ static const char * const ice_valid_args[] = {
 	ICE_DEFAULT_MAC_DISABLE,
 	ICE_MBUF_CHECK_ARG,
 	ICE_DDP_FILENAME,
+	ICE_DDP_LOAD_SCHED,
 	NULL
 };
 
@@ -1938,7 +1940,7 @@ int ice_load_pkg(struct ice_adapter *adapter, bool use_dsn, uint64_t dsn)
 load_fw:
 	PMD_INIT_LOG(DEBUG, "DDP package name: %s", pkg_file);
 
-	err = ice_copy_and_init_pkg(hw, buf, bufsz);
+	err = ice_copy_and_init_pkg(hw, buf, bufsz, adapter->devargs.ddp_load_sched);
 	if (!ice_is_init_pkg_successful(err)) {
 		PMD_INIT_LOG(ERR, "ice_copy_and_init_hw failed: %d\n", err);
 		free(buf);
@@ -1971,19 +1973,18 @@ static int
 parse_bool(const char *key, const char *value, void *args)
 {
 	int *i = (int *)args;
-	char *end;
-	int num;
 
-	num = strtoul(value, &end, 10);
-
-	if (num != 0 && num != 1) {
-		PMD_DRV_LOG(WARNING, "invalid value:\"%s\" for key:\"%s\", "
-			"value must be 0 or 1",
+	if (value == NULL || value[0] == '\0') {
+		PMD_DRV_LOG(WARNING, "key:\"%s\", requires a value, which must be 0 or 1", key);
+		return -1;
+	}
+	if (value[1] != '\0' || (value[0] != '0' && value[0] != '1')) {
+		PMD_DRV_LOG(WARNING, "invalid value:\"%s\" for key:\"%s\", value must be 0 or 1",
 			value, key);
 		return -1;
 	}
 
-	*i = num;
+	*i = value[0] - '0';
 	return 0;
 }
 
@@ -2248,6 +2249,10 @@ static int ice_parse_devargs(struct rte_eth_dev *dev)
 	if (ret)
 		goto bail;
 
+	ret = rte_kvargs_process(kvlist, ICE_DDP_LOAD_SCHED,
+				 &parse_bool, &ad->devargs.ddp_load_sched);
+	if (ret)
+		goto bail;
 bail:
 	rte_kvargs_free(kvlist);
 	return ret;
@@ -7014,6 +7019,7 @@ RTE_PMD_REGISTER_PARAM_STRING(net_ice,
 			      ICE_PROTO_XTR_ARG "=[queue:]<vlan|ipv4|ipv6|ipv6_flow|tcp|ip_offset>"
 			      ICE_SAFE_MODE_SUPPORT_ARG "=<0|1>"
 			      ICE_DEFAULT_MAC_DISABLE "=<0|1>"
+			      ICE_DDP_LOAD_SCHED "=<0|1>"
 			      ICE_DDP_FILENAME "=</path/to/file>"
 			      ICE_RX_LOW_LATENCY_ARG "=<0|1>");
 
diff --git a/drivers/net/ice/ice_ethdev.h b/drivers/net/ice/ice_ethdev.h
index c211b5b9cc..f31addb122 100644
--- a/drivers/net/ice/ice_ethdev.h
+++ b/drivers/net/ice/ice_ethdev.h
@@ -563,6 +563,7 @@ struct ice_devargs {
 	uint8_t proto_xtr[ICE_MAX_QUEUE_NUM];
 	uint8_t pin_idx;
 	uint8_t pps_out_ena;
+	int ddp_load_sched;
 	int xtr_field_offs;
 	uint8_t xtr_flag_offs[PROTO_XTR_MAX];
 	/* Name of the field. */
-- 
2.43.0
^ permalink raw reply	[flat|nested] 76+ messages in thread
* [PATCH 06/15] net/ice/base: allow init without TC class sched nodes
  2024-08-07  9:33 [PATCH 00/15] Improve rte_tm support in ICE driver Bruce Richardson
                   ` (4 preceding siblings ...)
  2024-08-07  9:33 ` [PATCH 05/15] net/ice: add option to download scheduler topology Bruce Richardson
@ 2024-08-07  9:33 ` Bruce Richardson
  2024-08-07  9:33 ` [PATCH 07/15] net/ice/base: set VSI index on newly created nodes Bruce Richardson
                   ` (13 subsequent siblings)
  19 siblings, 0 replies; 76+ messages in thread
From: Bruce Richardson @ 2024-08-07  9:33 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson
If DCB support is disabled via DDP image, there will not be any traffic
class (TC) nodes in the scheduler tree immediately above the root level.
To allow the driver to work with this scenario, we allow use of the root
node as a dummy TC0 node in case where there are no TC nodes in the
tree. For use of any other TC other than 0 (used by default in the
driver), existing behaviour of returning NULL pointer is maintained.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/ice/base/ice_sched.c | 6 ++++++
 drivers/net/ice/base/ice_type.h  | 1 +
 2 files changed, 7 insertions(+)
diff --git a/drivers/net/ice/base/ice_sched.c b/drivers/net/ice/base/ice_sched.c
index 373c32a518..f75e5ae599 100644
--- a/drivers/net/ice/base/ice_sched.c
+++ b/drivers/net/ice/base/ice_sched.c
@@ -292,6 +292,10 @@ struct ice_sched_node *ice_sched_get_tc_node(struct ice_port_info *pi, u8 tc)
 
 	if (!pi || !pi->root)
 		return NULL;
+	/* if no TC nodes, use root as TC node 0 */
+	if (pi->has_tc == 0)
+		return tc == 0 ? pi->root : NULL;
+
 	for (i = 0; i < pi->root->num_children; i++)
 		if (pi->root->children[i]->tc_num == tc)
 			return pi->root->children[i];
@@ -1306,6 +1310,8 @@ int ice_sched_init_port(struct ice_port_info *pi)
 			    ICE_AQC_ELEM_TYPE_ENTRY_POINT)
 				hw->sw_entry_point_layer = j;
 
+			if (buf[0].generic[j].data.elem_type == ICE_AQC_ELEM_TYPE_TC)
+				pi->has_tc = 1;
 			status = ice_sched_add_node(pi, j, &buf[i].generic[j], NULL);
 			if (status)
 				goto err_init_port;
diff --git a/drivers/net/ice/base/ice_type.h b/drivers/net/ice/base/ice_type.h
index 598a80155b..a70e4a8afa 100644
--- a/drivers/net/ice/base/ice_type.h
+++ b/drivers/net/ice/base/ice_type.h
@@ -1260,6 +1260,7 @@ struct ice_port_info {
 	struct ice_qos_cfg qos_cfg;
 	u8 is_vf:1;
 	u8 is_custom_tx_enabled:1;
+	u8 has_tc:1;
 };
 
 struct ice_switch_info {
-- 
2.43.0
^ permalink raw reply	[flat|nested] 76+ messages in thread
* [PATCH 07/15] net/ice/base: set VSI index on newly created nodes
  2024-08-07  9:33 [PATCH 00/15] Improve rte_tm support in ICE driver Bruce Richardson
                   ` (5 preceding siblings ...)
  2024-08-07  9:33 ` [PATCH 06/15] net/ice/base: allow init without TC class sched nodes Bruce Richardson
@ 2024-08-07  9:33 ` Bruce Richardson
  2024-08-07  9:34 ` [PATCH 08/15] net/ice/base: read VSI layer info from VSI Bruce Richardson
                   ` (12 subsequent siblings)
  19 siblings, 0 replies; 76+ messages in thread
From: Bruce Richardson @ 2024-08-07  9:33 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson
The ice_sched_node type has got a field for the vsi to which the node
belongs. This field was not getting set in "ice_sched_add_node", so add
a line configuring this field for each node from its parent node.
Similarly, when searching for a qgroup node, we can check for each node
that the VSI information is correct.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/ice/base/ice_sched.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ice/base/ice_sched.c b/drivers/net/ice/base/ice_sched.c
index f75e5ae599..f6dc5ae173 100644
--- a/drivers/net/ice/base/ice_sched.c
+++ b/drivers/net/ice/base/ice_sched.c
@@ -200,6 +200,7 @@ ice_sched_add_node(struct ice_port_info *pi, u8 layer,
 	node->in_use = true;
 	node->parent = parent;
 	node->tx_sched_layer = layer;
+	node->vsi_handle = parent->vsi_handle;
 	parent->children[parent->num_children++] = node;
 	node->info = elem;
 	return 0;
@@ -1581,7 +1582,7 @@ ice_sched_get_free_qparent(struct ice_port_info *pi, u16 vsi_handle, u8 tc,
 		/* make sure the qgroup node is part of the VSI subtree */
 		if (ice_sched_find_node_in_subtree(pi->hw, vsi_node, qgrp_node))
 			if (qgrp_node->num_children < max_children &&
-			    qgrp_node->owner == owner)
+			    qgrp_node->owner == owner && qgrp_node->vsi_handle == vsi_handle)
 				break;
 		qgrp_node = qgrp_node->sibling;
 	}
-- 
2.43.0
^ permalink raw reply	[flat|nested] 76+ messages in thread
* [PATCH 08/15] net/ice/base: read VSI layer info from VSI
  2024-08-07  9:33 [PATCH 00/15] Improve rte_tm support in ICE driver Bruce Richardson
                   ` (6 preceding siblings ...)
  2024-08-07  9:33 ` [PATCH 07/15] net/ice/base: set VSI index on newly created nodes Bruce Richardson
@ 2024-08-07  9:34 ` Bruce Richardson
  2024-08-07  9:34 ` [PATCH 09/15] net/ice/base: remove 255 limit on sched child nodes Bruce Richardson
                   ` (11 subsequent siblings)
  19 siblings, 0 replies; 76+ messages in thread
From: Bruce Richardson @ 2024-08-07  9:34 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson
Rather than computing from the number of HW layers the layer of the VSI,
we can instead just read that info from the VSI node itself. This allows
the layer to be changed at runtime.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/ice/base/ice_sched.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ice/base/ice_sched.c b/drivers/net/ice/base/ice_sched.c
index f6dc5ae173..e398984bf2 100644
--- a/drivers/net/ice/base/ice_sched.c
+++ b/drivers/net/ice/base/ice_sched.c
@@ -1559,7 +1559,6 @@ ice_sched_get_free_qparent(struct ice_port_info *pi, u16 vsi_handle, u8 tc,
 	u16 max_children;
 
 	qgrp_layer = ice_sched_get_qgrp_layer(pi->hw);
-	vsi_layer = ice_sched_get_vsi_layer(pi->hw);
 	max_children = pi->hw->max_children[qgrp_layer];
 
 	vsi_ctx = ice_get_vsi_ctx(pi->hw, vsi_handle);
@@ -1569,6 +1568,7 @@ ice_sched_get_free_qparent(struct ice_port_info *pi, u16 vsi_handle, u8 tc,
 	/* validate invalid VSI ID */
 	if (!vsi_node)
 		return NULL;
+	vsi_layer = vsi_node->tx_sched_layer;
 
 	/* If the queue group and vsi layer are same then queues
 	 * are all attached directly to VSI
-- 
2.43.0
^ permalink raw reply	[flat|nested] 76+ messages in thread
* [PATCH 09/15] net/ice/base: remove 255 limit on sched child nodes
  2024-08-07  9:33 [PATCH 00/15] Improve rte_tm support in ICE driver Bruce Richardson
                   ` (7 preceding siblings ...)
  2024-08-07  9:34 ` [PATCH 08/15] net/ice/base: read VSI layer info from VSI Bruce Richardson
@ 2024-08-07  9:34 ` Bruce Richardson
  2024-08-07  9:34 ` [PATCH 10/15] net/ice/base: optimize subtree searches Bruce Richardson
                   ` (10 subsequent siblings)
  19 siblings, 0 replies; 76+ messages in thread
From: Bruce Richardson @ 2024-08-07  9:34 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson
The Tx scheduler in the ice driver can be configured to have large
numbers of child nodes at a given layer, but the driver code implicitly
limited the number of nodes to 255 by using a u8 datatype for the number
of children. Increase this to a 16-bit value throughout the code.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/ice/base/ice_sched.c | 25 ++++++++++++++-----------
 drivers/net/ice/base/ice_type.h  |  2 +-
 2 files changed, 15 insertions(+), 12 deletions(-)
diff --git a/drivers/net/ice/base/ice_sched.c b/drivers/net/ice/base/ice_sched.c
index e398984bf2..be13833e1e 100644
--- a/drivers/net/ice/base/ice_sched.c
+++ b/drivers/net/ice/base/ice_sched.c
@@ -289,7 +289,7 @@ ice_sched_get_first_node(struct ice_port_info *pi,
  */
 struct ice_sched_node *ice_sched_get_tc_node(struct ice_port_info *pi, u8 tc)
 {
-	u8 i;
+	u16 i;
 
 	if (!pi || !pi->root)
 		return NULL;
@@ -316,7 +316,7 @@ void ice_free_sched_node(struct ice_port_info *pi, struct ice_sched_node *node)
 {
 	struct ice_sched_node *parent;
 	struct ice_hw *hw = pi->hw;
-	u8 i, j;
+	u16 i, j;
 
 	/* Free the children before freeing up the parent node
 	 * The parent array is updated below and that shifts the nodes
@@ -1473,7 +1473,7 @@ bool
 ice_sched_find_node_in_subtree(struct ice_hw *hw, struct ice_sched_node *base,
 			       struct ice_sched_node *node)
 {
-	u8 i;
+	u16 i;
 
 	for (i = 0; i < base->num_children; i++) {
 		struct ice_sched_node *child = base->children[i];
@@ -1510,7 +1510,7 @@ ice_sched_get_free_qgrp(struct ice_port_info *pi,
 			struct ice_sched_node *qgrp_node, u8 owner)
 {
 	struct ice_sched_node *min_qgrp;
-	u8 min_children;
+	u16 min_children;
 
 	if (!qgrp_node)
 		return qgrp_node;
@@ -2070,7 +2070,7 @@ static void ice_sched_rm_agg_vsi_info(struct ice_port_info *pi, u16 vsi_handle)
  */
 static bool ice_sched_is_leaf_node_present(struct ice_sched_node *node)
 {
-	u8 i;
+	u16 i;
 
 	for (i = 0; i < node->num_children; i++)
 		if (ice_sched_is_leaf_node_present(node->children[i]))
@@ -2105,7 +2105,7 @@ ice_sched_rm_vsi_cfg(struct ice_port_info *pi, u16 vsi_handle, u8 owner)
 
 	ice_for_each_traffic_class(i) {
 		struct ice_sched_node *vsi_node, *tc_node;
-		u8 j = 0;
+		u16 j = 0;
 
 		tc_node = ice_sched_get_tc_node(pi, i);
 		if (!tc_node)
@@ -2173,7 +2173,7 @@ int ice_rm_vsi_lan_cfg(struct ice_port_info *pi, u16 vsi_handle)
  */
 bool ice_sched_is_tree_balanced(struct ice_hw *hw, struct ice_sched_node *node)
 {
-	u8 i;
+	u16 i;
 
 	/* start from the leaf node */
 	for (i = 0; i < node->num_children; i++)
@@ -2247,7 +2247,8 @@ ice_sched_get_free_vsi_parent(struct ice_hw *hw, struct ice_sched_node *node,
 			      u16 *num_nodes)
 {
 	u8 l = node->tx_sched_layer;
-	u8 vsil, i;
+	u8 vsil;
+	u16 i;
 
 	vsil = ice_sched_get_vsi_layer(hw);
 
@@ -2289,7 +2290,7 @@ ice_sched_update_parent(struct ice_sched_node *new_parent,
 			struct ice_sched_node *node)
 {
 	struct ice_sched_node *old_parent;
-	u8 i, j;
+	u16 i, j;
 
 	old_parent = node->parent;
 
@@ -2389,7 +2390,8 @@ ice_sched_move_vsi_to_agg(struct ice_port_info *pi, u16 vsi_handle, u32 agg_id,
 	u16 num_nodes[ICE_AQC_TOPO_MAX_LEVEL_NUM] = { 0 };
 	u32 first_node_teid, vsi_teid;
 	u16 num_nodes_added;
-	u8 aggl, vsil, i;
+	u8 aggl, vsil;
+	u16 i;
 	int status;
 
 	tc_node = ice_sched_get_tc_node(pi, tc);
@@ -2505,7 +2507,8 @@ ice_move_all_vsi_to_dflt_agg(struct ice_port_info *pi,
 static bool
 ice_sched_is_agg_inuse(struct ice_port_info *pi, struct ice_sched_node *node)
 {
-	u8 vsil, i;
+	u8 vsil;
+	u16 i;
 
 	vsil = ice_sched_get_vsi_layer(pi->hw);
 	if (node->tx_sched_layer < vsil - 1) {
diff --git a/drivers/net/ice/base/ice_type.h b/drivers/net/ice/base/ice_type.h
index a70e4a8afa..35f832eb9f 100644
--- a/drivers/net/ice/base/ice_type.h
+++ b/drivers/net/ice/base/ice_type.h
@@ -1030,9 +1030,9 @@ struct ice_sched_node {
 	struct ice_aqc_txsched_elem_data info;
 	u32 agg_id;			/* aggregator group ID */
 	u16 vsi_handle;
+	u16 num_children;
 	u8 in_use;			/* suspended or in use */
 	u8 tx_sched_layer;		/* Logical Layer (1-9) */
-	u8 num_children;
 	u8 tc_num;
 	u8 owner;
 #define ICE_SCHED_NODE_OWNER_LAN	0
-- 
2.43.0
^ permalink raw reply	[flat|nested] 76+ messages in thread
* [PATCH 10/15] net/ice/base: optimize subtree searches
  2024-08-07  9:33 [PATCH 00/15] Improve rte_tm support in ICE driver Bruce Richardson
                   ` (8 preceding siblings ...)
  2024-08-07  9:34 ` [PATCH 09/15] net/ice/base: remove 255 limit on sched child nodes Bruce Richardson
@ 2024-08-07  9:34 ` Bruce Richardson
  2024-08-07  9:34 ` [PATCH 11/15] net/ice/base: make functions non-static Bruce Richardson
                   ` (9 subsequent siblings)
  19 siblings, 0 replies; 76+ messages in thread
From: Bruce Richardson @ 2024-08-07  9:34 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson
In a number of places throughout the driver code, we want to confirm
that a scheduler node is indeed a child of another node. Currently, this
is confirmed by searching down the tree from the base until the desired
node is hit, a search which may hit many irrelevant tree nodes when
recursing down wrong branches. By switching the direction of search, to
check upwards from the node to the parent, we can avoid any incorrect
paths, and so speed up processing.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/ice/base/ice_sched.c | 18 +++++-------------
 1 file changed, 5 insertions(+), 13 deletions(-)
diff --git a/drivers/net/ice/base/ice_sched.c b/drivers/net/ice/base/ice_sched.c
index be13833e1e..f7d5f8f415 100644
--- a/drivers/net/ice/base/ice_sched.c
+++ b/drivers/net/ice/base/ice_sched.c
@@ -1475,20 +1475,12 @@ ice_sched_find_node_in_subtree(struct ice_hw *hw, struct ice_sched_node *base,
 {
 	u16 i;
 
-	for (i = 0; i < base->num_children; i++) {
-		struct ice_sched_node *child = base->children[i];
-
-		if (node == child)
-			return true;
-
-		if (child->tx_sched_layer > node->tx_sched_layer)
-			return false;
-
-		/* this recursion is intentional, and wouldn't
-		 * go more than 8 calls
-		 */
-		if (ice_sched_find_node_in_subtree(hw, child, node))
+	if (base == node)
+		return true;
+	while (node->tx_sched_layer != 0 && node->parent != NULL) {
+		if (node->parent == base)
 			return true;
+		node = node->parent;
 	}
 	return false;
 }
-- 
2.43.0
^ permalink raw reply	[flat|nested] 76+ messages in thread
* [PATCH 11/15] net/ice/base: make functions non-static
  2024-08-07  9:33 [PATCH 00/15] Improve rte_tm support in ICE driver Bruce Richardson
                   ` (9 preceding siblings ...)
  2024-08-07  9:34 ` [PATCH 10/15] net/ice/base: optimize subtree searches Bruce Richardson
@ 2024-08-07  9:34 ` Bruce Richardson
  2024-08-07  9:34 ` [PATCH 12/15] net/ice/base: remove flag checks before topology upload Bruce Richardson
                   ` (8 subsequent siblings)
  19 siblings, 0 replies; 76+ messages in thread
From: Bruce Richardson @ 2024-08-07  9:34 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson
We will need to allocate more lanq contexts after a scheduler rework, so
make that function non-static so accessible outside the file. For similar
reasons, make the function to add a Tx scheduler node non-static
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/ice/base/ice_sched.c | 2 +-
 drivers/net/ice/base/ice_sched.h | 8 ++++++++
 2 files changed, 9 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ice/base/ice_sched.c b/drivers/net/ice/base/ice_sched.c
index f7d5f8f415..d88b836c38 100644
--- a/drivers/net/ice/base/ice_sched.c
+++ b/drivers/net/ice/base/ice_sched.c
@@ -570,7 +570,7 @@ ice_sched_suspend_resume_elems(struct ice_hw *hw, u8 num_nodes, u32 *node_teids,
  * @tc: TC number
  * @new_numqs: number of queues
  */
-static int
+int
 ice_alloc_lan_q_ctx(struct ice_hw *hw, u16 vsi_handle, u8 tc, u16 new_numqs)
 {
 	struct ice_vsi_ctx *vsi_ctx;
diff --git a/drivers/net/ice/base/ice_sched.h b/drivers/net/ice/base/ice_sched.h
index 9f78516dfb..c7eb794963 100644
--- a/drivers/net/ice/base/ice_sched.h
+++ b/drivers/net/ice/base/ice_sched.h
@@ -270,4 +270,12 @@ int ice_sched_replay_q_bw(struct ice_port_info *pi, struct ice_q_ctx *q_ctx);
 int
 ice_sched_cfg_node_bw_alloc(struct ice_hw *hw, struct ice_sched_node *node,
 			    enum ice_rl_type rl_type, u16 bw_alloc);
+
+int
+ice_sched_add_elems(struct ice_port_info *pi, struct ice_sched_node *tc_node,
+		    struct ice_sched_node *parent, u8 layer, u16 num_nodes,
+		    u16 *num_nodes_added, u32 *first_node_teid,
+		    struct ice_sched_node **prealloc_nodes);
+int
+ice_alloc_lan_q_ctx(struct ice_hw *hw, u16 vsi_handle, u8 tc, u16 new_numqs);
 #endif /* _ICE_SCHED_H_ */
-- 
2.43.0
^ permalink raw reply	[flat|nested] 76+ messages in thread
* [PATCH 12/15] net/ice/base: remove flag checks before topology upload
  2024-08-07  9:33 [PATCH 00/15] Improve rte_tm support in ICE driver Bruce Richardson
                   ` (10 preceding siblings ...)
  2024-08-07  9:34 ` [PATCH 11/15] net/ice/base: make functions non-static Bruce Richardson
@ 2024-08-07  9:34 ` Bruce Richardson
  2024-08-07  9:34 ` [PATCH 13/15] net/ice: limit the number of queues to sched capabilities Bruce Richardson
                   ` (7 subsequent siblings)
  19 siblings, 0 replies; 76+ messages in thread
From: Bruce Richardson @ 2024-08-07  9:34 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson
DPDK should support more than just 9-level or 5-level topologies, so
remove the checks for those particular settings.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/ice/base/ice_ddp.c | 33 ---------------------------------
 1 file changed, 33 deletions(-)
diff --git a/drivers/net/ice/base/ice_ddp.c b/drivers/net/ice/base/ice_ddp.c
index e6c42c5274..744f015fe5 100644
--- a/drivers/net/ice/base/ice_ddp.c
+++ b/drivers/net/ice/base/ice_ddp.c
@@ -2373,38 +2373,6 @@ int ice_cfg_tx_topo(struct ice_hw *hw, u8 *buf, u32 len)
 		return status;
 	}
 
-	/* Is default topology already applied ? */
-	if (!(flags & ICE_AQC_TX_TOPO_FLAGS_LOAD_NEW) &&
-	    hw->num_tx_sched_layers == 9) {
-		ice_debug(hw, ICE_DBG_INIT, "Loaded default topology\n");
-		/* Already default topology is loaded */
-		return ICE_ERR_ALREADY_EXISTS;
-	}
-
-	/* Is new topology already applied ? */
-	if ((flags & ICE_AQC_TX_TOPO_FLAGS_LOAD_NEW) &&
-	    hw->num_tx_sched_layers == 5) {
-		ice_debug(hw, ICE_DBG_INIT, "Loaded new topology\n");
-		/* Already new topology is loaded */
-		return ICE_ERR_ALREADY_EXISTS;
-	}
-
-	/* Is set topology issued already ? */
-	if (flags & ICE_AQC_TX_TOPO_FLAGS_ISSUED) {
-		ice_debug(hw, ICE_DBG_INIT, "Update tx topology was done by another PF\n");
-		/* add a small delay before exiting */
-		for (i = 0; i < 20; i++)
-			ice_msec_delay(100, true);
-		return ICE_ERR_ALREADY_EXISTS;
-	}
-
-	/* Change the topology from new to default (5 to 9) */
-	if (!(flags & ICE_AQC_TX_TOPO_FLAGS_LOAD_NEW) &&
-	    hw->num_tx_sched_layers == 5) {
-		ice_debug(hw, ICE_DBG_INIT, "Change topology from 5 to 9 layers\n");
-		goto update_topo;
-	}
-
 	pkg_hdr = (struct ice_pkg_hdr *)buf;
 	state = ice_verify_pkg(pkg_hdr, len);
 	if (state) {
@@ -2451,7 +2419,6 @@ int ice_cfg_tx_topo(struct ice_hw *hw, u8 *buf, u32 len)
 	/* Get the new topology buffer */
 	new_topo = ((u8 *)section) + offset;
 
-update_topo:
 	/* acquire global lock to make sure that set topology issued
 	 * by one PF
 	 */
-- 
2.43.0
^ permalink raw reply	[flat|nested] 76+ messages in thread
* [PATCH 13/15] net/ice: limit the number of queues to sched capabilities
  2024-08-07  9:33 [PATCH 00/15] Improve rte_tm support in ICE driver Bruce Richardson
                   ` (11 preceding siblings ...)
  2024-08-07  9:34 ` [PATCH 12/15] net/ice/base: remove flag checks before topology upload Bruce Richardson
@ 2024-08-07  9:34 ` Bruce Richardson
  2024-08-07  9:34 ` [PATCH 14/15] net/ice: enhance Tx scheduler hierarchy support Bruce Richardson
                   ` (6 subsequent siblings)
  19 siblings, 0 replies; 76+ messages in thread
From: Bruce Richardson @ 2024-08-07  9:34 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson
Rather than assuming that each VSI can hold up to 256 queue pairs,
or the reported device limit, query the available nodes in the scheduler
tree to check that we are not overflowing the limit for number of
child scheduling nodes at each level. Do this by multiplying
max_children for each level beyond the VSI and using that as an
additional cap on the number of queues.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/ice/ice_ethdev.c | 25 ++++++++++++++++++++-----
 1 file changed, 20 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c
index 0d2445a317..ab3f88fd7d 100644
--- a/drivers/net/ice/ice_ethdev.c
+++ b/drivers/net/ice/ice_ethdev.c
@@ -913,7 +913,7 @@ ice_vsi_config_default_rss(struct ice_aqc_vsi_props *info)
 }
 
 static int
-ice_vsi_config_tc_queue_mapping(struct ice_vsi *vsi,
+ice_vsi_config_tc_queue_mapping(struct ice_hw *hw, struct ice_vsi *vsi,
 				struct ice_aqc_vsi_props *info,
 				uint8_t enabled_tcmap)
 {
@@ -929,13 +929,28 @@ ice_vsi_config_tc_queue_mapping(struct ice_vsi *vsi,
 	}
 
 	/* vector 0 is reserved and 1 vector for ctrl vsi */
-	if (vsi->adapter->hw.func_caps.common_cap.num_msix_vectors < 2)
+	if (vsi->adapter->hw.func_caps.common_cap.num_msix_vectors < 2) {
 		vsi->nb_qps = 0;
-	else
+	} else {
 		vsi->nb_qps = RTE_MIN
 			((uint16_t)vsi->adapter->hw.func_caps.common_cap.num_msix_vectors - 2,
 			RTE_MIN(vsi->nb_qps, ICE_MAX_Q_PER_TC));
 
+		/* cap max QPs to what the HW reports as num-children for each layer.
+		 * Multiply num_children for each layer from the entry_point layer to
+		 * the qgroup, or second-last layer.
+		 * Avoid any potential overflow by using uint32_t type and breaking loop
+		 * once we have a number greater than the already configured max.
+		 */
+		uint32_t max_sched_vsi_nodes = 1;
+		for (uint8_t i = hw->sw_entry_point_layer; i < hw->num_tx_sched_layers - 1; i++) {
+			max_sched_vsi_nodes *= hw->max_children[i];
+			if (max_sched_vsi_nodes >= vsi->nb_qps)
+				break;
+		}
+		vsi->nb_qps = RTE_MIN(vsi->nb_qps, max_sched_vsi_nodes);
+	}
+
 	/* nb_qps(hex)  -> fls */
 	/* 0000		-> 0 */
 	/* 0001		-> 0 */
@@ -1707,7 +1722,7 @@ ice_setup_vsi(struct ice_pf *pf, enum ice_vsi_type type)
 			rte_cpu_to_le_16(hw->func_caps.fd_fltr_best_effort);
 
 		/* Enable VLAN/UP trip */
-		ret = ice_vsi_config_tc_queue_mapping(vsi,
+		ret = ice_vsi_config_tc_queue_mapping(hw, vsi,
 						      &vsi_ctx.info,
 						      ICE_DEFAULT_TCMAP);
 		if (ret) {
@@ -1731,7 +1746,7 @@ ice_setup_vsi(struct ice_pf *pf, enum ice_vsi_type type)
 		vsi_ctx.info.fd_options = rte_cpu_to_le_16(cfg);
 		vsi_ctx.info.sw_id = hw->port_info->sw_id;
 		vsi_ctx.info.sw_flags2 = ICE_AQ_VSI_SW_FLAG_LAN_ENA;
-		ret = ice_vsi_config_tc_queue_mapping(vsi,
+		ret = ice_vsi_config_tc_queue_mapping(hw, vsi,
 						      &vsi_ctx.info,
 						      ICE_DEFAULT_TCMAP);
 		if (ret) {
-- 
2.43.0
^ permalink raw reply	[flat|nested] 76+ messages in thread
* [PATCH 14/15] net/ice: enhance Tx scheduler hierarchy support
  2024-08-07  9:33 [PATCH 00/15] Improve rte_tm support in ICE driver Bruce Richardson
                   ` (12 preceding siblings ...)
  2024-08-07  9:34 ` [PATCH 13/15] net/ice: limit the number of queues to sched capabilities Bruce Richardson
@ 2024-08-07  9:34 ` Bruce Richardson
  2024-08-07  9:34 ` [PATCH 15/15] net/ice: add minimal capability reporting API Bruce Richardson
                   ` (5 subsequent siblings)
  19 siblings, 0 replies; 76+ messages in thread
From: Bruce Richardson @ 2024-08-07  9:34 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson
Increase the flexibility of the Tx scheduler hierarchy support in the
driver. If the HW/firmware allows it, allow creating up to 2k child
nodes per scheduler node. Also expand the number of supported layers to
the max available, rather than always just having 3 layers.  One
restriction on this change is that the topology needs to be configured
and enabled before port queue setup, in many cases, and before port
start in all cases.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/ice/ice_ethdev.c |   9 -
 drivers/net/ice/ice_ethdev.h |  15 +-
 drivers/net/ice/ice_rxtx.c   |  10 +
 drivers/net/ice/ice_tm.c     | 495 ++++++++++++++---------------------
 4 files changed, 212 insertions(+), 317 deletions(-)
diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c
index ab3f88fd7d..5a5967ff71 100644
--- a/drivers/net/ice/ice_ethdev.c
+++ b/drivers/net/ice/ice_ethdev.c
@@ -3832,7 +3832,6 @@ ice_dev_start(struct rte_eth_dev *dev)
 	int mask, ret;
 	uint8_t timer = hw->func_caps.ts_func_info.tmr_index_owned;
 	uint32_t pin_idx = ad->devargs.pin_idx;
-	struct rte_tm_error tm_err;
 	ice_declare_bitmap(pmask, ICE_PROMISC_MAX);
 	ice_zero_bitmap(pmask, ICE_PROMISC_MAX);
 
@@ -3864,14 +3863,6 @@ ice_dev_start(struct rte_eth_dev *dev)
 		}
 	}
 
-	if (pf->tm_conf.committed) {
-		ret = ice_do_hierarchy_commit(dev, pf->tm_conf.clear_on_fail, &tm_err);
-		if (ret) {
-			PMD_DRV_LOG(ERR, "fail to commit Tx scheduler");
-			goto rx_err;
-		}
-	}
-
 	ice_set_rx_function(dev);
 	ice_set_tx_function(dev);
 
diff --git a/drivers/net/ice/ice_ethdev.h b/drivers/net/ice/ice_ethdev.h
index f31addb122..cb1a7e8e0d 100644
--- a/drivers/net/ice/ice_ethdev.h
+++ b/drivers/net/ice/ice_ethdev.h
@@ -479,14 +479,6 @@ struct ice_tm_node {
 	struct ice_sched_node *sched_node;
 };
 
-/* node type of Traffic Manager */
-enum ice_tm_node_type {
-	ICE_TM_NODE_TYPE_PORT,
-	ICE_TM_NODE_TYPE_QGROUP,
-	ICE_TM_NODE_TYPE_QUEUE,
-	ICE_TM_NODE_TYPE_MAX,
-};
-
 /* Struct to store all the Traffic Manager configuration. */
 struct ice_tm_conf {
 	struct ice_shaper_profile_list shaper_profile_list;
@@ -690,9 +682,6 @@ int ice_rem_rss_cfg_wrap(struct ice_pf *pf, uint16_t vsi_id,
 			 struct ice_rss_hash_cfg *cfg);
 void ice_tm_conf_init(struct rte_eth_dev *dev);
 void ice_tm_conf_uninit(struct rte_eth_dev *dev);
-int ice_do_hierarchy_commit(struct rte_eth_dev *dev,
-			    int clear_on_fail,
-			    struct rte_tm_error *error);
 extern const struct rte_tm_ops ice_tm_ops;
 
 static inline int
@@ -750,4 +739,8 @@ int rte_pmd_ice_dump_switch(uint16_t port, uint8_t **buff, uint32_t *size);
 
 __rte_experimental
 int rte_pmd_ice_dump_txsched(uint16_t port, bool detail, FILE *stream);
+
+int
+ice_tm_setup_txq_node(struct ice_pf *pf, struct ice_hw *hw, uint16_t qid, uint32_t node_teid);
+
 #endif /* _ICE_ETHDEV_H_ */
diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c
index a150d28e73..7a421bb364 100644
--- a/drivers/net/ice/ice_rxtx.c
+++ b/drivers/net/ice/ice_rxtx.c
@@ -747,6 +747,7 @@ ice_tx_queue_start(struct rte_eth_dev *dev, uint16_t tx_queue_id)
 	int err;
 	struct ice_vsi *vsi;
 	struct ice_hw *hw;
+	struct ice_pf *pf;
 	struct ice_aqc_add_tx_qgrp *txq_elem;
 	struct ice_tlan_ctx tx_ctx;
 	int buf_len;
@@ -777,6 +778,7 @@ ice_tx_queue_start(struct rte_eth_dev *dev, uint16_t tx_queue_id)
 
 	vsi = txq->vsi;
 	hw = ICE_VSI_TO_HW(vsi);
+	pf = ICE_VSI_TO_PF(vsi);
 
 	memset(&tx_ctx, 0, sizeof(tx_ctx));
 	txq_elem->num_txqs = 1;
@@ -812,6 +814,14 @@ ice_tx_queue_start(struct rte_eth_dev *dev, uint16_t tx_queue_id)
 	/* store the schedule node id */
 	txq->q_teid = txq_elem->txqs[0].q_teid;
 
+	/* move the queue to correct position in hierarchy, if explicit hierarchy configured */
+	if (pf->tm_conf.committed)
+		if (ice_tm_setup_txq_node(pf, hw, tx_queue_id, txq->q_teid) != 0) {
+			PMD_DRV_LOG(ERR, "Failed to set up txq traffic management node");
+			rte_free(txq_elem);
+			return -EIO;
+		}
+
 	dev->data->tx_queue_state[tx_queue_id] = RTE_ETH_QUEUE_STATE_STARTED;
 
 	rte_free(txq_elem);
diff --git a/drivers/net/ice/ice_tm.c b/drivers/net/ice/ice_tm.c
index 459446a6b0..a86943a5b2 100644
--- a/drivers/net/ice/ice_tm.c
+++ b/drivers/net/ice/ice_tm.c
@@ -1,17 +1,17 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2022 Intel Corporation
  */
+#include <rte_ethdev.h>
 #include <rte_tm_driver.h>
 
 #include "ice_ethdev.h"
 #include "ice_rxtx.h"
 
-#define MAX_CHILDREN_PER_SCHED_NODE	8
-#define MAX_CHILDREN_PER_TM_NODE	256
+#define MAX_CHILDREN_PER_TM_NODE	2048
 
 static int ice_hierarchy_commit(struct rte_eth_dev *dev,
 				 int clear_on_fail,
-				 __rte_unused struct rte_tm_error *error);
+				 struct rte_tm_error *error);
 static int ice_tm_node_add(struct rte_eth_dev *dev, uint32_t node_id,
 	      uint32_t parent_node_id, uint32_t priority,
 	      uint32_t weight, uint32_t level_id,
@@ -86,9 +86,10 @@ ice_tm_conf_uninit(struct rte_eth_dev *dev)
 }
 
 static int
-ice_node_param_check(struct ice_pf *pf, uint32_t node_id,
+ice_node_param_check(uint32_t node_id,
 		      uint32_t priority, uint32_t weight,
 		      const struct rte_tm_node_params *params,
+		      bool is_leaf,
 		      struct rte_tm_error *error)
 {
 	/* checked all the unsupported parameter */
@@ -123,7 +124,7 @@ ice_node_param_check(struct ice_pf *pf, uint32_t node_id,
 	}
 
 	/* for non-leaf node */
-	if (node_id >= pf->dev_data->nb_tx_queues) {
+	if (!is_leaf) {
 		if (params->nonleaf.wfq_weight_mode) {
 			error->type =
 				RTE_TM_ERROR_TYPE_NODE_PARAMS_WFQ_WEIGHT_MODE;
@@ -147,6 +148,11 @@ ice_node_param_check(struct ice_pf *pf, uint32_t node_id,
 	}
 
 	/* for leaf node */
+	if (node_id >= RTE_MAX_QUEUES_PER_PORT) {
+		error->type = RTE_TM_ERROR_TYPE_NODE_ID;
+		error->message = "Node ID out of range for a leaf node.";
+		return -EINVAL;
+	}
 	if (params->leaf.cman) {
 		error->type = RTE_TM_ERROR_TYPE_NODE_PARAMS_CMAN;
 		error->message = "Congestion management not supported";
@@ -193,11 +199,18 @@ find_node(struct ice_tm_node *root, uint32_t id)
 	return NULL;
 }
 
+static inline uint8_t
+ice_get_leaf_level(struct ice_hw *hw)
+{
+	return hw->num_tx_sched_layers - 1 - hw->port_info->has_tc;
+}
+
 static int
 ice_node_type_get(struct rte_eth_dev *dev, uint32_t node_id,
 		   int *is_leaf, struct rte_tm_error *error)
 {
 	struct ice_pf *pf = ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
+	struct ice_hw *hw = ICE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
 	struct ice_tm_node *tm_node;
 
 	if (!is_leaf || !error)
@@ -217,7 +230,7 @@ ice_node_type_get(struct rte_eth_dev *dev, uint32_t node_id,
 		return -EINVAL;
 	}
 
-	if (tm_node->level == ICE_TM_NODE_TYPE_QUEUE)
+	if (tm_node->level == ice_get_leaf_level(hw))
 		*is_leaf = true;
 	else
 		*is_leaf = false;
@@ -389,16 +402,28 @@ ice_tm_node_add(struct rte_eth_dev *dev, uint32_t node_id,
 	      struct rte_tm_error *error)
 {
 	struct ice_pf *pf = ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
+	struct ice_hw *hw = ICE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
 	struct ice_tm_shaper_profile *shaper_profile = NULL;
 	struct ice_tm_node *tm_node;
-	struct ice_tm_node *parent_node;
+	struct ice_tm_node *parent_node = NULL;
 	int ret;
 
 	if (!params || !error)
 		return -EINVAL;
 
-	ret = ice_node_param_check(pf, node_id, priority, weight,
-				    params, error);
+	if (parent_node_id != RTE_TM_NODE_ID_NULL) {
+		parent_node = find_node(pf->tm_conf.root, parent_node_id);
+		if (!parent_node) {
+			error->type = RTE_TM_ERROR_TYPE_NODE_PARENT_NODE_ID;
+			error->message = "parent not exist";
+			return -EINVAL;
+		}
+	}
+	if (level_id == RTE_TM_NODE_LEVEL_ID_ANY && parent_node != NULL)
+		level_id = parent_node->level + 1;
+
+	ret = ice_node_param_check(node_id, priority, weight,
+			params, level_id == ice_get_leaf_level(hw), error);
 	if (ret)
 		return ret;
 
@@ -424,9 +449,9 @@ ice_tm_node_add(struct rte_eth_dev *dev, uint32_t node_id,
 	/* root node if not have a parent */
 	if (parent_node_id == RTE_TM_NODE_ID_NULL) {
 		/* check level */
-		if (level_id != ICE_TM_NODE_TYPE_PORT) {
+		if (level_id != 0) {
 			error->type = RTE_TM_ERROR_TYPE_NODE_PARAMS;
-			error->message = "Wrong level";
+			error->message = "Wrong level, root node (NULL parent) must be at level 0";
 			return -EINVAL;
 		}
 
@@ -445,7 +470,7 @@ ice_tm_node_add(struct rte_eth_dev *dev, uint32_t node_id,
 		if (!tm_node)
 			return -ENOMEM;
 		tm_node->id = node_id;
-		tm_node->level = ICE_TM_NODE_TYPE_PORT;
+		tm_node->level = 0;
 		tm_node->parent = NULL;
 		tm_node->reference_count = 0;
 		tm_node->shaper_profile = shaper_profile;
@@ -458,48 +483,21 @@ ice_tm_node_add(struct rte_eth_dev *dev, uint32_t node_id,
 	}
 
 	/* check the parent node */
-	parent_node = find_node(pf->tm_conf.root, parent_node_id);
-	if (!parent_node) {
-		error->type = RTE_TM_ERROR_TYPE_NODE_PARENT_NODE_ID;
-		error->message = "parent not exist";
-		return -EINVAL;
-	}
-	if (parent_node->level != ICE_TM_NODE_TYPE_PORT &&
-	    parent_node->level != ICE_TM_NODE_TYPE_QGROUP) {
+	/* for n-level hierarchy, level n-1 is leaf, so last level with children is n-2 */
+	if ((int)parent_node->level > hw->num_tx_sched_layers - 2) {
 		error->type = RTE_TM_ERROR_TYPE_NODE_PARENT_NODE_ID;
 		error->message = "parent is not valid";
 		return -EINVAL;
 	}
 	/* check level */
-	if (level_id != RTE_TM_NODE_LEVEL_ID_ANY &&
-	    level_id != parent_node->level + 1) {
+	if (level_id != parent_node->level + 1) {
 		error->type = RTE_TM_ERROR_TYPE_NODE_PARAMS;
 		error->message = "Wrong level";
 		return -EINVAL;
 	}
 
 	/* check the node number */
-	if (parent_node->level == ICE_TM_NODE_TYPE_PORT) {
-		/* check the queue group number */
-		if (parent_node->reference_count >= pf->dev_data->nb_tx_queues) {
-			error->type = RTE_TM_ERROR_TYPE_NODE_ID;
-			error->message = "too many queue groups";
-			return -EINVAL;
-		}
-	} else {
-		/* check the queue number */
-		if (parent_node->reference_count >=
-			MAX_CHILDREN_PER_SCHED_NODE) {
-			error->type = RTE_TM_ERROR_TYPE_NODE_ID;
-			error->message = "too many queues";
-			return -EINVAL;
-		}
-		if (node_id >= pf->dev_data->nb_tx_queues) {
-			error->type = RTE_TM_ERROR_TYPE_NODE_ID;
-			error->message = "too large queue id";
-			return -EINVAL;
-		}
-	}
+	/* TODO, check max children allowed and max nodes at this level */
 
 	tm_node = rte_zmalloc(NULL,
 			      sizeof(struct ice_tm_node) +
@@ -518,13 +516,12 @@ ice_tm_node_add(struct rte_eth_dev *dev, uint32_t node_id,
 		(void *)((uint8_t *)tm_node + sizeof(struct ice_tm_node));
 	tm_node->parent->children[tm_node->parent->reference_count] = tm_node;
 
-	if (tm_node->priority != 0 && level_id != ICE_TM_NODE_TYPE_QUEUE &&
-	    level_id != ICE_TM_NODE_TYPE_QGROUP)
+	if (tm_node->priority != 0)
+		/* TODO fixme, some levels may support this perhaps? */
 		PMD_DRV_LOG(WARNING, "priority != 0 not supported in level %d",
 			    level_id);
 
-	if (tm_node->weight != 1 &&
-	    level_id != ICE_TM_NODE_TYPE_QUEUE && level_id != ICE_TM_NODE_TYPE_QGROUP)
+	if (tm_node->weight != 1 && level_id == 0)
 		PMD_DRV_LOG(WARNING, "weight != 1 not supported in level %d",
 			    level_id);
 
@@ -569,7 +566,7 @@ ice_tm_node_delete(struct rte_eth_dev *dev, uint32_t node_id,
 	}
 
 	/* root node */
-	if (tm_node->level == ICE_TM_NODE_TYPE_PORT) {
+	if (tm_node->level == 0) {
 		rte_free(tm_node);
 		pf->tm_conf.root = NULL;
 		return 0;
@@ -589,53 +586,6 @@ ice_tm_node_delete(struct rte_eth_dev *dev, uint32_t node_id,
 	return 0;
 }
 
-static int ice_move_recfg_lan_txq(struct rte_eth_dev *dev,
-				  struct ice_sched_node *queue_sched_node,
-				  struct ice_sched_node *dst_node,
-				  uint16_t queue_id)
-{
-	struct ice_hw *hw = ICE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
-	struct ice_aqc_move_txqs_data *buf;
-	struct ice_sched_node *queue_parent_node;
-	uint8_t txqs_moved;
-	int ret = ICE_SUCCESS;
-	uint16_t buf_size = ice_struct_size(buf, txqs, 1);
-
-	buf = (struct ice_aqc_move_txqs_data *)ice_malloc(hw, sizeof(*buf));
-	if (buf == NULL)
-		return -ENOMEM;
-
-	queue_parent_node = queue_sched_node->parent;
-	buf->src_teid = queue_parent_node->info.node_teid;
-	buf->dest_teid = dst_node->info.node_teid;
-	buf->txqs[0].q_teid = queue_sched_node->info.node_teid;
-	buf->txqs[0].txq_id = queue_id;
-
-	ret = ice_aq_move_recfg_lan_txq(hw, 1, true, false, false, false, 50,
-					NULL, buf, buf_size, &txqs_moved, NULL);
-	if (ret || txqs_moved == 0) {
-		PMD_DRV_LOG(ERR, "move lan queue %u failed", queue_id);
-		rte_free(buf);
-		return ICE_ERR_PARAM;
-	}
-
-	if (queue_parent_node->num_children > 0) {
-		queue_parent_node->num_children--;
-		queue_parent_node->children[queue_parent_node->num_children] = NULL;
-	} else {
-		PMD_DRV_LOG(ERR, "invalid children number %d for queue %u",
-			    queue_parent_node->num_children, queue_id);
-		rte_free(buf);
-		return ICE_ERR_PARAM;
-	}
-	dst_node->children[dst_node->num_children++] = queue_sched_node;
-	queue_sched_node->parent = dst_node;
-	ice_sched_query_elem(hw, queue_sched_node->info.node_teid, &queue_sched_node->info);
-
-	rte_free(buf);
-	return ret;
-}
-
 static int ice_set_node_rate(struct ice_hw *hw,
 			     struct ice_tm_node *tm_node,
 			     struct ice_sched_node *sched_node)
@@ -723,240 +673,191 @@ static int ice_cfg_hw_node(struct ice_hw *hw,
 	return 0;
 }
 
-static struct ice_sched_node *ice_get_vsi_node(struct ice_hw *hw)
+int
+ice_tm_setup_txq_node(struct ice_pf *pf, struct ice_hw *hw, uint16_t qid, uint32_t teid)
 {
-	struct ice_sched_node *node = hw->port_info->root;
-	uint32_t vsi_layer = hw->num_tx_sched_layers - ICE_VSI_LAYER_OFFSET;
-	uint32_t i;
+	struct ice_sched_node *hw_node = ice_sched_find_node_by_teid(hw->port_info->root, teid);
+	struct ice_tm_node *sw_node = find_node(pf->tm_conf.root, qid);
 
-	for (i = 0; i < vsi_layer; i++)
-		node = node->children[0];
-
-	return node;
-}
-
-static int ice_reset_noleaf_nodes(struct rte_eth_dev *dev)
-{
-	struct ice_pf *pf = ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
-	struct ice_hw *hw = ICE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
-	struct ice_sched_node *vsi_node = ice_get_vsi_node(hw);
-	struct ice_tm_node *root = pf->tm_conf.root;
-	uint32_t i;
-	int ret;
-
-	/* reset vsi_node */
-	ret = ice_set_node_rate(hw, NULL, vsi_node);
-	if (ret) {
-		PMD_DRV_LOG(ERR, "reset vsi node failed");
-		return ret;
-	}
-
-	if (root == NULL)
+	/* not configured in hierarchy */
+	if (sw_node == NULL)
 		return 0;
 
-	for (i = 0; i < root->reference_count; i++) {
-		struct ice_tm_node *tm_node = root->children[i];
+	sw_node->sched_node = hw_node;
 
-		if (tm_node->sched_node == NULL)
-			continue;
+	/* if the queue node has been put in the wrong place in hierarchy */
+	if (hw_node->parent != sw_node->parent->sched_node) {
+		struct ice_aqc_move_txqs_data *buf;
+		uint8_t txqs_moved = 0;
+		uint16_t buf_size = ice_struct_size(buf, txqs, 1);
+
+		buf = ice_malloc(hw, buf_size);
+		if (buf == NULL)
+			return -ENOMEM;
 
-		ret = ice_cfg_hw_node(hw, NULL, tm_node->sched_node);
-		if (ret) {
-			PMD_DRV_LOG(ERR, "reset queue group node %u failed", tm_node->id);
-			return ret;
+		struct ice_sched_node *parent = hw_node->parent;
+		struct ice_sched_node *new_parent = sw_node->parent->sched_node;
+		buf->src_teid = parent->info.node_teid;
+		buf->dest_teid = new_parent->info.node_teid;
+		buf->txqs[0].q_teid = hw_node->info.node_teid;
+		buf->txqs[0].txq_id = qid;
+
+		int ret = ice_aq_move_recfg_lan_txq(hw, 1, true, false, false, false, 50,
+						NULL, buf, buf_size, &txqs_moved, NULL);
+		if (ret || txqs_moved == 0) {
+			PMD_DRV_LOG(ERR, "move lan queue %u failed", qid);
+			ice_free(hw, buf);
+			return ICE_ERR_PARAM;
 		}
-		tm_node->sched_node = NULL;
+
+		/* now update the ice_sched_nodes to match physical layout */
+		new_parent->children[new_parent->num_children++] = hw_node;
+		hw_node->parent = new_parent;
+		ice_sched_query_elem(hw, hw_node->info.node_teid, &hw_node->info);
+		for (uint16_t i = 0; i < parent->num_children; i++)
+			if (parent->children[i] == hw_node) {
+				/* to remove, just overwrite the old node slot with the last ptr */
+				parent->children[i] = parent->children[--parent->num_children];
+				break;
+			}
 	}
 
-	return 0;
+	return ice_cfg_hw_node(hw, sw_node, hw_node);
 }
 
-static int ice_remove_leaf_nodes(struct rte_eth_dev *dev)
+/* from a given node, recursively deletes all the nodes that belong to that vsi.
+ * Any nodes which can't be deleted because they have children belonging to a different
+ * VSI, are now also adjusted to belong to that VSI also
+ */
+static int
+free_sched_node_recursive(struct ice_port_info *pi, const struct ice_sched_node *root,
+		struct ice_sched_node *node, uint8_t vsi_id)
 {
-	int ret = 0;
-	int i;
+	uint16_t i = 0;
 
-	for (i = 0; i < dev->data->nb_tx_queues; i++) {
-		ret = ice_tx_queue_stop(dev, i);
-		if (ret) {
-			PMD_DRV_LOG(ERR, "stop queue %u failed", i);
-			break;
+	while (i < node->num_children) {
+		if (node->children[i]->vsi_handle != vsi_id) {
+			i++;
+			continue;
 		}
+		free_sched_node_recursive(pi, root, node->children[i], vsi_id);
 	}
 
-	return ret;
-}
-
-static int ice_add_leaf_nodes(struct rte_eth_dev *dev)
-{
-	int ret = 0;
-	int i;
-
-	for (i = 0; i < dev->data->nb_tx_queues; i++) {
-		ret = ice_tx_queue_start(dev, i);
-		if (ret) {
-			PMD_DRV_LOG(ERR, "start queue %u failed", i);
-			break;
-		}
+	if (node != root) {
+		if (node->num_children == 0)
+			ice_free_sched_node(pi, node);
+		else
+			node->vsi_handle = node->children[0]->vsi_handle;
 	}
 
-	return ret;
+	return 0;
 }
 
-int ice_do_hierarchy_commit(struct rte_eth_dev *dev,
-			    int clear_on_fail,
-			    struct rte_tm_error *error)
+static int
+create_sched_node_recursive(struct ice_port_info *pi, struct ice_tm_node *sw_node,
+		struct ice_sched_node *hw_root, uint16_t *created)
 {
-	struct ice_pf *pf = ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
-	struct ice_hw *hw = ICE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
-	struct ice_tm_node *root;
-	struct ice_sched_node *vsi_node = NULL;
-	struct ice_sched_node *queue_node;
-	struct ice_tx_queue *txq;
-	int ret_val = 0;
-	uint32_t i;
-	uint32_t idx_vsi_child;
-	uint32_t idx_qg;
-	uint32_t nb_vsi_child;
-	uint32_t nb_qg;
-	uint32_t qid;
-	uint32_t q_teid;
-
-	/* remove leaf nodes */
-	ret_val = ice_remove_leaf_nodes(dev);
-	if (ret_val) {
-		error->type = RTE_TM_ERROR_TYPE_UNSPECIFIED;
-		PMD_DRV_LOG(ERR, "reset no-leaf nodes failed");
-		goto fail_clear;
-	}
-
-	/* reset no-leaf nodes. */
-	ret_val = ice_reset_noleaf_nodes(dev);
-	if (ret_val) {
-		error->type = RTE_TM_ERROR_TYPE_UNSPECIFIED;
-		PMD_DRV_LOG(ERR, "reset leaf nodes failed");
-		goto add_leaf;
-	}
-
-	/* config vsi node */
-	vsi_node = ice_get_vsi_node(hw);
-	root = pf->tm_conf.root;
-
-	ret_val = ice_set_node_rate(hw, root, vsi_node);
-	if (ret_val) {
-		error->type = RTE_TM_ERROR_TYPE_UNSPECIFIED;
-		PMD_DRV_LOG(ERR,
-			    "configure vsi node %u bandwidth failed",
-			    root->id);
-		goto add_leaf;
-	}
-
-	/* config queue group nodes */
-	nb_vsi_child = vsi_node->num_children;
-	nb_qg = vsi_node->children[0]->num_children;
-
-	idx_vsi_child = 0;
-	idx_qg = 0;
-
-	if (root == NULL)
-		goto commit;
-
-	for (i = 0; i < root->reference_count; i++) {
-		struct ice_tm_node *tm_node = root->children[i];
-		struct ice_tm_node *tm_child_node;
-		struct ice_sched_node *qgroup_sched_node =
-			vsi_node->children[idx_vsi_child]->children[idx_qg];
-		uint32_t j;
-
-		ret_val = ice_cfg_hw_node(hw, tm_node, qgroup_sched_node);
-		if (ret_val) {
-			error->type = RTE_TM_ERROR_TYPE_UNSPECIFIED;
-			PMD_DRV_LOG(ERR,
-				    "configure queue group node %u failed",
-				    tm_node->id);
-			goto reset_leaf;
-		}
-
-		for (j = 0; j < tm_node->reference_count; j++) {
-			tm_child_node = tm_node->children[j];
-			qid = tm_child_node->id;
-			ret_val = ice_tx_queue_start(dev, qid);
-			if (ret_val) {
-				error->type = RTE_TM_ERROR_TYPE_UNSPECIFIED;
-				PMD_DRV_LOG(ERR, "start queue %u failed", qid);
-				goto reset_leaf;
-			}
-			txq = dev->data->tx_queues[qid];
-			q_teid = txq->q_teid;
-			queue_node = ice_sched_get_node(hw->port_info, q_teid);
-			if (queue_node == NULL) {
-				error->type = RTE_TM_ERROR_TYPE_UNSPECIFIED;
-				PMD_DRV_LOG(ERR, "get queue %u node failed", qid);
-				goto reset_leaf;
-			}
-			if (queue_node->info.parent_teid != qgroup_sched_node->info.node_teid) {
-				ret_val = ice_move_recfg_lan_txq(dev, queue_node,
-								 qgroup_sched_node, qid);
-				if (ret_val) {
-					error->type = RTE_TM_ERROR_TYPE_UNSPECIFIED;
-					PMD_DRV_LOG(ERR, "move queue %u failed", qid);
-					goto reset_leaf;
-				}
-			}
-			ret_val = ice_cfg_hw_node(hw, tm_child_node, queue_node);
-			if (ret_val) {
-				error->type = RTE_TM_ERROR_TYPE_UNSPECIFIED;
-				PMD_DRV_LOG(ERR,
-					    "configure queue group node %u failed",
-					    tm_node->id);
-				goto reset_leaf;
-			}
-		}
-
-		idx_qg++;
-		if (idx_qg >= nb_qg) {
-			idx_qg = 0;
-			idx_vsi_child++;
+	struct ice_sched_node *parent = sw_node->sched_node;
+	uint32_t teid;
+	uint16_t added;
+
+	/* first create all child nodes */
+	for (uint16_t i = 0; i < sw_node->reference_count; i++) {
+		struct ice_tm_node *tm_node = sw_node->children[i];
+		int res = ice_sched_add_elems(pi, hw_root,
+				parent, parent->tx_sched_layer + 1,
+				1 /* num nodes */, &added, &teid,
+				NULL /* no pre-alloc */);
+		if (res != 0) {
+			PMD_DRV_LOG(ERR, "Error with ice_sched_add_elems, adding child node to teid %u\n",
+					parent->info.node_teid);
+			return -1;
 		}
-		if (idx_vsi_child >= nb_vsi_child) {
-			error->type = RTE_TM_ERROR_TYPE_UNSPECIFIED;
-			PMD_DRV_LOG(ERR, "too many queues");
-			goto reset_leaf;
+		struct ice_sched_node *hw_node = ice_sched_find_node_by_teid(parent, teid);
+		if (ice_cfg_hw_node(pi->hw, tm_node, hw_node) != 0) {
+			PMD_DRV_LOG(ERR, "Error configuring node %u at layer %u",
+					teid, parent->tx_sched_layer + 1);
+			return -1;
 		}
+		tm_node->sched_node = hw_node;
+		created[hw_node->tx_sched_layer]++;
 	}
 
-commit:
-	pf->tm_conf.committed = true;
-	pf->tm_conf.clear_on_fail = clear_on_fail;
+	/* if we have just created the child nodes in the q-group, i.e. last non-leaf layer,
+	 * then just return, rather than trying to create leaf nodes.
+	 * That is done later at queue start.
+	 */
+	if (sw_node->level + 2 == ice_get_leaf_level(pi->hw))
+		return 0;
 
-	return ret_val;
+	for (uint16_t i = 0; i < sw_node->reference_count; i++) {
+		if (sw_node->children[i]->reference_count == 0)
+			continue;
 
-reset_leaf:
-	ice_remove_leaf_nodes(dev);
-add_leaf:
-	ice_add_leaf_nodes(dev);
-	ice_reset_noleaf_nodes(dev);
-fail_clear:
-	/* clear all the traffic manager configuration */
-	if (clear_on_fail) {
-		ice_tm_conf_uninit(dev);
-		ice_tm_conf_init(dev);
+		if (create_sched_node_recursive(pi, sw_node->children[i], hw_root, created) < 0)
+			return -1;
 	}
-	return ret_val;
+	return 0;
 }
 
-static int ice_hierarchy_commit(struct rte_eth_dev *dev,
-				 int clear_on_fail,
-				 struct rte_tm_error *error)
+static int
+apply_topology_updates(struct rte_eth_dev *dev __rte_unused)
 {
+	return 0;
+}
+
+static int
+commit_new_hierarchy(struct rte_eth_dev *dev)
+{
+	struct ice_hw *hw = ICE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
 	struct ice_pf *pf = ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
+	struct ice_port_info *pi = hw->port_info;
+	struct ice_tm_node *sw_root = pf->tm_conf.root;
+	struct ice_sched_node *new_vsi_root = (pi->has_tc) ? pi->root->children[0] : pi->root;
+	uint16_t nodes_created_per_level[10] = {0}; /* counted per hw level, not per logical */
+	uint8_t q_lvl = ice_get_leaf_level(hw);
+	uint8_t qg_lvl = q_lvl - 1;
+
+	/* check if we have a previously applied topology */
+	if (sw_root->sched_node != NULL)
+		return apply_topology_updates(dev);
+
+	free_sched_node_recursive(pi, new_vsi_root, new_vsi_root, new_vsi_root->vsi_handle);
+
+	sw_root->sched_node = new_vsi_root;
+	if (create_sched_node_recursive(pi, sw_root, new_vsi_root, nodes_created_per_level) < 0)
+		return -1;
+	for (uint16_t i = 0; i < RTE_DIM(nodes_created_per_level); i++)
+		PMD_DRV_LOG(DEBUG, "Created %u nodes at level %u\n",
+				nodes_created_per_level[i], i);
+	hw->vsi_ctx[pf->main_vsi->idx]->sched.vsi_node[0] = new_vsi_root;
+
+	pf->main_vsi->nb_qps =
+			RTE_MIN(nodes_created_per_level[qg_lvl] * hw->max_children[qg_lvl],
+				hw->layer_info[q_lvl].max_device_nodes);
+
+	pf->tm_conf.committed = true; /* set flag to be checks on queue start */
+
+	return ice_alloc_lan_q_ctx(hw, 0, 0, pf->main_vsi->nb_qps);
+}
 
-	/* if device not started, simply set committed flag and return. */
-	if (!dev->data->dev_started) {
-		pf->tm_conf.committed = true;
-		pf->tm_conf.clear_on_fail = clear_on_fail;
-		return 0;
+static int
+ice_hierarchy_commit(struct rte_eth_dev *dev,
+				 int clear_on_fail,
+				 struct rte_tm_error *error)
+{
+	RTE_SET_USED(error);
+	/* TODO - commit should only be done to topology before start! */
+	if (dev->data->dev_started)
+		return -1;
+
+	uint64_t start = rte_rdtsc();
+	int ret = commit_new_hierarchy(dev);
+	if (ret < 0 && clear_on_fail) {
+		ice_tm_conf_uninit(dev);
+		ice_tm_conf_init(dev);
 	}
-
-	return ice_do_hierarchy_commit(dev, clear_on_fail, error);
+	uint64_t time = rte_rdtsc() - start;
+	PMD_DRV_LOG(DEBUG, "Time to apply hierarchy = %.1f\n", (float)time / rte_get_timer_hz());
+	return ret;
 }
-- 
2.43.0
^ permalink raw reply	[flat|nested] 76+ messages in thread
* [PATCH 15/15] net/ice: add minimal capability reporting API
  2024-08-07  9:33 [PATCH 00/15] Improve rte_tm support in ICE driver Bruce Richardson
                   ` (13 preceding siblings ...)
  2024-08-07  9:34 ` [PATCH 14/15] net/ice: enhance Tx scheduler hierarchy support Bruce Richardson
@ 2024-08-07  9:34 ` Bruce Richardson
  2024-08-07  9:46 ` [PATCH v2 00/15] Improve rte_tm support in ICE driver Bruce Richardson
                   ` (4 subsequent siblings)
  19 siblings, 0 replies; 76+ messages in thread
From: Bruce Richardson @ 2024-08-07  9:34 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson
Incomplete but reports number of available layers
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/ice/ice_ethdev.h |  1 +
 drivers/net/ice/ice_tm.c     | 17 +++++++++++++++++
 2 files changed, 18 insertions(+)
diff --git a/drivers/net/ice/ice_ethdev.h b/drivers/net/ice/ice_ethdev.h
index cb1a7e8e0d..6bebc511e4 100644
--- a/drivers/net/ice/ice_ethdev.h
+++ b/drivers/net/ice/ice_ethdev.h
@@ -682,6 +682,7 @@ int ice_rem_rss_cfg_wrap(struct ice_pf *pf, uint16_t vsi_id,
 			 struct ice_rss_hash_cfg *cfg);
 void ice_tm_conf_init(struct rte_eth_dev *dev);
 void ice_tm_conf_uninit(struct rte_eth_dev *dev);
+
 extern const struct rte_tm_ops ice_tm_ops;
 
 static inline int
diff --git a/drivers/net/ice/ice_tm.c b/drivers/net/ice/ice_tm.c
index a86943a5b2..d7def61756 100644
--- a/drivers/net/ice/ice_tm.c
+++ b/drivers/net/ice/ice_tm.c
@@ -33,8 +33,12 @@ static int ice_shaper_profile_add(struct rte_eth_dev *dev,
 static int ice_shaper_profile_del(struct rte_eth_dev *dev,
 				   uint32_t shaper_profile_id,
 				   struct rte_tm_error *error);
+static int ice_tm_capabilities_get(struct rte_eth_dev *dev,
+	struct rte_tm_capabilities *cap,
+	struct rte_tm_error *error);
 
 const struct rte_tm_ops ice_tm_ops = {
+	.capabilities_get = ice_tm_capabilities_get,
 	.shaper_profile_add = ice_shaper_profile_add,
 	.shaper_profile_delete = ice_shaper_profile_del,
 	.node_add = ice_tm_node_add,
@@ -861,3 +865,16 @@ ice_hierarchy_commit(struct rte_eth_dev *dev,
 	PMD_DRV_LOG(DEBUG, "Time to apply hierarchy = %.1f\n", (float)time / rte_get_timer_hz());
 	return ret;
 }
+
+static int
+ice_tm_capabilities_get(struct rte_eth_dev *dev, struct rte_tm_capabilities *cap,
+	struct rte_tm_error *error)
+{
+	struct ice_hw *hw = ICE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+	*cap = (struct rte_tm_capabilities){
+		.n_levels_max = hw->num_tx_sched_layers - hw->port_info->has_tc,
+	};
+	if (error)
+		error->type = RTE_TM_ERROR_TYPE_NONE;
+	return 0;
+}
-- 
2.43.0
^ permalink raw reply	[flat|nested] 76+ messages in thread
* [PATCH v2 00/15] Improve rte_tm support in ICE driver
  2024-08-07  9:33 [PATCH 00/15] Improve rte_tm support in ICE driver Bruce Richardson
                   ` (14 preceding siblings ...)
  2024-08-07  9:34 ` [PATCH 15/15] net/ice: add minimal capability reporting API Bruce Richardson
@ 2024-08-07  9:46 ` Bruce Richardson
  2024-08-07  9:46   ` [PATCH v2 01/15] net/ice: add traffic management node query function Bruce Richardson
                     ` (14 more replies)
  2024-08-12 15:27 ` [PATCH v3 00/16] Improve rte_tm support in ICE driver Bruce Richardson
                   ` (3 subsequent siblings)
  19 siblings, 15 replies; 76+ messages in thread
From: Bruce Richardson @ 2024-08-07  9:46 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson
This patchset expands the capabilities of the traffic management
support in the ICE driver. It allows the driver to support different
sizes of topologies, and support >256 queues and more than 3 hierarchy
layers.
---
Depends-on: series-32719 ("improve rte_rm APIs")
v2:
* Correct typo in commit log of one patch
* Add missing depends-on tag to the cover letter
Bruce Richardson (15):
  net/ice: add traffic management node query function
  net/ice: detect stopping a flow-director queue twice
  net/ice: improve Tx scheduler graph output
  net/ice: add option to choose DDP package file
  net/ice: add option to download scheduler topology
  net/ice/base: allow init without TC class sched nodes
  net/ice/base: set VSI index on newly created nodes
  net/ice/base: read VSI layer info from VSI
  net/ice/base: remove 255 limit on sched child nodes
  net/ice/base: optimize subtree searches
  net/ice/base: make functions non-static
  net/ice/base: remove flag checks before topology upload
  net/ice: limit the number of queues to sched capabilities
  net/ice: enhance Tx scheduler hierarchy support
  net/ice: add minimal capability reporting API
 doc/guides/nics/ice.rst          |   9 +
 drivers/net/ice/base/ice_ddp.c   |  51 +--
 drivers/net/ice/base/ice_ddp.h   |   4 +-
 drivers/net/ice/base/ice_sched.c |  56 ++--
 drivers/net/ice/base/ice_sched.h |   8 +
 drivers/net/ice/base/ice_type.h  |   3 +-
 drivers/net/ice/ice_diagnose.c   | 196 ++++-------
 drivers/net/ice/ice_ethdev.c     |  92 +++--
 drivers/net/ice/ice_ethdev.h     |  18 +-
 drivers/net/ice/ice_rxtx.c       |  15 +
 drivers/net/ice/ice_tm.c         | 558 +++++++++++++++----------------
 11 files changed, 487 insertions(+), 523 deletions(-)
--
2.43.0
^ permalink raw reply	[flat|nested] 76+ messages in thread
* [PATCH v2 01/15] net/ice: add traffic management node query function
  2024-08-07  9:46 ` [PATCH v2 00/15] Improve rte_tm support in ICE driver Bruce Richardson
@ 2024-08-07  9:46   ` Bruce Richardson
  2024-08-07  9:46   ` [PATCH v2 02/15] net/ice: detect stopping a flow-director queue twice Bruce Richardson
                     ` (13 subsequent siblings)
  14 siblings, 0 replies; 76+ messages in thread
From: Bruce Richardson @ 2024-08-07  9:46 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson
Implement the new node querying function for the "ice" net driver.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/ice/ice_tm.c | 48 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)
diff --git a/drivers/net/ice/ice_tm.c b/drivers/net/ice/ice_tm.c
index 8a29a9e744..459446a6b0 100644
--- a/drivers/net/ice/ice_tm.c
+++ b/drivers/net/ice/ice_tm.c
@@ -17,6 +17,11 @@ static int ice_tm_node_add(struct rte_eth_dev *dev, uint32_t node_id,
 	      uint32_t weight, uint32_t level_id,
 	      const struct rte_tm_node_params *params,
 	      struct rte_tm_error *error);
+static int ice_node_query(const struct rte_eth_dev *dev, uint32_t node_id,
+		uint32_t *parent_node_id, uint32_t *priority,
+		uint32_t *weight, uint32_t *level_id,
+		struct rte_tm_node_params *params,
+		struct rte_tm_error *error);
 static int ice_tm_node_delete(struct rte_eth_dev *dev, uint32_t node_id,
 			    struct rte_tm_error *error);
 static int ice_node_type_get(struct rte_eth_dev *dev, uint32_t node_id,
@@ -35,6 +40,7 @@ const struct rte_tm_ops ice_tm_ops = {
 	.node_add = ice_tm_node_add,
 	.node_delete = ice_tm_node_delete,
 	.node_type_get = ice_node_type_get,
+	.node_query = ice_node_query,
 	.hierarchy_commit = ice_hierarchy_commit,
 };
 
@@ -219,6 +225,48 @@ ice_node_type_get(struct rte_eth_dev *dev, uint32_t node_id,
 	return 0;
 }
 
+static int
+ice_node_query(const struct rte_eth_dev *dev, uint32_t node_id,
+		uint32_t *parent_node_id, uint32_t *priority,
+		uint32_t *weight, uint32_t *level_id,
+		struct rte_tm_node_params *params,
+		struct rte_tm_error *error)
+{
+	struct ice_pf *pf = ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
+	struct ice_tm_node *tm_node;
+
+	if (node_id == RTE_TM_NODE_ID_NULL) {
+		error->type = RTE_TM_ERROR_TYPE_NODE_ID;
+		error->message = "invalid node id";
+		return -EINVAL;
+	}
+
+	/* check if the node id exists */
+	tm_node = find_node(pf->tm_conf.root, node_id);
+	if (!tm_node) {
+		error->type = RTE_TM_ERROR_TYPE_NODE_ID;
+		error->message = "no such node";
+		return -EEXIST;
+	}
+
+	if (parent_node_id != NULL)
+		*parent_node_id = tm_node->parent->id;
+
+	if (priority != NULL)
+		*priority = tm_node->priority;
+
+	if (weight != NULL)
+		*weight = tm_node->weight;
+
+	if (level_id != NULL)
+		*level_id = tm_node->level;
+
+	if (params != NULL)
+		*params = tm_node->params;
+
+	return 0;
+}
+
 static inline struct ice_tm_shaper_profile *
 ice_shaper_profile_search(struct rte_eth_dev *dev,
 			   uint32_t shaper_profile_id)
-- 
2.43.0
^ permalink raw reply	[flat|nested] 76+ messages in thread
* [PATCH v2 02/15] net/ice: detect stopping a flow-director queue twice
  2024-08-07  9:46 ` [PATCH v2 00/15] Improve rte_tm support in ICE driver Bruce Richardson
  2024-08-07  9:46   ` [PATCH v2 01/15] net/ice: add traffic management node query function Bruce Richardson
@ 2024-08-07  9:46   ` Bruce Richardson
  2024-08-07  9:46   ` [PATCH v2 03/15] net/ice: improve Tx scheduler graph output Bruce Richardson
                     ` (12 subsequent siblings)
  14 siblings, 0 replies; 76+ messages in thread
From: Bruce Richardson @ 2024-08-07  9:46 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson
If the flow-director queue is stopped at some point during the running
of an application, the shutdown procedure for the port issues an error
as it tries to stop the queue a second time, and fails to do so. We can
eliminate this error by setting the tail-register pointer to NULL on
stop, and checking for that condition in subsequent stop calls. Since
the register pointer is set on start, any restarting of the queue will
allow a stop call to progress as normal.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/ice/ice_rxtx.c | 5 +++++
 1 file changed, 5 insertions(+)
diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c
index f270498ed1..a150d28e73 100644
--- a/drivers/net/ice/ice_rxtx.c
+++ b/drivers/net/ice/ice_rxtx.c
@@ -1139,6 +1139,10 @@ ice_fdir_tx_queue_stop(struct rte_eth_dev *dev, uint16_t tx_queue_id)
 			    tx_queue_id);
 		return -EINVAL;
 	}
+	if (txq->qtx_tail == NULL) {
+		PMD_DRV_LOG(INFO, "TX queue %u not started\n", tx_queue_id);
+		return 0;
+	}
 	vsi = txq->vsi;
 
 	q_ids[0] = txq->reg_idx;
@@ -1153,6 +1157,7 @@ ice_fdir_tx_queue_stop(struct rte_eth_dev *dev, uint16_t tx_queue_id)
 	}
 
 	txq->tx_rel_mbufs(txq);
+	txq->qtx_tail = NULL;
 
 	return 0;
 }
-- 
2.43.0
^ permalink raw reply	[flat|nested] 76+ messages in thread
* [PATCH v2 03/15] net/ice: improve Tx scheduler graph output
  2024-08-07  9:46 ` [PATCH v2 00/15] Improve rte_tm support in ICE driver Bruce Richardson
  2024-08-07  9:46   ` [PATCH v2 01/15] net/ice: add traffic management node query function Bruce Richardson
  2024-08-07  9:46   ` [PATCH v2 02/15] net/ice: detect stopping a flow-director queue twice Bruce Richardson
@ 2024-08-07  9:46   ` Bruce Richardson
  2024-08-07  9:46   ` [PATCH v2 04/15] net/ice: add option to choose DDP package file Bruce Richardson
                     ` (11 subsequent siblings)
  14 siblings, 0 replies; 76+ messages in thread
From: Bruce Richardson @ 2024-08-07  9:46 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson
The function to dump the TX scheduler topology only adds to the chart
nodes connected to TX queues or for the flow director VSI. Change the
function to work recursively from the root node and thereby include all
scheduler nodes, whether in use or not, in the dump.
Also, improve the output of the Tx scheduler graphing function:
* Add VSI details to each node in graph
* When number of children is >16, skip middle nodes to reduce size of
  the graph, otherwise dot output is unviewable for large hierarchies
* For VSIs other than zero, use dot's clustering method to put those
  VSIs into subgraphs with borders
* For leaf nodes, display queue numbers for the any nodes assigned to
  ethdev NIC Tx queues
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/ice/ice_diagnose.c | 196 ++++++++++++---------------------
 1 file changed, 69 insertions(+), 127 deletions(-)
diff --git a/drivers/net/ice/ice_diagnose.c b/drivers/net/ice/ice_diagnose.c
index c357554707..623d84e37d 100644
--- a/drivers/net/ice/ice_diagnose.c
+++ b/drivers/net/ice/ice_diagnose.c
@@ -545,29 +545,15 @@ static void print_rl_profile(struct ice_aqc_rl_profile_elem *prof,
 	fprintf(stream, "\t\t\t\t\t</td>\n");
 }
 
-static
-void print_elem_type(FILE *stream, u8 type)
+static const char *
+get_elem_type(u8 type)
 {
-	switch (type) {
-	case 1:
-		fprintf(stream, "root");
-		break;
-	case 2:
-		fprintf(stream, "tc");
-		break;
-	case 3:
-		fprintf(stream, "se_generic");
-		break;
-	case 4:
-		fprintf(stream, "entry_point");
-		break;
-	case 5:
-		fprintf(stream, "leaf");
-		break;
-	default:
-		fprintf(stream, "%d", type);
-		break;
-	}
+	static const char * const ice_sched_node_types[] = {
+			"Undefined", "Root", "TC", "SE Generic", "SW Entry", "Leaf"
+	};
+	if (type < RTE_DIM(ice_sched_node_types))
+		return ice_sched_node_types[type];
+	return "*UNKNOWN*";
 }
 
 static
@@ -602,7 +588,9 @@ void print_priority_mode(FILE *stream, bool flag)
 }
 
 static
-void print_node(struct ice_aqc_txsched_elem_data *data,
+void print_node(struct ice_sched_node *node,
+		struct rte_eth_dev_data *ethdata,
+		struct ice_aqc_txsched_elem_data *data,
 		struct ice_aqc_rl_profile_elem *cir_prof,
 		struct ice_aqc_rl_profile_elem *eir_prof,
 		struct ice_aqc_rl_profile_elem *shared_prof,
@@ -613,17 +601,19 @@ void print_node(struct ice_aqc_txsched_elem_data *data,
 
 	fprintf(stream, "\t\t\t<table>\n");
 
-	fprintf(stream, "\t\t\t\t<tr>\n");
-	fprintf(stream, "\t\t\t\t\t<td> teid </td>\n");
-	fprintf(stream, "\t\t\t\t\t<td> %d </td>\n", data->node_teid);
-	fprintf(stream, "\t\t\t\t</tr>\n");
-
-	fprintf(stream, "\t\t\t\t<tr>\n");
-	fprintf(stream, "\t\t\t\t\t<td> type </td>\n");
-	fprintf(stream, "\t\t\t\t\t<td>");
-	print_elem_type(stream, data->data.elem_type);
-	fprintf(stream, "</td>\n");
-	fprintf(stream, "\t\t\t\t</tr>\n");
+	fprintf(stream, "\t\t\t\t<tr><td>teid</td><td>%d</td></tr>\n", data->node_teid);
+	fprintf(stream, "\t\t\t\t<tr><td>type</td><td>%s</td></tr>\n",
+			get_elem_type(data->data.elem_type));
+	fprintf(stream, "\t\t\t\t<tr><td>VSI</td><td>%u</td></tr>\n", node->vsi_handle);
+	if (data->data.elem_type == ICE_AQC_ELEM_TYPE_LEAF) {
+		for (uint16_t i = 0; i < ethdata->nb_tx_queues; i++) {
+			struct ice_tx_queue *q = ethdata->tx_queues[i];
+			if (q->q_teid == data->node_teid) {
+				fprintf(stream, "\t\t\t\t<tr><td>TXQ</td><td>%u</td></tr>\n", i);
+				break;
+			}
+		}
+	}
 
 	if (!detail)
 		goto brief;
@@ -705,8 +695,6 @@ void print_node(struct ice_aqc_txsched_elem_data *data,
 	fprintf(stream, "\t\tshape=plain\n");
 	fprintf(stream, "\t]\n");
 
-	if (data->parent_teid != 0xFFFFFFFF)
-		fprintf(stream, "\tNODE_%d -> NODE_%d\n", data->parent_teid, data->node_teid);
 }
 
 static
@@ -731,112 +719,92 @@ int query_rl_profile(struct ice_hw *hw,
 	return 0;
 }
 
-static
-int query_node(struct ice_hw *hw, uint32_t child, uint32_t *parent,
-	       uint8_t level, bool detail, FILE *stream)
+static int
+query_node(struct ice_hw *hw, struct rte_eth_dev_data *ethdata,
+		struct ice_sched_node *node, bool detail, FILE *stream)
 {
-	struct ice_aqc_txsched_elem_data data;
+	struct ice_aqc_txsched_elem_data *data = &node->info;
 	struct ice_aqc_rl_profile_elem cir_prof;
 	struct ice_aqc_rl_profile_elem eir_prof;
 	struct ice_aqc_rl_profile_elem shared_prof;
 	struct ice_aqc_rl_profile_elem *cp = NULL;
 	struct ice_aqc_rl_profile_elem *ep = NULL;
 	struct ice_aqc_rl_profile_elem *sp = NULL;
-	int status, ret;
-
-	status = ice_sched_query_elem(hw, child, &data);
-	if (status != ICE_SUCCESS) {
-		if (level == hw->num_tx_sched_layers) {
-			/* ignore the error when a queue has been stopped. */
-			PMD_DRV_LOG(WARNING, "Failed to query queue node %d.", child);
-			*parent = 0xffffffff;
-			return 0;
-		}
-		PMD_DRV_LOG(ERR, "Failed to query scheduling node %d.", child);
-		return -EINVAL;
-	}
-
-	*parent = data.parent_teid;
+	u8 level = node->tx_sched_layer;
+	int ret;
 
-	if (data.data.cir_bw.bw_profile_idx != 0) {
-		ret = query_rl_profile(hw, level, 0, data.data.cir_bw.bw_profile_idx, &cir_prof);
+	if (data->data.cir_bw.bw_profile_idx != 0) {
+		ret = query_rl_profile(hw, level, 0, data->data.cir_bw.bw_profile_idx, &cir_prof);
 
 		if (ret)
 			return ret;
 		cp = &cir_prof;
 	}
 
-	if (data.data.eir_bw.bw_profile_idx != 0) {
-		ret = query_rl_profile(hw, level, 1, data.data.eir_bw.bw_profile_idx, &eir_prof);
+	if (data->data.eir_bw.bw_profile_idx != 0) {
+		ret = query_rl_profile(hw, level, 1, data->data.eir_bw.bw_profile_idx, &eir_prof);
 
 		if (ret)
 			return ret;
 		ep = &eir_prof;
 	}
 
-	if (data.data.srl_id != 0) {
-		ret = query_rl_profile(hw, level, 2, data.data.srl_id, &shared_prof);
+	if (data->data.srl_id != 0) {
+		ret = query_rl_profile(hw, level, 2, data->data.srl_id, &shared_prof);
 
 		if (ret)
 			return ret;
 		sp = &shared_prof;
 	}
 
-	print_node(&data, cp, ep, sp, detail, stream);
+	print_node(node, ethdata, data, cp, ep, sp, detail, stream);
 
 	return 0;
 }
 
-static
-int query_nodes(struct ice_hw *hw,
-		uint32_t *children, int child_num,
-		uint32_t *parents, int *parent_num,
-		uint8_t level, bool detail,
-		FILE *stream)
+static int
+query_node_recursive(struct ice_hw *hw, struct rte_eth_dev_data *ethdata,
+		struct ice_sched_node *node, bool detail, FILE *stream)
 {
-	uint32_t parent;
-	int i;
-	int j;
-
-	*parent_num = 0;
-	for (i = 0; i < child_num; i++) {
-		bool exist = false;
-		int ret;
+	bool close = false;
+	if (node->parent != NULL && node->vsi_handle != node->parent->vsi_handle) {
+		fprintf(stream, "subgraph cluster_%u {\n", node->vsi_handle);
+		fprintf(stream, "\tlabel = \"VSI %u\";\n", node->vsi_handle);
+		close = true;
+	}
 
-		ret = query_node(hw, children[i], &parent, level, detail, stream);
-		if (ret)
-			return -EINVAL;
+	int ret = query_node(hw, ethdata, node, detail, stream);
+	if (ret != 0)
+		return ret;
 
-		for (j = 0; j < *parent_num; j++) {
-			if (parents[j] == parent) {
-				exist = true;
-				break;
-			}
+	for (uint16_t i = 0; i < node->num_children; i++) {
+		ret = query_node_recursive(hw, ethdata, node->children[i], detail, stream);
+		if (ret != 0)
+			return ret;
+		/* if we have a lot of nodes, skip a bunch in the middle */
+		if (node->num_children > 16 && i == 2) {
+			uint16_t inc = node->num_children - 5;
+			fprintf(stream, "\tn%d_children [label=\"... +%d child nodes ...\"];\n",
+					node->info.node_teid, inc);
+			fprintf(stream, "\tNODE_%d -> n%d_children;\n",
+					node->info.node_teid, node->info.node_teid);
+			i += inc;
 		}
-
-		if (!exist && parent != 0xFFFFFFFF)
-			parents[(*parent_num)++] = parent;
 	}
+	if (close)
+		fprintf(stream, "}\n");
+	if (node->info.parent_teid != 0xFFFFFFFF)
+		fprintf(stream, "\tNODE_%d -> NODE_%d\n",
+				node->info.parent_teid, node->info.node_teid);
 
 	return 0;
 }
 
-int rte_pmd_ice_dump_txsched(uint16_t port, bool detail, FILE *stream)
+int
+rte_pmd_ice_dump_txsched(uint16_t port, bool detail, FILE *stream)
 {
 	struct rte_eth_dev *dev;
 	struct ice_hw *hw;
-	struct ice_pf *pf;
-	struct ice_q_ctx *q_ctx;
-	uint16_t q_num;
-	uint16_t i;
-	struct ice_tx_queue *txq;
-	uint32_t buf1[256];
-	uint32_t buf2[256];
-	uint32_t *children = buf1;
-	uint32_t *parents = buf2;
-	int child_num = 0;
-	int parent_num = 0;
-	uint8_t level;
 
 	RTE_ETH_VALID_PORTID_OR_ERR_RET(port, -ENODEV);
 
@@ -846,35 +814,9 @@ int rte_pmd_ice_dump_txsched(uint16_t port, bool detail, FILE *stream)
 
 	dev = &rte_eth_devices[port];
 	hw = ICE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
-	pf = ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
-	level = hw->num_tx_sched_layers;
-
-	q_num = dev->data->nb_tx_queues;
-
-	/* main vsi */
-	for (i = 0; i < q_num; i++) {
-		txq = dev->data->tx_queues[i];
-		q_ctx = ice_get_lan_q_ctx(hw, txq->vsi->idx, 0, i);
-		children[child_num++] = q_ctx->q_teid;
-	}
-
-	/* fdir vsi */
-	q_ctx = ice_get_lan_q_ctx(hw, pf->fdir.fdir_vsi->idx, 0, 0);
-	children[child_num++] = q_ctx->q_teid;
 
 	fprintf(stream, "digraph tx_sched {\n");
-	while (child_num > 0) {
-		int ret;
-		ret = query_nodes(hw, children, child_num,
-				  parents, &parent_num,
-				  level, detail, stream);
-		if (ret)
-			return ret;
-
-		children = parents;
-		child_num = parent_num;
-		level--;
-	}
+	query_node_recursive(hw, dev->data, hw->port_info->root, detail, stream);
 	fprintf(stream, "}\n");
 
 	return 0;
-- 
2.43.0
^ permalink raw reply	[flat|nested] 76+ messages in thread
* [PATCH v2 04/15] net/ice: add option to choose DDP package file
  2024-08-07  9:46 ` [PATCH v2 00/15] Improve rte_tm support in ICE driver Bruce Richardson
                     ` (2 preceding siblings ...)
  2024-08-07  9:46   ` [PATCH v2 03/15] net/ice: improve Tx scheduler graph output Bruce Richardson
@ 2024-08-07  9:46   ` Bruce Richardson
  2024-08-07  9:46   ` [PATCH v2 05/15] net/ice: add option to download scheduler topology Bruce Richardson
                     ` (10 subsequent siblings)
  14 siblings, 0 replies; 76+ messages in thread
From: Bruce Richardson @ 2024-08-07  9:46 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson
The "Dynamic Device Personalization" package is loaded at initialization
time by the driver, but the specific package file loaded depends upon
what package file is found first by searching through a hard-coded list
of firmware paths. To enable greater control over the package loading,
we can add a device option to choose a specific DDP package file to
load.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 doc/guides/nics/ice.rst      |  9 +++++++++
 drivers/net/ice/ice_ethdev.c | 34 ++++++++++++++++++++++++++++++++++
 drivers/net/ice/ice_ethdev.h |  1 +
 3 files changed, 44 insertions(+)
diff --git a/doc/guides/nics/ice.rst b/doc/guides/nics/ice.rst
index ae975d19ad..58ccfbd1a5 100644
--- a/doc/guides/nics/ice.rst
+++ b/doc/guides/nics/ice.rst
@@ -108,6 +108,15 @@ Runtime Configuration
 
     -a 80:00.0,default-mac-disable=1
 
+- ``DDP Package File``
+
+  Rather than have the driver search for the DDP package to load,
+  or to override what package is used,
+  the ``ddp_pkg_file`` option can be used to provide the path to a specific package file.
+  For example::
+
+    -a 80:00.0,ddp_pkg_file=/path/to/ice-version.pkg
+
 - ``Protocol extraction for per queue``
 
   Configure the RX queues to do protocol extraction into mbuf for protocol
diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c
index 304f959b7e..3e7ceda9ce 100644
--- a/drivers/net/ice/ice_ethdev.c
+++ b/drivers/net/ice/ice_ethdev.c
@@ -36,6 +36,7 @@
 #define ICE_ONE_PPS_OUT_ARG       "pps_out"
 #define ICE_RX_LOW_LATENCY_ARG    "rx_low_latency"
 #define ICE_MBUF_CHECK_ARG       "mbuf_check"
+#define ICE_DDP_FILENAME          "ddp_pkg_file"
 
 #define ICE_CYCLECOUNTER_MASK  0xffffffffffffffffULL
 
@@ -52,6 +53,7 @@ static const char * const ice_valid_args[] = {
 	ICE_RX_LOW_LATENCY_ARG,
 	ICE_DEFAULT_MAC_DISABLE,
 	ICE_MBUF_CHECK_ARG,
+	ICE_DDP_FILENAME,
 	NULL
 };
 
@@ -692,6 +694,18 @@ handle_field_name_arg(__rte_unused const char *key, const char *value,
 	return 0;
 }
 
+static int
+handle_ddp_filename_arg(__rte_unused const char *key, const char *value, void *name_args)
+{
+	const char **filename = name_args;
+	if (strlen(value) >= ICE_MAX_PKG_FILENAME_SIZE) {
+		PMD_DRV_LOG(ERR, "The DDP package filename is too long : '%s'", value);
+		return -1;
+	}
+	*filename = strdup(value);
+	return 0;
+}
+
 static void
 ice_check_proto_xtr_support(struct ice_hw *hw)
 {
@@ -1882,6 +1896,16 @@ int ice_load_pkg(struct ice_adapter *adapter, bool use_dsn, uint64_t dsn)
 	size_t bufsz;
 	int err;
 
+	if (adapter->devargs.ddp_filename != NULL) {
+		strlcpy(pkg_file, adapter->devargs.ddp_filename, sizeof(pkg_file));
+		if (rte_firmware_read(pkg_file, &buf, &bufsz) == 0) {
+			goto load_fw;
+		} else {
+			PMD_INIT_LOG(ERR, "Cannot load DDP file: %s\n", pkg_file);
+			return -1;
+		}
+	}
+
 	if (!use_dsn)
 		goto no_dsn;
 
@@ -2216,6 +2240,13 @@ static int ice_parse_devargs(struct rte_eth_dev *dev)
 
 	ret = rte_kvargs_process(kvlist, ICE_RX_LOW_LATENCY_ARG,
 				 &parse_bool, &ad->devargs.rx_low_latency);
+	if (ret)
+		goto bail;
+
+	ret = rte_kvargs_process(kvlist, ICE_DDP_FILENAME,
+				 &handle_ddp_filename_arg, &ad->devargs.ddp_filename);
+	if (ret)
+		goto bail;
 
 bail:
 	rte_kvargs_free(kvlist);
@@ -2689,6 +2720,8 @@ ice_dev_close(struct rte_eth_dev *dev)
 	ice_free_hw_tbls(hw);
 	rte_free(hw->port_info);
 	hw->port_info = NULL;
+	free((void *)(uintptr_t)ad->devargs.ddp_filename);
+	ad->devargs.ddp_filename = NULL;
 	ice_shutdown_all_ctrlq(hw, true);
 	rte_free(pf->proto_xtr);
 	pf->proto_xtr = NULL;
@@ -6981,6 +7014,7 @@ RTE_PMD_REGISTER_PARAM_STRING(net_ice,
 			      ICE_PROTO_XTR_ARG "=[queue:]<vlan|ipv4|ipv6|ipv6_flow|tcp|ip_offset>"
 			      ICE_SAFE_MODE_SUPPORT_ARG "=<0|1>"
 			      ICE_DEFAULT_MAC_DISABLE "=<0|1>"
+			      ICE_DDP_FILENAME "=</path/to/file>"
 			      ICE_RX_LOW_LATENCY_ARG "=<0|1>");
 
 RTE_LOG_REGISTER_SUFFIX(ice_logtype_init, init, NOTICE);
diff --git a/drivers/net/ice/ice_ethdev.h b/drivers/net/ice/ice_ethdev.h
index 3ea9f37dc8..c211b5b9cc 100644
--- a/drivers/net/ice/ice_ethdev.h
+++ b/drivers/net/ice/ice_ethdev.h
@@ -568,6 +568,7 @@ struct ice_devargs {
 	/* Name of the field. */
 	char xtr_field_name[RTE_MBUF_DYN_NAMESIZE];
 	uint64_t mbuf_check;
+	const char *ddp_filename;
 };
 
 /**
-- 
2.43.0
^ permalink raw reply	[flat|nested] 76+ messages in thread
* [PATCH v2 05/15] net/ice: add option to download scheduler topology
  2024-08-07  9:46 ` [PATCH v2 00/15] Improve rte_tm support in ICE driver Bruce Richardson
                     ` (3 preceding siblings ...)
  2024-08-07  9:46   ` [PATCH v2 04/15] net/ice: add option to choose DDP package file Bruce Richardson
@ 2024-08-07  9:46   ` Bruce Richardson
  2024-08-07  9:46   ` [PATCH v2 06/15] net/ice/base: allow init without TC class sched nodes Bruce Richardson
                     ` (9 subsequent siblings)
  14 siblings, 0 replies; 76+ messages in thread
From: Bruce Richardson @ 2024-08-07  9:46 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson
The DDP package file being loaded at init time may contain an
alternative Tx Scheduler topology in it. Add driver option to load this
topology at init time.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/ice/base/ice_ddp.c | 18 +++++++++++++++---
 drivers/net/ice/base/ice_ddp.h |  4 ++--
 drivers/net/ice/ice_ethdev.c   | 24 +++++++++++++++---------
 drivers/net/ice/ice_ethdev.h   |  1 +
 4 files changed, 33 insertions(+), 14 deletions(-)
diff --git a/drivers/net/ice/base/ice_ddp.c b/drivers/net/ice/base/ice_ddp.c
index 24506dfaea..e6c42c5274 100644
--- a/drivers/net/ice/base/ice_ddp.c
+++ b/drivers/net/ice/base/ice_ddp.c
@@ -1326,7 +1326,7 @@ ice_fill_hw_ptype(struct ice_hw *hw)
  * ice_copy_and_init_pkg() instead of directly calling ice_init_pkg() in this
  * case.
  */
-enum ice_ddp_state ice_init_pkg(struct ice_hw *hw, u8 *buf, u32 len)
+enum ice_ddp_state ice_init_pkg(struct ice_hw *hw, u8 *buf, u32 len, bool load_sched)
 {
 	bool already_loaded = false;
 	enum ice_ddp_state state;
@@ -1344,6 +1344,18 @@ enum ice_ddp_state ice_init_pkg(struct ice_hw *hw, u8 *buf, u32 len)
 		return state;
 	}
 
+	if (load_sched) {
+		enum ice_status res = ice_cfg_tx_topo(hw, buf, len);
+		if (res != ICE_SUCCESS) {
+			ice_debug(hw, ICE_DBG_INIT, "failed to apply sched topology  (err: %d)\n",
+					res);
+			return ICE_DDP_PKG_ERR;
+		}
+		ice_debug(hw, ICE_DBG_INIT, "Topology download successful, reinitializing device\n");
+		ice_deinit_hw(hw);
+		ice_init_hw(hw);
+	}
+
 	/* initialize package info */
 	state = ice_init_pkg_info(hw, pkg);
 	if (state)
@@ -1416,7 +1428,7 @@ enum ice_ddp_state ice_init_pkg(struct ice_hw *hw, u8 *buf, u32 len)
  * related routines.
  */
 enum ice_ddp_state
-ice_copy_and_init_pkg(struct ice_hw *hw, const u8 *buf, u32 len)
+ice_copy_and_init_pkg(struct ice_hw *hw, const u8 *buf, u32 len, bool load_sched)
 {
 	enum ice_ddp_state state;
 	u8 *buf_copy;
@@ -1426,7 +1438,7 @@ ice_copy_and_init_pkg(struct ice_hw *hw, const u8 *buf, u32 len)
 
 	buf_copy = (u8 *)ice_memdup(hw, buf, len, ICE_NONDMA_TO_NONDMA);
 
-	state = ice_init_pkg(hw, buf_copy, len);
+	state = ice_init_pkg(hw, buf_copy, len, load_sched);
 	if (!ice_is_init_pkg_successful(state)) {
 		/* Free the copy, since we failed to initialize the package */
 		ice_free(hw, buf_copy);
diff --git a/drivers/net/ice/base/ice_ddp.h b/drivers/net/ice/base/ice_ddp.h
index 5761920207..2feba2e91d 100644
--- a/drivers/net/ice/base/ice_ddp.h
+++ b/drivers/net/ice/base/ice_ddp.h
@@ -451,9 +451,9 @@ ice_pkg_enum_entry(struct ice_seg *ice_seg, struct ice_pkg_enum *state,
 void *
 ice_pkg_enum_section(struct ice_seg *ice_seg, struct ice_pkg_enum *state,
 		     u32 sect_type);
-enum ice_ddp_state ice_init_pkg(struct ice_hw *hw, u8 *buff, u32 len);
+enum ice_ddp_state ice_init_pkg(struct ice_hw *hw, u8 *buff, u32 len, bool load_sched);
 enum ice_ddp_state
-ice_copy_and_init_pkg(struct ice_hw *hw, const u8 *buf, u32 len);
+ice_copy_and_init_pkg(struct ice_hw *hw, const u8 *buf, u32 len, bool load_sched);
 bool ice_is_init_pkg_successful(enum ice_ddp_state state);
 void ice_free_seg(struct ice_hw *hw);
 
diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c
index 3e7ceda9ce..0d2445a317 100644
--- a/drivers/net/ice/ice_ethdev.c
+++ b/drivers/net/ice/ice_ethdev.c
@@ -37,6 +37,7 @@
 #define ICE_RX_LOW_LATENCY_ARG    "rx_low_latency"
 #define ICE_MBUF_CHECK_ARG       "mbuf_check"
 #define ICE_DDP_FILENAME          "ddp_pkg_file"
+#define ICE_DDP_LOAD_SCHED        "ddp_load_sched_topo"
 
 #define ICE_CYCLECOUNTER_MASK  0xffffffffffffffffULL
 
@@ -54,6 +55,7 @@ static const char * const ice_valid_args[] = {
 	ICE_DEFAULT_MAC_DISABLE,
 	ICE_MBUF_CHECK_ARG,
 	ICE_DDP_FILENAME,
+	ICE_DDP_LOAD_SCHED,
 	NULL
 };
 
@@ -1938,7 +1940,7 @@ int ice_load_pkg(struct ice_adapter *adapter, bool use_dsn, uint64_t dsn)
 load_fw:
 	PMD_INIT_LOG(DEBUG, "DDP package name: %s", pkg_file);
 
-	err = ice_copy_and_init_pkg(hw, buf, bufsz);
+	err = ice_copy_and_init_pkg(hw, buf, bufsz, adapter->devargs.ddp_load_sched);
 	if (!ice_is_init_pkg_successful(err)) {
 		PMD_INIT_LOG(ERR, "ice_copy_and_init_hw failed: %d\n", err);
 		free(buf);
@@ -1971,19 +1973,18 @@ static int
 parse_bool(const char *key, const char *value, void *args)
 {
 	int *i = (int *)args;
-	char *end;
-	int num;
 
-	num = strtoul(value, &end, 10);
-
-	if (num != 0 && num != 1) {
-		PMD_DRV_LOG(WARNING, "invalid value:\"%s\" for key:\"%s\", "
-			"value must be 0 or 1",
+	if (value == NULL || value[0] == '\0') {
+		PMD_DRV_LOG(WARNING, "key:\"%s\", requires a value, which must be 0 or 1", key);
+		return -1;
+	}
+	if (value[1] != '\0' || (value[0] != '0' && value[0] != '1')) {
+		PMD_DRV_LOG(WARNING, "invalid value:\"%s\" for key:\"%s\", value must be 0 or 1",
 			value, key);
 		return -1;
 	}
 
-	*i = num;
+	*i = value[0] - '0';
 	return 0;
 }
 
@@ -2248,6 +2249,10 @@ static int ice_parse_devargs(struct rte_eth_dev *dev)
 	if (ret)
 		goto bail;
 
+	ret = rte_kvargs_process(kvlist, ICE_DDP_LOAD_SCHED,
+				 &parse_bool, &ad->devargs.ddp_load_sched);
+	if (ret)
+		goto bail;
 bail:
 	rte_kvargs_free(kvlist);
 	return ret;
@@ -7014,6 +7019,7 @@ RTE_PMD_REGISTER_PARAM_STRING(net_ice,
 			      ICE_PROTO_XTR_ARG "=[queue:]<vlan|ipv4|ipv6|ipv6_flow|tcp|ip_offset>"
 			      ICE_SAFE_MODE_SUPPORT_ARG "=<0|1>"
 			      ICE_DEFAULT_MAC_DISABLE "=<0|1>"
+			      ICE_DDP_LOAD_SCHED "=<0|1>"
 			      ICE_DDP_FILENAME "=</path/to/file>"
 			      ICE_RX_LOW_LATENCY_ARG "=<0|1>");
 
diff --git a/drivers/net/ice/ice_ethdev.h b/drivers/net/ice/ice_ethdev.h
index c211b5b9cc..f31addb122 100644
--- a/drivers/net/ice/ice_ethdev.h
+++ b/drivers/net/ice/ice_ethdev.h
@@ -563,6 +563,7 @@ struct ice_devargs {
 	uint8_t proto_xtr[ICE_MAX_QUEUE_NUM];
 	uint8_t pin_idx;
 	uint8_t pps_out_ena;
+	int ddp_load_sched;
 	int xtr_field_offs;
 	uint8_t xtr_flag_offs[PROTO_XTR_MAX];
 	/* Name of the field. */
-- 
2.43.0
^ permalink raw reply	[flat|nested] 76+ messages in thread
* [PATCH v2 06/15] net/ice/base: allow init without TC class sched nodes
  2024-08-07  9:46 ` [PATCH v2 00/15] Improve rte_tm support in ICE driver Bruce Richardson
                     ` (4 preceding siblings ...)
  2024-08-07  9:46   ` [PATCH v2 05/15] net/ice: add option to download scheduler topology Bruce Richardson
@ 2024-08-07  9:46   ` Bruce Richardson
  2024-08-07  9:46   ` [PATCH v2 07/15] net/ice/base: set VSI index on newly created nodes Bruce Richardson
                     ` (8 subsequent siblings)
  14 siblings, 0 replies; 76+ messages in thread
From: Bruce Richardson @ 2024-08-07  9:46 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson
If DCB support is disabled via DDP image, there will not be any traffic
class (TC) nodes in the scheduler tree immediately above the root level.
To allow the driver to work with this scenario, we allow use of the root
node as a dummy TC0 node in case where there are no TC nodes in the
tree. For use of any other TC other than 0 (used by default in the
driver), existing behaviour of returning NULL pointer is maintained.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/ice/base/ice_sched.c | 6 ++++++
 drivers/net/ice/base/ice_type.h  | 1 +
 2 files changed, 7 insertions(+)
diff --git a/drivers/net/ice/base/ice_sched.c b/drivers/net/ice/base/ice_sched.c
index 373c32a518..f75e5ae599 100644
--- a/drivers/net/ice/base/ice_sched.c
+++ b/drivers/net/ice/base/ice_sched.c
@@ -292,6 +292,10 @@ struct ice_sched_node *ice_sched_get_tc_node(struct ice_port_info *pi, u8 tc)
 
 	if (!pi || !pi->root)
 		return NULL;
+	/* if no TC nodes, use root as TC node 0 */
+	if (pi->has_tc == 0)
+		return tc == 0 ? pi->root : NULL;
+
 	for (i = 0; i < pi->root->num_children; i++)
 		if (pi->root->children[i]->tc_num == tc)
 			return pi->root->children[i];
@@ -1306,6 +1310,8 @@ int ice_sched_init_port(struct ice_port_info *pi)
 			    ICE_AQC_ELEM_TYPE_ENTRY_POINT)
 				hw->sw_entry_point_layer = j;
 
+			if (buf[0].generic[j].data.elem_type == ICE_AQC_ELEM_TYPE_TC)
+				pi->has_tc = 1;
 			status = ice_sched_add_node(pi, j, &buf[i].generic[j], NULL);
 			if (status)
 				goto err_init_port;
diff --git a/drivers/net/ice/base/ice_type.h b/drivers/net/ice/base/ice_type.h
index 598a80155b..a70e4a8afa 100644
--- a/drivers/net/ice/base/ice_type.h
+++ b/drivers/net/ice/base/ice_type.h
@@ -1260,6 +1260,7 @@ struct ice_port_info {
 	struct ice_qos_cfg qos_cfg;
 	u8 is_vf:1;
 	u8 is_custom_tx_enabled:1;
+	u8 has_tc:1;
 };
 
 struct ice_switch_info {
-- 
2.43.0
^ permalink raw reply	[flat|nested] 76+ messages in thread
* [PATCH v2 07/15] net/ice/base: set VSI index on newly created nodes
  2024-08-07  9:46 ` [PATCH v2 00/15] Improve rte_tm support in ICE driver Bruce Richardson
                     ` (5 preceding siblings ...)
  2024-08-07  9:46   ` [PATCH v2 06/15] net/ice/base: allow init without TC class sched nodes Bruce Richardson
@ 2024-08-07  9:46   ` Bruce Richardson
  2024-08-07  9:46   ` [PATCH v2 08/15] net/ice/base: read VSI layer info from VSI Bruce Richardson
                     ` (7 subsequent siblings)
  14 siblings, 0 replies; 76+ messages in thread
From: Bruce Richardson @ 2024-08-07  9:46 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson
The ice_sched_node type has got a field for the vsi to which the node
belongs. This field was not getting set in "ice_sched_add_node", so add
a line configuring this field for each node from its parent node.
Similarly, when searching for a qgroup node, we can check for each node
that the VSI information is correct.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/ice/base/ice_sched.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ice/base/ice_sched.c b/drivers/net/ice/base/ice_sched.c
index f75e5ae599..f6dc5ae173 100644
--- a/drivers/net/ice/base/ice_sched.c
+++ b/drivers/net/ice/base/ice_sched.c
@@ -200,6 +200,7 @@ ice_sched_add_node(struct ice_port_info *pi, u8 layer,
 	node->in_use = true;
 	node->parent = parent;
 	node->tx_sched_layer = layer;
+	node->vsi_handle = parent->vsi_handle;
 	parent->children[parent->num_children++] = node;
 	node->info = elem;
 	return 0;
@@ -1581,7 +1582,7 @@ ice_sched_get_free_qparent(struct ice_port_info *pi, u16 vsi_handle, u8 tc,
 		/* make sure the qgroup node is part of the VSI subtree */
 		if (ice_sched_find_node_in_subtree(pi->hw, vsi_node, qgrp_node))
 			if (qgrp_node->num_children < max_children &&
-			    qgrp_node->owner == owner)
+			    qgrp_node->owner == owner && qgrp_node->vsi_handle == vsi_handle)
 				break;
 		qgrp_node = qgrp_node->sibling;
 	}
-- 
2.43.0
^ permalink raw reply	[flat|nested] 76+ messages in thread
* [PATCH v2 08/15] net/ice/base: read VSI layer info from VSI
  2024-08-07  9:46 ` [PATCH v2 00/15] Improve rte_tm support in ICE driver Bruce Richardson
                     ` (6 preceding siblings ...)
  2024-08-07  9:46   ` [PATCH v2 07/15] net/ice/base: set VSI index on newly created nodes Bruce Richardson
@ 2024-08-07  9:46   ` Bruce Richardson
  2024-08-07  9:47   ` [PATCH v2 09/15] net/ice/base: remove 255 limit on sched child nodes Bruce Richardson
                     ` (6 subsequent siblings)
  14 siblings, 0 replies; 76+ messages in thread
From: Bruce Richardson @ 2024-08-07  9:46 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson
Rather than computing from the number of HW layers the layer of the VSI,
we can instead just read that info from the VSI node itself. This allows
the layer to be changed at runtime.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/ice/base/ice_sched.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ice/base/ice_sched.c b/drivers/net/ice/base/ice_sched.c
index f6dc5ae173..e398984bf2 100644
--- a/drivers/net/ice/base/ice_sched.c
+++ b/drivers/net/ice/base/ice_sched.c
@@ -1559,7 +1559,6 @@ ice_sched_get_free_qparent(struct ice_port_info *pi, u16 vsi_handle, u8 tc,
 	u16 max_children;
 
 	qgrp_layer = ice_sched_get_qgrp_layer(pi->hw);
-	vsi_layer = ice_sched_get_vsi_layer(pi->hw);
 	max_children = pi->hw->max_children[qgrp_layer];
 
 	vsi_ctx = ice_get_vsi_ctx(pi->hw, vsi_handle);
@@ -1569,6 +1568,7 @@ ice_sched_get_free_qparent(struct ice_port_info *pi, u16 vsi_handle, u8 tc,
 	/* validate invalid VSI ID */
 	if (!vsi_node)
 		return NULL;
+	vsi_layer = vsi_node->tx_sched_layer;
 
 	/* If the queue group and vsi layer are same then queues
 	 * are all attached directly to VSI
-- 
2.43.0
^ permalink raw reply	[flat|nested] 76+ messages in thread
* [PATCH v2 09/15] net/ice/base: remove 255 limit on sched child nodes
  2024-08-07  9:46 ` [PATCH v2 00/15] Improve rte_tm support in ICE driver Bruce Richardson
                     ` (7 preceding siblings ...)
  2024-08-07  9:46   ` [PATCH v2 08/15] net/ice/base: read VSI layer info from VSI Bruce Richardson
@ 2024-08-07  9:47   ` Bruce Richardson
  2024-08-07  9:47   ` [PATCH v2 10/15] net/ice/base: optimize subtree searches Bruce Richardson
                     ` (5 subsequent siblings)
  14 siblings, 0 replies; 76+ messages in thread
From: Bruce Richardson @ 2024-08-07  9:47 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson
The Tx scheduler in the ice driver can be configured to have large
numbers of child nodes at a given layer, but the driver code implicitly
limited the number of nodes to 255 by using a u8 datatype for the number
of children. Increase this to a 16-bit value throughout the code.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/ice/base/ice_sched.c | 25 ++++++++++++++-----------
 drivers/net/ice/base/ice_type.h  |  2 +-
 2 files changed, 15 insertions(+), 12 deletions(-)
diff --git a/drivers/net/ice/base/ice_sched.c b/drivers/net/ice/base/ice_sched.c
index e398984bf2..be13833e1e 100644
--- a/drivers/net/ice/base/ice_sched.c
+++ b/drivers/net/ice/base/ice_sched.c
@@ -289,7 +289,7 @@ ice_sched_get_first_node(struct ice_port_info *pi,
  */
 struct ice_sched_node *ice_sched_get_tc_node(struct ice_port_info *pi, u8 tc)
 {
-	u8 i;
+	u16 i;
 
 	if (!pi || !pi->root)
 		return NULL;
@@ -316,7 +316,7 @@ void ice_free_sched_node(struct ice_port_info *pi, struct ice_sched_node *node)
 {
 	struct ice_sched_node *parent;
 	struct ice_hw *hw = pi->hw;
-	u8 i, j;
+	u16 i, j;
 
 	/* Free the children before freeing up the parent node
 	 * The parent array is updated below and that shifts the nodes
@@ -1473,7 +1473,7 @@ bool
 ice_sched_find_node_in_subtree(struct ice_hw *hw, struct ice_sched_node *base,
 			       struct ice_sched_node *node)
 {
-	u8 i;
+	u16 i;
 
 	for (i = 0; i < base->num_children; i++) {
 		struct ice_sched_node *child = base->children[i];
@@ -1510,7 +1510,7 @@ ice_sched_get_free_qgrp(struct ice_port_info *pi,
 			struct ice_sched_node *qgrp_node, u8 owner)
 {
 	struct ice_sched_node *min_qgrp;
-	u8 min_children;
+	u16 min_children;
 
 	if (!qgrp_node)
 		return qgrp_node;
@@ -2070,7 +2070,7 @@ static void ice_sched_rm_agg_vsi_info(struct ice_port_info *pi, u16 vsi_handle)
  */
 static bool ice_sched_is_leaf_node_present(struct ice_sched_node *node)
 {
-	u8 i;
+	u16 i;
 
 	for (i = 0; i < node->num_children; i++)
 		if (ice_sched_is_leaf_node_present(node->children[i]))
@@ -2105,7 +2105,7 @@ ice_sched_rm_vsi_cfg(struct ice_port_info *pi, u16 vsi_handle, u8 owner)
 
 	ice_for_each_traffic_class(i) {
 		struct ice_sched_node *vsi_node, *tc_node;
-		u8 j = 0;
+		u16 j = 0;
 
 		tc_node = ice_sched_get_tc_node(pi, i);
 		if (!tc_node)
@@ -2173,7 +2173,7 @@ int ice_rm_vsi_lan_cfg(struct ice_port_info *pi, u16 vsi_handle)
  */
 bool ice_sched_is_tree_balanced(struct ice_hw *hw, struct ice_sched_node *node)
 {
-	u8 i;
+	u16 i;
 
 	/* start from the leaf node */
 	for (i = 0; i < node->num_children; i++)
@@ -2247,7 +2247,8 @@ ice_sched_get_free_vsi_parent(struct ice_hw *hw, struct ice_sched_node *node,
 			      u16 *num_nodes)
 {
 	u8 l = node->tx_sched_layer;
-	u8 vsil, i;
+	u8 vsil;
+	u16 i;
 
 	vsil = ice_sched_get_vsi_layer(hw);
 
@@ -2289,7 +2290,7 @@ ice_sched_update_parent(struct ice_sched_node *new_parent,
 			struct ice_sched_node *node)
 {
 	struct ice_sched_node *old_parent;
-	u8 i, j;
+	u16 i, j;
 
 	old_parent = node->parent;
 
@@ -2389,7 +2390,8 @@ ice_sched_move_vsi_to_agg(struct ice_port_info *pi, u16 vsi_handle, u32 agg_id,
 	u16 num_nodes[ICE_AQC_TOPO_MAX_LEVEL_NUM] = { 0 };
 	u32 first_node_teid, vsi_teid;
 	u16 num_nodes_added;
-	u8 aggl, vsil, i;
+	u8 aggl, vsil;
+	u16 i;
 	int status;
 
 	tc_node = ice_sched_get_tc_node(pi, tc);
@@ -2505,7 +2507,8 @@ ice_move_all_vsi_to_dflt_agg(struct ice_port_info *pi,
 static bool
 ice_sched_is_agg_inuse(struct ice_port_info *pi, struct ice_sched_node *node)
 {
-	u8 vsil, i;
+	u8 vsil;
+	u16 i;
 
 	vsil = ice_sched_get_vsi_layer(pi->hw);
 	if (node->tx_sched_layer < vsil - 1) {
diff --git a/drivers/net/ice/base/ice_type.h b/drivers/net/ice/base/ice_type.h
index a70e4a8afa..35f832eb9f 100644
--- a/drivers/net/ice/base/ice_type.h
+++ b/drivers/net/ice/base/ice_type.h
@@ -1030,9 +1030,9 @@ struct ice_sched_node {
 	struct ice_aqc_txsched_elem_data info;
 	u32 agg_id;			/* aggregator group ID */
 	u16 vsi_handle;
+	u16 num_children;
 	u8 in_use;			/* suspended or in use */
 	u8 tx_sched_layer;		/* Logical Layer (1-9) */
-	u8 num_children;
 	u8 tc_num;
 	u8 owner;
 #define ICE_SCHED_NODE_OWNER_LAN	0
-- 
2.43.0
^ permalink raw reply	[flat|nested] 76+ messages in thread
* [PATCH v2 10/15] net/ice/base: optimize subtree searches
  2024-08-07  9:46 ` [PATCH v2 00/15] Improve rte_tm support in ICE driver Bruce Richardson
                     ` (8 preceding siblings ...)
  2024-08-07  9:47   ` [PATCH v2 09/15] net/ice/base: remove 255 limit on sched child nodes Bruce Richardson
@ 2024-08-07  9:47   ` Bruce Richardson
  2024-08-07  9:47   ` [PATCH v2 11/15] net/ice/base: make functions non-static Bruce Richardson
                     ` (4 subsequent siblings)
  14 siblings, 0 replies; 76+ messages in thread
From: Bruce Richardson @ 2024-08-07  9:47 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson
In a number of places throughout the driver code, we want to confirm
that a scheduler node is indeed a child of another node. Currently, this
is confirmed by searching down the tree from the base until the desired
node is hit, a search which may hit many irrelevant tree nodes when
recursing down wrong branches. By switching the direction of search, to
check upwards from the node to the parent, we can avoid any incorrect
paths, and so speed up processing.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/ice/base/ice_sched.c | 18 +++++-------------
 1 file changed, 5 insertions(+), 13 deletions(-)
diff --git a/drivers/net/ice/base/ice_sched.c b/drivers/net/ice/base/ice_sched.c
index be13833e1e..f7d5f8f415 100644
--- a/drivers/net/ice/base/ice_sched.c
+++ b/drivers/net/ice/base/ice_sched.c
@@ -1475,20 +1475,12 @@ ice_sched_find_node_in_subtree(struct ice_hw *hw, struct ice_sched_node *base,
 {
 	u16 i;
 
-	for (i = 0; i < base->num_children; i++) {
-		struct ice_sched_node *child = base->children[i];
-
-		if (node == child)
-			return true;
-
-		if (child->tx_sched_layer > node->tx_sched_layer)
-			return false;
-
-		/* this recursion is intentional, and wouldn't
-		 * go more than 8 calls
-		 */
-		if (ice_sched_find_node_in_subtree(hw, child, node))
+	if (base == node)
+		return true;
+	while (node->tx_sched_layer != 0 && node->parent != NULL) {
+		if (node->parent == base)
 			return true;
+		node = node->parent;
 	}
 	return false;
 }
-- 
2.43.0
^ permalink raw reply	[flat|nested] 76+ messages in thread
* [PATCH v2 11/15] net/ice/base: make functions non-static
  2024-08-07  9:46 ` [PATCH v2 00/15] Improve rte_tm support in ICE driver Bruce Richardson
                     ` (9 preceding siblings ...)
  2024-08-07  9:47   ` [PATCH v2 10/15] net/ice/base: optimize subtree searches Bruce Richardson
@ 2024-08-07  9:47   ` Bruce Richardson
  2024-08-07  9:47   ` [PATCH v2 12/15] net/ice/base: remove flag checks before topology upload Bruce Richardson
                     ` (3 subsequent siblings)
  14 siblings, 0 replies; 76+ messages in thread
From: Bruce Richardson @ 2024-08-07  9:47 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson
We will need to allocate more lanq contexts after a scheduler rework, so
make that function non-static so accessible outside the file. For similar
reasons, make the function to add a Tx scheduler node non-static
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/ice/base/ice_sched.c | 2 +-
 drivers/net/ice/base/ice_sched.h | 8 ++++++++
 2 files changed, 9 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ice/base/ice_sched.c b/drivers/net/ice/base/ice_sched.c
index f7d5f8f415..d88b836c38 100644
--- a/drivers/net/ice/base/ice_sched.c
+++ b/drivers/net/ice/base/ice_sched.c
@@ -570,7 +570,7 @@ ice_sched_suspend_resume_elems(struct ice_hw *hw, u8 num_nodes, u32 *node_teids,
  * @tc: TC number
  * @new_numqs: number of queues
  */
-static int
+int
 ice_alloc_lan_q_ctx(struct ice_hw *hw, u16 vsi_handle, u8 tc, u16 new_numqs)
 {
 	struct ice_vsi_ctx *vsi_ctx;
diff --git a/drivers/net/ice/base/ice_sched.h b/drivers/net/ice/base/ice_sched.h
index 9f78516dfb..c7eb794963 100644
--- a/drivers/net/ice/base/ice_sched.h
+++ b/drivers/net/ice/base/ice_sched.h
@@ -270,4 +270,12 @@ int ice_sched_replay_q_bw(struct ice_port_info *pi, struct ice_q_ctx *q_ctx);
 int
 ice_sched_cfg_node_bw_alloc(struct ice_hw *hw, struct ice_sched_node *node,
 			    enum ice_rl_type rl_type, u16 bw_alloc);
+
+int
+ice_sched_add_elems(struct ice_port_info *pi, struct ice_sched_node *tc_node,
+		    struct ice_sched_node *parent, u8 layer, u16 num_nodes,
+		    u16 *num_nodes_added, u32 *first_node_teid,
+		    struct ice_sched_node **prealloc_nodes);
+int
+ice_alloc_lan_q_ctx(struct ice_hw *hw, u16 vsi_handle, u8 tc, u16 new_numqs);
 #endif /* _ICE_SCHED_H_ */
-- 
2.43.0
^ permalink raw reply	[flat|nested] 76+ messages in thread
* [PATCH v2 12/15] net/ice/base: remove flag checks before topology upload
  2024-08-07  9:46 ` [PATCH v2 00/15] Improve rte_tm support in ICE driver Bruce Richardson
                     ` (10 preceding siblings ...)
  2024-08-07  9:47   ` [PATCH v2 11/15] net/ice/base: make functions non-static Bruce Richardson
@ 2024-08-07  9:47   ` Bruce Richardson
  2024-08-07  9:47   ` [PATCH v2 13/15] net/ice: limit the number of queues to sched capabilities Bruce Richardson
                     ` (2 subsequent siblings)
  14 siblings, 0 replies; 76+ messages in thread
From: Bruce Richardson @ 2024-08-07  9:47 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson
DPDK should support more than just 9-level or 5-level topologies, so
remove the checks for those particular settings.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/ice/base/ice_ddp.c | 33 ---------------------------------
 1 file changed, 33 deletions(-)
diff --git a/drivers/net/ice/base/ice_ddp.c b/drivers/net/ice/base/ice_ddp.c
index e6c42c5274..744f015fe5 100644
--- a/drivers/net/ice/base/ice_ddp.c
+++ b/drivers/net/ice/base/ice_ddp.c
@@ -2373,38 +2373,6 @@ int ice_cfg_tx_topo(struct ice_hw *hw, u8 *buf, u32 len)
 		return status;
 	}
 
-	/* Is default topology already applied ? */
-	if (!(flags & ICE_AQC_TX_TOPO_FLAGS_LOAD_NEW) &&
-	    hw->num_tx_sched_layers == 9) {
-		ice_debug(hw, ICE_DBG_INIT, "Loaded default topology\n");
-		/* Already default topology is loaded */
-		return ICE_ERR_ALREADY_EXISTS;
-	}
-
-	/* Is new topology already applied ? */
-	if ((flags & ICE_AQC_TX_TOPO_FLAGS_LOAD_NEW) &&
-	    hw->num_tx_sched_layers == 5) {
-		ice_debug(hw, ICE_DBG_INIT, "Loaded new topology\n");
-		/* Already new topology is loaded */
-		return ICE_ERR_ALREADY_EXISTS;
-	}
-
-	/* Is set topology issued already ? */
-	if (flags & ICE_AQC_TX_TOPO_FLAGS_ISSUED) {
-		ice_debug(hw, ICE_DBG_INIT, "Update tx topology was done by another PF\n");
-		/* add a small delay before exiting */
-		for (i = 0; i < 20; i++)
-			ice_msec_delay(100, true);
-		return ICE_ERR_ALREADY_EXISTS;
-	}
-
-	/* Change the topology from new to default (5 to 9) */
-	if (!(flags & ICE_AQC_TX_TOPO_FLAGS_LOAD_NEW) &&
-	    hw->num_tx_sched_layers == 5) {
-		ice_debug(hw, ICE_DBG_INIT, "Change topology from 5 to 9 layers\n");
-		goto update_topo;
-	}
-
 	pkg_hdr = (struct ice_pkg_hdr *)buf;
 	state = ice_verify_pkg(pkg_hdr, len);
 	if (state) {
@@ -2451,7 +2419,6 @@ int ice_cfg_tx_topo(struct ice_hw *hw, u8 *buf, u32 len)
 	/* Get the new topology buffer */
 	new_topo = ((u8 *)section) + offset;
 
-update_topo:
 	/* acquire global lock to make sure that set topology issued
 	 * by one PF
 	 */
-- 
2.43.0
^ permalink raw reply	[flat|nested] 76+ messages in thread
* [PATCH v2 13/15] net/ice: limit the number of queues to sched capabilities
  2024-08-07  9:46 ` [PATCH v2 00/15] Improve rte_tm support in ICE driver Bruce Richardson
                     ` (11 preceding siblings ...)
  2024-08-07  9:47   ` [PATCH v2 12/15] net/ice/base: remove flag checks before topology upload Bruce Richardson
@ 2024-08-07  9:47   ` Bruce Richardson
  2024-08-07  9:47   ` [PATCH v2 14/15] net/ice: enhance Tx scheduler hierarchy support Bruce Richardson
  2024-08-07  9:47   ` [PATCH v2 15/15] net/ice: add minimal capability reporting API Bruce Richardson
  14 siblings, 0 replies; 76+ messages in thread
From: Bruce Richardson @ 2024-08-07  9:47 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson
Rather than assuming that each VSI can hold up to 256 queue pairs,
or the reported device limit, query the available nodes in the scheduler
tree to check that we are not overflowing the limit for number of
child scheduling nodes at each level. Do this by multiplying
max_children for each level beyond the VSI and using that as an
additional cap on the number of queues.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/ice/ice_ethdev.c | 25 ++++++++++++++++++++-----
 1 file changed, 20 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c
index 0d2445a317..ab3f88fd7d 100644
--- a/drivers/net/ice/ice_ethdev.c
+++ b/drivers/net/ice/ice_ethdev.c
@@ -913,7 +913,7 @@ ice_vsi_config_default_rss(struct ice_aqc_vsi_props *info)
 }
 
 static int
-ice_vsi_config_tc_queue_mapping(struct ice_vsi *vsi,
+ice_vsi_config_tc_queue_mapping(struct ice_hw *hw, struct ice_vsi *vsi,
 				struct ice_aqc_vsi_props *info,
 				uint8_t enabled_tcmap)
 {
@@ -929,13 +929,28 @@ ice_vsi_config_tc_queue_mapping(struct ice_vsi *vsi,
 	}
 
 	/* vector 0 is reserved and 1 vector for ctrl vsi */
-	if (vsi->adapter->hw.func_caps.common_cap.num_msix_vectors < 2)
+	if (vsi->adapter->hw.func_caps.common_cap.num_msix_vectors < 2) {
 		vsi->nb_qps = 0;
-	else
+	} else {
 		vsi->nb_qps = RTE_MIN
 			((uint16_t)vsi->adapter->hw.func_caps.common_cap.num_msix_vectors - 2,
 			RTE_MIN(vsi->nb_qps, ICE_MAX_Q_PER_TC));
 
+		/* cap max QPs to what the HW reports as num-children for each layer.
+		 * Multiply num_children for each layer from the entry_point layer to
+		 * the qgroup, or second-last layer.
+		 * Avoid any potential overflow by using uint32_t type and breaking loop
+		 * once we have a number greater than the already configured max.
+		 */
+		uint32_t max_sched_vsi_nodes = 1;
+		for (uint8_t i = hw->sw_entry_point_layer; i < hw->num_tx_sched_layers - 1; i++) {
+			max_sched_vsi_nodes *= hw->max_children[i];
+			if (max_sched_vsi_nodes >= vsi->nb_qps)
+				break;
+		}
+		vsi->nb_qps = RTE_MIN(vsi->nb_qps, max_sched_vsi_nodes);
+	}
+
 	/* nb_qps(hex)  -> fls */
 	/* 0000		-> 0 */
 	/* 0001		-> 0 */
@@ -1707,7 +1722,7 @@ ice_setup_vsi(struct ice_pf *pf, enum ice_vsi_type type)
 			rte_cpu_to_le_16(hw->func_caps.fd_fltr_best_effort);
 
 		/* Enable VLAN/UP trip */
-		ret = ice_vsi_config_tc_queue_mapping(vsi,
+		ret = ice_vsi_config_tc_queue_mapping(hw, vsi,
 						      &vsi_ctx.info,
 						      ICE_DEFAULT_TCMAP);
 		if (ret) {
@@ -1731,7 +1746,7 @@ ice_setup_vsi(struct ice_pf *pf, enum ice_vsi_type type)
 		vsi_ctx.info.fd_options = rte_cpu_to_le_16(cfg);
 		vsi_ctx.info.sw_id = hw->port_info->sw_id;
 		vsi_ctx.info.sw_flags2 = ICE_AQ_VSI_SW_FLAG_LAN_ENA;
-		ret = ice_vsi_config_tc_queue_mapping(vsi,
+		ret = ice_vsi_config_tc_queue_mapping(hw, vsi,
 						      &vsi_ctx.info,
 						      ICE_DEFAULT_TCMAP);
 		if (ret) {
-- 
2.43.0
^ permalink raw reply	[flat|nested] 76+ messages in thread
* [PATCH v2 14/15] net/ice: enhance Tx scheduler hierarchy support
  2024-08-07  9:46 ` [PATCH v2 00/15] Improve rte_tm support in ICE driver Bruce Richardson
                     ` (12 preceding siblings ...)
  2024-08-07  9:47   ` [PATCH v2 13/15] net/ice: limit the number of queues to sched capabilities Bruce Richardson
@ 2024-08-07  9:47   ` Bruce Richardson
  2024-08-07  9:47   ` [PATCH v2 15/15] net/ice: add minimal capability reporting API Bruce Richardson
  14 siblings, 0 replies; 76+ messages in thread
From: Bruce Richardson @ 2024-08-07  9:47 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson
Increase the flexibility of the Tx scheduler hierarchy support in the
driver. If the HW/firmware allows it, allow creating up to 2k child
nodes per scheduler node. Also expand the number of supported layers to
the max available, rather than always just having 3 layers.  One
restriction on this change is that the topology needs to be configured
and enabled before port queue setup, in many cases, and before port
start in all cases.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/ice/ice_ethdev.c |   9 -
 drivers/net/ice/ice_ethdev.h |  15 +-
 drivers/net/ice/ice_rxtx.c   |  10 +
 drivers/net/ice/ice_tm.c     | 495 ++++++++++++++---------------------
 4 files changed, 212 insertions(+), 317 deletions(-)
diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c
index ab3f88fd7d..5a5967ff71 100644
--- a/drivers/net/ice/ice_ethdev.c
+++ b/drivers/net/ice/ice_ethdev.c
@@ -3832,7 +3832,6 @@ ice_dev_start(struct rte_eth_dev *dev)
 	int mask, ret;
 	uint8_t timer = hw->func_caps.ts_func_info.tmr_index_owned;
 	uint32_t pin_idx = ad->devargs.pin_idx;
-	struct rte_tm_error tm_err;
 	ice_declare_bitmap(pmask, ICE_PROMISC_MAX);
 	ice_zero_bitmap(pmask, ICE_PROMISC_MAX);
 
@@ -3864,14 +3863,6 @@ ice_dev_start(struct rte_eth_dev *dev)
 		}
 	}
 
-	if (pf->tm_conf.committed) {
-		ret = ice_do_hierarchy_commit(dev, pf->tm_conf.clear_on_fail, &tm_err);
-		if (ret) {
-			PMD_DRV_LOG(ERR, "fail to commit Tx scheduler");
-			goto rx_err;
-		}
-	}
-
 	ice_set_rx_function(dev);
 	ice_set_tx_function(dev);
 
diff --git a/drivers/net/ice/ice_ethdev.h b/drivers/net/ice/ice_ethdev.h
index f31addb122..cb1a7e8e0d 100644
--- a/drivers/net/ice/ice_ethdev.h
+++ b/drivers/net/ice/ice_ethdev.h
@@ -479,14 +479,6 @@ struct ice_tm_node {
 	struct ice_sched_node *sched_node;
 };
 
-/* node type of Traffic Manager */
-enum ice_tm_node_type {
-	ICE_TM_NODE_TYPE_PORT,
-	ICE_TM_NODE_TYPE_QGROUP,
-	ICE_TM_NODE_TYPE_QUEUE,
-	ICE_TM_NODE_TYPE_MAX,
-};
-
 /* Struct to store all the Traffic Manager configuration. */
 struct ice_tm_conf {
 	struct ice_shaper_profile_list shaper_profile_list;
@@ -690,9 +682,6 @@ int ice_rem_rss_cfg_wrap(struct ice_pf *pf, uint16_t vsi_id,
 			 struct ice_rss_hash_cfg *cfg);
 void ice_tm_conf_init(struct rte_eth_dev *dev);
 void ice_tm_conf_uninit(struct rte_eth_dev *dev);
-int ice_do_hierarchy_commit(struct rte_eth_dev *dev,
-			    int clear_on_fail,
-			    struct rte_tm_error *error);
 extern const struct rte_tm_ops ice_tm_ops;
 
 static inline int
@@ -750,4 +739,8 @@ int rte_pmd_ice_dump_switch(uint16_t port, uint8_t **buff, uint32_t *size);
 
 __rte_experimental
 int rte_pmd_ice_dump_txsched(uint16_t port, bool detail, FILE *stream);
+
+int
+ice_tm_setup_txq_node(struct ice_pf *pf, struct ice_hw *hw, uint16_t qid, uint32_t node_teid);
+
 #endif /* _ICE_ETHDEV_H_ */
diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c
index a150d28e73..7a421bb364 100644
--- a/drivers/net/ice/ice_rxtx.c
+++ b/drivers/net/ice/ice_rxtx.c
@@ -747,6 +747,7 @@ ice_tx_queue_start(struct rte_eth_dev *dev, uint16_t tx_queue_id)
 	int err;
 	struct ice_vsi *vsi;
 	struct ice_hw *hw;
+	struct ice_pf *pf;
 	struct ice_aqc_add_tx_qgrp *txq_elem;
 	struct ice_tlan_ctx tx_ctx;
 	int buf_len;
@@ -777,6 +778,7 @@ ice_tx_queue_start(struct rte_eth_dev *dev, uint16_t tx_queue_id)
 
 	vsi = txq->vsi;
 	hw = ICE_VSI_TO_HW(vsi);
+	pf = ICE_VSI_TO_PF(vsi);
 
 	memset(&tx_ctx, 0, sizeof(tx_ctx));
 	txq_elem->num_txqs = 1;
@@ -812,6 +814,14 @@ ice_tx_queue_start(struct rte_eth_dev *dev, uint16_t tx_queue_id)
 	/* store the schedule node id */
 	txq->q_teid = txq_elem->txqs[0].q_teid;
 
+	/* move the queue to correct position in hierarchy, if explicit hierarchy configured */
+	if (pf->tm_conf.committed)
+		if (ice_tm_setup_txq_node(pf, hw, tx_queue_id, txq->q_teid) != 0) {
+			PMD_DRV_LOG(ERR, "Failed to set up txq traffic management node");
+			rte_free(txq_elem);
+			return -EIO;
+		}
+
 	dev->data->tx_queue_state[tx_queue_id] = RTE_ETH_QUEUE_STATE_STARTED;
 
 	rte_free(txq_elem);
diff --git a/drivers/net/ice/ice_tm.c b/drivers/net/ice/ice_tm.c
index 459446a6b0..a86943a5b2 100644
--- a/drivers/net/ice/ice_tm.c
+++ b/drivers/net/ice/ice_tm.c
@@ -1,17 +1,17 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2022 Intel Corporation
  */
+#include <rte_ethdev.h>
 #include <rte_tm_driver.h>
 
 #include "ice_ethdev.h"
 #include "ice_rxtx.h"
 
-#define MAX_CHILDREN_PER_SCHED_NODE	8
-#define MAX_CHILDREN_PER_TM_NODE	256
+#define MAX_CHILDREN_PER_TM_NODE	2048
 
 static int ice_hierarchy_commit(struct rte_eth_dev *dev,
 				 int clear_on_fail,
-				 __rte_unused struct rte_tm_error *error);
+				 struct rte_tm_error *error);
 static int ice_tm_node_add(struct rte_eth_dev *dev, uint32_t node_id,
 	      uint32_t parent_node_id, uint32_t priority,
 	      uint32_t weight, uint32_t level_id,
@@ -86,9 +86,10 @@ ice_tm_conf_uninit(struct rte_eth_dev *dev)
 }
 
 static int
-ice_node_param_check(struct ice_pf *pf, uint32_t node_id,
+ice_node_param_check(uint32_t node_id,
 		      uint32_t priority, uint32_t weight,
 		      const struct rte_tm_node_params *params,
+		      bool is_leaf,
 		      struct rte_tm_error *error)
 {
 	/* checked all the unsupported parameter */
@@ -123,7 +124,7 @@ ice_node_param_check(struct ice_pf *pf, uint32_t node_id,
 	}
 
 	/* for non-leaf node */
-	if (node_id >= pf->dev_data->nb_tx_queues) {
+	if (!is_leaf) {
 		if (params->nonleaf.wfq_weight_mode) {
 			error->type =
 				RTE_TM_ERROR_TYPE_NODE_PARAMS_WFQ_WEIGHT_MODE;
@@ -147,6 +148,11 @@ ice_node_param_check(struct ice_pf *pf, uint32_t node_id,
 	}
 
 	/* for leaf node */
+	if (node_id >= RTE_MAX_QUEUES_PER_PORT) {
+		error->type = RTE_TM_ERROR_TYPE_NODE_ID;
+		error->message = "Node ID out of range for a leaf node.";
+		return -EINVAL;
+	}
 	if (params->leaf.cman) {
 		error->type = RTE_TM_ERROR_TYPE_NODE_PARAMS_CMAN;
 		error->message = "Congestion management not supported";
@@ -193,11 +199,18 @@ find_node(struct ice_tm_node *root, uint32_t id)
 	return NULL;
 }
 
+static inline uint8_t
+ice_get_leaf_level(struct ice_hw *hw)
+{
+	return hw->num_tx_sched_layers - 1 - hw->port_info->has_tc;
+}
+
 static int
 ice_node_type_get(struct rte_eth_dev *dev, uint32_t node_id,
 		   int *is_leaf, struct rte_tm_error *error)
 {
 	struct ice_pf *pf = ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
+	struct ice_hw *hw = ICE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
 	struct ice_tm_node *tm_node;
 
 	if (!is_leaf || !error)
@@ -217,7 +230,7 @@ ice_node_type_get(struct rte_eth_dev *dev, uint32_t node_id,
 		return -EINVAL;
 	}
 
-	if (tm_node->level == ICE_TM_NODE_TYPE_QUEUE)
+	if (tm_node->level == ice_get_leaf_level(hw))
 		*is_leaf = true;
 	else
 		*is_leaf = false;
@@ -389,16 +402,28 @@ ice_tm_node_add(struct rte_eth_dev *dev, uint32_t node_id,
 	      struct rte_tm_error *error)
 {
 	struct ice_pf *pf = ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
+	struct ice_hw *hw = ICE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
 	struct ice_tm_shaper_profile *shaper_profile = NULL;
 	struct ice_tm_node *tm_node;
-	struct ice_tm_node *parent_node;
+	struct ice_tm_node *parent_node = NULL;
 	int ret;
 
 	if (!params || !error)
 		return -EINVAL;
 
-	ret = ice_node_param_check(pf, node_id, priority, weight,
-				    params, error);
+	if (parent_node_id != RTE_TM_NODE_ID_NULL) {
+		parent_node = find_node(pf->tm_conf.root, parent_node_id);
+		if (!parent_node) {
+			error->type = RTE_TM_ERROR_TYPE_NODE_PARENT_NODE_ID;
+			error->message = "parent not exist";
+			return -EINVAL;
+		}
+	}
+	if (level_id == RTE_TM_NODE_LEVEL_ID_ANY && parent_node != NULL)
+		level_id = parent_node->level + 1;
+
+	ret = ice_node_param_check(node_id, priority, weight,
+			params, level_id == ice_get_leaf_level(hw), error);
 	if (ret)
 		return ret;
 
@@ -424,9 +449,9 @@ ice_tm_node_add(struct rte_eth_dev *dev, uint32_t node_id,
 	/* root node if not have a parent */
 	if (parent_node_id == RTE_TM_NODE_ID_NULL) {
 		/* check level */
-		if (level_id != ICE_TM_NODE_TYPE_PORT) {
+		if (level_id != 0) {
 			error->type = RTE_TM_ERROR_TYPE_NODE_PARAMS;
-			error->message = "Wrong level";
+			error->message = "Wrong level, root node (NULL parent) must be at level 0";
 			return -EINVAL;
 		}
 
@@ -445,7 +470,7 @@ ice_tm_node_add(struct rte_eth_dev *dev, uint32_t node_id,
 		if (!tm_node)
 			return -ENOMEM;
 		tm_node->id = node_id;
-		tm_node->level = ICE_TM_NODE_TYPE_PORT;
+		tm_node->level = 0;
 		tm_node->parent = NULL;
 		tm_node->reference_count = 0;
 		tm_node->shaper_profile = shaper_profile;
@@ -458,48 +483,21 @@ ice_tm_node_add(struct rte_eth_dev *dev, uint32_t node_id,
 	}
 
 	/* check the parent node */
-	parent_node = find_node(pf->tm_conf.root, parent_node_id);
-	if (!parent_node) {
-		error->type = RTE_TM_ERROR_TYPE_NODE_PARENT_NODE_ID;
-		error->message = "parent not exist";
-		return -EINVAL;
-	}
-	if (parent_node->level != ICE_TM_NODE_TYPE_PORT &&
-	    parent_node->level != ICE_TM_NODE_TYPE_QGROUP) {
+	/* for n-level hierarchy, level n-1 is leaf, so last level with children is n-2 */
+	if ((int)parent_node->level > hw->num_tx_sched_layers - 2) {
 		error->type = RTE_TM_ERROR_TYPE_NODE_PARENT_NODE_ID;
 		error->message = "parent is not valid";
 		return -EINVAL;
 	}
 	/* check level */
-	if (level_id != RTE_TM_NODE_LEVEL_ID_ANY &&
-	    level_id != parent_node->level + 1) {
+	if (level_id != parent_node->level + 1) {
 		error->type = RTE_TM_ERROR_TYPE_NODE_PARAMS;
 		error->message = "Wrong level";
 		return -EINVAL;
 	}
 
 	/* check the node number */
-	if (parent_node->level == ICE_TM_NODE_TYPE_PORT) {
-		/* check the queue group number */
-		if (parent_node->reference_count >= pf->dev_data->nb_tx_queues) {
-			error->type = RTE_TM_ERROR_TYPE_NODE_ID;
-			error->message = "too many queue groups";
-			return -EINVAL;
-		}
-	} else {
-		/* check the queue number */
-		if (parent_node->reference_count >=
-			MAX_CHILDREN_PER_SCHED_NODE) {
-			error->type = RTE_TM_ERROR_TYPE_NODE_ID;
-			error->message = "too many queues";
-			return -EINVAL;
-		}
-		if (node_id >= pf->dev_data->nb_tx_queues) {
-			error->type = RTE_TM_ERROR_TYPE_NODE_ID;
-			error->message = "too large queue id";
-			return -EINVAL;
-		}
-	}
+	/* TODO, check max children allowed and max nodes at this level */
 
 	tm_node = rte_zmalloc(NULL,
 			      sizeof(struct ice_tm_node) +
@@ -518,13 +516,12 @@ ice_tm_node_add(struct rte_eth_dev *dev, uint32_t node_id,
 		(void *)((uint8_t *)tm_node + sizeof(struct ice_tm_node));
 	tm_node->parent->children[tm_node->parent->reference_count] = tm_node;
 
-	if (tm_node->priority != 0 && level_id != ICE_TM_NODE_TYPE_QUEUE &&
-	    level_id != ICE_TM_NODE_TYPE_QGROUP)
+	if (tm_node->priority != 0)
+		/* TODO fixme, some levels may support this perhaps? */
 		PMD_DRV_LOG(WARNING, "priority != 0 not supported in level %d",
 			    level_id);
 
-	if (tm_node->weight != 1 &&
-	    level_id != ICE_TM_NODE_TYPE_QUEUE && level_id != ICE_TM_NODE_TYPE_QGROUP)
+	if (tm_node->weight != 1 && level_id == 0)
 		PMD_DRV_LOG(WARNING, "weight != 1 not supported in level %d",
 			    level_id);
 
@@ -569,7 +566,7 @@ ice_tm_node_delete(struct rte_eth_dev *dev, uint32_t node_id,
 	}
 
 	/* root node */
-	if (tm_node->level == ICE_TM_NODE_TYPE_PORT) {
+	if (tm_node->level == 0) {
 		rte_free(tm_node);
 		pf->tm_conf.root = NULL;
 		return 0;
@@ -589,53 +586,6 @@ ice_tm_node_delete(struct rte_eth_dev *dev, uint32_t node_id,
 	return 0;
 }
 
-static int ice_move_recfg_lan_txq(struct rte_eth_dev *dev,
-				  struct ice_sched_node *queue_sched_node,
-				  struct ice_sched_node *dst_node,
-				  uint16_t queue_id)
-{
-	struct ice_hw *hw = ICE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
-	struct ice_aqc_move_txqs_data *buf;
-	struct ice_sched_node *queue_parent_node;
-	uint8_t txqs_moved;
-	int ret = ICE_SUCCESS;
-	uint16_t buf_size = ice_struct_size(buf, txqs, 1);
-
-	buf = (struct ice_aqc_move_txqs_data *)ice_malloc(hw, sizeof(*buf));
-	if (buf == NULL)
-		return -ENOMEM;
-
-	queue_parent_node = queue_sched_node->parent;
-	buf->src_teid = queue_parent_node->info.node_teid;
-	buf->dest_teid = dst_node->info.node_teid;
-	buf->txqs[0].q_teid = queue_sched_node->info.node_teid;
-	buf->txqs[0].txq_id = queue_id;
-
-	ret = ice_aq_move_recfg_lan_txq(hw, 1, true, false, false, false, 50,
-					NULL, buf, buf_size, &txqs_moved, NULL);
-	if (ret || txqs_moved == 0) {
-		PMD_DRV_LOG(ERR, "move lan queue %u failed", queue_id);
-		rte_free(buf);
-		return ICE_ERR_PARAM;
-	}
-
-	if (queue_parent_node->num_children > 0) {
-		queue_parent_node->num_children--;
-		queue_parent_node->children[queue_parent_node->num_children] = NULL;
-	} else {
-		PMD_DRV_LOG(ERR, "invalid children number %d for queue %u",
-			    queue_parent_node->num_children, queue_id);
-		rte_free(buf);
-		return ICE_ERR_PARAM;
-	}
-	dst_node->children[dst_node->num_children++] = queue_sched_node;
-	queue_sched_node->parent = dst_node;
-	ice_sched_query_elem(hw, queue_sched_node->info.node_teid, &queue_sched_node->info);
-
-	rte_free(buf);
-	return ret;
-}
-
 static int ice_set_node_rate(struct ice_hw *hw,
 			     struct ice_tm_node *tm_node,
 			     struct ice_sched_node *sched_node)
@@ -723,240 +673,191 @@ static int ice_cfg_hw_node(struct ice_hw *hw,
 	return 0;
 }
 
-static struct ice_sched_node *ice_get_vsi_node(struct ice_hw *hw)
+int
+ice_tm_setup_txq_node(struct ice_pf *pf, struct ice_hw *hw, uint16_t qid, uint32_t teid)
 {
-	struct ice_sched_node *node = hw->port_info->root;
-	uint32_t vsi_layer = hw->num_tx_sched_layers - ICE_VSI_LAYER_OFFSET;
-	uint32_t i;
+	struct ice_sched_node *hw_node = ice_sched_find_node_by_teid(hw->port_info->root, teid);
+	struct ice_tm_node *sw_node = find_node(pf->tm_conf.root, qid);
 
-	for (i = 0; i < vsi_layer; i++)
-		node = node->children[0];
-
-	return node;
-}
-
-static int ice_reset_noleaf_nodes(struct rte_eth_dev *dev)
-{
-	struct ice_pf *pf = ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
-	struct ice_hw *hw = ICE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
-	struct ice_sched_node *vsi_node = ice_get_vsi_node(hw);
-	struct ice_tm_node *root = pf->tm_conf.root;
-	uint32_t i;
-	int ret;
-
-	/* reset vsi_node */
-	ret = ice_set_node_rate(hw, NULL, vsi_node);
-	if (ret) {
-		PMD_DRV_LOG(ERR, "reset vsi node failed");
-		return ret;
-	}
-
-	if (root == NULL)
+	/* not configured in hierarchy */
+	if (sw_node == NULL)
 		return 0;
 
-	for (i = 0; i < root->reference_count; i++) {
-		struct ice_tm_node *tm_node = root->children[i];
+	sw_node->sched_node = hw_node;
 
-		if (tm_node->sched_node == NULL)
-			continue;
+	/* if the queue node has been put in the wrong place in hierarchy */
+	if (hw_node->parent != sw_node->parent->sched_node) {
+		struct ice_aqc_move_txqs_data *buf;
+		uint8_t txqs_moved = 0;
+		uint16_t buf_size = ice_struct_size(buf, txqs, 1);
+
+		buf = ice_malloc(hw, buf_size);
+		if (buf == NULL)
+			return -ENOMEM;
 
-		ret = ice_cfg_hw_node(hw, NULL, tm_node->sched_node);
-		if (ret) {
-			PMD_DRV_LOG(ERR, "reset queue group node %u failed", tm_node->id);
-			return ret;
+		struct ice_sched_node *parent = hw_node->parent;
+		struct ice_sched_node *new_parent = sw_node->parent->sched_node;
+		buf->src_teid = parent->info.node_teid;
+		buf->dest_teid = new_parent->info.node_teid;
+		buf->txqs[0].q_teid = hw_node->info.node_teid;
+		buf->txqs[0].txq_id = qid;
+
+		int ret = ice_aq_move_recfg_lan_txq(hw, 1, true, false, false, false, 50,
+						NULL, buf, buf_size, &txqs_moved, NULL);
+		if (ret || txqs_moved == 0) {
+			PMD_DRV_LOG(ERR, "move lan queue %u failed", qid);
+			ice_free(hw, buf);
+			return ICE_ERR_PARAM;
 		}
-		tm_node->sched_node = NULL;
+
+		/* now update the ice_sched_nodes to match physical layout */
+		new_parent->children[new_parent->num_children++] = hw_node;
+		hw_node->parent = new_parent;
+		ice_sched_query_elem(hw, hw_node->info.node_teid, &hw_node->info);
+		for (uint16_t i = 0; i < parent->num_children; i++)
+			if (parent->children[i] == hw_node) {
+				/* to remove, just overwrite the old node slot with the last ptr */
+				parent->children[i] = parent->children[--parent->num_children];
+				break;
+			}
 	}
 
-	return 0;
+	return ice_cfg_hw_node(hw, sw_node, hw_node);
 }
 
-static int ice_remove_leaf_nodes(struct rte_eth_dev *dev)
+/* from a given node, recursively deletes all the nodes that belong to that vsi.
+ * Any nodes which can't be deleted because they have children belonging to a different
+ * VSI, are now also adjusted to belong to that VSI also
+ */
+static int
+free_sched_node_recursive(struct ice_port_info *pi, const struct ice_sched_node *root,
+		struct ice_sched_node *node, uint8_t vsi_id)
 {
-	int ret = 0;
-	int i;
+	uint16_t i = 0;
 
-	for (i = 0; i < dev->data->nb_tx_queues; i++) {
-		ret = ice_tx_queue_stop(dev, i);
-		if (ret) {
-			PMD_DRV_LOG(ERR, "stop queue %u failed", i);
-			break;
+	while (i < node->num_children) {
+		if (node->children[i]->vsi_handle != vsi_id) {
+			i++;
+			continue;
 		}
+		free_sched_node_recursive(pi, root, node->children[i], vsi_id);
 	}
 
-	return ret;
-}
-
-static int ice_add_leaf_nodes(struct rte_eth_dev *dev)
-{
-	int ret = 0;
-	int i;
-
-	for (i = 0; i < dev->data->nb_tx_queues; i++) {
-		ret = ice_tx_queue_start(dev, i);
-		if (ret) {
-			PMD_DRV_LOG(ERR, "start queue %u failed", i);
-			break;
-		}
+	if (node != root) {
+		if (node->num_children == 0)
+			ice_free_sched_node(pi, node);
+		else
+			node->vsi_handle = node->children[0]->vsi_handle;
 	}
 
-	return ret;
+	return 0;
 }
 
-int ice_do_hierarchy_commit(struct rte_eth_dev *dev,
-			    int clear_on_fail,
-			    struct rte_tm_error *error)
+static int
+create_sched_node_recursive(struct ice_port_info *pi, struct ice_tm_node *sw_node,
+		struct ice_sched_node *hw_root, uint16_t *created)
 {
-	struct ice_pf *pf = ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
-	struct ice_hw *hw = ICE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
-	struct ice_tm_node *root;
-	struct ice_sched_node *vsi_node = NULL;
-	struct ice_sched_node *queue_node;
-	struct ice_tx_queue *txq;
-	int ret_val = 0;
-	uint32_t i;
-	uint32_t idx_vsi_child;
-	uint32_t idx_qg;
-	uint32_t nb_vsi_child;
-	uint32_t nb_qg;
-	uint32_t qid;
-	uint32_t q_teid;
-
-	/* remove leaf nodes */
-	ret_val = ice_remove_leaf_nodes(dev);
-	if (ret_val) {
-		error->type = RTE_TM_ERROR_TYPE_UNSPECIFIED;
-		PMD_DRV_LOG(ERR, "reset no-leaf nodes failed");
-		goto fail_clear;
-	}
-
-	/* reset no-leaf nodes. */
-	ret_val = ice_reset_noleaf_nodes(dev);
-	if (ret_val) {
-		error->type = RTE_TM_ERROR_TYPE_UNSPECIFIED;
-		PMD_DRV_LOG(ERR, "reset leaf nodes failed");
-		goto add_leaf;
-	}
-
-	/* config vsi node */
-	vsi_node = ice_get_vsi_node(hw);
-	root = pf->tm_conf.root;
-
-	ret_val = ice_set_node_rate(hw, root, vsi_node);
-	if (ret_val) {
-		error->type = RTE_TM_ERROR_TYPE_UNSPECIFIED;
-		PMD_DRV_LOG(ERR,
-			    "configure vsi node %u bandwidth failed",
-			    root->id);
-		goto add_leaf;
-	}
-
-	/* config queue group nodes */
-	nb_vsi_child = vsi_node->num_children;
-	nb_qg = vsi_node->children[0]->num_children;
-
-	idx_vsi_child = 0;
-	idx_qg = 0;
-
-	if (root == NULL)
-		goto commit;
-
-	for (i = 0; i < root->reference_count; i++) {
-		struct ice_tm_node *tm_node = root->children[i];
-		struct ice_tm_node *tm_child_node;
-		struct ice_sched_node *qgroup_sched_node =
-			vsi_node->children[idx_vsi_child]->children[idx_qg];
-		uint32_t j;
-
-		ret_val = ice_cfg_hw_node(hw, tm_node, qgroup_sched_node);
-		if (ret_val) {
-			error->type = RTE_TM_ERROR_TYPE_UNSPECIFIED;
-			PMD_DRV_LOG(ERR,
-				    "configure queue group node %u failed",
-				    tm_node->id);
-			goto reset_leaf;
-		}
-
-		for (j = 0; j < tm_node->reference_count; j++) {
-			tm_child_node = tm_node->children[j];
-			qid = tm_child_node->id;
-			ret_val = ice_tx_queue_start(dev, qid);
-			if (ret_val) {
-				error->type = RTE_TM_ERROR_TYPE_UNSPECIFIED;
-				PMD_DRV_LOG(ERR, "start queue %u failed", qid);
-				goto reset_leaf;
-			}
-			txq = dev->data->tx_queues[qid];
-			q_teid = txq->q_teid;
-			queue_node = ice_sched_get_node(hw->port_info, q_teid);
-			if (queue_node == NULL) {
-				error->type = RTE_TM_ERROR_TYPE_UNSPECIFIED;
-				PMD_DRV_LOG(ERR, "get queue %u node failed", qid);
-				goto reset_leaf;
-			}
-			if (queue_node->info.parent_teid != qgroup_sched_node->info.node_teid) {
-				ret_val = ice_move_recfg_lan_txq(dev, queue_node,
-								 qgroup_sched_node, qid);
-				if (ret_val) {
-					error->type = RTE_TM_ERROR_TYPE_UNSPECIFIED;
-					PMD_DRV_LOG(ERR, "move queue %u failed", qid);
-					goto reset_leaf;
-				}
-			}
-			ret_val = ice_cfg_hw_node(hw, tm_child_node, queue_node);
-			if (ret_val) {
-				error->type = RTE_TM_ERROR_TYPE_UNSPECIFIED;
-				PMD_DRV_LOG(ERR,
-					    "configure queue group node %u failed",
-					    tm_node->id);
-				goto reset_leaf;
-			}
-		}
-
-		idx_qg++;
-		if (idx_qg >= nb_qg) {
-			idx_qg = 0;
-			idx_vsi_child++;
+	struct ice_sched_node *parent = sw_node->sched_node;
+	uint32_t teid;
+	uint16_t added;
+
+	/* first create all child nodes */
+	for (uint16_t i = 0; i < sw_node->reference_count; i++) {
+		struct ice_tm_node *tm_node = sw_node->children[i];
+		int res = ice_sched_add_elems(pi, hw_root,
+				parent, parent->tx_sched_layer + 1,
+				1 /* num nodes */, &added, &teid,
+				NULL /* no pre-alloc */);
+		if (res != 0) {
+			PMD_DRV_LOG(ERR, "Error with ice_sched_add_elems, adding child node to teid %u\n",
+					parent->info.node_teid);
+			return -1;
 		}
-		if (idx_vsi_child >= nb_vsi_child) {
-			error->type = RTE_TM_ERROR_TYPE_UNSPECIFIED;
-			PMD_DRV_LOG(ERR, "too many queues");
-			goto reset_leaf;
+		struct ice_sched_node *hw_node = ice_sched_find_node_by_teid(parent, teid);
+		if (ice_cfg_hw_node(pi->hw, tm_node, hw_node) != 0) {
+			PMD_DRV_LOG(ERR, "Error configuring node %u at layer %u",
+					teid, parent->tx_sched_layer + 1);
+			return -1;
 		}
+		tm_node->sched_node = hw_node;
+		created[hw_node->tx_sched_layer]++;
 	}
 
-commit:
-	pf->tm_conf.committed = true;
-	pf->tm_conf.clear_on_fail = clear_on_fail;
+	/* if we have just created the child nodes in the q-group, i.e. last non-leaf layer,
+	 * then just return, rather than trying to create leaf nodes.
+	 * That is done later at queue start.
+	 */
+	if (sw_node->level + 2 == ice_get_leaf_level(pi->hw))
+		return 0;
 
-	return ret_val;
+	for (uint16_t i = 0; i < sw_node->reference_count; i++) {
+		if (sw_node->children[i]->reference_count == 0)
+			continue;
 
-reset_leaf:
-	ice_remove_leaf_nodes(dev);
-add_leaf:
-	ice_add_leaf_nodes(dev);
-	ice_reset_noleaf_nodes(dev);
-fail_clear:
-	/* clear all the traffic manager configuration */
-	if (clear_on_fail) {
-		ice_tm_conf_uninit(dev);
-		ice_tm_conf_init(dev);
+		if (create_sched_node_recursive(pi, sw_node->children[i], hw_root, created) < 0)
+			return -1;
 	}
-	return ret_val;
+	return 0;
 }
 
-static int ice_hierarchy_commit(struct rte_eth_dev *dev,
-				 int clear_on_fail,
-				 struct rte_tm_error *error)
+static int
+apply_topology_updates(struct rte_eth_dev *dev __rte_unused)
 {
+	return 0;
+}
+
+static int
+commit_new_hierarchy(struct rte_eth_dev *dev)
+{
+	struct ice_hw *hw = ICE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
 	struct ice_pf *pf = ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
+	struct ice_port_info *pi = hw->port_info;
+	struct ice_tm_node *sw_root = pf->tm_conf.root;
+	struct ice_sched_node *new_vsi_root = (pi->has_tc) ? pi->root->children[0] : pi->root;
+	uint16_t nodes_created_per_level[10] = {0}; /* counted per hw level, not per logical */
+	uint8_t q_lvl = ice_get_leaf_level(hw);
+	uint8_t qg_lvl = q_lvl - 1;
+
+	/* check if we have a previously applied topology */
+	if (sw_root->sched_node != NULL)
+		return apply_topology_updates(dev);
+
+	free_sched_node_recursive(pi, new_vsi_root, new_vsi_root, new_vsi_root->vsi_handle);
+
+	sw_root->sched_node = new_vsi_root;
+	if (create_sched_node_recursive(pi, sw_root, new_vsi_root, nodes_created_per_level) < 0)
+		return -1;
+	for (uint16_t i = 0; i < RTE_DIM(nodes_created_per_level); i++)
+		PMD_DRV_LOG(DEBUG, "Created %u nodes at level %u\n",
+				nodes_created_per_level[i], i);
+	hw->vsi_ctx[pf->main_vsi->idx]->sched.vsi_node[0] = new_vsi_root;
+
+	pf->main_vsi->nb_qps =
+			RTE_MIN(nodes_created_per_level[qg_lvl] * hw->max_children[qg_lvl],
+				hw->layer_info[q_lvl].max_device_nodes);
+
+	pf->tm_conf.committed = true; /* set flag to be checks on queue start */
+
+	return ice_alloc_lan_q_ctx(hw, 0, 0, pf->main_vsi->nb_qps);
+}
 
-	/* if device not started, simply set committed flag and return. */
-	if (!dev->data->dev_started) {
-		pf->tm_conf.committed = true;
-		pf->tm_conf.clear_on_fail = clear_on_fail;
-		return 0;
+static int
+ice_hierarchy_commit(struct rte_eth_dev *dev,
+				 int clear_on_fail,
+				 struct rte_tm_error *error)
+{
+	RTE_SET_USED(error);
+	/* TODO - commit should only be done to topology before start! */
+	if (dev->data->dev_started)
+		return -1;
+
+	uint64_t start = rte_rdtsc();
+	int ret = commit_new_hierarchy(dev);
+	if (ret < 0 && clear_on_fail) {
+		ice_tm_conf_uninit(dev);
+		ice_tm_conf_init(dev);
 	}
-
-	return ice_do_hierarchy_commit(dev, clear_on_fail, error);
+	uint64_t time = rte_rdtsc() - start;
+	PMD_DRV_LOG(DEBUG, "Time to apply hierarchy = %.1f\n", (float)time / rte_get_timer_hz());
+	return ret;
 }
-- 
2.43.0
^ permalink raw reply	[flat|nested] 76+ messages in thread
* [PATCH v2 15/15] net/ice: add minimal capability reporting API
  2024-08-07  9:46 ` [PATCH v2 00/15] Improve rte_tm support in ICE driver Bruce Richardson
                     ` (13 preceding siblings ...)
  2024-08-07  9:47   ` [PATCH v2 14/15] net/ice: enhance Tx scheduler hierarchy support Bruce Richardson
@ 2024-08-07  9:47   ` Bruce Richardson
  14 siblings, 0 replies; 76+ messages in thread
From: Bruce Richardson @ 2024-08-07  9:47 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson
Incomplete but reports number of available layers
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/ice/ice_ethdev.h |  1 +
 drivers/net/ice/ice_tm.c     | 17 +++++++++++++++++
 2 files changed, 18 insertions(+)
diff --git a/drivers/net/ice/ice_ethdev.h b/drivers/net/ice/ice_ethdev.h
index cb1a7e8e0d..6bebc511e4 100644
--- a/drivers/net/ice/ice_ethdev.h
+++ b/drivers/net/ice/ice_ethdev.h
@@ -682,6 +682,7 @@ int ice_rem_rss_cfg_wrap(struct ice_pf *pf, uint16_t vsi_id,
 			 struct ice_rss_hash_cfg *cfg);
 void ice_tm_conf_init(struct rte_eth_dev *dev);
 void ice_tm_conf_uninit(struct rte_eth_dev *dev);
+
 extern const struct rte_tm_ops ice_tm_ops;
 
 static inline int
diff --git a/drivers/net/ice/ice_tm.c b/drivers/net/ice/ice_tm.c
index a86943a5b2..d7def61756 100644
--- a/drivers/net/ice/ice_tm.c
+++ b/drivers/net/ice/ice_tm.c
@@ -33,8 +33,12 @@ static int ice_shaper_profile_add(struct rte_eth_dev *dev,
 static int ice_shaper_profile_del(struct rte_eth_dev *dev,
 				   uint32_t shaper_profile_id,
 				   struct rte_tm_error *error);
+static int ice_tm_capabilities_get(struct rte_eth_dev *dev,
+	struct rte_tm_capabilities *cap,
+	struct rte_tm_error *error);
 
 const struct rte_tm_ops ice_tm_ops = {
+	.capabilities_get = ice_tm_capabilities_get,
 	.shaper_profile_add = ice_shaper_profile_add,
 	.shaper_profile_delete = ice_shaper_profile_del,
 	.node_add = ice_tm_node_add,
@@ -861,3 +865,16 @@ ice_hierarchy_commit(struct rte_eth_dev *dev,
 	PMD_DRV_LOG(DEBUG, "Time to apply hierarchy = %.1f\n", (float)time / rte_get_timer_hz());
 	return ret;
 }
+
+static int
+ice_tm_capabilities_get(struct rte_eth_dev *dev, struct rte_tm_capabilities *cap,
+	struct rte_tm_error *error)
+{
+	struct ice_hw *hw = ICE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+	*cap = (struct rte_tm_capabilities){
+		.n_levels_max = hw->num_tx_sched_layers - hw->port_info->has_tc,
+	};
+	if (error)
+		error->type = RTE_TM_ERROR_TYPE_NONE;
+	return 0;
+}
-- 
2.43.0
^ permalink raw reply	[flat|nested] 76+ messages in thread
* [PATCH v3 00/16] Improve rte_tm support in ICE driver
  2024-08-07  9:33 [PATCH 00/15] Improve rte_tm support in ICE driver Bruce Richardson
                   ` (15 preceding siblings ...)
  2024-08-07  9:46 ` [PATCH v2 00/15] Improve rte_tm support in ICE driver Bruce Richardson
@ 2024-08-12 15:27 ` Bruce Richardson
  2024-08-12 15:28   ` [PATCH v3 01/16] net/ice: add traffic management node query function Bruce Richardson
                     ` (15 more replies)
  2024-10-23 16:27 ` [PATCH v4 0/5] Improve rte_tm support in ICE driver Bruce Richardson
                   ` (2 subsequent siblings)
  19 siblings, 16 replies; 76+ messages in thread
From: Bruce Richardson @ 2024-08-12 15:27 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson
This patchset expands the capabilities of the traffic management
support in the ICE driver. It allows the driver to support different
sizes of topologies, and support >256 queues and more than 3 hierarchy
layers.
---
Depends-on: series-32719 ("improve rte_rm APIs")
v3:
* remove/implement some code TODOs
* add patch 16 to set.
v2:
* Correct typo in commit log of one patch
* Add missing depends-on tag to the cover letter
Bruce Richardson (16):
  net/ice: add traffic management node query function
  net/ice: detect stopping a flow-director queue twice
  net/ice: improve Tx scheduler graph output
  net/ice: add option to choose DDP package file
  net/ice: add option to download scheduler topology
  net/ice/base: allow init without TC class sched nodes
  net/ice/base: set VSI index on newly created nodes
  net/ice/base: read VSI layer info from VSI
  net/ice/base: remove 255 limit on sched child nodes
  net/ice/base: optimize subtree searches
  net/ice/base: make functions non-static
  net/ice/base: remove flag checks before topology upload
  net/ice: limit the number of queues to sched capabilities
  net/ice: enhance Tx scheduler hierarchy support
  net/ice: add minimal capability reporting API
  net/ice: do early check on node level when adding
 doc/guides/nics/ice.rst          |   9 +
 drivers/net/ice/base/ice_ddp.c   |  51 +--
 drivers/net/ice/base/ice_ddp.h   |   4 +-
 drivers/net/ice/base/ice_sched.c |  56 +--
 drivers/net/ice/base/ice_sched.h |   8 +
 drivers/net/ice/base/ice_type.h  |   3 +-
 drivers/net/ice/ice_diagnose.c   | 196 ++++-------
 drivers/net/ice/ice_ethdev.c     |  92 +++--
 drivers/net/ice/ice_ethdev.h     |  18 +-
 drivers/net/ice/ice_rxtx.c       |  15 +
 drivers/net/ice/ice_tm.c         | 574 +++++++++++++++----------------
 11 files changed, 497 insertions(+), 529 deletions(-)
--
2.43.0
^ permalink raw reply	[flat|nested] 76+ messages in thread
* [PATCH v3 01/16] net/ice: add traffic management node query function
  2024-08-12 15:27 ` [PATCH v3 00/16] Improve rte_tm support in ICE driver Bruce Richardson
@ 2024-08-12 15:28   ` Bruce Richardson
  2024-08-12 15:28   ` [PATCH v3 02/16] net/ice: detect stopping a flow-director queue twice Bruce Richardson
                     ` (14 subsequent siblings)
  15 siblings, 0 replies; 76+ messages in thread
From: Bruce Richardson @ 2024-08-12 15:28 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson
Implement the new node querying function for the "ice" net driver.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/ice/ice_tm.c | 48 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)
diff --git a/drivers/net/ice/ice_tm.c b/drivers/net/ice/ice_tm.c
index 8a29a9e744..459446a6b0 100644
--- a/drivers/net/ice/ice_tm.c
+++ b/drivers/net/ice/ice_tm.c
@@ -17,6 +17,11 @@ static int ice_tm_node_add(struct rte_eth_dev *dev, uint32_t node_id,
 	      uint32_t weight, uint32_t level_id,
 	      const struct rte_tm_node_params *params,
 	      struct rte_tm_error *error);
+static int ice_node_query(const struct rte_eth_dev *dev, uint32_t node_id,
+		uint32_t *parent_node_id, uint32_t *priority,
+		uint32_t *weight, uint32_t *level_id,
+		struct rte_tm_node_params *params,
+		struct rte_tm_error *error);
 static int ice_tm_node_delete(struct rte_eth_dev *dev, uint32_t node_id,
 			    struct rte_tm_error *error);
 static int ice_node_type_get(struct rte_eth_dev *dev, uint32_t node_id,
@@ -35,6 +40,7 @@ const struct rte_tm_ops ice_tm_ops = {
 	.node_add = ice_tm_node_add,
 	.node_delete = ice_tm_node_delete,
 	.node_type_get = ice_node_type_get,
+	.node_query = ice_node_query,
 	.hierarchy_commit = ice_hierarchy_commit,
 };
 
@@ -219,6 +225,48 @@ ice_node_type_get(struct rte_eth_dev *dev, uint32_t node_id,
 	return 0;
 }
 
+static int
+ice_node_query(const struct rte_eth_dev *dev, uint32_t node_id,
+		uint32_t *parent_node_id, uint32_t *priority,
+		uint32_t *weight, uint32_t *level_id,
+		struct rte_tm_node_params *params,
+		struct rte_tm_error *error)
+{
+	struct ice_pf *pf = ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
+	struct ice_tm_node *tm_node;
+
+	if (node_id == RTE_TM_NODE_ID_NULL) {
+		error->type = RTE_TM_ERROR_TYPE_NODE_ID;
+		error->message = "invalid node id";
+		return -EINVAL;
+	}
+
+	/* check if the node id exists */
+	tm_node = find_node(pf->tm_conf.root, node_id);
+	if (!tm_node) {
+		error->type = RTE_TM_ERROR_TYPE_NODE_ID;
+		error->message = "no such node";
+		return -EEXIST;
+	}
+
+	if (parent_node_id != NULL)
+		*parent_node_id = tm_node->parent->id;
+
+	if (priority != NULL)
+		*priority = tm_node->priority;
+
+	if (weight != NULL)
+		*weight = tm_node->weight;
+
+	if (level_id != NULL)
+		*level_id = tm_node->level;
+
+	if (params != NULL)
+		*params = tm_node->params;
+
+	return 0;
+}
+
 static inline struct ice_tm_shaper_profile *
 ice_shaper_profile_search(struct rte_eth_dev *dev,
 			   uint32_t shaper_profile_id)
-- 
2.43.0
^ permalink raw reply	[flat|nested] 76+ messages in thread
* [PATCH v3 02/16] net/ice: detect stopping a flow-director queue twice
  2024-08-12 15:27 ` [PATCH v3 00/16] Improve rte_tm support in ICE driver Bruce Richardson
  2024-08-12 15:28   ` [PATCH v3 01/16] net/ice: add traffic management node query function Bruce Richardson
@ 2024-08-12 15:28   ` Bruce Richardson
  2024-08-12 15:28   ` [PATCH v3 03/16] net/ice: improve Tx scheduler graph output Bruce Richardson
                     ` (13 subsequent siblings)
  15 siblings, 0 replies; 76+ messages in thread
From: Bruce Richardson @ 2024-08-12 15:28 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson
If the flow-director queue is stopped at some point during the running
of an application, the shutdown procedure for the port issues an error
as it tries to stop the queue a second time, and fails to do so. We can
eliminate this error by setting the tail-register pointer to NULL on
stop, and checking for that condition in subsequent stop calls. Since
the register pointer is set on start, any restarting of the queue will
allow a stop call to progress as normal.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/ice/ice_rxtx.c | 5 +++++
 1 file changed, 5 insertions(+)
diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c
index f270498ed1..a150d28e73 100644
--- a/drivers/net/ice/ice_rxtx.c
+++ b/drivers/net/ice/ice_rxtx.c
@@ -1139,6 +1139,10 @@ ice_fdir_tx_queue_stop(struct rte_eth_dev *dev, uint16_t tx_queue_id)
 			    tx_queue_id);
 		return -EINVAL;
 	}
+	if (txq->qtx_tail == NULL) {
+		PMD_DRV_LOG(INFO, "TX queue %u not started\n", tx_queue_id);
+		return 0;
+	}
 	vsi = txq->vsi;
 
 	q_ids[0] = txq->reg_idx;
@@ -1153,6 +1157,7 @@ ice_fdir_tx_queue_stop(struct rte_eth_dev *dev, uint16_t tx_queue_id)
 	}
 
 	txq->tx_rel_mbufs(txq);
+	txq->qtx_tail = NULL;
 
 	return 0;
 }
-- 
2.43.0
^ permalink raw reply	[flat|nested] 76+ messages in thread
* [PATCH v3 03/16] net/ice: improve Tx scheduler graph output
  2024-08-12 15:27 ` [PATCH v3 00/16] Improve rte_tm support in ICE driver Bruce Richardson
  2024-08-12 15:28   ` [PATCH v3 01/16] net/ice: add traffic management node query function Bruce Richardson
  2024-08-12 15:28   ` [PATCH v3 02/16] net/ice: detect stopping a flow-director queue twice Bruce Richardson
@ 2024-08-12 15:28   ` Bruce Richardson
  2024-08-12 15:28   ` [PATCH v3 04/16] net/ice: add option to choose DDP package file Bruce Richardson
                     ` (12 subsequent siblings)
  15 siblings, 0 replies; 76+ messages in thread
From: Bruce Richardson @ 2024-08-12 15:28 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson
The function to dump the TX scheduler topology only adds to the chart
nodes connected to TX queues or for the flow director VSI. Change the
function to work recursively from the root node and thereby include all
scheduler nodes, whether in use or not, in the dump.
Also, improve the output of the Tx scheduler graphing function:
* Add VSI details to each node in graph
* When number of children is >16, skip middle nodes to reduce size of
  the graph, otherwise dot output is unviewable for large hierarchies
* For VSIs other than zero, use dot's clustering method to put those
  VSIs into subgraphs with borders
* For leaf nodes, display queue numbers for the any nodes assigned to
  ethdev NIC Tx queues
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/ice/ice_diagnose.c | 196 ++++++++++++---------------------
 1 file changed, 69 insertions(+), 127 deletions(-)
diff --git a/drivers/net/ice/ice_diagnose.c b/drivers/net/ice/ice_diagnose.c
index c357554707..623d84e37d 100644
--- a/drivers/net/ice/ice_diagnose.c
+++ b/drivers/net/ice/ice_diagnose.c
@@ -545,29 +545,15 @@ static void print_rl_profile(struct ice_aqc_rl_profile_elem *prof,
 	fprintf(stream, "\t\t\t\t\t</td>\n");
 }
 
-static
-void print_elem_type(FILE *stream, u8 type)
+static const char *
+get_elem_type(u8 type)
 {
-	switch (type) {
-	case 1:
-		fprintf(stream, "root");
-		break;
-	case 2:
-		fprintf(stream, "tc");
-		break;
-	case 3:
-		fprintf(stream, "se_generic");
-		break;
-	case 4:
-		fprintf(stream, "entry_point");
-		break;
-	case 5:
-		fprintf(stream, "leaf");
-		break;
-	default:
-		fprintf(stream, "%d", type);
-		break;
-	}
+	static const char * const ice_sched_node_types[] = {
+			"Undefined", "Root", "TC", "SE Generic", "SW Entry", "Leaf"
+	};
+	if (type < RTE_DIM(ice_sched_node_types))
+		return ice_sched_node_types[type];
+	return "*UNKNOWN*";
 }
 
 static
@@ -602,7 +588,9 @@ void print_priority_mode(FILE *stream, bool flag)
 }
 
 static
-void print_node(struct ice_aqc_txsched_elem_data *data,
+void print_node(struct ice_sched_node *node,
+		struct rte_eth_dev_data *ethdata,
+		struct ice_aqc_txsched_elem_data *data,
 		struct ice_aqc_rl_profile_elem *cir_prof,
 		struct ice_aqc_rl_profile_elem *eir_prof,
 		struct ice_aqc_rl_profile_elem *shared_prof,
@@ -613,17 +601,19 @@ void print_node(struct ice_aqc_txsched_elem_data *data,
 
 	fprintf(stream, "\t\t\t<table>\n");
 
-	fprintf(stream, "\t\t\t\t<tr>\n");
-	fprintf(stream, "\t\t\t\t\t<td> teid </td>\n");
-	fprintf(stream, "\t\t\t\t\t<td> %d </td>\n", data->node_teid);
-	fprintf(stream, "\t\t\t\t</tr>\n");
-
-	fprintf(stream, "\t\t\t\t<tr>\n");
-	fprintf(stream, "\t\t\t\t\t<td> type </td>\n");
-	fprintf(stream, "\t\t\t\t\t<td>");
-	print_elem_type(stream, data->data.elem_type);
-	fprintf(stream, "</td>\n");
-	fprintf(stream, "\t\t\t\t</tr>\n");
+	fprintf(stream, "\t\t\t\t<tr><td>teid</td><td>%d</td></tr>\n", data->node_teid);
+	fprintf(stream, "\t\t\t\t<tr><td>type</td><td>%s</td></tr>\n",
+			get_elem_type(data->data.elem_type));
+	fprintf(stream, "\t\t\t\t<tr><td>VSI</td><td>%u</td></tr>\n", node->vsi_handle);
+	if (data->data.elem_type == ICE_AQC_ELEM_TYPE_LEAF) {
+		for (uint16_t i = 0; i < ethdata->nb_tx_queues; i++) {
+			struct ice_tx_queue *q = ethdata->tx_queues[i];
+			if (q->q_teid == data->node_teid) {
+				fprintf(stream, "\t\t\t\t<tr><td>TXQ</td><td>%u</td></tr>\n", i);
+				break;
+			}
+		}
+	}
 
 	if (!detail)
 		goto brief;
@@ -705,8 +695,6 @@ void print_node(struct ice_aqc_txsched_elem_data *data,
 	fprintf(stream, "\t\tshape=plain\n");
 	fprintf(stream, "\t]\n");
 
-	if (data->parent_teid != 0xFFFFFFFF)
-		fprintf(stream, "\tNODE_%d -> NODE_%d\n", data->parent_teid, data->node_teid);
 }
 
 static
@@ -731,112 +719,92 @@ int query_rl_profile(struct ice_hw *hw,
 	return 0;
 }
 
-static
-int query_node(struct ice_hw *hw, uint32_t child, uint32_t *parent,
-	       uint8_t level, bool detail, FILE *stream)
+static int
+query_node(struct ice_hw *hw, struct rte_eth_dev_data *ethdata,
+		struct ice_sched_node *node, bool detail, FILE *stream)
 {
-	struct ice_aqc_txsched_elem_data data;
+	struct ice_aqc_txsched_elem_data *data = &node->info;
 	struct ice_aqc_rl_profile_elem cir_prof;
 	struct ice_aqc_rl_profile_elem eir_prof;
 	struct ice_aqc_rl_profile_elem shared_prof;
 	struct ice_aqc_rl_profile_elem *cp = NULL;
 	struct ice_aqc_rl_profile_elem *ep = NULL;
 	struct ice_aqc_rl_profile_elem *sp = NULL;
-	int status, ret;
-
-	status = ice_sched_query_elem(hw, child, &data);
-	if (status != ICE_SUCCESS) {
-		if (level == hw->num_tx_sched_layers) {
-			/* ignore the error when a queue has been stopped. */
-			PMD_DRV_LOG(WARNING, "Failed to query queue node %d.", child);
-			*parent = 0xffffffff;
-			return 0;
-		}
-		PMD_DRV_LOG(ERR, "Failed to query scheduling node %d.", child);
-		return -EINVAL;
-	}
-
-	*parent = data.parent_teid;
+	u8 level = node->tx_sched_layer;
+	int ret;
 
-	if (data.data.cir_bw.bw_profile_idx != 0) {
-		ret = query_rl_profile(hw, level, 0, data.data.cir_bw.bw_profile_idx, &cir_prof);
+	if (data->data.cir_bw.bw_profile_idx != 0) {
+		ret = query_rl_profile(hw, level, 0, data->data.cir_bw.bw_profile_idx, &cir_prof);
 
 		if (ret)
 			return ret;
 		cp = &cir_prof;
 	}
 
-	if (data.data.eir_bw.bw_profile_idx != 0) {
-		ret = query_rl_profile(hw, level, 1, data.data.eir_bw.bw_profile_idx, &eir_prof);
+	if (data->data.eir_bw.bw_profile_idx != 0) {
+		ret = query_rl_profile(hw, level, 1, data->data.eir_bw.bw_profile_idx, &eir_prof);
 
 		if (ret)
 			return ret;
 		ep = &eir_prof;
 	}
 
-	if (data.data.srl_id != 0) {
-		ret = query_rl_profile(hw, level, 2, data.data.srl_id, &shared_prof);
+	if (data->data.srl_id != 0) {
+		ret = query_rl_profile(hw, level, 2, data->data.srl_id, &shared_prof);
 
 		if (ret)
 			return ret;
 		sp = &shared_prof;
 	}
 
-	print_node(&data, cp, ep, sp, detail, stream);
+	print_node(node, ethdata, data, cp, ep, sp, detail, stream);
 
 	return 0;
 }
 
-static
-int query_nodes(struct ice_hw *hw,
-		uint32_t *children, int child_num,
-		uint32_t *parents, int *parent_num,
-		uint8_t level, bool detail,
-		FILE *stream)
+static int
+query_node_recursive(struct ice_hw *hw, struct rte_eth_dev_data *ethdata,
+		struct ice_sched_node *node, bool detail, FILE *stream)
 {
-	uint32_t parent;
-	int i;
-	int j;
-
-	*parent_num = 0;
-	for (i = 0; i < child_num; i++) {
-		bool exist = false;
-		int ret;
+	bool close = false;
+	if (node->parent != NULL && node->vsi_handle != node->parent->vsi_handle) {
+		fprintf(stream, "subgraph cluster_%u {\n", node->vsi_handle);
+		fprintf(stream, "\tlabel = \"VSI %u\";\n", node->vsi_handle);
+		close = true;
+	}
 
-		ret = query_node(hw, children[i], &parent, level, detail, stream);
-		if (ret)
-			return -EINVAL;
+	int ret = query_node(hw, ethdata, node, detail, stream);
+	if (ret != 0)
+		return ret;
 
-		for (j = 0; j < *parent_num; j++) {
-			if (parents[j] == parent) {
-				exist = true;
-				break;
-			}
+	for (uint16_t i = 0; i < node->num_children; i++) {
+		ret = query_node_recursive(hw, ethdata, node->children[i], detail, stream);
+		if (ret != 0)
+			return ret;
+		/* if we have a lot of nodes, skip a bunch in the middle */
+		if (node->num_children > 16 && i == 2) {
+			uint16_t inc = node->num_children - 5;
+			fprintf(stream, "\tn%d_children [label=\"... +%d child nodes ...\"];\n",
+					node->info.node_teid, inc);
+			fprintf(stream, "\tNODE_%d -> n%d_children;\n",
+					node->info.node_teid, node->info.node_teid);
+			i += inc;
 		}
-
-		if (!exist && parent != 0xFFFFFFFF)
-			parents[(*parent_num)++] = parent;
 	}
+	if (close)
+		fprintf(stream, "}\n");
+	if (node->info.parent_teid != 0xFFFFFFFF)
+		fprintf(stream, "\tNODE_%d -> NODE_%d\n",
+				node->info.parent_teid, node->info.node_teid);
 
 	return 0;
 }
 
-int rte_pmd_ice_dump_txsched(uint16_t port, bool detail, FILE *stream)
+int
+rte_pmd_ice_dump_txsched(uint16_t port, bool detail, FILE *stream)
 {
 	struct rte_eth_dev *dev;
 	struct ice_hw *hw;
-	struct ice_pf *pf;
-	struct ice_q_ctx *q_ctx;
-	uint16_t q_num;
-	uint16_t i;
-	struct ice_tx_queue *txq;
-	uint32_t buf1[256];
-	uint32_t buf2[256];
-	uint32_t *children = buf1;
-	uint32_t *parents = buf2;
-	int child_num = 0;
-	int parent_num = 0;
-	uint8_t level;
 
 	RTE_ETH_VALID_PORTID_OR_ERR_RET(port, -ENODEV);
 
@@ -846,35 +814,9 @@ int rte_pmd_ice_dump_txsched(uint16_t port, bool detail, FILE *stream)
 
 	dev = &rte_eth_devices[port];
 	hw = ICE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
-	pf = ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
-	level = hw->num_tx_sched_layers;
-
-	q_num = dev->data->nb_tx_queues;
-
-	/* main vsi */
-	for (i = 0; i < q_num; i++) {
-		txq = dev->data->tx_queues[i];
-		q_ctx = ice_get_lan_q_ctx(hw, txq->vsi->idx, 0, i);
-		children[child_num++] = q_ctx->q_teid;
-	}
-
-	/* fdir vsi */
-	q_ctx = ice_get_lan_q_ctx(hw, pf->fdir.fdir_vsi->idx, 0, 0);
-	children[child_num++] = q_ctx->q_teid;
 
 	fprintf(stream, "digraph tx_sched {\n");
-	while (child_num > 0) {
-		int ret;
-		ret = query_nodes(hw, children, child_num,
-				  parents, &parent_num,
-				  level, detail, stream);
-		if (ret)
-			return ret;
-
-		children = parents;
-		child_num = parent_num;
-		level--;
-	}
+	query_node_recursive(hw, dev->data, hw->port_info->root, detail, stream);
 	fprintf(stream, "}\n");
 
 	return 0;
-- 
2.43.0
^ permalink raw reply	[flat|nested] 76+ messages in thread
* [PATCH v3 04/16] net/ice: add option to choose DDP package file
  2024-08-12 15:27 ` [PATCH v3 00/16] Improve rte_tm support in ICE driver Bruce Richardson
                     ` (2 preceding siblings ...)
  2024-08-12 15:28   ` [PATCH v3 03/16] net/ice: improve Tx scheduler graph output Bruce Richardson
@ 2024-08-12 15:28   ` Bruce Richardson
  2024-08-12 15:28   ` [PATCH v3 05/16] net/ice: add option to download scheduler topology Bruce Richardson
                     ` (11 subsequent siblings)
  15 siblings, 0 replies; 76+ messages in thread
From: Bruce Richardson @ 2024-08-12 15:28 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson
The "Dynamic Device Personalization" package is loaded at initialization
time by the driver, but the specific package file loaded depends upon
what package file is found first by searching through a hard-coded list
of firmware paths. To enable greater control over the package loading,
we can add a device option to choose a specific DDP package file to
load.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 doc/guides/nics/ice.rst      |  9 +++++++++
 drivers/net/ice/ice_ethdev.c | 34 ++++++++++++++++++++++++++++++++++
 drivers/net/ice/ice_ethdev.h |  1 +
 3 files changed, 44 insertions(+)
diff --git a/doc/guides/nics/ice.rst b/doc/guides/nics/ice.rst
index ae975d19ad..58ccfbd1a5 100644
--- a/doc/guides/nics/ice.rst
+++ b/doc/guides/nics/ice.rst
@@ -108,6 +108,15 @@ Runtime Configuration
 
     -a 80:00.0,default-mac-disable=1
 
+- ``DDP Package File``
+
+  Rather than have the driver search for the DDP package to load,
+  or to override what package is used,
+  the ``ddp_pkg_file`` option can be used to provide the path to a specific package file.
+  For example::
+
+    -a 80:00.0,ddp_pkg_file=/path/to/ice-version.pkg
+
 - ``Protocol extraction for per queue``
 
   Configure the RX queues to do protocol extraction into mbuf for protocol
diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c
index 304f959b7e..3e7ceda9ce 100644
--- a/drivers/net/ice/ice_ethdev.c
+++ b/drivers/net/ice/ice_ethdev.c
@@ -36,6 +36,7 @@
 #define ICE_ONE_PPS_OUT_ARG       "pps_out"
 #define ICE_RX_LOW_LATENCY_ARG    "rx_low_latency"
 #define ICE_MBUF_CHECK_ARG       "mbuf_check"
+#define ICE_DDP_FILENAME          "ddp_pkg_file"
 
 #define ICE_CYCLECOUNTER_MASK  0xffffffffffffffffULL
 
@@ -52,6 +53,7 @@ static const char * const ice_valid_args[] = {
 	ICE_RX_LOW_LATENCY_ARG,
 	ICE_DEFAULT_MAC_DISABLE,
 	ICE_MBUF_CHECK_ARG,
+	ICE_DDP_FILENAME,
 	NULL
 };
 
@@ -692,6 +694,18 @@ handle_field_name_arg(__rte_unused const char *key, const char *value,
 	return 0;
 }
 
+static int
+handle_ddp_filename_arg(__rte_unused const char *key, const char *value, void *name_args)
+{
+	const char **filename = name_args;
+	if (strlen(value) >= ICE_MAX_PKG_FILENAME_SIZE) {
+		PMD_DRV_LOG(ERR, "The DDP package filename is too long : '%s'", value);
+		return -1;
+	}
+	*filename = strdup(value);
+	return 0;
+}
+
 static void
 ice_check_proto_xtr_support(struct ice_hw *hw)
 {
@@ -1882,6 +1896,16 @@ int ice_load_pkg(struct ice_adapter *adapter, bool use_dsn, uint64_t dsn)
 	size_t bufsz;
 	int err;
 
+	if (adapter->devargs.ddp_filename != NULL) {
+		strlcpy(pkg_file, adapter->devargs.ddp_filename, sizeof(pkg_file));
+		if (rte_firmware_read(pkg_file, &buf, &bufsz) == 0) {
+			goto load_fw;
+		} else {
+			PMD_INIT_LOG(ERR, "Cannot load DDP file: %s\n", pkg_file);
+			return -1;
+		}
+	}
+
 	if (!use_dsn)
 		goto no_dsn;
 
@@ -2216,6 +2240,13 @@ static int ice_parse_devargs(struct rte_eth_dev *dev)
 
 	ret = rte_kvargs_process(kvlist, ICE_RX_LOW_LATENCY_ARG,
 				 &parse_bool, &ad->devargs.rx_low_latency);
+	if (ret)
+		goto bail;
+
+	ret = rte_kvargs_process(kvlist, ICE_DDP_FILENAME,
+				 &handle_ddp_filename_arg, &ad->devargs.ddp_filename);
+	if (ret)
+		goto bail;
 
 bail:
 	rte_kvargs_free(kvlist);
@@ -2689,6 +2720,8 @@ ice_dev_close(struct rte_eth_dev *dev)
 	ice_free_hw_tbls(hw);
 	rte_free(hw->port_info);
 	hw->port_info = NULL;
+	free((void *)(uintptr_t)ad->devargs.ddp_filename);
+	ad->devargs.ddp_filename = NULL;
 	ice_shutdown_all_ctrlq(hw, true);
 	rte_free(pf->proto_xtr);
 	pf->proto_xtr = NULL;
@@ -6981,6 +7014,7 @@ RTE_PMD_REGISTER_PARAM_STRING(net_ice,
 			      ICE_PROTO_XTR_ARG "=[queue:]<vlan|ipv4|ipv6|ipv6_flow|tcp|ip_offset>"
 			      ICE_SAFE_MODE_SUPPORT_ARG "=<0|1>"
 			      ICE_DEFAULT_MAC_DISABLE "=<0|1>"
+			      ICE_DDP_FILENAME "=</path/to/file>"
 			      ICE_RX_LOW_LATENCY_ARG "=<0|1>");
 
 RTE_LOG_REGISTER_SUFFIX(ice_logtype_init, init, NOTICE);
diff --git a/drivers/net/ice/ice_ethdev.h b/drivers/net/ice/ice_ethdev.h
index 3ea9f37dc8..c211b5b9cc 100644
--- a/drivers/net/ice/ice_ethdev.h
+++ b/drivers/net/ice/ice_ethdev.h
@@ -568,6 +568,7 @@ struct ice_devargs {
 	/* Name of the field. */
 	char xtr_field_name[RTE_MBUF_DYN_NAMESIZE];
 	uint64_t mbuf_check;
+	const char *ddp_filename;
 };
 
 /**
-- 
2.43.0
^ permalink raw reply	[flat|nested] 76+ messages in thread
* [PATCH v3 05/16] net/ice: add option to download scheduler topology
  2024-08-12 15:27 ` [PATCH v3 00/16] Improve rte_tm support in ICE driver Bruce Richardson
                     ` (3 preceding siblings ...)
  2024-08-12 15:28   ` [PATCH v3 04/16] net/ice: add option to choose DDP package file Bruce Richardson
@ 2024-08-12 15:28   ` Bruce Richardson
  2024-08-12 15:28   ` [PATCH v3 06/16] net/ice/base: allow init without TC class sched nodes Bruce Richardson
                     ` (10 subsequent siblings)
  15 siblings, 0 replies; 76+ messages in thread
From: Bruce Richardson @ 2024-08-12 15:28 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson
The DDP package file being loaded at init time may contain an
alternative Tx Scheduler topology in it. Add driver option to load this
topology at init time.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/ice/base/ice_ddp.c | 18 +++++++++++++++---
 drivers/net/ice/base/ice_ddp.h |  4 ++--
 drivers/net/ice/ice_ethdev.c   | 24 +++++++++++++++---------
 drivers/net/ice/ice_ethdev.h   |  1 +
 4 files changed, 33 insertions(+), 14 deletions(-)
diff --git a/drivers/net/ice/base/ice_ddp.c b/drivers/net/ice/base/ice_ddp.c
index 24506dfaea..e6c42c5274 100644
--- a/drivers/net/ice/base/ice_ddp.c
+++ b/drivers/net/ice/base/ice_ddp.c
@@ -1326,7 +1326,7 @@ ice_fill_hw_ptype(struct ice_hw *hw)
  * ice_copy_and_init_pkg() instead of directly calling ice_init_pkg() in this
  * case.
  */
-enum ice_ddp_state ice_init_pkg(struct ice_hw *hw, u8 *buf, u32 len)
+enum ice_ddp_state ice_init_pkg(struct ice_hw *hw, u8 *buf, u32 len, bool load_sched)
 {
 	bool already_loaded = false;
 	enum ice_ddp_state state;
@@ -1344,6 +1344,18 @@ enum ice_ddp_state ice_init_pkg(struct ice_hw *hw, u8 *buf, u32 len)
 		return state;
 	}
 
+	if (load_sched) {
+		enum ice_status res = ice_cfg_tx_topo(hw, buf, len);
+		if (res != ICE_SUCCESS) {
+			ice_debug(hw, ICE_DBG_INIT, "failed to apply sched topology  (err: %d)\n",
+					res);
+			return ICE_DDP_PKG_ERR;
+		}
+		ice_debug(hw, ICE_DBG_INIT, "Topology download successful, reinitializing device\n");
+		ice_deinit_hw(hw);
+		ice_init_hw(hw);
+	}
+
 	/* initialize package info */
 	state = ice_init_pkg_info(hw, pkg);
 	if (state)
@@ -1416,7 +1428,7 @@ enum ice_ddp_state ice_init_pkg(struct ice_hw *hw, u8 *buf, u32 len)
  * related routines.
  */
 enum ice_ddp_state
-ice_copy_and_init_pkg(struct ice_hw *hw, const u8 *buf, u32 len)
+ice_copy_and_init_pkg(struct ice_hw *hw, const u8 *buf, u32 len, bool load_sched)
 {
 	enum ice_ddp_state state;
 	u8 *buf_copy;
@@ -1426,7 +1438,7 @@ ice_copy_and_init_pkg(struct ice_hw *hw, const u8 *buf, u32 len)
 
 	buf_copy = (u8 *)ice_memdup(hw, buf, len, ICE_NONDMA_TO_NONDMA);
 
-	state = ice_init_pkg(hw, buf_copy, len);
+	state = ice_init_pkg(hw, buf_copy, len, load_sched);
 	if (!ice_is_init_pkg_successful(state)) {
 		/* Free the copy, since we failed to initialize the package */
 		ice_free(hw, buf_copy);
diff --git a/drivers/net/ice/base/ice_ddp.h b/drivers/net/ice/base/ice_ddp.h
index 5761920207..2feba2e91d 100644
--- a/drivers/net/ice/base/ice_ddp.h
+++ b/drivers/net/ice/base/ice_ddp.h
@@ -451,9 +451,9 @@ ice_pkg_enum_entry(struct ice_seg *ice_seg, struct ice_pkg_enum *state,
 void *
 ice_pkg_enum_section(struct ice_seg *ice_seg, struct ice_pkg_enum *state,
 		     u32 sect_type);
-enum ice_ddp_state ice_init_pkg(struct ice_hw *hw, u8 *buff, u32 len);
+enum ice_ddp_state ice_init_pkg(struct ice_hw *hw, u8 *buff, u32 len, bool load_sched);
 enum ice_ddp_state
-ice_copy_and_init_pkg(struct ice_hw *hw, const u8 *buf, u32 len);
+ice_copy_and_init_pkg(struct ice_hw *hw, const u8 *buf, u32 len, bool load_sched);
 bool ice_is_init_pkg_successful(enum ice_ddp_state state);
 void ice_free_seg(struct ice_hw *hw);
 
diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c
index 3e7ceda9ce..0d2445a317 100644
--- a/drivers/net/ice/ice_ethdev.c
+++ b/drivers/net/ice/ice_ethdev.c
@@ -37,6 +37,7 @@
 #define ICE_RX_LOW_LATENCY_ARG    "rx_low_latency"
 #define ICE_MBUF_CHECK_ARG       "mbuf_check"
 #define ICE_DDP_FILENAME          "ddp_pkg_file"
+#define ICE_DDP_LOAD_SCHED        "ddp_load_sched_topo"
 
 #define ICE_CYCLECOUNTER_MASK  0xffffffffffffffffULL
 
@@ -54,6 +55,7 @@ static const char * const ice_valid_args[] = {
 	ICE_DEFAULT_MAC_DISABLE,
 	ICE_MBUF_CHECK_ARG,
 	ICE_DDP_FILENAME,
+	ICE_DDP_LOAD_SCHED,
 	NULL
 };
 
@@ -1938,7 +1940,7 @@ int ice_load_pkg(struct ice_adapter *adapter, bool use_dsn, uint64_t dsn)
 load_fw:
 	PMD_INIT_LOG(DEBUG, "DDP package name: %s", pkg_file);
 
-	err = ice_copy_and_init_pkg(hw, buf, bufsz);
+	err = ice_copy_and_init_pkg(hw, buf, bufsz, adapter->devargs.ddp_load_sched);
 	if (!ice_is_init_pkg_successful(err)) {
 		PMD_INIT_LOG(ERR, "ice_copy_and_init_hw failed: %d\n", err);
 		free(buf);
@@ -1971,19 +1973,18 @@ static int
 parse_bool(const char *key, const char *value, void *args)
 {
 	int *i = (int *)args;
-	char *end;
-	int num;
 
-	num = strtoul(value, &end, 10);
-
-	if (num != 0 && num != 1) {
-		PMD_DRV_LOG(WARNING, "invalid value:\"%s\" for key:\"%s\", "
-			"value must be 0 or 1",
+	if (value == NULL || value[0] == '\0') {
+		PMD_DRV_LOG(WARNING, "key:\"%s\", requires a value, which must be 0 or 1", key);
+		return -1;
+	}
+	if (value[1] != '\0' || (value[0] != '0' && value[0] != '1')) {
+		PMD_DRV_LOG(WARNING, "invalid value:\"%s\" for key:\"%s\", value must be 0 or 1",
 			value, key);
 		return -1;
 	}
 
-	*i = num;
+	*i = value[0] - '0';
 	return 0;
 }
 
@@ -2248,6 +2249,10 @@ static int ice_parse_devargs(struct rte_eth_dev *dev)
 	if (ret)
 		goto bail;
 
+	ret = rte_kvargs_process(kvlist, ICE_DDP_LOAD_SCHED,
+				 &parse_bool, &ad->devargs.ddp_load_sched);
+	if (ret)
+		goto bail;
 bail:
 	rte_kvargs_free(kvlist);
 	return ret;
@@ -7014,6 +7019,7 @@ RTE_PMD_REGISTER_PARAM_STRING(net_ice,
 			      ICE_PROTO_XTR_ARG "=[queue:]<vlan|ipv4|ipv6|ipv6_flow|tcp|ip_offset>"
 			      ICE_SAFE_MODE_SUPPORT_ARG "=<0|1>"
 			      ICE_DEFAULT_MAC_DISABLE "=<0|1>"
+			      ICE_DDP_LOAD_SCHED "=<0|1>"
 			      ICE_DDP_FILENAME "=</path/to/file>"
 			      ICE_RX_LOW_LATENCY_ARG "=<0|1>");
 
diff --git a/drivers/net/ice/ice_ethdev.h b/drivers/net/ice/ice_ethdev.h
index c211b5b9cc..f31addb122 100644
--- a/drivers/net/ice/ice_ethdev.h
+++ b/drivers/net/ice/ice_ethdev.h
@@ -563,6 +563,7 @@ struct ice_devargs {
 	uint8_t proto_xtr[ICE_MAX_QUEUE_NUM];
 	uint8_t pin_idx;
 	uint8_t pps_out_ena;
+	int ddp_load_sched;
 	int xtr_field_offs;
 	uint8_t xtr_flag_offs[PROTO_XTR_MAX];
 	/* Name of the field. */
-- 
2.43.0
^ permalink raw reply	[flat|nested] 76+ messages in thread
* [PATCH v3 06/16] net/ice/base: allow init without TC class sched nodes
  2024-08-12 15:27 ` [PATCH v3 00/16] Improve rte_tm support in ICE driver Bruce Richardson
                     ` (4 preceding siblings ...)
  2024-08-12 15:28   ` [PATCH v3 05/16] net/ice: add option to download scheduler topology Bruce Richardson
@ 2024-08-12 15:28   ` Bruce Richardson
  2024-08-12 15:28   ` [PATCH v3 07/16] net/ice/base: set VSI index on newly created nodes Bruce Richardson
                     ` (9 subsequent siblings)
  15 siblings, 0 replies; 76+ messages in thread
From: Bruce Richardson @ 2024-08-12 15:28 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson
If DCB support is disabled via DDP image, there will not be any traffic
class (TC) nodes in the scheduler tree immediately above the root level.
To allow the driver to work with this scenario, we allow use of the root
node as a dummy TC0 node in case where there are no TC nodes in the
tree. For use of any other TC other than 0 (used by default in the
driver), existing behaviour of returning NULL pointer is maintained.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/ice/base/ice_sched.c | 6 ++++++
 drivers/net/ice/base/ice_type.h  | 1 +
 2 files changed, 7 insertions(+)
diff --git a/drivers/net/ice/base/ice_sched.c b/drivers/net/ice/base/ice_sched.c
index 373c32a518..f75e5ae599 100644
--- a/drivers/net/ice/base/ice_sched.c
+++ b/drivers/net/ice/base/ice_sched.c
@@ -292,6 +292,10 @@ struct ice_sched_node *ice_sched_get_tc_node(struct ice_port_info *pi, u8 tc)
 
 	if (!pi || !pi->root)
 		return NULL;
+	/* if no TC nodes, use root as TC node 0 */
+	if (pi->has_tc == 0)
+		return tc == 0 ? pi->root : NULL;
+
 	for (i = 0; i < pi->root->num_children; i++)
 		if (pi->root->children[i]->tc_num == tc)
 			return pi->root->children[i];
@@ -1306,6 +1310,8 @@ int ice_sched_init_port(struct ice_port_info *pi)
 			    ICE_AQC_ELEM_TYPE_ENTRY_POINT)
 				hw->sw_entry_point_layer = j;
 
+			if (buf[0].generic[j].data.elem_type == ICE_AQC_ELEM_TYPE_TC)
+				pi->has_tc = 1;
 			status = ice_sched_add_node(pi, j, &buf[i].generic[j], NULL);
 			if (status)
 				goto err_init_port;
diff --git a/drivers/net/ice/base/ice_type.h b/drivers/net/ice/base/ice_type.h
index 598a80155b..a70e4a8afa 100644
--- a/drivers/net/ice/base/ice_type.h
+++ b/drivers/net/ice/base/ice_type.h
@@ -1260,6 +1260,7 @@ struct ice_port_info {
 	struct ice_qos_cfg qos_cfg;
 	u8 is_vf:1;
 	u8 is_custom_tx_enabled:1;
+	u8 has_tc:1;
 };
 
 struct ice_switch_info {
-- 
2.43.0
^ permalink raw reply	[flat|nested] 76+ messages in thread
* [PATCH v3 07/16] net/ice/base: set VSI index on newly created nodes
  2024-08-12 15:27 ` [PATCH v3 00/16] Improve rte_tm support in ICE driver Bruce Richardson
                     ` (5 preceding siblings ...)
  2024-08-12 15:28   ` [PATCH v3 06/16] net/ice/base: allow init without TC class sched nodes Bruce Richardson
@ 2024-08-12 15:28   ` Bruce Richardson
  2024-08-12 15:28   ` [PATCH v3 08/16] net/ice/base: read VSI layer info from VSI Bruce Richardson
                     ` (8 subsequent siblings)
  15 siblings, 0 replies; 76+ messages in thread
From: Bruce Richardson @ 2024-08-12 15:28 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson
The ice_sched_node type has got a field for the vsi to which the node
belongs. This field was not getting set in "ice_sched_add_node", so add
a line configuring this field for each node from its parent node.
Similarly, when searching for a qgroup node, we can check for each node
that the VSI information is correct.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/ice/base/ice_sched.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ice/base/ice_sched.c b/drivers/net/ice/base/ice_sched.c
index f75e5ae599..f6dc5ae173 100644
--- a/drivers/net/ice/base/ice_sched.c
+++ b/drivers/net/ice/base/ice_sched.c
@@ -200,6 +200,7 @@ ice_sched_add_node(struct ice_port_info *pi, u8 layer,
 	node->in_use = true;
 	node->parent = parent;
 	node->tx_sched_layer = layer;
+	node->vsi_handle = parent->vsi_handle;
 	parent->children[parent->num_children++] = node;
 	node->info = elem;
 	return 0;
@@ -1581,7 +1582,7 @@ ice_sched_get_free_qparent(struct ice_port_info *pi, u16 vsi_handle, u8 tc,
 		/* make sure the qgroup node is part of the VSI subtree */
 		if (ice_sched_find_node_in_subtree(pi->hw, vsi_node, qgrp_node))
 			if (qgrp_node->num_children < max_children &&
-			    qgrp_node->owner == owner)
+			    qgrp_node->owner == owner && qgrp_node->vsi_handle == vsi_handle)
 				break;
 		qgrp_node = qgrp_node->sibling;
 	}
-- 
2.43.0
^ permalink raw reply	[flat|nested] 76+ messages in thread
* [PATCH v3 08/16] net/ice/base: read VSI layer info from VSI
  2024-08-12 15:27 ` [PATCH v3 00/16] Improve rte_tm support in ICE driver Bruce Richardson
                     ` (6 preceding siblings ...)
  2024-08-12 15:28   ` [PATCH v3 07/16] net/ice/base: set VSI index on newly created nodes Bruce Richardson
@ 2024-08-12 15:28   ` Bruce Richardson
  2024-08-12 15:28   ` [PATCH v3 09/16] net/ice/base: remove 255 limit on sched child nodes Bruce Richardson
                     ` (7 subsequent siblings)
  15 siblings, 0 replies; 76+ messages in thread
From: Bruce Richardson @ 2024-08-12 15:28 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson
Rather than computing from the number of HW layers the layer of the VSI,
we can instead just read that info from the VSI node itself. This allows
the layer to be changed at runtime.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/ice/base/ice_sched.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ice/base/ice_sched.c b/drivers/net/ice/base/ice_sched.c
index f6dc5ae173..e398984bf2 100644
--- a/drivers/net/ice/base/ice_sched.c
+++ b/drivers/net/ice/base/ice_sched.c
@@ -1559,7 +1559,6 @@ ice_sched_get_free_qparent(struct ice_port_info *pi, u16 vsi_handle, u8 tc,
 	u16 max_children;
 
 	qgrp_layer = ice_sched_get_qgrp_layer(pi->hw);
-	vsi_layer = ice_sched_get_vsi_layer(pi->hw);
 	max_children = pi->hw->max_children[qgrp_layer];
 
 	vsi_ctx = ice_get_vsi_ctx(pi->hw, vsi_handle);
@@ -1569,6 +1568,7 @@ ice_sched_get_free_qparent(struct ice_port_info *pi, u16 vsi_handle, u8 tc,
 	/* validate invalid VSI ID */
 	if (!vsi_node)
 		return NULL;
+	vsi_layer = vsi_node->tx_sched_layer;
 
 	/* If the queue group and vsi layer are same then queues
 	 * are all attached directly to VSI
-- 
2.43.0
^ permalink raw reply	[flat|nested] 76+ messages in thread
* [PATCH v3 09/16] net/ice/base: remove 255 limit on sched child nodes
  2024-08-12 15:27 ` [PATCH v3 00/16] Improve rte_tm support in ICE driver Bruce Richardson
                     ` (7 preceding siblings ...)
  2024-08-12 15:28   ` [PATCH v3 08/16] net/ice/base: read VSI layer info from VSI Bruce Richardson
@ 2024-08-12 15:28   ` Bruce Richardson
  2024-08-12 15:28   ` [PATCH v3 10/16] net/ice/base: optimize subtree searches Bruce Richardson
                     ` (6 subsequent siblings)
  15 siblings, 0 replies; 76+ messages in thread
From: Bruce Richardson @ 2024-08-12 15:28 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson
The Tx scheduler in the ice driver can be configured to have large
numbers of child nodes at a given layer, but the driver code implicitly
limited the number of nodes to 255 by using a u8 datatype for the number
of children. Increase this to a 16-bit value throughout the code.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/ice/base/ice_sched.c | 25 ++++++++++++++-----------
 drivers/net/ice/base/ice_type.h  |  2 +-
 2 files changed, 15 insertions(+), 12 deletions(-)
diff --git a/drivers/net/ice/base/ice_sched.c b/drivers/net/ice/base/ice_sched.c
index e398984bf2..be13833e1e 100644
--- a/drivers/net/ice/base/ice_sched.c
+++ b/drivers/net/ice/base/ice_sched.c
@@ -289,7 +289,7 @@ ice_sched_get_first_node(struct ice_port_info *pi,
  */
 struct ice_sched_node *ice_sched_get_tc_node(struct ice_port_info *pi, u8 tc)
 {
-	u8 i;
+	u16 i;
 
 	if (!pi || !pi->root)
 		return NULL;
@@ -316,7 +316,7 @@ void ice_free_sched_node(struct ice_port_info *pi, struct ice_sched_node *node)
 {
 	struct ice_sched_node *parent;
 	struct ice_hw *hw = pi->hw;
-	u8 i, j;
+	u16 i, j;
 
 	/* Free the children before freeing up the parent node
 	 * The parent array is updated below and that shifts the nodes
@@ -1473,7 +1473,7 @@ bool
 ice_sched_find_node_in_subtree(struct ice_hw *hw, struct ice_sched_node *base,
 			       struct ice_sched_node *node)
 {
-	u8 i;
+	u16 i;
 
 	for (i = 0; i < base->num_children; i++) {
 		struct ice_sched_node *child = base->children[i];
@@ -1510,7 +1510,7 @@ ice_sched_get_free_qgrp(struct ice_port_info *pi,
 			struct ice_sched_node *qgrp_node, u8 owner)
 {
 	struct ice_sched_node *min_qgrp;
-	u8 min_children;
+	u16 min_children;
 
 	if (!qgrp_node)
 		return qgrp_node;
@@ -2070,7 +2070,7 @@ static void ice_sched_rm_agg_vsi_info(struct ice_port_info *pi, u16 vsi_handle)
  */
 static bool ice_sched_is_leaf_node_present(struct ice_sched_node *node)
 {
-	u8 i;
+	u16 i;
 
 	for (i = 0; i < node->num_children; i++)
 		if (ice_sched_is_leaf_node_present(node->children[i]))
@@ -2105,7 +2105,7 @@ ice_sched_rm_vsi_cfg(struct ice_port_info *pi, u16 vsi_handle, u8 owner)
 
 	ice_for_each_traffic_class(i) {
 		struct ice_sched_node *vsi_node, *tc_node;
-		u8 j = 0;
+		u16 j = 0;
 
 		tc_node = ice_sched_get_tc_node(pi, i);
 		if (!tc_node)
@@ -2173,7 +2173,7 @@ int ice_rm_vsi_lan_cfg(struct ice_port_info *pi, u16 vsi_handle)
  */
 bool ice_sched_is_tree_balanced(struct ice_hw *hw, struct ice_sched_node *node)
 {
-	u8 i;
+	u16 i;
 
 	/* start from the leaf node */
 	for (i = 0; i < node->num_children; i++)
@@ -2247,7 +2247,8 @@ ice_sched_get_free_vsi_parent(struct ice_hw *hw, struct ice_sched_node *node,
 			      u16 *num_nodes)
 {
 	u8 l = node->tx_sched_layer;
-	u8 vsil, i;
+	u8 vsil;
+	u16 i;
 
 	vsil = ice_sched_get_vsi_layer(hw);
 
@@ -2289,7 +2290,7 @@ ice_sched_update_parent(struct ice_sched_node *new_parent,
 			struct ice_sched_node *node)
 {
 	struct ice_sched_node *old_parent;
-	u8 i, j;
+	u16 i, j;
 
 	old_parent = node->parent;
 
@@ -2389,7 +2390,8 @@ ice_sched_move_vsi_to_agg(struct ice_port_info *pi, u16 vsi_handle, u32 agg_id,
 	u16 num_nodes[ICE_AQC_TOPO_MAX_LEVEL_NUM] = { 0 };
 	u32 first_node_teid, vsi_teid;
 	u16 num_nodes_added;
-	u8 aggl, vsil, i;
+	u8 aggl, vsil;
+	u16 i;
 	int status;
 
 	tc_node = ice_sched_get_tc_node(pi, tc);
@@ -2505,7 +2507,8 @@ ice_move_all_vsi_to_dflt_agg(struct ice_port_info *pi,
 static bool
 ice_sched_is_agg_inuse(struct ice_port_info *pi, struct ice_sched_node *node)
 {
-	u8 vsil, i;
+	u8 vsil;
+	u16 i;
 
 	vsil = ice_sched_get_vsi_layer(pi->hw);
 	if (node->tx_sched_layer < vsil - 1) {
diff --git a/drivers/net/ice/base/ice_type.h b/drivers/net/ice/base/ice_type.h
index a70e4a8afa..35f832eb9f 100644
--- a/drivers/net/ice/base/ice_type.h
+++ b/drivers/net/ice/base/ice_type.h
@@ -1030,9 +1030,9 @@ struct ice_sched_node {
 	struct ice_aqc_txsched_elem_data info;
 	u32 agg_id;			/* aggregator group ID */
 	u16 vsi_handle;
+	u16 num_children;
 	u8 in_use;			/* suspended or in use */
 	u8 tx_sched_layer;		/* Logical Layer (1-9) */
-	u8 num_children;
 	u8 tc_num;
 	u8 owner;
 #define ICE_SCHED_NODE_OWNER_LAN	0
-- 
2.43.0
^ permalink raw reply	[flat|nested] 76+ messages in thread
* [PATCH v3 10/16] net/ice/base: optimize subtree searches
  2024-08-12 15:27 ` [PATCH v3 00/16] Improve rte_tm support in ICE driver Bruce Richardson
                     ` (8 preceding siblings ...)
  2024-08-12 15:28   ` [PATCH v3 09/16] net/ice/base: remove 255 limit on sched child nodes Bruce Richardson
@ 2024-08-12 15:28   ` Bruce Richardson
  2024-08-12 15:28   ` [PATCH v3 11/16] net/ice/base: make functions non-static Bruce Richardson
                     ` (5 subsequent siblings)
  15 siblings, 0 replies; 76+ messages in thread
From: Bruce Richardson @ 2024-08-12 15:28 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson
In a number of places throughout the driver code, we want to confirm
that a scheduler node is indeed a child of another node. Currently, this
is confirmed by searching down the tree from the base until the desired
node is hit, a search which may hit many irrelevant tree nodes when
recursing down wrong branches. By switching the direction of search, to
check upwards from the node to the parent, we can avoid any incorrect
paths, and so speed up processing.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/ice/base/ice_sched.c | 18 +++++-------------
 1 file changed, 5 insertions(+), 13 deletions(-)
diff --git a/drivers/net/ice/base/ice_sched.c b/drivers/net/ice/base/ice_sched.c
index be13833e1e..f7d5f8f415 100644
--- a/drivers/net/ice/base/ice_sched.c
+++ b/drivers/net/ice/base/ice_sched.c
@@ -1475,20 +1475,12 @@ ice_sched_find_node_in_subtree(struct ice_hw *hw, struct ice_sched_node *base,
 {
 	u16 i;
 
-	for (i = 0; i < base->num_children; i++) {
-		struct ice_sched_node *child = base->children[i];
-
-		if (node == child)
-			return true;
-
-		if (child->tx_sched_layer > node->tx_sched_layer)
-			return false;
-
-		/* this recursion is intentional, and wouldn't
-		 * go more than 8 calls
-		 */
-		if (ice_sched_find_node_in_subtree(hw, child, node))
+	if (base == node)
+		return true;
+	while (node->tx_sched_layer != 0 && node->parent != NULL) {
+		if (node->parent == base)
 			return true;
+		node = node->parent;
 	}
 	return false;
 }
-- 
2.43.0
^ permalink raw reply	[flat|nested] 76+ messages in thread
* [PATCH v3 11/16] net/ice/base: make functions non-static
  2024-08-12 15:27 ` [PATCH v3 00/16] Improve rte_tm support in ICE driver Bruce Richardson
                     ` (9 preceding siblings ...)
  2024-08-12 15:28   ` [PATCH v3 10/16] net/ice/base: optimize subtree searches Bruce Richardson
@ 2024-08-12 15:28   ` Bruce Richardson
  2024-08-12 15:28   ` [PATCH v3 12/16] net/ice/base: remove flag checks before topology upload Bruce Richardson
                     ` (4 subsequent siblings)
  15 siblings, 0 replies; 76+ messages in thread
From: Bruce Richardson @ 2024-08-12 15:28 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson
We will need to allocate more lanq contexts after a scheduler rework, so
make that function non-static so accessible outside the file. For similar
reasons, make the function to add a Tx scheduler node non-static
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/ice/base/ice_sched.c | 2 +-
 drivers/net/ice/base/ice_sched.h | 8 ++++++++
 2 files changed, 9 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ice/base/ice_sched.c b/drivers/net/ice/base/ice_sched.c
index f7d5f8f415..d88b836c38 100644
--- a/drivers/net/ice/base/ice_sched.c
+++ b/drivers/net/ice/base/ice_sched.c
@@ -570,7 +570,7 @@ ice_sched_suspend_resume_elems(struct ice_hw *hw, u8 num_nodes, u32 *node_teids,
  * @tc: TC number
  * @new_numqs: number of queues
  */
-static int
+int
 ice_alloc_lan_q_ctx(struct ice_hw *hw, u16 vsi_handle, u8 tc, u16 new_numqs)
 {
 	struct ice_vsi_ctx *vsi_ctx;
diff --git a/drivers/net/ice/base/ice_sched.h b/drivers/net/ice/base/ice_sched.h
index 9f78516dfb..c7eb794963 100644
--- a/drivers/net/ice/base/ice_sched.h
+++ b/drivers/net/ice/base/ice_sched.h
@@ -270,4 +270,12 @@ int ice_sched_replay_q_bw(struct ice_port_info *pi, struct ice_q_ctx *q_ctx);
 int
 ice_sched_cfg_node_bw_alloc(struct ice_hw *hw, struct ice_sched_node *node,
 			    enum ice_rl_type rl_type, u16 bw_alloc);
+
+int
+ice_sched_add_elems(struct ice_port_info *pi, struct ice_sched_node *tc_node,
+		    struct ice_sched_node *parent, u8 layer, u16 num_nodes,
+		    u16 *num_nodes_added, u32 *first_node_teid,
+		    struct ice_sched_node **prealloc_nodes);
+int
+ice_alloc_lan_q_ctx(struct ice_hw *hw, u16 vsi_handle, u8 tc, u16 new_numqs);
 #endif /* _ICE_SCHED_H_ */
-- 
2.43.0
^ permalink raw reply	[flat|nested] 76+ messages in thread
* [PATCH v3 12/16] net/ice/base: remove flag checks before topology upload
  2024-08-12 15:27 ` [PATCH v3 00/16] Improve rte_tm support in ICE driver Bruce Richardson
                     ` (10 preceding siblings ...)
  2024-08-12 15:28   ` [PATCH v3 11/16] net/ice/base: make functions non-static Bruce Richardson
@ 2024-08-12 15:28   ` Bruce Richardson
  2024-08-12 15:28   ` [PATCH v3 13/16] net/ice: limit the number of queues to sched capabilities Bruce Richardson
                     ` (3 subsequent siblings)
  15 siblings, 0 replies; 76+ messages in thread
From: Bruce Richardson @ 2024-08-12 15:28 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson
DPDK should support more than just 9-level or 5-level topologies, so
remove the checks for those particular settings.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/ice/base/ice_ddp.c | 33 ---------------------------------
 1 file changed, 33 deletions(-)
diff --git a/drivers/net/ice/base/ice_ddp.c b/drivers/net/ice/base/ice_ddp.c
index e6c42c5274..744f015fe5 100644
--- a/drivers/net/ice/base/ice_ddp.c
+++ b/drivers/net/ice/base/ice_ddp.c
@@ -2373,38 +2373,6 @@ int ice_cfg_tx_topo(struct ice_hw *hw, u8 *buf, u32 len)
 		return status;
 	}
 
-	/* Is default topology already applied ? */
-	if (!(flags & ICE_AQC_TX_TOPO_FLAGS_LOAD_NEW) &&
-	    hw->num_tx_sched_layers == 9) {
-		ice_debug(hw, ICE_DBG_INIT, "Loaded default topology\n");
-		/* Already default topology is loaded */
-		return ICE_ERR_ALREADY_EXISTS;
-	}
-
-	/* Is new topology already applied ? */
-	if ((flags & ICE_AQC_TX_TOPO_FLAGS_LOAD_NEW) &&
-	    hw->num_tx_sched_layers == 5) {
-		ice_debug(hw, ICE_DBG_INIT, "Loaded new topology\n");
-		/* Already new topology is loaded */
-		return ICE_ERR_ALREADY_EXISTS;
-	}
-
-	/* Is set topology issued already ? */
-	if (flags & ICE_AQC_TX_TOPO_FLAGS_ISSUED) {
-		ice_debug(hw, ICE_DBG_INIT, "Update tx topology was done by another PF\n");
-		/* add a small delay before exiting */
-		for (i = 0; i < 20; i++)
-			ice_msec_delay(100, true);
-		return ICE_ERR_ALREADY_EXISTS;
-	}
-
-	/* Change the topology from new to default (5 to 9) */
-	if (!(flags & ICE_AQC_TX_TOPO_FLAGS_LOAD_NEW) &&
-	    hw->num_tx_sched_layers == 5) {
-		ice_debug(hw, ICE_DBG_INIT, "Change topology from 5 to 9 layers\n");
-		goto update_topo;
-	}
-
 	pkg_hdr = (struct ice_pkg_hdr *)buf;
 	state = ice_verify_pkg(pkg_hdr, len);
 	if (state) {
@@ -2451,7 +2419,6 @@ int ice_cfg_tx_topo(struct ice_hw *hw, u8 *buf, u32 len)
 	/* Get the new topology buffer */
 	new_topo = ((u8 *)section) + offset;
 
-update_topo:
 	/* acquire global lock to make sure that set topology issued
 	 * by one PF
 	 */
-- 
2.43.0
^ permalink raw reply	[flat|nested] 76+ messages in thread
* [PATCH v3 13/16] net/ice: limit the number of queues to sched capabilities
  2024-08-12 15:27 ` [PATCH v3 00/16] Improve rte_tm support in ICE driver Bruce Richardson
                     ` (11 preceding siblings ...)
  2024-08-12 15:28   ` [PATCH v3 12/16] net/ice/base: remove flag checks before topology upload Bruce Richardson
@ 2024-08-12 15:28   ` Bruce Richardson
  2024-08-12 15:28   ` [PATCH v3 14/16] net/ice: enhance Tx scheduler hierarchy support Bruce Richardson
                     ` (2 subsequent siblings)
  15 siblings, 0 replies; 76+ messages in thread
From: Bruce Richardson @ 2024-08-12 15:28 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson
Rather than assuming that each VSI can hold up to 256 queue pairs,
or the reported device limit, query the available nodes in the scheduler
tree to check that we are not overflowing the limit for number of
child scheduling nodes at each level. Do this by multiplying
max_children for each level beyond the VSI and using that as an
additional cap on the number of queues.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/ice/ice_ethdev.c | 25 ++++++++++++++++++++-----
 1 file changed, 20 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c
index 0d2445a317..ab3f88fd7d 100644
--- a/drivers/net/ice/ice_ethdev.c
+++ b/drivers/net/ice/ice_ethdev.c
@@ -913,7 +913,7 @@ ice_vsi_config_default_rss(struct ice_aqc_vsi_props *info)
 }
 
 static int
-ice_vsi_config_tc_queue_mapping(struct ice_vsi *vsi,
+ice_vsi_config_tc_queue_mapping(struct ice_hw *hw, struct ice_vsi *vsi,
 				struct ice_aqc_vsi_props *info,
 				uint8_t enabled_tcmap)
 {
@@ -929,13 +929,28 @@ ice_vsi_config_tc_queue_mapping(struct ice_vsi *vsi,
 	}
 
 	/* vector 0 is reserved and 1 vector for ctrl vsi */
-	if (vsi->adapter->hw.func_caps.common_cap.num_msix_vectors < 2)
+	if (vsi->adapter->hw.func_caps.common_cap.num_msix_vectors < 2) {
 		vsi->nb_qps = 0;
-	else
+	} else {
 		vsi->nb_qps = RTE_MIN
 			((uint16_t)vsi->adapter->hw.func_caps.common_cap.num_msix_vectors - 2,
 			RTE_MIN(vsi->nb_qps, ICE_MAX_Q_PER_TC));
 
+		/* cap max QPs to what the HW reports as num-children for each layer.
+		 * Multiply num_children for each layer from the entry_point layer to
+		 * the qgroup, or second-last layer.
+		 * Avoid any potential overflow by using uint32_t type and breaking loop
+		 * once we have a number greater than the already configured max.
+		 */
+		uint32_t max_sched_vsi_nodes = 1;
+		for (uint8_t i = hw->sw_entry_point_layer; i < hw->num_tx_sched_layers - 1; i++) {
+			max_sched_vsi_nodes *= hw->max_children[i];
+			if (max_sched_vsi_nodes >= vsi->nb_qps)
+				break;
+		}
+		vsi->nb_qps = RTE_MIN(vsi->nb_qps, max_sched_vsi_nodes);
+	}
+
 	/* nb_qps(hex)  -> fls */
 	/* 0000		-> 0 */
 	/* 0001		-> 0 */
@@ -1707,7 +1722,7 @@ ice_setup_vsi(struct ice_pf *pf, enum ice_vsi_type type)
 			rte_cpu_to_le_16(hw->func_caps.fd_fltr_best_effort);
 
 		/* Enable VLAN/UP trip */
-		ret = ice_vsi_config_tc_queue_mapping(vsi,
+		ret = ice_vsi_config_tc_queue_mapping(hw, vsi,
 						      &vsi_ctx.info,
 						      ICE_DEFAULT_TCMAP);
 		if (ret) {
@@ -1731,7 +1746,7 @@ ice_setup_vsi(struct ice_pf *pf, enum ice_vsi_type type)
 		vsi_ctx.info.fd_options = rte_cpu_to_le_16(cfg);
 		vsi_ctx.info.sw_id = hw->port_info->sw_id;
 		vsi_ctx.info.sw_flags2 = ICE_AQ_VSI_SW_FLAG_LAN_ENA;
-		ret = ice_vsi_config_tc_queue_mapping(vsi,
+		ret = ice_vsi_config_tc_queue_mapping(hw, vsi,
 						      &vsi_ctx.info,
 						      ICE_DEFAULT_TCMAP);
 		if (ret) {
-- 
2.43.0
^ permalink raw reply	[flat|nested] 76+ messages in thread
* [PATCH v3 14/16] net/ice: enhance Tx scheduler hierarchy support
  2024-08-12 15:27 ` [PATCH v3 00/16] Improve rte_tm support in ICE driver Bruce Richardson
                     ` (12 preceding siblings ...)
  2024-08-12 15:28   ` [PATCH v3 13/16] net/ice: limit the number of queues to sched capabilities Bruce Richardson
@ 2024-08-12 15:28   ` Bruce Richardson
  2024-08-12 15:28   ` [PATCH v3 15/16] net/ice: add minimal capability reporting API Bruce Richardson
  2024-08-12 15:28   ` [PATCH v3 16/16] net/ice: do early check on node level when adding Bruce Richardson
  15 siblings, 0 replies; 76+ messages in thread
From: Bruce Richardson @ 2024-08-12 15:28 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson
Increase the flexibility of the Tx scheduler hierarchy support in the
driver. If the HW/firmware allows it, allow creating up to 2k child
nodes per scheduler node. Also expand the number of supported layers to
the max available, rather than always just having 3 layers.  One
restriction on this change is that the topology needs to be configured
and enabled before port queue setup, in many cases, and before port
start in all cases.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/ice/ice_ethdev.c |   9 -
 drivers/net/ice/ice_ethdev.h |  15 +-
 drivers/net/ice/ice_rxtx.c   |  10 +
 drivers/net/ice/ice_tm.c     | 500 ++++++++++++++---------------------
 4 files changed, 216 insertions(+), 318 deletions(-)
diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c
index ab3f88fd7d..5a5967ff71 100644
--- a/drivers/net/ice/ice_ethdev.c
+++ b/drivers/net/ice/ice_ethdev.c
@@ -3832,7 +3832,6 @@ ice_dev_start(struct rte_eth_dev *dev)
 	int mask, ret;
 	uint8_t timer = hw->func_caps.ts_func_info.tmr_index_owned;
 	uint32_t pin_idx = ad->devargs.pin_idx;
-	struct rte_tm_error tm_err;
 	ice_declare_bitmap(pmask, ICE_PROMISC_MAX);
 	ice_zero_bitmap(pmask, ICE_PROMISC_MAX);
 
@@ -3864,14 +3863,6 @@ ice_dev_start(struct rte_eth_dev *dev)
 		}
 	}
 
-	if (pf->tm_conf.committed) {
-		ret = ice_do_hierarchy_commit(dev, pf->tm_conf.clear_on_fail, &tm_err);
-		if (ret) {
-			PMD_DRV_LOG(ERR, "fail to commit Tx scheduler");
-			goto rx_err;
-		}
-	}
-
 	ice_set_rx_function(dev);
 	ice_set_tx_function(dev);
 
diff --git a/drivers/net/ice/ice_ethdev.h b/drivers/net/ice/ice_ethdev.h
index f31addb122..cb1a7e8e0d 100644
--- a/drivers/net/ice/ice_ethdev.h
+++ b/drivers/net/ice/ice_ethdev.h
@@ -479,14 +479,6 @@ struct ice_tm_node {
 	struct ice_sched_node *sched_node;
 };
 
-/* node type of Traffic Manager */
-enum ice_tm_node_type {
-	ICE_TM_NODE_TYPE_PORT,
-	ICE_TM_NODE_TYPE_QGROUP,
-	ICE_TM_NODE_TYPE_QUEUE,
-	ICE_TM_NODE_TYPE_MAX,
-};
-
 /* Struct to store all the Traffic Manager configuration. */
 struct ice_tm_conf {
 	struct ice_shaper_profile_list shaper_profile_list;
@@ -690,9 +682,6 @@ int ice_rem_rss_cfg_wrap(struct ice_pf *pf, uint16_t vsi_id,
 			 struct ice_rss_hash_cfg *cfg);
 void ice_tm_conf_init(struct rte_eth_dev *dev);
 void ice_tm_conf_uninit(struct rte_eth_dev *dev);
-int ice_do_hierarchy_commit(struct rte_eth_dev *dev,
-			    int clear_on_fail,
-			    struct rte_tm_error *error);
 extern const struct rte_tm_ops ice_tm_ops;
 
 static inline int
@@ -750,4 +739,8 @@ int rte_pmd_ice_dump_switch(uint16_t port, uint8_t **buff, uint32_t *size);
 
 __rte_experimental
 int rte_pmd_ice_dump_txsched(uint16_t port, bool detail, FILE *stream);
+
+int
+ice_tm_setup_txq_node(struct ice_pf *pf, struct ice_hw *hw, uint16_t qid, uint32_t node_teid);
+
 #endif /* _ICE_ETHDEV_H_ */
diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c
index a150d28e73..7a421bb364 100644
--- a/drivers/net/ice/ice_rxtx.c
+++ b/drivers/net/ice/ice_rxtx.c
@@ -747,6 +747,7 @@ ice_tx_queue_start(struct rte_eth_dev *dev, uint16_t tx_queue_id)
 	int err;
 	struct ice_vsi *vsi;
 	struct ice_hw *hw;
+	struct ice_pf *pf;
 	struct ice_aqc_add_tx_qgrp *txq_elem;
 	struct ice_tlan_ctx tx_ctx;
 	int buf_len;
@@ -777,6 +778,7 @@ ice_tx_queue_start(struct rte_eth_dev *dev, uint16_t tx_queue_id)
 
 	vsi = txq->vsi;
 	hw = ICE_VSI_TO_HW(vsi);
+	pf = ICE_VSI_TO_PF(vsi);
 
 	memset(&tx_ctx, 0, sizeof(tx_ctx));
 	txq_elem->num_txqs = 1;
@@ -812,6 +814,14 @@ ice_tx_queue_start(struct rte_eth_dev *dev, uint16_t tx_queue_id)
 	/* store the schedule node id */
 	txq->q_teid = txq_elem->txqs[0].q_teid;
 
+	/* move the queue to correct position in hierarchy, if explicit hierarchy configured */
+	if (pf->tm_conf.committed)
+		if (ice_tm_setup_txq_node(pf, hw, tx_queue_id, txq->q_teid) != 0) {
+			PMD_DRV_LOG(ERR, "Failed to set up txq traffic management node");
+			rte_free(txq_elem);
+			return -EIO;
+		}
+
 	dev->data->tx_queue_state[tx_queue_id] = RTE_ETH_QUEUE_STATE_STARTED;
 
 	rte_free(txq_elem);
diff --git a/drivers/net/ice/ice_tm.c b/drivers/net/ice/ice_tm.c
index 459446a6b0..80039c8aff 100644
--- a/drivers/net/ice/ice_tm.c
+++ b/drivers/net/ice/ice_tm.c
@@ -1,17 +1,17 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2022 Intel Corporation
  */
+#include <rte_ethdev.h>
 #include <rte_tm_driver.h>
 
 #include "ice_ethdev.h"
 #include "ice_rxtx.h"
 
-#define MAX_CHILDREN_PER_SCHED_NODE	8
-#define MAX_CHILDREN_PER_TM_NODE	256
+#define MAX_CHILDREN_PER_TM_NODE	2048
 
 static int ice_hierarchy_commit(struct rte_eth_dev *dev,
 				 int clear_on_fail,
-				 __rte_unused struct rte_tm_error *error);
+				 struct rte_tm_error *error);
 static int ice_tm_node_add(struct rte_eth_dev *dev, uint32_t node_id,
 	      uint32_t parent_node_id, uint32_t priority,
 	      uint32_t weight, uint32_t level_id,
@@ -86,9 +86,10 @@ ice_tm_conf_uninit(struct rte_eth_dev *dev)
 }
 
 static int
-ice_node_param_check(struct ice_pf *pf, uint32_t node_id,
+ice_node_param_check(uint32_t node_id,
 		      uint32_t priority, uint32_t weight,
 		      const struct rte_tm_node_params *params,
+		      bool is_leaf,
 		      struct rte_tm_error *error)
 {
 	/* checked all the unsupported parameter */
@@ -123,7 +124,7 @@ ice_node_param_check(struct ice_pf *pf, uint32_t node_id,
 	}
 
 	/* for non-leaf node */
-	if (node_id >= pf->dev_data->nb_tx_queues) {
+	if (!is_leaf) {
 		if (params->nonleaf.wfq_weight_mode) {
 			error->type =
 				RTE_TM_ERROR_TYPE_NODE_PARAMS_WFQ_WEIGHT_MODE;
@@ -147,6 +148,11 @@ ice_node_param_check(struct ice_pf *pf, uint32_t node_id,
 	}
 
 	/* for leaf node */
+	if (node_id >= RTE_MAX_QUEUES_PER_PORT) {
+		error->type = RTE_TM_ERROR_TYPE_NODE_ID;
+		error->message = "Node ID out of range for a leaf node.";
+		return -EINVAL;
+	}
 	if (params->leaf.cman) {
 		error->type = RTE_TM_ERROR_TYPE_NODE_PARAMS_CMAN;
 		error->message = "Congestion management not supported";
@@ -193,11 +199,18 @@ find_node(struct ice_tm_node *root, uint32_t id)
 	return NULL;
 }
 
+static inline uint8_t
+ice_get_leaf_level(struct ice_hw *hw)
+{
+	return hw->num_tx_sched_layers - 1 - hw->port_info->has_tc;
+}
+
 static int
 ice_node_type_get(struct rte_eth_dev *dev, uint32_t node_id,
 		   int *is_leaf, struct rte_tm_error *error)
 {
 	struct ice_pf *pf = ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
+	struct ice_hw *hw = ICE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
 	struct ice_tm_node *tm_node;
 
 	if (!is_leaf || !error)
@@ -217,7 +230,7 @@ ice_node_type_get(struct rte_eth_dev *dev, uint32_t node_id,
 		return -EINVAL;
 	}
 
-	if (tm_node->level == ICE_TM_NODE_TYPE_QUEUE)
+	if (tm_node->level == ice_get_leaf_level(hw))
 		*is_leaf = true;
 	else
 		*is_leaf = false;
@@ -389,16 +402,28 @@ ice_tm_node_add(struct rte_eth_dev *dev, uint32_t node_id,
 	      struct rte_tm_error *error)
 {
 	struct ice_pf *pf = ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
+	struct ice_hw *hw = ICE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
 	struct ice_tm_shaper_profile *shaper_profile = NULL;
 	struct ice_tm_node *tm_node;
-	struct ice_tm_node *parent_node;
+	struct ice_tm_node *parent_node = NULL;
 	int ret;
 
 	if (!params || !error)
 		return -EINVAL;
 
-	ret = ice_node_param_check(pf, node_id, priority, weight,
-				    params, error);
+	if (parent_node_id != RTE_TM_NODE_ID_NULL) {
+		parent_node = find_node(pf->tm_conf.root, parent_node_id);
+		if (!parent_node) {
+			error->type = RTE_TM_ERROR_TYPE_NODE_PARENT_NODE_ID;
+			error->message = "parent not exist";
+			return -EINVAL;
+		}
+	}
+	if (level_id == RTE_TM_NODE_LEVEL_ID_ANY && parent_node != NULL)
+		level_id = parent_node->level + 1;
+
+	ret = ice_node_param_check(node_id, priority, weight,
+			params, level_id == ice_get_leaf_level(hw), error);
 	if (ret)
 		return ret;
 
@@ -424,9 +449,9 @@ ice_tm_node_add(struct rte_eth_dev *dev, uint32_t node_id,
 	/* root node if not have a parent */
 	if (parent_node_id == RTE_TM_NODE_ID_NULL) {
 		/* check level */
-		if (level_id != ICE_TM_NODE_TYPE_PORT) {
+		if (level_id != 0) {
 			error->type = RTE_TM_ERROR_TYPE_NODE_PARAMS;
-			error->message = "Wrong level";
+			error->message = "Wrong level, root node (NULL parent) must be at level 0";
 			return -EINVAL;
 		}
 
@@ -445,7 +470,7 @@ ice_tm_node_add(struct rte_eth_dev *dev, uint32_t node_id,
 		if (!tm_node)
 			return -ENOMEM;
 		tm_node->id = node_id;
-		tm_node->level = ICE_TM_NODE_TYPE_PORT;
+		tm_node->level = 0;
 		tm_node->parent = NULL;
 		tm_node->reference_count = 0;
 		tm_node->shaper_profile = shaper_profile;
@@ -458,52 +483,29 @@ ice_tm_node_add(struct rte_eth_dev *dev, uint32_t node_id,
 	}
 
 	/* check the parent node */
-	parent_node = find_node(pf->tm_conf.root, parent_node_id);
-	if (!parent_node) {
-		error->type = RTE_TM_ERROR_TYPE_NODE_PARENT_NODE_ID;
-		error->message = "parent not exist";
-		return -EINVAL;
-	}
-	if (parent_node->level != ICE_TM_NODE_TYPE_PORT &&
-	    parent_node->level != ICE_TM_NODE_TYPE_QGROUP) {
+	/* for n-level hierarchy, level n-1 is leaf, so last level with children is n-2 */
+	if ((int)parent_node->level > hw->num_tx_sched_layers - 2) {
 		error->type = RTE_TM_ERROR_TYPE_NODE_PARENT_NODE_ID;
 		error->message = "parent is not valid";
 		return -EINVAL;
 	}
 	/* check level */
-	if (level_id != RTE_TM_NODE_LEVEL_ID_ANY &&
-	    level_id != parent_node->level + 1) {
+	if (level_id != parent_node->level + 1) {
 		error->type = RTE_TM_ERROR_TYPE_NODE_PARAMS;
 		error->message = "Wrong level";
 		return -EINVAL;
 	}
 
-	/* check the node number */
-	if (parent_node->level == ICE_TM_NODE_TYPE_PORT) {
-		/* check the queue group number */
-		if (parent_node->reference_count >= pf->dev_data->nb_tx_queues) {
-			error->type = RTE_TM_ERROR_TYPE_NODE_ID;
-			error->message = "too many queue groups";
-			return -EINVAL;
-		}
-	} else {
-		/* check the queue number */
-		if (parent_node->reference_count >=
-			MAX_CHILDREN_PER_SCHED_NODE) {
-			error->type = RTE_TM_ERROR_TYPE_NODE_ID;
-			error->message = "too many queues";
-			return -EINVAL;
-		}
-		if (node_id >= pf->dev_data->nb_tx_queues) {
-			error->type = RTE_TM_ERROR_TYPE_NODE_ID;
-			error->message = "too large queue id";
-			return -EINVAL;
-		}
+	/* check the max children allowed at this level */
+	if (parent_node->reference_count >= hw->max_children[parent_node->level]) {
+		error->type = RTE_TM_ERROR_TYPE_CAPABILITIES;
+		error->message = "insufficient number of child nodes supported";
+		return -EINVAL;
 	}
 
 	tm_node = rte_zmalloc(NULL,
 			      sizeof(struct ice_tm_node) +
-			      sizeof(struct ice_tm_node *) * MAX_CHILDREN_PER_TM_NODE,
+			      sizeof(struct ice_tm_node *) * hw->max_children[level_id],
 			      0);
 	if (!tm_node)
 		return -ENOMEM;
@@ -518,13 +520,11 @@ ice_tm_node_add(struct rte_eth_dev *dev, uint32_t node_id,
 		(void *)((uint8_t *)tm_node + sizeof(struct ice_tm_node));
 	tm_node->parent->children[tm_node->parent->reference_count] = tm_node;
 
-	if (tm_node->priority != 0 && level_id != ICE_TM_NODE_TYPE_QUEUE &&
-	    level_id != ICE_TM_NODE_TYPE_QGROUP)
+	if (tm_node->priority != 0)
 		PMD_DRV_LOG(WARNING, "priority != 0 not supported in level %d",
 			    level_id);
 
-	if (tm_node->weight != 1 &&
-	    level_id != ICE_TM_NODE_TYPE_QUEUE && level_id != ICE_TM_NODE_TYPE_QGROUP)
+	if (tm_node->weight != 1 && level_id == 0)
 		PMD_DRV_LOG(WARNING, "weight != 1 not supported in level %d",
 			    level_id);
 
@@ -569,7 +569,7 @@ ice_tm_node_delete(struct rte_eth_dev *dev, uint32_t node_id,
 	}
 
 	/* root node */
-	if (tm_node->level == ICE_TM_NODE_TYPE_PORT) {
+	if (tm_node->level == 0) {
 		rte_free(tm_node);
 		pf->tm_conf.root = NULL;
 		return 0;
@@ -589,53 +589,6 @@ ice_tm_node_delete(struct rte_eth_dev *dev, uint32_t node_id,
 	return 0;
 }
 
-static int ice_move_recfg_lan_txq(struct rte_eth_dev *dev,
-				  struct ice_sched_node *queue_sched_node,
-				  struct ice_sched_node *dst_node,
-				  uint16_t queue_id)
-{
-	struct ice_hw *hw = ICE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
-	struct ice_aqc_move_txqs_data *buf;
-	struct ice_sched_node *queue_parent_node;
-	uint8_t txqs_moved;
-	int ret = ICE_SUCCESS;
-	uint16_t buf_size = ice_struct_size(buf, txqs, 1);
-
-	buf = (struct ice_aqc_move_txqs_data *)ice_malloc(hw, sizeof(*buf));
-	if (buf == NULL)
-		return -ENOMEM;
-
-	queue_parent_node = queue_sched_node->parent;
-	buf->src_teid = queue_parent_node->info.node_teid;
-	buf->dest_teid = dst_node->info.node_teid;
-	buf->txqs[0].q_teid = queue_sched_node->info.node_teid;
-	buf->txqs[0].txq_id = queue_id;
-
-	ret = ice_aq_move_recfg_lan_txq(hw, 1, true, false, false, false, 50,
-					NULL, buf, buf_size, &txqs_moved, NULL);
-	if (ret || txqs_moved == 0) {
-		PMD_DRV_LOG(ERR, "move lan queue %u failed", queue_id);
-		rte_free(buf);
-		return ICE_ERR_PARAM;
-	}
-
-	if (queue_parent_node->num_children > 0) {
-		queue_parent_node->num_children--;
-		queue_parent_node->children[queue_parent_node->num_children] = NULL;
-	} else {
-		PMD_DRV_LOG(ERR, "invalid children number %d for queue %u",
-			    queue_parent_node->num_children, queue_id);
-		rte_free(buf);
-		return ICE_ERR_PARAM;
-	}
-	dst_node->children[dst_node->num_children++] = queue_sched_node;
-	queue_sched_node->parent = dst_node;
-	ice_sched_query_elem(hw, queue_sched_node->info.node_teid, &queue_sched_node->info);
-
-	rte_free(buf);
-	return ret;
-}
-
 static int ice_set_node_rate(struct ice_hw *hw,
 			     struct ice_tm_node *tm_node,
 			     struct ice_sched_node *sched_node)
@@ -723,240 +676,191 @@ static int ice_cfg_hw_node(struct ice_hw *hw,
 	return 0;
 }
 
-static struct ice_sched_node *ice_get_vsi_node(struct ice_hw *hw)
+int
+ice_tm_setup_txq_node(struct ice_pf *pf, struct ice_hw *hw, uint16_t qid, uint32_t teid)
 {
-	struct ice_sched_node *node = hw->port_info->root;
-	uint32_t vsi_layer = hw->num_tx_sched_layers - ICE_VSI_LAYER_OFFSET;
-	uint32_t i;
+	struct ice_sched_node *hw_node = ice_sched_find_node_by_teid(hw->port_info->root, teid);
+	struct ice_tm_node *sw_node = find_node(pf->tm_conf.root, qid);
 
-	for (i = 0; i < vsi_layer; i++)
-		node = node->children[0];
-
-	return node;
-}
-
-static int ice_reset_noleaf_nodes(struct rte_eth_dev *dev)
-{
-	struct ice_pf *pf = ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
-	struct ice_hw *hw = ICE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
-	struct ice_sched_node *vsi_node = ice_get_vsi_node(hw);
-	struct ice_tm_node *root = pf->tm_conf.root;
-	uint32_t i;
-	int ret;
-
-	/* reset vsi_node */
-	ret = ice_set_node_rate(hw, NULL, vsi_node);
-	if (ret) {
-		PMD_DRV_LOG(ERR, "reset vsi node failed");
-		return ret;
-	}
-
-	if (root == NULL)
+	/* not configured in hierarchy */
+	if (sw_node == NULL)
 		return 0;
 
-	for (i = 0; i < root->reference_count; i++) {
-		struct ice_tm_node *tm_node = root->children[i];
+	sw_node->sched_node = hw_node;
 
-		if (tm_node->sched_node == NULL)
-			continue;
+	/* if the queue node has been put in the wrong place in hierarchy */
+	if (hw_node->parent != sw_node->parent->sched_node) {
+		struct ice_aqc_move_txqs_data *buf;
+		uint8_t txqs_moved = 0;
+		uint16_t buf_size = ice_struct_size(buf, txqs, 1);
+
+		buf = ice_malloc(hw, buf_size);
+		if (buf == NULL)
+			return -ENOMEM;
 
-		ret = ice_cfg_hw_node(hw, NULL, tm_node->sched_node);
-		if (ret) {
-			PMD_DRV_LOG(ERR, "reset queue group node %u failed", tm_node->id);
-			return ret;
+		struct ice_sched_node *parent = hw_node->parent;
+		struct ice_sched_node *new_parent = sw_node->parent->sched_node;
+		buf->src_teid = parent->info.node_teid;
+		buf->dest_teid = new_parent->info.node_teid;
+		buf->txqs[0].q_teid = hw_node->info.node_teid;
+		buf->txqs[0].txq_id = qid;
+
+		int ret = ice_aq_move_recfg_lan_txq(hw, 1, true, false, false, false, 50,
+						NULL, buf, buf_size, &txqs_moved, NULL);
+		if (ret || txqs_moved == 0) {
+			PMD_DRV_LOG(ERR, "move lan queue %u failed", qid);
+			ice_free(hw, buf);
+			return ICE_ERR_PARAM;
 		}
-		tm_node->sched_node = NULL;
+
+		/* now update the ice_sched_nodes to match physical layout */
+		new_parent->children[new_parent->num_children++] = hw_node;
+		hw_node->parent = new_parent;
+		ice_sched_query_elem(hw, hw_node->info.node_teid, &hw_node->info);
+		for (uint16_t i = 0; i < parent->num_children; i++)
+			if (parent->children[i] == hw_node) {
+				/* to remove, just overwrite the old node slot with the last ptr */
+				parent->children[i] = parent->children[--parent->num_children];
+				break;
+			}
 	}
 
-	return 0;
+	return ice_cfg_hw_node(hw, sw_node, hw_node);
 }
 
-static int ice_remove_leaf_nodes(struct rte_eth_dev *dev)
+/* from a given node, recursively deletes all the nodes that belong to that vsi.
+ * Any nodes which can't be deleted because they have children belonging to a different
+ * VSI, are now also adjusted to belong to that VSI also
+ */
+static int
+free_sched_node_recursive(struct ice_port_info *pi, const struct ice_sched_node *root,
+		struct ice_sched_node *node, uint8_t vsi_id)
 {
-	int ret = 0;
-	int i;
+	uint16_t i = 0;
 
-	for (i = 0; i < dev->data->nb_tx_queues; i++) {
-		ret = ice_tx_queue_stop(dev, i);
-		if (ret) {
-			PMD_DRV_LOG(ERR, "stop queue %u failed", i);
-			break;
+	while (i < node->num_children) {
+		if (node->children[i]->vsi_handle != vsi_id) {
+			i++;
+			continue;
 		}
+		free_sched_node_recursive(pi, root, node->children[i], vsi_id);
 	}
 
-	return ret;
-}
-
-static int ice_add_leaf_nodes(struct rte_eth_dev *dev)
-{
-	int ret = 0;
-	int i;
-
-	for (i = 0; i < dev->data->nb_tx_queues; i++) {
-		ret = ice_tx_queue_start(dev, i);
-		if (ret) {
-			PMD_DRV_LOG(ERR, "start queue %u failed", i);
-			break;
-		}
+	if (node != root) {
+		if (node->num_children == 0)
+			ice_free_sched_node(pi, node);
+		else
+			node->vsi_handle = node->children[0]->vsi_handle;
 	}
 
-	return ret;
+	return 0;
 }
 
-int ice_do_hierarchy_commit(struct rte_eth_dev *dev,
-			    int clear_on_fail,
-			    struct rte_tm_error *error)
+static int
+create_sched_node_recursive(struct ice_port_info *pi, struct ice_tm_node *sw_node,
+		struct ice_sched_node *hw_root, uint16_t *created)
 {
-	struct ice_pf *pf = ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
-	struct ice_hw *hw = ICE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
-	struct ice_tm_node *root;
-	struct ice_sched_node *vsi_node = NULL;
-	struct ice_sched_node *queue_node;
-	struct ice_tx_queue *txq;
-	int ret_val = 0;
-	uint32_t i;
-	uint32_t idx_vsi_child;
-	uint32_t idx_qg;
-	uint32_t nb_vsi_child;
-	uint32_t nb_qg;
-	uint32_t qid;
-	uint32_t q_teid;
-
-	/* remove leaf nodes */
-	ret_val = ice_remove_leaf_nodes(dev);
-	if (ret_val) {
-		error->type = RTE_TM_ERROR_TYPE_UNSPECIFIED;
-		PMD_DRV_LOG(ERR, "reset no-leaf nodes failed");
-		goto fail_clear;
-	}
-
-	/* reset no-leaf nodes. */
-	ret_val = ice_reset_noleaf_nodes(dev);
-	if (ret_val) {
-		error->type = RTE_TM_ERROR_TYPE_UNSPECIFIED;
-		PMD_DRV_LOG(ERR, "reset leaf nodes failed");
-		goto add_leaf;
-	}
-
-	/* config vsi node */
-	vsi_node = ice_get_vsi_node(hw);
-	root = pf->tm_conf.root;
-
-	ret_val = ice_set_node_rate(hw, root, vsi_node);
-	if (ret_val) {
-		error->type = RTE_TM_ERROR_TYPE_UNSPECIFIED;
-		PMD_DRV_LOG(ERR,
-			    "configure vsi node %u bandwidth failed",
-			    root->id);
-		goto add_leaf;
-	}
-
-	/* config queue group nodes */
-	nb_vsi_child = vsi_node->num_children;
-	nb_qg = vsi_node->children[0]->num_children;
-
-	idx_vsi_child = 0;
-	idx_qg = 0;
-
-	if (root == NULL)
-		goto commit;
-
-	for (i = 0; i < root->reference_count; i++) {
-		struct ice_tm_node *tm_node = root->children[i];
-		struct ice_tm_node *tm_child_node;
-		struct ice_sched_node *qgroup_sched_node =
-			vsi_node->children[idx_vsi_child]->children[idx_qg];
-		uint32_t j;
-
-		ret_val = ice_cfg_hw_node(hw, tm_node, qgroup_sched_node);
-		if (ret_val) {
-			error->type = RTE_TM_ERROR_TYPE_UNSPECIFIED;
-			PMD_DRV_LOG(ERR,
-				    "configure queue group node %u failed",
-				    tm_node->id);
-			goto reset_leaf;
-		}
-
-		for (j = 0; j < tm_node->reference_count; j++) {
-			tm_child_node = tm_node->children[j];
-			qid = tm_child_node->id;
-			ret_val = ice_tx_queue_start(dev, qid);
-			if (ret_val) {
-				error->type = RTE_TM_ERROR_TYPE_UNSPECIFIED;
-				PMD_DRV_LOG(ERR, "start queue %u failed", qid);
-				goto reset_leaf;
-			}
-			txq = dev->data->tx_queues[qid];
-			q_teid = txq->q_teid;
-			queue_node = ice_sched_get_node(hw->port_info, q_teid);
-			if (queue_node == NULL) {
-				error->type = RTE_TM_ERROR_TYPE_UNSPECIFIED;
-				PMD_DRV_LOG(ERR, "get queue %u node failed", qid);
-				goto reset_leaf;
-			}
-			if (queue_node->info.parent_teid != qgroup_sched_node->info.node_teid) {
-				ret_val = ice_move_recfg_lan_txq(dev, queue_node,
-								 qgroup_sched_node, qid);
-				if (ret_val) {
-					error->type = RTE_TM_ERROR_TYPE_UNSPECIFIED;
-					PMD_DRV_LOG(ERR, "move queue %u failed", qid);
-					goto reset_leaf;
-				}
-			}
-			ret_val = ice_cfg_hw_node(hw, tm_child_node, queue_node);
-			if (ret_val) {
-				error->type = RTE_TM_ERROR_TYPE_UNSPECIFIED;
-				PMD_DRV_LOG(ERR,
-					    "configure queue group node %u failed",
-					    tm_node->id);
-				goto reset_leaf;
-			}
-		}
-
-		idx_qg++;
-		if (idx_qg >= nb_qg) {
-			idx_qg = 0;
-			idx_vsi_child++;
+	struct ice_sched_node *parent = sw_node->sched_node;
+	uint32_t teid;
+	uint16_t added;
+
+	/* first create all child nodes */
+	for (uint16_t i = 0; i < sw_node->reference_count; i++) {
+		struct ice_tm_node *tm_node = sw_node->children[i];
+		int res = ice_sched_add_elems(pi, hw_root,
+				parent, parent->tx_sched_layer + 1,
+				1 /* num nodes */, &added, &teid,
+				NULL /* no pre-alloc */);
+		if (res != 0) {
+			PMD_DRV_LOG(ERR, "Error with ice_sched_add_elems, adding child node to teid %u\n",
+					parent->info.node_teid);
+			return -1;
 		}
-		if (idx_vsi_child >= nb_vsi_child) {
-			error->type = RTE_TM_ERROR_TYPE_UNSPECIFIED;
-			PMD_DRV_LOG(ERR, "too many queues");
-			goto reset_leaf;
+		struct ice_sched_node *hw_node = ice_sched_find_node_by_teid(parent, teid);
+		if (ice_cfg_hw_node(pi->hw, tm_node, hw_node) != 0) {
+			PMD_DRV_LOG(ERR, "Error configuring node %u at layer %u",
+					teid, parent->tx_sched_layer + 1);
+			return -1;
 		}
+		tm_node->sched_node = hw_node;
+		created[hw_node->tx_sched_layer]++;
 	}
 
-commit:
-	pf->tm_conf.committed = true;
-	pf->tm_conf.clear_on_fail = clear_on_fail;
+	/* if we have just created the child nodes in the q-group, i.e. last non-leaf layer,
+	 * then just return, rather than trying to create leaf nodes.
+	 * That is done later at queue start.
+	 */
+	if (sw_node->level + 2 == ice_get_leaf_level(pi->hw))
+		return 0;
 
-	return ret_val;
+	for (uint16_t i = 0; i < sw_node->reference_count; i++) {
+		if (sw_node->children[i]->reference_count == 0)
+			continue;
 
-reset_leaf:
-	ice_remove_leaf_nodes(dev);
-add_leaf:
-	ice_add_leaf_nodes(dev);
-	ice_reset_noleaf_nodes(dev);
-fail_clear:
-	/* clear all the traffic manager configuration */
-	if (clear_on_fail) {
-		ice_tm_conf_uninit(dev);
-		ice_tm_conf_init(dev);
+		if (create_sched_node_recursive(pi, sw_node->children[i], hw_root, created) < 0)
+			return -1;
 	}
-	return ret_val;
+	return 0;
 }
 
-static int ice_hierarchy_commit(struct rte_eth_dev *dev,
-				 int clear_on_fail,
-				 struct rte_tm_error *error)
+static int
+apply_topology_updates(struct rte_eth_dev *dev __rte_unused)
 {
+	return 0;
+}
+
+static int
+commit_new_hierarchy(struct rte_eth_dev *dev)
+{
+	struct ice_hw *hw = ICE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
 	struct ice_pf *pf = ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
+	struct ice_port_info *pi = hw->port_info;
+	struct ice_tm_node *sw_root = pf->tm_conf.root;
+	struct ice_sched_node *new_vsi_root = (pi->has_tc) ? pi->root->children[0] : pi->root;
+	uint16_t nodes_created_per_level[10] = {0}; /* counted per hw level, not per logical */
+	uint8_t q_lvl = ice_get_leaf_level(hw);
+	uint8_t qg_lvl = q_lvl - 1;
+
+	/* check if we have a previously applied topology */
+	if (sw_root->sched_node != NULL)
+		return apply_topology_updates(dev);
+
+	free_sched_node_recursive(pi, new_vsi_root, new_vsi_root, new_vsi_root->vsi_handle);
+
+	sw_root->sched_node = new_vsi_root;
+	if (create_sched_node_recursive(pi, sw_root, new_vsi_root, nodes_created_per_level) < 0)
+		return -1;
+	for (uint16_t i = 0; i < RTE_DIM(nodes_created_per_level); i++)
+		PMD_DRV_LOG(DEBUG, "Created %u nodes at level %u\n",
+				nodes_created_per_level[i], i);
+	hw->vsi_ctx[pf->main_vsi->idx]->sched.vsi_node[0] = new_vsi_root;
+
+	pf->main_vsi->nb_qps =
+			RTE_MIN(nodes_created_per_level[qg_lvl] * hw->max_children[qg_lvl],
+				hw->layer_info[q_lvl].max_device_nodes);
+
+	pf->tm_conf.committed = true; /* set flag to be checks on queue start */
+
+	return ice_alloc_lan_q_ctx(hw, 0, 0, pf->main_vsi->nb_qps);
+}
 
-	/* if device not started, simply set committed flag and return. */
-	if (!dev->data->dev_started) {
-		pf->tm_conf.committed = true;
-		pf->tm_conf.clear_on_fail = clear_on_fail;
-		return 0;
+static int
+ice_hierarchy_commit(struct rte_eth_dev *dev,
+				 int clear_on_fail,
+				 struct rte_tm_error *error)
+{
+	RTE_SET_USED(error);
+	/* commit should only be done to topology before start! */
+	if (dev->data->dev_started)
+		return -1;
+
+	uint64_t start = rte_rdtsc();
+	int ret = commit_new_hierarchy(dev);
+	if (ret < 0 && clear_on_fail) {
+		ice_tm_conf_uninit(dev);
+		ice_tm_conf_init(dev);
 	}
-
-	return ice_do_hierarchy_commit(dev, clear_on_fail, error);
+	uint64_t time = rte_rdtsc() - start;
+	PMD_DRV_LOG(DEBUG, "Time to apply hierarchy = %.1f\n", (float)time / rte_get_timer_hz());
+	return ret;
 }
-- 
2.43.0
^ permalink raw reply	[flat|nested] 76+ messages in thread
* [PATCH v3 15/16] net/ice: add minimal capability reporting API
  2024-08-12 15:27 ` [PATCH v3 00/16] Improve rte_tm support in ICE driver Bruce Richardson
                     ` (13 preceding siblings ...)
  2024-08-12 15:28   ` [PATCH v3 14/16] net/ice: enhance Tx scheduler hierarchy support Bruce Richardson
@ 2024-08-12 15:28   ` Bruce Richardson
  2024-08-12 15:28   ` [PATCH v3 16/16] net/ice: do early check on node level when adding Bruce Richardson
  15 siblings, 0 replies; 76+ messages in thread
From: Bruce Richardson @ 2024-08-12 15:28 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson
Incomplete but reports number of available layers
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/ice/ice_ethdev.h |  1 +
 drivers/net/ice/ice_tm.c     | 17 +++++++++++++++++
 2 files changed, 18 insertions(+)
diff --git a/drivers/net/ice/ice_ethdev.h b/drivers/net/ice/ice_ethdev.h
index cb1a7e8e0d..6bebc511e4 100644
--- a/drivers/net/ice/ice_ethdev.h
+++ b/drivers/net/ice/ice_ethdev.h
@@ -682,6 +682,7 @@ int ice_rem_rss_cfg_wrap(struct ice_pf *pf, uint16_t vsi_id,
 			 struct ice_rss_hash_cfg *cfg);
 void ice_tm_conf_init(struct rte_eth_dev *dev);
 void ice_tm_conf_uninit(struct rte_eth_dev *dev);
+
 extern const struct rte_tm_ops ice_tm_ops;
 
 static inline int
diff --git a/drivers/net/ice/ice_tm.c b/drivers/net/ice/ice_tm.c
index 80039c8aff..3dcd091c38 100644
--- a/drivers/net/ice/ice_tm.c
+++ b/drivers/net/ice/ice_tm.c
@@ -33,8 +33,12 @@ static int ice_shaper_profile_add(struct rte_eth_dev *dev,
 static int ice_shaper_profile_del(struct rte_eth_dev *dev,
 				   uint32_t shaper_profile_id,
 				   struct rte_tm_error *error);
+static int ice_tm_capabilities_get(struct rte_eth_dev *dev,
+	struct rte_tm_capabilities *cap,
+	struct rte_tm_error *error);
 
 const struct rte_tm_ops ice_tm_ops = {
+	.capabilities_get = ice_tm_capabilities_get,
 	.shaper_profile_add = ice_shaper_profile_add,
 	.shaper_profile_delete = ice_shaper_profile_del,
 	.node_add = ice_tm_node_add,
@@ -864,3 +868,16 @@ ice_hierarchy_commit(struct rte_eth_dev *dev,
 	PMD_DRV_LOG(DEBUG, "Time to apply hierarchy = %.1f\n", (float)time / rte_get_timer_hz());
 	return ret;
 }
+
+static int
+ice_tm_capabilities_get(struct rte_eth_dev *dev, struct rte_tm_capabilities *cap,
+	struct rte_tm_error *error)
+{
+	struct ice_hw *hw = ICE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+	*cap = (struct rte_tm_capabilities){
+		.n_levels_max = hw->num_tx_sched_layers - hw->port_info->has_tc,
+	};
+	if (error)
+		error->type = RTE_TM_ERROR_TYPE_NONE;
+	return 0;
+}
-- 
2.43.0
^ permalink raw reply	[flat|nested] 76+ messages in thread
* [PATCH v3 16/16] net/ice: do early check on node level when adding
  2024-08-12 15:27 ` [PATCH v3 00/16] Improve rte_tm support in ICE driver Bruce Richardson
                     ` (14 preceding siblings ...)
  2024-08-12 15:28   ` [PATCH v3 15/16] net/ice: add minimal capability reporting API Bruce Richardson
@ 2024-08-12 15:28   ` Bruce Richardson
  15 siblings, 0 replies; 76+ messages in thread
From: Bruce Richardson @ 2024-08-12 15:28 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson
When adding a new scheduler node, the parameters for leaf nodes and
non-leaf nodes are different, and which parameter checks are done is
determined by checking the node level i.e. if it's the lowest (leaf)
node level or not. However, if the node level itself is incorrectly
specified, the error messages got can be confusing since the user may
add a leaf node using e.g. the testpmd command to explicitly add a leaf
node, yet get error messages only relevant to non-leaf nodes due to an
incorrect level parameter.
We can avoid these confusing errors by doing a check that the level
matches "parent->level + 1" before doing a more detailed parameter
check.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/ice/ice_tm.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)
diff --git a/drivers/net/ice/ice_tm.c b/drivers/net/ice/ice_tm.c
index 3dcd091c38..e05ad8a8e7 100644
--- a/drivers/net/ice/ice_tm.c
+++ b/drivers/net/ice/ice_tm.c
@@ -426,6 +426,13 @@ ice_tm_node_add(struct rte_eth_dev *dev, uint32_t node_id,
 	if (level_id == RTE_TM_NODE_LEVEL_ID_ANY && parent_node != NULL)
 		level_id = parent_node->level + 1;
 
+	/* check level */
+	if (parent_node != NULL && level_id != parent_node->level + 1) {
+		error->type = RTE_TM_ERROR_TYPE_NODE_PARAMS;
+		error->message = "Wrong level";
+		return -EINVAL;
+	}
+
 	ret = ice_node_param_check(node_id, priority, weight,
 			params, level_id == ice_get_leaf_level(hw), error);
 	if (ret)
@@ -493,12 +500,6 @@ ice_tm_node_add(struct rte_eth_dev *dev, uint32_t node_id,
 		error->message = "parent is not valid";
 		return -EINVAL;
 	}
-	/* check level */
-	if (level_id != parent_node->level + 1) {
-		error->type = RTE_TM_ERROR_TYPE_NODE_PARAMS;
-		error->message = "Wrong level";
-		return -EINVAL;
-	}
 
 	/* check the max children allowed at this level */
 	if (parent_node->reference_count >= hw->max_children[parent_node->level]) {
-- 
2.43.0
^ permalink raw reply	[flat|nested] 76+ messages in thread
* [PATCH v4 0/5] Improve rte_tm support in ICE driver
  2024-08-07  9:33 [PATCH 00/15] Improve rte_tm support in ICE driver Bruce Richardson
                   ` (16 preceding siblings ...)
  2024-08-12 15:27 ` [PATCH v3 00/16] Improve rte_tm support in ICE driver Bruce Richardson
@ 2024-10-23 16:27 ` Bruce Richardson
  2024-10-23 16:27   ` [PATCH v4 1/5] net/ice: add option to download scheduler topology Bruce Richardson
                     ` (4 more replies)
  2024-10-23 16:55 ` [PATCH v5 0/5] Improve rte_tm support in ICE driver Bruce Richardson
  2024-10-29 17:01 ` [PATCH v6 0/5] Improve rte_tm support in ICE driver Bruce Richardson
  19 siblings, 5 replies; 76+ messages in thread
From: Bruce Richardson @ 2024-10-23 16:27 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson
This patchset expands the capabilities of the traffic management
support in the ICE driver. It allows the driver to support different
sizes of topologies, and support >256 queues and more than 3 hierarchy
layers.
---
v4:
* set reduces to only 5 patches:
  - base code changes mostly covered by separate base code patchset (merged rc1)
  - additional minor fixes and enhancements covered by set [1] (merged to next-net-intel for rc2)
* additional work included in set:
  - automatic stopping and restarting of port on configuration
  - ability to reconfigure the sched topology post-commit and then apply that via new commit call
v3:
* remove/implement some code TODOs
* add patch 16 to set.
v2:
* Correct typo in commit log of one patch
* Add missing depends-on tag to the cover letter
[1] https://patches.dpdk.org/project/dpdk/list/?series=33609&state=*
Bruce Richardson (5):
  net/ice: add option to download scheduler topology
  net/ice/base: make context alloc function non-static
  net/ice: enhance Tx scheduler hierarchy support
  net/ice: allowing stopping port to apply TM topology
  net/ice: provide parameter to limit scheduler layers
 doc/guides/nics/ice.rst          |  60 +++-
 drivers/net/ice/base/ice_ddp.c   |  18 +-
 drivers/net/ice/base/ice_ddp.h   |   4 +-
 drivers/net/ice/base/ice_sched.c |   2 +-
 drivers/net/ice/base/ice_sched.h |   3 +
 drivers/net/ice/ice_ethdev.c     |  90 ++++--
 drivers/net/ice/ice_ethdev.h     |  20 +-
 drivers/net/ice/ice_rxtx.c       |  10 +
 drivers/net/ice/ice_tm.c         | 513 +++++++++++++------------------
 9 files changed, 368 insertions(+), 352 deletions(-)
--
2.43.0
^ permalink raw reply	[flat|nested] 76+ messages in thread
* [PATCH v4 1/5] net/ice: add option to download scheduler topology
  2024-10-23 16:27 ` [PATCH v4 0/5] Improve rte_tm support in ICE driver Bruce Richardson
@ 2024-10-23 16:27   ` Bruce Richardson
  2024-10-23 16:27   ` [PATCH v4 2/5] net/ice/base: make context alloc function non-static Bruce Richardson
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 76+ messages in thread
From: Bruce Richardson @ 2024-10-23 16:27 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson
The DDP package file being loaded at init time may contain an
alternative Tx Scheduler topology in it. Add driver option to load this
topology at init time.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 doc/guides/nics/ice.rst        | 15 +++++++++++++++
 drivers/net/ice/base/ice_ddp.c | 18 +++++++++++++++---
 drivers/net/ice/base/ice_ddp.h |  4 ++--
 drivers/net/ice/ice_ethdev.c   | 24 +++++++++++++++---------
 drivers/net/ice/ice_ethdev.h   |  1 +
 5 files changed, 48 insertions(+), 14 deletions(-)
diff --git a/doc/guides/nics/ice.rst b/doc/guides/nics/ice.rst
index 6c66dc8008..bb8f01cb38 100644
--- a/doc/guides/nics/ice.rst
+++ b/doc/guides/nics/ice.rst
@@ -298,6 +298,21 @@ Runtime Configuration
   As a trade-off, this configuration may cause the packet processing performance
   degradation due to the PCI bandwidth limitation.
 
+- ``Tx Scheduler Topology Download``
+
+  The default Tx scheduler topology exposed by the NIC,
+  generally a 9-level topology of which 8 levels are SW configurable,
+  may be updated by a new topology loaded from a DDP package file file.
+  The ``ddp_load_sched_topo`` option can be used to specify that the scheduler topology,
+  if any, in the DDP package file being used should be loaded into the NIC.
+  For example::
+
+    -a 0000:88:00.0,ddp_load_sched_topo=1
+
+  or::
+
+    -a 0000:88:00.0,ddp_pkg_file=/path/to/pkg.file,ddp_load_sched_topo=1
+
 - ``Tx diagnostics`` (default ``not enabled``)
 
   Set the ``devargs`` parameter ``mbuf_check`` to enable Tx diagnostics.
diff --git a/drivers/net/ice/base/ice_ddp.c b/drivers/net/ice/base/ice_ddp.c
index c17a58eab8..ab9645f3a2 100644
--- a/drivers/net/ice/base/ice_ddp.c
+++ b/drivers/net/ice/base/ice_ddp.c
@@ -1333,7 +1333,7 @@ ice_fill_hw_ptype(struct ice_hw *hw)
  * ice_copy_and_init_pkg() instead of directly calling ice_init_pkg() in this
  * case.
  */
-enum ice_ddp_state ice_init_pkg(struct ice_hw *hw, u8 *buf, u32 len)
+enum ice_ddp_state ice_init_pkg(struct ice_hw *hw, u8 *buf, u32 len, bool load_sched)
 {
 	bool already_loaded = false;
 	enum ice_ddp_state state;
@@ -1351,6 +1351,18 @@ enum ice_ddp_state ice_init_pkg(struct ice_hw *hw, u8 *buf, u32 len)
 		return state;
 	}
 
+	if (load_sched) {
+		enum ice_status res = ice_cfg_tx_topo(hw, buf, len);
+		if (res != ICE_SUCCESS) {
+			ice_debug(hw, ICE_DBG_INIT, "failed to apply sched topology  (err: %d)\n",
+					res);
+			return ICE_DDP_PKG_ERR;
+		}
+		ice_debug(hw, ICE_DBG_INIT, "Topology download successful, reinitializing device\n");
+		ice_deinit_hw(hw);
+		ice_init_hw(hw);
+	}
+
 	/* initialize package info */
 	state = ice_init_pkg_info(hw, pkg);
 	if (state)
@@ -1423,7 +1435,7 @@ enum ice_ddp_state ice_init_pkg(struct ice_hw *hw, u8 *buf, u32 len)
  * related routines.
  */
 enum ice_ddp_state
-ice_copy_and_init_pkg(struct ice_hw *hw, const u8 *buf, u32 len)
+ice_copy_and_init_pkg(struct ice_hw *hw, const u8 *buf, u32 len, bool load_sched)
 {
 	enum ice_ddp_state state;
 	u8 *buf_copy;
@@ -1433,7 +1445,7 @@ ice_copy_and_init_pkg(struct ice_hw *hw, const u8 *buf, u32 len)
 
 	buf_copy = (u8 *)ice_memdup(hw, buf, len, ICE_NONDMA_TO_NONDMA);
 
-	state = ice_init_pkg(hw, buf_copy, len);
+	state = ice_init_pkg(hw, buf_copy, len, load_sched);
 	if (!ice_is_init_pkg_successful(state)) {
 		/* Free the copy, since we failed to initialize the package */
 		ice_free(hw, buf_copy);
diff --git a/drivers/net/ice/base/ice_ddp.h b/drivers/net/ice/base/ice_ddp.h
index 5512669f44..d79cdee13a 100644
--- a/drivers/net/ice/base/ice_ddp.h
+++ b/drivers/net/ice/base/ice_ddp.h
@@ -454,9 +454,9 @@ ice_pkg_enum_entry(struct ice_seg *ice_seg, struct ice_pkg_enum *state,
 void *
 ice_pkg_enum_section(struct ice_seg *ice_seg, struct ice_pkg_enum *state,
 		     u32 sect_type);
-enum ice_ddp_state ice_init_pkg(struct ice_hw *hw, u8 *buff, u32 len);
+enum ice_ddp_state ice_init_pkg(struct ice_hw *hw, u8 *buff, u32 len, bool load_sched);
 enum ice_ddp_state
-ice_copy_and_init_pkg(struct ice_hw *hw, const u8 *buf, u32 len);
+ice_copy_and_init_pkg(struct ice_hw *hw, const u8 *buf, u32 len, bool load_sched);
 bool ice_is_init_pkg_successful(enum ice_ddp_state state);
 void ice_free_seg(struct ice_hw *hw);
 
diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c
index d5e94a6685..d0a845accd 100644
--- a/drivers/net/ice/ice_ethdev.c
+++ b/drivers/net/ice/ice_ethdev.c
@@ -39,6 +39,7 @@
 #define ICE_RX_LOW_LATENCY_ARG    "rx_low_latency"
 #define ICE_MBUF_CHECK_ARG       "mbuf_check"
 #define ICE_DDP_FILENAME_ARG      "ddp_pkg_file"
+#define ICE_DDP_LOAD_SCHED_ARG    "ddp_load_sched_topo"
 
 #define ICE_CYCLECOUNTER_MASK  0xffffffffffffffffULL
 
@@ -56,6 +57,7 @@ static const char * const ice_valid_args[] = {
 	ICE_DEFAULT_MAC_DISABLE,
 	ICE_MBUF_CHECK_ARG,
 	ICE_DDP_FILENAME_ARG,
+	ICE_DDP_LOAD_SCHED_ARG,
 	NULL
 };
 
@@ -1997,7 +1999,7 @@ int ice_load_pkg(struct ice_adapter *adapter, bool use_dsn, uint64_t dsn)
 load_fw:
 	PMD_INIT_LOG(DEBUG, "DDP package name: %s", pkg_file);
 
-	err = ice_copy_and_init_pkg(hw, buf, bufsz);
+	err = ice_copy_and_init_pkg(hw, buf, bufsz, adapter->devargs.ddp_load_sched);
 	if (!ice_is_init_pkg_successful(err)) {
 		PMD_INIT_LOG(ERR, "ice_copy_and_init_hw failed: %d", err);
 		free(buf);
@@ -2030,19 +2032,18 @@ static int
 parse_bool(const char *key, const char *value, void *args)
 {
 	int *i = (int *)args;
-	char *end;
-	int num;
 
-	num = strtoul(value, &end, 10);
-
-	if (num != 0 && num != 1) {
-		PMD_DRV_LOG(WARNING, "invalid value:\"%s\" for key:\"%s\", "
-			"value must be 0 or 1",
+	if (value == NULL || value[0] == '\0') {
+		PMD_DRV_LOG(WARNING, "key:\"%s\", requires a value, which must be 0 or 1", key);
+		return -1;
+	}
+	if (value[1] != '\0' || (value[0] != '0' && value[0] != '1')) {
+		PMD_DRV_LOG(WARNING, "invalid value:\"%s\" for key:\"%s\", value must be 0 or 1",
 			value, key);
 		return -1;
 	}
 
-	*i = num;
+	*i = value[0] - '0';
 	return 0;
 }
 
@@ -2307,6 +2308,10 @@ static int ice_parse_devargs(struct rte_eth_dev *dev)
 	if (ret)
 		goto bail;
 
+	ret = rte_kvargs_process(kvlist, ICE_DDP_LOAD_SCHED_ARG,
+				 &parse_bool, &ad->devargs.ddp_load_sched);
+	if (ret)
+		goto bail;
 bail:
 	rte_kvargs_free(kvlist);
 	return ret;
@@ -7185,6 +7190,7 @@ RTE_PMD_REGISTER_PARAM_STRING(net_ice,
 			      ICE_SAFE_MODE_SUPPORT_ARG "=<0|1>"
 			      ICE_DEFAULT_MAC_DISABLE "=<0|1>"
 			      ICE_DDP_FILENAME_ARG "=</path/to/file>"
+			      ICE_DDP_LOAD_SCHED_ARG "=<0|1>"
 			      ICE_RX_LOW_LATENCY_ARG "=<0|1>");
 
 RTE_LOG_REGISTER_SUFFIX(ice_logtype_init, init, NOTICE);
diff --git a/drivers/net/ice/ice_ethdev.h b/drivers/net/ice/ice_ethdev.h
index 076cf595e8..2794a76096 100644
--- a/drivers/net/ice/ice_ethdev.h
+++ b/drivers/net/ice/ice_ethdev.h
@@ -564,6 +564,7 @@ struct ice_devargs {
 	uint8_t proto_xtr[ICE_MAX_QUEUE_NUM];
 	uint8_t pin_idx;
 	uint8_t pps_out_ena;
+	uint8_t ddp_load_sched;
 	int xtr_field_offs;
 	uint8_t xtr_flag_offs[PROTO_XTR_MAX];
 	/* Name of the field. */
-- 
2.43.0
^ permalink raw reply	[flat|nested] 76+ messages in thread
* [PATCH v4 2/5] net/ice/base: make context alloc function non-static
  2024-10-23 16:27 ` [PATCH v4 0/5] Improve rte_tm support in ICE driver Bruce Richardson
  2024-10-23 16:27   ` [PATCH v4 1/5] net/ice: add option to download scheduler topology Bruce Richardson
@ 2024-10-23 16:27   ` Bruce Richardson
  2024-10-23 16:27   ` [PATCH v4 3/5] net/ice: enhance Tx scheduler hierarchy support Bruce Richardson
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 76+ messages in thread
From: Bruce Richardson @ 2024-10-23 16:27 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson
The function "ice_alloc_lan_q_ctx" will be needed by the driver code, so
make it non-static.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/ice/base/ice_sched.c | 2 +-
 drivers/net/ice/base/ice_sched.h | 3 +++
 2 files changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ice/base/ice_sched.c b/drivers/net/ice/base/ice_sched.c
index 9608ac7c24..1f520bb7c0 100644
--- a/drivers/net/ice/base/ice_sched.c
+++ b/drivers/net/ice/base/ice_sched.c
@@ -570,7 +570,7 @@ ice_sched_suspend_resume_elems(struct ice_hw *hw, u8 num_nodes, u32 *node_teids,
  * @tc: TC number
  * @new_numqs: number of queues
  */
-static int
+int
 ice_alloc_lan_q_ctx(struct ice_hw *hw, u16 vsi_handle, u8 tc, u16 new_numqs)
 {
 	struct ice_vsi_ctx *vsi_ctx;
diff --git a/drivers/net/ice/base/ice_sched.h b/drivers/net/ice/base/ice_sched.h
index 9f78516dfb..09d60d02f0 100644
--- a/drivers/net/ice/base/ice_sched.h
+++ b/drivers/net/ice/base/ice_sched.h
@@ -270,4 +270,7 @@ int ice_sched_replay_q_bw(struct ice_port_info *pi, struct ice_q_ctx *q_ctx);
 int
 ice_sched_cfg_node_bw_alloc(struct ice_hw *hw, struct ice_sched_node *node,
 			    enum ice_rl_type rl_type, u16 bw_alloc);
+
+int
+ice_alloc_lan_q_ctx(struct ice_hw *hw, u16 vsi_handle, u8 tc, u16 new_numqs);
 #endif /* _ICE_SCHED_H_ */
-- 
2.43.0
^ permalink raw reply	[flat|nested] 76+ messages in thread
* [PATCH v4 3/5] net/ice: enhance Tx scheduler hierarchy support
  2024-10-23 16:27 ` [PATCH v4 0/5] Improve rte_tm support in ICE driver Bruce Richardson
  2024-10-23 16:27   ` [PATCH v4 1/5] net/ice: add option to download scheduler topology Bruce Richardson
  2024-10-23 16:27   ` [PATCH v4 2/5] net/ice/base: make context alloc function non-static Bruce Richardson
@ 2024-10-23 16:27   ` Bruce Richardson
  2024-10-23 16:27   ` [PATCH v4 4/5] net/ice: allowing stopping port to apply TM topology Bruce Richardson
  2024-10-23 16:27   ` [PATCH v4 5/5] net/ice: provide parameter to limit scheduler layers Bruce Richardson
  4 siblings, 0 replies; 76+ messages in thread
From: Bruce Richardson @ 2024-10-23 16:27 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson
Increase the flexibility of the Tx scheduler hierarchy support in the
driver. If the HW/firmware allows it, allow creating up to 2k child
nodes per scheduler node. Also expand the number of supported layers to
the max available, rather than always just having 3 layers.  One
restriction on this change is that the topology needs to be configured
and enabled before port queue setup, in many cases, and before port
start in all cases.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 doc/guides/nics/ice.rst      |  31 +--
 drivers/net/ice/ice_ethdev.c |   9 -
 drivers/net/ice/ice_ethdev.h |  15 +-
 drivers/net/ice/ice_rxtx.c   |  10 +
 drivers/net/ice/ice_tm.c     | 496 ++++++++++++++---------------------
 5 files changed, 224 insertions(+), 337 deletions(-)
diff --git a/doc/guides/nics/ice.rst b/doc/guides/nics/ice.rst
index bb8f01cb38..dc649f8e31 100644
--- a/doc/guides/nics/ice.rst
+++ b/doc/guides/nics/ice.rst
@@ -447,21 +447,22 @@ Traffic Management Support
 ~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 The ice PMD provides support for the Traffic Management API (RTE_TM),
-allow users to offload a 3-layers Tx scheduler on the E810 NIC:
-
-- ``Port Layer``
-
-  This is the root layer, support peak bandwidth configuration,
-  max to 32 children.
-
-- ``Queue Group Layer``
-
-  The middle layer, support peak / committed bandwidth, weight, priority configurations,
-  max to 8 children.
-
-- ``Queue Layer``
-
-  The leaf layer, support peak / committed bandwidth, weight, priority configurations.
+enabling users to configure and manage the traffic shaping and scheduling of transmitted packets.
+By default, all available transmit scheduler layers are available for configuration,
+allowing up to 2000 queues to be configured in a hierarchy of up to 8 levels.
+The number of levels in the hierarchy can be adjusted via driver parameter:
+
+* the default 9-level topology (8 levels usable) can be replaced by a new topology downloaded from a DDP file,
+  using the driver parameter ``ddp_load_sched_topo=1``.
+  Using this mechanism, if the number of levels is reduced,
+  the possible fan-out of child-nodes from each level may be increased.
+  The default topology is a 9-level tree with a fan-out of 8 at each level.
+  Released DDP package files contain a 5-level hierarchy (4-levels usable),
+  with increased fan-out at the lower 3 levels
+  e.g. 64 at levels 2 and 3, and 256 or more at the leaf-node level.
+
+For more details on how to configure a Tx scheduling hierarchy,
+please refer to the ``rte_tm`` `API documentation <https://doc.dpdk.org/api/rte__tm_8h.html>`_.
 
 Additional Options
 ++++++++++++++++++
diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c
index d0a845accd..7aed26118f 100644
--- a/drivers/net/ice/ice_ethdev.c
+++ b/drivers/net/ice/ice_ethdev.c
@@ -3906,7 +3906,6 @@ ice_dev_start(struct rte_eth_dev *dev)
 	int mask, ret;
 	uint8_t timer = hw->func_caps.ts_func_info.tmr_index_owned;
 	uint32_t pin_idx = ad->devargs.pin_idx;
-	struct rte_tm_error tm_err;
 	ice_declare_bitmap(pmask, ICE_PROMISC_MAX);
 	ice_zero_bitmap(pmask, ICE_PROMISC_MAX);
 
@@ -3938,14 +3937,6 @@ ice_dev_start(struct rte_eth_dev *dev)
 		}
 	}
 
-	if (pf->tm_conf.committed) {
-		ret = ice_do_hierarchy_commit(dev, pf->tm_conf.clear_on_fail, &tm_err);
-		if (ret) {
-			PMD_DRV_LOG(ERR, "fail to commit Tx scheduler");
-			goto rx_err;
-		}
-	}
-
 	ice_set_rx_function(dev);
 	ice_set_tx_function(dev);
 
diff --git a/drivers/net/ice/ice_ethdev.h b/drivers/net/ice/ice_ethdev.h
index 2794a76096..71fd7bca64 100644
--- a/drivers/net/ice/ice_ethdev.h
+++ b/drivers/net/ice/ice_ethdev.h
@@ -480,14 +480,6 @@ struct ice_tm_node {
 	struct ice_sched_node *sched_node;
 };
 
-/* node type of Traffic Manager */
-enum ice_tm_node_type {
-	ICE_TM_NODE_TYPE_PORT,
-	ICE_TM_NODE_TYPE_QGROUP,
-	ICE_TM_NODE_TYPE_QUEUE,
-	ICE_TM_NODE_TYPE_MAX,
-};
-
 /* Struct to store all the Traffic Manager configuration. */
 struct ice_tm_conf {
 	struct ice_shaper_profile_list shaper_profile_list;
@@ -690,9 +682,6 @@ int ice_rem_rss_cfg_wrap(struct ice_pf *pf, uint16_t vsi_id,
 			 struct ice_rss_hash_cfg *cfg);
 void ice_tm_conf_init(struct rte_eth_dev *dev);
 void ice_tm_conf_uninit(struct rte_eth_dev *dev);
-int ice_do_hierarchy_commit(struct rte_eth_dev *dev,
-			    int clear_on_fail,
-			    struct rte_tm_error *error);
 extern const struct rte_tm_ops ice_tm_ops;
 
 static inline int
@@ -750,4 +739,8 @@ int rte_pmd_ice_dump_switch(uint16_t port, uint8_t **buff, uint32_t *size);
 
 __rte_experimental
 int rte_pmd_ice_dump_txsched(uint16_t port, bool detail, FILE *stream);
+
+int
+ice_tm_setup_txq_node(struct ice_pf *pf, struct ice_hw *hw, uint16_t qid, uint32_t node_teid);
+
 #endif /* _ICE_ETHDEV_H_ */
diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c
index 024d97cb46..0c7106c7e0 100644
--- a/drivers/net/ice/ice_rxtx.c
+++ b/drivers/net/ice/ice_rxtx.c
@@ -747,6 +747,7 @@ ice_tx_queue_start(struct rte_eth_dev *dev, uint16_t tx_queue_id)
 	int err;
 	struct ice_vsi *vsi;
 	struct ice_hw *hw;
+	struct ice_pf *pf;
 	struct ice_aqc_add_tx_qgrp *txq_elem;
 	struct ice_tlan_ctx tx_ctx;
 	int buf_len;
@@ -777,6 +778,7 @@ ice_tx_queue_start(struct rte_eth_dev *dev, uint16_t tx_queue_id)
 
 	vsi = txq->vsi;
 	hw = ICE_VSI_TO_HW(vsi);
+	pf = ICE_VSI_TO_PF(vsi);
 
 	memset(&tx_ctx, 0, sizeof(tx_ctx));
 	txq_elem->num_txqs = 1;
@@ -812,6 +814,14 @@ ice_tx_queue_start(struct rte_eth_dev *dev, uint16_t tx_queue_id)
 	/* store the schedule node id */
 	txq->q_teid = txq_elem->txqs[0].q_teid;
 
+	/* move the queue to correct position in hierarchy, if explicit hierarchy configured */
+	if (pf->tm_conf.committed)
+		if (ice_tm_setup_txq_node(pf, hw, tx_queue_id, txq->q_teid) != 0) {
+			PMD_DRV_LOG(ERR, "Failed to set up txq traffic management node");
+			rte_free(txq_elem);
+			return -EIO;
+		}
+
 	dev->data->tx_queue_state[tx_queue_id] = RTE_ETH_QUEUE_STATE_STARTED;
 
 	rte_free(txq_elem);
diff --git a/drivers/net/ice/ice_tm.c b/drivers/net/ice/ice_tm.c
index 636ab77f26..4809bdde40 100644
--- a/drivers/net/ice/ice_tm.c
+++ b/drivers/net/ice/ice_tm.c
@@ -1,17 +1,17 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2022 Intel Corporation
  */
+#include <rte_ethdev.h>
 #include <rte_tm_driver.h>
 
 #include "ice_ethdev.h"
 #include "ice_rxtx.h"
 
-#define MAX_CHILDREN_PER_SCHED_NODE	8
-#define MAX_CHILDREN_PER_TM_NODE	256
+#define MAX_CHILDREN_PER_TM_NODE	2048
 
 static int ice_hierarchy_commit(struct rte_eth_dev *dev,
 				 int clear_on_fail,
-				 __rte_unused struct rte_tm_error *error);
+				 struct rte_tm_error *error);
 static int ice_tm_node_add(struct rte_eth_dev *dev, uint32_t node_id,
 	      uint32_t parent_node_id, uint32_t priority,
 	      uint32_t weight, uint32_t level_id,
@@ -86,9 +86,10 @@ ice_tm_conf_uninit(struct rte_eth_dev *dev)
 }
 
 static int
-ice_node_param_check(struct ice_pf *pf, uint32_t node_id,
+ice_node_param_check(uint32_t node_id,
 		      uint32_t priority, uint32_t weight,
 		      const struct rte_tm_node_params *params,
+		      bool is_leaf,
 		      struct rte_tm_error *error)
 {
 	/* checked all the unsupported parameter */
@@ -123,7 +124,7 @@ ice_node_param_check(struct ice_pf *pf, uint32_t node_id,
 	}
 
 	/* for non-leaf node */
-	if (node_id >= pf->dev_data->nb_tx_queues) {
+	if (!is_leaf) {
 		if (params->nonleaf.wfq_weight_mode) {
 			error->type =
 				RTE_TM_ERROR_TYPE_NODE_PARAMS_WFQ_WEIGHT_MODE;
@@ -147,6 +148,11 @@ ice_node_param_check(struct ice_pf *pf, uint32_t node_id,
 	}
 
 	/* for leaf node */
+	if (node_id >= RTE_MAX_QUEUES_PER_PORT) {
+		error->type = RTE_TM_ERROR_TYPE_NODE_ID;
+		error->message = "Node ID out of range for a leaf node.";
+		return -EINVAL;
+	}
 	if (params->leaf.cman) {
 		error->type = RTE_TM_ERROR_TYPE_NODE_PARAMS_CMAN;
 		error->message = "Congestion management not supported";
@@ -193,11 +199,18 @@ find_node(struct ice_tm_node *root, uint32_t id)
 	return NULL;
 }
 
+static inline uint8_t
+ice_get_leaf_level(struct ice_hw *hw)
+{
+	return hw->num_tx_sched_layers - 1 - hw->port_info->has_tc;
+}
+
 static int
 ice_node_type_get(struct rte_eth_dev *dev, uint32_t node_id,
 		   int *is_leaf, struct rte_tm_error *error)
 {
 	struct ice_pf *pf = ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
+	struct ice_hw *hw = ICE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
 	struct ice_tm_node *tm_node;
 
 	if (!is_leaf || !error)
@@ -217,7 +230,7 @@ ice_node_type_get(struct rte_eth_dev *dev, uint32_t node_id,
 		return -EINVAL;
 	}
 
-	if (tm_node->level == ICE_TM_NODE_TYPE_QUEUE)
+	if (tm_node->level == ice_get_leaf_level(hw))
 		*is_leaf = true;
 	else
 		*is_leaf = false;
@@ -393,16 +406,35 @@ ice_tm_node_add(struct rte_eth_dev *dev, uint32_t node_id,
 	      struct rte_tm_error *error)
 {
 	struct ice_pf *pf = ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
+	struct ice_hw *hw = ICE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
 	struct ice_tm_shaper_profile *shaper_profile = NULL;
 	struct ice_tm_node *tm_node;
-	struct ice_tm_node *parent_node;
+	struct ice_tm_node *parent_node = NULL;
 	int ret;
 
 	if (!params || !error)
 		return -EINVAL;
 
-	ret = ice_node_param_check(pf, node_id, priority, weight,
-				    params, error);
+	if (parent_node_id != RTE_TM_NODE_ID_NULL) {
+		parent_node = find_node(pf->tm_conf.root, parent_node_id);
+		if (!parent_node) {
+			error->type = RTE_TM_ERROR_TYPE_NODE_PARENT_NODE_ID;
+			error->message = "parent not exist";
+			return -EINVAL;
+		}
+	}
+	if (level_id == RTE_TM_NODE_LEVEL_ID_ANY && parent_node != NULL)
+		level_id = parent_node->level + 1;
+
+	/* check level */
+	if (parent_node != NULL && level_id != parent_node->level + 1) {
+		error->type = RTE_TM_ERROR_TYPE_NODE_PARAMS;
+		error->message = "Wrong level";
+		return -EINVAL;
+	}
+
+	ret = ice_node_param_check(node_id, priority, weight,
+			params, level_id == ice_get_leaf_level(hw), error);
 	if (ret)
 		return ret;
 
@@ -428,9 +460,9 @@ ice_tm_node_add(struct rte_eth_dev *dev, uint32_t node_id,
 	/* root node if not have a parent */
 	if (parent_node_id == RTE_TM_NODE_ID_NULL) {
 		/* check level */
-		if (level_id != ICE_TM_NODE_TYPE_PORT) {
+		if (level_id != 0) {
 			error->type = RTE_TM_ERROR_TYPE_NODE_PARAMS;
-			error->message = "Wrong level";
+			error->message = "Wrong level, root node (NULL parent) must be at level 0";
 			return -EINVAL;
 		}
 
@@ -449,7 +481,7 @@ ice_tm_node_add(struct rte_eth_dev *dev, uint32_t node_id,
 		if (!tm_node)
 			return -ENOMEM;
 		tm_node->id = node_id;
-		tm_node->level = ICE_TM_NODE_TYPE_PORT;
+		tm_node->level = 0;
 		tm_node->parent = NULL;
 		tm_node->reference_count = 0;
 		tm_node->shaper_profile = shaper_profile;
@@ -462,52 +494,23 @@ ice_tm_node_add(struct rte_eth_dev *dev, uint32_t node_id,
 	}
 
 	/* check the parent node */
-	parent_node = find_node(pf->tm_conf.root, parent_node_id);
-	if (!parent_node) {
-		error->type = RTE_TM_ERROR_TYPE_NODE_PARENT_NODE_ID;
-		error->message = "parent not exist";
-		return -EINVAL;
-	}
-	if (parent_node->level != ICE_TM_NODE_TYPE_PORT &&
-	    parent_node->level != ICE_TM_NODE_TYPE_QGROUP) {
+	/* for n-level hierarchy, level n-1 is leaf, so last level with children is n-2 */
+	if ((int)parent_node->level > hw->num_tx_sched_layers - 2) {
 		error->type = RTE_TM_ERROR_TYPE_NODE_PARENT_NODE_ID;
 		error->message = "parent is not valid";
 		return -EINVAL;
 	}
-	/* check level */
-	if (level_id != RTE_TM_NODE_LEVEL_ID_ANY &&
-	    level_id != parent_node->level + 1) {
-		error->type = RTE_TM_ERROR_TYPE_NODE_PARAMS;
-		error->message = "Wrong level";
-		return -EINVAL;
-	}
 
-	/* check the node number */
-	if (parent_node->level == ICE_TM_NODE_TYPE_PORT) {
-		/* check the queue group number */
-		if (parent_node->reference_count >= pf->dev_data->nb_tx_queues) {
-			error->type = RTE_TM_ERROR_TYPE_NODE_ID;
-			error->message = "too many queue groups";
-			return -EINVAL;
-		}
-	} else {
-		/* check the queue number */
-		if (parent_node->reference_count >=
-			MAX_CHILDREN_PER_SCHED_NODE) {
-			error->type = RTE_TM_ERROR_TYPE_NODE_ID;
-			error->message = "too many queues";
-			return -EINVAL;
-		}
-		if (node_id >= pf->dev_data->nb_tx_queues) {
-			error->type = RTE_TM_ERROR_TYPE_NODE_ID;
-			error->message = "too large queue id";
-			return -EINVAL;
-		}
+	/* check the max children allowed at this level */
+	if (parent_node->reference_count >= hw->max_children[parent_node->level]) {
+		error->type = RTE_TM_ERROR_TYPE_CAPABILITIES;
+		error->message = "insufficient number of child nodes supported";
+		return -EINVAL;
 	}
 
 	tm_node = rte_zmalloc(NULL,
 			      sizeof(struct ice_tm_node) +
-			      sizeof(struct ice_tm_node *) * MAX_CHILDREN_PER_TM_NODE,
+			      sizeof(struct ice_tm_node *) * hw->max_children[level_id],
 			      0);
 	if (!tm_node)
 		return -ENOMEM;
@@ -522,13 +525,11 @@ ice_tm_node_add(struct rte_eth_dev *dev, uint32_t node_id,
 		(void *)((uint8_t *)tm_node + sizeof(struct ice_tm_node));
 	tm_node->parent->children[tm_node->parent->reference_count] = tm_node;
 
-	if (tm_node->priority != 0 && level_id != ICE_TM_NODE_TYPE_QUEUE &&
-	    level_id != ICE_TM_NODE_TYPE_QGROUP)
+	if (tm_node->priority != 0)
 		PMD_DRV_LOG(WARNING, "priority != 0 not supported in level %d",
 			    level_id);
 
-	if (tm_node->weight != 1 &&
-	    level_id != ICE_TM_NODE_TYPE_QUEUE && level_id != ICE_TM_NODE_TYPE_QGROUP)
+	if (tm_node->weight != 1 && level_id == 0)
 		PMD_DRV_LOG(WARNING, "weight != 1 not supported in level %d",
 			    level_id);
 
@@ -573,7 +574,7 @@ ice_tm_node_delete(struct rte_eth_dev *dev, uint32_t node_id,
 	}
 
 	/* root node */
-	if (tm_node->level == ICE_TM_NODE_TYPE_PORT) {
+	if (tm_node->level == 0) {
 		rte_free(tm_node);
 		pf->tm_conf.root = NULL;
 		return 0;
@@ -593,53 +594,6 @@ ice_tm_node_delete(struct rte_eth_dev *dev, uint32_t node_id,
 	return 0;
 }
 
-static int ice_move_recfg_lan_txq(struct rte_eth_dev *dev,
-				  struct ice_sched_node *queue_sched_node,
-				  struct ice_sched_node *dst_node,
-				  uint16_t queue_id)
-{
-	struct ice_hw *hw = ICE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
-	struct ice_aqc_move_txqs_data *buf;
-	struct ice_sched_node *queue_parent_node;
-	uint8_t txqs_moved;
-	int ret = ICE_SUCCESS;
-	uint16_t buf_size = ice_struct_size(buf, txqs, 1);
-
-	buf = (struct ice_aqc_move_txqs_data *)ice_malloc(hw, sizeof(*buf));
-	if (buf == NULL)
-		return -ENOMEM;
-
-	queue_parent_node = queue_sched_node->parent;
-	buf->src_teid = queue_parent_node->info.node_teid;
-	buf->dest_teid = dst_node->info.node_teid;
-	buf->txqs[0].q_teid = queue_sched_node->info.node_teid;
-	buf->txqs[0].txq_id = queue_id;
-
-	ret = ice_aq_move_recfg_lan_txq(hw, 1, true, false, false, false, 50,
-					NULL, buf, buf_size, &txqs_moved, NULL);
-	if (ret || txqs_moved == 0) {
-		PMD_DRV_LOG(ERR, "move lan queue %u failed", queue_id);
-		rte_free(buf);
-		return ICE_ERR_PARAM;
-	}
-
-	if (queue_parent_node->num_children > 0) {
-		queue_parent_node->num_children--;
-		queue_parent_node->children[queue_parent_node->num_children] = NULL;
-	} else {
-		PMD_DRV_LOG(ERR, "invalid children number %d for queue %u",
-			    queue_parent_node->num_children, queue_id);
-		rte_free(buf);
-		return ICE_ERR_PARAM;
-	}
-	dst_node->children[dst_node->num_children++] = queue_sched_node;
-	queue_sched_node->parent = dst_node;
-	ice_sched_query_elem(hw, queue_sched_node->info.node_teid, &queue_sched_node->info);
-
-	rte_free(buf);
-	return ret;
-}
-
 static int ice_set_node_rate(struct ice_hw *hw,
 			     struct ice_tm_node *tm_node,
 			     struct ice_sched_node *sched_node)
@@ -727,240 +681,178 @@ static int ice_cfg_hw_node(struct ice_hw *hw,
 	return 0;
 }
 
-static struct ice_sched_node *ice_get_vsi_node(struct ice_hw *hw)
-{
-	struct ice_sched_node *node = hw->port_info->root;
-	uint32_t vsi_layer = hw->num_tx_sched_layers - ICE_VSI_LAYER_OFFSET;
-	uint32_t i;
-
-	for (i = 0; i < vsi_layer; i++)
-		node = node->children[0];
-
-	return node;
-}
-
-static int ice_reset_noleaf_nodes(struct rte_eth_dev *dev)
+int
+ice_tm_setup_txq_node(struct ice_pf *pf, struct ice_hw *hw, uint16_t qid, uint32_t teid)
 {
-	struct ice_pf *pf = ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
-	struct ice_hw *hw = ICE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
-	struct ice_sched_node *vsi_node = ice_get_vsi_node(hw);
-	struct ice_tm_node *root = pf->tm_conf.root;
-	uint32_t i;
-	int ret;
-
-	/* reset vsi_node */
-	ret = ice_set_node_rate(hw, NULL, vsi_node);
-	if (ret) {
-		PMD_DRV_LOG(ERR, "reset vsi node failed");
-		return ret;
-	}
+	struct ice_sched_node *hw_node = ice_sched_find_node_by_teid(hw->port_info->root, teid);
+	struct ice_tm_node *sw_node = find_node(pf->tm_conf.root, qid);
 
-	if (root == NULL)
+	/* not configured in hierarchy */
+	if (sw_node == NULL)
 		return 0;
 
-	for (i = 0; i < root->reference_count; i++) {
-		struct ice_tm_node *tm_node = root->children[i];
+	sw_node->sched_node = hw_node;
 
-		if (tm_node->sched_node == NULL)
-			continue;
+	/* if the queue node has been put in the wrong place in hierarchy */
+	if (hw_node->parent != sw_node->parent->sched_node) {
+		struct ice_aqc_move_txqs_data *buf;
+		uint8_t txqs_moved = 0;
+		uint16_t buf_size = ice_struct_size(buf, txqs, 1);
+
+		buf = ice_malloc(hw, buf_size);
+		if (buf == NULL)
+			return -ENOMEM;
 
-		ret = ice_cfg_hw_node(hw, NULL, tm_node->sched_node);
-		if (ret) {
-			PMD_DRV_LOG(ERR, "reset queue group node %u failed", tm_node->id);
-			return ret;
+		struct ice_sched_node *parent = hw_node->parent;
+		struct ice_sched_node *new_parent = sw_node->parent->sched_node;
+		buf->src_teid = parent->info.node_teid;
+		buf->dest_teid = new_parent->info.node_teid;
+		buf->txqs[0].q_teid = hw_node->info.node_teid;
+		buf->txqs[0].txq_id = qid;
+
+		int ret = ice_aq_move_recfg_lan_txq(hw, 1, true, false, false, false, 50,
+						NULL, buf, buf_size, &txqs_moved, NULL);
+		if (ret || txqs_moved == 0) {
+			PMD_DRV_LOG(ERR, "move lan queue %u failed", qid);
+			ice_free(hw, buf);
+			return ICE_ERR_PARAM;
 		}
-		tm_node->sched_node = NULL;
+
+		/* now update the ice_sched_nodes to match physical layout */
+		new_parent->children[new_parent->num_children++] = hw_node;
+		hw_node->parent = new_parent;
+		ice_sched_query_elem(hw, hw_node->info.node_teid, &hw_node->info);
+		for (uint16_t i = 0; i < parent->num_children; i++)
+			if (parent->children[i] == hw_node) {
+				/* to remove, just overwrite the old node slot with the last ptr */
+				parent->children[i] = parent->children[--parent->num_children];
+				break;
+			}
 	}
 
-	return 0;
+	return ice_cfg_hw_node(hw, sw_node, hw_node);
 }
 
-static int ice_remove_leaf_nodes(struct rte_eth_dev *dev)
+/* from a given node, recursively deletes all the nodes that belong to that vsi.
+ * Any nodes which can't be deleted because they have children belonging to a different
+ * VSI, are now also adjusted to belong to that VSI also
+ */
+static int
+free_sched_node_recursive(struct ice_port_info *pi, const struct ice_sched_node *root,
+		struct ice_sched_node *node, uint8_t vsi_id)
 {
-	int ret = 0;
-	int i;
+	uint16_t i = 0;
 
-	for (i = 0; i < dev->data->nb_tx_queues; i++) {
-		ret = ice_tx_queue_stop(dev, i);
-		if (ret) {
-			PMD_DRV_LOG(ERR, "stop queue %u failed", i);
-			break;
+	while (i < node->num_children) {
+		if (node->children[i]->vsi_handle != vsi_id) {
+			i++;
+			continue;
 		}
+		free_sched_node_recursive(pi, root, node->children[i], vsi_id);
 	}
 
-	return ret;
-}
-
-static int ice_add_leaf_nodes(struct rte_eth_dev *dev)
-{
-	int ret = 0;
-	int i;
-
-	for (i = 0; i < dev->data->nb_tx_queues; i++) {
-		ret = ice_tx_queue_start(dev, i);
-		if (ret) {
-			PMD_DRV_LOG(ERR, "start queue %u failed", i);
-			break;
-		}
+	if (node != root) {
+		if (node->num_children == 0)
+			ice_free_sched_node(pi, node);
+		else
+			node->vsi_handle = node->children[0]->vsi_handle;
 	}
 
-	return ret;
+	return 0;
 }
 
-int ice_do_hierarchy_commit(struct rte_eth_dev *dev,
-			    int clear_on_fail,
-			    struct rte_tm_error *error)
+static int
+create_sched_node_recursive(struct ice_port_info *pi, struct ice_tm_node *sw_node,
+		struct ice_sched_node *hw_root, uint16_t *created)
 {
-	struct ice_pf *pf = ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
-	struct ice_hw *hw = ICE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
-	struct ice_tm_node *root;
-	struct ice_sched_node *vsi_node = NULL;
-	struct ice_sched_node *queue_node;
-	struct ice_tx_queue *txq;
-	int ret_val = 0;
-	uint32_t i;
-	uint32_t idx_vsi_child;
-	uint32_t idx_qg;
-	uint32_t nb_vsi_child;
-	uint32_t nb_qg;
-	uint32_t qid;
-	uint32_t q_teid;
-
-	/* remove leaf nodes */
-	ret_val = ice_remove_leaf_nodes(dev);
-	if (ret_val) {
-		error->type = RTE_TM_ERROR_TYPE_UNSPECIFIED;
-		PMD_DRV_LOG(ERR, "reset no-leaf nodes failed");
-		goto fail_clear;
-	}
-
-	/* reset no-leaf nodes. */
-	ret_val = ice_reset_noleaf_nodes(dev);
-	if (ret_val) {
-		error->type = RTE_TM_ERROR_TYPE_UNSPECIFIED;
-		PMD_DRV_LOG(ERR, "reset leaf nodes failed");
-		goto add_leaf;
-	}
-
-	/* config vsi node */
-	vsi_node = ice_get_vsi_node(hw);
-	root = pf->tm_conf.root;
-
-	ret_val = ice_set_node_rate(hw, root, vsi_node);
-	if (ret_val) {
-		error->type = RTE_TM_ERROR_TYPE_UNSPECIFIED;
-		PMD_DRV_LOG(ERR,
-			    "configure vsi node %u bandwidth failed",
-			    root->id);
-		goto add_leaf;
-	}
-
-	/* config queue group nodes */
-	nb_vsi_child = vsi_node->num_children;
-	nb_qg = vsi_node->children[0]->num_children;
-
-	idx_vsi_child = 0;
-	idx_qg = 0;
-
-	if (root == NULL)
-		goto commit;
-
-	for (i = 0; i < root->reference_count; i++) {
-		struct ice_tm_node *tm_node = root->children[i];
-		struct ice_tm_node *tm_child_node;
-		struct ice_sched_node *qgroup_sched_node =
-			vsi_node->children[idx_vsi_child]->children[idx_qg];
-		uint32_t j;
-
-		ret_val = ice_cfg_hw_node(hw, tm_node, qgroup_sched_node);
-		if (ret_val) {
-			error->type = RTE_TM_ERROR_TYPE_UNSPECIFIED;
-			PMD_DRV_LOG(ERR,
-				    "configure queue group node %u failed",
-				    tm_node->id);
-			goto reset_leaf;
+	struct ice_sched_node *parent = sw_node->sched_node;
+	uint32_t teid;
+	uint16_t added;
+
+	/* first create all child nodes */
+	for (uint16_t i = 0; i < sw_node->reference_count; i++) {
+		struct ice_tm_node *tm_node = sw_node->children[i];
+		int res = ice_sched_add_elems(pi, hw_root,
+				parent, parent->tx_sched_layer + 1,
+				1 /* num nodes */, &added, &teid,
+				NULL /* no pre-alloc */);
+		if (res != 0) {
+			PMD_DRV_LOG(ERR, "Error with ice_sched_add_elems, adding child node to teid %u",
+					parent->info.node_teid);
+			return -1;
 		}
-
-		for (j = 0; j < tm_node->reference_count; j++) {
-			tm_child_node = tm_node->children[j];
-			qid = tm_child_node->id;
-			ret_val = ice_tx_queue_start(dev, qid);
-			if (ret_val) {
-				error->type = RTE_TM_ERROR_TYPE_UNSPECIFIED;
-				PMD_DRV_LOG(ERR, "start queue %u failed", qid);
-				goto reset_leaf;
-			}
-			txq = dev->data->tx_queues[qid];
-			q_teid = txq->q_teid;
-			queue_node = ice_sched_get_node(hw->port_info, q_teid);
-			if (queue_node == NULL) {
-				error->type = RTE_TM_ERROR_TYPE_UNSPECIFIED;
-				PMD_DRV_LOG(ERR, "get queue %u node failed", qid);
-				goto reset_leaf;
-			}
-			if (queue_node->info.parent_teid != qgroup_sched_node->info.node_teid) {
-				ret_val = ice_move_recfg_lan_txq(dev, queue_node,
-								 qgroup_sched_node, qid);
-				if (ret_val) {
-					error->type = RTE_TM_ERROR_TYPE_UNSPECIFIED;
-					PMD_DRV_LOG(ERR, "move queue %u failed", qid);
-					goto reset_leaf;
-				}
-			}
-			ret_val = ice_cfg_hw_node(hw, tm_child_node, queue_node);
-			if (ret_val) {
-				error->type = RTE_TM_ERROR_TYPE_UNSPECIFIED;
-				PMD_DRV_LOG(ERR,
-					    "configure queue group node %u failed",
-					    tm_node->id);
-				goto reset_leaf;
-			}
-		}
-
-		idx_qg++;
-		if (idx_qg >= nb_qg) {
-			idx_qg = 0;
-			idx_vsi_child++;
-		}
-		if (idx_vsi_child >= nb_vsi_child) {
-			error->type = RTE_TM_ERROR_TYPE_UNSPECIFIED;
-			PMD_DRV_LOG(ERR, "too many queues");
-			goto reset_leaf;
+		struct ice_sched_node *hw_node = ice_sched_find_node_by_teid(parent, teid);
+		if (ice_cfg_hw_node(pi->hw, tm_node, hw_node) != 0) {
+			PMD_DRV_LOG(ERR, "Error configuring node %u at layer %u",
+					teid, parent->tx_sched_layer + 1);
+			return -1;
 		}
+		tm_node->sched_node = hw_node;
+		created[hw_node->tx_sched_layer]++;
 	}
 
-commit:
-	pf->tm_conf.committed = true;
-	pf->tm_conf.clear_on_fail = clear_on_fail;
+	/* if we have just created the child nodes in the q-group, i.e. last non-leaf layer,
+	 * then just return, rather than trying to create leaf nodes.
+	 * That is done later at queue start.
+	 */
+	if (sw_node->level + 2 == ice_get_leaf_level(pi->hw))
+		return 0;
 
-	return ret_val;
+	for (uint16_t i = 0; i < sw_node->reference_count; i++) {
+		if (sw_node->children[i]->reference_count == 0)
+			continue;
 
-reset_leaf:
-	ice_remove_leaf_nodes(dev);
-add_leaf:
-	ice_add_leaf_nodes(dev);
-	ice_reset_noleaf_nodes(dev);
-fail_clear:
-	/* clear all the traffic manager configuration */
-	if (clear_on_fail) {
-		ice_tm_conf_uninit(dev);
-		ice_tm_conf_init(dev);
+		if (create_sched_node_recursive(pi, sw_node->children[i], hw_root, created) < 0)
+			return -1;
 	}
-	return ret_val;
+	return 0;
 }
 
-static int ice_hierarchy_commit(struct rte_eth_dev *dev,
+static int
+commit_new_hierarchy(struct rte_eth_dev *dev)
+{
+	struct ice_hw *hw = ICE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+	struct ice_pf *pf = ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
+	struct ice_port_info *pi = hw->port_info;
+	struct ice_tm_node *sw_root = pf->tm_conf.root;
+	struct ice_sched_node *new_vsi_root = (pi->has_tc) ? pi->root->children[0] : pi->root;
+	uint16_t nodes_created_per_level[10] = {0}; /* counted per hw level, not per logical */
+	uint8_t q_lvl = ice_get_leaf_level(hw);
+	uint8_t qg_lvl = q_lvl - 1;
+
+	free_sched_node_recursive(pi, new_vsi_root, new_vsi_root, new_vsi_root->vsi_handle);
+
+	sw_root->sched_node = new_vsi_root;
+	if (create_sched_node_recursive(pi, sw_root, new_vsi_root, nodes_created_per_level) < 0)
+		return -1;
+	for (uint16_t i = 0; i < RTE_DIM(nodes_created_per_level); i++)
+		PMD_DRV_LOG(DEBUG, "Created %u nodes at level %u",
+				nodes_created_per_level[i], i);
+	hw->vsi_ctx[pf->main_vsi->idx]->sched.vsi_node[0] = new_vsi_root;
+
+	pf->main_vsi->nb_qps =
+			RTE_MIN(nodes_created_per_level[qg_lvl] * hw->max_children[qg_lvl],
+				hw->layer_info[q_lvl].max_device_nodes);
+
+	pf->tm_conf.committed = true; /* set flag to be checks on queue start */
+
+	return ice_alloc_lan_q_ctx(hw, 0, 0, pf->main_vsi->nb_qps);
+}
+
+static int
+ice_hierarchy_commit(struct rte_eth_dev *dev,
 				 int clear_on_fail,
 				 struct rte_tm_error *error)
 {
-	struct ice_pf *pf = ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
+	RTE_SET_USED(error);
+	/* commit should only be done to topology before start! */
+	if (dev->data->dev_started)
+		return -1;
 
-	/* if device not started, simply set committed flag and return. */
-	if (!dev->data->dev_started) {
-		pf->tm_conf.committed = true;
-		pf->tm_conf.clear_on_fail = clear_on_fail;
-		return 0;
+	int ret = commit_new_hierarchy(dev);
+	if (ret < 0 && clear_on_fail) {
+		ice_tm_conf_uninit(dev);
+		ice_tm_conf_init(dev);
 	}
-
-	return ice_do_hierarchy_commit(dev, clear_on_fail, error);
+	return ret;
 }
-- 
2.43.0
^ permalink raw reply	[flat|nested] 76+ messages in thread
* [PATCH v4 4/5] net/ice: allowing stopping port to apply TM topology
  2024-10-23 16:27 ` [PATCH v4 0/5] Improve rte_tm support in ICE driver Bruce Richardson
                     ` (2 preceding siblings ...)
  2024-10-23 16:27   ` [PATCH v4 3/5] net/ice: enhance Tx scheduler hierarchy support Bruce Richardson
@ 2024-10-23 16:27   ` Bruce Richardson
  2024-10-23 16:27   ` [PATCH v4 5/5] net/ice: provide parameter to limit scheduler layers Bruce Richardson
  4 siblings, 0 replies; 76+ messages in thread
From: Bruce Richardson @ 2024-10-23 16:27 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson
The rte_tm topology commit requires the port to be stopped on apply.
Rather than just returning an error when the port is already started, we
can stop the port, apply the topology to it and then restart it.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/ice/ice_tm.c | 23 +++++++++++++++++++----
 1 file changed, 19 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ice/ice_tm.c b/drivers/net/ice/ice_tm.c
index 4809bdde40..09e947a3b1 100644
--- a/drivers/net/ice/ice_tm.c
+++ b/drivers/net/ice/ice_tm.c
@@ -844,15 +844,30 @@ ice_hierarchy_commit(struct rte_eth_dev *dev,
 				 int clear_on_fail,
 				 struct rte_tm_error *error)
 {
-	RTE_SET_USED(error);
-	/* commit should only be done to topology before start! */
-	if (dev->data->dev_started)
-		return -1;
+	bool restart = false;
+
+	/* commit should only be done to topology before start
+	 * If port is already started, stop it and then restart when done.
+	 */
+	if (dev->data->dev_started) {
+		if (rte_eth_dev_stop(dev->data->port_id) != 0) {
+			error->message = "Device failed to Stop";
+			return -1;
+		}
+		restart = true;
+	}
 
 	int ret = commit_new_hierarchy(dev);
 	if (ret < 0 && clear_on_fail) {
 		ice_tm_conf_uninit(dev);
 		ice_tm_conf_init(dev);
 	}
+
+	if (restart) {
+		if (rte_eth_dev_start(dev->data->port_id) != 0) {
+			error->message = "Device failed to Start";
+			return -1;
+		}
+	}
 	return ret;
 }
-- 
2.43.0
^ permalink raw reply	[flat|nested] 76+ messages in thread
* [PATCH v4 5/5] net/ice: provide parameter to limit scheduler layers
  2024-10-23 16:27 ` [PATCH v4 0/5] Improve rte_tm support in ICE driver Bruce Richardson
                     ` (3 preceding siblings ...)
  2024-10-23 16:27   ` [PATCH v4 4/5] net/ice: allowing stopping port to apply TM topology Bruce Richardson
@ 2024-10-23 16:27   ` Bruce Richardson
  4 siblings, 0 replies; 76+ messages in thread
From: Bruce Richardson @ 2024-10-23 16:27 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson
In order to help with backward compatibility for applications, which may
expect the ice driver tx scheduler (accessed via tm apis) to only have 3
layers, add in a devarg to allow the user to explicitly limit the
number of scheduler layers visible to the application.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 doc/guides/nics/ice.rst      | 16 +++++++++-
 drivers/net/ice/ice_ethdev.c | 57 ++++++++++++++++++++++++++++++++++++
 drivers/net/ice/ice_ethdev.h |  4 ++-
 drivers/net/ice/ice_tm.c     | 28 ++++++++++--------
 4 files changed, 91 insertions(+), 14 deletions(-)
diff --git a/doc/guides/nics/ice.rst b/doc/guides/nics/ice.rst
index dc649f8e31..d73fd1b995 100644
--- a/doc/guides/nics/ice.rst
+++ b/doc/guides/nics/ice.rst
@@ -147,6 +147,16 @@ Runtime Configuration
 
     -a 80:00.0,ddp_pkg_file=/path/to/ice-version.pkg
 
+- ``Traffic Management Scheduling Levels``
+
+  The DPDK Traffic Management (rte_tm) APIs can be used to configure the Tx scheduler on the NIC.
+  From 24.11 release, all available hardware layers are available to software.
+  Earlier versions of DPDK only supported 3 levels in the scheduling hierarchy.
+  To help with backward compatibility the ``tm_sched_levels`` parameter can be used to limit the scheduler levels to the provided value.
+  The provided value must be between 3 and 8.
+  If the value provided is greater than the number of levels provided by the HW,
+  SW will use the hardware maximum value.
+
 - ``Protocol extraction for per queue``
 
   Configure the RX queues to do protocol extraction into mbuf for protocol
@@ -450,7 +460,7 @@ The ice PMD provides support for the Traffic Management API (RTE_TM),
 enabling users to configure and manage the traffic shaping and scheduling of transmitted packets.
 By default, all available transmit scheduler layers are available for configuration,
 allowing up to 2000 queues to be configured in a hierarchy of up to 8 levels.
-The number of levels in the hierarchy can be adjusted via driver parameter:
+The number of levels in the hierarchy can be adjusted via driver parameters:
 
 * the default 9-level topology (8 levels usable) can be replaced by a new topology downloaded from a DDP file,
   using the driver parameter ``ddp_load_sched_topo=1``.
@@ -461,6 +471,10 @@ The number of levels in the hierarchy can be adjusted via driver parameter:
   with increased fan-out at the lower 3 levels
   e.g. 64 at levels 2 and 3, and 256 or more at the leaf-node level.
 
+* the number of levels can be reduced by setting the driver parameter ``tm_sched_levels`` to a lower value.
+  This scheme will reduce in software the number of editable levels,
+  but will not affect the fan-out from each level.
+
 For more details on how to configure a Tx scheduling hierarchy,
 please refer to the ``rte_tm`` `API documentation <https://doc.dpdk.org/api/rte__tm_8h.html>`_.
 
diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c
index 7aed26118f..6f3bef1078 100644
--- a/drivers/net/ice/ice_ethdev.c
+++ b/drivers/net/ice/ice_ethdev.c
@@ -40,6 +40,7 @@
 #define ICE_MBUF_CHECK_ARG       "mbuf_check"
 #define ICE_DDP_FILENAME_ARG      "ddp_pkg_file"
 #define ICE_DDP_LOAD_SCHED_ARG    "ddp_load_sched_topo"
+#define ICE_TM_LEVELS_ARG         "tm_sched_levels"
 
 #define ICE_CYCLECOUNTER_MASK  0xffffffffffffffffULL
 
@@ -58,6 +59,7 @@ static const char * const ice_valid_args[] = {
 	ICE_MBUF_CHECK_ARG,
 	ICE_DDP_FILENAME_ARG,
 	ICE_DDP_LOAD_SCHED_ARG,
+	ICE_TM_LEVELS_ARG,
 	NULL
 };
 
@@ -1854,6 +1856,7 @@ ice_send_driver_ver(struct ice_hw *hw)
 static int
 ice_pf_setup(struct ice_pf *pf)
 {
+	struct ice_adapter *ad = ICE_PF_TO_ADAPTER(pf);
 	struct ice_hw *hw = ICE_PF_TO_HW(pf);
 	struct ice_vsi *vsi;
 	uint16_t unused;
@@ -1878,6 +1881,27 @@ ice_pf_setup(struct ice_pf *pf)
 		return -EINVAL;
 	}
 
+	/* set the number of hidden Tx scheduler layers. If no devargs parameter to
+	 * set the number of exposed levels, the default is to expose all levels,
+	 * except the TC layer.
+	 *
+	 * If the number of exposed levels is set, we check that it's not greater
+	 * than the HW can provide (in which case we do nothing except log a warning),
+	 * and then set the hidden layers to be the total number of levels minus the
+	 * requested visible number.
+	 */
+	pf->tm_conf.hidden_layers = hw->port_info->has_tc;
+	if (ad->devargs.tm_exposed_levels != 0) {
+		const uint8_t avail_layers = hw->num_tx_sched_layers - hw->port_info->has_tc;
+		const uint8_t req_layers = ad->devargs.tm_exposed_levels;
+		if (req_layers > avail_layers) {
+			PMD_INIT_LOG(WARNING, "The number of TM scheduler exposed levels exceeds the number of supported levels (%u)",
+					avail_layers);
+			PMD_INIT_LOG(WARNING, "Setting scheduler layers to %u", avail_layers);
+		} else
+			pf->tm_conf.hidden_layers = hw->num_tx_sched_layers - req_layers;
+	}
+
 	pf->main_vsi = vsi;
 	rte_spinlock_init(&pf->link_lock);
 
@@ -2066,6 +2090,32 @@ parse_u64(const char *key, const char *value, void *args)
 	return 0;
 }
 
+static int
+parse_tx_sched_levels(const char *key, const char *value, void *args)
+{
+	uint8_t *num = args;
+	long int tmp;
+	char *endptr;
+
+	errno = 0;
+	tmp = strtol(value, &endptr, 0);
+	/* the value needs two stage validation, since the actual number of available
+	 * levels is not known at this point. Initially just validate that it is in
+	 * the correct range, between 3 and 8. Later validation will check that the
+	 * available layers on a particular port is higher than the value specified here.
+	 */
+	if (errno || *endptr != '\0' ||
+			tmp < (ICE_VSI_LAYER_OFFSET - 1) || tmp >= ICE_SCHED_9_LAYERS) {
+		PMD_DRV_LOG(WARNING, "%s: Invalid value \"%s\", should be in range [%d, %d]",
+			    key, value, ICE_VSI_LAYER_OFFSET - 1, ICE_SCHED_9_LAYERS - 1);
+		return -1;
+	}
+
+	*num = tmp;
+
+	return 0;
+}
+
 static int
 lookup_pps_type(const char *pps_name)
 {
@@ -2312,6 +2362,12 @@ static int ice_parse_devargs(struct rte_eth_dev *dev)
 				 &parse_bool, &ad->devargs.ddp_load_sched);
 	if (ret)
 		goto bail;
+
+	ret = rte_kvargs_process(kvlist, ICE_TM_LEVELS_ARG,
+				 &parse_tx_sched_levels, &ad->devargs.tm_exposed_levels);
+	if (ret)
+		goto bail;
+
 bail:
 	rte_kvargs_free(kvlist);
 	return ret;
@@ -7182,6 +7238,7 @@ RTE_PMD_REGISTER_PARAM_STRING(net_ice,
 			      ICE_DEFAULT_MAC_DISABLE "=<0|1>"
 			      ICE_DDP_FILENAME_ARG "=</path/to/file>"
 			      ICE_DDP_LOAD_SCHED_ARG "=<0|1>"
+			      ICE_TM_LEVELS_ARG "=<N>"
 			      ICE_RX_LOW_LATENCY_ARG "=<0|1>");
 
 RTE_LOG_REGISTER_SUFFIX(ice_logtype_init, init, NOTICE);
diff --git a/drivers/net/ice/ice_ethdev.h b/drivers/net/ice/ice_ethdev.h
index 71fd7bca64..431561e48f 100644
--- a/drivers/net/ice/ice_ethdev.h
+++ b/drivers/net/ice/ice_ethdev.h
@@ -484,6 +484,7 @@ struct ice_tm_node {
 struct ice_tm_conf {
 	struct ice_shaper_profile_list shaper_profile_list;
 	struct ice_tm_node *root; /* root node - port */
+	uint8_t hidden_layers;    /* the number of hierarchy layers hidden from app */
 	bool committed;
 	bool clear_on_fail;
 };
@@ -557,6 +558,7 @@ struct ice_devargs {
 	uint8_t pin_idx;
 	uint8_t pps_out_ena;
 	uint8_t ddp_load_sched;
+	uint8_t tm_exposed_levels;
 	int xtr_field_offs;
 	uint8_t xtr_flag_offs[PROTO_XTR_MAX];
 	/* Name of the field. */
@@ -660,7 +662,7 @@ struct ice_vsi_vlan_pvid_info {
 
 /* ICE_PF_TO */
 #define ICE_PF_TO_HW(pf) \
-	(&(((struct ice_pf *)pf)->adapter->hw))
+	(&((pf)->adapter->hw))
 #define ICE_PF_TO_ADAPTER(pf) \
 	((struct ice_adapter *)(pf)->adapter)
 #define ICE_PF_TO_ETH_DEV(pf) \
diff --git a/drivers/net/ice/ice_tm.c b/drivers/net/ice/ice_tm.c
index 09e947a3b1..9e943da7a1 100644
--- a/drivers/net/ice/ice_tm.c
+++ b/drivers/net/ice/ice_tm.c
@@ -200,9 +200,10 @@ find_node(struct ice_tm_node *root, uint32_t id)
 }
 
 static inline uint8_t
-ice_get_leaf_level(struct ice_hw *hw)
+ice_get_leaf_level(const struct ice_pf *pf)
 {
-	return hw->num_tx_sched_layers - 1 - hw->port_info->has_tc;
+	const struct ice_hw *hw = ICE_PF_TO_HW(pf);
+	return hw->num_tx_sched_layers - pf->tm_conf.hidden_layers - 1;
 }
 
 static int
@@ -210,7 +211,6 @@ ice_node_type_get(struct rte_eth_dev *dev, uint32_t node_id,
 		   int *is_leaf, struct rte_tm_error *error)
 {
 	struct ice_pf *pf = ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
-	struct ice_hw *hw = ICE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
 	struct ice_tm_node *tm_node;
 
 	if (!is_leaf || !error)
@@ -230,7 +230,7 @@ ice_node_type_get(struct rte_eth_dev *dev, uint32_t node_id,
 		return -EINVAL;
 	}
 
-	if (tm_node->level == ice_get_leaf_level(hw))
+	if (tm_node->level == ice_get_leaf_level(pf))
 		*is_leaf = true;
 	else
 		*is_leaf = false;
@@ -434,7 +434,7 @@ ice_tm_node_add(struct rte_eth_dev *dev, uint32_t node_id,
 	}
 
 	ret = ice_node_param_check(node_id, priority, weight,
-			params, level_id == ice_get_leaf_level(hw), error);
+			params, level_id == ice_get_leaf_level(pf), error);
 	if (ret)
 		return ret;
 
@@ -762,8 +762,8 @@ free_sched_node_recursive(struct ice_port_info *pi, const struct ice_sched_node
 }
 
 static int
-create_sched_node_recursive(struct ice_port_info *pi, struct ice_tm_node *sw_node,
-		struct ice_sched_node *hw_root, uint16_t *created)
+create_sched_node_recursive(struct ice_pf *pf, struct ice_port_info *pi,
+		 struct ice_tm_node *sw_node, struct ice_sched_node *hw_root, uint16_t *created)
 {
 	struct ice_sched_node *parent = sw_node->sched_node;
 	uint32_t teid;
@@ -795,14 +795,14 @@ create_sched_node_recursive(struct ice_port_info *pi, struct ice_tm_node *sw_nod
 	 * then just return, rather than trying to create leaf nodes.
 	 * That is done later at queue start.
 	 */
-	if (sw_node->level + 2 == ice_get_leaf_level(pi->hw))
+	if (sw_node->level + 2 == ice_get_leaf_level(pf))
 		return 0;
 
 	for (uint16_t i = 0; i < sw_node->reference_count; i++) {
 		if (sw_node->children[i]->reference_count == 0)
 			continue;
 
-		if (create_sched_node_recursive(pi, sw_node->children[i], hw_root, created) < 0)
+		if (create_sched_node_recursive(pf, pi, sw_node->children[i], hw_root, created) < 0)
 			return -1;
 	}
 	return 0;
@@ -815,15 +815,19 @@ commit_new_hierarchy(struct rte_eth_dev *dev)
 	struct ice_pf *pf = ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
 	struct ice_port_info *pi = hw->port_info;
 	struct ice_tm_node *sw_root = pf->tm_conf.root;
-	struct ice_sched_node *new_vsi_root = (pi->has_tc) ? pi->root->children[0] : pi->root;
+	const uint16_t new_root_level = pf->tm_conf.hidden_layers;
 	uint16_t nodes_created_per_level[10] = {0}; /* counted per hw level, not per logical */
-	uint8_t q_lvl = ice_get_leaf_level(hw);
+	uint8_t q_lvl = ice_get_leaf_level(pf);
 	uint8_t qg_lvl = q_lvl - 1;
 
+	struct ice_sched_node *new_vsi_root = hw->vsi_ctx[pf->main_vsi->idx]->sched.vsi_node[0];
+	while (new_vsi_root->tx_sched_layer > new_root_level)
+		new_vsi_root = new_vsi_root->parent;
+
 	free_sched_node_recursive(pi, new_vsi_root, new_vsi_root, new_vsi_root->vsi_handle);
 
 	sw_root->sched_node = new_vsi_root;
-	if (create_sched_node_recursive(pi, sw_root, new_vsi_root, nodes_created_per_level) < 0)
+	if (create_sched_node_recursive(pf, pi, sw_root, new_vsi_root, nodes_created_per_level) < 0)
 		return -1;
 	for (uint16_t i = 0; i < RTE_DIM(nodes_created_per_level); i++)
 		PMD_DRV_LOG(DEBUG, "Created %u nodes at level %u",
-- 
2.43.0
^ permalink raw reply	[flat|nested] 76+ messages in thread
* [PATCH v5 0/5] Improve rte_tm support in ICE driver
  2024-08-07  9:33 [PATCH 00/15] Improve rte_tm support in ICE driver Bruce Richardson
                   ` (17 preceding siblings ...)
  2024-10-23 16:27 ` [PATCH v4 0/5] Improve rte_tm support in ICE driver Bruce Richardson
@ 2024-10-23 16:55 ` Bruce Richardson
  2024-10-23 16:55   ` [PATCH v5 1/5] net/ice: add option to download scheduler topology Bruce Richardson
                     ` (4 more replies)
  2024-10-29 17:01 ` [PATCH v6 0/5] Improve rte_tm support in ICE driver Bruce Richardson
  19 siblings, 5 replies; 76+ messages in thread
From: Bruce Richardson @ 2024-10-23 16:55 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson
This patchset expands the capabilities of the traffic management
support in the ICE driver. It allows the driver to support different
sizes of topologies, and support >256 queues and more than 3 hierarchy
layers.
---
v5:
* fix checkpatch flagged issues
v4:
* set reduces to only 5 patches:
  - base code changes mostly covered by separate base code patchset (merged rc1)
  - additional minor fixes and enhancements covered by set [1] (merged to next-net-intel for rc2)
* additional work included in set:
  - automatic stopping and restarting of port on configuration
  - ability to reconfigure the sched topology post-commit and then apply that via new commit call
v3:
* remove/implement some code TODOs
* add patch 16 to set.
v2:
* Correct typo in commit log of one patch
* Add missing depends-on tag to the cover letter
[1] https://patches.dpdk.org/project/dpdk/list/?series=33609&state=*
Bruce Richardson (5):
  net/ice: add option to download scheduler topology
  net/ice/base: make context alloc function non-static
  net/ice: enhance Tx scheduler hierarchy support
  net/ice: allowing stopping port to apply TM topology
  net/ice: provide parameter to limit scheduler layers
 doc/guides/nics/ice.rst          |  60 +++-
 drivers/net/ice/base/ice_ddp.c   |  20 +-
 drivers/net/ice/base/ice_ddp.h   |   4 +-
 drivers/net/ice/base/ice_sched.c |   2 +-
 drivers/net/ice/base/ice_sched.h |   3 +
 drivers/net/ice/ice_ethdev.c     |  91 ++++--
 drivers/net/ice/ice_ethdev.h     |  20 +-
 drivers/net/ice/ice_rxtx.c       |  10 +
 drivers/net/ice/ice_tm.c         | 513 +++++++++++++------------------
 9 files changed, 371 insertions(+), 352 deletions(-)
--
2.43.0
^ permalink raw reply	[flat|nested] 76+ messages in thread
* [PATCH v5 1/5] net/ice: add option to download scheduler topology
  2024-10-23 16:55 ` [PATCH v5 0/5] Improve rte_tm support in ICE driver Bruce Richardson
@ 2024-10-23 16:55   ` Bruce Richardson
  2024-10-25 17:01     ` Medvedkin, Vladimir
  2024-10-23 16:55   ` [PATCH v5 2/5] net/ice/base: make context alloc function non-static Bruce Richardson
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 76+ messages in thread
From: Bruce Richardson @ 2024-10-23 16:55 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson
The DDP package file being loaded at init time may contain an
alternative Tx Scheduler topology in it. Add driver option to load this
topology at init time.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 doc/guides/nics/ice.rst        | 15 +++++++++++++++
 drivers/net/ice/base/ice_ddp.c | 20 +++++++++++++++++---
 drivers/net/ice/base/ice_ddp.h |  4 ++--
 drivers/net/ice/ice_ethdev.c   | 24 +++++++++++++++---------
 drivers/net/ice/ice_ethdev.h   |  1 +
 5 files changed, 50 insertions(+), 14 deletions(-)
diff --git a/doc/guides/nics/ice.rst b/doc/guides/nics/ice.rst
index 6c66dc8008..42bbe50968 100644
--- a/doc/guides/nics/ice.rst
+++ b/doc/guides/nics/ice.rst
@@ -298,6 +298,21 @@ Runtime Configuration
   As a trade-off, this configuration may cause the packet processing performance
   degradation due to the PCI bandwidth limitation.
 
+- ``Tx Scheduler Topology Download``
+
+  The default Tx scheduler topology exposed by the NIC,
+  generally a 9-level topology of which 8 levels are SW configurable,
+  may be updated by a new topology loaded from a DDP package file.
+  The ``ddp_load_sched_topo`` option can be used to specify that the scheduler topology,
+  if any, in the DDP package file being used should be loaded into the NIC.
+  For example::
+
+    -a 0000:88:00.0,ddp_load_sched_topo=1
+
+  or::
+
+    -a 0000:88:00.0,ddp_pkg_file=/path/to/pkg.file,ddp_load_sched_topo=1
+
 - ``Tx diagnostics`` (default ``not enabled``)
 
   Set the ``devargs`` parameter ``mbuf_check`` to enable Tx diagnostics.
diff --git a/drivers/net/ice/base/ice_ddp.c b/drivers/net/ice/base/ice_ddp.c
index c17a58eab8..850c722a3f 100644
--- a/drivers/net/ice/base/ice_ddp.c
+++ b/drivers/net/ice/base/ice_ddp.c
@@ -1333,7 +1333,7 @@ ice_fill_hw_ptype(struct ice_hw *hw)
  * ice_copy_and_init_pkg() instead of directly calling ice_init_pkg() in this
  * case.
  */
-enum ice_ddp_state ice_init_pkg(struct ice_hw *hw, u8 *buf, u32 len)
+enum ice_ddp_state ice_init_pkg(struct ice_hw *hw, u8 *buf, u32 len, bool load_sched)
 {
 	bool already_loaded = false;
 	enum ice_ddp_state state;
@@ -1351,6 +1351,20 @@ enum ice_ddp_state ice_init_pkg(struct ice_hw *hw, u8 *buf, u32 len)
 		return state;
 	}
 
+	if (load_sched) {
+		enum ice_status res = ice_cfg_tx_topo(hw, buf, len);
+		if (res != ICE_SUCCESS) {
+			ice_debug(hw, ICE_DBG_INIT,
+				  "failed to apply sched topology  (err: %d)\n",
+				  res);
+			return ICE_DDP_PKG_ERR;
+		}
+		ice_debug(hw, ICE_DBG_INIT,
+			  "Topology download successful, reinitializing device\n");
+		ice_deinit_hw(hw);
+		ice_init_hw(hw);
+	}
+
 	/* initialize package info */
 	state = ice_init_pkg_info(hw, pkg);
 	if (state)
@@ -1423,7 +1437,7 @@ enum ice_ddp_state ice_init_pkg(struct ice_hw *hw, u8 *buf, u32 len)
  * related routines.
  */
 enum ice_ddp_state
-ice_copy_and_init_pkg(struct ice_hw *hw, const u8 *buf, u32 len)
+ice_copy_and_init_pkg(struct ice_hw *hw, const u8 *buf, u32 len, bool load_sched)
 {
 	enum ice_ddp_state state;
 	u8 *buf_copy;
@@ -1433,7 +1447,7 @@ ice_copy_and_init_pkg(struct ice_hw *hw, const u8 *buf, u32 len)
 
 	buf_copy = (u8 *)ice_memdup(hw, buf, len, ICE_NONDMA_TO_NONDMA);
 
-	state = ice_init_pkg(hw, buf_copy, len);
+	state = ice_init_pkg(hw, buf_copy, len, load_sched);
 	if (!ice_is_init_pkg_successful(state)) {
 		/* Free the copy, since we failed to initialize the package */
 		ice_free(hw, buf_copy);
diff --git a/drivers/net/ice/base/ice_ddp.h b/drivers/net/ice/base/ice_ddp.h
index 5512669f44..d79cdee13a 100644
--- a/drivers/net/ice/base/ice_ddp.h
+++ b/drivers/net/ice/base/ice_ddp.h
@@ -454,9 +454,9 @@ ice_pkg_enum_entry(struct ice_seg *ice_seg, struct ice_pkg_enum *state,
 void *
 ice_pkg_enum_section(struct ice_seg *ice_seg, struct ice_pkg_enum *state,
 		     u32 sect_type);
-enum ice_ddp_state ice_init_pkg(struct ice_hw *hw, u8 *buff, u32 len);
+enum ice_ddp_state ice_init_pkg(struct ice_hw *hw, u8 *buff, u32 len, bool load_sched);
 enum ice_ddp_state
-ice_copy_and_init_pkg(struct ice_hw *hw, const u8 *buf, u32 len);
+ice_copy_and_init_pkg(struct ice_hw *hw, const u8 *buf, u32 len, bool load_sched);
 bool ice_is_init_pkg_successful(enum ice_ddp_state state);
 void ice_free_seg(struct ice_hw *hw);
 
diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c
index d5e94a6685..d0a845accd 100644
--- a/drivers/net/ice/ice_ethdev.c
+++ b/drivers/net/ice/ice_ethdev.c
@@ -39,6 +39,7 @@
 #define ICE_RX_LOW_LATENCY_ARG    "rx_low_latency"
 #define ICE_MBUF_CHECK_ARG       "mbuf_check"
 #define ICE_DDP_FILENAME_ARG      "ddp_pkg_file"
+#define ICE_DDP_LOAD_SCHED_ARG    "ddp_load_sched_topo"
 
 #define ICE_CYCLECOUNTER_MASK  0xffffffffffffffffULL
 
@@ -56,6 +57,7 @@ static const char * const ice_valid_args[] = {
 	ICE_DEFAULT_MAC_DISABLE,
 	ICE_MBUF_CHECK_ARG,
 	ICE_DDP_FILENAME_ARG,
+	ICE_DDP_LOAD_SCHED_ARG,
 	NULL
 };
 
@@ -1997,7 +1999,7 @@ int ice_load_pkg(struct ice_adapter *adapter, bool use_dsn, uint64_t dsn)
 load_fw:
 	PMD_INIT_LOG(DEBUG, "DDP package name: %s", pkg_file);
 
-	err = ice_copy_and_init_pkg(hw, buf, bufsz);
+	err = ice_copy_and_init_pkg(hw, buf, bufsz, adapter->devargs.ddp_load_sched);
 	if (!ice_is_init_pkg_successful(err)) {
 		PMD_INIT_LOG(ERR, "ice_copy_and_init_hw failed: %d", err);
 		free(buf);
@@ -2030,19 +2032,18 @@ static int
 parse_bool(const char *key, const char *value, void *args)
 {
 	int *i = (int *)args;
-	char *end;
-	int num;
 
-	num = strtoul(value, &end, 10);
-
-	if (num != 0 && num != 1) {
-		PMD_DRV_LOG(WARNING, "invalid value:\"%s\" for key:\"%s\", "
-			"value must be 0 or 1",
+	if (value == NULL || value[0] == '\0') {
+		PMD_DRV_LOG(WARNING, "key:\"%s\", requires a value, which must be 0 or 1", key);
+		return -1;
+	}
+	if (value[1] != '\0' || (value[0] != '0' && value[0] != '1')) {
+		PMD_DRV_LOG(WARNING, "invalid value:\"%s\" for key:\"%s\", value must be 0 or 1",
 			value, key);
 		return -1;
 	}
 
-	*i = num;
+	*i = value[0] - '0';
 	return 0;
 }
 
@@ -2307,6 +2308,10 @@ static int ice_parse_devargs(struct rte_eth_dev *dev)
 	if (ret)
 		goto bail;
 
+	ret = rte_kvargs_process(kvlist, ICE_DDP_LOAD_SCHED_ARG,
+				 &parse_bool, &ad->devargs.ddp_load_sched);
+	if (ret)
+		goto bail;
 bail:
 	rte_kvargs_free(kvlist);
 	return ret;
@@ -7185,6 +7190,7 @@ RTE_PMD_REGISTER_PARAM_STRING(net_ice,
 			      ICE_SAFE_MODE_SUPPORT_ARG "=<0|1>"
 			      ICE_DEFAULT_MAC_DISABLE "=<0|1>"
 			      ICE_DDP_FILENAME_ARG "=</path/to/file>"
+			      ICE_DDP_LOAD_SCHED_ARG "=<0|1>"
 			      ICE_RX_LOW_LATENCY_ARG "=<0|1>");
 
 RTE_LOG_REGISTER_SUFFIX(ice_logtype_init, init, NOTICE);
diff --git a/drivers/net/ice/ice_ethdev.h b/drivers/net/ice/ice_ethdev.h
index 076cf595e8..2794a76096 100644
--- a/drivers/net/ice/ice_ethdev.h
+++ b/drivers/net/ice/ice_ethdev.h
@@ -564,6 +564,7 @@ struct ice_devargs {
 	uint8_t proto_xtr[ICE_MAX_QUEUE_NUM];
 	uint8_t pin_idx;
 	uint8_t pps_out_ena;
+	uint8_t ddp_load_sched;
 	int xtr_field_offs;
 	uint8_t xtr_flag_offs[PROTO_XTR_MAX];
 	/* Name of the field. */
-- 
2.43.0
^ permalink raw reply	[flat|nested] 76+ messages in thread
* [PATCH v5 2/5] net/ice/base: make context alloc function non-static
  2024-10-23 16:55 ` [PATCH v5 0/5] Improve rte_tm support in ICE driver Bruce Richardson
  2024-10-23 16:55   ` [PATCH v5 1/5] net/ice: add option to download scheduler topology Bruce Richardson
@ 2024-10-23 16:55   ` Bruce Richardson
  2024-10-25 17:01     ` Medvedkin, Vladimir
  2024-10-23 16:55   ` [PATCH v5 3/5] net/ice: enhance Tx scheduler hierarchy support Bruce Richardson
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 76+ messages in thread
From: Bruce Richardson @ 2024-10-23 16:55 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson
The function "ice_alloc_lan_q_ctx" will be needed by the driver code, so
make it non-static.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/ice/base/ice_sched.c | 2 +-
 drivers/net/ice/base/ice_sched.h | 3 +++
 2 files changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ice/base/ice_sched.c b/drivers/net/ice/base/ice_sched.c
index 9608ac7c24..1f520bb7c0 100644
--- a/drivers/net/ice/base/ice_sched.c
+++ b/drivers/net/ice/base/ice_sched.c
@@ -570,7 +570,7 @@ ice_sched_suspend_resume_elems(struct ice_hw *hw, u8 num_nodes, u32 *node_teids,
  * @tc: TC number
  * @new_numqs: number of queues
  */
-static int
+int
 ice_alloc_lan_q_ctx(struct ice_hw *hw, u16 vsi_handle, u8 tc, u16 new_numqs)
 {
 	struct ice_vsi_ctx *vsi_ctx;
diff --git a/drivers/net/ice/base/ice_sched.h b/drivers/net/ice/base/ice_sched.h
index 9f78516dfb..09d60d02f0 100644
--- a/drivers/net/ice/base/ice_sched.h
+++ b/drivers/net/ice/base/ice_sched.h
@@ -270,4 +270,7 @@ int ice_sched_replay_q_bw(struct ice_port_info *pi, struct ice_q_ctx *q_ctx);
 int
 ice_sched_cfg_node_bw_alloc(struct ice_hw *hw, struct ice_sched_node *node,
 			    enum ice_rl_type rl_type, u16 bw_alloc);
+
+int
+ice_alloc_lan_q_ctx(struct ice_hw *hw, u16 vsi_handle, u8 tc, u16 new_numqs);
 #endif /* _ICE_SCHED_H_ */
-- 
2.43.0
^ permalink raw reply	[flat|nested] 76+ messages in thread
* [PATCH v5 3/5] net/ice: enhance Tx scheduler hierarchy support
  2024-10-23 16:55 ` [PATCH v5 0/5] Improve rte_tm support in ICE driver Bruce Richardson
  2024-10-23 16:55   ` [PATCH v5 1/5] net/ice: add option to download scheduler topology Bruce Richardson
  2024-10-23 16:55   ` [PATCH v5 2/5] net/ice/base: make context alloc function non-static Bruce Richardson
@ 2024-10-23 16:55   ` Bruce Richardson
  2024-10-25 17:02     ` Medvedkin, Vladimir
  2024-10-23 16:55   ` [PATCH v5 4/5] net/ice: allowing stopping port to apply TM topology Bruce Richardson
  2024-10-23 16:55   ` [PATCH v5 5/5] net/ice: provide parameter to limit scheduler layers Bruce Richardson
  4 siblings, 1 reply; 76+ messages in thread
From: Bruce Richardson @ 2024-10-23 16:55 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson
Increase the flexibility of the Tx scheduler hierarchy support in the
driver. If the HW/firmware allows it, allow creating up to 2k child
nodes per scheduler node. Also expand the number of supported layers to
the max available, rather than always just having 3 layers.  One
restriction on this change is that the topology needs to be configured
and enabled before port queue setup, in many cases, and before port
start in all cases.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 doc/guides/nics/ice.rst      |  31 +--
 drivers/net/ice/ice_ethdev.c |   9 -
 drivers/net/ice/ice_ethdev.h |  15 +-
 drivers/net/ice/ice_rxtx.c   |  10 +
 drivers/net/ice/ice_tm.c     | 496 ++++++++++++++---------------------
 5 files changed, 224 insertions(+), 337 deletions(-)
diff --git a/doc/guides/nics/ice.rst b/doc/guides/nics/ice.rst
index 42bbe50968..df489be08d 100644
--- a/doc/guides/nics/ice.rst
+++ b/doc/guides/nics/ice.rst
@@ -447,21 +447,22 @@ Traffic Management Support
 ~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 The ice PMD provides support for the Traffic Management API (RTE_TM),
-allow users to offload a 3-layers Tx scheduler on the E810 NIC:
-
-- ``Port Layer``
-
-  This is the root layer, support peak bandwidth configuration,
-  max to 32 children.
-
-- ``Queue Group Layer``
-
-  The middle layer, support peak / committed bandwidth, weight, priority configurations,
-  max to 8 children.
-
-- ``Queue Layer``
-
-  The leaf layer, support peak / committed bandwidth, weight, priority configurations.
+enabling users to configure and manage the traffic shaping and scheduling of transmitted packets.
+By default, all available transmit scheduler layers are available for configuration,
+allowing up to 2000 queues to be configured in a hierarchy of up to 8 levels.
+The number of levels in the hierarchy can be adjusted via driver parameter:
+
+* the default 9-level topology (8 levels usable) can be replaced by a new topology downloaded from a DDP file,
+  using the driver parameter ``ddp_load_sched_topo=1``.
+  Using this mechanism, if the number of levels is reduced,
+  the possible fan-out of child-nodes from each level may be increased.
+  The default topology is a 9-level tree with a fan-out of 8 at each level.
+  Released DDP package files contain a 5-level hierarchy (4-levels usable),
+  with increased fan-out at the lower 3 levels
+  e.g. 64 at levels 2 and 3, and 256 or more at the leaf-node level.
+
+For more details on how to configure a Tx scheduling hierarchy,
+please refer to the ``rte_tm`` `API documentation <https://doc.dpdk.org/api/rte__tm_8h.html>`_.
 
 Additional Options
 ++++++++++++++++++
diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c
index d0a845accd..7aed26118f 100644
--- a/drivers/net/ice/ice_ethdev.c
+++ b/drivers/net/ice/ice_ethdev.c
@@ -3906,7 +3906,6 @@ ice_dev_start(struct rte_eth_dev *dev)
 	int mask, ret;
 	uint8_t timer = hw->func_caps.ts_func_info.tmr_index_owned;
 	uint32_t pin_idx = ad->devargs.pin_idx;
-	struct rte_tm_error tm_err;
 	ice_declare_bitmap(pmask, ICE_PROMISC_MAX);
 	ice_zero_bitmap(pmask, ICE_PROMISC_MAX);
 
@@ -3938,14 +3937,6 @@ ice_dev_start(struct rte_eth_dev *dev)
 		}
 	}
 
-	if (pf->tm_conf.committed) {
-		ret = ice_do_hierarchy_commit(dev, pf->tm_conf.clear_on_fail, &tm_err);
-		if (ret) {
-			PMD_DRV_LOG(ERR, "fail to commit Tx scheduler");
-			goto rx_err;
-		}
-	}
-
 	ice_set_rx_function(dev);
 	ice_set_tx_function(dev);
 
diff --git a/drivers/net/ice/ice_ethdev.h b/drivers/net/ice/ice_ethdev.h
index 2794a76096..71fd7bca64 100644
--- a/drivers/net/ice/ice_ethdev.h
+++ b/drivers/net/ice/ice_ethdev.h
@@ -480,14 +480,6 @@ struct ice_tm_node {
 	struct ice_sched_node *sched_node;
 };
 
-/* node type of Traffic Manager */
-enum ice_tm_node_type {
-	ICE_TM_NODE_TYPE_PORT,
-	ICE_TM_NODE_TYPE_QGROUP,
-	ICE_TM_NODE_TYPE_QUEUE,
-	ICE_TM_NODE_TYPE_MAX,
-};
-
 /* Struct to store all the Traffic Manager configuration. */
 struct ice_tm_conf {
 	struct ice_shaper_profile_list shaper_profile_list;
@@ -690,9 +682,6 @@ int ice_rem_rss_cfg_wrap(struct ice_pf *pf, uint16_t vsi_id,
 			 struct ice_rss_hash_cfg *cfg);
 void ice_tm_conf_init(struct rte_eth_dev *dev);
 void ice_tm_conf_uninit(struct rte_eth_dev *dev);
-int ice_do_hierarchy_commit(struct rte_eth_dev *dev,
-			    int clear_on_fail,
-			    struct rte_tm_error *error);
 extern const struct rte_tm_ops ice_tm_ops;
 
 static inline int
@@ -750,4 +739,8 @@ int rte_pmd_ice_dump_switch(uint16_t port, uint8_t **buff, uint32_t *size);
 
 __rte_experimental
 int rte_pmd_ice_dump_txsched(uint16_t port, bool detail, FILE *stream);
+
+int
+ice_tm_setup_txq_node(struct ice_pf *pf, struct ice_hw *hw, uint16_t qid, uint32_t node_teid);
+
 #endif /* _ICE_ETHDEV_H_ */
diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c
index 024d97cb46..0c7106c7e0 100644
--- a/drivers/net/ice/ice_rxtx.c
+++ b/drivers/net/ice/ice_rxtx.c
@@ -747,6 +747,7 @@ ice_tx_queue_start(struct rte_eth_dev *dev, uint16_t tx_queue_id)
 	int err;
 	struct ice_vsi *vsi;
 	struct ice_hw *hw;
+	struct ice_pf *pf;
 	struct ice_aqc_add_tx_qgrp *txq_elem;
 	struct ice_tlan_ctx tx_ctx;
 	int buf_len;
@@ -777,6 +778,7 @@ ice_tx_queue_start(struct rte_eth_dev *dev, uint16_t tx_queue_id)
 
 	vsi = txq->vsi;
 	hw = ICE_VSI_TO_HW(vsi);
+	pf = ICE_VSI_TO_PF(vsi);
 
 	memset(&tx_ctx, 0, sizeof(tx_ctx));
 	txq_elem->num_txqs = 1;
@@ -812,6 +814,14 @@ ice_tx_queue_start(struct rte_eth_dev *dev, uint16_t tx_queue_id)
 	/* store the schedule node id */
 	txq->q_teid = txq_elem->txqs[0].q_teid;
 
+	/* move the queue to correct position in hierarchy, if explicit hierarchy configured */
+	if (pf->tm_conf.committed)
+		if (ice_tm_setup_txq_node(pf, hw, tx_queue_id, txq->q_teid) != 0) {
+			PMD_DRV_LOG(ERR, "Failed to set up txq traffic management node");
+			rte_free(txq_elem);
+			return -EIO;
+		}
+
 	dev->data->tx_queue_state[tx_queue_id] = RTE_ETH_QUEUE_STATE_STARTED;
 
 	rte_free(txq_elem);
diff --git a/drivers/net/ice/ice_tm.c b/drivers/net/ice/ice_tm.c
index 636ab77f26..4809bdde40 100644
--- a/drivers/net/ice/ice_tm.c
+++ b/drivers/net/ice/ice_tm.c
@@ -1,17 +1,17 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2022 Intel Corporation
  */
+#include <rte_ethdev.h>
 #include <rte_tm_driver.h>
 
 #include "ice_ethdev.h"
 #include "ice_rxtx.h"
 
-#define MAX_CHILDREN_PER_SCHED_NODE	8
-#define MAX_CHILDREN_PER_TM_NODE	256
+#define MAX_CHILDREN_PER_TM_NODE	2048
 
 static int ice_hierarchy_commit(struct rte_eth_dev *dev,
 				 int clear_on_fail,
-				 __rte_unused struct rte_tm_error *error);
+				 struct rte_tm_error *error);
 static int ice_tm_node_add(struct rte_eth_dev *dev, uint32_t node_id,
 	      uint32_t parent_node_id, uint32_t priority,
 	      uint32_t weight, uint32_t level_id,
@@ -86,9 +86,10 @@ ice_tm_conf_uninit(struct rte_eth_dev *dev)
 }
 
 static int
-ice_node_param_check(struct ice_pf *pf, uint32_t node_id,
+ice_node_param_check(uint32_t node_id,
 		      uint32_t priority, uint32_t weight,
 		      const struct rte_tm_node_params *params,
+		      bool is_leaf,
 		      struct rte_tm_error *error)
 {
 	/* checked all the unsupported parameter */
@@ -123,7 +124,7 @@ ice_node_param_check(struct ice_pf *pf, uint32_t node_id,
 	}
 
 	/* for non-leaf node */
-	if (node_id >= pf->dev_data->nb_tx_queues) {
+	if (!is_leaf) {
 		if (params->nonleaf.wfq_weight_mode) {
 			error->type =
 				RTE_TM_ERROR_TYPE_NODE_PARAMS_WFQ_WEIGHT_MODE;
@@ -147,6 +148,11 @@ ice_node_param_check(struct ice_pf *pf, uint32_t node_id,
 	}
 
 	/* for leaf node */
+	if (node_id >= RTE_MAX_QUEUES_PER_PORT) {
+		error->type = RTE_TM_ERROR_TYPE_NODE_ID;
+		error->message = "Node ID out of range for a leaf node.";
+		return -EINVAL;
+	}
 	if (params->leaf.cman) {
 		error->type = RTE_TM_ERROR_TYPE_NODE_PARAMS_CMAN;
 		error->message = "Congestion management not supported";
@@ -193,11 +199,18 @@ find_node(struct ice_tm_node *root, uint32_t id)
 	return NULL;
 }
 
+static inline uint8_t
+ice_get_leaf_level(struct ice_hw *hw)
+{
+	return hw->num_tx_sched_layers - 1 - hw->port_info->has_tc;
+}
+
 static int
 ice_node_type_get(struct rte_eth_dev *dev, uint32_t node_id,
 		   int *is_leaf, struct rte_tm_error *error)
 {
 	struct ice_pf *pf = ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
+	struct ice_hw *hw = ICE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
 	struct ice_tm_node *tm_node;
 
 	if (!is_leaf || !error)
@@ -217,7 +230,7 @@ ice_node_type_get(struct rte_eth_dev *dev, uint32_t node_id,
 		return -EINVAL;
 	}
 
-	if (tm_node->level == ICE_TM_NODE_TYPE_QUEUE)
+	if (tm_node->level == ice_get_leaf_level(hw))
 		*is_leaf = true;
 	else
 		*is_leaf = false;
@@ -393,16 +406,35 @@ ice_tm_node_add(struct rte_eth_dev *dev, uint32_t node_id,
 	      struct rte_tm_error *error)
 {
 	struct ice_pf *pf = ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
+	struct ice_hw *hw = ICE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
 	struct ice_tm_shaper_profile *shaper_profile = NULL;
 	struct ice_tm_node *tm_node;
-	struct ice_tm_node *parent_node;
+	struct ice_tm_node *parent_node = NULL;
 	int ret;
 
 	if (!params || !error)
 		return -EINVAL;
 
-	ret = ice_node_param_check(pf, node_id, priority, weight,
-				    params, error);
+	if (parent_node_id != RTE_TM_NODE_ID_NULL) {
+		parent_node = find_node(pf->tm_conf.root, parent_node_id);
+		if (!parent_node) {
+			error->type = RTE_TM_ERROR_TYPE_NODE_PARENT_NODE_ID;
+			error->message = "parent not exist";
+			return -EINVAL;
+		}
+	}
+	if (level_id == RTE_TM_NODE_LEVEL_ID_ANY && parent_node != NULL)
+		level_id = parent_node->level + 1;
+
+	/* check level */
+	if (parent_node != NULL && level_id != parent_node->level + 1) {
+		error->type = RTE_TM_ERROR_TYPE_NODE_PARAMS;
+		error->message = "Wrong level";
+		return -EINVAL;
+	}
+
+	ret = ice_node_param_check(node_id, priority, weight,
+			params, level_id == ice_get_leaf_level(hw), error);
 	if (ret)
 		return ret;
 
@@ -428,9 +460,9 @@ ice_tm_node_add(struct rte_eth_dev *dev, uint32_t node_id,
 	/* root node if not have a parent */
 	if (parent_node_id == RTE_TM_NODE_ID_NULL) {
 		/* check level */
-		if (level_id != ICE_TM_NODE_TYPE_PORT) {
+		if (level_id != 0) {
 			error->type = RTE_TM_ERROR_TYPE_NODE_PARAMS;
-			error->message = "Wrong level";
+			error->message = "Wrong level, root node (NULL parent) must be at level 0";
 			return -EINVAL;
 		}
 
@@ -449,7 +481,7 @@ ice_tm_node_add(struct rte_eth_dev *dev, uint32_t node_id,
 		if (!tm_node)
 			return -ENOMEM;
 		tm_node->id = node_id;
-		tm_node->level = ICE_TM_NODE_TYPE_PORT;
+		tm_node->level = 0;
 		tm_node->parent = NULL;
 		tm_node->reference_count = 0;
 		tm_node->shaper_profile = shaper_profile;
@@ -462,52 +494,23 @@ ice_tm_node_add(struct rte_eth_dev *dev, uint32_t node_id,
 	}
 
 	/* check the parent node */
-	parent_node = find_node(pf->tm_conf.root, parent_node_id);
-	if (!parent_node) {
-		error->type = RTE_TM_ERROR_TYPE_NODE_PARENT_NODE_ID;
-		error->message = "parent not exist";
-		return -EINVAL;
-	}
-	if (parent_node->level != ICE_TM_NODE_TYPE_PORT &&
-	    parent_node->level != ICE_TM_NODE_TYPE_QGROUP) {
+	/* for n-level hierarchy, level n-1 is leaf, so last level with children is n-2 */
+	if ((int)parent_node->level > hw->num_tx_sched_layers - 2) {
 		error->type = RTE_TM_ERROR_TYPE_NODE_PARENT_NODE_ID;
 		error->message = "parent is not valid";
 		return -EINVAL;
 	}
-	/* check level */
-	if (level_id != RTE_TM_NODE_LEVEL_ID_ANY &&
-	    level_id != parent_node->level + 1) {
-		error->type = RTE_TM_ERROR_TYPE_NODE_PARAMS;
-		error->message = "Wrong level";
-		return -EINVAL;
-	}
 
-	/* check the node number */
-	if (parent_node->level == ICE_TM_NODE_TYPE_PORT) {
-		/* check the queue group number */
-		if (parent_node->reference_count >= pf->dev_data->nb_tx_queues) {
-			error->type = RTE_TM_ERROR_TYPE_NODE_ID;
-			error->message = "too many queue groups";
-			return -EINVAL;
-		}
-	} else {
-		/* check the queue number */
-		if (parent_node->reference_count >=
-			MAX_CHILDREN_PER_SCHED_NODE) {
-			error->type = RTE_TM_ERROR_TYPE_NODE_ID;
-			error->message = "too many queues";
-			return -EINVAL;
-		}
-		if (node_id >= pf->dev_data->nb_tx_queues) {
-			error->type = RTE_TM_ERROR_TYPE_NODE_ID;
-			error->message = "too large queue id";
-			return -EINVAL;
-		}
+	/* check the max children allowed at this level */
+	if (parent_node->reference_count >= hw->max_children[parent_node->level]) {
+		error->type = RTE_TM_ERROR_TYPE_CAPABILITIES;
+		error->message = "insufficient number of child nodes supported";
+		return -EINVAL;
 	}
 
 	tm_node = rte_zmalloc(NULL,
 			      sizeof(struct ice_tm_node) +
-			      sizeof(struct ice_tm_node *) * MAX_CHILDREN_PER_TM_NODE,
+			      sizeof(struct ice_tm_node *) * hw->max_children[level_id],
 			      0);
 	if (!tm_node)
 		return -ENOMEM;
@@ -522,13 +525,11 @@ ice_tm_node_add(struct rte_eth_dev *dev, uint32_t node_id,
 		(void *)((uint8_t *)tm_node + sizeof(struct ice_tm_node));
 	tm_node->parent->children[tm_node->parent->reference_count] = tm_node;
 
-	if (tm_node->priority != 0 && level_id != ICE_TM_NODE_TYPE_QUEUE &&
-	    level_id != ICE_TM_NODE_TYPE_QGROUP)
+	if (tm_node->priority != 0)
 		PMD_DRV_LOG(WARNING, "priority != 0 not supported in level %d",
 			    level_id);
 
-	if (tm_node->weight != 1 &&
-	    level_id != ICE_TM_NODE_TYPE_QUEUE && level_id != ICE_TM_NODE_TYPE_QGROUP)
+	if (tm_node->weight != 1 && level_id == 0)
 		PMD_DRV_LOG(WARNING, "weight != 1 not supported in level %d",
 			    level_id);
 
@@ -573,7 +574,7 @@ ice_tm_node_delete(struct rte_eth_dev *dev, uint32_t node_id,
 	}
 
 	/* root node */
-	if (tm_node->level == ICE_TM_NODE_TYPE_PORT) {
+	if (tm_node->level == 0) {
 		rte_free(tm_node);
 		pf->tm_conf.root = NULL;
 		return 0;
@@ -593,53 +594,6 @@ ice_tm_node_delete(struct rte_eth_dev *dev, uint32_t node_id,
 	return 0;
 }
 
-static int ice_move_recfg_lan_txq(struct rte_eth_dev *dev,
-				  struct ice_sched_node *queue_sched_node,
-				  struct ice_sched_node *dst_node,
-				  uint16_t queue_id)
-{
-	struct ice_hw *hw = ICE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
-	struct ice_aqc_move_txqs_data *buf;
-	struct ice_sched_node *queue_parent_node;
-	uint8_t txqs_moved;
-	int ret = ICE_SUCCESS;
-	uint16_t buf_size = ice_struct_size(buf, txqs, 1);
-
-	buf = (struct ice_aqc_move_txqs_data *)ice_malloc(hw, sizeof(*buf));
-	if (buf == NULL)
-		return -ENOMEM;
-
-	queue_parent_node = queue_sched_node->parent;
-	buf->src_teid = queue_parent_node->info.node_teid;
-	buf->dest_teid = dst_node->info.node_teid;
-	buf->txqs[0].q_teid = queue_sched_node->info.node_teid;
-	buf->txqs[0].txq_id = queue_id;
-
-	ret = ice_aq_move_recfg_lan_txq(hw, 1, true, false, false, false, 50,
-					NULL, buf, buf_size, &txqs_moved, NULL);
-	if (ret || txqs_moved == 0) {
-		PMD_DRV_LOG(ERR, "move lan queue %u failed", queue_id);
-		rte_free(buf);
-		return ICE_ERR_PARAM;
-	}
-
-	if (queue_parent_node->num_children > 0) {
-		queue_parent_node->num_children--;
-		queue_parent_node->children[queue_parent_node->num_children] = NULL;
-	} else {
-		PMD_DRV_LOG(ERR, "invalid children number %d for queue %u",
-			    queue_parent_node->num_children, queue_id);
-		rte_free(buf);
-		return ICE_ERR_PARAM;
-	}
-	dst_node->children[dst_node->num_children++] = queue_sched_node;
-	queue_sched_node->parent = dst_node;
-	ice_sched_query_elem(hw, queue_sched_node->info.node_teid, &queue_sched_node->info);
-
-	rte_free(buf);
-	return ret;
-}
-
 static int ice_set_node_rate(struct ice_hw *hw,
 			     struct ice_tm_node *tm_node,
 			     struct ice_sched_node *sched_node)
@@ -727,240 +681,178 @@ static int ice_cfg_hw_node(struct ice_hw *hw,
 	return 0;
 }
 
-static struct ice_sched_node *ice_get_vsi_node(struct ice_hw *hw)
-{
-	struct ice_sched_node *node = hw->port_info->root;
-	uint32_t vsi_layer = hw->num_tx_sched_layers - ICE_VSI_LAYER_OFFSET;
-	uint32_t i;
-
-	for (i = 0; i < vsi_layer; i++)
-		node = node->children[0];
-
-	return node;
-}
-
-static int ice_reset_noleaf_nodes(struct rte_eth_dev *dev)
+int
+ice_tm_setup_txq_node(struct ice_pf *pf, struct ice_hw *hw, uint16_t qid, uint32_t teid)
 {
-	struct ice_pf *pf = ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
-	struct ice_hw *hw = ICE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
-	struct ice_sched_node *vsi_node = ice_get_vsi_node(hw);
-	struct ice_tm_node *root = pf->tm_conf.root;
-	uint32_t i;
-	int ret;
-
-	/* reset vsi_node */
-	ret = ice_set_node_rate(hw, NULL, vsi_node);
-	if (ret) {
-		PMD_DRV_LOG(ERR, "reset vsi node failed");
-		return ret;
-	}
+	struct ice_sched_node *hw_node = ice_sched_find_node_by_teid(hw->port_info->root, teid);
+	struct ice_tm_node *sw_node = find_node(pf->tm_conf.root, qid);
 
-	if (root == NULL)
+	/* not configured in hierarchy */
+	if (sw_node == NULL)
 		return 0;
 
-	for (i = 0; i < root->reference_count; i++) {
-		struct ice_tm_node *tm_node = root->children[i];
+	sw_node->sched_node = hw_node;
 
-		if (tm_node->sched_node == NULL)
-			continue;
+	/* if the queue node has been put in the wrong place in hierarchy */
+	if (hw_node->parent != sw_node->parent->sched_node) {
+		struct ice_aqc_move_txqs_data *buf;
+		uint8_t txqs_moved = 0;
+		uint16_t buf_size = ice_struct_size(buf, txqs, 1);
+
+		buf = ice_malloc(hw, buf_size);
+		if (buf == NULL)
+			return -ENOMEM;
 
-		ret = ice_cfg_hw_node(hw, NULL, tm_node->sched_node);
-		if (ret) {
-			PMD_DRV_LOG(ERR, "reset queue group node %u failed", tm_node->id);
-			return ret;
+		struct ice_sched_node *parent = hw_node->parent;
+		struct ice_sched_node *new_parent = sw_node->parent->sched_node;
+		buf->src_teid = parent->info.node_teid;
+		buf->dest_teid = new_parent->info.node_teid;
+		buf->txqs[0].q_teid = hw_node->info.node_teid;
+		buf->txqs[0].txq_id = qid;
+
+		int ret = ice_aq_move_recfg_lan_txq(hw, 1, true, false, false, false, 50,
+						NULL, buf, buf_size, &txqs_moved, NULL);
+		if (ret || txqs_moved == 0) {
+			PMD_DRV_LOG(ERR, "move lan queue %u failed", qid);
+			ice_free(hw, buf);
+			return ICE_ERR_PARAM;
 		}
-		tm_node->sched_node = NULL;
+
+		/* now update the ice_sched_nodes to match physical layout */
+		new_parent->children[new_parent->num_children++] = hw_node;
+		hw_node->parent = new_parent;
+		ice_sched_query_elem(hw, hw_node->info.node_teid, &hw_node->info);
+		for (uint16_t i = 0; i < parent->num_children; i++)
+			if (parent->children[i] == hw_node) {
+				/* to remove, just overwrite the old node slot with the last ptr */
+				parent->children[i] = parent->children[--parent->num_children];
+				break;
+			}
 	}
 
-	return 0;
+	return ice_cfg_hw_node(hw, sw_node, hw_node);
 }
 
-static int ice_remove_leaf_nodes(struct rte_eth_dev *dev)
+/* from a given node, recursively deletes all the nodes that belong to that vsi.
+ * Any nodes which can't be deleted because they have children belonging to a different
+ * VSI, are now also adjusted to belong to that VSI also
+ */
+static int
+free_sched_node_recursive(struct ice_port_info *pi, const struct ice_sched_node *root,
+		struct ice_sched_node *node, uint8_t vsi_id)
 {
-	int ret = 0;
-	int i;
+	uint16_t i = 0;
 
-	for (i = 0; i < dev->data->nb_tx_queues; i++) {
-		ret = ice_tx_queue_stop(dev, i);
-		if (ret) {
-			PMD_DRV_LOG(ERR, "stop queue %u failed", i);
-			break;
+	while (i < node->num_children) {
+		if (node->children[i]->vsi_handle != vsi_id) {
+			i++;
+			continue;
 		}
+		free_sched_node_recursive(pi, root, node->children[i], vsi_id);
 	}
 
-	return ret;
-}
-
-static int ice_add_leaf_nodes(struct rte_eth_dev *dev)
-{
-	int ret = 0;
-	int i;
-
-	for (i = 0; i < dev->data->nb_tx_queues; i++) {
-		ret = ice_tx_queue_start(dev, i);
-		if (ret) {
-			PMD_DRV_LOG(ERR, "start queue %u failed", i);
-			break;
-		}
+	if (node != root) {
+		if (node->num_children == 0)
+			ice_free_sched_node(pi, node);
+		else
+			node->vsi_handle = node->children[0]->vsi_handle;
 	}
 
-	return ret;
+	return 0;
 }
 
-int ice_do_hierarchy_commit(struct rte_eth_dev *dev,
-			    int clear_on_fail,
-			    struct rte_tm_error *error)
+static int
+create_sched_node_recursive(struct ice_port_info *pi, struct ice_tm_node *sw_node,
+		struct ice_sched_node *hw_root, uint16_t *created)
 {
-	struct ice_pf *pf = ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
-	struct ice_hw *hw = ICE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
-	struct ice_tm_node *root;
-	struct ice_sched_node *vsi_node = NULL;
-	struct ice_sched_node *queue_node;
-	struct ice_tx_queue *txq;
-	int ret_val = 0;
-	uint32_t i;
-	uint32_t idx_vsi_child;
-	uint32_t idx_qg;
-	uint32_t nb_vsi_child;
-	uint32_t nb_qg;
-	uint32_t qid;
-	uint32_t q_teid;
-
-	/* remove leaf nodes */
-	ret_val = ice_remove_leaf_nodes(dev);
-	if (ret_val) {
-		error->type = RTE_TM_ERROR_TYPE_UNSPECIFIED;
-		PMD_DRV_LOG(ERR, "reset no-leaf nodes failed");
-		goto fail_clear;
-	}
-
-	/* reset no-leaf nodes. */
-	ret_val = ice_reset_noleaf_nodes(dev);
-	if (ret_val) {
-		error->type = RTE_TM_ERROR_TYPE_UNSPECIFIED;
-		PMD_DRV_LOG(ERR, "reset leaf nodes failed");
-		goto add_leaf;
-	}
-
-	/* config vsi node */
-	vsi_node = ice_get_vsi_node(hw);
-	root = pf->tm_conf.root;
-
-	ret_val = ice_set_node_rate(hw, root, vsi_node);
-	if (ret_val) {
-		error->type = RTE_TM_ERROR_TYPE_UNSPECIFIED;
-		PMD_DRV_LOG(ERR,
-			    "configure vsi node %u bandwidth failed",
-			    root->id);
-		goto add_leaf;
-	}
-
-	/* config queue group nodes */
-	nb_vsi_child = vsi_node->num_children;
-	nb_qg = vsi_node->children[0]->num_children;
-
-	idx_vsi_child = 0;
-	idx_qg = 0;
-
-	if (root == NULL)
-		goto commit;
-
-	for (i = 0; i < root->reference_count; i++) {
-		struct ice_tm_node *tm_node = root->children[i];
-		struct ice_tm_node *tm_child_node;
-		struct ice_sched_node *qgroup_sched_node =
-			vsi_node->children[idx_vsi_child]->children[idx_qg];
-		uint32_t j;
-
-		ret_val = ice_cfg_hw_node(hw, tm_node, qgroup_sched_node);
-		if (ret_val) {
-			error->type = RTE_TM_ERROR_TYPE_UNSPECIFIED;
-			PMD_DRV_LOG(ERR,
-				    "configure queue group node %u failed",
-				    tm_node->id);
-			goto reset_leaf;
+	struct ice_sched_node *parent = sw_node->sched_node;
+	uint32_t teid;
+	uint16_t added;
+
+	/* first create all child nodes */
+	for (uint16_t i = 0; i < sw_node->reference_count; i++) {
+		struct ice_tm_node *tm_node = sw_node->children[i];
+		int res = ice_sched_add_elems(pi, hw_root,
+				parent, parent->tx_sched_layer + 1,
+				1 /* num nodes */, &added, &teid,
+				NULL /* no pre-alloc */);
+		if (res != 0) {
+			PMD_DRV_LOG(ERR, "Error with ice_sched_add_elems, adding child node to teid %u",
+					parent->info.node_teid);
+			return -1;
 		}
-
-		for (j = 0; j < tm_node->reference_count; j++) {
-			tm_child_node = tm_node->children[j];
-			qid = tm_child_node->id;
-			ret_val = ice_tx_queue_start(dev, qid);
-			if (ret_val) {
-				error->type = RTE_TM_ERROR_TYPE_UNSPECIFIED;
-				PMD_DRV_LOG(ERR, "start queue %u failed", qid);
-				goto reset_leaf;
-			}
-			txq = dev->data->tx_queues[qid];
-			q_teid = txq->q_teid;
-			queue_node = ice_sched_get_node(hw->port_info, q_teid);
-			if (queue_node == NULL) {
-				error->type = RTE_TM_ERROR_TYPE_UNSPECIFIED;
-				PMD_DRV_LOG(ERR, "get queue %u node failed", qid);
-				goto reset_leaf;
-			}
-			if (queue_node->info.parent_teid != qgroup_sched_node->info.node_teid) {
-				ret_val = ice_move_recfg_lan_txq(dev, queue_node,
-								 qgroup_sched_node, qid);
-				if (ret_val) {
-					error->type = RTE_TM_ERROR_TYPE_UNSPECIFIED;
-					PMD_DRV_LOG(ERR, "move queue %u failed", qid);
-					goto reset_leaf;
-				}
-			}
-			ret_val = ice_cfg_hw_node(hw, tm_child_node, queue_node);
-			if (ret_val) {
-				error->type = RTE_TM_ERROR_TYPE_UNSPECIFIED;
-				PMD_DRV_LOG(ERR,
-					    "configure queue group node %u failed",
-					    tm_node->id);
-				goto reset_leaf;
-			}
-		}
-
-		idx_qg++;
-		if (idx_qg >= nb_qg) {
-			idx_qg = 0;
-			idx_vsi_child++;
-		}
-		if (idx_vsi_child >= nb_vsi_child) {
-			error->type = RTE_TM_ERROR_TYPE_UNSPECIFIED;
-			PMD_DRV_LOG(ERR, "too many queues");
-			goto reset_leaf;
+		struct ice_sched_node *hw_node = ice_sched_find_node_by_teid(parent, teid);
+		if (ice_cfg_hw_node(pi->hw, tm_node, hw_node) != 0) {
+			PMD_DRV_LOG(ERR, "Error configuring node %u at layer %u",
+					teid, parent->tx_sched_layer + 1);
+			return -1;
 		}
+		tm_node->sched_node = hw_node;
+		created[hw_node->tx_sched_layer]++;
 	}
 
-commit:
-	pf->tm_conf.committed = true;
-	pf->tm_conf.clear_on_fail = clear_on_fail;
+	/* if we have just created the child nodes in the q-group, i.e. last non-leaf layer,
+	 * then just return, rather than trying to create leaf nodes.
+	 * That is done later at queue start.
+	 */
+	if (sw_node->level + 2 == ice_get_leaf_level(pi->hw))
+		return 0;
 
-	return ret_val;
+	for (uint16_t i = 0; i < sw_node->reference_count; i++) {
+		if (sw_node->children[i]->reference_count == 0)
+			continue;
 
-reset_leaf:
-	ice_remove_leaf_nodes(dev);
-add_leaf:
-	ice_add_leaf_nodes(dev);
-	ice_reset_noleaf_nodes(dev);
-fail_clear:
-	/* clear all the traffic manager configuration */
-	if (clear_on_fail) {
-		ice_tm_conf_uninit(dev);
-		ice_tm_conf_init(dev);
+		if (create_sched_node_recursive(pi, sw_node->children[i], hw_root, created) < 0)
+			return -1;
 	}
-	return ret_val;
+	return 0;
 }
 
-static int ice_hierarchy_commit(struct rte_eth_dev *dev,
+static int
+commit_new_hierarchy(struct rte_eth_dev *dev)
+{
+	struct ice_hw *hw = ICE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+	struct ice_pf *pf = ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
+	struct ice_port_info *pi = hw->port_info;
+	struct ice_tm_node *sw_root = pf->tm_conf.root;
+	struct ice_sched_node *new_vsi_root = (pi->has_tc) ? pi->root->children[0] : pi->root;
+	uint16_t nodes_created_per_level[10] = {0}; /* counted per hw level, not per logical */
+	uint8_t q_lvl = ice_get_leaf_level(hw);
+	uint8_t qg_lvl = q_lvl - 1;
+
+	free_sched_node_recursive(pi, new_vsi_root, new_vsi_root, new_vsi_root->vsi_handle);
+
+	sw_root->sched_node = new_vsi_root;
+	if (create_sched_node_recursive(pi, sw_root, new_vsi_root, nodes_created_per_level) < 0)
+		return -1;
+	for (uint16_t i = 0; i < RTE_DIM(nodes_created_per_level); i++)
+		PMD_DRV_LOG(DEBUG, "Created %u nodes at level %u",
+				nodes_created_per_level[i], i);
+	hw->vsi_ctx[pf->main_vsi->idx]->sched.vsi_node[0] = new_vsi_root;
+
+	pf->main_vsi->nb_qps =
+			RTE_MIN(nodes_created_per_level[qg_lvl] * hw->max_children[qg_lvl],
+				hw->layer_info[q_lvl].max_device_nodes);
+
+	pf->tm_conf.committed = true; /* set flag to be checks on queue start */
+
+	return ice_alloc_lan_q_ctx(hw, 0, 0, pf->main_vsi->nb_qps);
+}
+
+static int
+ice_hierarchy_commit(struct rte_eth_dev *dev,
 				 int clear_on_fail,
 				 struct rte_tm_error *error)
 {
-	struct ice_pf *pf = ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
+	RTE_SET_USED(error);
+	/* commit should only be done to topology before start! */
+	if (dev->data->dev_started)
+		return -1;
 
-	/* if device not started, simply set committed flag and return. */
-	if (!dev->data->dev_started) {
-		pf->tm_conf.committed = true;
-		pf->tm_conf.clear_on_fail = clear_on_fail;
-		return 0;
+	int ret = commit_new_hierarchy(dev);
+	if (ret < 0 && clear_on_fail) {
+		ice_tm_conf_uninit(dev);
+		ice_tm_conf_init(dev);
 	}
-
-	return ice_do_hierarchy_commit(dev, clear_on_fail, error);
+	return ret;
 }
-- 
2.43.0
^ permalink raw reply	[flat|nested] 76+ messages in thread
* [PATCH v5 4/5] net/ice: allowing stopping port to apply TM topology
  2024-10-23 16:55 ` [PATCH v5 0/5] Improve rte_tm support in ICE driver Bruce Richardson
                     ` (2 preceding siblings ...)
  2024-10-23 16:55   ` [PATCH v5 3/5] net/ice: enhance Tx scheduler hierarchy support Bruce Richardson
@ 2024-10-23 16:55   ` Bruce Richardson
  2024-10-25 17:02     ` Medvedkin, Vladimir
  2024-10-23 16:55   ` [PATCH v5 5/5] net/ice: provide parameter to limit scheduler layers Bruce Richardson
  4 siblings, 1 reply; 76+ messages in thread
From: Bruce Richardson @ 2024-10-23 16:55 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson
The rte_tm topology commit requires the port to be stopped on apply.
Rather than just returning an error when the port is already started, we
can stop the port, apply the topology to it and then restart it.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/ice/ice_tm.c | 23 +++++++++++++++++++----
 1 file changed, 19 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ice/ice_tm.c b/drivers/net/ice/ice_tm.c
index 4809bdde40..09e947a3b1 100644
--- a/drivers/net/ice/ice_tm.c
+++ b/drivers/net/ice/ice_tm.c
@@ -844,15 +844,30 @@ ice_hierarchy_commit(struct rte_eth_dev *dev,
 				 int clear_on_fail,
 				 struct rte_tm_error *error)
 {
-	RTE_SET_USED(error);
-	/* commit should only be done to topology before start! */
-	if (dev->data->dev_started)
-		return -1;
+	bool restart = false;
+
+	/* commit should only be done to topology before start
+	 * If port is already started, stop it and then restart when done.
+	 */
+	if (dev->data->dev_started) {
+		if (rte_eth_dev_stop(dev->data->port_id) != 0) {
+			error->message = "Device failed to Stop";
+			return -1;
+		}
+		restart = true;
+	}
 
 	int ret = commit_new_hierarchy(dev);
 	if (ret < 0 && clear_on_fail) {
 		ice_tm_conf_uninit(dev);
 		ice_tm_conf_init(dev);
 	}
+
+	if (restart) {
+		if (rte_eth_dev_start(dev->data->port_id) != 0) {
+			error->message = "Device failed to Start";
+			return -1;
+		}
+	}
 	return ret;
 }
-- 
2.43.0
^ permalink raw reply	[flat|nested] 76+ messages in thread
* [PATCH v5 5/5] net/ice: provide parameter to limit scheduler layers
  2024-10-23 16:55 ` [PATCH v5 0/5] Improve rte_tm support in ICE driver Bruce Richardson
                     ` (3 preceding siblings ...)
  2024-10-23 16:55   ` [PATCH v5 4/5] net/ice: allowing stopping port to apply TM topology Bruce Richardson
@ 2024-10-23 16:55   ` Bruce Richardson
  2024-10-25 17:02     ` Medvedkin, Vladimir
  4 siblings, 1 reply; 76+ messages in thread
From: Bruce Richardson @ 2024-10-23 16:55 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson
In order to help with backward compatibility for applications, which may
expect the ice driver tx scheduler (accessed via tm apis) to only have 3
layers, add in a devarg to allow the user to explicitly limit the
number of scheduler layers visible to the application.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 doc/guides/nics/ice.rst      | 16 +++++++++-
 drivers/net/ice/ice_ethdev.c | 58 ++++++++++++++++++++++++++++++++++++
 drivers/net/ice/ice_ethdev.h |  4 ++-
 drivers/net/ice/ice_tm.c     | 28 +++++++++--------
 4 files changed, 92 insertions(+), 14 deletions(-)
diff --git a/doc/guides/nics/ice.rst b/doc/guides/nics/ice.rst
index df489be08d..471343a0ac 100644
--- a/doc/guides/nics/ice.rst
+++ b/doc/guides/nics/ice.rst
@@ -147,6 +147,16 @@ Runtime Configuration
 
     -a 80:00.0,ddp_pkg_file=/path/to/ice-version.pkg
 
+- ``Traffic Management Scheduling Levels``
+
+  The DPDK Traffic Management (rte_tm) APIs can be used to configure the Tx scheduler on the NIC.
+  From 24.11 release, all available hardware layers are available to software.
+  Earlier versions of DPDK only supported 3 levels in the scheduling hierarchy.
+  To help with backward compatibility the ``tm_sched_levels`` parameter can be used to limit the scheduler levels to the provided value.
+  The provided value must be between 3 and 8.
+  If the value provided is greater than the number of levels provided by the HW,
+  SW will use the hardware maximum value.
+
 - ``Protocol extraction for per queue``
 
   Configure the RX queues to do protocol extraction into mbuf for protocol
@@ -450,7 +460,7 @@ The ice PMD provides support for the Traffic Management API (RTE_TM),
 enabling users to configure and manage the traffic shaping and scheduling of transmitted packets.
 By default, all available transmit scheduler layers are available for configuration,
 allowing up to 2000 queues to be configured in a hierarchy of up to 8 levels.
-The number of levels in the hierarchy can be adjusted via driver parameter:
+The number of levels in the hierarchy can be adjusted via driver parameters:
 
 * the default 9-level topology (8 levels usable) can be replaced by a new topology downloaded from a DDP file,
   using the driver parameter ``ddp_load_sched_topo=1``.
@@ -461,6 +471,10 @@ The number of levels in the hierarchy can be adjusted via driver parameter:
   with increased fan-out at the lower 3 levels
   e.g. 64 at levels 2 and 3, and 256 or more at the leaf-node level.
 
+* the number of levels can be reduced by setting the driver parameter ``tm_sched_levels`` to a lower value.
+  This scheme will reduce in software the number of editable levels,
+  but will not affect the fan-out from each level.
+
 For more details on how to configure a Tx scheduling hierarchy,
 please refer to the ``rte_tm`` `API documentation <https://doc.dpdk.org/api/rte__tm_8h.html>`_.
 
diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c
index 7aed26118f..0f1f34e739 100644
--- a/drivers/net/ice/ice_ethdev.c
+++ b/drivers/net/ice/ice_ethdev.c
@@ -40,6 +40,7 @@
 #define ICE_MBUF_CHECK_ARG       "mbuf_check"
 #define ICE_DDP_FILENAME_ARG      "ddp_pkg_file"
 #define ICE_DDP_LOAD_SCHED_ARG    "ddp_load_sched_topo"
+#define ICE_TM_LEVELS_ARG         "tm_sched_levels"
 
 #define ICE_CYCLECOUNTER_MASK  0xffffffffffffffffULL
 
@@ -58,6 +59,7 @@ static const char * const ice_valid_args[] = {
 	ICE_MBUF_CHECK_ARG,
 	ICE_DDP_FILENAME_ARG,
 	ICE_DDP_LOAD_SCHED_ARG,
+	ICE_TM_LEVELS_ARG,
 	NULL
 };
 
@@ -1854,6 +1856,7 @@ ice_send_driver_ver(struct ice_hw *hw)
 static int
 ice_pf_setup(struct ice_pf *pf)
 {
+	struct ice_adapter *ad = ICE_PF_TO_ADAPTER(pf);
 	struct ice_hw *hw = ICE_PF_TO_HW(pf);
 	struct ice_vsi *vsi;
 	uint16_t unused;
@@ -1878,6 +1881,28 @@ ice_pf_setup(struct ice_pf *pf)
 		return -EINVAL;
 	}
 
+	/* set the number of hidden Tx scheduler layers. If no devargs parameter to
+	 * set the number of exposed levels, the default is to expose all levels,
+	 * except the TC layer.
+	 *
+	 * If the number of exposed levels is set, we check that it's not greater
+	 * than the HW can provide (in which case we do nothing except log a warning),
+	 * and then set the hidden layers to be the total number of levels minus the
+	 * requested visible number.
+	 */
+	pf->tm_conf.hidden_layers = hw->port_info->has_tc;
+	if (ad->devargs.tm_exposed_levels != 0) {
+		const uint8_t avail_layers = hw->num_tx_sched_layers - hw->port_info->has_tc;
+		const uint8_t req_layers = ad->devargs.tm_exposed_levels;
+		if (req_layers > avail_layers) {
+			PMD_INIT_LOG(WARNING, "The number of TM scheduler exposed levels exceeds the number of supported levels (%u)",
+					avail_layers);
+			PMD_INIT_LOG(WARNING, "Setting scheduler layers to %u", avail_layers);
+		} else {
+			pf->tm_conf.hidden_layers = hw->num_tx_sched_layers - req_layers;
+		}
+	}
+
 	pf->main_vsi = vsi;
 	rte_spinlock_init(&pf->link_lock);
 
@@ -2066,6 +2091,32 @@ parse_u64(const char *key, const char *value, void *args)
 	return 0;
 }
 
+static int
+parse_tx_sched_levels(const char *key, const char *value, void *args)
+{
+	uint8_t *num = args;
+	long tmp;
+	char *endptr;
+
+	errno = 0;
+	tmp = strtol(value, &endptr, 0);
+	/* the value needs two stage validation, since the actual number of available
+	 * levels is not known at this point. Initially just validate that it is in
+	 * the correct range, between 3 and 8. Later validation will check that the
+	 * available layers on a particular port is higher than the value specified here.
+	 */
+	if (errno || *endptr != '\0' ||
+			tmp < (ICE_VSI_LAYER_OFFSET - 1) || tmp >= ICE_SCHED_9_LAYERS) {
+		PMD_DRV_LOG(WARNING, "%s: Invalid value \"%s\", should be in range [%d, %d]",
+			    key, value, ICE_VSI_LAYER_OFFSET - 1, ICE_SCHED_9_LAYERS - 1);
+		return -1;
+	}
+
+	*num = tmp;
+
+	return 0;
+}
+
 static int
 lookup_pps_type(const char *pps_name)
 {
@@ -2312,6 +2363,12 @@ static int ice_parse_devargs(struct rte_eth_dev *dev)
 				 &parse_bool, &ad->devargs.ddp_load_sched);
 	if (ret)
 		goto bail;
+
+	ret = rte_kvargs_process(kvlist, ICE_TM_LEVELS_ARG,
+				 &parse_tx_sched_levels, &ad->devargs.tm_exposed_levels);
+	if (ret)
+		goto bail;
+
 bail:
 	rte_kvargs_free(kvlist);
 	return ret;
@@ -7182,6 +7239,7 @@ RTE_PMD_REGISTER_PARAM_STRING(net_ice,
 			      ICE_DEFAULT_MAC_DISABLE "=<0|1>"
 			      ICE_DDP_FILENAME_ARG "=</path/to/file>"
 			      ICE_DDP_LOAD_SCHED_ARG "=<0|1>"
+			      ICE_TM_LEVELS_ARG "=<N>"
 			      ICE_RX_LOW_LATENCY_ARG "=<0|1>");
 
 RTE_LOG_REGISTER_SUFFIX(ice_logtype_init, init, NOTICE);
diff --git a/drivers/net/ice/ice_ethdev.h b/drivers/net/ice/ice_ethdev.h
index 71fd7bca64..431561e48f 100644
--- a/drivers/net/ice/ice_ethdev.h
+++ b/drivers/net/ice/ice_ethdev.h
@@ -484,6 +484,7 @@ struct ice_tm_node {
 struct ice_tm_conf {
 	struct ice_shaper_profile_list shaper_profile_list;
 	struct ice_tm_node *root; /* root node - port */
+	uint8_t hidden_layers;    /* the number of hierarchy layers hidden from app */
 	bool committed;
 	bool clear_on_fail;
 };
@@ -557,6 +558,7 @@ struct ice_devargs {
 	uint8_t pin_idx;
 	uint8_t pps_out_ena;
 	uint8_t ddp_load_sched;
+	uint8_t tm_exposed_levels;
 	int xtr_field_offs;
 	uint8_t xtr_flag_offs[PROTO_XTR_MAX];
 	/* Name of the field. */
@@ -660,7 +662,7 @@ struct ice_vsi_vlan_pvid_info {
 
 /* ICE_PF_TO */
 #define ICE_PF_TO_HW(pf) \
-	(&(((struct ice_pf *)pf)->adapter->hw))
+	(&((pf)->adapter->hw))
 #define ICE_PF_TO_ADAPTER(pf) \
 	((struct ice_adapter *)(pf)->adapter)
 #define ICE_PF_TO_ETH_DEV(pf) \
diff --git a/drivers/net/ice/ice_tm.c b/drivers/net/ice/ice_tm.c
index 09e947a3b1..9e943da7a1 100644
--- a/drivers/net/ice/ice_tm.c
+++ b/drivers/net/ice/ice_tm.c
@@ -200,9 +200,10 @@ find_node(struct ice_tm_node *root, uint32_t id)
 }
 
 static inline uint8_t
-ice_get_leaf_level(struct ice_hw *hw)
+ice_get_leaf_level(const struct ice_pf *pf)
 {
-	return hw->num_tx_sched_layers - 1 - hw->port_info->has_tc;
+	const struct ice_hw *hw = ICE_PF_TO_HW(pf);
+	return hw->num_tx_sched_layers - pf->tm_conf.hidden_layers - 1;
 }
 
 static int
@@ -210,7 +211,6 @@ ice_node_type_get(struct rte_eth_dev *dev, uint32_t node_id,
 		   int *is_leaf, struct rte_tm_error *error)
 {
 	struct ice_pf *pf = ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
-	struct ice_hw *hw = ICE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
 	struct ice_tm_node *tm_node;
 
 	if (!is_leaf || !error)
@@ -230,7 +230,7 @@ ice_node_type_get(struct rte_eth_dev *dev, uint32_t node_id,
 		return -EINVAL;
 	}
 
-	if (tm_node->level == ice_get_leaf_level(hw))
+	if (tm_node->level == ice_get_leaf_level(pf))
 		*is_leaf = true;
 	else
 		*is_leaf = false;
@@ -434,7 +434,7 @@ ice_tm_node_add(struct rte_eth_dev *dev, uint32_t node_id,
 	}
 
 	ret = ice_node_param_check(node_id, priority, weight,
-			params, level_id == ice_get_leaf_level(hw), error);
+			params, level_id == ice_get_leaf_level(pf), error);
 	if (ret)
 		return ret;
 
@@ -762,8 +762,8 @@ free_sched_node_recursive(struct ice_port_info *pi, const struct ice_sched_node
 }
 
 static int
-create_sched_node_recursive(struct ice_port_info *pi, struct ice_tm_node *sw_node,
-		struct ice_sched_node *hw_root, uint16_t *created)
+create_sched_node_recursive(struct ice_pf *pf, struct ice_port_info *pi,
+		 struct ice_tm_node *sw_node, struct ice_sched_node *hw_root, uint16_t *created)
 {
 	struct ice_sched_node *parent = sw_node->sched_node;
 	uint32_t teid;
@@ -795,14 +795,14 @@ create_sched_node_recursive(struct ice_port_info *pi, struct ice_tm_node *sw_nod
 	 * then just return, rather than trying to create leaf nodes.
 	 * That is done later at queue start.
 	 */
-	if (sw_node->level + 2 == ice_get_leaf_level(pi->hw))
+	if (sw_node->level + 2 == ice_get_leaf_level(pf))
 		return 0;
 
 	for (uint16_t i = 0; i < sw_node->reference_count; i++) {
 		if (sw_node->children[i]->reference_count == 0)
 			continue;
 
-		if (create_sched_node_recursive(pi, sw_node->children[i], hw_root, created) < 0)
+		if (create_sched_node_recursive(pf, pi, sw_node->children[i], hw_root, created) < 0)
 			return -1;
 	}
 	return 0;
@@ -815,15 +815,19 @@ commit_new_hierarchy(struct rte_eth_dev *dev)
 	struct ice_pf *pf = ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
 	struct ice_port_info *pi = hw->port_info;
 	struct ice_tm_node *sw_root = pf->tm_conf.root;
-	struct ice_sched_node *new_vsi_root = (pi->has_tc) ? pi->root->children[0] : pi->root;
+	const uint16_t new_root_level = pf->tm_conf.hidden_layers;
 	uint16_t nodes_created_per_level[10] = {0}; /* counted per hw level, not per logical */
-	uint8_t q_lvl = ice_get_leaf_level(hw);
+	uint8_t q_lvl = ice_get_leaf_level(pf);
 	uint8_t qg_lvl = q_lvl - 1;
 
+	struct ice_sched_node *new_vsi_root = hw->vsi_ctx[pf->main_vsi->idx]->sched.vsi_node[0];
+	while (new_vsi_root->tx_sched_layer > new_root_level)
+		new_vsi_root = new_vsi_root->parent;
+
 	free_sched_node_recursive(pi, new_vsi_root, new_vsi_root, new_vsi_root->vsi_handle);
 
 	sw_root->sched_node = new_vsi_root;
-	if (create_sched_node_recursive(pi, sw_root, new_vsi_root, nodes_created_per_level) < 0)
+	if (create_sched_node_recursive(pf, pi, sw_root, new_vsi_root, nodes_created_per_level) < 0)
 		return -1;
 	for (uint16_t i = 0; i < RTE_DIM(nodes_created_per_level); i++)
 		PMD_DRV_LOG(DEBUG, "Created %u nodes at level %u",
-- 
2.43.0
^ permalink raw reply	[flat|nested] 76+ messages in thread
* Re: [PATCH v5 1/5] net/ice: add option to download scheduler topology
  2024-10-23 16:55   ` [PATCH v5 1/5] net/ice: add option to download scheduler topology Bruce Richardson
@ 2024-10-25 17:01     ` Medvedkin, Vladimir
  2024-10-29 15:48       ` Bruce Richardson
  0 siblings, 1 reply; 76+ messages in thread
From: Medvedkin, Vladimir @ 2024-10-25 17:01 UTC (permalink / raw)
  To: Bruce Richardson, dev
[-- Attachment #1: Type: text/plain, Size: 2617 bytes --]
Hi Bruce,
On 23/10/2024 17:55, Bruce Richardson wrote:
> The DDP package file being loaded at init time may contain an
> alternative Tx Scheduler topology in it. Add driver option to load this
> topology at init time.
>
> Signed-off-by: Bruce Richardson<bruce.richardson@intel.com>
> ---
>   doc/guides/nics/ice.rst        | 15 +++++++++++++++
>   drivers/net/ice/base/ice_ddp.c | 20 +++++++++++++++++---
>   drivers/net/ice/base/ice_ddp.h |  4 ++--
>   drivers/net/ice/ice_ethdev.c   | 24 +++++++++++++++---------
>   drivers/net/ice/ice_ethdev.h   |  1 +
>   5 files changed, 50 insertions(+), 14 deletions(-)
>
<snip>
> @@ -2030,19 +2032,18 @@ static int
>   parse_bool(const char *key, const char *value, void *args)
>   {
>   	int *i = (int *)args;
> -	char *end;
> -	int num;
>   
> -	num = strtoul(value, &end, 10);
> -
> -	if (num != 0 && num != 1) {
> -		PMD_DRV_LOG(WARNING, "invalid value:\"%s\" for key:\"%s\", "
> -			"value must be 0 or 1",
> +	if (value == NULL || value[0] == '\0') {
> +		PMD_DRV_LOG(WARNING, "key:\"%s\", requires a value, which must be 0 or 1", key);
> +		return -1;
> +	}
> +	if (value[1] != '\0' || (value[0] != '0' && value[0] != '1')) {
> +		PMD_DRV_LOG(WARNING, "invalid value:\"%s\" for key:\"%s\", value must be 0 or 1",
>   			value, key);
>   		return -1;
>   	}
>   
> -	*i = num;
> +	*i = value[0] - '0';
Ithinkthatinsteadof usingchararithmetic,it wouldbe better to:
*i = !(value[0] == '0')
>   	return 0;
>   }
>   
> @@ -2307,6 +2308,10 @@ static int ice_parse_devargs(struct rte_eth_dev *dev)
>   	if (ret)
>   		goto bail;
>   
> +	ret = rte_kvargs_process(kvlist, ICE_DDP_LOAD_SCHED_ARG,
> +				 &parse_bool, &ad->devargs.ddp_load_sched);
> +	if (ret)
> +		goto bail;
>   bail:
>   	rte_kvargs_free(kvlist);
>   	return ret;
> @@ -7185,6 +7190,7 @@ RTE_PMD_REGISTER_PARAM_STRING(net_ice,
>   			      ICE_SAFE_MODE_SUPPORT_ARG "=<0|1>"
>   			      ICE_DEFAULT_MAC_DISABLE "=<0|1>"
>   			      ICE_DDP_FILENAME_ARG "=</path/to/file>"
> +			      ICE_DDP_LOAD_SCHED_ARG "=<0|1>"
>   			      ICE_RX_LOW_LATENCY_ARG "=<0|1>");
>   
>   RTE_LOG_REGISTER_SUFFIX(ice_logtype_init, init, NOTICE);
> diff --git a/drivers/net/ice/ice_ethdev.h b/drivers/net/ice/ice_ethdev.h
> index 076cf595e8..2794a76096 100644
> --- a/drivers/net/ice/ice_ethdev.h
> +++ b/drivers/net/ice/ice_ethdev.h
> @@ -564,6 +564,7 @@ struct ice_devargs {
>   	uint8_t proto_xtr[ICE_MAX_QUEUE_NUM];
>   	uint8_t pin_idx;
>   	uint8_t pps_out_ena;
> +	uint8_t ddp_load_sched;
>   	int xtr_field_offs;
>   	uint8_t xtr_flag_offs[PROTO_XTR_MAX];
>   	/* Name of the field. */
-- 
Regards,
Vladimir
[-- Attachment #2: Type: text/html, Size: 4907 bytes --]
^ permalink raw reply	[flat|nested] 76+ messages in thread
* Re: [PATCH v5 2/5] net/ice/base: make context alloc function non-static
  2024-10-23 16:55   ` [PATCH v5 2/5] net/ice/base: make context alloc function non-static Bruce Richardson
@ 2024-10-25 17:01     ` Medvedkin, Vladimir
  0 siblings, 0 replies; 76+ messages in thread
From: Medvedkin, Vladimir @ 2024-10-25 17:01 UTC (permalink / raw)
  To: Bruce Richardson, dev
Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
On 23/10/2024 17:55, Bruce Richardson wrote:
> The function "ice_alloc_lan_q_ctx" will be needed by the driver code, so
> make it non-static.
>
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> ---
>   drivers/net/ice/base/ice_sched.c | 2 +-
>   drivers/net/ice/base/ice_sched.h | 3 +++
>   2 files changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/ice/base/ice_sched.c b/drivers/net/ice/base/ice_sched.c
> index 9608ac7c24..1f520bb7c0 100644
> --- a/drivers/net/ice/base/ice_sched.c
> +++ b/drivers/net/ice/base/ice_sched.c
> @@ -570,7 +570,7 @@ ice_sched_suspend_resume_elems(struct ice_hw *hw, u8 num_nodes, u32 *node_teids,
>    * @tc: TC number
>    * @new_numqs: number of queues
>    */
> -static int
> +int
>   ice_alloc_lan_q_ctx(struct ice_hw *hw, u16 vsi_handle, u8 tc, u16 new_numqs)
>   {
>   	struct ice_vsi_ctx *vsi_ctx;
> diff --git a/drivers/net/ice/base/ice_sched.h b/drivers/net/ice/base/ice_sched.h
> index 9f78516dfb..09d60d02f0 100644
> --- a/drivers/net/ice/base/ice_sched.h
> +++ b/drivers/net/ice/base/ice_sched.h
> @@ -270,4 +270,7 @@ int ice_sched_replay_q_bw(struct ice_port_info *pi, struct ice_q_ctx *q_ctx);
>   int
>   ice_sched_cfg_node_bw_alloc(struct ice_hw *hw, struct ice_sched_node *node,
>   			    enum ice_rl_type rl_type, u16 bw_alloc);
> +
> +int
> +ice_alloc_lan_q_ctx(struct ice_hw *hw, u16 vsi_handle, u8 tc, u16 new_numqs);
>   #endif /* _ICE_SCHED_H_ */
-- 
Regards,
Vladimir
^ permalink raw reply	[flat|nested] 76+ messages in thread
* Re: [PATCH v5 3/5] net/ice: enhance Tx scheduler hierarchy support
  2024-10-23 16:55   ` [PATCH v5 3/5] net/ice: enhance Tx scheduler hierarchy support Bruce Richardson
@ 2024-10-25 17:02     ` Medvedkin, Vladimir
  0 siblings, 0 replies; 76+ messages in thread
From: Medvedkin, Vladimir @ 2024-10-25 17:02 UTC (permalink / raw)
  To: Bruce Richardson, dev
Hi Bruce,
On 23/10/2024 17:55, Bruce Richardson wrote:
> Increase the flexibility of the Tx scheduler hierarchy support in the
> driver. If the HW/firmware allows it, allow creating up to 2k child
> nodes per scheduler node. Also expand the number of supported layers to
> the max available, rather than always just having 3 layers.  One
> restriction on this change is that the topology needs to be configured
> and enabled before port queue setup, in many cases, and before port
> start in all cases.
>
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> ---
>   doc/guides/nics/ice.rst      |  31 +--
>   drivers/net/ice/ice_ethdev.c |   9 -
>   drivers/net/ice/ice_ethdev.h |  15 +-
>   drivers/net/ice/ice_rxtx.c   |  10 +
>   drivers/net/ice/ice_tm.c     | 496 ++++++++++++++---------------------
>   5 files changed, 224 insertions(+), 337 deletions(-)
>
<snip>
> @@ -393,16 +406,35 @@ ice_tm_node_add(struct rte_eth_dev *dev, uint32_t node_id,
>   	      struct rte_tm_error *error)
>   {
>   	struct ice_pf *pf = ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
> +	struct ice_hw *hw = ICE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
>   	struct ice_tm_shaper_profile *shaper_profile = NULL;
>   	struct ice_tm_node *tm_node;
> -	struct ice_tm_node *parent_node;
> +	struct ice_tm_node *parent_node = NULL;
>   	int ret;
>   
>   	if (!params || !error)
>   		return -EINVAL;
>   
> -	ret = ice_node_param_check(pf, node_id, priority, weight,
> -				    params, error);
> +	if (parent_node_id != RTE_TM_NODE_ID_NULL) {
> +		parent_node = find_node(pf->tm_conf.root, parent_node_id);
> +		if (!parent_node) {
> +			error->type = RTE_TM_ERROR_TYPE_NODE_PARENT_NODE_ID;
> +			error->message = "parent not exist";
> +			return -EINVAL;
> +		}
> +	}
> +	if (level_id == RTE_TM_NODE_LEVEL_ID_ANY && parent_node != NULL)
> +		level_id = parent_node->level + 1;
> +
> +	/* check level */
> +	if (parent_node != NULL && level_id != parent_node->level + 1) {
> +		error->type = RTE_TM_ERROR_TYPE_NODE_PARAMS;
> +		error->message = "Wrong level";
> +		return -EINVAL;
> +	}
> +
> +	ret = ice_node_param_check(node_id, priority, weight,
> +			params, level_id == ice_get_leaf_level(hw), error);
As a suggestion, move the following section:
/* root node if not have a parent */
     if (parent_node_id == RTE_TM_NODE_ID_NULL) {
     ...
}
before those checks and simplify if statements
>   	if (ret)
>   		return ret;
>   
>
>
> @@ -573,7 +574,7 @@ ice_tm_node_delete(struct rte_eth_dev *dev, uint32_t node_id,
>   	}
>   
>   	/* root node */
> -	if (tm_node->level == ICE_TM_NODE_TYPE_PORT) {
> +	if (tm_node->level == 0) {
>   		rte_free(tm_node);
>   		pf->tm_conf.root = NULL;
>   		return 0;
> @@ -593,53 +594,6 @@ ice_tm_node_delete(struct rte_eth_dev *dev, uint32_t node_id,
>   	return 0;
>   }
>   
<snip>
> +int
> +ice_tm_setup_txq_node(struct ice_pf *pf, struct ice_hw *hw, uint16_t qid, uint32_t teid)
>   {
> -	struct ice_pf *pf = ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
> -	struct ice_hw *hw = ICE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
> -	struct ice_sched_node *vsi_node = ice_get_vsi_node(hw);
> -	struct ice_tm_node *root = pf->tm_conf.root;
> -	uint32_t i;
> -	int ret;
> -
> -	/* reset vsi_node */
> -	ret = ice_set_node_rate(hw, NULL, vsi_node);
> -	if (ret) {
> -		PMD_DRV_LOG(ERR, "reset vsi node failed");
> -		return ret;
> -	}
> +	struct ice_sched_node *hw_node = ice_sched_find_node_by_teid(hw->port_info->root, teid);
> +	struct ice_tm_node *sw_node = find_node(pf->tm_conf.root, qid);
>   
> -	if (root == NULL)
> +	/* not configured in hierarchy */
> +	if (sw_node == NULL)
>   		return 0;
>   
> -	for (i = 0; i < root->reference_count; i++) {
> -		struct ice_tm_node *tm_node = root->children[i];
> +	sw_node->sched_node = hw_node;
>   
> -		if (tm_node->sched_node == NULL)
> -			continue;
> +	/* if the queue node has been put in the wrong place in hierarchy */
> +	if (hw_node->parent != sw_node->parent->sched_node) {
need to check hw_node for NULL
> +		struct ice_aqc_move_txqs_data *buf;
> +		uint8_t txqs_moved = 0;
> +		uint16_t buf_size = ice_struct_size(buf, txqs, 1);
> +
<snip>
>   
> -static int ice_hierarchy_commit(struct rte_eth_dev *dev,
> +static int
> +commit_new_hierarchy(struct rte_eth_dev *dev)
> +{
> +	struct ice_hw *hw = ICE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
> +	struct ice_pf *pf = ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
> +	struct ice_port_info *pi = hw->port_info;
> +	struct ice_tm_node *sw_root = pf->tm_conf.root;
> +	struct ice_sched_node *new_vsi_root = (pi->has_tc) ? pi->root->children[0] : pi->root;
> +	uint16_t nodes_created_per_level[10] = {0}; /* counted per hw level, not per logical */
I think it is worth to to change "10" with something meaningful. Also it 
is probably would be good to add an extra check against this value into:
ice_sched:ice_sched_query_res_alloc():
     ...
     hw->num_tx_sched_layers =
         (u8)LE16_TO_CPU(buf->sched_props.logical_levels);
> +	uint8_t q_lvl = ice_get_leaf_level(hw);
> +	uint8_t qg_lvl = q_lvl - 1;
<snip>
-- 
Regards,
Vladimir
^ permalink raw reply	[flat|nested] 76+ messages in thread
* Re: [PATCH v5 4/5] net/ice: allowing stopping port to apply TM topology
  2024-10-23 16:55   ` [PATCH v5 4/5] net/ice: allowing stopping port to apply TM topology Bruce Richardson
@ 2024-10-25 17:02     ` Medvedkin, Vladimir
  0 siblings, 0 replies; 76+ messages in thread
From: Medvedkin, Vladimir @ 2024-10-25 17:02 UTC (permalink / raw)
  To: Bruce Richardson, dev
Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
On 23/10/2024 17:55, Bruce Richardson wrote:
> The rte_tm topology commit requires the port to be stopped on apply.
> Rather than just returning an error when the port is already started, we
> can stop the port, apply the topology to it and then restart it.
>
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> ---
>   drivers/net/ice/ice_tm.c | 23 +++++++++++++++++++----
>   1 file changed, 19 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/ice/ice_tm.c b/drivers/net/ice/ice_tm.c
> index 4809bdde40..09e947a3b1 100644
> --- a/drivers/net/ice/ice_tm.c
> +++ b/drivers/net/ice/ice_tm.c
> @@ -844,15 +844,30 @@ ice_hierarchy_commit(struct rte_eth_dev *dev,
>   				 int clear_on_fail,
>   				 struct rte_tm_error *error)
>   {
> -	RTE_SET_USED(error);
> -	/* commit should only be done to topology before start! */
> -	if (dev->data->dev_started)
> -		return -1;
> +	bool restart = false;
> +
> +	/* commit should only be done to topology before start
> +	 * If port is already started, stop it and then restart when done.
> +	 */
> +	if (dev->data->dev_started) {
> +		if (rte_eth_dev_stop(dev->data->port_id) != 0) {
> +			error->message = "Device failed to Stop";
> +			return -1;
> +		}
> +		restart = true;
> +	}
>   
>   	int ret = commit_new_hierarchy(dev);
>   	if (ret < 0 && clear_on_fail) {
>   		ice_tm_conf_uninit(dev);
>   		ice_tm_conf_init(dev);
>   	}
> +
> +	if (restart) {
> +		if (rte_eth_dev_start(dev->data->port_id) != 0) {
> +			error->message = "Device failed to Start";
> +			return -1;
> +		}
> +	}
>   	return ret;
>   }
-- 
Regards,
Vladimir
^ permalink raw reply	[flat|nested] 76+ messages in thread
* Re: [PATCH v5 5/5] net/ice: provide parameter to limit scheduler layers
  2024-10-23 16:55   ` [PATCH v5 5/5] net/ice: provide parameter to limit scheduler layers Bruce Richardson
@ 2024-10-25 17:02     ` Medvedkin, Vladimir
  0 siblings, 0 replies; 76+ messages in thread
From: Medvedkin, Vladimir @ 2024-10-25 17:02 UTC (permalink / raw)
  To: Bruce Richardson, dev
Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
On 23/10/2024 17:55, Bruce Richardson wrote:
> In order to help with backward compatibility for applications, which may
> expect the ice driver tx scheduler (accessed via tm apis) to only have 3
> layers, add in a devarg to allow the user to explicitly limit the
> number of scheduler layers visible to the application.
>
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> ---
>   doc/guides/nics/ice.rst      | 16 +++++++++-
>   drivers/net/ice/ice_ethdev.c | 58 ++++++++++++++++++++++++++++++++++++
>   drivers/net/ice/ice_ethdev.h |  4 ++-
>   drivers/net/ice/ice_tm.c     | 28 +++++++++--------
>   4 files changed, 92 insertions(+), 14 deletions(-)
>
<snip>
-- 
Regards,
Vladimir
^ permalink raw reply	[flat|nested] 76+ messages in thread
* Re: [PATCH v5 1/5] net/ice: add option to download scheduler topology
  2024-10-25 17:01     ` Medvedkin, Vladimir
@ 2024-10-29 15:48       ` Bruce Richardson
  0 siblings, 0 replies; 76+ messages in thread
From: Bruce Richardson @ 2024-10-29 15:48 UTC (permalink / raw)
  To: Medvedkin, Vladimir; +Cc: dev
On Fri, Oct 25, 2024 at 06:01:06PM +0100, Medvedkin, Vladimir wrote:
>    Hi Bruce,
> 
>    On 23/10/2024 17:55, Bruce Richardson wrote:
> 
> The DDP package file being loaded at init time may contain an
> alternative Tx Scheduler topology in it. Add driver option to load this
> topology at init time.
> 
> Signed-off-by: Bruce Richardson [1]<bruce.richardson@intel.com>
> ---
>  doc/guides/nics/ice.rst        | 15 +++++++++++++++
>  drivers/net/ice/base/ice_ddp.c | 20 +++++++++++++++++---
>  drivers/net/ice/base/ice_ddp.h |  4 ++--
>  drivers/net/ice/ice_ethdev.c   | 24 +++++++++++++++---------
>  drivers/net/ice/ice_ethdev.h   |  1 +
>  5 files changed, 50 insertions(+), 14 deletions(-)
> 
> 
>    <snip>
> 
> @@ -2030,19 +2032,18 @@ static int
>  parse_bool(const char *key, const char *value, void *args)
>  {
>         int *i = (int *)args;
> -       char *end;
> -       int num;
> 
> -       num = strtoul(value, &end, 10);
> -
> -       if (num != 0 && num != 1) {
> -               PMD_DRV_LOG(WARNING, "invalid value:\"%s\" for key:\"%s\", "
> -                       "value must be 0 or 1",
> +       if (value == NULL || value[0] == '\0') {
> +               PMD_DRV_LOG(WARNING, "key:\"%s\", requires a value, which must b
> e 0 or 1", key);
> +               return -1;
> +       }
> +       if (value[1] != '\0' || (value[0] != '0' && value[0] != '1')) {
> +               PMD_DRV_LOG(WARNING, "invalid value:\"%s\" for key:\"%s\", value
>  must be 0 or 1",
>                         value, key);
>                 return -1;
>         }
> 
> -       *i = num;
> +       *i = value[0] - '0';
> 
>    I think that instead of using char arithmetic, it would be better to:
> 
>    *i = !(value[0] == '0')
> 
Not sure it's that big a difference, however I will change it. Any
objection to removing the "!" there and doing:
*i = (value[0] == '1')
/Bruce
^ permalink raw reply	[flat|nested] 76+ messages in thread
* [PATCH v6 0/5] Improve rte_tm support in ICE driver
  2024-08-07  9:33 [PATCH 00/15] Improve rte_tm support in ICE driver Bruce Richardson
                   ` (18 preceding siblings ...)
  2024-10-23 16:55 ` [PATCH v5 0/5] Improve rte_tm support in ICE driver Bruce Richardson
@ 2024-10-29 17:01 ` Bruce Richardson
  2024-10-29 17:01   ` [PATCH v6 1/5] net/ice: add option to download scheduler topology Bruce Richardson
                     ` (5 more replies)
  19 siblings, 6 replies; 76+ messages in thread
From: Bruce Richardson @ 2024-10-29 17:01 UTC (permalink / raw)
  To: dev; +Cc: vladimir.medvedkin, Bruce Richardson
This patchset expands the capabilities of the traffic management
support in the ICE driver. It allows the driver to support different
sizes of topologies, and support >256 queues and more than 3 hierarchy
layers.
---
v6:
* remove char arithmetic in patch 1
* rework parameter checks in patch 3 to shorten and simplify code
v5:
* fix checkpatch flagged issues
v4:
* set reduces to only 5 patches:
  - base code changes mostly covered by separate base code patchset (merged rc1)
  - additional minor fixes and enhancements covered by set [1] (merged to next-net-intel for rc2)
* additional work included in set:
  - automatic stopping and restarting of port on configuration
  - ability to reconfigure the sched topology post-commit and then apply that via new commit call
v3:
* remove/implement some code TODOs
* add patch 16 to set.
v2:
* Correct typo in commit log of one patch
* Add missing depends-on tag to the cover letter
[1] https://patches.dpdk.org/project/dpdk/list/?series=33609&state=*
Bruce Richardson (5):
  net/ice: add option to download scheduler topology
  net/ice/base: make context alloc function non-static
  net/ice: enhance Tx scheduler hierarchy support
  net/ice: allowing stopping port to apply TM topology
  net/ice: provide parameter to limit scheduler layers
 doc/guides/nics/ice.rst          |  60 +++-
 drivers/net/ice/base/ice_ddp.c   |  20 +-
 drivers/net/ice/base/ice_ddp.h   |   4 +-
 drivers/net/ice/base/ice_sched.c |   2 +-
 drivers/net/ice/base/ice_sched.h |   3 +
 drivers/net/ice/ice_ethdev.c     |  93 +++--
 drivers/net/ice/ice_ethdev.h     |  22 +-
 drivers/net/ice/ice_rxtx.c       |  10 +
 drivers/net/ice/ice_tm.c         | 566 +++++++++++++------------------
 9 files changed, 397 insertions(+), 383 deletions(-)
--
2.43.0
^ permalink raw reply	[flat|nested] 76+ messages in thread
* [PATCH v6 1/5] net/ice: add option to download scheduler topology
  2024-10-29 17:01 ` [PATCH v6 0/5] Improve rte_tm support in ICE driver Bruce Richardson
@ 2024-10-29 17:01   ` Bruce Richardson
  2024-10-30 15:21     ` Medvedkin, Vladimir
  2024-10-29 17:01   ` [PATCH v6 2/5] net/ice/base: make context alloc function non-static Bruce Richardson
                     ` (4 subsequent siblings)
  5 siblings, 1 reply; 76+ messages in thread
From: Bruce Richardson @ 2024-10-29 17:01 UTC (permalink / raw)
  To: dev; +Cc: vladimir.medvedkin, Bruce Richardson
The DDP package file being loaded at init time may contain an
alternative Tx Scheduler topology in it. Add driver option to load this
topology at init time.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 doc/guides/nics/ice.rst        | 15 +++++++++++++++
 drivers/net/ice/base/ice_ddp.c | 20 +++++++++++++++++---
 drivers/net/ice/base/ice_ddp.h |  4 ++--
 drivers/net/ice/ice_ethdev.c   | 26 ++++++++++++++++----------
 drivers/net/ice/ice_ethdev.h   |  1 +
 5 files changed, 51 insertions(+), 15 deletions(-)
diff --git a/doc/guides/nics/ice.rst b/doc/guides/nics/ice.rst
index 6c66dc8008..42bbe50968 100644
--- a/doc/guides/nics/ice.rst
+++ b/doc/guides/nics/ice.rst
@@ -298,6 +298,21 @@ Runtime Configuration
   As a trade-off, this configuration may cause the packet processing performance
   degradation due to the PCI bandwidth limitation.
 
+- ``Tx Scheduler Topology Download``
+
+  The default Tx scheduler topology exposed by the NIC,
+  generally a 9-level topology of which 8 levels are SW configurable,
+  may be updated by a new topology loaded from a DDP package file.
+  The ``ddp_load_sched_topo`` option can be used to specify that the scheduler topology,
+  if any, in the DDP package file being used should be loaded into the NIC.
+  For example::
+
+    -a 0000:88:00.0,ddp_load_sched_topo=1
+
+  or::
+
+    -a 0000:88:00.0,ddp_pkg_file=/path/to/pkg.file,ddp_load_sched_topo=1
+
 - ``Tx diagnostics`` (default ``not enabled``)
 
   Set the ``devargs`` parameter ``mbuf_check`` to enable Tx diagnostics.
diff --git a/drivers/net/ice/base/ice_ddp.c b/drivers/net/ice/base/ice_ddp.c
index c17a58eab8..850c722a3f 100644
--- a/drivers/net/ice/base/ice_ddp.c
+++ b/drivers/net/ice/base/ice_ddp.c
@@ -1333,7 +1333,7 @@ ice_fill_hw_ptype(struct ice_hw *hw)
  * ice_copy_and_init_pkg() instead of directly calling ice_init_pkg() in this
  * case.
  */
-enum ice_ddp_state ice_init_pkg(struct ice_hw *hw, u8 *buf, u32 len)
+enum ice_ddp_state ice_init_pkg(struct ice_hw *hw, u8 *buf, u32 len, bool load_sched)
 {
 	bool already_loaded = false;
 	enum ice_ddp_state state;
@@ -1351,6 +1351,20 @@ enum ice_ddp_state ice_init_pkg(struct ice_hw *hw, u8 *buf, u32 len)
 		return state;
 	}
 
+	if (load_sched) {
+		enum ice_status res = ice_cfg_tx_topo(hw, buf, len);
+		if (res != ICE_SUCCESS) {
+			ice_debug(hw, ICE_DBG_INIT,
+				  "failed to apply sched topology  (err: %d)\n",
+				  res);
+			return ICE_DDP_PKG_ERR;
+		}
+		ice_debug(hw, ICE_DBG_INIT,
+			  "Topology download successful, reinitializing device\n");
+		ice_deinit_hw(hw);
+		ice_init_hw(hw);
+	}
+
 	/* initialize package info */
 	state = ice_init_pkg_info(hw, pkg);
 	if (state)
@@ -1423,7 +1437,7 @@ enum ice_ddp_state ice_init_pkg(struct ice_hw *hw, u8 *buf, u32 len)
  * related routines.
  */
 enum ice_ddp_state
-ice_copy_and_init_pkg(struct ice_hw *hw, const u8 *buf, u32 len)
+ice_copy_and_init_pkg(struct ice_hw *hw, const u8 *buf, u32 len, bool load_sched)
 {
 	enum ice_ddp_state state;
 	u8 *buf_copy;
@@ -1433,7 +1447,7 @@ ice_copy_and_init_pkg(struct ice_hw *hw, const u8 *buf, u32 len)
 
 	buf_copy = (u8 *)ice_memdup(hw, buf, len, ICE_NONDMA_TO_NONDMA);
 
-	state = ice_init_pkg(hw, buf_copy, len);
+	state = ice_init_pkg(hw, buf_copy, len, load_sched);
 	if (!ice_is_init_pkg_successful(state)) {
 		/* Free the copy, since we failed to initialize the package */
 		ice_free(hw, buf_copy);
diff --git a/drivers/net/ice/base/ice_ddp.h b/drivers/net/ice/base/ice_ddp.h
index 5512669f44..d79cdee13a 100644
--- a/drivers/net/ice/base/ice_ddp.h
+++ b/drivers/net/ice/base/ice_ddp.h
@@ -454,9 +454,9 @@ ice_pkg_enum_entry(struct ice_seg *ice_seg, struct ice_pkg_enum *state,
 void *
 ice_pkg_enum_section(struct ice_seg *ice_seg, struct ice_pkg_enum *state,
 		     u32 sect_type);
-enum ice_ddp_state ice_init_pkg(struct ice_hw *hw, u8 *buff, u32 len);
+enum ice_ddp_state ice_init_pkg(struct ice_hw *hw, u8 *buff, u32 len, bool load_sched);
 enum ice_ddp_state
-ice_copy_and_init_pkg(struct ice_hw *hw, const u8 *buf, u32 len);
+ice_copy_and_init_pkg(struct ice_hw *hw, const u8 *buf, u32 len, bool load_sched);
 bool ice_is_init_pkg_successful(enum ice_ddp_state state);
 void ice_free_seg(struct ice_hw *hw);
 
diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c
index d5e94a6685..da91012a5e 100644
--- a/drivers/net/ice/ice_ethdev.c
+++ b/drivers/net/ice/ice_ethdev.c
@@ -39,6 +39,7 @@
 #define ICE_RX_LOW_LATENCY_ARG    "rx_low_latency"
 #define ICE_MBUF_CHECK_ARG       "mbuf_check"
 #define ICE_DDP_FILENAME_ARG      "ddp_pkg_file"
+#define ICE_DDP_LOAD_SCHED_ARG    "ddp_load_sched_topo"
 
 #define ICE_CYCLECOUNTER_MASK  0xffffffffffffffffULL
 
@@ -56,6 +57,7 @@ static const char * const ice_valid_args[] = {
 	ICE_DEFAULT_MAC_DISABLE,
 	ICE_MBUF_CHECK_ARG,
 	ICE_DDP_FILENAME_ARG,
+	ICE_DDP_LOAD_SCHED_ARG,
 	NULL
 };
 
@@ -1997,7 +1999,7 @@ int ice_load_pkg(struct ice_adapter *adapter, bool use_dsn, uint64_t dsn)
 load_fw:
 	PMD_INIT_LOG(DEBUG, "DDP package name: %s", pkg_file);
 
-	err = ice_copy_and_init_pkg(hw, buf, bufsz);
+	err = ice_copy_and_init_pkg(hw, buf, bufsz, adapter->devargs.ddp_load_sched);
 	if (!ice_is_init_pkg_successful(err)) {
 		PMD_INIT_LOG(ERR, "ice_copy_and_init_hw failed: %d", err);
 		free(buf);
@@ -2029,20 +2031,19 @@ ice_base_queue_get(struct ice_pf *pf)
 static int
 parse_bool(const char *key, const char *value, void *args)
 {
-	int *i = (int *)args;
-	char *end;
-	int num;
+	int *i = args;
 
-	num = strtoul(value, &end, 10);
-
-	if (num != 0 && num != 1) {
-		PMD_DRV_LOG(WARNING, "invalid value:\"%s\" for key:\"%s\", "
-			"value must be 0 or 1",
+	if (value == NULL || value[0] == '\0') {
+		PMD_DRV_LOG(WARNING, "key:\"%s\", requires a value, which must be 0 or 1", key);
+		return -1;
+	}
+	if (value[1] != '\0' || (value[0] != '0' && value[0] != '1')) {
+		PMD_DRV_LOG(WARNING, "invalid value:\"%s\" for key:\"%s\", value must be 0 or 1",
 			value, key);
 		return -1;
 	}
 
-	*i = num;
+	*i = (value[0] == '1');
 	return 0;
 }
 
@@ -2307,6 +2308,10 @@ static int ice_parse_devargs(struct rte_eth_dev *dev)
 	if (ret)
 		goto bail;
 
+	ret = rte_kvargs_process(kvlist, ICE_DDP_LOAD_SCHED_ARG,
+				 &parse_bool, &ad->devargs.ddp_load_sched);
+	if (ret)
+		goto bail;
 bail:
 	rte_kvargs_free(kvlist);
 	return ret;
@@ -7185,6 +7190,7 @@ RTE_PMD_REGISTER_PARAM_STRING(net_ice,
 			      ICE_SAFE_MODE_SUPPORT_ARG "=<0|1>"
 			      ICE_DEFAULT_MAC_DISABLE "=<0|1>"
 			      ICE_DDP_FILENAME_ARG "=</path/to/file>"
+			      ICE_DDP_LOAD_SCHED_ARG "=<0|1>"
 			      ICE_RX_LOW_LATENCY_ARG "=<0|1>");
 
 RTE_LOG_REGISTER_SUFFIX(ice_logtype_init, init, NOTICE);
diff --git a/drivers/net/ice/ice_ethdev.h b/drivers/net/ice/ice_ethdev.h
index 076cf595e8..2794a76096 100644
--- a/drivers/net/ice/ice_ethdev.h
+++ b/drivers/net/ice/ice_ethdev.h
@@ -564,6 +564,7 @@ struct ice_devargs {
 	uint8_t proto_xtr[ICE_MAX_QUEUE_NUM];
 	uint8_t pin_idx;
 	uint8_t pps_out_ena;
+	uint8_t ddp_load_sched;
 	int xtr_field_offs;
 	uint8_t xtr_flag_offs[PROTO_XTR_MAX];
 	/* Name of the field. */
-- 
2.43.0
^ permalink raw reply	[flat|nested] 76+ messages in thread
* [PATCH v6 2/5] net/ice/base: make context alloc function non-static
  2024-10-29 17:01 ` [PATCH v6 0/5] Improve rte_tm support in ICE driver Bruce Richardson
  2024-10-29 17:01   ` [PATCH v6 1/5] net/ice: add option to download scheduler topology Bruce Richardson
@ 2024-10-29 17:01   ` Bruce Richardson
  2024-10-29 17:01   ` [PATCH v6 3/5] net/ice: enhance Tx scheduler hierarchy support Bruce Richardson
                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 76+ messages in thread
From: Bruce Richardson @ 2024-10-29 17:01 UTC (permalink / raw)
  To: dev; +Cc: vladimir.medvedkin, Bruce Richardson
The function "ice_alloc_lan_q_ctx" will be needed by the driver code, so
make it non-static.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
---
 drivers/net/ice/base/ice_sched.c | 2 +-
 drivers/net/ice/base/ice_sched.h | 3 +++
 2 files changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ice/base/ice_sched.c b/drivers/net/ice/base/ice_sched.c
index 9608ac7c24..1f520bb7c0 100644
--- a/drivers/net/ice/base/ice_sched.c
+++ b/drivers/net/ice/base/ice_sched.c
@@ -570,7 +570,7 @@ ice_sched_suspend_resume_elems(struct ice_hw *hw, u8 num_nodes, u32 *node_teids,
  * @tc: TC number
  * @new_numqs: number of queues
  */
-static int
+int
 ice_alloc_lan_q_ctx(struct ice_hw *hw, u16 vsi_handle, u8 tc, u16 new_numqs)
 {
 	struct ice_vsi_ctx *vsi_ctx;
diff --git a/drivers/net/ice/base/ice_sched.h b/drivers/net/ice/base/ice_sched.h
index 9f78516dfb..09d60d02f0 100644
--- a/drivers/net/ice/base/ice_sched.h
+++ b/drivers/net/ice/base/ice_sched.h
@@ -270,4 +270,7 @@ int ice_sched_replay_q_bw(struct ice_port_info *pi, struct ice_q_ctx *q_ctx);
 int
 ice_sched_cfg_node_bw_alloc(struct ice_hw *hw, struct ice_sched_node *node,
 			    enum ice_rl_type rl_type, u16 bw_alloc);
+
+int
+ice_alloc_lan_q_ctx(struct ice_hw *hw, u16 vsi_handle, u8 tc, u16 new_numqs);
 #endif /* _ICE_SCHED_H_ */
-- 
2.43.0
^ permalink raw reply	[flat|nested] 76+ messages in thread
* [PATCH v6 3/5] net/ice: enhance Tx scheduler hierarchy support
  2024-10-29 17:01 ` [PATCH v6 0/5] Improve rte_tm support in ICE driver Bruce Richardson
  2024-10-29 17:01   ` [PATCH v6 1/5] net/ice: add option to download scheduler topology Bruce Richardson
  2024-10-29 17:01   ` [PATCH v6 2/5] net/ice/base: make context alloc function non-static Bruce Richardson
@ 2024-10-29 17:01   ` Bruce Richardson
  2024-10-30 15:21     ` Medvedkin, Vladimir
  2024-10-29 17:01   ` [PATCH v6 4/5] net/ice: allowing stopping port to apply TM topology Bruce Richardson
                     ` (2 subsequent siblings)
  5 siblings, 1 reply; 76+ messages in thread
From: Bruce Richardson @ 2024-10-29 17:01 UTC (permalink / raw)
  To: dev; +Cc: vladimir.medvedkin, Bruce Richardson
Increase the flexibility of the Tx scheduler hierarchy support in the
driver. If the HW/firmware allows it, allow creating up to 2k child
nodes per scheduler node. Also expand the number of supported layers to
the max available, rather than always just having 3 layers.  One
restriction on this change is that the topology needs to be configured
and enabled before port queue setup, in many cases, and before port
start in all cases.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 doc/guides/nics/ice.rst      |  31 +-
 drivers/net/ice/ice_ethdev.c |   9 -
 drivers/net/ice/ice_ethdev.h |  17 +-
 drivers/net/ice/ice_rxtx.c   |  10 +
 drivers/net/ice/ice_tm.c     | 548 ++++++++++++++---------------------
 5 files changed, 248 insertions(+), 367 deletions(-)
diff --git a/doc/guides/nics/ice.rst b/doc/guides/nics/ice.rst
index 42bbe50968..df489be08d 100644
--- a/doc/guides/nics/ice.rst
+++ b/doc/guides/nics/ice.rst
@@ -447,21 +447,22 @@ Traffic Management Support
 ~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 The ice PMD provides support for the Traffic Management API (RTE_TM),
-allow users to offload a 3-layers Tx scheduler on the E810 NIC:
-
-- ``Port Layer``
-
-  This is the root layer, support peak bandwidth configuration,
-  max to 32 children.
-
-- ``Queue Group Layer``
-
-  The middle layer, support peak / committed bandwidth, weight, priority configurations,
-  max to 8 children.
-
-- ``Queue Layer``
-
-  The leaf layer, support peak / committed bandwidth, weight, priority configurations.
+enabling users to configure and manage the traffic shaping and scheduling of transmitted packets.
+By default, all available transmit scheduler layers are available for configuration,
+allowing up to 2000 queues to be configured in a hierarchy of up to 8 levels.
+The number of levels in the hierarchy can be adjusted via driver parameter:
+
+* the default 9-level topology (8 levels usable) can be replaced by a new topology downloaded from a DDP file,
+  using the driver parameter ``ddp_load_sched_topo=1``.
+  Using this mechanism, if the number of levels is reduced,
+  the possible fan-out of child-nodes from each level may be increased.
+  The default topology is a 9-level tree with a fan-out of 8 at each level.
+  Released DDP package files contain a 5-level hierarchy (4-levels usable),
+  with increased fan-out at the lower 3 levels
+  e.g. 64 at levels 2 and 3, and 256 or more at the leaf-node level.
+
+For more details on how to configure a Tx scheduling hierarchy,
+please refer to the ``rte_tm`` `API documentation <https://doc.dpdk.org/api/rte__tm_8h.html>`_.
 
 Additional Options
 ++++++++++++++++++
diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c
index da91012a5e..7252ea6b24 100644
--- a/drivers/net/ice/ice_ethdev.c
+++ b/drivers/net/ice/ice_ethdev.c
@@ -3906,7 +3906,6 @@ ice_dev_start(struct rte_eth_dev *dev)
 	int mask, ret;
 	uint8_t timer = hw->func_caps.ts_func_info.tmr_index_owned;
 	uint32_t pin_idx = ad->devargs.pin_idx;
-	struct rte_tm_error tm_err;
 	ice_declare_bitmap(pmask, ICE_PROMISC_MAX);
 	ice_zero_bitmap(pmask, ICE_PROMISC_MAX);
 
@@ -3938,14 +3937,6 @@ ice_dev_start(struct rte_eth_dev *dev)
 		}
 	}
 
-	if (pf->tm_conf.committed) {
-		ret = ice_do_hierarchy_commit(dev, pf->tm_conf.clear_on_fail, &tm_err);
-		if (ret) {
-			PMD_DRV_LOG(ERR, "fail to commit Tx scheduler");
-			goto rx_err;
-		}
-	}
-
 	ice_set_rx_function(dev);
 	ice_set_tx_function(dev);
 
diff --git a/drivers/net/ice/ice_ethdev.h b/drivers/net/ice/ice_ethdev.h
index 2794a76096..70189a9eb7 100644
--- a/drivers/net/ice/ice_ethdev.h
+++ b/drivers/net/ice/ice_ethdev.h
@@ -458,6 +458,8 @@ struct ice_acl_info {
 TAILQ_HEAD(ice_shaper_profile_list, ice_tm_shaper_profile);
 TAILQ_HEAD(ice_tm_node_list, ice_tm_node);
 
+#define ICE_TM_MAX_LAYERS ICE_SCHED_9_LAYERS
+
 struct ice_tm_shaper_profile {
 	TAILQ_ENTRY(ice_tm_shaper_profile) node;
 	uint32_t shaper_profile_id;
@@ -480,14 +482,6 @@ struct ice_tm_node {
 	struct ice_sched_node *sched_node;
 };
 
-/* node type of Traffic Manager */
-enum ice_tm_node_type {
-	ICE_TM_NODE_TYPE_PORT,
-	ICE_TM_NODE_TYPE_QGROUP,
-	ICE_TM_NODE_TYPE_QUEUE,
-	ICE_TM_NODE_TYPE_MAX,
-};
-
 /* Struct to store all the Traffic Manager configuration. */
 struct ice_tm_conf {
 	struct ice_shaper_profile_list shaper_profile_list;
@@ -690,9 +684,6 @@ int ice_rem_rss_cfg_wrap(struct ice_pf *pf, uint16_t vsi_id,
 			 struct ice_rss_hash_cfg *cfg);
 void ice_tm_conf_init(struct rte_eth_dev *dev);
 void ice_tm_conf_uninit(struct rte_eth_dev *dev);
-int ice_do_hierarchy_commit(struct rte_eth_dev *dev,
-			    int clear_on_fail,
-			    struct rte_tm_error *error);
 extern const struct rte_tm_ops ice_tm_ops;
 
 static inline int
@@ -750,4 +741,8 @@ int rte_pmd_ice_dump_switch(uint16_t port, uint8_t **buff, uint32_t *size);
 
 __rte_experimental
 int rte_pmd_ice_dump_txsched(uint16_t port, bool detail, FILE *stream);
+
+int
+ice_tm_setup_txq_node(struct ice_pf *pf, struct ice_hw *hw, uint16_t qid, uint32_t node_teid);
+
 #endif /* _ICE_ETHDEV_H_ */
diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c
index 024d97cb46..0c7106c7e0 100644
--- a/drivers/net/ice/ice_rxtx.c
+++ b/drivers/net/ice/ice_rxtx.c
@@ -747,6 +747,7 @@ ice_tx_queue_start(struct rte_eth_dev *dev, uint16_t tx_queue_id)
 	int err;
 	struct ice_vsi *vsi;
 	struct ice_hw *hw;
+	struct ice_pf *pf;
 	struct ice_aqc_add_tx_qgrp *txq_elem;
 	struct ice_tlan_ctx tx_ctx;
 	int buf_len;
@@ -777,6 +778,7 @@ ice_tx_queue_start(struct rte_eth_dev *dev, uint16_t tx_queue_id)
 
 	vsi = txq->vsi;
 	hw = ICE_VSI_TO_HW(vsi);
+	pf = ICE_VSI_TO_PF(vsi);
 
 	memset(&tx_ctx, 0, sizeof(tx_ctx));
 	txq_elem->num_txqs = 1;
@@ -812,6 +814,14 @@ ice_tx_queue_start(struct rte_eth_dev *dev, uint16_t tx_queue_id)
 	/* store the schedule node id */
 	txq->q_teid = txq_elem->txqs[0].q_teid;
 
+	/* move the queue to correct position in hierarchy, if explicit hierarchy configured */
+	if (pf->tm_conf.committed)
+		if (ice_tm_setup_txq_node(pf, hw, tx_queue_id, txq->q_teid) != 0) {
+			PMD_DRV_LOG(ERR, "Failed to set up txq traffic management node");
+			rte_free(txq_elem);
+			return -EIO;
+		}
+
 	dev->data->tx_queue_state[tx_queue_id] = RTE_ETH_QUEUE_STATE_STARTED;
 
 	rte_free(txq_elem);
diff --git a/drivers/net/ice/ice_tm.c b/drivers/net/ice/ice_tm.c
index 636ab77f26..a135e9db30 100644
--- a/drivers/net/ice/ice_tm.c
+++ b/drivers/net/ice/ice_tm.c
@@ -1,17 +1,15 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2022 Intel Corporation
  */
+#include <rte_ethdev.h>
 #include <rte_tm_driver.h>
 
 #include "ice_ethdev.h"
 #include "ice_rxtx.h"
 
-#define MAX_CHILDREN_PER_SCHED_NODE	8
-#define MAX_CHILDREN_PER_TM_NODE	256
-
 static int ice_hierarchy_commit(struct rte_eth_dev *dev,
 				 int clear_on_fail,
-				 __rte_unused struct rte_tm_error *error);
+				 struct rte_tm_error *error);
 static int ice_tm_node_add(struct rte_eth_dev *dev, uint32_t node_id,
 	      uint32_t parent_node_id, uint32_t priority,
 	      uint32_t weight, uint32_t level_id,
@@ -86,9 +84,10 @@ ice_tm_conf_uninit(struct rte_eth_dev *dev)
 }
 
 static int
-ice_node_param_check(struct ice_pf *pf, uint32_t node_id,
+ice_node_param_check(uint32_t node_id,
 		      uint32_t priority, uint32_t weight,
 		      const struct rte_tm_node_params *params,
+		      bool is_leaf,
 		      struct rte_tm_error *error)
 {
 	/* checked all the unsupported parameter */
@@ -123,7 +122,7 @@ ice_node_param_check(struct ice_pf *pf, uint32_t node_id,
 	}
 
 	/* for non-leaf node */
-	if (node_id >= pf->dev_data->nb_tx_queues) {
+	if (!is_leaf) {
 		if (params->nonleaf.wfq_weight_mode) {
 			error->type =
 				RTE_TM_ERROR_TYPE_NODE_PARAMS_WFQ_WEIGHT_MODE;
@@ -147,6 +146,11 @@ ice_node_param_check(struct ice_pf *pf, uint32_t node_id,
 	}
 
 	/* for leaf node */
+	if (node_id >= RTE_MAX_QUEUES_PER_PORT) {
+		error->type = RTE_TM_ERROR_TYPE_NODE_ID;
+		error->message = "Node ID out of range for a leaf node.";
+		return -EINVAL;
+	}
 	if (params->leaf.cman) {
 		error->type = RTE_TM_ERROR_TYPE_NODE_PARAMS_CMAN;
 		error->message = "Congestion management not supported";
@@ -193,11 +197,18 @@ find_node(struct ice_tm_node *root, uint32_t id)
 	return NULL;
 }
 
+static inline uint8_t
+ice_get_leaf_level(struct ice_hw *hw)
+{
+	return hw->num_tx_sched_layers - 1 - hw->port_info->has_tc;
+}
+
 static int
 ice_node_type_get(struct rte_eth_dev *dev, uint32_t node_id,
 		   int *is_leaf, struct rte_tm_error *error)
 {
 	struct ice_pf *pf = ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
+	struct ice_hw *hw = ICE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
 	struct ice_tm_node *tm_node;
 
 	if (!is_leaf || !error)
@@ -217,7 +228,7 @@ ice_node_type_get(struct rte_eth_dev *dev, uint32_t node_id,
 		return -EINVAL;
 	}
 
-	if (tm_node->level == ICE_TM_NODE_TYPE_QUEUE)
+	if (tm_node->level == ice_get_leaf_level(hw))
 		*is_leaf = true;
 	else
 		*is_leaf = false;
@@ -393,34 +404,21 @@ ice_tm_node_add(struct rte_eth_dev *dev, uint32_t node_id,
 	      struct rte_tm_error *error)
 {
 	struct ice_pf *pf = ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
+	struct ice_hw *hw = ICE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
 	struct ice_tm_shaper_profile *shaper_profile = NULL;
 	struct ice_tm_node *tm_node;
-	struct ice_tm_node *parent_node;
+	struct ice_tm_node *parent_node = NULL;
 	int ret;
 
 	if (!params || !error)
 		return -EINVAL;
 
-	ret = ice_node_param_check(pf, node_id, priority, weight,
-				    params, error);
-	if (ret)
-		return ret;
-
-	/* check if the node is already existed */
-	if (find_node(pf->tm_conf.root, node_id)) {
-		error->type = RTE_TM_ERROR_TYPE_NODE_ID;
-		error->message = "node id already used";
-		return -EINVAL;
-	}
-
 	/* check the shaper profile id */
 	if (params->shaper_profile_id != RTE_TM_SHAPER_PROFILE_ID_NONE) {
-		shaper_profile = ice_shaper_profile_search(dev,
-			params->shaper_profile_id);
+		shaper_profile = ice_shaper_profile_search(dev, params->shaper_profile_id);
 		if (!shaper_profile) {
-			error->type =
-				RTE_TM_ERROR_TYPE_NODE_PARAMS_SHAPER_PROFILE_ID;
-			error->message = "shaper profile not exist";
+			error->type = RTE_TM_ERROR_TYPE_NODE_PARAMS_SHAPER_PROFILE_ID;
+			error->message = "shaper profile does not exist";
 			return -EINVAL;
 		}
 	}
@@ -428,9 +426,9 @@ ice_tm_node_add(struct rte_eth_dev *dev, uint32_t node_id,
 	/* root node if not have a parent */
 	if (parent_node_id == RTE_TM_NODE_ID_NULL) {
 		/* check level */
-		if (level_id != ICE_TM_NODE_TYPE_PORT) {
+		if (level_id != 0) {
 			error->type = RTE_TM_ERROR_TYPE_NODE_PARAMS;
-			error->message = "Wrong level";
+			error->message = "Wrong level, root node (NULL parent) must be at level 0";
 			return -EINVAL;
 		}
 
@@ -441,74 +439,75 @@ ice_tm_node_add(struct rte_eth_dev *dev, uint32_t node_id,
 			return -EINVAL;
 		}
 
+		ret = ice_node_param_check(node_id, priority, weight, params, false, error);
+		if (ret)
+			return ret;
+
 		/* add the root node */
 		tm_node = rte_zmalloc(NULL,
-				      sizeof(struct ice_tm_node) +
-				      sizeof(struct ice_tm_node *) * MAX_CHILDREN_PER_TM_NODE,
-				      0);
+				sizeof(struct ice_tm_node) +
+				sizeof(struct ice_tm_node *) * hw->max_children[0],
+				0);
 		if (!tm_node)
 			return -ENOMEM;
 		tm_node->id = node_id;
-		tm_node->level = ICE_TM_NODE_TYPE_PORT;
+		tm_node->level = 0;
 		tm_node->parent = NULL;
 		tm_node->reference_count = 0;
 		tm_node->shaper_profile = shaper_profile;
-		tm_node->children =
-			(void *)((uint8_t *)tm_node + sizeof(struct ice_tm_node));
-		rte_memcpy(&tm_node->params, params,
-				 sizeof(struct rte_tm_node_params));
+		tm_node->children = RTE_PTR_ADD(tm_node, sizeof(struct ice_tm_node));
+		tm_node->params = *params;
 		pf->tm_conf.root = tm_node;
 		return 0;
 	}
 
-	/* check the parent node */
 	parent_node = find_node(pf->tm_conf.root, parent_node_id);
 	if (!parent_node) {
 		error->type = RTE_TM_ERROR_TYPE_NODE_PARENT_NODE_ID;
 		error->message = "parent not exist";
 		return -EINVAL;
 	}
-	if (parent_node->level != ICE_TM_NODE_TYPE_PORT &&
-	    parent_node->level != ICE_TM_NODE_TYPE_QGROUP) {
-		error->type = RTE_TM_ERROR_TYPE_NODE_PARENT_NODE_ID;
-		error->message = "parent is not valid";
-		return -EINVAL;
-	}
+
 	/* check level */
-	if (level_id != RTE_TM_NODE_LEVEL_ID_ANY &&
-	    level_id != parent_node->level + 1) {
+	if (level_id == RTE_TM_NODE_LEVEL_ID_ANY)
+		level_id = parent_node->level + 1;
+	else if (level_id != parent_node->level + 1) {
 		error->type = RTE_TM_ERROR_TYPE_NODE_PARAMS;
 		error->message = "Wrong level";
 		return -EINVAL;
 	}
 
-	/* check the node number */
-	if (parent_node->level == ICE_TM_NODE_TYPE_PORT) {
-		/* check the queue group number */
-		if (parent_node->reference_count >= pf->dev_data->nb_tx_queues) {
-			error->type = RTE_TM_ERROR_TYPE_NODE_ID;
-			error->message = "too many queue groups";
-			return -EINVAL;
-		}
-	} else {
-		/* check the queue number */
-		if (parent_node->reference_count >=
-			MAX_CHILDREN_PER_SCHED_NODE) {
-			error->type = RTE_TM_ERROR_TYPE_NODE_ID;
-			error->message = "too many queues";
-			return -EINVAL;
-		}
-		if (node_id >= pf->dev_data->nb_tx_queues) {
-			error->type = RTE_TM_ERROR_TYPE_NODE_ID;
-			error->message = "too large queue id";
-			return -EINVAL;
-		}
+	ret = ice_node_param_check(node_id, priority, weight,
+			params, level_id == ice_get_leaf_level(hw), error);
+	if (ret)
+		return ret;
+
+	/* check if the node is already existed */
+	if (find_node(pf->tm_conf.root, node_id)) {
+		error->type = RTE_TM_ERROR_TYPE_NODE_ID;
+		error->message = "node id already used";
+		return -EINVAL;
+	}
+
+	/* check the parent node */
+	/* for n-level hierarchy, level n-1 is leaf, so last level with children is n-2 */
+	if ((int)parent_node->level > hw->num_tx_sched_layers - 2) {
+		error->type = RTE_TM_ERROR_TYPE_NODE_PARENT_NODE_ID;
+		error->message = "parent is not valid";
+		return -EINVAL;
+	}
+
+	/* check the max children allowed at this level */
+	if (parent_node->reference_count >= hw->max_children[parent_node->level]) {
+		error->type = RTE_TM_ERROR_TYPE_CAPABILITIES;
+		error->message = "insufficient number of child nodes supported";
+		return -EINVAL;
 	}
 
 	tm_node = rte_zmalloc(NULL,
-			      sizeof(struct ice_tm_node) +
-			      sizeof(struct ice_tm_node *) * MAX_CHILDREN_PER_TM_NODE,
-			      0);
+			sizeof(struct ice_tm_node) +
+			sizeof(struct ice_tm_node *) * hw->max_children[level_id],
+			0);
 	if (!tm_node)
 		return -ENOMEM;
 	tm_node->id = node_id;
@@ -516,25 +515,18 @@ ice_tm_node_add(struct rte_eth_dev *dev, uint32_t node_id,
 	tm_node->weight = weight;
 	tm_node->reference_count = 0;
 	tm_node->parent = parent_node;
-	tm_node->level = parent_node->level + 1;
+	tm_node->level = level_id;
 	tm_node->shaper_profile = shaper_profile;
-	tm_node->children =
-		(void *)((uint8_t *)tm_node + sizeof(struct ice_tm_node));
-	tm_node->parent->children[tm_node->parent->reference_count] = tm_node;
+	tm_node->children = RTE_PTR_ADD(tm_node, sizeof(struct ice_tm_node));
+	tm_node->parent->children[tm_node->parent->reference_count++] = tm_node;
+	tm_node->params = *params;
 
-	if (tm_node->priority != 0 && level_id != ICE_TM_NODE_TYPE_QUEUE &&
-	    level_id != ICE_TM_NODE_TYPE_QGROUP)
-		PMD_DRV_LOG(WARNING, "priority != 0 not supported in level %d",
-			    level_id);
+	if (tm_node->priority != 0)
+		PMD_DRV_LOG(WARNING, "priority != 0 not supported in level %d", level_id);
 
-	if (tm_node->weight != 1 &&
-	    level_id != ICE_TM_NODE_TYPE_QUEUE && level_id != ICE_TM_NODE_TYPE_QGROUP)
-		PMD_DRV_LOG(WARNING, "weight != 1 not supported in level %d",
-			    level_id);
+	if (tm_node->weight != 1 && level_id == 0)
+		PMD_DRV_LOG(WARNING, "weight != 1 not supported in level %d", level_id);
 
-	rte_memcpy(&tm_node->params, params,
-			 sizeof(struct rte_tm_node_params));
-	tm_node->parent->reference_count++;
 
 	return 0;
 }
@@ -573,7 +565,7 @@ ice_tm_node_delete(struct rte_eth_dev *dev, uint32_t node_id,
 	}
 
 	/* root node */
-	if (tm_node->level == ICE_TM_NODE_TYPE_PORT) {
+	if (tm_node->level == 0) {
 		rte_free(tm_node);
 		pf->tm_conf.root = NULL;
 		return 0;
@@ -593,53 +585,6 @@ ice_tm_node_delete(struct rte_eth_dev *dev, uint32_t node_id,
 	return 0;
 }
 
-static int ice_move_recfg_lan_txq(struct rte_eth_dev *dev,
-				  struct ice_sched_node *queue_sched_node,
-				  struct ice_sched_node *dst_node,
-				  uint16_t queue_id)
-{
-	struct ice_hw *hw = ICE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
-	struct ice_aqc_move_txqs_data *buf;
-	struct ice_sched_node *queue_parent_node;
-	uint8_t txqs_moved;
-	int ret = ICE_SUCCESS;
-	uint16_t buf_size = ice_struct_size(buf, txqs, 1);
-
-	buf = (struct ice_aqc_move_txqs_data *)ice_malloc(hw, sizeof(*buf));
-	if (buf == NULL)
-		return -ENOMEM;
-
-	queue_parent_node = queue_sched_node->parent;
-	buf->src_teid = queue_parent_node->info.node_teid;
-	buf->dest_teid = dst_node->info.node_teid;
-	buf->txqs[0].q_teid = queue_sched_node->info.node_teid;
-	buf->txqs[0].txq_id = queue_id;
-
-	ret = ice_aq_move_recfg_lan_txq(hw, 1, true, false, false, false, 50,
-					NULL, buf, buf_size, &txqs_moved, NULL);
-	if (ret || txqs_moved == 0) {
-		PMD_DRV_LOG(ERR, "move lan queue %u failed", queue_id);
-		rte_free(buf);
-		return ICE_ERR_PARAM;
-	}
-
-	if (queue_parent_node->num_children > 0) {
-		queue_parent_node->num_children--;
-		queue_parent_node->children[queue_parent_node->num_children] = NULL;
-	} else {
-		PMD_DRV_LOG(ERR, "invalid children number %d for queue %u",
-			    queue_parent_node->num_children, queue_id);
-		rte_free(buf);
-		return ICE_ERR_PARAM;
-	}
-	dst_node->children[dst_node->num_children++] = queue_sched_node;
-	queue_sched_node->parent = dst_node;
-	ice_sched_query_elem(hw, queue_sched_node->info.node_teid, &queue_sched_node->info);
-
-	rte_free(buf);
-	return ret;
-}
-
 static int ice_set_node_rate(struct ice_hw *hw,
 			     struct ice_tm_node *tm_node,
 			     struct ice_sched_node *sched_node)
@@ -727,240 +672,179 @@ static int ice_cfg_hw_node(struct ice_hw *hw,
 	return 0;
 }
 
-static struct ice_sched_node *ice_get_vsi_node(struct ice_hw *hw)
+int
+ice_tm_setup_txq_node(struct ice_pf *pf, struct ice_hw *hw, uint16_t qid, uint32_t teid)
 {
-	struct ice_sched_node *node = hw->port_info->root;
-	uint32_t vsi_layer = hw->num_tx_sched_layers - ICE_VSI_LAYER_OFFSET;
-	uint32_t i;
-
-	for (i = 0; i < vsi_layer; i++)
-		node = node->children[0];
+	struct ice_sched_node *hw_node = ice_sched_find_node_by_teid(hw->port_info->root, teid);
+	struct ice_tm_node *sw_node = find_node(pf->tm_conf.root, qid);
 
-	return node;
-}
-
-static int ice_reset_noleaf_nodes(struct rte_eth_dev *dev)
-{
-	struct ice_pf *pf = ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
-	struct ice_hw *hw = ICE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
-	struct ice_sched_node *vsi_node = ice_get_vsi_node(hw);
-	struct ice_tm_node *root = pf->tm_conf.root;
-	uint32_t i;
-	int ret;
-
-	/* reset vsi_node */
-	ret = ice_set_node_rate(hw, NULL, vsi_node);
-	if (ret) {
-		PMD_DRV_LOG(ERR, "reset vsi node failed");
-		return ret;
-	}
-
-	if (root == NULL)
+	/* not configured in hierarchy */
+	if (sw_node == NULL)
 		return 0;
 
-	for (i = 0; i < root->reference_count; i++) {
-		struct ice_tm_node *tm_node = root->children[i];
+	sw_node->sched_node = hw_node;
 
-		if (tm_node->sched_node == NULL)
-			continue;
+	/* if the queue node has been put in the wrong place in hierarchy */
+	if (hw_node->parent != sw_node->parent->sched_node) {
+		struct ice_aqc_move_txqs_data *buf;
+		uint8_t txqs_moved = 0;
+		uint16_t buf_size = ice_struct_size(buf, txqs, 1);
 
-		ret = ice_cfg_hw_node(hw, NULL, tm_node->sched_node);
-		if (ret) {
-			PMD_DRV_LOG(ERR, "reset queue group node %u failed", tm_node->id);
-			return ret;
+		buf = ice_malloc(hw, buf_size);
+		if (buf == NULL)
+			return -ENOMEM;
+
+		struct ice_sched_node *parent = hw_node->parent;
+		struct ice_sched_node *new_parent = sw_node->parent->sched_node;
+		buf->src_teid = parent->info.node_teid;
+		buf->dest_teid = new_parent->info.node_teid;
+		buf->txqs[0].q_teid = hw_node->info.node_teid;
+		buf->txqs[0].txq_id = qid;
+
+		int ret = ice_aq_move_recfg_lan_txq(hw, 1, true, false, false, false, 50,
+						NULL, buf, buf_size, &txqs_moved, NULL);
+		if (ret || txqs_moved == 0) {
+			PMD_DRV_LOG(ERR, "move lan queue %u failed", qid);
+			ice_free(hw, buf);
+			return ICE_ERR_PARAM;
 		}
-		tm_node->sched_node = NULL;
+
+		/* now update the ice_sched_nodes to match physical layout */
+		new_parent->children[new_parent->num_children++] = hw_node;
+		hw_node->parent = new_parent;
+		ice_sched_query_elem(hw, hw_node->info.node_teid, &hw_node->info);
+		for (uint16_t i = 0; i < parent->num_children; i++)
+			if (parent->children[i] == hw_node) {
+				/* to remove, just overwrite the old node slot with the last ptr */
+				parent->children[i] = parent->children[--parent->num_children];
+				break;
+			}
 	}
 
-	return 0;
+	return ice_cfg_hw_node(hw, sw_node, hw_node);
 }
 
-static int ice_remove_leaf_nodes(struct rte_eth_dev *dev)
+/* from a given node, recursively deletes all the nodes that belong to that vsi.
+ * Any nodes which can't be deleted because they have children belonging to a different
+ * VSI, are now also adjusted to belong to that VSI also
+ */
+static int
+free_sched_node_recursive(struct ice_port_info *pi, const struct ice_sched_node *root,
+		struct ice_sched_node *node, uint8_t vsi_id)
 {
-	int ret = 0;
-	int i;
+	uint16_t i = 0;
 
-	for (i = 0; i < dev->data->nb_tx_queues; i++) {
-		ret = ice_tx_queue_stop(dev, i);
-		if (ret) {
-			PMD_DRV_LOG(ERR, "stop queue %u failed", i);
-			break;
+	while (i < node->num_children) {
+		if (node->children[i]->vsi_handle != vsi_id) {
+			i++;
+			continue;
 		}
+		free_sched_node_recursive(pi, root, node->children[i], vsi_id);
 	}
 
-	return ret;
-}
-
-static int ice_add_leaf_nodes(struct rte_eth_dev *dev)
-{
-	int ret = 0;
-	int i;
-
-	for (i = 0; i < dev->data->nb_tx_queues; i++) {
-		ret = ice_tx_queue_start(dev, i);
-		if (ret) {
-			PMD_DRV_LOG(ERR, "start queue %u failed", i);
-			break;
-		}
+	if (node != root) {
+		if (node->num_children == 0)
+			ice_free_sched_node(pi, node);
+		else
+			node->vsi_handle = node->children[0]->vsi_handle;
 	}
 
-	return ret;
+	return 0;
 }
 
-int ice_do_hierarchy_commit(struct rte_eth_dev *dev,
-			    int clear_on_fail,
-			    struct rte_tm_error *error)
+static int
+create_sched_node_recursive(struct ice_port_info *pi, struct ice_tm_node *sw_node,
+		struct ice_sched_node *hw_root, uint16_t *created)
 {
-	struct ice_pf *pf = ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
-	struct ice_hw *hw = ICE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
-	struct ice_tm_node *root;
-	struct ice_sched_node *vsi_node = NULL;
-	struct ice_sched_node *queue_node;
-	struct ice_tx_queue *txq;
-	int ret_val = 0;
-	uint32_t i;
-	uint32_t idx_vsi_child;
-	uint32_t idx_qg;
-	uint32_t nb_vsi_child;
-	uint32_t nb_qg;
-	uint32_t qid;
-	uint32_t q_teid;
-
-	/* remove leaf nodes */
-	ret_val = ice_remove_leaf_nodes(dev);
-	if (ret_val) {
-		error->type = RTE_TM_ERROR_TYPE_UNSPECIFIED;
-		PMD_DRV_LOG(ERR, "reset no-leaf nodes failed");
-		goto fail_clear;
-	}
-
-	/* reset no-leaf nodes. */
-	ret_val = ice_reset_noleaf_nodes(dev);
-	if (ret_val) {
-		error->type = RTE_TM_ERROR_TYPE_UNSPECIFIED;
-		PMD_DRV_LOG(ERR, "reset leaf nodes failed");
-		goto add_leaf;
-	}
-
-	/* config vsi node */
-	vsi_node = ice_get_vsi_node(hw);
-	root = pf->tm_conf.root;
-
-	ret_val = ice_set_node_rate(hw, root, vsi_node);
-	if (ret_val) {
-		error->type = RTE_TM_ERROR_TYPE_UNSPECIFIED;
-		PMD_DRV_LOG(ERR,
-			    "configure vsi node %u bandwidth failed",
-			    root->id);
-		goto add_leaf;
-	}
-
-	/* config queue group nodes */
-	nb_vsi_child = vsi_node->num_children;
-	nb_qg = vsi_node->children[0]->num_children;
-
-	idx_vsi_child = 0;
-	idx_qg = 0;
-
-	if (root == NULL)
-		goto commit;
-
-	for (i = 0; i < root->reference_count; i++) {
-		struct ice_tm_node *tm_node = root->children[i];
-		struct ice_tm_node *tm_child_node;
-		struct ice_sched_node *qgroup_sched_node =
-			vsi_node->children[idx_vsi_child]->children[idx_qg];
-		uint32_t j;
-
-		ret_val = ice_cfg_hw_node(hw, tm_node, qgroup_sched_node);
-		if (ret_val) {
-			error->type = RTE_TM_ERROR_TYPE_UNSPECIFIED;
-			PMD_DRV_LOG(ERR,
-				    "configure queue group node %u failed",
-				    tm_node->id);
-			goto reset_leaf;
-		}
-
-		for (j = 0; j < tm_node->reference_count; j++) {
-			tm_child_node = tm_node->children[j];
-			qid = tm_child_node->id;
-			ret_val = ice_tx_queue_start(dev, qid);
-			if (ret_val) {
-				error->type = RTE_TM_ERROR_TYPE_UNSPECIFIED;
-				PMD_DRV_LOG(ERR, "start queue %u failed", qid);
-				goto reset_leaf;
-			}
-			txq = dev->data->tx_queues[qid];
-			q_teid = txq->q_teid;
-			queue_node = ice_sched_get_node(hw->port_info, q_teid);
-			if (queue_node == NULL) {
-				error->type = RTE_TM_ERROR_TYPE_UNSPECIFIED;
-				PMD_DRV_LOG(ERR, "get queue %u node failed", qid);
-				goto reset_leaf;
-			}
-			if (queue_node->info.parent_teid != qgroup_sched_node->info.node_teid) {
-				ret_val = ice_move_recfg_lan_txq(dev, queue_node,
-								 qgroup_sched_node, qid);
-				if (ret_val) {
-					error->type = RTE_TM_ERROR_TYPE_UNSPECIFIED;
-					PMD_DRV_LOG(ERR, "move queue %u failed", qid);
-					goto reset_leaf;
-				}
-			}
-			ret_val = ice_cfg_hw_node(hw, tm_child_node, queue_node);
-			if (ret_val) {
-				error->type = RTE_TM_ERROR_TYPE_UNSPECIFIED;
-				PMD_DRV_LOG(ERR,
-					    "configure queue group node %u failed",
-					    tm_node->id);
-				goto reset_leaf;
-			}
+	struct ice_sched_node *parent = sw_node->sched_node;
+	uint32_t teid;
+	uint16_t added;
+
+	/* first create all child nodes */
+	for (uint16_t i = 0; i < sw_node->reference_count; i++) {
+		struct ice_tm_node *tm_node = sw_node->children[i];
+		int res = ice_sched_add_elems(pi, hw_root,
+				parent, parent->tx_sched_layer + 1,
+				1 /* num nodes */, &added, &teid,
+				NULL /* no pre-alloc */);
+		if (res != 0) {
+			PMD_DRV_LOG(ERR, "Error with ice_sched_add_elems, adding child node to teid %u",
+					parent->info.node_teid);
+			return -1;
 		}
-
-		idx_qg++;
-		if (idx_qg >= nb_qg) {
-			idx_qg = 0;
-			idx_vsi_child++;
-		}
-		if (idx_vsi_child >= nb_vsi_child) {
-			error->type = RTE_TM_ERROR_TYPE_UNSPECIFIED;
-			PMD_DRV_LOG(ERR, "too many queues");
-			goto reset_leaf;
+		struct ice_sched_node *hw_node = ice_sched_find_node_by_teid(parent, teid);
+		if (ice_cfg_hw_node(pi->hw, tm_node, hw_node) != 0) {
+			PMD_DRV_LOG(ERR, "Error configuring node %u at layer %u",
+					teid, parent->tx_sched_layer + 1);
+			return -1;
 		}
+		tm_node->sched_node = hw_node;
+		created[hw_node->tx_sched_layer]++;
 	}
 
-commit:
-	pf->tm_conf.committed = true;
-	pf->tm_conf.clear_on_fail = clear_on_fail;
+	/* if we have just created the child nodes in the q-group, i.e. last non-leaf layer,
+	 * then just return, rather than trying to create leaf nodes.
+	 * That is done later at queue start.
+	 */
+	if (sw_node->level + 2 == ice_get_leaf_level(pi->hw))
+		return 0;
 
-	return ret_val;
+	for (uint16_t i = 0; i < sw_node->reference_count; i++) {
+		if (sw_node->children[i]->reference_count == 0)
+			continue;
 
-reset_leaf:
-	ice_remove_leaf_nodes(dev);
-add_leaf:
-	ice_add_leaf_nodes(dev);
-	ice_reset_noleaf_nodes(dev);
-fail_clear:
-	/* clear all the traffic manager configuration */
-	if (clear_on_fail) {
-		ice_tm_conf_uninit(dev);
-		ice_tm_conf_init(dev);
+		if (create_sched_node_recursive(pi, sw_node->children[i], hw_root, created) < 0)
+			return -1;
 	}
-	return ret_val;
+	return 0;
 }
 
-static int ice_hierarchy_commit(struct rte_eth_dev *dev,
+static int
+commit_new_hierarchy(struct rte_eth_dev *dev)
+{
+	struct ice_hw *hw = ICE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+	struct ice_pf *pf = ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
+	struct ice_port_info *pi = hw->port_info;
+	struct ice_tm_node *sw_root = pf->tm_conf.root;
+	struct ice_sched_node *new_vsi_root = (pi->has_tc) ? pi->root->children[0] : pi->root;
+	/* count nodes per hw level, not per logical */
+	uint16_t nodes_created_per_level[ICE_TM_MAX_LAYERS] = {0};
+	uint8_t q_lvl = ice_get_leaf_level(hw);
+	uint8_t qg_lvl = q_lvl - 1;
+
+	free_sched_node_recursive(pi, new_vsi_root, new_vsi_root, new_vsi_root->vsi_handle);
+
+	sw_root->sched_node = new_vsi_root;
+	if (create_sched_node_recursive(pi, sw_root, new_vsi_root, nodes_created_per_level) < 0)
+		return -1;
+	for (uint16_t i = 0; i < RTE_DIM(nodes_created_per_level); i++)
+		PMD_DRV_LOG(DEBUG, "Created %u nodes at level %u",
+				nodes_created_per_level[i], i);
+	hw->vsi_ctx[pf->main_vsi->idx]->sched.vsi_node[0] = new_vsi_root;
+
+	pf->main_vsi->nb_qps =
+			RTE_MIN(nodes_created_per_level[qg_lvl] * hw->max_children[qg_lvl],
+				hw->layer_info[q_lvl].max_device_nodes);
+
+	pf->tm_conf.committed = true; /* set flag to be checks on queue start */
+
+	return ice_alloc_lan_q_ctx(hw, 0, 0, pf->main_vsi->nb_qps);
+}
+
+static int
+ice_hierarchy_commit(struct rte_eth_dev *dev,
 				 int clear_on_fail,
 				 struct rte_tm_error *error)
 {
-	struct ice_pf *pf = ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
+	RTE_SET_USED(error);
+	/* commit should only be done to topology before start! */
+	if (dev->data->dev_started)
+		return -1;
 
-	/* if device not started, simply set committed flag and return. */
-	if (!dev->data->dev_started) {
-		pf->tm_conf.committed = true;
-		pf->tm_conf.clear_on_fail = clear_on_fail;
-		return 0;
+	int ret = commit_new_hierarchy(dev);
+	if (ret < 0 && clear_on_fail) {
+		ice_tm_conf_uninit(dev);
+		ice_tm_conf_init(dev);
 	}
-
-	return ice_do_hierarchy_commit(dev, clear_on_fail, error);
+	return ret;
 }
-- 
2.43.0
^ permalink raw reply	[flat|nested] 76+ messages in thread
* [PATCH v6 4/5] net/ice: allowing stopping port to apply TM topology
  2024-10-29 17:01 ` [PATCH v6 0/5] Improve rte_tm support in ICE driver Bruce Richardson
                     ` (2 preceding siblings ...)
  2024-10-29 17:01   ` [PATCH v6 3/5] net/ice: enhance Tx scheduler hierarchy support Bruce Richardson
@ 2024-10-29 17:01   ` Bruce Richardson
  2024-10-29 17:01   ` [PATCH v6 5/5] net/ice: provide parameter to limit scheduler layers Bruce Richardson
  2024-10-30 16:30   ` [PATCH v6 0/5] Improve rte_tm support in ICE driver Bruce Richardson
  5 siblings, 0 replies; 76+ messages in thread
From: Bruce Richardson @ 2024-10-29 17:01 UTC (permalink / raw)
  To: dev; +Cc: vladimir.medvedkin, Bruce Richardson
The rte_tm topology commit requires the port to be stopped on apply.
Rather than just returning an error when the port is already started, we
can stop the port, apply the topology to it and then restart it.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
---
 drivers/net/ice/ice_tm.c | 23 +++++++++++++++++++----
 1 file changed, 19 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ice/ice_tm.c b/drivers/net/ice/ice_tm.c
index a135e9db30..235eda24e5 100644
--- a/drivers/net/ice/ice_tm.c
+++ b/drivers/net/ice/ice_tm.c
@@ -836,15 +836,30 @@ ice_hierarchy_commit(struct rte_eth_dev *dev,
 				 int clear_on_fail,
 				 struct rte_tm_error *error)
 {
-	RTE_SET_USED(error);
-	/* commit should only be done to topology before start! */
-	if (dev->data->dev_started)
-		return -1;
+	bool restart = false;
+
+	/* commit should only be done to topology before start
+	 * If port is already started, stop it and then restart when done.
+	 */
+	if (dev->data->dev_started) {
+		if (rte_eth_dev_stop(dev->data->port_id) != 0) {
+			error->message = "Device failed to Stop";
+			return -1;
+		}
+		restart = true;
+	}
 
 	int ret = commit_new_hierarchy(dev);
 	if (ret < 0 && clear_on_fail) {
 		ice_tm_conf_uninit(dev);
 		ice_tm_conf_init(dev);
 	}
+
+	if (restart) {
+		if (rte_eth_dev_start(dev->data->port_id) != 0) {
+			error->message = "Device failed to Start";
+			return -1;
+		}
+	}
 	return ret;
 }
-- 
2.43.0
^ permalink raw reply	[flat|nested] 76+ messages in thread
* [PATCH v6 5/5] net/ice: provide parameter to limit scheduler layers
  2024-10-29 17:01 ` [PATCH v6 0/5] Improve rte_tm support in ICE driver Bruce Richardson
                     ` (3 preceding siblings ...)
  2024-10-29 17:01   ` [PATCH v6 4/5] net/ice: allowing stopping port to apply TM topology Bruce Richardson
@ 2024-10-29 17:01   ` Bruce Richardson
  2024-10-30 16:30   ` [PATCH v6 0/5] Improve rte_tm support in ICE driver Bruce Richardson
  5 siblings, 0 replies; 76+ messages in thread
From: Bruce Richardson @ 2024-10-29 17:01 UTC (permalink / raw)
  To: dev; +Cc: vladimir.medvedkin, Bruce Richardson
In order to help with backward compatibility for applications, which may
expect the ice driver tx scheduler (accessed via tm apis) to only have 3
layers, add in a devarg to allow the user to explicitly limit the
number of scheduler layers visible to the application.
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
---
 doc/guides/nics/ice.rst      | 16 +++++++++-
 drivers/net/ice/ice_ethdev.c | 58 ++++++++++++++++++++++++++++++++++++
 drivers/net/ice/ice_ethdev.h |  4 ++-
 drivers/net/ice/ice_tm.c     | 33 +++++++++++---------
 4 files changed, 95 insertions(+), 16 deletions(-)
diff --git a/doc/guides/nics/ice.rst b/doc/guides/nics/ice.rst
index df489be08d..471343a0ac 100644
--- a/doc/guides/nics/ice.rst
+++ b/doc/guides/nics/ice.rst
@@ -147,6 +147,16 @@ Runtime Configuration
 
     -a 80:00.0,ddp_pkg_file=/path/to/ice-version.pkg
 
+- ``Traffic Management Scheduling Levels``
+
+  The DPDK Traffic Management (rte_tm) APIs can be used to configure the Tx scheduler on the NIC.
+  From 24.11 release, all available hardware layers are available to software.
+  Earlier versions of DPDK only supported 3 levels in the scheduling hierarchy.
+  To help with backward compatibility the ``tm_sched_levels`` parameter can be used to limit the scheduler levels to the provided value.
+  The provided value must be between 3 and 8.
+  If the value provided is greater than the number of levels provided by the HW,
+  SW will use the hardware maximum value.
+
 - ``Protocol extraction for per queue``
 
   Configure the RX queues to do protocol extraction into mbuf for protocol
@@ -450,7 +460,7 @@ The ice PMD provides support for the Traffic Management API (RTE_TM),
 enabling users to configure and manage the traffic shaping and scheduling of transmitted packets.
 By default, all available transmit scheduler layers are available for configuration,
 allowing up to 2000 queues to be configured in a hierarchy of up to 8 levels.
-The number of levels in the hierarchy can be adjusted via driver parameter:
+The number of levels in the hierarchy can be adjusted via driver parameters:
 
 * the default 9-level topology (8 levels usable) can be replaced by a new topology downloaded from a DDP file,
   using the driver parameter ``ddp_load_sched_topo=1``.
@@ -461,6 +471,10 @@ The number of levels in the hierarchy can be adjusted via driver parameter:
   with increased fan-out at the lower 3 levels
   e.g. 64 at levels 2 and 3, and 256 or more at the leaf-node level.
 
+* the number of levels can be reduced by setting the driver parameter ``tm_sched_levels`` to a lower value.
+  This scheme will reduce in software the number of editable levels,
+  but will not affect the fan-out from each level.
+
 For more details on how to configure a Tx scheduling hierarchy,
 please refer to the ``rte_tm`` `API documentation <https://doc.dpdk.org/api/rte__tm_8h.html>`_.
 
diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c
index 7252ea6b24..474c6eeef5 100644
--- a/drivers/net/ice/ice_ethdev.c
+++ b/drivers/net/ice/ice_ethdev.c
@@ -40,6 +40,7 @@
 #define ICE_MBUF_CHECK_ARG       "mbuf_check"
 #define ICE_DDP_FILENAME_ARG      "ddp_pkg_file"
 #define ICE_DDP_LOAD_SCHED_ARG    "ddp_load_sched_topo"
+#define ICE_TM_LEVELS_ARG         "tm_sched_levels"
 
 #define ICE_CYCLECOUNTER_MASK  0xffffffffffffffffULL
 
@@ -58,6 +59,7 @@ static const char * const ice_valid_args[] = {
 	ICE_MBUF_CHECK_ARG,
 	ICE_DDP_FILENAME_ARG,
 	ICE_DDP_LOAD_SCHED_ARG,
+	ICE_TM_LEVELS_ARG,
 	NULL
 };
 
@@ -1854,6 +1856,7 @@ ice_send_driver_ver(struct ice_hw *hw)
 static int
 ice_pf_setup(struct ice_pf *pf)
 {
+	struct ice_adapter *ad = ICE_PF_TO_ADAPTER(pf);
 	struct ice_hw *hw = ICE_PF_TO_HW(pf);
 	struct ice_vsi *vsi;
 	uint16_t unused;
@@ -1878,6 +1881,28 @@ ice_pf_setup(struct ice_pf *pf)
 		return -EINVAL;
 	}
 
+	/* set the number of hidden Tx scheduler layers. If no devargs parameter to
+	 * set the number of exposed levels, the default is to expose all levels,
+	 * except the TC layer.
+	 *
+	 * If the number of exposed levels is set, we check that it's not greater
+	 * than the HW can provide (in which case we do nothing except log a warning),
+	 * and then set the hidden layers to be the total number of levels minus the
+	 * requested visible number.
+	 */
+	pf->tm_conf.hidden_layers = hw->port_info->has_tc;
+	if (ad->devargs.tm_exposed_levels != 0) {
+		const uint8_t avail_layers = hw->num_tx_sched_layers - hw->port_info->has_tc;
+		const uint8_t req_layers = ad->devargs.tm_exposed_levels;
+		if (req_layers > avail_layers) {
+			PMD_INIT_LOG(WARNING, "The number of TM scheduler exposed levels exceeds the number of supported levels (%u)",
+					avail_layers);
+			PMD_INIT_LOG(WARNING, "Setting scheduler layers to %u", avail_layers);
+		} else {
+			pf->tm_conf.hidden_layers = hw->num_tx_sched_layers - req_layers;
+		}
+	}
+
 	pf->main_vsi = vsi;
 	rte_spinlock_init(&pf->link_lock);
 
@@ -2066,6 +2091,32 @@ parse_u64(const char *key, const char *value, void *args)
 	return 0;
 }
 
+static int
+parse_tx_sched_levels(const char *key, const char *value, void *args)
+{
+	uint8_t *num = args;
+	long tmp;
+	char *endptr;
+
+	errno = 0;
+	tmp = strtol(value, &endptr, 0);
+	/* the value needs two stage validation, since the actual number of available
+	 * levels is not known at this point. Initially just validate that it is in
+	 * the correct range, between 3 and 8. Later validation will check that the
+	 * available layers on a particular port is higher than the value specified here.
+	 */
+	if (errno || *endptr != '\0' ||
+			tmp < (ICE_VSI_LAYER_OFFSET - 1) || tmp >= ICE_TM_MAX_LAYERS) {
+		PMD_DRV_LOG(WARNING, "%s: Invalid value \"%s\", should be in range [%d, %d]",
+			    key, value, ICE_VSI_LAYER_OFFSET - 1, ICE_TM_MAX_LAYERS - 1);
+		return -1;
+	}
+
+	*num = tmp;
+
+	return 0;
+}
+
 static int
 lookup_pps_type(const char *pps_name)
 {
@@ -2312,6 +2363,12 @@ static int ice_parse_devargs(struct rte_eth_dev *dev)
 				 &parse_bool, &ad->devargs.ddp_load_sched);
 	if (ret)
 		goto bail;
+
+	ret = rte_kvargs_process(kvlist, ICE_TM_LEVELS_ARG,
+				 &parse_tx_sched_levels, &ad->devargs.tm_exposed_levels);
+	if (ret)
+		goto bail;
+
 bail:
 	rte_kvargs_free(kvlist);
 	return ret;
@@ -7182,6 +7239,7 @@ RTE_PMD_REGISTER_PARAM_STRING(net_ice,
 			      ICE_DEFAULT_MAC_DISABLE "=<0|1>"
 			      ICE_DDP_FILENAME_ARG "=</path/to/file>"
 			      ICE_DDP_LOAD_SCHED_ARG "=<0|1>"
+			      ICE_TM_LEVELS_ARG "=<N>"
 			      ICE_RX_LOW_LATENCY_ARG "=<0|1>");
 
 RTE_LOG_REGISTER_SUFFIX(ice_logtype_init, init, NOTICE);
diff --git a/drivers/net/ice/ice_ethdev.h b/drivers/net/ice/ice_ethdev.h
index 70189a9eb7..978fcf50f1 100644
--- a/drivers/net/ice/ice_ethdev.h
+++ b/drivers/net/ice/ice_ethdev.h
@@ -486,6 +486,7 @@ struct ice_tm_node {
 struct ice_tm_conf {
 	struct ice_shaper_profile_list shaper_profile_list;
 	struct ice_tm_node *root; /* root node - port */
+	uint8_t hidden_layers;    /* the number of hierarchy layers hidden from app */
 	bool committed;
 	bool clear_on_fail;
 };
@@ -559,6 +560,7 @@ struct ice_devargs {
 	uint8_t pin_idx;
 	uint8_t pps_out_ena;
 	uint8_t ddp_load_sched;
+	uint8_t tm_exposed_levels;
 	int xtr_field_offs;
 	uint8_t xtr_flag_offs[PROTO_XTR_MAX];
 	/* Name of the field. */
@@ -662,7 +664,7 @@ struct ice_vsi_vlan_pvid_info {
 
 /* ICE_PF_TO */
 #define ICE_PF_TO_HW(pf) \
-	(&(((struct ice_pf *)pf)->adapter->hw))
+	(&((pf)->adapter->hw))
 #define ICE_PF_TO_ADAPTER(pf) \
 	((struct ice_adapter *)(pf)->adapter)
 #define ICE_PF_TO_ETH_DEV(pf) \
diff --git a/drivers/net/ice/ice_tm.c b/drivers/net/ice/ice_tm.c
index 235eda24e5..18ac324a61 100644
--- a/drivers/net/ice/ice_tm.c
+++ b/drivers/net/ice/ice_tm.c
@@ -198,9 +198,10 @@ find_node(struct ice_tm_node *root, uint32_t id)
 }
 
 static inline uint8_t
-ice_get_leaf_level(struct ice_hw *hw)
+ice_get_leaf_level(const struct ice_pf *pf)
 {
-	return hw->num_tx_sched_layers - 1 - hw->port_info->has_tc;
+	const struct ice_hw *hw = ICE_PF_TO_HW(pf);
+	return hw->num_tx_sched_layers - pf->tm_conf.hidden_layers - 1;
 }
 
 static int
@@ -208,7 +209,6 @@ ice_node_type_get(struct rte_eth_dev *dev, uint32_t node_id,
 		   int *is_leaf, struct rte_tm_error *error)
 {
 	struct ice_pf *pf = ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
-	struct ice_hw *hw = ICE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
 	struct ice_tm_node *tm_node;
 
 	if (!is_leaf || !error)
@@ -228,7 +228,7 @@ ice_node_type_get(struct rte_eth_dev *dev, uint32_t node_id,
 		return -EINVAL;
 	}
 
-	if (tm_node->level == ice_get_leaf_level(hw))
+	if (tm_node->level == ice_get_leaf_level(pf))
 		*is_leaf = true;
 	else
 		*is_leaf = false;
@@ -408,6 +408,7 @@ ice_tm_node_add(struct rte_eth_dev *dev, uint32_t node_id,
 	struct ice_tm_shaper_profile *shaper_profile = NULL;
 	struct ice_tm_node *tm_node;
 	struct ice_tm_node *parent_node = NULL;
+	uint8_t layer_offset = pf->tm_conf.hidden_layers;
 	int ret;
 
 	if (!params || !error)
@@ -446,7 +447,7 @@ ice_tm_node_add(struct rte_eth_dev *dev, uint32_t node_id,
 		/* add the root node */
 		tm_node = rte_zmalloc(NULL,
 				sizeof(struct ice_tm_node) +
-				sizeof(struct ice_tm_node *) * hw->max_children[0],
+				sizeof(struct ice_tm_node *) * hw->max_children[layer_offset],
 				0);
 		if (!tm_node)
 			return -ENOMEM;
@@ -478,7 +479,7 @@ ice_tm_node_add(struct rte_eth_dev *dev, uint32_t node_id,
 	}
 
 	ret = ice_node_param_check(node_id, priority, weight,
-			params, level_id == ice_get_leaf_level(hw), error);
+			params, level_id == ice_get_leaf_level(pf), error);
 	if (ret)
 		return ret;
 
@@ -506,7 +507,7 @@ ice_tm_node_add(struct rte_eth_dev *dev, uint32_t node_id,
 
 	tm_node = rte_zmalloc(NULL,
 			sizeof(struct ice_tm_node) +
-			sizeof(struct ice_tm_node *) * hw->max_children[level_id],
+			sizeof(struct ice_tm_node *) * hw->max_children[level_id + layer_offset],
 			0);
 	if (!tm_node)
 		return -ENOMEM;
@@ -753,8 +754,8 @@ free_sched_node_recursive(struct ice_port_info *pi, const struct ice_sched_node
 }
 
 static int
-create_sched_node_recursive(struct ice_port_info *pi, struct ice_tm_node *sw_node,
-		struct ice_sched_node *hw_root, uint16_t *created)
+create_sched_node_recursive(struct ice_pf *pf, struct ice_port_info *pi,
+		 struct ice_tm_node *sw_node, struct ice_sched_node *hw_root, uint16_t *created)
 {
 	struct ice_sched_node *parent = sw_node->sched_node;
 	uint32_t teid;
@@ -786,14 +787,14 @@ create_sched_node_recursive(struct ice_port_info *pi, struct ice_tm_node *sw_nod
 	 * then just return, rather than trying to create leaf nodes.
 	 * That is done later at queue start.
 	 */
-	if (sw_node->level + 2 == ice_get_leaf_level(pi->hw))
+	if (sw_node->level + 2 == ice_get_leaf_level(pf))
 		return 0;
 
 	for (uint16_t i = 0; i < sw_node->reference_count; i++) {
 		if (sw_node->children[i]->reference_count == 0)
 			continue;
 
-		if (create_sched_node_recursive(pi, sw_node->children[i], hw_root, created) < 0)
+		if (create_sched_node_recursive(pf, pi, sw_node->children[i], hw_root, created) < 0)
 			return -1;
 	}
 	return 0;
@@ -806,16 +807,20 @@ commit_new_hierarchy(struct rte_eth_dev *dev)
 	struct ice_pf *pf = ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
 	struct ice_port_info *pi = hw->port_info;
 	struct ice_tm_node *sw_root = pf->tm_conf.root;
-	struct ice_sched_node *new_vsi_root = (pi->has_tc) ? pi->root->children[0] : pi->root;
+	const uint16_t new_root_level = pf->tm_conf.hidden_layers;
 	/* count nodes per hw level, not per logical */
 	uint16_t nodes_created_per_level[ICE_TM_MAX_LAYERS] = {0};
-	uint8_t q_lvl = ice_get_leaf_level(hw);
+	uint8_t q_lvl = ice_get_leaf_level(pf);
 	uint8_t qg_lvl = q_lvl - 1;
 
+	struct ice_sched_node *new_vsi_root = hw->vsi_ctx[pf->main_vsi->idx]->sched.vsi_node[0];
+	while (new_vsi_root->tx_sched_layer > new_root_level)
+		new_vsi_root = new_vsi_root->parent;
+
 	free_sched_node_recursive(pi, new_vsi_root, new_vsi_root, new_vsi_root->vsi_handle);
 
 	sw_root->sched_node = new_vsi_root;
-	if (create_sched_node_recursive(pi, sw_root, new_vsi_root, nodes_created_per_level) < 0)
+	if (create_sched_node_recursive(pf, pi, sw_root, new_vsi_root, nodes_created_per_level) < 0)
 		return -1;
 	for (uint16_t i = 0; i < RTE_DIM(nodes_created_per_level); i++)
 		PMD_DRV_LOG(DEBUG, "Created %u nodes at level %u",
-- 
2.43.0
^ permalink raw reply	[flat|nested] 76+ messages in thread
* Re: [PATCH v6 1/5] net/ice: add option to download scheduler topology
  2024-10-29 17:01   ` [PATCH v6 1/5] net/ice: add option to download scheduler topology Bruce Richardson
@ 2024-10-30 15:21     ` Medvedkin, Vladimir
  0 siblings, 0 replies; 76+ messages in thread
From: Medvedkin, Vladimir @ 2024-10-30 15:21 UTC (permalink / raw)
  To: Bruce Richardson, dev
Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
On 29/10/2024 17:01, Bruce Richardson wrote:
> The DDP package file being loaded at init time may contain an
> alternative Tx Scheduler topology in it. Add driver option to load this
> topology at init time.
>
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> ---
>   doc/guides/nics/ice.rst        | 15 +++++++++++++++
>   drivers/net/ice/base/ice_ddp.c | 20 +++++++++++++++++---
>   drivers/net/ice/base/ice_ddp.h |  4 ++--
>   drivers/net/ice/ice_ethdev.c   | 26 ++++++++++++++++----------
>   drivers/net/ice/ice_ethdev.h   |  1 +
>   5 files changed, 51 insertions(+), 15 deletions(-)
>
> diff --git a/doc/guides/nics/ice.rst b/doc/guides/nics/ice.rst
> index 6c66dc8008..42bbe50968 100644
> --- a/doc/guides/nics/ice.rst
> +++ b/doc/guides/nics/ice.rst
> @@ -298,6 +298,21 @@ Runtime Configuration
>     As a trade-off, this configuration may cause the packet processing performance
>     degradation due to the PCI bandwidth limitation.
>   
> +- ``Tx Scheduler Topology Download``
> +
> +  The default Tx scheduler topology exposed by the NIC,
> +  generally a 9-level topology of which 8 levels are SW configurable,
> +  may be updated by a new topology loaded from a DDP package file.
> +  The ``ddp_load_sched_topo`` option can be used to specify that the scheduler topology,
> +  if any, in the DDP package file being used should be loaded into the NIC.
> +  For example::
> +
> +    -a 0000:88:00.0,ddp_load_sched_topo=1
> +
> +  or::
> +
> +    -a 0000:88:00.0,ddp_pkg_file=/path/to/pkg.file,ddp_load_sched_topo=1
> +
>   - ``Tx diagnostics`` (default ``not enabled``)
>   
>     Set the ``devargs`` parameter ``mbuf_check`` to enable Tx diagnostics.
> diff --git a/drivers/net/ice/base/ice_ddp.c b/drivers/net/ice/base/ice_ddp.c
> index c17a58eab8..850c722a3f 100644
> --- a/drivers/net/ice/base/ice_ddp.c
> +++ b/drivers/net/ice/base/ice_ddp.c
> @@ -1333,7 +1333,7 @@ ice_fill_hw_ptype(struct ice_hw *hw)
>    * ice_copy_and_init_pkg() instead of directly calling ice_init_pkg() in this
>    * case.
>    */
> -enum ice_ddp_state ice_init_pkg(struct ice_hw *hw, u8 *buf, u32 len)
> +enum ice_ddp_state ice_init_pkg(struct ice_hw *hw, u8 *buf, u32 len, bool load_sched)
>   {
>   	bool already_loaded = false;
>   	enum ice_ddp_state state;
> @@ -1351,6 +1351,20 @@ enum ice_ddp_state ice_init_pkg(struct ice_hw *hw, u8 *buf, u32 len)
>   		return state;
>   	}
>   
> +	if (load_sched) {
> +		enum ice_status res = ice_cfg_tx_topo(hw, buf, len);
> +		if (res != ICE_SUCCESS) {
> +			ice_debug(hw, ICE_DBG_INIT,
> +				  "failed to apply sched topology  (err: %d)\n",
> +				  res);
> +			return ICE_DDP_PKG_ERR;
> +		}
> +		ice_debug(hw, ICE_DBG_INIT,
> +			  "Topology download successful, reinitializing device\n");
> +		ice_deinit_hw(hw);
> +		ice_init_hw(hw);
> +	}
> +
>   	/* initialize package info */
>   	state = ice_init_pkg_info(hw, pkg);
>   	if (state)
> @@ -1423,7 +1437,7 @@ enum ice_ddp_state ice_init_pkg(struct ice_hw *hw, u8 *buf, u32 len)
>    * related routines.
>    */
>   enum ice_ddp_state
> -ice_copy_and_init_pkg(struct ice_hw *hw, const u8 *buf, u32 len)
> +ice_copy_and_init_pkg(struct ice_hw *hw, const u8 *buf, u32 len, bool load_sched)
>   {
>   	enum ice_ddp_state state;
>   	u8 *buf_copy;
> @@ -1433,7 +1447,7 @@ ice_copy_and_init_pkg(struct ice_hw *hw, const u8 *buf, u32 len)
>   
>   	buf_copy = (u8 *)ice_memdup(hw, buf, len, ICE_NONDMA_TO_NONDMA);
>   
> -	state = ice_init_pkg(hw, buf_copy, len);
> +	state = ice_init_pkg(hw, buf_copy, len, load_sched);
>   	if (!ice_is_init_pkg_successful(state)) {
>   		/* Free the copy, since we failed to initialize the package */
>   		ice_free(hw, buf_copy);
> diff --git a/drivers/net/ice/base/ice_ddp.h b/drivers/net/ice/base/ice_ddp.h
> index 5512669f44..d79cdee13a 100644
> --- a/drivers/net/ice/base/ice_ddp.h
> +++ b/drivers/net/ice/base/ice_ddp.h
> @@ -454,9 +454,9 @@ ice_pkg_enum_entry(struct ice_seg *ice_seg, struct ice_pkg_enum *state,
>   void *
>   ice_pkg_enum_section(struct ice_seg *ice_seg, struct ice_pkg_enum *state,
>   		     u32 sect_type);
> -enum ice_ddp_state ice_init_pkg(struct ice_hw *hw, u8 *buff, u32 len);
> +enum ice_ddp_state ice_init_pkg(struct ice_hw *hw, u8 *buff, u32 len, bool load_sched);
>   enum ice_ddp_state
> -ice_copy_and_init_pkg(struct ice_hw *hw, const u8 *buf, u32 len);
> +ice_copy_and_init_pkg(struct ice_hw *hw, const u8 *buf, u32 len, bool load_sched);
>   bool ice_is_init_pkg_successful(enum ice_ddp_state state);
>   void ice_free_seg(struct ice_hw *hw);
>   
> diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c
> index d5e94a6685..da91012a5e 100644
> --- a/drivers/net/ice/ice_ethdev.c
> +++ b/drivers/net/ice/ice_ethdev.c
> @@ -39,6 +39,7 @@
>   #define ICE_RX_LOW_LATENCY_ARG    "rx_low_latency"
>   #define ICE_MBUF_CHECK_ARG       "mbuf_check"
>   #define ICE_DDP_FILENAME_ARG      "ddp_pkg_file"
> +#define ICE_DDP_LOAD_SCHED_ARG    "ddp_load_sched_topo"
>   
>   #define ICE_CYCLECOUNTER_MASK  0xffffffffffffffffULL
>   
> @@ -56,6 +57,7 @@ static const char * const ice_valid_args[] = {
>   	ICE_DEFAULT_MAC_DISABLE,
>   	ICE_MBUF_CHECK_ARG,
>   	ICE_DDP_FILENAME_ARG,
> +	ICE_DDP_LOAD_SCHED_ARG,
>   	NULL
>   };
>   
> @@ -1997,7 +1999,7 @@ int ice_load_pkg(struct ice_adapter *adapter, bool use_dsn, uint64_t dsn)
>   load_fw:
>   	PMD_INIT_LOG(DEBUG, "DDP package name: %s", pkg_file);
>   
> -	err = ice_copy_and_init_pkg(hw, buf, bufsz);
> +	err = ice_copy_and_init_pkg(hw, buf, bufsz, adapter->devargs.ddp_load_sched);
>   	if (!ice_is_init_pkg_successful(err)) {
>   		PMD_INIT_LOG(ERR, "ice_copy_and_init_hw failed: %d", err);
>   		free(buf);
> @@ -2029,20 +2031,19 @@ ice_base_queue_get(struct ice_pf *pf)
>   static int
>   parse_bool(const char *key, const char *value, void *args)
>   {
> -	int *i = (int *)args;
> -	char *end;
> -	int num;
> +	int *i = args;
>   
> -	num = strtoul(value, &end, 10);
> -
> -	if (num != 0 && num != 1) {
> -		PMD_DRV_LOG(WARNING, "invalid value:\"%s\" for key:\"%s\", "
> -			"value must be 0 or 1",
> +	if (value == NULL || value[0] == '\0') {
> +		PMD_DRV_LOG(WARNING, "key:\"%s\", requires a value, which must be 0 or 1", key);
> +		return -1;
> +	}
> +	if (value[1] != '\0' || (value[0] != '0' && value[0] != '1')) {
> +		PMD_DRV_LOG(WARNING, "invalid value:\"%s\" for key:\"%s\", value must be 0 or 1",
>   			value, key);
>   		return -1;
>   	}
>   
> -	*i = num;
> +	*i = (value[0] == '1');
>   	return 0;
>   }
>   
> @@ -2307,6 +2308,10 @@ static int ice_parse_devargs(struct rte_eth_dev *dev)
>   	if (ret)
>   		goto bail;
>   
> +	ret = rte_kvargs_process(kvlist, ICE_DDP_LOAD_SCHED_ARG,
> +				 &parse_bool, &ad->devargs.ddp_load_sched);
> +	if (ret)
> +		goto bail;
>   bail:
>   	rte_kvargs_free(kvlist);
>   	return ret;
> @@ -7185,6 +7190,7 @@ RTE_PMD_REGISTER_PARAM_STRING(net_ice,
>   			      ICE_SAFE_MODE_SUPPORT_ARG "=<0|1>"
>   			      ICE_DEFAULT_MAC_DISABLE "=<0|1>"
>   			      ICE_DDP_FILENAME_ARG "=</path/to/file>"
> +			      ICE_DDP_LOAD_SCHED_ARG "=<0|1>"
>   			      ICE_RX_LOW_LATENCY_ARG "=<0|1>");
>   
>   RTE_LOG_REGISTER_SUFFIX(ice_logtype_init, init, NOTICE);
> diff --git a/drivers/net/ice/ice_ethdev.h b/drivers/net/ice/ice_ethdev.h
> index 076cf595e8..2794a76096 100644
> --- a/drivers/net/ice/ice_ethdev.h
> +++ b/drivers/net/ice/ice_ethdev.h
> @@ -564,6 +564,7 @@ struct ice_devargs {
>   	uint8_t proto_xtr[ICE_MAX_QUEUE_NUM];
>   	uint8_t pin_idx;
>   	uint8_t pps_out_ena;
> +	uint8_t ddp_load_sched;
>   	int xtr_field_offs;
>   	uint8_t xtr_flag_offs[PROTO_XTR_MAX];
>   	/* Name of the field. */
-- 
Regards,
Vladimir
^ permalink raw reply	[flat|nested] 76+ messages in thread
* Re: [PATCH v6 3/5] net/ice: enhance Tx scheduler hierarchy support
  2024-10-29 17:01   ` [PATCH v6 3/5] net/ice: enhance Tx scheduler hierarchy support Bruce Richardson
@ 2024-10-30 15:21     ` Medvedkin, Vladimir
  0 siblings, 0 replies; 76+ messages in thread
From: Medvedkin, Vladimir @ 2024-10-30 15:21 UTC (permalink / raw)
  To: Bruce Richardson, dev
Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
On 29/10/2024 17:01, Bruce Richardson wrote:
> Increase the flexibility of the Tx scheduler hierarchy support in the
> driver. If the HW/firmware allows it, allow creating up to 2k child
> nodes per scheduler node. Also expand the number of supported layers to
> the max available, rather than always just having 3 layers.  One
> restriction on this change is that the topology needs to be configured
> and enabled before port queue setup, in many cases, and before port
> start in all cases.
>
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> ---
>   doc/guides/nics/ice.rst      |  31 +-
>   drivers/net/ice/ice_ethdev.c |   9 -
>   drivers/net/ice/ice_ethdev.h |  17 +-
>   drivers/net/ice/ice_rxtx.c   |  10 +
>   drivers/net/ice/ice_tm.c     | 548 ++++++++++++++---------------------
>   5 files changed, 248 insertions(+), 367 deletions(-)
>
> diff --git a/doc/guides/nics/ice.rst b/doc/guides/nics/ice.rst
> index 42bbe50968..df489be08d 100644
> --- a/doc/guides/nics/ice.rst
> +++ b/doc/guides/nics/ice.rst
> @@ -447,21 +447,22 @@ Traffic Management Support
>   ~~~~~~~~~~~~~~~~~~~~~~~~~~
>   
>   The ice PMD provides support for the Traffic Management API (RTE_TM),
> -allow users to offload a 3-layers Tx scheduler on the E810 NIC:
> -
> -- ``Port Layer``
> -
> -  This is the root layer, support peak bandwidth configuration,
> -  max to 32 children.
> -
> -- ``Queue Group Layer``
> -
> -  The middle layer, support peak / committed bandwidth, weight, priority configurations,
> -  max to 8 children.
> -
> -- ``Queue Layer``
> -
> -  The leaf layer, support peak / committed bandwidth, weight, priority configurations.
> +enabling users to configure and manage the traffic shaping and scheduling of transmitted packets.
> +By default, all available transmit scheduler layers are available for configuration,
> +allowing up to 2000 queues to be configured in a hierarchy of up to 8 levels.
> +The number of levels in the hierarchy can be adjusted via driver parameter:
> +
> +* the default 9-level topology (8 levels usable) can be replaced by a new topology downloaded from a DDP file,
> +  using the driver parameter ``ddp_load_sched_topo=1``.
> +  Using this mechanism, if the number of levels is reduced,
> +  the possible fan-out of child-nodes from each level may be increased.
> +  The default topology is a 9-level tree with a fan-out of 8 at each level.
> +  Released DDP package files contain a 5-level hierarchy (4-levels usable),
> +  with increased fan-out at the lower 3 levels
> +  e.g. 64 at levels 2 and 3, and 256 or more at the leaf-node level.
> +
> +For more details on how to configure a Tx scheduling hierarchy,
> +please refer to the ``rte_tm`` `API documentation <https://doc.dpdk.org/api/rte__tm_8h.html>`_.
>   
>   Additional Options
>   ++++++++++++++++++
> diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c
> index da91012a5e..7252ea6b24 100644
> --- a/drivers/net/ice/ice_ethdev.c
> +++ b/drivers/net/ice/ice_ethdev.c
> @@ -3906,7 +3906,6 @@ ice_dev_start(struct rte_eth_dev *dev)
>   	int mask, ret;
>   	uint8_t timer = hw->func_caps.ts_func_info.tmr_index_owned;
>   	uint32_t pin_idx = ad->devargs.pin_idx;
> -	struct rte_tm_error tm_err;
>   	ice_declare_bitmap(pmask, ICE_PROMISC_MAX);
>   	ice_zero_bitmap(pmask, ICE_PROMISC_MAX);
>   
> @@ -3938,14 +3937,6 @@ ice_dev_start(struct rte_eth_dev *dev)
>   		}
>   	}
>   
> -	if (pf->tm_conf.committed) {
> -		ret = ice_do_hierarchy_commit(dev, pf->tm_conf.clear_on_fail, &tm_err);
> -		if (ret) {
> -			PMD_DRV_LOG(ERR, "fail to commit Tx scheduler");
> -			goto rx_err;
> -		}
> -	}
> -
>   	ice_set_rx_function(dev);
>   	ice_set_tx_function(dev);
>   
> diff --git a/drivers/net/ice/ice_ethdev.h b/drivers/net/ice/ice_ethdev.h
> index 2794a76096..70189a9eb7 100644
> --- a/drivers/net/ice/ice_ethdev.h
> +++ b/drivers/net/ice/ice_ethdev.h
> @@ -458,6 +458,8 @@ struct ice_acl_info {
>   TAILQ_HEAD(ice_shaper_profile_list, ice_tm_shaper_profile);
>   TAILQ_HEAD(ice_tm_node_list, ice_tm_node);
>   
> +#define ICE_TM_MAX_LAYERS ICE_SCHED_9_LAYERS
> +
>   struct ice_tm_shaper_profile {
>   	TAILQ_ENTRY(ice_tm_shaper_profile) node;
>   	uint32_t shaper_profile_id;
> @@ -480,14 +482,6 @@ struct ice_tm_node {
>   	struct ice_sched_node *sched_node;
>   };
>   
> -/* node type of Traffic Manager */
> -enum ice_tm_node_type {
> -	ICE_TM_NODE_TYPE_PORT,
> -	ICE_TM_NODE_TYPE_QGROUP,
> -	ICE_TM_NODE_TYPE_QUEUE,
> -	ICE_TM_NODE_TYPE_MAX,
> -};
> -
>   /* Struct to store all the Traffic Manager configuration. */
>   struct ice_tm_conf {
>   	struct ice_shaper_profile_list shaper_profile_list;
> @@ -690,9 +684,6 @@ int ice_rem_rss_cfg_wrap(struct ice_pf *pf, uint16_t vsi_id,
>   			 struct ice_rss_hash_cfg *cfg);
>   void ice_tm_conf_init(struct rte_eth_dev *dev);
>   void ice_tm_conf_uninit(struct rte_eth_dev *dev);
> -int ice_do_hierarchy_commit(struct rte_eth_dev *dev,
> -			    int clear_on_fail,
> -			    struct rte_tm_error *error);
>   extern const struct rte_tm_ops ice_tm_ops;
>   
>   static inline int
> @@ -750,4 +741,8 @@ int rte_pmd_ice_dump_switch(uint16_t port, uint8_t **buff, uint32_t *size);
>   
>   __rte_experimental
>   int rte_pmd_ice_dump_txsched(uint16_t port, bool detail, FILE *stream);
> +
> +int
> +ice_tm_setup_txq_node(struct ice_pf *pf, struct ice_hw *hw, uint16_t qid, uint32_t node_teid);
> +
>   #endif /* _ICE_ETHDEV_H_ */
> diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c
> index 024d97cb46..0c7106c7e0 100644
> --- a/drivers/net/ice/ice_rxtx.c
> +++ b/drivers/net/ice/ice_rxtx.c
> @@ -747,6 +747,7 @@ ice_tx_queue_start(struct rte_eth_dev *dev, uint16_t tx_queue_id)
>   	int err;
>   	struct ice_vsi *vsi;
>   	struct ice_hw *hw;
> +	struct ice_pf *pf;
>   	struct ice_aqc_add_tx_qgrp *txq_elem;
>   	struct ice_tlan_ctx tx_ctx;
>   	int buf_len;
> @@ -777,6 +778,7 @@ ice_tx_queue_start(struct rte_eth_dev *dev, uint16_t tx_queue_id)
>   
>   	vsi = txq->vsi;
>   	hw = ICE_VSI_TO_HW(vsi);
> +	pf = ICE_VSI_TO_PF(vsi);
>   
>   	memset(&tx_ctx, 0, sizeof(tx_ctx));
>   	txq_elem->num_txqs = 1;
> @@ -812,6 +814,14 @@ ice_tx_queue_start(struct rte_eth_dev *dev, uint16_t tx_queue_id)
>   	/* store the schedule node id */
>   	txq->q_teid = txq_elem->txqs[0].q_teid;
>   
> +	/* move the queue to correct position in hierarchy, if explicit hierarchy configured */
> +	if (pf->tm_conf.committed)
> +		if (ice_tm_setup_txq_node(pf, hw, tx_queue_id, txq->q_teid) != 0) {
> +			PMD_DRV_LOG(ERR, "Failed to set up txq traffic management node");
> +			rte_free(txq_elem);
> +			return -EIO;
> +		}
> +
>   	dev->data->tx_queue_state[tx_queue_id] = RTE_ETH_QUEUE_STATE_STARTED;
>   
>   	rte_free(txq_elem);
> diff --git a/drivers/net/ice/ice_tm.c b/drivers/net/ice/ice_tm.c
> index 636ab77f26..a135e9db30 100644
> --- a/drivers/net/ice/ice_tm.c
> +++ b/drivers/net/ice/ice_tm.c
> @@ -1,17 +1,15 @@
>   /* SPDX-License-Identifier: BSD-3-Clause
>    * Copyright(c) 2022 Intel Corporation
>    */
> +#include <rte_ethdev.h>
>   #include <rte_tm_driver.h>
>   
>   #include "ice_ethdev.h"
>   #include "ice_rxtx.h"
>   
> -#define MAX_CHILDREN_PER_SCHED_NODE	8
> -#define MAX_CHILDREN_PER_TM_NODE	256
> -
>   static int ice_hierarchy_commit(struct rte_eth_dev *dev,
>   				 int clear_on_fail,
> -				 __rte_unused struct rte_tm_error *error);
> +				 struct rte_tm_error *error);
>   static int ice_tm_node_add(struct rte_eth_dev *dev, uint32_t node_id,
>   	      uint32_t parent_node_id, uint32_t priority,
>   	      uint32_t weight, uint32_t level_id,
> @@ -86,9 +84,10 @@ ice_tm_conf_uninit(struct rte_eth_dev *dev)
>   }
>   
>   static int
> -ice_node_param_check(struct ice_pf *pf, uint32_t node_id,
> +ice_node_param_check(uint32_t node_id,
>   		      uint32_t priority, uint32_t weight,
>   		      const struct rte_tm_node_params *params,
> +		      bool is_leaf,
>   		      struct rte_tm_error *error)
>   {
>   	/* checked all the unsupported parameter */
> @@ -123,7 +122,7 @@ ice_node_param_check(struct ice_pf *pf, uint32_t node_id,
>   	}
>   
>   	/* for non-leaf node */
> -	if (node_id >= pf->dev_data->nb_tx_queues) {
> +	if (!is_leaf) {
>   		if (params->nonleaf.wfq_weight_mode) {
>   			error->type =
>   				RTE_TM_ERROR_TYPE_NODE_PARAMS_WFQ_WEIGHT_MODE;
> @@ -147,6 +146,11 @@ ice_node_param_check(struct ice_pf *pf, uint32_t node_id,
>   	}
>   
>   	/* for leaf node */
> +	if (node_id >= RTE_MAX_QUEUES_PER_PORT) {
> +		error->type = RTE_TM_ERROR_TYPE_NODE_ID;
> +		error->message = "Node ID out of range for a leaf node.";
> +		return -EINVAL;
> +	}
>   	if (params->leaf.cman) {
>   		error->type = RTE_TM_ERROR_TYPE_NODE_PARAMS_CMAN;
>   		error->message = "Congestion management not supported";
> @@ -193,11 +197,18 @@ find_node(struct ice_tm_node *root, uint32_t id)
>   	return NULL;
>   }
>   
> +static inline uint8_t
> +ice_get_leaf_level(struct ice_hw *hw)
> +{
> +	return hw->num_tx_sched_layers - 1 - hw->port_info->has_tc;
> +}
> +
>   static int
>   ice_node_type_get(struct rte_eth_dev *dev, uint32_t node_id,
>   		   int *is_leaf, struct rte_tm_error *error)
>   {
>   	struct ice_pf *pf = ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
> +	struct ice_hw *hw = ICE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
>   	struct ice_tm_node *tm_node;
>   
>   	if (!is_leaf || !error)
> @@ -217,7 +228,7 @@ ice_node_type_get(struct rte_eth_dev *dev, uint32_t node_id,
>   		return -EINVAL;
>   	}
>   
> -	if (tm_node->level == ICE_TM_NODE_TYPE_QUEUE)
> +	if (tm_node->level == ice_get_leaf_level(hw))
>   		*is_leaf = true;
>   	else
>   		*is_leaf = false;
> @@ -393,34 +404,21 @@ ice_tm_node_add(struct rte_eth_dev *dev, uint32_t node_id,
>   	      struct rte_tm_error *error)
>   {
>   	struct ice_pf *pf = ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
> +	struct ice_hw *hw = ICE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
>   	struct ice_tm_shaper_profile *shaper_profile = NULL;
>   	struct ice_tm_node *tm_node;
> -	struct ice_tm_node *parent_node;
> +	struct ice_tm_node *parent_node = NULL;
>   	int ret;
>   
>   	if (!params || !error)
>   		return -EINVAL;
>   
> -	ret = ice_node_param_check(pf, node_id, priority, weight,
> -				    params, error);
> -	if (ret)
> -		return ret;
> -
> -	/* check if the node is already existed */
> -	if (find_node(pf->tm_conf.root, node_id)) {
> -		error->type = RTE_TM_ERROR_TYPE_NODE_ID;
> -		error->message = "node id already used";
> -		return -EINVAL;
> -	}
> -
>   	/* check the shaper profile id */
>   	if (params->shaper_profile_id != RTE_TM_SHAPER_PROFILE_ID_NONE) {
> -		shaper_profile = ice_shaper_profile_search(dev,
> -			params->shaper_profile_id);
> +		shaper_profile = ice_shaper_profile_search(dev, params->shaper_profile_id);
>   		if (!shaper_profile) {
> -			error->type =
> -				RTE_TM_ERROR_TYPE_NODE_PARAMS_SHAPER_PROFILE_ID;
> -			error->message = "shaper profile not exist";
> +			error->type = RTE_TM_ERROR_TYPE_NODE_PARAMS_SHAPER_PROFILE_ID;
> +			error->message = "shaper profile does not exist";
>   			return -EINVAL;
>   		}
>   	}
> @@ -428,9 +426,9 @@ ice_tm_node_add(struct rte_eth_dev *dev, uint32_t node_id,
>   	/* root node if not have a parent */
>   	if (parent_node_id == RTE_TM_NODE_ID_NULL) {
>   		/* check level */
> -		if (level_id != ICE_TM_NODE_TYPE_PORT) {
> +		if (level_id != 0) {
>   			error->type = RTE_TM_ERROR_TYPE_NODE_PARAMS;
> -			error->message = "Wrong level";
> +			error->message = "Wrong level, root node (NULL parent) must be at level 0";
>   			return -EINVAL;
>   		}
>   
> @@ -441,74 +439,75 @@ ice_tm_node_add(struct rte_eth_dev *dev, uint32_t node_id,
>   			return -EINVAL;
>   		}
>   
> +		ret = ice_node_param_check(node_id, priority, weight, params, false, error);
> +		if (ret)
> +			return ret;
> +
>   		/* add the root node */
>   		tm_node = rte_zmalloc(NULL,
> -				      sizeof(struct ice_tm_node) +
> -				      sizeof(struct ice_tm_node *) * MAX_CHILDREN_PER_TM_NODE,
> -				      0);
> +				sizeof(struct ice_tm_node) +
> +				sizeof(struct ice_tm_node *) * hw->max_children[0],
> +				0);
>   		if (!tm_node)
>   			return -ENOMEM;
>   		tm_node->id = node_id;
> -		tm_node->level = ICE_TM_NODE_TYPE_PORT;
> +		tm_node->level = 0;
>   		tm_node->parent = NULL;
>   		tm_node->reference_count = 0;
>   		tm_node->shaper_profile = shaper_profile;
> -		tm_node->children =
> -			(void *)((uint8_t *)tm_node + sizeof(struct ice_tm_node));
> -		rte_memcpy(&tm_node->params, params,
> -				 sizeof(struct rte_tm_node_params));
> +		tm_node->children = RTE_PTR_ADD(tm_node, sizeof(struct ice_tm_node));
> +		tm_node->params = *params;
>   		pf->tm_conf.root = tm_node;
>   		return 0;
>   	}
>   
> -	/* check the parent node */
>   	parent_node = find_node(pf->tm_conf.root, parent_node_id);
>   	if (!parent_node) {
>   		error->type = RTE_TM_ERROR_TYPE_NODE_PARENT_NODE_ID;
>   		error->message = "parent not exist";
>   		return -EINVAL;
>   	}
> -	if (parent_node->level != ICE_TM_NODE_TYPE_PORT &&
> -	    parent_node->level != ICE_TM_NODE_TYPE_QGROUP) {
> -		error->type = RTE_TM_ERROR_TYPE_NODE_PARENT_NODE_ID;
> -		error->message = "parent is not valid";
> -		return -EINVAL;
> -	}
> +
>   	/* check level */
> -	if (level_id != RTE_TM_NODE_LEVEL_ID_ANY &&
> -	    level_id != parent_node->level + 1) {
> +	if (level_id == RTE_TM_NODE_LEVEL_ID_ANY)
> +		level_id = parent_node->level + 1;
> +	else if (level_id != parent_node->level + 1) {
>   		error->type = RTE_TM_ERROR_TYPE_NODE_PARAMS;
>   		error->message = "Wrong level";
>   		return -EINVAL;
>   	}
>   
> -	/* check the node number */
> -	if (parent_node->level == ICE_TM_NODE_TYPE_PORT) {
> -		/* check the queue group number */
> -		if (parent_node->reference_count >= pf->dev_data->nb_tx_queues) {
> -			error->type = RTE_TM_ERROR_TYPE_NODE_ID;
> -			error->message = "too many queue groups";
> -			return -EINVAL;
> -		}
> -	} else {
> -		/* check the queue number */
> -		if (parent_node->reference_count >=
> -			MAX_CHILDREN_PER_SCHED_NODE) {
> -			error->type = RTE_TM_ERROR_TYPE_NODE_ID;
> -			error->message = "too many queues";
> -			return -EINVAL;
> -		}
> -		if (node_id >= pf->dev_data->nb_tx_queues) {
> -			error->type = RTE_TM_ERROR_TYPE_NODE_ID;
> -			error->message = "too large queue id";
> -			return -EINVAL;
> -		}
> +	ret = ice_node_param_check(node_id, priority, weight,
> +			params, level_id == ice_get_leaf_level(hw), error);
> +	if (ret)
> +		return ret;
> +
> +	/* check if the node is already existed */
> +	if (find_node(pf->tm_conf.root, node_id)) {
> +		error->type = RTE_TM_ERROR_TYPE_NODE_ID;
> +		error->message = "node id already used";
> +		return -EINVAL;
> +	}
> +
> +	/* check the parent node */
> +	/* for n-level hierarchy, level n-1 is leaf, so last level with children is n-2 */
> +	if ((int)parent_node->level > hw->num_tx_sched_layers - 2) {
> +		error->type = RTE_TM_ERROR_TYPE_NODE_PARENT_NODE_ID;
> +		error->message = "parent is not valid";
> +		return -EINVAL;
> +	}
> +
> +	/* check the max children allowed at this level */
> +	if (parent_node->reference_count >= hw->max_children[parent_node->level]) {
> +		error->type = RTE_TM_ERROR_TYPE_CAPABILITIES;
> +		error->message = "insufficient number of child nodes supported";
> +		return -EINVAL;
>   	}
>   
>   	tm_node = rte_zmalloc(NULL,
> -			      sizeof(struct ice_tm_node) +
> -			      sizeof(struct ice_tm_node *) * MAX_CHILDREN_PER_TM_NODE,
> -			      0);
> +			sizeof(struct ice_tm_node) +
> +			sizeof(struct ice_tm_node *) * hw->max_children[level_id],
> +			0);
>   	if (!tm_node)
>   		return -ENOMEM;
>   	tm_node->id = node_id;
> @@ -516,25 +515,18 @@ ice_tm_node_add(struct rte_eth_dev *dev, uint32_t node_id,
>   	tm_node->weight = weight;
>   	tm_node->reference_count = 0;
>   	tm_node->parent = parent_node;
> -	tm_node->level = parent_node->level + 1;
> +	tm_node->level = level_id;
>   	tm_node->shaper_profile = shaper_profile;
> -	tm_node->children =
> -		(void *)((uint8_t *)tm_node + sizeof(struct ice_tm_node));
> -	tm_node->parent->children[tm_node->parent->reference_count] = tm_node;
> +	tm_node->children = RTE_PTR_ADD(tm_node, sizeof(struct ice_tm_node));
> +	tm_node->parent->children[tm_node->parent->reference_count++] = tm_node;
> +	tm_node->params = *params;
>   
> -	if (tm_node->priority != 0 && level_id != ICE_TM_NODE_TYPE_QUEUE &&
> -	    level_id != ICE_TM_NODE_TYPE_QGROUP)
> -		PMD_DRV_LOG(WARNING, "priority != 0 not supported in level %d",
> -			    level_id);
> +	if (tm_node->priority != 0)
> +		PMD_DRV_LOG(WARNING, "priority != 0 not supported in level %d", level_id);
>   
> -	if (tm_node->weight != 1 &&
> -	    level_id != ICE_TM_NODE_TYPE_QUEUE && level_id != ICE_TM_NODE_TYPE_QGROUP)
> -		PMD_DRV_LOG(WARNING, "weight != 1 not supported in level %d",
> -			    level_id);
> +	if (tm_node->weight != 1 && level_id == 0)
> +		PMD_DRV_LOG(WARNING, "weight != 1 not supported in level %d", level_id);
>   
> -	rte_memcpy(&tm_node->params, params,
> -			 sizeof(struct rte_tm_node_params));
> -	tm_node->parent->reference_count++;
>   
>   	return 0;
>   }
> @@ -573,7 +565,7 @@ ice_tm_node_delete(struct rte_eth_dev *dev, uint32_t node_id,
>   	}
>   
>   	/* root node */
> -	if (tm_node->level == ICE_TM_NODE_TYPE_PORT) {
> +	if (tm_node->level == 0) {
>   		rte_free(tm_node);
>   		pf->tm_conf.root = NULL;
>   		return 0;
> @@ -593,53 +585,6 @@ ice_tm_node_delete(struct rte_eth_dev *dev, uint32_t node_id,
>   	return 0;
>   }
>   
> -static int ice_move_recfg_lan_txq(struct rte_eth_dev *dev,
> -				  struct ice_sched_node *queue_sched_node,
> -				  struct ice_sched_node *dst_node,
> -				  uint16_t queue_id)
> -{
> -	struct ice_hw *hw = ICE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
> -	struct ice_aqc_move_txqs_data *buf;
> -	struct ice_sched_node *queue_parent_node;
> -	uint8_t txqs_moved;
> -	int ret = ICE_SUCCESS;
> -	uint16_t buf_size = ice_struct_size(buf, txqs, 1);
> -
> -	buf = (struct ice_aqc_move_txqs_data *)ice_malloc(hw, sizeof(*buf));
> -	if (buf == NULL)
> -		return -ENOMEM;
> -
> -	queue_parent_node = queue_sched_node->parent;
> -	buf->src_teid = queue_parent_node->info.node_teid;
> -	buf->dest_teid = dst_node->info.node_teid;
> -	buf->txqs[0].q_teid = queue_sched_node->info.node_teid;
> -	buf->txqs[0].txq_id = queue_id;
> -
> -	ret = ice_aq_move_recfg_lan_txq(hw, 1, true, false, false, false, 50,
> -					NULL, buf, buf_size, &txqs_moved, NULL);
> -	if (ret || txqs_moved == 0) {
> -		PMD_DRV_LOG(ERR, "move lan queue %u failed", queue_id);
> -		rte_free(buf);
> -		return ICE_ERR_PARAM;
> -	}
> -
> -	if (queue_parent_node->num_children > 0) {
> -		queue_parent_node->num_children--;
> -		queue_parent_node->children[queue_parent_node->num_children] = NULL;
> -	} else {
> -		PMD_DRV_LOG(ERR, "invalid children number %d for queue %u",
> -			    queue_parent_node->num_children, queue_id);
> -		rte_free(buf);
> -		return ICE_ERR_PARAM;
> -	}
> -	dst_node->children[dst_node->num_children++] = queue_sched_node;
> -	queue_sched_node->parent = dst_node;
> -	ice_sched_query_elem(hw, queue_sched_node->info.node_teid, &queue_sched_node->info);
> -
> -	rte_free(buf);
> -	return ret;
> -}
> -
>   static int ice_set_node_rate(struct ice_hw *hw,
>   			     struct ice_tm_node *tm_node,
>   			     struct ice_sched_node *sched_node)
> @@ -727,240 +672,179 @@ static int ice_cfg_hw_node(struct ice_hw *hw,
>   	return 0;
>   }
>   
> -static struct ice_sched_node *ice_get_vsi_node(struct ice_hw *hw)
> +int
> +ice_tm_setup_txq_node(struct ice_pf *pf, struct ice_hw *hw, uint16_t qid, uint32_t teid)
>   {
> -	struct ice_sched_node *node = hw->port_info->root;
> -	uint32_t vsi_layer = hw->num_tx_sched_layers - ICE_VSI_LAYER_OFFSET;
> -	uint32_t i;
> -
> -	for (i = 0; i < vsi_layer; i++)
> -		node = node->children[0];
> +	struct ice_sched_node *hw_node = ice_sched_find_node_by_teid(hw->port_info->root, teid);
> +	struct ice_tm_node *sw_node = find_node(pf->tm_conf.root, qid);
>   
> -	return node;
> -}
> -
> -static int ice_reset_noleaf_nodes(struct rte_eth_dev *dev)
> -{
> -	struct ice_pf *pf = ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
> -	struct ice_hw *hw = ICE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
> -	struct ice_sched_node *vsi_node = ice_get_vsi_node(hw);
> -	struct ice_tm_node *root = pf->tm_conf.root;
> -	uint32_t i;
> -	int ret;
> -
> -	/* reset vsi_node */
> -	ret = ice_set_node_rate(hw, NULL, vsi_node);
> -	if (ret) {
> -		PMD_DRV_LOG(ERR, "reset vsi node failed");
> -		return ret;
> -	}
> -
> -	if (root == NULL)
> +	/* not configured in hierarchy */
> +	if (sw_node == NULL)
>   		return 0;
>   
> -	for (i = 0; i < root->reference_count; i++) {
> -		struct ice_tm_node *tm_node = root->children[i];
> +	sw_node->sched_node = hw_node;
>   
> -		if (tm_node->sched_node == NULL)
> -			continue;
> +	/* if the queue node has been put in the wrong place in hierarchy */
> +	if (hw_node->parent != sw_node->parent->sched_node) {
> +		struct ice_aqc_move_txqs_data *buf;
> +		uint8_t txqs_moved = 0;
> +		uint16_t buf_size = ice_struct_size(buf, txqs, 1);
>   
> -		ret = ice_cfg_hw_node(hw, NULL, tm_node->sched_node);
> -		if (ret) {
> -			PMD_DRV_LOG(ERR, "reset queue group node %u failed", tm_node->id);
> -			return ret;
> +		buf = ice_malloc(hw, buf_size);
> +		if (buf == NULL)
> +			return -ENOMEM;
> +
> +		struct ice_sched_node *parent = hw_node->parent;
> +		struct ice_sched_node *new_parent = sw_node->parent->sched_node;
> +		buf->src_teid = parent->info.node_teid;
> +		buf->dest_teid = new_parent->info.node_teid;
> +		buf->txqs[0].q_teid = hw_node->info.node_teid;
> +		buf->txqs[0].txq_id = qid;
> +
> +		int ret = ice_aq_move_recfg_lan_txq(hw, 1, true, false, false, false, 50,
> +						NULL, buf, buf_size, &txqs_moved, NULL);
> +		if (ret || txqs_moved == 0) {
> +			PMD_DRV_LOG(ERR, "move lan queue %u failed", qid);
> +			ice_free(hw, buf);
> +			return ICE_ERR_PARAM;
>   		}
> -		tm_node->sched_node = NULL;
> +
> +		/* now update the ice_sched_nodes to match physical layout */
> +		new_parent->children[new_parent->num_children++] = hw_node;
> +		hw_node->parent = new_parent;
> +		ice_sched_query_elem(hw, hw_node->info.node_teid, &hw_node->info);
> +		for (uint16_t i = 0; i < parent->num_children; i++)
> +			if (parent->children[i] == hw_node) {
> +				/* to remove, just overwrite the old node slot with the last ptr */
> +				parent->children[i] = parent->children[--parent->num_children];
> +				break;
> +			}
>   	}
>   
> -	return 0;
> +	return ice_cfg_hw_node(hw, sw_node, hw_node);
>   }
>   
> -static int ice_remove_leaf_nodes(struct rte_eth_dev *dev)
> +/* from a given node, recursively deletes all the nodes that belong to that vsi.
> + * Any nodes which can't be deleted because they have children belonging to a different
> + * VSI, are now also adjusted to belong to that VSI also
> + */
> +static int
> +free_sched_node_recursive(struct ice_port_info *pi, const struct ice_sched_node *root,
> +		struct ice_sched_node *node, uint8_t vsi_id)
>   {
> -	int ret = 0;
> -	int i;
> +	uint16_t i = 0;
>   
> -	for (i = 0; i < dev->data->nb_tx_queues; i++) {
> -		ret = ice_tx_queue_stop(dev, i);
> -		if (ret) {
> -			PMD_DRV_LOG(ERR, "stop queue %u failed", i);
> -			break;
> +	while (i < node->num_children) {
> +		if (node->children[i]->vsi_handle != vsi_id) {
> +			i++;
> +			continue;
>   		}
> +		free_sched_node_recursive(pi, root, node->children[i], vsi_id);
>   	}
>   
> -	return ret;
> -}
> -
> -static int ice_add_leaf_nodes(struct rte_eth_dev *dev)
> -{
> -	int ret = 0;
> -	int i;
> -
> -	for (i = 0; i < dev->data->nb_tx_queues; i++) {
> -		ret = ice_tx_queue_start(dev, i);
> -		if (ret) {
> -			PMD_DRV_LOG(ERR, "start queue %u failed", i);
> -			break;
> -		}
> +	if (node != root) {
> +		if (node->num_children == 0)
> +			ice_free_sched_node(pi, node);
> +		else
> +			node->vsi_handle = node->children[0]->vsi_handle;
>   	}
>   
> -	return ret;
> +	return 0;
>   }
>   
> -int ice_do_hierarchy_commit(struct rte_eth_dev *dev,
> -			    int clear_on_fail,
> -			    struct rte_tm_error *error)
> +static int
> +create_sched_node_recursive(struct ice_port_info *pi, struct ice_tm_node *sw_node,
> +		struct ice_sched_node *hw_root, uint16_t *created)
>   {
> -	struct ice_pf *pf = ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
> -	struct ice_hw *hw = ICE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
> -	struct ice_tm_node *root;
> -	struct ice_sched_node *vsi_node = NULL;
> -	struct ice_sched_node *queue_node;
> -	struct ice_tx_queue *txq;
> -	int ret_val = 0;
> -	uint32_t i;
> -	uint32_t idx_vsi_child;
> -	uint32_t idx_qg;
> -	uint32_t nb_vsi_child;
> -	uint32_t nb_qg;
> -	uint32_t qid;
> -	uint32_t q_teid;
> -
> -	/* remove leaf nodes */
> -	ret_val = ice_remove_leaf_nodes(dev);
> -	if (ret_val) {
> -		error->type = RTE_TM_ERROR_TYPE_UNSPECIFIED;
> -		PMD_DRV_LOG(ERR, "reset no-leaf nodes failed");
> -		goto fail_clear;
> -	}
> -
> -	/* reset no-leaf nodes. */
> -	ret_val = ice_reset_noleaf_nodes(dev);
> -	if (ret_val) {
> -		error->type = RTE_TM_ERROR_TYPE_UNSPECIFIED;
> -		PMD_DRV_LOG(ERR, "reset leaf nodes failed");
> -		goto add_leaf;
> -	}
> -
> -	/* config vsi node */
> -	vsi_node = ice_get_vsi_node(hw);
> -	root = pf->tm_conf.root;
> -
> -	ret_val = ice_set_node_rate(hw, root, vsi_node);
> -	if (ret_val) {
> -		error->type = RTE_TM_ERROR_TYPE_UNSPECIFIED;
> -		PMD_DRV_LOG(ERR,
> -			    "configure vsi node %u bandwidth failed",
> -			    root->id);
> -		goto add_leaf;
> -	}
> -
> -	/* config queue group nodes */
> -	nb_vsi_child = vsi_node->num_children;
> -	nb_qg = vsi_node->children[0]->num_children;
> -
> -	idx_vsi_child = 0;
> -	idx_qg = 0;
> -
> -	if (root == NULL)
> -		goto commit;
> -
> -	for (i = 0; i < root->reference_count; i++) {
> -		struct ice_tm_node *tm_node = root->children[i];
> -		struct ice_tm_node *tm_child_node;
> -		struct ice_sched_node *qgroup_sched_node =
> -			vsi_node->children[idx_vsi_child]->children[idx_qg];
> -		uint32_t j;
> -
> -		ret_val = ice_cfg_hw_node(hw, tm_node, qgroup_sched_node);
> -		if (ret_val) {
> -			error->type = RTE_TM_ERROR_TYPE_UNSPECIFIED;
> -			PMD_DRV_LOG(ERR,
> -				    "configure queue group node %u failed",
> -				    tm_node->id);
> -			goto reset_leaf;
> -		}
> -
> -		for (j = 0; j < tm_node->reference_count; j++) {
> -			tm_child_node = tm_node->children[j];
> -			qid = tm_child_node->id;
> -			ret_val = ice_tx_queue_start(dev, qid);
> -			if (ret_val) {
> -				error->type = RTE_TM_ERROR_TYPE_UNSPECIFIED;
> -				PMD_DRV_LOG(ERR, "start queue %u failed", qid);
> -				goto reset_leaf;
> -			}
> -			txq = dev->data->tx_queues[qid];
> -			q_teid = txq->q_teid;
> -			queue_node = ice_sched_get_node(hw->port_info, q_teid);
> -			if (queue_node == NULL) {
> -				error->type = RTE_TM_ERROR_TYPE_UNSPECIFIED;
> -				PMD_DRV_LOG(ERR, "get queue %u node failed", qid);
> -				goto reset_leaf;
> -			}
> -			if (queue_node->info.parent_teid != qgroup_sched_node->info.node_teid) {
> -				ret_val = ice_move_recfg_lan_txq(dev, queue_node,
> -								 qgroup_sched_node, qid);
> -				if (ret_val) {
> -					error->type = RTE_TM_ERROR_TYPE_UNSPECIFIED;
> -					PMD_DRV_LOG(ERR, "move queue %u failed", qid);
> -					goto reset_leaf;
> -				}
> -			}
> -			ret_val = ice_cfg_hw_node(hw, tm_child_node, queue_node);
> -			if (ret_val) {
> -				error->type = RTE_TM_ERROR_TYPE_UNSPECIFIED;
> -				PMD_DRV_LOG(ERR,
> -					    "configure queue group node %u failed",
> -					    tm_node->id);
> -				goto reset_leaf;
> -			}
> +	struct ice_sched_node *parent = sw_node->sched_node;
> +	uint32_t teid;
> +	uint16_t added;
> +
> +	/* first create all child nodes */
> +	for (uint16_t i = 0; i < sw_node->reference_count; i++) {
> +		struct ice_tm_node *tm_node = sw_node->children[i];
> +		int res = ice_sched_add_elems(pi, hw_root,
> +				parent, parent->tx_sched_layer + 1,
> +				1 /* num nodes */, &added, &teid,
> +				NULL /* no pre-alloc */);
> +		if (res != 0) {
> +			PMD_DRV_LOG(ERR, "Error with ice_sched_add_elems, adding child node to teid %u",
> +					parent->info.node_teid);
> +			return -1;
>   		}
> -
> -		idx_qg++;
> -		if (idx_qg >= nb_qg) {
> -			idx_qg = 0;
> -			idx_vsi_child++;
> -		}
> -		if (idx_vsi_child >= nb_vsi_child) {
> -			error->type = RTE_TM_ERROR_TYPE_UNSPECIFIED;
> -			PMD_DRV_LOG(ERR, "too many queues");
> -			goto reset_leaf;
> +		struct ice_sched_node *hw_node = ice_sched_find_node_by_teid(parent, teid);
> +		if (ice_cfg_hw_node(pi->hw, tm_node, hw_node) != 0) {
> +			PMD_DRV_LOG(ERR, "Error configuring node %u at layer %u",
> +					teid, parent->tx_sched_layer + 1);
> +			return -1;
>   		}
> +		tm_node->sched_node = hw_node;
> +		created[hw_node->tx_sched_layer]++;
>   	}
>   
> -commit:
> -	pf->tm_conf.committed = true;
> -	pf->tm_conf.clear_on_fail = clear_on_fail;
> +	/* if we have just created the child nodes in the q-group, i.e. last non-leaf layer,
> +	 * then just return, rather than trying to create leaf nodes.
> +	 * That is done later at queue start.
> +	 */
> +	if (sw_node->level + 2 == ice_get_leaf_level(pi->hw))
> +		return 0;
>   
> -	return ret_val;
> +	for (uint16_t i = 0; i < sw_node->reference_count; i++) {
> +		if (sw_node->children[i]->reference_count == 0)
> +			continue;
>   
> -reset_leaf:
> -	ice_remove_leaf_nodes(dev);
> -add_leaf:
> -	ice_add_leaf_nodes(dev);
> -	ice_reset_noleaf_nodes(dev);
> -fail_clear:
> -	/* clear all the traffic manager configuration */
> -	if (clear_on_fail) {
> -		ice_tm_conf_uninit(dev);
> -		ice_tm_conf_init(dev);
> +		if (create_sched_node_recursive(pi, sw_node->children[i], hw_root, created) < 0)
> +			return -1;
>   	}
> -	return ret_val;
> +	return 0;
>   }
>   
> -static int ice_hierarchy_commit(struct rte_eth_dev *dev,
> +static int
> +commit_new_hierarchy(struct rte_eth_dev *dev)
> +{
> +	struct ice_hw *hw = ICE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
> +	struct ice_pf *pf = ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
> +	struct ice_port_info *pi = hw->port_info;
> +	struct ice_tm_node *sw_root = pf->tm_conf.root;
> +	struct ice_sched_node *new_vsi_root = (pi->has_tc) ? pi->root->children[0] : pi->root;
> +	/* count nodes per hw level, not per logical */
> +	uint16_t nodes_created_per_level[ICE_TM_MAX_LAYERS] = {0};
> +	uint8_t q_lvl = ice_get_leaf_level(hw);
> +	uint8_t qg_lvl = q_lvl - 1;
> +
> +	free_sched_node_recursive(pi, new_vsi_root, new_vsi_root, new_vsi_root->vsi_handle);
> +
> +	sw_root->sched_node = new_vsi_root;
> +	if (create_sched_node_recursive(pi, sw_root, new_vsi_root, nodes_created_per_level) < 0)
> +		return -1;
> +	for (uint16_t i = 0; i < RTE_DIM(nodes_created_per_level); i++)
> +		PMD_DRV_LOG(DEBUG, "Created %u nodes at level %u",
> +				nodes_created_per_level[i], i);
> +	hw->vsi_ctx[pf->main_vsi->idx]->sched.vsi_node[0] = new_vsi_root;
> +
> +	pf->main_vsi->nb_qps =
> +			RTE_MIN(nodes_created_per_level[qg_lvl] * hw->max_children[qg_lvl],
> +				hw->layer_info[q_lvl].max_device_nodes);
> +
> +	pf->tm_conf.committed = true; /* set flag to be checks on queue start */
> +
> +	return ice_alloc_lan_q_ctx(hw, 0, 0, pf->main_vsi->nb_qps);
> +}
> +
> +static int
> +ice_hierarchy_commit(struct rte_eth_dev *dev,
>   				 int clear_on_fail,
>   				 struct rte_tm_error *error)
>   {
> -	struct ice_pf *pf = ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
> +	RTE_SET_USED(error);
> +	/* commit should only be done to topology before start! */
> +	if (dev->data->dev_started)
> +		return -1;
>   
> -	/* if device not started, simply set committed flag and return. */
> -	if (!dev->data->dev_started) {
> -		pf->tm_conf.committed = true;
> -		pf->tm_conf.clear_on_fail = clear_on_fail;
> -		return 0;
> +	int ret = commit_new_hierarchy(dev);
> +	if (ret < 0 && clear_on_fail) {
> +		ice_tm_conf_uninit(dev);
> +		ice_tm_conf_init(dev);
>   	}
> -
> -	return ice_do_hierarchy_commit(dev, clear_on_fail, error);
> +	return ret;
>   }
-- 
Regards,
Vladimir
^ permalink raw reply	[flat|nested] 76+ messages in thread
* Re: [PATCH v6 0/5] Improve rte_tm support in ICE driver
  2024-10-29 17:01 ` [PATCH v6 0/5] Improve rte_tm support in ICE driver Bruce Richardson
                     ` (4 preceding siblings ...)
  2024-10-29 17:01   ` [PATCH v6 5/5] net/ice: provide parameter to limit scheduler layers Bruce Richardson
@ 2024-10-30 16:30   ` Bruce Richardson
  5 siblings, 0 replies; 76+ messages in thread
From: Bruce Richardson @ 2024-10-30 16:30 UTC (permalink / raw)
  To: dev; +Cc: vladimir.medvedkin
On Tue, Oct 29, 2024 at 05:01:52PM +0000, Bruce Richardson wrote:
> This patchset expands the capabilities of the traffic management
> support in the ICE driver. It allows the driver to support different
> sizes of topologies, and support >256 queues and more than 3 hierarchy
> layers.
> 
> ---
> 
> v6:
> * remove char arithmetic in patch 1
> * rework parameter checks in patch 3 to shorten and simplify code
> 
Series applied to dpdk-next-net-intel.
/Bruce
^ permalink raw reply	[flat|nested] 76+ messages in thread
end of thread, other threads:[~2024-10-30 16:30 UTC | newest]
Thread overview: 76+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-08-07  9:33 [PATCH 00/15] Improve rte_tm support in ICE driver Bruce Richardson
2024-08-07  9:33 ` [PATCH 01/15] net/ice: add traffic management node query function Bruce Richardson
2024-08-07  9:33 ` [PATCH 02/15] net/ice: detect stopping a flow-director queue twice Bruce Richardson
2024-08-07  9:33 ` [PATCH 03/15] net/ice: improve Tx scheduler graph output Bruce Richardson
2024-08-07  9:33 ` [PATCH 04/15] net/ice: add option to choose DDP package file Bruce Richardson
2024-08-07  9:33 ` [PATCH 05/15] net/ice: add option to download scheduler topology Bruce Richardson
2024-08-07  9:33 ` [PATCH 06/15] net/ice/base: allow init without TC class sched nodes Bruce Richardson
2024-08-07  9:33 ` [PATCH 07/15] net/ice/base: set VSI index on newly created nodes Bruce Richardson
2024-08-07  9:34 ` [PATCH 08/15] net/ice/base: read VSI layer info from VSI Bruce Richardson
2024-08-07  9:34 ` [PATCH 09/15] net/ice/base: remove 255 limit on sched child nodes Bruce Richardson
2024-08-07  9:34 ` [PATCH 10/15] net/ice/base: optimize subtree searches Bruce Richardson
2024-08-07  9:34 ` [PATCH 11/15] net/ice/base: make functions non-static Bruce Richardson
2024-08-07  9:34 ` [PATCH 12/15] net/ice/base: remove flag checks before topology upload Bruce Richardson
2024-08-07  9:34 ` [PATCH 13/15] net/ice: limit the number of queues to sched capabilities Bruce Richardson
2024-08-07  9:34 ` [PATCH 14/15] net/ice: enhance Tx scheduler hierarchy support Bruce Richardson
2024-08-07  9:34 ` [PATCH 15/15] net/ice: add minimal capability reporting API Bruce Richardson
2024-08-07  9:46 ` [PATCH v2 00/15] Improve rte_tm support in ICE driver Bruce Richardson
2024-08-07  9:46   ` [PATCH v2 01/15] net/ice: add traffic management node query function Bruce Richardson
2024-08-07  9:46   ` [PATCH v2 02/15] net/ice: detect stopping a flow-director queue twice Bruce Richardson
2024-08-07  9:46   ` [PATCH v2 03/15] net/ice: improve Tx scheduler graph output Bruce Richardson
2024-08-07  9:46   ` [PATCH v2 04/15] net/ice: add option to choose DDP package file Bruce Richardson
2024-08-07  9:46   ` [PATCH v2 05/15] net/ice: add option to download scheduler topology Bruce Richardson
2024-08-07  9:46   ` [PATCH v2 06/15] net/ice/base: allow init without TC class sched nodes Bruce Richardson
2024-08-07  9:46   ` [PATCH v2 07/15] net/ice/base: set VSI index on newly created nodes Bruce Richardson
2024-08-07  9:46   ` [PATCH v2 08/15] net/ice/base: read VSI layer info from VSI Bruce Richardson
2024-08-07  9:47   ` [PATCH v2 09/15] net/ice/base: remove 255 limit on sched child nodes Bruce Richardson
2024-08-07  9:47   ` [PATCH v2 10/15] net/ice/base: optimize subtree searches Bruce Richardson
2024-08-07  9:47   ` [PATCH v2 11/15] net/ice/base: make functions non-static Bruce Richardson
2024-08-07  9:47   ` [PATCH v2 12/15] net/ice/base: remove flag checks before topology upload Bruce Richardson
2024-08-07  9:47   ` [PATCH v2 13/15] net/ice: limit the number of queues to sched capabilities Bruce Richardson
2024-08-07  9:47   ` [PATCH v2 14/15] net/ice: enhance Tx scheduler hierarchy support Bruce Richardson
2024-08-07  9:47   ` [PATCH v2 15/15] net/ice: add minimal capability reporting API Bruce Richardson
2024-08-12 15:27 ` [PATCH v3 00/16] Improve rte_tm support in ICE driver Bruce Richardson
2024-08-12 15:28   ` [PATCH v3 01/16] net/ice: add traffic management node query function Bruce Richardson
2024-08-12 15:28   ` [PATCH v3 02/16] net/ice: detect stopping a flow-director queue twice Bruce Richardson
2024-08-12 15:28   ` [PATCH v3 03/16] net/ice: improve Tx scheduler graph output Bruce Richardson
2024-08-12 15:28   ` [PATCH v3 04/16] net/ice: add option to choose DDP package file Bruce Richardson
2024-08-12 15:28   ` [PATCH v3 05/16] net/ice: add option to download scheduler topology Bruce Richardson
2024-08-12 15:28   ` [PATCH v3 06/16] net/ice/base: allow init without TC class sched nodes Bruce Richardson
2024-08-12 15:28   ` [PATCH v3 07/16] net/ice/base: set VSI index on newly created nodes Bruce Richardson
2024-08-12 15:28   ` [PATCH v3 08/16] net/ice/base: read VSI layer info from VSI Bruce Richardson
2024-08-12 15:28   ` [PATCH v3 09/16] net/ice/base: remove 255 limit on sched child nodes Bruce Richardson
2024-08-12 15:28   ` [PATCH v3 10/16] net/ice/base: optimize subtree searches Bruce Richardson
2024-08-12 15:28   ` [PATCH v3 11/16] net/ice/base: make functions non-static Bruce Richardson
2024-08-12 15:28   ` [PATCH v3 12/16] net/ice/base: remove flag checks before topology upload Bruce Richardson
2024-08-12 15:28   ` [PATCH v3 13/16] net/ice: limit the number of queues to sched capabilities Bruce Richardson
2024-08-12 15:28   ` [PATCH v3 14/16] net/ice: enhance Tx scheduler hierarchy support Bruce Richardson
2024-08-12 15:28   ` [PATCH v3 15/16] net/ice: add minimal capability reporting API Bruce Richardson
2024-08-12 15:28   ` [PATCH v3 16/16] net/ice: do early check on node level when adding Bruce Richardson
2024-10-23 16:27 ` [PATCH v4 0/5] Improve rte_tm support in ICE driver Bruce Richardson
2024-10-23 16:27   ` [PATCH v4 1/5] net/ice: add option to download scheduler topology Bruce Richardson
2024-10-23 16:27   ` [PATCH v4 2/5] net/ice/base: make context alloc function non-static Bruce Richardson
2024-10-23 16:27   ` [PATCH v4 3/5] net/ice: enhance Tx scheduler hierarchy support Bruce Richardson
2024-10-23 16:27   ` [PATCH v4 4/5] net/ice: allowing stopping port to apply TM topology Bruce Richardson
2024-10-23 16:27   ` [PATCH v4 5/5] net/ice: provide parameter to limit scheduler layers Bruce Richardson
2024-10-23 16:55 ` [PATCH v5 0/5] Improve rte_tm support in ICE driver Bruce Richardson
2024-10-23 16:55   ` [PATCH v5 1/5] net/ice: add option to download scheduler topology Bruce Richardson
2024-10-25 17:01     ` Medvedkin, Vladimir
2024-10-29 15:48       ` Bruce Richardson
2024-10-23 16:55   ` [PATCH v5 2/5] net/ice/base: make context alloc function non-static Bruce Richardson
2024-10-25 17:01     ` Medvedkin, Vladimir
2024-10-23 16:55   ` [PATCH v5 3/5] net/ice: enhance Tx scheduler hierarchy support Bruce Richardson
2024-10-25 17:02     ` Medvedkin, Vladimir
2024-10-23 16:55   ` [PATCH v5 4/5] net/ice: allowing stopping port to apply TM topology Bruce Richardson
2024-10-25 17:02     ` Medvedkin, Vladimir
2024-10-23 16:55   ` [PATCH v5 5/5] net/ice: provide parameter to limit scheduler layers Bruce Richardson
2024-10-25 17:02     ` Medvedkin, Vladimir
2024-10-29 17:01 ` [PATCH v6 0/5] Improve rte_tm support in ICE driver Bruce Richardson
2024-10-29 17:01   ` [PATCH v6 1/5] net/ice: add option to download scheduler topology Bruce Richardson
2024-10-30 15:21     ` Medvedkin, Vladimir
2024-10-29 17:01   ` [PATCH v6 2/5] net/ice/base: make context alloc function non-static Bruce Richardson
2024-10-29 17:01   ` [PATCH v6 3/5] net/ice: enhance Tx scheduler hierarchy support Bruce Richardson
2024-10-30 15:21     ` Medvedkin, Vladimir
2024-10-29 17:01   ` [PATCH v6 4/5] net/ice: allowing stopping port to apply TM topology Bruce Richardson
2024-10-29 17:01   ` [PATCH v6 5/5] net/ice: provide parameter to limit scheduler layers Bruce Richardson
2024-10-30 16:30   ` [PATCH v6 0/5] Improve rte_tm support in ICE driver Bruce Richardson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).