From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 0B2D545BC0; Tue, 29 Oct 2024 18:02:46 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id BA99142F4B; Tue, 29 Oct 2024 18:02:18 +0100 (CET) Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.13]) by mails.dpdk.org (Postfix) with ESMTP id 4108742EFA for ; Tue, 29 Oct 2024 18:02:13 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1730221333; x=1761757333; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Nbgs10bPJkwJ8LubpetWjl5LK2+I3nQc92kJBBWf4r0=; b=UG9t/xa8SmhXUkc99iaVUfk1ZKySYOECeni01MlnM4utASVN7yraToxE rqSiO2ZObQrU3alXtRX+M3sE50Akh/pOR2hi5xb2g7DyH2dUOcGsBM2s0 kuhzUWbmZNncOtsytCE8+PJAmJlACiRMrfV1/2/7t6HGYGciipeda17Nb gQ4V1i9yfLygadaRItYeRPolKP2zzd4kJn0lrs/LxwzcQMH9/8QfszZ2n DaQXUzqPo8R7jFrK+0cWWIc63L1i6sgZAP/QQ+URm5Gxm2IJkVo0k4VNe gSEEQBs5Gk0Y6aivSw+zOOiyvH6RpLbsVYLwP9bC8ISQP+1cNH6ULAyTh Q==; X-CSE-ConnectionGUID: 67nnL1JATLqNRZwiCkh54A== X-CSE-MsgGUID: AwoO3LtLTXWC8UVhmVP5AQ== X-IronPort-AV: E=McAfee;i="6700,10204,11222"; a="40977546" X-IronPort-AV: E=Sophos;i="6.11,199,1725346800"; d="scan'208";a="40977546" Received: from orviesa001.jf.intel.com ([10.64.159.141]) by orvoesa105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Oct 2024 10:02:13 -0700 X-CSE-ConnectionGUID: DXRBJFZWSJKEUnwtMg17Dw== X-CSE-MsgGUID: krmtIkf6SXqwhk9093MC2w== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,241,1725346800"; d="scan'208";a="119494700" Received: from silpixa00400496-oob.ir.intel.com (HELO silpixa00401385.ir.intel.com) ([10.237.214.43]) by orviesa001.jf.intel.com with ESMTP; 29 Oct 2024 10:02:12 -0700 From: Bruce Richardson To: dev@dpdk.org Cc: vladimir.medvedkin@intel.com, Bruce Richardson Subject: [PATCH v6 5/5] net/ice: provide parameter to limit scheduler layers Date: Tue, 29 Oct 2024 17:01:57 +0000 Message-ID: <20241029170157.1390724-6-bruce.richardson@intel.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20241029170157.1390724-1-bruce.richardson@intel.com> References: <20240807093407.452784-1-bruce.richardson@intel.com> <20241029170157.1390724-1-bruce.richardson@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org In order to help with backward compatibility for applications, which may expect the ice driver tx scheduler (accessed via tm apis) to only have 3 layers, add in a devarg to allow the user to explicitly limit the number of scheduler layers visible to the application. Signed-off-by: Bruce Richardson Acked-by: Vladimir Medvedkin --- doc/guides/nics/ice.rst | 16 +++++++++- drivers/net/ice/ice_ethdev.c | 58 ++++++++++++++++++++++++++++++++++++ drivers/net/ice/ice_ethdev.h | 4 ++- drivers/net/ice/ice_tm.c | 33 +++++++++++--------- 4 files changed, 95 insertions(+), 16 deletions(-) diff --git a/doc/guides/nics/ice.rst b/doc/guides/nics/ice.rst index df489be08d..471343a0ac 100644 --- a/doc/guides/nics/ice.rst +++ b/doc/guides/nics/ice.rst @@ -147,6 +147,16 @@ Runtime Configuration -a 80:00.0,ddp_pkg_file=/path/to/ice-version.pkg +- ``Traffic Management Scheduling Levels`` + + The DPDK Traffic Management (rte_tm) APIs can be used to configure the Tx scheduler on the NIC. + From 24.11 release, all available hardware layers are available to software. + Earlier versions of DPDK only supported 3 levels in the scheduling hierarchy. + To help with backward compatibility the ``tm_sched_levels`` parameter can be used to limit the scheduler levels to the provided value. + The provided value must be between 3 and 8. + If the value provided is greater than the number of levels provided by the HW, + SW will use the hardware maximum value. + - ``Protocol extraction for per queue`` Configure the RX queues to do protocol extraction into mbuf for protocol @@ -450,7 +460,7 @@ The ice PMD provides support for the Traffic Management API (RTE_TM), enabling users to configure and manage the traffic shaping and scheduling of transmitted packets. By default, all available transmit scheduler layers are available for configuration, allowing up to 2000 queues to be configured in a hierarchy of up to 8 levels. -The number of levels in the hierarchy can be adjusted via driver parameter: +The number of levels in the hierarchy can be adjusted via driver parameters: * the default 9-level topology (8 levels usable) can be replaced by a new topology downloaded from a DDP file, using the driver parameter ``ddp_load_sched_topo=1``. @@ -461,6 +471,10 @@ The number of levels in the hierarchy can be adjusted via driver parameter: with increased fan-out at the lower 3 levels e.g. 64 at levels 2 and 3, and 256 or more at the leaf-node level. +* the number of levels can be reduced by setting the driver parameter ``tm_sched_levels`` to a lower value. + This scheme will reduce in software the number of editable levels, + but will not affect the fan-out from each level. + For more details on how to configure a Tx scheduling hierarchy, please refer to the ``rte_tm`` `API documentation `_. diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c index 7252ea6b24..474c6eeef5 100644 --- a/drivers/net/ice/ice_ethdev.c +++ b/drivers/net/ice/ice_ethdev.c @@ -40,6 +40,7 @@ #define ICE_MBUF_CHECK_ARG "mbuf_check" #define ICE_DDP_FILENAME_ARG "ddp_pkg_file" #define ICE_DDP_LOAD_SCHED_ARG "ddp_load_sched_topo" +#define ICE_TM_LEVELS_ARG "tm_sched_levels" #define ICE_CYCLECOUNTER_MASK 0xffffffffffffffffULL @@ -58,6 +59,7 @@ static const char * const ice_valid_args[] = { ICE_MBUF_CHECK_ARG, ICE_DDP_FILENAME_ARG, ICE_DDP_LOAD_SCHED_ARG, + ICE_TM_LEVELS_ARG, NULL }; @@ -1854,6 +1856,7 @@ ice_send_driver_ver(struct ice_hw *hw) static int ice_pf_setup(struct ice_pf *pf) { + struct ice_adapter *ad = ICE_PF_TO_ADAPTER(pf); struct ice_hw *hw = ICE_PF_TO_HW(pf); struct ice_vsi *vsi; uint16_t unused; @@ -1878,6 +1881,28 @@ ice_pf_setup(struct ice_pf *pf) return -EINVAL; } + /* set the number of hidden Tx scheduler layers. If no devargs parameter to + * set the number of exposed levels, the default is to expose all levels, + * except the TC layer. + * + * If the number of exposed levels is set, we check that it's not greater + * than the HW can provide (in which case we do nothing except log a warning), + * and then set the hidden layers to be the total number of levels minus the + * requested visible number. + */ + pf->tm_conf.hidden_layers = hw->port_info->has_tc; + if (ad->devargs.tm_exposed_levels != 0) { + const uint8_t avail_layers = hw->num_tx_sched_layers - hw->port_info->has_tc; + const uint8_t req_layers = ad->devargs.tm_exposed_levels; + if (req_layers > avail_layers) { + PMD_INIT_LOG(WARNING, "The number of TM scheduler exposed levels exceeds the number of supported levels (%u)", + avail_layers); + PMD_INIT_LOG(WARNING, "Setting scheduler layers to %u", avail_layers); + } else { + pf->tm_conf.hidden_layers = hw->num_tx_sched_layers - req_layers; + } + } + pf->main_vsi = vsi; rte_spinlock_init(&pf->link_lock); @@ -2066,6 +2091,32 @@ parse_u64(const char *key, const char *value, void *args) return 0; } +static int +parse_tx_sched_levels(const char *key, const char *value, void *args) +{ + uint8_t *num = args; + long tmp; + char *endptr; + + errno = 0; + tmp = strtol(value, &endptr, 0); + /* the value needs two stage validation, since the actual number of available + * levels is not known at this point. Initially just validate that it is in + * the correct range, between 3 and 8. Later validation will check that the + * available layers on a particular port is higher than the value specified here. + */ + if (errno || *endptr != '\0' || + tmp < (ICE_VSI_LAYER_OFFSET - 1) || tmp >= ICE_TM_MAX_LAYERS) { + PMD_DRV_LOG(WARNING, "%s: Invalid value \"%s\", should be in range [%d, %d]", + key, value, ICE_VSI_LAYER_OFFSET - 1, ICE_TM_MAX_LAYERS - 1); + return -1; + } + + *num = tmp; + + return 0; +} + static int lookup_pps_type(const char *pps_name) { @@ -2312,6 +2363,12 @@ static int ice_parse_devargs(struct rte_eth_dev *dev) &parse_bool, &ad->devargs.ddp_load_sched); if (ret) goto bail; + + ret = rte_kvargs_process(kvlist, ICE_TM_LEVELS_ARG, + &parse_tx_sched_levels, &ad->devargs.tm_exposed_levels); + if (ret) + goto bail; + bail: rte_kvargs_free(kvlist); return ret; @@ -7182,6 +7239,7 @@ RTE_PMD_REGISTER_PARAM_STRING(net_ice, ICE_DEFAULT_MAC_DISABLE "=<0|1>" ICE_DDP_FILENAME_ARG "=" ICE_DDP_LOAD_SCHED_ARG "=<0|1>" + ICE_TM_LEVELS_ARG "=" ICE_RX_LOW_LATENCY_ARG "=<0|1>"); RTE_LOG_REGISTER_SUFFIX(ice_logtype_init, init, NOTICE); diff --git a/drivers/net/ice/ice_ethdev.h b/drivers/net/ice/ice_ethdev.h index 70189a9eb7..978fcf50f1 100644 --- a/drivers/net/ice/ice_ethdev.h +++ b/drivers/net/ice/ice_ethdev.h @@ -486,6 +486,7 @@ struct ice_tm_node { struct ice_tm_conf { struct ice_shaper_profile_list shaper_profile_list; struct ice_tm_node *root; /* root node - port */ + uint8_t hidden_layers; /* the number of hierarchy layers hidden from app */ bool committed; bool clear_on_fail; }; @@ -559,6 +560,7 @@ struct ice_devargs { uint8_t pin_idx; uint8_t pps_out_ena; uint8_t ddp_load_sched; + uint8_t tm_exposed_levels; int xtr_field_offs; uint8_t xtr_flag_offs[PROTO_XTR_MAX]; /* Name of the field. */ @@ -662,7 +664,7 @@ struct ice_vsi_vlan_pvid_info { /* ICE_PF_TO */ #define ICE_PF_TO_HW(pf) \ - (&(((struct ice_pf *)pf)->adapter->hw)) + (&((pf)->adapter->hw)) #define ICE_PF_TO_ADAPTER(pf) \ ((struct ice_adapter *)(pf)->adapter) #define ICE_PF_TO_ETH_DEV(pf) \ diff --git a/drivers/net/ice/ice_tm.c b/drivers/net/ice/ice_tm.c index 235eda24e5..18ac324a61 100644 --- a/drivers/net/ice/ice_tm.c +++ b/drivers/net/ice/ice_tm.c @@ -198,9 +198,10 @@ find_node(struct ice_tm_node *root, uint32_t id) } static inline uint8_t -ice_get_leaf_level(struct ice_hw *hw) +ice_get_leaf_level(const struct ice_pf *pf) { - return hw->num_tx_sched_layers - 1 - hw->port_info->has_tc; + const struct ice_hw *hw = ICE_PF_TO_HW(pf); + return hw->num_tx_sched_layers - pf->tm_conf.hidden_layers - 1; } static int @@ -208,7 +209,6 @@ ice_node_type_get(struct rte_eth_dev *dev, uint32_t node_id, int *is_leaf, struct rte_tm_error *error) { struct ice_pf *pf = ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private); - struct ice_hw *hw = ICE_DEV_PRIVATE_TO_HW(dev->data->dev_private); struct ice_tm_node *tm_node; if (!is_leaf || !error) @@ -228,7 +228,7 @@ ice_node_type_get(struct rte_eth_dev *dev, uint32_t node_id, return -EINVAL; } - if (tm_node->level == ice_get_leaf_level(hw)) + if (tm_node->level == ice_get_leaf_level(pf)) *is_leaf = true; else *is_leaf = false; @@ -408,6 +408,7 @@ ice_tm_node_add(struct rte_eth_dev *dev, uint32_t node_id, struct ice_tm_shaper_profile *shaper_profile = NULL; struct ice_tm_node *tm_node; struct ice_tm_node *parent_node = NULL; + uint8_t layer_offset = pf->tm_conf.hidden_layers; int ret; if (!params || !error) @@ -446,7 +447,7 @@ ice_tm_node_add(struct rte_eth_dev *dev, uint32_t node_id, /* add the root node */ tm_node = rte_zmalloc(NULL, sizeof(struct ice_tm_node) + - sizeof(struct ice_tm_node *) * hw->max_children[0], + sizeof(struct ice_tm_node *) * hw->max_children[layer_offset], 0); if (!tm_node) return -ENOMEM; @@ -478,7 +479,7 @@ ice_tm_node_add(struct rte_eth_dev *dev, uint32_t node_id, } ret = ice_node_param_check(node_id, priority, weight, - params, level_id == ice_get_leaf_level(hw), error); + params, level_id == ice_get_leaf_level(pf), error); if (ret) return ret; @@ -506,7 +507,7 @@ ice_tm_node_add(struct rte_eth_dev *dev, uint32_t node_id, tm_node = rte_zmalloc(NULL, sizeof(struct ice_tm_node) + - sizeof(struct ice_tm_node *) * hw->max_children[level_id], + sizeof(struct ice_tm_node *) * hw->max_children[level_id + layer_offset], 0); if (!tm_node) return -ENOMEM; @@ -753,8 +754,8 @@ free_sched_node_recursive(struct ice_port_info *pi, const struct ice_sched_node } static int -create_sched_node_recursive(struct ice_port_info *pi, struct ice_tm_node *sw_node, - struct ice_sched_node *hw_root, uint16_t *created) +create_sched_node_recursive(struct ice_pf *pf, struct ice_port_info *pi, + struct ice_tm_node *sw_node, struct ice_sched_node *hw_root, uint16_t *created) { struct ice_sched_node *parent = sw_node->sched_node; uint32_t teid; @@ -786,14 +787,14 @@ create_sched_node_recursive(struct ice_port_info *pi, struct ice_tm_node *sw_nod * then just return, rather than trying to create leaf nodes. * That is done later at queue start. */ - if (sw_node->level + 2 == ice_get_leaf_level(pi->hw)) + if (sw_node->level + 2 == ice_get_leaf_level(pf)) return 0; for (uint16_t i = 0; i < sw_node->reference_count; i++) { if (sw_node->children[i]->reference_count == 0) continue; - if (create_sched_node_recursive(pi, sw_node->children[i], hw_root, created) < 0) + if (create_sched_node_recursive(pf, pi, sw_node->children[i], hw_root, created) < 0) return -1; } return 0; @@ -806,16 +807,20 @@ commit_new_hierarchy(struct rte_eth_dev *dev) struct ice_pf *pf = ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private); struct ice_port_info *pi = hw->port_info; struct ice_tm_node *sw_root = pf->tm_conf.root; - struct ice_sched_node *new_vsi_root = (pi->has_tc) ? pi->root->children[0] : pi->root; + const uint16_t new_root_level = pf->tm_conf.hidden_layers; /* count nodes per hw level, not per logical */ uint16_t nodes_created_per_level[ICE_TM_MAX_LAYERS] = {0}; - uint8_t q_lvl = ice_get_leaf_level(hw); + uint8_t q_lvl = ice_get_leaf_level(pf); uint8_t qg_lvl = q_lvl - 1; + struct ice_sched_node *new_vsi_root = hw->vsi_ctx[pf->main_vsi->idx]->sched.vsi_node[0]; + while (new_vsi_root->tx_sched_layer > new_root_level) + new_vsi_root = new_vsi_root->parent; + free_sched_node_recursive(pi, new_vsi_root, new_vsi_root, new_vsi_root->vsi_handle); sw_root->sched_node = new_vsi_root; - if (create_sched_node_recursive(pi, sw_root, new_vsi_root, nodes_created_per_level) < 0) + if (create_sched_node_recursive(pf, pi, sw_root, new_vsi_root, nodes_created_per_level) < 0) return -1; for (uint16_t i = 0; i < RTE_DIM(nodes_created_per_level); i++) PMD_DRV_LOG(DEBUG, "Created %u nodes at level %u", -- 2.43.0