From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id EA0C645BB4; Wed, 23 Oct 2024 18:28:32 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 96A1242EBF; Wed, 23 Oct 2024 18:28:08 +0200 (CEST) Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.14]) by mails.dpdk.org (Postfix) with ESMTP id 8930142EA3 for ; Wed, 23 Oct 2024 18:28:02 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1729700882; x=1761236882; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=WCymyYuRclQT0FrZQDfaUGflNi4qcIzSa/5gLCLxkQY=; b=afRhBOsVUSHM6XvKN41E3s2iPSHMPxcts9FS6RnMtZQtgUPCLr+cPvgL a+a0nLXj/YKWkS3YNeFwKAx/YwlAvi2yc7accx8Hxj9DfoG1u2NvuNz+f Rww/RxP1ykkbjIMJeBGHhIxEjrNp1I0Uujr/sGxcksrRvpswY2/X71kdJ lcHa+0tqGMh7uSBUZ3y3ijaMgzb96MBGSvC7OTa+pdYxe65xN2ziAcR9F MnUnnSEQIhwRuVR7HharSxaSX759pRNRv4DPYaxokQ3SzFonpJAh1Bky9 TluQ9nTk8KrplNoWzSie20g1e9no8pZwlX0OkO99XI+gF3ilaP+DHmiPG Q==; X-CSE-ConnectionGUID: X7cu4DC9TyyOQI4GLNrQUQ== X-CSE-MsgGUID: gB/0IyzdRE+/liquoPMM9g== X-IronPort-AV: E=McAfee;i="6700,10204,11234"; a="29521283" X-IronPort-AV: E=Sophos;i="6.11,226,1725346800"; d="scan'208";a="29521283" Received: from fmviesa002.fm.intel.com ([10.60.135.142]) by fmvoesa108.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Oct 2024 09:28:01 -0700 X-CSE-ConnectionGUID: alcnHjFLS9WGfLAPSbASwA== X-CSE-MsgGUID: 14Y40SbmRIKGJIpW88Tu3A== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,226,1725346800"; d="scan'208";a="103583984" Received: from unknown (HELO silpixa00401385.ir.intel.com) ([10.237.214.25]) by fmviesa002.fm.intel.com with ESMTP; 23 Oct 2024 09:28:01 -0700 From: Bruce Richardson To: dev@dpdk.org Cc: Bruce Richardson Subject: [PATCH v4 5/5] net/ice: provide parameter to limit scheduler layers Date: Wed, 23 Oct 2024 17:27:36 +0100 Message-ID: <20241023162747.453267-6-bruce.richardson@intel.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20241023162747.453267-1-bruce.richardson@intel.com> References: <20240807093407.452784-1-bruce.richardson@intel.com> <20241023162747.453267-1-bruce.richardson@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org In order to help with backward compatibility for applications, which may expect the ice driver tx scheduler (accessed via tm apis) to only have 3 layers, add in a devarg to allow the user to explicitly limit the number of scheduler layers visible to the application. Signed-off-by: Bruce Richardson --- doc/guides/nics/ice.rst | 16 +++++++++- drivers/net/ice/ice_ethdev.c | 57 ++++++++++++++++++++++++++++++++++++ drivers/net/ice/ice_ethdev.h | 4 ++- drivers/net/ice/ice_tm.c | 28 ++++++++++-------- 4 files changed, 91 insertions(+), 14 deletions(-) diff --git a/doc/guides/nics/ice.rst b/doc/guides/nics/ice.rst index dc649f8e31..d73fd1b995 100644 --- a/doc/guides/nics/ice.rst +++ b/doc/guides/nics/ice.rst @@ -147,6 +147,16 @@ Runtime Configuration -a 80:00.0,ddp_pkg_file=/path/to/ice-version.pkg +- ``Traffic Management Scheduling Levels`` + + The DPDK Traffic Management (rte_tm) APIs can be used to configure the Tx scheduler on the NIC. + From 24.11 release, all available hardware layers are available to software. + Earlier versions of DPDK only supported 3 levels in the scheduling hierarchy. + To help with backward compatibility the ``tm_sched_levels`` parameter can be used to limit the scheduler levels to the provided value. + The provided value must be between 3 and 8. + If the value provided is greater than the number of levels provided by the HW, + SW will use the hardware maximum value. + - ``Protocol extraction for per queue`` Configure the RX queues to do protocol extraction into mbuf for protocol @@ -450,7 +460,7 @@ The ice PMD provides support for the Traffic Management API (RTE_TM), enabling users to configure and manage the traffic shaping and scheduling of transmitted packets. By default, all available transmit scheduler layers are available for configuration, allowing up to 2000 queues to be configured in a hierarchy of up to 8 levels. -The number of levels in the hierarchy can be adjusted via driver parameter: +The number of levels in the hierarchy can be adjusted via driver parameters: * the default 9-level topology (8 levels usable) can be replaced by a new topology downloaded from a DDP file, using the driver parameter ``ddp_load_sched_topo=1``. @@ -461,6 +471,10 @@ The number of levels in the hierarchy can be adjusted via driver parameter: with increased fan-out at the lower 3 levels e.g. 64 at levels 2 and 3, and 256 or more at the leaf-node level. +* the number of levels can be reduced by setting the driver parameter ``tm_sched_levels`` to a lower value. + This scheme will reduce in software the number of editable levels, + but will not affect the fan-out from each level. + For more details on how to configure a Tx scheduling hierarchy, please refer to the ``rte_tm`` `API documentation `_. diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c index 7aed26118f..6f3bef1078 100644 --- a/drivers/net/ice/ice_ethdev.c +++ b/drivers/net/ice/ice_ethdev.c @@ -40,6 +40,7 @@ #define ICE_MBUF_CHECK_ARG "mbuf_check" #define ICE_DDP_FILENAME_ARG "ddp_pkg_file" #define ICE_DDP_LOAD_SCHED_ARG "ddp_load_sched_topo" +#define ICE_TM_LEVELS_ARG "tm_sched_levels" #define ICE_CYCLECOUNTER_MASK 0xffffffffffffffffULL @@ -58,6 +59,7 @@ static const char * const ice_valid_args[] = { ICE_MBUF_CHECK_ARG, ICE_DDP_FILENAME_ARG, ICE_DDP_LOAD_SCHED_ARG, + ICE_TM_LEVELS_ARG, NULL }; @@ -1854,6 +1856,7 @@ ice_send_driver_ver(struct ice_hw *hw) static int ice_pf_setup(struct ice_pf *pf) { + struct ice_adapter *ad = ICE_PF_TO_ADAPTER(pf); struct ice_hw *hw = ICE_PF_TO_HW(pf); struct ice_vsi *vsi; uint16_t unused; @@ -1878,6 +1881,27 @@ ice_pf_setup(struct ice_pf *pf) return -EINVAL; } + /* set the number of hidden Tx scheduler layers. If no devargs parameter to + * set the number of exposed levels, the default is to expose all levels, + * except the TC layer. + * + * If the number of exposed levels is set, we check that it's not greater + * than the HW can provide (in which case we do nothing except log a warning), + * and then set the hidden layers to be the total number of levels minus the + * requested visible number. + */ + pf->tm_conf.hidden_layers = hw->port_info->has_tc; + if (ad->devargs.tm_exposed_levels != 0) { + const uint8_t avail_layers = hw->num_tx_sched_layers - hw->port_info->has_tc; + const uint8_t req_layers = ad->devargs.tm_exposed_levels; + if (req_layers > avail_layers) { + PMD_INIT_LOG(WARNING, "The number of TM scheduler exposed levels exceeds the number of supported levels (%u)", + avail_layers); + PMD_INIT_LOG(WARNING, "Setting scheduler layers to %u", avail_layers); + } else + pf->tm_conf.hidden_layers = hw->num_tx_sched_layers - req_layers; + } + pf->main_vsi = vsi; rte_spinlock_init(&pf->link_lock); @@ -2066,6 +2090,32 @@ parse_u64(const char *key, const char *value, void *args) return 0; } +static int +parse_tx_sched_levels(const char *key, const char *value, void *args) +{ + uint8_t *num = args; + long int tmp; + char *endptr; + + errno = 0; + tmp = strtol(value, &endptr, 0); + /* the value needs two stage validation, since the actual number of available + * levels is not known at this point. Initially just validate that it is in + * the correct range, between 3 and 8. Later validation will check that the + * available layers on a particular port is higher than the value specified here. + */ + if (errno || *endptr != '\0' || + tmp < (ICE_VSI_LAYER_OFFSET - 1) || tmp >= ICE_SCHED_9_LAYERS) { + PMD_DRV_LOG(WARNING, "%s: Invalid value \"%s\", should be in range [%d, %d]", + key, value, ICE_VSI_LAYER_OFFSET - 1, ICE_SCHED_9_LAYERS - 1); + return -1; + } + + *num = tmp; + + return 0; +} + static int lookup_pps_type(const char *pps_name) { @@ -2312,6 +2362,12 @@ static int ice_parse_devargs(struct rte_eth_dev *dev) &parse_bool, &ad->devargs.ddp_load_sched); if (ret) goto bail; + + ret = rte_kvargs_process(kvlist, ICE_TM_LEVELS_ARG, + &parse_tx_sched_levels, &ad->devargs.tm_exposed_levels); + if (ret) + goto bail; + bail: rte_kvargs_free(kvlist); return ret; @@ -7182,6 +7238,7 @@ RTE_PMD_REGISTER_PARAM_STRING(net_ice, ICE_DEFAULT_MAC_DISABLE "=<0|1>" ICE_DDP_FILENAME_ARG "=" ICE_DDP_LOAD_SCHED_ARG "=<0|1>" + ICE_TM_LEVELS_ARG "=" ICE_RX_LOW_LATENCY_ARG "=<0|1>"); RTE_LOG_REGISTER_SUFFIX(ice_logtype_init, init, NOTICE); diff --git a/drivers/net/ice/ice_ethdev.h b/drivers/net/ice/ice_ethdev.h index 71fd7bca64..431561e48f 100644 --- a/drivers/net/ice/ice_ethdev.h +++ b/drivers/net/ice/ice_ethdev.h @@ -484,6 +484,7 @@ struct ice_tm_node { struct ice_tm_conf { struct ice_shaper_profile_list shaper_profile_list; struct ice_tm_node *root; /* root node - port */ + uint8_t hidden_layers; /* the number of hierarchy layers hidden from app */ bool committed; bool clear_on_fail; }; @@ -557,6 +558,7 @@ struct ice_devargs { uint8_t pin_idx; uint8_t pps_out_ena; uint8_t ddp_load_sched; + uint8_t tm_exposed_levels; int xtr_field_offs; uint8_t xtr_flag_offs[PROTO_XTR_MAX]; /* Name of the field. */ @@ -660,7 +662,7 @@ struct ice_vsi_vlan_pvid_info { /* ICE_PF_TO */ #define ICE_PF_TO_HW(pf) \ - (&(((struct ice_pf *)pf)->adapter->hw)) + (&((pf)->adapter->hw)) #define ICE_PF_TO_ADAPTER(pf) \ ((struct ice_adapter *)(pf)->adapter) #define ICE_PF_TO_ETH_DEV(pf) \ diff --git a/drivers/net/ice/ice_tm.c b/drivers/net/ice/ice_tm.c index 09e947a3b1..9e943da7a1 100644 --- a/drivers/net/ice/ice_tm.c +++ b/drivers/net/ice/ice_tm.c @@ -200,9 +200,10 @@ find_node(struct ice_tm_node *root, uint32_t id) } static inline uint8_t -ice_get_leaf_level(struct ice_hw *hw) +ice_get_leaf_level(const struct ice_pf *pf) { - return hw->num_tx_sched_layers - 1 - hw->port_info->has_tc; + const struct ice_hw *hw = ICE_PF_TO_HW(pf); + return hw->num_tx_sched_layers - pf->tm_conf.hidden_layers - 1; } static int @@ -210,7 +211,6 @@ ice_node_type_get(struct rte_eth_dev *dev, uint32_t node_id, int *is_leaf, struct rte_tm_error *error) { struct ice_pf *pf = ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private); - struct ice_hw *hw = ICE_DEV_PRIVATE_TO_HW(dev->data->dev_private); struct ice_tm_node *tm_node; if (!is_leaf || !error) @@ -230,7 +230,7 @@ ice_node_type_get(struct rte_eth_dev *dev, uint32_t node_id, return -EINVAL; } - if (tm_node->level == ice_get_leaf_level(hw)) + if (tm_node->level == ice_get_leaf_level(pf)) *is_leaf = true; else *is_leaf = false; @@ -434,7 +434,7 @@ ice_tm_node_add(struct rte_eth_dev *dev, uint32_t node_id, } ret = ice_node_param_check(node_id, priority, weight, - params, level_id == ice_get_leaf_level(hw), error); + params, level_id == ice_get_leaf_level(pf), error); if (ret) return ret; @@ -762,8 +762,8 @@ free_sched_node_recursive(struct ice_port_info *pi, const struct ice_sched_node } static int -create_sched_node_recursive(struct ice_port_info *pi, struct ice_tm_node *sw_node, - struct ice_sched_node *hw_root, uint16_t *created) +create_sched_node_recursive(struct ice_pf *pf, struct ice_port_info *pi, + struct ice_tm_node *sw_node, struct ice_sched_node *hw_root, uint16_t *created) { struct ice_sched_node *parent = sw_node->sched_node; uint32_t teid; @@ -795,14 +795,14 @@ create_sched_node_recursive(struct ice_port_info *pi, struct ice_tm_node *sw_nod * then just return, rather than trying to create leaf nodes. * That is done later at queue start. */ - if (sw_node->level + 2 == ice_get_leaf_level(pi->hw)) + if (sw_node->level + 2 == ice_get_leaf_level(pf)) return 0; for (uint16_t i = 0; i < sw_node->reference_count; i++) { if (sw_node->children[i]->reference_count == 0) continue; - if (create_sched_node_recursive(pi, sw_node->children[i], hw_root, created) < 0) + if (create_sched_node_recursive(pf, pi, sw_node->children[i], hw_root, created) < 0) return -1; } return 0; @@ -815,15 +815,19 @@ commit_new_hierarchy(struct rte_eth_dev *dev) struct ice_pf *pf = ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private); struct ice_port_info *pi = hw->port_info; struct ice_tm_node *sw_root = pf->tm_conf.root; - struct ice_sched_node *new_vsi_root = (pi->has_tc) ? pi->root->children[0] : pi->root; + const uint16_t new_root_level = pf->tm_conf.hidden_layers; uint16_t nodes_created_per_level[10] = {0}; /* counted per hw level, not per logical */ - uint8_t q_lvl = ice_get_leaf_level(hw); + uint8_t q_lvl = ice_get_leaf_level(pf); uint8_t qg_lvl = q_lvl - 1; + struct ice_sched_node *new_vsi_root = hw->vsi_ctx[pf->main_vsi->idx]->sched.vsi_node[0]; + while (new_vsi_root->tx_sched_layer > new_root_level) + new_vsi_root = new_vsi_root->parent; + free_sched_node_recursive(pi, new_vsi_root, new_vsi_root, new_vsi_root->vsi_handle); sw_root->sched_node = new_vsi_root; - if (create_sched_node_recursive(pi, sw_root, new_vsi_root, nodes_created_per_level) < 0) + if (create_sched_node_recursive(pf, pi, sw_root, new_vsi_root, nodes_created_per_level) < 0) return -1; for (uint16_t i = 0; i < RTE_DIM(nodes_created_per_level); i++) PMD_DRV_LOG(DEBUG, "Created %u nodes at level %u", -- 2.43.0