From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by dpdk.org (Postfix) with ESMTP id DDB8E691A for ; Fri, 7 Apr 2017 19:47:44 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=intel.com; i=@intel.com; q=dns/txt; s=intel; t=1491587265; x=1523123265; h=from:to:cc:subject:date:message-id:references: in-reply-to:content-transfer-encoding:mime-version; bh=+MPrxEUKxFvKVMK6TGcT9zM0Oyb4UrK5YRBJwrW7vT4=; b=OQWZttTLQd96W7gB69fDcNd+Rg/8MYwe35nAVJAKGy2AoUd64rfEPZi+ FQrxDedg9mr9kt0anFlUNs8Iq6NZBQ==; Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 07 Apr 2017 10:47:43 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.37,166,1488873600"; d="scan'208";a="1117020247" Received: from irsmsx154.ger.corp.intel.com ([163.33.192.96]) by orsmga001.jf.intel.com with ESMTP; 07 Apr 2017 10:47:42 -0700 Received: from irsmsx108.ger.corp.intel.com ([169.254.11.239]) by IRSMSX154.ger.corp.intel.com ([169.254.12.233]) with mapi id 14.03.0319.002; Fri, 7 Apr 2017 18:47:41 +0100 From: "Dumitrescu, Cristian" To: Jerin Jacob CC: "dev@dpdk.org" , "thomas.monjalon@6wind.com" , "balasubramanian.manoharan@cavium.com" , "hemant.agrawal@nxp.com" , "shreyansh.jain@nxp.com" Thread-Topic: [PATCH v3 2/2] ethdev: add hierarchical scheduler API Thread-Index: AQHSr6HJ8bnAW/izFUO8nT+NW1wSLKG6II6A Date: Fri, 7 Apr 2017 17:47:40 +0000 Message-ID: <3EB4FA525960D640B5BDFFD6A3D8912652789730@IRSMSX108.ger.corp.intel.com> References: <1488589820-206947-1-git-send-email-cristian.dumitrescu@intel.com> <1488589820-206947-3-git-send-email-cristian.dumitrescu@intel.com> <20170407132025.GA8249@jerin> In-Reply-To: <20170407132025.GA8249@jerin> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiMjcwNmQ4MjQtNGRiZi00ZDAxLTkyYWUtMjU0MDNhYTIwNDY5IiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX0lDIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE2LjUuOS4zIiwiVHJ1c3RlZExhYmVsSGFzaCI6IkxcL2E3UXJ5NG9sUjFrNFkxeDV0ZmhVRmFQRm45eTVYdEZNVmNoZkp4dTFzPSJ9 x-ctpclassification: CTP_IC dlp-product: dlpe-windows dlp-version: 10.0.102.7 dlp-reaction: no-action x-originating-ip: [163.33.239.180] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH v3 2/2] ethdev: add hierarchical scheduler API X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 07 Apr 2017 17:47:45 -0000 Hi Jerin, Thanks for your review! >=20 > Thanks Cristian for v3. >=20 > From Cavium HW perceptive, v3 is in relatively good shape to consume it, > Except the below mentioned two pointers. >=20 > 1) We strongly believes, application explicitly need to give level id, wh= ile > creating topology(i.e rte_tm_node_add()). Reasons are, >=20 > - In the capability side we are exposing nr_levels etc > - I think, _All_ the HW implementation expects to be connected from > level-n to leveln-1. IMO, Its better to express that in API. > - For the SW implementations, which don't care abut the specific level id= for > the > connection can ignore the level id passed by the application. > Let the definition be "level" aware. >=20 The current API proposal creates a new node and connects it to its parent i= n a single step, so when a new node is added its level if completely known = based on its parent level. Therefore, specifying the level of the node when the node is added is redun= dant and therefore not needed. My concern is this requirement can introduce= inconsistency into the API, as the user can specify a level ID for the new= node that is different than (parent node level ID + 1). But based on private conversation it looks to me that you guys have a stron= g opinion on this, so I am taking the action to identify a (nice) way to im= plement your requirement and do it. > 2) There are lot of capability in the TM definition. I don't have strong = option > here as TM stuff comes in control path. >=20 > So expect point (1), generally we are fine with V3. >=20 This is good news! > Detailed comments below, >=20 > > + > > +#ifndef __INCLUDE_RTE_TM_H__ > > +#define __INCLUDE_RTE_TM_H__ > > + > > +/** > > + * @file > > + * RTE Generic Traffic Manager API > > + * > > + * This interface provides the ability to configure the traffic manage= r in a > > + * generic way. It includes features such as: hierarchical scheduling, > > + * traffic shaping, congestion management, packet marking, etc. > > + */ >=20 > Fix missing API documentation doxygen hooks. >=20 > Files: doc/api/doxy-api-index.md and doc/api/doxy-api.conf. > Ref: > http://dpdk.org/browse/dpdk/commit/?id=3D71f2384328651dced05eceee8711 > 9a71f0cf16a7 >=20 Thanks, will do. >=20 > > + > > +#include > > + > > +#ifdef __cplusplus > > +extern "C" { > > +#endif > > + > > +/** Ethernet framing overhead > > + * > > + * Overhead fields per Ethernet frame: > > + * 1. Preamble: 7 bytes; > > + * 2. Start of Frame Delimiter (SFD): 1 byte; > > + * 3. Inter-Frame Gap (IFG): 12 bytes. > > + */ > > +#define RTE_TM_ETH_FRAMING_OVERHEAD 20 >=20 > This definition is not used anywhere This is the typical value to be passed to the shaper profile adjust paramet= er. Will take the action to document this. >=20 > > +/** > > + * Color > > + */ > > +enum rte_tm_color { > > + RTE_TM_GREEN =3D 0, /**< Green */ >=20 > Explicit zero assignment is not required. I think it is needed if we want RTE_TM_COLORS to be equal to the number of = colors? >=20 > > + RTE_TM_YELLOW, /**< Yellow */ > > + RTE_TM_RED, /**< Red */ > > + RTE_TM_COLORS /**< Number of colors */ > > +}; > > + > > +/** > > + * Node statistics counter type > > + */ > > +enum rte_tm_stats_type { > > + /**< Number of packets scheduled from current node. */ > > + RTE_TM_STATS_N_PKTS =3D 1 << 0, > > + > > + RTE_TM_STATS_N_PKTS_QUEUED =3D 1 << 8, > > + > > + /**< Number of bytes currently waiting in the packet queue of > current > > + * leaf node. > > + */ > > + RTE_TM_STATS_N_BYTES_QUEUED =3D 1 << 9, >=20 > I think, For bitmask, it is better to add ULL. > example: > RTE_TM_STATS_N_BYTES_QUEUED =3D 1ULL << 9, The enum fields are uint32_t, not uint64_t; therefore, we cannot use uint64= _t constants. This is why all enums in DPDK are using uint32_t values. I am= not sure, there might be a way to use uint64_t constants by relying on com= piler extensions, by I wanted to keep ourselves out of trouble :). > > +}; >=20 > I think, The above definitions are used as "uint64_t stats_mask" in > the remaining sections. How about changing to "enum rte_tm_stats_type > stats_mask" > instead of uint64_t stats_mask? >=20 The mask is a collection of flags (so multiple enum flags are typically ena= bled in this mask, e.g. stats_mask =3D RTE_TM_STATS_N_PKT | RTE_TM_STATS_N_= BYTES | ...), therefore it cannot be of the same type as the enum. I am taking the action to document that stats_mask is built out by using th= is specific enum flags. > > + > > +/** > > + * Traffic manager dynamic updates > > + */ > > +enum rte_tm_dynamic_update_type { > > + /**< Dynamic parent node update. The new parent node is located > on same > > + * hierarchy level as the former parent node. Consequently, the > node > > + * whose parent is changed preserves its hierarchy level. > > + */ > > + /**< Dynamic update of the set of enabled stats counter types. */ > > + RTE_TM_UPDATE_NODE_STATS =3D 1 << 5, > > + > > + /**< Dynamic update of congestion management mode for leaf > nodes. */ > > + RTE_TM_UPDATE_NODE_CMAN =3D 1 << 6, > > +}; >=20 > Same as above comment on enum. Same as above answer on enum :) >=20 > > +struct rte_tm_level_capabilities { >=20 > IMO, level can be either leaf or nonleaf. If so, following struct makes m= ore > sense to me >=20 > int is_leaf; > uint32_t n_nodes_max; > union { > struct rte_tm_node_capabilities nonleaf; > struct rte_tm_node_capabilities leaf; > }; >=20 This was the way it was done in previous versions, but Hemant rightly obser= ved that leaf nodes typically have different capabilities as non-leaf nodes= , hence the current solution. > > + /**< Maximum number of nodes for the current hierarchy level. */ > > + uint32_t n_nodes_max; > > + > > + /**< Maximum number of non-leaf nodes for the current hierarchy > level. > > + * The value of 0 indicates that current level only supports leaf > > + * nodes. The maximum value is *n_nodes_max*. > > + */ > > + uint32_t n_nodes_nonleaf_max; > > + > > + /**< Maximum number of leaf nodes for the current hierarchy level. > The > > + * value of 0 indicates that current level only supports non-leaf > > + * nodes. The maximum value is *n_nodes_max*. > > + */ > > + uint32_t n_nodes_leaf_max; > > + > > + /**< Summary of node-level capabilities across all the non-leaf > nodes > > + * of the current hierarchy level. Valid only when > > + * *n_nodes_nonleaf_max* is greater than 0. > > + */ > > + struct rte_tm_node_capabilities nonleaf; > > + > > + /**< Summary of node-level capabilities across all the leaf nodes of > > + * the current hierarchy level. Valid only when *n_nodes_leaf_max* > is > > + * greater than 0. > > + */ > > + struct rte_tm_node_capabilities leaf; > > +}; > > + > > +/** > > + * Traffic manager capabilities > > + */ > > +struct rte_tm_capabilities { > > + /**< Maximum number of nodes. */ > > + uint32_t n_nodes_max; > > + > > + /**< Set of supported dynamic update operations > > + * (see enum rte_tm_dynamic_update_type). > > + */ > > + uint64_t dynamic_update_mask; >=20 > IMO, It is better to change as > enum rte_tm_dynamic_update_type dynamic_update_mask > instead of > uint64_t dynamic_update_mask; >=20 Same answer as for the other enum above (mask is not equal to a single enum= value, but a set of enum flags; basically each bit of the mask corresponds= to a different enum value). > > + > > + /**< Summary of node-level capabilities across all non-leaf nodes. */ > > + struct rte_tm_node_capabilities nonleaf; > > + > > + /**< Summary of node-level capabilities across all leaf nodes. */ > > + struct rte_tm_node_capabilities leaf; > > +}; > > + > > +/** > > + * Congestion management (CMAN) mode > > + * > > + * This is used for controlling the admission of packets into a packet= queue > or > > + * group of packet queues on congestion. On request of writing a new > packet > > + * into the current queue while the queue is full, the *tail drop* alg= orithm > > + * drops the new packet while leaving the queue unmodified, as opposed > to *head > > + * drop* algorithm, which drops the packet at the head of the queue (t= he > oldest > > + * packet waiting in the queue) and admits the new packet at the tail = of > the > > + * queue. > > + * > > + * The *Random Early Detection (RED)* algorithm works by proactively > dropping > > + * more and more input packets as the queue occupancy builds up. When > the queue > > + * is full or almost full, RED effectively works as *tail drop*. The > *Weighted > > + * RED* algorithm uses a separate set of RED thresholds for each packe= t > color. > > + */ > > +enum rte_tm_cman_mode { > > + RTE_TM_CMAN_TAIL_DROP =3D 0, /**< Tail drop */ >=20 > explicit zero assignment may not be required Agree, I don't mind either way, by I see that the most of DPD library code = uses explicit initial assignment? >=20 > > + RTE_TM_CMAN_HEAD_DROP, /**< Head drop */ > > + RTE_TM_CMAN_WRED, /**< Weighted Random Early Detection > (WRED) */ > > +}; > > + >=20 > > +struct rte_tm_node_params { > > + /**< Shaper profile for the private shaper. The absence of the > private > > + * shaper for the current node is indicated by setting this parameter > > + * to RTE_TM_SHAPER_PROFILE_ID_NONE. > > + */ > > + uint32_t shaper_profile_id; > > + > > + /**< User allocated array of valid shared shaper IDs. */ > > + uint32_t *shared_shaper_id; > > + > > + /**< Number of shared shaper IDs in the *shared_shaper_id* array. > */ > > + uint32_t n_shared_shapers; > > + > > + /**< Mask of statistics counter types to be enabled for this node. > This > > + * needs to be a subset of the statistics counter types available for > > + * the current node. Any statistics counter type not included in this > > + * set is to be disabled for the current node. > > + */ > > + uint64_t stats_mask; >=20 > How about changing to "enum rte_tm_stats_type" instead of uint64_t ? >=20 Same answer as above ones on enums & masks. > > + > > + union { > > + /**< Parameters only valid for non-leaf nodes. */ > > + struct { > > + /**< For each priority, indicates whether the children > > + * nodes sharing the same priority are to be > scheduled > > + * by WFQ or by WRR. When NULL, it indicates that > WFQ > > + * is to be used for all priorities. When non-NULL, it > > + * points to a pre-allocated array of *n_priority* > > + * elements, with a non-zero value element > indicating > > + * WFQ and a zero value element for WRR. > > + */ > > + int *scheduling_mode_per_priority; > > + > > + /**< Number of priorities. */ > > + uint32_t n_priorities; > > + } nonleaf; >=20 >=20 > Since we are adding all node "connecting" parameter in rte_tm_node_add(). > How about adding WFQ vs WRR as boolean value in rte_tm_node_add() to > maintain > the consistency This is not about the parent node managing this new node as one of its chil= dren nodes, it is about how this new node will manage its future children n= odes, hence the reason to put it here. >=20 > How about new error type in "enum rte_tm_error_type" to specify the > connection > error due to requested mode WFQ or WRR not supported. >=20 I think we already have it, it is called: RTE_TM_ERROR_TYPE_NODE_PARAMS_SCH= EDULING_MODE. > > + > > +/** > > +/** > > + * Traffic manager get number of leaf nodes > > + * > > + * Each leaf node sits on on top of a TX queue of the current Ethernet > port. > > + * Therefore, the set of leaf nodes is predefined, their number is alw= ays > equal > > + * to N (where N is the number of TX queues configured for the current > port) > > + * and their IDs are 0 .. (N-1). > > + * > > + * @param port_id > > + * The port identifier of the Ethernet device. > > + * @param n_leaf_nodes > > + * Number of leaf nodes for the current port. > > + * @param error > > + * Error details. Filled in only on error, when not NULL. > > + * @return > > + * 0 on success, non-zero error code otherwise. > > + */ > > +int > > +rte_tm_get_leaf_nodes(uint8_t port_id, > > + uint32_t *n_leaf_nodes, > > + struct rte_tm_error *error); >=20 > In order to keep consistency with rest of the API, IMO, the API > name can be changed to rte_tm_leaf_nodes_get() >=20 IMO this is not a node API, it is a port API, hence the attempt to avoid rt= e_tm_node_XYZ(). Maybe a longer but less confusing name is: rte_tm_get_number_of_leaf_nodes(= )? > > + > > +/** > > + * Traffic manager node type (i.e. leaf or non-leaf) get > > + * > > + * The leaf nodes have predefined IDs in the range of 0 .. (N-1), wher= e N is > > + * the number of TX queues of the current Ethernet port. The non-leaf > nodes > > + * have their IDs generated by the application outside of the above ra= nge, > > + * which is reserved for leaf nodes. > > + * > > + * @param port_id > > + * The port identifier of the Ethernet device. > > + * @param node_id > > + * Node ID value. Needs to be valid. > > + * @param is_leaf >=20 > Change to "@param[out] is_leaf" to indicate the parameter is output. > I guess, That scheme is missing in overall header file. It is good to hav= e. >=20 OK, will do for the entire file. >=20 > > + * Set to non-zero value when node is leaf and to zero otherwise (no= n- > leaf). > > + * @param error > > + * Error details. Filled in only on error, when not NULL. > > + * @return > > + * 0 on success, non-zero error code otherwise. > > + */ > > +int > > +rte_tm_node_type_get(uint8_t port_id, > > + uint32_t node_id, > > + int *is_leaf, > > + struct rte_tm_error *error); > > + > > +/** > > + * Traffic manager capabilities get > > + * > > + * @param port_id > > + * The port identifier of the Ethernet device. > > + * @param cap >=20 > missing [out]. See above >=20 OK, will do for the entire file. > > + * Traffic manager capabilities. Needs to be pre-allocated and valid= . > > + * @param error > > + * Error details. Filled in only on error, when not NULL. > > + * @return > > + * 0 on success, non-zero error code otherwise. > > + */ > > +int > > +rte_tm_capabilities_get(uint8_t port_id, > > + struct rte_tm_capabilities *cap, > > + struct rte_tm_error *error); > > + >=20 > > +int > > +rte_tm_node_add(uint8_t port_id, > > + uint32_t node_id, > > + uint32_t parent_node_id, > > + uint32_t priority, > > + uint32_t weight, > > + struct rte_tm_node_params *params, > > + struct rte_tm_error *error); >=20 > See the first comment in the beginning of the file. >=20 Noted. > > + > > + * Traffic manager node parent update > > + * > > + * Restriction for root node: its parent cannot be changed. >=20 > IMO, it is nice to mention correspond "enum rte_tm_dynamic_update_type" > flag for this API support here. May be in common code itself we can check > that and > return error if implementation does not meet the capability. >=20 > Applicable to all update APIs >=20 OK, will do for the entire file. > > + * > > + * @param port_id > > + * The port identifier of the Ethernet device. > > + * @param node_id > > + * Node ID. Needs to be valid. > > + * @param parent_node_id > > + * Node ID for the new parent. Needs to be valid. > > + * @param priority > > + * Node priority. The highest node priority is zero. Used by the SP > algorithm > > + * running on the parent of the current node for scheduling this chi= ld > node. > > + * @param weight > > + * Node weight. The node weight is relative to the weight sum of all > siblings > > + * that have the same priority. The lowest weight is zero. Used by t= he > > + * WFQ/WRR algorithm running on the parent of the current node for > scheduling > > + * this child node. > > + * @param error > > + * Error details. Filled in only on error, when not NULL. > > + * @return > > + * 0 on success, non-zero error code otherwise. > > + */ > > +int > > +rte_tm_node_parent_update(uint8_t port_id, > > + uint32_t node_id, > > + uint32_t parent_node_id, > > + uint32_t priority, > > + uint32_t weight, > > + struct rte_tm_error *error); > > + Regards, Cristian