From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-000f0801.pphosted.com (mx0b-000f0801.pphosted.com [67.231.152.113]) by dpdk.org (Postfix) with ESMTP id E63B02C56 for ; Wed, 7 Dec 2016 11:58:55 +0100 (CET) Received: from pps.filterd (m0000700.ppops.net [127.0.0.1]) by mx0b-000f0801.pphosted.com (8.16.0.17/8.16.0.17) with SMTP id uB7AoLnp019505; Wed, 7 Dec 2016 02:58:55 -0800 Received: from brmwp-exmb11.corp.brocade.com ([208.47.132.227]) by mx0b-000f0801.pphosted.com with ESMTP id 273vj6xbwm-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT); Wed, 07 Dec 2016 02:58:54 -0800 Received: from EMEAWP-EXMB12.corp.brocade.com (172.29.11.86) by BRMWP-EXMB11.corp.brocade.com (172.16.59.77) with Microsoft SMTP Server (TLS) id 15.0.1210.3; Wed, 7 Dec 2016 03:58:51 -0700 Received: from EMEAWP-EXMB11.corp.brocade.com (172.29.11.85) by EMEAWP-EXMB12.corp.brocade.com (172.29.11.86) with Microsoft SMTP Server (TLS) id 15.0.1210.3; Wed, 7 Dec 2016 11:58:49 +0100 Received: from EMEAWP-EXMB11.corp.brocade.com ([fe80::85ea:b7da:48dd:1640]) by EMEAWP-EXMB11.corp.brocade.com ([fe80::85ea:b7da:48dd:1640%21]) with mapi id 15.00.1210.000; Wed, 7 Dec 2016 11:58:49 +0100 From: Alan Robertson To: "Dumitrescu, Cristian" CC: "dev@dpdk.org" Thread-Topic: [dpdk-dev] [RFC] ethdev: abstraction layer for QoS hierarchical scheduler Thread-Index: AQHSUAXX1nBIRfHHyEC6aV2DVJ7quqD8RZPw Date: Wed, 7 Dec 2016 10:58:49 +0000 Message-ID: <6d862b500e1e4f34a4cbf790db8d5d48@EMEAWP-EXMB11.corp.brocade.com> References: <1480529810-95280-1-git-send-email-cristian.dumitrescu@intel.com> <57688e98-15d5-1866-0c3a-9dda81621651@brocade.com> In-Reply-To: <57688e98-15d5-1866-0c3a-9dda81621651@brocade.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [172.27.212.177] MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2016-12-07_03:, , signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1011 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1609300000 definitions=main-1612070172 Content-Type: text/plain; charset="us-ascii" X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Subject: Re: [dpdk-dev] [RFC] ethdev: abstraction layer for QoS hierarchical scheduler X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 07 Dec 2016 10:58:56 -0000 Hi Cristian, Looking at points 10 and 11 it's good to hear nodes can be dynamically added. We've been trying to decide the best way to do this for support of qos on tunnels for some time now and the existing implementation doesn't allow this so effectively ruled out hierarchical queueing for tunnel targets on the output interface. Having said that, has thought been given to separating the queueing from being so closely tied to the Ethernet transmit process ? When queueing on a tunnel for example we may be working with encryption. When running with an anti-reply window it is really much better to do the QOS (packet reordering) before the encryption. To support this would it be possible to have a separate scheduler structure which can be passed into the scheduling API ? This means the calling code can hang the structure of whatever entity it wishes to perform qos on, and we get dynamic target support (sessions/tunnels etc). Regarding the structure allocation, would it be possible to make the number of queues associated with a TC a compile time option which the scheduler would accommodate ? We frequently only use one queue per tc which means 75% of the space allocated at the queueing layer for that tc is never used. This may be specific to our implementation but if other implementations do the same if folks could say we may get a better idea if this is a common case. Whilst touching on the scheduler, the token replenishment works using a division and multiplication obviously to cater for the fact that it may be run after several tc windows have passed. The most commonly used industrial scheduler simply does a lapsed on the tc and then adds the bc. This relies on the scheduler being called within the tc window though. It would be nice to have this as a configurable option since it's much for efficient assuming the infra code from which it's called can guarantee the calling frequency. I hope you'll consider these points for inclusion into a future road map. Hopefully in the future my employer will increase the priority of some of the tasks and a PR may appear on the mailing list. Thanks, Alan. Subject: [dpdk-dev] [RFC] ethdev: abstraction layer for QoS hierarchical scheduler Date: Wed, 30 Nov 2016 18:16:50 +0000 From: Cristian Dumitrescu To: dev@dpdk.org CC: cristian.dumitrescu@intel.com This RFC proposes an ethdev-based abstraction layer for Quality of Service (QoS) hierarchical scheduler. The goal of the abstraction layer is to provide a simple generic API that is agnostic of the underlying HW, SW or mixed HW-SW complex implementation. Q1: What is the benefit for having an abstraction layer for QoS hierarchical layer? A1: There is growing interest in the industry for handling various HW-based, SW-based or mixed hierarchical scheduler implementations using a unified DPDK API. Q2: Which devices are targeted by this abstraction layer? A2: All current and future devices that expose a hierarchical scheduler feature under DPDK, including NICs, FPGAs, ASICs, SOCs, SW libraries. Q3: Which scheduler hierarchies are supported by the API? A3: Hopefully any scheduler hierarchy can be described and covered by the current API. Of course, functional correctness, accuracy and performance levels depend on the specific implementations of this API. Q4: Why have this abstraction layer into ethdev as opposed to a new type of device (e.g. scheddev) similar to ethdev, cryptodev, eventdev, etc? A4: Packets are sent to the Ethernet device using the ethdev API rte_eth_tx_burst() function, with the hierarchical scheduling taking place automatically (i.e. no SW intervention) in HW implementations. Basically, the hierarchical scheduler is done as part of packet TX operation. The hierarchical scheduler is typically the last stage before packet TX and it is tightly integrated with the TX stage. The hierarchical scheduler is just another offload feature of the Ethernet device, which needs to be accommodated by the ethdev API similar to any other offload feature (such as RSS, DCB, flow director, etc). Once the decision to schedule a specific packet has been taken, this packet cannot be dropped and it has to be sent over the wire as is, otherwise what takes place on the wire is not what was planned at scheduling time, so the scheduling is not accurate (Note: there are some devices which allow prepending headers to the packet after the scheduling stage at the expense of sending correction requests back to the scheduler, but this only strengthens the bond between scheduling and TX). Q5: Given that the packet scheduling takes place automatically for pure HW implementations, how does packet scheduling take place for poll-mode SW implementations? A5: The API provided function rte_sched_run() is designed to take care of this. For HW implementations, this function typically does nothing. For SW implementations, this function is typically expected to perform dequeue of packets from the hierarchical scheduler and their write to Ethernet device TX queue, periodic flush of any buffers on enqueue-side into the hierarchical scheduler for burst-oriented implementations, etc. Q6: Which are the scheduling algorithms supported? A6: The fundamental scheduling algorithms that are supported are Strict Priority (SP) and Weighted Fair Queuing (WFQ). The SP and WFQ algorithms are supported at the level of each node of the scheduling hierarchy, regardless of the node level/position in the tree. The SP algorithm is used to schedule between sibling nodes with different priority, while WFQ is used to schedule between groups of siblings that have the same priority. Algorithms such as Weighed Round Robin (WRR), byte-level WRR, Deficit WRR (DWRR), etc are considered approximations of the ideal WFQ and are therefore assimilated to WFQ, although an associated implementation-dependent accuracy, performance and resource usage trade-off might exist. Q7: Which are the supported congestion management algorithms? A7: Tail drop, head drop and Weighted Random Early Detection (WRED). They are available for every leaf node in the hierarchy, subject to the specific implementation supporting them. Q8: Is traffic shaping supported? A8: Yes, there are a number of shapers (rate limiters) that can be supported for each node in the hierarchy (built-in limit is currently set to 4 per node). Each shaper can be private to a node (used only by that node) or shared between multiple nodes. Q9: What is the purpose of having shaper profiles and WRED profiles? A9: In most implementations, many shapers typically share the same configuration parameters, so defining shaper profiles simplifies the configuration task. Same considerations apply to WRED contexts and profiles. Q10: How is the scheduling hierarchy defined and created? A10: Scheduler hierarchy tree is set up by creating new nodes and connecting them to other existing nodes, which thus become parent nodes. The unique ID that is assigned to each node when the node is created is further used to update the node configuration or to connect children nodes to it. The leaf nodes of the scheduler hierarchy are each attached to one of the Ethernet device TX queues. Q11: Are on-the-fly changes of the scheduling hierarchy allowed by the API? A11: Yes. The actual changes take place subject to the specific implementation supporting them, otherwise error code is returned. Q12: What is the typical function call sequence to set up and run the Ethernet device scheduler? A12: The typical simplified function call sequence is listed below: i) Configure the Ethernet device and its TX queues: rte_eth_dev_configure(), rte_eth_tx_queue_setup() ii) Create WRED profiles and WRED contexts, shaper profiles and shapers: rte_eth_sched_wred_profile_add(), rte_eth_sched_wred_context_add(), rte_eth_sched_shaper_profile_add(), rte_eth_sched_shaper_add() iii) Create the scheduler hierarchy nodes and tree: rte_eth_sched_node_add() iv) Freeze the start-up hierarchy and ask the device whether it supports it: rte_eth_sched_node_add() v) Start the Ethernet port: rte_eth_dev_start() vi) Run-time scheduler hierarchy updates: rte_eth_sched_node_add(), rte_eth_sched_node__set() vii) Run-time packet enqueue into the hierarchical scheduler: rte_eth_tx_burst() viii) Run-time support for SW poll-mode implementations (see previous answer): rte_sched_run() Q13: Which are the possible options for the user when the Ethernet port does not support the scheduling hierarchy required by the user? A13: The following options are available to the user: i) abort ii) try out a new hierarchy (e.g. with less leaf nodes), if acceptable iii) wrap the Ethernet device into a new type of Ethernet device that has a SW front-end implementing the hierarchical scheduler (e.g. existing DPDK library librte_sched); instantiate the new device type on-the-fly and check if the hierarchy requirements can be met by the new device. Signed-off-by: Cristian Dumitrescu --- lib/librte_ether/rte_ethdev.h | 794 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 794 insertions(+) mode change 100644 => 100755 lib/librte_ether/rte_ethdev.h diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h old mode 100644 new mode 100755 index 9678179..d4d8604 --- a/lib/librte_ether/rte_ethdev.h +++ b/lib/librte_ether/rte_ethdev.h @@ -182,6 +182,8 @@ extern "C" { #include #include #include +#include +#include #include "rte_ether.h" #include "rte_eth_ctrl.h" #include "rte_dev_info.h" @@ -1038,6 +1040,152 @@ TAILQ_HEAD(rte_eth_dev_cb_list, rte_eth_dev_callback); /**< l2 tunnel forwarding mask */ #define ETH_L2_TUNNEL_FORWARDING_MASK 0x00000008 +/** + * Scheduler configuration + */ + +/**< Max number of shapers per node */ +#define RTE_ETH_SCHED_SHAPERS_PER_NODE 4 +/**< Invalid shaper ID */ +#define RTE_ETH_SCHED_SHAPER_ID_NONE UINT32_MAX +/**< Max number of WRED contexts per node */ +#define RTE_ETH_SCHED_WRED_CONTEXTS_PER_NODE 4 +/**< Invalid WRED context ID */ +#define RTE_ETH_SCHED_WRED_CONTEXT_ID_NONE UINT32_MAX +/**< Invalid node ID */ +#define RTE_ETH_SCHED_NODE_NULL UINT32_MAX + +/** + * Congestion management (CMAN) mode + * + * This is used for controlling the admission of packets into a packet queue or + * group of packet queues on congestion. On request of writing a new packet + * into the current queue while the queue is full, the *tail drop* algorithm + * drops the new packet while leaving the queue unmodified, as opposed to *head + * drop* algorithm, which drops the packet at the head of the queue (the oldest + * packet waiting in the queue) and admits the new packet at the tail of the + * queue. + * + * The *Random Early Detection (RED)* algorithm works by proactively dropping + * more and more input packets as the queue occupancy builds up. When the queue + * is full or almost full, RED effectively works as *tail drop*. The *Weighted + * RED* algorithm uses a separate set of RED thresholds per packet color. + */ +enum rte_eth_sched_cman_mode { + RTE_ETH_SCHED_CMAN_TAIL_DROP = 0, /**< Tail drop */ + RTE_ETH_SCHED_CMAN_HEAD_DROP, /**< Head drop */ + RTE_ETH_SCHED_CMAN_WRED, /**< Weighted Random Early Detection (WRED) */ +}; + +/** + * WRED profile + */ +struct rte_eth_sched_wred_params { + /**< One set of RED parameters per packet color */ + struct rte_red_params red_params[e_RTE_METER_COLORS]; +}; + +/** + * Shaper (rate limiter) profile + * + * Multiple shaper instances can share the same shaper profile. Each node can + * have multiple shapers enabled (up to RTE_ETH_SCHED_SHAPERS_PER_NODE). Each + * shaper can be private to a node (only one node using it) or shared (multiple + * nodes use the same shaper instance). + */ +struct rte_eth_sched_shaper_params { + uint64_t rate; /**< Token bucket rate (bytes per second) */ + uint64_t size; /**< Token bucket size (bytes) */ +}; + +/** + * Node parameters + * + * Each scheduler hierarchy node has multiple inputs (children nodes of the + * current parent node) and a single output (which is input to its parent + * node). The current node arbitrates its inputs using Strict Priority (SP) + * and Weighted Fair Queuing (WFQ) algorithms to schedule input packets on its + * output while observing its shaping/rate limiting constraints. Algorithms + * such as Weighted Round Robin (WRR), byte-level WRR, Deficit WRR (DWRR), etc + * are considered approximations of the ideal WFQ and are assimilated to WFQ, + * although an associated implementation-dependent trade-off on accuracy, + * performance and resource usage might exist. + * + * Children nodes with different priorities are scheduled using the SP + * algorithm, based on their priority, with zero (0) as the highest priority. + * Children with same priority are scheduled using the WFQ algorithm, based on + * their weight, which is relative to the sum of the weights of all siblings + * with same priority, with one (1) as the lowest weight. + */ +struct rte_eth_sched_node_params { + /**< Child node priority (used by SP). The highest priority is zero. */ + uint32_t priority; + /**< Child node weight (used by WFQ), relative to some of weights of all + siblings with same priority). The lowest weight is one. */ + uint32_t weight; + /**< Set of shaper instances enabled for current node. Each node shaper + can be disabled by setting it to RTE_ETH_SCHED_SHAPER_ID_NONE. */ + uint32_t shaper_id[RTE_ETH_SCHED_SHAPERS_PER_NODE]; + /**< Set to zero if current node is not a hierarchy leaf node, set to a + non-zero value otherwise. A leaf node is a hierarchy node that does + not have any children. A leaf node has to be connected to a valid + packet queue. */ + int is_leaf; + /**< Parameters valid for leaf nodes only */ + struct { + /**< Packet queue ID */ + uint64_t queue_id; + /**< Congestion management mode */ + enum rte_eth_sched_cman_mode cman; + /**< Set of WRED contexts enabled for current leaf node. Each + leaf node WRED context can be disabled by setting it to + RTE_ETH_SCHED_WRED_CONTEXT_ID_NONE. Only valid when + congestion management for current leaf node is set to WRED. */ + uint32_t wred_context_id[RTE_ETH_SCHED_WRED_CONTEXTS_PER_NODE]; + } leaf; +}; + +/** + * Node statistics counter type + */ +enum rte_eth_sched_stats_counter { + /**< Number of packets scheduled from current node. */ + RTE_ETH_SCHED_STATS_COUNTER_N_PKTS = 1<< 0, + /**< Number of bytes scheduled from current node. */ + RTE_ETH_SCHED_STATS_COUNTER_N_BYTES = 1 << 1, + RTE_ETH_SCHED_STATS_COUNTER_N_PKTS_DROPPED = 1 << 2, + RTE_ETH_SCHED_STATS_COUNTER_N_BYTES_DROPPED = 1 << 3, + /**< Number of packets currently waiting in the packet queue of current + leaf node. */ + RTE_ETH_SCHED_STATS_COUNTER_N_PKTS_QUEUED = 1 << 4, + /**< Number of bytes currently waiting in the packet queue of current + leaf node. */ + RTE_ETH_SCHED_STATS_COUNTER_N_BYTES_QUEUED = 1 << 5, +}; + +/** + * Node statistics counters + */ +struct rte_eth_sched_node_stats { + /**< Number of packets scheduled from current node. */ + uint64_t n_pkts; + /**< Number of bytes scheduled from current node. */ + uint64_t n_bytes; + /**< Statistics counters for leaf nodes only */ + struct { + /**< Number of packets dropped by current leaf node. */ + uint64_t n_pkts_dropped; + /**< Number of bytes dropped by current leaf node. */ + uint64_t n_bytes_dropped; + /**< Number of packets currently waiting in the packet queue of + current leaf node. */ + uint64_t n_pkts_queued; + /**< Number of bytes currently waiting in the packet queue of + current leaf node. */ + uint64_t n_bytes_queued; + } leaf; +}; + /* * Definitions of all functions exported by an Ethernet driver through the * the generic structure of type *eth_dev_ops* supplied in the *rte_eth_dev* @@ -1421,6 +1569,120 @@ typedef int (*eth_get_dcb_info)(struct rte_eth_dev *dev, struct rte_eth_dcb_info *dcb_info); /**< @internal Get dcb information on an Ethernet device */ +typedef int (*eth_sched_wred_profile_add_t)(struct rte_eth_dev *dev, + uint32_t wred_profile_id, + struct rte_eth_sched_wred_params *profile); +/**< @internal Scheduler WRED profile add */ + +typedef int (*eth_sched_wred_profile_delete_t)(struct rte_eth_dev *dev, + uint32_t wred_profile_id); +/**< @internal Scheduler WRED profile delete */ + +typedef int (*eth_sched_wred_context_add_t)(struct rte_eth_dev *dev, + uint32_t wred_context_id, + uint32_t wred_profile_id); +/**< @internal Scheduler WRED context add */ + +typedef int (*eth_sched_wred_context_delete_t)(struct rte_eth_dev *dev, + uint32_t wred_context_id); +/**< @internal Scheduler WRED context delete */ + +typedef int (*eth_sched_shaper_profile_add_t)(struct rte_eth_dev *dev, + uint32_t shaper_profile_id, + struct rte_eth_sched_shaper_params *profile); +/**< @internal Scheduler shaper profile add */ + +typedef int (*eth_sched_shaper_profile_delete_t)(struct rte_eth_dev *dev, + uint32_t shaper_profile_id); +/**< @internal Scheduler shaper profile delete */ + +typedef int (*eth_sched_shaper_add_t)(struct rte_eth_dev *dev, + uint32_t shaper_id, + uint32_t shaper_profile_id); +/**< @internal Scheduler shaper instance add */ + +typedef int (*eth_sched_shaper_delete_t)(struct rte_eth_dev *dev, + uint32_t shaper_id); +/**< @internal Scheduler shaper instance delete */ + +typedef int (*eth_sched_node_add_t)(struct rte_eth_dev *dev, + uint32_t node_id, + uint32_t parent_node_id, + struct rte_eth_sched_node_params *params); +/**< @internal Scheduler node add */ + +typedef int (*eth_sched_node_delete_t)(struct rte_eth_dev *dev, + uint32_t node_id); +/**< @internal Scheduler node delete */ + +typedef int (*eth_sched_hierarchy_set_t)(struct rte_eth_dev *dev, + int clear_on_fail); +/**< @internal Scheduler hierarchy set */ + +typedef int (*eth_sched_node_priority_set_t)(struct rte_eth_dev *dev, + uint32_t node_id, + uint32_t priority); +/**< @internal Scheduler node priority set */ + +typedef int (*eth_sched_node_weight_set_t)(struct rte_eth_dev *dev, + uint32_t node_id, + uint32_t weight); +/**< @internal Scheduler node weight set */ + +typedef int (*eth_sched_node_shaper_set_t)(struct rte_eth_dev *dev, + uint32_t node_id, + uint32_t shaper_pos, + uint32_t shaper_id); +/**< @internal Scheduler node shaper set */ + +typedef int (*eth_sched_node_queue_set_t)(struct rte_eth_dev *dev, + uint32_t node_id, + uint32_t queue_id); +/**< @internal Scheduler node queue set */ + +typedef int (*eth_sched_node_cman_set_t)(struct rte_eth_dev *dev, + uint32_t node_id, + enum rte_eth_sched_cman_mode cman); +/**< @internal Scheduler node congestion management mode set */ + +typedef int (*eth_sched_node_wred_context_set_t)(struct rte_eth_dev *dev, + uint32_t node_id, + uint32_t wred_context_pos, + uint32_t wred_context_id); +/**< @internal Scheduler node WRED context set */ + +typedef int (*eth_sched_stats_get_enabled_t)(struct rte_eth_dev *dev, + uint64_t *nonleaf_node_capability_stats_mask, + uint64_t *nonleaf_node_enabled_stats_mask, + uint64_t *leaf_node_capability_stats_mask, + uint64_t *leaf_node_enabled_stats_mask); +/**< @internal Scheduler get set of stats counters enabled for all nodes */ + +typedef int (*eth_sched_stats_enable_t)(struct rte_eth_dev *dev, + uint64_t nonleaf_node_enabled_stats_mask, + uint64_t leaf_node_enabled_stats_mask); +/**< @internal Scheduler enable selected stats counters for all nodes */ + +typedef int (*eth_sched_node_stats_get_enabled_t)(struct rte_eth_dev *dev, + uint32_t node_id, + uint64_t *capability_stats_mask, + uint64_t *enabled_stats_mask); +/**< @internal Scheduler get set of stats counters enabled for specific node */ + +typedef int (*eth_sched_node_stats_enable_t)(struct rte_eth_dev *dev, + uint32_t node_id, + uint64_t enabled_stats_mask); +/**< @internal Scheduler enable selected stats counters for specific node */ + +typedef int (*eth_sched_node_stats_read_t)(struct rte_eth_dev *dev, + uint32_t node_id, + struct rte_eth_sched_node_stats *stats, + int clear); +/**< @internal Scheduler read stats counters for specific node */ + +typedef int (*eth_sched_run_t)(struct rte_eth_dev *dev); +/**< @internal Scheduler run */ + /** * @internal A structure containing the functions exported by an Ethernet driver. */ @@ -1547,6 +1809,53 @@ struct eth_dev_ops { eth_l2_tunnel_eth_type_conf_t l2_tunnel_eth_type_conf; /** Enable/disable l2 tunnel offload functions */ eth_l2_tunnel_offload_set_t l2_tunnel_offload_set; + + /** Scheduler WRED profile add */ + eth_sched_wred_profile_add_t sched_wred_profile_add; + /** Scheduler WRED profile delete */ + eth_sched_wred_profile_delete_t sched_wred_profile_delete; + /** Scheduler WRED context add */ + eth_sched_wred_context_add_t sched_wred_context_add; + /** Scheduler WRED context delete */ + eth_sched_wred_context_delete_t sched_wred_context_delete; + /** Scheduler shaper profile add */ + eth_sched_shaper_profile_add_t sched_shaper_profile_add; + /** Scheduler shaper profile delete */ + eth_sched_shaper_profile_delete_t sched_shaper_profile_delete; + /** Scheduler shaper instance add */ + eth_sched_shaper_add_t sched_shaper_add; + /** Scheduler shaper instance delete */ + eth_sched_shaper_delete_t sched_shaper_delete; + /** Scheduler node add */ + eth_sched_node_add_t sched_node_add; + /** Scheduler node delete */ + eth_sched_node_delete_t sched_node_delete; + /** Scheduler hierarchy set */ + eth_sched_hierarchy_set_t sched_hierarchy_set; + /** Scheduler node priority set */ + eth_sched_node_priority_set_t sched_node_priority_set; + /** Scheduler node weight set */ + eth_sched_node_weight_set_t sched_node_weight_set; + /** Scheduler node shaper set */ + eth_sched_node_shaper_set_t sched_node_shaper_set; + /** Scheduler node queue set */ + eth_sched_node_queue_set_t sched_node_queue_set; + /** Scheduler node congestion management mode set */ + eth_sched_node_cman_set_t sched_node_cman_set; + /** Scheduler node WRED context set */ + eth_sched_node_wred_context_set_t sched_node_wred_context_set; + /** Scheduler get statistics counter type enabled for all nodes */ + eth_sched_stats_get_enabled_t sched_stats_get_enabled; + /** Scheduler enable selected statistics counters for all nodes */ + eth_sched_stats_enable_t sched_stats_enable; + /** Scheduler get statistics counter type enabled for current node */ + eth_sched_node_stats_get_enabled_t sched_node_stats_get_enabled; + /** Scheduler enable selected statistics counters for current node */ + eth_sched_node_stats_enable_t sched_node_stats_enable; + /** Scheduler read statistics counters for current node */ + eth_sched_node_stats_read_t sched_node_stats_read; + /** Scheduler run */ + eth_sched_run_t sched_run; }; /** @@ -4336,6 +4645,491 @@ rte_eth_dev_l2_tunnel_offload_set(uint8_t port_id, uint8_t en); /** + * Scheduler WRED profile add + * + * Create a new WRED profile with ID set to *wred_profile_id*. The new profile + * is used to create one or several WRED contexts. + * + * @param port_id + * The port identifier of the Ethernet device. + * @param wred_profile_id + * WRED profile ID for the new profile. Needs to be unused. + * @param profile + * WRED profile parameters. Needs to be pre-allocated and valid. + * @return + * 0 on success, non-zero error code otherwise. + */ +int rte_eth_sched_wred_profile_add(uint8_t port_id, + uint32_t wred_profile_id, + struct rte_eth_sched_wred_params *profile); + +/** + * Scheduler WRED profile delete + * + * Delete an existing WRED profile. This operation fails when there is currently + * at least one user (i.e. WRED context) of this WRED profile. + * + * @param port_id + * The port identifier of the Ethernet device. + * @param wred_profile_id + * WRED profile ID. Needs to be the valid. + * @return + * 0 on success, non-zero error code otherwise. + */ +int rte_eth_sched_wred_profile_delete(uint8_t port_id, + uint32_t wred_profile_id); + +/** + * Scheduler WRED context add or update + * + * When *wred_context_id* is invalid, a new WRED context with this ID is created + * by using the WRED profile identified by *wred_profile_id*. + * + * When *wred_context_id* is valid, this WRED context is no longer using the + * profile previously assigned to it and is updated to use the profile + * identified by *wred_profile_id*. + * + * A valid WRED context is assigned to one or several scheduler hierarchy leaf + * nodes configured to use WRED as the congestion management mode. + * + * @param port_id + * The port identifier of the Ethernet device. + * @param wred_context_id + * WRED context ID + * @param wred_profile_id + * WRED profile ID. Needs to be the valid. + * @return + * 0 on success, non-zero error code otherwise. + */ +int rte_eth_sched_wred_context_add(uint8_t port_id, + uint32_t wred_context_id, + uint32_t wred_profile_id); + +/** + * Scheduler WRED context delete + * + * Delete an existing WRED context. This operation fails when there is currently + * at least one user (i.e. scheduler hierarchy leaf node) of this WRED context. + * + * @param port_id + * The port identifier of the Ethernet device. + * @param wred_context_id + * WRED context ID. Needs to be the valid. + * @return + * 0 on success, non-zero error code otherwise. + */ +int rte_eth_sched_wred_context_delete(uint8_t port_id, + uint32_t wred_context_id); + +/** + * Scheduler shaper profile add + * + * Create a new shaper profile with ID set to *shaper_profile_id*. The new + * shaper profile is used to create one or several shapers. + * + * @param port_id + * The port identifier of the Ethernet device. + * @param shaper_profile_id + * Shaper profile ID for the new profile. Needs to be unused. + * @param profile + * Shaper profile parameters. Needs to be pre-allocated and valid. + * @return + * 0 on success, non-zero error code otherwise. + */ +int rte_eth_sched_shaper_profile_add(uint8_t port_id, + uint32_t shaper_profile_id, + struct rte_eth_sched_shaper_params *profile); + +/** + * Scheduler shaper profile delete + * + * Delete an existing shaper profile. This operation fails when there is + * currently at least one user (i.e. shaper) of this shaper profile. + * + * @param port_id + * The port identifier of the Ethernet device. + * @param shaper_profile_id + * Shaper profile ID. Needs to be the valid. + * @return + * 0 on success, non-zero error code otherwise. + */ +/* no users (shapers) using this profile */ +int rte_eth_sched_shaper_profile_delete(uint8_t port_id, + uint32_t shaper_profile_id); + +/** + * Scheduler shaper add or update + * + * When *shaper_id* is not a valid shaper ID, a new shaper with this ID is + * created using the shaper profile identified by *shaper_profile_id*. + * + * When *shaper_id* is a valid shaper ID, this shaper is no longer using the + * shaper profile previously assigned to it and is updated to use the shaper + * profile identified by *shaper_profile_id*. + * + * A valid shaper is assigned to one or several scheduler hierarchy nodes. + * + * @param port_id + * The port identifier of the Ethernet device. + * @param shaper_id + * Shaper ID + * @param shaper_profile_id + * Shaper profile ID. Needs to be the valid. + * @return + * 0 on success, non-zero error code otherwise. + */ +int rte_eth_sched_shaper_add(uint8_t port_id, + uint32_t shaper_id, + uint32_t shaper_profile_id); + +/** + * Scheduler shaper delete + * + * Delete an existing shaper. This operation fails when there is currently at + * least one user (i.e. scheduler hierarchy node) of this shaper. + * + * @param port_id + * The port identifier of the Ethernet device. + * @param shaper_id + * Shaper ID. Needs to be the valid. + * @return + * 0 on success, non-zero error code otherwise. + */ +int rte_eth_sched_shaper_delete(uint8_t port_id, + uint32_t shaper_id); + +/** + * Scheduler node add or remap + * + * When *node_id* is not a valid node ID, a new node with this ID is created and + * connected as child to the existing node identified by *parent_node_id*. + * + * When *node_id* is a valid node ID, this node is disconnected from its current + * parent and connected as child to another existing node identified by + * *parent_node_id *. + * + * This function can be called during port initialization phase (before the + * Ethernet port is started) for building the scheduler start-up hierarchy. + * Subject to the specific Ethernet port supporting on-the-fly scheduler + * hierarchy updates, this function can also be called during run-time (after + * the Ethernet port is started). + * + * @param port_id + * The port identifier of the Ethernet device. + * @param node_id + * Node ID + * @param parent_node_id + * Parent node ID. Needs to be the valid. + * @param params + * Node parameters. Needs to be pre-allocated and valid. + * @return + * 0 on success, non-zero error code otherwise. + */ +int rte_eth_sched_node_add(uint8_t port_id, + uint32_t node_id, + uint32_t parent_node_id, + struct rte_eth_sched_node_params *params); + +/** + * Scheduler node delete + * + * Delete an existing node. This operation fails when this node currently has at + * least one user (i.e. child node). + * + * @param port_id + * The port identifier of the Ethernet device. + * @param node_id + * Node ID. Needs to be valid. + * @return + * 0 on success, non-zero error code otherwise. + */ +int rte_eth_sched_node_delete(uint8_t port_id, + uint32_t node_id); + +/** + * Scheduler hierarchy set + * + * This function is called during the port initialization phase (before the + * Ethernet port is started) to freeze the scheduler start-up hierarchy. + * + * This function fails when the currently configured scheduler hierarchy is not + * supported by the Ethernet port, in which case the user can abort or try out + * another hierarchy configuration (e.g. a hierarchy with less leaf nodes), + * which can be build from scratch (when *clear_on_fail* is enabled) or by + * modifying the existing hierarchy configuration (when *clear_on_fail* is + * disabled). + * + * Note that, even when the configured scheduler hierarchy is supported (so this + * function is successful), the Ethernet port start might still fail due to e.g. + * not enough memory being available in the system, etc. + * + * @param port_id + * The port identifier of the Ethernet device. + * @param clear_on_fail + * On function call failure, hierarchy is cleared when this parameter is + * non-zero and preserved when this parameter is equal to zero. + * @return + * 0 on success, non-zero error code otherwise. + */ +int rte_eth_sched_hierarchy_set(uint8_t port_id, + int clear_on_fail); + +/** + * Scheduler node priority set + * + * @param port_id + * The port identifier of the Ethernet device. + * @param node_id + * Node ID. Needs to be valid. + * @param priority + * Node priority. The highest node priority is zero. Used by the SP algorithm + * running on the parent of the current node for scheduling this child node. + * @return + * 0 on success, non-zero error code otherwise. + */ +int rte_eth_sched_node_priority_set(uint8_t port_id, + uint32_t node_id, + uint32_t priority); + +/** + * Scheduler node weight set + * + * @param port_id + * The port identifier of the Ethernet device. + * @param node_id + * Node ID. Needs to be valid. + * @param weight + * Node weight. The node weight is relative to the weight sum of all siblings + * that have the same priority. The lowest weight is zero. Used by the WFQ + * algorithm running on the parent of the current node for scheduling this + * child node. + * @return + * 0 on success, non-zero error code otherwise. + */ +int rte_eth_sched_node_weight_set(uint8_t port_id, + uint32_t node_id, + uint32_t weight); + +/** + * Scheduler node shaper set + * + * @param port_id + * The port identifier of the Ethernet device. + * @param node_id + * Node ID. Needs to be valid. + * @param shaper_pos + * Position in the shaper array of the current node + * (0 .. RTE_ETH_SCHED_SHAPERS_PER_NODE-1). + * @param shaper_id + * Shaper ID. Needs to be either valid shaper ID or set to + * RTE_ETH_SCHED_SHAPER_ID_NONE in order to invalidate the shaper on position + * *shaper_pos* within the current node. + * @return + * 0 on success, non-zero error code otherwise. + */ +int rte_eth_sched_node_shaper_set(uint8_t port_id, + uint32_t node_id, + uint32_t shaper_pos, + uint32_t shaper_id); + +/** + * Scheduler node queue set + * + * @param port_id + * The port identifier of the Ethernet device. + * @param node_id + * Node ID. Needs to be valid. + * @param queue_id + * Queue ID. Needs to be valid. + * @return + * 0 on success, non-zero error code otherwise. + */ +int rte_eth_sched_node_queue_set(uint8_t port_id, + uint32_t node_id, + uint32_t queue_id); + +/** + * Scheduler node congestion management mode set + * + * @param port_id + * The port identifier of the Ethernet device. + * @param node_id + * Node ID. Needs to be valid leaf node ID. + * @param cman + * Congestion management mode. + * @return + * 0 on success, non-zero error code otherwise. + */ +int rte_eth_sched_node_cman_set(uint8_t port_id, + uint32_t node_id, + enum rte_eth_sched_cman_mode cman); + +/** + * Scheduler node WRED context set + * + * @param port_id + * The port identifier of the Ethernet device. + * @param node_id + * Node ID. Needs to be valid leaf node ID that has WRED selected as the + * congestion management mode. + * @param wred_context_pos + * Position in the WRED context array of the current leaf node + * (0 .. RTE_ETH_SCHED_WRED_CONTEXTS_PER_NODE-1) + * @param wred_context_id + * WRED context ID. Needs to be either valid WRED context ID or set to + * RTE_ETH_SCHED_WRED_CONTEXT_ID_NONE in order to invalidate the WRED context + * on position *wred_context_pos* within the current leaf node. + * @return + * 0 on success, non-zero error code otherwise. + */ +int rte_eth_sched_node_wred_context_set(uint8_t port_id, + uint32_t node_id, + uint32_t wred_context_pos, + uint32_t wred_context_id); + +/** + * Scheduler get statistics counter types enabled for all nodes + * + * @param port_id + * The port identifier of the Ethernet device. + * @param nonleaf_node_capability_stats_mask + * Statistics counter types available per node for all non-leaf nodes. Needs + * to be pre-allocated. + * @param nonleaf_node_enabled_stats_mask + * Statistics counter types currently enabled per node for each non-leaf node. + * This is a subset of *nonleaf_node_capability_stats_mask*. Needs to be + * pre-allocated. + * @param leaf_node_capability_stats_mask + * Statistics counter types available per node for all leaf nodes. Needs to + * be pre-allocated. + * @param leaf_node_enabled_stats_mask + * Statistics counter types currently enabled for each leaf node. This is + * a subset of *leaf_node_capability_stats_mask*. Needs to be pre-allocated. + * @return + * 0 on success, non-zero error code otherwise. + */ +int rte_eth_sched_stats_get_enabled(uint8_t port_id, + uint64_t *nonleaf_node_capability_stats_mask, + uint64_t *nonleaf_node_enabled_stats_mask, + uint64_t *leaf_node_capability_stats_mask, + uint64_t *leaf_node_enabled_stats_mask); + +/** + * Scheduler enable selected statistics counters for all nodes + * + * @param port_id + * The port identifier of the Ethernet device. + * @param nonleaf_node_enabled_stats_mask + * Statistics counter types to be enabled per node for each non-leaf node. + * This needs to be a subset of the statistics counter types available per + * node for all non-leaf nodes. Any statistics counter type not included in + * this set is to be disabled for all non-leaf nodes. + * @param leaf_node_enabled_stats_mask + * Statistics counter types to be enabled per node for each leaf node. This + * needs to be a subset of the statistics counter types available per node for + * all leaf nodes. Any statistics counter type not included in this set is to + * be disabled for all leaf nodes. + * @return + * 0 on success, non-zero error code otherwise. + */ +int rte_eth_sched_stats_enable(uint8_t port_id, + uint64_t nonleaf_node_enabled_stats_mask, + uint64_t leaf_node_enabled_stats_mask); + +/** + * Scheduler get statistics counter types enabled for current node + * + * @param port_id + * The port identifier of the Ethernet device. + * @param node_id + * Node ID. Needs to be valid. + * @param capability_stats_mask + * Statistics counter types available for the current node. Needs to be pre-allocated. + * @param enabled_stats_mask + * Statistics counter types currently enabled for the current node. This is + * a subset of *capability_stats_mask*. Needs to be pre-allocated. + * @return + * 0 on success, non-zero error code otherwise. + */ +int rte_eth_sched_node_stats_get_enabled(uint8_t port_id, + uint32_t node_id, + uint64_t *capability_stats_mask, + uint64_t *enabled_stats_mask); + +/** + * Scheduler enable selected statistics counters for current node + * + * @param port_id + * The port identifier of the Ethernet device. + * @param node_id + * Node ID. Needs to be valid. + * @param enabled_stats_mask + * Statistics counter types to be enabled for the current node. This needs to + * be a subset of the statistics counter types available for the current node. + * Any statistics counter type not included in this set is to be disabled for + * the current node. + * @return + * 0 on success, non-zero error code otherwise. + */ +int rte_eth_sched_node_stats_enable(uint8_t port_id, + uint32_t node_id, + uint64_t enabled_stats_mask); + +/** + * Scheduler node statistics counters read + * + * @param port_id + * The port identifier of the Ethernet device. + * @param node_id + * Node ID. Needs to be valid. + * @param stats + * When non-NULL, it contains the current value for the statistics counters + * enabled for the current node. + * @param clear + * When this parameter has a non-zero value, the statistics counters are + * cleared (i.e. set to zero) immediately after they have been read, otherwise + * the statistics counters are left untouched. + * @return + * 0 on success, non-zero error code otherwise. + */ +int rte_eth_sched_node_stats_read(uint8_t port_id, + uint32_t node_id, + struct rte_eth_sched_node_stats *stats, + int clear); + +/** + * Scheduler run + * + * The packet enqueue side of the scheduler hierarchy is typically done through + * the Ethernet device TX function. For HW implementations, the packet dequeue + * side is typically done by the Ethernet device without any SW intervention, + * therefore this functions should not do anything. + * + * However, for poll-mode SW or mixed HW-SW implementations, the SW intervention + * is likely to be required for running the packet dequeue side of the scheduler + * hierarchy. Other potential task performed by this function is periodic flush + * of any packet enqueue-side buffers used by the burst-mode implementations. + * + * @param port_id + * The port identifier of the Ethernet device. + * @return + * 0 on success, non-zero error code otherwise. + */ +static inline int +rte_eth_sched_run(uint8_t port_id) +{ + struct rte_eth_dev *dev; + +#ifdef RTE_LIBRTE_ETHDEV_DEBUG + RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, 0); +#endif + + dev = &rte_eth_devices[port_id]; + + return (dev->dev_ops->sched_run)? dev->dev_ops->sched_run(dev) : 0; +} + +/** * Get the port id from pci adrress or device name * Ex: 0000:2:00.0 or vdev name net_pcap0 * -- 2.5.0