From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg0-f45.google.com (mail-pg0-f45.google.com [74.125.83.45]) by dpdk.org (Postfix) with ESMTP id 722892C0A for ; Tue, 6 Dec 2016 20:51:33 +0100 (CET) Received: by mail-pg0-f45.google.com with SMTP id f188so152078263pgc.3 for ; Tue, 06 Dec 2016 11:51:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=networkplumber-org.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=ias1qDCxce3FAai2n9Ekf8uZsaF3i6aRpKkdZ5SIeO8=; b=D45VTPhE8nPSvaB7KfRMnO5FJzgcSQMp/8s3hu2Dugkzt3zOfrwoZmCuZ5PecAbuU/ a7rfQcd0jjCYEtnzRHsmQYpHLJhfLDaKLnwhrDviphRLchn5h0+l65gppIpxF0UETQUv oi7dA6C5XR/DEoFMfxDalT6w912lPrjoolCV0F2cKXQat8gSqiICgaA99TK4FDz7Yact s3ycl2YZ62+/VTOIEwmMyESeFmyXwuGy2RlvzDH6dhCSjLpsS2jfQmyyaOAAVf5cIp78 cVDIXn+wcjQstDlASfbqyQyuqyDOGBEZ8DSDoKh2z39l1HMENhpNe4SRXlvAjYQuod8o ea6w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ias1qDCxce3FAai2n9Ekf8uZsaF3i6aRpKkdZ5SIeO8=; b=ku1BIvsOIpkgkJZ3qk7Jsx7fvXbGxqMIMYf5Cs2V0LaYyP8YRTLmoLBYjuJJ1yAeTb I6vGOVbYG6wohDNheb3PAixPePe7kGicbdLnaEmTRISH8yNOJIy7gh4aVMroAqZjFKDH eDhir9qJT1fG2KFJbe7DMb3QKTM40/7MgtDWCJsd+18/2ZIwdPMdHYVcsQmw5vdx9v6L 8Wsp5zK/m1U2scRGXElwWoCfI89A1aIQ0keQGjHrT/LSQKzAhlBsHBV8L1Bj/ODpfP9e qyECuCvayn1MY+7wiy4mQqTw2+sKaeOnf09gUUP5AA23wTthFla9aRe4c+Zq2ii+naii NOLQ== X-Gm-Message-State: AKaTC03Qj5ka6bXLvecpGJpta/K8XJYHMpGw6F2hqf/o+p9qlt0M4NK/Zf4H9A0Ta5R+jA== X-Received: by 10.99.217.81 with SMTP id e17mr114508633pgj.127.1481053892338; Tue, 06 Dec 2016 11:51:32 -0800 (PST) Received: from xeon-e3 (204-195-18-65.wavecable.com. [204.195.18.65]) by smtp.gmail.com with ESMTPSA id x90sm36705721pfk.73.2016.12.06.11.51.31 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 06 Dec 2016 11:51:32 -0800 (PST) Date: Tue, 6 Dec 2016 11:51:24 -0800 From: Stephen Hemminger To: Cristian Dumitrescu Cc: dev@dpdk.org Message-ID: <20161206115124.67ccc0c7@xeon-e3> In-Reply-To: <1480529810-95280-1-git-send-email-cristian.dumitrescu@intel.com> References: <1480529810-95280-1-git-send-email-cristian.dumitrescu@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [RFC] ethdev: abstraction layer for QoS hierarchical scheduler X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 06 Dec 2016 19:51:33 -0000 On Wed, 30 Nov 2016 18:16:50 +0000 Cristian Dumitrescu wrote: > This RFC proposes an ethdev-based abstraction layer for Quality of Service (QoS) > hierarchical scheduler. The goal of the abstraction layer is to provide a simple > generic API that is agnostic of the underlying HW, SW or mixed HW-SW complex > implementation. > > Q1: What is the benefit for having an abstraction layer for QoS hierarchical > layer? > A1: There is growing interest in the industry for handling various HW-based, > SW-based or mixed hierarchical scheduler implementations using a unified DPDK > API. > > Q2: Which devices are targeted by this abstraction layer? > A2: All current and future devices that expose a hierarchical scheduler feature > under DPDK, including NICs, FPGAs, ASICs, SOCs, SW libraries. > > Q3: Which scheduler hierarchies are supported by the API? > A3: Hopefully any scheduler hierarchy can be described and covered by the > current API. Of course, functional correctness, accuracy and performance levels > depend on the specific implementations of this API. > > Q4: Why have this abstraction layer into ethdev as opposed to a new type of > device (e.g. scheddev) similar to ethdev, cryptodev, eventdev, etc? > A4: Packets are sent to the Ethernet device using the ethdev API > rte_eth_tx_burst() function, with the hierarchical scheduling taking place > automatically (i.e. no SW intervention) in HW implementations. Basically, the > hierarchical scheduler is done as part of packet TX operation. > The hierarchical scheduler is typically the last stage before packet TX and it > is tightly integrated with the TX stage. The hierarchical scheduler is just > another offload feature of the Ethernet device, which needs to be accommodated > by the ethdev API similar to any other offload feature (such as RSS, DCB, > flow director, etc). > Once the decision to schedule a specific packet has been taken, this packet > cannot be dropped and it has to be sent over the wire as is, otherwise what > takes place on the wire is not what was planned at scheduling time, so the > scheduling is not accurate (Note: there are some devices which allow prepending > headers to the packet after the scheduling stage at the expense of sending > correction requests back to the scheduler, but this only strengthens the bond > between scheduling and TX). > > Q5: Given that the packet scheduling takes place automatically for pure HW > implementations, how does packet scheduling take place for poll-mode SW > implementations? > A5: The API provided function rte_sched_run() is designed to take care of this. > For HW implementations, this function typically does nothing. For SW > implementations, this function is typically expected to perform dequeue of > packets from the hierarchical scheduler and their write to Ethernet device TX > queue, periodic flush of any buffers on enqueue-side into the hierarchical > scheduler for burst-oriented implementations, etc. > > Q6: Which are the scheduling algorithms supported? > A6: The fundamental scheduling algorithms that are supported are Strict Priority > (SP) and Weighted Fair Queuing (WFQ). The SP and WFQ algorithms are supported at > the level of each node of the scheduling hierarchy, regardless of the node > level/position in the tree. The SP algorithm is used to schedule between sibling > nodes with different priority, while WFQ is used to schedule between groups of > siblings that have the same priority. > Algorithms such as Weighed Round Robin (WRR), byte-level WRR, Deficit WRR > (DWRR), etc are considered approximations of the ideal WFQ and are therefore > assimilated to WFQ, although an associated implementation-dependent accuracy, > performance and resource usage trade-off might exist. > > Q7: Which are the supported congestion management algorithms? > A7: Tail drop, head drop and Weighted Random Early Detection (WRED). They are > available for every leaf node in the hierarchy, subject to the specific > implementation supporting them. > > Q8: Is traffic shaping supported? > A8: Yes, there are a number of shapers (rate limiters) that can be supported for > each node in the hierarchy (built-in limit is currently set to 4 per node). Each > shaper can be private to a node (used only by that node) or shared between > multiple nodes. > > Q9: What is the purpose of having shaper profiles and WRED profiles? > A9: In most implementations, many shapers typically share the same configuration > parameters, so defining shaper profiles simplifies the configuration task. Same > considerations apply to WRED contexts and profiles. > > Q10: How is the scheduling hierarchy defined and created? > A10: Scheduler hierarchy tree is set up by creating new nodes and connecting > them to other existing nodes, which thus become parent nodes. The unique ID that > is assigned to each node when the node is created is further used to update the > node configuration or to connect children nodes to it. The leaf nodes of the > scheduler hierarchy are each attached to one of the Ethernet device TX queues. > > Q11: Are on-the-fly changes of the scheduling hierarchy allowed by the API? > A11: Yes. The actual changes take place subject to the specific implementation > supporting them, otherwise error code is returned. > > Q12: What is the typical function call sequence to set up and run the Ethernet > device scheduler? > A12: The typical simplified function call sequence is listed below: > i) Configure the Ethernet device and its TX queues: rte_eth_dev_configure(), > rte_eth_tx_queue_setup() > ii) Create WRED profiles and WRED contexts, shaper profiles and shapers: > rte_eth_sched_wred_profile_add(), rte_eth_sched_wred_context_add(), > rte_eth_sched_shaper_profile_add(), rte_eth_sched_shaper_add() > iii) Create the scheduler hierarchy nodes and tree: rte_eth_sched_node_add() > iv) Freeze the start-up hierarchy and ask the device whether it supports it: > rte_eth_sched_node_add() > v) Start the Ethernet port: rte_eth_dev_start() > vi) Run-time scheduler hierarchy updates: rte_eth_sched_node_add(), > rte_eth_sched_node__set() > vii) Run-time packet enqueue into the hierarchical scheduler: rte_eth_tx_burst() > viii) Run-time support for SW poll-mode implementations (see previous answer): > rte_sched_run() > > Q13: Which are the possible options for the user when the Ethernet port does not > support the scheduling hierarchy required by the user? > A13: The following options are available to the user: > i) abort > ii) try out a new hierarchy (e.g. with less leaf nodes), if acceptable > iii) wrap the Ethernet device into a new type of Ethernet device that has a SW > front-end implementing the hierarchical scheduler (e.g. existing DPDK library > librte_sched); instantiate the new device type on-the-fly and check if the > hierarchy requirements can be met by the new device. > > > Signed-off-by: Cristian Dumitrescu This seems to be more of an abstraction of existing QoS. Why not something like Linux Qdisc or FreeBSD DummyNet/PF/ALTQ where the Qos components are stackable objects? And why not make it the same as existing OS abstractions? Rather than reinventing wheel which seems to be DPDK Standard Procedure, could an existing abstraction be used?