From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by dpdk.org (Postfix) with ESMTP id 9051A2A61 for ; Fri, 24 Apr 2015 13:19:35 +0200 (CEST) Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga103.jf.intel.com with ESMTP; 24 Apr 2015 04:19:35 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.11,639,1422950400"; d="scan'208";a="718626665" Received: from irsmsx109.ger.corp.intel.com ([163.33.3.23]) by orsmga002.jf.intel.com with ESMTP; 24 Apr 2015 04:19:33 -0700 Received: from irsmsx108.ger.corp.intel.com ([169.254.11.246]) by IRSMSX109.ger.corp.intel.com ([169.254.13.201]) with mapi id 14.03.0224.002; Fri, 24 Apr 2015 12:19:33 +0100 From: "Dumitrescu, Cristian" To: Greg Smith , "dev@dpdk.org" Thread-Topic: QoS Question Thread-Index: AdB7hwer++Rvil0nR+WEeVfgDMWH0gC6gNqg Date: Fri, 24 Apr 2015 11:19:31 +0000 Message-ID: <3EB4FA525960D640B5BDFFD6A3D8912632358773@IRSMSX108.ger.corp.intel.com> References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [163.33.239.182] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] QoS Question X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 24 Apr 2015 11:19:36 -0000 Hi Greg, Great question, thank you! Please see my comments below. > -----Original Message----- > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Greg Smith > Sent: Monday, April 20, 2015 7:40 PM > To: dev@dpdk.org > Subject: [dpdk-dev] QoS Question >=20 > Hi DPDK team, >=20 > The docs on QoS > (http://dpdk.org/doc/guides/prog_guide/qos_framework.html# ) describe > the traffic class (TC) as follows: > 1 - The TCs of the same pipe handled in strict priority order. > 2 - Upper limit enforced per TC at the pipe level. > 3 - Lower priority TCs able to reuse pipe bandwidth currently unused by > higher priority TCs. > 4 - When subport TC is oversubscribed (configuration time event), pipe TC > upper limit is capped to a dynamically adjusted value that is shared by a= ll the > subport pipes. >=20 > Can someone describe how and when the TC upper limit is "dynamically" > changed? This feature is described at length in Programmer's Guide section 21.2.4.6.= 6. Subport Traffic Class Oversubscription. Please note this feature is not = enabled by default in the code base. To enable it, please set flag CONFIG_R= TE_SCHED_SUBPORT_TC_OV=3Dy in the DPDK configuration file (config/common_li= nuxapp or config/common_bsdapp). Subport traffic class oversubscription is implemented only for the lowest p= riority traffic class (TC3, a.k.a. Best Effort), as usually the Best Effort= traffic class (where most of the traffic is) is oversubscribed, while the = high priority traffic classes (e.g. voice) are usually fully provisioned. O= verprovisioning takes place when the service provider is selling more bandw= idth that physically available, i.e. when the summation of the _nominal_ ra= te assigned to the users exceeds the rate of the subport. This does not nec= essarily represent a problem, as only a fraction of the users are looking t= o fully utilize their service (i.e. use up 100% of their nominal rate) at a= ny given time: when the current total demand from all subport users does no= t exceed the subport rate, no problem exists, as each subscriber has its de= mand fully serviced; when the current total demand (which changes dynamical= ly) exceeds the limit, it is obviously no longer possible to fully meet the= entire demand. In the latter case, it is important that some fairness is taking place. We = do not want to have some subscribers getting close to 0% of their nominal = rate, while others getting close to 100% of their nominal rate, as this wou= ld not be fair. On the other hand, we cannot reduce the nominal rate of the= users (e.g. everybody is now allowed 73% of their rate), as the nominal ra= te of a subscriber is completely disconnected from its current demand: one = user might demand only 10% of its rate at this moment, so reserving 73% of = its rate for this users results in wasting 63% of its rate, which could oth= erwise be awarded to some other user which has a higher demand at the same = moment. What we need to do is this: we need to apply a water filling algorithm that= computes a user quota (common for all the subport users), so that users wi= th current demand less than this quota will be fully serviced, while users = with high demand will be truncated. This user quota is updated periodically= , with the new value being estimated based on subport consumption from the = past: when we see that the previous quota resulted in some subport bandwidt= h not being consumed, we increase the quota incrementally until the entire = subport bandwidth is consumed; when we see that the entire subport bandwidt= h is consumed, we start dropping the quota incrementally until we see that = some subport bandwidth starts to be wasted. >=20 > For example, assume there's a 1Gb/s port and a single 1Gb/s subport and > 2000 pipes each of 1Mb/s (total pipes =3D 2Gb/s which is > the 1Gb/s subp= ort > which I think means "oversubscribed" as used in the doc). Each Pipe has a > single TC. Yes, agree this is an example of oversubscription. I used a similar example= to describe this feature during the DPDK community readout earlier this we= ek (https://youtu.be/_PPklkWGugs).=20 > In that case, would each pipe be shaped to an upper limit of 0.5 Mb/s? Only in the very unlikely event that all the 2000 users are active and each= one is asking for 0.5 Mbps or more. Typically, some of these users are currently inactive (demand =3D 0%) and s= ome others will ask for less than e.g. 0.5 Mbps; whatever subport bandwidth= is left unused by the low demand users, it can be awarded to the high dema= nd users (of course, no user will ever get more than its nominal rate). Let's refine the example: let's say that, currently, the demand distributio= n for the 2000 users is: [500 users: 0 Mbps; 500 users: 0.4 Mbps; 500 users= : 0.7 Mbps; 500 users: 1 Mbps]. These users will be awarded the following rates: [500 users: 0 Mbps; 500 us= ers: 0.4 Mbps; 500 users: 0.7 Mbps; 500 users: 0.9 Mbps]. Basically, all users are fully serviced, except the users demanding 1 Mbps,= which will be truncated to get 0.9 Mbps. Implementation-wise, this means that the water filling algorithm will reach= the equilibrium after a few iterations to the user quota of 0.9 Mbps. > What if there was no traffic on 1999 pipes, would the single active pipe = still be > limited to 0.5 Mb/s? Nope, see example above. > What if the number of pipes changes without restarting the OS, how does > that change the behavior? The set of active users and the user demand fluctuates over time, so this i= s why the water filling algorithm is periodically re-computing the user quo= ta based on the new network reality. >=20 > BTW, great docs overall, thanks for writing those up. >=20 > Thanks, >=20 > Greg Smith >=20 >=20 Regards, Cristian