DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Dumitrescu, Cristian" <cristian.dumitrescu@intel.com>
To: Greg Smith <gregsmith@juniper.net>, "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] QoS Question
Date: Fri, 24 Apr 2015 11:19:31 +0000	[thread overview]
Message-ID: <3EB4FA525960D640B5BDFFD6A3D8912632358773@IRSMSX108.ger.corp.intel.com> (raw)
In-Reply-To: <BN1PR05MB280069BC605566E6014E7CBA8E00@BN1PR05MB280.namprd05.prod.outlook.com>

Hi Greg,

Great question, thank you! Please see my comments below.

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Greg Smith
> Sent: Monday, April 20, 2015 7:40 PM
> To: dev@dpdk.org
> Subject: [dpdk-dev] QoS Question
> 
> Hi DPDK team,
> 
> The docs on QoS
> (http://dpdk.org/doc/guides/prog_guide/qos_framework.html# ) describe
> the traffic class (TC) as follows:
> 1 - The  TCs of the same pipe handled in strict priority order.
> 2 - Upper limit enforced per TC at the pipe level.
> 3 - Lower priority TCs able to reuse pipe bandwidth currently unused by
> higher priority TCs.
> 4 - When subport TC is oversubscribed (configuration time event), pipe TC
> upper limit is capped to a dynamically adjusted value that is shared by all the
> subport pipes.
> 
> Can someone describe how and when the TC upper limit is "dynamically"
> changed?

This feature is described at length in Programmer's Guide section 21.2.4.6.6. Subport Traffic Class Oversubscription. Please note this feature is not enabled by default in the code base. To enable it, please set flag CONFIG_RTE_SCHED_SUBPORT_TC_OV=y in the DPDK configuration file (config/common_linuxapp or config/common_bsdapp).

Subport traffic class oversubscription is implemented only for the lowest priority traffic class (TC3, a.k.a. Best Effort), as usually the Best Effort traffic class (where most of the traffic is) is oversubscribed, while the high priority traffic classes (e.g. voice) are usually fully provisioned. Overprovisioning takes place when the service provider is selling more bandwidth that physically available, i.e. when the summation of the _nominal_ rate assigned to the users exceeds the rate of the subport. This does not necessarily represent a problem, as only a fraction of the users are looking to fully utilize their service (i.e. use up 100% of their nominal rate) at any given time: when the current total demand from all subport users does not exceed the subport rate, no problem exists, as each subscriber has its demand fully serviced; when the current total demand (which changes dynamically) exceeds the limit, it is obviously no longer possible to fully meet the entire demand.

In the latter case, it is important that some fairness is taking place. We do not want to have some  subscribers getting close to 0% of their nominal rate, while others getting close to 100% of their nominal rate, as this would not be fair. On the other hand, we cannot reduce the nominal rate of the users (e.g. everybody is now allowed 73% of their rate), as the nominal rate of a subscriber is completely disconnected from its current demand: one user might demand only 10% of its rate at this moment, so reserving 73% of its rate for this users results in wasting 63% of its rate, which could otherwise be awarded to some other user which has a higher demand at the same moment.

What we need to do is this: we need to apply a water filling algorithm that computes a user quota (common for all the subport users), so that users with current demand less than this quota will be fully serviced, while users with high demand will be truncated. This user quota is updated periodically, with the new value being estimated based on subport consumption from the past: when we see that the previous quota resulted in some subport bandwidth not being consumed, we increase the quota incrementally until the entire subport bandwidth is consumed; when we see that the entire subport bandwidth is consumed, we start dropping the quota incrementally until we see that some subport bandwidth starts to be wasted.

> 
> For example, assume there's a 1Gb/s port and a single 1Gb/s subport and
> 2000 pipes each of 1Mb/s (total pipes = 2Gb/s which is > the 1Gb/s subport
> which I think means "oversubscribed" as used in the doc). Each Pipe has a
> single TC.

Yes, agree this is an example of oversubscription. I used a similar example to describe this feature during the DPDK community readout earlier this week (https://youtu.be/_PPklkWGugs). 


> In that case, would each pipe be shaped to an upper limit of 0.5 Mb/s?

Only in the very unlikely event that all the 2000 users are active and each one is asking for 0.5 Mbps or more.

Typically, some of these users are currently inactive (demand = 0%) and some others will ask for less than e.g. 0.5 Mbps; whatever subport bandwidth is left unused by the low demand users, it can be awarded to the high demand users (of course, no user will ever get more than its nominal rate).

Let's refine the example: let's say that, currently, the demand distribution for the 2000 users is: [500 users: 0 Mbps; 500 users: 0.4 Mbps; 500 users: 0.7 Mbps; 500 users: 1 Mbps].
These users will be awarded the following rates: [500 users: 0 Mbps; 500 users: 0.4 Mbps; 500 users: 0.7 Mbps; 500 users: 0.9 Mbps].
Basically, all users are fully serviced, except the users demanding 1 Mbps, which will be truncated to get 0.9 Mbps.

Implementation-wise, this means that the water filling algorithm will reach the equilibrium after a few iterations to the user quota of 0.9 Mbps.

> What if there was no traffic on 1999 pipes, would the single active pipe still be
> limited to 0.5 Mb/s?

Nope, see example above.

> What if the number of pipes changes without restarting the OS, how does
> that change the behavior?

The set of active users and the user demand fluctuates over time, so this is why the water filling algorithm is periodically re-computing the user quota based on the new network reality.

> 
> BTW, great docs overall, thanks for writing those up.
> 
> Thanks,
> 
> Greg Smith
> 
> 

Regards,
Cristian

      parent reply	other threads:[~2015-04-24 11:19 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-20 16:39 Greg Smith
2015-04-21  4:58 ` Sangjin Han
2015-04-24 11:19 ` Dumitrescu, Cristian [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3EB4FA525960D640B5BDFFD6A3D8912632358773@IRSMSX108.ger.corp.intel.com \
    --to=cristian.dumitrescu@intel.com \
    --cc=dev@dpdk.org \
    --cc=gregsmith@juniper.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).