From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <cristian.dumitrescu@intel.com>
Received: from mga11.intel.com (mga11.intel.com [192.55.52.93])
 by dpdk.org (Postfix) with ESMTP id 4BCF332A5;
 Thu,  4 Aug 2016 15:01:15 +0200 (CEST)
Received: from fmsmga004.fm.intel.com ([10.253.24.48])
 by fmsmga102.fm.intel.com with ESMTP; 04 Aug 2016 06:01:15 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.28,470,1464678000"; d="scan'208";a="150541385"
Received: from irsmsx109.ger.corp.intel.com ([163.33.3.23])
 by fmsmga004.fm.intel.com with ESMTP; 04 Aug 2016 06:01:14 -0700
Received: from irsmsx108.ger.corp.intel.com ([169.254.11.71]) by
 IRSMSX109.ger.corp.intel.com ([169.254.13.24]) with mapi id 14.03.0248.002;
 Thu, 4 Aug 2016 14:01:12 +0100
From: "Dumitrescu, Cristian" <cristian.dumitrescu@intel.com>
To: Yuyong Zhang <yuyong.zhang@casa-systems.com>, "dev@dpdk.org"
 <dev@dpdk.org>, "users@dpdk.org" <users@dpdk.org>
Thread-Topic: how to design high performance QoS support for a large amount
 of subscribers
Thread-Index: AdHsy9qOZIKKBTHaTcy6xWj03wy3owABeiFAAF6x/gA=
Date: Thu, 4 Aug 2016 13:01:12 +0000
Message-ID: <3EB4FA525960D640B5BDFFD6A3D8912647A3FECC@IRSMSX108.ger.corp.intel.com>
References: <BLUPR06MB611FFA47DC49EF570D8F594B9050@BLUPR06MB611.namprd06.prod.outlook.com>
In-Reply-To: <BLUPR06MB611FFA47DC49EF570D8F594B9050@BLUPR06MB611.namprd06.prod.outlook.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiOGRhYmNhNTUtMWVmZS00ODhjLWE0MjItZmQzNDRmYjdkYmU4IiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX0lDIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE1LjkuNi42IiwiVHJ1c3RlZExhYmVsSGFzaCI6InhQMW96SUxuSWZrUDlKdnppRVppYzY1elFVdlJiQXJDUkVCVWp2VVZSWkE9In0=
x-ctpclassification: CTP_IC
x-originating-ip: [163.33.239.182]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: Re: [dpdk-dev] how to design high performance QoS support for a
 large amount of subscribers
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Thu, 04 Aug 2016 13:01:16 -0000

Hi Yuyong,

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Yuyong Zhang
> Sent: Tuesday, August 2, 2016 4:26 PM
> To: dev@dpdk.org; users@dpdk.org
> Subject: [dpdk-dev] how to design high performance QoS support for a larg=
e
> amount of subscribers
>=20
> Hi,
>=20
> I am trying to add QoS support for a high performance VNF with large
> amount of subscribers (millions).

Welcome to the world of DPDK QoS users!

It requires to support guaranteed bit rate
> for different service level of subscribers. I.e. four service levels need=
 to be
> supported:
>=20
> *         Diamond, 500M
>=20
> *         Gold, 100M
>=20
> *         Silver, 50M
>=20
> *         Bronze, 10M

Service levels translate to pipe profiles in our DPDK implementation. The s=
et of pipe profiles is defined per port.

>=20
> Here is the current pipeline design using DPDK:
>=20
>=20
> *         4 RX threads, does packet classification and load balancing
>=20
> *         10-20 worker thread, does application subscriber management
>=20
> *         4 TX threads, sends packets to TX NICs.
>=20
> *         Ring buffers used among RX threads, Worker threads, and TX thre=
ads
>=20
> I read DPDK program guide for QoS framework regarding  hierarchical
> scheduler: Port, sub-port, pipe, TC and queues, I am looking for advice o=
n
> how to design QoS scheduler to support millions of subscribers (pipes) wh=
ich
> traffic are processed in tens of worker threads where subscriber
> management processing are handled?

Having millions of pipes per port poses some challenges:
1. Does it actually make sense? Assuming the port rate is 10GbE, looking at=
 the smallest user rate you mention above (Bronze, 10Mbps/user), this means=
 that fully provisioning all users (i.e. making sure you can fully handle e=
ach user in worst case scenario) results in a maximum of 1000 users per por=
t. Assuming overprovisioning of 50:1, this means a maximum of 50K users per=
 port.
2. Memory challenge. The number of pipes per port is configurable -- hey, t=
his is SW! :) -- but each of these pipes has 16 queues. For 4K pipes per po=
rt, this is 64K queues per port; for typical value of 64 packets per queue,=
 this is 4M packets per port, so worst case scenario we need to provision 4=
M packets in the buffer pool for each output port that has hierarchical sch=
eduler enabled; for buffer size of ~2KB each, this means ~8GB of memory for=
 each output port. If you go from 4k pipes per port to 4M pipes per port, t=
his means 8TB of memory per port. Do you have enough memory in your system?=
 :)

One thing to realize is that even for millions of users in your system, not=
 all of them are active at the same time. So maybe have a smaller number of=
 pipes and only map the active users (those that have any packets to send n=
ow) to them (a fraction of the total set of users), with the set of active =
users changing over time.

You can also consider mapping several users to the same pipe.

>=20
> One design thought is as the following:
>=20
> 8 ports (each one is associated with one physical port), 16-20 sub-ports =
(each
> is used by one Worker thread), each sub-port supports 250K pipes for
> subscribers. Each worker thread manages one sub-port and does metering
> for the sub-port to get color, and after identity subscriber flow pick a =
unused
> pipe, and do sched enqueuer/de-queue and then put into TX rings to TX
> threads, and TX threads send the packets to TX NICs.
>=20

In the current implementation, each port scheduler object has to be owned b=
y a single thread, i.e. you cannot slit a port across multiple threads, the=
refore is not straightforward to have different sub-ports handled by differ=
ent threads. The workaround is to split yourself the physical NIC port into=
 multiple port scheduler objects: for example, create 8 port scheduler obje=
cts, set the rate of each to 1/8 of 10GbE, have each of them feed a differe=
nt NIC TX queue of the same physical NIC port.

You can probably get this scenario (or very similar) up pretty quickly just=
 by handcrafting yourself a configuration file for examples/ip_pipeline app=
lication.

> Are there functional and performance issues with above approach?
>=20
> Any advice and input are appreciated.
>=20
> Regards,
>=20
> Yuyong
>=20
>=20
>=20

Regards,
Cristian