From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by dpdk.org (Postfix) with ESMTP id 4BCF332A5; Thu, 4 Aug 2016 15:01:15 +0200 (CEST) Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga102.fm.intel.com with ESMTP; 04 Aug 2016 06:01:15 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.28,470,1464678000"; d="scan'208";a="150541385" Received: from irsmsx109.ger.corp.intel.com ([163.33.3.23]) by fmsmga004.fm.intel.com with ESMTP; 04 Aug 2016 06:01:14 -0700 Received: from irsmsx108.ger.corp.intel.com ([169.254.11.71]) by IRSMSX109.ger.corp.intel.com ([169.254.13.24]) with mapi id 14.03.0248.002; Thu, 4 Aug 2016 14:01:12 +0100 From: "Dumitrescu, Cristian" To: Yuyong Zhang , "dev@dpdk.org" , "users@dpdk.org" Thread-Topic: how to design high performance QoS support for a large amount of subscribers Thread-Index: AdHsy9qOZIKKBTHaTcy6xWj03wy3owABeiFAAF6x/gA= Date: Thu, 4 Aug 2016 13:01:12 +0000 Message-ID: <3EB4FA525960D640B5BDFFD6A3D8912647A3FECC@IRSMSX108.ger.corp.intel.com> References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiOGRhYmNhNTUtMWVmZS00ODhjLWE0MjItZmQzNDRmYjdkYmU4IiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX0lDIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE1LjkuNi42IiwiVHJ1c3RlZExhYmVsSGFzaCI6InhQMW96SUxuSWZrUDlKdnppRVppYzY1elFVdlJiQXJDUkVCVWp2VVZSWkE9In0= x-ctpclassification: CTP_IC x-originating-ip: [163.33.239.182] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] how to design high performance QoS support for a large amount of subscribers X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Aug 2016 13:01:16 -0000 Hi Yuyong, > -----Original Message----- > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Yuyong Zhang > Sent: Tuesday, August 2, 2016 4:26 PM > To: dev@dpdk.org; users@dpdk.org > Subject: [dpdk-dev] how to design high performance QoS support for a larg= e > amount of subscribers >=20 > Hi, >=20 > I am trying to add QoS support for a high performance VNF with large > amount of subscribers (millions). Welcome to the world of DPDK QoS users! It requires to support guaranteed bit rate > for different service level of subscribers. I.e. four service levels need= to be > supported: >=20 > * Diamond, 500M >=20 > * Gold, 100M >=20 > * Silver, 50M >=20 > * Bronze, 10M Service levels translate to pipe profiles in our DPDK implementation. The s= et of pipe profiles is defined per port. >=20 > Here is the current pipeline design using DPDK: >=20 >=20 > * 4 RX threads, does packet classification and load balancing >=20 > * 10-20 worker thread, does application subscriber management >=20 > * 4 TX threads, sends packets to TX NICs. >=20 > * Ring buffers used among RX threads, Worker threads, and TX thre= ads >=20 > I read DPDK program guide for QoS framework regarding hierarchical > scheduler: Port, sub-port, pipe, TC and queues, I am looking for advice o= n > how to design QoS scheduler to support millions of subscribers (pipes) wh= ich > traffic are processed in tens of worker threads where subscriber > management processing are handled? Having millions of pipes per port poses some challenges: 1. Does it actually make sense? Assuming the port rate is 10GbE, looking at= the smallest user rate you mention above (Bronze, 10Mbps/user), this means= that fully provisioning all users (i.e. making sure you can fully handle e= ach user in worst case scenario) results in a maximum of 1000 users per por= t. Assuming overprovisioning of 50:1, this means a maximum of 50K users per= port. 2. Memory challenge. The number of pipes per port is configurable -- hey, t= his is SW! :) -- but each of these pipes has 16 queues. For 4K pipes per po= rt, this is 64K queues per port; for typical value of 64 packets per queue,= this is 4M packets per port, so worst case scenario we need to provision 4= M packets in the buffer pool for each output port that has hierarchical sch= eduler enabled; for buffer size of ~2KB each, this means ~8GB of memory for= each output port. If you go from 4k pipes per port to 4M pipes per port, t= his means 8TB of memory per port. Do you have enough memory in your system?= :) One thing to realize is that even for millions of users in your system, not= all of them are active at the same time. So maybe have a smaller number of= pipes and only map the active users (those that have any packets to send n= ow) to them (a fraction of the total set of users), with the set of active = users changing over time. You can also consider mapping several users to the same pipe. >=20 > One design thought is as the following: >=20 > 8 ports (each one is associated with one physical port), 16-20 sub-ports = (each > is used by one Worker thread), each sub-port supports 250K pipes for > subscribers. Each worker thread manages one sub-port and does metering > for the sub-port to get color, and after identity subscriber flow pick a = unused > pipe, and do sched enqueuer/de-queue and then put into TX rings to TX > threads, and TX threads send the packets to TX NICs. >=20 In the current implementation, each port scheduler object has to be owned b= y a single thread, i.e. you cannot slit a port across multiple threads, the= refore is not straightforward to have different sub-ports handled by differ= ent threads. The workaround is to split yourself the physical NIC port into= multiple port scheduler objects: for example, create 8 port scheduler obje= cts, set the rate of each to 1/8 of 10GbE, have each of them feed a differe= nt NIC TX queue of the same physical NIC port. You can probably get this scenario (or very similar) up pretty quickly just= by handcrafting yourself a configuration file for examples/ip_pipeline app= lication. > Are there functional and performance issues with above approach? >=20 > Any advice and input are appreciated. >=20 > Regards, >=20 > Yuyong >=20 >=20 >=20 Regards, Cristian