From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-000f0801.pphosted.com (mx0b-000f0801.pphosted.com [67.231.152.113]) by dpdk.org (Postfix) with ESMTP id B77412A66 for ; Thu, 8 Dec 2016 16:41:17 +0100 (CET) Received: from pps.filterd (m0048192.ppops.net [127.0.0.1]) by mx0b-000f0801.pphosted.com (8.16.0.17/8.16.0.17) with SMTP id uB8Fd3aZ009321; Thu, 8 Dec 2016 07:41:15 -0800 Received: from brmwp-exmb11.corp.brocade.com ([208.47.132.227]) by mx0b-000f0801.pphosted.com with ESMTP id 273x7egu8v-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT); Thu, 08 Dec 2016 07:41:15 -0800 Received: from EMEAWP-EXMB11.corp.brocade.com (172.29.11.85) by BRMWP-EXMB11.corp.brocade.com (172.16.59.77) with Microsoft SMTP Server (TLS) id 15.0.1210.3; Thu, 8 Dec 2016 08:41:12 -0700 Received: from EMEAWP-EXMB11.corp.brocade.com (172.29.11.85) by EMEAWP-EXMB11.corp.brocade.com (172.29.11.85) with Microsoft SMTP Server (TLS) id 15.0.1210.3; Thu, 8 Dec 2016 16:41:08 +0100 Received: from EMEAWP-EXMB11.corp.brocade.com ([fe80::85ea:b7da:48dd:1640]) by EMEAWP-EXMB11.corp.brocade.com ([fe80::85ea:b7da:48dd:1640%21]) with mapi id 15.00.1210.000; Thu, 8 Dec 2016 16:41:08 +0100 From: Alan Robertson To: "Dumitrescu, Cristian" CC: "dev@dpdk.org" , Thomas Monjalon Thread-Topic: [dpdk-dev] [RFC] ethdev: abstraction layer for QoS hierarchical scheduler Thread-Index: AQHSUAXX1nBIRfHHyEC6aV2DVJ7quqD8RZPwgACQXoCAAVnVoA== Date: Thu, 8 Dec 2016 15:41:08 +0000 Message-ID: References: <1480529810-95280-1-git-send-email-cristian.dumitrescu@intel.com> <57688e98-15d5-1866-0c3a-9dda81621651@brocade.com> <6d862b500e1e4f34a4cbf790db8d5d48@EMEAWP-EXMB11.corp.brocade.com> <3EB4FA525960D640B5BDFFD6A3D8912652711302@IRSMSX108.ger.corp.intel.com> In-Reply-To: <3EB4FA525960D640B5BDFFD6A3D8912652711302@IRSMSX108.ger.corp.intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [172.27.212.177] Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2016-12-08_08:, , signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1011 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1609300000 definitions=main-1612080230 Subject: Re: [dpdk-dev] [RFC] ethdev: abstraction layer for QoS hierarchical scheduler X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 08 Dec 2016 15:41:18 -0000 Hi Cristian, The way qos works just now should be feasible for dynamic targets. That i= s similar functions to rte_sched_port_enqueue() and rte_sched_port_dequeue() would be called. = The first to enqueue the mbufs onto the queues the second to dequeue. The qos structure= s and scheduler don't need to be as functionally rich though. I would have thought a simpl= e pipe with child nodes should suffice for most. That would allow each tunnel/session to be = shaped and the queueing and drop logic inherited from what is there just now. Thanks, Alan. -----Original Message----- From: Dumitrescu, Cristian [mailto:cristian.dumitrescu@intel.com]=20 Sent: Wednesday, December 07, 2016 7:52 PM To: Alan Robertson Cc: dev@dpdk.org; Thomas Monjalon Subject: RE: [dpdk-dev] [RFC] ethdev: abstraction layer for QoS hierarchica= l scheduler Hi Alan, Thanks for your comments! > Hi Cristian, > Looking at points 10 and 11 it's good to hear nodes can be dynamically ad= ded. Yes, many implementations allow on-the-fly remapping a node from one parent= to another one, or simply adding more nodes post-initialization, so it is = natural for the API to provide this. > We've been trying to decide the best way to do this for support of qos=20 > on tunnels for some time now and the existing implementation doesn't=20 > allow this so effectively ruled out hierarchical queueing for tunnel targ= ets on the output interface. > Having said that, has thought been given to separating the queueing from = being so closely > tied to the Ethernet transmit process ? When queueing on a tunnel for e= xample we may > be working with encryption. When running with an anti-reply window it i= s really much > better to do the QOS (packet reordering) before the encryption. To=20 > support this would it be possible to have a separate scheduler=20 > structure which can be passed into the scheduling API ? This means=20 > the calling code can hang the structure of whatever entity it wishes to p= erform qos on, and we get dynamic target support (sessions/tunnels etc). Yes, this is one point where we need to look for a better solution. Current= proposal attaches the hierarchical scheduler function to an ethdev, so sch= eduling traffic for tunnels that have a pre-defined bandwidth is not suppor= ted nicely. This question was also raised in VPP, but there tunnels are sup= ported as a type of output interfaces, so attaching scheduling to an output= interface also covers the tunnels case. Looks to me that nice tunnel abstractions are a gap in DPDK as well. Any th= oughts about how tunnels should be supported in DPDK? What do other people = think about this? > Regarding the structure allocation, would it be possible to make the=20 > number of queues associated with a TC a compile time option which the sch= eduler would accommodate ? > We frequently only use one queue per tc which means 75% of the space=20 > allocated at the queueing layer for that tc is never used.=A0 This may=20 > be specific to our implementation but if other implementations do the=20 > same if folks could say we may get a better idea if this is a common case= . > Whilst touching on the scheduler, the token replenishment works using=20 > a division and multiplication obviously to cater for the fact that it=20 > may be run after several tc windows have passed.=A0 The most commonly=20 > used industrial scheduler simply does a lapsed on the tc and then adds=20 > the bc.=A0=A0 This relies on the scheduler being called within the tc=20 > window though.=A0 It would be nice to have this as a configurable option = since it's much for efficient assuming the infra code from which it's calle= d can guarantee the calling frequency. This is probably feedback for librte_sched as opposed to the current API pr= oposal, as the Latter is intended to be generic/implementation-agnostic and= therefor its scope far exceeds the existing set of librte_sched features. Btw, we do plan using the librte_sched feature as the default fall-back whe= n the HW ethdev is not scheduler-enabled, as well as the implementation of = choice for a lot of use-cases where it fits really well, so we do have to c= ontinue evolve and improve librte_sched feature-wise and performance-wise. > I hope you'll consider these points for inclusion into a future road=20 > map.=A0 Hopefully in the future my employer will increase the priority=20 > of some of the tasks and a PR may appear on the mailing list. > Thanks, > Alan.