From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-00191d01.pphosted.com (mx0b-00191d01.pphosted.com [67.231.157.136]) by dpdk.org (Postfix) with ESMTP id EDB241B1DB for ; Wed, 3 Jan 2018 14:46:12 +0100 (CET) Received: from pps.filterd (m0049463.ppops.net [127.0.0.1]) by m0049463.ppops.net-00191d01. (8.16.0.21/8.16.0.21) with SMTP id w03DissW006314; Wed, 3 Jan 2018 08:46:12 -0500 Received: from alpi155.enaf.aldc.att.com (sbcsmtp7.sbc.com [144.160.229.24]) by m0049463.ppops.net-00191d01. with ESMTP id 2f8y3mh894-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 03 Jan 2018 08:46:11 -0500 Received: from enaf.aldc.att.com (localhost [127.0.0.1]) by alpi155.enaf.aldc.att.com (8.14.5/8.14.5) with ESMTP id w03DkAub003773; Wed, 3 Jan 2018 08:46:11 -0500 Received: from mlpi409.sfdc.sbc.com (mlpi409.sfdc.sbc.com [130.9.128.241]) by alpi155.enaf.aldc.att.com (8.14.5/8.14.5) with ESMTP id w03Dk5Vf003710 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Wed, 3 Jan 2018 08:46:06 -0500 Received: from zlp27129.vci.att.com (zlp27129.vci.att.com [135.66.87.42]) by mlpi409.sfdc.sbc.com (RSA Interceptor); Wed, 3 Jan 2018 13:45:55 GMT Received: from zlp27129.vci.att.com (zlp27129.vci.att.com [127.0.0.1]) by zlp27129.vci.att.com (Service) with ESMTP id 7A6234000419; Wed, 3 Jan 2018 13:45:55 +0000 (GMT) Received: from gbcdccas02.intl.att.com (unknown [135.76.180.10]) by zlp27129.vci.att.com (Service) with ESMTPS id 0A6D940006B6; Wed, 3 Jan 2018 13:45:55 +0000 (GMT) Received: from GBCDCMBX03.intl.att.com ([135.76.31.134]) by gbcdccas02.intl.att.com ([135.76.180.10]) with mapi id 14.03.0361.001; Wed, 3 Jan 2018 13:45:53 +0000 From: "Dewar, Alan" To: "'Dumitrescu, Cristian'" , "'alangordondewar@gmail.com'" CC: "'dev@dpdk.org'" , "'Alan Dewar'" Thread-Topic: [PATCH v2] sched: fix overflow errors in WRR weighting code Thread-Index: AQHTabpTTdtkz1Ipb0+EfuFMNflu6aNg9dIAgAFeNFA= Date: Wed, 3 Jan 2018 13:45:52 +0000 Message-ID: <3F9268EEC0E43747A5FFFC6B48EF0321FBE61D@gbcdcmbx03.intl.att.com> References: <1512032696-30765-1-git-send-email-alan.dewar@att.com> <3EB4FA525960D640B5BDFFD6A3D891267BAF7E9D@IRSMSX108.ger.corp.intel.com> In-Reply-To: <3EB4FA525960D640B5BDFFD6A3D891267BAF7E9D@IRSMSX108.ger.corp.intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [135.160.174.41] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-RSA-Inspected: yes X-RSA-Classifications: public X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2018-01-03_09:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_policy_notspam policy=outbound_policy score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1011 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1801030193 X-Mailman-Approved-At: Thu, 04 Jan 2018 10:34:23 +0100 Subject: Re: [dpdk-dev] [PATCH v2] sched: fix overflow errors in WRR weighting code X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jan 2018 13:46:13 -0000 Hi Cristian, Responses in-line below... > -----Original Message----- > From: Dumitrescu, Cristian [mailto:cristian.dumitrescu@intel.com]=20 > Sent: Tuesday, January 02, 2018 4:14 PM > To: alangordondewar@gmail.com > Cc: dev@dpdk.org; Alan Dewar > Subject: RE: [PATCH v2] sched: fix overflow errors in WRR weighting code > > Hi Alan, > > NAK for now. > > There is a good reason for truncating the WRR cost to 8-bit value, which = is keeping the size of the rte_sched_pipe structure to single cache line (6= 4 bytes). This is done for performance reasons at the expense of some accur= acy loss for the scheduling of the 4x queues per traffic class. > Ah, I wasn't aware of this restriction, but I fully understand why you don'= t want to increase the size of the rte_sched_pipe structure. > Is there a way to make the improvement while working with 8-bit WRR cost = values in the pipe structure? Off the top of my head, I think that we might be able to improve the behavi= or of WRR queues a little, but not as well as we could with 32-bit wrr_cost= array in the rte_sched_pipe structure. I'll need to play around with it a= nd see what happens. > > > -----Original Message----- > > From: alangordondewar@gmail.com [mailto:alangordondewar@gmail.com] > > Sent: Thursday, November 30, 2017 9:05 AM > > To: Dumitrescu, Cristian > > Cc: dev@dpdk.org; Alan Dewar > > Subject: [PATCH v2] sched: fix overflow errors in WRR weighting code > >=20 > > From: Alan Dewar > >=20 > > Revised patch - this version fixes an issue when a small wrr_cost is=20 > > shifted so far right that its value becomes zero. > >=20 > > The WRR code calculates the lowest common denominator between the four= =20 > > WRR weights as a uint32_t value and divides the LCD by each of the WRR= =20 > > weights and casts the results as a uint8_t. This casting can cause > > the ratios of the computed wrr costs to be wrong. For example with=20 > > WRR weights of 3, 5, 7 and 11, the LCD is computed to be 1155. The=20 > > WRR costs get computed as: > > Picking prime numbers for the weights is generally a bad idea. If you wou= ld pick e.g. 4, 6, 8 and 12 rather than 3, 5, 7 and 11 you would avoid any = issues due to the 8-bit truncation. > I selected 3, 5, 7 and 11 as a pathological case to illustrate part of the = problem, I think that it is unlikely that customers will actually use these= numbers. However the DPDK doesn't impose any restrictions on what values wrr_weights= can have any values between 1 and 255 are allowed. In the product where w= e are using the DPDK as part of a software dataplane, we currently restrict= ed wrr_weights to values between 1 and 100. However we have no control o= ver the values that our customers select within that range.=20 To impose any further restriction would not be backwards compatible with pr= evious versions of our software. This non-backwards compatibility is not = acceptable to customers when they upgrade the software on their routers wit= h the newer version, and their previously accepted configuration is rejecte= d by the new version of the software. > >=20 > > 1155/3 =3D 385, 1155/5 =3D 231, 1155/7 =3D 165, 1155/11 =3D 105. > >=20 > > When the value 385 is cast into an uint8_t it ends up as 129. > > Rather than casting straight into a uint8_t, this patch shifts the=20 > > computed WRR costs right so that the largest value is only eight bits=20 > > wide. > >=20 > > In grinder_schedule, the packet length is multiplied by the WRR cost=20 > > and added to the grinder's wrr_tokens value. The grinder's wrr_tokens= =20 > > field is a uint16_t, so combination of a packet length of 1500 bytes=20 > > and a wrr cost of 44 will overflow this field on the first packet. > >=20 > > This patch increases the width of the grinder's wrr_tokens and=20 > > wrr_mask fields from uint16_t to uint32_t. > >=20 > > Increasing the size of the grinder fields is OK, but I am not sure whethe= r it is really helpful, as the values saved in the pipe structure are 8-bit= . > I will investigate this further, but I don't think that we will see much be= nefit. > > > In grinder_wrr_store, the remaining tokens in the grinder's wrr_tokens= =20 > > array are copied to the appropriate pipe's wrr_tokens array. However=20 > > the pipe's wrr_tokens array is only a uint8_t array so unused tokens=20 > > were quite frequently lost which upsets the balance of traffic across=20 > > the four WRR queues. > >=20 > > This patch increases the width of the pipe's wrr_tokens array from a=20 > > uint8_t to uint32_t. > > This is not allowed for performance reasons, as having 16-bit or 32-bit W= RR cost values in pipe structure would increase the size of the pipe struct= ure from one cache line to two cache lines. Understood, it might be interesting to try to determine what the performanc= e hit is. Regards Alan > > >=20 > > Signed-off-by: Alan Dewar > > --- > > v2 - fixed bug in the wrr_cost calculation code that could result in a= =20 > > zero wrr_cost > >=20 > > lib/librte_sched/rte_sched.c | 59 +++++++++++++++++++++++++++++= --- > > ----- > > lib/librte_sched/rte_sched_common.h | 15 ++++++++++ > > 2 files changed, 61 insertions(+), 13 deletions(-) > >=20 > > diff --git a/lib/librte_sched/rte_sched.c=20 > > b/lib/librte_sched/rte_sched.c index 7252f85..324743d 100644 > > --- a/lib/librte_sched/rte_sched.c > > +++ b/lib/librte_sched/rte_sched.c > > @@ -130,7 +130,7 @@ struct rte_sched_pipe { > > uint32_t tc_credits[RTE_SCHED_TRAFFIC_CLASSES_PER_PIPE]; > >=20 > > /* Weighted Round Robin (WRR) */ > > - uint8_t wrr_tokens[RTE_SCHED_QUEUES_PER_PIPE]; > > + uint32_t wrr_tokens[RTE_SCHED_QUEUES_PER_PIPE]; > >=20 > > /* TC oversubscription */ > > uint32_t tc_ov_credits; > > @@ -205,8 +205,8 @@ struct rte_sched_grinder { > > struct rte_mbuf *pkt; > >=20 > > /* WRR */ > > - uint16_t wrr_tokens[RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS]; > > - uint16_t wrr_mask[RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS]; > > + uint32_t wrr_tokens[RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS]; > > + uint32_t wrr_mask[RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS]; > > uint8_t wrr_cost[RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS]; > > }; > >=20 > > @@ -542,6 +542,17 @@ rte_sched_time_ms_to_bytes(uint32_t time_ms,=20 > > uint32_t rate) > > return time; > > } > >=20 > > +static uint32_t rte_sched_reduce_to_byte(uint32_t value) { > > + uint32_t shift =3D 0; > > + > > + while (value & 0xFFFFFF00) { > > + value >>=3D 1; > > + shift++; > > + } > > + return shift; > > +} > > + > > static void > > rte_sched_port_config_pipe_profile_table(struct rte_sched_port *port,= =20 > > struct rte_sched_port_params *params) { @@ -583,6 +594,8 @@=20 > > rte_sched_port_config_pipe_profile_table(struct > > rte_sched_port *port, struct rte > > uint32_t > > wrr_cost[RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS]; > > uint32_t lcd, lcd1, lcd2; > > uint32_t qindex; > > + uint32_t low_pos; > > + uint32_t shift; > >=20 > > qindex =3D j * > > RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS; > >=20 > > @@ -594,12 +607,28 @@ rte_sched_port_config_pipe_profile_table(struct > > rte_sched_port *port, struct rte > > lcd1 =3D rte_get_lcd(wrr_cost[0], wrr_cost[1]); > > lcd2 =3D rte_get_lcd(wrr_cost[2], wrr_cost[3]); > > lcd =3D rte_get_lcd(lcd1, lcd2); > > + low_pos =3D rte_min_pos_4_u32(wrr_cost); > >=20 > > wrr_cost[0] =3D lcd / wrr_cost[0]; > > wrr_cost[1] =3D lcd / wrr_cost[1]; > > wrr_cost[2] =3D lcd / wrr_cost[2]; > > wrr_cost[3] =3D lcd / wrr_cost[3]; > >=20 > > + shift =3D > > rte_sched_reduce_to_byte(wrr_cost[low_pos]); > > + wrr_cost[0] >>=3D shift; > > + wrr_cost[1] >>=3D shift; > > + wrr_cost[2] >>=3D shift; > > + wrr_cost[3] >>=3D shift; > > + > > + if (wrr_cost[0] =3D=3D 0) > > + wrr_cost[0]++; > > + if (wrr_cost[1] =3D=3D 0) > > + wrr_cost[1]++; > > + if (wrr_cost[2] =3D=3D 0) > > + wrr_cost[2]++; > > + if (wrr_cost[3] =3D=3D 0) > > + wrr_cost[3]++; > > + > > dst->wrr_cost[qindex] =3D (uint8_t) wrr_cost[0]; > > dst->wrr_cost[qindex + 1] =3D (uint8_t) wrr_cost[1]; > > dst->wrr_cost[qindex + 2] =3D (uint8_t) wrr_cost[2]; @@ -1941,15=20 > > +1970,19 @@ grinder_wrr_load(struct rte_sched_port *port, uint32_t=20 > > pos) > >=20 > > qindex =3D tc_index * 4; > >=20 > > - grinder->wrr_tokens[0] =3D ((uint16_t) pipe->wrr_tokens[qindex]) << > > RTE_SCHED_WRR_SHIFT; > > - grinder->wrr_tokens[1] =3D ((uint16_t) pipe->wrr_tokens[qindex + 1]) > > << RTE_SCHED_WRR_SHIFT; > > - grinder->wrr_tokens[2] =3D ((uint16_t) pipe->wrr_tokens[qindex + 2]) > > << RTE_SCHED_WRR_SHIFT; > > - grinder->wrr_tokens[3] =3D ((uint16_t) pipe->wrr_tokens[qindex + 3]) > > << RTE_SCHED_WRR_SHIFT; > > + grinder->wrr_tokens[0] =3D pipe->wrr_tokens[qindex] << > > + RTE_SCHED_WRR_SHIFT; > > + grinder->wrr_tokens[1] =3D pipe->wrr_tokens[qindex + 1] << > > + RTE_SCHED_WRR_SHIFT; > > + grinder->wrr_tokens[2] =3D pipe->wrr_tokens[qindex + 2] << > > + RTE_SCHED_WRR_SHIFT; > > + grinder->wrr_tokens[3] =3D pipe->wrr_tokens[qindex + 3] << > > + RTE_SCHED_WRR_SHIFT; > >=20 > > - grinder->wrr_mask[0] =3D (qmask & 0x1) * 0xFFFF; > > - grinder->wrr_mask[1] =3D ((qmask >> 1) & 0x1) * 0xFFFF; > > - grinder->wrr_mask[2] =3D ((qmask >> 2) & 0x1) * 0xFFFF; > > - grinder->wrr_mask[3] =3D ((qmask >> 3) & 0x1) * 0xFFFF; > > + grinder->wrr_mask[0] =3D (qmask & 0x1) * 0xFFFFFFFF; > > + grinder->wrr_mask[1] =3D ((qmask >> 1) & 0x1) * 0xFFFFFFFF; > > + grinder->wrr_mask[2] =3D ((qmask >> 2) & 0x1) * 0xFFFFFFFF; > > + grinder->wrr_mask[3] =3D ((qmask >> 3) & 0x1) * 0xFFFFFFFF; > >=20 > > grinder->wrr_cost[0] =3D pipe_params->wrr_cost[qindex]; > > grinder->wrr_cost[1] =3D pipe_params->wrr_cost[qindex + 1]; @@=20 > > -1981,14 +2014,14 @@ static inline void grinder_wrr(struct=20 > > rte_sched_port *port, uint32_t pos) { > > struct rte_sched_grinder *grinder =3D port->grinder + pos; > > - uint16_t wrr_tokens_min; > > + uint32_t wrr_tokens_min; > >=20 > > grinder->wrr_tokens[0] |=3D ~grinder->wrr_mask[0]; > > grinder->wrr_tokens[1] |=3D ~grinder->wrr_mask[1]; > > grinder->wrr_tokens[2] |=3D ~grinder->wrr_mask[2]; > > grinder->wrr_tokens[3] |=3D ~grinder->wrr_mask[3]; > >=20 > > - grinder->qpos =3D rte_min_pos_4_u16(grinder->wrr_tokens); > > + grinder->qpos =3D rte_min_pos_4_u32(grinder->wrr_tokens); > > wrr_tokens_min =3D grinder->wrr_tokens[grinder->qpos]; > >=20 > > grinder->wrr_tokens[0] -=3D wrr_tokens_min; diff --git=20 > > a/lib/librte_sched/rte_sched_common.h > > b/lib/librte_sched/rte_sched_common.h > > index aed144b..a3c6bc2 100644 > > --- a/lib/librte_sched/rte_sched_common.h > > +++ b/lib/librte_sched/rte_sched_common.h > > @@ -77,6 +77,21 @@ rte_min_pos_4_u16(uint16_t *x) > > return pos0; > > } > >=20 > > +static inline uint32_t > > +rte_min_pos_4_u32(uint32_t *x) > > +{ > > + uint32_t pos0 =3D 0; > > + uint32_t pos1 =3D 2; > > + > > + if (x[1] <=3D x[0]) > > + pos0 =3D 1; > > + if (x[3] <=3D x[2]) > > + pos1 =3D 3; > > + if (x[pos1] <=3D x[pos0]) > > + pos0 =3D pos1; > > + > > + return pos0; > > +} > > #endif > >=20 > > /* > > -- > > 2.1.4 > > Regards, > Cristian