From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by dpdk.org (Postfix) with ESMTP id DA0207D19 for ; Tue, 2 Jan 2018 17:13:38 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 02 Jan 2018 08:13:37 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.45,498,1508828400"; d="scan'208";a="163606548" Received: from irsmsx108.ger.corp.intel.com ([163.33.3.3]) by orsmga004.jf.intel.com with ESMTP; 02 Jan 2018 08:13:36 -0800 Received: from irsmsx111.ger.corp.intel.com (10.108.20.4) by IRSMSX108.ger.corp.intel.com (163.33.3.3) with Microsoft SMTP Server (TLS) id 14.3.319.2; Tue, 2 Jan 2018 16:13:35 +0000 Received: from irsmsx108.ger.corp.intel.com ([169.254.11.167]) by irsmsx111.ger.corp.intel.com ([169.254.2.30]) with mapi id 14.03.0319.002; Tue, 2 Jan 2018 16:13:35 +0000 From: "Dumitrescu, Cristian" To: "alangordondewar@gmail.com" CC: "dev@dpdk.org" , Alan Dewar Thread-Topic: [PATCH v2] sched: fix overflow errors in WRR weighting code Thread-Index: AQHTabpz0SsBie49DEeQ3RrNt/7TP6Ng8TxA Date: Tue, 2 Jan 2018 16:13:34 +0000 Message-ID: <3EB4FA525960D640B5BDFFD6A3D891267BAF7E9D@IRSMSX108.ger.corp.intel.com> References: <1512032696-30765-1-git-send-email-alan.dewar@att.com> In-Reply-To: <1512032696-30765-1-git-send-email-alan.dewar@att.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: dlp-product: dlpe-windows dlp-version: 11.0.0.116 dlp-reaction: no-action x-originating-ip: [163.33.239.181] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH v2] sched: fix overflow errors in WRR weighting code X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 02 Jan 2018 16:13:39 -0000 Hi Alan, NAK for now. There is a good reason for truncating the WRR cost to 8-bit value, which is= keeping the size of the rte_sched_pipe structure to single cache line (64 = bytes). This is done for performance reasons at the expense of some accurac= y loss for the scheduling of the 4x queues per traffic class. Is there a way to make the improvement while working with 8-bit WRR cost va= lues in the pipe structure? > -----Original Message----- > From: alangordondewar@gmail.com [mailto:alangordondewar@gmail.com] > Sent: Thursday, November 30, 2017 9:05 AM > To: Dumitrescu, Cristian > Cc: dev@dpdk.org; Alan Dewar > Subject: [PATCH v2] sched: fix overflow errors in WRR weighting code >=20 > From: Alan Dewar >=20 > Revised patch - this version fixes an issue when a small wrr_cost is > shifted so far right that its value becomes zero. >=20 > The WRR code calculates the lowest common denominator between the > four > WRR weights as a uint32_t value and divides the LCD by each of the WRR > weights and casts the results as a uint8_t. This casting can cause > the ratios of the computed wrr costs to be wrong. For example with > WRR weights of 3, 5, 7 and 11, the LCD is computed to be > 1155. The WRR costs get computed as: Picking prime numbers for the weights is generally a bad idea. If you would= pick e.g. 4, 6, 8 and 12 rather than 3, 5, 7 and 11 you would avoid any is= sues due to the 8-bit truncation. >=20 > 1155/3 =3D 385, 1155/5 =3D 231, 1155/7 =3D 165, 1155/11 =3D 105. >=20 > When the value 385 is cast into an uint8_t it ends up as 129. > Rather than casting straight into a uint8_t, this patch shifts the > computed WRR costs right so that the largest value is only eight bits > wide. >=20 > In grinder_schedule, the packet length is multiplied by the WRR cost > and added to the grinder's wrr_tokens value. The grinder's wrr_tokens > field is a uint16_t, so combination of a packet length of 1500 bytes > and a wrr cost of 44 will overflow this field on the first packet. >=20 > This patch increases the width of the grinder's wrr_tokens and > wrr_mask fields from uint16_t to uint32_t. >=20 Increasing the size of the grinder fields is OK, but I am not sure whether = it is really helpful, as the values saved in the pipe structure are 8-bit. > In grinder_wrr_store, the remaining tokens in the grinder's wrr_tokens > array are copied to the appropriate pipe's wrr_tokens array. However > the pipe's wrr_tokens array is only a uint8_t array so unused tokens > were quite frequently lost which upsets the balance of traffic across > the four WRR queues. >=20 > This patch increases the width of the pipe's wrr_tokens array from > a uint8_t to uint32_t. This is not allowed for performance reasons, as having 16-bit or 32-bit WRR= cost values in pipe structure would increase the size of the pipe structur= e from one cache line to two cache lines. >=20 > Signed-off-by: Alan Dewar > --- > v2 - fixed bug in the wrr_cost calculation code that could result > in a zero wrr_cost >=20 > lib/librte_sched/rte_sched.c | 59 +++++++++++++++++++++++++++++--= - > ----- > lib/librte_sched/rte_sched_common.h | 15 ++++++++++ > 2 files changed, 61 insertions(+), 13 deletions(-) >=20 > diff --git a/lib/librte_sched/rte_sched.c b/lib/librte_sched/rte_sched.c > index 7252f85..324743d 100644 > --- a/lib/librte_sched/rte_sched.c > +++ b/lib/librte_sched/rte_sched.c > @@ -130,7 +130,7 @@ struct rte_sched_pipe { > uint32_t tc_credits[RTE_SCHED_TRAFFIC_CLASSES_PER_PIPE]; >=20 > /* Weighted Round Robin (WRR) */ > - uint8_t wrr_tokens[RTE_SCHED_QUEUES_PER_PIPE]; > + uint32_t wrr_tokens[RTE_SCHED_QUEUES_PER_PIPE]; >=20 > /* TC oversubscription */ > uint32_t tc_ov_credits; > @@ -205,8 +205,8 @@ struct rte_sched_grinder { > struct rte_mbuf *pkt; >=20 > /* WRR */ > - uint16_t wrr_tokens[RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS]; > - uint16_t wrr_mask[RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS]; > + uint32_t wrr_tokens[RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS]; > + uint32_t wrr_mask[RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS]; > uint8_t wrr_cost[RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS]; > }; >=20 > @@ -542,6 +542,17 @@ rte_sched_time_ms_to_bytes(uint32_t time_ms, > uint32_t rate) > return time; > } >=20 > +static uint32_t rte_sched_reduce_to_byte(uint32_t value) > +{ > + uint32_t shift =3D 0; > + > + while (value & 0xFFFFFF00) { > + value >>=3D 1; > + shift++; > + } > + return shift; > +} > + > static void > rte_sched_port_config_pipe_profile_table(struct rte_sched_port *port, > struct rte_sched_port_params *params) > { > @@ -583,6 +594,8 @@ rte_sched_port_config_pipe_profile_table(struct > rte_sched_port *port, struct rte > uint32_t > wrr_cost[RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS]; > uint32_t lcd, lcd1, lcd2; > uint32_t qindex; > + uint32_t low_pos; > + uint32_t shift; >=20 > qindex =3D j * > RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS; >=20 > @@ -594,12 +607,28 @@ rte_sched_port_config_pipe_profile_table(struct > rte_sched_port *port, struct rte > lcd1 =3D rte_get_lcd(wrr_cost[0], wrr_cost[1]); > lcd2 =3D rte_get_lcd(wrr_cost[2], wrr_cost[3]); > lcd =3D rte_get_lcd(lcd1, lcd2); > + low_pos =3D rte_min_pos_4_u32(wrr_cost); >=20 > wrr_cost[0] =3D lcd / wrr_cost[0]; > wrr_cost[1] =3D lcd / wrr_cost[1]; > wrr_cost[2] =3D lcd / wrr_cost[2]; > wrr_cost[3] =3D lcd / wrr_cost[3]; >=20 > + shift =3D > rte_sched_reduce_to_byte(wrr_cost[low_pos]); > + wrr_cost[0] >>=3D shift; > + wrr_cost[1] >>=3D shift; > + wrr_cost[2] >>=3D shift; > + wrr_cost[3] >>=3D shift; > + > + if (wrr_cost[0] =3D=3D 0) > + wrr_cost[0]++; > + if (wrr_cost[1] =3D=3D 0) > + wrr_cost[1]++; > + if (wrr_cost[2] =3D=3D 0) > + wrr_cost[2]++; > + if (wrr_cost[3] =3D=3D 0) > + wrr_cost[3]++; > + > dst->wrr_cost[qindex] =3D (uint8_t) wrr_cost[0]; > dst->wrr_cost[qindex + 1] =3D (uint8_t) wrr_cost[1]; > dst->wrr_cost[qindex + 2] =3D (uint8_t) wrr_cost[2]; > @@ -1941,15 +1970,19 @@ grinder_wrr_load(struct rte_sched_port *port, > uint32_t pos) >=20 > qindex =3D tc_index * 4; >=20 > - grinder->wrr_tokens[0] =3D ((uint16_t) pipe->wrr_tokens[qindex]) << > RTE_SCHED_WRR_SHIFT; > - grinder->wrr_tokens[1] =3D ((uint16_t) pipe->wrr_tokens[qindex + 1]) > << RTE_SCHED_WRR_SHIFT; > - grinder->wrr_tokens[2] =3D ((uint16_t) pipe->wrr_tokens[qindex + 2]) > << RTE_SCHED_WRR_SHIFT; > - grinder->wrr_tokens[3] =3D ((uint16_t) pipe->wrr_tokens[qindex + 3]) > << RTE_SCHED_WRR_SHIFT; > + grinder->wrr_tokens[0] =3D pipe->wrr_tokens[qindex] << > + RTE_SCHED_WRR_SHIFT; > + grinder->wrr_tokens[1] =3D pipe->wrr_tokens[qindex + 1] << > + RTE_SCHED_WRR_SHIFT; > + grinder->wrr_tokens[2] =3D pipe->wrr_tokens[qindex + 2] << > + RTE_SCHED_WRR_SHIFT; > + grinder->wrr_tokens[3] =3D pipe->wrr_tokens[qindex + 3] << > + RTE_SCHED_WRR_SHIFT; >=20 > - grinder->wrr_mask[0] =3D (qmask & 0x1) * 0xFFFF; > - grinder->wrr_mask[1] =3D ((qmask >> 1) & 0x1) * 0xFFFF; > - grinder->wrr_mask[2] =3D ((qmask >> 2) & 0x1) * 0xFFFF; > - grinder->wrr_mask[3] =3D ((qmask >> 3) & 0x1) * 0xFFFF; > + grinder->wrr_mask[0] =3D (qmask & 0x1) * 0xFFFFFFFF; > + grinder->wrr_mask[1] =3D ((qmask >> 1) & 0x1) * 0xFFFFFFFF; > + grinder->wrr_mask[2] =3D ((qmask >> 2) & 0x1) * 0xFFFFFFFF; > + grinder->wrr_mask[3] =3D ((qmask >> 3) & 0x1) * 0xFFFFFFFF; >=20 > grinder->wrr_cost[0] =3D pipe_params->wrr_cost[qindex]; > grinder->wrr_cost[1] =3D pipe_params->wrr_cost[qindex + 1]; > @@ -1981,14 +2014,14 @@ static inline void > grinder_wrr(struct rte_sched_port *port, uint32_t pos) > { > struct rte_sched_grinder *grinder =3D port->grinder + pos; > - uint16_t wrr_tokens_min; > + uint32_t wrr_tokens_min; >=20 > grinder->wrr_tokens[0] |=3D ~grinder->wrr_mask[0]; > grinder->wrr_tokens[1] |=3D ~grinder->wrr_mask[1]; > grinder->wrr_tokens[2] |=3D ~grinder->wrr_mask[2]; > grinder->wrr_tokens[3] |=3D ~grinder->wrr_mask[3]; >=20 > - grinder->qpos =3D rte_min_pos_4_u16(grinder->wrr_tokens); > + grinder->qpos =3D rte_min_pos_4_u32(grinder->wrr_tokens); > wrr_tokens_min =3D grinder->wrr_tokens[grinder->qpos]; >=20 > grinder->wrr_tokens[0] -=3D wrr_tokens_min; > diff --git a/lib/librte_sched/rte_sched_common.h > b/lib/librte_sched/rte_sched_common.h > index aed144b..a3c6bc2 100644 > --- a/lib/librte_sched/rte_sched_common.h > +++ b/lib/librte_sched/rte_sched_common.h > @@ -77,6 +77,21 @@ rte_min_pos_4_u16(uint16_t *x) > return pos0; > } >=20 > +static inline uint32_t > +rte_min_pos_4_u32(uint32_t *x) > +{ > + uint32_t pos0 =3D 0; > + uint32_t pos1 =3D 2; > + > + if (x[1] <=3D x[0]) > + pos0 =3D 1; > + if (x[3] <=3D x[2]) > + pos1 =3D 3; > + if (x[pos1] <=3D x[pos0]) > + pos0 =3D pos1; > + > + return pos0; > +} > #endif >=20 > /* > -- > 2.1.4 Regards, Cristian