From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id F0C38A0471 for ; Tue, 16 Jul 2019 01:50:09 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 9FF402C23; Tue, 16 Jul 2019 01:50:08 +0200 (CEST) Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by dpdk.org (Postfix) with ESMTP id 9B2372C18 for ; Tue, 16 Jul 2019 01:50:06 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga106.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 15 Jul 2019 16:50:05 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.63,494,1557212400"; d="scan'208";a="342539843" Received: from irsmsx153.ger.corp.intel.com ([163.33.192.75]) by orsmga005.jf.intel.com with ESMTP; 15 Jul 2019 16:50:04 -0700 Received: from irsmsx108.ger.corp.intel.com ([169.254.11.229]) by IRSMSX153.ger.corp.intel.com ([169.254.9.166]) with mapi id 14.03.0439.000; Tue, 16 Jul 2019 00:50:03 +0100 From: "Dumitrescu, Cristian" To: "Singh, Jasvinder" , "dev@dpdk.org" CC: "Tovar, AbrahamX" , "Krakowiak, LukaszX" Thread-Topic: [PATCH v4 01/11] sched: remove wrr from strict priority tc queues Thread-Index: AQHVOJhBMvHarYslhUil11rM3EraUabMWvng Date: Mon, 15 Jul 2019 23:50:02 +0000 Message-ID: <3EB4FA525960D640B5BDFFD6A3D891268E8EEDA2@IRSMSX108.ger.corp.intel.com> References: <20190711102659.59001-2-jasvinder.singh@intel.com> <20190712095729.159767-1-jasvinder.singh@intel.com> <20190712095729.159767-2-jasvinder.singh@intel.com> In-Reply-To: <20190712095729.159767-2-jasvinder.singh@intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiOGYzZTg2NzctMTQ1Mi00M2E3LWExMGQtMzkxMDBiOWZiOGEzIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX05UIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE3LjEwLjE4MDQuNDkiLCJUcnVzdGVkTGFiZWxIYXNoIjoiMUdham9nbTV4R0d2UUpcL3dZc29GZUJWRGprVXNPYkUrb1dEaDRlZlpZUHNUOEJtV2ZLS251THNhXC8zUWNNb0xNIn0= x-ctpclassification: CTP_NT dlp-product: dlpe-windows dlp-version: 11.2.0.6 dlp-reaction: no-action x-originating-ip: [163.33.239.180] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH v4 01/11] sched: remove wrr from strict priority tc queues X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" > -----Original Message----- > From: Singh, Jasvinder > Sent: Friday, July 12, 2019 11:57 AM > To: dev@dpdk.org > Cc: Dumitrescu, Cristian ; Tovar, AbrahamX > ; Krakowiak, LukaszX > > Subject: [PATCH v4 01/11] sched: remove wrr from strict priority tc queue= s >=20 > All higher priority traffic classes contain only one queue, thus > remove wrr function for them. The lowest priority best-effort > traffic class conitnue to have multiple queues and packet are > scheduled from its queues using wrr function. >=20 > Signed-off-by: Jasvinder Singh > Signed-off-by: Abraham Tovar > Signed-off-by: Lukasz Krakowiak > --- > app/test/test_sched.c | 2 +- > examples/qos_sched/init.c | 2 +- > lib/librte_sched/Makefile | 2 +- > lib/librte_sched/meson.build | 2 +- > lib/librte_sched/rte_sched.c | 182 ++++++++++++++++++++--------------- > lib/librte_sched/rte_sched.h | 23 +++-- > 6 files changed, 124 insertions(+), 89 deletions(-) >=20 > diff --git a/app/test/test_sched.c b/app/test/test_sched.c > index 49bb9ea6f..36fa2d425 100644 > --- a/app/test/test_sched.c > +++ b/app/test/test_sched.c > @@ -40,7 +40,7 @@ static struct rte_sched_pipe_params pipe_profile[] =3D = { > .tc_rate =3D {305175, 305175, 305175, 305175}, > .tc_period =3D 40, >=20 > - .wrr_weights =3D {1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1}, > + .wrr_weights =3D {1, 1, 1, 1}, > }, > }; >=20 > diff --git a/examples/qos_sched/init.c b/examples/qos_sched/init.c > index 1209bd7ce..6b63d4e0e 100644 > --- a/examples/qos_sched/init.c > +++ b/examples/qos_sched/init.c > @@ -186,7 +186,7 @@ static struct rte_sched_pipe_params > pipe_profiles[RTE_SCHED_PIPE_PROFILES_PER_PO > .tc_ov_weight =3D 1, > #endif >=20 > - .wrr_weights =3D {1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1}, > + .wrr_weights =3D {1, 1, 1, 1}, > }, > }; >=20 > diff --git a/lib/librte_sched/Makefile b/lib/librte_sched/Makefile > index 644fd9d15..3d7f410e1 100644 > --- a/lib/librte_sched/Makefile > +++ b/lib/librte_sched/Makefile > @@ -18,7 +18,7 @@ LDLIBS +=3D -lrte_timer >=20 > EXPORT_MAP :=3D rte_sched_version.map >=20 > -LIBABIVER :=3D 2 > +LIBABIVER :=3D 3 >=20 > # > # all source are stored in SRCS-y > diff --git a/lib/librte_sched/meson.build b/lib/librte_sched/meson.build > index 8e989e5f6..59d43c6d8 100644 > --- a/lib/librte_sched/meson.build > +++ b/lib/librte_sched/meson.build > @@ -1,7 +1,7 @@ > # SPDX-License-Identifier: BSD-3-Clause > # Copyright(c) 2017 Intel Corporation >=20 > -version =3D 2 > +version =3D 3 > sources =3D files('rte_sched.c', 'rte_red.c', 'rte_approx.c') > headers =3D files('rte_sched.h', 'rte_sched_common.h', > 'rte_red.h', 'rte_approx.h') > diff --git a/lib/librte_sched/rte_sched.c b/lib/librte_sched/rte_sched.c > index bc06bc3f4..b1f521794 100644 > --- a/lib/librte_sched/rte_sched.c > +++ b/lib/librte_sched/rte_sched.c > @@ -37,6 +37,8 @@ >=20 > #define RTE_SCHED_TB_RATE_CONFIG_ERR (1e-7) > #define RTE_SCHED_WRR_SHIFT 3 > +#define RTE_SCHED_TRAFFIC_CLASS_BE > (RTE_SCHED_TRAFFIC_CLASSES_PER_PIPE - 1) > +#define RTE_SCHED_MAX_QUEUES_PER_TC > RTE_SCHED_BE_QUEUES_PER_PIPE > #define RTE_SCHED_GRINDER_PCACHE_SIZE (64 / > RTE_SCHED_QUEUES_PER_PIPE) > #define RTE_SCHED_PIPE_INVALID UINT32_MAX > #define RTE_SCHED_BMP_POS_INVALID UINT32_MAX > @@ -84,8 +86,9 @@ struct rte_sched_pipe_profile { > uint32_t > tc_credits_per_period[RTE_SCHED_TRAFFIC_CLASSES_PER_PIPE]; > uint8_t tc_ov_weight; >=20 > - /* Pipe queues */ > - uint8_t wrr_cost[RTE_SCHED_QUEUES_PER_PIPE]; > + /* Pipe best-effort traffic class queues */ > + uint8_t n_be_queues; The n_be_queues is the same for all pipes within the same port, so it does = not make sense to save this per-port value in each pipe profile. At the ver= y least, let's move it to the port data structure, please. In fact, a better solution (that also simplifies the implementation) is to = enforce the same queue size for all BE queues, as it does not make sense to= have queues within the same traffic class of different size (see my commen= t in the other patch where you update the API). So n_be_queues should alway= s be 4, therefore no need for this variable. > + uint8_t wrr_cost[RTE_SCHED_BE_QUEUES_PER_PIPE]; > }; >=20 > struct rte_sched_pipe { > @@ -100,8 +103,10 @@ struct rte_sched_pipe { > uint64_t tc_time; /* time of next update */ > uint32_t tc_credits[RTE_SCHED_TRAFFIC_CLASSES_PER_PIPE]; >=20 > + uint8_t n_be_queues; /* Best effort traffic class queues */ Same comment here, even more important, as we need to strive reducing the s= ize of this struct for performance reasons. > + > /* Weighted Round Robin (WRR) */ > - uint8_t wrr_tokens[RTE_SCHED_QUEUES_PER_PIPE]; > + uint8_t wrr_tokens[RTE_SCHED_BE_QUEUES_PER_PIPE]; >=20 > /* TC oversubscription */ > uint32_t tc_ov_credits; > @@ -153,16 +158,16 @@ struct rte_sched_grinder { > uint32_t tc_index; > struct rte_sched_queue > *queue[RTE_SCHED_TRAFFIC_CLASSES_PER_PIPE]; > struct rte_mbuf **qbase[RTE_SCHED_TRAFFIC_CLASSES_PER_PIPE]; > - uint32_t qindex[RTE_SCHED_TRAFFIC_CLASSES_PER_PIPE]; > - uint16_t qsize; > + uint32_t qindex[RTE_SCHED_MAX_QUEUES_PER_TC]; > + uint16_t qsize[RTE_SCHED_MAX_QUEUES_PER_TC]; > uint32_t qmask; > uint32_t qpos; > struct rte_mbuf *pkt; >=20 > /* WRR */ > - uint16_t wrr_tokens[RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS]; > - uint16_t wrr_mask[RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS]; > - uint8_t wrr_cost[RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS]; > + uint16_t wrr_tokens[RTE_SCHED_BE_QUEUES_PER_PIPE]; > + uint16_t wrr_mask[RTE_SCHED_BE_QUEUES_PER_PIPE]; > + uint8_t wrr_cost[RTE_SCHED_BE_QUEUES_PER_PIPE]; > }; >=20 > struct rte_sched_port { > @@ -301,7 +306,6 @@ pipe_profile_check(struct rte_sched_pipe_params > *params, > if (params->wrr_weights[i] =3D=3D 0) > return -16; > } > - > return 0; > } >=20 > @@ -483,7 +487,7 @@ rte_sched_port_log_pipe_profile(struct > rte_sched_port *port, uint32_t i) > " Token bucket: period =3D %u, credits per period =3D %u, size =3D > %u\n" > " Traffic classes: period =3D %u, credits per period =3D [%u, %u, > %u, %u]\n" > " Traffic class 3 oversubscription: weight =3D %hhu\n" > - " WRR cost: [%hhu, %hhu, %hhu, %hhu], [%hhu, %hhu, > %hhu, %hhu], [%hhu, %hhu, %hhu, %hhu], [%hhu, %hhu, %hhu, %hhu]\n", > + " WRR cost: [%hhu, %hhu, %hhu, %hhu]\n", > i, >=20 > /* Token bucket */ > @@ -502,10 +506,7 @@ rte_sched_port_log_pipe_profile(struct > rte_sched_port *port, uint32_t i) > p->tc_ov_weight, >=20 > /* WRR */ > - p->wrr_cost[ 0], p->wrr_cost[ 1], p->wrr_cost[ 2], p- > >wrr_cost[ 3], > - p->wrr_cost[ 4], p->wrr_cost[ 5], p->wrr_cost[ 6], p- > >wrr_cost[ 7], > - p->wrr_cost[ 8], p->wrr_cost[ 9], p->wrr_cost[10], p- > >wrr_cost[11], > - p->wrr_cost[12], p->wrr_cost[13], p->wrr_cost[14], p- > >wrr_cost[15]); > + p->wrr_cost[0], p->wrr_cost[1], p->wrr_cost[2], p- > >wrr_cost[3]); > } >=20 > static inline uint64_t > @@ -519,10 +520,12 @@ rte_sched_time_ms_to_bytes(uint32_t time_ms, > uint32_t rate) > } >=20 > static void > -rte_sched_pipe_profile_convert(struct rte_sched_pipe_params *src, > +rte_sched_pipe_profile_convert(struct rte_sched_port *port, > + struct rte_sched_pipe_params *src, > struct rte_sched_pipe_profile *dst, > uint32_t rate) > { > + uint32_t wrr_cost[RTE_SCHED_BE_QUEUES_PER_PIPE]; > uint32_t i; >=20 > /* Token Bucket */ > @@ -553,18 +556,36 @@ rte_sched_pipe_profile_convert(struct > rte_sched_pipe_params *src, > dst->tc_ov_weight =3D src->tc_ov_weight; > #endif >=20 > - /* WRR */ > - for (i =3D 0; i < RTE_SCHED_TRAFFIC_CLASSES_PER_PIPE; i++) { > - uint32_t > wrr_cost[RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS]; > - uint32_t lcd, lcd1, lcd2; > - uint32_t qindex; > + /* WRR queues */ > + for (i =3D 0; i < RTE_SCHED_BE_QUEUES_PER_PIPE; i++) > + if (port->qsize[i]) > + dst->n_be_queues++; > + > + if (dst->n_be_queues =3D=3D 1) > + dst->wrr_cost[0] =3D src->wrr_weights[0]; > + > + if (dst->n_be_queues =3D=3D 2) { > + uint32_t lcd; > + > + wrr_cost[0] =3D src->wrr_weights[0]; > + wrr_cost[1] =3D src->wrr_weights[1]; > + > + lcd =3D rte_get_lcd(wrr_cost[0], wrr_cost[1]); > + > + wrr_cost[0] =3D lcd / wrr_cost[0]; > + wrr_cost[1] =3D lcd / wrr_cost[1]; >=20 > - qindex =3D i * RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS; > + dst->wrr_cost[0] =3D (uint8_t) wrr_cost[0]; > + dst->wrr_cost[1] =3D (uint8_t) wrr_cost[1]; > + } >=20 > - wrr_cost[0] =3D src->wrr_weights[qindex]; > - wrr_cost[1] =3D src->wrr_weights[qindex + 1]; > - wrr_cost[2] =3D src->wrr_weights[qindex + 2]; > - wrr_cost[3] =3D src->wrr_weights[qindex + 3]; > + if (dst->n_be_queues =3D=3D 4) { See the above comment, it is better and simpler to enforce n_be_queues =3D= =3D 4, which simplifies this code a loot, as it keeps only this branch and = removes the need for the above two. > + uint32_t lcd1, lcd2, lcd; > + > + wrr_cost[0] =3D src->wrr_weights[0]; > + wrr_cost[1] =3D src->wrr_weights[1]; > + wrr_cost[2] =3D src->wrr_weights[2]; > + wrr_cost[3] =3D src->wrr_weights[3]; >=20 > lcd1 =3D rte_get_lcd(wrr_cost[0], wrr_cost[1]); > lcd2 =3D rte_get_lcd(wrr_cost[2], wrr_cost[3]); > @@ -575,10 +596,10 @@ rte_sched_pipe_profile_convert(struct > rte_sched_pipe_params *src, > wrr_cost[2] =3D lcd / wrr_cost[2]; > wrr_cost[3] =3D lcd / wrr_cost[3]; >=20 > - dst->wrr_cost[qindex] =3D (uint8_t) wrr_cost[0]; > - dst->wrr_cost[qindex + 1] =3D (uint8_t) wrr_cost[1]; > - dst->wrr_cost[qindex + 2] =3D (uint8_t) wrr_cost[2]; > - dst->wrr_cost[qindex + 3] =3D (uint8_t) wrr_cost[3]; > + dst->wrr_cost[0] =3D (uint8_t) wrr_cost[0]; > + dst->wrr_cost[1] =3D (uint8_t) wrr_cost[1]; > + dst->wrr_cost[2] =3D (uint8_t) wrr_cost[2]; > + dst->wrr_cost[3] =3D (uint8_t) wrr_cost[3]; > } > } >=20 > @@ -592,7 +613,7 @@ rte_sched_port_config_pipe_profile_table(struct > rte_sched_port *port, > struct rte_sched_pipe_params *src =3D params->pipe_profiles > + i; > struct rte_sched_pipe_profile *dst =3D port->pipe_profiles + i; >=20 > - rte_sched_pipe_profile_convert(src, dst, params->rate); > + rte_sched_pipe_profile_convert(port, src, dst, params- > >rate); > rte_sched_port_log_pipe_profile(port, i); > } >=20 > @@ -976,7 +997,7 @@ rte_sched_port_pipe_profile_add(struct > rte_sched_port *port, > return status; >=20 > pp =3D &port->pipe_profiles[port->n_pipe_profiles]; > - rte_sched_pipe_profile_convert(params, pp, port->rate); > + rte_sched_pipe_profile_convert(port, params, pp, port->rate); >=20 > /* Pipe profile not exists */ > for (i =3D 0; i < port->n_pipe_profiles; i++) > @@ -1715,6 +1736,7 @@ grinder_schedule(struct rte_sched_port *port, > uint32_t pos) > struct rte_sched_queue *queue =3D grinder->queue[grinder->qpos]; > struct rte_mbuf *pkt =3D grinder->pkt; > uint32_t pkt_len =3D pkt->pkt_len + port->frame_overhead; > + int be_tc_active; >=20 > if (!grinder_credits_check(port, pos)) > return 0; > @@ -1725,13 +1747,18 @@ grinder_schedule(struct rte_sched_port *port, > uint32_t pos) > /* Send packet */ > port->pkts_out[port->n_pkts_out++] =3D pkt; > queue->qr++; > - grinder->wrr_tokens[grinder->qpos] +=3D pkt_len * grinder- > >wrr_cost[grinder->qpos]; > + > + be_tc_active =3D (grinder->tc_index =3D=3D > RTE_SCHED_TRAFFIC_CLASS_BE); > + grinder->wrr_tokens[grinder->qpos] +=3D > + pkt_len * grinder->wrr_cost[grinder->qpos] * be_tc_active; > + Integer multiplication is very expensive, you can easily avoid it by doing = bitwise-and with a mask whose values are either 0 or all-ones. > if (queue->qr =3D=3D queue->qw) { > uint32_t qindex =3D grinder->qindex[grinder->qpos]; >=20 > rte_bitmap_clear(port->bmp, qindex); > grinder->qmask &=3D ~(1 << grinder->qpos); > - grinder->wrr_mask[grinder->qpos] =3D 0; > + if (be_tc_active) > + grinder->wrr_mask[grinder->qpos] =3D 0; > rte_sched_port_set_queue_empty_timestamp(port, > qindex); > } >=20 > @@ -1877,7 +1904,7 @@ grinder_next_tc(struct rte_sched_port *port, > uint32_t pos) >=20 > grinder->tc_index =3D (qindex >> 2) & 0x3; > grinder->qmask =3D grinder->tccache_qmask[grinder->tccache_r]; > - grinder->qsize =3D qsize; > + grinder->qsize[grinder->tc_index] =3D qsize; >=20 > grinder->qindex[0] =3D qindex; > grinder->qindex[1] =3D qindex + 1; > @@ -1962,26 +1989,15 @@ grinder_wrr_load(struct rte_sched_port *port, > uint32_t pos) > struct rte_sched_grinder *grinder =3D port->grinder + pos; > struct rte_sched_pipe *pipe =3D grinder->pipe; > struct rte_sched_pipe_profile *pipe_params =3D grinder- > >pipe_params; > - uint32_t tc_index =3D grinder->tc_index; > uint32_t qmask =3D grinder->qmask; > - uint32_t qindex; > - > - qindex =3D tc_index * 4; > - > - grinder->wrr_tokens[0] =3D ((uint16_t) pipe->wrr_tokens[qindex]) << > RTE_SCHED_WRR_SHIFT; > - grinder->wrr_tokens[1] =3D ((uint16_t) pipe->wrr_tokens[qindex + 1]) > << RTE_SCHED_WRR_SHIFT; > - grinder->wrr_tokens[2] =3D ((uint16_t) pipe->wrr_tokens[qindex + 2]) > << RTE_SCHED_WRR_SHIFT; > - grinder->wrr_tokens[3] =3D ((uint16_t) pipe->wrr_tokens[qindex + 3]) > << RTE_SCHED_WRR_SHIFT; > - > - grinder->wrr_mask[0] =3D (qmask & 0x1) * 0xFFFF; > - grinder->wrr_mask[1] =3D ((qmask >> 1) & 0x1) * 0xFFFF; > - grinder->wrr_mask[2] =3D ((qmask >> 2) & 0x1) * 0xFFFF; > - grinder->wrr_mask[3] =3D ((qmask >> 3) & 0x1) * 0xFFFF; > + uint32_t i; >=20 > - grinder->wrr_cost[0] =3D pipe_params->wrr_cost[qindex]; > - grinder->wrr_cost[1] =3D pipe_params->wrr_cost[qindex + 1]; > - grinder->wrr_cost[2] =3D pipe_params->wrr_cost[qindex + 2]; > - grinder->wrr_cost[3] =3D pipe_params->wrr_cost[qindex + 3]; > + for (i =3D 0; i < pipe->n_be_queues; i++) { > + grinder->wrr_tokens[i] =3D > + ((uint16_t) pipe->wrr_tokens[i]) << > RTE_SCHED_WRR_SHIFT; > + grinder->wrr_mask[i] =3D ((qmask >> i) & 0x1) * 0xFFFF; > + grinder->wrr_cost[i] =3D pipe_params->wrr_cost[i]; > + } > } >=20 > static inline void > @@ -1989,19 +2005,12 @@ grinder_wrr_store(struct rte_sched_port *port, > uint32_t pos) > { > struct rte_sched_grinder *grinder =3D port->grinder + pos; > struct rte_sched_pipe *pipe =3D grinder->pipe; > - uint32_t tc_index =3D grinder->tc_index; > - uint32_t qindex; > - > - qindex =3D tc_index * 4; > + uint32_t i; >=20 > - pipe->wrr_tokens[qindex] =3D (grinder->wrr_tokens[0] & grinder- > >wrr_mask[0]) > - >> RTE_SCHED_WRR_SHIFT; > - pipe->wrr_tokens[qindex + 1] =3D (grinder->wrr_tokens[1] & grinder- > >wrr_mask[1]) > - >> RTE_SCHED_WRR_SHIFT; > - pipe->wrr_tokens[qindex + 2] =3D (grinder->wrr_tokens[2] & grinder- > >wrr_mask[2]) > - >> RTE_SCHED_WRR_SHIFT; > - pipe->wrr_tokens[qindex + 3] =3D (grinder->wrr_tokens[3] & grinder- > >wrr_mask[3]) > - >> RTE_SCHED_WRR_SHIFT; > + for (i =3D 0; i < pipe->n_be_queues; i++) > + pipe->wrr_tokens[i] =3D > + (grinder->wrr_tokens[i] & grinder->wrr_mask[i]) >> > + RTE_SCHED_WRR_SHIFT; > } >=20 > static inline void > @@ -2040,22 +2049,31 @@ static inline void > grinder_prefetch_tc_queue_arrays(struct rte_sched_port *port, uint32_t > pos) > { > struct rte_sched_grinder *grinder =3D port->grinder + pos; > - uint16_t qsize, qr[4]; > + struct rte_sched_pipe *pipe =3D grinder->pipe; > + struct rte_sched_queue *queue; > + uint32_t i; > + uint16_t qsize, qr[RTE_SCHED_MAX_QUEUES_PER_TC]; >=20 > - qsize =3D grinder->qsize; > - qr[0] =3D grinder->queue[0]->qr & (qsize - 1); > - qr[1] =3D grinder->queue[1]->qr & (qsize - 1); > - qr[2] =3D grinder->queue[2]->qr & (qsize - 1); > - qr[3] =3D grinder->queue[3]->qr & (qsize - 1); > + grinder->qpos =3D 0; > + if (grinder->tc_index < RTE_SCHED_TRAFFIC_CLASS_BE) { > + queue =3D grinder->queue[0]; > + qsize =3D grinder->qsize[0]; > + qr[0] =3D queue->qr & (qsize - 1); >=20 > - rte_prefetch0(grinder->qbase[0] + qr[0]); > - rte_prefetch0(grinder->qbase[1] + qr[1]); > + rte_prefetch0(grinder->qbase[0] + qr[0]); > + return; > + } > + > + for (i =3D 0; i < pipe->n_be_queues; i++) { > + queue =3D grinder->queue[i]; > + qsize =3D grinder->qsize[i]; > + qr[i] =3D queue->qr & (qsize - 1); > + > + rte_prefetch0(grinder->qbase[i] + qr[i]); > + } >=20 > grinder_wrr_load(port, pos); > grinder_wrr(port, pos); > - > - rte_prefetch0(grinder->qbase[2] + qr[2]); > - rte_prefetch0(grinder->qbase[3] + qr[3]); > } >=20 > static inline void > @@ -2064,7 +2082,7 @@ grinder_prefetch_mbuf(struct rte_sched_port > *port, uint32_t pos) > struct rte_sched_grinder *grinder =3D port->grinder + pos; > uint32_t qpos =3D grinder->qpos; > struct rte_mbuf **qbase =3D grinder->qbase[qpos]; > - uint16_t qsize =3D grinder->qsize; > + uint16_t qsize =3D grinder->qsize[qpos]; > uint16_t qr =3D grinder->queue[qpos]->qr & (qsize - 1); >=20 > grinder->pkt =3D qbase[qr]; > @@ -2118,18 +2136,24 @@ grinder_handle(struct rte_sched_port *port, > uint32_t pos) >=20 > case e_GRINDER_READ_MBUF: > { > - uint32_t result =3D 0; > + uint32_t wrr_active, result =3D 0; >=20 > result =3D grinder_schedule(port, pos); >=20 > + wrr_active =3D (grinder->tc_index =3D=3D > RTE_SCHED_TRAFFIC_CLASS_BE); > + > /* Look for next packet within the same TC */ > if (result && grinder->qmask) { > - grinder_wrr(port, pos); > + if (wrr_active) > + grinder_wrr(port, pos); > + > grinder_prefetch_mbuf(port, pos); >=20 > return 1; > } > - grinder_wrr_store(port, pos); > + > + if (wrr_active) > + grinder_wrr_store(port, pos); >=20 > /* Look for another active TC within same pipe */ > if (grinder_next_tc(port, pos)) { > diff --git a/lib/librte_sched/rte_sched.h b/lib/librte_sched/rte_sched.h > index d61dda9f5..2a935998a 100644 > --- a/lib/librte_sched/rte_sched.h > +++ b/lib/librte_sched/rte_sched.h > @@ -66,6 +66,22 @@ extern "C" { > #include "rte_red.h" > #endif >=20 > +/** Maximum number of queues per pipe. > + * Note that the multiple queues (power of 2) can only be assigned to > + * lowest priority (best-effort) traffic class. Other higher priority tr= affic > + * classes can only have one queue. > + * Can not change. > + * > + * @see struct rte_sched_port_params > + */ > +#define RTE_SCHED_QUEUES_PER_PIPE 16 > + > +/** Number of WRR queues for best-effort traffic class per pipe. > + * > + * @see struct rte_sched_pipe_params > + */ > +#define RTE_SCHED_BE_QUEUES_PER_PIPE 4 > + > /** Number of traffic classes per pipe (as well as subport). > * Cannot be changed. > */ > @@ -74,11 +90,6 @@ extern "C" { > /** Number of queues per pipe traffic class. Cannot be changed. */ > #define RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS 4 >=20 > -/** Number of queues per pipe. */ > -#define RTE_SCHED_QUEUES_PER_PIPE \ > - (RTE_SCHED_TRAFFIC_CLASSES_PER_PIPE * \ > - RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS) > - > /** Maximum number of pipe profiles that can be defined per port. > * Compile-time configurable. > */ > @@ -165,7 +176,7 @@ struct rte_sched_pipe_params { > #endif >=20 > /* Pipe queues */ > - uint8_t wrr_weights[RTE_SCHED_QUEUES_PER_PIPE]; /**< WRR > weights */ > + uint8_t wrr_weights[RTE_SCHED_BE_QUEUES_PER_PIPE]; /**< > WRR weights */ > }; >=20 > /** Queue statistics */ > -- > 2.21.0