From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 4A07DA0471 for ; Wed, 17 Jul 2019 16:49:24 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 5D1EB1BE17; Wed, 17 Jul 2019 16:49:23 +0200 (CEST) Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by dpdk.org (Postfix) with ESMTP id 6BCF11BE0D for ; Wed, 17 Jul 2019 16:49:21 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 17 Jul 2019 07:49:20 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.64,274,1559545200"; d="scan'208";a="191283648" Received: from irsmsx154.ger.corp.intel.com ([163.33.192.96]) by fmsmga004.fm.intel.com with ESMTP; 17 Jul 2019 07:49:19 -0700 Received: from irsmsx103.ger.corp.intel.com ([169.254.3.45]) by IRSMSX154.ger.corp.intel.com ([169.254.12.160]) with mapi id 14.03.0439.000; Wed, 17 Jul 2019 15:49:18 +0100 From: "Singh, Jasvinder" To: "Dumitrescu, Cristian" , "dev@dpdk.org" CC: "Tovar, AbrahamX" , "Krakowiak, LukaszX" Thread-Topic: [PATCH v4 01/11] sched: remove wrr from strict priority tc queues Thread-Index: AQHVO2gF8Eb5CF6PqkOBnuz8supfx6bO5Saw Date: Wed, 17 Jul 2019 14:49:17 +0000 Message-ID: <54CBAA185211B4429112C315DA58FF6D3FD947AF@IRSMSX103.ger.corp.intel.com> References: <20190711102659.59001-2-jasvinder.singh@intel.com> <20190712095729.159767-1-jasvinder.singh@intel.com> <20190712095729.159767-2-jasvinder.singh@intel.com> <3EB4FA525960D640B5BDFFD6A3D891268E8EEDA2@IRSMSX108.ger.corp.intel.com> In-Reply-To: <3EB4FA525960D640B5BDFFD6A3D891268E8EEDA2@IRSMSX108.ger.corp.intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiOGYzZTg2NzctMTQ1Mi00M2E3LWExMGQtMzkxMDBiOWZiOGEzIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX05UIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE3LjEwLjE4MDQuNDkiLCJUcnVzdGVkTGFiZWxIYXNoIjoiMUdham9nbTV4R0d2UUpcL3dZc29GZUJWRGprVXNPYkUrb1dEaDRlZlpZUHNUOEJtV2ZLS251THNhXC8zUWNNb0xNIn0= x-ctpclassification: CTP_NT dlp-product: dlpe-windows dlp-version: 11.0.200.100 dlp-reaction: no-action x-originating-ip: [163.33.239.180] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH v4 01/11] sched: remove wrr from strict priority tc queues X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" > > +version =3D 3 > > sources =3D files('rte_sched.c', 'rte_red.c', 'rte_approx.c') headers > > =3D files('rte_sched.h', 'rte_sched_common.h', > > 'rte_red.h', 'rte_approx.h') > > diff --git a/lib/librte_sched/rte_sched.c > > b/lib/librte_sched/rte_sched.c index bc06bc3f4..b1f521794 100644 > > --- a/lib/librte_sched/rte_sched.c > > +++ b/lib/librte_sched/rte_sched.c > > @@ -37,6 +37,8 @@ > > > > #define RTE_SCHED_TB_RATE_CONFIG_ERR (1e-7) > > #define RTE_SCHED_WRR_SHIFT 3 > > +#define RTE_SCHED_TRAFFIC_CLASS_BE > > (RTE_SCHED_TRAFFIC_CLASSES_PER_PIPE - 1) > > +#define RTE_SCHED_MAX_QUEUES_PER_TC > > RTE_SCHED_BE_QUEUES_PER_PIPE > > #define RTE_SCHED_GRINDER_PCACHE_SIZE (64 / > > RTE_SCHED_QUEUES_PER_PIPE) > > #define RTE_SCHED_PIPE_INVALID UINT32_MAX > > #define RTE_SCHED_BMP_POS_INVALID UINT32_MAX > > @@ -84,8 +86,9 @@ struct rte_sched_pipe_profile { > > uint32_t > > tc_credits_per_period[RTE_SCHED_TRAFFIC_CLASSES_PER_PIPE]; > > uint8_t tc_ov_weight; > > > > - /* Pipe queues */ > > - uint8_t wrr_cost[RTE_SCHED_QUEUES_PER_PIPE]; > > + /* Pipe best-effort traffic class queues */ > > + uint8_t n_be_queues; >=20 > The n_be_queues is the same for all pipes within the same port, so it doe= s not > make sense to save this per-port value in each pipe profile. At the very = least, > let's move it to the port data structure, please. >=20 > In fact, a better solution (that also simplifies the implementation) is t= o enforce > the same queue size for all BE queues, as it does not make sense to have > queues within the same traffic class of different size (see my comment in= the > other patch where you update the API). So n_be_queues should always be 4, > therefore no need for this variable. >=20 =20 Thanks for your time and comments. I have removed n_be_queues in v5. =20 > > + uint8_t wrr_cost[RTE_SCHED_BE_QUEUES_PER_PIPE]; > > }; > > > > struct rte_sched_pipe { > > @@ -100,8 +103,10 @@ struct rte_sched_pipe { > > uint64_t tc_time; /* time of next update */ > > uint32_t tc_credits[RTE_SCHED_TRAFFIC_CLASSES_PER_PIPE]; > > > > + uint8_t n_be_queues; /* Best effort traffic class queues */ >=20 > Same comment here, even more important, as we need to strive reducing the > size of this struct for performance reasons. >=20 > > + > > /* Weighted Round Robin (WRR) */ > > - uint8_t wrr_tokens[RTE_SCHED_QUEUES_PER_PIPE]; > > + uint8_t wrr_tokens[RTE_SCHED_BE_QUEUES_PER_PIPE]; > > > > /* TC oversubscription */ > > uint32_t tc_ov_credits; > > @@ -153,16 +158,16 @@ struct rte_sched_grinder { > > uint32_t tc_index; > > struct rte_sched_queue > > *queue[RTE_SCHED_TRAFFIC_CLASSES_PER_PIPE]; > > struct rte_mbuf **qbase[RTE_SCHED_TRAFFIC_CLASSES_PER_PIPE]; > > - uint32_t qindex[RTE_SCHED_TRAFFIC_CLASSES_PER_PIPE]; > > - uint16_t qsize; > > + uint32_t qindex[RTE_SCHED_MAX_QUEUES_PER_TC]; > > + uint16_t qsize[RTE_SCHED_MAX_QUEUES_PER_TC]; > > uint32_t qmask; > > uint32_t qpos; > > struct rte_mbuf *pkt; > > > > /* WRR */ > > - uint16_t wrr_tokens[RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS]; > > - uint16_t wrr_mask[RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS]; > > - uint8_t wrr_cost[RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS]; > > + uint16_t wrr_tokens[RTE_SCHED_BE_QUEUES_PER_PIPE]; > > + uint16_t wrr_mask[RTE_SCHED_BE_QUEUES_PER_PIPE]; > > + uint8_t wrr_cost[RTE_SCHED_BE_QUEUES_PER_PIPE]; > > }; > > > > struct rte_sched_port { > > @@ -301,7 +306,6 @@ pipe_profile_check(struct rte_sched_pipe_params > > *params, > > if (params->wrr_weights[i] =3D=3D 0) > > return -16; > > } > > - > > return 0; > > } > > > > @@ -483,7 +487,7 @@ rte_sched_port_log_pipe_profile(struct > > rte_sched_port *port, uint32_t i) > > " Token bucket: period =3D %u, credits per period =3D %u, size = =3D > > %u\n" > > " Traffic classes: period =3D %u, credits per period =3D [%u, %u, > > %u, %u]\n" > > " Traffic class 3 oversubscription: weight =3D %hhu\n" > > - " WRR cost: [%hhu, %hhu, %hhu, %hhu], [%hhu, %hhu, > > %hhu, %hhu], [%hhu, %hhu, %hhu, %hhu], [%hhu, %hhu, %hhu, %hhu]\n", > > + " WRR cost: [%hhu, %hhu, %hhu, %hhu]\n", > > i, > > > > /* Token bucket */ > > @@ -502,10 +506,7 @@ rte_sched_port_log_pipe_profile(struct > > rte_sched_port *port, uint32_t i) > > p->tc_ov_weight, > > > > /* WRR */ > > - p->wrr_cost[ 0], p->wrr_cost[ 1], p->wrr_cost[ 2], p- > > >wrr_cost[ 3], > > - p->wrr_cost[ 4], p->wrr_cost[ 5], p->wrr_cost[ 6], p- > > >wrr_cost[ 7], > > - p->wrr_cost[ 8], p->wrr_cost[ 9], p->wrr_cost[10], p- > > >wrr_cost[11], > > - p->wrr_cost[12], p->wrr_cost[13], p->wrr_cost[14], p- > > >wrr_cost[15]); > > + p->wrr_cost[0], p->wrr_cost[1], p->wrr_cost[2], p- > > >wrr_cost[3]); > > } > > > > static inline uint64_t > > @@ -519,10 +520,12 @@ rte_sched_time_ms_to_bytes(uint32_t time_ms, > > uint32_t rate) } > > > > static void > > -rte_sched_pipe_profile_convert(struct rte_sched_pipe_params *src, > > +rte_sched_pipe_profile_convert(struct rte_sched_port *port, > > + struct rte_sched_pipe_params *src, > > struct rte_sched_pipe_profile *dst, > > uint32_t rate) > > { > > + uint32_t wrr_cost[RTE_SCHED_BE_QUEUES_PER_PIPE]; > > uint32_t i; > > > > /* Token Bucket */ > > @@ -553,18 +556,36 @@ rte_sched_pipe_profile_convert(struct > > rte_sched_pipe_params *src, > > dst->tc_ov_weight =3D src->tc_ov_weight; #endif > > > > - /* WRR */ > > - for (i =3D 0; i < RTE_SCHED_TRAFFIC_CLASSES_PER_PIPE; i++) { > > - uint32_t > > wrr_cost[RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS]; > > - uint32_t lcd, lcd1, lcd2; > > - uint32_t qindex; > > + /* WRR queues */ > > + for (i =3D 0; i < RTE_SCHED_BE_QUEUES_PER_PIPE; i++) > > + if (port->qsize[i]) > > + dst->n_be_queues++; > > + > > + if (dst->n_be_queues =3D=3D 1) > > + dst->wrr_cost[0] =3D src->wrr_weights[0]; > > + > > + if (dst->n_be_queues =3D=3D 2) { > > + uint32_t lcd; > > + > > + wrr_cost[0] =3D src->wrr_weights[0]; > > + wrr_cost[1] =3D src->wrr_weights[1]; > > + > > + lcd =3D rte_get_lcd(wrr_cost[0], wrr_cost[1]); > > + > > + wrr_cost[0] =3D lcd / wrr_cost[0]; > > + wrr_cost[1] =3D lcd / wrr_cost[1]; > > > > - qindex =3D i * RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS; > > + dst->wrr_cost[0] =3D (uint8_t) wrr_cost[0]; > > + dst->wrr_cost[1] =3D (uint8_t) wrr_cost[1]; > > + } > > > > - wrr_cost[0] =3D src->wrr_weights[qindex]; > > - wrr_cost[1] =3D src->wrr_weights[qindex + 1]; > > - wrr_cost[2] =3D src->wrr_weights[qindex + 2]; > > - wrr_cost[3] =3D src->wrr_weights[qindex + 3]; > > + if (dst->n_be_queues =3D=3D 4) { >=20 > See the above comment, it is better and simpler to enforce n_be_queues = =3D=3D 4, > which simplifies this code a loot, as it keeps only this branch and remov= es the > need for the above two. >=20 Fixed in v5. > > + uint32_t lcd1, lcd2, lcd; > > + > > + wrr_cost[0] =3D src->wrr_weights[0]; > > + wrr_cost[1] =3D src->wrr_weights[1]; > > + wrr_cost[2] =3D src->wrr_weights[2]; > > + wrr_cost[3] =3D src->wrr_weights[3]; > > > > lcd1 =3D rte_get_lcd(wrr_cost[0], wrr_cost[1]); > > lcd2 =3D rte_get_lcd(wrr_cost[2], wrr_cost[3]); @@ -575,10 > +596,10 @@ > > rte_sched_pipe_profile_convert(struct > > rte_sched_pipe_params *src, > > wrr_cost[2] =3D lcd / wrr_cost[2]; > > wrr_cost[3] =3D lcd / wrr_cost[3]; > > > > - dst->wrr_cost[qindex] =3D (uint8_t) wrr_cost[0]; > > - dst->wrr_cost[qindex + 1] =3D (uint8_t) wrr_cost[1]; > > - dst->wrr_cost[qindex + 2] =3D (uint8_t) wrr_cost[2]; > > - dst->wrr_cost[qindex + 3] =3D (uint8_t) wrr_cost[3]; > > + dst->wrr_cost[0] =3D (uint8_t) wrr_cost[0]; > > + dst->wrr_cost[1] =3D (uint8_t) wrr_cost[1]; > > + dst->wrr_cost[2] =3D (uint8_t) wrr_cost[2]; > > + dst->wrr_cost[3] =3D (uint8_t) wrr_cost[3]; > > } > > } > > > > @@ -592,7 +613,7 @@ rte_sched_port_config_pipe_profile_table(struct > > rte_sched_port *port, > > struct rte_sched_pipe_params *src =3D params->pipe_profiles > > + i; > > struct rte_sched_pipe_profile *dst =3D port->pipe_profiles + i; > > > > - rte_sched_pipe_profile_convert(src, dst, params->rate); > > + rte_sched_pipe_profile_convert(port, src, dst, params- > > >rate); > > rte_sched_port_log_pipe_profile(port, i); > > } > > > > @@ -976,7 +997,7 @@ rte_sched_port_pipe_profile_add(struct > > rte_sched_port *port, > > return status; > > > > pp =3D &port->pipe_profiles[port->n_pipe_profiles]; > > - rte_sched_pipe_profile_convert(params, pp, port->rate); > > + rte_sched_pipe_profile_convert(port, params, pp, port->rate); > > > > /* Pipe profile not exists */ > > for (i =3D 0; i < port->n_pipe_profiles; i++) @@ -1715,6 +1736,7 @@ > > grinder_schedule(struct rte_sched_port *port, uint32_t pos) > > struct rte_sched_queue *queue =3D grinder->queue[grinder->qpos]; > > struct rte_mbuf *pkt =3D grinder->pkt; > > uint32_t pkt_len =3D pkt->pkt_len + port->frame_overhead; > > + int be_tc_active; > > > > if (!grinder_credits_check(port, pos)) > > return 0; > > @@ -1725,13 +1747,18 @@ grinder_schedule(struct rte_sched_port *port, > > uint32_t pos) > > /* Send packet */ > > port->pkts_out[port->n_pkts_out++] =3D pkt; > > queue->qr++; > > - grinder->wrr_tokens[grinder->qpos] +=3D pkt_len * grinder- > > >wrr_cost[grinder->qpos]; > > + > > + be_tc_active =3D (grinder->tc_index =3D=3D > > RTE_SCHED_TRAFFIC_CLASS_BE); > > + grinder->wrr_tokens[grinder->qpos] +=3D > > + pkt_len * grinder->wrr_cost[grinder->qpos] * be_tc_active; > > + >=20 > Integer multiplication is very expensive, you can easily avoid it by doin= g > bitwise-and with a mask whose values are either 0 or all-ones. > Replace multiplication with bitwise & operation in v5. > > if (queue->qr =3D=3D queue->qw) { > > uint32_t qindex =3D grinder->qindex[grinder->qpos]; > > > > rte_bitmap_clear(port->bmp, qindex); > > grinder->qmask &=3D ~(1 << grinder->qpos); > > - grinder->wrr_mask[grinder->qpos] =3D 0; > > + if (be_tc_active) > > + grinder->wrr_mask[grinder->qpos] =3D 0; > > rte_sched_port_set_queue_empty_timestamp(port, > > qindex); > > } > > > > @@ -1877,7 +1904,7 @@ grinder_next_tc(struct rte_sched_port *port, > > uint32_t pos) > > > > grinder->tc_index =3D (qindex >> 2) & 0x3; > > grinder->qmask =3D grinder->tccache_qmask[grinder->tccache_r]; > > - grinder->qsize =3D qsize; > > + grinder->qsize[grinder->tc_index] =3D qsize; > > > > grinder->qindex[0] =3D qindex; > > grinder->qindex[1] =3D qindex + 1; > > @@ -1962,26 +1989,15 @@ grinder_wrr_load(struct rte_sched_port *port, > > uint32_t pos) > > struct rte_sched_grinder *grinder =3D port->grinder + pos; > > struct rte_sched_pipe *pipe =3D grinder->pipe; > > struct rte_sched_pipe_profile *pipe_params =3D grinder- > > >pipe_params; > > - uint32_t tc_index =3D grinder->tc_index; > > uint32_t qmask =3D grinder->qmask; > > - uint32_t qindex; > > - > > - qindex =3D tc_index * 4; > > - > > - grinder->wrr_tokens[0] =3D ((uint16_t) pipe->wrr_tokens[qindex]) << > > RTE_SCHED_WRR_SHIFT; > > - grinder->wrr_tokens[1] =3D ((uint16_t) pipe->wrr_tokens[qindex + 1]) > > << RTE_SCHED_WRR_SHIFT; > > - grinder->wrr_tokens[2] =3D ((uint16_t) pipe->wrr_tokens[qindex + 2]) > > << RTE_SCHED_WRR_SHIFT; > > - grinder->wrr_tokens[3] =3D ((uint16_t) pipe->wrr_tokens[qindex + 3]) > > << RTE_SCHED_WRR_SHIFT; > > - > > - grinder->wrr_mask[0] =3D (qmask & 0x1) * 0xFFFF; > > - grinder->wrr_mask[1] =3D ((qmask >> 1) & 0x1) * 0xFFFF; > > - grinder->wrr_mask[2] =3D ((qmask >> 2) & 0x1) * 0xFFFF; > > - grinder->wrr_mask[3] =3D ((qmask >> 3) & 0x1) * 0xFFFF; > > + uint32_t i; > > > > - grinder->wrr_cost[0] =3D pipe_params->wrr_cost[qindex]; > > - grinder->wrr_cost[1] =3D pipe_params->wrr_cost[qindex + 1]; > > - grinder->wrr_cost[2] =3D pipe_params->wrr_cost[qindex + 2]; > > - grinder->wrr_cost[3] =3D pipe_params->wrr_cost[qindex + 3]; > > + for (i =3D 0; i < pipe->n_be_queues; i++) { > > + grinder->wrr_tokens[i] =3D > > + ((uint16_t) pipe->wrr_tokens[i]) << > > RTE_SCHED_WRR_SHIFT; > > + grinder->wrr_mask[i] =3D ((qmask >> i) & 0x1) * 0xFFFF; > > + grinder->wrr_cost[i] =3D pipe_params->wrr_cost[i]; > > + } > > } > > > > static inline void > > @@ -1989,19 +2005,12 @@ grinder_wrr_store(struct rte_sched_port *port, > > uint32_t pos) { > > struct rte_sched_grinder *grinder =3D port->grinder + pos; > > struct rte_sched_pipe *pipe =3D grinder->pipe; > > - uint32_t tc_index =3D grinder->tc_index; > > - uint32_t qindex; > > - > > - qindex =3D tc_index * 4; > > + uint32_t i; > > > > - pipe->wrr_tokens[qindex] =3D (grinder->wrr_tokens[0] & grinder- > > >wrr_mask[0]) > > - >> RTE_SCHED_WRR_SHIFT; > > - pipe->wrr_tokens[qindex + 1] =3D (grinder->wrr_tokens[1] & grinder- > > >wrr_mask[1]) > > - >> RTE_SCHED_WRR_SHIFT; > > - pipe->wrr_tokens[qindex + 2] =3D (grinder->wrr_tokens[2] & grinder- > > >wrr_mask[2]) > > - >> RTE_SCHED_WRR_SHIFT; > > - pipe->wrr_tokens[qindex + 3] =3D (grinder->wrr_tokens[3] & grinder- > > >wrr_mask[3]) > > - >> RTE_SCHED_WRR_SHIFT; > > + for (i =3D 0; i < pipe->n_be_queues; i++) > > + pipe->wrr_tokens[i] =3D > > + (grinder->wrr_tokens[i] & grinder->wrr_mask[i]) >> > > + RTE_SCHED_WRR_SHIFT; > > } > > > > static inline void > > @@ -2040,22 +2049,31 @@ static inline void > > grinder_prefetch_tc_queue_arrays(struct rte_sched_port *port, uint32_t > > pos) > > { > > struct rte_sched_grinder *grinder =3D port->grinder + pos; > > - uint16_t qsize, qr[4]; > > + struct rte_sched_pipe *pipe =3D grinder->pipe; > > + struct rte_sched_queue *queue; > > + uint32_t i; > > + uint16_t qsize, qr[RTE_SCHED_MAX_QUEUES_PER_TC]; > > > > - qsize =3D grinder->qsize; > > - qr[0] =3D grinder->queue[0]->qr & (qsize - 1); > > - qr[1] =3D grinder->queue[1]->qr & (qsize - 1); > > - qr[2] =3D grinder->queue[2]->qr & (qsize - 1); > > - qr[3] =3D grinder->queue[3]->qr & (qsize - 1); > > + grinder->qpos =3D 0; > > + if (grinder->tc_index < RTE_SCHED_TRAFFIC_CLASS_BE) { > > + queue =3D grinder->queue[0]; > > + qsize =3D grinder->qsize[0]; > > + qr[0] =3D queue->qr & (qsize - 1); > > > > - rte_prefetch0(grinder->qbase[0] + qr[0]); > > - rte_prefetch0(grinder->qbase[1] + qr[1]); > > + rte_prefetch0(grinder->qbase[0] + qr[0]); > > + return; > > + } > > + > > + for (i =3D 0; i < pipe->n_be_queues; i++) { > > + queue =3D grinder->queue[i]; > > + qsize =3D grinder->qsize[i]; > > + qr[i] =3D queue->qr & (qsize - 1); > > + > > + rte_prefetch0(grinder->qbase[i] + qr[i]); > > + } > > > > grinder_wrr_load(port, pos); > > grinder_wrr(port, pos); > > - > > - rte_prefetch0(grinder->qbase[2] + qr[2]); > > - rte_prefetch0(grinder->qbase[3] + qr[3]); > > } > > > > static inline void > > @@ -2064,7 +2082,7 @@ grinder_prefetch_mbuf(struct rte_sched_port > > *port, uint32_t pos) > > struct rte_sched_grinder *grinder =3D port->grinder + pos; > > uint32_t qpos =3D grinder->qpos; > > struct rte_mbuf **qbase =3D grinder->qbase[qpos]; > > - uint16_t qsize =3D grinder->qsize; > > + uint16_t qsize =3D grinder->qsize[qpos]; > > uint16_t qr =3D grinder->queue[qpos]->qr & (qsize - 1); > > > > grinder->pkt =3D qbase[qr]; > > @@ -2118,18 +2136,24 @@ grinder_handle(struct rte_sched_port *port, > > uint32_t pos) > > > > case e_GRINDER_READ_MBUF: > > { > > - uint32_t result =3D 0; > > + uint32_t wrr_active, result =3D 0; > > > > result =3D grinder_schedule(port, pos); > > > > + wrr_active =3D (grinder->tc_index =3D=3D > > RTE_SCHED_TRAFFIC_CLASS_BE); > > + > > /* Look for next packet within the same TC */ > > if (result && grinder->qmask) { > > - grinder_wrr(port, pos); > > + if (wrr_active) > > + grinder_wrr(port, pos); > > + > > grinder_prefetch_mbuf(port, pos); > > > > return 1; > > } > > - grinder_wrr_store(port, pos); > > + > > + if (wrr_active) > > + grinder_wrr_store(port, pos); > > > > /* Look for another active TC within same pipe */ > > if (grinder_next_tc(port, pos)) { > > diff --git a/lib/librte_sched/rte_sched.h > > b/lib/librte_sched/rte_sched.h index d61dda9f5..2a935998a 100644 > > --- a/lib/librte_sched/rte_sched.h > > +++ b/lib/librte_sched/rte_sched.h > > @@ -66,6 +66,22 @@ extern "C" { > > #include "rte_red.h" > > #endif > > > > +/** Maximum number of queues per pipe. > > + * Note that the multiple queues (power of 2) can only be assigned to > > + * lowest priority (best-effort) traffic class. Other higher priority > > +traffic > > + * classes can only have one queue. > > + * Can not change. > > + * > > + * @see struct rte_sched_port_params > > + */ > > +#define RTE_SCHED_QUEUES_PER_PIPE 16 > > + > > +/** Number of WRR queues for best-effort traffic class per pipe. > > + * > > + * @see struct rte_sched_pipe_params > > + */ > > +#define RTE_SCHED_BE_QUEUES_PER_PIPE 4 > > + > > /** Number of traffic classes per pipe (as well as subport). > > * Cannot be changed. > > */ > > @@ -74,11 +90,6 @@ extern "C" { > > /** Number of queues per pipe traffic class. Cannot be changed. */ > > #define RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS 4 > > > > -/** Number of queues per pipe. */ > > -#define RTE_SCHED_QUEUES_PER_PIPE \ > > - (RTE_SCHED_TRAFFIC_CLASSES_PER_PIPE * \ > > - RTE_SCHED_QUEUES_PER_TRAFFIC_CLASS) > > - > > /** Maximum number of pipe profiles that can be defined per port. > > * Compile-time configurable. > > */ > > @@ -165,7 +176,7 @@ struct rte_sched_pipe_params { #endif > > > > /* Pipe queues */ > > - uint8_t wrr_weights[RTE_SCHED_QUEUES_PER_PIPE]; /**< WRR > > weights */ > > + uint8_t wrr_weights[RTE_SCHED_BE_QUEUES_PER_PIPE]; /**< > > WRR weights */ > > }; > > > > /** Queue statistics */ > > -- > > 2.21.0