DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Dumitrescu, Cristian" <cristian.dumitrescu@intel.com>
To: "Danilewicz, MarcinX" <marcinx.danilewicz@intel.com>,
	"dev@dpdk.org" <dev@dpdk.org>,
	"Singh, Jasvinder" <jasvinder.singh@intel.com>
Cc: "Ajmera, Megha" <megha.ajmera@intel.com>
Subject: RE: [PATCH v3] sched: enable/disable TC OV at runtime
Date: Mon, 9 May 2022 20:05:31 +0000	[thread overview]
Message-ID: <DM8PR11MB5670595C757CB671A40D1510EBC69@DM8PR11MB5670.namprd11.prod.outlook.com> (raw)
In-Reply-To: <20220427092357.491720-1-marcinx.danilewicz@intel.com>

Hi Marcin,

> -----Original Message-----
> From: Danilewicz, MarcinX <marcinx.danilewicz@intel.com>
> Sent: Wednesday, April 27, 2022 10:24 AM
> To: dev@dpdk.org; Singh, Jasvinder <jasvinder.singh@intel.com>; Dumitrescu,
> Cristian <cristian.dumitrescu@intel.com>
> Cc: Ajmera, Megha <megha.ajmera@intel.com>
> Subject: [PATCH v3] sched: enable/disable TC OV at runtime

We are not trying to enable/disable the traffic class oversubscription feature at run-time, but at initialization. If cat, we should prohibit changing this post-initialization.

Also the name of the feature should not be abbreviated in the patch title.

I suggest you rework the title to:
[PATCH] sched: enable traffic class oversubscription conditionally

> 
> Added new API to enable or disable TC over subscription for best
> effort traffic class at subport level.
> Added changes after review and increased throughput.
> 
> By default TC OV is disabled.

It should be the other way around, the TC_OV should be enabled by default. The TC oversubscription is a more natural way to use this library, we usually want to disable this feature just for better performance in case this functionality is not needed. Please initialize the tc_ov flag accordingly.

> 
> Signed-off-by: Marcin Danilewicz <marcinx.danilewicz@intel.com>
> ---
>  lib/sched/rte_sched.c | 189 +++++++++++++++++++++++++++++++++++-------
>  lib/sched/rte_sched.h |  18 ++++
>  lib/sched/version.map |   3 +
>  3 files changed, 178 insertions(+), 32 deletions(-)
> 
> diff --git a/lib/sched/rte_sched.c b/lib/sched/rte_sched.c
> index ec74bee939..6e7d81df46 100644
> --- a/lib/sched/rte_sched.c
> +++ b/lib/sched/rte_sched.c
> @@ -213,6 +213,9 @@ struct rte_sched_subport {
>  	uint8_t *bmp_array;
>  	struct rte_mbuf **queue_array;
>  	uint8_t memory[0] __rte_cache_aligned;
> +
> +	/* TC oversubscription activation */
> +	int is_tc_ov_enabled;

How about we simplify the name of this variable to: tc_ov_enabled ?

>  } __rte_cache_aligned;
> 
>  struct rte_sched_port {
> @@ -1165,6 +1168,45 @@ rte_sched_cman_config(struct rte_sched_port
> *port,
>  }
>  #endif
> 
> +int
> +rte_sched_subport_tc_ov_config(struct rte_sched_port *port,
> +	uint32_t subport_id,
> +	bool tc_ov_enable)
> +{
> +	struct rte_sched_subport *s;
> +	struct rte_sched_subport_profile *profile;
> +
> +	if (port == NULL) {
> +		RTE_LOG(ERR, SCHED,
> +			"%s: Incorrect value for parameter port\n", __func__);
> +		return -EINVAL;
> +	}
> +
> +	if (subport_id >= port->n_subports_per_port) {
> +		RTE_LOG(ERR, SCHED,
> +			"%s: Incorrect value for parameter subport id\n",
> __func__);
> +		return  -EINVAL;
> +	}
> +
> +	s = port->subports[subport_id];
> +	s->is_tc_ov_enabled = tc_ov_enable ? 1 : 0;
> +
> +	if (s->is_tc_ov_enabled) {
> +		/* TC oversubscription */
> +		s->tc_ov_wm_min = port->mtu;
> +		s->tc_ov_period_id = 0;
> +		s->tc_ov = 0;
> +		s->tc_ov_n = 0;
> +		s->tc_ov_rate = 0;
> +
> +		profile = port->subport_profiles + s->profile;
> +		s->tc_ov_wm_max = rte_sched_time_ms_to_bytes(profile-
> >tc_period,
> +				s->pipe_tc_be_rate_max);
> +		s->tc_ov_wm = s->tc_ov_wm_max;
> +	}
> +	return 0;
> +}

This function should not exist, please remove it and keep the initial code that computes the tc_ov related variable regardless of whether tc_ov is enabled or not.

All the tc_ov related variables have the tc_ov particle in their name, so there is no clash. This is initialization code, so no performance overhead. Let's keep the code unmodified and compute both the tc_ov and the non-tc_ov varables at initialization, regardless of whether the feature is enabled or not.

This comment is applicable to all the initialization code, please adjust all the init code accordingly. There should be no diff showing in the patch for any of the init code!

For this file "rte_sched.c", your patch should contain just two additional run-time functions, i.e. the non-tc-ov version of functions grinder_credits_update() and grindler_credits_check(), and the small code required to test when to use the tc-ov vs. the non-tc_ov version, makes sense?

> +
>  int
>  is_tc_ov_enabled (struct rte_sched_port *port,
>  	uint32_t subport_id,
> @@ -1254,6 +1296,9 @@ rte_sched_subport_config(struct rte_sched_port
> *port,
>  		s->n_pipe_profiles = params->n_pipe_profiles;
>  		s->n_max_pipe_profiles = params->n_max_pipe_profiles;
> 
> +		/* TC over-subscription is disabled by default */
> +		s->is_tc_ov_enabled = 0;
> +

By default, this feature should be enabled:
s->is_tc_ov_enabled = 1;

>  #ifdef RTE_SCHED_CMAN
>  		if (params->cman_params != NULL) {
>  			s->cman_enabled = true;
> @@ -1316,13 +1361,6 @@ rte_sched_subport_config(struct rte_sched_port
> *port,
> 
>  		for (i = 0; i < RTE_SCHED_PORT_N_GRINDERS; i++)
>  			s->grinder_base_bmp_pos[i] =
> RTE_SCHED_PIPE_INVALID;
> -
> -		/* TC oversubscription */
> -		s->tc_ov_wm_min = port->mtu;
> -		s->tc_ov_period_id = 0;
> -		s->tc_ov = 0;
> -		s->tc_ov_n = 0;
> -		s->tc_ov_rate = 0;
>  	}
> 
>  	{
> @@ -1342,9 +1380,6 @@ rte_sched_subport_config(struct rte_sched_port
> *port,
>  			else
>  				profile->tc_credits_per_period[i] = 0;
> 
> -		s->tc_ov_wm_max = rte_sched_time_ms_to_bytes(profile-
> >tc_period,
> -							s-
> >pipe_tc_be_rate_max);
> -		s->tc_ov_wm = s->tc_ov_wm_max;
>  		s->profile = subport_profile_id;
> 
>  	}
> @@ -1417,17 +1452,20 @@ rte_sched_pipe_config(struct rte_sched_port
> *port,
>  		double pipe_tc_be_rate =
>  			(double) params-
> >tc_credits_per_period[RTE_SCHED_TRAFFIC_CLASS_BE]
>  			/ (double) params->tc_period;
> -		uint32_t tc_be_ov = s->tc_ov;
> 
> -		/* Unplug pipe from its subport */
> -		s->tc_ov_n -= params->tc_ov_weight;
> -		s->tc_ov_rate -= pipe_tc_be_rate;
> -		s->tc_ov = s->tc_ov_rate > subport_tc_be_rate;
> +		if (s->is_tc_ov_enabled) {
> +			uint32_t tc_be_ov = s->tc_ov;
> 
> -		if (s->tc_ov != tc_be_ov) {
> -			RTE_LOG(DEBUG, SCHED,
> -				"Subport %u Best-effort TC oversubscription is
> OFF (%.4lf >= %.4lf)\n",
> -				subport_id, subport_tc_be_rate, s-
> >tc_ov_rate);
> +			/* Unplug pipe from its subport */
> +			s->tc_ov_n -= params->tc_ov_weight;
> +			s->tc_ov_rate -= pipe_tc_be_rate;
> +			s->tc_ov = s->tc_ov_rate > subport_tc_be_rate;
> +
> +			if (s->tc_ov != tc_be_ov) {
> +				RTE_LOG(DEBUG, SCHED,
> +					"Subport %u Best-effort TC
> oversubscription is OFF (%.4lf >= %.4lf)\n",
> +					subport_id, subport_tc_be_rate, s-
> >tc_ov_rate);
> +			}
>  		}
> 
>  		/* Reset the pipe */
> @@ -1460,19 +1498,22 @@ rte_sched_pipe_config(struct rte_sched_port
> *port,
>  		double pipe_tc_be_rate =
>  			(double) params-
> >tc_credits_per_period[RTE_SCHED_TRAFFIC_CLASS_BE]
>  			/ (double) params->tc_period;
> -		uint32_t tc_be_ov = s->tc_ov;
> 
> -		s->tc_ov_n += params->tc_ov_weight;
> -		s->tc_ov_rate += pipe_tc_be_rate;
> -		s->tc_ov = s->tc_ov_rate > subport_tc_be_rate;
> +		if (s->is_tc_ov_enabled) {
> +			uint32_t tc_be_ov = s->tc_ov;
> +
> +			s->tc_ov_n += params->tc_ov_weight;
> +			s->tc_ov_rate += pipe_tc_be_rate;
> +			s->tc_ov = s->tc_ov_rate > subport_tc_be_rate;
> 
> -		if (s->tc_ov != tc_be_ov) {
> -			RTE_LOG(DEBUG, SCHED,
> -				"Subport %u Best effort TC oversubscription is
> ON (%.4lf < %.4lf)\n",
> -				subport_id, subport_tc_be_rate, s-
> >tc_ov_rate);
> +			if (s->tc_ov != tc_be_ov) {
> +				RTE_LOG(DEBUG, SCHED,
> +					"Subport %u Best effort TC
> oversubscription is ON (%.4lf < %.4lf)\n",
> +					subport_id, subport_tc_be_rate, s-
> >tc_ov_rate);
> +			}
> +			p->tc_ov_period_id = s->tc_ov_period_id;
> +			p->tc_ov_credits = s->tc_ov_wm;
>  		}
> -		p->tc_ov_period_id = s->tc_ov_period_id;
> -		p->tc_ov_credits = s->tc_ov_wm;
>  	}
> 
>  	return 0;
> @@ -2318,6 +2359,45 @@ grinder_credits_update(struct rte_sched_port
> *port,
>  	pipe->tb_credits = RTE_MIN(pipe->tb_credits, params->tb_size);
>  	pipe->tb_time += n_periods * params->tb_period;
> 
> +	/* Subport TCs */
> +	if (unlikely(port->time >= subport->tc_time)) {
> +		for (i = 0; i < RTE_SCHED_TRAFFIC_CLASSES_PER_PIPE; i++)
> +			subport->tc_credits[i] = sp->tc_credits_per_period[i];
> +
> +		subport->tc_time = port->time + sp->tc_period;
> +	}
> +
> +	/* Pipe TCs */
> +	if (unlikely(port->time >= pipe->tc_time)) {
> +		for (i = 0; i < RTE_SCHED_TRAFFIC_CLASSES_PER_PIPE; i++)
> +			pipe->tc_credits[i] = params->tc_credits_per_period[i];
> +		pipe->tc_time = port->time + params->tc_period;
> +	}
> +}
> +
> +static inline void
> +grinder_credits_update_with_tc_ov(struct rte_sched_port *port,
> +	struct rte_sched_subport *subport, uint32_t pos)
> +{
> +	struct rte_sched_grinder *grinder = subport->grinder + pos;
> +	struct rte_sched_pipe *pipe = grinder->pipe;
> +	struct rte_sched_pipe_profile *params = grinder->pipe_params;
> +	struct rte_sched_subport_profile *sp = grinder->subport_params;
> +	uint64_t n_periods;
> +	uint32_t i;
> +
> +	/* Subport TB */
> +	n_periods = (port->time - subport->tb_time) / sp->tb_period;
> +	subport->tb_credits += n_periods * sp->tb_credits_per_period;
> +	subport->tb_credits = RTE_MIN(subport->tb_credits, sp->tb_size);
> +	subport->tb_time += n_periods * sp->tb_period;
> +
> +	/* Pipe TB */
> +	n_periods = (port->time - pipe->tb_time) / params->tb_period;
> +	pipe->tb_credits += n_periods * params->tb_credits_per_period;
> +	pipe->tb_credits = RTE_MIN(pipe->tb_credits, params->tb_size);
> +	pipe->tb_time += n_periods * params->tb_period;
> +
>  	/* Subport TCs */
>  	if (unlikely(port->time >= subport->tc_time)) {
>  		subport->tc_ov_wm =
> @@ -2348,6 +2428,39 @@ grinder_credits_update(struct rte_sched_port
> *port,
>  static inline int
>  grinder_credits_check(struct rte_sched_port *port,
>  	struct rte_sched_subport *subport, uint32_t pos)
> +{
> +	struct rte_sched_grinder *grinder = subport->grinder + pos;
> +	struct rte_sched_pipe *pipe = grinder->pipe;
> +	struct rte_mbuf *pkt = grinder->pkt;
> +	uint32_t tc_index = grinder->tc_index;
> +	uint64_t pkt_len = pkt->pkt_len + port->frame_overhead;
> +	uint64_t subport_tb_credits = subport->tb_credits;
> +	uint64_t subport_tc_credits = subport->tc_credits[tc_index];
> +	uint64_t pipe_tb_credits = pipe->tb_credits;
> +	uint64_t pipe_tc_credits = pipe->tc_credits[tc_index];
> +	int enough_credits;
> +
> +	/* Check pipe and subport credits */
> +	enough_credits = (pkt_len <= subport_tb_credits) &&
> +		(pkt_len <= subport_tc_credits) &&
> +		(pkt_len <= pipe_tb_credits) &&
> +		(pkt_len <= pipe_tc_credits);
> +
> +	if (!enough_credits)
> +		return 0;
> +
> +	/* Update pipe and subport credits */
> +	subport->tb_credits -= pkt_len;
> +	subport->tc_credits[tc_index] -= pkt_len;
> +	pipe->tb_credits -= pkt_len;
> +	pipe->tc_credits[tc_index] -= pkt_len;
> +
> +	return 1;
> +}
> +
> +static inline int
> +grinder_credits_check_with_tc_ov(struct rte_sched_port *port,
> +	struct rte_sched_subport *subport, uint32_t pos)
>  {
>  	struct rte_sched_grinder *grinder = subport->grinder + pos;
>  	struct rte_sched_pipe *pipe = grinder->pipe;
> @@ -2403,8 +2516,16 @@ grinder_schedule(struct rte_sched_port *port,
>  	uint32_t pkt_len = pkt->pkt_len + port->frame_overhead;
>  	uint32_t be_tc_active;
> 
> -	if (!grinder_credits_check(port, subport, pos))
> -		return 0;
> +	switch (subport->is_tc_ov_enabled) {
> +	case 1:
> +		if (!grinder_credits_check_with_tc_ov(port, subport, pos))
> +			return 0;
> +		break;
> +	case 0:
> +		if (!grinder_credits_check(port, subport, pos))
> +			return 0;
> +		break;
> +	}

There should be no switch statement here, please replace with an if statement. I suggest the following:

int status;

status = subport->tc_ov_enabled ? grinder_credits_check_with_tc_ov(port, subport, pos) : grinder_credits_check(port, subport, pos);
if (!status)
	return 0;

> 
>  	/* Advance port time */
>  	port->time += pkt_len;
> @@ -2770,7 +2891,11 @@ grinder_handle(struct rte_sched_port *port,
>  						subport->profile;
> 
>  		grinder_prefetch_tc_queue_arrays(subport, pos);
> -		grinder_credits_update(port, subport, pos);
> +
> +		if (unlikely(subport->is_tc_ov_enabled))

Please remove the "unlikely" from here, don't put any likely/unlikely here at all.

> +			grinder_credits_update_with_tc_ov(port, subport, pos);
> +		else
> +			grinder_credits_update(port, subport, pos);
> 
>  		grinder->state = e_GRINDER_PREFETCH_MBUF;
>  		return 0;
> diff --git a/lib/sched/rte_sched.h b/lib/sched/rte_sched.h
> index 5ece64e527..94febe1d94 100644
> --- a/lib/sched/rte_sched.h
> +++ b/lib/sched/rte_sched.h
> @@ -579,6 +579,24 @@ rte_sched_port_enqueue(struct rte_sched_port *port,
> struct rte_mbuf **pkts, uint
>  int
>  rte_sched_port_dequeue(struct rte_sched_port *port, struct rte_mbuf **pkts,
> uint32_t n_pkts);
> 
> +/**
> + * Hierarchical scheduler subport TC OV enable/disable config.

The name of the feature should be fully stated here: traffic class oversubscription, not the abbreviation, please change.


> + * Note that this function is safe to use at runtime
> + * to enable/disable TC OV for subport.

We should actually forbit this rather than encourage it. Calling this function several times does not make sense, and it can create limitations that can come back and byte us in the future, whenever we might need to extend this code, for no reason.

Please actually replace with: "This function should be called at the time of subport initialization."

> + *
> + * @param port
> + *   Handle to port scheduler instance
> + * @param subport_id
> + *   Subport ID
> + * @param tc_ov_enable
> + *  Boolean flag to enable/disable TC OV
> + * @return
> + *   0 upon success, error code otherwise
> + */
> +__rte_experimental
> +int
> +rte_sched_subport_tc_ov_config(struct rte_sched_port *port, uint32_t
> subport_id, bool tc_ov_enable);
> +
>  #ifdef __cplusplus
>  }
>  #endif
> diff --git a/lib/sched/version.map b/lib/sched/version.map
> index d22c07fc9f..c6e994d8df 100644
> --- a/lib/sched/version.map
> +++ b/lib/sched/version.map
> @@ -34,4 +34,7 @@ EXPERIMENTAL {
>  	# added in 21.11
>  	rte_pie_rt_data_init;
>  	rte_pie_config_init;
> +
> +	# added in 22.03

This is not in 22.03, it will hopefully be in 22.07.

> +	rte_sched_subport_tc_ov_config;
>  };
> --
> 2.25.1

Regards,
Cristian

  parent reply	other threads:[~2022-05-09 20:05 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-07 14:51 [dpdk][PATCH 1/2] " Marcin Danilewicz
2022-04-07 14:51 ` [dpdk][PATCH 2/2] sched: fix to manage " Marcin Danilewicz
2022-04-14  9:53   ` Thomas Monjalon
2022-04-27  9:23   ` [PATCH v3] sched: enable/disable " Marcin Danilewicz
2022-04-27 15:48     ` Singh, Jasvinder
2022-05-09 20:05     ` Dumitrescu, Cristian [this message]
2022-05-10  6:40       ` Ajmera, Megha
2022-05-10  9:09         ` Dumitrescu, Cristian
2022-05-24 13:33           ` Marcin Danilewicz
2022-05-24 14:52             ` Stephen Hemminger
2022-05-26 23:12               ` Danilewicz, MarcinX
2022-05-24 13:38     ` [PATCH v4] sched: enable traffic class oversubscription conditionally Marcin Danilewicz
2022-05-24 13:43     ` Marcin Danilewicz
2022-05-24 14:30       ` Dumitrescu, Cristian
2022-05-25 14:18         ` Danilewicz, MarcinX
2022-05-27  0:09           ` Danilewicz, MarcinX
2022-05-27  0:09       ` [PATCH v5] " Marcin Danilewicz
2022-05-30  8:45         ` [PATCH v6] " Marcin Danilewicz
2022-05-30 10:35           ` Dumitrescu, Cristian
2022-05-30 11:59             ` Danilewicz, MarcinX
2022-05-30 10:54           ` Dumitrescu, Cristian
2022-05-30 12:02             ` Danilewicz, MarcinX
2022-05-30 10:58           ` Dumitrescu, Cristian
2022-05-30 12:04             ` Danilewicz, MarcinX
2022-05-30 11:55           ` [PATCH v7] " Marcin Danilewicz
2022-05-30 12:14             ` Dumitrescu, Cristian
2022-05-30 13:34               ` Danilewicz, MarcinX
2022-05-30 13:55                 ` Dumitrescu, Cristian
2022-05-30 14:05                   ` Danilewicz, MarcinX
2022-05-30 13:38             ` [PATCH v8] " Marcin Danilewicz
2022-05-30 18:47               ` [PATCH v9] " Marcin Danilewicz
2022-05-30 21:13                 ` Dumitrescu, Cristian
2022-05-31  9:49                 ` [PATCH v10] " Marcin Danilewicz
2022-05-31 13:09                   ` Dumitrescu, Cristian
2022-05-31 16:42                     ` Thomas Monjalon
2022-05-31 18:45                       ` Danilewicz, MarcinX
2022-04-08 12:50 ` [dpdk][PATCH 1/2] sched: enable/disable TC OV at runtime Singh, Jasvinder
2022-04-27  8:58 [PATCH v3] " Marcin Danilewicz
2022-04-27  9:36 ` Dumitrescu, Cristian
2022-04-27  9:37   ` Dumitrescu, Cristian
2022-04-27 15:53 ` Stephen Hemminger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DM8PR11MB5670595C757CB671A40D1510EBC69@DM8PR11MB5670.namprd11.prod.outlook.com \
    --to=cristian.dumitrescu@intel.com \
    --cc=dev@dpdk.org \
    --cc=jasvinder.singh@intel.com \
    --cc=marcinx.danilewicz@intel.com \
    --cc=megha.ajmera@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).