DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Hunt, David" <david.hunt@intel.com>
To: Liang Ma <liang.j.ma@intel.com>
Cc: dev@dpdk.org, lei.a.yao@intel.com, ktraynor@redhat.com,
	john.geary@intel.com
Subject: Re: [dpdk-dev] [PATCH v8 1/4] lib/librte_power: traffic pattern aware power control
Date: Fri, 28 Sep 2018 11:47:17 +0100	[thread overview]
Message-ID: <fa8d85a2-a79e-d677-98a9-854ee3434440@intel.com> (raw)
In-Reply-To: <1537191016-26330-1-git-send-email-liang.j.ma@intel.com>

Hi Liang,


On 17/9/2018 2:30 PM, Liang Ma wrote:
> 1. Abstract
>
> For packet processing workloads such as DPDK polling is continuous.
> This means CPU cores always show 100% busy independent of how much work
> those cores are doing. It is critical to accurately determine how busy
> a core is hugely important for the following reasons:
>
>     * No indication of overload conditions
>
>     * User do not know how much real load is on a system meaning resulted in
>       wasted energy as no power management is utilized
>
> Compared to the original l3fwd-power design, instead of going to sleep
> after detecting an empty poll, the new mechanism just lowers the core
> frequency. As a result, the application does not stop polling the device,
> which leads to improved handling of bursts of traffic.
>
> When the system become busy, the empty poll mechanism can also increase the
> core frequency (including turbo) to do best effort for intensive traffic.
> This gives us more flexible and balanced traffic awareness over the
> standard l3fwd-power application.
>
> 2. Proposed solution
>
> The proposed solution focuses on how many times empty polls are executed.
> The less the number of empty polls, means current core is busy with
> processing workload, therefore, the higher frequency is needed. The high
> empty poll number indicates the current core not doing any real work
> therefore, we can lower the frequency to safe power.
>
> In the current implementation, each core has 1 empty-poll counter which
> assume 1 core is dedicated to 1 queue. This will need to be expanded in the
> future to support multiple queues per core.
>
> 2.1 Power state definition:
>
> 	LOW:  Not currently used, reserved for future use.
>
> 	MED:  the frequency is used to process modest traffic workload.
>
> 	HIGH: the frequency is used to process busy traffic workload.
>
> 2.2 There are two phases to establish the power management system:
>
> 	a.Initialization/Training phase. The training phase is necessary
> 	  in order to figure out the system polling baseline numbers from
> 	  idle to busy. The highest poll count will be during idle, where
> 	  all polls are empty. These poll counts will be different between
> 	  systems due to the many possible processor micro-arch, cache
> 	  and device configurations, hence the training phase.
>    	  In the training phase, traffic is blocked so the training
>    	  algorithm can average the empty-poll numbers for the LOW, MED and
>   	  HIGH  power states in order to create a baseline.
>    	  The core's counter are collected every 10ms, and the Training
>   	  phase will take 2 seconds.
>   	  Training is disabled as default configuration. the default
>   	  parameter is applied. Simple App still can trigger training

Typo: "Simple" should be "Sample"

Suggest adding: Once the training phase has been executed once on a 
system, the application
can then be started with the relevant thresholds provided on the command 
line, allowing the
application to start passing start traffic immediately.

>   	  if that's needed.
>
> 	b.Normal phase. When the training phase is complete, traffic is
>    	  started. The run-time poll counts are compared with the
> 	  baseline and the decision will be taken to move to MED power
>    	  state or HIGH power state. The counters are calculated every 10ms.

Propose changing the first sentence:  Traffic starts immediately based 
on the default
thresholds, or based on the user supplied thresholds via the command 
line parameters.




> 3. Proposed  API
>
> 1.  rte_power_empty_poll_stat_init(struct ep_params **eptr,
> 		uint8_t *freq_tlb, struct ep_policy *policy);
> which is used to initialize the power management system.
>   
> 2.  rte_power_empty_poll_stat_free(void);
> which is used to free the resource hold by power management system.
>   
> 3.  rte_power_empty_poll_stat_update(unsigned int lcore_id);
> which is used to update specific core empty poll counter, not thread safe
>   
> 4.  rte_power_poll_stat_update(unsigned int lcore_id, uint8_t nb_pkt);
> which is used to update specific core valid poll counter, not thread safe
>   
> 5.  rte_power_empty_poll_stat_fetch(unsigned int lcore_id);
> which is used to get specific core empty poll counter.
>   
> 6.  rte_power_poll_stat_fetch(unsigned int lcore_id);
> which is used to get specific core valid poll counter.
>
> 7.  rte_empty_poll_detection(struct rte_timer *tim, void *arg);
> which is used to detect empty poll state changes then take action.
>
> ChangeLog:
> v2: fix some coding style issues.
> v3: rename the filename, API name.
> v4: no change.
> v5: no change.
> v6: re-work the code layout, update API.
> v7: fix minor typo and lift node num limit.
> v8: disable training as default option.
>
> Signed-off-by: Liang Ma <liang.j.ma@intel.com>
>
> Reviewed-by: Lei Yao <lei.a.yao@intel.com>
> ---
>   lib/librte_power/Makefile               |   6 +-
>   lib/librte_power/meson.build            |   5 +-
>   lib/librte_power/rte_power_empty_poll.c | 539 ++++++++++++++++++++++++++++++++
>   lib/librte_power/rte_power_empty_poll.h | 219 +++++++++++++
>   lib/librte_power/rte_power_version.map  |  13 +
>   5 files changed, 778 insertions(+), 4 deletions(-)
>   create mode 100644 lib/librte_power/rte_power_empty_poll.c
>   create mode 100644 lib/librte_power/rte_power_empty_poll.h
>
> diff --git a/lib/librte_power/Makefile b/lib/librte_power/Makefile
> index 6f85e88..a8f1301 100644
> --- a/lib/librte_power/Makefile
> +++ b/lib/librte_power/Makefile
> @@ -6,8 +6,9 @@ include $(RTE_SDK)/mk/rte.vars.mk
>   # library name
>   LIB = librte_power.a
>   
> +CFLAGS += -DALLOW_EXPERIMENTAL_API
>   CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3 -fno-strict-aliasing
> -LDLIBS += -lrte_eal
> +LDLIBS += -lrte_eal -lrte_timer
>   
>   EXPORT_MAP := rte_power_version.map
>   
> @@ -16,8 +17,9 @@ LIBABIVER := 1
>   # all source are stored in SRCS-y
>   SRCS-$(CONFIG_RTE_LIBRTE_POWER) := rte_power.c power_acpi_cpufreq.c
>   SRCS-$(CONFIG_RTE_LIBRTE_POWER) += power_kvm_vm.c guest_channel.c
> +SRCS-$(CONFIG_RTE_LIBRTE_POWER) += rte_power_empty_poll.c
>   
>   # install this header file
> -SYMLINK-$(CONFIG_RTE_LIBRTE_POWER)-include := rte_power.h
> +SYMLINK-$(CONFIG_RTE_LIBRTE_POWER)-include := rte_power.h  rte_power_empty_poll.h
>   
>   include $(RTE_SDK)/mk/rte.lib.mk
> diff --git a/lib/librte_power/meson.build b/lib/librte_power/meson.build
> index 253173f..63957eb 100644
> --- a/lib/librte_power/meson.build
> +++ b/lib/librte_power/meson.build
> @@ -5,5 +5,6 @@ if host_machine.system() != 'linux'
>   	build = false
>   endif
>   sources = files('rte_power.c', 'power_acpi_cpufreq.c',
> -		'power_kvm_vm.c', 'guest_channel.c')
> -headers = files('rte_power.h')
> +		'power_kvm_vm.c', 'guest_channel.c',
> +		'rte_power_empty_poll.c')
> +headers = files('rte_power.h','rte_power_empty_poll.h')
> diff --git a/lib/librte_power/rte_power_empty_poll.c b/lib/librte_power/rte_power_empty_poll.c
> new file mode 100644
> index 0000000..587ce78
> --- /dev/null
> +++ b/lib/librte_power/rte_power_empty_poll.c
> @@ -0,0 +1,539 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2010-2018 Intel Corporation
> + */
> +
> +#include <string.h>
> +
> +#include <rte_lcore.h>
> +#include <rte_cycles.h>
> +#include <rte_atomic.h>
> +#include <rte_malloc.h>
> +
> +#include "rte_power.h"
> +#include "rte_power_empty_poll.h"
> +
> +#define INTERVALS_PER_SECOND 100     /* (10ms) */
> +#define SECONDS_TO_TRAIN_FOR 2
> +#define DEFAULT_MED_TO_HIGH_PERCENT_THRESHOLD 70
> +#define DEFAULT_HIGH_TO_MED_PERCENT_THRESHOLD 30
> +#define DEFAULT_CYCLES_PER_PACKET 800
> +
> +static struct ep_params *ep_params;
> +static uint32_t med_to_high_threshold = DEFAULT_MED_TO_HIGH_PERCENT_THRESHOLD;
> +static uint32_t high_to_med_threshold = DEFAULT_HIGH_TO_MED_PERCENT_THRESHOLD;
> +
> +static uint32_t avail_freqs[RTE_MAX_LCORE][NUM_FREQS];
> +
> +static uint32_t total_avail_freqs[RTE_MAX_LCORE];
> +
> +static uint32_t freq_index[NUM_FREQ];
> +
> +static uint32_t
> +get_freq_index(enum freq_val index)
> +{
> +	return freq_index[index];
> +}
> +
> +
> +static int
> +set_power_freq(int lcore_id, enum freq_val freq, bool specific_freq)
> +{
> +	int err = 0;
> +	uint32_t power_freq_index;
> +	if (!specific_freq)
> +		power_freq_index = get_freq_index(freq);
> +	else
> +		power_freq_index = freq;
> +
> +	err = rte_power_set_freq(lcore_id, power_freq_index);
> +
> +	return err;
> +}
> +
> +
> +static inline void __attribute__((always_inline))
> +exit_training_state(struct priority_worker *poll_stats)
> +{
> +	RTE_SET_USED(poll_stats);
> +}
> +

Is this really needed? It does nothing, and is just local to this file.


> +static inline void __attribute__((always_inline))
> +enter_training_state(struct priority_worker *poll_stats)
> +{
> +	poll_stats->iter_counter = 0;
> +	poll_stats->cur_freq = LOW;
> +	poll_stats->queue_state = TRAINING;
> +}
> +
> +static inline void __attribute__((always_inline))
> +enter_normal_state(struct priority_worker *poll_stats)
> +{
> +	/* Clear the averages arrays and strs */
> +	memset(poll_stats->edpi_av, 0, sizeof(poll_stats->edpi_av));
> +	poll_stats->ec = 0;
> +	memset(poll_stats->ppi_av, 0, sizeof(poll_stats->ppi_av));
> +	poll_stats->pc = 0;
> +
> +	poll_stats->cur_freq = MED;
> +	poll_stats->iter_counter = 0;
> +	poll_stats->threshold_ctr = 0;
> +	poll_stats->queue_state = MED_NORMAL;
> +	RTE_LOG(INFO, POWER, "set the poewr freq to MED\n");

Typo, "poewr" should be "power", also suggest "Set" rather than "set"


> +	set_power_freq(poll_stats->lcore_id, MED, false);
> +
> +	/* Try here */

Not sure about this comment?

> +	poll_stats->thresh[MED].threshold_percent = med_to_high_threshold;
> +	poll_stats->thresh[HGH].threshold_percent = high_to_med_threshold;
> +}
> +
> +static inline void __attribute__((always_inline))
> +enter_busy_state(struct priority_worker *poll_stats)
> +{
> +	memset(poll_stats->edpi_av, 0, sizeof(poll_stats->edpi_av));
> +	poll_stats->ec = 0;
> +	memset(poll_stats->ppi_av, 0, sizeof(poll_stats->ppi_av));
> +	poll_stats->pc = 0;
> +
> +	poll_stats->cur_freq = HGH;
> +	poll_stats->iter_counter = 0;
> +	poll_stats->threshold_ctr = 0;
> +	poll_stats->queue_state = HGH_BUSY;
> +	set_power_freq(poll_stats->lcore_id, HGH, false);
> +}
> +
> +static inline void __attribute__((always_inline))
> +enter_purge_state(struct priority_worker *poll_stats)
> +{
> +	poll_stats->iter_counter = 0;
> +	poll_stats->queue_state = LOW_PURGE;
> +}
> +
> +static inline void __attribute__((always_inline))
> +set_state(struct priority_worker *poll_stats,
> +		enum queue_state new_state)
> +{
> +	enum queue_state old_state = poll_stats->queue_state;
> +	if (old_state != new_state) {
> +
> +		/* Call any old state exit functions */
> +		if (old_state == TRAINING)
> +			exit_training_state(poll_stats);

Is this needed? exit_training_state() does nothing.

> +		/* Call any new state entry functions */
> +		if (new_state == TRAINING)
> +			enter_training_state(poll_stats);
> +		if (new_state == MED_NORMAL)
> +			enter_normal_state(poll_stats);
> +		if (new_state == HGH_BUSY)
> +			enter_busy_state(poll_stats);
> +		if (new_state == LOW_PURGE)
> +			enter_purge_state(poll_stats);
> +	}
> +}
> +
> +static inline void __attribute__((always_inline))
> +set_policy(struct priority_worker *poll_stats,
> +		struct ep_policy *policy)
> +{
> +	set_state(poll_stats, policy->state);
> +
> +	if (policy->state == TRAINING)
> +		return;
> +
> +	poll_stats->thresh[MED_NORMAL].base_edpi = policy->med_base_edpi;
> +	poll_stats->thresh[HGH_BUSY].base_edpi = policy->hgh_base_edpi;
> +
> +	poll_stats->thresh[MED_NORMAL].trained = true;
> +	poll_stats->thresh[HGH_BUSY].trained = true;
> +
> +}
> +
> +static void
> +update_training_stats(struct priority_worker *poll_stats,
> +		uint32_t freq,
> +		bool specific_freq,
> +		uint32_t max_train_iter)
> +{
> +	RTE_SET_USED(specific_freq);
> +
> +	char pfi_str[32];
> +	uint64_t p0_empty_deq;
> +
> +	sprintf(pfi_str, "%02d", freq);
> +
> +	if (poll_stats->cur_freq == freq &&
> +			poll_stats->thresh[freq].trained == false) {
> +		if (poll_stats->thresh[freq].cur_train_iter == 0) {
> +
> +			set_power_freq(poll_stats->lcore_id,
> +					freq, specific_freq);
> +
> +			poll_stats->empty_dequeues_prev =
> +				poll_stats->empty_dequeues;
> +
> +			poll_stats->thresh[freq].cur_train_iter++;
> +
> +			return;
> +		} else if (poll_stats->thresh[freq].cur_train_iter
> +				<= max_train_iter) {
> +
> +			p0_empty_deq = poll_stats->empty_dequeues -
> +				poll_stats->empty_dequeues_prev;
> +
> +			poll_stats->empty_dequeues_prev =
> +				poll_stats->empty_dequeues;
> +
> +			poll_stats->thresh[freq].base_edpi += p0_empty_deq;
> +			poll_stats->thresh[freq].cur_train_iter++;
> +
> +		} else {
> +			if (poll_stats->thresh[freq].trained == false) {
> +				poll_stats->thresh[freq].base_edpi =
> +					poll_stats->thresh[freq].base_edpi /
> +					max_train_iter;
> +
> +				/* Add on a factor of 0.05%*/
> +				/* this should remove any */
> +				/* false negatives when the system is 0% busy */

Multi line comment should follow the usual standard /* \n * text \n text 
\n */

> +				poll_stats->thresh[freq].base_edpi +=
> +					poll_stats->thresh[freq].base_edpi / 2000;
> +
> +				poll_stats->thresh[freq].trained = true;
> +				poll_stats->cur_freq++;
> +
> +			}
> +		}
> +	}
> +}
> +
> +static inline uint32_t __attribute__((always_inline))
> +update_stats(struct priority_worker *poll_stats)
> +{
> +	uint64_t tot_edpi = 0, tot_ppi = 0;
> +	uint32_t j, percent;
> +
> +	struct priority_worker *s = poll_stats;
> +
> +	uint64_t cur_edpi = s->empty_dequeues - s->empty_dequeues_prev;
> +
> +	s->empty_dequeues_prev = s->empty_dequeues;
> +
> +	uint64_t ppi = s->num_dequeue_pkts - s->num_dequeue_pkts_prev;
> +
> +	s->num_dequeue_pkts_prev = s->num_dequeue_pkts;
> +
> +	if (s->thresh[s->cur_freq].base_edpi < cur_edpi) {
> +		RTE_LOG(DEBUG, POWER, "cur_edpi is too large "
> +				"cur edpi %ld "
> +				"base epdi %ld\n",
> +				cur_edpi,
> +				s->thresh[s->cur_freq].base_edpi);

Suggest making this log message more meaningful to the user. I suspect 
that "cur_edpi" will not mean much to the user.
What does edpi mean?

> +		/* Value to make us fail need debug log*/
> +		return 1000UL;
> +	}
> +
> +	s->edpi_av[s->ec++ % BINS_AV] = cur_edpi;
> +	s->ppi_av[s->pc++ % BINS_AV] = ppi;
> +
> +	for (j = 0; j < BINS_AV; j++) {
> +		tot_edpi += s->edpi_av[j];
> +		tot_ppi += s->ppi_av[j];
> +	}
> +
> +	tot_edpi = tot_edpi / BINS_AV;
> +
> +	percent = 100 - (uint32_t)(((float)tot_edpi /
> +			(float)s->thresh[s->cur_freq].base_edpi) * 100);
> +
> +	return (uint32_t)percent;
> +}
> +
> +
> +static inline void  __attribute__((always_inline))
> +update_stats_normal(struct priority_worker *poll_stats)
> +{
> +	uint32_t percent;
> +
> +	if (poll_stats->thresh[poll_stats->cur_freq].base_edpi == 0) {
> +
> +		RTE_LOG(DEBUG, POWER, "cure freq is %d, edpi is %lu\n",
> +				poll_stats->cur_freq,
> +				poll_stats->thresh[poll_stats->cur_freq].base_edpi);

Again, a more meaningful explanation of edpi is needed here.

> +		return;
> +	}
> +
> +	percent = update_stats(poll_stats);
> +
> +	if (percent > 100) {
> +		RTE_LOG(DEBUG, POWER, "Big than 100 abnormal\n");

Please change to something meaningful to the user. What is the 
percentage returned from update_stats()?

> +		return;
> +	}
> +
> +	if (poll_stats->cur_freq == LOW)
> +		RTE_LOG(INFO, POWER, "Purge Mode is not supported\n");

Suggest adding "currently" - "Purge Mode is not currently supported\n"

> +	else if (poll_stats->cur_freq == MED) {
> +
> +		if (percent >
> +			poll_stats->thresh[MED].threshold_percent) {
> +
> +			if (poll_stats->threshold_ctr < INTERVALS_PER_SECOND)
> +				poll_stats->threshold_ctr++;
> +			else {
> +				set_state(poll_stats, HGH_BUSY);
> +				RTE_LOG(INFO, POWER, "MOVE to HGH\n");
> +			}
> +
> +		} else {
> +			/* reset */
> +			poll_stats->threshold_ctr = 0;
> +		}
> +
> +	} else if (poll_stats->cur_freq == HGH) {
> +
> +		if (percent <
> +				poll_stats->thresh[HGH].threshold_percent) {
> +
> +			if (poll_stats->threshold_ctr < INTERVALS_PER_SECOND)
> +				poll_stats->threshold_ctr++;
> +			else {
> +				set_state(poll_stats, MED_NORMAL);
> +				RTE_LOG(INFO, POWER, "MOVE to MED\n");
> +			}
> +		} else {
> +			/* reset */
> +			poll_stats->threshold_ctr = 0;
> +		}
> +
> +	}
> +}
> +
> +static int
> +empty_poll_training(struct priority_worker *poll_stats,
> +		uint32_t max_train_iter)
> +{
> +
> +	if (poll_stats->iter_counter < INTERVALS_PER_SECOND) {
> +		poll_stats->iter_counter++;
> +		return 0;
> +	}
> +
> +
> +	update_training_stats(poll_stats,
> +			LOW,
> +			false,
> +			max_train_iter);
> +
> +	update_training_stats(poll_stats,
> +			MED,
> +			false,
> +			max_train_iter);
> +
> +	update_training_stats(poll_stats,
> +			HGH,
> +			false,
> +			max_train_iter);
> +
> +
> +	if (poll_stats->thresh[LOW].trained == true
> +			&& poll_stats->thresh[MED].trained == true
> +			&& poll_stats->thresh[HGH].trained == true) {
> +
> +		set_state(poll_stats, MED_NORMAL);
> +
> +		RTE_LOG(INFO, POWER, "Low threshold is %lu\n",
> +				poll_stats->thresh[LOW].base_edpi);

Suggest "Low" change to "LOW" for consistency with other log messages below.

> +
> +		RTE_LOG(INFO, POWER, "MED threshold is %lu\n",
> +				poll_stats->thresh[MED].base_edpi);
> +
> +
> +		RTE_LOG(INFO, POWER, "HIGH threshold is %lu\n",
> +				poll_stats->thresh[HGH].base_edpi);
> +
> +		RTE_LOG(INFO, POWER, "Training is Complete for %d\n",
> +				poll_stats->lcore_id);
> +	}
> +
> +	return 0;
> +}
> +
> +void
> +rte_empty_poll_detection(struct rte_timer *tim, void *arg)
> +{
> +
> +	uint32_t i;
> +
> +	struct priority_worker *poll_stats;
> +
> +	RTE_SET_USED(tim);
> +
> +	RTE_SET_USED(arg);
> +
> +	for (i = 0; i < NUM_NODES; i++) {
> +
> +		poll_stats = &(ep_params->wrk_data.wrk_stats[i]);
> +
> +		if (rte_lcore_is_enabled(poll_stats->lcore_id) == 0)
> +			continue;
> +
> +		switch (poll_stats->queue_state) {
> +		case(TRAINING):
> +			empty_poll_training(poll_stats,
> +					ep_params->max_train_iter);
> +			break;
> +
> +		case(HGH_BUSY):
> +		case(MED_NORMAL):
> +			update_stats_normal(poll_stats);
> +			break;
> +
> +		case(LOW_PURGE):
> +			break;
> +		default:
> +			break;
> +
> +		}
> +
> +	}
> +
> +}
> +
> +int __rte_experimental
> +rte_power_empty_poll_stat_init(struct ep_params **eptr, uint8_t *freq_tlb,
> +		struct ep_policy *policy)
> +{
> +	uint32_t i;
> +	/* Allocate the ep_params structure */
> +	ep_params = rte_zmalloc_socket(NULL,
> +			sizeof(struct ep_params),
> +			0,
> +			rte_socket_id());
> +
> +	if (!ep_params)
> +		rte_panic("Cannot allocate heap memory for ep_params "
> +				"for socket %d\n", rte_socket_id());
> +
> +	if (freq_tlb == NULL) {
> +		freq_index[LOW] = 14;
> +		freq_index[MED] = 9;
> +		freq_index[HGH] = 1;
> +	} else {
> +		freq_index[LOW] = freq_tlb[LOW];
> +		freq_index[MED] = freq_tlb[MED];
> +		freq_index[HGH] = freq_tlb[HGH];
> +	}
> +
> +	RTE_LOG(INFO, POWER, "Initialize the Empty Poll\n");
> +
> +	/* 5 seconds worth of training */

This now looks to be 2 seconds from the #define above. Maybe: /* Train 
for pre-defined period */

> +	ep_params->max_train_iter = INTERVALS_PER_SECOND * SECONDS_TO_TRAIN_FOR;
> +
> +	struct stats_data *w = &ep_params->wrk_data;
> +
> +	*eptr = ep_params;
> +
> +	/* initialize all wrk_stats state */
> +	for (i = 0; i < NUM_NODES; i++) {
> +
> +		if (rte_lcore_is_enabled(i) == 0)
> +			continue;
> +		/*init the freqs table */
> +		total_avail_freqs[i] = rte_power_freqs(i,
> +				avail_freqs[i],
> +				NUM_FREQS);
> +
> +		RTE_LOG(INFO, POWER, "total avail freq is %d , lcoreid %d\n",
> +				total_avail_freqs[i],
> +				i);
> +
> +		if (get_freq_index(LOW) > total_avail_freqs[i])
> +			return -1;
> +
> +		if (rte_get_master_lcore() != i) {
> +			w->wrk_stats[i].lcore_id = i;
> +			set_policy(&w->wrk_stats[i], policy);
> +		}
> +	}
> +
> +	return 0;
> +}
> +
> +void __rte_experimental
> +rte_power_empty_poll_stat_free(void)
> +{
> +
> +	RTE_LOG(INFO, POWER, "Close the Empty Poll\n");
> +
> +	if (ep_params != NULL)
> +		rte_free(ep_params);
> +}
> +
> +int __rte_experimental
> +rte_power_empty_poll_stat_update(unsigned int lcore_id)
> +{
> +	struct priority_worker *poll_stats;
> +
> +	if (lcore_id >= NUM_NODES)
> +		return -1;
> +
> +	poll_stats = &(ep_params->wrk_data.wrk_stats[lcore_id]);
> +
> +	if (poll_stats->lcore_id == 0)
> +		poll_stats->lcore_id = lcore_id;
> +
> +	poll_stats->empty_dequeues++;
> +
> +	return 0;
> +}
> +
> +int __rte_experimental
> +rte_power_poll_stat_update(unsigned int lcore_id, uint8_t nb_pkt)
> +{
> +
> +	struct priority_worker *poll_stats;
> +
> +	if (lcore_id >= NUM_NODES)
> +		return -1;
> +
> +	poll_stats = &(ep_params->wrk_data.wrk_stats[lcore_id]);
> +
> +	if (poll_stats->lcore_id == 0)
> +		poll_stats->lcore_id = lcore_id;
> +
> +	poll_stats->num_dequeue_pkts += nb_pkt;
> +
> +	return 0;
> +}
> +
> +
> +uint64_t __rte_experimental
> +rte_power_empty_poll_stat_fetch(unsigned int lcore_id)
> +{
> +	struct priority_worker *poll_stats;
> +
> +	if (lcore_id >= NUM_NODES)
> +		return -1;
> +
> +	poll_stats = &(ep_params->wrk_data.wrk_stats[lcore_id]);
> +
> +	if (poll_stats->lcore_id == 0)
> +		poll_stats->lcore_id = lcore_id;
> +
> +	return poll_stats->empty_dequeues;
> +}
> +
> +uint64_t __rte_experimental
> +rte_power_poll_stat_fetch(unsigned int lcore_id)
> +{
> +	struct priority_worker *poll_stats;
> +
> +	if (lcore_id >= NUM_NODES)
> +		return -1;
> +
> +	poll_stats = &(ep_params->wrk_data.wrk_stats[lcore_id]);
> +
> +	if (poll_stats->lcore_id == 0)
> +		poll_stats->lcore_id = lcore_id;
> +
> +	return poll_stats->num_dequeue_pkts;
> +}
> diff --git a/lib/librte_power/rte_power_empty_poll.h b/lib/librte_power/rte_power_empty_poll.h
> new file mode 100644
> index 0000000..ae27f7d
> --- /dev/null
> +++ b/lib/librte_power/rte_power_empty_poll.h
> @@ -0,0 +1,219 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2010-2018 Intel Corporation
> + */
> +
> +#ifndef _RTE_EMPTY_POLL_H
> +#define _RTE_EMPTY_POLL_H
> +
> +/**
> + * @file
> + * RTE Power Management
> + */
> +#include <stdint.h>
> +#include <stdbool.h>
> +
> +#include <rte_common.h>
> +#include <rte_byteorder.h>
> +#include <rte_log.h>
> +#include <rte_string_fns.h>
> +#include <rte_power.h>
> +#include <rte_timer.h>
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +#define NUM_FREQS 20

I don't think this is enough. Suggest using RTE_MAX_LCORE_FREQS

> +
> +#define BINS_AV 4 /* Has to be ^2 */
> +
> +#define DROP (NUM_DIRECTIONS * NUM_DEVICES)
> +
> +#define NUM_PRIORITIES          2
> +
> +#define NUM_NODES         256  /* Max core number*/
> +
> +/* Processor Power State */
> +enum freq_val {
> +	LOW,
> +	MED,
> +	HGH,
> +	NUM_FREQ = NUM_FREQS
> +};
> +

Why is NUM_FREQ in this enum? 0,1,2,20 (or RTE_MAX_LCORE_FREQS) does not 
seem right.
If you are using NUM_FREQ in the code then why not just use NUM_FREQS.

> +
> +/* Queue Polling State */
> +enum queue_state {
> +	TRAINING, /* NO TRAFFIC */
> +	MED_NORMAL,   /* MED */
> +	HGH_BUSY,     /* HIGH */
> +	LOW_PURGE,    /* LOW */
> +};
> +
> +/* Queue Stats */
> +struct freq_threshold {
> +
> +	uint64_t base_edpi;
> +	bool trained;
> +	uint32_t threshold_percent;
> +	uint32_t cur_train_iter;
> +};
> +
> +/* Each Worder Thread Empty Poll Stats */
> +struct priority_worker {
> +
> +	/* Current dequeue and throughput counts */
> +	/* These 2 are written to by the worker threads */
> +	/* So keep them on their own cache line */
> +	uint64_t empty_dequeues;
> +	uint64_t num_dequeue_pkts;
> +
> +	enum queue_state queue_state;
> +
> +	uint64_t empty_dequeues_prev;
> +	uint64_t num_dequeue_pkts_prev;
> +
> +	/* Used for training only */
> +	struct freq_threshold thresh[NUM_FREQ];
> +	enum freq_val cur_freq;
> +
> +	/* bucket arrays to calculate the averages */
> +	uint64_t edpi_av[BINS_AV];
> +	uint32_t  ec;
> +	uint64_t ppi_av[BINS_AV];
> +	uint32_t  pc;
> +
> +	uint32_t lcore_id;
> +	uint32_t iter_counter;
> +	uint32_t threshold_ctr;
> +	uint32_t display_ctr;
> +	uint8_t  dev_id;
> +
> +} __rte_cache_aligned;
> +

Suggest adding a comment on each of the variables above explaining what 
the acronym means.
E.g. edpi, ec, pc, ppi.


> +
> +struct stats_data {
> +
> +	struct priority_worker wrk_stats[NUM_NODES];
> +
> +	/* flag to stop rx threads processing packets until training over */
> +	bool start_rx;
> +
> +};
> +
> +/* Empty Poll Parameters */
> +struct ep_params {
> +
> +	/* Timer related stuff */
> +	uint64_t interval_ticks;
> +	uint32_t max_train_iter;
> +
> +	struct rte_timer timer0;
> +	struct stats_data wrk_data;
> +};
> +
> +
> +/* Sample App Init information */
> +struct ep_policy {
> +
> +	uint64_t med_base_edpi;
> +	uint64_t hgh_base_edpi;
> +
> +	enum queue_state state;
> +};
> +
> +
> +
> +/**
> + * Initialize the power management system.
> + *
> + * @param eptr
> + *   the structure of empty poll configuration
> + * @freq_tlb
> + *   the power state/frequency  mapping table
> + * @policy
> + *   the initialization policy from sample app
> + *
> + * @return
> + *  - 0 on success.
> + *  - Negative on error.
> + */
> +int __rte_experimental
> +rte_power_empty_poll_stat_init(struct ep_params **eptr, uint8_t *freq_tlb,
> +		struct ep_policy *policy);
> +
> +/**
> + * Free the resource hold by power management system.
> + */
> +void __rte_experimental
> +rte_power_empty_poll_stat_free(void);
> +
> +/**
> + * Update specific core empty poll counter
> + * It's not thread safe.
> + *
> + * @param lcore_id
> + *  lcore id
> + *
> + * @return
> + *  - 0 on success.
> + *  - Negative on error.
> + */
> +int __rte_experimental
> +rte_power_empty_poll_stat_update(unsigned int lcore_id);
> +
> +/**
> + * Update specific core valid poll counter, not thread safe.
> + *
> + * @param lcore_id
> + *  lcore id.
> + * @param nb_pkt
> + *  The packet number of one valid poll.
> + *
> + * @return
> + *  - 0 on success.
> + *  - Negative on error.
> + */
> +int __rte_experimental
> +rte_power_poll_stat_update(unsigned int lcore_id, uint8_t nb_pkt);
> +
> +/**
> + * Fetch specific core empty poll counter.
> + *
> + * @param lcore_id
> + *  lcore id
> + *
> + * @return
> + *  Current lcore empty poll counter value.
> + */
> +uint64_t __rte_experimental
> +rte_power_empty_poll_stat_fetch(unsigned int lcore_id);
> +
> +/**
> + * Fetch specific core valid poll counter.
> + *
> + * @param lcore_id
> + *  lcore id
> + *
> + * @return
> + *  Current lcore valid poll counter value.
> + */
> +uint64_t __rte_experimental
> +rte_power_poll_stat_fetch(unsigned int lcore_id);
> +
> +/**
> + * Empty poll  state change detection function
> + *
> + * @param  tim
> + *  The timer structure
> + * @param  arg
> + *  The customized parameter
> + */
> +void  __rte_experimental
> +rte_empty_poll_detection(struct rte_timer *tim, void *arg);
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif
> diff --git a/lib/librte_power/rte_power_version.map b/lib/librte_power/rte_power_version.map
> index dd587df..11ffdfb 100644
> --- a/lib/librte_power/rte_power_version.map
> +++ b/lib/librte_power/rte_power_version.map
> @@ -33,3 +33,16 @@ DPDK_18.08 {
>   	rte_power_get_capabilities;
>   
>   } DPDK_17.11;
> +
> +EXPERIMENTAL {
> +        global:
> +
> +        rte_power_empty_poll_stat_init;
> +        rte_power_empty_poll_stat_free;
> +        rte_power_empty_poll_stat_update;
> +        rte_power_empty_poll_stat_fetch;
> +        rte_power_poll_stat_fetch;
> +        rte_power_poll_stat_update;
> +        rte_empty_poll_detection;
> +
> +};

checkpatch has several warnings:



### lib/librte_power: traffic pattern aware power control

WARNING:LONG_LINE: line over 80 characters
#355: FILE: lib/librte_power/rte_power_empty_poll.c:199:
+ poll_stats->thresh[freq].base_edpi / 2000;

WARNING:LONG_LINE: line over 80 characters
#417: FILE: lib/librte_power/rte_power_empty_poll.c:261:
+ poll_stats->thresh[poll_stats->cur_freq].base_edpi);

total: 0 errors, 2 warnings, 802 lines checked
ERROR: symbol rte_empty_poll_detection is added in a section other than 
the EXPERIMENTAL section of the version map
ERROR: symbol rte_power_empty_poll_stat_fetch is added in a section 
other than the EXPERIMENTAL section of the version map
ERROR: symbol rte_power_empty_poll_stat_free is added in a section other 
than the EXPERIMENTAL section of the version map
ERROR: symbol rte_power_empty_poll_stat_init is added in a section other 
than the EXPERIMENTAL section of the version map
ERROR: symbol rte_power_empty_poll_stat_update is added in a section 
other than the EXPERIMENTAL section of the version map
ERROR: symbol rte_power_poll_stat_fetch is added in a section other than 
the EXPERIMENTAL section of the version map
ERROR: symbol rte_power_poll_stat_update is added in a section other 
than the EXPERIMENTAL section of the version map
Warning in /lib/librte_power/rte_power_empty_poll.c:
are you sure you want to add the following:
rte_panic\(


Rgds,
Dave.

  parent reply	other threads:[~2018-09-28 10:47 UTC|newest]

Thread overview: 79+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-08  9:57 [dpdk-dev] [PATCH v1 1/2] " Liang Ma
2018-06-08  9:57 ` [dpdk-dev] [PATCH v1 2/2] examples/l3fwd-power: simple app update to support new API Liang Ma
2018-06-19 10:31   ` Hunt, David
2018-06-08 15:26 ` [dpdk-dev] [PATCH v2 1/2] lib/librte_power: traffic pattern aware power control Liang Ma
2018-06-08 15:26   ` [dpdk-dev] [PATCH v2 2/2] examples/l3fwd-power: simple app update to support new API Liang Ma
2018-06-20 14:44     ` [dpdk-dev] [PATCH v3 1/2] lib/librte_power: traffic pattern aware power control Liang Ma
2018-06-20 14:44       ` [dpdk-dev] [PATCH v3 2/2] examples/l3fwd-power: simple app update to support new API Liang Ma
2018-06-26 11:40         ` [dpdk-dev] [PATCH v4 1/2] lib/librte_power: traffic pattern aware power control Radu Nicolau
2018-06-26 11:40           ` [dpdk-dev] [PATCH v4 2/2] examples/l3fwd-power: simple app update to support new API Radu Nicolau
2018-06-26 13:03             ` Hunt, David
2018-06-26 13:03           ` [dpdk-dev] [PATCH v4 1/2] lib/librte_power: traffic pattern aware power control Hunt, David
2018-06-27 17:33           ` Kevin Traynor
2018-07-05 14:45             ` Liang, Ma
2018-07-12 17:30               ` Thomas Monjalon
2018-09-11  9:19             ` Hunt, David
2018-09-13  9:46               ` Kevin Traynor
2018-09-13 13:30                 ` Liang, Ma
2018-07-10 16:04           ` [dpdk-dev] [PATCH v5 " Radu Nicolau
2018-07-10 16:04             ` [dpdk-dev] [PATCH v5 2/2] examples/l3fwd-power: simple app update to support new API Radu Nicolau
2018-08-31 15:04             ` [dpdk-dev] [PATCH v6 1/4] lib/librte_power: traffic pattern aware power control Liang Ma
2018-08-31 15:04               ` [dpdk-dev] [PATCH v6 2/4] examples/l3fwd-power: simple app update for new API Liang Ma
2018-08-31 15:04               ` [dpdk-dev] [PATCH v6 3/4] doc/guides/proguides/power-man: update the power API Liang Ma
2018-08-31 15:04               ` [dpdk-dev] [PATCH v6 4/4] doc/guides/sample_app_ug/l3_forward_power_man.rst: empty poll update Liang Ma
2018-09-04  1:11               ` [dpdk-dev] [PATCH v6 1/4] lib/librte_power: traffic pattern aware power control Yao, Lei A
2018-09-04  2:09               ` Yao, Lei A
2018-09-04 14:10               ` [dpdk-dev] [PATCH v7 " Liang Ma
2018-09-04 14:10                 ` [dpdk-dev] [PATCH v7 2/4] examples/l3fwd-power: simple app update for new API Liang Ma
2018-09-04 14:10                 ` [dpdk-dev] [PATCH v7 3/4] doc/guides/proguides/power-man: update the power API Liang Ma
2018-09-04 14:10                 ` [dpdk-dev] [PATCH v7 4/4] doc/guides/sample_app_ug/l3_forward_power_man.rst: empty poll update Liang Ma
2018-09-13 10:54                 ` [dpdk-dev] [PATCH v7 1/4] lib/librte_power: traffic pattern aware power control Kevin Traynor
2018-09-13 13:37                   ` Liang, Ma
2018-09-13 14:05                     ` Hunt, David
2018-09-17 13:30                 ` [dpdk-dev] [PATCH v8 " Liang Ma
2018-09-17 13:30                   ` [dpdk-dev] [PATCH v8 2/4] examples/l3fwd-power: simple app update for new API Liang Ma
2018-09-28 11:19                     ` Hunt, David
2018-10-02 10:18                       ` Liang, Ma
2018-09-17 13:30                   ` [dpdk-dev] [PATCH v8 3/4] doc/guides/proguide/power-man: update the power API Liang Ma
2018-09-25 12:31                     ` Kovacevic, Marko
2018-09-25 12:44                     ` Kovacevic, Marko
2018-09-28 12:30                     ` Hunt, David
2018-09-17 13:30                   ` [dpdk-dev] [PATCH v8 4/4] doc/guides/sample_app_ug/l3_forward_power_man.rst: empty poll update Liang Ma
2018-09-25 13:20                     ` Kovacevic, Marko
2018-09-28 12:43                       ` Hunt, David
2018-09-28 12:52                         ` Liang, Ma
2018-09-28 10:47                   ` Hunt, David [this message]
2018-10-02 10:13                     ` [dpdk-dev] [PATCH v8 1/4] lib/librte_power: traffic pattern aware power control Liang, Ma
2018-09-28 14:58                   ` [dpdk-dev] [PATCH v9 " Liang Ma
2018-09-28 14:58                     ` [dpdk-dev] [PATCH v9 2/4] examples/l3fwd-power: simple app update for new API Liang Ma
2018-09-28 14:58                     ` [dpdk-dev] [PATCH v9 3/4] doc/guides/pro_guide/power-man: update the power API Liang Ma
2018-10-02 13:48                     ` [dpdk-dev] [PATCH v10 1/4] lib/librte_power: traffic pattern aware power control Liang Ma
2018-10-02 13:48                       ` [dpdk-dev] [PATCH v10 2/4] examples/l3fwd-power: simple app update for new API Liang Ma
2018-10-02 14:23                         ` Hunt, David
2018-10-02 13:48                       ` [dpdk-dev] [PATCH v10 3/4] doc/guides/pro_guide/power-man: update the power API Liang Ma
2018-10-02 14:24                         ` Hunt, David
2018-10-02 13:48                       ` [dpdk-dev] [PATCH v10 4/4] doc/guides/sample_app_ug/l3_forward_power_man.rst: update Liang Ma
2018-10-02 14:25                         ` Hunt, David
2018-10-02 14:22                       ` [dpdk-dev] [PATCH v10 1/4] lib/librte_power: traffic pattern aware power control Hunt, David
2018-10-12  1:59                       ` Yao, Lei A
2018-10-12 10:02                         ` Liang, Ma
2018-10-12 13:22                           ` Yao, Lei A
2018-10-19 10:23                       ` [dpdk-dev] [PATCH v11 1/5] " Liang Ma
2018-10-19 10:23                         ` [dpdk-dev] [PATCH v11 2/5] examples/l3fwd-power: simple app update for new API Liang Ma
2018-10-19 10:23                         ` [dpdk-dev] [PATCH v11 3/5] doc/guides/pro_guide/power-man: update the power API Liang Ma
2018-10-19 10:23                         ` [dpdk-dev] [PATCH v11 5/5] doc: update release notes for empty poll library Liang Ma
2018-10-19 11:07                         ` [dpdk-dev] [PATCH v12 1/5] lib/librte_power: traffic pattern aware power control Liang Ma
2018-10-19 11:07                           ` [dpdk-dev] [PATCH v12 2/5] examples/l3fwd-power: simple app update for new API Liang Ma
2018-10-19 11:07                           ` [dpdk-dev] [PATCH v12 3/5] doc/guides/pro_guide/power-man: update the power API Liang Ma
2018-10-19 11:07                           ` [dpdk-dev] [PATCH v12 4/5] doc/guides/sample_app_ug/l3_forward_power_man.rst: update Liang Ma
2018-10-19 11:07                           ` [dpdk-dev] [PATCH v12 5/5] doc: update release notes for empty poll library Liang Ma
2018-10-22 12:41                             ` Kovacevic, Marko
2018-10-25 23:39                             ` Thomas Monjalon
2018-10-25 23:22                           ` [dpdk-dev] [PATCH v12 1/5] lib/librte_power: traffic pattern aware power control Thomas Monjalon
2018-10-25 23:32                             ` Thomas Monjalon
2018-10-25 23:54                           ` Thomas Monjalon
2018-10-25 23:55                           ` Thomas Monjalon
2018-10-26 13:34                             ` Liang, Ma
2018-10-01 10:06                   ` [dpdk-dev] [PATCH v9 4/4] doc/guides/sample_app_ug/l3_forward_power_man.rst: empty poll update Liang Ma
2018-06-14 10:59 ` [dpdk-dev] [PATCH v1 1/2] lib/librte_power: traffic pattern aware power control Hunt, David
2018-06-18 16:11   ` Liang, Ma

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fa8d85a2-a79e-d677-98a9-854ee3434440@intel.com \
    --to=david.hunt@intel.com \
    --cc=dev@dpdk.org \
    --cc=john.geary@intel.com \
    --cc=ktraynor@redhat.com \
    --cc=lei.a.yao@intel.com \
    --cc=liang.j.ma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).