DPDK patches and discussions
 help / color / mirror / Atom feed
From: Kevin Traynor <ktraynor@redhat.com>
To: Liang Ma <liang.j.ma@intel.com>, david.hunt@intel.com
Cc: dev@dpdk.org, lei.a.yao@intel.com, radu.nicolau@intel.com,
	anatoly.burakov@intel.com, john.geary@intel.com
Subject: Re: [dpdk-dev] [PATCH v7 1/4] lib/librte_power: traffic pattern aware power control
Date: Thu, 13 Sep 2018 11:54:50 +0100	[thread overview]
Message-ID: <9cd52e23-f1e9-0067-015c-9bd8ed1eaeab@redhat.com> (raw)
In-Reply-To: <1536070228-6545-1-git-send-email-liang.j.ma@intel.com>

On 09/04/2018 03:10 PM, Liang Ma wrote:
> 1. Abstract
> 

Hi Liang,

I didn't review the code, but some comments on API below,

> For packet processing workloads such as DPDK polling is continuous.
> This means CPU cores always show 100% busy independent of how much work
> those cores are doing. It is critical to accurately determine how busy
> a core is hugely important for the following reasons:
> 
>    * No indication of overload conditions
> 
>    * User do not know how much real load is on a system meaning resulted in
>      wasted energy as no power management is utilized
> 
> Compared to the original l3fwd-power design, instead of going to sleep
> after detecting an empty poll, the new mechanism just lowers the core
> frequency. As a result, the application does not stop polling the device,
> which leads to improved handling of bursts of traffic.
> 
> When the system become busy, the empty poll mechanism can also increase the
> core frequency (including turbo) to do best effort for intensive traffic.
> This gives us more flexible and balanced traffic awareness over the
> standard l3fwd-power application.
> 
> 2. Proposed solution
> 
> The proposed solution focuses on how many times empty polls are executed.
> The less the number of empty polls, means current core is busy with
> processing workload, therefore, the higher frequency is needed. The high
> empty poll number indicates the current core not doing any real work
> therefore, we can lower the frequency to safe power.
> 
> In the current implementation, each core has 1 empty-poll counter which
> assume 1 core is dedicated to 1 queue. This will need to be expanded in the
> future to support multiple queues per core.
> 
> 2.1 Power state definition:
> 
> 	LOW:  Not currently used, reserved for future use.
> 
> 	MED:  the frequency is used to process modest traffic workload.
> 
> 	HIGH: the frequency is used to process busy traffic workload.
> 
> 2.2 There are two phases to establish the power management system:
> 
> 	a.Initialization/Training phase. The training phase is necessary
> 	  in order to figure out the system polling baseline numbers from
> 	  idle to busy. The highest poll count will be during idle, where
> 	  all polls are empty. These poll counts will be different between
> 	  systems due to the many possible processor micro-arch, cache
> 	  and device configurations, hence the training phase.
>   	  In the training phase, traffic is blocked so the training

When you say 'traffic is blocked' is this something that the application
can do through DPDK API, or you mean no external packets are sent into
that port?

>   	  algorithm can average the empty-poll numbers for the LOW, MED and
>  	  HIGH  power states in order to create a baseline.
>   	  The core's counter are collected every 10ms, and the Training
>  	   phase will take 2 seconds.
> 
> 	b.Normal phase. When the training phase is complete, traffic is
>   	  started. The run-time poll counts are compared with the
> 	  baseline and the decision will be taken to move to MED power
>   	  state or HIGH power state. The counters are calculated every 10ms.
> 
> 3. Proposed  API
> 
> 1.  rte_power_empty_poll_stat_init(void);
> which is used to initialize the power management system.
>  
> 2.  rte_power_empty_poll_stat_free(void);
> which is used to free the resource hold by power management system.
>  
> 3.  rte_power_empty_poll_stat_update(unsigned int lcore_id);
> which is used to update specific core empty poll counter, not thread safe
>  
> 4.  rte_power_poll_stat_update(unsigned int lcore_id, uint8_t nb_pkt);
> which is used to update specific core valid poll counter, not thread safe
>  

is uint8_t enough to cover the max burst size for an rx poll? I didn't check

> 5.  rte_power_empty_poll_stat_fetch(unsigned int lcore_id);
> which is used to get specific core empty poll counter.
>  
> 6.  rte_power_poll_stat_fetch(unsigned int lcore_id);
> which is used to get specific core valid poll counter.
> 

How about replacing 1-6 with something like below..(not sure what would
be best prefix)

rte_power_poll_stat_init(void);
rte_power_poll_stat_free(void);
rte_power_poll_stat_update(unsigned int lcore, uint8_t nb_pkts)
rte_power_poll_stat_fetch(unsigned int lcore, uint8_t stat)

This would mean combining 3./4. as per previous suggestion so the
application could just call a single function with nb_pkt (which could
be 0). It also makes it more extensible if you want to add new stats in
the future, rather than having to add more functions.

It doesn't do anything different, so perhaps it's just a matter of
personal tastes.

> 7.  rte_empty_poll_detection(void);
> which is used to detect empty poll state changes.
> 

s/rte/rte_power/ ?

I think this needs some better docs/doxygen and maybe name. It seems to
not just detect state change but take some actions so that should be
documented, even in a general way.

thanks,
Kevin.

> ChangeLog:
> v2: fix some coding style issues.
> v3: rename the filename, API name.
> v4: no change.
> v5: no change.
> v6: re-work the code layout, update API.
> v7: fix minor typo and lift node num limit.
> 
> Signed-off-by: Liang Ma <liang.j.ma@intel.com>
> 
> Reviewed-by: Lei Yao <lei.a.yao@intel.com>
> ---
>  lib/librte_power/Makefile               |   6 +-
>  lib/librte_power/meson.build            |   5 +-
>  lib/librte_power/rte_power_empty_poll.c | 500 ++++++++++++++++++++++++++++++++
>  lib/librte_power/rte_power_empty_poll.h | 205 +++++++++++++
>  lib/librte_power/rte_power_version.map  |  13 +
>  5 files changed, 725 insertions(+), 4 deletions(-)
>  create mode 100644 lib/librte_power/rte_power_empty_poll.c
>  create mode 100644 lib/librte_power/rte_power_empty_poll.h
> 
> diff --git a/lib/librte_power/Makefile b/lib/librte_power/Makefile
> index 6f85e88..a8f1301 100644
> --- a/lib/librte_power/Makefile
> +++ b/lib/librte_power/Makefile
> @@ -6,8 +6,9 @@ include $(RTE_SDK)/mk/rte.vars.mk
>  # library name
>  LIB = librte_power.a
>  
> +CFLAGS += -DALLOW_EXPERIMENTAL_API
>  CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3 -fno-strict-aliasing
> -LDLIBS += -lrte_eal
> +LDLIBS += -lrte_eal -lrte_timer
>  
>  EXPORT_MAP := rte_power_version.map
>  
> @@ -16,8 +17,9 @@ LIBABIVER := 1
>  # all source are stored in SRCS-y
>  SRCS-$(CONFIG_RTE_LIBRTE_POWER) := rte_power.c power_acpi_cpufreq.c
>  SRCS-$(CONFIG_RTE_LIBRTE_POWER) += power_kvm_vm.c guest_channel.c
> +SRCS-$(CONFIG_RTE_LIBRTE_POWER) += rte_power_empty_poll.c
>  
>  # install this header file
> -SYMLINK-$(CONFIG_RTE_LIBRTE_POWER)-include := rte_power.h
> +SYMLINK-$(CONFIG_RTE_LIBRTE_POWER)-include := rte_power.h  rte_power_empty_poll.h
>  
>  include $(RTE_SDK)/mk/rte.lib.mk
> diff --git a/lib/librte_power/meson.build b/lib/librte_power/meson.build
> index 253173f..63957eb 100644
> --- a/lib/librte_power/meson.build
> +++ b/lib/librte_power/meson.build
> @@ -5,5 +5,6 @@ if host_machine.system() != 'linux'
>  	build = false
>  endif
>  sources = files('rte_power.c', 'power_acpi_cpufreq.c',
> -		'power_kvm_vm.c', 'guest_channel.c')
> -headers = files('rte_power.h')
> +		'power_kvm_vm.c', 'guest_channel.c',
> +		'rte_power_empty_poll.c')
> +headers = files('rte_power.h','rte_power_empty_poll.h')
> diff --git a/lib/librte_power/rte_power_empty_poll.c b/lib/librte_power/rte_power_empty_poll.c
> new file mode 100644
> index 0000000..3dac654
> --- /dev/null
> +++ b/lib/librte_power/rte_power_empty_poll.c
> @@ -0,0 +1,500 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2010-2018 Intel Corporation
> + */
> +
> +#include <string.h>
> +
> +#include <rte_lcore.h>
> +#include <rte_cycles.h>
> +#include <rte_atomic.h>
> +#include <rte_malloc.h>
> +
> +#include "rte_power.h"
> +#include "rte_power_empty_poll.h"
> +
> +#define INTERVALS_PER_SECOND 100     /* (10ms) */
> +#define SECONDS_TO_TRAIN_FOR 2
> +#define DEFAULT_MED_TO_HIGH_PERCENT_THRESHOLD 70
> +#define DEFAULT_HIGH_TO_MED_PERCENT_THRESHOLD 30
> +#define DEFAULT_CYCLES_PER_PACKET 800
> +
> +static struct ep_params *ep_params;
> +static uint32_t med_to_high_threshold = DEFAULT_MED_TO_HIGH_PERCENT_THRESHOLD;
> +static uint32_t high_to_med_threshold = DEFAULT_HIGH_TO_MED_PERCENT_THRESHOLD;
> +
> +static uint32_t avail_freqs[RTE_MAX_LCORE][NUM_FREQS];
> +
> +static uint32_t total_avail_freqs[RTE_MAX_LCORE];
> +
> +static uint32_t freq_index[NUM_FREQ];
> +
> +static uint32_t
> +get_freq_index(enum freq_val index)
> +{
> +	return freq_index[index];
> +}
> +
> +
> +static int
> +set_power_freq(int lcore_id, enum freq_val freq, bool specific_freq)
> +{
> +	int err = 0;
> +	uint32_t power_freq_index;
> +	if (!specific_freq)
> +		power_freq_index = get_freq_index(freq);
> +	else
> +		power_freq_index = freq;
> +
> +	err = rte_power_set_freq(lcore_id, power_freq_index);
> +
> +	return err;
> +}
> +
> +
> +static inline void __attribute__((always_inline))
> +exit_training_state(struct priority_worker *poll_stats)
> +{
> +	RTE_SET_USED(poll_stats);
> +}
> +
> +static inline void __attribute__((always_inline))
> +enter_training_state(struct priority_worker *poll_stats)
> +{
> +	poll_stats->iter_counter = 0;
> +	poll_stats->cur_freq = LOW;
> +	poll_stats->queue_state = TRAINING;
> +}
> +
> +static inline void __attribute__((always_inline))
> +enter_normal_state(struct priority_worker *poll_stats)
> +{
> +	/* Clear the averages arrays and strs */
> +	memset(poll_stats->edpi_av, 0, sizeof(poll_stats->edpi_av));
> +	poll_stats->ec = 0;
> +	memset(poll_stats->ppi_av, 0, sizeof(poll_stats->ppi_av));
> +	poll_stats->pc = 0;
> +
> +	poll_stats->cur_freq = MED;
> +	poll_stats->iter_counter = 0;
> +	poll_stats->threshold_ctr = 0;
> +	poll_stats->queue_state = MED_NORMAL;
> +	set_power_freq(poll_stats->lcore_id, MED, false);
> +
> +	/* Try here */
> +	poll_stats->thresh[MED].threshold_percent = med_to_high_threshold;
> +	poll_stats->thresh[HGH].threshold_percent = high_to_med_threshold;
> +}
> +
> +static inline void __attribute__((always_inline))
> +enter_busy_state(struct priority_worker *poll_stats)
> +{
> +	memset(poll_stats->edpi_av, 0, sizeof(poll_stats->edpi_av));
> +	poll_stats->ec = 0;
> +	memset(poll_stats->ppi_av, 0, sizeof(poll_stats->ppi_av));
> +	poll_stats->pc = 0;
> +
> +	poll_stats->cur_freq = HGH;
> +	poll_stats->iter_counter = 0;
> +	poll_stats->threshold_ctr = 0;
> +	poll_stats->queue_state = HGH_BUSY;
> +	set_power_freq(poll_stats->lcore_id, HGH, false);
> +}
> +
> +static inline void __attribute__((always_inline))
> +enter_purge_state(struct priority_worker *poll_stats)
> +{
> +	poll_stats->iter_counter = 0;
> +	poll_stats->queue_state = LOW_PURGE;
> +}
> +
> +static inline void __attribute__((always_inline))
> +set_state(struct priority_worker *poll_stats,
> +		enum queue_state new_state)
> +{
> +	enum queue_state old_state = poll_stats->queue_state;
> +	if (old_state != new_state) {
> +
> +		/* Call any old state exit functions */
> +		if (old_state == TRAINING)
> +			exit_training_state(poll_stats);
> +
> +		/* Call any new state entry functions */
> +		if (new_state == TRAINING)
> +			enter_training_state(poll_stats);
> +		if (new_state == MED_NORMAL)
> +			enter_normal_state(poll_stats);
> +		if (new_state == HGH_BUSY)
> +			enter_busy_state(poll_stats);
> +		if (new_state == LOW_PURGE)
> +			enter_purge_state(poll_stats);
> +	}
> +}
> +
> +
> +static void
> +update_training_stats(struct priority_worker *poll_stats,
> +		uint32_t freq,
> +		bool specific_freq,
> +		uint32_t max_train_iter)
> +{
> +	RTE_SET_USED(specific_freq);
> +
> +	char pfi_str[32];
> +	uint64_t p0_empty_deq;
> +
> +	sprintf(pfi_str, "%02d", freq);
> +
> +	if (poll_stats->cur_freq == freq &&
> +			poll_stats->thresh[freq].trained == false) {
> +		if (poll_stats->thresh[freq].cur_train_iter == 0) {
> +
> +			set_power_freq(poll_stats->lcore_id,
> +					freq, specific_freq);
> +
> +			poll_stats->empty_dequeues_prev =
> +				poll_stats->empty_dequeues;
> +
> +			poll_stats->thresh[freq].cur_train_iter++;
> +
> +			return;
> +		} else if (poll_stats->thresh[freq].cur_train_iter
> +				<= max_train_iter) {
> +
> +			p0_empty_deq = poll_stats->empty_dequeues -
> +				poll_stats->empty_dequeues_prev;
> +
> +			poll_stats->empty_dequeues_prev =
> +				poll_stats->empty_dequeues;
> +
> +			poll_stats->thresh[freq].base_edpi += p0_empty_deq;
> +			poll_stats->thresh[freq].cur_train_iter++;
> +
> +		} else {
> +			if (poll_stats->thresh[freq].trained == false) {
> +				poll_stats->thresh[freq].base_edpi =
> +					poll_stats->thresh[freq].base_edpi /
> +					max_train_iter;
> +
> +				/* Add on a factor of 0.05%, this should remove any */
> +				/* false negatives when the system is 0% busy */
> +				poll_stats->thresh[freq].base_edpi +=
> +					poll_stats->thresh[freq].base_edpi / 2000;
> +
> +				poll_stats->thresh[freq].trained = true;
> +				poll_stats->cur_freq++;
> +
> +			}
> +		}
> +	}
> +}
> +
> +static inline uint32_t __attribute__((always_inline))
> +update_stats(struct priority_worker *poll_stats)
> +{
> +	uint64_t tot_edpi = 0, tot_ppi = 0;
> +	uint32_t j, percent;
> +
> +	struct priority_worker *s = poll_stats;
> +
> +	uint64_t cur_edpi = s->empty_dequeues - s->empty_dequeues_prev;
> +
> +	s->empty_dequeues_prev = s->empty_dequeues;
> +
> +	uint64_t ppi = s->num_dequeue_pkts - s->num_dequeue_pkts_prev;
> +
> +	s->num_dequeue_pkts_prev = s->num_dequeue_pkts;
> +
> +	if (s->thresh[s->cur_freq].base_edpi < cur_edpi) {
> +		RTE_LOG(DEBUG, POWER, "cur_edpi is too large "
> +				"cur edpi %ld "
> +				"base epdi %ld\n",
> +				cur_edpi,
> +				s->thresh[s->cur_freq].base_edpi);
> +		/* Value to make us fail need debug log*/
> +		return 1000UL;
> +	}
> +
> +	s->edpi_av[s->ec++ % BINS_AV] = cur_edpi;
> +	s->ppi_av[s->pc++ % BINS_AV] = ppi;
> +
> +	for (j = 0; j < BINS_AV; j++) {
> +		tot_edpi += s->edpi_av[j];
> +		tot_ppi += s->ppi_av[j];
> +	}
> +
> +	tot_edpi = tot_edpi / BINS_AV;
> +
> +	percent = 100 - (uint32_t)(((float)tot_edpi /
> +			(float)s->thresh[s->cur_freq].base_edpi) * 100);
> +
> +	return (uint32_t)percent;
> +}
> +
> +
> +static inline void  __attribute__((always_inline))
> +update_stats_normal(struct priority_worker *poll_stats)
> +{
> +	uint32_t percent;
> +
> +	if (poll_stats->thresh[poll_stats->cur_freq].base_edpi == 0)
> +		return;
> +
> +	percent = update_stats(poll_stats);
> +
> +	if (percent > 100)
> +		return;
> +
> +	if (poll_stats->cur_freq == LOW)
> +		RTE_LOG(INFO, POWER, "Purge Mode is not supported\n");
> +	else if (poll_stats->cur_freq == MED) {
> +
> +		if (percent >
> +			poll_stats->thresh[MED].threshold_percent) {
> +
> +			if (poll_stats->threshold_ctr < INTERVALS_PER_SECOND)
> +				poll_stats->threshold_ctr++;
> +			else {
> +				set_state(poll_stats, HGH_BUSY);
> +				RTE_LOG(INFO, POWER, "MOVE to HGH\n");
> +			}
> +
> +		} else {
> +			/* reset */
> +			/* need debug log */
> +			poll_stats->threshold_ctr = 0;
> +		}
> +
> +	} else if (poll_stats->cur_freq == HGH) {
> +
> +		if (percent <
> +				poll_stats->thresh[HGH].threshold_percent) {
> +
> +			if (poll_stats->threshold_ctr < INTERVALS_PER_SECOND)
> +				poll_stats->threshold_ctr++;
> +			else
> +				set_state(poll_stats, MED_NORMAL);
> +		} else
> +			/* reset */
> +			/* need debug log */
> +			poll_stats->threshold_ctr = 0;
> +
> +
> +	}
> +}
> +
> +static int
> +empty_poll_training(struct priority_worker *poll_stats,
> +		uint32_t max_train_iter)
> +{
> +
> +	if (poll_stats->iter_counter < INTERVALS_PER_SECOND) {
> +		poll_stats->iter_counter++;
> +		return 0;
> +	}
> +
> +
> +	update_training_stats(poll_stats,
> +			LOW,
> +			false,
> +			max_train_iter);
> +
> +	update_training_stats(poll_stats,
> +			MED,
> +			false,
> +			max_train_iter);
> +
> +	update_training_stats(poll_stats,
> +			HGH,
> +			false,
> +			max_train_iter);
> +
> +
> +	if (poll_stats->thresh[LOW].trained == true
> +			&& poll_stats->thresh[MED].trained == true
> +			&& poll_stats->thresh[HGH].trained == true) {
> +
> +		set_state(poll_stats, MED_NORMAL);
> +
> +		RTE_LOG(INFO, POWER, "Training is Complete for %d\n",
> +				poll_stats->lcore_id);
> +	}
> +
> +	return 0;
> +}
> +
> +void
> +rte_empty_poll_detection(struct rte_timer *tim,
> +		void *arg)
> +{
> +
> +	uint32_t i;
> +
> +	struct priority_worker *poll_stats;
> +
> +	RTE_SET_USED(tim);
> +
> +	RTE_SET_USED(arg);
> +
> +	for (i = 0; i < NUM_NODES; i++) {
> +
> +		poll_stats = &(ep_params->wrk_data.wrk_stats[i]);
> +
> +		if (rte_lcore_is_enabled(poll_stats->lcore_id) == 0)
> +			continue;
> +
> +		switch (poll_stats->queue_state) {
> +		case(TRAINING):
> +			empty_poll_training(poll_stats,
> +					ep_params->max_train_iter);
> +			break;
> +
> +		case(HGH_BUSY):
> +		case(MED_NORMAL):
> +			update_stats_normal(poll_stats);
> +
> +			break;
> +
> +		case(LOW_PURGE):
> +			break;
> +		default:
> +			break;
> +
> +		}
> +
> +	}
> +
> +}
> +
> +int __rte_experimental
> +rte_power_empty_poll_stat_init(struct ep_params **eptr, uint8_t *freq_tlb)
> +{
> +	uint32_t i;
> +	/* Allocate the ep_params structure */
> +	ep_params = rte_zmalloc_socket(NULL,
> +			sizeof(struct ep_params),
> +			0,
> +			rte_socket_id());
> +
> +	if (!ep_params)
> +		rte_panic("Cannot allocate heap memory for ep_params "
> +				"for socket %d\n", rte_socket_id());
> +
> +	if (freq_tlb == NULL) {
> +		freq_index[LOW] = 14;
> +		freq_index[MED] = 9;
> +		freq_index[HGH] = 1;
> +	} else {
> +		freq_index[LOW] = freq_tlb[LOW];
> +		freq_index[MED] = freq_tlb[MED];
> +		freq_index[HGH] = freq_tlb[HGH];
> +	}
> +
> +	RTE_LOG(INFO, POWER, "Initialize the Empty Poll\n");
> +
> +	/* 5 seconds worth of training */
> +	ep_params->max_train_iter = INTERVALS_PER_SECOND * SECONDS_TO_TRAIN_FOR;
> +
> +	struct stats_data *w = &ep_params->wrk_data;
> +
> +	*eptr = ep_params;
> +
> +	/* initialize all wrk_stats state */
> +	for (i = 0; i < NUM_NODES; i++) {
> +
> +		if (rte_lcore_is_enabled(i) == 0)
> +			continue;
> +
> +		set_state(&w->wrk_stats[i], TRAINING);
> +		/*init the freqs table */
> +		total_avail_freqs[i] = rte_power_freqs(i,
> +				avail_freqs[i],
> +				NUM_FREQS);
> +
> +		if (get_freq_index(LOW) > total_avail_freqs[i])
> +			return -1;
> +
> +	}
> +
> +
> +	return 0;
> +}
> +
> +void __rte_experimental
> +rte_power_empty_poll_stat_free(void)
> +{
> +
> +	RTE_LOG(INFO, POWER, "Close the Empty Poll\n");
> +
> +	if (ep_params != NULL)
> +		rte_free(ep_params);
> +}
> +
> +int __rte_experimental
> +rte_power_empty_poll_stat_update(unsigned int lcore_id)
> +{
> +	struct priority_worker *poll_stats;
> +
> +	if (lcore_id >= NUM_NODES)
> +		return -1;
> +
> +	poll_stats = &(ep_params->wrk_data.wrk_stats[lcore_id]);
> +
> +	if (poll_stats->lcore_id == 0)
> +		poll_stats->lcore_id = lcore_id;
> +
> +	poll_stats->empty_dequeues++;
> +
> +	return 0;
> +}
> +
> +int __rte_experimental
> +rte_power_poll_stat_update(unsigned int lcore_id, uint8_t nb_pkt)
> +{
> +
> +	struct priority_worker *poll_stats;
> +
> +	if (lcore_id >= NUM_NODES)
> +		return -1;
> +
> +	poll_stats = &(ep_params->wrk_data.wrk_stats[lcore_id]);
> +
> +	if (poll_stats->lcore_id == 0)
> +		poll_stats->lcore_id = lcore_id;
> +
> +	poll_stats->num_dequeue_pkts += nb_pkt;
> +
> +	return 0;
> +}
> +
> +
> +uint64_t __rte_experimental
> +rte_power_empty_poll_stat_fetch(unsigned int lcore_id)
> +{
> +	struct priority_worker *poll_stats;
> +
> +	if (lcore_id >= NUM_NODES)
> +		return -1;
> +
> +	poll_stats = &(ep_params->wrk_data.wrk_stats[lcore_id]);
> +
> +	if (poll_stats->lcore_id == 0)
> +		poll_stats->lcore_id = lcore_id;
> +
> +	return poll_stats->empty_dequeues;
> +}
> +
> +uint64_t __rte_experimental
> +rte_power_poll_stat_fetch(unsigned int lcore_id)
> +{
> +	struct priority_worker *poll_stats;
> +
> +	if (lcore_id >= NUM_NODES)
> +		return -1;
> +
> +	poll_stats = &(ep_params->wrk_data.wrk_stats[lcore_id]);
> +
> +	if (poll_stats->lcore_id == 0)
> +		poll_stats->lcore_id = lcore_id;
> +
> +	return poll_stats->num_dequeue_pkts;
> +}
> diff --git a/lib/librte_power/rte_power_empty_poll.h b/lib/librte_power/rte_power_empty_poll.h
> new file mode 100644
> index 0000000..0aca1f0
> --- /dev/null
> +++ b/lib/librte_power/rte_power_empty_poll.h
> @@ -0,0 +1,205 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2010-2018 Intel Corporation
> + */
> +
> +#ifndef _RTE_EMPTY_POLL_H
> +#define _RTE_EMPTY_POLL_H
> +
> +/**
> + * @file
> + * RTE Power Management
> + */
> +#include <stdint.h>
> +#include <stdbool.h>
> +
> +#include <rte_common.h>
> +#include <rte_byteorder.h>
> +#include <rte_log.h>
> +#include <rte_string_fns.h>
> +#include <rte_power.h>
> +#include <rte_timer.h>
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +#define NUM_FREQS 20
> +
> +#define BINS_AV 4 /* Has to be ^2 */
> +
> +#define DROP (NUM_DIRECTIONS * NUM_DEVICES)
> +
> +#define NUM_PRIORITIES          2
> +
> +#define NUM_NODES         256  /* Max core number*/
> +
> +/* Processor Power State */
> +enum freq_val {
> +	LOW,
> +	MED,
> +	HGH,
> +	NUM_FREQ = NUM_FREQS
> +};
> +
> +
> +/* Queue Polling State */
> +enum queue_state {
> +	TRAINING, /* NO TRAFFIC */
> +	MED_NORMAL,   /* MED */
> +	HGH_BUSY,     /* HIGH */
> +	LOW_PURGE,    /* LOW */
> +};
> +
> +/* Queue Stats */
> +struct freq_threshold {
> +
> +	uint64_t base_edpi;
> +	bool trained;
> +	uint32_t threshold_percent;
> +	uint32_t cur_train_iter;
> +};
> +
> +/* Each Worder Thread Empty Poll Stats */
> +struct priority_worker {
> +
> +	/* Current dequeue and throughput counts */
> +	/* These 2 are written to by the worker threads */
> +	/* So keep them on their own cache line */
> +	uint64_t empty_dequeues;
> +	uint64_t num_dequeue_pkts;
> +
> +	enum queue_state queue_state;
> +
> +	uint64_t empty_dequeues_prev;
> +	uint64_t num_dequeue_pkts_prev;
> +
> +	/* Used for training only */
> +	struct freq_threshold thresh[NUM_FREQ];
> +	enum freq_val cur_freq;
> +
> +	/* bucket arrays to calculate the averages */
> +	uint64_t edpi_av[BINS_AV];
> +	uint32_t  ec;
> +	uint64_t ppi_av[BINS_AV];
> +	uint32_t  pc;
> +
> +	uint32_t lcore_id;
> +	uint32_t iter_counter;
> +	uint32_t threshold_ctr;
> +	uint32_t display_ctr;
> +	uint8_t  dev_id;
> +
> +} __rte_cache_aligned;
> +
> +
> +struct stats_data {
> +
> +	struct priority_worker wrk_stats[NUM_NODES];
> +
> +	/* flag to stop rx threads processing packets until training over */
> +	bool start_rx;
> +
> +};
> +
> +/* Empty Poll Parameters */
> +struct ep_params {
> +
> +	/* Timer related stuff */
> +	uint64_t interval_ticks;
> +	uint32_t max_train_iter;
> +
> +	struct rte_timer timer0;
> +	struct stats_data wrk_data;
> +};
> +
> +
> +/**
> + * Initialize the power management system.
> + *
> + * @param eptr
> + *   the structure of empty poll configuration
> + * @freq_tlb
> + *   the power state/frequency  mapping table
> + *
> + * @return
> + *  - 0 on success.
> + *  - Negative on error.
> + */
> +int __rte_experimental
> +rte_power_empty_poll_stat_init(struct ep_params **eptr, uint8_t *freq_tlb);
> +
> +/**
> + * Free the resource hold by power management system.
> + */
> +void __rte_experimental
> +rte_power_empty_poll_stat_free(void);
> +
> +/**
> + * Update specific core empty poll counter
> + * It's not thread safe.
> + *
> + * @param lcore_id
> + *  lcore id
> + *
> + * @return
> + *  - 0 on success.
> + *  - Negative on error.
> + */
> +int __rte_experimental
> +rte_power_empty_poll_stat_update(unsigned int lcore_id);
> +
> +/**
> + * Update specific core valid poll counter, not thread safe.
> + *
> + * @param lcore_id
> + *  lcore id.
> + * @param nb_pkt
> + *  The packet number of one valid poll.
> + *
> + * @return
> + *  - 0 on success.
> + *  - Negative on error.
> + */
> +int __rte_experimental
> +rte_power_poll_stat_update(unsigned int lcore_id, uint8_t nb_pkt);
> +
> +/**
> + * Fetch specific core empty poll counter.
> + *
> + * @param lcore_id
> + *  lcore id
> + *
> + * @return
> + *  Current lcore empty poll counter value.
> + */
> +uint64_t __rte_experimental
> +rte_power_empty_poll_stat_fetch(unsigned int lcore_id);
> +
> +/**
> + * Fetch specific core valid poll counter.
> + *
> + * @param lcore_id
> + *  lcore id
> + *
> + * @return
> + *  Current lcore valid poll counter value.
> + */
> +uint64_t __rte_experimental
> +rte_power_poll_stat_fetch(unsigned int lcore_id);
> +
> +/**
> + * Empty poll  state change detection function
> + *
> + * @param  tim
> + *  The timer structure
> + * @param  arg
> + *  The customized parameter
> + */
> +void  __rte_experimental
> +rte_empty_poll_detection(struct rte_timer *tim, void *arg);
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif
> diff --git a/lib/librte_power/rte_power_version.map b/lib/librte_power/rte_power_version.map
> index dd587df..11ffdfb 100644
> --- a/lib/librte_power/rte_power_version.map
> +++ b/lib/librte_power/rte_power_version.map
> @@ -33,3 +33,16 @@ DPDK_18.08 {
>  	rte_power_get_capabilities;
>  
>  } DPDK_17.11;
> +
> +EXPERIMENTAL {
> +        global:
> +
> +        rte_power_empty_poll_stat_init;
> +        rte_power_empty_poll_stat_free;
> +        rte_power_empty_poll_stat_update;
> +        rte_power_empty_poll_stat_fetch;
> +        rte_power_poll_stat_fetch;
> +        rte_power_poll_stat_update;
> +        rte_empty_poll_detection;
> +
> +};
> 

  parent reply	other threads:[~2018-09-13 10:54 UTC|newest]

Thread overview: 79+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-08  9:57 [dpdk-dev] [PATCH v1 1/2] " Liang Ma
2018-06-08  9:57 ` [dpdk-dev] [PATCH v1 2/2] examples/l3fwd-power: simple app update to support new API Liang Ma
2018-06-19 10:31   ` Hunt, David
2018-06-08 15:26 ` [dpdk-dev] [PATCH v2 1/2] lib/librte_power: traffic pattern aware power control Liang Ma
2018-06-08 15:26   ` [dpdk-dev] [PATCH v2 2/2] examples/l3fwd-power: simple app update to support new API Liang Ma
2018-06-20 14:44     ` [dpdk-dev] [PATCH v3 1/2] lib/librte_power: traffic pattern aware power control Liang Ma
2018-06-20 14:44       ` [dpdk-dev] [PATCH v3 2/2] examples/l3fwd-power: simple app update to support new API Liang Ma
2018-06-26 11:40         ` [dpdk-dev] [PATCH v4 1/2] lib/librte_power: traffic pattern aware power control Radu Nicolau
2018-06-26 11:40           ` [dpdk-dev] [PATCH v4 2/2] examples/l3fwd-power: simple app update to support new API Radu Nicolau
2018-06-26 13:03             ` Hunt, David
2018-06-26 13:03           ` [dpdk-dev] [PATCH v4 1/2] lib/librte_power: traffic pattern aware power control Hunt, David
2018-06-27 17:33           ` Kevin Traynor
2018-07-05 14:45             ` Liang, Ma
2018-07-12 17:30               ` Thomas Monjalon
2018-09-11  9:19             ` Hunt, David
2018-09-13  9:46               ` Kevin Traynor
2018-09-13 13:30                 ` Liang, Ma
2018-07-10 16:04           ` [dpdk-dev] [PATCH v5 " Radu Nicolau
2018-07-10 16:04             ` [dpdk-dev] [PATCH v5 2/2] examples/l3fwd-power: simple app update to support new API Radu Nicolau
2018-08-31 15:04             ` [dpdk-dev] [PATCH v6 1/4] lib/librte_power: traffic pattern aware power control Liang Ma
2018-08-31 15:04               ` [dpdk-dev] [PATCH v6 2/4] examples/l3fwd-power: simple app update for new API Liang Ma
2018-08-31 15:04               ` [dpdk-dev] [PATCH v6 3/4] doc/guides/proguides/power-man: update the power API Liang Ma
2018-08-31 15:04               ` [dpdk-dev] [PATCH v6 4/4] doc/guides/sample_app_ug/l3_forward_power_man.rst: empty poll update Liang Ma
2018-09-04  1:11               ` [dpdk-dev] [PATCH v6 1/4] lib/librte_power: traffic pattern aware power control Yao, Lei A
2018-09-04  2:09               ` Yao, Lei A
2018-09-04 14:10               ` [dpdk-dev] [PATCH v7 " Liang Ma
2018-09-04 14:10                 ` [dpdk-dev] [PATCH v7 2/4] examples/l3fwd-power: simple app update for new API Liang Ma
2018-09-04 14:10                 ` [dpdk-dev] [PATCH v7 3/4] doc/guides/proguides/power-man: update the power API Liang Ma
2018-09-04 14:10                 ` [dpdk-dev] [PATCH v7 4/4] doc/guides/sample_app_ug/l3_forward_power_man.rst: empty poll update Liang Ma
2018-09-13 10:54                 ` Kevin Traynor [this message]
2018-09-13 13:37                   ` [dpdk-dev] [PATCH v7 1/4] lib/librte_power: traffic pattern aware power control Liang, Ma
2018-09-13 14:05                     ` Hunt, David
2018-09-17 13:30                 ` [dpdk-dev] [PATCH v8 " Liang Ma
2018-09-17 13:30                   ` [dpdk-dev] [PATCH v8 2/4] examples/l3fwd-power: simple app update for new API Liang Ma
2018-09-28 11:19                     ` Hunt, David
2018-10-02 10:18                       ` Liang, Ma
2018-09-17 13:30                   ` [dpdk-dev] [PATCH v8 3/4] doc/guides/proguide/power-man: update the power API Liang Ma
2018-09-25 12:31                     ` Kovacevic, Marko
2018-09-25 12:44                     ` Kovacevic, Marko
2018-09-28 12:30                     ` Hunt, David
2018-09-17 13:30                   ` [dpdk-dev] [PATCH v8 4/4] doc/guides/sample_app_ug/l3_forward_power_man.rst: empty poll update Liang Ma
2018-09-25 13:20                     ` Kovacevic, Marko
2018-09-28 12:43                       ` Hunt, David
2018-09-28 12:52                         ` Liang, Ma
2018-09-28 10:47                   ` [dpdk-dev] [PATCH v8 1/4] lib/librte_power: traffic pattern aware power control Hunt, David
2018-10-02 10:13                     ` Liang, Ma
2018-09-28 14:58                   ` [dpdk-dev] [PATCH v9 " Liang Ma
2018-09-28 14:58                     ` [dpdk-dev] [PATCH v9 2/4] examples/l3fwd-power: simple app update for new API Liang Ma
2018-09-28 14:58                     ` [dpdk-dev] [PATCH v9 3/4] doc/guides/pro_guide/power-man: update the power API Liang Ma
2018-10-02 13:48                     ` [dpdk-dev] [PATCH v10 1/4] lib/librte_power: traffic pattern aware power control Liang Ma
2018-10-02 13:48                       ` [dpdk-dev] [PATCH v10 2/4] examples/l3fwd-power: simple app update for new API Liang Ma
2018-10-02 14:23                         ` Hunt, David
2018-10-02 13:48                       ` [dpdk-dev] [PATCH v10 3/4] doc/guides/pro_guide/power-man: update the power API Liang Ma
2018-10-02 14:24                         ` Hunt, David
2018-10-02 13:48                       ` [dpdk-dev] [PATCH v10 4/4] doc/guides/sample_app_ug/l3_forward_power_man.rst: update Liang Ma
2018-10-02 14:25                         ` Hunt, David
2018-10-02 14:22                       ` [dpdk-dev] [PATCH v10 1/4] lib/librte_power: traffic pattern aware power control Hunt, David
2018-10-12  1:59                       ` Yao, Lei A
2018-10-12 10:02                         ` Liang, Ma
2018-10-12 13:22                           ` Yao, Lei A
2018-10-19 10:23                       ` [dpdk-dev] [PATCH v11 1/5] " Liang Ma
2018-10-19 10:23                         ` [dpdk-dev] [PATCH v11 2/5] examples/l3fwd-power: simple app update for new API Liang Ma
2018-10-19 10:23                         ` [dpdk-dev] [PATCH v11 3/5] doc/guides/pro_guide/power-man: update the power API Liang Ma
2018-10-19 10:23                         ` [dpdk-dev] [PATCH v11 5/5] doc: update release notes for empty poll library Liang Ma
2018-10-19 11:07                         ` [dpdk-dev] [PATCH v12 1/5] lib/librte_power: traffic pattern aware power control Liang Ma
2018-10-19 11:07                           ` [dpdk-dev] [PATCH v12 2/5] examples/l3fwd-power: simple app update for new API Liang Ma
2018-10-19 11:07                           ` [dpdk-dev] [PATCH v12 3/5] doc/guides/pro_guide/power-man: update the power API Liang Ma
2018-10-19 11:07                           ` [dpdk-dev] [PATCH v12 4/5] doc/guides/sample_app_ug/l3_forward_power_man.rst: update Liang Ma
2018-10-19 11:07                           ` [dpdk-dev] [PATCH v12 5/5] doc: update release notes for empty poll library Liang Ma
2018-10-22 12:41                             ` Kovacevic, Marko
2018-10-25 23:39                             ` Thomas Monjalon
2018-10-25 23:22                           ` [dpdk-dev] [PATCH v12 1/5] lib/librte_power: traffic pattern aware power control Thomas Monjalon
2018-10-25 23:32                             ` Thomas Monjalon
2018-10-25 23:54                           ` Thomas Monjalon
2018-10-25 23:55                           ` Thomas Monjalon
2018-10-26 13:34                             ` Liang, Ma
2018-10-01 10:06                   ` [dpdk-dev] [PATCH v9 4/4] doc/guides/sample_app_ug/l3_forward_power_man.rst: empty poll update Liang Ma
2018-06-14 10:59 ` [dpdk-dev] [PATCH v1 1/2] lib/librte_power: traffic pattern aware power control Hunt, David
2018-06-18 16:11   ` Liang, Ma

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9cd52e23-f1e9-0067-015c-9bd8ed1eaeab@redhat.com \
    --to=ktraynor@redhat.com \
    --cc=anatoly.burakov@intel.com \
    --cc=david.hunt@intel.com \
    --cc=dev@dpdk.org \
    --cc=john.geary@intel.com \
    --cc=lei.a.yao@intel.com \
    --cc=liang.j.ma@intel.com \
    --cc=radu.nicolau@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).