From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by dpdk.org (Postfix) with ESMTP id 11019160 for ; Mon, 18 Jun 2018 18:11:38 +0200 (CEST) X-Amp-Result: UNSCANNABLE X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 18 Jun 2018 09:11:37 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.51,240,1526367600"; d="scan'208";a="50840179" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by orsmga006.jf.intel.com with ESMTP; 18 Jun 2018 09:11:36 -0700 Received: from sivswdev01.ir.intel.com (sivswdev01.ir.intel.com [10.237.217.45]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id w5IGBZbW018079; Mon, 18 Jun 2018 17:11:35 +0100 Received: from sivswdev01.ir.intel.com (localhost [127.0.0.1]) by sivswdev01.ir.intel.com with ESMTP id w5IGBZ6H029547; Mon, 18 Jun 2018 17:11:35 +0100 Received: (from lma25@localhost) by sivswdev01.ir.intel.com with LOCAL id w5IGBZT3029543; Mon, 18 Jun 2018 17:11:35 +0100 Date: Mon, 18 Jun 2018 17:11:35 +0100 From: "Liang, Ma" To: "Hunt, David" Cc: dev@dpdk.org, radu.nicolau@intel.com Message-ID: <20180618161135.GA25741@sivswdev01.ir.intel.com> References: <1528451833-3617-1-git-send-email-liang.j.ma@intel.com> <992ecf9a-5144-84c0-5318-329435aebd6c@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <992ecf9a-5144-84c0-5318-329435aebd6c@intel.com> User-Agent: Mutt/1.9.1 (2017-09-22) Subject: Re: [dpdk-dev] [PATCH v1 1/2] lib/librte_power: traffic pattern aware power control X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 Jun 2018 16:11:40 -0000 On 14 Jun 11:59, Hunt, David wrote: > Hi Liang > > > > On 8/6/2018 10:57 AM, Liang Ma wrote: > > 1. Abstract > > > > For packet processing workloads such as DPDK polling is continuous. > > This means CPU cores always show 100% busy independent of how much work > > those cores are doing. It is critical to accurately determine how busy > > a core is hugely important for the following reasons: > > > > * No indication of overload conditions > > > > * User do not know how much real load is on a system meaning resulted in > > wasted energy as no power management is utilized > > > > Tried and failed schemes include calculating the cycles required from > > the load on the core, in other words the business. For example, > > Typo, I think this should be busyness. :) agree, I will update in next patch. > > > how many cycles it costs to handle each packet and determining the frequency > > cost per core. Due to the varying nature of traffic, types of frames and cost > > in cycles to process, this mechanism becomes complex quickly where > > a simple scheme is required to solve the problems. > > > > 2. Proposed solution > > > > For all polling mechanism, the proposed solution focus on how many times > > empty poll executed instead of calculating how many cycles it cost to handle > > each packet. The less empty poll number means current core is busy with > > processing workload, therefore, the higher frequency is needed. The high > > empty poll number indicate current core has lots spare time, therefore, > > we can lower the frequency. > > > > 2.1 Power state definition: > > > > LOW: the frequency is used for purge mode. > > > > MED: the frequency is used to process modest traffic workload. > > > > HIGH: the frequency is used to process busy traffic workload. > > > > 2.2 There are two phases to establish the power management system: > > > > a.Initialization/Training phase. There is no traffic pass-through, > > the system will test average empty poll numbers with LOW/MED/HIGH > > power state. Those average empty poll numbers will be the baseline > > for the normal phase. The system will collect all core's counter > > every 100ms. The Training phase will take 5 seconds. > > I suggest that you add in that all Rx packets are blocked during this phase, > and we measure the number of empty polls possible for each of the > frequencies (low, medium and high). I don't think the Rx packet is blocked, currently, there is no traffic pass-throughduring training phase. > > > b.Normal phase. When the real traffic pass-though, the system will > > compare run-time empty poll moving average value with base line > > then make decision to move to HIGH power state of MED power state. > > The system will collect all core's counter every 100ms. > > I think this may need to be reduced from 100ms. The reaction to an increase > in traffic would need > to be quicker than this to avoid buffer overflow. > agree, I will reduce to 10ms in next patch. > > > > 3. Proposed API > > > > 1. rte_empty_poll_stat_init(void); > > which is used to initialize the power management system. > > 2. rte_empty_poll_stat_free(void); > > which is used to free the resource hold by power management system. > > 3. rte_empty_poll_stat_update(unsigned int lcore_id); > > which is used to update specific core empty poll counter, not thread safe > > 4. rte_poll_stat_update(unsigned int lcore_id, uint8_t nb_pkt); > > which is used to update specific core valid poll counter, not thread safe > > 5. rte_empty_poll_stat_fetch(unsigned int lcore_id); > > which is used to get specific core empty poll counter. > > 6. rte_poll_stat_fetch(unsigned int lcore_id); > > which is used to get specific core valid poll counter. > > > > 7. rte_empty_poll_set_freq(enum freq_val index, uint32_t limit); > > which allow user customize the frequency of power state. > > > > 8. rte_empty_poll_setup_timer(void); > > which is used to setup the timer/callback to process all above counter. > > > > Signed-off-by: Liang Ma > > --- > > lib/librte_power/Makefile | 3 +- > > lib/librte_power/meson.build | 5 +- > > lib/librte_power/rte_empty_poll.c | 529 ++++++++++++++++++++++++++++++++++++++ > > lib/librte_power/rte_empty_poll.h | 135 ++++++++++ > > 4 files changed, 669 insertions(+), 3 deletions(-) > > create mode 100644 lib/librte_power/rte_empty_poll.c > > create mode 100644 lib/librte_power/rte_empty_poll.h > > > > diff --git a/lib/librte_power/Makefile b/lib/librte_power/Makefile > > index 6f85e88..dbc175a 100644 > > --- a/lib/librte_power/Makefile > > +++ b/lib/librte_power/Makefile > > @@ -16,8 +16,9 @@ LIBABIVER := 1 > > # all source are stored in SRCS-y > > SRCS-$(CONFIG_RTE_LIBRTE_POWER) := rte_power.c power_acpi_cpufreq.c > > SRCS-$(CONFIG_RTE_LIBRTE_POWER) += power_kvm_vm.c guest_channel.c > > +SRCS-$(CONFIG_RTE_LIBRTE_POWER) += rte_empty_poll.c > > # install this header file > > -SYMLINK-$(CONFIG_RTE_LIBRTE_POWER)-include := rte_power.h > > +SYMLINK-$(CONFIG_RTE_LIBRTE_POWER)-include := rte_power.h rte_empty_poll.h > > I would propose re-naming the .h and .c file to > rte_*power_*empty_poll.[h/c], so we can > associate it with the power library. > > > > include $(RTE_SDK)/mk/rte.lib.mk > > diff --git a/lib/librte_power/meson.build b/lib/librte_power/meson.build > > index 253173f..5270fa3 100644 > > --- a/lib/librte_power/meson.build > > +++ b/lib/librte_power/meson.build > > @@ -5,5 +5,6 @@ if host_machine.system() != 'linux' > > build = false > > endif > > sources = files('rte_power.c', 'power_acpi_cpufreq.c', > > - 'power_kvm_vm.c', 'guest_channel.c') > > -headers = files('rte_power.h') > > + 'power_kvm_vm.c', 'guest_channel.c', > > + 'rte_empty_poll.c') > > +headers = files('rte_power.h','rte_empty_poll.h') > > diff --git a/lib/librte_power/rte_empty_poll.c b/lib/librte_power/rte_empty_poll.c > > new file mode 100644 > > index 0000000..57bd63b > > --- /dev/null > > +++ b/lib/librte_power/rte_empty_poll.c > > @@ -0,0 +1,529 @@ > > +/* SPDX-License-Identifier: BSD-3-Clause > > + * Copyright(c) 2010-2018 Intel Corporation > > + */ > > + > > +#include > > + > > +#include > > +#include > > +#include > > +#include > > + > > + > > + > > + > > +#include "rte_power.h" > > +#include "rte_empty_poll.h" > > + > > + > > + > > + > > +#define INTERVALS_PER_SECOND 10 /* (100ms) */ > > +#define SECONDS_TO_TRAIN_FOR 5 > > +#define DEFAULT_MED_TO_HIGH_PERCENT_THRESHOLD 70 > > +#define DEFAULT_HIGH_TO_MED_PERCENT_THRESHOLD 30 > > +#define DEFAULT_CYCLES_PER_PACKET 800 > > + > > + > > +static struct ep_params *ep_params; > > +static uint32_t med_to_high_threshold = DEFAULT_MED_TO_HIGH_PERCENT_THRESHOLD; > > +static uint32_t high_to_med_threshold = DEFAULT_HIGH_TO_MED_PERCENT_THRESHOLD; > > + > > +static uint32_t avail_freqs[RTE_MAX_LCORE][NUM_FREQS]; > > + > > +static uint32_t total_avail_freqs[RTE_MAX_LCORE]; > > + > > +static uint32_t freq_index[NUM_FREQ]; > > + > > + > > + > > +static uint32_t > > +get_freq_index(enum freq_val index) > > +{ > > + return freq_index[index]; > > +} > > + > > + > > +static int > > +set_power_freq(int lcore_id, enum freq_val freq, bool specific_freq) > > +{ > > + int err = 0; > > + uint32_t power_freq_index; > > + if (!specific_freq) > > + power_freq_index = get_freq_index(freq); > > + else > > + power_freq_index = freq; > > + > > + err = rte_power_set_freq(lcore_id, power_freq_index); > > + > > + return err; > > +} > > + > > + > > +static inline void __attribute__((always_inline)) > > +exit_training_state(struct priority_worker *poll_stats) > > +{ > > + RTE_SET_USED(poll_stats); > > +} > > + > > +static inline void __attribute__((always_inline)) > > +enter_training_state(struct priority_worker *poll_stats) > > +{ > > + poll_stats->iter_counter = 0; > > + poll_stats->cur_freq = LOW; > > + poll_stats->queue_state = TRAINING; > > +} > > + > > +static inline void __attribute__((always_inline)) > > +enter_normal_state(struct priority_worker *poll_stats) > > +{ > > + /* Clear the averages arrays and strs */ > > + memset(poll_stats->edpi_av, 0, sizeof(poll_stats->edpi_av)); > > + poll_stats->ec = 0; > > + memset(poll_stats->ppi_av, 0, sizeof(poll_stats->ppi_av)); > > + poll_stats->pc = 0; > > + > > + poll_stats->cur_freq = MED; > > + poll_stats->iter_counter = 0; > > + poll_stats->threshold_ctr = 0; > > + poll_stats->queue_state = MED_NORMAL; > > + set_power_freq(poll_stats->lcore_id, MED, false); > > + > > + /* Try here */ > > + poll_stats->thresh[MED].threshold_percent = med_to_high_threshold; > > + poll_stats->thresh[HGH].threshold_percent = high_to_med_threshold; > > +} > > + > > +static inline void __attribute__((always_inline)) > > +enter_busy_state(struct priority_worker *poll_stats) > > +{ > > + memset(poll_stats->edpi_av, 0, sizeof(poll_stats->edpi_av)); > > + poll_stats->ec = 0; > > + memset(poll_stats->ppi_av, 0, sizeof(poll_stats->ppi_av)); > > + poll_stats->pc = 0; > > + > > + poll_stats->cur_freq = HGH; > > + poll_stats->iter_counter = 0; > > + poll_stats->threshold_ctr = 0; > > + poll_stats->queue_state = HGH_BUSY; > > + set_power_freq(poll_stats->lcore_id, HGH, false); > > +} > > + > > +static inline void __attribute__((always_inline)) > > +enter_purge_state(struct priority_worker *poll_stats) > > +{ > > + poll_stats->iter_counter = 0; > > + poll_stats->queue_state = LOW_PURGE; > > +} > > + > > +static inline void __attribute__((always_inline)) > > +set_state(struct priority_worker *poll_stats, > > + enum queue_state new_state) > > +{ > > + enum queue_state old_state = poll_stats->queue_state; > > + if (old_state != new_state) { > > + > > + /* Call any old state exit functions */ > > + if (old_state == TRAINING) > > + exit_training_state(poll_stats); > > + > > + /* Call any new state entry functions */ > > + if (new_state == TRAINING) > > + enter_training_state(poll_stats); > > + if (new_state == MED_NORMAL) > > + enter_normal_state(poll_stats); > > + if (new_state == HGH_BUSY) > > + enter_busy_state(poll_stats); > > + if (new_state == LOW_PURGE) > > + enter_purge_state(poll_stats); > > + } > > +} > > + > > + > > +static void > > +update_training_stats(struct priority_worker *poll_stats, > > + uint32_t freq, > > + bool specific_freq, > > + uint32_t max_train_iter) > > +{ > > + RTE_SET_USED(specific_freq); > > + > > + char pfi_str[32]; > > + uint64_t p0_empty_deq; > > + > > + sprintf(pfi_str, "%02d", freq); > > + > > + if (poll_stats->cur_freq == freq && > > + poll_stats->thresh[freq].trained == false) { > > + if (poll_stats->thresh[freq].cur_train_iter == 0) { > > + > > + set_power_freq(poll_stats->lcore_id, > > + freq, specific_freq); > > + > > + poll_stats->empty_dequeues_prev = > > + poll_stats->empty_dequeues; > > + > > + poll_stats->thresh[freq].cur_train_iter++; > > + > > + return; > > + } else if (poll_stats->thresh[freq].cur_train_iter > > + <= max_train_iter) { > > + > > + p0_empty_deq = poll_stats->empty_dequeues - > > + poll_stats->empty_dequeues_prev; > > + > > + poll_stats->empty_dequeues_prev = > > + poll_stats->empty_dequeues; > > + > > + poll_stats->thresh[freq].base_edpi += p0_empty_deq; > > + poll_stats->thresh[freq].cur_train_iter++; > > + > > + } else { > > + if (poll_stats->thresh[freq].trained == false) { > > + poll_stats->thresh[freq].base_edpi = > > + poll_stats->thresh[freq].base_edpi / > > + max_train_iter; > > + > > + /* Add on a factor of 0.05%, this should remove any */ > > + /* false negatives when the system is 0% busy */ > > + poll_stats->thresh[freq].base_edpi += > > + poll_stats->thresh[freq].base_edpi / 2000; > > Please use #define for magic numbers > > > + > > + poll_stats->thresh[freq].trained = true; > > + poll_stats->cur_freq++; > > + > > + } > > + } > > + } > > +} > > + > > +static inline uint32_t __attribute__((always_inline)) > > +update_stats(struct priority_worker *poll_stats) > > +{ > > + uint64_t tot_edpi = 0, tot_ppi = 0; > > + uint32_t j, percent; > > + > > + struct priority_worker *s = poll_stats; > > + > > + uint64_t cur_edpi = s->empty_dequeues - s->empty_dequeues_prev; > > + > > + s->empty_dequeues_prev = s->empty_dequeues; > > + > > + uint64_t ppi = s->num_dequeue_pkts - s->num_dequeue_pkts_prev; > > + > > + s->num_dequeue_pkts_prev = s->num_dequeue_pkts; > > + > > + if (s->thresh[s->cur_freq].base_edpi < cur_edpi) > > + return 1000UL; /* Value to make us fail */ > > + > > + s->edpi_av[s->ec++ % BINS_AV] = cur_edpi; > > + s->ppi_av[s->pc++ % BINS_AV] = ppi; > > + > > + for (j = 0; j < BINS_AV; j++) { > > + tot_edpi += s->edpi_av[j]; > > + tot_ppi += s->ppi_av[j]; > > + } > > + > > + tot_edpi = tot_edpi / BINS_AV; > > + > > + percent = 100 - (uint32_t)((float)tot_edpi / > > + (float)s->thresh[s->cur_freq].base_edpi * 100); > > + > > + return (uint32_t)percent; > > +} > > + > > + > > +static inline void __attribute__((always_inline)) > > +update_stats_normal(struct priority_worker *poll_stats) > > +{ > > + uint32_t percent; > > + > > + if (poll_stats->thresh[poll_stats->cur_freq].base_edpi == 0) > > + return; > > + > > + percent = update_stats(poll_stats); > > + > > + if (percent > 100) > > + return; > > + > > + if (poll_stats->cur_freq == LOW) > > + RTE_LOG(INFO, POWER, "Purge Mode is not supported\n"); > > + else if (poll_stats->cur_freq == MED) { > > + > > + if (percent > poll_stats->thresh[poll_stats->cur_freq]. > > + threshold_percent) { > > + > > + if (poll_stats->threshold_ctr < INTERVALS_PER_SECOND) > > + poll_stats->threshold_ctr++; > > + else > > + set_state(poll_stats, HGH_BUSY); > > + > > + } else { > > + /* reset */ > > + poll_stats->threshold_ctr = 0; > > + } > > + > > + } else if (poll_stats->cur_freq == HGH) { > > + > > + if (percent < poll_stats->thresh[poll_stats->cur_freq]. > > + threshold_percent) { > > + > > + if (poll_stats->threshold_ctr < INTERVALS_PER_SECOND) > > + poll_stats->threshold_ctr++; > > + else > > + set_state(poll_stats, MED_NORMAL); > > + } else > > + /* reset */ > > + poll_stats->threshold_ctr = 0; > > + > > + > > + } > > +} > > + > > +static int > > +empty_poll_trainning(struct priority_worker *poll_stats, > > + uint32_t max_train_iter) { > > + > > + if (poll_stats->iter_counter < INTERVALS_PER_SECOND) { > > + poll_stats->iter_counter++; > > + return 0; > > + } > > + > > + > > + update_training_stats(poll_stats, > > + LOW, > > + false, > > + max_train_iter); > > + > > + update_training_stats(poll_stats, > > + MED, > > + false, > > + max_train_iter); > > + > > + update_training_stats(poll_stats, > > + HGH, > > + false, > > + max_train_iter); > > + > > + > > + if (poll_stats->thresh[LOW].trained == true > > + && poll_stats->thresh[MED].trained == true > > + && poll_stats->thresh[HGH].trained == true) { > > + > > + set_state(poll_stats, MED_NORMAL); > > + > > + RTE_LOG(INFO, POWER, "Training is Complete for %d\n", > > + poll_stats->lcore_id); > > + } > > + > > + return 0; > > +} > > + > > +static void > > +empty_poll_detection(struct rte_timer *tim, > > + void *arg) > > +{ > > + > > + uint32_t i; > > + > > + struct priority_worker *poll_stats; > > + > > + RTE_SET_USED(tim); > > + > > + RTE_SET_USED(arg); > > + > > + for (i = 0; i < NUM_NODES; i++) { > > + > > + poll_stats = &(ep_params->wrk_data.wrk_stats[i]); > > + > > + if (rte_lcore_is_enabled(poll_stats->lcore_id) == 0) > > + continue; > > + > > + switch (poll_stats->queue_state) { > > + case(TRAINING): > > + empty_poll_trainning(poll_stats, > > + ep_params->max_train_iter); > > + break; > > + > > + case(HGH_BUSY): > > + case(MED_NORMAL): > > + update_stats_normal(poll_stats); > > + > > + break; > > + > > + case(LOW_PURGE): > > + break; > > + default: > > + break; > > + > > + } > > + > > + } > > + > > +} > > + > > +int > > +rte_empty_poll_stat_init(void) > > +{ > > + uint32_t i; > > + /* Allocate the ep_params structure */ > > + ep_params = rte_zmalloc_socket(NULL, > > + sizeof(struct ep_params), > > + 0, > > + rte_socket_id()); > > + > > + if (!ep_params) > > + rte_panic("Cannot allocate heap memory for ep_params " > > + "for socket %d\n", rte_socket_id()); > > + > > + freq_index[LOW] = 14; > > + freq_index[MED] = 9; > > + freq_index[HGH] = 1; > > + > > Are these default frequency indexes? Can they be changed? Maybe say so in a > comment. > > > + RTE_LOG(INFO, POWER, "Initialize the Empty Poll\n"); > > + > > + /* 5 seconds worth of training */ > > + ep_params->max_train_iter = INTERVALS_PER_SECOND * SECONDS_TO_TRAIN_FOR; > > + > > + struct stats_data *w = &ep_params->wrk_data; > > + > > + /* initialize all wrk_stats state */ > > + for (i = 0; i < NUM_NODES; i++) { > > + > > + if (rte_lcore_is_enabled(i) == 0) > > + continue; > > + > > + set_state(&w->wrk_stats[i], TRAINING); > > + /*init the freqs table */ > > + total_avail_freqs[i] = rte_power_freqs(i, > > + avail_freqs[i], > > + NUM_FREQS); > > + > > + if (get_freq_index(LOW) > total_avail_freqs[i]) > > + return -1; > > + > > + } > > + > > + > > + return 0; > > +} > > + > > +void > > +rte_empty_poll_stat_free(void) > > +{ > > + > > + RTE_LOG(INFO, POWER, "Close the Empty Poll\n"); > > + > > + if (ep_params != NULL) > > + rte_free(ep_params); > > +} > > + > > +int > > +rte_empty_poll_stat_update(unsigned int lcore_id) > > +{ > > + struct priority_worker *poll_stats; > > + > > + if (lcore_id >= NUM_NODES) > > + return -1; > > + > > + poll_stats = &(ep_params->wrk_data.wrk_stats[lcore_id]); > > + > > + if (poll_stats->lcore_id == 0) > > + poll_stats->lcore_id = lcore_id; > > + > > + poll_stats->empty_dequeues++; > > + > > + return 0; > > +} > > + > > +int > > +rte_poll_stat_update(unsigned int lcore_id, uint8_t nb_pkt) > > +{ > > + > > + struct priority_worker *poll_stats; > > + > > + if (lcore_id >= NUM_NODES) > > + return -1; > > + > > + poll_stats = &(ep_params->wrk_data.wrk_stats[lcore_id]); > > + > > + if (poll_stats->lcore_id == 0) > > + poll_stats->lcore_id = lcore_id; > > + > > + poll_stats->num_dequeue_pkts += nb_pkt; > > + > > + return 0; > > +} > > + > > + > > +uint64_t > > +rte_empty_poll_stat_fetch(unsigned int lcore_id) > > +{ > > + struct priority_worker *poll_stats; > > + > > + if (lcore_id >= NUM_NODES) > > + return -1; > > + > > + poll_stats = &(ep_params->wrk_data.wrk_stats[lcore_id]); > > + > > + if (poll_stats->lcore_id == 0) > > + poll_stats->lcore_id = lcore_id; > > + > > + return poll_stats->empty_dequeues; > > +} > > + > > +uint64_t > > +rte_poll_stat_fetch(unsigned int lcore_id) > > +{ > > + struct priority_worker *poll_stats; > > + > > + if (lcore_id >= NUM_NODES) > > + return -1; > > + > > + poll_stats = &(ep_params->wrk_data.wrk_stats[lcore_id]); > > + > > + if (poll_stats->lcore_id == 0) > > + poll_stats->lcore_id = lcore_id; > > + > > + return poll_stats->num_dequeue_pkts; > > +} > > + > > +void > > +rte_empty_poll_set_freq(enum freq_val index, uint32_t limit) > > +{ > > + switch (index) { > > + > > + case LOW: > > + freq_index[LOW] = limit; > > + break; > > + > > + case MED: > > + freq_index[MED] = limit; > > + break; > > + > > + case HGH: > > + freq_index[HGH] = limit; > > + break; > > + default: > > + break; > > + } > > +} > > + > > +void > > +rte_empty_poll_setup_timer(void) > > +{ > > + int lcore_id = rte_lcore_id(); > > + uint64_t hz = rte_get_timer_hz(); > > + > > + struct ep_params *ep_ptr = ep_params; > > + > > + ep_ptr->interval_ticks = hz / INTERVALS_PER_SECOND; > > + > > + rte_timer_reset_sync(&ep_ptr->timer0, > > + ep_ptr->interval_ticks, > > + PERIODICAL, > > + lcore_id, > > + empty_poll_detection, > > + (void *)ep_ptr); > > + > > +} > > diff --git a/lib/librte_power/rte_empty_poll.h b/lib/librte_power/rte_empty_poll.h > > new file mode 100644 > > index 0000000..7e036ee > > --- /dev/null > > +++ b/lib/librte_power/rte_empty_poll.h > > @@ -0,0 +1,135 @@ > > +/* SPDX-License-Identifier: BSD-3-Clause > > + * Copyright(c) 2010-2018 Intel Corporation > > + */ > > + > > +#ifndef _RTE_EMPTY_POLL_H > > +#define _RTE_EMPTY_POLL_H > > + > > +/** > > + * @file > > + * RTE Power Management > > + */ > > +#include > > +#include > > +#include > > + > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > + > > +#ifdef __cplusplus > > +extern "C" { > > +#endif > > + > > +#define NUM_FREQS 20 > > Should probably be RTE_MAX_LCORE_FREQS > > > + > > +#define BINS_AV 4 /* has to be ^2 */ > > + > > +#define DROP (NUM_DIRECTIONS * NUM_DEVICES) > > + > > +#define NUM_PRIORITIES 2 > > + > > +#define NUM_NODES 31 /*any reseanable prime number should work*/ > > + > > + > > +enum freq_val { > > + LOW, > > + MED, > > + HGH, > > HGH should be HIGH. Where we see this on it's own in the code, it's not > obvious that it's related to MEDIUM and LOW.  Also, maybe MED should be > MEDIUM. > > Would suggest prefixing these with FREQ_. Makes the code more readable. > > > + NUM_FREQ = NUM_FREQS > > This should probably be the number of elements in freq_val rather than > NUM_FREQS > > I think there is some confusion in the code about the number of frequencies. > Any array where it's holding all the possible frequencies should use > RTE_MAX_LCORE_FREQS. > But it looks ti me that the freq_index array only holds three values, the > indexes for Low, Medium and High. > > > > +}; > > + > > + > > +enum queue_state { > > + TRAINING, /* NO TRAFFIC */ > > + MED_NORMAL, /* MED */ > > + HGH_BUSY, /* HIGH */ > > + LOW_PURGE, /* LOW */ > > +}; > > I'm not sure that the NORMAL, BUSY and PURGE words add any value here. How > about MEDIUM, HIGH and LOW? > > > + > > +/* queue stats */ > > +struct freq_threshold { > > + > > + uint64_t base_edpi; > > + bool trained; > > + uint32_t threshold_percent; > > + uint32_t cur_train_iter; > > +}; > > + > > + > > +struct priority_worker { > > + > > + /* Current dequeue and throughput counts */ > > + /* These 2 are written to by the worker threads */ > > + /* So keep them on their own cache line */ > > + uint64_t empty_dequeues; > > + uint64_t num_dequeue_pkts; > > + > > + enum queue_state queue_state; > > + > > + uint64_t empty_dequeues_prev; > > + uint64_t num_dequeue_pkts_prev; > > + > > + /* Used for training only */ > > + struct freq_threshold thresh[NUM_FREQ]; > > + enum freq_val cur_freq; > > + > > + /* bucket arrays to calculate the averages */ > > + uint64_t edpi_av[BINS_AV]; > > + uint32_t ec; > > + uint64_t ppi_av[BINS_AV]; > > + uint32_t pc; > > Suggest a comment per line explaining what the names of these variables > mean. edpi, ppi, etc. > > > > + > > + uint32_t lcore_id; > > + uint32_t iter_counter; > > + uint32_t threshold_ctr; > > + uint32_t display_ctr; > > + uint8_t dev_id; > > + > > +} __rte_cache_aligned; > > + > > + > > +struct stats_data { > > + > > + struct priority_worker wrk_stats[NUM_NODES]; > > + > > + /* flag to stop rx threads processing packets until training over */ > > + bool start_rx; > > + > > +}; > > + > > +struct ep_params { > > + > > + /* timer related stuff */ > > + uint64_t interval_ticks; > > + uint32_t max_train_iter; > > + struct rte_timer timer0; > > + > > + struct stats_data wrk_data; > > +}; > > + > > + > > +int rte_empty_poll_stat_init(void); > > + > > +void rte_empty_poll_stat_free(void); > > + > > +int rte_empty_poll_stat_update(unsigned int lcore_id); > > + > > +int rte_poll_stat_update(unsigned int lcore_id, uint8_t nb_pkt); > > + > > +uint64_t rte_empty_poll_stat_fetch(unsigned int lcore_id); > > + > > +uint64_t rte_poll_stat_fetch(unsigned int lcore_id); > > + > > +void rte_empty_poll_set_freq(enum freq_val index, uint32_t limit); > > + > > +void rte_empty_poll_setup_timer(void); > > All the function prototypes need documentation. Please see rte_power.h. > And please makes sure that Doxygen documentation generates correctly from > the comments. > Suggest prefixing the functions with rte_power_ > > > + > > +#ifdef __cplusplus > > +} > > +#endif > > + > > +#endif > > A few more comments throughout the code would be good. > > >