From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 947D1A0A0C; Mon, 5 Jul 2021 12:24:09 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 5C59B4068C; Mon, 5 Jul 2021 12:24:09 +0200 (CEST) Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by mails.dpdk.org (Postfix) with ESMTP id 1492C4003C for ; Mon, 5 Jul 2021 12:24:04 +0200 (CEST) X-IronPort-AV: E=McAfee;i="6200,9189,10035"; a="294599209" X-IronPort-AV: E=Sophos;i="5.83,325,1616482800"; d="scan'208";a="294599209" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Jul 2021 03:23:51 -0700 X-IronPort-AV: E=Sophos;i="5.83,325,1616482800"; d="scan'208";a="496083924" Received: from aburakov-mobl.ger.corp.intel.com (HELO [10.213.206.134]) ([10.213.206.134]) by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Jul 2021 03:23:49 -0700 To: "Ananyev, Konstantin" , "dev@dpdk.org" , "Hunt, David" Cc: "Loftus, Ciara" References: <8f5d030a77aa2f0e95e9680cb911b4e8db30c879.1624981670.git.anatoly.burakov@intel.com> From: "Burakov, Anatoly" Message-ID: <4cbae9e6-5e6b-5fe0-696f-798ec1135ce0@intel.com> Date: Mon, 5 Jul 2021 11:23:44 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [PATCH v5 5/7] power: support callbacks for multiple Rx queues X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 30-Jun-21 12:04 PM, Ananyev, Konstantin wrote: > > > >> Currently, there is a hard limitation on the PMD power management >> support that only allows it to support a single queue per lcore. This is >> not ideal as most DPDK use cases will poll multiple queues per core. >> >> The PMD power management mechanism relies on ethdev Rx callbacks, so it >> is very difficult to implement such support because callbacks are >> effectively stateless and have no visibility into what the other ethdev >> devices are doing. This places limitations on what we can do within the >> framework of Rx callbacks, but the basics of this implementation are as >> follows: >> >> - Replace per-queue structures with per-lcore ones, so that any device >> polled from the same lcore can share data >> - Any queue that is going to be polled from a specific lcore has to be >> added to the list of queues to poll, so that the callback is aware of >> other queues being polled by the same lcore >> - Both the empty poll counter and the actual power saving mechanism is >> shared between all queues polled on a particular lcore, and is only >> activated when all queues in the list were polled and were determined >> to have no traffic. >> - The limitation on UMWAIT-based polling is not removed because UMWAIT >> is incapable of monitoring more than one address. >> >> Also, while we're at it, update and improve the docs. >> >> Signed-off-by: Anatoly Burakov >> --- >> >> Notes: >> v5: >> - Remove the "power save queue" API and replace it with mechanism suggested by >> Konstantin >> >> v3: >> - Move the list of supported NICs to NIC feature table >> >> v2: >> - Use a TAILQ for queues instead of a static array >> - Address feedback from Konstantin >> - Add additional checks for stopped queues >> >> doc/guides/nics/features.rst | 10 + >> doc/guides/prog_guide/power_man.rst | 65 ++-- >> doc/guides/rel_notes/release_21_08.rst | 3 + >> lib/power/rte_power_pmd_mgmt.c | 431 ++++++++++++++++++------- >> 4 files changed, 373 insertions(+), 136 deletions(-) >> >> diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst >> index 403c2b03a3..a96e12d155 100644 >> --- a/doc/guides/nics/features.rst >> +++ b/doc/guides/nics/features.rst >> @@ -912,6 +912,16 @@ Supports to get Rx/Tx packet burst mode information. >> * **[implements] eth_dev_ops**: ``rx_burst_mode_get``, ``tx_burst_mode_get``. >> * **[related] API**: ``rte_eth_rx_burst_mode_get()``, ``rte_eth_tx_burst_mode_get()``. >> >> +.. _nic_features_get_monitor_addr: >> + >> +PMD power management using monitor addresses >> +-------------------------------------------- >> + >> +Supports getting a monitoring condition to use together with Ethernet PMD power >> +management (see :doc:`../prog_guide/power_man` for more details). >> + >> +* **[implements] eth_dev_ops**: ``get_monitor_addr`` >> + >> .. _nic_features_other: >> >> Other dev ops not represented by a Feature >> diff --git a/doc/guides/prog_guide/power_man.rst b/doc/guides/prog_guide/power_man.rst >> index c70ae128ac..ec04a72108 100644 >> --- a/doc/guides/prog_guide/power_man.rst >> +++ b/doc/guides/prog_guide/power_man.rst >> @@ -198,34 +198,41 @@ Ethernet PMD Power Management API >> Abstract >> ~~~~~~~~ >> >> -Existing power management mechanisms require developers >> -to change application design or change code to make use of it. >> -The PMD power management API provides a convenient alternative >> -by utilizing Ethernet PMD RX callbacks, >> -and triggering power saving whenever empty poll count reaches a certain number. >> - >> -Monitor >> - This power saving scheme will put the CPU into optimized power state >> - and use the ``rte_power_monitor()`` function >> - to monitor the Ethernet PMD RX descriptor address, >> - and wake the CPU up whenever there's new traffic. >> - >> -Pause >> - This power saving scheme will avoid busy polling >> - by either entering power-optimized sleep state >> - with ``rte_power_pause()`` function, >> - or, if it's not available, use ``rte_pause()``. >> - >> -Frequency scaling >> - This power saving scheme will use ``librte_power`` library >> - functionality to scale the core frequency up/down >> - depending on traffic volume. >> - >> -.. note:: >> - >> - Currently, this power management API is limited to mandatory mapping >> - of 1 queue to 1 core (multiple queues are supported, >> - but they must be polled from different cores). >> +Existing power management mechanisms require developers to change application >> +design or change code to make use of it. The PMD power management API provides a >> +convenient alternative by utilizing Ethernet PMD RX callbacks, and triggering >> +power saving whenever empty poll count reaches a certain number. >> + >> +* Monitor >> + This power saving scheme will put the CPU into optimized power state and >> + monitor the Ethernet PMD RX descriptor address, waking the CPU up whenever >> + there's new traffic. Support for this scheme may not be available on all >> + platforms, and further limitations may apply (see below). >> + >> +* Pause >> + This power saving scheme will avoid busy polling by either entering >> + power-optimized sleep state with ``rte_power_pause()`` function, or, if it's >> + not supported by the underlying platform, use ``rte_pause()``. >> + >> +* Frequency scaling >> + This power saving scheme will use ``librte_power`` library functionality to >> + scale the core frequency up/down depending on traffic volume. >> + >> +The "monitor" mode is only supported in the following configurations and scenarios: >> + >> +* If ``rte_cpu_get_intrinsics_support()`` function indicates that >> + ``rte_power_monitor()`` is supported by the platform, then monitoring will be >> + limited to a mapping of 1 core 1 queue (thus, each Rx queue will have to be >> + monitored from a different lcore). >> + >> +* If ``rte_cpu_get_intrinsics_support()`` function indicates that the >> + ``rte_power_monitor()`` function is not supported, then monitor mode will not >> + be supported. >> + >> +* Not all Ethernet drivers support monitoring, even if the underlying >> + platform may support the necessary CPU instructions. Please refer to >> + :doc:`../nics/overview` for more information. >> + >> >> API Overview for Ethernet PMD Power Management >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> @@ -242,3 +249,5 @@ References >> >> * The :doc:`../sample_app_ug/vm_power_management` >> chapter in the :doc:`../sample_app_ug/index` section. >> + >> +* The :doc:`../nics/overview` chapter in the :doc:`../nics/index` section >> diff --git a/doc/guides/rel_notes/release_21_08.rst b/doc/guides/rel_notes/release_21_08.rst >> index f015c509fc..3926d45ef8 100644 >> --- a/doc/guides/rel_notes/release_21_08.rst >> +++ b/doc/guides/rel_notes/release_21_08.rst >> @@ -57,6 +57,9 @@ New Features >> >> * eal: added ``rte_power_monitor_multi`` to support waiting for multiple events. >> >> +* rte_power: The experimental PMD power management API now supports managing >> + multiple Ethernet Rx queues per lcore. >> + >> >> Removed Items >> ------------- >> diff --git a/lib/power/rte_power_pmd_mgmt.c b/lib/power/rte_power_pmd_mgmt.c >> index 9b95cf1794..fccfd236c2 100644 >> --- a/lib/power/rte_power_pmd_mgmt.c >> +++ b/lib/power/rte_power_pmd_mgmt.c >> @@ -33,18 +33,96 @@ enum pmd_mgmt_state { >> PMD_MGMT_ENABLED >> }; >> >> -struct pmd_queue_cfg { >> +union queue { >> + uint32_t val; >> + struct { >> + uint16_t portid; >> + uint16_t qid; >> + }; >> +}; >> + >> +struct queue_list_entry { >> + TAILQ_ENTRY(queue_list_entry) next; >> + union queue queue; >> + uint64_t n_empty_polls; >> + const struct rte_eth_rxtx_callback *cb; >> +}; >> + >> +struct pmd_core_cfg { >> + TAILQ_HEAD(queue_list_head, queue_list_entry) head; >> + /**< List of queues associated with this lcore */ >> + size_t n_queues; >> + /**< How many queues are in the list? */ >> volatile enum pmd_mgmt_state pwr_mgmt_state; >> /**< State of power management for this queue */ >> enum rte_power_pmd_mgmt_type cb_mode; >> /**< Callback mode for this queue */ >> - const struct rte_eth_rxtx_callback *cur_cb; >> - /**< Callback instance */ >> - uint64_t empty_poll_stats; >> - /**< Number of empty polls */ >> + uint64_t n_queues_ready_to_sleep; >> + /**< Number of queues ready to enter power optimized state */ >> } __rte_cache_aligned; >> +static struct pmd_core_cfg lcore_cfgs[RTE_MAX_LCORE]; >> >> -static struct pmd_queue_cfg port_cfg[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT]; >> +static inline bool >> +queue_equal(const union queue *l, const union queue *r) >> +{ >> + return l->val == r->val; >> +} >> + >> +static inline void >> +queue_copy(union queue *dst, const union queue *src) >> +{ >> + dst->val = src->val; >> +} >> + >> +static struct queue_list_entry * >> +queue_list_find(const struct pmd_core_cfg *cfg, const union queue *q) >> +{ >> + struct queue_list_entry *cur; >> + >> + TAILQ_FOREACH(cur, &cfg->head, next) { >> + if (queue_equal(&cur->queue, q)) >> + return cur; >> + } >> + return NULL; >> +} >> + >> +static int >> +queue_list_add(struct pmd_core_cfg *cfg, const union queue *q) >> +{ >> + struct queue_list_entry *qle; >> + >> + /* is it already in the list? */ >> + if (queue_list_find(cfg, q) != NULL) >> + return -EEXIST; >> + >> + qle = malloc(sizeof(*qle)); >> + if (qle == NULL) >> + return -ENOMEM; >> + memset(qle, 0, sizeof(*qle)); >> + >> + queue_copy(&qle->queue, q); >> + TAILQ_INSERT_TAIL(&cfg->head, qle, next); >> + cfg->n_queues++; >> + qle->n_empty_polls = 0; >> + >> + return 0; >> +} >> + >> +static struct queue_list_entry * >> +queue_list_take(struct pmd_core_cfg *cfg, const union queue *q) >> +{ >> + struct queue_list_entry *found; >> + >> + found = queue_list_find(cfg, q); >> + if (found == NULL) >> + return NULL; >> + >> + TAILQ_REMOVE(&cfg->head, found, next); >> + cfg->n_queues--; >> + >> + /* freeing is responsibility of the caller */ >> + return found; >> +} >> >> static void >> calc_tsc(void) >> @@ -74,21 +152,56 @@ calc_tsc(void) >> } >> } >> >> +static inline void >> +queue_reset(struct pmd_core_cfg *cfg, struct queue_list_entry *qcfg) >> +{ >> + /* reset empty poll counter for this queue */ >> + qcfg->n_empty_polls = 0; >> + /* reset the sleep counter too */ >> + cfg->n_queues_ready_to_sleep = 0; >> +} >> + >> +static inline bool >> +queue_can_sleep(struct pmd_core_cfg *cfg, struct queue_list_entry *qcfg) >> +{ >> + /* this function is called - that means we have an empty poll */ >> + qcfg->n_empty_polls++; >> + >> + /* if we haven't reached threshold for empty polls, we can't sleep */ >> + if (qcfg->n_empty_polls <= EMPTYPOLL_MAX) >> + return false; >> + >> + /* we're ready to sleep */ >> + cfg->n_queues_ready_to_sleep++; >> + >> + return true; >> +} >> + >> +static inline bool >> +lcore_can_sleep(struct pmd_core_cfg *cfg) >> +{ >> + /* are all queues ready to sleep? */ >> + if (cfg->n_queues_ready_to_sleep != cfg->n_queues) >> + return false; >> + >> + /* we've reached an iteration where we can sleep, reset sleep counter */ >> + cfg->n_queues_ready_to_sleep = 0; >> + >> + return true; >> +} > > As I can see it a slightly modified one from what was discussed. > I understand that it seems simpler, but I think there are some problems with it: > - each queue can be counted more than once at lcore_cfg->n_queues_ready_to_sleep > - queues n_empty_polls are not reset after sleep(). > The latter is intentional: we *want* to sleep constantly once we pass the empty poll counter. The former shouldn't be a big problem in the conventional case as i don't think there are situations where people would poll core-pinned queues in different orders, but you're right, this is a potential issue and should be fixed. I'll add back the n_sleeps in the next iteration. > To illustrate the problem, let say we have 2 queues, and at some moment we have: > q0.n_empty_polls == EMPTYPOLL_MAX + 1 > q1.n_empty_polls == EMPTYPOLL_MAX + 1 > cfg->n_queues_ready_to_sleep == 2 > > So lcore_can_sleep() returns 'true' and sets: > cfg->n_queues_ready_to_sleep == 0 > > Now, after sleep(): > q0.n_empty_polls == EMPTYPOLL_MAX + 1 > q1.n_empty_polls == EMPTYPOLL_MAX + 1 > > So after: > queue_can_sleep(q0); > queue_can_sleep(q1); > > will have: > cfg->n_queues_ready_to_sleep == 2 > again, and we'll go to another sleep after just one rx_burst() attempt for each queue. > >> + >> static uint16_t >> clb_umwait(uint16_t port_id, uint16_t qidx, struct rte_mbuf **pkts __rte_unused, >> - uint16_t nb_rx, uint16_t max_pkts __rte_unused, >> - void *addr __rte_unused) >> + uint16_t nb_rx, uint16_t max_pkts __rte_unused, void *arg) >> { >> + struct queue_list_entry *queue_conf = arg; >> >> - struct pmd_queue_cfg *q_conf; >> - >> - q_conf = &port_cfg[port_id][qidx]; >> - >> + /* this callback can't do more than one queue, omit multiqueue logic */ >> if (unlikely(nb_rx == 0)) { >> - q_conf->empty_poll_stats++; >> - if (unlikely(q_conf->empty_poll_stats > EMPTYPOLL_MAX)) { >> + queue_conf->n_empty_polls++; >> + if (unlikely(queue_conf->n_empty_polls > EMPTYPOLL_MAX)) { >> struct rte_power_monitor_cond pmc; >> - uint16_t ret; >> + int ret; >> >> /* use monitoring condition to sleep */ >> ret = rte_eth_get_monitor_addr(port_id, qidx, >> @@ -97,60 +210,77 @@ clb_umwait(uint16_t port_id, uint16_t qidx, struct rte_mbuf **pkts __rte_unused, >> rte_power_monitor(&pmc, UINT64_MAX); >> } >> } else >> - q_conf->empty_poll_stats = 0; >> + queue_conf->n_empty_polls = 0; >> >> return nb_rx; >> } >> >> static uint16_t >> -clb_pause(uint16_t port_id, uint16_t qidx, struct rte_mbuf **pkts __rte_unused, >> - uint16_t nb_rx, uint16_t max_pkts __rte_unused, >> - void *addr __rte_unused) >> +clb_pause(uint16_t port_id __rte_unused, uint16_t qidx __rte_unused, >> + struct rte_mbuf **pkts __rte_unused, uint16_t nb_rx, >> + uint16_t max_pkts __rte_unused, void *arg) >> { >> - struct pmd_queue_cfg *q_conf; >> + const unsigned int lcore = rte_lcore_id(); >> + struct queue_list_entry *queue_conf = arg; >> + struct pmd_core_cfg *lcore_conf; >> + const bool empty = nb_rx == 0; >> >> - q_conf = &port_cfg[port_id][qidx]; >> + lcore_conf = &lcore_cfgs[lcore]; >> >> - if (unlikely(nb_rx == 0)) { >> - q_conf->empty_poll_stats++; >> - /* sleep for 1 microsecond */ >> - if (unlikely(q_conf->empty_poll_stats > EMPTYPOLL_MAX)) { >> - /* use tpause if we have it */ >> - if (global_data.intrinsics_support.power_pause) { >> - const uint64_t cur = rte_rdtsc(); >> - const uint64_t wait_tsc = >> - cur + global_data.tsc_per_us; >> - rte_power_pause(wait_tsc); >> - } else { >> - uint64_t i; >> - for (i = 0; i < global_data.pause_per_us; i++) >> - rte_pause(); >> - } >> + if (likely(!empty)) >> + /* early exit */ >> + queue_reset(lcore_conf, queue_conf); >> + else { >> + /* can this queue sleep? */ >> + if (!queue_can_sleep(lcore_conf, queue_conf)) >> + return nb_rx; >> + >> + /* can this lcore sleep? */ >> + if (!lcore_can_sleep(lcore_conf)) >> + return nb_rx; >> + >> + /* sleep for 1 microsecond, use tpause if we have it */ >> + if (global_data.intrinsics_support.power_pause) { >> + const uint64_t cur = rte_rdtsc(); >> + const uint64_t wait_tsc = >> + cur + global_data.tsc_per_us; >> + rte_power_pause(wait_tsc); >> + } else { >> + uint64_t i; >> + for (i = 0; i < global_data.pause_per_us; i++) >> + rte_pause(); >> } >> - } else >> - q_conf->empty_poll_stats = 0; >> + } >> >> return nb_rx; >> } >> >> static uint16_t >> -clb_scale_freq(uint16_t port_id, uint16_t qidx, >> +clb_scale_freq(uint16_t port_id __rte_unused, uint16_t qidx __rte_unused, >> struct rte_mbuf **pkts __rte_unused, uint16_t nb_rx, >> - uint16_t max_pkts __rte_unused, void *_ __rte_unused) >> + uint16_t max_pkts __rte_unused, void *arg) >> { >> - struct pmd_queue_cfg *q_conf; >> + const unsigned int lcore = rte_lcore_id(); >> + const bool empty = nb_rx == 0; >> + struct pmd_core_cfg *lcore_conf = &lcore_cfgs[lcore]; >> + struct queue_list_entry *queue_conf = arg; >> >> - q_conf = &port_cfg[port_id][qidx]; >> + if (likely(!empty)) { >> + /* early exit */ >> + queue_reset(lcore_conf, queue_conf); >> >> - if (unlikely(nb_rx == 0)) { >> - q_conf->empty_poll_stats++; >> - if (unlikely(q_conf->empty_poll_stats > EMPTYPOLL_MAX)) >> - /* scale down freq */ >> - rte_power_freq_min(rte_lcore_id()); >> - } else { >> - q_conf->empty_poll_stats = 0; >> - /* scale up freq */ >> + /* scale up freq immediately */ >> rte_power_freq_max(rte_lcore_id()); >> + } else { >> + /* can this queue sleep? */ >> + if (!queue_can_sleep(lcore_conf, queue_conf)) >> + return nb_rx; >> + >> + /* can this lcore sleep? */ >> + if (!lcore_can_sleep(lcore_conf)) >> + return nb_rx; >> + >> + rte_power_freq_min(rte_lcore_id()); >> } >> >> return nb_rx; >> @@ -167,11 +297,80 @@ queue_stopped(const uint16_t port_id, const uint16_t queue_id) >> return qinfo.queue_state == RTE_ETH_QUEUE_STATE_STOPPED; >> } >> >> +static int >> +cfg_queues_stopped(struct pmd_core_cfg *queue_cfg) >> +{ >> + const struct queue_list_entry *entry; >> + >> + TAILQ_FOREACH(entry, &queue_cfg->head, next) { >> + const union queue *q = &entry->queue; >> + int ret = queue_stopped(q->portid, q->qid); >> + if (ret != 1) >> + return ret; >> + } >> + return 1; >> +} >> + >> +static int >> +check_scale(unsigned int lcore) >> +{ >> + enum power_management_env env; >> + >> + /* only PSTATE and ACPI modes are supported */ >> + if (!rte_power_check_env_supported(PM_ENV_ACPI_CPUFREQ) && >> + !rte_power_check_env_supported(PM_ENV_PSTATE_CPUFREQ)) { >> + RTE_LOG(DEBUG, POWER, "Neither ACPI nor PSTATE modes are supported\n"); >> + return -ENOTSUP; >> + } >> + /* ensure we could initialize the power library */ >> + if (rte_power_init(lcore)) >> + return -EINVAL; >> + >> + /* ensure we initialized the correct env */ >> + env = rte_power_get_env(); >> + if (env != PM_ENV_ACPI_CPUFREQ && env != PM_ENV_PSTATE_CPUFREQ) { >> + RTE_LOG(DEBUG, POWER, "Neither ACPI nor PSTATE modes were initialized\n"); >> + return -ENOTSUP; >> + } >> + >> + /* we're done */ >> + return 0; >> +} >> + >> +static int >> +check_monitor(struct pmd_core_cfg *cfg, const union queue *qdata) >> +{ >> + struct rte_power_monitor_cond dummy; >> + >> + /* check if rte_power_monitor is supported */ >> + if (!global_data.intrinsics_support.power_monitor) { >> + RTE_LOG(DEBUG, POWER, "Monitoring intrinsics are not supported\n"); >> + return -ENOTSUP; >> + } >> + >> + if (cfg->n_queues > 0) { >> + RTE_LOG(DEBUG, POWER, "Monitoring multiple queues is not supported\n"); >> + return -ENOTSUP; >> + } >> + >> + /* check if the device supports the necessary PMD API */ >> + if (rte_eth_get_monitor_addr(qdata->portid, qdata->qid, >> + &dummy) == -ENOTSUP) { >> + RTE_LOG(DEBUG, POWER, "The device does not support rte_eth_get_monitor_addr\n"); >> + return -ENOTSUP; >> + } >> + >> + /* we're done */ >> + return 0; >> +} >> + >> int >> rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id, >> uint16_t queue_id, enum rte_power_pmd_mgmt_type mode) >> { >> - struct pmd_queue_cfg *queue_cfg; >> + const union queue qdata = {.portid = port_id, .qid = queue_id}; >> + struct pmd_core_cfg *lcore_cfg; >> + struct queue_list_entry *queue_cfg; >> struct rte_eth_dev_info info; >> rte_rx_callback_fn clb; >> int ret; >> @@ -202,9 +401,19 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id, >> goto end; >> } >> >> - queue_cfg = &port_cfg[port_id][queue_id]; >> + lcore_cfg = &lcore_cfgs[lcore_id]; >> >> - if (queue_cfg->pwr_mgmt_state != PMD_MGMT_DISABLED) { >> + /* check if other queues are stopped as well */ >> + ret = cfg_queues_stopped(lcore_cfg); >> + if (ret != 1) { >> + /* error means invalid queue, 0 means queue wasn't stopped */ >> + ret = ret < 0 ? -EINVAL : -EBUSY; >> + goto end; >> + } >> + >> + /* if callback was already enabled, check current callback type */ >> + if (lcore_cfg->pwr_mgmt_state != PMD_MGMT_DISABLED && >> + lcore_cfg->cb_mode != mode) { >> ret = -EINVAL; >> goto end; >> } >> @@ -214,53 +423,20 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id, >> >> switch (mode) { >> case RTE_POWER_MGMT_TYPE_MONITOR: >> - { >> - struct rte_power_monitor_cond dummy; >> - >> - /* check if rte_power_monitor is supported */ >> - if (!global_data.intrinsics_support.power_monitor) { >> - RTE_LOG(DEBUG, POWER, "Monitoring intrinsics are not supported\n"); >> - ret = -ENOTSUP; >> + /* check if we can add a new queue */ >> + ret = check_monitor(lcore_cfg, &qdata); >> + if (ret < 0) >> goto end; >> - } >> >> - /* check if the device supports the necessary PMD API */ >> - if (rte_eth_get_monitor_addr(port_id, queue_id, >> - &dummy) == -ENOTSUP) { >> - RTE_LOG(DEBUG, POWER, "The device does not support rte_eth_get_monitor_addr\n"); >> - ret = -ENOTSUP; >> - goto end; >> - } >> clb = clb_umwait; >> break; >> - } >> case RTE_POWER_MGMT_TYPE_SCALE: >> - { >> - enum power_management_env env; >> - /* only PSTATE and ACPI modes are supported */ >> - if (!rte_power_check_env_supported(PM_ENV_ACPI_CPUFREQ) && >> - !rte_power_check_env_supported( >> - PM_ENV_PSTATE_CPUFREQ)) { >> - RTE_LOG(DEBUG, POWER, "Neither ACPI nor PSTATE modes are supported\n"); >> - ret = -ENOTSUP; >> + /* check if we can add a new queue */ >> + ret = check_scale(lcore_id); >> + if (ret < 0) >> goto end; >> - } >> - /* ensure we could initialize the power library */ >> - if (rte_power_init(lcore_id)) { >> - ret = -EINVAL; >> - goto end; >> - } >> - /* ensure we initialized the correct env */ >> - env = rte_power_get_env(); >> - if (env != PM_ENV_ACPI_CPUFREQ && >> - env != PM_ENV_PSTATE_CPUFREQ) { >> - RTE_LOG(DEBUG, POWER, "Neither ACPI nor PSTATE modes were initialized\n"); >> - ret = -ENOTSUP; >> - goto end; >> - } >> clb = clb_scale_freq; >> break; >> - } >> case RTE_POWER_MGMT_TYPE_PAUSE: >> /* figure out various time-to-tsc conversions */ >> if (global_data.tsc_per_us == 0) >> @@ -273,13 +449,23 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id, >> ret = -EINVAL; >> goto end; >> } >> + /* add this queue to the list */ >> + ret = queue_list_add(lcore_cfg, &qdata); >> + if (ret < 0) { >> + RTE_LOG(DEBUG, POWER, "Failed to add queue to list: %s\n", >> + strerror(-ret)); >> + goto end; >> + } >> + /* new queue is always added last */ >> + queue_cfg = TAILQ_LAST(&lcore_cfgs->head, queue_list_head); >> >> /* initialize data before enabling the callback */ >> - queue_cfg->empty_poll_stats = 0; >> - queue_cfg->cb_mode = mode; >> - queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED; >> - queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id, queue_id, >> - clb, NULL); >> + if (lcore_cfg->n_queues == 1) { >> + lcore_cfg->cb_mode = mode; >> + lcore_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED; >> + } >> + queue_cfg->cb = rte_eth_add_rx_callback(port_id, queue_id, >> + clb, queue_cfg); >> >> ret = 0; >> end: >> @@ -290,7 +476,9 @@ int >> rte_power_ethdev_pmgmt_queue_disable(unsigned int lcore_id, >> uint16_t port_id, uint16_t queue_id) >> { >> - struct pmd_queue_cfg *queue_cfg; >> + const union queue qdata = {.portid = port_id, .qid = queue_id}; >> + struct pmd_core_cfg *lcore_cfg; >> + struct queue_list_entry *queue_cfg; >> int ret; >> >> RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL); >> @@ -306,24 +494,40 @@ rte_power_ethdev_pmgmt_queue_disable(unsigned int lcore_id, >> } >> >> /* no need to check queue id as wrong queue id would not be enabled */ >> - queue_cfg = &port_cfg[port_id][queue_id]; >> + lcore_cfg = &lcore_cfgs[lcore_id]; >> >> - if (queue_cfg->pwr_mgmt_state != PMD_MGMT_ENABLED) >> + /* check if other queues are stopped as well */ >> + ret = cfg_queues_stopped(lcore_cfg); >> + if (ret != 1) { >> + /* error means invalid queue, 0 means queue wasn't stopped */ >> + return ret < 0 ? -EINVAL : -EBUSY; >> + } >> + >> + if (lcore_cfg->pwr_mgmt_state != PMD_MGMT_ENABLED) >> return -EINVAL; >> >> - /* stop any callbacks from progressing */ >> - queue_cfg->pwr_mgmt_state = PMD_MGMT_DISABLED; >> + /* >> + * There is no good/easy way to do this without race conditions, so we >> + * are just going to throw our hands in the air and hope that the user >> + * has read the documentation and has ensured that ports are stopped at >> + * the time we enter the API functions. >> + */ >> + queue_cfg = queue_list_take(lcore_cfg, &qdata); >> + if (queue_cfg == NULL) >> + return -ENOENT; >> >> - switch (queue_cfg->cb_mode) { >> + /* if we've removed all queues from the lists, set state to disabled */ >> + if (lcore_cfg->n_queues == 0) >> + lcore_cfg->pwr_mgmt_state = PMD_MGMT_DISABLED; >> + >> + switch (lcore_cfg->cb_mode) { >> case RTE_POWER_MGMT_TYPE_MONITOR: /* fall-through */ >> case RTE_POWER_MGMT_TYPE_PAUSE: >> - rte_eth_remove_rx_callback(port_id, queue_id, >> - queue_cfg->cur_cb); >> + rte_eth_remove_rx_callback(port_id, queue_id, queue_cfg->cb); >> break; >> case RTE_POWER_MGMT_TYPE_SCALE: >> rte_power_freq_max(lcore_id); >> - rte_eth_remove_rx_callback(port_id, queue_id, >> - queue_cfg->cur_cb); >> + rte_eth_remove_rx_callback(port_id, queue_id, queue_cfg->cb); >> rte_power_exit(lcore_id); >> break; >> } >> @@ -332,7 +536,18 @@ rte_power_ethdev_pmgmt_queue_disable(unsigned int lcore_id, >> * ports before calling any of these API's, so we can assume that the >> * callbacks can be freed. we're intentionally casting away const-ness. >> */ >> - rte_free((void *)queue_cfg->cur_cb); >> + rte_free((void *)queue_cfg->cb); >> + free(queue_cfg); >> >> return 0; >> } >> + >> +RTE_INIT(rte_power_ethdev_pmgmt_init) { >> + size_t i; >> + >> + /* initialize all tailqs */ >> + for (i = 0; i < RTE_DIM(lcore_cfgs); i++) { >> + struct pmd_core_cfg *cfg = &lcore_cfgs[i]; >> + TAILQ_INIT(&cfg->head); >> + } >> +} >> -- >> 2.25.1 > -- Thanks, Anatoly