From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from dpdk.org (dpdk.org [92.243.14.124])
	by inbox.dpdk.org (Postfix) with ESMTP id 1ED8DA04C7;
	Mon, 14 Sep 2020 23:01:47 +0200 (CEST)
Received: from [92.243.14.124] (localhost [127.0.0.1])
	by dpdk.org (Postfix) with ESMTP id 043701C0C0;
	Mon, 14 Sep 2020 23:01:47 +0200 (CEST)
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
 by dpdk.org (Postfix) with ESMTP id F1E9C1BFC4
 for <dev@dpdk.org>; Mon, 14 Sep 2020 23:01:44 +0200 (CEST)
IronPort-SDR: omzwdFyeBOZodDS6I4OZ8tXpoI43XhbgXKavuCdj/ckOA/hHHzE04g1Jp5+lvqYhhdh/Ff5Szn
 F3jyeRGOqkew==
X-IronPort-AV: E=McAfee;i="6000,8403,9744"; a="146900306"
X-IronPort-AV: E=Sophos;i="5.76,427,1592895600"; d="scan'208";a="146900306"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from fmsmga007.fm.intel.com ([10.253.24.52])
 by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 14 Sep 2020 14:01:42 -0700
IronPort-SDR: rhsCoO4fJzBdpa7LARQsvTfLAYOa9uAEtpLNGllGWzSN59hTLa0cz2pBCUwalrIuB2AJMegVbk
 4D2suHqSdDtA==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.76,427,1592895600"; d="scan'208";a="286562879"
Received: from irvmail001.ir.intel.com ([163.33.26.43])
 by fmsmga007.fm.intel.com with ESMTP; 14 Sep 2020 14:01:41 -0700
Received: from sivswdev09.ir.intel.com (sivswdev09.ir.intel.com
 [10.237.217.48])
 by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id
 08EL1eEC026934; Mon, 14 Sep 2020 22:01:40 +0100
Received: from sivswdev09.ir.intel.com (localhost [127.0.0.1])
 by sivswdev09.ir.intel.com with ESMTP id 08EL1emq027085;
 Mon, 14 Sep 2020 22:01:40 +0100
Received: (from lma25@localhost)
 by sivswdev09.ir.intel.com with LOCAL id 08EL1ePU027081;
 Mon, 14 Sep 2020 22:01:40 +0100
Date: Mon, 14 Sep 2020 22:01:40 +0100
From: "Liang, Ma" <liang.j.ma@intel.com>
To: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>, "Hunt, David" <david.hunt@intel.com>,
 "Burakov, Anatoly" <anatoly.burakov@intel.com>
Message-ID: <20200914210140.GE13240@sivswdev09.ir.intel.com>
References: <1597141666-20621-1-git-send-email-liang.j.ma@intel.com>
 <1599214740-3927-1-git-send-email-liang.j.ma@intel.com>
 <1599214740-3927-3-git-send-email-liang.j.ma@intel.com>
 <BYAPR11MB3301AE40DE185AFAF89F4FFE9A2D0@BYAPR11MB3301.namprd11.prod.outlook.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <BYAPR11MB3301AE40DE185AFAF89F4FFE9A2D0@BYAPR11MB3301.namprd11.prod.outlook.com>
Subject: Re: [dpdk-dev] [PATCH v3 3/6] power: add simple power management
 API and callback
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

On 04 Sep 11:33, Ananyev, Konstantin wrote:

<snip>
> > +struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> > +
> > +if (unlikely(nb_rx == 0)) {
> > +dev->empty_poll_stats[qidx].num++;
>
Hi Konstantin,
   Agree, v4 will relocate the meta data to seperate structure. 
   and without touch the rte_ethdev structure. 
> I believe there are two fundamental issues with that approach:
> 1. You put metadata specific lib (power) callbacks into rte_eth_dev struct.
> 2. These callbacks do access rte_eth_devices[] directly.
> That doesn't look right to me - rte_eth_dev structure supposed to be treated
> as internal one librt_ether and underlying drivers and should be accessed directly
> by outer code.
> If these callbacks need some extra metadata, then it is responsibility
> of power library to allocate/manage these metadata.
> You can pass pointer to this metadata via last parameter for rte_eth_add_rx_callback().
> 
> > +if (unlikely(dev->empty_poll_stats[qidx].num >
> > +     ETH_EMPTYPOLL_MAX)) {
> > +volatile void *target_addr;
> > +uint64_t expected, mask;
> > +uint16_t ret;
> > +
> > +/*
> > + * get address of next descriptor in the RX
> > + * ring for this queue, as well as expected
> > + * value and a mask.
> > + */
> > +ret = (*dev->dev_ops->next_rx_desc)
> > +(dev->data->rx_queues[qidx],
> > + &target_addr, &expected, &mask);
> > +if (ret == 0)
> > +/* -1ULL is maximum value for TSC */
> > +rte_power_monitor(target_addr,
> > +  expected, mask,
> > +  0, -1ULL);
> > +}
> > +} else
> > +dev->empty_poll_stats[qidx].num = 0;
> > +
> > +return nb_rx;
> > +}
> > +
> > +static uint16_t
> > +rte_power_mgmt_pause(uint16_t port_id, uint16_t qidx,
> > +struct rte_mbuf **pkts __rte_unused, uint16_t nb_rx,
> > +uint16_t max_pkts __rte_unused, void *_  __rte_unused)
> > +{
> > +struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> > +
> > +int i;
> > +
> > +if (unlikely(nb_rx == 0)) {
> > +
> > +dev->empty_poll_stats[qidx].num++;
> > +
> > +if (unlikely(dev->empty_poll_stats[qidx].num >
> > +     ETH_EMPTYPOLL_MAX)) {
> > +
> > +for (i = 0; i < RTE_ETH_PAUSE_NUM; i++)
> > +rte_pause();
> > +
> > +}
> > +} else
> > +dev->empty_poll_stats[qidx].num = 0;
> > +
> > +return nb_rx;
> > +}
> > +
> > +static uint16_t
> > +rte_power_mgmt_scalefreq(uint16_t port_id, uint16_t qidx,
> > +struct rte_mbuf **pkts __rte_unused, uint16_t nb_rx,
> > +uint16_t max_pkts __rte_unused, void *_  __rte_unused)
> > +{
> > +struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> > +
> > +if (unlikely(nb_rx == 0)) {
> > +dev->empty_poll_stats[qidx].num++;
> > +if (unlikely(dev->empty_poll_stats[qidx].num >
> > +     ETH_EMPTYPOLL_MAX)) {
> > +
> > +/*scale down freq */
> > +rte_power_freq_min(rte_lcore_id());
> > +
> > +}
> > +} else {
> > +dev->empty_poll_stats[qidx].num = 0;
> > +/* scal up freq */
> > +rte_power_freq_max(rte_lcore_id());
> > +}
> > +
> > +return nb_rx;
> > +}
> > +
> > +int
> > +rte_power_pmd_mgmt_enable(unsigned int lcore_id,
> > +uint16_t port_id,
> > +enum rte_eth_dev_power_mgmt_cb_mode mode)
> > +{
> > +struct rte_eth_dev *dev;
> > +
> > +RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
> > +dev = &rte_eth_devices[port_id];
> > +
> > +if (dev->pwr_mgmt_state == RTE_ETH_DEV_POWER_MGMT_ENABLED)
> > +return -EINVAL;
> > +/* allocate memory for empty poll stats */
> > +dev->empty_poll_stats = rte_malloc_socket(NULL,
> > +  sizeof(struct rte_eth_ep_stat)
> > +  * RTE_MAX_QUEUES_PER_PORT,
> > +  0, dev->data->numa_node);
> > +if (dev->empty_poll_stats == NULL)
> > +return -ENOMEM;
> > +
> > +switch (mode) {
> > +case RTE_ETH_DEV_POWER_MGMT_CB_WAIT:
> > +if (!rte_cpu_get_flag_enabled(RTE_CPUFLAG_WAITPKG))
> > +return -ENOTSUP;
> 
> Here and in other places: in case of error return you don't' free your empty_poll_stats.
> 
> > +dev->cur_pwr_cb = rte_eth_add_rx_callback(port_id, 0,
> 
> Why zero for queue number, why not to pass queue_id as a parameter for that function?
v4 will move to use queue_id instead of 0. v3 still assume only queue 0 is used.
> 
> > +rte_power_mgmt_umwait, NULL);
> 
> As I said above, instead of NULL - could be pointer to metadata struct.
v4 will address this. 
> 
> > +break;
> > +case RTE_ETH_DEV_POWER_MGMT_CB_SCALE:
> > +/* init scale freq */
> > +if (rte_power_init(lcore_id))
> > +return -EINVAL;
> > +dev->cur_pwr_cb = rte_eth_add_rx_callback(port_id, 0,
> > +rte_power_mgmt_scalefreq, NULL);
> > +break;
> > +case RTE_ETH_DEV_POWER_MGMT_CB_PAUSE:
> > +dev->cur_pwr_cb = rte_eth_add_rx_callback(port_id, 0,
> > +rte_power_mgmt_pause, NULL);
> > +break;
> > +}
> > +
> > +dev->cb_mode = mode;
> > +dev->pwr_mgmt_state = RTE_ETH_DEV_POWER_MGMT_ENABLED;
> > +return 0;
> > +}
> > +
> > +int
> > +rte_power_pmd_mgmt_disable(unsigned int lcore_id,
> > +uint16_t port_id)
> > +{
> > +struct rte_eth_dev *dev;
> > +
> > +RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
> > +dev = &rte_eth_devices[port_id];
> > +
> > +/*add flag check */
> > +
> > +if (dev->pwr_mgmt_state == RTE_ETH_DEV_POWER_MGMT_DISABLED)
> > +return -EINVAL;
> > +
> > +/* rte_free ignores NULL so safe to call without checks */
> > +rte_free(dev->empty_poll_stats);
> 
> You can't free callback metadata before removing the callback itself.
> In fact, with current rx callback code it is not safe to free it
> even after (we discussed it offline).
agree. 
> 
> > +
> > +switch (dev->cb_mode) {
> > +case RTE_ETH_DEV_POWER_MGMT_CB_WAIT:
> > +case RTE_ETH_DEV_POWER_MGMT_CB_PAUSE:
> > +rte_eth_remove_rx_callback(port_id, 0,
> > +   dev->cur_pwr_cb);
> > +break;
> > +case RTE_ETH_DEV_POWER_MGMT_CB_SCALE:
> > +rte_power_freq_max(lcore_id);
> 
> Stupid q: what makes you think that lcore frequency was max,
> *before* you setup the callback?
that is because the rte_power_init() has figured out the system max.
the init code invocate rte_power_init() already. 
> 
> > +rte_eth_remove_rx_callback(port_id, 0,
> > +   dev->cur_pwr_cb);
> > +if (rte_power_exit(lcore_id))
> > +return -EINVAL;
> > +break;
> > +}
> > +
> > +dev->pwr_mgmt_state = RTE_ETH_DEV_POWER_MGMT_DISABLED;
> > +dev->cur_pwr_cb = NULL;
> > +dev->cb_mode = 0;
> > +
> > +return 0;
> > +}
> > diff --git a/lib/librte_power/rte_power_version.map b/lib/librte_power/rte_power_version.map
> > index 00ee5753e2..ade83cfd4f 100644
> > --- a/lib/librte_power/rte_power_version.map
> > +++ b/lib/librte_power/rte_power_version.map
> > @@ -34,4 +34,8 @@ EXPERIMENTAL {
> >  rte_power_guest_channel_receive_msg;
> >  rte_power_poll_stat_fetch;
> >  rte_power_poll_stat_update;
> > +# added in 20.08
> > +rte_power_pmd_mgmt_disable;
> > +rte_power_pmd_mgmt_enable;
> > +
> >  };
> > --
> > 2.17.1
>