From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id EA527A0032; Fri, 15 Jul 2022 16:11:55 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id CB72140A87; Fri, 15 Jul 2022 16:11:55 +0200 (CEST) Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by mails.dpdk.org (Postfix) with ESMTP id A140B40696 for ; Fri, 15 Jul 2022 16:11:54 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1657894314; x=1689430314; h=date:from:to:cc:subject:message-id:references: mime-version:in-reply-to; bh=PhWnnHJ1CTyIqs6tE7q9udoEunbZ/1HoXQ3KlSCvW4M=; b=BbasFF5au8cc+jgaK6rXqMQs+Piijg+wNbjXPG6f1zZ5FPBXISFK7vlj hxEBrczNt5jPlIDhb6GYDwVgw6JIHsusJfxELu8+8Jjvv7EM+WbSjpEyQ 4vIbBhMdZiBScKs6IqlCYcbNTaslf9VVzARAFQ7wTRLROds8qPADOt3we HF6WcbHjDhvVERH1RTmxmMcYBq7B40oYAcIZfGljOb2g2f5XvCQXzImUk mNo6bbNFDUokIpdosyVuRow7mhbSit9JRciNZNYMrlyNPWqZfhJwEvgh2 uMKLs6CipJj4+bgHA+J/AINL35eSuV4F61K+rVOTKwN6IHlUGJCt7N453 w==; X-IronPort-AV: E=McAfee;i="6400,9594,10408"; a="286941790" X-IronPort-AV: E=Sophos;i="5.92,274,1650956400"; d="scan'208";a="286941790" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Jul 2022 07:11:53 -0700 X-IronPort-AV: E=Sophos;i="5.92,274,1650956400"; d="scan'208";a="923520321" Received: from bricha3-mobl.ger.corp.intel.com ([10.55.133.37]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-SHA; 15 Jul 2022 07:11:49 -0700 Date: Fri, 15 Jul 2022 15:11:46 +0100 From: Bruce Richardson To: Jerin Jacob Cc: Anatoly Burakov , dpdk-dev , Nicolas Chautru , Fan Zhang , Ashish Gupta , Akhil Goyal , David Hunt , Chengwen Feng , Kevin Laatz , Ray Kinsella , Thomas Monjalon , Ferruh Yigit , Andrew Rybchenko , Jerin Jacob , Sachin Saxena , Hemant Agrawal , Ori Kam , Honnappa Nagarahalli , Konstantin Ananyev , Conor Walsh Subject: Re: [PATCH v1 1/2] eal: add lcore busyness telemetry Message-ID: References: <24c49429394294cfbf0d9c506b205029bac77c8b.1657890378.git.anatoly.burakov@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On Fri, Jul 15, 2022 at 07:16:17PM +0530, Jerin Jacob wrote: > On Fri, Jul 15, 2022 at 6:42 PM Anatoly Burakov > wrote: > > > > Currently, there is no way to measure lcore busyness in a passive way, > > without any modifications to the application. This patch adds a new EAL > > API that will be able to passively track core busyness. > > > > The busyness is calculated by relying on the fact that most DPDK API's > > will poll for packets. Empty polls can be counted as "idle", while > > non-empty polls can be counted as busy. To measure lcore busyness, we > > simply call the telemetry timestamping function with the number of polls > > a particular code section has processed, and count the number of cycles > > we've spent processing empty bursts. The more empty bursts we encounter, > > the less cycles we spend in "busy" state, and the less core busyness > > will be reported. > > > > In order for all of the above to work without modifications to the > > application, the library code needs to be instrumented with calls to > > the lcore telemetry busyness timestamping function. The following parts > > of DPDK are instrumented with lcore telemetry calls: > > > > - All major driver API's: > > - ethdev > > - cryptodev > > - compressdev > > - regexdev > > - bbdev > > - rawdev > > - eventdev > > - dmadev > > - Some additional libraries: > > - ring > > - distributor > > > > To avoid performance impact from having lcore telemetry support, a > > global variable is exported by EAL, and a call to timestamping function > > is wrapped into a macro, so that whenever telemetry is disabled, it only > > takes one additional branch and no function calls are performed. It is > > also possible to disable it at compile time by commenting out > > RTE_LCORE_BUSYNESS from build config. > > > > This patch also adds a telemetry endpoint to report lcore busyness, as > > well as telemetry endpoints to enable/disable lcore telemetry. > > > > Signed-off-by: Kevin Laatz > > Signed-off-by: Conor Walsh > > Signed-off-by: David Hunt > > Signed-off-by: Anatoly Burakov > > > It is a good feature. Thanks for this work. > +1 Some follow-up comments inline below. /Bruce > > > --- > > > > Notes: > > We did a couple of quick smoke tests to see if this patch causes any performance > > degradation, and it seemed to have none that we could measure. Telemetry can be > > disabled at compile time via a config option, while at runtime it can be > > disabled, seemingly at a cost of one additional branch. > > > > That said, our benchmarking efforts were admittedly not very rigorous, so > > comments welcome! > > > > > diff --git a/config/rte_config.h b/config/rte_config.h > > index 46549cb062..583cb6f7a5 100644 > > --- a/config/rte_config.h > > +++ b/config/rte_config.h > > @@ -39,6 +39,8 @@ > > #define RTE_LOG_DP_LEVEL RTE_LOG_INFO > > #define RTE_BACKTRACE 1 > > #define RTE_MAX_VFIO_CONTAINERS 64 > > +#define RTE_LCORE_BUSYNESS 1 > > Please don't enable debug features in fastpath as default. > I would disagree that this is a debug feature. The number of times I have heard from DPDK users that they wish to get more visibility into what the app is doing rather than seeing 100% cpu busy. Therefore, I'd see this as enabled by default rather than disabled - unless it's shown to have a performance regression. That said, since this impacts multiple components and it's something that end users might want to disable at build-time, I'd suggest moving it to meson_options.txt file and have it as an official DPDK build-time option. > > +#define RTE_LCORE_BUSYNESS_PERIOD 4000000ULL > > > > + > > +#include > > +#include > > +#include > > + > > +#include > > +#include > > +#include > > +#include > > + > > +#ifdef RTE_LCORE_BUSYNESS > > This clutter may not be required. Let it compile all cases. > > > +#include > > +#endif > > + > > +int __rte_lcore_telemetry_enabled; > > + > > +#ifdef RTE_LCORE_BUSYNESS > > + > > +struct lcore_telemetry { > > + int busyness; > > + /**< Calculated busyness (gets set/returned by the API) */ > > + int raw_busyness; > > + /**< Calculated busyness times 100. */ > > + uint64_t interval_ts; > > + /**< when previous telemetry interval started */ > > + uint64_t empty_cycles; > > + /**< empty cycle count since last interval */ > > + uint64_t last_poll_ts; > > + /**< last poll timestamp */ > > + bool last_empty; > > + /**< if last poll was empty */ > > + unsigned int contig_poll_cnt; > > + /**< contiguous (always empty/non empty) poll counter */ > > +} __rte_cache_aligned; > > +static struct lcore_telemetry telemetry_data[RTE_MAX_LCORE]; > > Allocate this from hugepage. > Yes, whether or not it's allocated from hugepages, dynamic allocation would be better than having static vars. > > + > > +void __rte_lcore_telemetry_timestamp(uint16_t nb_rx) > > +{ > > + const unsigned int lcore_id = rte_lcore_id(); > > + uint64_t interval_ts, empty_cycles, cur_tsc, last_poll_ts; > > + struct lcore_telemetry *tdata = &telemetry_data[lcore_id]; > > + const bool empty = nb_rx == 0; > > + uint64_t diff_int, diff_last; > > + bool last_empty; > > + > > + last_empty = tdata->last_empty; > > + > > + /* optimization: don't do anything if status hasn't changed */ > > + if (last_empty == empty && tdata->contig_poll_cnt++ < 32) > > + return; > > + /* status changed or we're waiting for too long, reset counter */ > > + tdata->contig_poll_cnt = 0; > > + > > + cur_tsc = rte_rdtsc(); > > + > > + interval_ts = tdata->interval_ts; > > + empty_cycles = tdata->empty_cycles; > > + last_poll_ts = tdata->last_poll_ts; > > + > > + diff_int = cur_tsc - interval_ts; > > + diff_last = cur_tsc - last_poll_ts; > > + > > + /* is this the first time we're here? */ > > + if (interval_ts == 0) { > > + tdata->busyness = LCORE_BUSYNESS_MIN; > > + tdata->raw_busyness = 0; > > + tdata->interval_ts = cur_tsc; > > + tdata->empty_cycles = 0; > > + tdata->contig_poll_cnt = 0; > > + goto end; > > + } > > + > > + /* update the empty counter if we got an empty poll earlier */ > > + if (last_empty) > > + empty_cycles += diff_last; > > + > > + /* have we passed the interval? */ > > + if (diff_int > RTE_LCORE_BUSYNESS_PERIOD) { > > > I think, this function logic can be limited to just updating the > timestamp in the ring buffer, > and another control function of telemetry which runs in control core to do > heavy lifting to reduce the performance impact on fast path, > > > + int raw_busyness; > > + > > + /* get updated busyness value */ > > + raw_busyness = calc_raw_busyness(tdata, empty_cycles, diff_int); > > + > > + /* set a new interval, reset empty counter */ > > + tdata->interval_ts = cur_tsc; > > + tdata->empty_cycles = 0; > > + tdata->raw_busyness = raw_busyness; > > + /* bring busyness back to 0..100 range, biased to round up */ > > + tdata->busyness = (raw_busyness + 50) / 100; > > + } else > > + /* we may have updated empty counter */ > > + tdata->empty_cycles = empty_cycles; > > + > > +end: > > + /* update status for next poll */ > > + tdata->last_poll_ts = cur_tsc; > > + tdata->last_empty = empty; > > +} > > + > > +#ifdef RTE_LCORE_BUSYNESS > > +#define RTE_LCORE_TELEMETRY_TIMESTAMP(nb_rx) \ > > + do { \ > > + if (__rte_lcore_telemetry_enabled) \ > > I think, rather than reading memory, Like Linux perf infrastructure, > we can patch up the instruction steam as NOP vs Timestamp capture > function. > Surely that requires much more complicated tooling? How would that work in this situation? > > Also instead of changing all libraries, Maybe we can use > "-finstrument-functions". > and just mark the function with attribute. > > Just 2c. > > > + __rte_lcore_telemetry_timestamp(nb_rx); \ > > + } while (0) > > +#else > > +#define RTE_LCORE_TELEMETRY_TIMESTAMP(nb_rx) \ > > + while (0) > > +#endif > > +