From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124])
	by inbox.dpdk.org (Postfix) with ESMTP id D149CA0544;
	Fri,  2 Sep 2022 19:18:01 +0200 (CEST)
Received: from [217.70.189.124] (localhost [127.0.0.1])
	by mails.dpdk.org (Postfix) with ESMTP id 6662440693;
	Fri,  2 Sep 2022 19:18:01 +0200 (CEST)
Received: from mail.lysator.liu.se (mail.lysator.liu.se [130.236.254.3])
 by mails.dpdk.org (Postfix) with ESMTP id A6E3240685
 for <dev@dpdk.org>; Fri,  2 Sep 2022 19:18:00 +0200 (CEST)
Received: from mail.lysator.liu.se (localhost [127.0.0.1])
 by mail.lysator.liu.se (Postfix) with ESMTP id C11C313455
 for <dev@dpdk.org>; Fri,  2 Sep 2022 19:17:59 +0200 (CEST)
Received: by mail.lysator.liu.se (Postfix, from userid 1004)
 id BFDBC1388F; Fri,  2 Sep 2022 19:17:59 +0200 (CEST)
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 hermod.lysator.liu.se
X-Spam-Level: 
X-Spam-Status: No, score=-1.5 required=5.0 tests=ALL_TRUSTED, AWL, NICE_REPLY_A,
 T_SCC_BODY_TEXT_LINE autolearn=disabled version=3.4.6
X-Spam-Score: -1.5
Received: from [192.168.1.59] (h-62-63-215-114.A163.priv.bahnhof.se
 [62.63.215.114])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest
 SHA256) (No client certificate requested)
 by mail.lysator.liu.se (Postfix) with ESMTPSA id C6AC21317F;
 Fri,  2 Sep 2022 19:17:58 +0200 (CEST)
Message-ID: <66cf8b28-ab44-8eff-3e9a-cc5b37f2bc6f@lysator.liu.se>
Date: Fri, 2 Sep 2022 19:17:58 +0200
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
 Thunderbird/91.11.0
From: =?UTF-8?Q?Mattias_R=c3=b6nnblom?= <hofors@lysator.liu.se>
Subject: Re: [PATCH v3 1/2] test/service: add perf measurements for with stats
 mode
To: Harry van Haaren <harry.van.haaren@intel.com>, dev@dpdk.org
Cc: =?UTF-8?Q?Mattias_R=c3=b6nnblom?= <mattias.ronnblom@ericsson.com>,
 Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>,
 =?UTF-8?Q?Morten_Br=c3=b8rup?= <mb@smartsharesystems.com>
References: <20220711105747.3295201-1-harry.van.haaren@intel.com>
 <20220711131825.3373195-1-harry.van.haaren@intel.com>
Content-Language: en-US
In-Reply-To: <20220711131825.3373195-1-harry.van.haaren@intel.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Virus-Scanned: ClamAV using ClamSMTP
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org

On 2022-07-11 15:18, Harry van Haaren wrote:
> This commit improves the performance reporting of the service
> cores polling loop to show both with and without statistics
> collection modes. Collecting cycle statistics is costly, due
> to calls to rte_rdtsc() per service iteration.

That is true for a service deployed on only a single core. For 
multi-core services, non-rdtsc-related overhead dominates. For example, 
if the service is deployed on 11 cores, the extra statistics-related 
overhead is ~1000 cc/service call on x86_64. 2x rdtsc shouldn't be more 
than ~50 cc.

> 
> Reported-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> Suggested-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> Suggested-by: Morten Brørup <mb@smartsharesystems.com>
> Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
> 
> ---
> 
> This is split out as a seperate patch from the fix to allow
> measuring the before/after of the service stats atomic fixup.
> ---
>   app/test/test_service_cores.c | 36 ++++++++++++++++++++++++-----------
>   1 file changed, 25 insertions(+), 11 deletions(-)
> 
> diff --git a/app/test/test_service_cores.c b/app/test/test_service_cores.c
> index ced6ed0081..7415b6b686 100644
> --- a/app/test/test_service_cores.c
> +++ b/app/test/test_service_cores.c
> @@ -777,6 +777,22 @@ service_run_on_app_core_func(void *arg)
>   	return rte_service_run_iter_on_app_lcore(*delay_service_id, 1);
>   }
>   
> +static float
> +service_app_lcore_perf_measure(uint32_t id)
> +{
> +	/* Performance test: call in a loop, and measure tsc() */
> +	const uint32_t perf_iters = (1 << 12);
> +	uint64_t start = rte_rdtsc();
> +	uint32_t i;
> +	for (i = 0; i < perf_iters; i++) {
> +		int err = service_run_on_app_core_func(&id);

In a real-world scenario, the latency of this function isn't 
representative for the overall service core overhead.

For example, consider a scenario where an lcore has a single service 
mapped to it. rte_service.c will call service_run() 64 times, but only 
one will be a "hit" and the service being run. One iteration in the 
service loop costs ~600 cc, on a machine where this performance 
benchmark reports 128 cc. (Both with statistics disabled.)

For low-latency services, this is a significant overhead.

> +		TEST_ASSERT_EQUAL(0, err, "perf test: returned run failure");
> +	}
> +	uint64_t end = rte_rdtsc();
> +
> +	return (end - start)/(float)perf_iters;
> +}
> +
>   static int
>   service_app_lcore_poll_impl(const int mt_safe)
>   {
> @@ -828,17 +844,15 @@ service_app_lcore_poll_impl(const int mt_safe)
>   				"MT Unsafe: App core1 didn't return -EBUSY");
>   	}
>   
> -	/* Performance test: call in a loop, and measure tsc() */
> -	const uint32_t perf_iters = (1 << 12);
> -	uint64_t start = rte_rdtsc();
> -	uint32_t i;
> -	for (i = 0; i < perf_iters; i++) {
> -		int err = service_run_on_app_core_func(&id);
> -		TEST_ASSERT_EQUAL(0, err, "perf test: returned run failure");
> -	}
> -	uint64_t end = rte_rdtsc();
> -	printf("perf test for %s: %0.1f cycles per call\n", mt_safe ?
> -		"MT Safe" : "MT Unsafe", (end - start)/(float)perf_iters);
> +	/* Measure performance of no-stats and with-stats. */
> +	float cyc_no_stats = service_app_lcore_perf_measure(id);
> +
> +	TEST_ASSERT_EQUAL(0, rte_service_set_stats_enable(id, 1),
> +				"failed to enable stats for service.");
> +	float cyc_with_stats = service_app_lcore_perf_measure(id);
> +
> +	printf("perf test for %s, no stats: %0.1f, with stats %0.1f cycles/call\n",
> +		mt_safe ? "MT Safe" : "MT Unsafe", cyc_no_stats, cyc_with_stats);
>   
>   	unregister_all();
>   	return TEST_SUCCESS;