From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 5458B4596E; Thu, 12 Sep 2024 15:01:25 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 3FCD4427B0; Thu, 12 Sep 2024 15:01:25 +0200 (CEST) Received: from mail.lysator.liu.se (mail.lysator.liu.se [130.236.254.3]) by mails.dpdk.org (Postfix) with ESMTP id 59B9C427B0 for ; Thu, 12 Sep 2024 15:01:23 +0200 (CEST) Received: from mail.lysator.liu.se (localhost [127.0.0.1]) by mail.lysator.liu.se (Postfix) with ESMTP id 19CD44EAC for ; Thu, 12 Sep 2024 15:01:23 +0200 (CEST) Received: by mail.lysator.liu.se (Postfix, from userid 1004) id 0DDDC4F18; Thu, 12 Sep 2024 15:01:23 +0200 (CEST) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on hermod.lysator.liu.se X-Spam-Level: X-Spam-Status: No, score=-1.2 required=5.0 tests=ALL_TRUSTED,AWL, T_SCC_BODY_TEXT_LINE autolearn=disabled version=4.0.0 X-Spam-Score: -1.2 Received: from [192.168.1.86] (h-62-63-215-114.A163.priv.bahnhof.se [62.63.215.114]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mail.lysator.liu.se (Postfix) with ESMTPSA id 4CD5B4EAA; Thu, 12 Sep 2024 15:01:21 +0200 (CEST) Message-ID: Date: Thu, 12 Sep 2024 15:01:20 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v3 3/7] eal: add lcore variable performance test To: =?UTF-8?Q?Morten_Br=C3=B8rup?= , =?UTF-8?Q?Mattias_R=C3=B6nnblom?= , dev@dpdk.org Cc: Stephen Hemminger , Konstantin Ananyev , David Marchand , Jerin Jacob References: <20240911170430.701685-2-mattias.ronnblom@ericsson.com> <20240912084429.703405-1-mattias.ronnblom@ericsson.com> <20240912084429.703405-4-mattias.ronnblom@ericsson.com> <98CBD80474FA8B44BF855DF32C47DC35E9F6D4@smartserver.smartshare.dk> Content-Language: en-US From: =?UTF-8?Q?Mattias_R=C3=B6nnblom?= In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35E9F6D4@smartserver.smartshare.dk> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Virus-Scanned: ClamAV using ClamSMTP X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On 2024-09-12 11:39, Morten Brørup wrote: >> +struct lcore_state { >> + uint64_t a; >> + uint64_t b; >> + uint64_t sum; >> +}; >> + >> +static __rte_always_inline void >> +update(struct lcore_state *state) >> +{ >> + state->sum += state->a * state->b; >> +} >> + >> +static RTE_DEFINE_PER_LCORE(struct lcore_state, tls_lcore_state); >> + >> +static __rte_noinline void >> +tls_update(void) >> +{ >> + update(&RTE_PER_LCORE(tls_lcore_state)); > > I would normally access TLS variables directly, not through a pointer, i.e.: > > RTE_PER_LCORE(tls_lcore_state.sum) += RTE_PER_LCORE(tls_lcore_state.a) * RTE_PER_LCORE(tls_lcore_state.b); > > On the other hand, then it wouldn't be 1:1 comparable to the two other test cases. > > Besides, I expect the compiler to optimize away the indirect access, and produce the same output (as for the alternative implementation) anyway. > > No change requested. Just noticing. > >> +} >> + >> +struct __rte_cache_aligned lcore_state_aligned { >> + uint64_t a; >> + uint64_t b; >> + uint64_t sum; > > Please add RTE_CACHE_GUARD here, for 100 % matching the common design pattern. > Will do. >> +}; >> + >> +static struct lcore_state_aligned sarray_lcore_state[RTE_MAX_LCORE]; > > >> + printf("Latencies [ns/update]\n"); >> + printf("Thread-local storage Static array Lcore variables\n"); >> + printf("%20.1f %13.1f %16.1f\n", tls_latency * 1e9, >> + sarray_latency * 1e9, lvar_latency * 1e9); > > I prefer cycles over ns. Perhaps you could show both? > That's makes you an x86 guy. :) Since only on x86 those cycles makes any sense. I didn't want to use cycles since it would be a very small value on certain (e.g., old ARM) platforms. But, elsewhere in the perf tests TSC cycles are used, so maybe I should switch to using such nevertheless. > > With RTE_CACHE_GUARD added where mentioned, > > Acked-by: Morten Brørup >