From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 526C5459B2; Mon, 16 Sep 2024 18:12:59 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 3DA73427A8; Mon, 16 Sep 2024 18:12:59 +0200 (CEST) Received: from mail.lysator.liu.se (mail.lysator.liu.se [130.236.254.3]) by mails.dpdk.org (Postfix) with ESMTP id 867DA40150 for ; Mon, 16 Sep 2024 18:12:58 +0200 (CEST) Received: from mail.lysator.liu.se (localhost [127.0.0.1]) by mail.lysator.liu.se (Postfix) with ESMTP id 4CC24167D6 for ; Mon, 16 Sep 2024 18:12:58 +0200 (CEST) Received: by mail.lysator.liu.se (Postfix, from userid 1004) id 40B6A1667F; Mon, 16 Sep 2024 18:12:58 +0200 (CEST) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on hermod.lysator.liu.se X-Spam-Level: X-Spam-Status: No, score=-1.2 required=5.0 tests=ALL_TRUSTED,AWL, T_SCC_BODY_TEXT_LINE autolearn=disabled version=4.0.0 X-Spam-Score: -1.2 Received: from [192.168.1.86] (h-62-63-215-114.A163.priv.bahnhof.se [62.63.215.114]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mail.lysator.liu.se (Postfix) with ESMTPSA id 5FC7B16757; Mon, 16 Sep 2024 18:12:55 +0200 (CEST) Message-ID: <18d312ab-e24f-4faf-b185-ce223a44cb4b@lysator.liu.se> Date: Mon, 16 Sep 2024 18:12:55 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v4 3/7] eal: add lcore variable performance test To: =?UTF-8?Q?Morten_Br=C3=B8rup?= , =?UTF-8?Q?Mattias_R=C3=B6nnblom?= , dev@dpdk.org Cc: Stephen Hemminger , Konstantin Ananyev , David Marchand , Jerin Jacob References: <20240912084429.703405-2-mattias.ronnblom@ericsson.com> <20240916105210.721315-1-mattias.ronnblom@ericsson.com> <20240916105210.721315-4-mattias.ronnblom@ericsson.com> <73c62731-caae-41b2-b9ac-0a190d56073a@lysator.liu.se> <98CBD80474FA8B44BF855DF32C47DC35E9F6EB@smartserver.smartshare.dk> Content-Language: en-US From: =?UTF-8?Q?Mattias_R=C3=B6nnblom?= In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35E9F6EB@smartserver.smartshare.dk> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Virus-Scanned: ClamAV using ClamSMTP X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On 2024-09-16 13:54, Morten Brørup wrote: >> From: Mattias Rönnblom [mailto:hofors@lysator.liu.se] >> Sent: Monday, 16 September 2024 13.13 >> >> On 2024-09-16 12:52, Mattias Rönnblom wrote: >>> Add basic micro benchmark for lcore variables, in an attempt to assure >>> that the overhead isn't significantly greater than alternative >>> approaches, in scenarios where the benefits aren't expected to show up >>> (i.e., when plenty of cache is available compared to the working set >>> size of the per-lcore data). >>> >> >> Here are some test results for a Raptor Cove @ 3,2 GHz (GCC 11): >> >> + ------------------------------------------------------- + >> + Test Suite : lcore variable perf autotest >> + ------------------------------------------------------- + >> Latencies [TSC cycles/update] >> Modules/Variables Static array Thread-local Storage Lcore variables >> 1 3.9 5.5 3.7 >> 2 3.8 5.5 3.8 >> 4 4.9 5.5 3.7 >> 8 3.8 5.5 3.8 >> 16 11.3 5.5 3.7 >> 32 20.9 5.5 3.7 >> 64 23.5 5.5 3.7 >> 128 23.2 5.5 3.7 >> 256 23.5 5.5 3.7 >> 512 24.1 5.5 3.7 >> 1024 25.3 5.5 3.9 >> + TestCase [ 0] : test_lcore_var_access succeeded >> + ------------------------------------------------------- + >> >> >> The reason for TLS being slower than lcore variables (which in turn >> relies on TLS for lcore id lookup) is the lazy initialization >> conditional that is imposed on variant. Could that be avoided (which is >> module-dependent I suppose), it beats lcore variables at ~3.0 cycles/update. > > I think you should not assume lazy initialization of TLS in your benchmark. > Our application uses TLS, and when spinning up a new thread, we call an per-lcore init function of each module before calling the per-lcore run function. This design pattern is also described in Figure 1.4 [1] in the Programmer's Guide. > > [1]: https://doc.dpdk.org/guides/prog_guide/env_abstraction_layer.html > Per-lcore init functions may be an option, and also may not, depending on what API you need to adhere to. But maybe I should add non-lazy TLS variant as well. I should probably add some information on lcore variables in the EAL programmer's guide as well. Non-lazy TLS would be a more viable option if there were proper framework support for it. Now, I'm not sure there is a better way to do it in a DPDK library than how it's done for tracing, where there's an explicit call per thread created. Other DPDK-internal users of RTE_PER_LCORE seems to depend on lazy initialization. >> >> I must say I'm surprised to see lcore variables doing this good, at >> these very modest working set sizes. Probably, you can stay at near-zero >> L1 misses with lcore variables (and TLS), but start missing the L1 with >> static arrays. >