From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 7B3C8440D7; Sun, 26 May 2024 16:46:51 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 4478D402B2; Sun, 26 May 2024 16:46:51 +0200 (CEST) Received: from mail.lysator.liu.se (mail.lysator.liu.se [130.236.254.3]) by mails.dpdk.org (Postfix) with ESMTP id 6700440279 for ; Sun, 26 May 2024 16:46:49 +0200 (CEST) Received: from mail.lysator.liu.se (localhost [127.0.0.1]) by mail.lysator.liu.se (Postfix) with ESMTP id 2C5721ADAA for ; Sun, 26 May 2024 16:46:49 +0200 (CEST) Received: by mail.lysator.liu.se (Postfix, from userid 1004) id 1F6931ACEB; Sun, 26 May 2024 16:46:49 +0200 (CEST) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on hermod.lysator.liu.se X-Spam-Level: X-Spam-Status: No, score=-1.3 required=5.0 tests=ALL_TRUSTED,AWL, T_SCC_BODY_TEXT_LINE autolearn=disabled version=4.0.0 X-Spam-Score: -1.3 Received: from [192.168.1.59] (h-62-63-215-114.A163.priv.bahnhof.se [62.63.215.114]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mail.lysator.liu.se (Postfix) with ESMTPSA id ABB2D1AC7D; Sun, 26 May 2024 16:46:47 +0200 (CEST) Message-ID: Date: Sun, 26 May 2024 16:46:47 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v9 1/8] eal: generic 64 bit counter To: Stephen Hemminger , Tyler Retzlaff Cc: =?UTF-8?Q?Morten_Br=C3=B8rup?= , dev@dpdk.org References: <20240510050507.14381-1-stephen@networkplumber.org> <20240521201801.126886-1-stephen@networkplumber.org> <20240521201801.126886-2-stephen@networkplumber.org> <98CBD80474FA8B44BF855DF32C47DC35E9F488@smartserver.smartshare.dk> <20240522083741.64078d7e@hermes.local> <98CBD80474FA8B44BF855DF32C47DC35E9F48C@smartserver.smartshare.dk> <20240522190112.GA19947@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net> <20240522125153.08e612f1@hermes.local> Content-Language: en-US From: =?UTF-8?Q?Mattias_R=C3=B6nnblom?= In-Reply-To: <20240522125153.08e612f1@hermes.local> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Virus-Scanned: ClamAV using ClamSMTP X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On 2024-05-22 21:51, Stephen Hemminger wrote: > On Wed, 22 May 2024 12:01:12 -0700 > Tyler Retzlaff wrote: > >> On Wed, May 22, 2024 at 07:57:01PM +0200, Morten Brørup wrote: >>>> From: Stephen Hemminger [mailto:stephen@networkplumber.org] >>>> Sent: Wednesday, 22 May 2024 17.38 >>>> >>>> On Wed, 22 May 2024 10:31:39 +0200 >>>> Morten Brørup wrote: >>>> >>>>>> +/* On 32 bit platform, need to use atomic to avoid load/store >>>> tearing */ >>>>>> +typedef RTE_ATOMIC(uint64_t) rte_counter64_t; >>>>> >>>>> As shown by Godbolt experiments discussed in a previous thread [2], >>>> non-tearing 64 bit counters can be implemented without using atomic >>>> instructions on all 32 bit architectures supported by DPDK. So we should >>>> use the counter/offset design pattern for RTE_ARCH_32 too. >>>>> >>>>> [2]: >>>> https://inbox.dpdk.org/dev/98CBD80474FA8B44BF855DF32C47DC35E9F433@smarts >>>> erver.smartshare.dk/ >>>> >>>> >>>> This code built with -O3 and -m32 on godbolt shows split problem. >>>> >>>> #include >>>> >>>> typedef uint64_t rte_counter64_t; >>>> >>>> void >>>> rte_counter64_add(rte_counter64_t *counter, uint32_t val) >>>> { >>>> *counter += val; >>>> } >>>> … *counter = val; >>>> } >>>> >>>> rte_counter64_add: >>>> push ebx >>>> mov eax, DWORD PTR [esp+8] >>>> xor ebx, ebx >>>> mov ecx, DWORD PTR [esp+12] >>>> add DWORD PTR [eax], ecx >>>> adc DWORD PTR [eax+4], ebx >>>> pop ebx >>>> ret >>>> >>>> rte_counter64_read: >>>> mov eax, DWORD PTR [esp+4] >>>> mov edx, DWORD PTR [eax+4] >>>> mov eax, DWORD PTR [eax] >>>> ret >>>> rte_counter64_set: >>>> movq xmm0, QWORD PTR [esp+8] >>>> mov eax, DWORD PTR [esp+4] >>>> movq QWORD PTR [eax], xmm0 >>>> ret >>> >>> Sure, atomic might be required on some 32 bit architectures and/or with some compilers. >> >> in theory i think you should be able to use generic atomics and >> depending on the target you get codegen that works. it might be >> something more expensive on 32-bit and nothing on 64-bit etc.. >> >> what's the damage if we just use atomic generic and relaxed ordering? is >> the codegen not optimal? > > If we use atomic with relaxed memory order, then compiler for x86 still generates > a locked increment in the fast path. This costs about 100 extra cycles due > to cache and prefetch stall. This whole endeavor is an attempt to avoid that. > It's because the code is overly restrictive (e.g., needlessly forcing the whole read-modify-read being atomic), in that case, and no fault of the compiler. void add(uint64_t *addr, uint64_t operand) { uint64_t value = __atomic_load_n(addr, __ATOMIC_RELAXED); value += operand; __atomic_store_n(addr, value, __ATOMIC_RELAXED); } -> x86_64 add: mov rax, QWORD PTR [rdi] add rax, rsi mov QWORD PTR [rdi], rax ret x86 add: sub esp, 12 mov ecx, DWORD PTR [esp+16] movq xmm0, QWORD PTR [ecx] movq QWORD PTR [esp], xmm0 mov eax, DWORD PTR [esp] mov edx, DWORD PTR [esp+4] add eax, DWORD PTR [esp+20] adc edx, DWORD PTR [esp+24] mov DWORD PTR [esp], eax mov DWORD PTR [esp+4], edx movq xmm1, QWORD PTR [esp] movq QWORD PTR [ecx], xmm1 add esp, 12 ret No locked instructions. > PS: looking at the locked increment code for 32 bit involves locked compare > exchange and potential retry. Probably don't care about performance on that platform > anymore. > >