From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 94613440D7; Sun, 26 May 2024 16:39:39 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 6DF1E402B2; Sun, 26 May 2024 16:39:39 +0200 (CEST) Received: from mail.lysator.liu.se (mail.lysator.liu.se [130.236.254.3]) by mails.dpdk.org (Postfix) with ESMTP id D0ACE40279 for ; Sun, 26 May 2024 16:39:37 +0200 (CEST) Received: from mail.lysator.liu.se (localhost [127.0.0.1]) by mail.lysator.liu.se (Postfix) with ESMTP id 295151AD98 for ; Sun, 26 May 2024 16:39:37 +0200 (CEST) Received: by mail.lysator.liu.se (Postfix, from userid 1004) id 1D5651AD24; Sun, 26 May 2024 16:39:37 +0200 (CEST) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on hermod.lysator.liu.se X-Spam-Level: X-Spam-Status: No, score=-1.3 required=5.0 tests=ALL_TRUSTED,AWL, T_SCC_BODY_TEXT_LINE autolearn=disabled version=4.0.0 X-Spam-Score: -1.3 Received: from [192.168.1.59] (h-62-63-215-114.A163.priv.bahnhof.se [62.63.215.114]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mail.lysator.liu.se (Postfix) with ESMTPSA id 480A01AD97; Sun, 26 May 2024 16:39:35 +0200 (CEST) Message-ID: <9e8ca14b-328f-44a2-8444-a526ae0f6e82@lysator.liu.se> Date: Sun, 26 May 2024 16:39:34 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v9 1/8] eal: generic 64 bit counter To: Tyler Retzlaff , =?UTF-8?Q?Morten_Br=C3=B8rup?= Cc: Stephen Hemminger , dev@dpdk.org References: <20240510050507.14381-1-stephen@networkplumber.org> <20240521201801.126886-1-stephen@networkplumber.org> <20240521201801.126886-2-stephen@networkplumber.org> <98CBD80474FA8B44BF855DF32C47DC35E9F488@smartserver.smartshare.dk> <20240522083741.64078d7e@hermes.local> <98CBD80474FA8B44BF855DF32C47DC35E9F48C@smartserver.smartshare.dk> <20240522190112.GA19947@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net> Content-Language: en-US From: =?UTF-8?Q?Mattias_R=C3=B6nnblom?= In-Reply-To: <20240522190112.GA19947@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Virus-Scanned: ClamAV using ClamSMTP X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On 2024-05-22 21:01, Tyler Retzlaff wrote: > On Wed, May 22, 2024 at 07:57:01PM +0200, Morten Brørup wrote: >>> From: Stephen Hemminger [mailto:stephen@networkplumber.org] >>> Sent: Wednesday, 22 May 2024 17.38 >>> >>> On Wed, 22 May 2024 10:31:39 +0200 >>> Morten Brørup wrote: >>> >>>>> +/* On 32 bit platform, need to use atomic to avoid load/store >>> tearing */ >>>>> +typedef RTE_ATOMIC(uint64_t) rte_counter64_t; >>>> >>>> As shown by Godbolt experiments discussed in a previous thread [2], >>> non-tearing 64 bit counters can be implemented without using atomic >>> instructions on all 32 bit architectures supported by DPDK. So we should >>> use the counter/offset design pattern for RTE_ARCH_32 too. >>>> >>>> [2]: >>> https://inbox.dpdk.org/dev/98CBD80474FA8B44BF855DF32C47DC35E9F433@smarts >>> erver.smartshare.dk/ >>> >>> >>> This code built with -O3 and -m32 on godbolt shows split problem. >>> >>> #include >>> >>> typedef uint64_t rte_counter64_t; >>> >>> void >>> rte_counter64_add(rte_counter64_t *counter, uint32_t val) >>> { >>> *counter += val; >>> } >>> … *counter = val; >>> } >>> >>> rte_counter64_add: >>> push ebx >>> mov eax, DWORD PTR [esp+8] >>> xor ebx, ebx >>> mov ecx, DWORD PTR [esp+12] >>> add DWORD PTR [eax], ecx >>> adc DWORD PTR [eax+4], ebx >>> pop ebx >>> ret >>> >>> rte_counter64_read: >>> mov eax, DWORD PTR [esp+4] >>> mov edx, DWORD PTR [eax+4] >>> mov eax, DWORD PTR [eax] >>> ret >>> rte_counter64_set: >>> movq xmm0, QWORD PTR [esp+8] >>> mov eax, DWORD PTR [esp+4] >>> movq QWORD PTR [eax], xmm0 >>> ret >> >> Sure, atomic might be required on some 32 bit architectures and/or with some compilers. > > in theory i think you should be able to use generic atomics and > depending on the target you get codegen that works. it might be > something more expensive on 32-bit and nothing on 64-bit etc.. > > what's the damage if we just use atomic generic and relaxed ordering? is > the codegen not optimal? > Below is what I originally proposed in the "make stats reset reliable" thread. struct counter { uint64_t count; uint64_t offset; }; /../ struct counter rx_pkts; struct counter rx_bytes; /../ static uint64_t counter_value(const struct counter *counter) { uint64_t count = __atomic_load_n(&counter->count, __ATOMIC_RELAXED); uint64_t offset = __atomic_load_n(&counter->offset, __ATOMIC_RELAXED); return count - offset; } static void counter_reset(struct counter *counter) { uint64_t count = __atomic_load_n(&counter->count, __ATOMIC_RELAXED); __atomic_store_n(&counter->offset, count, __ATOMIC_RELAXED); } static void counter_add(struct counter *counter, uint64_t operand) { __atomic_store_n(&counter->count, counter->count + operand, __ATOMIC_RELAXED); } I think this solution generally compiles to something that's equivalent to just using non-atomic loads/stores and hope for the best. Using a non-atomic load in counter_add() will generate better code, but doesn't work if you using _Atomic (w/o casts). Atomic load/stores seems to have volatile semantics, so multiple counter updates to the same counter cannot be merged. That is a drawback. >> I envision a variety of 32 bit implementations, optimized for certain architectures/compilers. >> >> Some of them can provide non-tearing 64 bit load/store, so we should also use the counter/offset design pattern for those. >>