From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 7784043F6A; Thu, 2 May 2024 19:57:59 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 0A377402B2; Thu, 2 May 2024 19:57:59 +0200 (CEST) Received: from mail.lysator.liu.se (mail.lysator.liu.se [130.236.254.3]) by mails.dpdk.org (Postfix) with ESMTP id BD91C40299 for ; Thu, 2 May 2024 19:57:57 +0200 (CEST) Received: from mail.lysator.liu.se (localhost [127.0.0.1]) by mail.lysator.liu.se (Postfix) with ESMTP id 27A9910C02 for ; Thu, 2 May 2024 19:57:57 +0200 (CEST) Received: by mail.lysator.liu.se (Postfix, from userid 1004) id 1BDBA10AF5; Thu, 2 May 2024 19:57:57 +0200 (CEST) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on hermod.lysator.liu.se X-Spam-Level: X-Spam-Status: No, score=-1.3 required=5.0 tests=ALL_TRUSTED,AWL, T_SCC_BODY_TEXT_LINE autolearn=disabled version=4.0.0 X-Spam-Score: -1.3 Received: from [192.168.1.59] (h-62-63-215-114.A163.priv.bahnhof.se [62.63.215.114]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mail.lysator.liu.se (Postfix) with ESMTPSA id 9FE0610C01; Thu, 2 May 2024 19:57:55 +0200 (CEST) Message-ID: Date: Thu, 2 May 2024 19:57:55 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] net/af_packet: fix statistics To: Stephen Hemminger , Ferruh Yigit Cc: dev@dpdk.org, "John W. Linville" , =?UTF-8?Q?Morten_Br=C3=B8rup?= References: <20240430154129.7347-1-stephen@networkplumber.org> <9025199c-585c-4779-9f4e-360845707088@amd.com> <20240501094303.751e95b4@hermes.local> <20240502091658.2d51ca75@hermes.local> Content-Language: en-US From: =?UTF-8?Q?Mattias_R=C3=B6nnblom?= In-Reply-To: <20240502091658.2d51ca75@hermes.local> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV using ClamSMTP X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On 2024-05-02 18:16, Stephen Hemminger wrote: > On Thu, 2 May 2024 15:12:43 +0100 > Ferruh Yigit wrote: > >> I am not referring multiple core sharing a queue, this is wrong for DPDK. >> >> For single core case, if a variable doesn't have 'volatile' qualifier >> and load/stores are not atomic, as compiler is not aware that there may >> be other threads will access this value, what prevents compiler to hold >> stats value in a register and store it memory at later stage, let say at >> the end of the application forwarding loop? > > No other driver does this; it is not a problem since once rx/tx burst > function returns, there is no state the compiler is aware of. > > >> For this case keeping 'volatile' qualifier works. But Mattias has a >> suggestion to use "normal load + normal add + atomic store" in datapath. >> >> >> For getting stats, which will be in different thread, you are already >> casting it to 'volatile' to enforce compiler read from memory each time. >> >> >> Additionally when we take stats reset into account, which can happen in >> different thread, 'volatile' is not enough. >> For reliable stats reset, either we need to use full atomics, or offset >> logic which I am trying to in other patch. > > If you want to attack the problem in the general case, there should be > standard set of functions for handling per-core stats. If you look at > Linux kernel you will see macros around this function: > > netdev_core_stats_inc((struct net_device *dev, u32 offset) { > struct net_device_core_stats __percpu *p = READ_ONCE(dev->core_stats); > unsigned long __percpu *field; > ... > field = (__force unsigned long __percpu *)((__force void *)p + offset); > this_cpu_inc(*field); > } > > The expansion of this_cpu_inc is nested but ends up being: > *field += 1; > > No volatile is needed. > > The conclusion I reach from this is: > - atomic is not needed for per-cpu data. See "volatile considered harmful" > using a form of compiler barrier is useful but should be hidden in generic code. > - what is the state of the better per-lcore patches? > that should be used here. > - need a more generic percpu stats infrastructure > - ethedev needs to have a generic percpu infra and not reinvented in xdp, tap, packet, etc. > > In this case, the counters are already effectively "per-lcore", since a single RX/TX queue pair is not MT safe, and thus is only used by one thread (or at least, by one thread at a time). My impression is that both the Linux kernel and DPDK tend to assume that no load or store tearing occurs for accesses to small, naturally aligned, integer types. I'm guessing it's also commonly assumed there are limits to how large section of the whole program the compiler may reason about, to eliminate data and code/instructions that have no effect on that thread. Thus, volatile are not needed to assure that for example counter state is actually maintained, and updates actually occurs. I will continue work on the lcore variables patch set as soon as time permits. That will be after my attempts to complete the bitops and the bitset patch sets have concluded.