From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id E10F8440D2; Sun, 26 May 2024 09:04:19 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 785BC402D5; Sun, 26 May 2024 09:04:19 +0200 (CEST) Received: from mail.lysator.liu.se (mail.lysator.liu.se [130.236.254.3]) by mails.dpdk.org (Postfix) with ESMTP id 4AFDC402B2 for ; Sun, 26 May 2024 09:04:18 +0200 (CEST) Received: from mail.lysator.liu.se (localhost [127.0.0.1]) by mail.lysator.liu.se (Postfix) with ESMTP id F2EC519358 for ; Sun, 26 May 2024 09:04:17 +0200 (CEST) Received: by mail.lysator.liu.se (Postfix, from userid 1004) id E6D1B19429; Sun, 26 May 2024 09:04:17 +0200 (CEST) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on hermod.lysator.liu.se X-Spam-Level: X-Spam-Status: No, score=-1.3 required=5.0 tests=ALL_TRUSTED,AWL, T_SCC_BODY_TEXT_LINE autolearn=disabled version=4.0.0 X-Spam-Score: -1.3 Received: from [192.168.1.59] (h-62-63-215-114.A163.priv.bahnhof.se [62.63.215.114]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mail.lysator.liu.se (Postfix) with ESMTPSA id 20574192FC; Sun, 26 May 2024 09:03:48 +0200 (CEST) Message-ID: Date: Sun, 26 May 2024 09:03:47 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC v3] net/af_packet: make stats reset reliable To: =?UTF-8?Q?Morten_Br=C3=B8rup?= , Bruce Richardson Cc: Stephen Hemminger , Ferruh Yigit , "John W. Linville" , Thomas Monjalon , dev@dpdk.org, =?UTF-8?Q?Mattias_R=C3=B6nnblom?= References: <20240425174617.2126159-1-ferruh.yigit@amd.com> <20240503154547.392069-1-ferruh.yigit@amd.com> <20240503150011.55681b97@hermes.local> <432fb280-96e3-4079-b987-990d509b8c79@lysator.liu.se> <20240508082339.6bf74202@hermes.local> <834102e5-8f4b-4b67-9f62-8e8f61403b39@amd.com> <20240508135418.55505105@hermes.local> <98CBD80474FA8B44BF855DF32C47DC35E9F431@smartserver.smartshare.dk> <98CBD80474FA8B44BF855DF32C47DC35E9F432@smartserver.smartshare.dk> Content-Language: en-US From: =?UTF-8?Q?Mattias_R=C3=B6nnblom?= In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35E9F432@smartserver.smartshare.dk> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Virus-Scanned: ClamAV using ClamSMTP X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On 2024-05-09 13:37, Morten Brørup wrote: >> From: Bruce Richardson [mailto:bruce.richardson@intel.com] >> Sent: Thursday, 9 May 2024 11.30 >> >> On Thu, May 09, 2024 at 09:43:16AM +0200, Morten Brørup wrote: >>>> From: Stephen Hemminger [mailto:stephen@networkplumber.org] >>>> Sent: Wednesday, 8 May 2024 22.54 >>>> >>>> On Wed, 8 May 2024 20:48:06 +0100 >>>> Ferruh Yigit wrote: >>>> >>>>>> >>>>>> The idea of load tearing is crazy talk of integral types. It >> would >>>> break so many things. >>>>>> It is the kind of stupid compiler thing that would send Linus on >> a >>>> rant and get >>>>>> the GCC compiler writers in trouble. >>>>>> >>>>>> The DPDK has always favored performance over strict safety guard >>>> rails everywhere. >>>>>> Switching to making every statistic an atomic operation is not >> in >>>> the spirit of >>>>>> what is required. There is no strict guarantee necessary here. >>>>>> >>>>> >>>>> I kind of agree with Stephen. >>>>> >>>>> Thanks Mattias, Morten & Stephen, it was informative discussion. >> But >>>> for >>>>> *SW drivers* stats update and reset is not core functionality and >> I >>>>> think we can be OK to get hit on corner cases, instead of >>>>> over-engineering or making code more complex. >>>> >>>> >>>> I forgot the case of 64 bit values on 32 bit platforms! >>>> Mostly because haven't cared about 32 bit for years... >>>> >>>> The Linux kernel uses some wrappers to handle this. >>>> On 64 bit platforms they become noop. >>>> On 32 bit platform, they are protected by a seqlock and updates are >>>> wrapped by the sequence count. >>>> >>>> If we go this way, then doing similar Noop on 64 bit and atomic or >>>> seqlock >>>> on 32 bit should be done, but in common helper. >>>> >>>> Looking inside FreeBSD, it looks like that has changed over the >> years as >>>> well. >>>> >>>> if_inc_counter >>>> counter_u64_add >>>> atomic_add_64 >>>> But the counters are always per-cpu in this case. So although it >> does >>>> use >>>> locked operation, will always be uncontended. >>>> >>>> >>>> PS: Does DPDK still actually support 32 bit on x86? Can it be >> dropped >>>> this cycle? >>> >>> We cannot drop 32 bit architecture support altogether. >>> >>> But, unlike the Linux kernel, DPDK doesn't need to support ancient 32 >> bit architectures. >>> If the few 32 bit architectures supported by DPDK provide non-tearing >> 64 bit loads/stores, we don't need locks (in the fast path) for 64 bit >> counters. >>> >>> In addition to 32 bit x86, DPDK supports ARMv7-A (a 32 bit >> architecture) and 32 bit ARMv8. >>> I don't think DPDK support any other 32 bit architectures. >>> >>> >>> As Mattias mentioned, 32 bit x86 can use xmm registers to provide 64 >> bit non-tearing load/store. >>> >> >> Testing this a little in godbolt, I see gcc using xmm registers on 32- >> bit >> when updating 64-bit counters, but clang doesn't seem to do so, but >> instead >> does 2 stores when writing back the 64 value. (I tried with both >> volatile >> and non-volatile 64-bit values, just to see if volatile would encourage >> clang to do a single store). >> >> GCC: https://godbolt.org/z/9eqKfT3hz >> Clang: https://godbolt.org/z/PT5EqKn4c > > Interesting. > I guess this can be fixed by manually implementing what GCC does. > If you want an atomic store, use __atomic_store_n() or the equivalent. On GCC, the resulting code will be compact and use an XMM register for the store. On clang, you get much more instructions including a conditional jump to a compiler library function. You also get a compiler warning about misaligned atomic may incur significant cost (no surprise). All this on 32-bit x86. Interesting, you get all those instructions even when you explicitly tell the compiler the address should be assumed to be aligned. Seems like a bug, or at least a short-coming. So GCC assumes atomic stores are aligned, which seems perfectly reasonable to me. > I'm more concerned about finding a high-performance (in the fast path) 64 bit counter solution for 32 bit ARM. >