From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id C3F8043A4E; Fri, 2 Feb 2024 11:19:30 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 52D1A402DC; Fri, 2 Feb 2024 11:19:30 +0100 (CET) Received: from mail.lysator.liu.se (mail.lysator.liu.se [130.236.254.3]) by mails.dpdk.org (Postfix) with ESMTP id 29F3C4026E for ; Fri, 2 Feb 2024 11:19:28 +0100 (CET) Received: from mail.lysator.liu.se (localhost [127.0.0.1]) by mail.lysator.liu.se (Postfix) with ESMTP id D942A5B5 for ; Fri, 2 Feb 2024 11:19:27 +0100 (CET) Received: by mail.lysator.liu.se (Postfix, from userid 1004) id CC3CB543; Fri, 2 Feb 2024 11:19:27 +0100 (CET) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on hermod.lysator.liu.se X-Spam-Level: X-Spam-Status: No, score=-1.4 required=5.0 tests=ALL_TRUSTED,AWL, T_SCC_BODY_TEXT_LINE autolearn=disabled version=4.0.0 X-Spam-Score: -1.4 Received: from [192.168.1.59] (h-62-63-215-114.A163.priv.bahnhof.se [62.63.215.114]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mail.lysator.liu.se (Postfix) with ESMTPSA id 1A5615B4; Fri, 2 Feb 2024 11:19:26 +0100 (CET) Message-ID: <04c3bc56-0075-46fa-b49c-c6f9b844d400@lysator.liu.se> Date: Fri, 2 Feb 2024 11:19:25 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC v3] eal: add bitset type Content-Language: en-US To: =?UTF-8?Q?Morten_Br=C3=B8rup?= , Stephen Hemminger , =?UTF-8?Q?Mattias_R=C3=B6nnblom?= Cc: dev@dpdk.org, Tyler Retzlaff References: <20240131131301.418361-1-mattias.ronnblom@ericsson.com> <20240131080643.41a03cd8@hermes.local> <23445b74-50b2-4e7a-a6d8-b844815031fb@lysator.liu.se> <98CBD80474FA8B44BF855DF32C47DC35E9F1D9@smartserver.smartshare.dk> From: =?UTF-8?Q?Mattias_R=C3=B6nnblom?= In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35E9F1D9@smartserver.smartshare.dk> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Virus-Scanned: ClamAV using ClamSMTP X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On 2024-02-01 09:04, Morten Brørup wrote: >> From: Mattias Rönnblom [mailto:hofors@lysator.liu.se] >> Sent: Wednesday, 31 January 2024 19.46 >> >> On 2024-01-31 17:06, Stephen Hemminger wrote: >>> On Wed, 31 Jan 2024 14:13:01 +0100 >>> Mattias Rönnblom wrote: > > [...] > >>> FYI - the linux kernel has a similar but more complete set of >> operations. >>> It might be more efficient to use unsigned long rather than requiring >>> the elements to be uint64_t. Thinking of the few 32 bit platforms. >>> >> >> Keeping it 64-bit avoids a popcount-related #ifdef. DPDK doesn't have >> an >> equivalent to __builtin_popcountl(). >> >> How much do we need to care about 32-bit ISA performance? > > At the 2023 DPDK Summit I talked to someone at a very well known network equipment vendor using 32 bit CPUs in some of their products; some sort of CPE, IIRC. 32 bit CPUs are still out there, and 32-bit CPU support has not been deprecated in DPDK. > > For the bitset parameter to functions, you could either use "unsigned long*" (as suggested by Stephen), or "void*" (followed by type casting inside the functions). > > If only using this library for the command line argument parser and similar, performance is irrelevant. If we foresee using it in the fast path, e.g. with the htimer library, we shouldn't tie its API tightly to 64 bit. > I'm not even sure performance will be that much worse. Sure, two popcount instead of one. What is probably worse is older ISAs (32- or 64-bit, e.g. original x64_64) that lack machine instructions for counting set bits of *any* word size. That said, the only real concern I have about going "unsigned long" -> "uint64_t" is that I might feel I need to go fix first. >> >> I'll go through the below API and some other APIs to see if there's >> something obvious missing. >> >> When I originally wrote this code there were a few potential features >> where I wasn't sure to what extent they were useful. One example was >> the >> shift operation. Any input is appreciated. > > Start off with what you already have. If we need more operations, they can always be added later. > >> >>> Also, what if any thread safety guarantees? or atomic. >>> >> >> Currently, it's all MT unsafe. >> >> An atomic set and get/test would make sense, and maybe other operations >> would as well. >> >> Bringing in atomicity into the design makes it much less obvious: >> >> Would the atomic operations imply some memory ordering, or be >> "relaxed". >> I would lean toward relaxed, but then shouldn't bit-level atomics be >> consistent with the core DPDK atomics API? With that in mind, memory >> ordering should be user-configurable. >> >> If the code needs to be pure C11 atomics-wise, the words that makes up >> the bitset must be _Atomic uint64_t. Then you need to be careful or end >> up with "lock"-prefixed instructions if you manipulate the bitset >> words. >> Just a pure words[N] = 0; gives you a mov+mfence on x86, for example, >> plus all the fun memory_order_seq_cst in terms of preventing >> compiler-level optimizations. So you definitely can't have the bitset >> always using _Atomic uint64_t, since would risk non-shared use cases. >> You could have a variant I guess. Just duplicate the whole thing, or >> something with macros. > > It seems like MT unsafe suffices for the near term use cases. > > We can add an atomic variant of the library later, if the need should arise. > Agreed. The only concern I have here is that you end up wanting to change the original design, to better be able to fit atomic bit operations. >> >> With GCC C11 builtins, you can both have the atomic cake and eat it, in >> that you both access the data non-atomically/normally, and in an atomic >> manner. > > Yep. And we care quite a lot about performance, so we are likely to keep using those until the compilers offer similar performance for C11 standard atomics. >