From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id DD2D043A22; Wed, 31 Jan 2024 19:45:38 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id CB3444068E; Wed, 31 Jan 2024 19:45:38 +0100 (CET) Received: from mail.lysator.liu.se (mail.lysator.liu.se [130.236.254.3]) by mails.dpdk.org (Postfix) with ESMTP id 9C803402A7 for ; Wed, 31 Jan 2024 19:45:37 +0100 (CET) Received: from mail.lysator.liu.se (localhost [127.0.0.1]) by mail.lysator.liu.se (Postfix) with ESMTP id 48DFC184E2 for ; Wed, 31 Jan 2024 19:45:37 +0100 (CET) Received: by mail.lysator.liu.se (Postfix, from userid 1004) id 3D83B18582; Wed, 31 Jan 2024 19:45:37 +0100 (CET) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on hermod.lysator.liu.se X-Spam-Level: X-Spam-Status: No, score=-1.4 required=5.0 tests=ALL_TRUSTED,AWL, T_SCC_BODY_TEXT_LINE autolearn=disabled version=4.0.0 X-Spam-Score: -1.4 Received: from [192.168.1.59] (h-62-63-215-114.A163.priv.bahnhof.se [62.63.215.114]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mail.lysator.liu.se (Postfix) with ESMTPSA id 7A53918581; Wed, 31 Jan 2024 19:45:35 +0100 (CET) Message-ID: <23445b74-50b2-4e7a-a6d8-b844815031fb@lysator.liu.se> Date: Wed, 31 Jan 2024 19:45:34 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC v3] eal: add bitset type Content-Language: en-US To: Stephen Hemminger , =?UTF-8?Q?Mattias_R=C3=B6nnblom?= Cc: dev@dpdk.org, =?UTF-8?Q?Morten_Br=C3=B8rup?= , Tyler Retzlaff References: <20240131131301.418361-1-mattias.ronnblom@ericsson.com> <20240131080643.41a03cd8@hermes.local> From: =?UTF-8?Q?Mattias_R=C3=B6nnblom?= In-Reply-To: <20240131080643.41a03cd8@hermes.local> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Virus-Scanned: ClamAV using ClamSMTP X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On 2024-01-31 17:06, Stephen Hemminger wrote: > On Wed, 31 Jan 2024 14:13:01 +0100 > Mattias Rönnblom wrote: > >> +/** >> + * @file >> + * RTE Bitset >> + * >> + * This file provides functions and macros for querying and >> + * manipulating sets of bits kept in arrays of @c uint64_t-sized >> + * elements. >> + * >> + * The bits in a bitset are numbered from 0 to @c size - 1, with the >> + * lowest index being the least significant bit. >> + * >> + * The bitset array must be properly aligned. >> + * >> + * For optimal performance, the @c size parameter, required by >> + * many of the API's functions, should be a compile-time constant. >> + * >> + * For large bitsets, the rte_bitmap.h API may be more appropriate. >> + * >> + * @warning >> + * All functions modifying a bitset may overwrite any unused bits of >> + * the last word. Such unused bits are ignored by all functions reading >> + * bits. >> + * >> + */ > > FYI - the linux kernel has a similar but more complete set of operations. > It might be more efficient to use unsigned long rather than requiring > the elements to be uint64_t. Thinking of the few 32 bit platforms. > Keeping it 64-bit avoids a popcount-related #ifdef. DPDK doesn't have an equivalent to __builtin_popcountl(). How much do we need to care about 32-bit ISA performance? I'll go through the below API and some other APIs to see if there's something obvious missing. When I originally wrote this code there were a few potential features where I wasn't sure to what extent they were useful. One example was the shift operation. Any input is appreciated. > Also, what if any thread safety guarantees? or atomic. > Currently, it's all MT unsafe. An atomic set and get/test would make sense, and maybe other operations would as well. Bringing in atomicity into the design makes it much less obvious: Would the atomic operations imply some memory ordering, or be "relaxed". I would lean toward relaxed, but then shouldn't bit-level atomics be consistent with the core DPDK atomics API? With that in mind, memory ordering should be user-configurable. If the code needs to be pure C11 atomics-wise, the words that makes up the bitset must be _Atomic uint64_t. Then you need to be careful or end up with "lock"-prefixed instructions if you manipulate the bitset words. Just a pure words[N] = 0; gives you a mov+mfence on x86, for example, plus all the fun memory_order_seq_cst in terms of preventing compiler-level optimizations. So you definitely can't have the bitset always using _Atomic uint64_t, since would risk non-shared use cases. You could have a variant I guess. Just duplicate the whole thing, or something with macros. With GCC C11 builtins, you can both have the atomic cake and eat it, in that you both access the data non-atomically/normally, and in an atomic manner. > From kernel bitmap.h > > /** > * DOC: bitmap overview > * > * The available bitmap operations and their rough meaning in the > * case that the bitmap is a single unsigned long are thus: > * > * The generated code is more efficient when nbits is known at > * compile-time and at most BITS_PER_LONG. > * > * :: > * > * bitmap_zero(dst, nbits) *dst = 0UL > * bitmap_fill(dst, nbits) *dst = ~0UL > * bitmap_copy(dst, src, nbits) *dst = *src > * bitmap_and(dst, src1, src2, nbits) *dst = *src1 & *src2 > * bitmap_or(dst, src1, src2, nbits) *dst = *src1 | *src2 > * bitmap_xor(dst, src1, src2, nbits) *dst = *src1 ^ *src2 > * bitmap_andnot(dst, src1, src2, nbits) *dst = *src1 & ~(*src2) > * bitmap_complement(dst, src, nbits) *dst = ~(*src) > * bitmap_equal(src1, src2, nbits) Are *src1 and *src2 equal? > * bitmap_intersects(src1, src2, nbits) Do *src1 and *src2 overlap? > * bitmap_subset(src1, src2, nbits) Is *src1 a subset of *src2? > * bitmap_empty(src, nbits) Are all bits zero in *src? > * bitmap_full(src, nbits) Are all bits set in *src? > * bitmap_weight(src, nbits) Hamming Weight: number set bits > * bitmap_weight_and(src1, src2, nbits) Hamming Weight of and'ed bitmap > * bitmap_set(dst, pos, nbits) Set specified bit area > * bitmap_clear(dst, pos, nbits) Clear specified bit area > * bitmap_find_next_zero_area(buf, len, pos, n, mask) Find bit free area > * bitmap_find_next_zero_area_off(buf, len, pos, n, mask, mask_off) as above > * bitmap_shift_right(dst, src, n, nbits) *dst = *src >> n > * bitmap_shift_left(dst, src, n, nbits) *dst = *src << n > * bitmap_cut(dst, src, first, n, nbits) Cut n bits from first, copy rest > * bitmap_replace(dst, old, new, mask, nbits) *dst = (*old & ~(*mask)) | (*new & *mask) > * bitmap_remap(dst, src, old, new, nbits) *dst = map(old, new)(src) > * bitmap_bitremap(oldbit, old, new, nbits) newbit = map(old, new)(oldbit) > * bitmap_onto(dst, orig, relmap, nbits) *dst = orig relative to relmap > * bitmap_fold(dst, orig, sz, nbits) dst bits = orig bits mod sz > * bitmap_parse(buf, buflen, dst, nbits) Parse bitmap dst from kernel buf > * bitmap_parse_user(ubuf, ulen, dst, nbits) Parse bitmap dst from user buf > * bitmap_parselist(buf, dst, nbits) Parse bitmap dst from kernel buf > * bitmap_parselist_user(buf, dst, nbits) Parse bitmap dst from user buf > * bitmap_find_free_region(bitmap, bits, order) Find and allocate bit region > * bitmap_release_region(bitmap, pos, order) Free specified bit region > * bitmap_allocate_region(bitmap, pos, order) Allocate specified bit region > * bitmap_from_arr32(dst, buf, nbits) Copy nbits from u32[] buf to dst > * bitmap_from_arr64(dst, buf, nbits) Copy nbits from u64[] buf to dst > * bitmap_to_arr32(buf, src, nbits) Copy nbits from buf to u32[] dst > * bitmap_to_arr64(buf, src, nbits) Copy nbits from buf to u64[] dst > * bitmap_get_value8(map, start) Get 8bit value from map at start > * bitmap_set_value8(map, value, start) Set 8bit value to map at start > * > * Note, bitmap_zero() and bitmap_fill() operate over the region of > * unsigned longs, that is, bits behind bitmap till the unsigned long > * boundary will be zeroed or filled as well. Consider to use > * bitmap_clear() or bitmap_set() to make explicit zeroing or filling > * respectively. > */ >