From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 27F5246DC9; Mon, 25 Aug 2025 08:05:32 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 561F24025E; Mon, 25 Aug 2025 08:05:31 +0200 (CEST) Received: from mail.lysator.liu.se (mail.lysator.liu.se [130.236.254.3]) by mails.dpdk.org (Postfix) with ESMTP id D94F44025A for ; Mon, 25 Aug 2025 08:05:29 +0200 (CEST) Received: from mail.lysator.liu.se (localhost [127.0.0.1]) by mail.lysator.liu.se (Postfix) with ESMTP id 78E191CCBD for ; Mon, 25 Aug 2025 08:05:29 +0200 (CEST) Received: by mail.lysator.liu.se (Postfix, from userid 1004) id 6CFA41CD1E; Mon, 25 Aug 2025 08:05:29 +0200 (CEST) X-Spam-Checker-Version: SpamAssassin 4.0.1 (2024-03-25) on hermod.lysator.liu.se X-Spam-Level: X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED autolearn=disabled version=4.0.1 X-Spam-Score: -1.0 Received: from [192.168.1.85] (h-62-63-215-114.A163.priv.bahnhof.se [62.63.215.114]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (prime256v1) server-digest SHA256) (No client certificate requested) by mail.lysator.liu.se (Postfix) with ESMTPSA id 0B4D81CCBC; Mon, 25 Aug 2025 08:05:26 +0200 (CEST) Message-ID: Date: Mon, 25 Aug 2025 08:05:26 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC 3/3] hash: add support for common small key sizes To: =?UTF-8?Q?Morten_Br=C3=B8rup?= , Stephen Hemminger , dev@dpdk.org Cc: =?UTF-8?Q?Mattias_R=C3=B6nnblom?= , Yipeng Wang , Sameh Gobriel , Bruce Richardson , Vladimir Medvedkin References: <20250821203646.133506-1-stephen@networkplumber.org> <20250821203646.133506-4-stephen@networkplumber.org> <5427c6f3-4446-4ee3-909e-5f2925d2b286@lysator.liu.se> <98CBD80474FA8B44BF855DF32C47DC35E9FE61@smartserver.smartshare.dk> <90ad6122-467c-441d-8515-c043146a8a57@lysator.liu.se> <98CBD80474FA8B44BF855DF32C47DC35E9FE63@smartserver.smartshare.dk> Content-Language: en-US From: =?UTF-8?Q?Mattias_R=C3=B6nnblom?= In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35E9FE63@smartserver.smartshare.dk> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Virus-Scanned: ClamAV using ClamSMTP X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On 2025-08-22 20:57, Morten Brørup wrote: >>>>> +static inline int >>>>> +rte_hash_k2_cmp_eq(const void *key1, const void *key2, size_t key_len >>>> __rte_unused) >>>>> +{ >>>>> + const uint16_t *k1 = key1; >>>>> + const uint16_t *k2 = key2; >>>>> + >>>> >>>> What we do now is to require the keys are 16-bit aligned (which wasn't >>>> the case before). >>>> >>>> You could >>>> >>>> uint16_t k1; >>>> memcpy(&k1, key1, sizeof(uint16_t)); >>>> instead. >>>> >>>> Would generate the same code, but be safe from any future alignment issues. >>> >>> Or use the unaligned types, e.g.: >>> const unaligned_uint16_t *k1 = key1; >>> const unaligned_uint16_t *k2 = key2; >>> >> >> Could you explain why that is safe? Doesn't >> __attribute__((__aligned__(1))) >> just say specify the object doesn't have any alignment requirements, >> without asking the compiler to deal with it? > > It is safe because the compiler does deal with it. > Here's a simple example: > https://godbolt.org/z/r39zdeEcx > I see. Thanks! > I don't know how MSVC deals with it, but it doesn't support alignment sensitive architectures, so no real problem. > >> >>>> >>>> Anyway, maybe it's safe to assume the keys are aligned, so this is not >>>> an issue. >>> >>> Lots of DPDK code ignores alignment issues. >>> >>>> >>>>> + return k1[0] ^ k2[0]; >>>>> +} >>>> >>>> Haven't you implemented "neq" rather than "eq" here? If the keys are >>>> equal, the result is 0. Should be != 0. >>> >>> Not a bug. >> >> Well, the function body doesn't do what the function name tells it. :) > > Agree. They really should be renamed to _cmp_neq. > And _cmp_eq wrappers could be kept for backwards API compatibility. Possibly marked deprecated. > >> >>> These hash compare functions are in fact "neq", not "eq". >>> Having "_cmp_eq" in the function names is extremely unfortunate and >> misleading. >>> >>>> >>>> Would it be worth adding a comment like "use XOR to make this >>>> branch-free"? It may not be obvious to all readers. >>>> >>>> That said, I’m not sure this trick will actually change the generated >>>> object code - especially if the result of the eq function is still used >>>> in a conditional afterward. Anyway, keeping it seems like a good >>>> conservative approach. >>> >>> I wonder if any compiler is clever enough to use a different memcmp >> implementation if we inform the compiler at build time that we don't care if >> key1 is less than or greater key2, only if they differ or not. >> >> All what is needed is a constant-size length. (Only tested with the most >> recent GCC and clang.) > > Yes. But that was not what I was thinking about... I was wondering if "memcmp()!=0" compiles to code that calls some other memcmp implementation that doesn't check which of the two strings is lower, but only tests if they differ. > I don't think so. On GCC 15.2 at least, for non-constant buffer lengths, glibc memcmp() will always be invoked. >> >> At least GCC will emit a cmp instruction though (so not branch free), if >> that matters. >> >>> If so, the OTHER_BYTES handler shouldn't call memcmp() directly, but a >> wrapper around it: >>> >>> rte_hash_k_cmp_eq(const void *key1, const void *key2, size_t key_len >> __rte_unused) >>> { >>> return memcmp(key1, key2, key_len) != 0; >>> } >>> >> >> Should work. (Remove __rte_unused.) > > I just tested it in Compiler Explorer, and the function calls the same memcmp(), regardless of the !=0. > So no benefit compared to calling memcmp() directly from the jump table. > The rte_hash_k_cmp_eq() must be invoked from some other function, which provides a constant key_len, for the magic to happen. I would keep this function (maybe as __rte_hash_cmp_neq()) for that purpose.