From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 3F5B543AEC; Sun, 11 Feb 2024 16:32:58 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id BB8EF40283; Sun, 11 Feb 2024 16:32:57 +0100 (CET) Received: from forward500a.mail.yandex.net (forward500a.mail.yandex.net [178.154.239.80]) by mails.dpdk.org (Postfix) with ESMTP id 7C5C24025D for ; Sun, 11 Feb 2024 16:32:56 +0100 (CET) Received: from mail-nwsmtp-smtp-production-main-39.vla.yp-c.yandex.net (mail-nwsmtp-smtp-production-main-39.vla.yp-c.yandex.net [IPv6:2a02:6b8:c1f:6405:0:640:589d:0]) by forward500a.mail.yandex.net (Yandex) with ESMTPS id B117460C82; Sun, 11 Feb 2024 18:32:55 +0300 (MSK) Received: by mail-nwsmtp-smtp-production-main-39.vla.yp-c.yandex.net (smtp/Yandex) with ESMTPSA id oWnbS5Ai8qM0-DZqGSNxg; Sun, 11 Feb 2024 18:32:55 +0300 X-Yandex-Fwd: 1 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex.ru; s=mail; t=1707665575; bh=10J1QSTZxn5ba3fHZgkIXYCkRzSP+b2+G+Q9N0Fn5xo=; h=From:In-Reply-To:Cc:Date:References:To:Subject:Message-ID; b=TXa/+OOKV41QA1aWXd9kSPY6A1eZ8ZrOdDfuk3TDDQhL4op6vKhsUjaw2oT091vor cDJXjM+IcYfCLXV8b7mqSm7RaDHU9tmgfdcfU4HpVHoSZFEALTzXSjync2DYL9tgYf DrmXBGpv0/rHjxPDCrXnpqvQR20nD/PHD45fqIxY= Authentication-Results: mail-nwsmtp-smtp-production-main-39.vla.yp-c.yandex.net; dkim=pass header.i=@yandex.ru Message-ID: <039d1ab2-9a78-4cb8-a349-fc133bb5cd80@yandex.ru> Date: Sun, 11 Feb 2024 15:32:50 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v5 1/4] eal: add pointer compression functions Content-Language: en-US, ru-RU To: Paul Szczepanek , dev@dpdk.org Cc: Honnappa Nagarahalli , Kamalakshitha Aligeri References: <20230927150854.3670391-2-paul.szczepanek@arm.com> <20231101181301.2449804-1-paul.szczepanek@arm.com> <20231101181301.2449804-2-paul.szczepanek@arm.com> From: Konstantin Ananyev In-Reply-To: <20231101181301.2449804-2-paul.szczepanek@arm.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org > Add a new utility header for compressing pointers. The provided > functions can store pointers in 32-bit offsets. > > The compression takes advantage of the fact that pointers are > usually located in a limited memory region (like a mempool). > We can compress them by converting them to offsets from a base > memory address. Offsets can be stored in fewer bytes (dictated > by the memory region size and alignment of the pointer). > For example: an 8 byte aligned pointer which is part of a 32GB > memory pool can be stored in 4 bytes. > > Suggested-by: Honnappa Nagarahalli > Signed-off-by: Paul Szczepanek > Signed-off-by: Kamalakshitha Aligeri > Reviewed-by: Honnappa Nagarahalli From one side the code itself is very small and straightforward, from other side - it is not clear to me what is intended usage for it within DPDK and it's applianances? Konstantin > --- > .mailmap | 1 + > lib/eal/include/meson.build | 1 + > lib/eal/include/rte_ptr_compress.h | 266 +++++++++++++++++++++++++++++ > 3 files changed, 268 insertions(+) > create mode 100644 lib/eal/include/rte_ptr_compress.h > > diff --git a/.mailmap b/.mailmap > index 3f5bab26a8..004751d27a 100644 > --- a/.mailmap > +++ b/.mailmap > @@ -1069,6 +1069,7 @@ Paul Greenwalt > Paulis Gributs > Paul Luse > Paul M Stillwell Jr > +Paul Szczepanek > Pavan Kumar Linga > Pavan Nikhilesh > Pavel Belous > diff --git a/lib/eal/include/meson.build b/lib/eal/include/meson.build > index e94b056d46..ce2c733633 100644 > --- a/lib/eal/include/meson.build > +++ b/lib/eal/include/meson.build > @@ -36,6 +36,7 @@ headers += files( > 'rte_pci_dev_features.h', > 'rte_per_lcore.h', > 'rte_pflock.h', > + 'rte_ptr_compress.h', > 'rte_random.h', > 'rte_reciprocal.h', > 'rte_seqcount.h', > diff --git a/lib/eal/include/rte_ptr_compress.h b/lib/eal/include/rte_ptr_compress.h > new file mode 100644 > index 0000000000..47a72e4213 > --- /dev/null > +++ b/lib/eal/include/rte_ptr_compress.h > @@ -0,0 +1,266 @@ > +/* SPDX-License-Identifier: BSD-shift-Clause > + * Copyright(c) 2023 Arm Limited > + */ > + > +#ifndef RTE_PTR_COMPRESS_H > +#define RTE_PTR_COMPRESS_H > + > +/** > + * @file > + * Pointer compression and decompression functions. > + * > + * When passing arrays full of pointers between threads, memory containing > + * the pointers is copied multiple times which is especially costly between > + * cores. These functions allow us to compress the pointers. > + * > + * Compression takes advantage of the fact that pointers are usually located in > + * a limited memory region (like a mempool). We compress them by converting them > + * to offsets from a base memory address. Offsets can be stored in fewer bytes. > + * > + * The compression functions come in two varieties: 32-bit and 16-bit. > + * > + * To determine how many bits are needed to compress the pointer calculate > + * the biggest offset possible (highest value pointer - base pointer) > + * and shift the value right according to alignment (shift by exponent of the > + * power of 2 of alignment: aligned by 4 - shift by 2, aligned by 8 - shift by > + * 3, etc.). The resulting value must fit in either 32 or 16 bits. > + * > + * For usage example and further explanation please see "Pointer Compression" in > + * doc/guides/prog_guide/env_abstraction_layer.rst > + */ > + > +#include > +#include > + > +#include > +#include > +#include > +#include > + > +#ifdef __cplusplus > +extern "C" { > +#endif > + > +/** > + * Compress pointers into 32-bit offsets from base pointer. > + * > + * @note It is programmer's responsibility to ensure the resulting offsets fit > + * into 32 bits. Alignment of the structures pointed to by the pointers allows > + * us to drop bits from the offsets. This is controlled by the bit_shift > + * parameter. This means that if structures are aligned by 8 bytes they must be > + * within 32GB of the base pointer. If there is no such alignment guarantee they > + * must be within 4GB. > + * > + * @param ptr_base > + * A pointer used to calculate offsets of pointers in src_table. > + * @param src_table > + * A pointer to an array of pointers. > + * @param dest_table > + * A pointer to an array of compressed pointers returned by this function. > + * @param n > + * The number of objects to compress, must be strictly positive. > + * @param bit_shift > + * Byte alignment of memory pointed to by the pointers allows for > + * bits to be dropped from the offset and hence widen the memory region that > + * can be covered. This controls how many bits are right shifted. > + **/ > +static __rte_always_inline void > +rte_ptr_compress_32(void *ptr_base, void **src_table, > + uint32_t *dest_table, unsigned int n, unsigned int bit_shift) > +{ > + unsigned int i = 0; > +#if defined RTE_HAS_SVE_ACLE && !defined RTE_ARCH_ARMv8_AARCH32 > + svuint64_t v_ptr_table; > + svbool_t pg = svwhilelt_b64(i, n); > + do { > + v_ptr_table = svld1_u64(pg, (uint64_t *)src_table + i); > + v_ptr_table = svsub_x(pg, v_ptr_table, (uint64_t)ptr_base); > + v_ptr_table = svlsr_x(pg, v_ptr_table, bit_shift); > + svst1w(pg, &dest_table[i], v_ptr_table); > + i += svcntd(); > + pg = svwhilelt_b64(i, n); > + } while (svptest_any(svptrue_b64(), pg)); > +#elif defined __ARM_NEON && !defined RTE_ARCH_ARMv8_AARCH32 > + uint64_t ptr_diff; > + uint64x2_t v_ptr_table; > + /* right shift is done by left shifting by negative int */ > + int64x2_t v_shift = vdupq_n_s64(-bit_shift); > + uint64x2_t v_ptr_base = vdupq_n_u64((uint64_t)ptr_base); > + for (; i < (n & ~0x1); i += 2) { > + v_ptr_table = vld1q_u64((const uint64_t *)src_table + i); > + v_ptr_table = vsubq_u64(v_ptr_table, v_ptr_base); > + v_ptr_table = vshlq_u64(v_ptr_table, v_shift); > + vst1_u32(dest_table + i, vqmovn_u64(v_ptr_table)); > + } > + /* process leftover single item in case of odd number of n */ > + if (unlikely(n & 0x1)) { > + ptr_diff = RTE_PTR_DIFF(src_table[i], ptr_base); > + dest_table[i] = (uint32_t) (ptr_diff >> bit_shift); > + } > +#else > + uintptr_t ptr_diff; > + for (; i < n; i++) { > + ptr_diff = RTE_PTR_DIFF(src_table[i], ptr_base); > + ptr_diff = ptr_diff >> bit_shift; > + RTE_ASSERT(ptr_diff <= UINT32_MAX); > + dest_table[i] = (uint32_t) ptr_diff; > + } > +#endif > +} > + > +/** > + * Decompress pointers from 32-bit offsets from base pointer. > + * > + * @param ptr_base > + * A pointer which was used to calculate offsets in src_table. > + * @param src_table > + * A pointer to an array to compressed pointers. > + * @param dest_table > + * A pointer to an array of decompressed pointers returned by this function. > + * @param n > + * The number of objects to decompress, must be strictly positive. > + * @param bit_shift > + * Byte alignment of memory pointed to by the pointers allows for > + * bits to be dropped from the offset and hence widen the memory region that > + * can be covered. This controls how many bits are left shifted when pointers > + * are recovered from the offsets. > + **/ > +static __rte_always_inline void > +rte_ptr_decompress_32(void *ptr_base, uint32_t *src_table, > + void **dest_table, unsigned int n, unsigned int bit_shift) > +{ > + unsigned int i = 0; > +#if defined RTE_HAS_SVE_ACLE && !defined RTE_ARCH_ARMv8_AARCH32 > + svuint64_t v_ptr_table; > + svbool_t pg = svwhilelt_b64(i, n); > + do { > + v_ptr_table = svld1uw_u64(pg, &src_table[i]); > + v_ptr_table = svlsl_x(pg, v_ptr_table, bit_shift); > + v_ptr_table = svadd_x(pg, v_ptr_table, (uint64_t)ptr_base); > + svst1(pg, (uint64_t *)dest_table + i, v_ptr_table); > + i += svcntd(); > + pg = svwhilelt_b64(i, n); > + } while (svptest_any(svptrue_b64(), pg)); > +#elif defined __ARM_NEON && !defined RTE_ARCH_ARMv8_AARCH32 > + uint64_t ptr_diff; > + uint64x2_t v_ptr_table; > + int64x2_t v_shift = vdupq_n_s64(bit_shift); > + uint64x2_t v_ptr_base = vdupq_n_u64((uint64_t)ptr_base); > + for (; i < (n & ~0x1); i += 2) { > + v_ptr_table = vmovl_u32(vld1_u32(src_table + i)); > + v_ptr_table = vshlq_u64(v_ptr_table, v_shift); > + v_ptr_table = vaddq_u64(v_ptr_table, v_ptr_base); > + vst1q_u64((uint64_t *)dest_table + i, v_ptr_table); > + } > + /* process leftover single item in case of odd number of n */ > + if (unlikely(n & 0x1)) { > + ptr_diff = ((uint64_t) src_table[i]) << bit_shift; > + dest_table[i] = RTE_PTR_ADD(ptr_base, ptr_diff); > + } > +#else > + uintptr_t ptr_diff; > + for (; i < n; i++) { > + ptr_diff = ((uintptr_t) src_table[i]) << bit_shift; > + dest_table[i] = RTE_PTR_ADD(ptr_base, ptr_diff); > + } > +#endif > +} > + > +/** > + * Compress pointers into 16-bit offsets from base pointer. > + * > + * @note It is programmer's responsibility to ensure the resulting offsets fit > + * into 16 bits. Alignment of the structures pointed to by the pointers allows > + * us to drop bits from the offsets. This is controlled by the bit_shift > + * parameter. This means that if structures are aligned by 8 bytes they must be > + * within 256KB of the base pointer. If there is no such alignment guarantee > + * they must be within 64KB. > + * > + * @param ptr_base > + * A pointer used to calculate offsets of pointers in src_table. > + * @param src_table > + * A pointer to an array of pointers. > + * @param dest_table > + * A pointer to an array of compressed pointers returned by this function. > + * @param n > + * The number of objects to compress, must be strictly positive. > + * @param bit_shift > + * Byte alignment of memory pointed to by the pointers allows for > + * bits to be dropped from the offset and hence widen the memory region that > + * can be covered. This controls how many bits are right shifted. > + **/ > +static __rte_always_inline void > +rte_ptr_compress_16(void *ptr_base, void **src_table, > + uint16_t *dest_table, unsigned int n, unsigned int bit_shift) > +{ > + > + unsigned int i = 0; > +#if defined RTE_HAS_SVE_ACLE && !defined RTE_ARCH_ARMv8_AARCH32 > + svuint64_t v_ptr_table; > + svbool_t pg = svwhilelt_b64(i, n); > + do { > + v_ptr_table = svld1_u64(pg, (uint64_t *)src_table + i); > + v_ptr_table = svsub_x(pg, v_ptr_table, (uint64_t)ptr_base); > + v_ptr_table = svlsr_x(pg, v_ptr_table, bit_shift); > + svst1h(pg, &dest_table[i], v_ptr_table); > + i += svcntd(); > + pg = svwhilelt_b64(i, n); > + } while (svptest_any(svptrue_b64(), pg)); > +#else > + uintptr_t ptr_diff; > + for (; i < n; i++) { > + ptr_diff = RTE_PTR_DIFF(src_table[i], ptr_base); > + ptr_diff = ptr_diff >> bit_shift; > + RTE_ASSERT(ptr_diff <= UINT16_MAX); > + dest_table[i] = (uint16_t) ptr_diff; > + } > +#endif > +} > + > +/** > + * Decompress pointers from 16-bit offsets from base pointer. > + * > + * @param ptr_base > + * A pointer which was used to calculate offsets in src_table. > + * @param src_table > + * A pointer to an array to compressed pointers. > + * @param dest_table > + * A pointer to an array of decompressed pointers returned by this function. > + * @param n > + * The number of objects to decompress, must be strictly positive. > + * @param bit_shift > + * Byte alignment of memory pointed to by the pointers allows for > + * bits to be dropped from the offset and hence widen the memory region that > + * can be covered. This controls how many bits are left shifted when pointers > + * are recovered from the offsets. > + **/ > +static __rte_always_inline void > +rte_ptr_decompress_16(void *ptr_base, uint16_t *src_table, > + void **dest_table, unsigned int n, unsigned int bit_shift) > +{ > + unsigned int i = 0; > +#if defined RTE_HAS_SVE_ACLE && !defined RTE_ARCH_ARMv8_AARCH32 > + svuint64_t v_ptr_table; > + svbool_t pg = svwhilelt_b64(i, n); > + do { > + v_ptr_table = svld1uh_u64(pg, &src_table[i]); > + v_ptr_table = svlsl_x(pg, v_ptr_table, bit_shift); > + v_ptr_table = svadd_x(pg, v_ptr_table, (uint64_t)ptr_base); > + svst1(pg, (uint64_t *)dest_table + i, v_ptr_table); > + i += svcntd(); > + pg = svwhilelt_b64(i, n); > + } while (svptest_any(svptrue_b64(), pg)); > +#else > + uintptr_t ptr_diff; > + for (; i < n; i++) { > + ptr_diff = ((uintptr_t) src_table[i]) << bit_shift; > + dest_table[i] = RTE_PTR_ADD(ptr_base, ptr_diff); > + } > +#endif > +} > + > +#ifdef __cplusplus > +} > +#endif > + > +#endif /* RTE_PTR_COMPRESS_H */ > -- > 2.25.1 >