From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 1037A424C1; Mon, 10 Jun 2024 17:19:14 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id A4396402A9; Mon, 10 Jun 2024 17:19:13 +0200 (CEST) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by mails.dpdk.org (Postfix) with ESMTP id 6569A40294 for ; Mon, 10 Jun 2024 17:19:12 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1718032751; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=YClQfMQsNHM8bM8n7WqB0g2QPoAgJlcqnIDgRcq5tBo=; b=cEGmL1Wk7JZ+OTJzH7ZG2zk6xnkqFCsjAvwY4XsecUYef6wONpOGn3M8xMgAeOOik+n0p/ EScpa9OSAcODZSrcQEeamiM2RTyuhWslZhxDAv2slwZp1MbHDo5u1ZXAqPchyDYjQEU/aN 0nMzHvLJjInnuCGA1mA4ZgPhMvivTPs= Received: from mail-lj1-f199.google.com (mail-lj1-f199.google.com [209.85.208.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-22-PYwsiol5Mp2dRcyu9w8naA-1; Mon, 10 Jun 2024 11:19:10 -0400 X-MC-Unique: PYwsiol5Mp2dRcyu9w8naA-1 Received: by mail-lj1-f199.google.com with SMTP id 38308e7fff4ca-2eaec29f8e3so21174591fa.1 for ; Mon, 10 Jun 2024 08:19:10 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718032749; x=1718637549; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=YClQfMQsNHM8bM8n7WqB0g2QPoAgJlcqnIDgRcq5tBo=; b=KPeZq/MOgfIUdw6U4Q6Un/ojV+Vv26OkYIYFNTjOEsagqfHPXbiKBROoUXfRR6y3jy IW5zJvEMr4X/X4M5vZ3BecNROHO4HeIoE/TMqAR3iMZJb2tckJt/mIlf6rMNukxXxjX5 QDIjA9JsABlYepBLgBvzI2WJrpml4EMhMec9VeLMIR6oVzJkBzbDP1oS310TmR4Fbe8A gdxAlVJI4+XSZQwq0tBuiBfPqtlkJU5sK/hp9ajid6taxFKFyTWT9WKVKRsGSpSs+8N+ T4Bk87UbwhnYDMPDjqHOQa5kJ3Cuf7j7ErJb/vvkPDNpChFORxvvCbNIJjOaJJNLMysG mVdg== X-Gm-Message-State: AOJu0YzjRzK9ZTugFro8r69Gur1qBoHNCgx+k9VNX+hvBQTLmQFgbzSs qjR2jZ8ABGB+JoSI+PJUOu1dQihuYYa7Pb4zdmx/PaZZuXiAh4yjj6m4B/3uIy7TaV/+RdGPzg8 jqc3O4dFljDwEegf7N1nRD3WdYYb0ruP0rqchGphFrOxFxrYB9GD3QQ5mvS0Js/9jRffc2YdISK ikZrQR8wqNGhsfFfo= X-Received: by 2002:a05:651c:198d:b0:2ea:8fff:7e1 with SMTP id 38308e7fff4ca-2eadce7131fmr69130991fa.41.1718032748963; Mon, 10 Jun 2024 08:19:08 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHwBo2gwHhS3C2bxxpW+CpRp/wrBqrci+zA/nNhqEvPVYKrr5FT6ZKnJk11SYib+pZ2kUWk9YYFNosefu33ZO0= X-Received: by 2002:a05:651c:198d:b0:2ea:8fff:7e1 with SMTP id 38308e7fff4ca-2eadce7131fmr69130701fa.41.1718032748478; Mon, 10 Jun 2024 08:19:08 -0700 (PDT) MIME-Version: 1.0 References: <20230927150854.3670391-1-paul.szczepanek@arm.com> <20240607151000.98562-1-paul.szczepanek@arm.com> <20240607151000.98562-4-paul.szczepanek@arm.com> In-Reply-To: <20240607151000.98562-4-paul.szczepanek@arm.com> From: David Marchand Date: Mon, 10 Jun 2024 17:18:56 +0200 Message-ID: Subject: Re: [PATCH v14 3/6] ptr_compress: add pointer compression library To: Paul Szczepanek Cc: dev@dpdk.org, mb@smartsharesystems.com, Honnappa Nagarahalli , Kamalakshitha Aligeri , Nathan Brown , Jack Bond-Preston , Thomas Monjalon X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Hello, On Fri, Jun 7, 2024 at 5:10=E2=80=AFPM Paul Szczepanek wrote: > > Add a new utility header for compressing pointers. The provided > functions can store pointers as 32-bit or 16-bit offsets. > > The compression takes advantage of the fact that pointers are > usually located in a limited memory region (like a mempool). > We can compress them by converting them to offsets from a base > memory address. Offsets can be stored in fewer bytes (dictated > by the memory region size and alignment of the pointer). > For example: an 8 byte aligned pointer which is part of a 32GB > memory pool can be stored in 4 bytes. > > Suggested-by: Honnappa Nagarahalli > Signed-off-by: Paul Szczepanek > Signed-off-by: Kamalakshitha Aligeri > Reviewed-by: Honnappa Nagarahalli > Reviewed-by: Nathan Brown > Reviewed-by: Jack Bond-Preston > Acked-by: Morten Br=C3=B8rup > --- > MAINTAINERS | 4 + > doc/api/doxy-api-index.md | 1 + > doc/api/doxy-api.conf.in | 1 + > doc/guides/rel_notes/release_24_07.rst | 5 + > lib/meson.build | 1 + > lib/ptr_compress/meson.build | 4 + > lib/ptr_compress/rte_ptr_compress.h | 324 +++++++++++++++++++++++++ > 7 files changed, 340 insertions(+) > create mode 100644 lib/ptr_compress/meson.build > create mode 100644 lib/ptr_compress/rte_ptr_compress.h > > diff --git a/MAINTAINERS b/MAINTAINERS > index c9adff9846..27b2f03e6c 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -1694,6 +1694,10 @@ M: Chenbo Xia > M: Gaetan Rivet > F: lib/pci/ > > +Pointer Compression > +M: Paul Szczepanek > +F: lib/ptr_compress/ > + > Power management > M: Anatoly Burakov > M: David Hunt > diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md > index 8c1eb8fafa..f9283154f8 100644 > --- a/doc/api/doxy-api-index.md > +++ b/doc/api/doxy-api-index.md > @@ -222,6 +222,7 @@ The public API headers are grouped by topics: > [config file](@ref rte_cfgfile.h), > [key/value args](@ref rte_kvargs.h), > [argument parsing](@ref rte_argparse.h), > + [ptr_compress](@ref rte_ptr_compress.h), > [string](@ref rte_string_fns.h), > [thread](@ref rte_thread.h) > > diff --git a/doc/api/doxy-api.conf.in b/doc/api/doxy-api.conf.in > index 27afec8b3b..a8823c046f 100644 > --- a/doc/api/doxy-api.conf.in > +++ b/doc/api/doxy-api.conf.in > @@ -71,6 +71,7 @@ INPUT =3D @TOPDIR@/doc/api/doxy-api-i= ndex.md \ > @TOPDIR@/lib/pipeline \ > @TOPDIR@/lib/port \ > @TOPDIR@/lib/power \ > + @TOPDIR@/lib/ptr_compress \ > @TOPDIR@/lib/rawdev \ > @TOPDIR@/lib/rcu \ > @TOPDIR@/lib/regexdev \ > diff --git a/doc/guides/rel_notes/release_24_07.rst b/doc/guides/rel_note= s/release_24_07.rst > index a69f24cf99..4711792e61 100644 > --- a/doc/guides/rel_notes/release_24_07.rst > +++ b/doc/guides/rel_notes/release_24_07.rst > @@ -55,6 +55,11 @@ New Features > Also, make sure to start the actual text at the margin. > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D > > +* **Introduced pointer compression library.** > + > + Library provides functions to compress and decompress arrays of pointe= rs > + which can improve application performance under certain conditions. > + Performance test was added to help users evaluate performance on their= setup. Please, double empty line before a new section in the release notes. > > Removed Items > ------------- > diff --git a/lib/meson.build b/lib/meson.build > index 7c90602bf5..63becee142 100644 > --- a/lib/meson.build > +++ b/lib/meson.build > @@ -14,6 +14,7 @@ libraries =3D [ > 'argparse', > 'telemetry', # basic info querying > 'eal', # everything depends on eal > + 'ptr_compress', > 'ring', > 'rcu', # rcu depends on ring > 'mempool', > diff --git a/lib/ptr_compress/meson.build b/lib/ptr_compress/meson.build > new file mode 100644 > index 0000000000..e92706a45f > --- /dev/null > +++ b/lib/ptr_compress/meson.build > @@ -0,0 +1,4 @@ > +# SPDX-License-Identifier: BSD-3-Clause > +# Copyright(c) 2024 Arm Limited > + > +headers =3D files('rte_ptr_compress.h') > diff --git a/lib/ptr_compress/rte_ptr_compress.h b/lib/ptr_compress/rte_p= tr_compress.h > new file mode 100644 > index 0000000000..bf9cfb0661 > --- /dev/null > +++ b/lib/ptr_compress/rte_ptr_compress.h > @@ -0,0 +1,324 @@ > +/* SPDX-License-Identifier: BSD-3-Clause > + * Copyright(c) 2024 Arm Limited > + */ > + > +#ifndef RTE_PTR_COMPRESS_H > +#define RTE_PTR_COMPRESS_H > + > +/** > + * @file > + * Pointer compression and decompression functions. > + * > + * When passing arrays full of pointers between threads, memory containi= ng > + * the pointers is copied multiple times which is especially costly betw= een > + * cores. These functions allow us to compress the pointers. > + * > + * Compression takes advantage of the fact that pointers are usually loc= ated in > + * a limited memory region. We compress them by converting them to offse= ts from > + * a base memory address. Offsets can be stored in fewer bytes. > + * > + * The compression functions come in two varieties: 32-bit and 16-bit. > + * > + * To determine how many bits are needed to compress the pointer calcula= te Please add a , ... to compress the pointer, calculate ... > + * the biggest offset possible (highest value pointer - base pointer) > + * and shift the value right according to alignment (shift by exponent o= f the > + * power of 2 of alignment: aligned by 4 - shift by 2, aligned by 8 - sh= ift by > + * 3, etc.). The resulting value must fit in either 32 or 16 bits. > + * > + * For usage example and further explanation please see "Pointer Compres= sion" in > + * doc/guides/prog_guide/ptr_compress_lib.rst I don't like refering to a path in the sources: the rendered doc won't link to the actual documentation page, and this path may get incorrect in the future if for some reason the doc is renamed or moved. The simpler is to state "please see this library documentation programming guide". > + */ > + > +#include > +#include > + > +#include > +#include > +#include > +#include > +#include It is unneeded. Please, don't create a dependency to the mempool library. > +#include > + > +#ifdef __cplusplus > +extern "C" { > +#endif > + > +/** > + * Calculate how many bits are required to store a value within a given = range. > + * This is to help decide which pointer compression functions can be use= d to > + * store pointers contained within a memory range. > + * > + * @param range > + * The size of the range the value belongs to. In my mind, a range takes the form of a set of two symbols. Here, iiuc, what is needed/requested as an input is the number of entries of a (contiguous) space. So the naming is at least strange. > + * @return > + * Number of bits required to store a value. > + **/ > +#define RTE_PTR_COMPRESS_BITS_REQUIRED_TO_STORE_VALUE_IN_RANGE(range) \ > + (((uint64_t)range) < 2 ? 1 : \ > + (sizeof(uint64_t) * CHAR_BIT - rte_clz64((uint64_t)range = - 1))) But now, I don't see why we need to expose this macro (and its sister RTE_PTR_COMPRESS_BIT_SHIFT_FROM_ALIGNMENT) at all. Can you clarify the usecase? > + > +/** > + * Calculate how many bits in the address can be dropped without losing = any > + * information thanks to the alignment of the address. > + * > + * @param alignment > + * Memory alignment. > + * @return > + * Size of shift allowed without dropping any information from the poi= nter. > + **/ > +#define RTE_PTR_COMPRESS_BIT_SHIFT_FROM_ALIGNMENT(alignment) \ > + ((alignment) =3D=3D 0 ? 0 : rte_ctz64((uint64_t)alignment)) > + > +/** > + * Determine if rte_ptr_compress_16_shift can be used to compress pointe= rs > + * that contain addresses of memory objects whose memory is aligned by > + * a given amount and contained in a given memory range. > + * > + * @param mem_range > + * The size of the memory region that contains the objects pointed to. Again, can you rephrase and not use the term "range"? Expected input here is a number of pointers / elements in a space. > + * @param obj_alignment > + * The alignment of objects pointed to. > + * @return > + * 1 if function can be used, 0 otherwise. > + **/ > +#define RTE_PTR_COMPRESS_CAN_COMPRESS_16_SHIFT(mem_range, obj_alignment)= \ > + ((RTE_PTR_COMPRESS_BITS_REQUIRED_TO_STORE_VALUE_IN_RANGE(mem_rang= e) - \ > + RTE_PTR_COMPRESS_BIT_SHIFT_FROM_ALIGNMENT(obj_alignment)) <=3D 16= ? 1 : 0) > + > +/** > + * Determine if rte_ptr_compress_32_shift can be used to compress pointe= rs > + * that contain addresses of memory objects whose memory is aligned by > + * a given amount and contained in a given memory range. > + * > + * @param mem_range > + * The size of the memory region that contains the objects pointed to. > + * @param obj_alignment > + * The alignment of objects pointed to. > + * @return > + * 1 if function can be used, 0 otherwise. > + **/ > +#define RTE_PTR_COMPRESS_CAN_COMPRESS_32_SHIFT(mem_range, obj_alignment)= \ > + ((RTE_PTR_COMPRESS_BITS_REQUIRED_TO_STORE_VALUE_IN_RANGE(mem_rang= e) - \ > + RTE_PTR_COMPRESS_BIT_SHIFT_FROM_ALIGNMENT(obj_alignment)) <=3D 32= ? 1 : 0) RTE_PTR_COMPRESS_CAN_COMPRESS_16_SHIFT and RTE_PTR_COMPRESS_CAN_COMPRESS_32_SHIFT do the same job. The only thing that differs is the size of the compressed space, so maybe this size could be passed as an input. No strong opinion though. > + > +/** > + * Compress pointers into 32-bit offsets from base pointer. > + * > + * @note It is programmer's responsibility to ensure the resulting offse= ts fit > + * into 32 bits. Alignment of the structures pointed to by the pointers = allows > + * us to drop bits from the offsets. This is controlled by the bit_shift > + * parameter. This means that if structures are aligned by 8 bytes they = must be > + * within 32GB of the base pointer. If there is no such alignment guaran= tee they > + * must be within 4GB. > + * > + * @param ptr_base > + * A pointer used to calculate offsets of pointers in src_table. > + * @param src_table > + * A pointer to an array of pointers. > + * @param dest_table > + * A pointer to an array of compressed pointers returned by this funct= ion. > + * @param n > + * The number of objects to compress, must be strictly positive. > + * @param bit_shift > + * Byte alignment of memory pointed to by the pointers allows for > + * bits to be dropped from the offset and hence widen the memory regio= n that > + * can be covered. This controls how many bits are right shifted. > + **/ > +static __rte_always_inline void > +rte_ptr_compress_32_shift(void *ptr_base, void * const *src_table, > + uint32_t *dest_table, size_t n, uint8_t bit_shift) > +{ > + size_t i =3D 0; > +#if defined RTE_HAS_SVE_ACLE && !defined RTE_ARCH_ARMv8_AARCH32 Note: DPDK usually split per architecture implementation in separate header files, but I guess this is okay in the current form with #ifdef for now.. [snip] --=20 David Marchand