From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124])
	by inbox.dpdk.org (Postfix) with ESMTP id 299644400A;
	Sun, 12 May 2024 09:11:04 +0200 (CEST)
Received: from mails.dpdk.org (localhost [127.0.0.1])
	by mails.dpdk.org (Postfix) with ESMTP id A2389402BA;
	Sun, 12 May 2024 09:11:03 +0200 (CEST)
Received: from mail-ej1-f45.google.com (mail-ej1-f45.google.com
 [209.85.218.45]) by mails.dpdk.org (Postfix) with ESMTP id 2DB35402B5
 for <dev@dpdk.org>; Sun, 12 May 2024 09:11:02 +0200 (CEST)
Received: by mail-ej1-f45.google.com with SMTP id
 a640c23a62f3a-a5a1054cf61so855775866b.1
 for <dev@dpdk.org>; Sun, 12 May 2024 00:11:02 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=gmail.com; s=20230601; t=1715497862; x=1716102662; darn=dpdk.org;
 h=content-transfer-encoding:cc:to:subject:message-id:date:from
 :in-reply-to:references:mime-version:from:to:cc:subject:date
 :message-id:reply-to;
 bh=+GwpQEiRKlwP3Q+LL14R0UvDMA+4Koc0GYs5vmwWEZY=;
 b=ISZZjrNrAVvhMJAhoQdDAc2AQhEjL7TZiz1Iv9jc9drrL/Y8SqpH1TvmLOQQXOk09H
 CuMV5SQfLeu05aFE4n4TepeevePw/uMUOaJaj7uY2wYvgVkapOlUBH2kXtudEUhmRAZC
 RRgng06RAzbCH0enBMSXWpdcvxHnH04v/x8LtT0D62bv2lHr+W7umj0it4Cen2ADT2mU
 WzyN/182SQsXou2sEna38Qyg06JViBVONYWm8RP1qg5FeVdMIGess+JFYAk9clLxphT3
 Vq5porRvSpR6yVPk8KN/dDF6O5d+aJcxQWc3YsBL+eHspJ2KyPQlWjWQ3t9nW71M/24x
 unpQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20230601; t=1715497862; x=1716102662;
 h=content-transfer-encoding:cc:to:subject:message-id:date:from
 :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
 :subject:date:message-id:reply-to;
 bh=+GwpQEiRKlwP3Q+LL14R0UvDMA+4Koc0GYs5vmwWEZY=;
 b=A6QuyWMuJIh+nP/FwmHSVKq9Ozfhv7OWEFYopjAD44sj2alMdQg0WwyHmkVFDmzyxi
 m/l4graWHeNtGgGq3TuknnMNBAvEbxkSGE4dwZ6umuhcfo7L/gX/kYYDLKry96VWINPU
 mjH18bjdLykfw0Yq+45poPaM4SUdgXCiefFzuEP/p2DeLBQGvgjlrYqO0OnNZvce1yvd
 2mlHe6IZhuNcBQB1aPFVuGoFlwcc0s8nX2HsMfk358s6pEU9kw8h/DKYz1zxwMUsqDkp
 o7T8M85lRgjaj+vcvKtLx79KxhUZmwsS8ebrjCyqq3HLQmRGww8/a95EeAivDbDzY89N
 I/0g==
X-Forwarded-Encrypted: i=1;
 AJvYcCXDqqgwlgGHFz9BhKONoy1Zf2/OWUijGpwfxAS8ZoRKagrKnZ3u9iUiODdIv9LAovjpPHgpwiYWWbi+V/o=
X-Gm-Message-State: AOJu0YzSIC8KpPOJWpqwJ9d/9hcGDlgkarQ0Fl26qjuEfWW/c3UOGH66
 Xhu7JV2snKLaCC/JekHm16QDCW6mDWTJXF4uMCr/UCoRd1597l/GPVYgAmZRgNFl+0ClqAECu6i
 4fU80ZtC1+Z1rwn0fx0Wa47J1IYw=
X-Google-Smtp-Source: AGHT+IFdqvLqBX4xi2I3dp9wlgcP/V1U06Un46K3fVxUY8dnFBJCd4GeIR1YL0c52hllr5UTKJc2T1iF3egd6csqhic=
X-Received: by 2002:a17:906:dace:b0:a5a:5bc8:9fcd with SMTP id
 a640c23a62f3a-a5a5bc8a0bbmr117124566b.43.1715497861461; Sun, 12 May 2024
 00:11:01 -0700 (PDT)
MIME-Version: 1.0
References: <20240502144149.66446-1-daniel.gregory@bytedance.com>
In-Reply-To: <20240502144149.66446-1-daniel.gregory@bytedance.com>
From: =?UTF-8?Q?Stanis=C5=82aw_Kardach?= <stanislaw.kardach@gmail.com>
Date: Sun, 12 May 2024 09:10:49 +0200
Message-ID: <CAJcPQBrX+-XceJ0RTNEKTzHQQ9NH==nXnE2_dJJhh6Q=R9wVcA@mail.gmail.com>
Subject: Re: [RFC PATCH] eal/riscv: add support for zawrs extension
To: Daniel Gregory <daniel.gregory@bytedance.com>
Cc: Bruce Richardson <bruce.richardson@intel.com>, dev@dpdk.org, 
 Liang Ma <liangma@bytedance.com>, Punit Agrawal <punit.agrawal@bytedance.com>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org

On Thu, May 2, 2024 at 4:44=E2=80=AFPM Daniel Gregory
<daniel.gregory@bytedance.com> wrote:
>
> The zawrs extension adds a pair of instructions that stall a core until
> a memory location is written to. This patch uses one of them to
> implement RISCV-specific versions of the rte_wait_until_equal_*
> functions. This is potentially more energy efficient than the default
> implementation that uses rte_pause/Zihintpause.
>
> The technique works as follows:
>
> * Create a reservation set containing the address we want to wait on
>   using an atomic load (lr.dw)
> * Call wrs.nto - this blocks until the reservation set is invalidated by
>   someone else writing to that address
> * Execution can also resume arbitrarily, so we still need to check
>   whether a change occurred and loop if not
>
> Due to RISC-V atomics only supporting naturally aligned word (32 bit)
> and double word (64 bit) loads, I've used pointer rounding and bit
> shifting to implement waiting on 16-bit values.
>
> This new functionality is controlled by a Meson flag that is disabled by
> default.
>
> Signed-off-by: Daniel Gregory <daniel.gregory@bytedance.com>
> Suggested-by: Punit Agrawal <punit.agrawal@bytedance.com>
> ---
>
> Posting as an RFC to get early feedback and enable testing by others
> with Zawrs-enabled hardware. Whilst I have been able to test it compiles
> & passes tests using QEMU, I am waiting on some Zawrs-enabled hardware
> to become available before I carry out performance tests.
>
> Nonetheless, I would be glad to hear any feedback on the general
> approach. Thanks, Daniel
>
>  config/riscv/meson.build          |   5 ++
>  lib/eal/riscv/include/rte_pause.h | 139 ++++++++++++++++++++++++++++++
>  2 files changed, 144 insertions(+)
>
> diff --git a/config/riscv/meson.build b/config/riscv/meson.build
> index 07d7d9da23..4cfdc42ecb 100644
> --- a/config/riscv/meson.build
> +++ b/config/riscv/meson.build
> @@ -26,6 +26,11 @@ flags_common =3D [
>      # read from /proc/device-tree/cpus/timebase-frequency. This property=
 is
>      # guaranteed on Linux, as riscv time_init() requires it.
>      ['RTE_RISCV_TIME_FREQ', 0],
> +
> +    # Enable use of RISC-V Wait-on-Reservation-Set extension (Zawrs)
> +    # Mitigates looping when polling on memory locations
> +    # Make sure to add '_zawrs' to your target's -march below
> +    ['RTE_RISCV_ZAWRS', false]
A bit orthogonal to this patch (or maybe not?)
Should we perhaps add a Qemu target in meson.build which would have
the modified -march for what qemu supports now?
Or perhaps add machine detection logic based either on the "riscv,isa"
cpu@0 property in the DT or RHCT ACPI table?
Or add perhaps some other way we could specify the extension list
suffix for -march?
>  ]
>
>  ## SoC-specific options.
> diff --git a/lib/eal/riscv/include/rte_pause.h b/lib/eal/riscv/include/rt=
e_pause.h
> index cb8e9ca52d..e7b43dffa3 100644
> --- a/lib/eal/riscv/include/rte_pause.h
> +++ b/lib/eal/riscv/include/rte_pause.h
> @@ -11,6 +11,12 @@
>  extern "C" {
>  #endif
>
> +#ifdef RTE_RISCV_ZAWRS
> +#define RTE_WAIT_UNTIL_EQUAL_ARCH_DEFINED
> +#endif
> +
> +#include <rte_debug.h>
> +
>  #include "rte_atomic.h"
>
>  #include "generic/rte_pause.h"
> @@ -24,6 +30,139 @@ static inline void rte_pause(void)
>         asm volatile(".int 0x0100000F" : : : "memory");
>  }
>
> +#ifdef RTE_RISCV_ZAWRS
> +
> +/*
> + * Atomic load from an address, it returns either a sign-extended word o=
r
> + * doubleword and creates a 'reservation set' containing the read memory
> + * location. When someone else writes to the reservation set, it is inva=
lidated,
> + * causing any stalled WRS instructions to resume.
> + *
> + * Address needs to be naturally aligned.
> + */
> +#define __RTE_RISCV_LR_32(src, dst, memorder) do {                \
> +       if ((memorder) =3D=3D rte_memory_order_relaxed) {             \
> +               asm volatile("lr.w %0, (%1)"                      \
> +                               : "=3Dr" (dst)                      \
> +                               : "r" (src)                       \
> +                               : "memory");                      \
> +       } else {                                                  \
> +               asm volatile("lr.w.aq %0, (%1)"                   \
> +                               : "=3Dr" (dst)                      \
> +                               : "r" (src)                       \
> +                               : "memory");                      \
> +       } } while (0)
> +#define __RTE_RISCV_LR_64(src, dst, memorder) do {                \
> +       if ((memorder) =3D=3D rte_memory_order_relaxed) {             \
> +               asm volatile("lr.d %0, (%1)"                      \
> +                               : "=3Dr" (dst)                      \
> +                               : "r" (src)                       \
> +                               : "memory");                      \
> +       } else {                                                  \
> +               asm volatile("lr.d.aq %0, (%1)"                   \
> +                               : "=3Dr" (dst)                      \
> +                               : "r" (src)                       \
> +                               : "memory");                      \
> +       } } while (0)
> +
> +/*
> + * There's not a RISC-V atomic load primitive for halfwords, so cast up =
to a
> + * _naturally aligned_ word and extract the halfword we want
> + */
> +#define __RTE_RISCV_LR_16(src, dst, memorder) do {                      =
\
> +       uint32_t word;                                                  \
> +       __RTE_RISCV_LR_32(((uintptr_t)(src) & (~3)), word, (memorder)); \
> +       if ((size_t)(src) & 3)                                          \
> +               (dst) =3D (typeof(dst))(word >> 16);                     =
 \
> +       else                                                            \
> +               (dst) =3D (typeof(dst))(word & 0xFFFF);                  =
 \
> +} while (0)
> +
> +#define __RTE_RISCV_LR(src, dst, memorder, size) {                \
> +       RTE_BUILD_BUG_ON(size !=3D 16 && size !=3D 32 && size !=3D 64); \
> +       if (size =3D=3D 16)                                           \
> +               __RTE_RISCV_LR_16(src, dst, memorder);            \
> +       else if (size =3D=3D 32)                                      \
> +               __RTE_RISCV_LR_32(src, dst, memorder);            \
> +       else if (size =3D=3D 64)                                      \
> +               __RTE_RISCV_LR_64(src, dst, memorder);            \
> +}
> +
> +/*
> + * Wait-on-Reservation-Set extension instruction, it stalls execution un=
til the
> + * reservation set is invalidated or an interrupt is observed.
> + * A loop is likely still needed as it may stop stalling arbitrarily.
> + */
> +#define __RTE_RISCV_WRS_NTO() { asm volatile("wrs.nto" : : : "memory"); =
}
> +
> +static __rte_always_inline void
> +rte_wait_until_equal_16(volatile uint16_t *addr, uint16_t expected,
> +               int memorder)
> +{
> +       uint16_t value;
> +
> +       RTE_ASSERT(memorder =3D=3D rte_memory_order_acquire ||
> +               memorder =3D=3D rte_memory_order_relaxed);
> +       RTE_ASSERT(((size_t)addr & 1) =3D=3D 0);
> +
> +       __RTE_RISCV_LR_16(addr, value, memorder);
> +       while (value !=3D expected) {
> +               __RTE_RISCV_WRS_NTO();
> +               __RTE_RISCV_LR_16(addr, value, memorder);
> +       }
> +}
> +
> +static __rte_always_inline void
> +rte_wait_until_equal_32(volatile uint32_t *addr, uint32_t expected,
> +               int memorder)
> +{
> +       uint32_t value;
> +
> +       RTE_ASSERT(memorder =3D=3D rte_memory_order_acquire ||
> +               memorder =3D=3D rte_memory_order_relaxed);
> +       RTE_ASSERT(((size_t)addr & 3) =3D=3D 0);
> +
> +       __RTE_RISCV_LR_32(addr, value, memorder);
> +       while (value !=3D expected) {
> +               __RTE_RISCV_WRS_NTO();
> +               __RTE_RISCV_LR_32(addr, value, memorder);
> +       }
> +}
> +
> +static __rte_always_inline void
> +rte_wait_until_equal_64(volatile uint64_t *addr, uint64_t expected,
> +               int memorder)
> +{
> +       uint64_t value;
> +
> +       RTE_ASSERT(memorder =3D=3D rte_memory_order_acquire ||
> +               memorder =3D=3D rte_memory_order_relaxed);
> +       RTE_ASSERT(((size_t)addr & 7) =3D=3D 0);
> +
> +       __RTE_RISCV_LR_64(addr, value, memorder);
> +       while (value !=3D expected) {
> +               __RTE_RISCV_WRS_NTO();
> +               __RTE_RISCV_LR_64(addr, value, memorder);
> +       }
> +}
> +
> +#define RTE_WAIT_UNTIL_MASKED(addr, mask, cond, expected, memorder) do {=
 \
> +       RTE_BUILD_BUG_ON(!__builtin_constant_p(memorder));               =
\
> +       RTE_BUILD_BUG_ON(memorder !=3D rte_memory_order_acquire &&       =
  \
> +               memorder !=3D rte_memory_order_relaxed);                 =
  \
> +       RTE_ASSERT(((size_t)(addr) & (sizeof(*(addr)) - 1)) !=3D 0);     =
  \
> +       const uint32_t size =3D sizeof(*(addr)) << 3;                    =
  \
> +       typeof(*(addr)) expected_value =3D (expected);                   =
  \
> +       typeof(*(addr)) value;                                           =
\
> +       __RTE_RISCV_LR((addr), value, memorder, size);                   =
\
> +       while (!((value & (mask)) cond expected_value)) {                =
\
> +               __RTE_RISCV_WRS_NTO();                                   =
\
> +               __RTE_RISCV_LR((addr), value, memorder, size);           =
\
> +       }                                                                =
\
> +} while (0)
> +
> +#endif /* RTE_RISCV_ZAWRS */
> +
>  #ifdef __cplusplus
>  }
>  #endif
Reviewed-by: Stanislaw Kardach <stanislaw.kardach@gmail.com>
> --
> 2.39.2
>

>  ]
>
>  ## SoC-specific options.
> diff --git a/lib/eal/riscv/include/rte_pause.h b/lib/eal/riscv/include/rt=
e_pause.h
> index cb8e9ca52d..e7b43dffa3 100644
> --- a/lib/eal/riscv/include/rte_pause.h
> +++ b/lib/eal/riscv/include/rte_pause.h
> @@ -11,6 +11,12 @@
>  extern "C" {
>  #endif
>
> +#ifdef RTE_RISCV_ZAWRS
> +#define RTE_WAIT_UNTIL_EQUAL_ARCH_DEFINED
> +#endif
> +
> +#include <rte_debug.h>
> +
>  #include "rte_atomic.h"
>
>  #include "generic/rte_pause.h"
> @@ -24,6 +30,139 @@ static inline void rte_pause(void)
>         asm volatile(".int 0x0100000F" : : : "memory");
>  }
>
> +#ifdef RTE_RISCV_ZAWRS
> +
> +/*
> + * Atomic load from an address, it returns either a sign-extended word o=
r
> + * doubleword and creates a 'reservation set' containing the read memory
> + * location. When someone else writes to the reservation set, it is inva=
lidated,
> + * causing any stalled WRS instructions to resume.
> + *
> + * Address needs to be naturally aligned.
> + */
> +#define __RTE_RISCV_LR_32(src, dst, memorder) do {                \
> +       if ((memorder) =3D=3D rte_memory_order_relaxed) {             \
> +               asm volatile("lr.w %0, (%1)"                      \
> +                               : "=3Dr" (dst)                      \
> +                               : "r" (src)                       \
> +                               : "memory");                      \
> +       } else {                                                  \
> +               asm volatile("lr.w.aq %0, (%1)"                   \
> +                               : "=3Dr" (dst)                      \
> +                               : "r" (src)                       \
> +                               : "memory");                      \
> +       } } while (0)
> +#define __RTE_RISCV_LR_64(src, dst, memorder) do {                \
> +       if ((memorder) =3D=3D rte_memory_order_relaxed) {             \
> +               asm volatile("lr.d %0, (%1)"                      \
> +                               : "=3Dr" (dst)                      \
> +                               : "r" (src)                       \
> +                               : "memory");                      \
> +       } else {                                                  \
> +               asm volatile("lr.d.aq %0, (%1)"                   \
> +                               : "=3Dr" (dst)                      \
> +                               : "r" (src)                       \
> +                               : "memory");                      \
> +       } } while (0)
> +
> +/*
> + * There's not a RISC-V atomic load primitive for halfwords, so cast up =
to a
> + * _naturally aligned_ word and extract the halfword we want
> + */
> +#define __RTE_RISCV_LR_16(src, dst, memorder) do {                      =
\
> +       uint32_t word;                                                  \
> +       __RTE_RISCV_LR_32(((uintptr_t)(src) & (~3)), word, (memorder)); \
> +       if ((size_t)(src) & 3)                                          \
> +               (dst) =3D (typeof(dst))(word >> 16);                     =
 \
> +       else                                                            \
> +               (dst) =3D (typeof(dst))(word & 0xFFFF);                  =
 \
> +} while (0)
> +
> +#define __RTE_RISCV_LR(src, dst, memorder, size) {                \
> +       RTE_BUILD_BUG_ON(size !=3D 16 && size !=3D 32 && size !=3D 64); \
> +       if (size =3D=3D 16)                                           \
> +               __RTE_RISCV_LR_16(src, dst, memorder);            \
> +       else if (size =3D=3D 32)                                      \
> +               __RTE_RISCV_LR_32(src, dst, memorder);            \
> +       else if (size =3D=3D 64)                                      \
> +               __RTE_RISCV_LR_64(src, dst, memorder);            \
> +}
> +
> +/*
> + * Wait-on-Reservation-Set extension instruction, it stalls execution un=
til the
> + * reservation set is invalidated or an interrupt is observed.
> + * A loop is likely still needed as it may stop stalling arbitrarily.
> + */
> +#define __RTE_RISCV_WRS_NTO() { asm volatile("wrs.nto" : : : "memory"); =
}
> +
> +static __rte_always_inline void
> +rte_wait_until_equal_16(volatile uint16_t *addr, uint16_t expected,
> +               int memorder)
> +{
> +       uint16_t value;
> +
> +       RTE_ASSERT(memorder =3D=3D rte_memory_order_acquire ||
> +               memorder =3D=3D rte_memory_order_relaxed);
> +       RTE_ASSERT(((size_t)addr & 1) =3D=3D 0);
> +
> +       __RTE_RISCV_LR_16(addr, value, memorder);
> +       while (value !=3D expected) {
> +               __RTE_RISCV_WRS_NTO();
> +               __RTE_RISCV_LR_16(addr, value, memorder);
> +       }
> +}
> +
> +static __rte_always_inline void
> +rte_wait_until_equal_32(volatile uint32_t *addr, uint32_t expected,
> +               int memorder)
> +{
> +       uint32_t value;
> +
> +       RTE_ASSERT(memorder =3D=3D rte_memory_order_acquire ||
> +               memorder =3D=3D rte_memory_order_relaxed);
> +       RTE_ASSERT(((size_t)addr & 3) =3D=3D 0);
> +
> +       __RTE_RISCV_LR_32(addr, value, memorder);
> +       while (value !=3D expected) {
> +               __RTE_RISCV_WRS_NTO();
> +               __RTE_RISCV_LR_32(addr, value, memorder);
> +       }
> +}
> +
> +static __rte_always_inline void
> +rte_wait_until_equal_64(volatile uint64_t *addr, uint64_t expected,
> +               int memorder)
> +{
> +       uint64_t value;
> +
> +       RTE_ASSERT(memorder =3D=3D rte_memory_order_acquire ||
> +               memorder =3D=3D rte_memory_order_relaxed);
> +       RTE_ASSERT(((size_t)addr & 7) =3D=3D 0);
> +
> +       __RTE_RISCV_LR_64(addr, value, memorder);
> +       while (value !=3D expected) {
> +               __RTE_RISCV_WRS_NTO();
> +               __RTE_RISCV_LR_64(addr, value, memorder);
> +       }
> +}
> +
> +#define RTE_WAIT_UNTIL_MASKED(addr, mask, cond, expected, memorder) do {=
 \
> +       RTE_BUILD_BUG_ON(!__builtin_constant_p(memorder));               =
\
> +       RTE_BUILD_BUG_ON(memorder !=3D rte_memory_order_acquire &&       =
  \
> +               memorder !=3D rte_memory_order_relaxed);                 =
  \
> +       RTE_ASSERT(((size_t)(addr) & (sizeof(*(addr)) - 1)) !=3D 0);     =
  \
> +       const uint32_t size =3D sizeof(*(addr)) << 3;                    =
  \
> +       typeof(*(addr)) expected_value =3D (expected);                   =
  \
> +       typeof(*(addr)) value;                                           =
\
> +       __RTE_RISCV_LR((addr), value, memorder, size);                   =
\
> +       while (!((value & (mask)) cond expected_value)) {                =
\
> +               __RTE_RISCV_WRS_NTO();                                   =
\
> +               __RTE_RISCV_LR((addr), value, memorder, size);           =
\
> +       }                                                                =
\
> +} while (0)
> +
> +#endif /* RTE_RISCV_ZAWRS */
> +
>  #ifdef __cplusplus
>  }
>  #endif
> --
> 2.39.2
>