From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124])
	by inbox.dpdk.org (Postfix) with ESMTP id A681F43F69;
	Thu,  2 May 2024 16:44:51 +0200 (CEST)
Received: from mails.dpdk.org (localhost [127.0.0.1])
	by mails.dpdk.org (Postfix) with ESMTP id 69EFE402B2;
	Thu,  2 May 2024 16:44:51 +0200 (CEST)
Received: from mail-wm1-f42.google.com (mail-wm1-f42.google.com
 [209.85.128.42]) by mails.dpdk.org (Postfix) with ESMTP id C30EC40299
 for <dev@dpdk.org>; Thu,  2 May 2024 16:44:49 +0200 (CEST)
Received: by mail-wm1-f42.google.com with SMTP id
 5b1f17b1804b1-41ba1ba55ffso11641375e9.1
 for <dev@dpdk.org>; Thu, 02 May 2024 07:44:49 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=bytedance.com; s=google; t=1714661089; x=1715265889; darn=dpdk.org;
 h=content-transfer-encoding:mime-version:message-id:date:subject:cc
 :to:from:from:to:cc:subject:date:message-id:reply-to;
 bh=Xi5qeL1//36OXBpVb9kJN6nJwKIDmbwjWb4jYwqpcCU=;
 b=kH1kg8OwovX5d1qw8kO/4s8WSrmABjLz9cjfSE/SiLQq9ihF5AUEIkShOxHCm3d4bm
 BONs+egD3MHJ4lWA0m0KJmKNGAwDBvXBy1lR1ksY/aRoAI5UaBSWW43maK9wlC/SYgmh
 UVwAUlQuk4lRORH8ot5JS3fs4XnJBjxmvxjFdpGBOijB1Mhnr7NWNilb4uwKvTChtNjq
 nJCG1CTxBCIMWA/1A3coT29caN1QKQtW1H+n0Wu4gkHLhtKis5u/DIQPzKLWPyXhpABV
 hd7pkuSZa/74puE51IaMQnlQV8DHiIYpN0I/+r9YZL4u2zHhY1Xba9OqpfU6oe2dmmPI
 846A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20230601; t=1714661089; x=1715265889;
 h=content-transfer-encoding:mime-version:message-id:date:subject:cc
 :to:from:x-gm-message-state:from:to:cc:subject:date:message-id
 :reply-to;
 bh=Xi5qeL1//36OXBpVb9kJN6nJwKIDmbwjWb4jYwqpcCU=;
 b=hSCe4uQqNMuf6NhQwVARPnnMgy9TcpMDCgcu3Anaqv1EkprkPeOF8lepbeSNxzLNgj
 +X/zQcaPXJiXSIMKezB38ZFPbsHrbFnBaygq1AB0o83YdpAsGORVz94FAL+CdwEw8jcF
 o1gOW/bVUxq98s5ToqNImeXJPJ7f/wLE3enIteri7c6+yaI3bVgbXD7EOWcJcggmYicN
 s8TnfbFxykA3KYkmSU+ZkBEUmr7nlpqmL0+f3ni5+KcEuWZqji7apMgBMjqe0zZBgqKm
 j6vYRNkVSFFmXVGEz5wsebEs6jGA8e7nT/soC6FKQpMSFHhuP7XhkG5ZaXl+0Za/uLJo
 VkdQ==
X-Gm-Message-State: AOJu0YwpzDVjyqxVe+PtpgoVl8WDa8ij5TNKTcKwr7LIN5DoUYTsZEzv
 LrijjyB7YAj2OW9fsnOVQfGGiUqjqggwTuh76/T2uxBTxfBVdW4pquWbd2gXP8c=
X-Google-Smtp-Source: AGHT+IECTNaRMNy1peTTL59zRVWhqYWOVqyPRTpC6bAcq1hq20PHEOV0gdysfHI17AG2U4A9cwpjog==
X-Received: by 2002:a05:600c:3b8f:b0:41c:13f6:1ee1 with SMTP id
 n15-20020a05600c3b8f00b0041c13f61ee1mr2634764wms.4.1714661089253; 
 Thu, 02 May 2024 07:44:49 -0700 (PDT)
Received: from C02FF2N1MD6T.bytedance.net ([79.173.157.19])
 by smtp.gmail.com with ESMTPSA id
 u17-20020a05600c19d100b004186eb69a55sm2185062wmq.25.2024.05.02.07.44.48
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Thu, 02 May 2024 07:44:48 -0700 (PDT)
From: Daniel Gregory <daniel.gregory@bytedance.com>
To: Stanislaw Kardach <stanislaw.kardach@gmail.com>,
 Bruce Richardson <bruce.richardson@intel.com>
Cc: dev@dpdk.org, Liang Ma <liangma@bytedance.com>,
 Daniel Gregory <daniel.gregory@bytedance.com>,
 Punit Agrawal <punit.agrawal@bytedance.com>
Subject: [RFC PATCH] eal/riscv: add support for zawrs extension
Date: Thu,  2 May 2024 15:41:50 +0100
Message-Id: <20240502144149.66446-1-daniel.gregory@bytedance.com>
X-Mailer: git-send-email 2.39.3 (Apple Git-146)
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org

The zawrs extension adds a pair of instructions that stall a core until
a memory location is written to. This patch uses one of them to
implement RISCV-specific versions of the rte_wait_until_equal_*
functions. This is potentially more energy efficient than the default
implementation that uses rte_pause/Zihintpause.

The technique works as follows:

* Create a reservation set containing the address we want to wait on
  using an atomic load (lr.dw)
* Call wrs.nto - this blocks until the reservation set is invalidated by
  someone else writing to that address
* Execution can also resume arbitrarily, so we still need to check
  whether a change occurred and loop if not

Due to RISC-V atomics only supporting naturally aligned word (32 bit)
and double word (64 bit) loads, I've used pointer rounding and bit
shifting to implement waiting on 16-bit values.

This new functionality is controlled by a Meson flag that is disabled by
default.

Signed-off-by: Daniel Gregory <daniel.gregory@bytedance.com>
Suggested-by: Punit Agrawal <punit.agrawal@bytedance.com>
---

Posting as an RFC to get early feedback and enable testing by others
with Zawrs-enabled hardware. Whilst I have been able to test it compiles
& passes tests using QEMU, I am waiting on some Zawrs-enabled hardware
to become available before I carry out performance tests.

Nonetheless, I would be glad to hear any feedback on the general
approach. Thanks, Daniel

 config/riscv/meson.build          |   5 ++
 lib/eal/riscv/include/rte_pause.h | 139 ++++++++++++++++++++++++++++++
 2 files changed, 144 insertions(+)

diff --git a/config/riscv/meson.build b/config/riscv/meson.build
index 07d7d9da23..4cfdc42ecb 100644
--- a/config/riscv/meson.build
+++ b/config/riscv/meson.build
@@ -26,6 +26,11 @@ flags_common = [
     # read from /proc/device-tree/cpus/timebase-frequency. This property is
     # guaranteed on Linux, as riscv time_init() requires it.
     ['RTE_RISCV_TIME_FREQ', 0],
+
+    # Enable use of RISC-V Wait-on-Reservation-Set extension (Zawrs)
+    # Mitigates looping when polling on memory locations
+    # Make sure to add '_zawrs' to your target's -march below
+    ['RTE_RISCV_ZAWRS', false]
 ]
 
 ## SoC-specific options.
diff --git a/lib/eal/riscv/include/rte_pause.h b/lib/eal/riscv/include/rte_pause.h
index cb8e9ca52d..e7b43dffa3 100644
--- a/lib/eal/riscv/include/rte_pause.h
+++ b/lib/eal/riscv/include/rte_pause.h
@@ -11,6 +11,12 @@
 extern "C" {
 #endif
 
+#ifdef RTE_RISCV_ZAWRS
+#define RTE_WAIT_UNTIL_EQUAL_ARCH_DEFINED
+#endif
+
+#include <rte_debug.h>
+
 #include "rte_atomic.h"
 
 #include "generic/rte_pause.h"
@@ -24,6 +30,139 @@ static inline void rte_pause(void)
 	asm volatile(".int 0x0100000F" : : : "memory");
 }
 
+#ifdef RTE_RISCV_ZAWRS
+
+/*
+ * Atomic load from an address, it returns either a sign-extended word or
+ * doubleword and creates a 'reservation set' containing the read memory
+ * location. When someone else writes to the reservation set, it is invalidated,
+ * causing any stalled WRS instructions to resume.
+ *
+ * Address needs to be naturally aligned.
+ */
+#define __RTE_RISCV_LR_32(src, dst, memorder) do {                \
+	if ((memorder) == rte_memory_order_relaxed) {             \
+		asm volatile("lr.w %0, (%1)"                      \
+				: "=r" (dst)                      \
+				: "r" (src)                       \
+				: "memory");                      \
+	} else {                                                  \
+		asm volatile("lr.w.aq %0, (%1)"                   \
+				: "=r" (dst)                      \
+				: "r" (src)                       \
+				: "memory");                      \
+	} } while (0)
+#define __RTE_RISCV_LR_64(src, dst, memorder) do {                \
+	if ((memorder) == rte_memory_order_relaxed) {             \
+		asm volatile("lr.d %0, (%1)"                      \
+				: "=r" (dst)                      \
+				: "r" (src)                       \
+				: "memory");                      \
+	} else {                                                  \
+		asm volatile("lr.d.aq %0, (%1)"                   \
+				: "=r" (dst)                      \
+				: "r" (src)                       \
+				: "memory");                      \
+	} } while (0)
+
+/*
+ * There's not a RISC-V atomic load primitive for halfwords, so cast up to a
+ * _naturally aligned_ word and extract the halfword we want
+ */
+#define __RTE_RISCV_LR_16(src, dst, memorder) do {                      \
+	uint32_t word;                                                  \
+	__RTE_RISCV_LR_32(((uintptr_t)(src) & (~3)), word, (memorder)); \
+	if ((size_t)(src) & 3)                                          \
+		(dst) = (typeof(dst))(word >> 16);                      \
+	else                                                            \
+		(dst) = (typeof(dst))(word & 0xFFFF);                   \
+} while (0)
+
+#define __RTE_RISCV_LR(src, dst, memorder, size) {                \
+	RTE_BUILD_BUG_ON(size != 16 && size != 32 && size != 64); \
+	if (size == 16)                                           \
+		__RTE_RISCV_LR_16(src, dst, memorder);            \
+	else if (size == 32)                                      \
+		__RTE_RISCV_LR_32(src, dst, memorder);            \
+	else if (size == 64)                                      \
+		__RTE_RISCV_LR_64(src, dst, memorder);            \
+}
+
+/*
+ * Wait-on-Reservation-Set extension instruction, it stalls execution until the
+ * reservation set is invalidated or an interrupt is observed.
+ * A loop is likely still needed as it may stop stalling arbitrarily.
+ */
+#define __RTE_RISCV_WRS_NTO() { asm volatile("wrs.nto" : : : "memory"); }
+
+static __rte_always_inline void
+rte_wait_until_equal_16(volatile uint16_t *addr, uint16_t expected,
+		int memorder)
+{
+	uint16_t value;
+
+	RTE_ASSERT(memorder == rte_memory_order_acquire ||
+		memorder == rte_memory_order_relaxed);
+	RTE_ASSERT(((size_t)addr & 1) == 0);
+
+	__RTE_RISCV_LR_16(addr, value, memorder);
+	while (value != expected) {
+		__RTE_RISCV_WRS_NTO();
+		__RTE_RISCV_LR_16(addr, value, memorder);
+	}
+}
+
+static __rte_always_inline void
+rte_wait_until_equal_32(volatile uint32_t *addr, uint32_t expected,
+		int memorder)
+{
+	uint32_t value;
+
+	RTE_ASSERT(memorder == rte_memory_order_acquire ||
+		memorder == rte_memory_order_relaxed);
+	RTE_ASSERT(((size_t)addr & 3) == 0);
+
+	__RTE_RISCV_LR_32(addr, value, memorder);
+	while (value != expected) {
+		__RTE_RISCV_WRS_NTO();
+		__RTE_RISCV_LR_32(addr, value, memorder);
+	}
+}
+
+static __rte_always_inline void
+rte_wait_until_equal_64(volatile uint64_t *addr, uint64_t expected,
+		int memorder)
+{
+	uint64_t value;
+
+	RTE_ASSERT(memorder == rte_memory_order_acquire ||
+		memorder == rte_memory_order_relaxed);
+	RTE_ASSERT(((size_t)addr & 7) == 0);
+
+	__RTE_RISCV_LR_64(addr, value, memorder);
+	while (value != expected) {
+		__RTE_RISCV_WRS_NTO();
+		__RTE_RISCV_LR_64(addr, value, memorder);
+	}
+}
+
+#define RTE_WAIT_UNTIL_MASKED(addr, mask, cond, expected, memorder) do { \
+	RTE_BUILD_BUG_ON(!__builtin_constant_p(memorder));               \
+	RTE_BUILD_BUG_ON(memorder != rte_memory_order_acquire &&         \
+		memorder != rte_memory_order_relaxed);                   \
+	RTE_ASSERT(((size_t)(addr) & (sizeof(*(addr)) - 1)) != 0);       \
+	const uint32_t size = sizeof(*(addr)) << 3;                      \
+	typeof(*(addr)) expected_value = (expected);                     \
+	typeof(*(addr)) value;                                           \
+	__RTE_RISCV_LR((addr), value, memorder, size);                   \
+	while (!((value & (mask)) cond expected_value)) {                \
+		__RTE_RISCV_WRS_NTO();                                   \
+		__RTE_RISCV_LR((addr), value, memorder, size);           \
+	}                                                                \
+} while (0)
+
+#endif /* RTE_RISCV_ZAWRS */
+
 #ifdef __cplusplus
 }
 #endif
-- 
2.39.2