DPDK patches and discussions
 help / color / mirror / Atom feed
* [PATCH v7] eal: add seqlock
       [not found] <20220513103820.3e34fcb9@hermes.local>
@ 2022-05-15 12:24 ` Mattias Rönnblom
  2022-05-15 12:39   ` Mattias Rönnblom
  0 siblings, 1 reply; 20+ messages in thread
From: Mattias Rönnblom @ 2022-05-15 12:24 UTC (permalink / raw)
  To: Thomas Monjalon, David Marchand
  Cc: dev, onar.olsen, Honnappa.Nagarahalli, nd, konstantin.ananyev,
	mb, stephen, hofors, Chengwen Feng, Mattias Rönnblom,
	Ola Liljedahl

A sequence lock (seqlock) is a synchronization primitive which allows
for data-race free, low-overhead, high-frequency reads, suitable for
data structures shared across many cores and which are updated
relatively infrequently.

A seqlock permits multiple parallel readers. A spinlock is used to
serialize writers. In cases where there is only a single writer, or
writer-writer synchronization is done by some external means, the
"raw" sequence counter type (and accompanying rte_seqcount_*()
functions) may be used instead.

To avoid resource reclamation and other issues, the data protected by
a seqlock is best off being self-contained (i.e., no pointers [except
to constant data]).

One way to think about seqlocks is that they provide means to perform
atomic operations on data objects larger than what the native atomic
machine instructions allow for.

DPDK seqlocks (and the underlying sequence counters) are not
preemption safe on the writer side. A thread preemption affects
performance, not correctness.

A seqlock contains a sequence number, which can be thought of as the
generation of the data it protects.

A reader will
  1. Load the sequence number (sn).
  2. Load, in arbitrary order, the seqlock-protected data.
  3. Load the sn again.
  4. Check if the first and second sn are equal, and even numbered.
     If they are not, discard the loaded data, and restart from 1.

The first three steps need to be ordered using suitable memory fences.

A writer will
  1. Take the spinlock, to serialize writer access.
  2. Load the sn.
  3. Store the original sn + 1 as the new sn.
  4. Perform load and stores to the seqlock-protected data.
  5. Store the original sn + 2 as the new sn.
  6. Release the spinlock.

Proper memory fencing is required to make sure the first sn store, the
data stores, and the second sn store appear to the reader in the
mentioned order.

The sn loads and stores must be atomic, but the data loads and stores
need not be.

The original seqlock design and implementation was done by Stephen
Hemminger. This is an independent implementation, using C11 atomics.

For more information on seqlocks, see
https://en.wikipedia.org/wiki/Seqlock

---

PATCH v7
  * Factor out the sequence number into a separate type rte_seqcount_t.

PATCH v6:
  * Check for failed memory allocations in unit test.
  * Fix underflow issue in test case for small RTE_LCORE_MAX values.
  * Fix test case memory leak.

PATCH v5:
  * Add sequence lock section to MAINTAINERS.
  * Add entry in the release notes.
  * Add seqlock reference in the API index.
  * Fix meson build file indentation.
  * Use "increment" to describe how a writer changes the sequence number.
  * Remove compiler barriers from seqlock test.
  * Use appropriate macros (e.g., TEST_SUCCESS) for test return values.

PATCH v4:
  * Reverted to Linux kernel style naming on the read side.
  * Bail out early from the retry function if an odd sequence
    number is encountered.
  * Added experimental warnings in the API documentation.
  * Static initializer now uses named field initialization.
  * Various tweaks to API documentation (including the example).

PATCH v3:
  * Renamed both read and write-side critical section begin/end functions
    to better match rwlock naming, per Ola Liljedahl's suggestion.
  * Added 'extern "C"' guards for C++ compatibility.
  * Refer to the main lcore as the main lcore, and nothing else.

PATCH v2:
  * Skip instead of fail unit test in case too few lcores are available.
  * Use main lcore for testing, reducing the minimum number of lcores
    required to run the unit tests to four.
  * Consistently refer to sn field as the "sequence number" in the
    documentation.
  * Fixed spelling mistakes in documentation.

Updates since RFC:
  * Added API documentation.
  * Added link to Wikipedia article in the commit message.
  * Changed seqlock sequence number field from uint64_t (which was
    overkill) to uint32_t. The sn type needs to be sufficiently large
    to assure no reader will read a sn, access the data, and then read
    the same sn, but the sn has been incremented enough times to have
    wrapped during the read, and arrived back at the original sn.
  * Added RTE_SEQLOCK_INITIALIZER macro for static initialization.
  * Removed the rte_seqlock struct + separate rte_seqlock_t typedef
    with an anonymous struct typedef:ed to rte_seqlock_t.

Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Reviewed-by: Chengwen Feng <fengchengwen@huawei.com>
Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
---
 MAINTAINERS                            |   5 +
 app/test/meson.build                   |   2 +
 app/test/test_seqlock.c                | 190 ++++++++++
 doc/api/doxy-api-index.md              |   1 +
 doc/guides/rel_notes/release_22_07.rst |  14 +
 lib/eal/common/meson.build             |   1 +
 lib/eal/common/rte_seqlock.c           |  18 +
 lib/eal/include/meson.build            |   1 +
 lib/eal/include/rte_seqlock.h          | 464 +++++++++++++++++++++++++
 lib/eal/version.map                    |   4 +
 10 files changed, 700 insertions(+)
 create mode 100644 app/test/test_seqlock.c
 create mode 100644 lib/eal/common/rte_seqlock.c
 create mode 100644 lib/eal/include/rte_seqlock.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 7c4f541dba..2804d8136c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -262,6 +262,11 @@ M: Joyce Kong <joyce.kong@arm.com>
 F: lib/eal/include/generic/rte_ticketlock.h
 F: app/test/test_ticketlock.c
 
+Sequence Lock
+M: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
+F: lib/eal/include/rte_seqlock.h
+F: app/test/test_seqlock.c
+
 Pseudo-random Number Generation
 M: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
 F: lib/eal/include/rte_random.h
diff --git a/app/test/meson.build b/app/test/meson.build
index 5fc1dd1b7b..5e418e8766 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -125,6 +125,7 @@ test_sources = files(
         'test_rwlock.c',
         'test_sched.c',
         'test_security.c',
+        'test_seqlock.c',
         'test_service_cores.c',
         'test_spinlock.c',
         'test_stack.c',
@@ -214,6 +215,7 @@ fast_tests = [
         ['rwlock_rde_wro_autotest', true],
         ['sched_autotest', true],
         ['security_autotest', false],
+        ['seqlock_autotest', true],
         ['spinlock_autotest', true],
         ['stack_autotest', false],
         ['stack_lf_autotest', false],
diff --git a/app/test/test_seqlock.c b/app/test/test_seqlock.c
new file mode 100644
index 0000000000..cb1c1baa82
--- /dev/null
+++ b/app/test/test_seqlock.c
@@ -0,0 +1,190 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2022 Ericsson AB
+ */
+
+#include <rte_seqlock.h>
+
+#include <rte_cycles.h>
+#include <rte_malloc.h>
+#include <rte_random.h>
+
+#include <inttypes.h>
+
+#include "test.h"
+
+struct data {
+	rte_seqlock_t lock;
+
+	uint64_t a;
+	uint64_t b __rte_cache_aligned;
+	uint64_t c __rte_cache_aligned;
+} __rte_cache_aligned;
+
+struct reader {
+	struct data *data;
+	uint8_t stop;
+};
+
+#define WRITER_RUNTIME (2.0) /* s */
+
+#define WRITER_MAX_DELAY (100) /* us */
+
+#define INTERRUPTED_WRITER_FREQUENCY (1000)
+#define WRITER_INTERRUPT_TIME (1) /* us */
+
+static int
+writer_run(void *arg)
+{
+	struct data *data = arg;
+	uint64_t deadline;
+
+	deadline = rte_get_timer_cycles() +
+		WRITER_RUNTIME * rte_get_timer_hz();
+
+	while (rte_get_timer_cycles() < deadline) {
+		bool interrupted;
+		uint64_t new_value;
+		unsigned int delay;
+
+		new_value = rte_rand();
+
+		interrupted = rte_rand_max(INTERRUPTED_WRITER_FREQUENCY) == 0;
+
+		rte_seqlock_write_lock(&data->lock);
+
+		data->c = new_value;
+		data->b = new_value;
+
+		if (interrupted)
+			rte_delay_us_block(WRITER_INTERRUPT_TIME);
+
+		data->a = new_value;
+
+		rte_seqlock_write_unlock(&data->lock);
+
+		delay = rte_rand_max(WRITER_MAX_DELAY);
+
+		rte_delay_us_block(delay);
+	}
+
+	return TEST_SUCCESS;
+}
+
+#define INTERRUPTED_READER_FREQUENCY (1000)
+#define READER_INTERRUPT_TIME (1000) /* us */
+
+static int
+reader_run(void *arg)
+{
+	struct reader *r = arg;
+	int rc = TEST_SUCCESS;
+
+	while (__atomic_load_n(&r->stop, __ATOMIC_RELAXED) == 0 &&
+	       rc == TEST_SUCCESS) {
+		struct data *data = r->data;
+		bool interrupted;
+		uint32_t sn;
+		uint64_t a;
+		uint64_t b;
+		uint64_t c;
+
+		interrupted = rte_rand_max(INTERRUPTED_READER_FREQUENCY) == 0;
+
+		do {
+			sn = rte_seqlock_read_begin(&data->lock);
+
+			a = data->a;
+			if (interrupted)
+				rte_delay_us_block(READER_INTERRUPT_TIME);
+			c = data->c;
+			b = data->b;
+
+		} while (rte_seqlock_read_retry(&data->lock, sn));
+
+		if (a != b || b != c) {
+			printf("Reader observed inconsistent data values "
+			       "%" PRIu64 " %" PRIu64 " %" PRIu64 "\n",
+			       a, b, c);
+			rc = TEST_FAILED;
+		}
+	}
+
+	return rc;
+}
+
+static void
+reader_stop(struct reader *reader)
+{
+	__atomic_store_n(&reader->stop, 1, __ATOMIC_RELAXED);
+}
+
+#define NUM_WRITERS (2) /* main lcore + one worker */
+#define MIN_NUM_READERS (2)
+#define MIN_LCORE_COUNT (NUM_WRITERS + MIN_NUM_READERS)
+
+/* Only a compile-time test */
+static rte_seqlock_t __rte_unused static_init_lock = RTE_SEQLOCK_INITIALIZER;
+
+static int
+test_seqlock(void)
+{
+	struct reader readers[RTE_MAX_LCORE];
+	unsigned int num_lcores;
+	unsigned int num_readers;
+	struct data *data;
+	unsigned int i;
+	unsigned int lcore_id;
+	unsigned int reader_lcore_ids[RTE_MAX_LCORE];
+	unsigned int worker_writer_lcore_id = 0;
+	int rc = TEST_SUCCESS;
+
+	num_lcores = rte_lcore_count();
+
+	if (num_lcores < MIN_LCORE_COUNT) {
+		printf("Too few cores to run test. Skipping.\n");
+		return TEST_SKIPPED;
+	}
+
+	num_readers = num_lcores - NUM_WRITERS;
+
+	data = rte_zmalloc(NULL, sizeof(struct data), 0);
+
+	if (data == NULL) {
+		printf("Failed to allocate memory for seqlock data\n");
+		return TEST_FAILED;
+	}
+
+	i = 0;
+	RTE_LCORE_FOREACH_WORKER(lcore_id) {
+		if (i == 0) {
+			rte_eal_remote_launch(writer_run, data, lcore_id);
+			worker_writer_lcore_id = lcore_id;
+		} else {
+			unsigned int reader_idx = i - 1;
+			struct reader *reader = &readers[reader_idx];
+
+			reader->data = data;
+			reader->stop = 0;
+
+			rte_eal_remote_launch(reader_run, reader, lcore_id);
+			reader_lcore_ids[reader_idx] = lcore_id;
+		}
+		i++;
+	}
+
+	if (writer_run(data) != 0 ||
+	    rte_eal_wait_lcore(worker_writer_lcore_id) != 0)
+		rc = TEST_FAILED;
+
+	for (i = 0; i < num_readers; i++) {
+		reader_stop(&readers[i]);
+		if (rte_eal_wait_lcore(reader_lcore_ids[i]) != 0)
+			rc = TEST_FAILED;
+	}
+
+	rte_free(data);
+
+	return rc;
+}
+
+REGISTER_TEST_COMMAND(seqlock_autotest, test_seqlock);
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index 4245b9635c..f23e33ae30 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -77,6 +77,7 @@ The public API headers are grouped by topics:
   [rwlock]             (@ref rte_rwlock.h),
   [spinlock]           (@ref rte_spinlock.h),
   [ticketlock]         (@ref rte_ticketlock.h),
+  [seqlock]            (@ref rte_seqlock.h),
   [RCU]                (@ref rte_rcu_qsbr.h)
 
 - **CPU arch**:
diff --git a/doc/guides/rel_notes/release_22_07.rst b/doc/guides/rel_notes/release_22_07.rst
index 88d6e96cc1..d2f7bafe7b 100644
--- a/doc/guides/rel_notes/release_22_07.rst
+++ b/doc/guides/rel_notes/release_22_07.rst
@@ -55,6 +55,20 @@ New Features
      Also, make sure to start the actual text at the margin.
      =======================================================
 
+* **Added Sequence Lock.**
+
+  Added a new synchronization primitive: the sequence lock
+  (seqlock). A seqlock allows for low overhead, parallel reads. The
+  DPDK seqlock uses a spinlock to serialize multiple writing threads.
+
+  In particular, seqlocks are useful for protecting data structures
+  which are read very frequently, by threads running on many different
+  cores, and modified relatively infrequently.
+
+  One way to think about seqlocks is that they provide means to
+  perform atomic operations on data objects larger than what the
+  native atomic machine instructions allow for.
+
 * **Updated Intel iavf driver.**
 
   * Added Tx QoS queue rate limitation support.
diff --git a/lib/eal/common/meson.build b/lib/eal/common/meson.build
index 917758cc65..3c896711e5 100644
--- a/lib/eal/common/meson.build
+++ b/lib/eal/common/meson.build
@@ -35,6 +35,7 @@ sources += files(
         'rte_malloc.c',
         'rte_random.c',
         'rte_reciprocal.c',
+        'rte_seqlock.c',
         'rte_service.c',
         'rte_version.c',
 )
diff --git a/lib/eal/common/rte_seqlock.c b/lib/eal/common/rte_seqlock.c
new file mode 100644
index 0000000000..370ff9bbb2
--- /dev/null
+++ b/lib/eal/common/rte_seqlock.c
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2022 Ericsson AB
+ */
+
+#include <rte_seqlock.h>
+
+void
+rte_seqcount_init(rte_seqcount_t *seqcount)
+{
+	seqcount->sn = 0;
+}
+
+void
+rte_seqlock_init(rte_seqlock_t *seqlock)
+{
+	rte_seqcount_init(&seqlock->count);
+	rte_spinlock_init(&seqlock->lock);
+}
diff --git a/lib/eal/include/meson.build b/lib/eal/include/meson.build
index 9700494816..48df5f1a21 100644
--- a/lib/eal/include/meson.build
+++ b/lib/eal/include/meson.build
@@ -36,6 +36,7 @@ headers += files(
         'rte_per_lcore.h',
         'rte_random.h',
         'rte_reciprocal.h',
+        'rte_seqlock.h',
         'rte_service.h',
         'rte_service_component.h',
         'rte_string_fns.h',
diff --git a/lib/eal/include/rte_seqlock.h b/lib/eal/include/rte_seqlock.h
new file mode 100644
index 0000000000..f0522b39b8
--- /dev/null
+++ b/lib/eal/include/rte_seqlock.h
@@ -0,0 +1,464 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2022 Ericsson AB
+ */
+
+#ifndef _RTE_SEQLOCK_H_
+#define _RTE_SEQLOCK_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * @file
+ * RTE Seqlock
+ *
+ * A sequence lock (seqlock) is a synchronization primitive allowing
+ * multiple, parallel, readers to efficiently and safely (i.e., in a
+ * data-race free manner) access lock-protected data. The RTE seqlock
+ * permits multiple writers as well. A spinlock is used for
+ * writer-writer synchronization.
+ *
+ * A reader never blocks a writer. Very high frequency writes may
+ * prevent readers from making progress.
+ *
+ * A seqlock is not preemption-safe on the writer side. If a writer is
+ * preempted, it may block readers until the writer thread is allowed
+ * to continue. Heavy computations should be kept out of the
+ * writer-side critical section, to avoid delaying readers.
+ *
+ * Seqlocks are useful for data which are read by many cores, at a
+ * high frequency, and relatively infrequently written to.
+ *
+ * One way to think about seqlocks is that they provide means to
+ * perform atomic operations on objects larger than what the native
+ * machine instructions allow for.
+ *
+ * To avoid resource reclamation issues, the data protected by a
+ * seqlock should typically be kept self-contained (e.g., no pointers
+ * to mutable, dynamically allocated data).
+ *
+ * Example usage:
+ * @code{.c}
+ * #define MAX_Y_LEN (16)
+ * // Application-defined example data structure, protected by a seqlock.
+ * struct config {
+ *         rte_seqlock_t lock;
+ *         int param_x;
+ *         char param_y[MAX_Y_LEN];
+ * };
+ *
+ * // Accessor function for reading config fields.
+ * void
+ * config_read(const struct config *config, int *param_x, char *param_y)
+ * {
+ *         uint32_t sn;
+ *
+ *         do {
+ *                 sn = rte_seqlock_read_begin(&config->lock);
+ *
+ *                 // Loads may be atomic or non-atomic, as in this example.
+ *                 *param_x = config->param_x;
+ *                 strcpy(param_y, config->param_y);
+ *                 // An alternative to an immediate retry is to abort and
+ *                 // try again at some later time, assuming progress is
+ *                 // possible without the data.
+ *         } while (rte_seqlock_read_retry(&config->lock));
+ * }
+ *
+ * // Accessor function for writing config fields.
+ * void
+ * config_update(struct config *config, int param_x, const char *param_y)
+ * {
+ *         rte_seqlock_write_lock(&config->lock);
+ *         // Stores may be atomic or non-atomic, as in this example.
+ *         config->param_x = param_x;
+ *         strcpy(config->param_y, param_y);
+ *         rte_seqlock_write_unlock(&config->lock);
+ * }
+ * @endcode
+ *
+ * In case there is only a single writer, or writer-writer
+ * serialization is provided by other means, the use of sequence lock
+ * (i.e., rte_seqlock_t) can be replaced with the use of the "raw"
+ * rte_seqcount_t type instead.
+ *
+ * @see
+ * https://en.wikipedia.org/wiki/Seqlock.
+ */
+
+#include <stdbool.h>
+#include <stdint.h>
+
+#include <rte_atomic.h>
+#include <rte_branch_prediction.h>
+#include <rte_spinlock.h>
+
+/**
+ * The RTE seqcount type.
+ */
+typedef struct {
+	uint32_t sn; /**< A sequence number for the protected data. */
+} rte_seqcount_t;
+
+/**
+ * A static seqlock initializer.
+ */
+#define RTE_SEQCOUNT_INITIALIZER \
+	{							\
+		.sn = 0						\
+	}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Initialize the sequence counter.
+ *
+ * @param seqcount
+ *   A pointer to the sequence counter.
+ */
+__rte_experimental
+void
+rte_seqcount_init(rte_seqcount_t *seqcount);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Begin a read-side critical section.
+ *
+ * A call to this function marks the beginning of a read-side critical
+ * section, for @p seqcount.
+ *
+ * rte_seqcount_read_begin() returns a sequence number, which is later
+ * used in rte_seqcount_read_retry() to check if the protected data
+ * underwent any modifications during the read transaction.
+ *
+ * After (in program order) rte_seqcount_read_begin() has been called,
+ * the calling thread reads the protected data, for later use. The
+ * protected data read *must* be copied (either in pristine form, or
+ * in the form of some derivative), since the caller may only read the
+ * data from within the read-side critical section (i.e., after
+ * rte_seqcount_read_begin() and before rte_seqcount_read_retry()),
+ * but must not act upon the retrieved data while in the critical
+ * section, since it does not yet know if it is consistent.
+ *
+ * The protected data may be read using atomic and/or non-atomic
+ * operations.
+ *
+ * After (in program order) all required data loads have been
+ * performed, rte_seqcount_read_retry() should be called, marking
+ * the end of the read-side critical section.
+ *
+ * If rte_seqcount_read_retry() returns true, the just-read data is
+ * inconsistent and should be discarded. The caller has the option to
+ * either restart the whole procedure right away (i.e., calling
+ * rte_seqcount_read_begin() again), or do the same at some later time.
+ *
+ * If rte_seqcount_read_retry() returns false, the data was read
+ * atomically and the copied data is consistent.
+ *
+ * @param seqcount
+ *   A pointer to the sequence counter.
+ * @return
+ *   The seqcount sequence number for this critical section, to
+ *   later be passed to rte_seqcount_read_retry().
+ *
+ * @see rte_seqcount_read_retry()
+ */
+
+__rte_experimental
+static inline uint32_t
+rte_seqcount_read_begin(const rte_seqcount_t *seqcount)
+{
+	/* __ATOMIC_ACQUIRE to prevent loads after (in program order)
+	 * from happening before the sn load. Synchronizes-with the
+	 * store release in rte_seqcount_write_end().
+	 */
+	return __atomic_load_n(&seqcount->sn, __ATOMIC_ACQUIRE);
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * End a read-side critical section.
+ *
+ * A call to this function marks the end of a read-side critical
+ * section, for @p seqcount. The application must supply the sequence
+ * number produced by the corresponding rte_seqcount_read_begin() call.
+ *
+ * After this function has been called, the caller should not access
+ * the protected data.
+ *
+ * In case rte_seqcount_read_retry() returns true, the just-read data
+ * was modified as it was being read and may be inconsistent, and thus
+ * should be discarded.
+ *
+ * In case this function returns false, the data is consistent and the
+ * set of atomic and non-atomic load operations performed between
+ * rte_seqcount_read_begin() and rte_seqcount_read_retry() were atomic,
+ * as a whole.
+ *
+ * @param seqcount
+ *   A pointer to the sequence counter.
+ * @param begin_sn
+ *   The sequence number returned by rte_seqcount_read_begin().
+ * @return
+ *   true or false, if the just-read seqcount-protected data was
+ *   inconsistent or consistent, respectively, at the time it was
+ *   read.
+ *
+ * @see rte_seqcount_read_begin()
+ */
+
+__rte_experimental
+static inline bool
+rte_seqcount_read_retry(const rte_seqcount_t *seqcount, uint32_t begin_sn)
+{
+	uint32_t end_sn;
+
+	/* An odd sequence number means the protected data was being
+	 * modified already at the point of the rte_seqcount_read_begin()
+	 * call.
+	 */
+	if (unlikely(begin_sn & 1))
+		return true;
+
+	/* make sure the data loads happens before the sn load */
+	rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
+
+	end_sn = __atomic_load_n(&seqcount->sn, __ATOMIC_RELAXED);
+
+	/* A writer incremented the sequence number during this read
+	 * critical section.
+	 */
+	if (unlikely(begin_sn != end_sn))
+		return true;
+
+	return false;
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Begin a write-side critical section.
+ *
+ * A call to this function marks the beginning of a write-side
+ * critical section, after which the caller may go on to modify (both
+ * read and write) the protected data, in an atomic or non-atomic
+ * manner.
+ *
+ * After the necessary updates have been performed, the application
+ * calls rte_seqcount_write_end().
+ *
+ * Multiple, parallel writers must use some external serialization.
+ *
+ * This function is not preemption-safe in the sense that preemption
+ * of the calling thread may block reader progress until the writer
+ * thread is rescheduled.
+ *
+ * @param seqcount
+ *   A pointer to the sequence counter.
+ *
+ * @see rte_seqcount_write_end()
+ */
+
+__rte_experimental
+static inline void
+rte_seqcount_write_begin(rte_seqcount_t *seqcount)
+{
+	uint32_t sn;
+
+	sn = seqcount->sn + 1;
+
+	__atomic_store_n(&seqcount->sn, sn, __ATOMIC_RELAXED);
+
+	/* __ATOMIC_RELEASE to prevent stores after (in program order)
+	 * from happening before the sn store.
+	 */
+	rte_atomic_thread_fence(__ATOMIC_RELEASE);
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * End a write-side critical section.
+ *
+ * A call to this function marks the end of the write-side critical
+ * section, for @p seqcount. After this call has been made, the
+ * protected data may no longer be modified.
+ *
+ * @param seqcount
+ *   A pointer to the sequence counter.
+ *
+ * @see rte_seqcount_write_begin()
+ */
+__rte_experimental
+static inline void
+rte_seqcount_write_end(rte_seqcount_t *seqcount)
+{
+	uint32_t sn;
+
+	sn = seqcount->sn + 1;
+
+	/* synchronizes-with the load acquire in rte_seqcount_read_begin() */
+	__atomic_store_n(&seqcount->sn, sn, __ATOMIC_RELEASE);
+}
+
+/**
+ * The RTE seqlock type.
+ */
+typedef struct {
+	rte_seqcount_t count; /**< Sequence count for the protected data. */
+	rte_spinlock_t lock; /**< Spinlock used to serialize writers.  */
+} rte_seqlock_t;
+
+/**
+ * A static seqlock initializer.
+ */
+#define RTE_SEQLOCK_INITIALIZER \
+	{							\
+		.count = RTE_SEQCOUNT_INITIALIZER,		\
+		.lock = RTE_SPINLOCK_INITIALIZER		\
+	}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Initialize the seqlock.
+ *
+ * This function initializes the seqlock, and leaves the writer-side
+ * spinlock unlocked.
+ *
+ * @param seqlock
+ *   A pointer to the seqlock.
+ */
+__rte_experimental
+void
+rte_seqlock_init(rte_seqlock_t *seqlock);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Begin a read-side critical section.
+ *
+ * See rte_seqcount_read_retry() for details.
+ *
+ * @param seqlock
+ *   A pointer to the seqlock.
+ * @return
+ *   The seqlock sequence number for this critical section, to
+ *   later be passed to rte_seqlock_read_retry().
+ *
+ * @see rte_seqlock_read_retry()
+ * @see rte_seqcount_read_retry()
+ */
+
+__rte_experimental
+static inline uint32_t
+rte_seqlock_read_begin(const rte_seqlock_t *seqlock)
+{
+	rte_seqcount_read_begin(&seqlock->count);
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * End a read-side critical section.
+ *
+ * See rte_seqcount_read_retry() for details.
+ *
+ * @param seqlock
+ *   A pointer to the seqlock.
+ * @param begin_sn
+ *   The seqlock sequence number returned by rte_seqlock_read_begin().
+ * @return
+ *   true or false, if the just-read seqlock-protected data was
+ *   inconsistent or consistent, respectively, at the time it was
+ *   read.
+ *
+ * @see rte_seqlock_read_begin()
+ */
+__rte_experimental
+static inline bool
+rte_seqlock_read_retry(const rte_seqlock_t *seqlock, uint32_t begin_sn)
+{
+	return rte_seqcount_read_retry(&seqlock->count, begin_sn);
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Begin a write-side critical section.
+ *
+ * A call to this function acquires the write lock associated @p
+ * seqlock, and marks the beginning of a write-side critical section.
+ *
+ * After having called this function, the caller may go on to modify
+ * (both read and write) the protected data, in an atomic or
+ * non-atomic manner.
+ *
+ * After the necessary updates have been performed, the application
+ * calls rte_seqlock_write_unlock().
+ *
+ * This function is not preemption-safe in the sense that preemption
+ * of the calling thread may block reader progress until the writer
+ * thread is rescheduled.
+ *
+ * Unlike rte_seqlock_read_begin(), each call made to
+ * rte_seqlock_write_lock() must be matched with an unlock call.
+ *
+ * @param seqlock
+ *   A pointer to the seqlock.
+ *
+ * @see rte_seqlock_write_unlock()
+ */
+__rte_experimental
+static inline void
+rte_seqlock_write_lock(rte_seqlock_t *seqlock)
+{
+	uint32_t sn;
+
+	/* to synchronize with other writers */
+	rte_spinlock_lock(&seqlock->lock);
+
+	rte_seqcount_write_begin(&seqlock->count);
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * End a write-side critical section.
+ *
+ * A call to this function marks the end of the write-side critical
+ * section, for @p seqlock. After this call has been made, the protected
+ * data may no longer be modified.
+ *
+ * @param seqlock
+ *   A pointer to the seqlock.
+ *
+ * @see rte_seqlock_write_lock()
+ */
+__rte_experimental
+static inline void
+rte_seqlock_write_unlock(rte_seqlock_t *seqlock)
+{
+	rte_seqcount_write_end(&seqlock->count);
+
+	rte_spinlock_unlock(&seqlock->lock);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif  /* _RTE_SEQLOCK_H_ */
diff --git a/lib/eal/version.map b/lib/eal/version.map
index b53eeb30d7..19efc13d63 100644
--- a/lib/eal/version.map
+++ b/lib/eal/version.map
@@ -420,6 +420,10 @@ EXPERIMENTAL {
 	rte_intr_instance_free;
 	rte_intr_type_get;
 	rte_intr_type_set;
+
+	# added in 22.07
+	rte_seqcount_init;
+	rte_seqlock_init;
 };
 
 INTERNAL {
-- 
2.25.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v7] eal: add seqlock
  2022-05-15 12:24 ` [PATCH v7] eal: add seqlock Mattias Rönnblom
@ 2022-05-15 12:39   ` Mattias Rönnblom
  2022-05-15 15:19     ` Morten Brørup
                       ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Mattias Rönnblom @ 2022-05-15 12:39 UTC (permalink / raw)
  To: Thomas Monjalon, David Marchand
  Cc: dev, Onar Olsen, Honnappa.Nagarahalli, nd, konstantin.ananyev,
	mb, stephen, hofors, Chengwen Feng, Ola Liljedahl

On 2022-05-15 14:24, Mattias Rönnblom wrote:
> A sequence lock (seqlock) is a synchronization primitive which allows
> for data-race free, low-overhead, high-frequency reads, suitable for
> data structures shared across many cores and which are updated
> relatively infrequently.
> 
> A seqlock permits multiple parallel readers. A spinlock is used to
> serialize writers. In cases where there is only a single writer, or
> writer-writer synchronization is done by some external means, the
> "raw" sequence counter type (and accompanying rte_seqcount_*()
> functions) may be used instead.
> 
> To avoid resource reclamation and other issues, the data protected by
> a seqlock is best off being self-contained (i.e., no pointers [except
> to constant data]).
> 
> One way to think about seqlocks is that they provide means to perform
> atomic operations on data objects larger than what the native atomic
> machine instructions allow for.
> 
> DPDK seqlocks (and the underlying sequence counters) are not
> preemption safe on the writer side. A thread preemption affects
> performance, not correctness.
> 
> A seqlock contains a sequence number, which can be thought of as the
> generation of the data it protects.
> 
> A reader will
>    1. Load the sequence number (sn).
>    2. Load, in arbitrary order, the seqlock-protected data.
>    3. Load the sn again.
>    4. Check if the first and second sn are equal, and even numbered.
>       If they are not, discard the loaded data, and restart from 1.
> 
> The first three steps need to be ordered using suitable memory fences.
> 
> A writer will
>    1. Take the spinlock, to serialize writer access.
>    2. Load the sn.
>    3. Store the original sn + 1 as the new sn.
>    4. Perform load and stores to the seqlock-protected data.
>    5. Store the original sn + 2 as the new sn.
>    6. Release the spinlock.
> 
> Proper memory fencing is required to make sure the first sn store, the
> data stores, and the second sn store appear to the reader in the
> mentioned order.
> 
> The sn loads and stores must be atomic, but the data loads and stores
> need not be.
> 
> The original seqlock design and implementation was done by Stephen
> Hemminger. This is an independent implementation, using C11 atomics.
> 
> For more information on seqlocks, see
> https://en.wikipedia.org/wiki/Seqlock
> 
> ---
> 

A note to previous reviewers: This split of seqlock into 
seqcount+seqlock assumes that the spinlock lock/unlock calls provided no 
additional functionality in regards to MT safety, than writer-writer 
serialization. I believe that to be the case, but I would feel more 
comfortable if someone else re-reviewed this code with this in mind.

Two questions remain:

1) Should the seqlock and the seqcount reside in different header files?
2) Is it it good enough to provided only a spinlock-protected seqlock?

Question 1 I don't really have an opinion on. Both ways seems perfectly 
reasonable to me. I noted Morten wanted a split, and left to my own 
devices this is probably what I would do as well.

I think the answer to 2 is yes. We can provide other variants in the 
future, would the need arise.

<snip>


^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v7] eal: add seqlock
  2022-05-15 12:39   ` Mattias Rönnblom
@ 2022-05-15 15:19     ` Morten Brørup
  2022-05-15 17:54       ` Mattias Rönnblom
  2022-05-15 15:23     ` Morten Brørup
  2022-05-20  6:02     ` Mattias Rönnblom
  2 siblings, 1 reply; 20+ messages in thread
From: Morten Brørup @ 2022-05-15 15:19 UTC (permalink / raw)
  To: Mattias Rönnblom, Thomas Monjalon, David Marchand
  Cc: dev, Onar Olsen, Honnappa.Nagarahalli, nd, konstantin.ananyev,
	stephen, hofors, Chengwen Feng, Ola Liljedahl

> From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com]
> Sent: Sunday, 15 May 2022 14.40
> 
> Two questions remain:
> 
> 1) Should the seqlock and the seqcount reside in different header
> files?
> 2) Is it it good enough to provided only a spinlock-protected seqlock?
> 
> Question 1 I don't really have an opinion on. Both ways seems perfectly
> reasonable to me. I noted Morten wanted a split, and left to my own
> devices this is probably what I would do as well.

Argument for separate header files: If we add e.g. a rte_seqticketlock_t later, it should be able to include the rte_seqcount_t header file without also getting the contextually irrelevant rte_seqlock_t type and functions.

I don't feel strongly about this.

> 
> I think the answer to 2 is yes. We can provide other variants in the
> future, would the need arise.

Agree.

> 
> <snip>

Please move the header files from /lib/eal/include to /lib/eal/include/generic, where the other lock header files reside.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v7] eal: add seqlock
  2022-05-15 12:39   ` Mattias Rönnblom
  2022-05-15 15:19     ` Morten Brørup
@ 2022-05-15 15:23     ` Morten Brørup
  2022-05-20  6:02     ` Mattias Rönnblom
  2 siblings, 0 replies; 20+ messages in thread
From: Morten Brørup @ 2022-05-15 15:23 UTC (permalink / raw)
  To: Mattias Rönnblom, Thomas Monjalon, David Marchand
  Cc: dev, Onar Olsen, Honnappa.Nagarahalli, nd, Konstantin Ananyev,
	stephen, hofors, Chengwen Feng, Ola Liljedahl

(Resent with Konstantin's new email address.)

> From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com]
> Sent: Sunday, 15 May 2022 14.40
> 
> Two questions remain:
> 
> 1) Should the seqlock and the seqcount reside in different header
> files?
> 2) Is it it good enough to provided only a spinlock-protected seqlock?
> 
> Question 1 I don't really have an opinion on. Both ways seems perfectly
> reasonable to me. I noted Morten wanted a split, and left to my own
> devices this is probably what I would do as well.

Argument for separate header files: If we add e.g. a rte_seqticketlock_t later, it should be able to include the rte_seqcount_t header file without also getting the contextually irrelevant rte_seqlock_t type and functions.

I don't feel strongly about this.

> 
> I think the answer to 2 is yes. We can provide other variants in the
> future, would the need arise.

Agree.

> 
> <snip>

Please move the header files from /lib/eal/include to /lib/eal/include/generic, where the other lock header files reside.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v7] eal: add seqlock
  2022-05-15 15:19     ` Morten Brørup
@ 2022-05-15 17:54       ` Mattias Rönnblom
  2022-05-16  8:30         ` Morten Brørup
  0 siblings, 1 reply; 20+ messages in thread
From: Mattias Rönnblom @ 2022-05-15 17:54 UTC (permalink / raw)
  To: Morten Brørup, Mattias Rönnblom, Thomas Monjalon,
	David Marchand
  Cc: dev, Onar Olsen, Honnappa.Nagarahalli, nd, konstantin.ananyev,
	stephen, Chengwen Feng, Ola Liljedahl

On 2022-05-15 17:19, Morten Brørup wrote:
>> From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com]
>> Sent: Sunday, 15 May 2022 14.40
>>
>> Two questions remain:
>>
>> 1) Should the seqlock and the seqcount reside in different header
>> files?
>> 2) Is it it good enough to provided only a spinlock-protected seqlock?
>>
>> Question 1 I don't really have an opinion on. Both ways seems perfectly
>> reasonable to me. I noted Morten wanted a split, and left to my own
>> devices this is probably what I would do as well.
> 
> Argument for separate header files: If we add e.g. a rte_seqticketlock_t later, it should be able to include the rte_seqcount_t header file without also getting the contextually irrelevant rte_seqlock_t type and functions.
> 
> I don't feel strongly about this.
> 
>>
>> I think the answer to 2 is yes. We can provide other variants in the
>> future, would the need arise.
> 
> Agree.
> 
>>
>> <snip>
> 
> Please move the header files from /lib/eal/include to /lib/eal/include/generic, where the other lock header files reside.
> 
> 

My guess would be that the lib/eal/include/generic directory is where 
the generic implementations of APIs for which there also exist 
machine-specific implementations go. The seqlock is not an example of such.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v7] eal: add seqlock
  2022-05-15 17:54       ` Mattias Rönnblom
@ 2022-05-16  8:30         ` Morten Brørup
  0 siblings, 0 replies; 20+ messages in thread
From: Morten Brørup @ 2022-05-16  8:30 UTC (permalink / raw)
  To: Mattias Rönnblom, Mattias Rönnblom, Thomas Monjalon,
	David Marchand
  Cc: dev, Onar Olsen, Honnappa.Nagarahalli, nd, Konstantin Ananyev,
	stephen, Chengwen Feng, Ola Liljedahl

> From: Mattias Rönnblom [mailto:hofors@lysator.liu.se]
> Sent: Sunday, 15 May 2022 19.54
> 
> On 2022-05-15 17:19, Morten Brørup wrote:
> >> From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com]
> >> Sent: Sunday, 15 May 2022 14.40
> >>
> >> Two questions remain:
> >>
> >> 1) Should the seqlock and the seqcount reside in different header
> >> files?
> >> 2) Is it it good enough to provided only a spinlock-protected
> seqlock?
> >>
> >> Question 1 I don't really have an opinion on. Both ways seems
> perfectly
> >> reasonable to me. I noted Morten wanted a split, and left to my own
> >> devices this is probably what I would do as well.
> >
> > Argument for separate header files: If we add e.g. a
> rte_seqticketlock_t later, it should be able to include the
> rte_seqcount_t header file without also getting the contextually
> irrelevant rte_seqlock_t type and functions.
> >
> > I don't feel strongly about this.
> >
> >>
> >> I think the answer to 2 is yes. We can provide other variants in the
> >> future, would the need arise.
> >
> > Agree.
> >
> >>
> >> <snip>
> >
> > Please move the header files from /lib/eal/include to
> /lib/eal/include/generic, where the other lock header files reside.
> >
> >
> 
> My guess would be that the lib/eal/include/generic directory is where
> the generic implementations of APIs for which there also exist
> machine-specific implementations go. The seqlock is not an example of
> such.

My guess would be the same. However, it is not documented what goes where in the DPDK tree structure, and the community seems to have a preference for following conventions rather than doing things right. So just turn off your brain and be a lemming for a second! ;-)

Or try your luck. You have my support for the patch regardless. :-)

<rant>
I think that most of the EAL has grown into a huge pile of monolithic spaghetti, with very little modularity. E.g. I suggested refactoring and fixing the mempool cache, but could not gather traction. Instead, an increasing amount of the mempool library's implementation details are being promoted from internal to public, and its internal code is being copy-pasted into other modules, so they can bypass both the mbuf and mempool libraries when allocating and freeing mbufs.
</rant>


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v7] eal: add seqlock
  2022-05-15 12:39   ` Mattias Rönnblom
  2022-05-15 15:19     ` Morten Brørup
  2022-05-15 15:23     ` Morten Brørup
@ 2022-05-20  6:02     ` Mattias Rönnblom
  2022-05-23 11:31       ` [PATCH v8] " Mattias Rönnblom
  2 siblings, 1 reply; 20+ messages in thread
From: Mattias Rönnblom @ 2022-05-20  6:02 UTC (permalink / raw)
  To: Mattias Rönnblom, Thomas Monjalon, David Marchand
  Cc: dev, Onar Olsen, Honnappa.Nagarahalli, nd, konstantin.ananyev,
	mb, stephen, Chengwen Feng, Ola Liljedahl

On 2022-05-15 14:39, Mattias Rönnblom wrote:
> On 2022-05-15 14:24, Mattias Rönnblom wrote:
>> A sequence lock (seqlock) is a synchronization primitive which allows
>> for data-race free, low-overhead, high-frequency reads, suitable for
>> data structures shared across many cores and which are updated
>> relatively infrequently.
>>
>> A seqlock permits multiple parallel readers. A spinlock is used to
>> serialize writers. In cases where there is only a single writer, or
>> writer-writer synchronization is done by some external means, the
>> "raw" sequence counter type (and accompanying rte_seqcount_*()
>> functions) may be used instead.
>>
>> To avoid resource reclamation and other issues, the data protected by
>> a seqlock is best off being self-contained (i.e., no pointers [except
>> to constant data]).
>>
>> One way to think about seqlocks is that they provide means to perform
>> atomic operations on data objects larger than what the native atomic
>> machine instructions allow for.
>>
>> DPDK seqlocks (and the underlying sequence counters) are not
>> preemption safe on the writer side. A thread preemption affects
>> performance, not correctness.
>>
>> A seqlock contains a sequence number, which can be thought of as the
>> generation of the data it protects.
>>
>> A reader will
>>     1. Load the sequence number (sn).
>>     2. Load, in arbitrary order, the seqlock-protected data.
>>     3. Load the sn again.
>>     4. Check if the first and second sn are equal, and even numbered.
>>        If they are not, discard the loaded data, and restart from 1.
>>
>> The first three steps need to be ordered using suitable memory fences.
>>
>> A writer will
>>     1. Take the spinlock, to serialize writer access.
>>     2. Load the sn.
>>     3. Store the original sn + 1 as the new sn.
>>     4. Perform load and stores to the seqlock-protected data.
>>     5. Store the original sn + 2 as the new sn.
>>     6. Release the spinlock.
>>
>> Proper memory fencing is required to make sure the first sn store, the
>> data stores, and the second sn store appear to the reader in the
>> mentioned order.
>>
>> The sn loads and stores must be atomic, but the data loads and stores
>> need not be.
>>
>> The original seqlock design and implementation was done by Stephen
>> Hemminger. This is an independent implementation, using C11 atomics.
>>
>> For more information on seqlocks, see
>> https://en.wikipedia.org/wiki/Seqlock
>>
>> ---
>>
> 
> A note to previous reviewers: This split of seqlock into
> seqcount+seqlock assumes that the spinlock lock/unlock calls provided no
> additional functionality in regards to MT safety, than writer-writer
> serialization. I believe that to be the case, but I would feel more
> comfortable if someone else re-reviewed this code with this in mind.
> 
> Two questions remain:
> 
> 1) Should the seqlock and the seqcount reside in different header files?

Since I received no comments on this, I'll go ahead and make a v8, where 
the seqcount and seqlock structs and functions are in different header 
files.

> 2) Is it it good enough to provided only a spinlock-protected seqlock?
> 
> Question 1 I don't really have an opinion on. Both ways seems perfectly
> reasonable to me. I noted Morten wanted a split, and left to my own
> devices this is probably what I would do as well.
> 
> I think the answer to 2 is yes. We can provide other variants in the
> future, would the need arise.
> 
> <snip>
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v8] eal: add seqlock
  2022-05-20  6:02     ` Mattias Rönnblom
@ 2022-05-23 11:31       ` Mattias Rönnblom
  2022-05-23 14:23         ` [PATCH v9] " Mattias Rönnblom
  0 siblings, 1 reply; 20+ messages in thread
From: Mattias Rönnblom @ 2022-05-23 11:31 UTC (permalink / raw)
  To: Thomas Monjalon, David Marchand
  Cc: dev, onar.olsen, Honnappa.Nagarahalli, nd, konstantin.ananyev,
	mb, stephen, hofors, Chengwen Feng, Mattias Rönnblom,
	Ola Liljedahl

A sequence lock (seqlock) is a synchronization primitive which allows
for data-race free, low-overhead, high-frequency reads, suitable for
data structures shared across many cores and which are updated
relatively infrequently.

A seqlock permits multiple parallel readers. A spinlock is used to
serialize writers. In cases where there is only a single writer, or
writer-writer synchronization is done by some external means, the
"raw" sequence counter type (and accompanying rte_seqcount_*()
functions) may be used instead.

To avoid resource reclamation and other issues, the data protected by
a seqlock is best off being self-contained (i.e., no pointers [except
to constant data]).

One way to think about seqlocks is that they provide means to perform
atomic operations on data objects larger than what the native atomic
machine instructions allow for.

DPDK seqlocks (and the underlying sequence counters) are not
preemption safe on the writer side. A thread preemption affects
performance, not correctness.

A seqlock contains a sequence number, which can be thought of as the
generation of the data it protects.

A reader will
  1. Load the sequence number (sn).
  2. Load, in arbitrary order, the seqlock-protected data.
  3. Load the sn again.
  4. Check if the first and second sn are equal, and even numbered.
     If they are not, discard the loaded data, and restart from 1.

The first three steps need to be ordered using suitable memory fences.

A writer will
  1. Take the spinlock, to serialize writer access.
  2. Load the sn.
  3. Store the original sn + 1 as the new sn.
  4. Perform load and stores to the seqlock-protected data.
  5. Store the original sn + 2 as the new sn.
  6. Release the spinlock.

Proper memory fencing is required to make sure the first sn store, the
data stores, and the second sn store appear to the reader in the
mentioned order.

The sn loads and stores must be atomic, but the data loads and stores
need not be.

The original seqlock design and implementation was done by Stephen
Hemminger. This is an independent implementation, using C11 atomics.

For more information on seqlocks, see
https://en.wikipedia.org/wiki/Seqlock

---

PATCH v8:
  * Move the sequence counter into a separate header file.
  * Move the initialization code into the header files and eliminate
    the tiny rte_seqlock.c.

PATCH v7:
  * Factor out the sequence number into a separate type rte_seqcount_t.

PATCH v6:
  * Check for failed memory allocations in unit test.
  * Fix underflow issue in test case for small RTE_LCORE_MAX values.
  * Fix test case memory leak.

PATCH v5:
  * Add sequence lock section to MAINTAINERS.
  * Add entry in the release notes.
  * Add seqlock reference in the API index.
  * Fix meson build file indentation.
  * Use "increment" to describe how a writer changes the sequence number.
  * Remove compiler barriers from seqlock test.
  * Use appropriate macros (e.g., TEST_SUCCESS) for test return values.

PATCH v4:
  * Reverted to Linux kernel style naming on the read side.
  * Bail out early from the retry function if an odd sequence
    number is encountered.
  * Added experimental warnings in the API documentation.
  * Static initializer now uses named field initialization.
  * Various tweaks to API documentation (including the example).

PATCH v3:
  * Renamed both read and write-side critical section begin/end functions
    to better match rwlock naming, per Ola Liljedahl's suggestion.
  * Added 'extern "C"' guards for C++ compatibility.
  * Refer to the main lcore as the main lcore, and nothing else.

PATCH v2:
  * Skip instead of fail unit test in case too few lcores are available.
  * Use main lcore for testing, reducing the minimum number of lcores
    required to run the unit tests to four.
  * Consistently refer to sn field as the "sequence number" in the
    documentation.
  * Fixed spelling mistakes in documentation.

Updates since RFC:
  * Added API documentation.
  * Added link to Wikipedia article in the commit message.
  * Changed seqlock sequence number field from uint64_t (which was
    overkill) to uint32_t. The sn type needs to be sufficiently large
    to assure no reader will read a sn, access the data, and then read
    the same sn, but the sn has been incremented enough times to have
    wrapped during the read, and arrived back at the original sn.
  * Added RTE_SEQLOCK_INITIALIZER macro for static initialization.
  * Removed the rte_seqlock struct + separate rte_seqlock_t typedef
    with an anonymous struct typedef:ed to rte_seqlock_t.

Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Reviewed-by: Chengwen Feng <fengchengwen@huawei.com>
Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
---
 MAINTAINERS                            |   6 +
 app/test/meson.build                   |   2 +
 app/test/test_seqlock.c                | 190 +++++++++++++++++++
 doc/api/doxy-api-index.md              |   2 +
 doc/guides/rel_notes/release_22_07.rst |  18 ++
 lib/eal/include/meson.build            |   2 +
 lib/eal/include/rte_seqcount.h         | 251 ++++++++++++++++++++++++
 lib/eal/include/rte_seqlock.h          | 252 +++++++++++++++++++++++++
 8 files changed, 723 insertions(+)
 create mode 100644 app/test/test_seqlock.c
 create mode 100644 lib/eal/include/rte_seqcount.h
 create mode 100644 lib/eal/include/rte_seqlock.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 17a0559ee7..458ea7e47c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -263,6 +263,12 @@ M: Joyce Kong <joyce.kong@arm.com>
 F: lib/eal/include/generic/rte_ticketlock.h
 F: app/test/test_ticketlock.c
 
+Sequence Lock
+M: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
+F: lib/eal/include/rte_seqcount.h
+F: lib/eal/include/rte_seqlock.h
+F: app/test/test_seqlock.c
+
 Pseudo-random Number Generation
 M: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
 F: lib/eal/include/rte_random.h
diff --git a/app/test/meson.build b/app/test/meson.build
index 15591ce5cf..48344e2071 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -125,6 +125,7 @@ test_sources = files(
         'test_rwlock.c',
         'test_sched.c',
         'test_security.c',
+        'test_seqlock.c',
         'test_service_cores.c',
         'test_spinlock.c',
         'test_stack.c',
@@ -216,6 +217,7 @@ fast_tests = [
         ['rwlock_rde_wro_autotest', true, true],
         ['sched_autotest', true, true],
         ['security_autotest', false, true],
+        ['seqlock_autotest', true, true],
         ['spinlock_autotest', true, true],
         ['stack_autotest', false, true],
         ['stack_lf_autotest', false, true],
diff --git a/app/test/test_seqlock.c b/app/test/test_seqlock.c
new file mode 100644
index 0000000000..cb1c1baa82
--- /dev/null
+++ b/app/test/test_seqlock.c
@@ -0,0 +1,190 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2022 Ericsson AB
+ */
+
+#include <rte_seqlock.h>
+
+#include <rte_cycles.h>
+#include <rte_malloc.h>
+#include <rte_random.h>
+
+#include <inttypes.h>
+
+#include "test.h"
+
+struct data {
+	rte_seqlock_t lock;
+
+	uint64_t a;
+	uint64_t b __rte_cache_aligned;
+	uint64_t c __rte_cache_aligned;
+} __rte_cache_aligned;
+
+struct reader {
+	struct data *data;
+	uint8_t stop;
+};
+
+#define WRITER_RUNTIME (2.0) /* s */
+
+#define WRITER_MAX_DELAY (100) /* us */
+
+#define INTERRUPTED_WRITER_FREQUENCY (1000)
+#define WRITER_INTERRUPT_TIME (1) /* us */
+
+static int
+writer_run(void *arg)
+{
+	struct data *data = arg;
+	uint64_t deadline;
+
+	deadline = rte_get_timer_cycles() +
+		WRITER_RUNTIME * rte_get_timer_hz();
+
+	while (rte_get_timer_cycles() < deadline) {
+		bool interrupted;
+		uint64_t new_value;
+		unsigned int delay;
+
+		new_value = rte_rand();
+
+		interrupted = rte_rand_max(INTERRUPTED_WRITER_FREQUENCY) == 0;
+
+		rte_seqlock_write_lock(&data->lock);
+
+		data->c = new_value;
+		data->b = new_value;
+
+		if (interrupted)
+			rte_delay_us_block(WRITER_INTERRUPT_TIME);
+
+		data->a = new_value;
+
+		rte_seqlock_write_unlock(&data->lock);
+
+		delay = rte_rand_max(WRITER_MAX_DELAY);
+
+		rte_delay_us_block(delay);
+	}
+
+	return TEST_SUCCESS;
+}
+
+#define INTERRUPTED_READER_FREQUENCY (1000)
+#define READER_INTERRUPT_TIME (1000) /* us */
+
+static int
+reader_run(void *arg)
+{
+	struct reader *r = arg;
+	int rc = TEST_SUCCESS;
+
+	while (__atomic_load_n(&r->stop, __ATOMIC_RELAXED) == 0 &&
+	       rc == TEST_SUCCESS) {
+		struct data *data = r->data;
+		bool interrupted;
+		uint32_t sn;
+		uint64_t a;
+		uint64_t b;
+		uint64_t c;
+
+		interrupted = rte_rand_max(INTERRUPTED_READER_FREQUENCY) == 0;
+
+		do {
+			sn = rte_seqlock_read_begin(&data->lock);
+
+			a = data->a;
+			if (interrupted)
+				rte_delay_us_block(READER_INTERRUPT_TIME);
+			c = data->c;
+			b = data->b;
+
+		} while (rte_seqlock_read_retry(&data->lock, sn));
+
+		if (a != b || b != c) {
+			printf("Reader observed inconsistent data values "
+			       "%" PRIu64 " %" PRIu64 " %" PRIu64 "\n",
+			       a, b, c);
+			rc = TEST_FAILED;
+		}
+	}
+
+	return rc;
+}
+
+static void
+reader_stop(struct reader *reader)
+{
+	__atomic_store_n(&reader->stop, 1, __ATOMIC_RELAXED);
+}
+
+#define NUM_WRITERS (2) /* main lcore + one worker */
+#define MIN_NUM_READERS (2)
+#define MIN_LCORE_COUNT (NUM_WRITERS + MIN_NUM_READERS)
+
+/* Only a compile-time test */
+static rte_seqlock_t __rte_unused static_init_lock = RTE_SEQLOCK_INITIALIZER;
+
+static int
+test_seqlock(void)
+{
+	struct reader readers[RTE_MAX_LCORE];
+	unsigned int num_lcores;
+	unsigned int num_readers;
+	struct data *data;
+	unsigned int i;
+	unsigned int lcore_id;
+	unsigned int reader_lcore_ids[RTE_MAX_LCORE];
+	unsigned int worker_writer_lcore_id = 0;
+	int rc = TEST_SUCCESS;
+
+	num_lcores = rte_lcore_count();
+
+	if (num_lcores < MIN_LCORE_COUNT) {
+		printf("Too few cores to run test. Skipping.\n");
+		return TEST_SKIPPED;
+	}
+
+	num_readers = num_lcores - NUM_WRITERS;
+
+	data = rte_zmalloc(NULL, sizeof(struct data), 0);
+
+	if (data == NULL) {
+		printf("Failed to allocate memory for seqlock data\n");
+		return TEST_FAILED;
+	}
+
+	i = 0;
+	RTE_LCORE_FOREACH_WORKER(lcore_id) {
+		if (i == 0) {
+			rte_eal_remote_launch(writer_run, data, lcore_id);
+			worker_writer_lcore_id = lcore_id;
+		} else {
+			unsigned int reader_idx = i - 1;
+			struct reader *reader = &readers[reader_idx];
+
+			reader->data = data;
+			reader->stop = 0;
+
+			rte_eal_remote_launch(reader_run, reader, lcore_id);
+			reader_lcore_ids[reader_idx] = lcore_id;
+		}
+		i++;
+	}
+
+	if (writer_run(data) != 0 ||
+	    rte_eal_wait_lcore(worker_writer_lcore_id) != 0)
+		rc = TEST_FAILED;
+
+	for (i = 0; i < num_readers; i++) {
+		reader_stop(&readers[i]);
+		if (rte_eal_wait_lcore(reader_lcore_ids[i]) != 0)
+			rc = TEST_FAILED;
+	}
+
+	rte_free(data);
+
+	return rc;
+}
+
+REGISTER_TEST_COMMAND(seqlock_autotest, test_seqlock);
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index 2b78d796ea..6dd219ef0d 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -78,6 +78,8 @@ The public API headers are grouped by topics:
   [rwlock]             (@ref rte_rwlock.h),
   [spinlock]           (@ref rte_spinlock.h),
   [ticketlock]         (@ref rte_ticketlock.h),
+  [seqlock]            (@ref rte_seqlock.h),
+  [seqcount]           (@ref rte_seqcount.h),
   [RCU]                (@ref rte_rcu_qsbr.h)
 
 - **CPU arch**:
diff --git a/doc/guides/rel_notes/release_22_07.rst b/doc/guides/rel_notes/release_22_07.rst
index e49cacecef..9c48769171 100644
--- a/doc/guides/rel_notes/release_22_07.rst
+++ b/doc/guides/rel_notes/release_22_07.rst
@@ -60,6 +60,24 @@ New Features
   Added an API which can get the number of in-flight packets in
   vhost async data path without using lock.
 
+* **Added Sequence Lock.**
+
+  Added a new synchronization primitive: the sequence lock
+  (seqlock). A seqlock allows for low overhead, parallel reads. The
+  DPDK seqlock uses a spinlock to serialize multiple writing threads.
+
+  In particular, seqlocks are useful for protecting data structures
+  which are read very frequently, by threads running on many different
+  cores, and modified relatively infrequently.
+
+  One way to think about seqlocks is that they provide means to
+  perform atomic operations on data objects larger than what the
+  native atomic machine instructions allow for.
+
+  In cases where there is only a single writer, or writer-writer
+  synchronization is performed by some means external to the seqlock,
+  direct use of the underlying sequence counter may be more suitable.
+
 * **Updated Intel iavf driver.**
 
   * Added Tx QoS queue rate limitation support.
diff --git a/lib/eal/include/meson.build b/lib/eal/include/meson.build
index 9700494816..40ebb5b63d 100644
--- a/lib/eal/include/meson.build
+++ b/lib/eal/include/meson.build
@@ -36,6 +36,8 @@ headers += files(
         'rte_per_lcore.h',
         'rte_random.h',
         'rte_reciprocal.h',
+        'rte_seqcount.h',
+        'rte_seqlock.h',
         'rte_service.h',
         'rte_service_component.h',
         'rte_string_fns.h',
diff --git a/lib/eal/include/rte_seqcount.h b/lib/eal/include/rte_seqcount.h
new file mode 100644
index 0000000000..7fdc8d6da9
--- /dev/null
+++ b/lib/eal/include/rte_seqcount.h
@@ -0,0 +1,251 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2022 Ericsson AB
+ */
+
+#ifndef _RTE_SEQCOUNT_H_
+#define _RTE_SEQCOUNT_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * @file
+ * RTE Seqcount
+ *
+ * The sequence counter synchronizes a single writer with multiple,
+ * parallel readers. It is used as the basis for the RTE sequence
+ * lock.
+ *
+ * @see rte_seqlock.h
+ */
+
+#include <stdbool.h>
+#include <stdint.h>
+
+#include <rte_atomic.h>
+#include <rte_branch_prediction.h>
+
+/**
+ * The RTE seqcount type.
+ */
+typedef struct {
+	uint32_t sn; /**< A sequence number for the protected data. */
+} rte_seqcount_t;
+
+/**
+ * A static seqcount initializer.
+ */
+#define RTE_SEQCOUNT_INITIALIZER \
+	{							\
+		.sn = 0						\
+	}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Initialize the sequence counter.
+ *
+ * @param seqcount
+ *   A pointer to the sequence counter.
+ */
+__rte_experimental
+static inline void
+rte_seqcount_init(rte_seqcount_t *seqcount)
+{
+	seqcount->sn = 0;
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Begin a read-side critical section.
+ *
+ * A call to this function marks the beginning of a read-side critical
+ * section, for @p seqcount.
+ *
+ * rte_seqcount_read_begin() returns a sequence number, which is later
+ * used in rte_seqcount_read_retry() to check if the protected data
+ * underwent any modifications during the read transaction.
+ *
+ * After (in program order) rte_seqcount_read_begin() has been called,
+ * the calling thread reads the protected data, for later use. The
+ * protected data read *must* be copied (either in pristine form, or
+ * in the form of some derivative), since the caller may only read the
+ * data from within the read-side critical section (i.e., after
+ * rte_seqcount_read_begin() and before rte_seqcount_read_retry()),
+ * but must not act upon the retrieved data while in the critical
+ * section, since it does not yet know if it is consistent.
+ *
+ * The protected data may be read using atomic and/or non-atomic
+ * operations.
+ *
+ * After (in program order) all required data loads have been
+ * performed, rte_seqcount_read_retry() should be called, marking
+ * the end of the read-side critical section.
+ *
+ * If rte_seqcount_read_retry() returns true, the just-read data is
+ * inconsistent and should be discarded. The caller has the option to
+ * either restart the whole procedure right away (i.e., calling
+ * rte_seqcount_read_begin() again), or do the same at some later time.
+ *
+ * If rte_seqcount_read_retry() returns false, the data was read
+ * atomically and the copied data is consistent.
+ *
+ * @param seqcount
+ *   A pointer to the sequence counter.
+ * @return
+ *   The seqcount sequence number for this critical section, to
+ *   later be passed to rte_seqcount_read_retry().
+ *
+ * @see rte_seqcount_read_retry()
+ */
+
+__rte_experimental
+static inline uint32_t
+rte_seqcount_read_begin(const rte_seqcount_t *seqcount)
+{
+	/* __ATOMIC_ACQUIRE to prevent loads after (in program order)
+	 * from happening before the sn load. Synchronizes-with the
+	 * store release in rte_seqcount_write_end().
+	 */
+	return __atomic_load_n(&seqcount->sn, __ATOMIC_ACQUIRE);
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * End a read-side critical section.
+ *
+ * A call to this function marks the end of a read-side critical
+ * section, for @p seqcount. The application must supply the sequence
+ * number produced by the corresponding rte_seqcount_read_begin() call.
+ *
+ * After this function has been called, the caller should not access
+ * the protected data.
+ *
+ * In case rte_seqcount_read_retry() returns true, the just-read data
+ * was modified as it was being read and may be inconsistent, and thus
+ * should be discarded.
+ *
+ * In case this function returns false, the data is consistent and the
+ * set of atomic and non-atomic load operations performed between
+ * rte_seqcount_read_begin() and rte_seqcount_read_retry() were atomic,
+ * as a whole.
+ *
+ * @param seqcount
+ *   A pointer to the sequence counter.
+ * @param begin_sn
+ *   The sequence number returned by rte_seqcount_read_begin().
+ * @return
+ *   true or false, if the just-read seqcount-protected data was
+ *   inconsistent or consistent, respectively, at the time it was
+ *   read.
+ *
+ * @see rte_seqcount_read_begin()
+ */
+
+__rte_experimental
+static inline bool
+rte_seqcount_read_retry(const rte_seqcount_t *seqcount, uint32_t begin_sn)
+{
+	uint32_t end_sn;
+
+	/* An odd sequence number means the protected data was being
+	 * modified already at the point of the rte_seqcount_read_begin()
+	 * call.
+	 */
+	if (unlikely(begin_sn & 1))
+		return true;
+
+	/* make sure the data loads happens before the sn load */
+	rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
+
+	end_sn = __atomic_load_n(&seqcount->sn, __ATOMIC_RELAXED);
+
+	/* A writer incremented the sequence number during this read
+	 * critical section.
+	 */
+	if (unlikely(begin_sn != end_sn))
+		return true;
+
+	return false;
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Begin a write-side critical section.
+ *
+ * A call to this function marks the beginning of a write-side
+ * critical section, after which the caller may go on to modify (both
+ * read and write) the protected data, in an atomic or non-atomic
+ * manner.
+ *
+ * After the necessary updates have been performed, the application
+ * calls rte_seqcount_write_end().
+ *
+ * Multiple, parallel writers must use some external serialization.
+ *
+ * This function is not preemption-safe in the sense that preemption
+ * of the calling thread may block reader progress until the writer
+ * thread is rescheduled.
+ *
+ * @param seqcount
+ *   A pointer to the sequence counter.
+ *
+ * @see rte_seqcount_write_end()
+ */
+
+__rte_experimental
+static inline void
+rte_seqcount_write_begin(rte_seqcount_t *seqcount)
+{
+	uint32_t sn;
+
+	sn = seqcount->sn + 1;
+
+	__atomic_store_n(&seqcount->sn, sn, __ATOMIC_RELAXED);
+
+	/* __ATOMIC_RELEASE to prevent stores after (in program order)
+	 * from happening before the sn store.
+	 */
+	rte_atomic_thread_fence(__ATOMIC_RELEASE);
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * End a write-side critical section.
+ *
+ * A call to this function marks the end of the write-side critical
+ * section, for @p seqcount. After this call has been made, the
+ * protected data may no longer be modified.
+ *
+ * @param seqcount
+ *   A pointer to the sequence counter.
+ *
+ * @see rte_seqcount_write_begin()
+ */
+__rte_experimental
+static inline void
+rte_seqcount_write_end(rte_seqcount_t *seqcount)
+{
+	uint32_t sn;
+
+	sn = seqcount->sn + 1;
+
+	/* synchronizes-with the load acquire in rte_seqcount_read_begin() */
+	__atomic_store_n(&seqcount->sn, sn, __ATOMIC_RELEASE);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_SEQCOUNT_H_ */
diff --git a/lib/eal/include/rte_seqlock.h b/lib/eal/include/rte_seqlock.h
new file mode 100644
index 0000000000..8f9d2fe51b
--- /dev/null
+++ b/lib/eal/include/rte_seqlock.h
@@ -0,0 +1,252 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2022 Ericsson AB
+ */
+
+#ifndef _RTE_SEQLOCK_H_
+#define _RTE_SEQLOCK_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * @file
+ * RTE Seqlock
+ *
+ * A sequence lock (seqlock) is a synchronization primitive allowing
+ * multiple, parallel, readers to efficiently and safely (i.e., in a
+ * data-race free manner) access lock-protected data. The RTE seqlock
+ * permits multiple writers as well. A spinlock is used for
+ * writer-writer synchronization.
+ *
+ * A reader never blocks a writer. Very high frequency writes may
+ * prevent readers from making progress.
+ *
+ * A seqlock is not preemption-safe on the writer side. If a writer is
+ * preempted, it may block readers until the writer thread is allowed
+ * to continue. Heavy computations should be kept out of the
+ * writer-side critical section, to avoid delaying readers.
+ *
+ * Seqlocks are useful for data which are read by many cores, at a
+ * high frequency, and relatively infrequently written to.
+ *
+ * One way to think about seqlocks is that they provide means to
+ * perform atomic operations on objects larger than what the native
+ * machine instructions allow for.
+ *
+ * To avoid resource reclamation issues, the data protected by a
+ * seqlock should typically be kept self-contained (e.g., no pointers
+ * to mutable, dynamically allocated data).
+ *
+ * Example usage:
+ * @code{.c}
+ * #define MAX_Y_LEN (16)
+ * // Application-defined example data structure, protected by a seqlock.
+ * struct config {
+ *         rte_seqlock_t lock;
+ *         int param_x;
+ *         char param_y[MAX_Y_LEN];
+ * };
+ *
+ * // Accessor function for reading config fields.
+ * void
+ * config_read(const struct config *config, int *param_x, char *param_y)
+ * {
+ *         uint32_t sn;
+ *
+ *         do {
+ *                 sn = rte_seqlock_read_begin(&config->lock);
+ *
+ *                 // Loads may be atomic or non-atomic, as in this example.
+ *                 *param_x = config->param_x;
+ *                 strcpy(param_y, config->param_y);
+ *                 // An alternative to an immediate retry is to abort and
+ *                 // try again at some later time, assuming progress is
+ *                 // possible without the data.
+ *         } while (rte_seqlock_read_retry(&config->lock));
+ * }
+ *
+ * // Accessor function for writing config fields.
+ * void
+ * config_update(struct config *config, int param_x, const char *param_y)
+ * {
+ *         rte_seqlock_write_lock(&config->lock);
+ *         // Stores may be atomic or non-atomic, as in this example.
+ *         config->param_x = param_x;
+ *         strcpy(config->param_y, param_y);
+ *         rte_seqlock_write_unlock(&config->lock);
+ * }
+ * @endcode
+ *
+ * In case there is only a single writer, or writer-writer
+ * serialization is provided by other means, the use of sequence lock
+ * (i.e., rte_seqlock_t) can be replaced with the use of the "raw"
+ * rte_seqcount_t type instead.
+ *
+ * @see
+ * https://en.wikipedia.org/wiki/Seqlock.
+ */
+
+#include <stdbool.h>
+#include <stdint.h>
+
+#include <rte_atomic.h>
+#include <rte_branch_prediction.h>
+#include <rte_seqcount.h>
+#include <rte_spinlock.h>
+
+/**
+ * The RTE seqlock type.
+ */
+typedef struct {
+	rte_seqcount_t count; /**< Sequence count for the protected data. */
+	rte_spinlock_t lock; /**< Spinlock used to serialize writers.  */
+} rte_seqlock_t;
+
+/**
+ * A static seqlock initializer.
+ */
+#define RTE_SEQLOCK_INITIALIZER \
+	{							\
+		.count = RTE_SEQCOUNT_INITIALIZER,		\
+		.lock = RTE_SPINLOCK_INITIALIZER		\
+	}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Initialize the seqlock.
+ *
+ * This function initializes the seqlock, and leaves the writer-side
+ * spinlock unlocked.
+ *
+ * @param seqlock
+ *   A pointer to the seqlock.
+ */
+__rte_experimental
+static inline void
+rte_seqlock_init(rte_seqlock_t *seqlock)
+{
+	rte_seqcount_init(&seqlock->count);
+	rte_spinlock_init(&seqlock->lock);
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Begin a read-side critical section.
+ *
+ * See rte_seqcount_read_retry() for details.
+ *
+ * @param seqlock
+ *   A pointer to the seqlock.
+ * @return
+ *   The seqlock sequence number for this critical section, to
+ *   later be passed to rte_seqlock_read_retry().
+ *
+ * @see rte_seqlock_read_retry()
+ * @see rte_seqcount_read_retry()
+ */
+
+__rte_experimental
+static inline uint32_t
+rte_seqlock_read_begin(const rte_seqlock_t *seqlock)
+{
+	return rte_seqcount_read_begin(&seqlock->count);
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * End a read-side critical section.
+ *
+ * See rte_seqcount_read_retry() for details.
+ *
+ * @param seqlock
+ *   A pointer to the seqlock.
+ * @param begin_sn
+ *   The seqlock sequence number returned by rte_seqlock_read_begin().
+ * @return
+ *   true or false, if the just-read seqlock-protected data was
+ *   inconsistent or consistent, respectively, at the time it was
+ *   read.
+ *
+ * @see rte_seqlock_read_begin()
+ */
+__rte_experimental
+static inline bool
+rte_seqlock_read_retry(const rte_seqlock_t *seqlock, uint32_t begin_sn)
+{
+	return rte_seqcount_read_retry(&seqlock->count, begin_sn);
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Begin a write-side critical section.
+ *
+ * A call to this function acquires the write lock associated @p
+ * seqlock, and marks the beginning of a write-side critical section.
+ *
+ * After having called this function, the caller may go on to modify
+ * (both read and write) the protected data, in an atomic or
+ * non-atomic manner.
+ *
+ * After the necessary updates have been performed, the application
+ * calls rte_seqlock_write_unlock().
+ *
+ * This function is not preemption-safe in the sense that preemption
+ * of the calling thread may block reader progress until the writer
+ * thread is rescheduled.
+ *
+ * Unlike rte_seqlock_read_begin(), each call made to
+ * rte_seqlock_write_lock() must be matched with an unlock call.
+ *
+ * @param seqlock
+ *   A pointer to the seqlock.
+ *
+ * @see rte_seqlock_write_unlock()
+ */
+__rte_experimental
+static inline void
+rte_seqlock_write_lock(rte_seqlock_t *seqlock)
+{
+	/* to synchronize with other writers */
+	rte_spinlock_lock(&seqlock->lock);
+
+	rte_seqcount_write_begin(&seqlock->count);
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * End a write-side critical section.
+ *
+ * A call to this function marks the end of the write-side critical
+ * section, for @p seqlock. After this call has been made, the protected
+ * data may no longer be modified.
+ *
+ * @param seqlock
+ *   A pointer to the seqlock.
+ *
+ * @see rte_seqlock_write_lock()
+ */
+__rte_experimental
+static inline void
+rte_seqlock_write_unlock(rte_seqlock_t *seqlock)
+{
+	rte_seqcount_write_end(&seqlock->count);
+
+	rte_spinlock_unlock(&seqlock->lock);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif  /* _RTE_SEQLOCK_H_ */
-- 
2.25.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v9] eal: add seqlock
  2022-05-23 11:31       ` [PATCH v8] " Mattias Rönnblom
@ 2022-05-23 14:23         ` Mattias Rönnblom
  2022-05-31 11:52           ` David Marchand
                             ` (3 more replies)
  0 siblings, 4 replies; 20+ messages in thread
From: Mattias Rönnblom @ 2022-05-23 14:23 UTC (permalink / raw)
  To: Thomas Monjalon, David Marchand
  Cc: dev, onar.olsen, Honnappa.Nagarahalli, nd, konstantin.ananyev,
	mb, stephen, hofors, Chengwen Feng, Mattias Rönnblom,
	Ola Liljedahl

A sequence lock (seqlock) is a synchronization primitive which allows
for data-race free, low-overhead, high-frequency reads, suitable for
data structures shared across many cores and which are updated
relatively infrequently.

A seqlock permits multiple parallel readers. A spinlock is used to
serialize writers. In cases where there is only a single writer, or
writer-writer synchronization is done by some external means, the
"raw" sequence counter type (and accompanying rte_seqcount_*()
functions) may be used instead.

To avoid resource reclamation and other issues, the data protected by
a seqlock is best off being self-contained (i.e., no pointers [except
to constant data]).

One way to think about seqlocks is that they provide means to perform
atomic operations on data objects larger than what the native atomic
machine instructions allow for.

DPDK seqlocks (and the underlying sequence counters) are not
preemption safe on the writer side. A thread preemption affects
performance, not correctness.

A seqlock contains a sequence number, which can be thought of as the
generation of the data it protects.

A reader will
  1. Load the sequence number (sn).
  2. Load, in arbitrary order, the seqlock-protected data.
  3. Load the sn again.
  4. Check if the first and second sn are equal, and even numbered.
     If they are not, discard the loaded data, and restart from 1.

The first three steps need to be ordered using suitable memory fences.

A writer will
  1. Take the spinlock, to serialize writer access.
  2. Load the sn.
  3. Store the original sn + 1 as the new sn.
  4. Perform load and stores to the seqlock-protected data.
  5. Store the original sn + 2 as the new sn.
  6. Release the spinlock.

Proper memory fencing is required to make sure the first sn store, the
data stores, and the second sn store appear to the reader in the
mentioned order.

The sn loads and stores must be atomic, but the data loads and stores
need not be.

The original seqlock design and implementation was done by Stephen
Hemminger. This is an independent implementation, using C11 atomics.

For more information on seqlocks, see
https://en.wikipedia.org/wiki/Seqlock

---

PATCH v9:
  * Include <rte_compat.h> for __rte_experimental. The failure to do
    so caused build failures on 32-bit ARM.

PATCH v8:
  * Move the sequence counter into a separate header file.
  * Move the initialization code into the header files and eliminate
    the tiny rte_seqlock.c.

PATCH v7:
  * Factor out the sequence number into a separate type rte_seqcount_t.

PATCH v6:
  * Check for failed memory allocations in unit test.
  * Fix underflow issue in test case for small RTE_LCORE_MAX values.
  * Fix test case memory leak.

PATCH v5:
  * Add sequence lock section to MAINTAINERS.
  * Add entry in the release notes.
  * Add seqlock reference in the API index.
  * Fix meson build file indentation.
  * Use "increment" to describe how a writer changes the sequence number.
  * Remove compiler barriers from seqlock test.
  * Use appropriate macros (e.g., TEST_SUCCESS) for test return values.

PATCH v4:
  * Reverted to Linux kernel style naming on the read side.
  * Bail out early from the retry function if an odd sequence
    number is encountered.
  * Added experimental warnings in the API documentation.
  * Static initializer now uses named field initialization.
  * Various tweaks to API documentation (including the example).

PATCH v3:
  * Renamed both read and write-side critical section begin/end functions
    to better match rwlock naming, per Ola Liljedahl's suggestion.
  * Added 'extern "C"' guards for C++ compatibility.
  * Refer to the main lcore as the main lcore, and nothing else.

PATCH v2:
  * Skip instead of fail unit test in case too few lcores are available.
  * Use main lcore for testing, reducing the minimum number of lcores
    required to run the unit tests to four.
  * Consistently refer to sn field as the "sequence number" in the
    documentation.
  * Fixed spelling mistakes in documentation.

Updates since RFC:
  * Added API documentation.
  * Added link to Wikipedia article in the commit message.
  * Changed seqlock sequence number field from uint64_t (which was
    overkill) to uint32_t. The sn type needs to be sufficiently large
    to assure no reader will read a sn, access the data, and then read
    the same sn, but the sn has been incremented enough times to have
    wrapped during the read, and arrived back at the original sn.
  * Added RTE_SEQLOCK_INITIALIZER macro for static initialization.
  * Removed the rte_seqlock struct + separate rte_seqlock_t typedef
    with an anonymous struct typedef:ed to rte_seqlock_t.

Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Reviewed-by: Chengwen Feng <fengchengwen@huawei.com>
Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
---
 MAINTAINERS                            |   6 +
 app/test/meson.build                   |   2 +
 app/test/test_seqlock.c                | 190 +++++++++++++++++++
 doc/api/doxy-api-index.md              |   2 +
 doc/guides/rel_notes/release_22_07.rst |  18 ++
 lib/eal/include/meson.build            |   2 +
 lib/eal/include/rte_seqcount.h         | 252 ++++++++++++++++++++++++
 lib/eal/include/rte_seqlock.h          | 253 +++++++++++++++++++++++++
 8 files changed, 725 insertions(+)
 create mode 100644 app/test/test_seqlock.c
 create mode 100644 lib/eal/include/rte_seqcount.h
 create mode 100644 lib/eal/include/rte_seqlock.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 17a0559ee7..458ea7e47c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -263,6 +263,12 @@ M: Joyce Kong <joyce.kong@arm.com>
 F: lib/eal/include/generic/rte_ticketlock.h
 F: app/test/test_ticketlock.c
 
+Sequence Lock
+M: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
+F: lib/eal/include/rte_seqcount.h
+F: lib/eal/include/rte_seqlock.h
+F: app/test/test_seqlock.c
+
 Pseudo-random Number Generation
 M: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
 F: lib/eal/include/rte_random.h
diff --git a/app/test/meson.build b/app/test/meson.build
index 15591ce5cf..48344e2071 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -125,6 +125,7 @@ test_sources = files(
         'test_rwlock.c',
         'test_sched.c',
         'test_security.c',
+        'test_seqlock.c',
         'test_service_cores.c',
         'test_spinlock.c',
         'test_stack.c',
@@ -216,6 +217,7 @@ fast_tests = [
         ['rwlock_rde_wro_autotest', true, true],
         ['sched_autotest', true, true],
         ['security_autotest', false, true],
+        ['seqlock_autotest', true, true],
         ['spinlock_autotest', true, true],
         ['stack_autotest', false, true],
         ['stack_lf_autotest', false, true],
diff --git a/app/test/test_seqlock.c b/app/test/test_seqlock.c
new file mode 100644
index 0000000000..cb1c1baa82
--- /dev/null
+++ b/app/test/test_seqlock.c
@@ -0,0 +1,190 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2022 Ericsson AB
+ */
+
+#include <rte_seqlock.h>
+
+#include <rte_cycles.h>
+#include <rte_malloc.h>
+#include <rte_random.h>
+
+#include <inttypes.h>
+
+#include "test.h"
+
+struct data {
+	rte_seqlock_t lock;
+
+	uint64_t a;
+	uint64_t b __rte_cache_aligned;
+	uint64_t c __rte_cache_aligned;
+} __rte_cache_aligned;
+
+struct reader {
+	struct data *data;
+	uint8_t stop;
+};
+
+#define WRITER_RUNTIME (2.0) /* s */
+
+#define WRITER_MAX_DELAY (100) /* us */
+
+#define INTERRUPTED_WRITER_FREQUENCY (1000)
+#define WRITER_INTERRUPT_TIME (1) /* us */
+
+static int
+writer_run(void *arg)
+{
+	struct data *data = arg;
+	uint64_t deadline;
+
+	deadline = rte_get_timer_cycles() +
+		WRITER_RUNTIME * rte_get_timer_hz();
+
+	while (rte_get_timer_cycles() < deadline) {
+		bool interrupted;
+		uint64_t new_value;
+		unsigned int delay;
+
+		new_value = rte_rand();
+
+		interrupted = rte_rand_max(INTERRUPTED_WRITER_FREQUENCY) == 0;
+
+		rte_seqlock_write_lock(&data->lock);
+
+		data->c = new_value;
+		data->b = new_value;
+
+		if (interrupted)
+			rte_delay_us_block(WRITER_INTERRUPT_TIME);
+
+		data->a = new_value;
+
+		rte_seqlock_write_unlock(&data->lock);
+
+		delay = rte_rand_max(WRITER_MAX_DELAY);
+
+		rte_delay_us_block(delay);
+	}
+
+	return TEST_SUCCESS;
+}
+
+#define INTERRUPTED_READER_FREQUENCY (1000)
+#define READER_INTERRUPT_TIME (1000) /* us */
+
+static int
+reader_run(void *arg)
+{
+	struct reader *r = arg;
+	int rc = TEST_SUCCESS;
+
+	while (__atomic_load_n(&r->stop, __ATOMIC_RELAXED) == 0 &&
+	       rc == TEST_SUCCESS) {
+		struct data *data = r->data;
+		bool interrupted;
+		uint32_t sn;
+		uint64_t a;
+		uint64_t b;
+		uint64_t c;
+
+		interrupted = rte_rand_max(INTERRUPTED_READER_FREQUENCY) == 0;
+
+		do {
+			sn = rte_seqlock_read_begin(&data->lock);
+
+			a = data->a;
+			if (interrupted)
+				rte_delay_us_block(READER_INTERRUPT_TIME);
+			c = data->c;
+			b = data->b;
+
+		} while (rte_seqlock_read_retry(&data->lock, sn));
+
+		if (a != b || b != c) {
+			printf("Reader observed inconsistent data values "
+			       "%" PRIu64 " %" PRIu64 " %" PRIu64 "\n",
+			       a, b, c);
+			rc = TEST_FAILED;
+		}
+	}
+
+	return rc;
+}
+
+static void
+reader_stop(struct reader *reader)
+{
+	__atomic_store_n(&reader->stop, 1, __ATOMIC_RELAXED);
+}
+
+#define NUM_WRITERS (2) /* main lcore + one worker */
+#define MIN_NUM_READERS (2)
+#define MIN_LCORE_COUNT (NUM_WRITERS + MIN_NUM_READERS)
+
+/* Only a compile-time test */
+static rte_seqlock_t __rte_unused static_init_lock = RTE_SEQLOCK_INITIALIZER;
+
+static int
+test_seqlock(void)
+{
+	struct reader readers[RTE_MAX_LCORE];
+	unsigned int num_lcores;
+	unsigned int num_readers;
+	struct data *data;
+	unsigned int i;
+	unsigned int lcore_id;
+	unsigned int reader_lcore_ids[RTE_MAX_LCORE];
+	unsigned int worker_writer_lcore_id = 0;
+	int rc = TEST_SUCCESS;
+
+	num_lcores = rte_lcore_count();
+
+	if (num_lcores < MIN_LCORE_COUNT) {
+		printf("Too few cores to run test. Skipping.\n");
+		return TEST_SKIPPED;
+	}
+
+	num_readers = num_lcores - NUM_WRITERS;
+
+	data = rte_zmalloc(NULL, sizeof(struct data), 0);
+
+	if (data == NULL) {
+		printf("Failed to allocate memory for seqlock data\n");
+		return TEST_FAILED;
+	}
+
+	i = 0;
+	RTE_LCORE_FOREACH_WORKER(lcore_id) {
+		if (i == 0) {
+			rte_eal_remote_launch(writer_run, data, lcore_id);
+			worker_writer_lcore_id = lcore_id;
+		} else {
+			unsigned int reader_idx = i - 1;
+			struct reader *reader = &readers[reader_idx];
+
+			reader->data = data;
+			reader->stop = 0;
+
+			rte_eal_remote_launch(reader_run, reader, lcore_id);
+			reader_lcore_ids[reader_idx] = lcore_id;
+		}
+		i++;
+	}
+
+	if (writer_run(data) != 0 ||
+	    rte_eal_wait_lcore(worker_writer_lcore_id) != 0)
+		rc = TEST_FAILED;
+
+	for (i = 0; i < num_readers; i++) {
+		reader_stop(&readers[i]);
+		if (rte_eal_wait_lcore(reader_lcore_ids[i]) != 0)
+			rc = TEST_FAILED;
+	}
+
+	rte_free(data);
+
+	return rc;
+}
+
+REGISTER_TEST_COMMAND(seqlock_autotest, test_seqlock);
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index 2b78d796ea..6dd219ef0d 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -78,6 +78,8 @@ The public API headers are grouped by topics:
   [rwlock]             (@ref rte_rwlock.h),
   [spinlock]           (@ref rte_spinlock.h),
   [ticketlock]         (@ref rte_ticketlock.h),
+  [seqlock]            (@ref rte_seqlock.h),
+  [seqcount]           (@ref rte_seqcount.h),
   [RCU]                (@ref rte_rcu_qsbr.h)
 
 - **CPU arch**:
diff --git a/doc/guides/rel_notes/release_22_07.rst b/doc/guides/rel_notes/release_22_07.rst
index e49cacecef..9c48769171 100644
--- a/doc/guides/rel_notes/release_22_07.rst
+++ b/doc/guides/rel_notes/release_22_07.rst
@@ -60,6 +60,24 @@ New Features
   Added an API which can get the number of in-flight packets in
   vhost async data path without using lock.
 
+* **Added Sequence Lock.**
+
+  Added a new synchronization primitive: the sequence lock
+  (seqlock). A seqlock allows for low overhead, parallel reads. The
+  DPDK seqlock uses a spinlock to serialize multiple writing threads.
+
+  In particular, seqlocks are useful for protecting data structures
+  which are read very frequently, by threads running on many different
+  cores, and modified relatively infrequently.
+
+  One way to think about seqlocks is that they provide means to
+  perform atomic operations on data objects larger than what the
+  native atomic machine instructions allow for.
+
+  In cases where there is only a single writer, or writer-writer
+  synchronization is performed by some means external to the seqlock,
+  direct use of the underlying sequence counter may be more suitable.
+
 * **Updated Intel iavf driver.**
 
   * Added Tx QoS queue rate limitation support.
diff --git a/lib/eal/include/meson.build b/lib/eal/include/meson.build
index 9700494816..40ebb5b63d 100644
--- a/lib/eal/include/meson.build
+++ b/lib/eal/include/meson.build
@@ -36,6 +36,8 @@ headers += files(
         'rte_per_lcore.h',
         'rte_random.h',
         'rte_reciprocal.h',
+        'rte_seqcount.h',
+        'rte_seqlock.h',
         'rte_service.h',
         'rte_service_component.h',
         'rte_string_fns.h',
diff --git a/lib/eal/include/rte_seqcount.h b/lib/eal/include/rte_seqcount.h
new file mode 100644
index 0000000000..67c7ee03a4
--- /dev/null
+++ b/lib/eal/include/rte_seqcount.h
@@ -0,0 +1,252 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2022 Ericsson AB
+ */
+
+#ifndef _RTE_SEQCOUNT_H_
+#define _RTE_SEQCOUNT_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * @file
+ * RTE Seqcount
+ *
+ * The sequence counter synchronizes a single writer with multiple,
+ * parallel readers. It is used as the basis for the RTE sequence
+ * lock.
+ *
+ * @see rte_seqlock.h
+ */
+
+#include <stdbool.h>
+#include <stdint.h>
+
+#include <rte_atomic.h>
+#include <rte_branch_prediction.h>
+#include <rte_compat.h>
+
+/**
+ * The RTE seqcount type.
+ */
+typedef struct {
+	uint32_t sn; /**< A sequence number for the protected data. */
+} rte_seqcount_t;
+
+/**
+ * A static seqcount initializer.
+ */
+#define RTE_SEQCOUNT_INITIALIZER \
+	{							\
+		.sn = 0						\
+	}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Initialize the sequence counter.
+ *
+ * @param seqcount
+ *   A pointer to the sequence counter.
+ */
+__rte_experimental
+static inline void
+rte_seqcount_init(rte_seqcount_t *seqcount)
+{
+	seqcount->sn = 0;
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Begin a read-side critical section.
+ *
+ * A call to this function marks the beginning of a read-side critical
+ * section, for @p seqcount.
+ *
+ * rte_seqcount_read_begin() returns a sequence number, which is later
+ * used in rte_seqcount_read_retry() to check if the protected data
+ * underwent any modifications during the read transaction.
+ *
+ * After (in program order) rte_seqcount_read_begin() has been called,
+ * the calling thread reads the protected data, for later use. The
+ * protected data read *must* be copied (either in pristine form, or
+ * in the form of some derivative), since the caller may only read the
+ * data from within the read-side critical section (i.e., after
+ * rte_seqcount_read_begin() and before rte_seqcount_read_retry()),
+ * but must not act upon the retrieved data while in the critical
+ * section, since it does not yet know if it is consistent.
+ *
+ * The protected data may be read using atomic and/or non-atomic
+ * operations.
+ *
+ * After (in program order) all required data loads have been
+ * performed, rte_seqcount_read_retry() should be called, marking
+ * the end of the read-side critical section.
+ *
+ * If rte_seqcount_read_retry() returns true, the just-read data is
+ * inconsistent and should be discarded. The caller has the option to
+ * either restart the whole procedure right away (i.e., calling
+ * rte_seqcount_read_begin() again), or do the same at some later time.
+ *
+ * If rte_seqcount_read_retry() returns false, the data was read
+ * atomically and the copied data is consistent.
+ *
+ * @param seqcount
+ *   A pointer to the sequence counter.
+ * @return
+ *   The seqcount sequence number for this critical section, to
+ *   later be passed to rte_seqcount_read_retry().
+ *
+ * @see rte_seqcount_read_retry()
+ */
+
+__rte_experimental
+static inline uint32_t
+rte_seqcount_read_begin(const rte_seqcount_t *seqcount)
+{
+	/* __ATOMIC_ACQUIRE to prevent loads after (in program order)
+	 * from happening before the sn load. Synchronizes-with the
+	 * store release in rte_seqcount_write_end().
+	 */
+	return __atomic_load_n(&seqcount->sn, __ATOMIC_ACQUIRE);
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * End a read-side critical section.
+ *
+ * A call to this function marks the end of a read-side critical
+ * section, for @p seqcount. The application must supply the sequence
+ * number produced by the corresponding rte_seqcount_read_begin() call.
+ *
+ * After this function has been called, the caller should not access
+ * the protected data.
+ *
+ * In case rte_seqcount_read_retry() returns true, the just-read data
+ * was modified as it was being read and may be inconsistent, and thus
+ * should be discarded.
+ *
+ * In case this function returns false, the data is consistent and the
+ * set of atomic and non-atomic load operations performed between
+ * rte_seqcount_read_begin() and rte_seqcount_read_retry() were atomic,
+ * as a whole.
+ *
+ * @param seqcount
+ *   A pointer to the sequence counter.
+ * @param begin_sn
+ *   The sequence number returned by rte_seqcount_read_begin().
+ * @return
+ *   true or false, if the just-read seqcount-protected data was
+ *   inconsistent or consistent, respectively, at the time it was
+ *   read.
+ *
+ * @see rte_seqcount_read_begin()
+ */
+
+__rte_experimental
+static inline bool
+rte_seqcount_read_retry(const rte_seqcount_t *seqcount, uint32_t begin_sn)
+{
+	uint32_t end_sn;
+
+	/* An odd sequence number means the protected data was being
+	 * modified already at the point of the rte_seqcount_read_begin()
+	 * call.
+	 */
+	if (unlikely(begin_sn & 1))
+		return true;
+
+	/* make sure the data loads happens before the sn load */
+	rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
+
+	end_sn = __atomic_load_n(&seqcount->sn, __ATOMIC_RELAXED);
+
+	/* A writer incremented the sequence number during this read
+	 * critical section.
+	 */
+	if (unlikely(begin_sn != end_sn))
+		return true;
+
+	return false;
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Begin a write-side critical section.
+ *
+ * A call to this function marks the beginning of a write-side
+ * critical section, after which the caller may go on to modify (both
+ * read and write) the protected data, in an atomic or non-atomic
+ * manner.
+ *
+ * After the necessary updates have been performed, the application
+ * calls rte_seqcount_write_end().
+ *
+ * Multiple, parallel writers must use some external serialization.
+ *
+ * This function is not preemption-safe in the sense that preemption
+ * of the calling thread may block reader progress until the writer
+ * thread is rescheduled.
+ *
+ * @param seqcount
+ *   A pointer to the sequence counter.
+ *
+ * @see rte_seqcount_write_end()
+ */
+
+__rte_experimental
+static inline void
+rte_seqcount_write_begin(rte_seqcount_t *seqcount)
+{
+	uint32_t sn;
+
+	sn = seqcount->sn + 1;
+
+	__atomic_store_n(&seqcount->sn, sn, __ATOMIC_RELAXED);
+
+	/* __ATOMIC_RELEASE to prevent stores after (in program order)
+	 * from happening before the sn store.
+	 */
+	rte_atomic_thread_fence(__ATOMIC_RELEASE);
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * End a write-side critical section.
+ *
+ * A call to this function marks the end of the write-side critical
+ * section, for @p seqcount. After this call has been made, the
+ * protected data may no longer be modified.
+ *
+ * @param seqcount
+ *   A pointer to the sequence counter.
+ *
+ * @see rte_seqcount_write_begin()
+ */
+__rte_experimental
+static inline void
+rte_seqcount_write_end(rte_seqcount_t *seqcount)
+{
+	uint32_t sn;
+
+	sn = seqcount->sn + 1;
+
+	/* synchronizes-with the load acquire in rte_seqcount_read_begin() */
+	__atomic_store_n(&seqcount->sn, sn, __ATOMIC_RELEASE);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_SEQCOUNT_H_ */
diff --git a/lib/eal/include/rte_seqlock.h b/lib/eal/include/rte_seqlock.h
new file mode 100644
index 0000000000..5eb9023e31
--- /dev/null
+++ b/lib/eal/include/rte_seqlock.h
@@ -0,0 +1,253 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2022 Ericsson AB
+ */
+
+#ifndef _RTE_SEQLOCK_H_
+#define _RTE_SEQLOCK_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * @file
+ * RTE Seqlock
+ *
+ * A sequence lock (seqlock) is a synchronization primitive allowing
+ * multiple, parallel, readers to efficiently and safely (i.e., in a
+ * data-race free manner) access lock-protected data. The RTE seqlock
+ * permits multiple writers as well. A spinlock is used for
+ * writer-writer synchronization.
+ *
+ * A reader never blocks a writer. Very high frequency writes may
+ * prevent readers from making progress.
+ *
+ * A seqlock is not preemption-safe on the writer side. If a writer is
+ * preempted, it may block readers until the writer thread is allowed
+ * to continue. Heavy computations should be kept out of the
+ * writer-side critical section, to avoid delaying readers.
+ *
+ * Seqlocks are useful for data which are read by many cores, at a
+ * high frequency, and relatively infrequently written to.
+ *
+ * One way to think about seqlocks is that they provide means to
+ * perform atomic operations on objects larger than what the native
+ * machine instructions allow for.
+ *
+ * To avoid resource reclamation issues, the data protected by a
+ * seqlock should typically be kept self-contained (e.g., no pointers
+ * to mutable, dynamically allocated data).
+ *
+ * Example usage:
+ * @code{.c}
+ * #define MAX_Y_LEN (16)
+ * // Application-defined example data structure, protected by a seqlock.
+ * struct config {
+ *         rte_seqlock_t lock;
+ *         int param_x;
+ *         char param_y[MAX_Y_LEN];
+ * };
+ *
+ * // Accessor function for reading config fields.
+ * void
+ * config_read(const struct config *config, int *param_x, char *param_y)
+ * {
+ *         uint32_t sn;
+ *
+ *         do {
+ *                 sn = rte_seqlock_read_begin(&config->lock);
+ *
+ *                 // Loads may be atomic or non-atomic, as in this example.
+ *                 *param_x = config->param_x;
+ *                 strcpy(param_y, config->param_y);
+ *                 // An alternative to an immediate retry is to abort and
+ *                 // try again at some later time, assuming progress is
+ *                 // possible without the data.
+ *         } while (rte_seqlock_read_retry(&config->lock));
+ * }
+ *
+ * // Accessor function for writing config fields.
+ * void
+ * config_update(struct config *config, int param_x, const char *param_y)
+ * {
+ *         rte_seqlock_write_lock(&config->lock);
+ *         // Stores may be atomic or non-atomic, as in this example.
+ *         config->param_x = param_x;
+ *         strcpy(config->param_y, param_y);
+ *         rte_seqlock_write_unlock(&config->lock);
+ * }
+ * @endcode
+ *
+ * In case there is only a single writer, or writer-writer
+ * serialization is provided by other means, the use of sequence lock
+ * (i.e., rte_seqlock_t) can be replaced with the use of the "raw"
+ * rte_seqcount_t type instead.
+ *
+ * @see
+ * https://en.wikipedia.org/wiki/Seqlock.
+ */
+
+#include <stdbool.h>
+#include <stdint.h>
+
+#include <rte_atomic.h>
+#include <rte_branch_prediction.h>
+#include <rte_compat.h>
+#include <rte_seqcount.h>
+#include <rte_spinlock.h>
+
+/**
+ * The RTE seqlock type.
+ */
+typedef struct {
+	rte_seqcount_t count; /**< Sequence count for the protected data. */
+	rte_spinlock_t lock; /**< Spinlock used to serialize writers.  */
+} rte_seqlock_t;
+
+/**
+ * A static seqlock initializer.
+ */
+#define RTE_SEQLOCK_INITIALIZER \
+	{							\
+		.count = RTE_SEQCOUNT_INITIALIZER,		\
+		.lock = RTE_SPINLOCK_INITIALIZER		\
+	}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Initialize the seqlock.
+ *
+ * This function initializes the seqlock, and leaves the writer-side
+ * spinlock unlocked.
+ *
+ * @param seqlock
+ *   A pointer to the seqlock.
+ */
+__rte_experimental
+static inline void
+rte_seqlock_init(rte_seqlock_t *seqlock)
+{
+	rte_seqcount_init(&seqlock->count);
+	rte_spinlock_init(&seqlock->lock);
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Begin a read-side critical section.
+ *
+ * See rte_seqcount_read_retry() for details.
+ *
+ * @param seqlock
+ *   A pointer to the seqlock.
+ * @return
+ *   The seqlock sequence number for this critical section, to
+ *   later be passed to rte_seqlock_read_retry().
+ *
+ * @see rte_seqlock_read_retry()
+ * @see rte_seqcount_read_retry()
+ */
+
+__rte_experimental
+static inline uint32_t
+rte_seqlock_read_begin(const rte_seqlock_t *seqlock)
+{
+	return rte_seqcount_read_begin(&seqlock->count);
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * End a read-side critical section.
+ *
+ * See rte_seqcount_read_retry() for details.
+ *
+ * @param seqlock
+ *   A pointer to the seqlock.
+ * @param begin_sn
+ *   The seqlock sequence number returned by rte_seqlock_read_begin().
+ * @return
+ *   true or false, if the just-read seqlock-protected data was
+ *   inconsistent or consistent, respectively, at the time it was
+ *   read.
+ *
+ * @see rte_seqlock_read_begin()
+ */
+__rte_experimental
+static inline bool
+rte_seqlock_read_retry(const rte_seqlock_t *seqlock, uint32_t begin_sn)
+{
+	return rte_seqcount_read_retry(&seqlock->count, begin_sn);
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Begin a write-side critical section.
+ *
+ * A call to this function acquires the write lock associated @p
+ * seqlock, and marks the beginning of a write-side critical section.
+ *
+ * After having called this function, the caller may go on to modify
+ * (both read and write) the protected data, in an atomic or
+ * non-atomic manner.
+ *
+ * After the necessary updates have been performed, the application
+ * calls rte_seqlock_write_unlock().
+ *
+ * This function is not preemption-safe in the sense that preemption
+ * of the calling thread may block reader progress until the writer
+ * thread is rescheduled.
+ *
+ * Unlike rte_seqlock_read_begin(), each call made to
+ * rte_seqlock_write_lock() must be matched with an unlock call.
+ *
+ * @param seqlock
+ *   A pointer to the seqlock.
+ *
+ * @see rte_seqlock_write_unlock()
+ */
+__rte_experimental
+static inline void
+rte_seqlock_write_lock(rte_seqlock_t *seqlock)
+{
+	/* to synchronize with other writers */
+	rte_spinlock_lock(&seqlock->lock);
+
+	rte_seqcount_write_begin(&seqlock->count);
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * End a write-side critical section.
+ *
+ * A call to this function marks the end of the write-side critical
+ * section, for @p seqlock. After this call has been made, the protected
+ * data may no longer be modified.
+ *
+ * @param seqlock
+ *   A pointer to the seqlock.
+ *
+ * @see rte_seqlock_write_lock()
+ */
+__rte_experimental
+static inline void
+rte_seqlock_write_unlock(rte_seqlock_t *seqlock)
+{
+	rte_seqcount_write_end(&seqlock->count);
+
+	rte_spinlock_unlock(&seqlock->lock);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif  /* _RTE_SEQLOCK_H_ */
-- 
2.25.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v9] eal: add seqlock
  2022-05-23 14:23         ` [PATCH v9] " Mattias Rönnblom
@ 2022-05-31 11:52           ` David Marchand
  2022-06-01  9:01             ` Mattias Rönnblom
  2022-05-31 22:45           ` Stephen Hemminger
                             ` (2 subsequent siblings)
  3 siblings, 1 reply; 20+ messages in thread
From: David Marchand @ 2022-05-31 11:52 UTC (permalink / raw)
  To: Mattias Rönnblom
  Cc: Thomas Monjalon, dev, onar.olsen, Honnappa Nagarahalli, nd,
	Ananyev, Konstantin, Morten Brørup, Stephen Hemminger,
	Mattias Rönnblom, Chengwen Feng, Ola Liljedahl

On Mon, May 23, 2022 at 4:24 PM Mattias Rönnblom
<mattias.ronnblom@ericsson.com> wrote:
>
> A sequence lock (seqlock) is a synchronization primitive which allows
> for data-race free, low-overhead, high-frequency reads, suitable for
> data structures shared across many cores and which are updated
> relatively infrequently.
>
> A seqlock permits multiple parallel readers. A spinlock is used to
> serialize writers. In cases where there is only a single writer, or
> writer-writer synchronization is done by some external means, the
> "raw" sequence counter type (and accompanying rte_seqcount_*()
> functions) may be used instead.
>
> To avoid resource reclamation and other issues, the data protected by
> a seqlock is best off being self-contained (i.e., no pointers [except
> to constant data]).
>
> One way to think about seqlocks is that they provide means to perform
> atomic operations on data objects larger than what the native atomic
> machine instructions allow for.
>
> DPDK seqlocks (and the underlying sequence counters) are not
> preemption safe on the writer side. A thread preemption affects
> performance, not correctness.
>
> A seqlock contains a sequence number, which can be thought of as the
> generation of the data it protects.
>
> A reader will
>   1. Load the sequence number (sn).
>   2. Load, in arbitrary order, the seqlock-protected data.
>   3. Load the sn again.
>   4. Check if the first and second sn are equal, and even numbered.
>      If they are not, discard the loaded data, and restart from 1.
>
> The first three steps need to be ordered using suitable memory fences.
>
> A writer will
>   1. Take the spinlock, to serialize writer access.
>   2. Load the sn.
>   3. Store the original sn + 1 as the new sn.
>   4. Perform load and stores to the seqlock-protected data.
>   5. Store the original sn + 2 as the new sn.
>   6. Release the spinlock.
>
> Proper memory fencing is required to make sure the first sn store, the
> data stores, and the second sn store appear to the reader in the
> mentioned order.
>
> The sn loads and stores must be atomic, but the data loads and stores
> need not be.
>
> The original seqlock design and implementation was done by Stephen
> Hemminger. This is an independent implementation, using C11 atomics.
>
> For more information on seqlocks, see
> https://en.wikipedia.org/wiki/Seqlock

The SoB and other tags were with annotations, just repeating them here
(in the hope patchwork will catch them):

Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Reviewed-by: Chengwen Feng <fengchengwen@huawei.com>
Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>

GHA had a hiccup on Ubuntu vm installation at the time of the v9 reception.
I had run the checks on my side:
https://github.com/david-marchand/dpdk/actions/runs/2373533127

I noticed a little issue in the example in the doxygen comments that I
can fix myself:
--- a/lib/eal/include/rte_seqlock.h
+++ b/lib/eal/include/rte_seqlock.h
@@ -63,7 +63,7 @@ extern "C" {
  *                 // An alternative to an immediate retry is to abort and
  *                 // try again at some later time, assuming progress is
  *                 // possible without the data.
- *         } while (rte_seqlock_read_retry(&config->lock));
+ *         } while (rte_seqlock_read_retry(&config->lock, sn));
  * }
  *
  * // Accessor function for writing config fields.

I also have some whitespace fixes here and there.

I may shorten the release notes update to keep only the first
paragraph, as the rest is too verbose and repeated in the
documentation.

Overall, comments were addressed and the patch looks ready.
Last call before merging?


-- 
David Marchand


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v9] eal: add seqlock
  2022-05-23 14:23         ` [PATCH v9] " Mattias Rönnblom
  2022-05-31 11:52           ` David Marchand
@ 2022-05-31 22:45           ` Stephen Hemminger
  2022-06-01  6:07             ` Morten Brørup
  2022-06-01  8:19             ` Mattias Rönnblom
  2022-05-31 22:49           ` Stephen Hemminger
  2022-06-07  9:25           ` David Marchand
  3 siblings, 2 replies; 20+ messages in thread
From: Stephen Hemminger @ 2022-05-31 22:45 UTC (permalink / raw)
  To: Mattias Rönnblom
  Cc: Thomas Monjalon, David Marchand, dev, onar.olsen,
	Honnappa.Nagarahalli, nd, konstantin.ananyev, mb, hofors,
	Chengwen Feng, Ola Liljedahl

On Mon, 23 May 2022 16:23:46 +0200
Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote:

> +/**
> + * The RTE seqcount type.
> + */
> +typedef struct {
> +	uint32_t sn; /**< A sequence number for the protected data. */
> +} rte_seqcount_t;

Don't need structure for only one element.

typedef uint32_t rte_seqcount_t;

+	if (unlikely(begin_sn != end_sn))
+		return true;
+
+	return false;

Prefer to avoid conditional if possible (compiler will optimize it as):

        return begin_sn == end_sn;

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v9] eal: add seqlock
  2022-05-23 14:23         ` [PATCH v9] " Mattias Rönnblom
  2022-05-31 11:52           ` David Marchand
  2022-05-31 22:45           ` Stephen Hemminger
@ 2022-05-31 22:49           ` Stephen Hemminger
  2022-05-31 22:57             ` Honnappa Nagarahalli
  2022-06-07  9:25           ` David Marchand
  3 siblings, 1 reply; 20+ messages in thread
From: Stephen Hemminger @ 2022-05-31 22:49 UTC (permalink / raw)
  To: Mattias Rönnblom
  Cc: Thomas Monjalon, David Marchand, dev, onar.olsen,
	Honnappa.Nagarahalli, nd, konstantin.ananyev, mb, hofors,
	Chengwen Feng, Ola Liljedahl

On Mon, 23 May 2022 16:23:46 +0200
Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote:

> +
> +	/* make sure the data loads happens before the sn load */
> +	rte_atomic_thread_fence(__ATOMIC_ACQUIRE);

Why mix __atomic builtin with rte_atomic?
Instead:
        __atomic_thread_fence(__ATOMIC_ACQUIRE);

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v9] eal: add seqlock
  2022-05-31 22:49           ` Stephen Hemminger
@ 2022-05-31 22:57             ` Honnappa Nagarahalli
  0 siblings, 0 replies; 20+ messages in thread
From: Honnappa Nagarahalli @ 2022-05-31 22:57 UTC (permalink / raw)
  To: Stephen Hemminger, Mattias Rönnblom
  Cc: thomas, David Marchand, dev, onar.olsen, nd, konstantin.ananyev,
	mb, hofors, Chengwen Feng, Ola Liljedahl, nd

<snip>

> 
> On Mon, 23 May 2022 16:23:46 +0200
> Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote:
> 
> > +
> > +	/* make sure the data loads happens before the sn load */
> > +	rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
> 
> Why mix __atomic builtin with rte_atomic?
> Instead:
>         __atomic_thread_fence(__ATOMIC_ACQUIRE);
There was an issue with __atomic_thread_fence(__ATOMIC_SEQ_CST) performance on x86. Hence, it was decided to add the 'rte_atomic_thread_fence' wrapper and use it all the time [1].

[1] https://www.dpdk.org/blog/2021/03/26/dpdk-adopts-the-c11-memory-model/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v9] eal: add seqlock
  2022-05-31 22:45           ` Stephen Hemminger
@ 2022-06-01  6:07             ` Morten Brørup
  2022-06-01  8:19             ` Mattias Rönnblom
  1 sibling, 0 replies; 20+ messages in thread
From: Morten Brørup @ 2022-06-01  6:07 UTC (permalink / raw)
  To: Stephen Hemminger, Mattias Rönnblom
  Cc: Thomas Monjalon, David Marchand, dev, onar.olsen,
	Honnappa.Nagarahalli, nd, konstantin.ananyev, hofors,
	Chengwen Feng, Ola Liljedahl

> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Wednesday, 1 June 2022 00.46
> 
> On Mon, 23 May 2022 16:23:46 +0200
> Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote:
> 
> > +/**
> > + * The RTE seqcount type.
> > + */
> > +typedef struct {
> > +	uint32_t sn; /**< A sequence number for the protected data. */
> > +} rte_seqcount_t;
> 
> Don't need structure for only one element.
> 
> typedef uint32_t rte_seqcount_t;

Agree. The rte_seqcount_t type is not going to evolve in any way, so it should be safe not wrapping it into a structure.

> +	if (unlikely(begin_sn != end_sn))
> +		return true;
> +
> +	return false;
> 
> Prefer to avoid conditional if possible (compiler will optimize it as):
> 
>         return begin_sn == end_sn;

Typo: return begin_sn != end_sn;

Please keep the unlikely() hint, which could also be added to Stephen's variant.

Considering the comments in the source code related to this comparison, Mattias' version seems more readable. And I suppose the compiler is able to generate optimized code for both.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v9] eal: add seqlock
  2022-05-31 22:45           ` Stephen Hemminger
  2022-06-01  6:07             ` Morten Brørup
@ 2022-06-01  8:19             ` Mattias Rönnblom
  2022-06-01 16:15               ` Stephen Hemminger
  1 sibling, 1 reply; 20+ messages in thread
From: Mattias Rönnblom @ 2022-06-01  8:19 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Thomas Monjalon, David Marchand, dev, Onar Olsen,
	Honnappa.Nagarahalli, nd, konstantin.ananyev, mb, hofors,
	Chengwen Feng, Ola Liljedahl

On 2022-06-01 00:45, Stephen Hemminger wrote:
> On Mon, 23 May 2022 16:23:46 +0200
> Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote:
> 
>> +/**
>> + * The RTE seqcount type.
>> + */
>> +typedef struct {
>> +	uint32_t sn; /**< A sequence number for the protected data. */
>> +} rte_seqcount_t;
> 
> Don't need structure for only one element.
> 

The struct adds a degree of type safety, with no run-time cost.

> typedef uint32_t rte_seqcount_t;
> 
> +	if (unlikely(begin_sn != end_sn))
> +		return true;
> +
> +	return false;
> 
> Prefer to avoid conditional if possible (compiler will optimize it as):
> 
>          return begin_sn == end_sn;

Is this a readability argument, or a performance one?

The compiler might use the unlikely hint to do something useful, like 
avoiding a branch in the common case.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v9] eal: add seqlock
  2022-05-31 11:52           ` David Marchand
@ 2022-06-01  9:01             ` Mattias Rönnblom
  2022-06-01  9:10               ` Morten Brørup
  0 siblings, 1 reply; 20+ messages in thread
From: Mattias Rönnblom @ 2022-06-01  9:01 UTC (permalink / raw)
  To: David Marchand
  Cc: Thomas Monjalon, dev, Onar Olsen, Honnappa Nagarahalli, nd,
	Ananyev, Konstantin, Morten Brørup, Stephen Hemminger,
	Mattias Rönnblom, Chengwen Feng, Ola Liljedahl

On 2022-05-31 13:52, David Marchand wrote:
> On Mon, May 23, 2022 at 4:24 PM Mattias Rönnblom
> <mattias.ronnblom@ericsson.com> wrote:
>>
>> A sequence lock (seqlock) is a synchronization primitive which allows
>> for data-race free, low-overhead, high-frequency reads, suitable for
>> data structures shared across many cores and which are updated
>> relatively infrequently.
>>
>> A seqlock permits multiple parallel readers. A spinlock is used to
>> serialize writers. In cases where there is only a single writer, or
>> writer-writer synchronization is done by some external means, the
>> "raw" sequence counter type (and accompanying rte_seqcount_*()
>> functions) may be used instead.
>>
>> To avoid resource reclamation and other issues, the data protected by
>> a seqlock is best off being self-contained (i.e., no pointers [except
>> to constant data]).
>>
>> One way to think about seqlocks is that they provide means to perform
>> atomic operations on data objects larger than what the native atomic
>> machine instructions allow for.
>>
>> DPDK seqlocks (and the underlying sequence counters) are not
>> preemption safe on the writer side. A thread preemption affects
>> performance, not correctness.
>>
>> A seqlock contains a sequence number, which can be thought of as the
>> generation of the data it protects.
>>
>> A reader will
>>    1. Load the sequence number (sn).
>>    2. Load, in arbitrary order, the seqlock-protected data.
>>    3. Load the sn again.
>>    4. Check if the first and second sn are equal, and even numbered.
>>       If they are not, discard the loaded data, and restart from 1.
>>
>> The first three steps need to be ordered using suitable memory fences.
>>
>> A writer will
>>    1. Take the spinlock, to serialize writer access.
>>    2. Load the sn.
>>    3. Store the original sn + 1 as the new sn.
>>    4. Perform load and stores to the seqlock-protected data.
>>    5. Store the original sn + 2 as the new sn.
>>    6. Release the spinlock.
>>
>> Proper memory fencing is required to make sure the first sn store, the
>> data stores, and the second sn store appear to the reader in the
>> mentioned order.
>>
>> The sn loads and stores must be atomic, but the data loads and stores
>> need not be.
>>
>> The original seqlock design and implementation was done by Stephen
>> Hemminger. This is an independent implementation, using C11 atomics.
>>
>> For more information on seqlocks, see
>> https://en.wikipedia.org/wiki/Seqlock
> 
> The SoB and other tags were with annotations, just repeating them here
> (in the hope patchwork will catch them):
> 
> Acked-by: Morten Brørup <mb@smartsharesystems.com>
> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
> Reviewed-by: Chengwen Feng <fengchengwen@huawei.com>
> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> 
> GHA had a hiccup on Ubuntu vm installation at the time of the v9 reception.
> I had run the checks on my side:
> https://protect2.fireeye.com/v1/url?k=31323334-501d5122-313273af-454445555731-b0b15dcc7275f17b&q=1&e=d209ff4b-9405-4dbe-8e79-b8505f61ca68&u=https%3A%2F%2Fgithub.com%2Fdavid-marchand%2Fdpdk%2Factions%2Fruns%2F2373533127
> 
> I noticed a little issue in the example in the doxygen comments that I
> can fix myself:
> --- a/lib/eal/include/rte_seqlock.h
> +++ b/lib/eal/include/rte_seqlock.h
> @@ -63,7 +63,7 @@ extern "C" {
>    *                 // An alternative to an immediate retry is to abort and
>    *                 // try again at some later time, assuming progress is
>    *                 // possible without the data.
> - *         } while (rte_seqlock_read_retry(&config->lock));
> + *         } while (rte_seqlock_read_retry(&config->lock, sn));
>    * }
>    *
>    * // Accessor function for writing config fields.
> 
> I also have some whitespace fixes here and there.
> 
> I may shorten the release notes update to keep only the first
> paragraph, as the rest is too verbose and repeated in the
> documentation.

Makes sense.

> 
> Overall, comments were addressed and the patch looks ready.
> Last call before merging?
> 
> 

I don't have anything to add. Thanks.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v9] eal: add seqlock
  2022-06-01  9:01             ` Mattias Rönnblom
@ 2022-06-01  9:10               ` Morten Brørup
  0 siblings, 0 replies; 20+ messages in thread
From: Morten Brørup @ 2022-06-01  9:10 UTC (permalink / raw)
  To: Mattias Rönnblom, David Marchand
  Cc: Thomas Monjalon, dev, Onar Olsen, Honnappa Nagarahalli, nd,
	Ananyev, Konstantin, Stephen Hemminger, Mattias Rönnblom,
	Chengwen Feng, Ola Liljedahl

> From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com]
> Sent: Wednesday, 1 June 2022 11.02
> 
> On 2022-05-31 13:52, David Marchand wrote:
> > On Mon, May 23, 2022 at 4:24 PM Mattias Rönnblom
> > <mattias.ronnblom@ericsson.com> wrote:
> >>
> >> A sequence lock (seqlock) is a synchronization primitive which
> allows
> >> for data-race free, low-overhead, high-frequency reads, suitable for
> >> data structures shared across many cores and which are updated
> >> relatively infrequently.
> >>
> >> A seqlock permits multiple parallel readers. A spinlock is used to
> >> serialize writers. In cases where there is only a single writer, or
> >> writer-writer synchronization is done by some external means, the
> >> "raw" sequence counter type (and accompanying rte_seqcount_*()
> >> functions) may be used instead.
> >>
> >> To avoid resource reclamation and other issues, the data protected
> by
> >> a seqlock is best off being self-contained (i.e., no pointers
> [except
> >> to constant data]).
> >>
> >
> > Overall, comments were addressed and the patch looks ready.
> > Last call before merging?
> >
> >
> 
> I don't have anything to add. Thanks.

Please don't hold back merging on my behalf; you may disregard my previous email regarding the v9 patch.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v9] eal: add seqlock
  2022-06-01  8:19             ` Mattias Rönnblom
@ 2022-06-01 16:15               ` Stephen Hemminger
  2022-06-01 19:33                 ` Mattias Rönnblom
  0 siblings, 1 reply; 20+ messages in thread
From: Stephen Hemminger @ 2022-06-01 16:15 UTC (permalink / raw)
  To: Mattias Rönnblom
  Cc: Thomas Monjalon, David Marchand, dev, Onar Olsen,
	Honnappa.Nagarahalli, nd, konstantin.ananyev, mb, hofors,
	Chengwen Feng, Ola Liljedahl

On Wed, 1 Jun 2022 08:19:54 +0000
Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote:

> On 2022-06-01 00:45, Stephen Hemminger wrote:
> > On Mon, 23 May 2022 16:23:46 +0200
> > Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote:
> >   
> >> +/**
> >> + * The RTE seqcount type.
> >> + */
> >> +typedef struct {
> >> +	uint32_t sn; /**< A sequence number for the protected data. */
> >> +} rte_seqcount_t;  
> > 
> > Don't need structure for only one element.
> >   
> 
> The struct adds a degree of type safety, with no run-time cost.

Makes sense.

> 
> > typedef uint32_t rte_seqcount_t;
> > 
> > +	if (unlikely(begin_sn != end_sn))
> > +		return true;
> > +
> > +	return false;
> > 
> > Prefer to avoid conditional if possible (compiler will optimize it as):
> > 
> >          return begin_sn == end_sn;  
> 
> Is this a readability argument, or a performance one?
> 
> The compiler might use the unlikely hint to do something useful, like 
> avoiding a branch in the common case.

It is a matter of taste. I always prefer writing the smallest (within reason)
amount of code as possible. And my preference is to do things with declarative
and data statements rather than conditionals.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v9] eal: add seqlock
  2022-06-01 16:15               ` Stephen Hemminger
@ 2022-06-01 19:33                 ` Mattias Rönnblom
  0 siblings, 0 replies; 20+ messages in thread
From: Mattias Rönnblom @ 2022-06-01 19:33 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Thomas Monjalon, David Marchand, dev, Onar Olsen,
	Honnappa.Nagarahalli, nd, konstantin.ananyev, mb, hofors,
	Chengwen Feng, Ola Liljedahl

On 2022-06-01 18:15, Stephen Hemminger wrote:
> On Wed, 1 Jun 2022 08:19:54 +0000
> Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote:
> 
>> On 2022-06-01 00:45, Stephen Hemminger wrote:
>>> On Mon, 23 May 2022 16:23:46 +0200
>>> Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote:
>>>    
>>>> +/**
>>>> + * The RTE seqcount type.
>>>> + */
>>>> +typedef struct {
>>>> +	uint32_t sn; /**< A sequence number for the protected data. */
>>>> +} rte_seqcount_t;
>>>
>>> Don't need structure for only one element.
>>>    
>>
>> The struct adds a degree of type safety, with no run-time cost.
> 
> Makes sense.
> 
>>
>>> typedef uint32_t rte_seqcount_t;
>>>
>>> +	if (unlikely(begin_sn != end_sn))
>>> +		return true;
>>> +
>>> +	return false;
>>>
>>> Prefer to avoid conditional if possible (compiler will optimize it as):
>>>
>>>           return begin_sn == end_sn;
>>
>> Is this a readability argument, or a performance one?
>>
>> The compiler might use the unlikely hint to do something useful, like
>> avoiding a branch in the common case.
> 
> It is a matter of taste. I always prefer writing the smallest (within reason)
> amount of code as possible. And my preference is to do things with declarative
> and data statements rather than conditionals.
I agree. A one-liner would have been a better choice. (A correct one, 
with the hint.)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v9] eal: add seqlock
  2022-05-23 14:23         ` [PATCH v9] " Mattias Rönnblom
                             ` (2 preceding siblings ...)
  2022-05-31 22:49           ` Stephen Hemminger
@ 2022-06-07  9:25           ` David Marchand
  3 siblings, 0 replies; 20+ messages in thread
From: David Marchand @ 2022-06-07  9:25 UTC (permalink / raw)
  To: Mattias Rönnblom
  Cc: Thomas Monjalon, dev, onar.olsen, Honnappa Nagarahalli, nd,
	Ananyev, Konstantin, Morten Brørup, Stephen Hemminger,
	Mattias Rönnblom, Chengwen Feng, Ola Liljedahl

On Mon, May 23, 2022 at 4:24 PM Mattias Rönnblom
<mattias.ronnblom@ericsson.com> wrote:
>
> A sequence lock (seqlock) is a synchronization primitive which allows
> for data-race free, low-overhead, high-frequency reads, suitable for
> data structures shared across many cores and which are updated
> relatively infrequently.
>
> A seqlock permits multiple parallel readers. A spinlock is used to
> serialize writers. In cases where there is only a single writer, or
> writer-writer synchronization is done by some external means, the
> "raw" sequence counter type (and accompanying rte_seqcount_*()
> functions) may be used instead.
>
> To avoid resource reclamation and other issues, the data protected by
> a seqlock is best off being self-contained (i.e., no pointers [except
> to constant data]).
>
> One way to think about seqlocks is that they provide means to perform
> atomic operations on data objects larger than what the native atomic
> machine instructions allow for.
>
> DPDK seqlocks (and the underlying sequence counters) are not
> preemption safe on the writer side. A thread preemption affects
> performance, not correctness.
>
> A seqlock contains a sequence number, which can be thought of as the
> generation of the data it protects.
>
> A reader will
>   1. Load the sequence number (sn).
>   2. Load, in arbitrary order, the seqlock-protected data.
>   3. Load the sn again.
>   4. Check if the first and second sn are equal, and even numbered.
>      If they are not, discard the loaded data, and restart from 1.
>
> The first three steps need to be ordered using suitable memory fences.
>
> A writer will
>   1. Take the spinlock, to serialize writer access.
>   2. Load the sn.
>   3. Store the original sn + 1 as the new sn.
>   4. Perform load and stores to the seqlock-protected data.
>   5. Store the original sn + 2 as the new sn.
>   6. Release the spinlock.
>
> Proper memory fencing is required to make sure the first sn store, the
> data stores, and the second sn store appear to the reader in the
> mentioned order.
>
> The sn loads and stores must be atomic, but the data loads and stores
> need not be.
>
> The original seqlock design and implementation was done by Stephen
> Hemminger. This is an independent implementation, using C11 atomics.
>
> For more information on seqlocks, see
> https://en.wikipedia.org/wiki/Seqlock
>
> Acked-by: Morten Brørup <mb@smartsharesystems.com>
> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
> Reviewed-by: Chengwen Feng <fengchengwen@huawei.com>
> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>

As mentionned before, I did some little style fixes, and one last
minute update following discussion between Stephen and Mattias.
This patch passes my checks and the CI was okay.

Thanks Mattias, and reviewers.
Applied for -rc1.


-- 
David Marchand


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2022-06-07  9:25 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20220513103820.3e34fcb9@hermes.local>
2022-05-15 12:24 ` [PATCH v7] eal: add seqlock Mattias Rönnblom
2022-05-15 12:39   ` Mattias Rönnblom
2022-05-15 15:19     ` Morten Brørup
2022-05-15 17:54       ` Mattias Rönnblom
2022-05-16  8:30         ` Morten Brørup
2022-05-15 15:23     ` Morten Brørup
2022-05-20  6:02     ` Mattias Rönnblom
2022-05-23 11:31       ` [PATCH v8] " Mattias Rönnblom
2022-05-23 14:23         ` [PATCH v9] " Mattias Rönnblom
2022-05-31 11:52           ` David Marchand
2022-06-01  9:01             ` Mattias Rönnblom
2022-06-01  9:10               ` Morten Brørup
2022-05-31 22:45           ` Stephen Hemminger
2022-06-01  6:07             ` Morten Brørup
2022-06-01  8:19             ` Mattias Rönnblom
2022-06-01 16:15               ` Stephen Hemminger
2022-06-01 19:33                 ` Mattias Rönnblom
2022-05-31 22:49           ` Stephen Hemminger
2022-05-31 22:57             ` Honnappa Nagarahalli
2022-06-07  9:25           ` David Marchand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).