DPDK patches and discussions
 help / color / mirror / Atom feed
* DPDK seqlock
@ 2022-03-22 16:10 Mattias Rönnblom
  2022-03-22 16:46 ` Ananyev, Konstantin
  2022-03-23 12:04 ` DPDK seqlock Morten Brørup
  0 siblings, 2 replies; 104+ messages in thread
From: Mattias Rönnblom @ 2022-03-22 16:10 UTC (permalink / raw)
  To: dev

Hi.

Would it make sense to have a seqlock implementation in DPDK?

I think so, since it's a very useful synchronization primitive in data 
plane applications.

Regards,
     Mattias


^ permalink raw reply	[flat|nested] 104+ messages in thread

* RE: DPDK seqlock
  2022-03-22 16:10 DPDK seqlock Mattias Rönnblom
@ 2022-03-22 16:46 ` Ananyev, Konstantin
  2022-03-24  4:52   ` Honnappa Nagarahalli
  2022-03-23 12:04 ` DPDK seqlock Morten Brørup
  1 sibling, 1 reply; 104+ messages in thread
From: Ananyev, Konstantin @ 2022-03-22 16:46 UTC (permalink / raw)
  To: mattias.ronnblom, dev



Hi Mattias,

> 
> Would it make sense to have a seqlock implementation in DPDK?
> 
> I think so, since it's a very useful synchronization primitive in data
> plane applications.
> 

Agree, it might be useful.
As I remember rte_hash '_lf' functions do use something similar to seqlock,
but in hand-made manner.
Probably some other entities within DPDK itself or related projects 
will benefit from it too...
   
Konstantin

^ permalink raw reply	[flat|nested] 104+ messages in thread

* RE: DPDK seqlock
  2022-03-22 16:10 DPDK seqlock Mattias Rönnblom
  2022-03-22 16:46 ` Ananyev, Konstantin
@ 2022-03-23 12:04 ` Morten Brørup
  1 sibling, 0 replies; 104+ messages in thread
From: Morten Brørup @ 2022-03-23 12:04 UTC (permalink / raw)
  To: Mattias Rönnblom, dev

> From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com]
> Sent: Tuesday, 22 March 2022 17.10
> 
> Hi.
> 
> Would it make sense to have a seqlock implementation in DPDK?

Certainly!

> 
> I think so, since it's a very useful synchronization primitive in data
> plane applications.

Yes, and having it in DPDK saves application developers from writing their own (with the risks coming with that).

> 
> Regards,
>      Mattias


^ permalink raw reply	[flat|nested] 104+ messages in thread

* RE: DPDK seqlock
  2022-03-22 16:46 ` Ananyev, Konstantin
@ 2022-03-24  4:52   ` Honnappa Nagarahalli
  2022-03-24  5:06     ` Stephen Hemminger
  2022-03-24 11:34     ` Mattias Rönnblom
  0 siblings, 2 replies; 104+ messages in thread
From: Honnappa Nagarahalli @ 2022-03-24  4:52 UTC (permalink / raw)
  To: Ananyev, Konstantin, mattias.ronnblom, dev; +Cc: nd, nd

<snip>

> 
> Hi Mattias,
> 
> >
> > Would it make sense to have a seqlock implementation in DPDK?
I do not have any issues with adding the seqlock to DPDK.

However, I am interested in understanding the use case. As I understand, seqlock is a type of reader-writer lock. This means that it is possible that readers (data plane) may be blocked till the writer completes the updates. Does not this mean, data plane might drop packets while the writer is updating entries?

> >
> > I think so, since it's a very useful synchronization primitive in data
> > plane applications.
> >
> 
> Agree, it might be useful.
> As I remember rte_hash '_lf' functions do use something similar to seqlock, but
> in hand-made manner.
> Probably some other entities within DPDK itself or related projects will benefit
> from it too...
> 
> Konstantin

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: DPDK seqlock
  2022-03-24  4:52   ` Honnappa Nagarahalli
@ 2022-03-24  5:06     ` Stephen Hemminger
  2022-03-24 11:34     ` Mattias Rönnblom
  1 sibling, 0 replies; 104+ messages in thread
From: Stephen Hemminger @ 2022-03-24  5:06 UTC (permalink / raw)
  To: Honnappa Nagarahalli; +Cc: Ananyev, Konstantin, mattias.ronnblom, dev, nd

On Thu, 24 Mar 2022 04:52:07 +0000
Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com> wrote:

> <snip>
> 
> > 
> > Hi Mattias,
> >   
> > >
> > > Would it make sense to have a seqlock implementation in DPDK?  
> I do not have any issues with adding the seqlock to DPDK.
> 
> However, I am interested in understanding the use case. As I understand, seqlock is a type of reader-writer lock. This means that it is possible that readers (data plane) may be blocked till the writer completes the updates. Does not this mean, data plane might drop packets while the writer is updating entries?
> 
> > >
> > > I think so, since it's a very useful synchronization primitive in data
> > > plane applications.
> > >  
> > 
> > Agree, it might be useful.
> > As I remember rte_hash '_lf' functions do use something similar to seqlock, but
> > in hand-made manner.
> > Probably some other entities within DPDK itself or related projects will benefit
> > from it too...
> > 
> > Konstantin

As inventor of seqlock, it is really just a kind of reader/writer spinlock
where spinning trys to do useful work.  It useful for cases where the data
being accessed is too large for __atomic primitives. 



^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: DPDK seqlock
  2022-03-24  4:52   ` Honnappa Nagarahalli
  2022-03-24  5:06     ` Stephen Hemminger
@ 2022-03-24 11:34     ` Mattias Rönnblom
  2022-03-25 20:24       ` [RFC] eal: add seqlock Mattias Rönnblom
  1 sibling, 1 reply; 104+ messages in thread
From: Mattias Rönnblom @ 2022-03-24 11:34 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Ananyev, Konstantin, dev; +Cc: nd

On 2022-03-24 05:52, Honnappa Nagarahalli wrote:
> <snip>
>
>> Hi Mattias,
>>
>>> Would it make sense to have a seqlock implementation in DPDK?
> I do not have any issues with adding the seqlock to DPDK.
>
> However, I am interested in understanding the use case. As I understand, seqlock is a type of reader-writer lock. This means that it is possible that readers (data plane) may be blocked till the writer completes the updates. Does not this mean, data plane might drop packets while the writer is updating entries?

Yes, it's not preemption-safe, just like for example a 
spinlock-protected data structure. If the writer is interrupted after 
having stored the first counter update, but before storing the second, 
all subsequent read attempts will fail. The reading workers would have 
decide to either give up reading the data structure being protected, or 
keep retrying indefinitely.

This issue is common across all non-preemption-safe data structures 
(default rings, spinlocks etc), and can be avoided by (surprise!) 
avoiding preemption by running the control plane thread on RT priority 
or on a dedicated core, or to use a preemption safe way to tell one of 
the worker lcore threads to do the actual update.

A seqlock is much more efficient on the reader side for high-frequency 
accesses from many cores than a regular RW lock (i.e., one implemented 
by two spinlocks or mutexes). A spinlock being lock/unlocked on a 
per-packet basis by every core is a total performance killer.

>>> I think so, since it's a very useful synchronization primitive in data
>>> plane applications.
>>>
>> Agree, it might be useful.
>> As I remember rte_hash '_lf' functions do use something similar to seqlock, but
>> in hand-made manner.
>> Probably some other entities within DPDK itself or related projects will benefit
>> from it too...
>>
>> Konstantin


^ permalink raw reply	[flat|nested] 104+ messages in thread

* [RFC] eal: add seqlock
  2022-03-24 11:34     ` Mattias Rönnblom
@ 2022-03-25 20:24       ` Mattias Rönnblom
  2022-03-25 21:10         ` Stephen Hemminger
  2022-03-27 14:49         ` Ananyev, Konstantin
  0 siblings, 2 replies; 104+ messages in thread
From: Mattias Rönnblom @ 2022-03-25 20:24 UTC (permalink / raw)
  To: dev
  Cc: Thomas Monjalon, David Marchand, onar.olsen,
	Honnappa.Nagarahalli, nd, konstantin.ananyev, mb, stephen,
	Mattias Rönnblom, Ola Liljedahl

A sequence lock (seqlock) is synchronization primitive which allows
for data-race free, low-overhead, high-frequency reads, especially for
data structures shared across many cores and which are updated with
relatively low frequency.

A seqlock permits multiple parallel readers. The variant of seqlock
implemented in this patch supports multiple writers as well. A
spinlock is used for writer-writer serialization.

To avoid resource reclamation and other issues, the data protected by
a seqlock is best off being self-contained (i.e., no pointers [except
to constant data]).

One way to think about seqlocks is that they provide means to perform
atomic operations on data objects larger what the native atomic
machine instructions allow for.

DPDK seqlocks are not preemption safe on the writer side. A thread
preemption affects performance, not correctness.

A seqlock contains a sequence number, which can be thought of as the
generation of the data it protects.

A reader will
  1. Load the sequence number (sn).
  2. Load, in arbitrary order, the seqlock-protected data.
  3. Load the sn again.
  4. Check if the first and second sn are equal, and even numbered.
     If they are not, discard the loaded data, and restart from 1.

The first three steps need to be ordered using suitable memory fences.

A writer will
  1. Take the spinlock, to serialize writer access.
  2. Load the sn.
  3. Store the original sn + 1 as the new sn.
  4. Perform load and stores to the seqlock-protected data.
  5. Store the original sn + 2 as the new sn.
  6. Release the spinlock.

Proper memory fencing is required to make sure the first sn store, the
data stores, and the second sn store appear to the reader in the
mentioned order.

The sn loads and stores must be atomic, but the data loads and stores
need not be.

The original seqlock design and implementation was done by Stephen
Hemminger. This is an independent implementation, using C11 atomics.

This RFC version lacks API documentation.

Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
---
 app/test/meson.build          |   2 +
 app/test/test_seqlock.c       | 197 ++++++++++++++++++++++++++++++++++
 lib/eal/common/meson.build    |   1 +
 lib/eal/common/rte_seqlock.c  |  12 +++
 lib/eal/include/meson.build   |   1 +
 lib/eal/include/rte_seqlock.h |  84 +++++++++++++++
 lib/eal/version.map           |   3 +
 7 files changed, 300 insertions(+)
 create mode 100644 app/test/test_seqlock.c
 create mode 100644 lib/eal/common/rte_seqlock.c
 create mode 100644 lib/eal/include/rte_seqlock.h

diff --git a/app/test/meson.build b/app/test/meson.build
index 5fc1dd1b7b..5e418e8766 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -125,6 +125,7 @@ test_sources = files(
         'test_rwlock.c',
         'test_sched.c',
         'test_security.c',
+        'test_seqlock.c',
         'test_service_cores.c',
         'test_spinlock.c',
         'test_stack.c',
@@ -214,6 +215,7 @@ fast_tests = [
         ['rwlock_rde_wro_autotest', true],
         ['sched_autotest', true],
         ['security_autotest', false],
+        ['seqlock_autotest', true],
         ['spinlock_autotest', true],
         ['stack_autotest', false],
         ['stack_lf_autotest', false],
diff --git a/app/test/test_seqlock.c b/app/test/test_seqlock.c
new file mode 100644
index 0000000000..a727e16caf
--- /dev/null
+++ b/app/test/test_seqlock.c
@@ -0,0 +1,197 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2022 Ericsson AB
+ */
+
+#include <rte_seqlock.h>
+
+#include <rte_cycles.h>
+#include <rte_malloc.h>
+#include <rte_random.h>
+
+#include <inttypes.h>
+
+#include "test.h"
+
+struct data {
+	rte_seqlock_t lock;
+
+	uint64_t a;
+	uint64_t b __rte_cache_aligned;
+	uint64_t c __rte_cache_aligned;
+} __rte_cache_aligned;
+
+struct reader {
+	struct data *data;
+	uint8_t stop;
+};
+
+#define WRITER_RUNTIME (2.0) /* s */
+
+#define WRITER_MAX_DELAY (100) /* us */
+
+#define INTERRUPTED_WRITER_FREQUENCY (1000)
+#define WRITER_INTERRUPT_TIME (1) /* us */
+
+static int
+writer_start(void *arg)
+{
+	struct data *data = arg;
+	uint64_t deadline;
+
+	deadline = rte_get_timer_cycles() +
+		WRITER_RUNTIME * rte_get_timer_hz();
+
+	while (rte_get_timer_cycles() < deadline) {
+		bool interrupted;
+		uint64_t new_value;
+		unsigned int delay;
+
+		new_value = rte_rand();
+
+		interrupted = rte_rand_max(INTERRUPTED_WRITER_FREQUENCY) == 0;
+
+		rte_seqlock_write_begin(&data->lock);
+
+		data->c = new_value;
+
+		/* These compiler barriers (both on the test reader
+		 * and the test writer side) are here to ensure that
+		 * loads/stores *usually* happen in test program order
+		 * (always on a TSO machine). They are arrange in such
+		 * a way that the writer stores in a different order
+		 * than the reader loads, to emulate an arbitrary
+		 * order. A real application using a seqlock does not
+		 * require any compiler barriers.
+		 */
+		rte_compiler_barrier();
+		data->b = new_value;
+
+		if (interrupted)
+			rte_delay_us_block(WRITER_INTERRUPT_TIME);
+
+		rte_compiler_barrier();
+		data->a = new_value;
+
+		rte_seqlock_write_end(&data->lock);
+
+		delay = rte_rand_max(WRITER_MAX_DELAY);
+
+		rte_delay_us_block(delay);
+	}
+
+	return 0;
+}
+
+#define INTERRUPTED_READER_FREQUENCY (1000)
+#define READER_INTERRUPT_TIME (1000) /* us */
+
+static int
+reader_start(void *arg)
+{
+	struct reader *r = arg;
+	int rc = 0;
+
+	while (__atomic_load_n(&r->stop, __ATOMIC_RELAXED) == 0 && rc == 0) {
+		struct data *data = r->data;
+		bool interrupted;
+		uint64_t a;
+		uint64_t b;
+		uint64_t c;
+		uint64_t sn;
+
+		interrupted = rte_rand_max(INTERRUPTED_READER_FREQUENCY) == 0;
+
+		do {
+			sn = rte_seqlock_read_begin(&data->lock);
+
+			a = data->a;
+			/* See writer_start() for an explaination why
+			 * these barriers are here.
+			 */
+			rte_compiler_barrier();
+
+			if (interrupted)
+				rte_delay_us_block(READER_INTERRUPT_TIME);
+
+			c = data->c;
+
+			rte_compiler_barrier();
+			b = data->b;
+
+		} while (rte_seqlock_read_retry(&data->lock, sn));
+
+		if (a != b || b != c) {
+			printf("Reader observed inconsistent data values "
+			       "%" PRIu64 " %" PRIu64 " %" PRIu64 "\n",
+			       a, b, c);
+			rc = -1;
+		}
+	}
+
+	return rc;
+}
+
+static void
+reader_stop(struct reader *reader)
+{
+	__atomic_store_n(&reader->stop, 1, __ATOMIC_RELAXED);
+}
+
+#define NUM_WRITERS (2)
+#define MIN_NUM_READERS (2)
+#define MAX_READERS (RTE_MAX_LCORE - NUM_WRITERS - 1)
+#define MIN_LCORE_COUNT (NUM_WRITERS + MIN_NUM_READERS + 1)
+
+static int
+test_seqlock(void)
+{
+	struct reader readers[MAX_READERS];
+	unsigned int num_readers;
+	unsigned int num_lcores;
+	unsigned int i;
+	unsigned int lcore_id;
+	unsigned int writer_lcore_ids[NUM_WRITERS] = { 0 };
+	unsigned int reader_lcore_ids[MAX_READERS];
+	int rc = 0;
+
+	num_lcores = rte_lcore_count();
+
+	if (num_lcores < MIN_LCORE_COUNT)
+		return -1;
+
+	num_readers = num_lcores - NUM_WRITERS - 1;
+
+	struct data *data = rte_zmalloc(NULL, sizeof(struct data), 0);
+
+	i = 0;
+	RTE_LCORE_FOREACH_WORKER(lcore_id) {
+		if (i < NUM_WRITERS) {
+			rte_eal_remote_launch(writer_start, data, lcore_id);
+			writer_lcore_ids[i] = lcore_id;
+		} else {
+			unsigned int reader_idx = i - NUM_WRITERS;
+			struct reader *reader = &readers[reader_idx];
+
+			reader->data = data;
+			reader->stop = 0;
+
+			rte_eal_remote_launch(reader_start, reader, lcore_id);
+			reader_lcore_ids[reader_idx] = lcore_id;
+		}
+		i++;
+	}
+
+	for (i = 0; i < NUM_WRITERS; i++)
+		if (rte_eal_wait_lcore(writer_lcore_ids[i]) != 0)
+			rc = -1;
+
+	for (i = 0; i < num_readers; i++) {
+		reader_stop(&readers[i]);
+		if (rte_eal_wait_lcore(reader_lcore_ids[i]) != 0)
+			rc = -1;
+	}
+
+	return rc;
+}
+
+REGISTER_TEST_COMMAND(seqlock_autotest, test_seqlock);
diff --git a/lib/eal/common/meson.build b/lib/eal/common/meson.build
index 917758cc65..a41343bfed 100644
--- a/lib/eal/common/meson.build
+++ b/lib/eal/common/meson.build
@@ -35,6 +35,7 @@ sources += files(
         'rte_malloc.c',
         'rte_random.c',
         'rte_reciprocal.c',
+	'rte_seqlock.c',
         'rte_service.c',
         'rte_version.c',
 )
diff --git a/lib/eal/common/rte_seqlock.c b/lib/eal/common/rte_seqlock.c
new file mode 100644
index 0000000000..d4fe648799
--- /dev/null
+++ b/lib/eal/common/rte_seqlock.c
@@ -0,0 +1,12 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2022 Ericsson AB
+ */
+
+#include <rte_seqlock.h>
+
+void
+rte_seqlock_init(rte_seqlock_t *seqlock)
+{
+	seqlock->sn = 0;
+	rte_spinlock_init(&seqlock->lock);
+}
diff --git a/lib/eal/include/meson.build b/lib/eal/include/meson.build
index 9700494816..48df5f1a21 100644
--- a/lib/eal/include/meson.build
+++ b/lib/eal/include/meson.build
@@ -36,6 +36,7 @@ headers += files(
         'rte_per_lcore.h',
         'rte_random.h',
         'rte_reciprocal.h',
+        'rte_seqlock.h',
         'rte_service.h',
         'rte_service_component.h',
         'rte_string_fns.h',
diff --git a/lib/eal/include/rte_seqlock.h b/lib/eal/include/rte_seqlock.h
new file mode 100644
index 0000000000..b975ca848a
--- /dev/null
+++ b/lib/eal/include/rte_seqlock.h
@@ -0,0 +1,84 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2022 Ericsson AB
+ */
+
+#ifndef _RTE_SEQLOCK_H_
+#define _RTE_SEQLOCK_H_
+
+#include <stdbool.h>
+#include <stdint.h>
+
+#include <rte_atomic.h>
+#include <rte_branch_prediction.h>
+#include <rte_spinlock.h>
+
+struct rte_seqlock {
+	uint64_t sn;
+	rte_spinlock_t lock;
+};
+
+typedef struct rte_seqlock rte_seqlock_t;
+
+__rte_experimental
+void
+rte_seqlock_init(rte_seqlock_t *seqlock);
+
+__rte_experimental
+static inline uint64_t
+rte_seqlock_read_begin(const rte_seqlock_t *seqlock)
+{
+	/* __ATOMIC_ACQUIRE to prevent loads after (in program order)
+	 * from happening before the sn load. Syncronizes-with the
+	 * store release in rte_seqlock_end().
+	 */
+	return __atomic_load_n(&seqlock->sn, __ATOMIC_ACQUIRE);
+}
+
+__rte_experimental
+static inline bool
+rte_seqlock_read_retry(const rte_seqlock_t *seqlock, uint64_t begin_sn)
+{
+	uint64_t end_sn;
+
+	/* make sure the data loads happens before the sn load */
+	rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
+
+	end_sn = __atomic_load_n(&seqlock->sn, __ATOMIC_RELAXED);
+
+	return unlikely(begin_sn & 1 || begin_sn != end_sn);
+}
+
+__rte_experimental
+static inline void
+rte_seqlock_write_begin(rte_seqlock_t *seqlock)
+{
+	uint64_t sn;
+
+	/* to synchronize with other writers */
+	rte_spinlock_lock(&seqlock->lock);
+
+	sn = seqlock->sn + 1;
+
+	__atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELAXED);
+
+	/* __ATOMIC_RELEASE to prevent stores after (in program order)
+	 * from happening before the sn store.
+	 */
+	rte_atomic_thread_fence(__ATOMIC_RELEASE);
+}
+
+__rte_experimental
+static inline void
+rte_seqlock_write_end(rte_seqlock_t *seqlock)
+{
+	uint64_t sn;
+
+	sn = seqlock->sn + 1;
+
+	/* synchronizes-with the load acquire in rte_seqlock_begin() */
+	__atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELEASE);
+
+	rte_spinlock_unlock(&seqlock->lock);
+}
+
+#endif  /* _RTE_SEQLOCK_H_ */
diff --git a/lib/eal/version.map b/lib/eal/version.map
index b53eeb30d7..4a9d0ed899 100644
--- a/lib/eal/version.map
+++ b/lib/eal/version.map
@@ -420,6 +420,9 @@ EXPERIMENTAL {
 	rte_intr_instance_free;
 	rte_intr_type_get;
 	rte_intr_type_set;
+
+	# added in 22.07
+	rte_seqlock_init;
 };
 
 INTERNAL {
-- 
2.25.1


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC] eal: add seqlock
  2022-03-25 20:24       ` [RFC] eal: add seqlock Mattias Rönnblom
@ 2022-03-25 21:10         ` Stephen Hemminger
  2022-03-26 14:57           ` Mattias Rönnblom
  2022-03-27 14:49         ` Ananyev, Konstantin
  1 sibling, 1 reply; 104+ messages in thread
From: Stephen Hemminger @ 2022-03-25 21:10 UTC (permalink / raw)
  To: Mattias Rönnblom
  Cc: dev, Thomas Monjalon, David Marchand, onar.olsen,
	Honnappa.Nagarahalli, nd, konstantin.ananyev, mb, Ola Liljedahl

On Fri, 25 Mar 2022 21:24:28 +0100
Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote:

> diff --git a/lib/eal/include/rte_seqlock.h b/lib/eal/include/rte_seqlock.h
> new file mode 100644
> index 0000000000..b975ca848a
> --- /dev/null
> +++ b/lib/eal/include/rte_seqlock.h
> @@ -0,0 +1,84 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2022 Ericsson AB
> + */
> +
> +#ifndef _RTE_SEQLOCK_H_
> +#define _RTE_SEQLOCK_H_
> +
> +#include <stdbool.h>
> +#include <stdint.h>
> +
> +#include <rte_atomic.h>
> +#include <rte_branch_prediction.h>
> +#include <rte_spinlock.h>
> +
> +struct rte_seqlock {
> +	uint64_t sn;
> +	rte_spinlock_t lock;
> +};
> +
> +typedef struct rte_seqlock rte_seqlock_t;
> +


Add a reference to Wikipedia and/or Linux since not every DPDK
user maybe familar with this.

> +
> +	sn = seqlock->sn + 1;
> +
> +	__atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELAXED);
> +
> +	/* __ATOMIC_RELEASE to prevent stores after (in program order)
> +	 * from happening before the sn store.
> +	 */
> +	rte_atomic_thread_fence(__ATOMIC_RELEASE);

Could this just be __atomic_fetch_add() with __ATOMIC_RELEASE?


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC] eal: add seqlock
  2022-03-25 21:10         ` Stephen Hemminger
@ 2022-03-26 14:57           ` Mattias Rönnblom
  0 siblings, 0 replies; 104+ messages in thread
From: Mattias Rönnblom @ 2022-03-26 14:57 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: dev, Thomas Monjalon, David Marchand, Onar Olsen,
	Honnappa.Nagarahalli, nd, konstantin.ananyev, mb, Ola Liljedahl

On 2022-03-25 22:10, Stephen Hemminger wrote:
> On Fri, 25 Mar 2022 21:24:28 +0100
> Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote:
>
>> diff --git a/lib/eal/include/rte_seqlock.h b/lib/eal/include/rte_seqlock.h
>> new file mode 100644
>> index 0000000000..b975ca848a
>> --- /dev/null
>> +++ b/lib/eal/include/rte_seqlock.h
>> @@ -0,0 +1,84 @@
>> +/* SPDX-License-Identifier: BSD-3-Clause
>> + * Copyright(c) 2022 Ericsson AB
>> + */
>> +
>> +#ifndef _RTE_SEQLOCK_H_
>> +#define _RTE_SEQLOCK_H_
>> +
>> +#include <stdbool.h>
>> +#include <stdint.h>
>> +
>> +#include <rte_atomic.h>
>> +#include <rte_branch_prediction.h>
>> +#include <rte_spinlock.h>
>> +
>> +struct rte_seqlock {
>> +	uint64_t sn;
>> +	rte_spinlock_t lock;
>> +};
>> +
>> +typedef struct rte_seqlock rte_seqlock_t;
>> +
>
> Add a reference to Wikipedia and/or Linux since not every DPDK
> user maybe familar with this.

OK, will do.

>> +
>> +	sn = seqlock->sn + 1;
>> +
>> +	__atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELAXED);
>> +
>> +	/* __ATOMIC_RELEASE to prevent stores after (in program order)
>> +	 * from happening before the sn store.
>> +	 */
>> +	rte_atomic_thread_fence(__ATOMIC_RELEASE);
> Could this just be __atomic_fetch_add() with __ATOMIC_RELEASE?

If I understood C11 correctly, an __atomic_fetch_add() with 
__ATOMIC_RELEASE only prevents stores that precedes it (in program 
order) to be move ahead of it. Thus, stores that follows it may be 
reordered across the __atomic_fetch_add(), and seen by a reader before 
the sn change.

Also, __atomic_fetch_add() would generate an atomic add machine 
instruction, which, at least according to my experience (on x86_64), is 
slower than a mov+add+mov, which is what the above code will generate 
(plus prevent certain compiler optimizations). That's with TSO. What 
would happen on weakly ordered machines, I don't know in detail.





^ permalink raw reply	[flat|nested] 104+ messages in thread

* RE: [RFC] eal: add seqlock
  2022-03-25 20:24       ` [RFC] eal: add seqlock Mattias Rönnblom
  2022-03-25 21:10         ` Stephen Hemminger
@ 2022-03-27 14:49         ` Ananyev, Konstantin
  2022-03-27 17:42           ` Mattias Rönnblom
  1 sibling, 1 reply; 104+ messages in thread
From: Ananyev, Konstantin @ 2022-03-27 14:49 UTC (permalink / raw)
  To: mattias.ronnblom, dev
  Cc: Thomas Monjalon, David Marchand, Olsen, Onar,
	Honnappa.Nagarahalli, nd, mb, stephen, mattias.ronnblom,
	Ola Liljedahl

> diff --git a/lib/eal/include/meson.build b/lib/eal/include/meson.build
> index 9700494816..48df5f1a21 100644
> --- a/lib/eal/include/meson.build
> +++ b/lib/eal/include/meson.build
> @@ -36,6 +36,7 @@ headers += files(
>          'rte_per_lcore.h',
>          'rte_random.h',
>          'rte_reciprocal.h',
> +        'rte_seqlock.h',
>          'rte_service.h',
>          'rte_service_component.h',
>          'rte_string_fns.h',
> diff --git a/lib/eal/include/rte_seqlock.h b/lib/eal/include/rte_seqlock.h
> new file mode 100644
> index 0000000000..b975ca848a
> --- /dev/null
> +++ b/lib/eal/include/rte_seqlock.h
> @@ -0,0 +1,84 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2022 Ericsson AB
> + */
> +
> +#ifndef _RTE_SEQLOCK_H_
> +#define _RTE_SEQLOCK_H_
> +
> +#include <stdbool.h>
> +#include <stdint.h>
> +
> +#include <rte_atomic.h>
> +#include <rte_branch_prediction.h>
> +#include <rte_spinlock.h>
> +
> +struct rte_seqlock {
> +	uint64_t sn;
> +	rte_spinlock_t lock;
> +};
> +
> +typedef struct rte_seqlock rte_seqlock_t;
> +
> +__rte_experimental
> +void
> +rte_seqlock_init(rte_seqlock_t *seqlock);

Probably worth to have static initializer too.


> +
> +__rte_experimental
> +static inline uint64_t
> +rte_seqlock_read_begin(const rte_seqlock_t *seqlock)
> +{
> +	/* __ATOMIC_ACQUIRE to prevent loads after (in program order)
> +	 * from happening before the sn load. Syncronizes-with the
> +	 * store release in rte_seqlock_end().
> +	 */
> +	return __atomic_load_n(&seqlock->sn, __ATOMIC_ACQUIRE);
> +}
> +
> +__rte_experimental
> +static inline bool
> +rte_seqlock_read_retry(const rte_seqlock_t *seqlock, uint64_t begin_sn)
> +{
> +	uint64_t end_sn;
> +
> +	/* make sure the data loads happens before the sn load */
> +	rte_atomic_thread_fence(__ATOMIC_ACQUIRE);

That's sort of 'read_end' correct?
If so, shouldn't it be '__ATOMIC_RELEASE' instead here,
and
end_sn = __atomic_load_n(..., (__ATOMIC_ACQUIRE)
on the line below? 

> +
> +	end_sn = __atomic_load_n(&seqlock->sn, __ATOMIC_RELAXED);
> +
> +	return unlikely(begin_sn & 1 || begin_sn != end_sn);
> +}
> +
> +__rte_experimental
> +static inline void
> +rte_seqlock_write_begin(rte_seqlock_t *seqlock)
> +{
> +	uint64_t sn;
> +
> +	/* to synchronize with other writers */
> +	rte_spinlock_lock(&seqlock->lock);
> +
> +	sn = seqlock->sn + 1;
> +
> +	__atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELAXED);
> +
> +	/* __ATOMIC_RELEASE to prevent stores after (in program order)
> +	 * from happening before the sn store.
> +	 */
> +	rte_atomic_thread_fence(__ATOMIC_RELEASE);

I think it needs to be '__ATOMIC_ACQUIRE' here instead of '__ATOMIC_RELEASE'.

> +}
> +
> +__rte_experimental
> +static inline void
> +rte_seqlock_write_end(rte_seqlock_t *seqlock)
> +{
> +	uint64_t sn;
> +
> +	sn = seqlock->sn + 1;
> +
> +	/* synchronizes-with the load acquire in rte_seqlock_begin() */
> +	__atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELEASE);
> +
> +	rte_spinlock_unlock(&seqlock->lock);
> +}
> +


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC] eal: add seqlock
  2022-03-27 14:49         ` Ananyev, Konstantin
@ 2022-03-27 17:42           ` Mattias Rönnblom
  2022-03-28 10:53             ` Ananyev, Konstantin
  0 siblings, 1 reply; 104+ messages in thread
From: Mattias Rönnblom @ 2022-03-27 17:42 UTC (permalink / raw)
  To: Ananyev, Konstantin, dev
  Cc: Thomas Monjalon, David Marchand, Onar Olsen,
	Honnappa.Nagarahalli, nd, mb, stephen, Ola Liljedahl

On 2022-03-27 16:49, Ananyev, Konstantin wrote:
>> diff --git a/lib/eal/include/meson.build b/lib/eal/include/meson.build
>> index 9700494816..48df5f1a21 100644
>> --- a/lib/eal/include/meson.build
>> +++ b/lib/eal/include/meson.build
>> @@ -36,6 +36,7 @@ headers += files(
>>           'rte_per_lcore.h',
>>           'rte_random.h',
>>           'rte_reciprocal.h',
>> +        'rte_seqlock.h',
>>           'rte_service.h',
>>           'rte_service_component.h',
>>           'rte_string_fns.h',
>> diff --git a/lib/eal/include/rte_seqlock.h b/lib/eal/include/rte_seqlock.h
>> new file mode 100644
>> index 0000000000..b975ca848a
>> --- /dev/null
>> +++ b/lib/eal/include/rte_seqlock.h
>> @@ -0,0 +1,84 @@
>> +/* SPDX-License-Identifier: BSD-3-Clause
>> + * Copyright(c) 2022 Ericsson AB
>> + */
>> +
>> +#ifndef _RTE_SEQLOCK_H_
>> +#define _RTE_SEQLOCK_H_
>> +
>> +#include <stdbool.h>
>> +#include <stdint.h>
>> +
>> +#include <rte_atomic.h>
>> +#include <rte_branch_prediction.h>
>> +#include <rte_spinlock.h>
>> +
>> +struct rte_seqlock {
>> +	uint64_t sn;
>> +	rte_spinlock_t lock;
>> +};
>> +
>> +typedef struct rte_seqlock rte_seqlock_t;
>> +
>> +__rte_experimental
>> +void
>> +rte_seqlock_init(rte_seqlock_t *seqlock);
> Probably worth to have static initializer too.
>

I will add that in the next version, thanks.

>> +
>> +__rte_experimental
>> +static inline uint64_t
>> +rte_seqlock_read_begin(const rte_seqlock_t *seqlock)
>> +{
>> +	/* __ATOMIC_ACQUIRE to prevent loads after (in program order)
>> +	 * from happening before the sn load. Syncronizes-with the
>> +	 * store release in rte_seqlock_end().
>> +	 */
>> +	return __atomic_load_n(&seqlock->sn, __ATOMIC_ACQUIRE);
>> +}
>> +
>> +__rte_experimental
>> +static inline bool
>> +rte_seqlock_read_retry(const rte_seqlock_t *seqlock, uint64_t begin_sn)
>> +{
>> +	uint64_t end_sn;
>> +
>> +	/* make sure the data loads happens before the sn load */
>> +	rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
> That's sort of 'read_end' correct?
> If so, shouldn't it be '__ATOMIC_RELEASE' instead here,
> and
> end_sn = __atomic_load_n(..., (__ATOMIC_ACQUIRE)
> on the line below?

A release fence prevents reordering of stores. The reader doesn't do any 
stores, so I don't understand why you would use a release fence here. 
Could you elaborate?

>> +
>> +	end_sn = __atomic_load_n(&seqlock->sn, __ATOMIC_RELAXED);
>> +
>> +	return unlikely(begin_sn & 1 || begin_sn != end_sn);
>> +}
>> +
>> +__rte_experimental
>> +static inline void
>> +rte_seqlock_write_begin(rte_seqlock_t *seqlock)
>> +{
>> +	uint64_t sn;
>> +
>> +	/* to synchronize with other writers */
>> +	rte_spinlock_lock(&seqlock->lock);
>> +
>> +	sn = seqlock->sn + 1;
>> +
>> +	__atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELAXED);
>> +
>> +	/* __ATOMIC_RELEASE to prevent stores after (in program order)
>> +	 * from happening before the sn store.
>> +	 */
>> +	rte_atomic_thread_fence(__ATOMIC_RELEASE);
> I think it needs to be '__ATOMIC_ACQUIRE' here instead of '__ATOMIC_RELEASE'.

Please elaborate on why.

>> +}
>> +
>> +__rte_experimental
>> +static inline void
>> +rte_seqlock_write_end(rte_seqlock_t *seqlock)
>> +{
>> +	uint64_t sn;
>> +
>> +	sn = seqlock->sn + 1;
>> +
>> +	/* synchronizes-with the load acquire in rte_seqlock_begin() */
>> +	__atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELEASE);
>> +
>> +	rte_spinlock_unlock(&seqlock->lock);
>> +}
>> +


^ permalink raw reply	[flat|nested] 104+ messages in thread

* RE: [RFC] eal: add seqlock
  2022-03-27 17:42           ` Mattias Rönnblom
@ 2022-03-28 10:53             ` Ananyev, Konstantin
  2022-03-28 14:06               ` Ola Liljedahl
  0 siblings, 1 reply; 104+ messages in thread
From: Ananyev, Konstantin @ 2022-03-28 10:53 UTC (permalink / raw)
  To: mattias.ronnblom, dev
  Cc: Thomas Monjalon, David Marchand, Olsen, Onar,
	Honnappa.Nagarahalli, nd, mb, stephen, Ola Liljedahl


> >> diff --git a/lib/eal/include/meson.build b/lib/eal/include/meson.build
> >> index 9700494816..48df5f1a21 100644
> >> --- a/lib/eal/include/meson.build
> >> +++ b/lib/eal/include/meson.build
> >> @@ -36,6 +36,7 @@ headers += files(
> >>           'rte_per_lcore.h',
> >>           'rte_random.h',
> >>           'rte_reciprocal.h',
> >> +        'rte_seqlock.h',
> >>           'rte_service.h',
> >>           'rte_service_component.h',
> >>           'rte_string_fns.h',
> >> diff --git a/lib/eal/include/rte_seqlock.h b/lib/eal/include/rte_seqlock.h
> >> new file mode 100644
> >> index 0000000000..b975ca848a
> >> --- /dev/null
> >> +++ b/lib/eal/include/rte_seqlock.h
> >> @@ -0,0 +1,84 @@
> >> +/* SPDX-License-Identifier: BSD-3-Clause
> >> + * Copyright(c) 2022 Ericsson AB
> >> + */
> >> +
> >> +#ifndef _RTE_SEQLOCK_H_
> >> +#define _RTE_SEQLOCK_H_
> >> +
> >> +#include <stdbool.h>
> >> +#include <stdint.h>
> >> +
> >> +#include <rte_atomic.h>
> >> +#include <rte_branch_prediction.h>
> >> +#include <rte_spinlock.h>
> >> +
> >> +struct rte_seqlock {
> >> +	uint64_t sn;
> >> +	rte_spinlock_t lock;
> >> +};
> >> +
> >> +typedef struct rte_seqlock rte_seqlock_t;
> >> +
> >> +__rte_experimental
> >> +void
> >> +rte_seqlock_init(rte_seqlock_t *seqlock);
> > Probably worth to have static initializer too.
> >
> 
> I will add that in the next version, thanks.
> 
> >> +
> >> +__rte_experimental
> >> +static inline uint64_t
> >> +rte_seqlock_read_begin(const rte_seqlock_t *seqlock)
> >> +{
> >> +	/* __ATOMIC_ACQUIRE to prevent loads after (in program order)
> >> +	 * from happening before the sn load. Syncronizes-with the
> >> +	 * store release in rte_seqlock_end().
> >> +	 */
> >> +	return __atomic_load_n(&seqlock->sn, __ATOMIC_ACQUIRE);
> >> +}
> >> +
> >> +__rte_experimental
> >> +static inline bool
> >> +rte_seqlock_read_retry(const rte_seqlock_t *seqlock, uint64_t begin_sn)
> >> +{
> >> +	uint64_t end_sn;
> >> +
> >> +	/* make sure the data loads happens before the sn load */
> >> +	rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
> > That's sort of 'read_end' correct?
> > If so, shouldn't it be '__ATOMIC_RELEASE' instead here,
> > and
> > end_sn = __atomic_load_n(..., (__ATOMIC_ACQUIRE)
> > on the line below?
> 
> A release fence prevents reordering of stores. The reader doesn't do any
> stores, so I don't understand why you would use a release fence here.
> Could you elaborate?

From my understanding:  
rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
serves as a hoist barrier here, so it would only prevent later instructions
to be executed before that point.
But it wouldn't prevent earlier instructions to be executed after that point.
While we do need to guarantee that cpu will finish all previous reads before
progressing further. 

Suppose we have something like that:

struct {
	uint64_t shared;
	rte_seqlock_t lock;
} data;

...
sn = ...
uint64_t x = data.shared; 
/* inside rte_seqlock_read_retry(): */
...
rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
end_sn = __atomic_load_n(&data.lock.sn, __ATOMIC_RELAXED);

Here we need to make sure that read of data.shared will always happen
before reading of data.lock.sn. 
It is not a problem on IA (as reads are not reordered), but on machines with 
relaxed memory ordering (ARM, etc.)  it can happen.
So to prevent it we do need a sink barrier here first (ATOMIC_RELEASE).

Honnappa and other ARM & atomics experts, please correct me if I am wrong here.    

> >> +
> >> +	end_sn = __atomic_load_n(&seqlock->sn, __ATOMIC_RELAXED);
> >> +
> >> +	return unlikely(begin_sn & 1 || begin_sn != end_sn);
> >> +}
> >> +
> >> +__rte_experimental
> >> +static inline void
> >> +rte_seqlock_write_begin(rte_seqlock_t *seqlock)
> >> +{
> >> +	uint64_t sn;
> >> +
> >> +	/* to synchronize with other writers */
> >> +	rte_spinlock_lock(&seqlock->lock);
> >> +
> >> +	sn = seqlock->sn + 1;
> >> +
> >> +	__atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELAXED);
> >> +
> >> +	/* __ATOMIC_RELEASE to prevent stores after (in program order)
> >> +	 * from happening before the sn store.
> >> +	 */
> >> +	rte_atomic_thread_fence(__ATOMIC_RELEASE);
> > I think it needs to be '__ATOMIC_ACQUIRE' here instead of '__ATOMIC_RELEASE'.
> 
> Please elaborate on why.

As you said in the comments above, we need to prevent later stores
to be executed before that point. So we do need a hoist barrier here.
AFAIK to guarantee a hoist barrier '__ATOMIC_ACQUIRE' is required.

> 
> >> +}
> >> +
> >> +__rte_experimental
> >> +static inline void
> >> +rte_seqlock_write_end(rte_seqlock_t *seqlock)
> >> +{
> >> +	uint64_t sn;
> >> +
> >> +	sn = seqlock->sn + 1;
> >> +
> >> +	/* synchronizes-with the load acquire in rte_seqlock_begin() */
> >> +	__atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELEASE);
> >> +
> >> +	rte_spinlock_unlock(&seqlock->lock);
> >> +}
> >> +


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC] eal: add seqlock
  2022-03-28 10:53             ` Ananyev, Konstantin
@ 2022-03-28 14:06               ` Ola Liljedahl
  2022-03-29  8:32                 ` Mattias Rönnblom
  0 siblings, 1 reply; 104+ messages in thread
From: Ola Liljedahl @ 2022-03-28 14:06 UTC (permalink / raw)
  To: Ananyev, Konstantin, mattias.ronnblom, dev
  Cc: Thomas Monjalon, David Marchand, Olsen, Onar,
	Honnappa.Nagarahalli, nd, mb, stephen



On 3/28/22 12:53, Ananyev, Konstantin wrote:
> 
>>>> diff --git a/lib/eal/include/meson.build b/lib/eal/include/meson.build
>>>> index 9700494816..48df5f1a21 100644
>>>> --- a/lib/eal/include/meson.build
>>>> +++ b/lib/eal/include/meson.build
>>>> @@ -36,6 +36,7 @@ headers += files(
>>>>            'rte_per_lcore.h',
>>>>            'rte_random.h',
>>>>            'rte_reciprocal.h',
>>>> +        'rte_seqlock.h',
>>>>            'rte_service.h',
>>>>            'rte_service_component.h',
>>>>            'rte_string_fns.h',
>>>> diff --git a/lib/eal/include/rte_seqlock.h b/lib/eal/include/rte_seqlock.h
>>>> new file mode 100644
>>>> index 0000000000..b975ca848a
>>>> --- /dev/null
>>>> +++ b/lib/eal/include/rte_seqlock.h
>>>> @@ -0,0 +1,84 @@
>>>> +/* SPDX-License-Identifier: BSD-3-Clause
>>>> + * Copyright(c) 2022 Ericsson AB
>>>> + */
>>>> +
>>>> +#ifndef _RTE_SEQLOCK_H_
>>>> +#define _RTE_SEQLOCK_H_
>>>> +
>>>> +#include <stdbool.h>
>>>> +#include <stdint.h>
>>>> +
>>>> +#include <rte_atomic.h>
>>>> +#include <rte_branch_prediction.h>
>>>> +#include <rte_spinlock.h>
>>>> +
>>>> +struct rte_seqlock {
>>>> +	uint64_t sn;
>>>> +	rte_spinlock_t lock;
>>>> +};
>>>> +
>>>> +typedef struct rte_seqlock rte_seqlock_t;
>>>> +
>>>> +__rte_experimental
>>>> +void
>>>> +rte_seqlock_init(rte_seqlock_t *seqlock);
>>> Probably worth to have static initializer too.
>>>
>>
>> I will add that in the next version, thanks.
>>
>>>> +
>>>> +__rte_experimental
>>>> +static inline uint64_t
>>>> +rte_seqlock_read_begin(const rte_seqlock_t *seqlock)
>>>> +{
>>>> +	/* __ATOMIC_ACQUIRE to prevent loads after (in program order)
>>>> +	 * from happening before the sn load. Syncronizes-with the
>>>> +	 * store release in rte_seqlock_end().
>>>> +	 */
>>>> +	return __atomic_load_n(&seqlock->sn, __ATOMIC_ACQUIRE);
>>>> +}
>>>> +
>>>> +__rte_experimental
>>>> +static inline bool
>>>> +rte_seqlock_read_retry(const rte_seqlock_t *seqlock, uint64_t begin_sn)
>>>> +{
>>>> +	uint64_t end_sn;
>>>> +
>>>> +	/* make sure the data loads happens before the sn load */
>>>> +	rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
>>> That's sort of 'read_end' correct?
>>> If so, shouldn't it be '__ATOMIC_RELEASE' instead here,
>>> and
>>> end_sn = __atomic_load_n(..., (__ATOMIC_ACQUIRE)
>>> on the line below?
>>
>> A release fence prevents reordering of stores. The reader doesn't do any
>> stores, so I don't understand why you would use a release fence here.
>> Could you elaborate?
> 
>  From my understanding:
> rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
> serves as a hoist barrier here, so it would only prevent later instructions
> to be executed before that point.
> But it wouldn't prevent earlier instructions to be executed after that point.
> While we do need to guarantee that cpu will finish all previous reads before
> progressing further.
> 
> Suppose we have something like that:
> 
> struct {
> 	uint64_t shared;
> 	rte_seqlock_t lock;
> } data;
> 
> ...
> sn = ...
> uint64_t x = data.shared;
> /* inside rte_seqlock_read_retry(): */
> ...
> rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
> end_sn = __atomic_load_n(&data.lock.sn, __ATOMIC_RELAXED);
> 
> Here we need to make sure that read of data.shared will always happen
> before reading of data.lock.sn.
> It is not a problem on IA (as reads are not reordered), but on machines with
> relaxed memory ordering (ARM, etc.)  it can happen.
> So to prevent it we do need a sink barrier here first (ATOMIC_RELEASE)
We can't use store-release since there is no write on the reader-side.
And fence-release orders against later stores, not later loads.

> 
> Honnappa and other ARM & atomics experts, please correct me if I am wrong here.
The C standard (chapter 7.17.4 in the C11 (draft)) isn't so easy to 
digest. If we trust Preshing, he has a more accessible description here: 
https://preshing.com/20130922/acquire-and-release-fences/
"An acquire fence prevents the memory reordering of any read which 
precedes it in program order with any read or write which follows it in 
program order."
and here: 
https://preshing.com/20131125/acquire-and-release-fences-dont-work-the-way-youd-expect/ 
(for C++ but the definition seems to be identical to that of C11).
Essentially a LoadLoad+LoadStore barrier which is what we want to achieve.

GCC 10.3 for AArch64/A64 ISA generates a "DMB ISHLD" instruction. This 
waits for all loads preceding (in program order) the memory barrier to 
be observed before any memory accesses after (in program order) the 
memory barrier.

I think the key to understanding atomic thread fences is that they are 
not associated with a specific memory access (unlike load-acquire and 
store-release) so they can't order earlier or later memory accesses 
against some specific memory access. Instead the fence orders any/all 
earlier loads and/or stores against any/all later loads or stores 
(depending on acquire or release).

> 
>>>> +
>>>> +	end_sn = __atomic_load_n(&seqlock->sn, __ATOMIC_RELAXED);
>>>> +
>>>> +	return unlikely(begin_sn & 1 || begin_sn != end_sn);
>>>> +}
>>>> +
>>>> +__rte_experimental
>>>> +static inline void
>>>> +rte_seqlock_write_begin(rte_seqlock_t *seqlock)
>>>> +{
>>>> +	uint64_t sn;
>>>> +
>>>> +	/* to synchronize with other writers */
>>>> +	rte_spinlock_lock(&seqlock->lock);
>>>> +
>>>> +	sn = seqlock->sn + 1;
>>>> +
>>>> +	__atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELAXED);
>>>> +
>>>> +	/* __ATOMIC_RELEASE to prevent stores after (in program order)
>>>> +	 * from happening before the sn store.
>>>> +	 */
>>>> +	rte_atomic_thread_fence(__ATOMIC_RELEASE);
>>> I think it needs to be '__ATOMIC_ACQUIRE' here instead of '__ATOMIC_RELEASE'.
>>
>> Please elaborate on why.
> 
> As you said in the comments above, we need to prevent later stores
> to be executed before that point. So we do need a hoist barrier here.
> AFAIK to guarantee a hoist barrier '__ATOMIC_ACQUIRE' is required.
An acquire fence wouldn't order an earlier store (the write to 
seqlock->sn) from being reordered with some later store (e.g. writes to 
the protected data), thus it would allow readers to see updated data 
(possibly torn) with a pre-update sequence number. We need a StoreStore 
barrier for ordering the SN store and data stores => fence(release).

Acquire and releases fences can (also) be used to create 
synchronize-with relationships (this is how the C standard defines 
them). Preshing has a good example on this. Basically
Thread 1:
data = 242;
atomic_thread_fence(atomic_release);
atomic_store_n(&guard, 1, atomic_relaxed);

Thread 2:
while (atomic_load_n(&guard, atomic_relaxed) != 1) ;
atomic_thread_fence(atomic_acquire);
do_something(data);

These are obvious analogues to store-release and load-acquire, thus the 
acquire & release names of the fences.

- Ola

> 
>>
>>>> +}
>>>> +
>>>> +__rte_experimental
>>>> +static inline void
>>>> +rte_seqlock_write_end(rte_seqlock_t *seqlock)
>>>> +{
>>>> +	uint64_t sn;
>>>> +
>>>> +	sn = seqlock->sn + 1;
>>>> +
>>>> +	/* synchronizes-with the load acquire in rte_seqlock_begin() */
>>>> +	__atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELEASE);
>>>> +
>>>> +	rte_spinlock_unlock(&seqlock->lock);
>>>> +}
>>>> +
> 

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC] eal: add seqlock
  2022-03-28 14:06               ` Ola Liljedahl
@ 2022-03-29  8:32                 ` Mattias Rönnblom
  2022-03-29 13:20                   ` Ananyev, Konstantin
  0 siblings, 1 reply; 104+ messages in thread
From: Mattias Rönnblom @ 2022-03-29  8:32 UTC (permalink / raw)
  To: Ola Liljedahl, Ananyev, Konstantin, dev
  Cc: Thomas Monjalon, David Marchand, Onar Olsen,
	Honnappa.Nagarahalli, nd, mb, stephen

On 2022-03-28 16:06, Ola Liljedahl wrote:
>
>
> On 3/28/22 12:53, Ananyev, Konstantin wrote:
>>
>>>>> diff --git a/lib/eal/include/meson.build 
>>>>> b/lib/eal/include/meson.build
>>>>> index 9700494816..48df5f1a21 100644
>>>>> --- a/lib/eal/include/meson.build
>>>>> +++ b/lib/eal/include/meson.build
>>>>> @@ -36,6 +36,7 @@ headers += files(
>>>>>            'rte_per_lcore.h',
>>>>>            'rte_random.h',
>>>>>            'rte_reciprocal.h',
>>>>> +        'rte_seqlock.h',
>>>>>            'rte_service.h',
>>>>>            'rte_service_component.h',
>>>>>            'rte_string_fns.h',
>>>>> diff --git a/lib/eal/include/rte_seqlock.h 
>>>>> b/lib/eal/include/rte_seqlock.h
>>>>> new file mode 100644
>>>>> index 0000000000..b975ca848a
>>>>> --- /dev/null
>>>>> +++ b/lib/eal/include/rte_seqlock.h
>>>>> @@ -0,0 +1,84 @@
>>>>> +/* SPDX-License-Identifier: BSD-3-Clause
>>>>> + * Copyright(c) 2022 Ericsson AB
>>>>> + */
>>>>> +
>>>>> +#ifndef _RTE_SEQLOCK_H_
>>>>> +#define _RTE_SEQLOCK_H_
>>>>> +
>>>>> +#include <stdbool.h>
>>>>> +#include <stdint.h>
>>>>> +
>>>>> +#include <rte_atomic.h>
>>>>> +#include <rte_branch_prediction.h>
>>>>> +#include <rte_spinlock.h>
>>>>> +
>>>>> +struct rte_seqlock {
>>>>> +    uint64_t sn;
>>>>> +    rte_spinlock_t lock;
>>>>> +};
>>>>> +
>>>>> +typedef struct rte_seqlock rte_seqlock_t;
>>>>> +
>>>>> +__rte_experimental
>>>>> +void
>>>>> +rte_seqlock_init(rte_seqlock_t *seqlock);
>>>> Probably worth to have static initializer too.
>>>>
>>>
>>> I will add that in the next version, thanks.
>>>
>>>>> +
>>>>> +__rte_experimental
>>>>> +static inline uint64_t
>>>>> +rte_seqlock_read_begin(const rte_seqlock_t *seqlock)
>>>>> +{
>>>>> +    /* __ATOMIC_ACQUIRE to prevent loads after (in program order)
>>>>> +     * from happening before the sn load. Syncronizes-with the
>>>>> +     * store release in rte_seqlock_end().
>>>>> +     */
>>>>> +    return __atomic_load_n(&seqlock->sn, __ATOMIC_ACQUIRE);
>>>>> +}
>>>>> +
>>>>> +__rte_experimental
>>>>> +static inline bool
>>>>> +rte_seqlock_read_retry(const rte_seqlock_t *seqlock, uint64_t 
>>>>> begin_sn)
>>>>> +{
>>>>> +    uint64_t end_sn;
>>>>> +
>>>>> +    /* make sure the data loads happens before the sn load */
>>>>> +    rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
>>>> That's sort of 'read_end' correct?
>>>> If so, shouldn't it be '__ATOMIC_RELEASE' instead here,
>>>> and
>>>> end_sn = __atomic_load_n(..., (__ATOMIC_ACQUIRE)
>>>> on the line below?
>>>
>>> A release fence prevents reordering of stores. The reader doesn't do 
>>> any
>>> stores, so I don't understand why you would use a release fence here.
>>> Could you elaborate?
>>
>>  From my understanding:
>> rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
>> serves as a hoist barrier here, so it would only prevent later 
>> instructions
>> to be executed before that point.
>> But it wouldn't prevent earlier instructions to be executed after 
>> that point.
>> While we do need to guarantee that cpu will finish all previous reads 
>> before
>> progressing further.
>>
>> Suppose we have something like that:
>>
>> struct {
>>     uint64_t shared;
>>     rte_seqlock_t lock;
>> } data;
>>
>> ...
>> sn = ...
>> uint64_t x = data.shared;
>> /* inside rte_seqlock_read_retry(): */
>> ...
>> rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
>> end_sn = __atomic_load_n(&data.lock.sn, __ATOMIC_RELAXED);
>>
>> Here we need to make sure that read of data.shared will always happen
>> before reading of data.lock.sn.
>> It is not a problem on IA (as reads are not reordered), but on 
>> machines with
>> relaxed memory ordering (ARM, etc.)  it can happen.
>> So to prevent it we do need a sink barrier here first (ATOMIC_RELEASE)
> We can't use store-release since there is no write on the reader-side.
> And fence-release orders against later stores, not later loads.
>
>>
>> Honnappa and other ARM & atomics experts, please correct me if I am 
>> wrong here.
> The C standard (chapter 7.17.4 in the C11 (draft)) isn't so easy to 
> digest. If we trust Preshing, he has a more accessible description 
> here: 
> https://protect2.fireeye.com/v1/url?k=31323334-501d5122-313273af-454445555731-f4f5b1eec2980283&q=1&e=3479ebfa-e18d-4bf8-88fe-76823a531912&u=https%3A%2F%2Fpreshing.com%2F20130922%2Facquire-and-release-fences%2F
> "An acquire fence prevents the memory reordering of any read which 
> precedes it in program order with any read or write which follows it 
> in program order."
> and here: 
> https://protect2.fireeye.com/v1/url?k=31323334-501d5122-313273af-454445555731-64b0eba450be934b&q=1&e=3479ebfa-e18d-4bf8-88fe-76823a531912&u=https%3A%2F%2Fpreshing.com%2F20131125%2Facquire-and-release-fences-dont-work-the-way-youd-expect%2F 
> (for C++ but the definition seems to be identical to that of C11).
> Essentially a LoadLoad+LoadStore barrier which is what we want to 
> achieve.
>
> GCC 10.3 for AArch64/A64 ISA generates a "DMB ISHLD" instruction. This 
> waits for all loads preceding (in program order) the memory barrier to 
> be observed before any memory accesses after (in program order) the 
> memory barrier.
>
> I think the key to understanding atomic thread fences is that they are 
> not associated with a specific memory access (unlike load-acquire and 
> store-release) so they can't order earlier or later memory accesses 
> against some specific memory access. Instead the fence orders any/all 
> earlier loads and/or stores against any/all later loads or stores 
> (depending on acquire or release).
>
>>
>>>>> +
>>>>> +    end_sn = __atomic_load_n(&seqlock->sn, __ATOMIC_RELAXED);
>>>>> +
>>>>> +    return unlikely(begin_sn & 1 || begin_sn != end_sn);
>>>>> +}
>>>>> +
>>>>> +__rte_experimental
>>>>> +static inline void
>>>>> +rte_seqlock_write_begin(rte_seqlock_t *seqlock)
>>>>> +{
>>>>> +    uint64_t sn;
>>>>> +
>>>>> +    /* to synchronize with other writers */
>>>>> +    rte_spinlock_lock(&seqlock->lock);
>>>>> +
>>>>> +    sn = seqlock->sn + 1;
>>>>> +
>>>>> +    __atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELAXED);
>>>>> +
>>>>> +    /* __ATOMIC_RELEASE to prevent stores after (in program order)
>>>>> +     * from happening before the sn store.
>>>>> +     */
>>>>> +    rte_atomic_thread_fence(__ATOMIC_RELEASE);
>>>> I think it needs to be '__ATOMIC_ACQUIRE' here instead of 
>>>> '__ATOMIC_RELEASE'.
>>>
>>> Please elaborate on why.
>>
>> As you said in the comments above, we need to prevent later stores
>> to be executed before that point. So we do need a hoist barrier here.
>> AFAIK to guarantee a hoist barrier '__ATOMIC_ACQUIRE' is required.
> An acquire fence wouldn't order an earlier store (the write to 
> seqlock->sn) from being reordered with some later store (e.g. writes 
> to the protected data), thus it would allow readers to see updated 
> data (possibly torn) with a pre-update sequence number. We need a 
> StoreStore barrier for ordering the SN store and data stores => 
> fence(release).
>
> Acquire and releases fences can (also) be used to create 
> synchronize-with relationships (this is how the C standard defines 
> them). Preshing has a good example on this. Basically
> Thread 1:
> data = 242;
> atomic_thread_fence(atomic_release);
> atomic_store_n(&guard, 1, atomic_relaxed);
>
> Thread 2:
> while (atomic_load_n(&guard, atomic_relaxed) != 1) ;
> atomic_thread_fence(atomic_acquire);
> do_something(data);
>
> These are obvious analogues to store-release and load-acquire, thus 
> the acquire & release names of the fences.
>
> - Ola
>
>>
>>>
>>>>> +}
>>>>> +
>>>>> +__rte_experimental
>>>>> +static inline void
>>>>> +rte_seqlock_write_end(rte_seqlock_t *seqlock)
>>>>> +{
>>>>> +    uint64_t sn;
>>>>> +
>>>>> +    sn = seqlock->sn + 1;
>>>>> +
>>>>> +    /* synchronizes-with the load acquire in rte_seqlock_begin() */
>>>>> +    __atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELEASE);
>>>>> +
>>>>> +    rte_spinlock_unlock(&seqlock->lock);
>>>>> +}
>>>>> +
>>

I have nothing to add, but Ola's mail seems to have been blocked from 
the dev list, so I'm posting this again.


^ permalink raw reply	[flat|nested] 104+ messages in thread

* RE: [RFC] eal: add seqlock
  2022-03-29  8:32                 ` Mattias Rönnblom
@ 2022-03-29 13:20                   ` Ananyev, Konstantin
  2022-03-30 10:07                     ` [PATCH] " Mattias Rönnblom
  0 siblings, 1 reply; 104+ messages in thread
From: Ananyev, Konstantin @ 2022-03-29 13:20 UTC (permalink / raw)
  To: mattias.ronnblom, Ola Liljedahl, dev
  Cc: Thomas Monjalon, David Marchand, Olsen, Onar,
	Honnappa.Nagarahalli, nd, mb, stephen


> >>>>> diff --git a/lib/eal/include/meson.build
> >>>>> b/lib/eal/include/meson.build
> >>>>> index 9700494816..48df5f1a21 100644
> >>>>> --- a/lib/eal/include/meson.build
> >>>>> +++ b/lib/eal/include/meson.build
> >>>>> @@ -36,6 +36,7 @@ headers += files(
> >>>>>            'rte_per_lcore.h',
> >>>>>            'rte_random.h',
> >>>>>            'rte_reciprocal.h',
> >>>>> +        'rte_seqlock.h',
> >>>>>            'rte_service.h',
> >>>>>            'rte_service_component.h',
> >>>>>            'rte_string_fns.h',
> >>>>> diff --git a/lib/eal/include/rte_seqlock.h
> >>>>> b/lib/eal/include/rte_seqlock.h
> >>>>> new file mode 100644
> >>>>> index 0000000000..b975ca848a
> >>>>> --- /dev/null
> >>>>> +++ b/lib/eal/include/rte_seqlock.h
> >>>>> @@ -0,0 +1,84 @@
> >>>>> +/* SPDX-License-Identifier: BSD-3-Clause
> >>>>> + * Copyright(c) 2022 Ericsson AB
> >>>>> + */
> >>>>> +
> >>>>> +#ifndef _RTE_SEQLOCK_H_
> >>>>> +#define _RTE_SEQLOCK_H_
> >>>>> +
> >>>>> +#include <stdbool.h>
> >>>>> +#include <stdint.h>
> >>>>> +
> >>>>> +#include <rte_atomic.h>
> >>>>> +#include <rte_branch_prediction.h>
> >>>>> +#include <rte_spinlock.h>
> >>>>> +
> >>>>> +struct rte_seqlock {
> >>>>> +    uint64_t sn;
> >>>>> +    rte_spinlock_t lock;
> >>>>> +};
> >>>>> +
> >>>>> +typedef struct rte_seqlock rte_seqlock_t;
> >>>>> +
> >>>>> +__rte_experimental
> >>>>> +void
> >>>>> +rte_seqlock_init(rte_seqlock_t *seqlock);
> >>>> Probably worth to have static initializer too.
> >>>>
> >>>
> >>> I will add that in the next version, thanks.
> >>>
> >>>>> +
> >>>>> +__rte_experimental
> >>>>> +static inline uint64_t
> >>>>> +rte_seqlock_read_begin(const rte_seqlock_t *seqlock)
> >>>>> +{
> >>>>> +    /* __ATOMIC_ACQUIRE to prevent loads after (in program order)
> >>>>> +     * from happening before the sn load. Syncronizes-with the
> >>>>> +     * store release in rte_seqlock_end().
> >>>>> +     */
> >>>>> +    return __atomic_load_n(&seqlock->sn, __ATOMIC_ACQUIRE);
> >>>>> +}
> >>>>> +
> >>>>> +__rte_experimental
> >>>>> +static inline bool
> >>>>> +rte_seqlock_read_retry(const rte_seqlock_t *seqlock, uint64_t
> >>>>> begin_sn)
> >>>>> +{
> >>>>> +    uint64_t end_sn;
> >>>>> +
> >>>>> +    /* make sure the data loads happens before the sn load */
> >>>>> +    rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
> >>>> That's sort of 'read_end' correct?
> >>>> If so, shouldn't it be '__ATOMIC_RELEASE' instead here,
> >>>> and
> >>>> end_sn = __atomic_load_n(..., (__ATOMIC_ACQUIRE)
> >>>> on the line below?
> >>>
> >>> A release fence prevents reordering of stores. The reader doesn't do
> >>> any
> >>> stores, so I don't understand why you would use a release fence here.
> >>> Could you elaborate?
> >>
> >>  From my understanding:
> >> rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
> >> serves as a hoist barrier here, so it would only prevent later
> >> instructions
> >> to be executed before that point.
> >> But it wouldn't prevent earlier instructions to be executed after
> >> that point.
> >> While we do need to guarantee that cpu will finish all previous reads
> >> before
> >> progressing further.
> >>
> >> Suppose we have something like that:
> >>
> >> struct {
> >>     uint64_t shared;
> >>     rte_seqlock_t lock;
> >> } data;
> >>
> >> ...
> >> sn = ...
> >> uint64_t x = data.shared;
> >> /* inside rte_seqlock_read_retry(): */
> >> ...
> >> rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
> >> end_sn = __atomic_load_n(&data.lock.sn, __ATOMIC_RELAXED);
> >>
> >> Here we need to make sure that read of data.shared will always happen
> >> before reading of data.lock.sn.
> >> It is not a problem on IA (as reads are not reordered), but on
> >> machines with
> >> relaxed memory ordering (ARM, etc.)  it can happen.
> >> So to prevent it we do need a sink barrier here first (ATOMIC_RELEASE)
> > We can't use store-release since there is no write on the reader-side.
> > And fence-release orders against later stores, not later loads.
> >
> >>
> >> Honnappa and other ARM & atomics experts, please correct me if I am
> >> wrong here.
> > The C standard (chapter 7.17.4 in the C11 (draft)) isn't so easy to
> > digest. If we trust Preshing, he has a more accessible description
> > here:
> > https://protect2.fireeye.com/v1/url?k=31323334-501d5122-313273af-454445555731-f4f5b1eec2980283&q=1&e=3479ebfa-e18d-4bf8-
> 88fe-76823a531912&u=https%3A%2F%2Fpreshing.com%2F20130922%2Facquire-and-release-fences%2F
> > "An acquire fence prevents the memory reordering of any read which
> > precedes it in program order with any read or write which follows it
> > in program order."
> > and here:
> > https://protect2.fireeye.com/v1/url?k=31323334-501d5122-313273af-454445555731-64b0eba450be934b&q=1&e=3479ebfa-e18d-4bf8-
> 88fe-76823a531912&u=https%3A%2F%2Fpreshing.com%2F20131125%2Facquire-and-release-fences-dont-work-the-way-youd-expect%2F
> > (for C++ but the definition seems to be identical to that of C11).
> > Essentially a LoadLoad+LoadStore barrier which is what we want to
> > achieve.
> >
> > GCC 10.3 for AArch64/A64 ISA generates a "DMB ISHLD" instruction. This
> > waits for all loads preceding (in program order) the memory barrier to
> > be observed before any memory accesses after (in program order) the
> > memory barrier.
> >
> > I think the key to understanding atomic thread fences is that they are
> > not associated with a specific memory access (unlike load-acquire and
> > store-release) so they can't order earlier or later memory accesses
> > against some specific memory access. Instead the fence orders any/all
> > earlier loads and/or stores against any/all later loads or stores
> > (depending on acquire or release).
> >
> >>
> >>>>> +
> >>>>> +    end_sn = __atomic_load_n(&seqlock->sn, __ATOMIC_RELAXED);
> >>>>> +
> >>>>> +    return unlikely(begin_sn & 1 || begin_sn != end_sn);
> >>>>> +}
> >>>>> +
> >>>>> +__rte_experimental
> >>>>> +static inline void
> >>>>> +rte_seqlock_write_begin(rte_seqlock_t *seqlock)
> >>>>> +{
> >>>>> +    uint64_t sn;
> >>>>> +
> >>>>> +    /* to synchronize with other writers */
> >>>>> +    rte_spinlock_lock(&seqlock->lock);
> >>>>> +
> >>>>> +    sn = seqlock->sn + 1;
> >>>>> +
> >>>>> +    __atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELAXED);
> >>>>> +
> >>>>> +    /* __ATOMIC_RELEASE to prevent stores after (in program order)
> >>>>> +     * from happening before the sn store.
> >>>>> +     */
> >>>>> +    rte_atomic_thread_fence(__ATOMIC_RELEASE);
> >>>> I think it needs to be '__ATOMIC_ACQUIRE' here instead of
> >>>> '__ATOMIC_RELEASE'.
> >>>
> >>> Please elaborate on why.
> >>
> >> As you said in the comments above, we need to prevent later stores
> >> to be executed before that point. So we do need a hoist barrier here.
> >> AFAIK to guarantee a hoist barrier '__ATOMIC_ACQUIRE' is required.
> > An acquire fence wouldn't order an earlier store (the write to
> > seqlock->sn) from being reordered with some later store (e.g. writes
> > to the protected data), thus it would allow readers to see updated
> > data (possibly torn) with a pre-update sequence number. We need a
> > StoreStore barrier for ordering the SN store and data stores =>
> > fence(release).
> >
> > Acquire and releases fences can (also) be used to create
> > synchronize-with relationships (this is how the C standard defines
> > them). Preshing has a good example on this. Basically
> > Thread 1:
> > data = 242;
> > atomic_thread_fence(atomic_release);
> > atomic_store_n(&guard, 1, atomic_relaxed);
> >
> > Thread 2:
> > while (atomic_load_n(&guard, atomic_relaxed) != 1) ;
> > atomic_thread_fence(atomic_acquire);
> > do_something(data);
> >
> > These are obvious analogues to store-release and load-acquire, thus
> > the acquire & release names of the fences.
> >
> > - Ola
> >
> >>
> >>>
> >>>>> +}
> >>>>> +
> >>>>> +__rte_experimental
> >>>>> +static inline void
> >>>>> +rte_seqlock_write_end(rte_seqlock_t *seqlock)
> >>>>> +{
> >>>>> +    uint64_t sn;
> >>>>> +
> >>>>> +    sn = seqlock->sn + 1;
> >>>>> +
> >>>>> +    /* synchronizes-with the load acquire in rte_seqlock_begin() */
> >>>>> +    __atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELEASE);
> >>>>> +
> >>>>> +    rte_spinlock_unlock(&seqlock->lock);
> >>>>> +}
> >>>>> +
> >>
> 
> I have nothing to add, but Ola's mail seems to have been blocked from
> the dev list, so I'm posting this again.

Ok, thanks Ola for detailed explanation.
Have to admit then that my understanding of atomic_fence() behaviour was incorrect.
Please disregard my comments above about rte_seqlock_read_retry()
and  rte_seqlock_write_begin().

Konstantin  



^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH] eal: add seqlock
  2022-03-29 13:20                   ` Ananyev, Konstantin
@ 2022-03-30 10:07                     ` Mattias Rönnblom
  2022-03-30 10:50                       ` Morten Brørup
  0 siblings, 1 reply; 104+ messages in thread
From: Mattias Rönnblom @ 2022-03-30 10:07 UTC (permalink / raw)
  To: dev
  Cc: Thomas Monjalon, David Marchand, onar.olsen,
	Honnappa.Nagarahalli, nd, konstantin.ananyev, mb, stephen,
	Mattias Rönnblom, Ola Liljedahl

A sequence lock (seqlock) is synchronization primitive which allows
for data-race free, low-overhead, high-frequency reads, especially for
data structures shared across many cores and which are updated with
relatively infrequently.

A seqlock permits multiple parallel readers. The variant of seqlock
implemented in this patch supports multiple writers as well. A
spinlock is used for writer-writer serialization.

To avoid resource reclamation and other issues, the data protected by
a seqlock is best off being self-contained (i.e., no pointers [except
to constant data]).

One way to think about seqlocks is that they provide means to perform
atomic operations on data objects larger what the native atomic
machine instructions allow for.

DPDK seqlocks are not preemption safe on the writer side. A thread
preemption affects performance, not correctness.

A seqlock contains a sequence number, which can be thought of as the
generation of the data it protects.

A reader will
  1. Load the sequence number (sn).
  2. Load, in arbitrary order, the seqlock-protected data.
  3. Load the sn again.
  4. Check if the first and second sn are equal, and even numbered.
     If they are not, discard the loaded data, and restart from 1.

The first three steps need to be ordered using suitable memory fences.

A writer will
  1. Take the spinlock, to serialize writer access.
  2. Load the sn.
  3. Store the original sn + 1 as the new sn.
  4. Perform load and stores to the seqlock-protected data.
  5. Store the original sn + 2 as the new sn.
  6. Release the spinlock.

Proper memory fencing is required to make sure the first sn store, the
data stores, and the second sn store appear to the reader in the
mentioned order.

The sn loads and stores must be atomic, but the data loads and stores
need not be.

The original seqlock design and implementation was done by Stephen
Hemminger. This is an independent implementation, using C11 atomics.

For more information on seqlocks, see
https://en.wikipedia.org/wiki/Seqlock

Updates since RFC:
  * Added API documentation.
  * Added link to Wikipedia article in the commit message.
  * Changed seqlock sequence number field from uint64_t (which was
    overkill) to uint32_t. The sn type needs to be sufficiently large
    to assure no reader will read a sn, access the data, and then read
    the same sn, but the sn has been updated to many times during the
    read, so it has wrapped.
  * Added RTE_SEQLOCK_INITIALIZER macro for static initialization.
  * Removed the rte_seqlock struct + separate rte_seqlock_t typedef
    with an anonymous struct typedef:ed to rte_seqlock_t.

Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
---
 app/test/meson.build          |   2 +
 app/test/test_seqlock.c       | 200 ++++++++++++++++++++++++
 lib/eal/common/meson.build    |   1 +
 lib/eal/common/rte_seqlock.c  |  12 ++
 lib/eal/include/meson.build   |   1 +
 lib/eal/include/rte_seqlock.h | 282 ++++++++++++++++++++++++++++++++++
 lib/eal/version.map           |   3 +
 7 files changed, 501 insertions(+)
 create mode 100644 app/test/test_seqlock.c
 create mode 100644 lib/eal/common/rte_seqlock.c
 create mode 100644 lib/eal/include/rte_seqlock.h

diff --git a/app/test/meson.build b/app/test/meson.build
index 5fc1dd1b7b..5e418e8766 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -125,6 +125,7 @@ test_sources = files(
         'test_rwlock.c',
         'test_sched.c',
         'test_security.c',
+        'test_seqlock.c',
         'test_service_cores.c',
         'test_spinlock.c',
         'test_stack.c',
@@ -214,6 +215,7 @@ fast_tests = [
         ['rwlock_rde_wro_autotest', true],
         ['sched_autotest', true],
         ['security_autotest', false],
+        ['seqlock_autotest', true],
         ['spinlock_autotest', true],
         ['stack_autotest', false],
         ['stack_lf_autotest', false],
diff --git a/app/test/test_seqlock.c b/app/test/test_seqlock.c
new file mode 100644
index 0000000000..8d094a3c32
--- /dev/null
+++ b/app/test/test_seqlock.c
@@ -0,0 +1,200 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2022 Ericsson AB
+ */
+
+#include <rte_seqlock.h>
+
+#include <rte_cycles.h>
+#include <rte_malloc.h>
+#include <rte_random.h>
+
+#include <inttypes.h>
+
+#include "test.h"
+
+struct data {
+	rte_seqlock_t lock;
+
+	uint64_t a;
+	uint64_t b __rte_cache_aligned;
+	uint64_t c __rte_cache_aligned;
+} __rte_cache_aligned;
+
+struct reader {
+	struct data *data;
+	uint8_t stop;
+};
+
+#define WRITER_RUNTIME (2.0) /* s */
+
+#define WRITER_MAX_DELAY (100) /* us */
+
+#define INTERRUPTED_WRITER_FREQUENCY (1000)
+#define WRITER_INTERRUPT_TIME (1) /* us */
+
+static int
+writer_start(void *arg)
+{
+	struct data *data = arg;
+	uint64_t deadline;
+
+	deadline = rte_get_timer_cycles() +
+		WRITER_RUNTIME * rte_get_timer_hz();
+
+	while (rte_get_timer_cycles() < deadline) {
+		bool interrupted;
+		uint64_t new_value;
+		unsigned int delay;
+
+		new_value = rte_rand();
+
+		interrupted = rte_rand_max(INTERRUPTED_WRITER_FREQUENCY) == 0;
+
+		rte_seqlock_write_begin(&data->lock);
+
+		data->c = new_value;
+
+		/* These compiler barriers (both on the test reader
+		 * and the test writer side) are here to ensure that
+		 * loads/stores *usually* happen in test program order
+		 * (always on a TSO machine). They are arrange in such
+		 * a way that the writer stores in a different order
+		 * than the reader loads, to emulate an arbitrary
+		 * order. A real application using a seqlock does not
+		 * require any compiler barriers.
+		 */
+		rte_compiler_barrier();
+		data->b = new_value;
+
+		if (interrupted)
+			rte_delay_us_block(WRITER_INTERRUPT_TIME);
+
+		rte_compiler_barrier();
+		data->a = new_value;
+
+		rte_seqlock_write_end(&data->lock);
+
+		delay = rte_rand_max(WRITER_MAX_DELAY);
+
+		rte_delay_us_block(delay);
+	}
+
+	return 0;
+}
+
+#define INTERRUPTED_READER_FREQUENCY (1000)
+#define READER_INTERRUPT_TIME (1000) /* us */
+
+static int
+reader_start(void *arg)
+{
+	struct reader *r = arg;
+	int rc = 0;
+
+	while (__atomic_load_n(&r->stop, __ATOMIC_RELAXED) == 0 && rc == 0) {
+		struct data *data = r->data;
+		bool interrupted;
+		uint64_t a;
+		uint64_t b;
+		uint64_t c;
+		uint32_t sn;
+
+		interrupted = rte_rand_max(INTERRUPTED_READER_FREQUENCY) == 0;
+
+		do {
+			sn = rte_seqlock_read_begin(&data->lock);
+
+			a = data->a;
+			/* See writer_start() for an explanation why
+			 * these barriers are here.
+			 */
+			rte_compiler_barrier();
+
+			if (interrupted)
+				rte_delay_us_block(READER_INTERRUPT_TIME);
+
+			c = data->c;
+
+			rte_compiler_barrier();
+			b = data->b;
+
+		} while (rte_seqlock_read_retry(&data->lock, sn));
+
+		if (a != b || b != c) {
+			printf("Reader observed inconsistent data values "
+			       "%" PRIu64 " %" PRIu64 " %" PRIu64 "\n",
+			       a, b, c);
+			rc = -1;
+		}
+	}
+
+	return rc;
+}
+
+static void
+reader_stop(struct reader *reader)
+{
+	__atomic_store_n(&reader->stop, 1, __ATOMIC_RELAXED);
+}
+
+#define NUM_WRITERS (2)
+#define MIN_NUM_READERS (2)
+#define MAX_READERS (RTE_MAX_LCORE - NUM_WRITERS - 1)
+#define MIN_LCORE_COUNT (NUM_WRITERS + MIN_NUM_READERS + 1)
+
+/* Only a compile-time test */
+static rte_seqlock_t __rte_unused static_init_lock = RTE_SEQLOCK_INITIALIZER;
+
+static int
+test_seqlock(void)
+{
+	struct reader readers[MAX_READERS];
+	unsigned int num_readers;
+	unsigned int num_lcores;
+	unsigned int i;
+	unsigned int lcore_id;
+	unsigned int writer_lcore_ids[NUM_WRITERS] = { 0 };
+	unsigned int reader_lcore_ids[MAX_READERS];
+	int rc = 0;
+
+	num_lcores = rte_lcore_count();
+
+	if (num_lcores < MIN_LCORE_COUNT)
+		return -1;
+
+	num_readers = num_lcores - NUM_WRITERS - 1;
+
+	struct data *data = rte_zmalloc(NULL, sizeof(struct data), 0);
+
+	i = 0;
+	RTE_LCORE_FOREACH_WORKER(lcore_id) {
+		if (i < NUM_WRITERS) {
+			rte_eal_remote_launch(writer_start, data, lcore_id);
+			writer_lcore_ids[i] = lcore_id;
+		} else {
+			unsigned int reader_idx = i - NUM_WRITERS;
+			struct reader *reader = &readers[reader_idx];
+
+			reader->data = data;
+			reader->stop = 0;
+
+			rte_eal_remote_launch(reader_start, reader, lcore_id);
+			reader_lcore_ids[reader_idx] = lcore_id;
+		}
+		i++;
+	}
+
+	for (i = 0; i < NUM_WRITERS; i++)
+		if (rte_eal_wait_lcore(writer_lcore_ids[i]) != 0)
+			rc = -1;
+
+	for (i = 0; i < num_readers; i++) {
+		reader_stop(&readers[i]);
+		if (rte_eal_wait_lcore(reader_lcore_ids[i]) != 0)
+			rc = -1;
+	}
+
+	return rc;
+}
+
+REGISTER_TEST_COMMAND(seqlock_autotest, test_seqlock);
diff --git a/lib/eal/common/meson.build b/lib/eal/common/meson.build
index 917758cc65..a41343bfed 100644
--- a/lib/eal/common/meson.build
+++ b/lib/eal/common/meson.build
@@ -35,6 +35,7 @@ sources += files(
         'rte_malloc.c',
         'rte_random.c',
         'rte_reciprocal.c',
+	'rte_seqlock.c',
         'rte_service.c',
         'rte_version.c',
 )
diff --git a/lib/eal/common/rte_seqlock.c b/lib/eal/common/rte_seqlock.c
new file mode 100644
index 0000000000..d4fe648799
--- /dev/null
+++ b/lib/eal/common/rte_seqlock.c
@@ -0,0 +1,12 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2022 Ericsson AB
+ */
+
+#include <rte_seqlock.h>
+
+void
+rte_seqlock_init(rte_seqlock_t *seqlock)
+{
+	seqlock->sn = 0;
+	rte_spinlock_init(&seqlock->lock);
+}
diff --git a/lib/eal/include/meson.build b/lib/eal/include/meson.build
index 9700494816..48df5f1a21 100644
--- a/lib/eal/include/meson.build
+++ b/lib/eal/include/meson.build
@@ -36,6 +36,7 @@ headers += files(
         'rte_per_lcore.h',
         'rte_random.h',
         'rte_reciprocal.h',
+        'rte_seqlock.h',
         'rte_service.h',
         'rte_service_component.h',
         'rte_string_fns.h',
diff --git a/lib/eal/include/rte_seqlock.h b/lib/eal/include/rte_seqlock.h
new file mode 100644
index 0000000000..03a7da57e9
--- /dev/null
+++ b/lib/eal/include/rte_seqlock.h
@@ -0,0 +1,282 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2022 Ericsson AB
+ */
+
+#ifndef _RTE_SEQLOCK_H_
+#define _RTE_SEQLOCK_H_
+
+/**
+ * @file
+ * RTE Seqlock
+ *
+ * A sequence lock (seqlock) is a synchronization primitive allowing
+ * multiple, parallel, readers to efficiently and safely (i.e., in a
+ * data-race free manner) access the lock-protected data. The RTE
+ * seqlock permits multiple writers as well. A spinlock is used for
+ * writer-writer synchronization.
+ *
+ * A reader never blocks a writer. Very high frequency writes may
+ * prevent readers from making progress.
+ *
+ * A seqlock is not preemption-safe on the writer side. If a writer is
+ * preempted, it may block readers until the writer thread is again
+ * allowed to execute. Heavy computations should be kept out of the
+ * writer-side critical section, to avoid delaying readers.
+ *
+ * Seqlocks are useful for data which are read by many cores, at a
+ * high frequency, and relatively infrequently written to.
+ *
+ * One way to think about seqlocks is that they provide means to
+ * perform atomic operations on objects larger than what the native
+ * machine instructions allow for.
+ *
+ * To avoid resource reclaimation issues, the data protected by a
+ * seqlock should typically be kept self-contained (e.g., no pointers
+ * to mutable, dynamically allocated data).
+ *
+ * Example usage:
+ * @code{.c}
+ * #define MAX_Y_LEN (16)
+ * // Application-defined example data structure, protected by a seqlock.
+ * struct config {
+ *         rte_seqlock_t lock;
+ *         int param_x;
+ *         char param_y[MAX_Y_LEN];
+ * };
+ *
+ * // Accessor function for reading config fields.
+ * void
+ * config_read(const struct config *config, int *param_x, char *param_y)
+ * {
+ *         // Temporary variables, just to improve readability.
+ *         int tentative_x;
+ *         char tentative_y[MAX_Y_LEN];
+ *
+ *         do {
+ *                 rte_seqlock_read(&config->lock);
+ *                 // Loads may be atomic or non-atomic, as in this example.
+ *                 tentative_x = config->param_x;
+ *                 strcpy(tentative_y, config->param_y);
+ *         } while (rte_seqlock_read_retry(&config->lock));
+ *         // An application could skip retrying, and try again later, if
+ *         // it can make progress without the data.
+ *
+ *         *param_x = tentative_x;
+ *         strcpy(param_y, tentative_y);
+ * }
+ *
+ * // Accessor function for writing config fields.
+ * void
+ * config_update(struct config *config, int param_x, const char *param_y)
+ * {
+ *         rte_seqlock_write_begin(&config->lock);
+ *         // Stores may be atomic or non-atomic, as in this example.
+ *         config->param_x = param_x;
+ *         strcpy(config->param_y, param_y);
+ *         rte_seqlock_write_end(&config->lock);
+ * }
+ * @endcode
+ *
+ * @see
+ * https://en.wikipedia.org/wiki/Seqlock.
+ */
+
+#include <stdbool.h>
+#include <stdint.h>
+
+#include <rte_atomic.h>
+#include <rte_branch_prediction.h>
+#include <rte_spinlock.h>
+
+/**
+ * The RTE seqlock type.
+ */
+typedef struct {
+	uint32_t sn; /**< A generation number for the protected data. */
+	rte_spinlock_t lock; /**< Spinlock used to serialize writers.  */
+} rte_seqlock_t;
+
+/**
+ * A static seqlock initializer.
+ */
+#define RTE_SEQLOCK_INITIALIZER { 0, RTE_SPINLOCK_INITIALIZER }
+
+/**
+ * Initialize the seqlock.
+ *
+ * This function initializes the seqlock, and leaves the writer-side
+ * spinlock unlocked.
+ *
+ * @param seqlock
+ *   A pointer to the seqlock.
+ */
+__rte_experimental
+void
+rte_seqlock_init(rte_seqlock_t *seqlock);
+
+/**
+ * Begin a read-side critical section.
+ *
+ * A call to this function marks the beginning of a read-side critical
+ * section, for @p seqlock.
+ *
+ * rte_seqlock_read_begin() returns a sequence number, which is later
+ * used in rte_seqlock_read_retry() to check if the protected data
+ * underwent any modifications during the read transaction.
+ *
+ * After (in program order) rte_seqlock_read_begin() has been called,
+ * the calling thread may read and copy the protected data. The
+ * protected data read *must* be copied (either in pristine form, or
+ * in the form of some derivative). A copy is required since the
+ * application only may read the data in the read-side criticial
+ * section (i.e., after rte_seqlock_read_begin() and before
+ * rte_seqlock_read_retry()), but must not act upon the retrieved data
+ * while in the critical section, since it does not yet know if it is
+ * consistent.
+ *
+ * The data may be accessed with both atomic and/or non-atomic loads.
+ *
+ * After (in program order) all required data loads have been
+ * performed, rte_seqlock_read_retry() must be called, marking the end
+ * of the read-side critical section.
+ *
+ * If rte_seqlock_read_retry() returns true, the just-read data is
+ * inconsistent and should be discarded. If rte_seqlock_read_retry()
+ * returns false, the data was read atomically and the copied data is
+ * consistent.
+ *
+ * If rte_seqlock_read_retry() returns false, the application has the
+ * option to immediately restart the whole procedure (e.g., calling
+ * rte_seqlock_read_being() again), or do the same at some later time.
+ *
+ * @param seqlock
+ *   A pointer to the seqlock.
+ * @return
+ *   The seqlock sequence number for this critical section, to
+ *   later be passed to rte_seqlock_read_retry().
+ *
+ * @see rte_seqlock_read_retry()
+ */
+__rte_experimental
+static inline uint32_t
+rte_seqlock_read_begin(const rte_seqlock_t *seqlock)
+{
+	/* __ATOMIC_ACQUIRE to prevent loads after (in program order)
+	 * from happening before the sn load. Synchronizes-with the
+	 * store release in rte_seqlock_end().
+	 */
+	return __atomic_load_n(&seqlock->sn, __ATOMIC_ACQUIRE);
+}
+
+/**
+ * End a read-side critical section.
+ *
+ * A call to this function marks the end of a read-side critical
+ * section, for @p seqlock. The application must supply the sequence
+ * number returned from the corresponding rte_seqlock_read_begin()
+ * call.
+ *
+ * After this function has been called, the caller should not access
+ * the protected data.
+ *
+ * In case this function returns false, the just-read data was
+ * consistent and the set of atomic and non-atomic load operations
+ * performed between rte_seqlock_read_begin() and
+ * rte_seqlock_read_retry() were atomic, as a whole.
+ *
+ * In case rte_seqlock_read_retry() returns true, the data was
+ * modified as it was being read and may be inconsistent, and thus
+ * should be discarded.
+ *
+ * @param seqlock
+ *   A pointer to the seqlock.
+ * @param begin_sn
+ *   The seqlock sequence number that was returned by
+ *   rte_seqlock_read_begin() for this critical section.
+ * @return
+ *   true or false, if the just-read seqlock-protected data is inconsistent
+ *   or consistent, respectively.
+ *
+ * @see rte_seqlock_read_begin()
+ */
+__rte_experimental
+static inline bool
+rte_seqlock_read_retry(const rte_seqlock_t *seqlock, uint32_t begin_sn)
+{
+	uint32_t end_sn;
+
+	/* make sure the data loads happens before the sn load */
+	rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
+
+	end_sn = __atomic_load_n(&seqlock->sn, __ATOMIC_RELAXED);
+
+	return unlikely(begin_sn & 1 || begin_sn != end_sn);
+}
+
+/**
+ * Begin write-side critical section.
+ *
+ * A call to this function acquires the write lock associated @p
+ * seqlock, and marks the beginning of a write-side critical section.
+ *
+ * After having called this function, the caller may go on to modify
+ * the protected data, in an atomic or non-atomic manner.
+ *
+ * After the nessesary updates have been performed, the application
+ * calls rte_seqlock_write_end().
+ *
+ * This function is not preemption-safe in the sense that preemption
+ * of the calling thread may block reader progress until the writer
+ * thread is rescheduled.
+ *
+ * @param seqlock
+ *   A pointer to the seqlock.
+ *
+ * @see rte_seqlock_write_end()
+ */
+__rte_experimental
+static inline void
+rte_seqlock_write_begin(rte_seqlock_t *seqlock)
+{
+	uint32_t sn;
+
+	/* to synchronize with other writers */
+	rte_spinlock_lock(&seqlock->lock);
+
+	sn = seqlock->sn + 1;
+
+	__atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELAXED);
+
+	/* __ATOMIC_RELEASE to prevent stores after (in program order)
+	 * from happening before the sn store.
+	 */
+	rte_atomic_thread_fence(__ATOMIC_RELEASE);
+}
+
+/**
+ * End write-side critical section.
+ *
+ * A call to this function marks the end of the write-side critical
+ * section, for @p seqlock. After this call has been made, the protected
+ * data may no longer be modified.
+ *
+ * @param seqlock
+ *   A pointer to the seqlock.
+ *
+ * @see rte_seqlock_write_begin()
+ */
+__rte_experimental
+static inline void
+rte_seqlock_write_end(rte_seqlock_t *seqlock)
+{
+	uint32_t sn;
+
+	sn = seqlock->sn + 1;
+
+	/* synchronizes-with the load acquire in rte_seqlock_begin() */
+	__atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELEASE);
+
+	rte_spinlock_unlock(&seqlock->lock);
+}
+
+#endif  /* _RTE_SEQLOCK_H_ */
diff --git a/lib/eal/version.map b/lib/eal/version.map
index b53eeb30d7..4a9d0ed899 100644
--- a/lib/eal/version.map
+++ b/lib/eal/version.map
@@ -420,6 +420,9 @@ EXPERIMENTAL {
 	rte_intr_instance_free;
 	rte_intr_type_get;
 	rte_intr_type_set;
+
+	# added in 22.07
+	rte_seqlock_init;
 };
 
 INTERNAL {
-- 
2.25.1


^ permalink raw reply	[flat|nested] 104+ messages in thread

* RE: [PATCH] eal: add seqlock
  2022-03-30 10:07                     ` [PATCH] " Mattias Rönnblom
@ 2022-03-30 10:50                       ` Morten Brørup
  2022-03-30 11:24                         ` Tyler Retzlaff
                                           ` (2 more replies)
  0 siblings, 3 replies; 104+ messages in thread
From: Morten Brørup @ 2022-03-30 10:50 UTC (permalink / raw)
  To: Mattias Rönnblom, dev
  Cc: Thomas Monjalon, David Marchand, onar.olsen,
	Honnappa.Nagarahalli, nd, konstantin.ananyev, stephen,
	Ola Liljedahl

> From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com]
> Sent: Wednesday, 30 March 2022 12.07

> +
> +/**
> + * The RTE seqlock type.
> + */
> +typedef struct {
> +	uint32_t sn; /**< A generation number for the protected data. */
> +	rte_spinlock_t lock; /**< Spinlock used to serialize writers.  */
> +} rte_seqlock_t;
> +

You refer to 'sn' as the sequence number everywhere else, so please document is as such:
"/**< Sequence number for the protected data. */"

Also, consider making 'sn' volatile, although it is only accessed through the __atomic_load_n() function. I don't know if it makes any difference, so I'm just bringing this to the attention of the experts!

Acked-by: Morten Brørup <mb@smartsharesystems.com>


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH] eal: add seqlock
  2022-03-30 10:50                       ` Morten Brørup
@ 2022-03-30 11:24                         ` Tyler Retzlaff
  2022-03-30 11:25                         ` Mattias Rönnblom
  2022-03-30 14:26                         ` [PATCH v2] " Mattias Rönnblom
  2 siblings, 0 replies; 104+ messages in thread
From: Tyler Retzlaff @ 2022-03-30 11:24 UTC (permalink / raw)
  To: Morten Brørup
  Cc: Mattias Rönnblom, dev, Thomas Monjalon, David Marchand,
	onar.olsen, Honnappa.Nagarahalli, nd, konstantin.ananyev,
	stephen, Ola Liljedahl

On Wed, Mar 30, 2022 at 12:50:42PM +0200, Morten Brørup wrote:
> > From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com]
> > Sent: Wednesday, 30 March 2022 12.07
> 
> > +
> > +/**
> > + * The RTE seqlock type.
> > + */
> > +typedef struct {
> > +	uint32_t sn; /**< A generation number for the protected data. */
> > +	rte_spinlock_t lock; /**< Spinlock used to serialize writers.  */
> > +} rte_seqlock_t;
> > +
> 
> You refer to 'sn' as the sequence number everywhere else, so please document is as such:
> "/**< Sequence number for the protected data. */"
> 
> Also, consider making 'sn' volatile, although it is only accessed through the __atomic_load_n() function. I don't know if it makes any difference, so I'm just bringing this to the attention of the experts!

i don't think there is value added by cv-volatile qualification.
if we want correct/portable behavior for all targets then we should
just access with appropriate atomics builtins/intrinsics they will
be qualifying volatile and generating correct barriers when
necessary.

> 
> Acked-by: Morten Brørup <mb@smartsharesystems.com>
> 

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH] eal: add seqlock
  2022-03-30 10:50                       ` Morten Brørup
  2022-03-30 11:24                         ` Tyler Retzlaff
@ 2022-03-30 11:25                         ` Mattias Rönnblom
  2022-03-30 14:26                         ` [PATCH v2] " Mattias Rönnblom
  2 siblings, 0 replies; 104+ messages in thread
From: Mattias Rönnblom @ 2022-03-30 11:25 UTC (permalink / raw)
  To: Morten Brørup, dev
  Cc: Thomas Monjalon, David Marchand, Onar Olsen,
	Honnappa.Nagarahalli, nd, konstantin.ananyev, stephen,
	Ola Liljedahl

On 2022-03-30 12:50, Morten Brørup wrote:
>> From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com]
>> Sent: Wednesday, 30 March 2022 12.07
>> +
>> +/**
>> + * The RTE seqlock type.
>> + */
>> +typedef struct {
>> +	uint32_t sn; /**< A generation number for the protected data. */
>> +	rte_spinlock_t lock; /**< Spinlock used to serialize writers.  */
>> +} rte_seqlock_t;
>> +
> You refer to 'sn' as the sequence number everywhere else, so please document is as such:
> "/**< Sequence number for the protected data. */"

Will do.

>
> Also, consider making 'sn' volatile, although it is only accessed through the __atomic_load_n() function. I don't know if it makes any difference, so I'm just bringing this to the attention of the experts!

It might make a difference, but not for the better. There are almost no 
valid uses of volatile for core-to-core/thread-to-thread 
synchronization, in C11.

> Acked-by: Morten Brørup <mb@smartsharesystems.com>
>

Thanks for your comments.


^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v2] eal: add seqlock
  2022-03-30 10:50                       ` Morten Brørup
  2022-03-30 11:24                         ` Tyler Retzlaff
  2022-03-30 11:25                         ` Mattias Rönnblom
@ 2022-03-30 14:26                         ` Mattias Rönnblom
  2022-03-31  7:46                           ` Mattias Rönnblom
                                             ` (2 more replies)
  2 siblings, 3 replies; 104+ messages in thread
From: Mattias Rönnblom @ 2022-03-30 14:26 UTC (permalink / raw)
  To: dev
  Cc: Thomas Monjalon, David Marchand, onar.olsen,
	Honnappa.Nagarahalli, nd, konstantin.ananyev, mb, stephen,
	Mattias Rönnblom, Ola Liljedahl

A sequence lock (seqlock) is synchronization primitive which allows
for data-race free, low-overhead, high-frequency reads, especially for
data structures shared across many cores and which are updated with
relatively infrequently.

A seqlock permits multiple parallel readers. The variant of seqlock
implemented in this patch supports multiple writers as well. A
spinlock is used for writer-writer serialization.

To avoid resource reclamation and other issues, the data protected by
a seqlock is best off being self-contained (i.e., no pointers [except
to constant data]).

One way to think about seqlocks is that they provide means to perform
atomic operations on data objects larger what the native atomic
machine instructions allow for.

DPDK seqlocks are not preemption safe on the writer side. A thread
preemption affects performance, not correctness.

A seqlock contains a sequence number, which can be thought of as the
generation of the data it protects.

A reader will
  1. Load the sequence number (sn).
  2. Load, in arbitrary order, the seqlock-protected data.
  3. Load the sn again.
  4. Check if the first and second sn are equal, and even numbered.
     If they are not, discard the loaded data, and restart from 1.

The first three steps need to be ordered using suitable memory fences.

A writer will
  1. Take the spinlock, to serialize writer access.
  2. Load the sn.
  3. Store the original sn + 1 as the new sn.
  4. Perform load and stores to the seqlock-protected data.
  5. Store the original sn + 2 as the new sn.
  6. Release the spinlock.

Proper memory fencing is required to make sure the first sn store, the
data stores, and the second sn store appear to the reader in the
mentioned order.

The sn loads and stores must be atomic, but the data loads and stores
need not be.

The original seqlock design and implementation was done by Stephen
Hemminger. This is an independent implementation, using C11 atomics.

For more information on seqlocks, see
https://en.wikipedia.org/wiki/Seqlock

PATCH v2:
  * Skip instead of fail unit test in case too few lcores are available.
  * Use main lcore for testing, reducing the minimum number of lcores
    required to run the unit tests to four.
  * Consistently refer to sn field as the "sequence number" in the
    documentation.
  * Fixed spelling mistakes in documentation.

Updates since RFC:
  * Added API documentation.
  * Added link to Wikipedia article in the commit message.
  * Changed seqlock sequence number field from uint64_t (which was
    overkill) to uint32_t. The sn type needs to be sufficiently large
    to assure no reader will read a sn, access the data, and then read
    the same sn, but the sn has been updated to many times during the
    read, so it has wrapped.
  * Added RTE_SEQLOCK_INITIALIZER macro for static initialization.
  * Removed the rte_seqlock struct + separate rte_seqlock_t typedef
    with an anonymous struct typedef:ed to rte_seqlock_t.

Acked-by: Morten Brørup <mb@smartsharesystems.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
---
 app/test/meson.build          |   2 +
 app/test/test_seqlock.c       | 202 ++++++++++++++++++++++++
 lib/eal/common/meson.build    |   1 +
 lib/eal/common/rte_seqlock.c  |  12 ++
 lib/eal/include/meson.build   |   1 +
 lib/eal/include/rte_seqlock.h | 282 ++++++++++++++++++++++++++++++++++
 lib/eal/version.map           |   3 +
 7 files changed, 503 insertions(+)
 create mode 100644 app/test/test_seqlock.c
 create mode 100644 lib/eal/common/rte_seqlock.c
 create mode 100644 lib/eal/include/rte_seqlock.h

diff --git a/app/test/meson.build b/app/test/meson.build
index 5fc1dd1b7b..5e418e8766 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -125,6 +125,7 @@ test_sources = files(
         'test_rwlock.c',
         'test_sched.c',
         'test_security.c',
+        'test_seqlock.c',
         'test_service_cores.c',
         'test_spinlock.c',
         'test_stack.c',
@@ -214,6 +215,7 @@ fast_tests = [
         ['rwlock_rde_wro_autotest', true],
         ['sched_autotest', true],
         ['security_autotest', false],
+        ['seqlock_autotest', true],
         ['spinlock_autotest', true],
         ['stack_autotest', false],
         ['stack_lf_autotest', false],
diff --git a/app/test/test_seqlock.c b/app/test/test_seqlock.c
new file mode 100644
index 0000000000..ba1755d9ad
--- /dev/null
+++ b/app/test/test_seqlock.c
@@ -0,0 +1,202 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2022 Ericsson AB
+ */
+
+#include <rte_seqlock.h>
+
+#include <rte_cycles.h>
+#include <rte_malloc.h>
+#include <rte_random.h>
+
+#include <inttypes.h>
+
+#include "test.h"
+
+struct data {
+	rte_seqlock_t lock;
+
+	uint64_t a;
+	uint64_t b __rte_cache_aligned;
+	uint64_t c __rte_cache_aligned;
+} __rte_cache_aligned;
+
+struct reader {
+	struct data *data;
+	uint8_t stop;
+};
+
+#define WRITER_RUNTIME (2.0) /* s */
+
+#define WRITER_MAX_DELAY (100) /* us */
+
+#define INTERRUPTED_WRITER_FREQUENCY (1000)
+#define WRITER_INTERRUPT_TIME (1) /* us */
+
+static int
+writer_run(void *arg)
+{
+	struct data *data = arg;
+	uint64_t deadline;
+
+	deadline = rte_get_timer_cycles() +
+		WRITER_RUNTIME * rte_get_timer_hz();
+
+	while (rte_get_timer_cycles() < deadline) {
+		bool interrupted;
+		uint64_t new_value;
+		unsigned int delay;
+
+		new_value = rte_rand();
+
+		interrupted = rte_rand_max(INTERRUPTED_WRITER_FREQUENCY) == 0;
+
+		rte_seqlock_write_begin(&data->lock);
+
+		data->c = new_value;
+
+		/* These compiler barriers (both on the test reader
+		 * and the test writer side) are here to ensure that
+		 * loads/stores *usually* happen in test program order
+		 * (always on a TSO machine). They are arrange in such
+		 * a way that the writer stores in a different order
+		 * than the reader loads, to emulate an arbitrary
+		 * order. A real application using a seqlock does not
+		 * require any compiler barriers.
+		 */
+		rte_compiler_barrier();
+		data->b = new_value;
+
+		if (interrupted)
+			rte_delay_us_block(WRITER_INTERRUPT_TIME);
+
+		rte_compiler_barrier();
+		data->a = new_value;
+
+		rte_seqlock_write_end(&data->lock);
+
+		delay = rte_rand_max(WRITER_MAX_DELAY);
+
+		rte_delay_us_block(delay);
+	}
+
+	return 0;
+}
+
+#define INTERRUPTED_READER_FREQUENCY (1000)
+#define READER_INTERRUPT_TIME (1000) /* us */
+
+static int
+reader_run(void *arg)
+{
+	struct reader *r = arg;
+	int rc = 0;
+
+	while (__atomic_load_n(&r->stop, __ATOMIC_RELAXED) == 0 && rc == 0) {
+		struct data *data = r->data;
+		bool interrupted;
+		uint64_t a;
+		uint64_t b;
+		uint64_t c;
+		uint32_t sn;
+
+		interrupted = rte_rand_max(INTERRUPTED_READER_FREQUENCY) == 0;
+
+		do {
+			sn = rte_seqlock_read_begin(&data->lock);
+
+			a = data->a;
+			/* See writer_run() for an explanation why
+			 * these barriers are here.
+			 */
+			rte_compiler_barrier();
+
+			if (interrupted)
+				rte_delay_us_block(READER_INTERRUPT_TIME);
+
+			c = data->c;
+
+			rte_compiler_barrier();
+			b = data->b;
+
+		} while (rte_seqlock_read_retry(&data->lock, sn));
+
+		if (a != b || b != c) {
+			printf("Reader observed inconsistent data values "
+			       "%" PRIu64 " %" PRIu64 " %" PRIu64 "\n",
+			       a, b, c);
+			rc = -1;
+		}
+	}
+
+	return rc;
+}
+
+static void
+reader_stop(struct reader *reader)
+{
+	__atomic_store_n(&reader->stop, 1, __ATOMIC_RELAXED);
+}
+
+#define NUM_WRITERS (2) /* master lcore + one worker */
+#define MIN_NUM_READERS (2)
+#define MAX_READERS (RTE_MAX_LCORE - NUM_WRITERS - 1)
+#define MIN_LCORE_COUNT (NUM_WRITERS + MIN_NUM_READERS)
+
+/* Only a compile-time test */
+static rte_seqlock_t __rte_unused static_init_lock = RTE_SEQLOCK_INITIALIZER;
+
+static int
+test_seqlock(void)
+{
+	struct reader readers[MAX_READERS];
+	unsigned int num_readers;
+	unsigned int num_lcores;
+	unsigned int i;
+	unsigned int lcore_id;
+	unsigned int reader_lcore_ids[MAX_READERS];
+	unsigned int worker_writer_lcore_id = 0;
+	int rc = 0;
+
+	num_lcores = rte_lcore_count();
+
+	if (num_lcores < MIN_LCORE_COUNT) {
+		printf("Too few cores to run test. Skipping.\n");
+		return 0;
+	}
+
+	num_readers = num_lcores - NUM_WRITERS;
+
+	struct data *data = rte_zmalloc(NULL, sizeof(struct data), 0);
+
+	i = 0;
+	RTE_LCORE_FOREACH_WORKER(lcore_id) {
+		if (i == 0) {
+			rte_eal_remote_launch(writer_run, data, lcore_id);
+			worker_writer_lcore_id = lcore_id;
+		} else {
+			unsigned int reader_idx = i - 1;
+			struct reader *reader = &readers[reader_idx];
+
+			reader->data = data;
+			reader->stop = 0;
+
+			rte_eal_remote_launch(reader_run, reader, lcore_id);
+			reader_lcore_ids[reader_idx] = lcore_id;
+		}
+		i++;
+	}
+
+	if (writer_run(data) != 0 ||
+	    rte_eal_wait_lcore(worker_writer_lcore_id) != 0)
+		rc = -1;
+
+	for (i = 0; i < num_readers; i++) {
+		reader_stop(&readers[i]);
+		if (rte_eal_wait_lcore(reader_lcore_ids[i]) != 0)
+			rc = -1;
+	}
+
+	return rc;
+}
+
+REGISTER_TEST_COMMAND(seqlock_autotest, test_seqlock);
diff --git a/lib/eal/common/meson.build b/lib/eal/common/meson.build
index 917758cc65..a41343bfed 100644
--- a/lib/eal/common/meson.build
+++ b/lib/eal/common/meson.build
@@ -35,6 +35,7 @@ sources += files(
         'rte_malloc.c',
         'rte_random.c',
         'rte_reciprocal.c',
+	'rte_seqlock.c',
         'rte_service.c',
         'rte_version.c',
 )
diff --git a/lib/eal/common/rte_seqlock.c b/lib/eal/common/rte_seqlock.c
new file mode 100644
index 0000000000..d4fe648799
--- /dev/null
+++ b/lib/eal/common/rte_seqlock.c
@@ -0,0 +1,12 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2022 Ericsson AB
+ */
+
+#include <rte_seqlock.h>
+
+void
+rte_seqlock_init(rte_seqlock_t *seqlock)
+{
+	seqlock->sn = 0;
+	rte_spinlock_init(&seqlock->lock);
+}
diff --git a/lib/eal/include/meson.build b/lib/eal/include/meson.build
index 9700494816..48df5f1a21 100644
--- a/lib/eal/include/meson.build
+++ b/lib/eal/include/meson.build
@@ -36,6 +36,7 @@ headers += files(
         'rte_per_lcore.h',
         'rte_random.h',
         'rte_reciprocal.h',
+        'rte_seqlock.h',
         'rte_service.h',
         'rte_service_component.h',
         'rte_string_fns.h',
diff --git a/lib/eal/include/rte_seqlock.h b/lib/eal/include/rte_seqlock.h
new file mode 100644
index 0000000000..12cc3cdcb2
--- /dev/null
+++ b/lib/eal/include/rte_seqlock.h
@@ -0,0 +1,282 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2022 Ericsson AB
+ */
+
+#ifndef _RTE_SEQLOCK_H_
+#define _RTE_SEQLOCK_H_
+
+/**
+ * @file
+ * RTE Seqlock
+ *
+ * A sequence lock (seqlock) is a synchronization primitive allowing
+ * multiple, parallel, readers to efficiently and safely (i.e., in a
+ * data-race free manner) access the lock-protected data. The RTE
+ * seqlock permits multiple writers as well. A spinlock is used for
+ * writer-writer synchronization.
+ *
+ * A reader never blocks a writer. Very high frequency writes may
+ * prevent readers from making progress.
+ *
+ * A seqlock is not preemption-safe on the writer side. If a writer is
+ * preempted, it may block readers until the writer thread is again
+ * allowed to execute. Heavy computations should be kept out of the
+ * writer-side critical section, to avoid delaying readers.
+ *
+ * Seqlocks are useful for data which are read by many cores, at a
+ * high frequency, and relatively infrequently written to.
+ *
+ * One way to think about seqlocks is that they provide means to
+ * perform atomic operations on objects larger than what the native
+ * machine instructions allow for.
+ *
+ * To avoid resource reclamation issues, the data protected by a
+ * seqlock should typically be kept self-contained (e.g., no pointers
+ * to mutable, dynamically allocated data).
+ *
+ * Example usage:
+ * @code{.c}
+ * #define MAX_Y_LEN (16)
+ * // Application-defined example data structure, protected by a seqlock.
+ * struct config {
+ *         rte_seqlock_t lock;
+ *         int param_x;
+ *         char param_y[MAX_Y_LEN];
+ * };
+ *
+ * // Accessor function for reading config fields.
+ * void
+ * config_read(const struct config *config, int *param_x, char *param_y)
+ * {
+ *         // Temporary variables, just to improve readability.
+ *         int tentative_x;
+ *         char tentative_y[MAX_Y_LEN];
+ *
+ *         do {
+ *                 rte_seqlock_read(&config->lock);
+ *                 // Loads may be atomic or non-atomic, as in this example.
+ *                 tentative_x = config->param_x;
+ *                 strcpy(tentative_y, config->param_y);
+ *         } while (rte_seqlock_read_retry(&config->lock));
+ *         // An application could skip retrying, and try again later, if
+ *         // it can make progress without the data.
+ *
+ *         *param_x = tentative_x;
+ *         strcpy(param_y, tentative_y);
+ * }
+ *
+ * // Accessor function for writing config fields.
+ * void
+ * config_update(struct config *config, int param_x, const char *param_y)
+ * {
+ *         rte_seqlock_write_begin(&config->lock);
+ *         // Stores may be atomic or non-atomic, as in this example.
+ *         config->param_x = param_x;
+ *         strcpy(config->param_y, param_y);
+ *         rte_seqlock_write_end(&config->lock);
+ * }
+ * @endcode
+ *
+ * @see
+ * https://en.wikipedia.org/wiki/Seqlock.
+ */
+
+#include <stdbool.h>
+#include <stdint.h>
+
+#include <rte_atomic.h>
+#include <rte_branch_prediction.h>
+#include <rte_spinlock.h>
+
+/**
+ * The RTE seqlock type.
+ */
+typedef struct {
+	uint32_t sn; /**< A sequence number for the protected data. */
+	rte_spinlock_t lock; /**< Spinlock used to serialize writers.  */
+} rte_seqlock_t;
+
+/**
+ * A static seqlock initializer.
+ */
+#define RTE_SEQLOCK_INITIALIZER { 0, RTE_SPINLOCK_INITIALIZER }
+
+/**
+ * Initialize the seqlock.
+ *
+ * This function initializes the seqlock, and leaves the writer-side
+ * spinlock unlocked.
+ *
+ * @param seqlock
+ *   A pointer to the seqlock.
+ */
+__rte_experimental
+void
+rte_seqlock_init(rte_seqlock_t *seqlock);
+
+/**
+ * Begin a read-side critical section.
+ *
+ * A call to this function marks the beginning of a read-side critical
+ * section, for @p seqlock.
+ *
+ * rte_seqlock_read_begin() returns a sequence number, which is later
+ * used in rte_seqlock_read_retry() to check if the protected data
+ * underwent any modifications during the read transaction.
+ *
+ * After (in program order) rte_seqlock_read_begin() has been called,
+ * the calling thread may read and copy the protected data. The
+ * protected data read *must* be copied (either in pristine form, or
+ * in the form of some derivative). A copy is required since the
+ * application only may read the data in the read-side critical
+ * section (i.e., after rte_seqlock_read_begin() and before
+ * rte_seqlock_read_retry()), but must not act upon the retrieved data
+ * while in the critical section, since it does not yet know if it is
+ * consistent.
+ *
+ * The data may be accessed with both atomic and/or non-atomic loads.
+ *
+ * After (in program order) all required data loads have been
+ * performed, rte_seqlock_read_retry() must be called, marking the end
+ * of the read-side critical section.
+ *
+ * If rte_seqlock_read_retry() returns true, the just-read data is
+ * inconsistent and should be discarded. If rte_seqlock_read_retry()
+ * returns false, the data was read atomically and the copied data is
+ * consistent.
+ *
+ * If rte_seqlock_read_retry() returns false, the application has the
+ * option to immediately restart the whole procedure (e.g., calling
+ * rte_seqlock_read_being() again), or do the same at some later time.
+ *
+ * @param seqlock
+ *   A pointer to the seqlock.
+ * @return
+ *   The seqlock sequence number for this critical section, to
+ *   later be passed to rte_seqlock_read_retry().
+ *
+ * @see rte_seqlock_read_retry()
+ */
+__rte_experimental
+static inline uint32_t
+rte_seqlock_read_begin(const rte_seqlock_t *seqlock)
+{
+	/* __ATOMIC_ACQUIRE to prevent loads after (in program order)
+	 * from happening before the sn load. Synchronizes-with the
+	 * store release in rte_seqlock_end().
+	 */
+	return __atomic_load_n(&seqlock->sn, __ATOMIC_ACQUIRE);
+}
+
+/**
+ * End a read-side critical section.
+ *
+ * A call to this function marks the end of a read-side critical
+ * section, for @p seqlock. The application must supply the sequence
+ * number returned from the corresponding rte_seqlock_read_begin()
+ * call.
+ *
+ * After this function has been called, the caller should not access
+ * the protected data.
+ *
+ * In case this function returns false, the just-read data was
+ * consistent and the set of atomic and non-atomic load operations
+ * performed between rte_seqlock_read_begin() and
+ * rte_seqlock_read_retry() were atomic, as a whole.
+ *
+ * In case rte_seqlock_read_retry() returns true, the data was
+ * modified as it was being read and may be inconsistent, and thus
+ * should be discarded.
+ *
+ * @param seqlock
+ *   A pointer to the seqlock.
+ * @param begin_sn
+ *   The seqlock sequence number that was returned by
+ *   rte_seqlock_read_begin() for this critical section.
+ * @return
+ *   true or false, if the just-read seqlock-protected data is inconsistent
+ *   or consistent, respectively.
+ *
+ * @see rte_seqlock_read_begin()
+ */
+__rte_experimental
+static inline bool
+rte_seqlock_read_retry(const rte_seqlock_t *seqlock, uint32_t begin_sn)
+{
+	uint32_t end_sn;
+
+	/* make sure the data loads happens before the sn load */
+	rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
+
+	end_sn = __atomic_load_n(&seqlock->sn, __ATOMIC_RELAXED);
+
+	return unlikely(begin_sn & 1 || begin_sn != end_sn);
+}
+
+/**
+ * Begin write-side critical section.
+ *
+ * A call to this function acquires the write lock associated @p
+ * seqlock, and marks the beginning of a write-side critical section.
+ *
+ * After having called this function, the caller may go on to modify
+ * the protected data, in an atomic or non-atomic manner.
+ *
+ * After the necessary updates have been performed, the application
+ * calls rte_seqlock_write_end().
+ *
+ * This function is not preemption-safe in the sense that preemption
+ * of the calling thread may block reader progress until the writer
+ * thread is rescheduled.
+ *
+ * @param seqlock
+ *   A pointer to the seqlock.
+ *
+ * @see rte_seqlock_write_end()
+ */
+__rte_experimental
+static inline void
+rte_seqlock_write_begin(rte_seqlock_t *seqlock)
+{
+	uint32_t sn;
+
+	/* to synchronize with other writers */
+	rte_spinlock_lock(&seqlock->lock);
+
+	sn = seqlock->sn + 1;
+
+	__atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELAXED);
+
+	/* __ATOMIC_RELEASE to prevent stores after (in program order)
+	 * from happening before the sn store.
+	 */
+	rte_atomic_thread_fence(__ATOMIC_RELEASE);
+}
+
+/**
+ * End write-side critical section.
+ *
+ * A call to this function marks the end of the write-side critical
+ * section, for @p seqlock. After this call has been made, the protected
+ * data may no longer be modified.
+ *
+ * @param seqlock
+ *   A pointer to the seqlock.
+ *
+ * @see rte_seqlock_write_begin()
+ */
+__rte_experimental
+static inline void
+rte_seqlock_write_end(rte_seqlock_t *seqlock)
+{
+	uint32_t sn;
+
+	sn = seqlock->sn + 1;
+
+	/* synchronizes-with the load acquire in rte_seqlock_begin() */
+	__atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELEASE);
+
+	rte_spinlock_unlock(&seqlock->lock);
+}
+
+#endif  /* _RTE_SEQLOCK_H_ */
diff --git a/lib/eal/version.map b/lib/eal/version.map
index b53eeb30d7..4a9d0ed899 100644
--- a/lib/eal/version.map
+++ b/lib/eal/version.map
@@ -420,6 +420,9 @@ EXPERIMENTAL {
 	rte_intr_instance_free;
 	rte_intr_type_get;
 	rte_intr_type_set;
+
+	# added in 22.07
+	rte_seqlock_init;
 };
 
 INTERNAL {
-- 
2.25.1


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2] eal: add seqlock
  2022-03-30 14:26                         ` [PATCH v2] " Mattias Rönnblom
@ 2022-03-31  7:46                           ` Mattias Rönnblom
  2022-03-31  9:04                             ` Ola Liljedahl
  2022-04-02  0:50                           ` Stephen Hemminger
  2022-04-05 20:16                           ` Stephen Hemminger
  2 siblings, 1 reply; 104+ messages in thread
From: Mattias Rönnblom @ 2022-03-31  7:46 UTC (permalink / raw)
  To: dev
  Cc: Thomas Monjalon, David Marchand, Onar Olsen,
	Honnappa.Nagarahalli, nd, konstantin.ananyev, mb, stephen,
	Ola Liljedahl

On 2022-03-30 16:26, Mattias Rönnblom wrote:
> A sequence lock (seqlock) is synchronization primitive which allows
> for data-race free, low-overhead, high-frequency reads, especially for
> data structures shared across many cores and which are updated with
> relatively infrequently.
>
>

<snip>

Some questions I have:

Is a variant of the seqlock without the spinlock required? The reason I 
left such out was that I thought that in most cases where only a single 
writer is used (or serialization is external to the seqlock), the 
spinlock overhead is negligible, since updates are relatively infrequent.

Should the rte_seqlock_read_retry() be called rte_seqlock_read_end(), or 
some third alternative? I wanted to make clear it's not just a "release 
the lock" function. You could use 
the|||__attribute__((warn_unused_result)) annotation to make clear the 
return value cannot be ignored, although I'm not sure DPDK ever use that 
attribute.


|

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2] eal: add seqlock
  2022-03-31  7:46                           ` Mattias Rönnblom
@ 2022-03-31  9:04                             ` Ola Liljedahl
  2022-03-31  9:25                               ` Morten Brørup
  2022-03-31 13:38                               ` Mattias Rönnblom
  0 siblings, 2 replies; 104+ messages in thread
From: Ola Liljedahl @ 2022-03-31  9:04 UTC (permalink / raw)
  To: Mattias Rönnblom, dev
  Cc: Thomas Monjalon, David Marchand, Onar Olsen,
	Honnappa.Nagarahalli, nd, konstantin.ananyev, mb, stephen


On 3/31/22 09:46, Mattias Rönnblom wrote:
> On 2022-03-30 16:26, Mattias Rönnblom wrote:
>> A sequence lock (seqlock) is synchronization primitive which allows
>> for data-race free, low-overhead, high-frequency reads, especially for
>> data structures shared across many cores and which are updated with
>> relatively infrequently.
>>
>>
> 
> <snip>
> 
> Some questions I have:
> 
> Is a variant of the seqlock without the spinlock required? The reason I
> left such out was that I thought that in most cases where only a single
> writer is used (or serialization is external to the seqlock), the
> spinlock overhead is negligible, since updates are relatively infrequent.
You can combine the spinlock and the sequence number. Odd sequence 
number means the seqlock is busy. That would replace a non-atomic RMW of 
the sequence number with an atomic RMW CAS and avoid the spin lock 
atomic RMW operation. Not sure how much it helps.

> 
> Should the rte_seqlock_read_retry() be called rte_seqlock_read_end(), or
> some third alternative? I wanted to make clear it's not just a "release
> the lock" function. You could use
> the|||__attribute__((warn_unused_result)) annotation to make clear the
> return value cannot be ignored, although I'm not sure DPDK ever use that
> attribute.
We have to decide how to use the seqlock API from the application 
perspective.
Your current proposal:
do {
     sn = rte_seqlock_read_begin(&seqlock)
     //read protected data
} while (rte_seqlock_read_retry(&seqlock, sn));

or perhaps
sn = rte_seqlock_read_lock(&seqlock);
do {
     //read protected data
} while (!rte_seqlock_read_tryunlock(&seqlock, &sn));

Tryunlock should signal to the user that the unlock operation might not 
succeed and something needs to be repeated.

-- Ola

^ permalink raw reply	[flat|nested] 104+ messages in thread

* RE: [PATCH v2] eal: add seqlock
  2022-03-31  9:04                             ` Ola Liljedahl
@ 2022-03-31  9:25                               ` Morten Brørup
  2022-03-31  9:38                                 ` Ola Liljedahl
  2022-03-31 13:51                                 ` [PATCH v2] " Mattias Rönnblom
  2022-03-31 13:38                               ` Mattias Rönnblom
  1 sibling, 2 replies; 104+ messages in thread
From: Morten Brørup @ 2022-03-31  9:25 UTC (permalink / raw)
  To: Ola Liljedahl, Mattias Rönnblom, dev
  Cc: Thomas Monjalon, David Marchand, Onar Olsen,
	Honnappa.Nagarahalli, nd, konstantin.ananyev, stephen

> From: Ola Liljedahl [mailto:ola.liljedahl@arm.com]
> Sent: Thursday, 31 March 2022 11.05
> 
> On 3/31/22 09:46, Mattias Rönnblom wrote:
> > On 2022-03-30 16:26, Mattias Rönnblom wrote:
> >> A sequence lock (seqlock) is synchronization primitive which allows
> >> for data-race free, low-overhead, high-frequency reads, especially
> for
> >> data structures shared across many cores and which are updated with
> >> relatively infrequently.
> >>
> >>
> >
> > <snip>
> >
> > Some questions I have:
> >
> > Is a variant of the seqlock without the spinlock required? The reason
> I
> > left such out was that I thought that in most cases where only a
> single
> > writer is used (or serialization is external to the seqlock), the
> > spinlock overhead is negligible, since updates are relatively
> infrequent.

Mattias, when you suggested adding the seqlock, I considered this too, and came to the same conclusion as you.

> You can combine the spinlock and the sequence number. Odd sequence
> number means the seqlock is busy. That would replace a non-atomic RMW
> of
> the sequence number with an atomic RMW CAS and avoid the spin lock
> atomic RMW operation. Not sure how much it helps.
> 
> >
> > Should the rte_seqlock_read_retry() be called rte_seqlock_read_end(),
> or
> > some third alternative? I wanted to make clear it's not just a
> "release
> > the lock" function. You could use
> > the|||__attribute__((warn_unused_result)) annotation to make clear
> the
> > return value cannot be ignored, although I'm not sure DPDK ever use
> that
> > attribute.

I strongly support adding __attribute__((warn_unused_result)) to the function. There's a first time for everything, and this attribute is very relevant here!

> We have to decide how to use the seqlock API from the application
> perspective.
> Your current proposal:
> do {
>      sn = rte_seqlock_read_begin(&seqlock)
>      //read protected data
> } while (rte_seqlock_read_retry(&seqlock, sn));
> 
> or perhaps
> sn = rte_seqlock_read_lock(&seqlock);
> do {
>      //read protected data
> } while (!rte_seqlock_read_tryunlock(&seqlock, &sn));
> 
> Tryunlock should signal to the user that the unlock operation might not
> succeed and something needs to be repeated.

Perhaps rename rte_seqlock_read_retry() to rte_seqlock_read_tryend()? As Ola mentions, this also inverses the boolean result value. If you consider this, please check that the resulting assembly output remains efficient.

I think lock()/unlock() should be avoided in the read operation names, because no lock is taken during read. I like the critical region begin()/end() names.

Regarding naming, you should also consider renaming rte_seqlock_write_begin/end() to rte_seqlock_write_lock/unlock(), following the naming convention of the other locks. This could prepare for future extensions, such as rte_seqlock_write_trylock(). Just a thought; I don't feel strongly about this.

Ola, the rte_seqlock_read_lock(&seqlock) must remain inside the loop, because retries can be triggered by a write operation happening between the read_begin() and read_tryend(), and then the new sn must be used by the read operation.


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2] eal: add seqlock
  2022-03-31  9:25                               ` Morten Brørup
@ 2022-03-31  9:38                                 ` Ola Liljedahl
  2022-03-31 10:03                                   ` Morten Brørup
  2022-03-31 13:51                                 ` [PATCH v2] " Mattias Rönnblom
  1 sibling, 1 reply; 104+ messages in thread
From: Ola Liljedahl @ 2022-03-31  9:38 UTC (permalink / raw)
  To: Morten Brørup, Mattias Rönnblom, dev
  Cc: Thomas Monjalon, David Marchand, Onar Olsen,
	Honnappa.Nagarahalli, nd, konstantin.ananyev, stephen



On 3/31/22 11:25, Morten Brørup wrote:
>> From: Ola Liljedahl [mailto:ola.liljedahl@arm.com]
>> Sent: Thursday, 31 March 2022 11.05
>>
>> On 3/31/22 09:46, Mattias Rönnblom wrote:
>>> On 2022-03-30 16:26, Mattias Rönnblom wrote:
>>>> A sequence lock (seqlock) is synchronization primitive which allows
>>>> for data-race free, low-overhead, high-frequency reads, especially
>> for
>>>> data structures shared across many cores and which are updated with
>>>> relatively infrequently.
>>>>
>>>>
>>>
>>> <snip>
>>>
>>> Some questions I have:
>>>
>>> Is a variant of the seqlock without the spinlock required? The reason
>> I
>>> left such out was that I thought that in most cases where only a
>> single
>>> writer is used (or serialization is external to the seqlock), the
>>> spinlock overhead is negligible, since updates are relatively
>> infrequent.
> 
> Mattias, when you suggested adding the seqlock, I considered this too, and came to the same conclusion as you.
> 
>> You can combine the spinlock and the sequence number. Odd sequence
>> number means the seqlock is busy. That would replace a non-atomic RMW
>> of
>> the sequence number with an atomic RMW CAS and avoid the spin lock
>> atomic RMW operation. Not sure how much it helps.
>>
>>>
>>> Should the rte_seqlock_read_retry() be called rte_seqlock_read_end(),
>> or
>>> some third alternative? I wanted to make clear it's not just a
>> "release
>>> the lock" function. You could use
>>> the|||__attribute__((warn_unused_result)) annotation to make clear
>> the
>>> return value cannot be ignored, although I'm not sure DPDK ever use
>> that
>>> attribute.
> 
> I strongly support adding __attribute__((warn_unused_result)) to the function. There's a first time for everything, and this attribute is very relevant here!
> 
>> We have to decide how to use the seqlock API from the application
>> perspective.
>> Your current proposal:
>> do {
>>       sn = rte_seqlock_read_begin(&seqlock)
>>       //read protected data
>> } while (rte_seqlock_read_retry(&seqlock, sn));
>>
>> or perhaps
>> sn = rte_seqlock_read_lock(&seqlock);
>> do {
>>       //read protected data
>> } while (!rte_seqlock_read_tryunlock(&seqlock, &sn));
>>
>> Tryunlock should signal to the user that the unlock operation might not
>> succeed and something needs to be repeated.
> 
> Perhaps rename rte_seqlock_read_retry() to rte_seqlock_read_tryend()? As Ola mentions, this also inverses the boolean result value. If you consider this, please check that the resulting assembly output remains efficient.
> 
> I think lock()/unlock() should be avoided in the read operation names, because no lock is taken during read. I like the critical region begin()/end() names.
I was following the naming convention of rte_rwlock. Isn't the seqlock 
just a more scalable implementation of a reader/writer lock?


> 
> Regarding naming, you should also consider renaming rte_seqlock_write_begin/end() to rte_seqlock_write_lock/unlock(), following the naming convention of the other locks. This could prepare for future extensions, such as rte_seqlock_write_trylock(). Just a thought; I don't feel strongly about this.
> 
> Ola, the rte_seqlock_read_lock(&seqlock) must remain inside the loop, because retries can be triggered by a write operation happening between the read_begin() and read_tryend(), and then the new sn must be used by the read operation.
That's why my rte_seqlock_read_tryunlock() function takes the sequence 
number as a parameter passed by reference. Then the sequence number can 
be updated if necessary. I didn't want to force a new call to 
rte_seqlock_read_lock() because there should be a one-to-one match 
between rte_seqlock_read_lock() and a successful call to 
rte_seqlock_read_tryunlock().

- Ola
> 

^ permalink raw reply	[flat|nested] 104+ messages in thread

* RE: [PATCH v2] eal: add seqlock
  2022-03-31  9:38                                 ` Ola Liljedahl
@ 2022-03-31 10:03                                   ` Morten Brørup
  2022-03-31 11:44                                     ` Ola Liljedahl
  0 siblings, 1 reply; 104+ messages in thread
From: Morten Brørup @ 2022-03-31 10:03 UTC (permalink / raw)
  To: Ola Liljedahl, Mattias Rönnblom, dev
  Cc: Thomas Monjalon, David Marchand, Onar Olsen,
	Honnappa.Nagarahalli, nd, konstantin.ananyev, stephen

> From: Ola Liljedahl [mailto:ola.liljedahl@arm.com]
> Sent: Thursday, 31 March 2022 11.39
> 
> On 3/31/22 11:25, Morten Brørup wrote:
> >> From: Ola Liljedahl [mailto:ola.liljedahl@arm.com]
> >> Sent: Thursday, 31 March 2022 11.05
> >>
> >> On 3/31/22 09:46, Mattias Rönnblom wrote:
> >>> On 2022-03-30 16:26, Mattias Rönnblom wrote:
> >>>> A sequence lock (seqlock) is synchronization primitive which
> >>>> allows
> >>>> for data-race free, low-overhead, high-frequency reads, especially
> >>>> for
> >>>> data structures shared across many cores and which are updated
> >>>> with relatively infrequently.
> >>>>
> >>>>
> >>>
> >>> <snip>
> >>>
> >>> Some questions I have:
> >>>
> >>> Is a variant of the seqlock without the spinlock required? The
> >>> reason I
> >>> left such out was that I thought that in most cases where only a
> >>> single
> >>> writer is used (or serialization is external to the seqlock), the
> >>> spinlock overhead is negligible, since updates are relatively
> >>> infrequent.
> >
> > Mattias, when you suggested adding the seqlock, I considered this
> > too, and came to the same conclusion as you.
> >
> >> You can combine the spinlock and the sequence number. Odd sequence
> >> number means the seqlock is busy. That would replace a non-atomic
> >> RMW of
> >> the sequence number with an atomic RMW CAS and avoid the spin lock
> >> atomic RMW operation. Not sure how much it helps.
> >>
> >>>
> >>> Should the rte_seqlock_read_retry() be called
> >>> rte_seqlock_read_end(), or
> >>> some third alternative? I wanted to make clear it's not just a
> >>> "release the lock" function. You could use
> >>> the|||__attribute__((warn_unused_result)) annotation to make clear
> >>> the
> >>> return value cannot be ignored, although I'm not sure DPDK ever use
> >>> that attribute.
> >
> > I strongly support adding __attribute__((warn_unused_result)) to the
> > function. There's a first time for everything, and this attribute is
> > very relevant here!
> >
> >> We have to decide how to use the seqlock API from the application
> >> perspective.
> >> Your current proposal:
> >> do {
> >>       sn = rte_seqlock_read_begin(&seqlock)
> >>       //read protected data
> >> } while (rte_seqlock_read_retry(&seqlock, sn));
> >>
> >> or perhaps
> >> sn = rte_seqlock_read_lock(&seqlock);
> >> do {
> >>       //read protected data
> >> } while (!rte_seqlock_read_tryunlock(&seqlock, &sn));
> >>
> >> Tryunlock should signal to the user that the unlock operation might
> >> not succeed and something needs to be repeated.
> >
> > Perhaps rename rte_seqlock_read_retry() to rte_seqlock_read_tryend()?
> > As Ola mentions, this also inverses the boolean result value. If you
> > consider this, please check that the resulting assembly output remains
> > efficient.
> >
> > I think lock()/unlock() should be avoided in the read operation
> > names, because no lock is taken during read. I like the critical region
> > begin()/end() names.
> I was following the naming convention of rte_rwlock. Isn't the seqlock
> just a more scalable implementation of a reader/writer lock?

I see your point. However, no lock is taken, so using lock()/unlock() is somewhat misleading.

I have no strong opinion about this, so I'll leave it up to Mattias.

> > Regarding naming, you should also consider renaming
> > rte_seqlock_write_begin/end() to rte_seqlock_write_lock/unlock(),
> > following the naming convention of the other locks. This could prepare
> > for future extensions, such as rte_seqlock_write_trylock(). Just a
> > thought; I don't feel strongly about this.
> >
> > Ola, the rte_seqlock_read_lock(&seqlock) must remain inside the loop,
> > because retries can be triggered by a write operation happening between
> > the read_begin() and read_tryend(), and then the new sn must be used by
> > the read operation.
> That's why my rte_seqlock_read_tryunlock() function takes the sequence
> number as a parameter passed by reference. Then the sequence number can
> be updated if necessary. I didn't want to force a new call to
> rte_seqlock_read_lock() because there should be a one-to-one match
> between rte_seqlock_read_lock() and a successful call to
> rte_seqlock_read_tryunlock().

Uhh... I missed that point.

In that case, consider passing sn as output parameter to read_begin() by reference too, like the Linux kernel's spin_lock_irqsave() takes the flags as output parameter. I don't have a strong opinion here; just mentioning the possibility.

Performance wise, the resulting assembly output is probably the same, regardless if sn is the return value or an output parameter passed by reference.


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2] eal: add seqlock
  2022-03-31 10:03                                   ` Morten Brørup
@ 2022-03-31 11:44                                     ` Ola Liljedahl
  2022-03-31 11:50                                       ` Morten Brørup
  2022-03-31 14:02                                       ` Mattias Rönnblom
  0 siblings, 2 replies; 104+ messages in thread
From: Ola Liljedahl @ 2022-03-31 11:44 UTC (permalink / raw)
  To: Morten Brørup, Mattias Rönnblom, dev
  Cc: Thomas Monjalon, David Marchand, Onar Olsen,
	Honnappa.Nagarahalli, nd, konstantin.ananyev, stephen

<snip>
>>> I think lock()/unlock() should be avoided in the read operation
>>> names, because no lock is taken during read. I like the critical region
>>> begin()/end() names.
>> I was following the naming convention of rte_rwlock. Isn't the seqlock
>> just a more scalable implementation of a reader/writer lock?
> 
> I see your point. However, no lock is taken, so using lock()/unlock() is somewhat misleading.
Conceptually, a reader lock is acquired and should be released. Now 
there wouldn't be any negative effects of skipping the unlock operation 
but then you wouldn't know if the data was properly read so you would 
have to ignore any read data as well. Why even call 
rte_seqlock_read_lock() in such a situation?

In the only meaningful case, the lock is acquired, the protected data is 
read and the lock is released. The only difference compared to a more 
vanilla lock implementation is that the release operation may fail and 
the operation must restart.

<snip>

- Ola

^ permalink raw reply	[flat|nested] 104+ messages in thread

* RE: [PATCH v2] eal: add seqlock
  2022-03-31 11:44                                     ` Ola Liljedahl
@ 2022-03-31 11:50                                       ` Morten Brørup
  2022-03-31 14:02                                       ` Mattias Rönnblom
  1 sibling, 0 replies; 104+ messages in thread
From: Morten Brørup @ 2022-03-31 11:50 UTC (permalink / raw)
  To: Ola Liljedahl, Mattias Rönnblom, dev
  Cc: Thomas Monjalon, David Marchand, Onar Olsen,
	Honnappa.Nagarahalli, nd, konstantin.ananyev, stephen

> From: Ola Liljedahl [mailto:ola.liljedahl@arm.com]
> Sent: Thursday, 31 March 2022 13.44
> 
> <snip>
> >>> I think lock()/unlock() should be avoided in the read operation
> >>> names, because no lock is taken during read. I like the critical
> region
> >>> begin()/end() names.
> >> I was following the naming convention of rte_rwlock. Isn't the
> seqlock
> >> just a more scalable implementation of a reader/writer lock?
> >
> > I see your point. However, no lock is taken, so using lock()/unlock()
> is somewhat misleading.
> Conceptually, a reader lock is acquired and should be released. Now
> there wouldn't be any negative effects of skipping the unlock operation
> but then you wouldn't know if the data was properly read so you would
> have to ignore any read data as well. Why even call
> rte_seqlock_read_lock() in such a situation?
> 
> In the only meaningful case, the lock is acquired, the protected data
> is
> read and the lock is released. The only difference compared to a more
> vanilla lock implementation is that the release operation may fail and
> the operation must restart.

Thank you for taking the time to correct me on this...

I was stuck on the "lock" variable not being touched, but you are right: The serial number is a lock conceptually taken.

Then I agree about the lock()/unlock() names for the read operation too.

-Morten

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2] eal: add seqlock
  2022-03-31  9:04                             ` Ola Liljedahl
  2022-03-31  9:25                               ` Morten Brørup
@ 2022-03-31 13:38                               ` Mattias Rönnblom
  2022-03-31 14:53                                 ` Ola Liljedahl
  1 sibling, 1 reply; 104+ messages in thread
From: Mattias Rönnblom @ 2022-03-31 13:38 UTC (permalink / raw)
  To: Ola Liljedahl, dev
  Cc: Thomas Monjalon, David Marchand, Onar Olsen,
	Honnappa.Nagarahalli, nd, konstantin.ananyev, mb, stephen

On 2022-03-31 11:04, Ola Liljedahl wrote:
> 
> On 3/31/22 09:46, Mattias Rönnblom wrote:
>> On 2022-03-30 16:26, Mattias Rönnblom wrote:
>>> A sequence lock (seqlock) is synchronization primitive which allows
>>> for data-race free, low-overhead, high-frequency reads, especially for
>>> data structures shared across many cores and which are updated with
>>> relatively infrequently.
>>>
>>>
>>
>> <snip>
>>
>> Some questions I have:
>>
>> Is a variant of the seqlock without the spinlock required? The reason I
>> left such out was that I thought that in most cases where only a single
>> writer is used (or serialization is external to the seqlock), the
>> spinlock overhead is negligible, since updates are relatively infrequent.
> You can combine the spinlock and the sequence number. Odd sequence 
> number means the seqlock is busy. That would replace a non-atomic RMW of 
> the sequence number with an atomic RMW CAS and avoid the spin lock 
> atomic RMW operation. Not sure how much it helps.
> 
>>
>> Should the rte_seqlock_read_retry() be called rte_seqlock_read_end(), or
>> some third alternative? I wanted to make clear it's not just a "release
>> the lock" function. You could use
>> the|||__attribute__((warn_unused_result)) annotation to make clear the
>> return value cannot be ignored, although I'm not sure DPDK ever use that
>> attribute.
> We have to decide how to use the seqlock API from the application 
> perspective.
> Your current proposal:
> do {
>      sn = rte_seqlock_read_begin(&seqlock)
>      //read protected data
> } while (rte_seqlock_read_retry(&seqlock, sn));
> 
> or perhaps
> sn = rte_seqlock_read_lock(&seqlock);
> do {
>      //read protected data
> } while (!rte_seqlock_read_tryunlock(&seqlock, &sn));
> 
> Tryunlock should signal to the user that the unlock operation might not 
> succeed and something needs to be repeated.
> 

I like that your proposal is consistent with rwlock API, although I tend 
to think about a seqlock more like an arbitrary-size atomic load/store, 
where begin() is the beginning of the read transaction.

What I don't like so much with "tryunlock" is that it's not obvious what 
return type and values it should have. I seem not to be the only one 
which suffers from a lack of intuition here, since the DPDK spinlock 
trylock() function returns '1' in case lock is taken (using an int, but 
treating it like a bool), while the rwlock equivalent returns '0' (also 
int, but treating it as an error code).

"lock" also suggests you prevent something from occurring, which is not 
the case on the reader side. A calling application also need not call 
the reader unlock (or retry) function for all seqlocks it has locked, 
although I don't see a point why it wouldn't. (I don't see why a 
read-side critical section should contain much logic at all, since you 
can't act on the just-read data.)

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2] eal: add seqlock
  2022-03-31  9:25                               ` Morten Brørup
  2022-03-31  9:38                                 ` Ola Liljedahl
@ 2022-03-31 13:51                                 ` Mattias Rönnblom
  2022-04-02  0:54                                   ` Stephen Hemminger
  1 sibling, 1 reply; 104+ messages in thread
From: Mattias Rönnblom @ 2022-03-31 13:51 UTC (permalink / raw)
  To: Morten Brørup, Ola Liljedahl, dev
  Cc: Thomas Monjalon, David Marchand, Onar Olsen,
	Honnappa.Nagarahalli, nd, konstantin.ananyev, stephen

On 2022-03-31 11:25, Morten Brørup wrote:
>> From: Ola Liljedahl [mailto:ola.liljedahl@arm.com]
>> Sent: Thursday, 31 March 2022 11.05
>>
>> On 3/31/22 09:46, Mattias Rönnblom wrote:
>>> On 2022-03-30 16:26, Mattias Rönnblom wrote:
>>>> A sequence lock (seqlock) is synchronization primitive which allows
>>>> for data-race free, low-overhead, high-frequency reads, especially
>> for
>>>> data structures shared across many cores and which are updated with
>>>> relatively infrequently.
>>>>
>>>>
>>>
>>> <snip>
>>>
>>> Some questions I have:
>>>
>>> Is a variant of the seqlock without the spinlock required? The reason
>> I
>>> left such out was that I thought that in most cases where only a
>> single
>>> writer is used (or serialization is external to the seqlock), the
>>> spinlock overhead is negligible, since updates are relatively
>> infrequent.
> 
> Mattias, when you suggested adding the seqlock, I considered this too, and came to the same conclusion as you.
> 
>> You can combine the spinlock and the sequence number. Odd sequence
>> number means the seqlock is busy. That would replace a non-atomic RMW
>> of
>> the sequence number with an atomic RMW CAS and avoid the spin lock
>> atomic RMW operation. Not sure how much it helps.
>>
>>>
>>> Should the rte_seqlock_read_retry() be called rte_seqlock_read_end(),
>> or
>>> some third alternative? I wanted to make clear it's not just a
>> "release
>>> the lock" function. You could use
>>> the|||__attribute__((warn_unused_result)) annotation to make clear
>> the
>>> return value cannot be ignored, although I'm not sure DPDK ever use
>> that
>>> attribute.
> 
> I strongly support adding __attribute__((warn_unused_result)) to the function. There's a first time for everything, and this attribute is very relevant here!
> 

That would be a separate patch, I assume. Does anyone know if this 
attribute is available in all supported compilers?

>> We have to decide how to use the seqlock API from the application
>> perspective.
>> Your current proposal:
>> do {
>>       sn = rte_seqlock_read_begin(&seqlock)
>>       //read protected data
>> } while (rte_seqlock_read_retry(&seqlock, sn));
>>
>> or perhaps
>> sn = rte_seqlock_read_lock(&seqlock);
>> do {
>>       //read protected data
>> } while (!rte_seqlock_read_tryunlock(&seqlock, &sn));
>>
>> Tryunlock should signal to the user that the unlock operation might not
>> succeed and something needs to be repeated.
> 
> Perhaps rename rte_seqlock_read_retry() to rte_seqlock_read_tryend()? As Ola mentions, this also inverses the boolean result value. If you consider this, please check that the resulting assembly output remains efficient.
> 
> I think lock()/unlock() should be avoided in the read operation names, because no lock is taken during read. I like the critical region begin()/end() names.
> 
> Regarding naming, you should also consider renaming rte_seqlock_write_begin/end() to rte_seqlock_write_lock/unlock(), following the naming convention of the other locks. This could prepare for future extensions, such as rte_seqlock_write_trylock(). Just a thought; I don't feel strongly about this.
> 
> Ola, the rte_seqlock_read_lock(&seqlock) must remain inside the loop, because retries can be triggered by a write operation happening between the read_begin() and read_tryend(), and then the new sn must be used by the read operation.
> 

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2] eal: add seqlock
  2022-03-31 11:44                                     ` Ola Liljedahl
  2022-03-31 11:50                                       ` Morten Brørup
@ 2022-03-31 14:02                                       ` Mattias Rönnblom
  2022-04-01 15:07                                         ` [PATCH v3] " Mattias Rönnblom
  1 sibling, 1 reply; 104+ messages in thread
From: Mattias Rönnblom @ 2022-03-31 14:02 UTC (permalink / raw)
  To: Ola Liljedahl, Morten Brørup, dev
  Cc: Thomas Monjalon, David Marchand, Onar Olsen,
	Honnappa.Nagarahalli, nd, konstantin.ananyev, stephen

On 2022-03-31 13:44, Ola Liljedahl wrote:
> <snip>
>>>> I think lock()/unlock() should be avoided in the read operation
>>>> names, because no lock is taken during read. I like the critical region
>>>> begin()/end() names.
>>> I was following the naming convention of rte_rwlock. Isn't the seqlock
>>> just a more scalable implementation of a reader/writer lock?
>>
>> I see your point. However, no lock is taken, so using lock()/unlock() 
>> is somewhat misleading.
> Conceptually, a reader lock is acquired and should be released. Now 
> there wouldn't be any negative effects of skipping the unlock operation 
> but then you wouldn't know if the data was properly read so you would 
> have to ignore any read data as well. Why even call 
> rte_seqlock_read_lock() in such a situation?
> 
> In the only meaningful case, the lock is acquired, the protected data is 
> read and the lock is released. The only difference compared to a more 
> vanilla lock implementation is that the release operation may fail and 
> the operation must restart.
> 
> <snip>
> 
> - Ola

The RCU library also use the terms "lock" and "unlock" for the reader side.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2] eal: add seqlock
  2022-03-31 13:38                               ` Mattias Rönnblom
@ 2022-03-31 14:53                                 ` Ola Liljedahl
  2022-04-02  0:52                                   ` Stephen Hemminger
  0 siblings, 1 reply; 104+ messages in thread
From: Ola Liljedahl @ 2022-03-31 14:53 UTC (permalink / raw)
  To: Mattias Rönnblom, dev
  Cc: Thomas Monjalon, David Marchand, Onar Olsen,
	Honnappa.Nagarahalli, nd, konstantin.ananyev, mb, stephen

(Thunderbird suddenly refuses to edit in plain text mode, hope the mail 
gets sent as text anyway)

On 3/31/22 15:38, Mattias Rönnblom wrote:

> On 2022-03-31 11:04, Ola Liljedahl wrote:
>> On 3/31/22 09:46, Mattias Rönnblom wrote:
>>> On 2022-03-30 16:26, Mattias Rönnblom wrote:
>>>
<snip>
>>> Should the rte_seqlock_read_retry() be called rte_seqlock_read_end(), or
>>> some third alternative? I wanted to make clear it's not just a "release
>>> the lock" function. You could use
>>> the|||__attribute__((warn_unused_result)) annotation to make clear the
>>> return value cannot be ignored, although I'm not sure DPDK ever use that
>>> attribute.
>> We have to decide how to use the seqlock API from the application
>> perspective.
>> Your current proposal:
>> do {
>>       sn = rte_seqlock_read_begin(&seqlock)
>>       //read protected data
>> } while (rte_seqlock_read_retry(&seqlock, sn));
>>
>> or perhaps
>> sn = rte_seqlock_read_lock(&seqlock);
>> do {
>>       //read protected data
>> } while (!rte_seqlock_read_tryunlock(&seqlock, &sn));
>>
>> Tryunlock should signal to the user that the unlock operation might not
>> succeed and something needs to be repeated.
>>
> I like that your proposal is consistent with rwlock API, although I tend
> to think about a seqlock more like an arbitrary-size atomic load/store,
> where begin() is the beginning of the read transaction.

I can see the evolution of an application where is starts to use plain 
spin locks, moves to reader/writer locks for better performance and 
eventually moves to seqlocks. The usage is the same, only the 
characteristics (including performance) differ.

>
> What I don't like so much with "tryunlock" is that it's not obvious what
> return type and values it should have. I seem not to be the only one
> which suffers from a lack of intuition here, since the DPDK spinlock
> trylock() function returns '1' in case lock is taken (using an int, but
> treating it like a bool), while the rwlock equivalent returns '0' (also
> int, but treating it as an error code).
Then you have two different ways of doing it! Or invent a third since 
there seems to be no consistent pattern.
>
> "lock" also suggests you prevent something from occurring, which is not
> the case on the reader side.

That's why my implementations in Progress64 use the terms acquire and 
release. Locks are acquired and released (with acquire and release 
semantics!). Hazard pointers are acquired and released (with acquire and 
release semantics!). Slots in reorder buffers are acquired and released. 
Etc.

https://github.com/ARM-software/progress64

>   A calling application also need not call
> the reader unlock (or retry) function for all seqlocks it has locked,
> although I don't see a point why it wouldn't. (I don't see why a
> read-side critical section should contain much logic at all, since you
> can't act on the just-read data.)

Lock without unlock/retry is meaningless and not something we need to 
consider IMO.


- Ola


^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v3] eal: add seqlock
  2022-03-31 14:02                                       ` Mattias Rönnblom
@ 2022-04-01 15:07                                         ` Mattias Rönnblom
  2022-04-02  0:21                                           ` Honnappa Nagarahalli
  2022-04-02 18:15                                           ` Ola Liljedahl
  0 siblings, 2 replies; 104+ messages in thread
From: Mattias Rönnblom @ 2022-04-01 15:07 UTC (permalink / raw)
  To: dev
  Cc: Thomas Monjalon, David Marchand, onar.olsen,
	Honnappa.Nagarahalli, nd, konstantin.ananyev, mb, stephen,
	Mattias Rönnblom, Ola Liljedahl

A sequence lock (seqlock) is synchronization primitive which allows
for data-race free, low-overhead, high-frequency reads, especially for
data structures shared across many cores and which are updated with
relatively infrequently.

A seqlock permits multiple parallel readers. The variant of seqlock
implemented in this patch supports multiple writers as well. A
spinlock is used for writer-writer serialization.

To avoid resource reclamation and other issues, the data protected by
a seqlock is best off being self-contained (i.e., no pointers [except
to constant data]).

One way to think about seqlocks is that they provide means to perform
atomic operations on data objects larger what the native atomic
machine instructions allow for.

DPDK seqlocks are not preemption safe on the writer side. A thread
preemption affects performance, not correctness.

A seqlock contains a sequence number, which can be thought of as the
generation of the data it protects.

A reader will
  1. Load the sequence number (sn).
  2. Load, in arbitrary order, the seqlock-protected data.
  3. Load the sn again.
  4. Check if the first and second sn are equal, and even numbered.
     If they are not, discard the loaded data, and restart from 1.

The first three steps need to be ordered using suitable memory fences.

A writer will
  1. Take the spinlock, to serialize writer access.
  2. Load the sn.
  3. Store the original sn + 1 as the new sn.
  4. Perform load and stores to the seqlock-protected data.
  5. Store the original sn + 2 as the new sn.
  6. Release the spinlock.

Proper memory fencing is required to make sure the first sn store, the
data stores, and the second sn store appear to the reader in the
mentioned order.

The sn loads and stores must be atomic, but the data loads and stores
need not be.

The original seqlock design and implementation was done by Stephen
Hemminger. This is an independent implementation, using C11 atomics.

For more information on seqlocks, see
https://en.wikipedia.org/wiki/Seqlock

PATCH v3:
  * Renamed both read and write-side critical section begin/end functions
    to better match rwlock naming, per Ola Liljedahl's suggestion.
  * Added 'extern "C"' guards for C++ compatibility.
  * Refer to the main lcore as the main, and not the master.

PATCH v2:
  * Skip instead of fail unit test in case too few lcores are available.
  * Use main lcore for testing, reducing the minimum number of lcores
    required to run the unit tests to four.
  * Consistently refer to sn field as the "sequence number" in the
    documentation.
  * Fixed spelling mistakes in documentation.

Updates since RFC:
  * Added API documentation.
  * Added link to Wikipedia article in the commit message.
  * Changed seqlock sequence number field from uint64_t (which was
    overkill) to uint32_t. The sn type needs to be sufficiently large
    to assure no reader will read a sn, access the data, and then read
    the same sn, but the sn has been updated to many times during the
    read, so it has wrapped.
  * Added RTE_SEQLOCK_INITIALIZER macro for static initialization.
  * Removed the rte_seqlock struct + separate rte_seqlock_t typedef
    with an anonymous struct typedef:ed to rte_seqlock_t.

Acked-by: Morten Brørup <mb@smartsharesystems.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
---
 app/test/meson.build          |   2 +
 app/test/test_seqlock.c       | 202 +++++++++++++++++++++++
 lib/eal/common/meson.build    |   1 +
 lib/eal/common/rte_seqlock.c  |  12 ++
 lib/eal/include/meson.build   |   1 +
 lib/eal/include/rte_seqlock.h | 302 ++++++++++++++++++++++++++++++++++
 lib/eal/version.map           |   3 +
 7 files changed, 523 insertions(+)
 create mode 100644 app/test/test_seqlock.c
 create mode 100644 lib/eal/common/rte_seqlock.c
 create mode 100644 lib/eal/include/rte_seqlock.h

diff --git a/app/test/meson.build b/app/test/meson.build
index 5fc1dd1b7b..5e418e8766 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -125,6 +125,7 @@ test_sources = files(
         'test_rwlock.c',
         'test_sched.c',
         'test_security.c',
+        'test_seqlock.c',
         'test_service_cores.c',
         'test_spinlock.c',
         'test_stack.c',
@@ -214,6 +215,7 @@ fast_tests = [
         ['rwlock_rde_wro_autotest', true],
         ['sched_autotest', true],
         ['security_autotest', false],
+        ['seqlock_autotest', true],
         ['spinlock_autotest', true],
         ['stack_autotest', false],
         ['stack_lf_autotest', false],
diff --git a/app/test/test_seqlock.c b/app/test/test_seqlock.c
new file mode 100644
index 0000000000..54fadf8025
--- /dev/null
+++ b/app/test/test_seqlock.c
@@ -0,0 +1,202 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2022 Ericsson AB
+ */
+
+#include <rte_seqlock.h>
+
+#include <rte_cycles.h>
+#include <rte_malloc.h>
+#include <rte_random.h>
+
+#include <inttypes.h>
+
+#include "test.h"
+
+struct data {
+	rte_seqlock_t lock;
+
+	uint64_t a;
+	uint64_t b __rte_cache_aligned;
+	uint64_t c __rte_cache_aligned;
+} __rte_cache_aligned;
+
+struct reader {
+	struct data *data;
+	uint8_t stop;
+};
+
+#define WRITER_RUNTIME (2.0) /* s */
+
+#define WRITER_MAX_DELAY (100) /* us */
+
+#define INTERRUPTED_WRITER_FREQUENCY (1000)
+#define WRITER_INTERRUPT_TIME (1) /* us */
+
+static int
+writer_run(void *arg)
+{
+	struct data *data = arg;
+	uint64_t deadline;
+
+	deadline = rte_get_timer_cycles() +
+		WRITER_RUNTIME * rte_get_timer_hz();
+
+	while (rte_get_timer_cycles() < deadline) {
+		bool interrupted;
+		uint64_t new_value;
+		unsigned int delay;
+
+		new_value = rte_rand();
+
+		interrupted = rte_rand_max(INTERRUPTED_WRITER_FREQUENCY) == 0;
+
+		rte_seqlock_write_lock(&data->lock);
+
+		data->c = new_value;
+
+		/* These compiler barriers (both on the test reader
+		 * and the test writer side) are here to ensure that
+		 * loads/stores *usually* happen in test program order
+		 * (always on a TSO machine). They are arrange in such
+		 * a way that the writer stores in a different order
+		 * than the reader loads, to emulate an arbitrary
+		 * order. A real application using a seqlock does not
+		 * require any compiler barriers.
+		 */
+		rte_compiler_barrier();
+		data->b = new_value;
+
+		if (interrupted)
+			rte_delay_us_block(WRITER_INTERRUPT_TIME);
+
+		rte_compiler_barrier();
+		data->a = new_value;
+
+		rte_seqlock_write_unlock(&data->lock);
+
+		delay = rte_rand_max(WRITER_MAX_DELAY);
+
+		rte_delay_us_block(delay);
+	}
+
+	return 0;
+}
+
+#define INTERRUPTED_READER_FREQUENCY (1000)
+#define READER_INTERRUPT_TIME (1000) /* us */
+
+static int
+reader_run(void *arg)
+{
+	struct reader *r = arg;
+	int rc = 0;
+
+	while (__atomic_load_n(&r->stop, __ATOMIC_RELAXED) == 0 && rc == 0) {
+		struct data *data = r->data;
+		bool interrupted;
+		uint64_t a;
+		uint64_t b;
+		uint64_t c;
+		uint32_t sn;
+
+		interrupted = rte_rand_max(INTERRUPTED_READER_FREQUENCY) == 0;
+
+		sn = rte_seqlock_read_lock(&data->lock);
+
+		do {
+			a = data->a;
+			/* See writer_run() for an explanation why
+			 * these barriers are here.
+			 */
+			rte_compiler_barrier();
+
+			if (interrupted)
+				rte_delay_us_block(READER_INTERRUPT_TIME);
+
+			c = data->c;
+
+			rte_compiler_barrier();
+			b = data->b;
+
+		} while (!rte_seqlock_read_tryunlock(&data->lock, &sn));
+
+		if (a != b || b != c) {
+			printf("Reader observed inconsistent data values "
+			       "%" PRIu64 " %" PRIu64 " %" PRIu64 "\n",
+			       a, b, c);
+			rc = -1;
+		}
+	}
+
+	return rc;
+}
+
+static void
+reader_stop(struct reader *reader)
+{
+	__atomic_store_n(&reader->stop, 1, __ATOMIC_RELAXED);
+}
+
+#define NUM_WRITERS (2) /* main lcore + one worker */
+#define MIN_NUM_READERS (2)
+#define MAX_READERS (RTE_MAX_LCORE - NUM_WRITERS - 1)
+#define MIN_LCORE_COUNT (NUM_WRITERS + MIN_NUM_READERS)
+
+/* Only a compile-time test */
+static rte_seqlock_t __rte_unused static_init_lock = RTE_SEQLOCK_INITIALIZER;
+
+static int
+test_seqlock(void)
+{
+	struct reader readers[MAX_READERS];
+	unsigned int num_readers;
+	unsigned int num_lcores;
+	unsigned int i;
+	unsigned int lcore_id;
+	unsigned int reader_lcore_ids[MAX_READERS];
+	unsigned int worker_writer_lcore_id = 0;
+	int rc = 0;
+
+	num_lcores = rte_lcore_count();
+
+	if (num_lcores < MIN_LCORE_COUNT) {
+		printf("Too few cores to run test. Skipping.\n");
+		return 0;
+	}
+
+	num_readers = num_lcores - NUM_WRITERS;
+
+	struct data *data = rte_zmalloc(NULL, sizeof(struct data), 0);
+
+	i = 0;
+	RTE_LCORE_FOREACH_WORKER(lcore_id) {
+		if (i == 0) {
+			rte_eal_remote_launch(writer_run, data, lcore_id);
+			worker_writer_lcore_id = lcore_id;
+		} else {
+			unsigned int reader_idx = i - 1;
+			struct reader *reader = &readers[reader_idx];
+
+			reader->data = data;
+			reader->stop = 0;
+
+			rte_eal_remote_launch(reader_run, reader, lcore_id);
+			reader_lcore_ids[reader_idx] = lcore_id;
+		}
+		i++;
+	}
+
+	if (writer_run(data) != 0 ||
+	    rte_eal_wait_lcore(worker_writer_lcore_id) != 0)
+		rc = -1;
+
+	for (i = 0; i < num_readers; i++) {
+		reader_stop(&readers[i]);
+		if (rte_eal_wait_lcore(reader_lcore_ids[i]) != 0)
+			rc = -1;
+	}
+
+	return rc;
+}
+
+REGISTER_TEST_COMMAND(seqlock_autotest, test_seqlock);
diff --git a/lib/eal/common/meson.build b/lib/eal/common/meson.build
index 917758cc65..a41343bfed 100644
--- a/lib/eal/common/meson.build
+++ b/lib/eal/common/meson.build
@@ -35,6 +35,7 @@ sources += files(
         'rte_malloc.c',
         'rte_random.c',
         'rte_reciprocal.c',
+	'rte_seqlock.c',
         'rte_service.c',
         'rte_version.c',
 )
diff --git a/lib/eal/common/rte_seqlock.c b/lib/eal/common/rte_seqlock.c
new file mode 100644
index 0000000000..d4fe648799
--- /dev/null
+++ b/lib/eal/common/rte_seqlock.c
@@ -0,0 +1,12 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2022 Ericsson AB
+ */
+
+#include <rte_seqlock.h>
+
+void
+rte_seqlock_init(rte_seqlock_t *seqlock)
+{
+	seqlock->sn = 0;
+	rte_spinlock_init(&seqlock->lock);
+}
diff --git a/lib/eal/include/meson.build b/lib/eal/include/meson.build
index 9700494816..48df5f1a21 100644
--- a/lib/eal/include/meson.build
+++ b/lib/eal/include/meson.build
@@ -36,6 +36,7 @@ headers += files(
         'rte_per_lcore.h',
         'rte_random.h',
         'rte_reciprocal.h',
+        'rte_seqlock.h',
         'rte_service.h',
         'rte_service_component.h',
         'rte_string_fns.h',
diff --git a/lib/eal/include/rte_seqlock.h b/lib/eal/include/rte_seqlock.h
new file mode 100644
index 0000000000..44eacd66e8
--- /dev/null
+++ b/lib/eal/include/rte_seqlock.h
@@ -0,0 +1,302 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2022 Ericsson AB
+ */
+
+#ifndef _RTE_SEQLOCK_H_
+#define _RTE_SEQLOCK_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * @file
+ * RTE Seqlock
+ *
+ * A sequence lock (seqlock) is a synchronization primitive allowing
+ * multiple, parallel, readers to efficiently and safely (i.e., in a
+ * data-race free manner) access the lock-protected data. The RTE
+ * seqlock permits multiple writers as well. A spinlock is used to
+ * writer-writer synchronization.
+ *
+ * A reader never blocks a writer. Very high frequency writes may
+ * prevent readers from making progress.
+ *
+ * A seqlock is not preemption-safe on the writer side. If a writer is
+ * preempted, it may block readers until the writer thread is again
+ * allowed to execute. Heavy computations should be kept out of the
+ * writer-side critical section, to avoid delaying readers.
+ *
+ * Seqlocks are useful for data which are read by many cores, at a
+ * high frequency, and relatively infrequently written to.
+ *
+ * One way to think about seqlocks is that they provide means to
+ * perform atomic operations on objects larger than what the native
+ * machine instructions allow for.
+ *
+ * To avoid resource reclamation issues, the data protected by a
+ * seqlock should typically be kept self-contained (e.g., no pointers
+ * to mutable, dynamically allocated data).
+ *
+ * Example usage:
+ * @code{.c}
+ * #define MAX_Y_LEN (16)
+ * // Application-defined example data structure, protected by a seqlock.
+ * struct config {
+ *         rte_seqlock_t lock;
+ *         int param_x;
+ *         char param_y[MAX_Y_LEN];
+ * };
+ *
+ * // Accessor function for reading config fields.
+ * void
+ * config_read(const struct config *config, int *param_x, char *param_y)
+ * {
+ *         // Temporary variables, just to improve readability.
+ *         int tentative_x;
+ *         char tentative_y[MAX_Y_LEN];
+ *         uint32_t sn;
+ *
+ *         sn = rte_seqlock_read_lock(&config->lock);
+ *         do {
+ *                 // Loads may be atomic or non-atomic, as in this example.
+ *                 tentative_x = config->param_x;
+ *                 strcpy(tentative_y, config->param_y);
+ *         } while (!rte_seqlock_read_tryunlock(&config->lock, &sn));
+ *         // An application could skip retrying, and try again later, if
+ *         // progress is possible without the data.
+ *
+ *         *param_x = tentative_x;
+ *         strcpy(param_y, tentative_y);
+ * }
+ *
+ * // Accessor function for writing config fields.
+ * void
+ * config_update(struct config *config, int param_x, const char *param_y)
+ * {
+ *         rte_seqlock_write_lock(&config->lock);
+ *         // Stores may be atomic or non-atomic, as in this example.
+ *         config->param_x = param_x;
+ *         strcpy(config->param_y, param_y);
+ *         rte_seqlock_write_unlock(&config->lock);
+ * }
+ * @endcode
+ *
+ * @see
+ * https://en.wikipedia.org/wiki/Seqlock.
+ */
+
+#include <stdbool.h>
+#include <stdint.h>
+
+#include <rte_atomic.h>
+#include <rte_branch_prediction.h>
+#include <rte_spinlock.h>
+
+/**
+ * The RTE seqlock type.
+ */
+typedef struct {
+	uint32_t sn; /**< A sequence number for the protected data. */
+	rte_spinlock_t lock; /**< Spinlock used to serialize writers.  */
+} rte_seqlock_t;
+
+/**
+ * A static seqlock initializer.
+ */
+#define RTE_SEQLOCK_INITIALIZER { 0, RTE_SPINLOCK_INITIALIZER }
+
+/**
+ * Initialize the seqlock.
+ *
+ * This function initializes the seqlock, and leaves the writer-side
+ * spinlock unlocked.
+ *
+ * @param seqlock
+ *   A pointer to the seqlock.
+ */
+__rte_experimental
+void
+rte_seqlock_init(rte_seqlock_t *seqlock);
+
+/**
+ * Begin a read-side critical section.
+ *
+ * A call to this function marks the beginning of a read-side critical
+ * section, for @p seqlock.
+ *
+ * rte_seqlock_read_lock() returns a sequence number, which is later
+ * used in rte_seqlock_read_tryunlock() to check if the protected data
+ * underwent any modifications during the read transaction.
+ *
+ * After (in program order) rte_seqlock_read_lock() has been called,
+ * the calling thread reads the protected data, for later use. The
+ * protected data read *must* be copied (either in pristine form, or
+ * in the form of some derivative), since the caller may only read the
+ * data from within the read-side critical section (i.e., after
+ * rte_seqlock_read_lock() and before rte_seqlock_read_tryunlock()),
+ * but must not act upon the retrieved data while in the critical
+ * section, since it does not yet know if it is consistent.
+ *
+ * The protected data may be read using atomic and/or non-atomic
+ * operations.
+ *
+ * After (in program order) all required data loads have been
+ * performed, rte_seqlock_read_tryunlock() should be called, marking
+ * the end of the read-side critical section.
+ *
+ * If rte_seqlock_read_tryunlock() returns true, the data was read
+ * atomically and the copied data is consistent.
+ *
+ * If rte_seqlock_read_tryunlock() returns false, the just-read data
+ * is inconsistent and should be discarded. The caller has the option
+ * to either re-read the data and call rte_seqlock_read_tryunlock()
+ * again, or to restart the whole procedure (i.e., from
+ * rte_seqlock_read_lock()) at some later time.
+ *
+ * @param seqlock
+ *   A pointer to the seqlock.
+ * @return
+ *   The seqlock sequence number for this critical section, to
+ *   later be passed to rte_seqlock_read_tryunlock().
+ *
+ * @see rte_seqlock_read_tryunlock()
+ */
+__rte_experimental
+static inline uint32_t
+rte_seqlock_read_lock(const rte_seqlock_t *seqlock)
+{
+	/* __ATOMIC_ACQUIRE to prevent loads after (in program order)
+	 * from happening before the sn load. Synchronizes-with the
+	 * store release in rte_seqlock_write_unlock().
+	 */
+	return __atomic_load_n(&seqlock->sn, __ATOMIC_ACQUIRE);
+}
+
+/**
+ * End a read-side critical section.
+ *
+ * A call to this function marks the end of a read-side critical
+ * section, for @p seqlock. The application must supply the sequence
+ * number produced by the corresponding rte_seqlock_read_lock() (or,
+ * in case of a retry, the rte_seqlock_tryunlock()) call.
+ *
+ * After this function has been called, the caller should not access
+ * the protected data.
+ *
+ * In case this function returns true, the just-read data was
+ * consistent and the set of atomic and non-atomic load operations
+ * performed between rte_seqlock_read_lock() and
+ * rte_seqlock_read_tryunlock() were atomic, as a whole.
+ *
+ * In case rte_seqlock_read_tryunlock() returns false, the data was
+ * modified as it was being read and may be inconsistent, and thus
+ * should be discarded. The @p begin_sn is updated with the
+ * now-current sequence number.
+ *
+ * @param seqlock
+ *   A pointer to the seqlock.
+ * @param begin_sn
+ *   The seqlock sequence number returned by
+ *   rte_seqlock_read_lock() (potentially updated in subsequent
+ *   rte_seqlock_read_tryunlock() calls) for this critical section.
+ * @return
+ *   true or false, if the just-read seqlock-protected data was consistent
+ *   or inconsistent, respectively, at the time it was read.
+ *
+ * @see rte_seqlock_read_lock()
+ */
+__rte_experimental
+static inline bool
+rte_seqlock_read_tryunlock(const rte_seqlock_t *seqlock, uint32_t *begin_sn)
+{
+	uint32_t end_sn;
+
+	/* make sure the data loads happens before the sn load */
+	rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
+
+	end_sn = __atomic_load_n(&seqlock->sn, __ATOMIC_RELAXED);
+
+	if (unlikely(end_sn & 1 || *begin_sn != end_sn)) {
+		*begin_sn = end_sn;
+		return false;
+	}
+
+	return true;
+}
+
+/**
+ * Begin a write-side critical section.
+ *
+ * A call to this function acquires the write lock associated @p
+ * seqlock, and marks the beginning of a write-side critical section.
+ *
+ * After having called this function, the caller may go on to modify
+ * (both read and write) the protected data, in an atomic or
+ * non-atomic manner.
+ *
+ * After the necessary updates have been performed, the application
+ * calls rte_seqlock_write_unlock().
+ *
+ * This function is not preemption-safe in the sense that preemption
+ * of the calling thread may block reader progress until the writer
+ * thread is rescheduled.
+ *
+ * Unlike rte_seqlock_read_lock(), each call made to
+ * rte_seqlock_write_lock() must be matched with an unlock call.
+ *
+ * @param seqlock
+ *   A pointer to the seqlock.
+ *
+ * @see rte_seqlock_write_unlock()
+ */
+__rte_experimental
+static inline void
+rte_seqlock_write_lock(rte_seqlock_t *seqlock)
+{
+	uint32_t sn;
+
+	/* to synchronize with other writers */
+	rte_spinlock_lock(&seqlock->lock);
+
+	sn = seqlock->sn + 1;
+
+	__atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELAXED);
+
+	/* __ATOMIC_RELEASE to prevent stores after (in program order)
+	 * from happening before the sn store.
+	 */
+	rte_atomic_thread_fence(__ATOMIC_RELEASE);
+}
+
+/**
+ * End a write-side critical section.
+ *
+ * A call to this function marks the end of the write-side critical
+ * section, for @p seqlock. After this call has been made, the protected
+ * data may no longer be modified.
+ *
+ * @param seqlock
+ *   A pointer to the seqlock.
+ *
+ * @see rte_seqlock_write_lock()
+ */
+__rte_experimental
+static inline void
+rte_seqlock_write_unlock(rte_seqlock_t *seqlock)
+{
+	uint32_t sn;
+
+	sn = seqlock->sn + 1;
+
+	/* synchronizes-with the load acquire in rte_seqlock_read_lock() */
+	__atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELEASE);
+
+	rte_spinlock_unlock(&seqlock->lock);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif  /* _RTE_SEQLOCK_H_ */
diff --git a/lib/eal/version.map b/lib/eal/version.map
index b53eeb30d7..4a9d0ed899 100644
--- a/lib/eal/version.map
+++ b/lib/eal/version.map
@@ -420,6 +420,9 @@ EXPERIMENTAL {
 	rte_intr_instance_free;
 	rte_intr_type_get;
 	rte_intr_type_set;
+
+	# added in 22.07
+	rte_seqlock_init;
 };
 
 INTERNAL {
-- 
2.25.1


^ permalink raw reply	[flat|nested] 104+ messages in thread

* RE: [PATCH v3] eal: add seqlock
  2022-04-01 15:07                                         ` [PATCH v3] " Mattias Rönnblom
@ 2022-04-02  0:21                                           ` Honnappa Nagarahalli
  2022-04-02 11:01                                             ` Morten Brørup
                                                               ` (2 more replies)
  2022-04-02 18:15                                           ` Ola Liljedahl
  1 sibling, 3 replies; 104+ messages in thread
From: Honnappa Nagarahalli @ 2022-04-02  0:21 UTC (permalink / raw)
  To: Mattias Rönnblom, dev
  Cc: thomas, David Marchand, onar.olsen, nd, konstantin.ananyev, mb,
	stephen, Ola Liljedahl, nd

Hi Mattias,
	Few comments inline.

> -----Original Message-----
> From: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> Sent: Friday, April 1, 2022 10:08 AM
> To: dev@dpdk.org
> Cc: thomas@monjalon.net; David Marchand <david.marchand@redhat.com>;
> onar.olsen@ericsson.com; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>;
> konstantin.ananyev@intel.com; mb@smartsharesystems.com;
> stephen@networkplumber.org; Mattias Rönnblom
> <mattias.ronnblom@ericsson.com>; Ola Liljedahl <Ola.Liljedahl@arm.com>
> Subject: [PATCH v3] eal: add seqlock
> 
> A sequence lock (seqlock) is synchronization primitive which allows for data-
> race free, low-overhead, high-frequency reads, especially for data structures
> shared across many cores and which are updated with relatively infrequently.
> 
> A seqlock permits multiple parallel readers. The variant of seqlock implemented
> in this patch supports multiple writers as well. A spinlock is used for writer-
> writer serialization.
> 
> To avoid resource reclamation and other issues, the data protected by a seqlock
> is best off being self-contained (i.e., no pointers [except to constant data]).
> 
> One way to think about seqlocks is that they provide means to perform atomic
> operations on data objects larger what the native atomic machine instructions
> allow for.
> 
> DPDK seqlocks are not preemption safe on the writer side. A thread preemption
> affects performance, not correctness.
> 
> A seqlock contains a sequence number, which can be thought of as the
> generation of the data it protects.
> 
> A reader will
>   1. Load the sequence number (sn).
>   2. Load, in arbitrary order, the seqlock-protected data.
>   3. Load the sn again.
>   4. Check if the first and second sn are equal, and even numbered.
>      If they are not, discard the loaded data, and restart from 1.
> 
> The first three steps need to be ordered using suitable memory fences.
> 
> A writer will
>   1. Take the spinlock, to serialize writer access.
>   2. Load the sn.
>   3. Store the original sn + 1 as the new sn.
>   4. Perform load and stores to the seqlock-protected data.
>   5. Store the original sn + 2 as the new sn.
>   6. Release the spinlock.
> 
> Proper memory fencing is required to make sure the first sn store, the data
> stores, and the second sn store appear to the reader in the mentioned order.
> 
> The sn loads and stores must be atomic, but the data loads and stores need not
> be.
> 
> The original seqlock design and implementation was done by Stephen
> Hemminger. This is an independent implementation, using C11 atomics.
> 
> For more information on seqlocks, see
> https://en.wikipedia.org/wiki/Seqlock
> 
> PATCH v3:
>   * Renamed both read and write-side critical section begin/end functions
>     to better match rwlock naming, per Ola Liljedahl's suggestion.
>   * Added 'extern "C"' guards for C++ compatibility.
>   * Refer to the main lcore as the main, and not the master.
> 
> PATCH v2:
>   * Skip instead of fail unit test in case too few lcores are available.
>   * Use main lcore for testing, reducing the minimum number of lcores
>     required to run the unit tests to four.
>   * Consistently refer to sn field as the "sequence number" in the
>     documentation.
>   * Fixed spelling mistakes in documentation.
> 
> Updates since RFC:
>   * Added API documentation.
>   * Added link to Wikipedia article in the commit message.
>   * Changed seqlock sequence number field from uint64_t (which was
>     overkill) to uint32_t. The sn type needs to be sufficiently large
>     to assure no reader will read a sn, access the data, and then read
>     the same sn, but the sn has been updated to many times during the
>     read, so it has wrapped.
>   * Added RTE_SEQLOCK_INITIALIZER macro for static initialization.
>   * Removed the rte_seqlock struct + separate rte_seqlock_t typedef
>     with an anonymous struct typedef:ed to rte_seqlock_t.
> 
> Acked-by: Morten Brørup <mb@smartsharesystems.com>
> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> ---
>  app/test/meson.build          |   2 +
>  app/test/test_seqlock.c       | 202 +++++++++++++++++++++++
>  lib/eal/common/meson.build    |   1 +
>  lib/eal/common/rte_seqlock.c  |  12 ++
>  lib/eal/include/meson.build   |   1 +
>  lib/eal/include/rte_seqlock.h | 302 ++++++++++++++++++++++++++++++++++
>  lib/eal/version.map           |   3 +
>  7 files changed, 523 insertions(+)
>  create mode 100644 app/test/test_seqlock.c  create mode 100644
> lib/eal/common/rte_seqlock.c  create mode 100644
> lib/eal/include/rte_seqlock.h
> 
> diff --git a/app/test/meson.build b/app/test/meson.build index
> 5fc1dd1b7b..5e418e8766 100644
> --- a/app/test/meson.build
> +++ b/app/test/meson.build
> @@ -125,6 +125,7 @@ test_sources = files(
>          'test_rwlock.c',
>          'test_sched.c',
>          'test_security.c',
> +        'test_seqlock.c',
>          'test_service_cores.c',
>          'test_spinlock.c',
>          'test_stack.c',
> @@ -214,6 +215,7 @@ fast_tests = [
>          ['rwlock_rde_wro_autotest', true],
>          ['sched_autotest', true],
>          ['security_autotest', false],
> +        ['seqlock_autotest', true],
>          ['spinlock_autotest', true],
>          ['stack_autotest', false],
>          ['stack_lf_autotest', false],
> diff --git a/app/test/test_seqlock.c b/app/test/test_seqlock.c new file mode
> 100644 index 0000000000..54fadf8025
> --- /dev/null
> +++ b/app/test/test_seqlock.c
> @@ -0,0 +1,202 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2022 Ericsson AB
> + */
> +
> +#include <rte_seqlock.h>
> +
> +#include <rte_cycles.h>
> +#include <rte_malloc.h>
> +#include <rte_random.h>
> +
> +#include <inttypes.h>
> +
> +#include "test.h"
> +
> +struct data {
> +	rte_seqlock_t lock;
> +
> +	uint64_t a;
> +	uint64_t b __rte_cache_aligned;
> +	uint64_t c __rte_cache_aligned;
> +} __rte_cache_aligned;
> +
> +struct reader {
> +	struct data *data;
> +	uint8_t stop;
> +};
> +
> +#define WRITER_RUNTIME (2.0) /* s */
> +
> +#define WRITER_MAX_DELAY (100) /* us */
> +
> +#define INTERRUPTED_WRITER_FREQUENCY (1000) #define
> +WRITER_INTERRUPT_TIME (1) /* us */
> +
> +static int
> +writer_run(void *arg)
> +{
> +	struct data *data = arg;
> +	uint64_t deadline;
> +
> +	deadline = rte_get_timer_cycles() +
> +		WRITER_RUNTIME * rte_get_timer_hz();
> +
> +	while (rte_get_timer_cycles() < deadline) {
> +		bool interrupted;
> +		uint64_t new_value;
> +		unsigned int delay;
> +
> +		new_value = rte_rand();
> +
> +		interrupted =
> rte_rand_max(INTERRUPTED_WRITER_FREQUENCY) == 0;
> +
> +		rte_seqlock_write_lock(&data->lock);
> +
> +		data->c = new_value;
> +
> +		/* These compiler barriers (both on the test reader
> +		 * and the test writer side) are here to ensure that
> +		 * loads/stores *usually* happen in test program order
> +		 * (always on a TSO machine). They are arrange in such
> +		 * a way that the writer stores in a different order
> +		 * than the reader loads, to emulate an arbitrary
> +		 * order. A real application using a seqlock does not
> +		 * require any compiler barriers.
> +		 */
> +		rte_compiler_barrier();
The compiler barriers are not sufficient on all architectures (if the intention is to maintain the program order).

> +		data->b = new_value;
> +
> +		if (interrupted)
> +			rte_delay_us_block(WRITER_INTERRUPT_TIME);
> +
> +		rte_compiler_barrier();
> +		data->a = new_value;
> +
> +		rte_seqlock_write_unlock(&data->lock);
> +
> +		delay = rte_rand_max(WRITER_MAX_DELAY);
> +
> +		rte_delay_us_block(delay);
> +	}
> +
> +	return 0;
> +}
> +
> +#define INTERRUPTED_READER_FREQUENCY (1000) #define
> +READER_INTERRUPT_TIME (1000) /* us */
> +
> +static int
> +reader_run(void *arg)
> +{
> +	struct reader *r = arg;
> +	int rc = 0;
> +
> +	while (__atomic_load_n(&r->stop, __ATOMIC_RELAXED) == 0 && rc ==
> 0) {
> +		struct data *data = r->data;
> +		bool interrupted;
> +		uint64_t a;
> +		uint64_t b;
> +		uint64_t c;
> +		uint32_t sn;
> +
> +		interrupted =
> rte_rand_max(INTERRUPTED_READER_FREQUENCY) == 0;
> +
> +		sn = rte_seqlock_read_lock(&data->lock);
> +
> +		do {
> +			a = data->a;
> +			/* See writer_run() for an explanation why
> +			 * these barriers are here.
> +			 */
> +			rte_compiler_barrier();
> +
> +			if (interrupted)
> +
> 	rte_delay_us_block(READER_INTERRUPT_TIME);
> +
> +			c = data->c;
> +
> +			rte_compiler_barrier();
> +			b = data->b;
> +
> +		} while (!rte_seqlock_read_tryunlock(&data->lock, &sn));
> +
> +		if (a != b || b != c) {
> +			printf("Reader observed inconsistent data values "
> +			       "%" PRIu64 " %" PRIu64 " %" PRIu64 "\n",
> +			       a, b, c);
> +			rc = -1;
> +		}
> +	}
> +
> +	return rc;
> +}
> +
> +static void
> +reader_stop(struct reader *reader)
> +{
> +	__atomic_store_n(&reader->stop, 1, __ATOMIC_RELAXED); }
> +
> +#define NUM_WRITERS (2) /* main lcore + one worker */ #define
> +MIN_NUM_READERS (2) #define MAX_READERS (RTE_MAX_LCORE -
> NUM_WRITERS -
> +1) #define MIN_LCORE_COUNT (NUM_WRITERS + MIN_NUM_READERS)
> +
> +/* Only a compile-time test */
> +static rte_seqlock_t __rte_unused static_init_lock =
> +RTE_SEQLOCK_INITIALIZER;
> +
> +static int
> +test_seqlock(void)
> +{
> +	struct reader readers[MAX_READERS];
> +	unsigned int num_readers;
> +	unsigned int num_lcores;
> +	unsigned int i;
> +	unsigned int lcore_id;
> +	unsigned int reader_lcore_ids[MAX_READERS];
> +	unsigned int worker_writer_lcore_id = 0;
> +	int rc = 0;
> +
> +	num_lcores = rte_lcore_count();
> +
> +	if (num_lcores < MIN_LCORE_COUNT) {
> +		printf("Too few cores to run test. Skipping.\n");
> +		return 0;
> +	}
> +
> +	num_readers = num_lcores - NUM_WRITERS;
> +
> +	struct data *data = rte_zmalloc(NULL, sizeof(struct data), 0);
> +
> +	i = 0;
> +	RTE_LCORE_FOREACH_WORKER(lcore_id) {
> +		if (i == 0) {
> +			rte_eal_remote_launch(writer_run, data, lcore_id);
> +			worker_writer_lcore_id = lcore_id;
> +		} else {
> +			unsigned int reader_idx = i - 1;
> +			struct reader *reader = &readers[reader_idx];
> +
> +			reader->data = data;
> +			reader->stop = 0;
> +
> +			rte_eal_remote_launch(reader_run, reader, lcore_id);
> +			reader_lcore_ids[reader_idx] = lcore_id;
> +		}
> +		i++;
> +	}
> +
> +	if (writer_run(data) != 0 ||
> +	    rte_eal_wait_lcore(worker_writer_lcore_id) != 0)
> +		rc = -1;
> +
> +	for (i = 0; i < num_readers; i++) {
> +		reader_stop(&readers[i]);
> +		if (rte_eal_wait_lcore(reader_lcore_ids[i]) != 0)
> +			rc = -1;
> +	}
> +
> +	return rc;
> +}
> +
> +REGISTER_TEST_COMMAND(seqlock_autotest, test_seqlock);
> diff --git a/lib/eal/common/meson.build b/lib/eal/common/meson.build index
> 917758cc65..a41343bfed 100644
> --- a/lib/eal/common/meson.build
> +++ b/lib/eal/common/meson.build
> @@ -35,6 +35,7 @@ sources += files(
>          'rte_malloc.c',
>          'rte_random.c',
>          'rte_reciprocal.c',
> +	'rte_seqlock.c',
>          'rte_service.c',
>          'rte_version.c',
>  )
> diff --git a/lib/eal/common/rte_seqlock.c b/lib/eal/common/rte_seqlock.c new
> file mode 100644 index 0000000000..d4fe648799
> --- /dev/null
> +++ b/lib/eal/common/rte_seqlock.c
> @@ -0,0 +1,12 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2022 Ericsson AB
> + */
> +
> +#include <rte_seqlock.h>
> +
> +void
> +rte_seqlock_init(rte_seqlock_t *seqlock) {
> +	seqlock->sn = 0;
> +	rte_spinlock_init(&seqlock->lock);
> +}
> diff --git a/lib/eal/include/meson.build b/lib/eal/include/meson.build index
> 9700494816..48df5f1a21 100644
> --- a/lib/eal/include/meson.build
> +++ b/lib/eal/include/meson.build
> @@ -36,6 +36,7 @@ headers += files(
>          'rte_per_lcore.h',
>          'rte_random.h',
>          'rte_reciprocal.h',
> +        'rte_seqlock.h',
>          'rte_service.h',
>          'rte_service_component.h',
>          'rte_string_fns.h',
> diff --git a/lib/eal/include/rte_seqlock.h b/lib/eal/include/rte_seqlock.h new file
Other lock implementations are in lib/eal/include/generic.

> mode 100644 index 0000000000..44eacd66e8
> --- /dev/null
> +++ b/lib/eal/include/rte_seqlock.h
> @@ -0,0 +1,302 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2022 Ericsson AB
> + */
> +
> +#ifndef _RTE_SEQLOCK_H_
> +#define _RTE_SEQLOCK_H_
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +/**
> + * @file
> + * RTE Seqlock
> + *
> + * A sequence lock (seqlock) is a synchronization primitive allowing
> + * multiple, parallel, readers to efficiently and safely (i.e., in a
> + * data-race free manner) access the lock-protected data. The RTE
> + * seqlock permits multiple writers as well. A spinlock is used to
> + * writer-writer synchronization.
> + *
> + * A reader never blocks a writer. Very high frequency writes may
> + * prevent readers from making progress.
> + *
> + * A seqlock is not preemption-safe on the writer side. If a writer is
> + * preempted, it may block readers until the writer thread is again
> + * allowed to execute. Heavy computations should be kept out of the
> + * writer-side critical section, to avoid delaying readers.
> + *
> + * Seqlocks are useful for data which are read by many cores, at a
> + * high frequency, and relatively infrequently written to.
> + *
> + * One way to think about seqlocks is that they provide means to
> + * perform atomic operations on objects larger than what the native
> + * machine instructions allow for.
> + *
> + * To avoid resource reclamation issues, the data protected by a
> + * seqlock should typically be kept self-contained (e.g., no pointers
> + * to mutable, dynamically allocated data).
> + *
> + * Example usage:
> + * @code{.c}
> + * #define MAX_Y_LEN (16)
> + * // Application-defined example data structure, protected by a seqlock.
> + * struct config {
> + *         rte_seqlock_t lock;
> + *         int param_x;
> + *         char param_y[MAX_Y_LEN];
> + * };
> + *
> + * // Accessor function for reading config fields.
> + * void
> + * config_read(const struct config *config, int *param_x, char
> +*param_y)
> + * {
> + *         // Temporary variables, just to improve readability.
I think the above comment is not necessary. It is beneficial to copy the protected data to keep the read side critical section small.

> + *         int tentative_x;
> + *         char tentative_y[MAX_Y_LEN];
> + *         uint32_t sn;
> + *
> + *         sn = rte_seqlock_read_lock(&config->lock);
> + *         do {
> + *                 // Loads may be atomic or non-atomic, as in this example.
> + *                 tentative_x = config->param_x;
> + *                 strcpy(tentative_y, config->param_y);
> + *         } while (!rte_seqlock_read_tryunlock(&config->lock, &sn));
> + *         // An application could skip retrying, and try again later, if
> + *         // progress is possible without the data.
> + *
> + *         *param_x = tentative_x;
> + *         strcpy(param_y, tentative_y);
> + * }
> + *
> + * // Accessor function for writing config fields.
> + * void
> + * config_update(struct config *config, int param_x, const char
> +*param_y)
> + * {
> + *         rte_seqlock_write_lock(&config->lock);
> + *         // Stores may be atomic or non-atomic, as in this example.
> + *         config->param_x = param_x;
> + *         strcpy(config->param_y, param_y);
> + *         rte_seqlock_write_unlock(&config->lock);
> + * }
> + * @endcode
> + *
> + * @see
> + * https://en.wikipedia.org/wiki/Seqlock.
> + */
> +
> +#include <stdbool.h>
> +#include <stdint.h>
> +
> +#include <rte_atomic.h>
> +#include <rte_branch_prediction.h>
> +#include <rte_spinlock.h>
> +
> +/**
> + * The RTE seqlock type.
> + */
> +typedef struct {
> +	uint32_t sn; /**< A sequence number for the protected data. */
> +	rte_spinlock_t lock; /**< Spinlock used to serialize writers.  */ }
Suggest using ticket lock for the writer side. It should have low overhead when there is a single writer, but provides better functionality when there are multiple writers. 

> +rte_seqlock_t;
> +
> +/**
> + * A static seqlock initializer.
> + */
> +#define RTE_SEQLOCK_INITIALIZER { 0, RTE_SPINLOCK_INITIALIZER }
> +
> +/**
> + * Initialize the seqlock.
> + *
> + * This function initializes the seqlock, and leaves the writer-side
> + * spinlock unlocked.
> + *
> + * @param seqlock
> + *   A pointer to the seqlock.
> + */
> +__rte_experimental
> +void
> +rte_seqlock_init(rte_seqlock_t *seqlock);
> +
> +/**
> + * Begin a read-side critical section.
> + *
> + * A call to this function marks the beginning of a read-side critical
> + * section, for @p seqlock.
> + *
> + * rte_seqlock_read_lock() returns a sequence number, which is later
> + * used in rte_seqlock_read_tryunlock() to check if the protected data
> + * underwent any modifications during the read transaction.
> + *
> + * After (in program order) rte_seqlock_read_lock() has been called,
> + * the calling thread reads the protected data, for later use. The
> + * protected data read *must* be copied (either in pristine form, or
> + * in the form of some derivative), since the caller may only read the
> + * data from within the read-side critical section (i.e., after
> + * rte_seqlock_read_lock() and before rte_seqlock_read_tryunlock()),
> + * but must not act upon the retrieved data while in the critical
> + * section, since it does not yet know if it is consistent.
> + *
> + * The protected data may be read using atomic and/or non-atomic
> + * operations.
> + *
> + * After (in program order) all required data loads have been
> + * performed, rte_seqlock_read_tryunlock() should be called, marking
> + * the end of the read-side critical section.
> + *
> + * If rte_seqlock_read_tryunlock() returns true, the data was read
> + * atomically and the copied data is consistent.
> + *
> + * If rte_seqlock_read_tryunlock() returns false, the just-read data
> + * is inconsistent and should be discarded. The caller has the option
> + * to either re-read the data and call rte_seqlock_read_tryunlock()
> + * again, or to restart the whole procedure (i.e., from
> + * rte_seqlock_read_lock()) at some later time.
> + *
> + * @param seqlock
> + *   A pointer to the seqlock.
> + * @return
> + *   The seqlock sequence number for this critical section, to
> + *   later be passed to rte_seqlock_read_tryunlock().
> + *
> + * @see rte_seqlock_read_tryunlock()
> + */
> +__rte_experimental
> +static inline uint32_t
> +rte_seqlock_read_lock(const rte_seqlock_t *seqlock) {
> +	/* __ATOMIC_ACQUIRE to prevent loads after (in program order)
> +	 * from happening before the sn load. Synchronizes-with the
> +	 * store release in rte_seqlock_write_unlock().
> +	 */
> +	return __atomic_load_n(&seqlock->sn, __ATOMIC_ACQUIRE); }
> +
> +/**
> + * End a read-side critical section.
> + *
> + * A call to this function marks the end of a read-side critical
Should we capture that it also begins a new critical-section for the subsequent calls to rte_seqlock_tryunlock()?

> + * section, for @p seqlock. The application must supply the sequence
> + * number produced by the corresponding rte_seqlock_read_lock() (or,
> + * in case of a retry, the rte_seqlock_tryunlock()) call.
> + *
> + * After this function has been called, the caller should not access
> + * the protected data.
I understand what you mean here. But, I think this needs clarity.
In the documentation for rte_seqlock_read_lock() you have mentioned, if rte_seqlock_read_tryunlock() returns false, one could re-read the data.
May be this should be changed to:
" After this function returns true, the caller should not access the protected data."?
Or may be combine it with the following para.

> + *
> + * In case this function returns true, the just-read data was
> + * consistent and the set of atomic and non-atomic load operations
> + * performed between rte_seqlock_read_lock() and
> + * rte_seqlock_read_tryunlock() were atomic, as a whole.
> + *
> + * In case rte_seqlock_read_tryunlock() returns false, the data was
> + * modified as it was being read and may be inconsistent, and thus
> + * should be discarded. The @p begin_sn is updated with the
> + * now-current sequence number.
May be
" The @p begin_sn is updated with the sequence number for the next critical section."

> + *
> + * @param seqlock
> + *   A pointer to the seqlock.
> + * @param begin_sn
> + *   The seqlock sequence number returned by
> + *   rte_seqlock_read_lock() (potentially updated in subsequent
> + *   rte_seqlock_read_tryunlock() calls) for this critical section.
> + * @return
> + *   true or false, if the just-read seqlock-protected data was consistent
> + *   or inconsistent, respectively, at the time it was read.
true - just read protected data was consistent
false - just read protected data was inconsistent

> + *
> + * @see rte_seqlock_read_lock()
> + */
> +__rte_experimental
> +static inline bool
> +rte_seqlock_read_tryunlock(const rte_seqlock_t *seqlock, uint32_t
> +*begin_sn) {
> +	uint32_t end_sn;
> +
> +	/* make sure the data loads happens before the sn load */
> +	rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
> +
> +	end_sn = __atomic_load_n(&seqlock->sn, __ATOMIC_RELAXED);
> +
> +	if (unlikely(end_sn & 1 || *begin_sn != end_sn)) {
> +		*begin_sn = end_sn;
> +		return false;
> +	}
> +
> +	return true;
> +}
> +
> +/**
> + * Begin a write-side critical section.
> + *
> + * A call to this function acquires the write lock associated @p
> + * seqlock, and marks the beginning of a write-side critical section.
> + *
> + * After having called this function, the caller may go on to modify
> + * (both read and write) the protected data, in an atomic or
> + * non-atomic manner.
> + *
> + * After the necessary updates have been performed, the application
> + * calls rte_seqlock_write_unlock().
> + *
> + * This function is not preemption-safe in the sense that preemption
> + * of the calling thread may block reader progress until the writer
> + * thread is rescheduled.
> + *
> + * Unlike rte_seqlock_read_lock(), each call made to
> + * rte_seqlock_write_lock() must be matched with an unlock call.
> + *
> + * @param seqlock
> + *   A pointer to the seqlock.
> + *
> + * @see rte_seqlock_write_unlock()
> + */
> +__rte_experimental
> +static inline void
> +rte_seqlock_write_lock(rte_seqlock_t *seqlock) {
> +	uint32_t sn;
> +
> +	/* to synchronize with other writers */
> +	rte_spinlock_lock(&seqlock->lock);
> +
> +	sn = seqlock->sn + 1;
The load of seqlock->sn could use __atomic_load_n to be consistent.

> +
> +	__atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELAXED);
> +
> +	/* __ATOMIC_RELEASE to prevent stores after (in program order)
> +	 * from happening before the sn store.
> +	 */
> +	rte_atomic_thread_fence(__ATOMIC_RELEASE);
> +}
> +
> +/**
> + * End a write-side critical section.
> + *
> + * A call to this function marks the end of the write-side critical
> + * section, for @p seqlock. After this call has been made, the
> +protected
> + * data may no longer be modified.
> + *
> + * @param seqlock
> + *   A pointer to the seqlock.
> + *
> + * @see rte_seqlock_write_lock()
> + */
> +__rte_experimental
> +static inline void
> +rte_seqlock_write_unlock(rte_seqlock_t *seqlock) {
> +	uint32_t sn;
> +
> +	sn = seqlock->sn + 1;
Same here, the load of seqlock->sn could use __atomic_load_n

> +
> +	/* synchronizes-with the load acquire in rte_seqlock_read_lock() */
> +	__atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELEASE);
> +
> +	rte_spinlock_unlock(&seqlock->lock);
> +}
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif  /* _RTE_SEQLOCK_H_ */
> diff --git a/lib/eal/version.map b/lib/eal/version.map index
> b53eeb30d7..4a9d0ed899 100644
> --- a/lib/eal/version.map
> +++ b/lib/eal/version.map
> @@ -420,6 +420,9 @@ EXPERIMENTAL {
>  	rte_intr_instance_free;
>  	rte_intr_type_get;
>  	rte_intr_type_set;
> +
> +	# added in 22.07
> +	rte_seqlock_init;
>  };
> 
>  INTERNAL {
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2] eal: add seqlock
  2022-03-30 14:26                         ` [PATCH v2] " Mattias Rönnblom
  2022-03-31  7:46                           ` Mattias Rönnblom
@ 2022-04-02  0:50                           ` Stephen Hemminger
  2022-04-02 17:54                             ` Ola Liljedahl
  2022-04-05 20:16                           ` Stephen Hemminger
  2 siblings, 1 reply; 104+ messages in thread
From: Stephen Hemminger @ 2022-04-02  0:50 UTC (permalink / raw)
  To: Mattias Rönnblom
  Cc: dev, Thomas Monjalon, David Marchand, onar.olsen,
	Honnappa.Nagarahalli, nd, konstantin.ananyev, mb, Ola Liljedahl

On Wed, 30 Mar 2022 16:26:02 +0200
Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote:

> +	__atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELAXED);
> +
> +	/* __ATOMIC_RELEASE to prevent stores after (in program
> order)
> +	 * from happening before the sn store.
> +	 */
> +	rte_atomic_thread_fence(__ATOMIC_RELEASE);

Couldn't atomic store with __ATOMIC_RELEASE do same thing?

> +static inline void
> +rte_seqlock_write_end(rte_seqlock_t *seqlock)
> +{
> +	uint32_t sn;
> +
> +	sn = seqlock->sn + 1;
> +
> +	/* synchronizes-with the load acquire in rte_seqlock_begin()
> */
> +	__atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELEASE);
> +
> +	rte_spinlock_unlock(&seqlock->lock);

Atomic store is not necessary here, the atomic operation in
spinlock_unlock wil assure theat the seqeuence number update is
ordered correctly.


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2] eal: add seqlock
  2022-03-31 14:53                                 ` Ola Liljedahl
@ 2022-04-02  0:52                                   ` Stephen Hemminger
  2022-04-03  6:23                                     ` Mattias Rönnblom
  0 siblings, 1 reply; 104+ messages in thread
From: Stephen Hemminger @ 2022-04-02  0:52 UTC (permalink / raw)
  To: Ola Liljedahl
  Cc: Mattias Rönnblom, dev, Thomas Monjalon, David Marchand,
	Onar Olsen, Honnappa.Nagarahalli, nd, konstantin.ananyev, mb

On Thu, 31 Mar 2022 16:53:00 +0200
Ola Liljedahl <ola.liljedahl@arm.com> wrote:

> From: Ola Liljedahl <ola.liljedahl@arm.com>
> To: Mattias Rönnblom <mattias.ronnblom@ericsson.com>,  "dev@dpdk.org"
> <dev@dpdk.org> Cc: Thomas Monjalon <thomas@monjalon.net>,  David
> Marchand <david.marchand@redhat.com>,  Onar Olsen
> <onar.olsen@ericsson.com>,  "Honnappa.Nagarahalli@arm.com"
> <Honnappa.Nagarahalli@arm.com>,  "nd@arm.com" <nd@arm.com>,
> "konstantin.ananyev@intel.com" <konstantin.ananyev@intel.com>,
> "mb@smartsharesystems.com" <mb@smartsharesystems.com>,
> "stephen@networkplumber.org" <stephen@networkplumber.org> Subject:
> Re: [PATCH v2] eal: add seqlock Date: Thu, 31 Mar 2022 16:53:00 +0200
> User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
> Thunderbird/91.7.0
> 
> (Thunderbird suddenly refuses to edit in plain text mode, hope the
> mail gets sent as text anyway)
> 
> On 3/31/22 15:38, Mattias Rönnblom wrote:
> 
> > On 2022-03-31 11:04, Ola Liljedahl wrote:  
> >> On 3/31/22 09:46, Mattias Rönnblom wrote:  
> >>> On 2022-03-30 16:26, Mattias Rönnblom wrote:
> >>>  
> <snip>
> >>> Should the rte_seqlock_read_retry() be called
> >>> rte_seqlock_read_end(), or some third alternative? I wanted to
> >>> make clear it's not just a "release the lock" function. You could
> >>> use the|||__attribute__((warn_unused_result)) annotation to make
> >>> clear the return value cannot be ignored, although I'm not sure
> >>> DPDK ever use that attribute.  
> >> We have to decide how to use the seqlock API from the application
> >> perspective.
> >> Your current proposal:
> >> do {
> >>       sn = rte_seqlock_read_begin(&seqlock)
> >>       //read protected data
> >> } while (rte_seqlock_read_retry(&seqlock, sn));
> >>
> >> or perhaps
> >> sn = rte_seqlock_read_lock(&seqlock);
> >> do {
> >>       //read protected data
> >> } while (!rte_seqlock_read_tryunlock(&seqlock, &sn));
> >>
> >> Tryunlock should signal to the user that the unlock operation
> >> might not succeed and something needs to be repeated.
> >>  
> > I like that your proposal is consistent with rwlock API, although I
> > tend to think about a seqlock more like an arbitrary-size atomic
> > load/store, where begin() is the beginning of the read transaction.
> >  
> 
> I can see the evolution of an application where is starts to use
> plain spin locks, moves to reader/writer locks for better performance
> and eventually moves to seqlocks. The usage is the same, only the 
> characteristics (including performance) differ.


The semantics of seqlock in DPDK must be the same as what Linux kernel
does or you are asking for trouble.  It is not a reader-writer lock in
traditional sense.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2] eal: add seqlock
  2022-03-31 13:51                                 ` [PATCH v2] " Mattias Rönnblom
@ 2022-04-02  0:54                                   ` Stephen Hemminger
  2022-04-02 10:25                                     ` Morten Brørup
  0 siblings, 1 reply; 104+ messages in thread
From: Stephen Hemminger @ 2022-04-02  0:54 UTC (permalink / raw)
  To: Mattias Rönnblom
  Cc: Morten Brørup, Ola Liljedahl, dev, Thomas Monjalon,
	David Marchand, Onar Olsen, Honnappa.Nagarahalli, nd,
	konstantin.ananyev

On Thu, 31 Mar 2022 13:51:32 +0000
Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote:

> > 
> > Regarding naming, you should also consider renaming
> > rte_seqlock_write_begin/end() to rte_seqlock_write_lock/unlock(),
> > following the naming convention of the other locks. This could
> > prepare for future extensions, such as rte_seqlock_write_trylock().
> > Just a thought; I don't feel strongly about this.

Semantics and naming should be the same as Linux kernel or you risk
having to reeducate too many people.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* RE: [PATCH v2] eal: add seqlock
  2022-04-02  0:54                                   ` Stephen Hemminger
@ 2022-04-02 10:25                                     ` Morten Brørup
  2022-04-02 17:43                                       ` Ola Liljedahl
  0 siblings, 1 reply; 104+ messages in thread
From: Morten Brørup @ 2022-04-02 10:25 UTC (permalink / raw)
  To: Stephen Hemminger, Mattias Rönnblom
  Cc: Ola Liljedahl, dev, Thomas Monjalon, David Marchand, Onar Olsen,
	Honnappa.Nagarahalli, nd, konstantin.ananyev

> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Saturday, 2 April 2022 02.54
> 
> Semantics and naming should be the same as Linux kernel or you risk
> having to reeducate too many people.

Although I do see significant value in that point, I don't consider the Linux kernel API the definitive golden standard in all regards. If DPDK can do better, it should.

However, if different naming/semantics does a better job for DPDK, then we should take care to avoid similar function names with different behavior than Linux, to reduce the risk of incorrect use by seasoned Linux kernel developers.


^ permalink raw reply	[flat|nested] 104+ messages in thread

* RE: [PATCH v3] eal: add seqlock
  2022-04-02  0:21                                           ` Honnappa Nagarahalli
@ 2022-04-02 11:01                                             ` Morten Brørup
  2022-04-02 19:38                                               ` Honnappa Nagarahalli
  2022-04-03  6:10                                             ` [PATCH v3] eal: add seqlock Mattias Rönnblom
  2022-04-03  6:33                                             ` Mattias Rönnblom
  2 siblings, 1 reply; 104+ messages in thread
From: Morten Brørup @ 2022-04-02 11:01 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Mattias Rönnblom, dev
  Cc: thomas, David Marchand, onar.olsen, nd, konstantin.ananyev,
	stephen, Ola Liljedahl, nd

> From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]
> Sent: Saturday, 2 April 2022 02.22
> 
> > From: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> > Sent: Friday, April 1, 2022 10:08 AM
> >
> > diff --git a/lib/eal/include/rte_seqlock.h
> > b/lib/eal/include/rte_seqlock.h new file
> Other lock implementations are in lib/eal/include/generic.

I'm not sure why what goes where... e.g. rte_branch_prediction.h and rte_bitops.h are not in include/generic.

But I agree that keeping lock implementations in the same location makes sense.

Also, other lock implementations have their init() function in their header file, so you could consider getting rid of the C file. I don't care, just mentioning it.

> > +/**
> > + * The RTE seqlock type.
> > + */
> > +typedef struct {
> > +	uint32_t sn; /**< A sequence number for the protected data. */
> > +	rte_spinlock_t lock; /**< Spinlock used to serialize writers.  */
> }
> Suggest using ticket lock for the writer side. It should have low
> overhead when there is a single writer, but provides better
> functionality when there are multiple writers.

A spinlock and a ticket lock have the same size, so there is no memory cost either.

Unless using a ticket lock stands in the way for future extensions to the seqlock library, then it seems like a good idea.

> > +__rte_experimental
> > +static inline bool
> > +rte_seqlock_read_tryunlock(const rte_seqlock_t *seqlock, uint32_t
> > +*begin_sn) {

Did anyone object to adding the __attribute__((warn_unused_result))?

Otherwise, I think you should add it.


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2] eal: add seqlock
  2022-04-02 10:25                                     ` Morten Brørup
@ 2022-04-02 17:43                                       ` Ola Liljedahl
  0 siblings, 0 replies; 104+ messages in thread
From: Ola Liljedahl @ 2022-04-02 17:43 UTC (permalink / raw)
  To: Morten Brørup, Stephen Hemminger, Mattias Rönnblom
  Cc: dev, Thomas Monjalon, David Marchand, Onar Olsen,
	Honnappa.Nagarahalli, nd, konstantin.ananyev


On 4/2/22 12:25, Morten Brørup wrote:
>> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
>> Sent: Saturday, 2 April 2022 02.54
>>
>> Semantics and naming should be the same as Linux kernel or you risk
>> having to reeducate too many people.
> Although I do see significant value in that point, I don't consider the Linux kernel API the definitive golden standard in all regards. If DPDK can do better, it should.
>
> However, if different naming/semantics does a better job for DPDK, then we should take care to avoid similar function names with different behavior than Linux, to reduce the risk of incorrect use by seasoned Linux kernel developers.
Couldn't have said it better myself.


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2] eal: add seqlock
  2022-04-02  0:50                           ` Stephen Hemminger
@ 2022-04-02 17:54                             ` Ola Liljedahl
  2022-04-02 19:37                               ` Honnappa Nagarahalli
  0 siblings, 1 reply; 104+ messages in thread
From: Ola Liljedahl @ 2022-04-02 17:54 UTC (permalink / raw)
  To: Stephen Hemminger, Mattias Rönnblom
  Cc: dev, Thomas Monjalon, David Marchand, onar.olsen,
	Honnappa.Nagarahalli, nd, konstantin.ananyev, mb


On 4/2/22 02:50, Stephen Hemminger wrote:
> On Wed, 30 Mar 2022 16:26:02 +0200
> Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote:
>
>> +	__atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELAXED);
>> +
>> +	/* __ATOMIC_RELEASE to prevent stores after (in program
>> order)
>> +	 * from happening before the sn store.
>> +	 */
>> +	rte_atomic_thread_fence(__ATOMIC_RELEASE);
> Couldn't atomic store with __ATOMIC_RELEASE do same thing?

No, store-release wouldn't prevent later stores from moving up. It only 
ensures that earlier loads and stores have completed before 
store-release completes. If later stores could move before a supposed 
store-release(seqlock->sn), readers could see inconsistent (torn) data 
with a valid sequence number.


>> +static inline void
>> +rte_seqlock_write_end(rte_seqlock_t *seqlock)
>> +{
>> +	uint32_t sn;
>> +
>> +	sn = seqlock->sn + 1;
>> +
>> +	/* synchronizes-with the load acquire in rte_seqlock_begin()
>> */
>> +	__atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELEASE);
>> +
>> +	rte_spinlock_unlock(&seqlock->lock);
> Atomic store is not necessary here, the atomic operation in
> spinlock_unlock wil assure theat the seqeuence number update is
> ordered correctly.
Load-acquire(seqlock->sn) in rte_seqlock_begin() must be paired with 
store-release(seqlock->sn) in rte_seqlock_write_end() or there wouldn't 
exist any synchronize-with relationship. Readers don't access the spin 
lock so any writer-side updates to the spin lock don't mean anything to 
readers.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3] eal: add seqlock
  2022-04-01 15:07                                         ` [PATCH v3] " Mattias Rönnblom
  2022-04-02  0:21                                           ` Honnappa Nagarahalli
@ 2022-04-02 18:15                                           ` Ola Liljedahl
  2022-04-02 19:31                                             ` Honnappa Nagarahalli
  2022-04-03  6:51                                             ` Mattias Rönnblom
  1 sibling, 2 replies; 104+ messages in thread
From: Ola Liljedahl @ 2022-04-02 18:15 UTC (permalink / raw)
  To: Mattias Rönnblom, dev
  Cc: Thomas Monjalon, David Marchand, onar.olsen,
	Honnappa.Nagarahalli, nd, konstantin.ananyev, mb, stephen

On 4/1/22 17:07, Mattias Rönnblom wrote:
> +
> +/**
> + * End a read-side critical section.
> + *
> + * A call to this function marks the end of a read-side critical
> + * section, for @p seqlock. The application must supply the sequence
> + * number produced by the corresponding rte_seqlock_read_lock() (or,
> + * in case of a retry, the rte_seqlock_tryunlock()) call.
> + *
> + * After this function has been called, the caller should not access
> + * the protected data.
> + *
> + * In case this function returns true, the just-read data was
> + * consistent and the set of atomic and non-atomic load operations
> + * performed between rte_seqlock_read_lock() and
> + * rte_seqlock_read_tryunlock() were atomic, as a whole.
> + *
> + * In case rte_seqlock_read_tryunlock() returns false, the data was
> + * modified as it was being read and may be inconsistent, and thus
> + * should be discarded. The @p begin_sn is updated with the
> + * now-current sequence number.
> + *
> + * @param seqlock
> + *   A pointer to the seqlock.
> + * @param begin_sn
> + *   The seqlock sequence number returned by
> + *   rte_seqlock_read_lock() (potentially updated in subsequent
> + *   rte_seqlock_read_tryunlock() calls) for this critical section.
> + * @return
> + *   true or false, if the just-read seqlock-protected data was consistent
> + *   or inconsistent, respectively, at the time it was read.
> + *
> + * @see rte_seqlock_read_lock()
> + */
> +__rte_experimental
> +static inline bool
> +rte_seqlock_read_tryunlock(const rte_seqlock_t *seqlock, uint32_t *begin_sn)
> +{
> +	uint32_t end_sn;
> +
> +	/* make sure the data loads happens before the sn load */
> +	rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
> +
> +	end_sn = __atomic_load_n(&seqlock->sn, __ATOMIC_RELAXED);

Since we are reading and potentially returning the sequence number here 
(repeating the read of the protected data), we need to use load-acquire. 
I assume it is not expected that the user will call 
rte_seqlock_read_lock() again.

Seeing this implementation, I might actually prefer the original 
implementation, I think it is cleaner. But I would like for the begin 
function also to wait for an even sequence number, the end function 
would only have to check for same sequence number, this might improve 
performance a little bit as readers won't perform one or several broken 
reads while a write is in progress. The function names are a different 
thing though.

The writer side behaves much more like a lock with mutual exclusion so 
write_lock/write_unlock makes sense.

> +
> +	if (unlikely(end_sn & 1 || *begin_sn != end_sn)) {
> +		*begin_sn = end_sn;
> +		return false;
> +	}
> +
> +	return true;
> +}
> +

^ permalink raw reply	[flat|nested] 104+ messages in thread

* RE: [PATCH v3] eal: add seqlock
  2022-04-02 18:15                                           ` Ola Liljedahl
@ 2022-04-02 19:31                                             ` Honnappa Nagarahalli
  2022-04-02 20:36                                               ` Morten Brørup
  2022-04-03 18:11                                               ` Ola Liljedahl
  2022-04-03  6:51                                             ` Mattias Rönnblom
  1 sibling, 2 replies; 104+ messages in thread
From: Honnappa Nagarahalli @ 2022-04-02 19:31 UTC (permalink / raw)
  To: Ola Liljedahl, Mattias Rönnblom, dev
  Cc: thomas, David Marchand, onar.olsen, nd, konstantin.ananyev, mb,
	stephen, nd

<snip>

> > +__rte_experimental
> > +static inline bool
> > +rte_seqlock_read_tryunlock(const rte_seqlock_t *seqlock, uint32_t
> > +*begin_sn) {
> > +	uint32_t end_sn;
> > +
> > +	/* make sure the data loads happens before the sn load */
> > +	rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
> > +
> > +	end_sn = __atomic_load_n(&seqlock->sn, __ATOMIC_RELAXED);
> 
> Since we are reading and potentially returning the sequence number here
> (repeating the read of the protected data), we need to use load-acquire.
> I assume it is not expected that the user will call
> rte_seqlock_read_lock() again.
Good point, we need a load-acquire (due to changes done in v3).

> 
> Seeing this implementation, I might actually prefer the original
> implementation, I think it is cleaner. But I would like for the begin function
> also to wait for an even sequence number, the end function would only have
> to check for same sequence number, this might improve performance a little
> bit as readers won't perform one or several broken reads while a write is in
> progress. The function names are a different thing though.
I think we need to be optimizing for the case where there is no contention between readers and writers (as that happens most of the time). From this perspective, not checking for an even seq number in the begin function would reduce one 'if' statement.

Going back to the earlier model is better as well, because of the load-acquire required in the 'rte_seqlock_read_tryunlock' function. The earlier model would not introduce the load-acquire for the no contention case.

> 
> The writer side behaves much more like a lock with mutual exclusion so
> write_lock/write_unlock makes sense.
> 
> > +
> > +	if (unlikely(end_sn & 1 || *begin_sn != end_sn)) {
> > +		*begin_sn = end_sn;
> > +		return false;
> > +	}
> > +
> > +	return true;
> > +}
> > +

^ permalink raw reply	[flat|nested] 104+ messages in thread

* RE: [PATCH v2] eal: add seqlock
  2022-04-02 17:54                             ` Ola Liljedahl
@ 2022-04-02 19:37                               ` Honnappa Nagarahalli
  0 siblings, 0 replies; 104+ messages in thread
From: Honnappa Nagarahalli @ 2022-04-02 19:37 UTC (permalink / raw)
  To: Ola Liljedahl, Stephen Hemminger, Mattias Rönnblom
  Cc: dev, thomas, David Marchand, onar.olsen, nd, konstantin.ananyev, mb, nd

<snip>

> >> +static inline void
> >> +rte_seqlock_write_end(rte_seqlock_t *seqlock) {
> >> +	uint32_t sn;
> >> +
> >> +	sn = seqlock->sn + 1;
> >> +
> >> +	/* synchronizes-with the load acquire in rte_seqlock_begin()
> >> */
> >> +	__atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELEASE);
> >> +
> >> +	rte_spinlock_unlock(&seqlock->lock);
> > Atomic store is not necessary here, the atomic operation in
> > spinlock_unlock wil assure theat the seqeuence number update is
> > ordered correctly.
> Load-acquire(seqlock->sn) in rte_seqlock_begin() must be paired with
> store-release(seqlock->sn) in rte_seqlock_write_end() or there wouldn't exist
> any synchronize-with relationship. Readers don't access the spin lock so any
> writer-side updates to the spin lock don't mean anything to readers.
Agree with this assessment. The store-release in spin-lock unlock does not synchronize with the readers.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* RE: [PATCH v3] eal: add seqlock
  2022-04-02 11:01                                             ` Morten Brørup
@ 2022-04-02 19:38                                               ` Honnappa Nagarahalli
  2022-04-10 13:51                                                 ` [RFC 1/3] eal: add macro to warn for unused function return values Mattias Rönnblom
  0 siblings, 1 reply; 104+ messages in thread
From: Honnappa Nagarahalli @ 2022-04-02 19:38 UTC (permalink / raw)
  To: Morten Brørup, Mattias Rönnblom, dev
  Cc: thomas, David Marchand, onar.olsen, nd, konstantin.ananyev,
	stephen, Ola Liljedahl, nd, nd

<snip>

> 
> > > +__rte_experimental
> > > +static inline bool
> > > +rte_seqlock_read_tryunlock(const rte_seqlock_t *seqlock, uint32_t
> > > +*begin_sn) {
> 
> Did anyone object to adding the __attribute__((warn_unused_result))?
> 
> Otherwise, I think you should add it.
+1

^ permalink raw reply	[flat|nested] 104+ messages in thread

* RE: [PATCH v3] eal: add seqlock
  2022-04-02 19:31                                             ` Honnappa Nagarahalli
@ 2022-04-02 20:36                                               ` Morten Brørup
  2022-04-02 22:01                                                 ` Honnappa Nagarahalli
  2022-04-03 18:11                                               ` Ola Liljedahl
  1 sibling, 1 reply; 104+ messages in thread
From: Morten Brørup @ 2022-04-02 20:36 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Ola Liljedahl, Mattias Rönnblom, dev
  Cc: thomas, David Marchand, onar.olsen, nd, konstantin.ananyev, stephen, nd

> From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]
> Sent: Saturday, 2 April 2022 21.31
> 
> <snip>
> 
> > > +__rte_experimental
> > > +static inline bool
> > > +rte_seqlock_read_tryunlock(const rte_seqlock_t *seqlock, uint32_t
> > > +*begin_sn) {
> > > +	uint32_t end_sn;
> > > +
> > > +	/* make sure the data loads happens before the sn load */
> > > +	rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
> > > +
> > > +	end_sn = __atomic_load_n(&seqlock->sn, __ATOMIC_RELAXED);
> >
> > Since we are reading and potentially returning the sequence number
> here
> > (repeating the read of the protected data), we need to use load-
> acquire.
> > I assume it is not expected that the user will call
> > rte_seqlock_read_lock() again.
> Good point, we need a load-acquire (due to changes done in v3).
> 
> >
> > Seeing this implementation, I might actually prefer the original
> > implementation, I think it is cleaner. But I would like for the begin
> function
> > also to wait for an even sequence number, the end function would only
> have
> > to check for same sequence number, this might improve performance a
> little
> > bit as readers won't perform one or several broken reads while a
> write is in
> > progress. The function names are a different thing though.
> I think we need to be optimizing for the case where there is no
> contention between readers and writers (as that happens most of the
> time). From this perspective, not checking for an even seq number in
> the begin function would reduce one 'if' statement.

I might be siding with Ola on this, but with a twist: The read_lock() should not wait, but test. (Or both variants could be available. Or all three, including the variant without checking for an even sequence number.)

My argument for this is: The write operation could take a long time to complete, and while this goes on, it is good for the reading threads to know at entry of their critical read section that the read operation will fail, so they can take the alternative code path instead of proceeding into the critical read section. Otherwise, the reading threads have to waste time reading the protected data, only to discard them at the end. It's an optimization based on the assumption that reading the protected data has some small cost, because this small cost adds up if done many times during a longwinded write operation.

And, although checking for the sequence number in read_trylock() adds an 'if' statement to it, that 'if' statement should be surrounded by likely() to reduce its cost in the case we are optimizing for, i.e. when no write operation is ongoing.

This means that read_trylock() returns a boolean, and the sequence number is returned in an output parameter.

Please note that it doesn't change the fact that read_tryunlock() can still fail, even though read_trylock() gave the go-ahead.

I'm trying to highlight that while we all agree to optimize for the case of reading while no writing is ongoing, there might be opportunity for optimizing for the opposite case (i.e. trying to read while writing is ongoing) at the same time.

I only hope it can be done with negligent performance cost for the primary case.

I'll respectfully leave the hardcore implementation details and performance considerations to you experts in this area. :-)


^ permalink raw reply	[flat|nested] 104+ messages in thread

* RE: [PATCH v3] eal: add seqlock
  2022-04-02 20:36                                               ` Morten Brørup
@ 2022-04-02 22:01                                                 ` Honnappa Nagarahalli
  0 siblings, 0 replies; 104+ messages in thread
From: Honnappa Nagarahalli @ 2022-04-02 22:01 UTC (permalink / raw)
  To: Morten Brørup, Ola Liljedahl, Mattias Rönnblom, dev
  Cc: thomas, David Marchand, onar.olsen, nd, konstantin.ananyev,
	stephen, nd, nd

<snip>
> >
> > > > +__rte_experimental
> > > > +static inline bool
> > > > +rte_seqlock_read_tryunlock(const rte_seqlock_t *seqlock, uint32_t
> > > > +*begin_sn) {
> > > > +	uint32_t end_sn;
> > > > +
> > > > +	/* make sure the data loads happens before the sn load */
> > > > +	rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
> > > > +
> > > > +	end_sn = __atomic_load_n(&seqlock->sn, __ATOMIC_RELAXED);
> > >
> > > Since we are reading and potentially returning the sequence number
> > here
> > > (repeating the read of the protected data), we need to use load-
> > acquire.
> > > I assume it is not expected that the user will call
> > > rte_seqlock_read_lock() again.
> > Good point, we need a load-acquire (due to changes done in v3).
> >
> > >
> > > Seeing this implementation, I might actually prefer the original
> > > implementation, I think it is cleaner. But I would like for the
> > > begin
> > function
> > > also to wait for an even sequence number, the end function would
> > > only
> > have
> > > to check for same sequence number, this might improve performance a
> > little
> > > bit as readers won't perform one or several broken reads while a
> > write is in
> > > progress. The function names are a different thing though.
> > I think we need to be optimizing for the case where there is no
> > contention between readers and writers (as that happens most of the
> > time). From this perspective, not checking for an even seq number in
> > the begin function would reduce one 'if' statement.
> 
> I might be siding with Ola on this, but with a twist: The read_lock() should not
> wait, but test. (Or both variants could be available. Or all three, including the
> variant without checking for an even sequence number.)
> 
> My argument for this is: The write operation could take a long time to
> complete, and while this goes on, it is good for the reading threads to know at
> entry of their critical read section that the read operation will fail, so they can
> take the alternative code path instead of proceeding into the critical read
> section. Otherwise, the reading threads have to waste time reading the
> protected data, only to discard them at the end. It's an optimization based on
> the assumption that reading the protected data has some small cost, because
> this small cost adds up if done many times during a longwinded write
> operation.
> 
> And, although checking for the sequence number in read_trylock() adds an 'if'
> statement to it, that 'if' statement should be surrounded by likely() to reduce
> its cost in the case we are optimizing for, i.e. when no write operation is
> ongoing.
This 'if' statement can be part of the application code as well. This would allow for multiple models to exist.

> 
> This means that read_trylock() returns a boolean, and the sequence number is
> returned in an output parameter.
> 
> Please note that it doesn't change the fact that read_tryunlock() can still fail,
> even though read_trylock() gave the go-ahead.
> 
> I'm trying to highlight that while we all agree to optimize for the case of
> reading while no writing is ongoing, there might be opportunity for optimizing
> for the opposite case (i.e. trying to read while writing is ongoing) at the same
> time.
> 
> I only hope it can be done with negligent performance cost for the primary
> case.
> 
> I'll respectfully leave the hardcore implementation details and performance
> considerations to you experts in this area. :-)


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3] eal: add seqlock
  2022-04-02  0:21                                           ` Honnappa Nagarahalli
  2022-04-02 11:01                                             ` Morten Brørup
@ 2022-04-03  6:10                                             ` Mattias Rönnblom
  2022-04-03 17:27                                               ` Honnappa Nagarahalli
  2022-04-03  6:33                                             ` Mattias Rönnblom
  2 siblings, 1 reply; 104+ messages in thread
From: Mattias Rönnblom @ 2022-04-03  6:10 UTC (permalink / raw)
  To: Honnappa Nagarahalli, dev
  Cc: thomas, David Marchand, onar.olsen, nd, konstantin.ananyev, mb,
	stephen, Ola Liljedahl

On 2022-04-02 02:21, Honnappa Nagarahalli wrote:
> Hi Mattias,
> 	Few comments inline.
> 
>> -----Original Message-----
>> From: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
>> Sent: Friday, April 1, 2022 10:08 AM
>> To: dev@dpdk.org
>> Cc: thomas@monjalon.net; David Marchand <david.marchand@redhat.com>;
>> onar.olsen@ericsson.com; Honnappa Nagarahalli
>> <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>;
>> konstantin.ananyev@intel.com; mb@smartsharesystems.com;
>> stephen@networkplumber.org; Mattias Rönnblom
>> <mattias.ronnblom@ericsson.com>; Ola Liljedahl <Ola.Liljedahl@arm.com>
>> Subject: [PATCH v3] eal: add seqlock
>>
>> A sequence lock (seqlock) is synchronization primitive which allows for data-
>> race free, low-overhead, high-frequency reads, especially for data structures
>> shared across many cores and which are updated with relatively infrequently.
>>
>> A seqlock permits multiple parallel readers. The variant of seqlock implemented
>> in this patch supports multiple writers as well. A spinlock is used for writer-
>> writer serialization.
>>
>> To avoid resource reclamation and other issues, the data protected by a seqlock
>> is best off being self-contained (i.e., no pointers [except to constant data]).
>>
>> One way to think about seqlocks is that they provide means to perform atomic
>> operations on data objects larger what the native atomic machine instructions
>> allow for.
>>
>> DPDK seqlocks are not preemption safe on the writer side. A thread preemption
>> affects performance, not correctness.
>>
>> A seqlock contains a sequence number, which can be thought of as the
>> generation of the data it protects.
>>
>> A reader will
>>    1. Load the sequence number (sn).
>>    2. Load, in arbitrary order, the seqlock-protected data.
>>    3. Load the sn again.
>>    4. Check if the first and second sn are equal, and even numbered.
>>       If they are not, discard the loaded data, and restart from 1.
>>
>> The first three steps need to be ordered using suitable memory fences.
>>
>> A writer will
>>    1. Take the spinlock, to serialize writer access.
>>    2. Load the sn.
>>    3. Store the original sn + 1 as the new sn.
>>    4. Perform load and stores to the seqlock-protected data.
>>    5. Store the original sn + 2 as the new sn.
>>    6. Release the spinlock.
>>
>> Proper memory fencing is required to make sure the first sn store, the data
>> stores, and the second sn store appear to the reader in the mentioned order.
>>
>> The sn loads and stores must be atomic, but the data loads and stores need not
>> be.
>>
>> The original seqlock design and implementation was done by Stephen
>> Hemminger. This is an independent implementation, using C11 atomics.
>>
>> For more information on seqlocks, see
>> https://en.wikipedia.org/wiki/Seqlock
>>
>> PATCH v3:
>>    * Renamed both read and write-side critical section begin/end functions
>>      to better match rwlock naming, per Ola Liljedahl's suggestion.
>>    * Added 'extern "C"' guards for C++ compatibility.
>>    * Refer to the main lcore as the main, and not the master.
>>
>> PATCH v2:
>>    * Skip instead of fail unit test in case too few lcores are available.
>>    * Use main lcore for testing, reducing the minimum number of lcores
>>      required to run the unit tests to four.
>>    * Consistently refer to sn field as the "sequence number" in the
>>      documentation.
>>    * Fixed spelling mistakes in documentation.
>>
>> Updates since RFC:
>>    * Added API documentation.
>>    * Added link to Wikipedia article in the commit message.
>>    * Changed seqlock sequence number field from uint64_t (which was
>>      overkill) to uint32_t. The sn type needs to be sufficiently large
>>      to assure no reader will read a sn, access the data, and then read
>>      the same sn, but the sn has been updated to many times during the
>>      read, so it has wrapped.
>>    * Added RTE_SEQLOCK_INITIALIZER macro for static initialization.
>>    * Removed the rte_seqlock struct + separate rte_seqlock_t typedef
>>      with an anonymous struct typedef:ed to rte_seqlock_t.
>>
>> Acked-by: Morten Brørup <mb@smartsharesystems.com>
>> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
>> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
>> ---
>>   app/test/meson.build          |   2 +
>>   app/test/test_seqlock.c       | 202 +++++++++++++++++++++++
>>   lib/eal/common/meson.build    |   1 +
>>   lib/eal/common/rte_seqlock.c  |  12 ++
>>   lib/eal/include/meson.build   |   1 +
>>   lib/eal/include/rte_seqlock.h | 302 ++++++++++++++++++++++++++++++++++
>>   lib/eal/version.map           |   3 +
>>   7 files changed, 523 insertions(+)
>>   create mode 100644 app/test/test_seqlock.c  create mode 100644
>> lib/eal/common/rte_seqlock.c  create mode 100644
>> lib/eal/include/rte_seqlock.h
>>
>> diff --git a/app/test/meson.build b/app/test/meson.build index
>> 5fc1dd1b7b..5e418e8766 100644
>> --- a/app/test/meson.build
>> +++ b/app/test/meson.build
>> @@ -125,6 +125,7 @@ test_sources = files(
>>           'test_rwlock.c',
>>           'test_sched.c',
>>           'test_security.c',
>> +        'test_seqlock.c',
>>           'test_service_cores.c',
>>           'test_spinlock.c',
>>           'test_stack.c',
>> @@ -214,6 +215,7 @@ fast_tests = [
>>           ['rwlock_rde_wro_autotest', true],
>>           ['sched_autotest', true],
>>           ['security_autotest', false],
>> +        ['seqlock_autotest', true],
>>           ['spinlock_autotest', true],
>>           ['stack_autotest', false],
>>           ['stack_lf_autotest', false],
>> diff --git a/app/test/test_seqlock.c b/app/test/test_seqlock.c new file mode
>> 100644 index 0000000000..54fadf8025
>> --- /dev/null
>> +++ b/app/test/test_seqlock.c
>> @@ -0,0 +1,202 @@
>> +/* SPDX-License-Identifier: BSD-3-Clause
>> + * Copyright(c) 2022 Ericsson AB
>> + */
>> +
>> +#include <rte_seqlock.h>
>> +
>> +#include <rte_cycles.h>
>> +#include <rte_malloc.h>
>> +#include <rte_random.h>
>> +
>> +#include <inttypes.h>
>> +
>> +#include "test.h"
>> +
>> +struct data {
>> +	rte_seqlock_t lock;
>> +
>> +	uint64_t a;
>> +	uint64_t b __rte_cache_aligned;
>> +	uint64_t c __rte_cache_aligned;
>> +} __rte_cache_aligned;
>> +
>> +struct reader {
>> +	struct data *data;
>> +	uint8_t stop;
>> +};
>> +
>> +#define WRITER_RUNTIME (2.0) /* s */
>> +
>> +#define WRITER_MAX_DELAY (100) /* us */
>> +
>> +#define INTERRUPTED_WRITER_FREQUENCY (1000) #define
>> +WRITER_INTERRUPT_TIME (1) /* us */
>> +
>> +static int
>> +writer_run(void *arg)
>> +{
>> +	struct data *data = arg;
>> +	uint64_t deadline;
>> +
>> +	deadline = rte_get_timer_cycles() +
>> +		WRITER_RUNTIME * rte_get_timer_hz();
>> +
>> +	while (rte_get_timer_cycles() < deadline) {
>> +		bool interrupted;
>> +		uint64_t new_value;
>> +		unsigned int delay;
>> +
>> +		new_value = rte_rand();
>> +
>> +		interrupted =
>> rte_rand_max(INTERRUPTED_WRITER_FREQUENCY) == 0;
>> +
>> +		rte_seqlock_write_lock(&data->lock);
>> +
>> +		data->c = new_value;
>> +
>> +		/* These compiler barriers (both on the test reader
>> +		 * and the test writer side) are here to ensure that
>> +		 * loads/stores *usually* happen in test program order
>> +		 * (always on a TSO machine). They are arrange in such
>> +		 * a way that the writer stores in a different order
>> +		 * than the reader loads, to emulate an arbitrary
>> +		 * order. A real application using a seqlock does not
>> +		 * require any compiler barriers.
>> +		 */
>> +		rte_compiler_barrier();
> The compiler barriers are not sufficient on all architectures (if the intention is to maintain the program order).
> 

The intention is what is described in the comment (i.e., to make it 
likely, but no guaranteed, that the stores will be globally visible in 
the program order).

The reason I didn't put in a release memory barrier, was that it seems a 
little intrusive.

Maybe I should remove these compiler barriers. They are also intrusive 
in the way may prevent some compiler optimizations, that could expose a 
seqlock bug. Or, I could have two variants of the tests. I don't know.

>> +		data->b = new_value;
>> +
>> +		if (interrupted)
>> +			rte_delay_us_block(WRITER_INTERRUPT_TIME);
>> +
>> +		rte_compiler_barrier();
>> +		data->a = new_value;
>> +
>> +		rte_seqlock_write_unlock(&data->lock);
>> +
>> +		delay = rte_rand_max(WRITER_MAX_DELAY);
>> +
>> +		rte_delay_us_block(delay);
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +#define INTERRUPTED_READER_FREQUENCY (1000) #define
>> +READER_INTERRUPT_TIME (1000) /* us */
>> +
>> +static int
>> +reader_run(void *arg)
>> +{
>> +	struct reader *r = arg;
>> +	int rc = 0;
>> +
>> +	while (__atomic_load_n(&r->stop, __ATOMIC_RELAXED) == 0 && rc ==
>> 0) {
>> +		struct data *data = r->data;
>> +		bool interrupted;
>> +		uint64_t a;
>> +		uint64_t b;
>> +		uint64_t c;
>> +		uint32_t sn;
>> +
>> +		interrupted =
>> rte_rand_max(INTERRUPTED_READER_FREQUENCY) == 0;
>> +
>> +		sn = rte_seqlock_read_lock(&data->lock);
>> +
>> +		do {
>> +			a = data->a;
>> +			/* See writer_run() for an explanation why
>> +			 * these barriers are here.
>> +			 */
>> +			rte_compiler_barrier();
>> +
>> +			if (interrupted)
>> +
>> 	rte_delay_us_block(READER_INTERRUPT_TIME);
>> +
>> +			c = data->c;
>> +
>> +			rte_compiler_barrier();
>> +			b = data->b;
>> +
>> +		} while (!rte_seqlock_read_tryunlock(&data->lock, &sn));
>> +
>> +		if (a != b || b != c) {
>> +			printf("Reader observed inconsistent data values "
>> +			       "%" PRIu64 " %" PRIu64 " %" PRIu64 "\n",
>> +			       a, b, c);
>> +			rc = -1;
>> +		}
>> +	}
>> +
>> +	return rc;
>> +}
>> +
>> +static void
>> +reader_stop(struct reader *reader)
>> +{
>> +	__atomic_store_n(&reader->stop, 1, __ATOMIC_RELAXED); }
>> +
>> +#define NUM_WRITERS (2) /* main lcore + one worker */ #define
>> +MIN_NUM_READERS (2) #define MAX_READERS (RTE_MAX_LCORE -
>> NUM_WRITERS -
>> +1) #define MIN_LCORE_COUNT (NUM_WRITERS + MIN_NUM_READERS)
>> +
>> +/* Only a compile-time test */
>> +static rte_seqlock_t __rte_unused static_init_lock =
>> +RTE_SEQLOCK_INITIALIZER;
>> +
>> +static int
>> +test_seqlock(void)
>> +{
>> +	struct reader readers[MAX_READERS];
>> +	unsigned int num_readers;
>> +	unsigned int num_lcores;
>> +	unsigned int i;
>> +	unsigned int lcore_id;
>> +	unsigned int reader_lcore_ids[MAX_READERS];
>> +	unsigned int worker_writer_lcore_id = 0;
>> +	int rc = 0;
>> +
>> +	num_lcores = rte_lcore_count();
>> +
>> +	if (num_lcores < MIN_LCORE_COUNT) {
>> +		printf("Too few cores to run test. Skipping.\n");
>> +		return 0;
>> +	}
>> +
>> +	num_readers = num_lcores - NUM_WRITERS;
>> +
>> +	struct data *data = rte_zmalloc(NULL, sizeof(struct data), 0);
>> +
>> +	i = 0;
>> +	RTE_LCORE_FOREACH_WORKER(lcore_id) {
>> +		if (i == 0) {
>> +			rte_eal_remote_launch(writer_run, data, lcore_id);
>> +			worker_writer_lcore_id = lcore_id;
>> +		} else {
>> +			unsigned int reader_idx = i - 1;
>> +			struct reader *reader = &readers[reader_idx];
>> +
>> +			reader->data = data;
>> +			reader->stop = 0;
>> +
>> +			rte_eal_remote_launch(reader_run, reader, lcore_id);
>> +			reader_lcore_ids[reader_idx] = lcore_id;
>> +		}
>> +		i++;
>> +	}
>> +
>> +	if (writer_run(data) != 0 ||
>> +	    rte_eal_wait_lcore(worker_writer_lcore_id) != 0)
>> +		rc = -1;
>> +
>> +	for (i = 0; i < num_readers; i++) {
>> +		reader_stop(&readers[i]);
>> +		if (rte_eal_wait_lcore(reader_lcore_ids[i]) != 0)
>> +			rc = -1;
>> +	}
>> +
>> +	return rc;
>> +}
>> +
>> +REGISTER_TEST_COMMAND(seqlock_autotest, test_seqlock);
>> diff --git a/lib/eal/common/meson.build b/lib/eal/common/meson.build index
>> 917758cc65..a41343bfed 100644
>> --- a/lib/eal/common/meson.build
>> +++ b/lib/eal/common/meson.build
>> @@ -35,6 +35,7 @@ sources += files(
>>           'rte_malloc.c',
>>           'rte_random.c',
>>           'rte_reciprocal.c',
>> +	'rte_seqlock.c',
>>           'rte_service.c',
>>           'rte_version.c',
>>   )
>> diff --git a/lib/eal/common/rte_seqlock.c b/lib/eal/common/rte_seqlock.c new
>> file mode 100644 index 0000000000..d4fe648799
>> --- /dev/null
>> +++ b/lib/eal/common/rte_seqlock.c
>> @@ -0,0 +1,12 @@
>> +/* SPDX-License-Identifier: BSD-3-Clause
>> + * Copyright(c) 2022 Ericsson AB
>> + */
>> +
>> +#include <rte_seqlock.h>
>> +
>> +void
>> +rte_seqlock_init(rte_seqlock_t *seqlock) {
>> +	seqlock->sn = 0;
>> +	rte_spinlock_init(&seqlock->lock);
>> +}
>> diff --git a/lib/eal/include/meson.build b/lib/eal/include/meson.build index
>> 9700494816..48df5f1a21 100644
>> --- a/lib/eal/include/meson.build
>> +++ b/lib/eal/include/meson.build
>> @@ -36,6 +36,7 @@ headers += files(
>>           'rte_per_lcore.h',
>>           'rte_random.h',
>>           'rte_reciprocal.h',
>> +        'rte_seqlock.h',
>>           'rte_service.h',
>>           'rte_service_component.h',
>>           'rte_string_fns.h',
>> diff --git a/lib/eal/include/rte_seqlock.h b/lib/eal/include/rte_seqlock.h new file
> Other lock implementations are in lib/eal/include/generic.
> 
>> mode 100644 index 0000000000..44eacd66e8
>> --- /dev/null
>> +++ b/lib/eal/include/rte_seqlock.h
>> @@ -0,0 +1,302 @@
>> +/* SPDX-License-Identifier: BSD-3-Clause
>> + * Copyright(c) 2022 Ericsson AB
>> + */
>> +
>> +#ifndef _RTE_SEQLOCK_H_
>> +#define _RTE_SEQLOCK_H_
>> +
>> +#ifdef __cplusplus
>> +extern "C" {
>> +#endif
>> +
>> +/**
>> + * @file
>> + * RTE Seqlock
>> + *
>> + * A sequence lock (seqlock) is a synchronization primitive allowing
>> + * multiple, parallel, readers to efficiently and safely (i.e., in a
>> + * data-race free manner) access the lock-protected data. The RTE
>> + * seqlock permits multiple writers as well. A spinlock is used to
>> + * writer-writer synchronization.
>> + *
>> + * A reader never blocks a writer. Very high frequency writes may
>> + * prevent readers from making progress.
>> + *
>> + * A seqlock is not preemption-safe on the writer side. If a writer is
>> + * preempted, it may block readers until the writer thread is again
>> + * allowed to execute. Heavy computations should be kept out of the
>> + * writer-side critical section, to avoid delaying readers.
>> + *
>> + * Seqlocks are useful for data which are read by many cores, at a
>> + * high frequency, and relatively infrequently written to.
>> + *
>> + * One way to think about seqlocks is that they provide means to
>> + * perform atomic operations on objects larger than what the native
>> + * machine instructions allow for.
>> + *
>> + * To avoid resource reclamation issues, the data protected by a
>> + * seqlock should typically be kept self-contained (e.g., no pointers
>> + * to mutable, dynamically allocated data).
>> + *
>> + * Example usage:
>> + * @code{.c}
>> + * #define MAX_Y_LEN (16)
>> + * // Application-defined example data structure, protected by a seqlock.
>> + * struct config {
>> + *         rte_seqlock_t lock;
>> + *         int param_x;
>> + *         char param_y[MAX_Y_LEN];
>> + * };
>> + *
>> + * // Accessor function for reading config fields.
>> + * void
>> + * config_read(const struct config *config, int *param_x, char
>> +*param_y)
>> + * {
>> + *         // Temporary variables, just to improve readability.
> I think the above comment is not necessary. It is beneficial to copy the protected data to keep the read side critical section small.
> 
>> + *         int tentative_x;
>> + *         char tentative_y[MAX_Y_LEN];
>> + *         uint32_t sn;
>> + *
>> + *         sn = rte_seqlock_read_lock(&config->lock);
>> + *         do {
>> + *                 // Loads may be atomic or non-atomic, as in this example.
>> + *                 tentative_x = config->param_x;
>> + *                 strcpy(tentative_y, config->param_y);
>> + *         } while (!rte_seqlock_read_tryunlock(&config->lock, &sn));
>> + *         // An application could skip retrying, and try again later, if
>> + *         // progress is possible without the data.
>> + *
>> + *         *param_x = tentative_x;
>> + *         strcpy(param_y, tentative_y);
>> + * }
>> + *
>> + * // Accessor function for writing config fields.
>> + * void
>> + * config_update(struct config *config, int param_x, const char
>> +*param_y)
>> + * {
>> + *         rte_seqlock_write_lock(&config->lock);
>> + *         // Stores may be atomic or non-atomic, as in this example.
>> + *         config->param_x = param_x;
>> + *         strcpy(config->param_y, param_y);
>> + *         rte_seqlock_write_unlock(&config->lock);
>> + * }
>> + * @endcode
>> + *
>> + * @see
>> + * https://en.wikipedia.org/wiki/Seqlock.
>> + */
>> +
>> +#include <stdbool.h>
>> +#include <stdint.h>
>> +
>> +#include <rte_atomic.h>
>> +#include <rte_branch_prediction.h>
>> +#include <rte_spinlock.h>
>> +
>> +/**
>> + * The RTE seqlock type.
>> + */
>> +typedef struct {
>> +	uint32_t sn; /**< A sequence number for the protected data. */
>> +	rte_spinlock_t lock; /**< Spinlock used to serialize writers.  */ }
> Suggest using ticket lock for the writer side. It should have low overhead when there is a single writer, but provides better functionality when there are multiple writers.
> 
>> +rte_seqlock_t;
>> +
>> +/**
>> + * A static seqlock initializer.
>> + */
>> +#define RTE_SEQLOCK_INITIALIZER { 0, RTE_SPINLOCK_INITIALIZER }
>> +
>> +/**
>> + * Initialize the seqlock.
>> + *
>> + * This function initializes the seqlock, and leaves the writer-side
>> + * spinlock unlocked.
>> + *
>> + * @param seqlock
>> + *   A pointer to the seqlock.
>> + */
>> +__rte_experimental
>> +void
>> +rte_seqlock_init(rte_seqlock_t *seqlock);
>> +
>> +/**
>> + * Begin a read-side critical section.
>> + *
>> + * A call to this function marks the beginning of a read-side critical
>> + * section, for @p seqlock.
>> + *
>> + * rte_seqlock_read_lock() returns a sequence number, which is later
>> + * used in rte_seqlock_read_tryunlock() to check if the protected data
>> + * underwent any modifications during the read transaction.
>> + *
>> + * After (in program order) rte_seqlock_read_lock() has been called,
>> + * the calling thread reads the protected data, for later use. The
>> + * protected data read *must* be copied (either in pristine form, or
>> + * in the form of some derivative), since the caller may only read the
>> + * data from within the read-side critical section (i.e., after
>> + * rte_seqlock_read_lock() and before rte_seqlock_read_tryunlock()),
>> + * but must not act upon the retrieved data while in the critical
>> + * section, since it does not yet know if it is consistent.
>> + *
>> + * The protected data may be read using atomic and/or non-atomic
>> + * operations.
>> + *
>> + * After (in program order) all required data loads have been
>> + * performed, rte_seqlock_read_tryunlock() should be called, marking
>> + * the end of the read-side critical section.
>> + *
>> + * If rte_seqlock_read_tryunlock() returns true, the data was read
>> + * atomically and the copied data is consistent.
>> + *
>> + * If rte_seqlock_read_tryunlock() returns false, the just-read data
>> + * is inconsistent and should be discarded. The caller has the option
>> + * to either re-read the data and call rte_seqlock_read_tryunlock()
>> + * again, or to restart the whole procedure (i.e., from
>> + * rte_seqlock_read_lock()) at some later time.
>> + *
>> + * @param seqlock
>> + *   A pointer to the seqlock.
>> + * @return
>> + *   The seqlock sequence number for this critical section, to
>> + *   later be passed to rte_seqlock_read_tryunlock().
>> + *
>> + * @see rte_seqlock_read_tryunlock()
>> + */
>> +__rte_experimental
>> +static inline uint32_t
>> +rte_seqlock_read_lock(const rte_seqlock_t *seqlock) {
>> +	/* __ATOMIC_ACQUIRE to prevent loads after (in program order)
>> +	 * from happening before the sn load. Synchronizes-with the
>> +	 * store release in rte_seqlock_write_unlock().
>> +	 */
>> +	return __atomic_load_n(&seqlock->sn, __ATOMIC_ACQUIRE); }
>> +
>> +/**
>> + * End a read-side critical section.
>> + *
>> + * A call to this function marks the end of a read-side critical
> Should we capture that it also begins a new critical-section for the subsequent calls to rte_seqlock_tryunlock()?
> 
>> + * section, for @p seqlock. The application must supply the sequence
>> + * number produced by the corresponding rte_seqlock_read_lock() (or,
>> + * in case of a retry, the rte_seqlock_tryunlock()) call.
>> + *
>> + * After this function has been called, the caller should not access
>> + * the protected data.
> I understand what you mean here. But, I think this needs clarity.
> In the documentation for rte_seqlock_read_lock() you have mentioned, if rte_seqlock_read_tryunlock() returns false, one could re-read the data.
> May be this should be changed to:
> " After this function returns true, the caller should not access the protected data."?
> Or may be combine it with the following para.
> 
>> + *
>> + * In case this function returns true, the just-read data was
>> + * consistent and the set of atomic and non-atomic load operations
>> + * performed between rte_seqlock_read_lock() and
>> + * rte_seqlock_read_tryunlock() were atomic, as a whole.
>> + *
>> + * In case rte_seqlock_read_tryunlock() returns false, the data was
>> + * modified as it was being read and may be inconsistent, and thus
>> + * should be discarded. The @p begin_sn is updated with the
>> + * now-current sequence number.
> May be
> " The @p begin_sn is updated with the sequence number for the next critical section."
> 

Sounds good.

>> + *
>> + * @param seqlock
>> + *   A pointer to the seqlock.
>> + * @param begin_sn
>> + *   The seqlock sequence number returned by
>> + *   rte_seqlock_read_lock() (potentially updated in subsequent
>> + *   rte_seqlock_read_tryunlock() calls) for this critical section.
>> + * @return
>> + *   true or false, if the just-read seqlock-protected data was consistent
>> + *   or inconsistent, respectively, at the time it was read.
> true - just read protected data was consistent
> false - just read protected data was inconsistent
> 
>> + *
>> + * @see rte_seqlock_read_lock()
>> + */
>> +__rte_experimental
>> +static inline bool
>> +rte_seqlock_read_tryunlock(const rte_seqlock_t *seqlock, uint32_t
>> +*begin_sn) {
>> +	uint32_t end_sn;
>> +
>> +	/* make sure the data loads happens before the sn load */
>> +	rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
>> +
>> +	end_sn = __atomic_load_n(&seqlock->sn, __ATOMIC_RELAXED);
>> +
>> +	if (unlikely(end_sn & 1 || *begin_sn != end_sn)) {
>> +		*begin_sn = end_sn;
>> +		return false;
>> +	}
>> +
>> +	return true;
>> +}
>> +
>> +/**
>> + * Begin a write-side critical section.
>> + *
>> + * A call to this function acquires the write lock associated @p
>> + * seqlock, and marks the beginning of a write-side critical section.
>> + *
>> + * After having called this function, the caller may go on to modify
>> + * (both read and write) the protected data, in an atomic or
>> + * non-atomic manner.
>> + *
>> + * After the necessary updates have been performed, the application
>> + * calls rte_seqlock_write_unlock().
>> + *
>> + * This function is not preemption-safe in the sense that preemption
>> + * of the calling thread may block reader progress until the writer
>> + * thread is rescheduled.
>> + *
>> + * Unlike rte_seqlock_read_lock(), each call made to
>> + * rte_seqlock_write_lock() must be matched with an unlock call.
>> + *
>> + * @param seqlock
>> + *   A pointer to the seqlock.
>> + *
>> + * @see rte_seqlock_write_unlock()
>> + */
>> +__rte_experimental
>> +static inline void
>> +rte_seqlock_write_lock(rte_seqlock_t *seqlock) {
>> +	uint32_t sn;
>> +
>> +	/* to synchronize with other writers */
>> +	rte_spinlock_lock(&seqlock->lock);
>> +
>> +	sn = seqlock->sn + 1;
> The load of seqlock->sn could use __atomic_load_n to be consistent.
> 

But why? I know it doesn't have any cost (these loads are going to be 
atomic anyways), but why use a construct with stronger guarantees than 
you have to?

>> +
>> +	__atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELAXED);
>> +
>> +	/* __ATOMIC_RELEASE to prevent stores after (in program order)
>> +	 * from happening before the sn store.
>> +	 */
>> +	rte_atomic_thread_fence(__ATOMIC_RELEASE);
>> +}
>> +
>> +/**
>> + * End a write-side critical section.
>> + *
>> + * A call to this function marks the end of the write-side critical
>> + * section, for @p seqlock. After this call has been made, the
>> +protected
>> + * data may no longer be modified.
>> + *
>> + * @param seqlock
>> + *   A pointer to the seqlock.
>> + *
>> + * @see rte_seqlock_write_lock()
>> + */
>> +__rte_experimental
>> +static inline void
>> +rte_seqlock_write_unlock(rte_seqlock_t *seqlock) {
>> +	uint32_t sn;
>> +
>> +	sn = seqlock->sn + 1;
> Same here, the load of seqlock->sn could use __atomic_load_n
> 
>> +
>> +	/* synchronizes-with the load acquire in rte_seqlock_read_lock() */
>> +	__atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELEASE);
>> +
>> +	rte_spinlock_unlock(&seqlock->lock);
>> +}
>> +
>> +#ifdef __cplusplus
>> +}
>> +#endif
>> +
>> +#endif  /* _RTE_SEQLOCK_H_ */
>> diff --git a/lib/eal/version.map b/lib/eal/version.map index
>> b53eeb30d7..4a9d0ed899 100644
>> --- a/lib/eal/version.map
>> +++ b/lib/eal/version.map
>> @@ -420,6 +420,9 @@ EXPERIMENTAL {
>>   	rte_intr_instance_free;
>>   	rte_intr_type_get;
>>   	rte_intr_type_set;
>> +
>> +	# added in 22.07
>> +	rte_seqlock_init;
>>   };
>>
>>   INTERNAL {
>> --
>> 2.25.1
> 

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2] eal: add seqlock
  2022-04-02  0:52                                   ` Stephen Hemminger
@ 2022-04-03  6:23                                     ` Mattias Rönnblom
  0 siblings, 0 replies; 104+ messages in thread
From: Mattias Rönnblom @ 2022-04-03  6:23 UTC (permalink / raw)
  To: Stephen Hemminger, Ola Liljedahl
  Cc: dev, Thomas Monjalon, David Marchand, Onar Olsen,
	Honnappa.Nagarahalli, nd, konstantin.ananyev, mb

On 2022-04-02 02:52, Stephen Hemminger wrote:
> On Thu, 31 Mar 2022 16:53:00 +0200
> Ola Liljedahl <ola.liljedahl@arm.com> wrote:
> 
>> From: Ola Liljedahl <ola.liljedahl@arm.com>
>> To: Mattias Rönnblom <mattias.ronnblom@ericsson.com>,  "dev@dpdk.org"
>> <dev@dpdk.org> Cc: Thomas Monjalon <thomas@monjalon.net>,  David
>> Marchand <david.marchand@redhat.com>,  Onar Olsen
>> <onar.olsen@ericsson.com>,  "Honnappa.Nagarahalli@arm.com"
>> <Honnappa.Nagarahalli@arm.com>,  "nd@arm.com" <nd@arm.com>,
>> "konstantin.ananyev@intel.com" <konstantin.ananyev@intel.com>,
>> "mb@smartsharesystems.com" <mb@smartsharesystems.com>,
>> "stephen@networkplumber.org" <stephen@networkplumber.org> Subject:
>> Re: [PATCH v2] eal: add seqlock Date: Thu, 31 Mar 2022 16:53:00 +0200
>> User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
>> Thunderbird/91.7.0
>>
>> (Thunderbird suddenly refuses to edit in plain text mode, hope the
>> mail gets sent as text anyway)
>>
>> On 3/31/22 15:38, Mattias Rönnblom wrote:
>>
>>> On 2022-03-31 11:04, Ola Liljedahl wrote:
>>>> On 3/31/22 09:46, Mattias Rönnblom wrote:
>>>>> On 2022-03-30 16:26, Mattias Rönnblom wrote:
>>>>>   
>> <snip>
>>>>> Should the rte_seqlock_read_retry() be called
>>>>> rte_seqlock_read_end(), or some third alternative? I wanted to
>>>>> make clear it's not just a "release the lock" function. You could
>>>>> use the|||__attribute__((warn_unused_result)) annotation to make
>>>>> clear the return value cannot be ignored, although I'm not sure
>>>>> DPDK ever use that attribute.
>>>> We have to decide how to use the seqlock API from the application
>>>> perspective.
>>>> Your current proposal:
>>>> do {
>>>>        sn = rte_seqlock_read_begin(&seqlock)
>>>>        //read protected data
>>>> } while (rte_seqlock_read_retry(&seqlock, sn));
>>>>
>>>> or perhaps
>>>> sn = rte_seqlock_read_lock(&seqlock);
>>>> do {
>>>>        //read protected data
>>>> } while (!rte_seqlock_read_tryunlock(&seqlock, &sn));
>>>>
>>>> Tryunlock should signal to the user that the unlock operation
>>>> might not succeed and something needs to be repeated.
>>>>   
>>> I like that your proposal is consistent with rwlock API, although I
>>> tend to think about a seqlock more like an arbitrary-size atomic
>>> load/store, where begin() is the beginning of the read transaction.
>>>   
>>
>> I can see the evolution of an application where is starts to use
>> plain spin locks, moves to reader/writer locks for better performance
>> and eventually moves to seqlocks. The usage is the same, only the
>> characteristics (including performance) differ.
> 
> 
> The semantics of seqlock in DPDK must be the same as what Linux kernel
> does or you are asking for trouble.  It is not a reader-writer lock in
> traditional sense.

Does "semantics" here including the naming of the functions? The overall 
semantics will be the same, except the kernel has a bunch of variants 
with different kind of write-side synchronization, if I recall correctly.

I'll try to summarize the options as I see them:

Option A: (PATCH v3):
rte_seqlock_read_lock()
rte_seqlock_read_tryunlock() /* with built-in "read restart" */ 
rte_seqlock_write_lock()
rte_seqlock_write_unlock()


Option B: (Linux kernel-style naming):
rte_seqlock_read_begin()
rte_seqlock_read_end()
rte_seqlock_write_begin()
rte_seqlock_write_end()

A combination, acknowledging there's a lock on the writer side, but not 
on the read side.

Option C:

rte_seqlock_read_begin()
rte_seqlock_read_retry()
rte_seqlock_write_lock()
rte_seqlock_write_unlock()

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3] eal: add seqlock
  2022-04-02  0:21                                           ` Honnappa Nagarahalli
  2022-04-02 11:01                                             ` Morten Brørup
  2022-04-03  6:10                                             ` [PATCH v3] eal: add seqlock Mattias Rönnblom
@ 2022-04-03  6:33                                             ` Mattias Rönnblom
  2022-04-03 17:37                                               ` Honnappa Nagarahalli
  2 siblings, 1 reply; 104+ messages in thread
From: Mattias Rönnblom @ 2022-04-03  6:33 UTC (permalink / raw)
  To: Honnappa Nagarahalli, dev
  Cc: thomas, David Marchand, onar.olsen, nd, konstantin.ananyev, mb,
	stephen, Ola Liljedahl

I missed some of your comments.

On 2022-04-02 02:21, Honnappa Nagarahalli wrote:

<snip>

>> + * Example usage:
>> + * @code{.c}
>> + * #define MAX_Y_LEN (16)
>> + * // Application-defined example data structure, protected by a seqlock.
>> + * struct config {
>> + *         rte_seqlock_t lock;
>> + *         int param_x;
>> + *         char param_y[MAX_Y_LEN];
>> + * };
>> + *
>> + * // Accessor function for reading config fields.
>> + * void
>> + * config_read(const struct config *config, int *param_x, char
>> +*param_y)
>> + * {
>> + *         // Temporary variables, just to improve readability.
> I think the above comment is not necessary. It is beneficial to copy the protected data to keep the read side critical section small.
> 

The data here would be copied into the buffers supplied by config_read() 
anyways, so it's a copy regardless.

>> + *         int tentative_x;
>> + *         char tentative_y[MAX_Y_LEN];
>> + *         uint32_t sn;
>> + *
>> + *         sn = rte_seqlock_read_lock(&config->lock);
>> + *         do {
>> + *                 // Loads may be atomic or non-atomic, as in this example.
>> + *                 tentative_x = config->param_x;
>> + *                 strcpy(tentative_y, config->param_y);
>> + *         } while (!rte_seqlock_read_tryunlock(&config->lock, &sn));
>> + *         // An application could skip retrying, and try again later, if
>> + *         // progress is possible without the data.
>> + *
>> + *         *param_x = tentative_x;
>> + *         strcpy(param_y, tentative_y);
>> + * }
>> + *
>> + * // Accessor function for writing config fields.
>> + * void
>> + * config_update(struct config *config, int param_x, const char
>> +*param_y)
>> + * {
>> + *         rte_seqlock_write_lock(&config->lock);
>> + *         // Stores may be atomic or non-atomic, as in this example.
>> + *         config->param_x = param_x;
>> + *         strcpy(config->param_y, param_y);
>> + *         rte_seqlock_write_unlock(&config->lock);
>> + * }
>> + * @endcode
>> + *
>> + * @see
>> + * https://en.wikipedia.org/wiki/Seqlock.
>> + */
>> +
>> +#include <stdbool.h>
>> +#include <stdint.h>
>> +
>> +#include <rte_atomic.h>
>> +#include <rte_branch_prediction.h>
>> +#include <rte_spinlock.h>
>> +
>> +/**
>> + * The RTE seqlock type.
>> + */
>> +typedef struct {
>> +	uint32_t sn; /**< A sequence number for the protected data. */
>> +	rte_spinlock_t lock; /**< Spinlock used to serialize writers.  */ }
> Suggest using ticket lock for the writer side. It should have low overhead when there is a single writer, but provides better functionality when there are multiple writers.
> 

Is a seqlock the synchronization primitive of choice for high-contention 
cases? I would say no, but I'm not sure what you would use instead.

<snip>

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3] eal: add seqlock
  2022-04-02 18:15                                           ` Ola Liljedahl
  2022-04-02 19:31                                             ` Honnappa Nagarahalli
@ 2022-04-03  6:51                                             ` Mattias Rönnblom
  1 sibling, 0 replies; 104+ messages in thread
From: Mattias Rönnblom @ 2022-04-03  6:51 UTC (permalink / raw)
  To: Ola Liljedahl, dev
  Cc: Thomas Monjalon, David Marchand, onar.olsen,
	Honnappa.Nagarahalli, nd, konstantin.ananyev, mb, stephen

On 2022-04-02 20:15, Ola Liljedahl wrote:
> On 4/1/22 17:07, Mattias Rönnblom wrote:
>> +
>> +/**
>> + * End a read-side critical section.
>> + *
>> + * A call to this function marks the end of a read-side critical
>> + * section, for @p seqlock. The application must supply the sequence
>> + * number produced by the corresponding rte_seqlock_read_lock() (or,
>> + * in case of a retry, the rte_seqlock_tryunlock()) call.
>> + *
>> + * After this function has been called, the caller should not access
>> + * the protected data.
>> + *
>> + * In case this function returns true, the just-read data was
>> + * consistent and the set of atomic and non-atomic load operations
>> + * performed between rte_seqlock_read_lock() and
>> + * rte_seqlock_read_tryunlock() were atomic, as a whole.
>> + *
>> + * In case rte_seqlock_read_tryunlock() returns false, the data was
>> + * modified as it was being read and may be inconsistent, and thus
>> + * should be discarded. The @p begin_sn is updated with the
>> + * now-current sequence number.
>> + *
>> + * @param seqlock
>> + *   A pointer to the seqlock.
>> + * @param begin_sn
>> + *   The seqlock sequence number returned by
>> + *   rte_seqlock_read_lock() (potentially updated in subsequent
>> + *   rte_seqlock_read_tryunlock() calls) for this critical section.
>> + * @return
>> + *   true or false, if the just-read seqlock-protected data was 
>> consistent
>> + *   or inconsistent, respectively, at the time it was read.
>> + *
>> + * @see rte_seqlock_read_lock()
>> + */
>> +__rte_experimental
>> +static inline bool
>> +rte_seqlock_read_tryunlock(const rte_seqlock_t *seqlock, uint32_t 
>> *begin_sn)
>> +{
>> +    uint32_t end_sn;
>> +
>> +    /* make sure the data loads happens before the sn load */
>> +    rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
>> +
>> +    end_sn = __atomic_load_n(&seqlock->sn, __ATOMIC_RELAXED);
> 
> Since we are reading and potentially returning the sequence number here 
> (repeating the read of the protected data), we need to use load-acquire. 
> I assume it is not expected that the user will call 
> rte_seqlock_read_lock() again.
> 

Good point.

> Seeing this implementation, I might actually prefer the original 
> implementation, I think it is cleaner.

Me too.

> But I would like for the begin 
> function also to wait for an even sequence number, the end function 
> would only have to check for same sequence number, this might improve 
> performance a little bit as readers won't perform one or several broken 
> reads while a write is in progress. The function names are a different 
> thing though.
> 

The odd sn should be a rare case, if the seqlock is used for relatively 
low frequency update scenarios, which is what I think it should be 
designed for.

Waiting for an even sn in read_begin() would exclude the option for the 
caller to defer reading the new data to same later time, in case it's 
being written. That in turn would force even a single writer to make 
sure its thread is not preempted, or risk blocking all lcore worker 
cores attempting to read the protected data.

You could complete the above API with a read_trybegin() function to 
address that issue, for those who care, but that would force some extra 
complexity on the user.

> The writer side behaves much more like a lock with mutual exclusion so 
> write_lock/write_unlock makes sense.
> 
>> +
>> +    if (unlikely(end_sn & 1 || *begin_sn != end_sn)) {
>> +        *begin_sn = end_sn;
>> +        return false;
>> +    }
>> +
>> +    return true;
>> +}
>> +

^ permalink raw reply	[flat|nested] 104+ messages in thread

* RE: [PATCH v3] eal: add seqlock
  2022-04-03  6:10                                             ` [PATCH v3] eal: add seqlock Mattias Rönnblom
@ 2022-04-03 17:27                                               ` Honnappa Nagarahalli
  2022-04-03 18:37                                                 ` Ola Liljedahl
  0 siblings, 1 reply; 104+ messages in thread
From: Honnappa Nagarahalli @ 2022-04-03 17:27 UTC (permalink / raw)
  To: Mattias Rönnblom, dev
  Cc: thomas, David Marchand, onar.olsen, nd, konstantin.ananyev, mb,
	stephen, Ola Liljedahl, nd

<snip>

> >> a/app/test/test_seqlock.c b/app/test/test_seqlock.c new file mode
> >> 100644 index 0000000000..54fadf8025
> >> --- /dev/null
> >> +++ b/app/test/test_seqlock.c
> >> @@ -0,0 +1,202 @@
> >> +/* SPDX-License-Identifier: BSD-3-Clause
> >> + * Copyright(c) 2022 Ericsson AB
> >> + */
> >> +
> >> +#include <rte_seqlock.h>
> >> +
> >> +#include <rte_cycles.h>
> >> +#include <rte_malloc.h>
> >> +#include <rte_random.h>
> >> +
> >> +#include <inttypes.h>
> >> +
> >> +#include "test.h"
> >> +
> >> +struct data {
> >> +	rte_seqlock_t lock;
> >> +
> >> +	uint64_t a;
> >> +	uint64_t b __rte_cache_aligned;
> >> +	uint64_t c __rte_cache_aligned;
> >> +} __rte_cache_aligned;
> >> +
> >> +struct reader {
> >> +	struct data *data;
> >> +	uint8_t stop;
> >> +};
> >> +
> >> +#define WRITER_RUNTIME (2.0) /* s */
> >> +
> >> +#define WRITER_MAX_DELAY (100) /* us */
> >> +
> >> +#define INTERRUPTED_WRITER_FREQUENCY (1000) #define
> >> +WRITER_INTERRUPT_TIME (1) /* us */
> >> +
> >> +static int
> >> +writer_run(void *arg)
> >> +{
> >> +	struct data *data = arg;
> >> +	uint64_t deadline;
> >> +
> >> +	deadline = rte_get_timer_cycles() +
> >> +		WRITER_RUNTIME * rte_get_timer_hz();
> >> +
> >> +	while (rte_get_timer_cycles() < deadline) {
> >> +		bool interrupted;
> >> +		uint64_t new_value;
> >> +		unsigned int delay;
> >> +
> >> +		new_value = rte_rand();
> >> +
> >> +		interrupted =
> >> rte_rand_max(INTERRUPTED_WRITER_FREQUENCY) == 0;
> >> +
> >> +		rte_seqlock_write_lock(&data->lock);
> >> +
> >> +		data->c = new_value;
> >> +
> >> +		/* These compiler barriers (both on the test reader
> >> +		 * and the test writer side) are here to ensure that
> >> +		 * loads/stores *usually* happen in test program order
> >> +		 * (always on a TSO machine). They are arrange in such
> >> +		 * a way that the writer stores in a different order
> >> +		 * than the reader loads, to emulate an arbitrary
> >> +		 * order. A real application using a seqlock does not
> >> +		 * require any compiler barriers.
> >> +		 */
> >> +		rte_compiler_barrier();
> > The compiler barriers are not sufficient on all architectures (if the intention
> is to maintain the program order).
> >
> 
> The intention is what is described in the comment (i.e., to make it likely, but
> no guaranteed, that the stores will be globally visible in the program order).
> 
> The reason I didn't put in a release memory barrier, was that it seems a little
> intrusive.
> 
> Maybe I should remove these compiler barriers. They are also intrusive in the
> way may prevent some compiler optimizations, that could expose a seqlock
> bug. Or, I could have two variants of the tests. I don't know.
I would suggest removing the compiler barriers, leave it to the CPU to do what it can do.

> 
> >> +		data->b = new_value;
> >> +
> >> +		if (interrupted)
> >> +			rte_delay_us_block(WRITER_INTERRUPT_TIME);
> >> +
> >> +		rte_compiler_barrier();
> >> +		data->a = new_value;
> >> +
> >> +		rte_seqlock_write_unlock(&data->lock);
> >> +
> >> +		delay = rte_rand_max(WRITER_MAX_DELAY);
> >> +
> >> +		rte_delay_us_block(delay);
> >> +	}
> >> +
> >> +	return 0;
> >> +}
> >> +

<snip>

> >> +
> >> +/**
> >> + * Begin a write-side critical section.
> >> + *
> >> + * A call to this function acquires the write lock associated @p
> >> + * seqlock, and marks the beginning of a write-side critical section.
> >> + *
> >> + * After having called this function, the caller may go on to modify
> >> + * (both read and write) the protected data, in an atomic or
> >> + * non-atomic manner.
> >> + *
> >> + * After the necessary updates have been performed, the application
> >> + * calls rte_seqlock_write_unlock().
> >> + *
> >> + * This function is not preemption-safe in the sense that preemption
> >> + * of the calling thread may block reader progress until the writer
> >> + * thread is rescheduled.
> >> + *
> >> + * Unlike rte_seqlock_read_lock(), each call made to
> >> + * rte_seqlock_write_lock() must be matched with an unlock call.
> >> + *
> >> + * @param seqlock
> >> + *   A pointer to the seqlock.
> >> + *
> >> + * @see rte_seqlock_write_unlock()
> >> + */
> >> +__rte_experimental
> >> +static inline void
> >> +rte_seqlock_write_lock(rte_seqlock_t *seqlock) {
> >> +	uint32_t sn;
> >> +
> >> +	/* to synchronize with other writers */
> >> +	rte_spinlock_lock(&seqlock->lock);
> >> +
> >> +	sn = seqlock->sn + 1;
> > The load of seqlock->sn could use __atomic_load_n to be consistent.
> >
> 
> But why? I know it doesn't have any cost (these loads are going to be atomic
> anyways), but why use a construct with stronger guarantees than you have
> to?
Using __atomic_xxx ensures that the operation is atomic always. I believe (I am not sure) that, when not using __atomic_xxx, the compiler is allowed to use non-atomic operations.
The other reason is we are not qualifying 'sn' as volatile. Use of __atomic_xxx inherently indicate to the compiler not to cache 'sn' in a register. I do not know the compiler behavior if some operations on 'sn' use __atomic_xxx and some do not.

> 
> >> +
> >> +	__atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELAXED);
> >> +
> >> +	/* __ATOMIC_RELEASE to prevent stores after (in program order)
> >> +	 * from happening before the sn store.
> >> +	 */
> >> +	rte_atomic_thread_fence(__ATOMIC_RELEASE);
> >> +}
> >> +
> >> +/**
> >> + * End a write-side critical section.
> >> + *
> >> + * A call to this function marks the end of the write-side critical
> >> + * section, for @p seqlock. After this call has been made, the
> >> +protected
> >> + * data may no longer be modified.
> >> + *
> >> + * @param seqlock
> >> + *   A pointer to the seqlock.
> >> + *
> >> + * @see rte_seqlock_write_lock()
> >> + */
> >> +__rte_experimental
> >> +static inline void
> >> +rte_seqlock_write_unlock(rte_seqlock_t *seqlock) {
> >> +	uint32_t sn;
> >> +
> >> +	sn = seqlock->sn + 1;
> > Same here, the load of seqlock->sn could use __atomic_load_n
> >
> >> +
> >> +	/* synchronizes-with the load acquire in rte_seqlock_read_lock() */
> >> +	__atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELEASE);
> >> +
> >> +	rte_spinlock_unlock(&seqlock->lock);
> >> +}
> >> +
> >> +#ifdef __cplusplus
> >> +}
> >> +#endif
> >> +
> >> +#endif  /* _RTE_SEQLOCK_H_ */
<snip>

^ permalink raw reply	[flat|nested] 104+ messages in thread

* RE: [PATCH v3] eal: add seqlock
  2022-04-03  6:33                                             ` Mattias Rönnblom
@ 2022-04-03 17:37                                               ` Honnappa Nagarahalli
  2022-04-08 13:45                                                 ` Mattias Rönnblom
  0 siblings, 1 reply; 104+ messages in thread
From: Honnappa Nagarahalli @ 2022-04-03 17:37 UTC (permalink / raw)
  To: Mattias Rönnblom, dev
  Cc: thomas, David Marchand, onar.olsen, nd, konstantin.ananyev, mb,
	stephen, Ola Liljedahl, nd

<snip>

> 
> >> + * Example usage:
> >> + * @code{.c}
> >> + * #define MAX_Y_LEN (16)
> >> + * // Application-defined example data structure, protected by a seqlock.
> >> + * struct config {
> >> + *         rte_seqlock_t lock;
> >> + *         int param_x;
> >> + *         char param_y[MAX_Y_LEN];
> >> + * };
> >> + *
> >> + * // Accessor function for reading config fields.
> >> + * void
> >> + * config_read(const struct config *config, int *param_x, char
> >> +*param_y)
> >> + * {
> >> + *         // Temporary variables, just to improve readability.
> > I think the above comment is not necessary. It is beneficial to copy the
> protected data to keep the read side critical section small.
> >
> 
> The data here would be copied into the buffers supplied by config_read()
> anyways, so it's a copy regardless.
I see what you mean here. I would think the local variables add confusion, the copy can happen to the passed parameters directly. I will leave it to you to decide.

> 
> >> + *         int tentative_x;
> >> + *         char tentative_y[MAX_Y_LEN];
> >> + *         uint32_t sn;
> >> + *
> >> + *         sn = rte_seqlock_read_lock(&config->lock);
> >> + *         do {
> >> + *                 // Loads may be atomic or non-atomic, as in this example.
> >> + *                 tentative_x = config->param_x;
> >> + *                 strcpy(tentative_y, config->param_y);
> >> + *         } while (!rte_seqlock_read_tryunlock(&config->lock, &sn));
> >> + *         // An application could skip retrying, and try again later, if
> >> + *         // progress is possible without the data.
> >> + *
> >> + *         *param_x = tentative_x;
> >> + *         strcpy(param_y, tentative_y);
> >> + * }
> >> + *
> >> + * // Accessor function for writing config fields.
> >> + * void
> >> + * config_update(struct config *config, int param_x, const char
> >> +*param_y)
> >> + * {
> >> + *         rte_seqlock_write_lock(&config->lock);
> >> + *         // Stores may be atomic or non-atomic, as in this example.
> >> + *         config->param_x = param_x;
> >> + *         strcpy(config->param_y, param_y);
> >> + *         rte_seqlock_write_unlock(&config->lock);
> >> + * }
> >> + * @endcode
> >> + *
> >> + * @see
> >> + * https://en.wikipedia.org/wiki/Seqlock.
> >> + */
> >> +
> >> +#include <stdbool.h>
> >> +#include <stdint.h>
> >> +
> >> +#include <rte_atomic.h>
> >> +#include <rte_branch_prediction.h>
> >> +#include <rte_spinlock.h>
> >> +
> >> +/**
> >> + * The RTE seqlock type.
> >> + */
> >> +typedef struct {
> >> +	uint32_t sn; /**< A sequence number for the protected data. */
> >> +	rte_spinlock_t lock; /**< Spinlock used to serialize writers.  */ }
> > Suggest using ticket lock for the writer side. It should have low overhead
> when there is a single writer, but provides better functionality when there are
> multiple writers.
> >
> 
> Is a seqlock the synchronization primitive of choice for high-contention cases?
> I would say no, but I'm not sure what you would use instead.
I think Stephen has come across some use cases of high contention writers with readers, maybe Stephen can provide some input.

IMO, there is no harm/perf issues in using ticket lock.

> 
> <snip>

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3] eal: add seqlock
  2022-04-02 19:31                                             ` Honnappa Nagarahalli
  2022-04-02 20:36                                               ` Morten Brørup
@ 2022-04-03 18:11                                               ` Ola Liljedahl
  1 sibling, 0 replies; 104+ messages in thread
From: Ola Liljedahl @ 2022-04-03 18:11 UTC (permalink / raw)
  To: Honnappa Nagarahalli
  Cc: Mattias Rönnblom, dev, thomas, David Marchand, onar.olsen,
	nd, konstantin.ananyev, mb, stephen

(Now using macOS Mail program in plain text mode, hope this works)

> On 2 Apr 2022, at 21:31, Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com> wrote:
> 
> <snip>
> 
>>> +__rte_experimental
>>> +static inline bool
>>> +rte_seqlock_read_tryunlock(const rte_seqlock_t *seqlock, uint32_t
>>> +*begin_sn) {
>>> +	uint32_t end_sn;
>>> +
>>> +	/* make sure the data loads happens before the sn load */
>>> +	rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
>>> +
>>> +	end_sn = __atomic_load_n(&seqlock->sn, __ATOMIC_RELAXED);
>> 
>> Since we are reading and potentially returning the sequence number here
>> (repeating the read of the protected data), we need to use load-acquire.
>> I assume it is not expected that the user will call
>> rte_seqlock_read_lock() again.
> Good point, we need a load-acquire (due to changes done in v3).
> 
>> 
>> Seeing this implementation, I might actually prefer the original
>> implementation, I think it is cleaner. But I would like for the begin function
>> also to wait for an even sequence number, the end function would only have
>> to check for same sequence number, this might improve performance a little
>> bit as readers won't perform one or several broken reads while a write is in
>> progress. The function names are a different thing though.
> I think we need to be optimizing for the case where there is no contention between readers and writers (as that happens most of the time). From this perspective, not checking for an even seq number in the begin function would reduce one 'if' statement.
The number of statements in C is not relevant, instead we need to look at the generated code. On x86, I would assume an if-statement like “if ((sn & 1) || (sn == sl->sn))” to generate two separate evaluations with their own conditional jump instructions. On AArch64, the two evaluations could probably be combined using a CCMP instruction and need only one conditional branch instruction. With branch prediction, it is doubtful we will see any difference in performance.

> 
> Going back to the earlier model is better as well, because of the load-acquire required in the 'rte_seqlock_read_tryunlock' function. The earlier model would not introduce the load-acquire for the no contention case.
The earlier model still had load-acquire in the read_begin function which would have to invoked again. There is no difference in the number or type of memory accesses. We just need to copy the implementation of read_begin into the read_tryunlock function if we decide that the user should not have to re-invoke read_begin on a failed read_tryunlock.

> 
>> 
>> The writer side behaves much more like a lock with mutual exclusion so
>> write_lock/write_unlock makes sense.
>> 
>>> +
>>> +	if (unlikely(end_sn & 1 || *begin_sn != end_sn)) {
>>> +		*begin_sn = end_sn;
>>> +		return false;
>>> +	}
>>> +
>>> +	return true;
>>> +}
>>> +


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3] eal: add seqlock
  2022-04-03 17:27                                               ` Honnappa Nagarahalli
@ 2022-04-03 18:37                                                 ` Ola Liljedahl
  2022-04-04 21:56                                                   ` Honnappa Nagarahalli
  0 siblings, 1 reply; 104+ messages in thread
From: Ola Liljedahl @ 2022-04-03 18:37 UTC (permalink / raw)
  To: Honnappa Nagarahalli
  Cc: Mattias Rönnblom, dev, thomas, David Marchand, onar.olsen,
	nd, konstantin.ananyev, mb, stephen


>>>> +__rte_experimental
>>>> +static inline void
>>>> +rte_seqlock_write_lock(rte_seqlock_t *seqlock) {
>>>> +  uint32_t sn;
>>>> +
>>>> +  /* to synchronize with other writers */
>>>> +  rte_spinlock_lock(&seqlock->lock);
>>>> +
>>>> +  sn = seqlock->sn + 1;
>>> The load of seqlock->sn could use __atomic_load_n to be consistent.
>>> 
>> 
>> But why? I know it doesn't have any cost (these loads are going to be atomic
>> anyways), but why use a construct with stronger guarantees than you have
>> to?
> Using __atomic_xxx ensures that the operation is atomic always. I believe (I am not sure) that, when not using __atomic_xxx, the compiler is allowed to use non-atomic operations.
> The other reason is we are not qualifying 'sn' as volatile. Use of __atomic_xxx inherently indicate to the compiler not to cache 'sn' in a register. I do not know the compiler behavior if some operations on 'sn' use __atomic_xxx and some do not.
We don’t need an atomic read here as the seqlock->lock protects (serialises) writer-side accesses to seqlock->sn. There is no other thread which could update seqlock->sn while this thread owns the lock. The seqlock owner could read seqlock->sn byte for byte without any problems.
Only writes to seqlock->sn need to be atomic as there might be readers who read seqlock->sn and in such multi-access scenarios, all accesses need to be atomic in order to avoid data races.

If seqlock->sn was a C11 _Atomic type, all accesses would automatically be atomic.

- Ola


^ permalink raw reply	[flat|nested] 104+ messages in thread

* RE: [PATCH v3] eal: add seqlock
  2022-04-03 18:37                                                 ` Ola Liljedahl
@ 2022-04-04 21:56                                                   ` Honnappa Nagarahalli
  0 siblings, 0 replies; 104+ messages in thread
From: Honnappa Nagarahalli @ 2022-04-04 21:56 UTC (permalink / raw)
  To: Ola Liljedahl
  Cc: Mattias Rönnblom, dev, thomas, David Marchand, onar.olsen,
	nd, konstantin.ananyev, mb, stephen, nd

<snip>
> 
> 
> >>>> +__rte_experimental
> >>>> +static inline void
> >>>> +rte_seqlock_write_lock(rte_seqlock_t *seqlock) {
> >>>> +  uint32_t sn;
> >>>> +
> >>>> +  /* to synchronize with other writers */
> >>>> + rte_spinlock_lock(&seqlock->lock);
> >>>> +
> >>>> +  sn = seqlock->sn + 1;
> >>> The load of seqlock->sn could use __atomic_load_n to be consistent.
> >>>
> >>
> >> But why? I know it doesn't have any cost (these loads are going to be
> >> atomic anyways), but why use a construct with stronger guarantees
> >> than you have to?
> > Using __atomic_xxx ensures that the operation is atomic always. I believe (I
> am not sure) that, when not using __atomic_xxx, the compiler is allowed to
> use non-atomic operations.
> > The other reason is we are not qualifying 'sn' as volatile. Use of
> __atomic_xxx inherently indicate to the compiler not to cache 'sn' in a
> register. I do not know the compiler behavior if some operations on 'sn' use
> __atomic_xxx and some do not.
> We don’t need an atomic read here as the seqlock->lock protects (serialises)
> writer-side accesses to seqlock->sn. There is no other thread which could
> update seqlock->sn while this thread owns the lock. The seqlock owner could
> read seqlock->sn byte for byte without any problems.
> Only writes to seqlock->sn need to be atomic as there might be readers who
> read seqlock->sn and in such multi-access scenarios, all accesses need to be
> atomic in order to avoid data races.
How does the compiler interpret a mix of __atomic_xxx and non-atomic access to a memory location? What are the guarantees the compiler provides in such cases?
Do you see any harm in making this an atomic operation?

> 
> If seqlock->sn was a C11 _Atomic type, all accesses would automatically be
> atomic.
Exactly, we do not have that currently.

> 
> - Ola
> 


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2] eal: add seqlock
  2022-03-30 14:26                         ` [PATCH v2] " Mattias Rönnblom
  2022-03-31  7:46                           ` Mattias Rönnblom
  2022-04-02  0:50                           ` Stephen Hemminger
@ 2022-04-05 20:16                           ` Stephen Hemminger
  2022-04-08 13:50                             ` Mattias Rönnblom
  2 siblings, 1 reply; 104+ messages in thread
From: Stephen Hemminger @ 2022-04-05 20:16 UTC (permalink / raw)
  To: Mattias Rönnblom
  Cc: dev, Thomas Monjalon, David Marchand, onar.olsen,
	Honnappa.Nagarahalli, nd, konstantin.ananyev, mb, Ola Liljedahl

On Wed, 30 Mar 2022 16:26:02 +0200
Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote:

> +/**
> + * A static seqlock initializer.
> + */
> +#define RTE_SEQLOCK_INITIALIZER { 0, RTE_SPINLOCK_INITIALIZER }

Used named field initializers here please.

> +/**
> + * Initialize the seqlock.
> + *
> + * This function initializes the seqlock, and leaves the writer-side
> + * spinlock unlocked.
> + *
> + * @param seqlock
> + *   A pointer to the seqlock.
> + */
> +__rte_experimental
> +void
> +rte_seqlock_init(rte_seqlock_t *seqlock);

You need to add the standard experimental prefix
to the comment so that doxygen marks the API as experimental
in documentation.

> +static inline bool
> +rte_seqlock_read_retry(const rte_seqlock_t *seqlock, uint32_t begin_sn)
> +{
> +	uint32_t end_sn;
> +
> +	/* make sure the data loads happens before the sn load */
> +	rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
> +
> +	end_sn = __atomic_load_n(&seqlock->sn, __ATOMIC_RELAXED);
> +
> +	return unlikely(begin_sn & 1 || begin_sn != end_sn);

Please add parenthesis around the and to test if odd.
It would be good to document why if begin_sn is odd it returns false.



^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v3] eal: add seqlock
  2022-04-03 17:37                                               ` Honnappa Nagarahalli
@ 2022-04-08 13:45                                                 ` Mattias Rönnblom
  0 siblings, 0 replies; 104+ messages in thread
From: Mattias Rönnblom @ 2022-04-08 13:45 UTC (permalink / raw)
  To: Honnappa Nagarahalli, dev
  Cc: thomas, David Marchand, onar.olsen, nd, konstantin.ananyev, mb,
	stephen, Ola Liljedahl, Mattias Rönnblom

On 2022-04-03 19:37, Honnappa Nagarahalli wrote:
> <snip>
> 
>>
>>>> + * Example usage:
>>>> + * @code{.c}
>>>> + * #define MAX_Y_LEN (16)
>>>> + * // Application-defined example data structure, protected by a seqlock.
>>>> + * struct config {
>>>> + *         rte_seqlock_t lock;
>>>> + *         int param_x;
>>>> + *         char param_y[MAX_Y_LEN];
>>>> + * };
>>>> + *
>>>> + * // Accessor function for reading config fields.
>>>> + * void
>>>> + * config_read(const struct config *config, int *param_x, char
>>>> +*param_y)
>>>> + * {
>>>> + *         // Temporary variables, just to improve readability.
>>> I think the above comment is not necessary. It is beneficial to copy the
>> protected data to keep the read side critical section small.
>>>
>>
>> The data here would be copied into the buffers supplied by config_read()
>> anyways, so it's a copy regardless.
> I see what you mean here. I would think the local variables add confusion, the copy can happen to the passed parameters directly. I will leave it to you to decide.
> 

I'll remove the temp variables.

>>
>>>> + *         int tentative_x;
>>>> + *         char tentative_y[MAX_Y_LEN];
>>>> + *         uint32_t sn;
>>>> + *
>>>> + *         sn = rte_seqlock_read_lock(&config->lock);
>>>> + *         do {
>>>> + *                 // Loads may be atomic or non-atomic, as in this example.
>>>> + *                 tentative_x = config->param_x;
>>>> + *                 strcpy(tentative_y, config->param_y);
>>>> + *         } while (!rte_seqlock_read_tryunlock(&config->lock, &sn));
>>>> + *         // An application could skip retrying, and try again later, if
>>>> + *         // progress is possible without the data.
>>>> + *
>>>> + *         *param_x = tentative_x;
>>>> + *         strcpy(param_y, tentative_y);
>>>> + * }
>>>> + *
>>>> + * // Accessor function for writing config fields.
>>>> + * void
>>>> + * config_update(struct config *config, int param_x, const char
>>>> +*param_y)
>>>> + * {
>>>> + *         rte_seqlock_write_lock(&config->lock);
>>>> + *         // Stores may be atomic or non-atomic, as in this example.
>>>> + *         config->param_x = param_x;
>>>> + *         strcpy(config->param_y, param_y);
>>>> + *         rte_seqlock_write_unlock(&config->lock);
>>>> + * }
>>>> + * @endcode
>>>> + *
>>>> + * @see
>>>> + * https://en.wikipedia.org/wiki/Seqlock.
>>>> + */
>>>> +
>>>> +#include <stdbool.h>
>>>> +#include <stdint.h>
>>>> +
>>>> +#include <rte_atomic.h>
>>>> +#include <rte_branch_prediction.h>
>>>> +#include <rte_spinlock.h>
>>>> +
>>>> +/**
>>>> + * The RTE seqlock type.
>>>> + */
>>>> +typedef struct {
>>>> +	uint32_t sn; /**< A sequence number for the protected data. */
>>>> +	rte_spinlock_t lock; /**< Spinlock used to serialize writers.  */ }
>>> Suggest using ticket lock for the writer side. It should have low overhead
>> when there is a single writer, but provides better functionality when there are
>> multiple writers.
>>>
>>
>> Is a seqlock the synchronization primitive of choice for high-contention cases?
>> I would say no, but I'm not sure what you would use instead.
> I think Stephen has come across some use cases of high contention writers with readers, maybe Stephen can provide some input.
> 
> IMO, there is no harm/perf issues in using ticket lock.
> 

OK. I will leave at as spinlock for now (PATCH v4).

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v2] eal: add seqlock
  2022-04-05 20:16                           ` Stephen Hemminger
@ 2022-04-08 13:50                             ` Mattias Rönnblom
  2022-04-08 14:24                               ` [PATCH v4] " Mattias Rönnblom
  0 siblings, 1 reply; 104+ messages in thread
From: Mattias Rönnblom @ 2022-04-08 13:50 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: dev, Thomas Monjalon, David Marchand, onar.olsen,
	Honnappa.Nagarahalli, nd, konstantin.ananyev, mb, Ola Liljedahl,
	Mattias Rönnblom

On 2022-04-05 22:16, Stephen Hemminger wrote:
> On Wed, 30 Mar 2022 16:26:02 +0200
> Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote:
> 
>> +/**
>> + * A static seqlock initializer.
>> + */
>> +#define RTE_SEQLOCK_INITIALIZER { 0, RTE_SPINLOCK_INITIALIZER }
> 
> Used named field initializers here please.
> 

OK.

>> +/**
>> + * Initialize the seqlock.
>> + *
>> + * This function initializes the seqlock, and leaves the writer-side
>> + * spinlock unlocked.
>> + *
>> + * @param seqlock
>> + *   A pointer to the seqlock.
>> + */
>> +__rte_experimental
>> +void
>> +rte_seqlock_init(rte_seqlock_t *seqlock);
> 
> You need to add the standard experimental prefix
> to the comment so that doxygen marks the API as experimental
> in documentation.
> 

OK.

>> +static inline bool
>> +rte_seqlock_read_retry(const rte_seqlock_t *seqlock, uint32_t begin_sn)
>> +{
>> +	uint32_t end_sn;
>> +
>> +	/* make sure the data loads happens before the sn load */
>> +	rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
>> +
>> +	end_sn = __atomic_load_n(&seqlock->sn, __ATOMIC_RELAXED);
>> +
>> +	return unlikely(begin_sn & 1 || begin_sn != end_sn);
> 
> Please add parenthesis around the and to test if odd.
> It would be good to document why if begin_sn is odd it returns false.
> 
> 

Will do. Thanks.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v4] eal: add seqlock
  2022-04-08 13:50                             ` Mattias Rönnblom
@ 2022-04-08 14:24                               ` Mattias Rönnblom
  2022-04-08 15:17                                 ` Stephen Hemminger
                                                   ` (4 more replies)
  0 siblings, 5 replies; 104+ messages in thread
From: Mattias Rönnblom @ 2022-04-08 14:24 UTC (permalink / raw)
  To: dev
  Cc: Thomas Monjalon, David Marchand, onar.olsen,
	Honnappa.Nagarahalli, nd, konstantin.ananyev, mb, stephen,
	hofors, Mattias Rönnblom, Ola Liljedahl

A sequence lock (seqlock) is synchronization primitive which allows
for data-race free, low-overhead, high-frequency reads, especially for
data structures shared across many cores and which are updated
relatively infrequently.

A seqlock permits multiple parallel readers. The variant of seqlock
implemented in this patch supports multiple writers as well. A
spinlock is used for writer-writer serialization.

To avoid resource reclamation and other issues, the data protected by
a seqlock is best off being self-contained (i.e., no pointers [except
to constant data]).

One way to think about seqlocks is that they provide means to perform
atomic operations on data objects larger what the native atomic
machine instructions allow for.

DPDK seqlocks are not preemption safe on the writer side. A thread
preemption affects performance, not correctness.

A seqlock contains a sequence number, which can be thought of as the
generation of the data it protects.

A reader will
  1. Load the sequence number (sn).
  2. Load, in arbitrary order, the seqlock-protected data.
  3. Load the sn again.
  4. Check if the first and second sn are equal, and even numbered.
     If they are not, discard the loaded data, and restart from 1.

The first three steps need to be ordered using suitable memory fences.

A writer will
  1. Take the spinlock, to serialize writer access.
  2. Load the sn.
  3. Store the original sn + 1 as the new sn.
  4. Perform load and stores to the seqlock-protected data.
  5. Store the original sn + 2 as the new sn.
  6. Release the spinlock.

Proper memory fencing is required to make sure the first sn store, the
data stores, and the second sn store appear to the reader in the
mentioned order.

The sn loads and stores must be atomic, but the data loads and stores
need not be.

The original seqlock design and implementation was done by Stephen
Hemminger. This is an independent implementation, using C11 atomics.

For more information on seqlocks, see
https://en.wikipedia.org/wiki/Seqlock

PATCH v4:
  * Reverted to Linux kernel style naming on the read side.
  * Bail out early from the retry function if an odd sequence
    number is encountered.
  * Added experimental warnings in the API documentation.
  * Static initializer now uses named field initialization.
  * Various tweaks to API documentation (including the example).

PATCH v3:
  * Renamed both read and write-side critical section begin/end functions
    to better match rwlock naming, per Ola Liljedahl's suggestion.
  * Added 'extern "C"' guards for C++ compatibility.
  * Refer to the main lcore as the main lcore, and nothing else.

PATCH v2:
  * Skip instead of fail unit test in case too few lcores are available.
  * Use main lcore for testing, reducing the minimum number of lcores
    required to run the unit tests to four.
  * Consistently refer to sn field as the "sequence number" in the
    documentation.
  * Fixed spelling mistakes in documentation.

Updates since RFC:
  * Added API documentation.
  * Added link to Wikipedia article in the commit message.
  * Changed seqlock sequence number field from uint64_t (which was
    overkill) to uint32_t. The sn type needs to be sufficiently large
    to assure no reader will read a sn, access the data, and then read
    the same sn, but the sn has been incremented enough times to have
    wrapped during the read, and arrived back at the original sn.
  * Added RTE_SEQLOCK_INITIALIZER macro for static initialization.
  * Removed the rte_seqlock struct + separate rte_seqlock_t typedef
    with an anonymous struct typedef:ed to rte_seqlock_t.

Acked-by: Morten Brørup <mb@smartsharesystems.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
---
 app/test/meson.build          |   2 +
 app/test/test_seqlock.c       | 202 +++++++++++++++++++++
 lib/eal/common/meson.build    |   1 +
 lib/eal/common/rte_seqlock.c  |  12 ++
 lib/eal/include/meson.build   |   1 +
 lib/eal/include/rte_seqlock.h | 319 ++++++++++++++++++++++++++++++++++
 lib/eal/version.map           |   3 +
 7 files changed, 540 insertions(+)
 create mode 100644 app/test/test_seqlock.c
 create mode 100644 lib/eal/common/rte_seqlock.c
 create mode 100644 lib/eal/include/rte_seqlock.h

diff --git a/app/test/meson.build b/app/test/meson.build
index 5fc1dd1b7b..5e418e8766 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -125,6 +125,7 @@ test_sources = files(
         'test_rwlock.c',
         'test_sched.c',
         'test_security.c',
+        'test_seqlock.c',
         'test_service_cores.c',
         'test_spinlock.c',
         'test_stack.c',
@@ -214,6 +215,7 @@ fast_tests = [
         ['rwlock_rde_wro_autotest', true],
         ['sched_autotest', true],
         ['security_autotest', false],
+        ['seqlock_autotest', true],
         ['spinlock_autotest', true],
         ['stack_autotest', false],
         ['stack_lf_autotest', false],
diff --git a/app/test/test_seqlock.c b/app/test/test_seqlock.c
new file mode 100644
index 0000000000..3f1ce53678
--- /dev/null
+++ b/app/test/test_seqlock.c
@@ -0,0 +1,202 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2022 Ericsson AB
+ */
+
+#include <rte_seqlock.h>
+
+#include <rte_cycles.h>
+#include <rte_malloc.h>
+#include <rte_random.h>
+
+#include <inttypes.h>
+
+#include "test.h"
+
+struct data {
+	rte_seqlock_t lock;
+
+	uint64_t a;
+	uint64_t b __rte_cache_aligned;
+	uint64_t c __rte_cache_aligned;
+} __rte_cache_aligned;
+
+struct reader {
+	struct data *data;
+	uint8_t stop;
+};
+
+#define WRITER_RUNTIME (2.0) /* s */
+
+#define WRITER_MAX_DELAY (100) /* us */
+
+#define INTERRUPTED_WRITER_FREQUENCY (1000)
+#define WRITER_INTERRUPT_TIME (1) /* us */
+
+static int
+writer_run(void *arg)
+{
+	struct data *data = arg;
+	uint64_t deadline;
+
+	deadline = rte_get_timer_cycles() +
+		WRITER_RUNTIME * rte_get_timer_hz();
+
+	while (rte_get_timer_cycles() < deadline) {
+		bool interrupted;
+		uint64_t new_value;
+		unsigned int delay;
+
+		new_value = rte_rand();
+
+		interrupted = rte_rand_max(INTERRUPTED_WRITER_FREQUENCY) == 0;
+
+		rte_seqlock_write_lock(&data->lock);
+
+		data->c = new_value;
+
+		/* These compiler barriers (both on the test reader
+		 * and the test writer side) are here to ensure that
+		 * loads/stores *usually* happen in test program order
+		 * (always on a TSO machine). They are arrange in such
+		 * a way that the writer stores in a different order
+		 * than the reader loads, to emulate an arbitrary
+		 * order. A real application using a seqlock does not
+		 * require any compiler barriers.
+		 */
+		rte_compiler_barrier();
+		data->b = new_value;
+
+		if (interrupted)
+			rte_delay_us_block(WRITER_INTERRUPT_TIME);
+
+		rte_compiler_barrier();
+		data->a = new_value;
+
+		rte_seqlock_write_unlock(&data->lock);
+
+		delay = rte_rand_max(WRITER_MAX_DELAY);
+
+		rte_delay_us_block(delay);
+	}
+
+	return 0;
+}
+
+#define INTERRUPTED_READER_FREQUENCY (1000)
+#define READER_INTERRUPT_TIME (1000) /* us */
+
+static int
+reader_run(void *arg)
+{
+	struct reader *r = arg;
+	int rc = 0;
+
+	while (__atomic_load_n(&r->stop, __ATOMIC_RELAXED) == 0 && rc == 0) {
+		struct data *data = r->data;
+		bool interrupted;
+		uint32_t sn;
+		uint64_t a;
+		uint64_t b;
+		uint64_t c;
+
+		interrupted = rte_rand_max(INTERRUPTED_READER_FREQUENCY) == 0;
+
+		do {
+			sn = rte_seqlock_read_begin(&data->lock);
+
+			a = data->a;
+			/* See writer_run() for an explanation why
+			 * these barriers are here.
+			 */
+			rte_compiler_barrier();
+
+			if (interrupted)
+				rte_delay_us_block(READER_INTERRUPT_TIME);
+
+			c = data->c;
+
+			rte_compiler_barrier();
+			b = data->b;
+
+		} while (rte_seqlock_read_retry(&data->lock, sn));
+
+		if (a != b || b != c) {
+			printf("Reader observed inconsistent data values "
+			       "%" PRIu64 " %" PRIu64 " %" PRIu64 "\n",
+			       a, b, c);
+			rc = -1;
+		}
+	}
+
+	return rc;
+}
+
+static void
+reader_stop(struct reader *reader)
+{
+	__atomic_store_n(&reader->stop, 1, __ATOMIC_RELAXED);
+}
+
+#define NUM_WRITERS (2) /* main lcore + one worker */
+#define MIN_NUM_READERS (2)
+#define MAX_READERS (RTE_MAX_LCORE - NUM_WRITERS - 1)
+#define MIN_LCORE_COUNT (NUM_WRITERS + MIN_NUM_READERS)
+
+/* Only a compile-time test */
+static rte_seqlock_t __rte_unused static_init_lock = RTE_SEQLOCK_INITIALIZER;
+
+static int
+test_seqlock(void)
+{
+	struct reader readers[MAX_READERS];
+	unsigned int num_readers;
+	unsigned int num_lcores;
+	unsigned int i;
+	unsigned int lcore_id;
+	unsigned int reader_lcore_ids[MAX_READERS];
+	unsigned int worker_writer_lcore_id = 0;
+	int rc = 0;
+
+	num_lcores = rte_lcore_count();
+
+	if (num_lcores < MIN_LCORE_COUNT) {
+		printf("Too few cores to run test. Skipping.\n");
+		return 0;
+	}
+
+	num_readers = num_lcores - NUM_WRITERS;
+
+	struct data *data = rte_zmalloc(NULL, sizeof(struct data), 0);
+
+	i = 0;
+	RTE_LCORE_FOREACH_WORKER(lcore_id) {
+		if (i == 0) {
+			rte_eal_remote_launch(writer_run, data, lcore_id);
+			worker_writer_lcore_id = lcore_id;
+		} else {
+			unsigned int reader_idx = i - 1;
+			struct reader *reader = &readers[reader_idx];
+
+			reader->data = data;
+			reader->stop = 0;
+
+			rte_eal_remote_launch(reader_run, reader, lcore_id);
+			reader_lcore_ids[reader_idx] = lcore_id;
+		}
+		i++;
+	}
+
+	if (writer_run(data) != 0 ||
+	    rte_eal_wait_lcore(worker_writer_lcore_id) != 0)
+		rc = -1;
+
+	for (i = 0; i < num_readers; i++) {
+		reader_stop(&readers[i]);
+		if (rte_eal_wait_lcore(reader_lcore_ids[i]) != 0)
+			rc = -1;
+	}
+
+	return rc;
+}
+
+REGISTER_TEST_COMMAND(seqlock_autotest, test_seqlock);
diff --git a/lib/eal/common/meson.build b/lib/eal/common/meson.build
index 917758cc65..a41343bfed 100644
--- a/lib/eal/common/meson.build
+++ b/lib/eal/common/meson.build
@@ -35,6 +35,7 @@ sources += files(
         'rte_malloc.c',
         'rte_random.c',
         'rte_reciprocal.c',
+	'rte_seqlock.c',
         'rte_service.c',
         'rte_version.c',
 )
diff --git a/lib/eal/common/rte_seqlock.c b/lib/eal/common/rte_seqlock.c
new file mode 100644
index 0000000000..d4fe648799
--- /dev/null
+++ b/lib/eal/common/rte_seqlock.c
@@ -0,0 +1,12 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2022 Ericsson AB
+ */
+
+#include <rte_seqlock.h>
+
+void
+rte_seqlock_init(rte_seqlock_t *seqlock)
+{
+	seqlock->sn = 0;
+	rte_spinlock_init(&seqlock->lock);
+}
diff --git a/lib/eal/include/meson.build b/lib/eal/include/meson.build
index 9700494816..48df5f1a21 100644
--- a/lib/eal/include/meson.build
+++ b/lib/eal/include/meson.build
@@ -36,6 +36,7 @@ headers += files(
         'rte_per_lcore.h',
         'rte_random.h',
         'rte_reciprocal.h',
+        'rte_seqlock.h',
         'rte_service.h',
         'rte_service_component.h',
         'rte_string_fns.h',
diff --git a/lib/eal/include/rte_seqlock.h b/lib/eal/include/rte_seqlock.h
new file mode 100644
index 0000000000..961816aa10
--- /dev/null
+++ b/lib/eal/include/rte_seqlock.h
@@ -0,0 +1,319 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2022 Ericsson AB
+ */
+
+#ifndef _RTE_SEQLOCK_H_
+#define _RTE_SEQLOCK_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * @file
+ * RTE Seqlock
+ *
+ * A sequence lock (seqlock) is a synchronization primitive allowing
+ * multiple, parallel, readers to efficiently and safely (i.e., in a
+ * data-race free manner) access lock-protected data. The RTE seqlock
+ * permits multiple writers as well. A spinlock is used to
+ * writer-writer synchronization.
+ *
+ * A reader never blocks a writer. Very high frequency writes may
+ * prevent readers from making progress.
+ *
+ * A seqlock is not preemption-safe on the writer side. If a writer is
+ * preempted, it may block readers until the writer thread is allowed
+ * to continue. Heavy computations should be kept out of the
+ * writer-side critical section, to avoid delaying readers.
+ *
+ * Seqlocks are useful for data which are read by many cores, at a
+ * high frequency, and relatively infrequently written to.
+ *
+ * One way to think about seqlocks is that they provide means to
+ * perform atomic operations on objects larger than what the native
+ * machine instructions allow for.
+ *
+ * To avoid resource reclamation issues, the data protected by a
+ * seqlock should typically be kept self-contained (e.g., no pointers
+ * to mutable, dynamically allocated data).
+ *
+ * Example usage:
+ * @code{.c}
+ * #define MAX_Y_LEN (16)
+ * // Application-defined example data structure, protected by a seqlock.
+ * struct config {
+ *         rte_seqlock_t lock;
+ *         int param_x;
+ *         char param_y[MAX_Y_LEN];
+ * };
+ *
+ * // Accessor function for reading config fields.
+ * void
+ * config_read(const struct config *config, int *param_x, char *param_y)
+ * {
+ *         uint32_t sn;
+ *
+ *         do {
+ *                 sn = rte_seqlock_read_begin(&config->lock);
+ *
+ *                 // Loads may be atomic or non-atomic, as in this example.
+ *                 *param_x = config->param_x;
+ *                 strcpy(param_y, config->param_y);
+ *                 // An alternative to an immediate retry is to abort and
+ *                 // try again at some later time, assuming progress is
+ *                 // possible without the data.
+ *         } while (rte_seqlock_read_retry(&config->lock));
+ * }
+ *
+ * // Accessor function for writing config fields.
+ * void
+ * config_update(struct config *config, int param_x, const char *param_y)
+ * {
+ *         rte_seqlock_write_lock(&config->lock);
+ *         // Stores may be atomic or non-atomic, as in this example.
+ *         config->param_x = param_x;
+ *         strcpy(config->param_y, param_y);
+ *         rte_seqlock_write_unlock(&config->lock);
+ * }
+ * @endcode
+ *
+ * @see
+ * https://en.wikipedia.org/wiki/Seqlock.
+ */
+
+#include <stdbool.h>
+#include <stdint.h>
+
+#include <rte_atomic.h>
+#include <rte_branch_prediction.h>
+#include <rte_spinlock.h>
+
+/**
+ * The RTE seqlock type.
+ */
+typedef struct {
+	uint32_t sn; /**< A sequence number for the protected data. */
+	rte_spinlock_t lock; /**< Spinlock used to serialize writers.  */
+} rte_seqlock_t;
+
+/**
+ * A static seqlock initializer.
+ */
+#define RTE_SEQLOCK_INITIALIZER \
+	{							\
+		.sn = 0,					\
+		.lock = RTE_SPINLOCK_INITIALIZER		\
+	}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Initialize the seqlock.
+ *
+ * This function initializes the seqlock, and leaves the writer-side
+ * spinlock unlocked.
+ *
+ * @param seqlock
+ *   A pointer to the seqlock.
+ */
+__rte_experimental
+void
+rte_seqlock_init(rte_seqlock_t *seqlock);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Begin a read-side critical section.
+ *
+ * A call to this function marks the beginning of a read-side critical
+ * section, for @p seqlock.
+ *
+ * rte_seqlock_read_begin() returns a sequence number, which is later
+ * used in rte_seqlock_read_retry() to check if the protected data
+ * underwent any modifications during the read transaction.
+ *
+ * After (in program order) rte_seqlock_read_begin() has been called,
+ * the calling thread reads the protected data, for later use. The
+ * protected data read *must* be copied (either in pristine form, or
+ * in the form of some derivative), since the caller may only read the
+ * data from within the read-side critical section (i.e., after
+ * rte_seqlock_read_begin() and before rte_seqlock_read_retry()),
+ * but must not act upon the retrieved data while in the critical
+ * section, since it does not yet know if it is consistent.
+ *
+ * The protected data may be read using atomic and/or non-atomic
+ * operations.
+ *
+ * After (in program order) all required data loads have been
+ * performed, rte_seqlock_read_retry() should be called, marking
+ * the end of the read-side critical section.
+ *
+ * If rte_seqlock_read_retry() returns true, the just-read data is
+ * inconsistent and should be discarded. The caller has the option to
+ * either restart the whole procedure right away (i.e., calling
+ * rte_seqlock_read_begin() again), or do the same at some later time.
+ *
+ * If rte_seqlock_read_retry() returns false, the data was read
+ * atomically and the copied data is consistent.
+ *
+ * @param seqlock
+ *   A pointer to the seqlock.
+ * @return
+ *   The seqlock sequence number for this critical section, to
+ *   later be passed to rte_seqlock_read_retry().
+ *
+ * @see rte_seqlock_read_retry()
+ */
+__rte_experimental
+static inline uint32_t
+rte_seqlock_read_begin(const rte_seqlock_t *seqlock)
+{
+	/* __ATOMIC_ACQUIRE to prevent loads after (in program order)
+	 * from happening before the sn load. Synchronizes-with the
+	 * store release in rte_seqlock_write_unlock().
+	 */
+	return __atomic_load_n(&seqlock->sn, __ATOMIC_ACQUIRE);
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * End a read-side critical section.
+ *
+ * A call to this function marks the end of a read-side critical
+ * section, for @p seqlock. The application must supply the sequence
+ * number produced by the corresponding rte_seqlock_read_begin() call.
+ *
+ * After this function has been called, the caller should not access
+ * the protected data.
+ *
+ * In case rte_seqlock_read_retry() returns true, the just-read data
+ * was modified as it was being read and may be inconsistent, and thus
+ * should be discarded.
+ *
+ * In case this function returns false, the data is consistent and the
+ * set of atomic and non-atomic load operations performed between
+ * rte_seqlock_read_begin() and rte_seqlock_read_retry() were atomic,
+ * as a whole.
+ *
+ * @param seqlock
+ *   A pointer to the seqlock.
+ * @param begin_sn
+ *   The seqlock sequence number returned by rte_seqlock_read_begin().
+ * @return
+ *   true or false, if the just-read seqlock-protected data was
+ *   inconsistent or consistent, respectively, at the time it was
+ *   read.
+ *
+ * @see rte_seqlock_read_begin()
+ */
+__rte_experimental
+static inline bool
+rte_seqlock_read_retry(const rte_seqlock_t *seqlock, uint32_t begin_sn)
+{
+	uint32_t end_sn;
+
+	/* An odd sequence number means the protected data was being
+	 * modified already at the point of the rte_seqlock_read_begin()
+	 * call.
+	 */
+	if (unlikely(begin_sn & 1))
+		return true;
+
+	/* make sure the data loads happens before the sn load */
+	rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
+
+	end_sn = __atomic_load_n(&seqlock->sn, __ATOMIC_RELAXED);
+
+	/* A writer pegged the sequence number during the read operation. */
+	if (unlikely(begin_sn != end_sn))
+		return true;
+
+	return false;
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Begin a write-side critical section.
+ *
+ * A call to this function acquires the write lock associated @p
+ * seqlock, and marks the beginning of a write-side critical section.
+ *
+ * After having called this function, the caller may go on to modify
+ * (both read and write) the protected data, in an atomic or
+ * non-atomic manner.
+ *
+ * After the necessary updates have been performed, the application
+ * calls rte_seqlock_write_unlock().
+ *
+ * This function is not preemption-safe in the sense that preemption
+ * of the calling thread may block reader progress until the writer
+ * thread is rescheduled.
+ *
+ * Unlike rte_seqlock_read_begin(), each call made to
+ * rte_seqlock_write_lock() must be matched with an unlock call.
+ *
+ * @param seqlock
+ *   A pointer to the seqlock.
+ *
+ * @see rte_seqlock_write_unlock()
+ */
+__rte_experimental
+static inline void
+rte_seqlock_write_lock(rte_seqlock_t *seqlock)
+{
+	uint32_t sn;
+
+	/* to synchronize with other writers */
+	rte_spinlock_lock(&seqlock->lock);
+
+	sn = seqlock->sn + 1;
+
+	__atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELAXED);
+
+	/* __ATOMIC_RELEASE to prevent stores after (in program order)
+	 * from happening before the sn store.
+	 */
+	rte_atomic_thread_fence(__ATOMIC_RELEASE);
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * End a write-side critical section.
+ *
+ * A call to this function marks the end of the write-side critical
+ * section, for @p seqlock. After this call has been made, the protected
+ * data may no longer be modified.
+ *
+ * @param seqlock
+ *   A pointer to the seqlock.
+ *
+ * @see rte_seqlock_write_lock()
+ */
+__rte_experimental
+static inline void
+rte_seqlock_write_unlock(rte_seqlock_t *seqlock)
+{
+	uint32_t sn;
+
+	sn = seqlock->sn + 1;
+
+	/* synchronizes-with the load acquire in rte_seqlock_read_begin() */
+	__atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELEASE);
+
+	rte_spinlock_unlock(&seqlock->lock);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif  /* _RTE_SEQLOCK_H_ */
diff --git a/lib/eal/version.map b/lib/eal/version.map
index b53eeb30d7..4a9d0ed899 100644
--- a/lib/eal/version.map
+++ b/lib/eal/version.map
@@ -420,6 +420,9 @@ EXPERIMENTAL {
 	rte_intr_instance_free;
 	rte_intr_type_get;
 	rte_intr_type_set;
+
+	# added in 22.07
+	rte_seqlock_init;
 };
 
 INTERNAL {
-- 
2.25.1


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v4] eal: add seqlock
  2022-04-08 14:24                               ` [PATCH v4] " Mattias Rönnblom
@ 2022-04-08 15:17                                 ` Stephen Hemminger
  2022-04-08 16:24                                   ` Mattias Rönnblom
  2022-04-08 15:19                                 ` Stephen Hemminger
                                                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 104+ messages in thread
From: Stephen Hemminger @ 2022-04-08 15:17 UTC (permalink / raw)
  To: Mattias Rönnblom
  Cc: dev, Thomas Monjalon, David Marchand, onar.olsen,
	Honnappa.Nagarahalli, nd, konstantin.ananyev, mb, hofors,
	Ola Liljedahl

On Fri, 8 Apr 2022 16:24:42 +0200
Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote:

> +++ b/lib/eal/common/rte_seqlock.c
> @@ -0,0 +1,12 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2022 Ericsson AB
> + */
> +
> +#include <rte_seqlock.h>
> +
> +void
> +rte_seqlock_init(rte_seqlock_t *seqlock)
> +{
> +	seqlock->sn = 0;
> +	rte_spinlock_init(&seqlock->lock);
> +}

Why not put init in rte_seqlock.h (like other locks)
and not need a .c at all?


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v4] eal: add seqlock
  2022-04-08 14:24                               ` [PATCH v4] " Mattias Rönnblom
  2022-04-08 15:17                                 ` Stephen Hemminger
@ 2022-04-08 15:19                                 ` Stephen Hemminger
  2022-04-08 16:37                                   ` Mattias Rönnblom
  2022-04-08 16:48                                 ` Mattias Rönnblom
                                                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 104+ messages in thread
From: Stephen Hemminger @ 2022-04-08 15:19 UTC (permalink / raw)
  To: Mattias Rönnblom
  Cc: dev, Thomas Monjalon, David Marchand, onar.olsen,
	Honnappa.Nagarahalli, nd, konstantin.ananyev, mb, hofors,
	Ola Liljedahl

On Fri, 8 Apr 2022 16:24:42 +0200
Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote:

> +	/* A writer pegged the sequence number during the read operation. */
> +	if (unlikely(begin_sn != end_sn))
> +		return true;

In some countries "pegged" might be considered inappropriate slang.
Use incremented or changed instead.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v4] eal: add seqlock
  2022-04-08 15:17                                 ` Stephen Hemminger
@ 2022-04-08 16:24                                   ` Mattias Rönnblom
  0 siblings, 0 replies; 104+ messages in thread
From: Mattias Rönnblom @ 2022-04-08 16:24 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: dev, Thomas Monjalon, David Marchand, onar.olsen,
	Honnappa.Nagarahalli, nd, konstantin.ananyev, mb, hofors,
	Ola Liljedahl

On 2022-04-08 17:17, Stephen Hemminger wrote:
> On Fri, 8 Apr 2022 16:24:42 +0200
> Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote:
> 
>> +++ b/lib/eal/common/rte_seqlock.c
>> @@ -0,0 +1,12 @@
>> +/* SPDX-License-Identifier: BSD-3-Clause
>> + * Copyright(c) 2022 Ericsson AB
>> + */
>> +
>> +#include <rte_seqlock.h>
>> +
>> +void
>> +rte_seqlock_init(rte_seqlock_t *seqlock)
>> +{
>> +	seqlock->sn = 0;
>> +	rte_spinlock_init(&seqlock->lock);
>> +}
> 
> Why not put init in rte_seqlock.h (like other locks)
> and not need a .c at all?
> 

Non-performance critical functions shouldn't be in header files.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v4] eal: add seqlock
  2022-04-08 15:19                                 ` Stephen Hemminger
@ 2022-04-08 16:37                                   ` Mattias Rönnblom
  0 siblings, 0 replies; 104+ messages in thread
From: Mattias Rönnblom @ 2022-04-08 16:37 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: dev, Thomas Monjalon, David Marchand, onar.olsen,
	Honnappa.Nagarahalli, nd, konstantin.ananyev, mb, hofors,
	Ola Liljedahl

On 2022-04-08 17:19, Stephen Hemminger wrote:
> On Fri, 8 Apr 2022 16:24:42 +0200
> Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote:
> 
>> +	/* A writer pegged the sequence number during the read operation. */
>> +	if (unlikely(begin_sn != end_sn))
>> +		return true;
> 
> In some countries "pegged" might be considered inappropriate slang.
> Use incremented or changed instead.

OK.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v4] eal: add seqlock
  2022-04-08 14:24                               ` [PATCH v4] " Mattias Rönnblom
  2022-04-08 15:17                                 ` Stephen Hemminger
  2022-04-08 15:19                                 ` Stephen Hemminger
@ 2022-04-08 16:48                                 ` Mattias Rönnblom
  2022-04-12 17:27                                 ` Ananyev, Konstantin
  2022-04-28 10:28                                 ` David Marchand
  4 siblings, 0 replies; 104+ messages in thread
From: Mattias Rönnblom @ 2022-04-08 16:48 UTC (permalink / raw)
  To: Mattias Rönnblom, dev
  Cc: Thomas Monjalon, David Marchand, onar.olsen,
	Honnappa.Nagarahalli, nd, konstantin.ananyev, mb, stephen,
	Ola Liljedahl

On 2022-04-08 16:24, Mattias Rönnblom wrote:

<snip>

> 
> PATCH v4:
>    * Reverted to Linux kernel style naming on the read side.

In this version I chose to adhere to kernel naming on the read side, but 
keep the write_lock()/unlock() on the write side.

I think those names communicate better what the functions do, but 
Stephen's comment about keeping naming and semantics close to the Linux 
kernel APIs is very much relevant, also for the write functions.

I don't really have an opinion if we keep these names, or if we change 
to rte_seqlock_write_begin()/end().

You might ask yourself which of the two naming options make most sense 
in the light that we might extend the proposed seqlock API with an 
"unlocked" (non-writer-serializing) seqlock variant, or variants with 
other types of lock, in the future. What function writer-side names 
would be suitable for such. (I don't know, but it seemed something that 
might be useful to consider.)

<snip>

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [RFC 1/3] eal: add macro to warn for unused function return values
  2022-04-02 19:38                                               ` Honnappa Nagarahalli
@ 2022-04-10 13:51                                                 ` Mattias Rönnblom
  2022-04-10 13:51                                                   ` [RFC 2/3] eal: emit warning for unused trylock return value Mattias Rönnblom
                                                                     ` (4 more replies)
  0 siblings, 5 replies; 104+ messages in thread
From: Mattias Rönnblom @ 2022-04-10 13:51 UTC (permalink / raw)
  To: dev
  Cc: Thomas Monjalon, David Marchand, Honnappa.Nagarahalli, mb,
	hofors, Mattias Rönnblom

This patch adds a wrapper macro __rte_warn_unused_result for the
warn_unused_result function attribute.

Marking a function __rte_warn_unused_result will make the compiler
emit a warning in case the caller does not use the function's return
value.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
---
 lib/eal/include/rte_common.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/lib/eal/include/rte_common.h b/lib/eal/include/rte_common.h
index 4a399cc7c8..544e7de2e7 100644
--- a/lib/eal/include/rte_common.h
+++ b/lib/eal/include/rte_common.h
@@ -222,6 +222,11 @@ static void __attribute__((destructor(RTE_PRIO(prio)), used)) func(void)
  */
 #define __rte_noreturn __attribute__((noreturn))
 
+/**
+ * Issue warning in case the function's return value is ignore
+ */
+#define __rte_warn_unused_result __attribute__((warn_unused_result))
+
 /**
  * Force a function to be inlined
  */
-- 
2.25.1


^ permalink raw reply	[flat|nested] 104+ messages in thread

* [RFC 2/3] eal: emit warning for unused trylock return value
  2022-04-10 13:51                                                 ` [RFC 1/3] eal: add macro to warn for unused function return values Mattias Rönnblom
@ 2022-04-10 13:51                                                   ` Mattias Rönnblom
  2022-04-10 13:51                                                   ` [RFC 3/3] examples/bond: fix invalid use of trylock Mattias Rönnblom
                                                                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 104+ messages in thread
From: Mattias Rönnblom @ 2022-04-10 13:51 UTC (permalink / raw)
  To: dev
  Cc: Thomas Monjalon, David Marchand, Honnappa.Nagarahalli, mb,
	hofors, Mattias Rönnblom

Mark the trylock family of spinlock functions with
__rte_warn_unused_result.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
---
 lib/eal/include/generic/rte_spinlock.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/lib/eal/include/generic/rte_spinlock.h b/lib/eal/include/generic/rte_spinlock.h
index 40fe49d5ad..73ed4bfbdc 100644
--- a/lib/eal/include/generic/rte_spinlock.h
+++ b/lib/eal/include/generic/rte_spinlock.h
@@ -97,6 +97,7 @@ rte_spinlock_unlock (rte_spinlock_t *sl)
  * @return
  *   1 if the lock is successfully taken; 0 otherwise.
  */
+__rte_warn_unused_result
 static inline int
 rte_spinlock_trylock (rte_spinlock_t *sl);
 
@@ -174,6 +175,7 @@ rte_spinlock_unlock_tm(rte_spinlock_t *sl);
  *   1 if the hardware memory transaction is successfully started
  *   or lock is successfully taken; 0 otherwise.
  */
+__rte_warn_unused_result
 static inline int
 rte_spinlock_trylock_tm(rte_spinlock_t *sl);
 
@@ -243,6 +245,7 @@ static inline void rte_spinlock_recursive_unlock(rte_spinlock_recursive_t *slr)
  * @return
  *   1 if the lock is successfully taken; 0 otherwise.
  */
+__rte_warn_unused_result
 static inline int rte_spinlock_recursive_trylock(rte_spinlock_recursive_t *slr)
 {
 	int id = rte_gettid();
@@ -299,6 +302,7 @@ static inline void rte_spinlock_recursive_unlock_tm(
  *   1 if the hardware memory transaction is successfully started
  *   or lock is successfully taken; 0 otherwise.
  */
+__rte_warn_unused_result
 static inline int rte_spinlock_recursive_trylock_tm(
 	rte_spinlock_recursive_t *slr);
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 104+ messages in thread

* [RFC 3/3] examples/bond: fix invalid use of trylock
  2022-04-10 13:51                                                 ` [RFC 1/3] eal: add macro to warn for unused function return values Mattias Rönnblom
  2022-04-10 13:51                                                   ` [RFC 2/3] eal: emit warning for unused trylock return value Mattias Rönnblom
@ 2022-04-10 13:51                                                   ` Mattias Rönnblom
  2022-04-11  1:01                                                     ` Min Hu (Connor)
  2022-04-11 11:25                                                     ` David Marchand
  2022-04-10 18:02                                                   ` [RFC 1/3] eal: add macro to warn for unused function return values Stephen Hemminger
                                                                     ` (2 subsequent siblings)
  4 siblings, 2 replies; 104+ messages in thread
From: Mattias Rönnblom @ 2022-04-10 13:51 UTC (permalink / raw)
  To: dev
  Cc: Thomas Monjalon, David Marchand, Honnappa.Nagarahalli, mb,
	hofors, Mattias Rönnblom, michalx.k.jastrzebski

The conditional rte_spinlock_trylock() was used as if it is an
unconditional lock operation in a number of places.

Fixes: cc7e8ae84faa ("examples/bond: add example application for link bonding mode 6")
Cc: michalx.k.jastrzebski@intel.com

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
---
 examples/bond/main.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/examples/bond/main.c b/examples/bond/main.c
index 335bde5c8d..4efebb3902 100644
--- a/examples/bond/main.c
+++ b/examples/bond/main.c
@@ -373,7 +373,7 @@ static int lcore_main(__rte_unused void *arg1)
 	bond_ip = BOND_IP_1 | (BOND_IP_2 << 8) |
 				(BOND_IP_3 << 16) | (BOND_IP_4 << 24);
 
-	rte_spinlock_trylock(&global_flag_stru_p->lock);
+	rte_spinlock_lock(&global_flag_stru_p->lock);
 
 	while (global_flag_stru_p->LcoreMainIsRunning) {
 		rte_spinlock_unlock(&global_flag_stru_p->lock);
@@ -456,7 +456,7 @@ static int lcore_main(__rte_unused void *arg1)
 			if (is_free == 0)
 				rte_pktmbuf_free(pkts[i]);
 		}
-		rte_spinlock_trylock(&global_flag_stru_p->lock);
+		rte_spinlock_lock(&global_flag_stru_p->lock);
 	}
 	rte_spinlock_unlock(&global_flag_stru_p->lock);
 	printf("BYE lcore_main\n");
@@ -571,7 +571,7 @@ static void cmd_start_parsed(__rte_unused void *parsed_result,
 {
 	int worker_core_id = rte_lcore_id();
 
-	rte_spinlock_trylock(&global_flag_stru_p->lock);
+	rte_spinlock_lock(&global_flag_stru_p->lock);
 	if (global_flag_stru_p->LcoreMainIsRunning == 0) {
 		if (rte_eal_get_lcore_state(global_flag_stru_p->LcoreMainCore)
 		    != WAIT) {
@@ -591,7 +591,7 @@ static void cmd_start_parsed(__rte_unused void *parsed_result,
 	if ((worker_core_id >= RTE_MAX_LCORE) || (worker_core_id == 0))
 		return;
 
-	rte_spinlock_trylock(&global_flag_stru_p->lock);
+	rte_spinlock_lock(&global_flag_stru_p->lock);
 	global_flag_stru_p->LcoreMainIsRunning = 1;
 	rte_spinlock_unlock(&global_flag_stru_p->lock);
 	cmdline_printf(cl,
@@ -659,7 +659,7 @@ static void cmd_stop_parsed(__rte_unused void *parsed_result,
 			    struct cmdline *cl,
 			    __rte_unused void *data)
 {
-	rte_spinlock_trylock(&global_flag_stru_p->lock);
+	rte_spinlock_lock(&global_flag_stru_p->lock);
 	if (global_flag_stru_p->LcoreMainIsRunning == 0)	{
 		cmdline_printf(cl,
 					"lcore_main not running on core:%d\n",
@@ -700,7 +700,7 @@ static void cmd_quit_parsed(__rte_unused void *parsed_result,
 			    struct cmdline *cl,
 			    __rte_unused void *data)
 {
-	rte_spinlock_trylock(&global_flag_stru_p->lock);
+	rte_spinlock_lock(&global_flag_stru_p->lock);
 	if (global_flag_stru_p->LcoreMainIsRunning == 0)	{
 		cmdline_printf(cl,
 					"lcore_main not running on core:%d\n",
@@ -762,7 +762,7 @@ static void cmd_show_parsed(__rte_unused void *parsed_result,
 		printf("\n");
 	}
 
-	rte_spinlock_trylock(&global_flag_stru_p->lock);
+	rte_spinlock_lock(&global_flag_stru_p->lock);
 	cmdline_printf(cl,
 			"Active_slaves:%d "
 			"packets received:Tot:%d Arp:%d IPv4:%d\n",
-- 
2.25.1


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 1/3] eal: add macro to warn for unused function return values
  2022-04-10 13:51                                                 ` [RFC 1/3] eal: add macro to warn for unused function return values Mattias Rönnblom
  2022-04-10 13:51                                                   ` [RFC 2/3] eal: emit warning for unused trylock return value Mattias Rönnblom
  2022-04-10 13:51                                                   ` [RFC 3/3] examples/bond: fix invalid use of trylock Mattias Rönnblom
@ 2022-04-10 18:02                                                   ` Stephen Hemminger
  2022-04-10 18:50                                                     ` Mattias Rönnblom
  2022-04-11  7:17                                                   ` Morten Brørup
  2022-04-11  9:16                                                   ` Bruce Richardson
  4 siblings, 1 reply; 104+ messages in thread
From: Stephen Hemminger @ 2022-04-10 18:02 UTC (permalink / raw)
  To: Mattias Rönnblom
  Cc: dev, Thomas Monjalon, David Marchand, Honnappa.Nagarahalli, mb, hofors

On Sun, 10 Apr 2022 15:51:38 +0200
Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote:

> This patch adds a wrapper macro __rte_warn_unused_result for the
> warn_unused_result function attribute.
> 
> Marking a function __rte_warn_unused_result will make the compiler
> emit a warning in case the caller does not use the function's return
> value.
> 
> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>

Looks good, but are these attributes compiler specific?

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 1/3] eal: add macro to warn for unused function return values
  2022-04-10 18:02                                                   ` [RFC 1/3] eal: add macro to warn for unused function return values Stephen Hemminger
@ 2022-04-10 18:50                                                     ` Mattias Rönnblom
  0 siblings, 0 replies; 104+ messages in thread
From: Mattias Rönnblom @ 2022-04-10 18:50 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: dev, Thomas Monjalon, David Marchand, Honnappa.Nagarahalli, mb, hofors

On 2022-04-10 20:02, Stephen Hemminger wrote:
> On Sun, 10 Apr 2022 15:51:38 +0200
> Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote:
> 
>> This patch adds a wrapper macro __rte_warn_unused_result for the
>> warn_unused_result function attribute.
>>
>> Marking a function __rte_warn_unused_result will make the compiler
>> emit a warning in case the caller does not use the function's return
>> value.
>>
>> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> 
> Looks good, but are these attributes compiler specific?

GCC and LLVM clang supports this and many other attributes (some of 
which are already wrapped by ___rte_* macros). The whole attribute 
machinery is compiler (or rather, "implementation") specific, as 
suggested by the double-underscore prefix (__attribute__).

I don't know about icc.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 3/3] examples/bond: fix invalid use of trylock
  2022-04-10 13:51                                                   ` [RFC 3/3] examples/bond: fix invalid use of trylock Mattias Rönnblom
@ 2022-04-11  1:01                                                     ` Min Hu (Connor)
  2022-04-11 14:32                                                       ` Mattias Rönnblom
  2022-04-11 11:25                                                     ` David Marchand
  1 sibling, 1 reply; 104+ messages in thread
From: Min Hu (Connor) @ 2022-04-11  1:01 UTC (permalink / raw)
  To: Mattias Rönnblom, dev
  Cc: Thomas Monjalon, David Marchand, Honnappa.Nagarahalli, mb,
	hofors, michalx.k.jastrzebski

Acked-by: Min Hu (Connor) <humin29@huawei.com>

在 2022/4/10 21:51, Mattias Rönnblom 写道:
> The conditional rte_spinlock_trylock() was used as if it is an
> unconditional lock operation in a number of places.
> 
> Fixes: cc7e8ae84faa ("examples/bond: add example application for link bonding mode 6")
> Cc: michalx.k.jastrzebski@intel.com
> 
> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> ---
>   examples/bond/main.c | 14 +++++++-------
>   1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/examples/bond/main.c b/examples/bond/main.c
> index 335bde5c8d..4efebb3902 100644
> --- a/examples/bond/main.c
> +++ b/examples/bond/main.c
> @@ -373,7 +373,7 @@ static int lcore_main(__rte_unused void *arg1)
>   	bond_ip = BOND_IP_1 | (BOND_IP_2 << 8) |
>   				(BOND_IP_3 << 16) | (BOND_IP_4 << 24);
>   
> -	rte_spinlock_trylock(&global_flag_stru_p->lock);
> +	rte_spinlock_lock(&global_flag_stru_p->lock);
>   
>   	while (global_flag_stru_p->LcoreMainIsRunning) {
>   		rte_spinlock_unlock(&global_flag_stru_p->lock);
> @@ -456,7 +456,7 @@ static int lcore_main(__rte_unused void *arg1)
>   			if (is_free == 0)
>   				rte_pktmbuf_free(pkts[i]);
>   		}
> -		rte_spinlock_trylock(&global_flag_stru_p->lock);
> +		rte_spinlock_lock(&global_flag_stru_p->lock);
>   	}
>   	rte_spinlock_unlock(&global_flag_stru_p->lock);
>   	printf("BYE lcore_main\n");
> @@ -571,7 +571,7 @@ static void cmd_start_parsed(__rte_unused void *parsed_result,
>   {
>   	int worker_core_id = rte_lcore_id();
>   
> -	rte_spinlock_trylock(&global_flag_stru_p->lock);
> +	rte_spinlock_lock(&global_flag_stru_p->lock);
>   	if (global_flag_stru_p->LcoreMainIsRunning == 0) {
>   		if (rte_eal_get_lcore_state(global_flag_stru_p->LcoreMainCore)
>   		    != WAIT) {
> @@ -591,7 +591,7 @@ static void cmd_start_parsed(__rte_unused void *parsed_result,
>   	if ((worker_core_id >= RTE_MAX_LCORE) || (worker_core_id == 0))
>   		return;
>   
> -	rte_spinlock_trylock(&global_flag_stru_p->lock);
> +	rte_spinlock_lock(&global_flag_stru_p->lock);
>   	global_flag_stru_p->LcoreMainIsRunning = 1;
>   	rte_spinlock_unlock(&global_flag_stru_p->lock);
>   	cmdline_printf(cl,
> @@ -659,7 +659,7 @@ static void cmd_stop_parsed(__rte_unused void *parsed_result,
>   			    struct cmdline *cl,
>   			    __rte_unused void *data)
>   {
> -	rte_spinlock_trylock(&global_flag_stru_p->lock);
> +	rte_spinlock_lock(&global_flag_stru_p->lock);
>   	if (global_flag_stru_p->LcoreMainIsRunning == 0)	{
>   		cmdline_printf(cl,
>   					"lcore_main not running on core:%d\n",
> @@ -700,7 +700,7 @@ static void cmd_quit_parsed(__rte_unused void *parsed_result,
>   			    struct cmdline *cl,
>   			    __rte_unused void *data)
>   {
> -	rte_spinlock_trylock(&global_flag_stru_p->lock);
> +	rte_spinlock_lock(&global_flag_stru_p->lock);
>   	if (global_flag_stru_p->LcoreMainIsRunning == 0)	{
>   		cmdline_printf(cl,
>   					"lcore_main not running on core:%d\n",
> @@ -762,7 +762,7 @@ static void cmd_show_parsed(__rte_unused void *parsed_result,
>   		printf("\n");
>   	}
>   
> -	rte_spinlock_trylock(&global_flag_stru_p->lock);
> +	rte_spinlock_lock(&global_flag_stru_p->lock);
>   	cmdline_printf(cl,
>   			"Active_slaves:%d "
>   			"packets received:Tot:%d Arp:%d IPv4:%d\n",
> 

^ permalink raw reply	[flat|nested] 104+ messages in thread

* RE: [RFC 1/3] eal: add macro to warn for unused function return values
  2022-04-10 13:51                                                 ` [RFC 1/3] eal: add macro to warn for unused function return values Mattias Rönnblom
                                                                     ` (2 preceding siblings ...)
  2022-04-10 18:02                                                   ` [RFC 1/3] eal: add macro to warn for unused function return values Stephen Hemminger
@ 2022-04-11  7:17                                                   ` Morten Brørup
  2022-04-11 14:29                                                     ` Mattias Rönnblom
  2022-04-11  9:16                                                   ` Bruce Richardson
  4 siblings, 1 reply; 104+ messages in thread
From: Morten Brørup @ 2022-04-11  7:17 UTC (permalink / raw)
  To: Mattias Rönnblom, dev
  Cc: Thomas Monjalon, David Marchand, Honnappa.Nagarahalli, hofors

> From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com]
> Sent: Sunday, 10 April 2022 15.52
> 
> This patch adds a wrapper macro __rte_warn_unused_result for the
> warn_unused_result function attribute.
> 
> Marking a function __rte_warn_unused_result will make the compiler
> emit a warning in case the caller does not use the function's return
> value.
> 
> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> ---
>  lib/eal/include/rte_common.h | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/lib/eal/include/rte_common.h
> b/lib/eal/include/rte_common.h
> index 4a399cc7c8..544e7de2e7 100644
> --- a/lib/eal/include/rte_common.h
> +++ b/lib/eal/include/rte_common.h
> @@ -222,6 +222,11 @@ static void
> __attribute__((destructor(RTE_PRIO(prio)), used)) func(void)
>   */
>  #define __rte_noreturn __attribute__((noreturn))
> 
> +/**
> + * Issue warning in case the function's return value is ignore

Typo: ignore -> ignored

Consider: warning -> a warning

> + */
> +#define __rte_warn_unused_result __attribute__((warn_unused_result))
> +
>  /**
>   * Force a function to be inlined
>   */
> --
> 2.25.1
> 

Reviewed-by: Morten Brørup <mb@smartsharesystems.com>


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 1/3] eal: add macro to warn for unused function return values
  2022-04-10 13:51                                                 ` [RFC 1/3] eal: add macro to warn for unused function return values Mattias Rönnblom
                                                                     ` (3 preceding siblings ...)
  2022-04-11  7:17                                                   ` Morten Brørup
@ 2022-04-11  9:16                                                   ` Bruce Richardson
  2022-04-11 14:27                                                     ` Mattias Rönnblom
                                                                       ` (2 more replies)
  4 siblings, 3 replies; 104+ messages in thread
From: Bruce Richardson @ 2022-04-11  9:16 UTC (permalink / raw)
  To: Mattias Rönnblom
  Cc: dev, Thomas Monjalon, David Marchand, Honnappa.Nagarahalli, mb, hofors

On Sun, Apr 10, 2022 at 03:51:38PM +0200, Mattias Rönnblom wrote:
> This patch adds a wrapper macro __rte_warn_unused_result for the
> warn_unused_result function attribute.
> 
> Marking a function __rte_warn_unused_result will make the compiler
> emit a warning in case the caller does not use the function's return
> value.
> 
> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> ---

This is good to have, thanks.

Series-acked-by: Bruce Richardson <bruce.richardson@intel.com>

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 3/3] examples/bond: fix invalid use of trylock
  2022-04-10 13:51                                                   ` [RFC 3/3] examples/bond: fix invalid use of trylock Mattias Rönnblom
  2022-04-11  1:01                                                     ` Min Hu (Connor)
@ 2022-04-11 11:25                                                     ` David Marchand
  2022-04-11 14:33                                                       ` Mattias Rönnblom
  1 sibling, 1 reply; 104+ messages in thread
From: David Marchand @ 2022-04-11 11:25 UTC (permalink / raw)
  To: Mattias Rönnblom
  Cc: dev, Thomas Monjalon, Honnappa Nagarahalli, Morten Brørup,
	hofors, michalx.k.jastrzebski

On Sun, Apr 10, 2022 at 3:53 PM Mattias Rönnblom
<mattias.ronnblom@ericsson.com> wrote:
>
> The conditional rte_spinlock_trylock() was used as if it is an
> unconditional lock operation in a number of places.
>
> Fixes: cc7e8ae84faa ("examples/bond: add example application for link bonding mode 6")
> Cc: michalx.k.jastrzebski@intel.com

Any reason not to ask for backport in stable branches?

>
> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>

Otherwise, this series looks good, thanks Mattias.


-- 
David Marchand


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 1/3] eal: add macro to warn for unused function return values
  2022-04-11  9:16                                                   ` Bruce Richardson
@ 2022-04-11 14:27                                                     ` Mattias Rönnblom
  2022-04-11 15:15                                                     ` [PATCH " Mattias Rönnblom
  2022-04-11 18:24                                                     ` [RFC " Tyler Retzlaff
  2 siblings, 0 replies; 104+ messages in thread
From: Mattias Rönnblom @ 2022-04-11 14:27 UTC (permalink / raw)
  To: Bruce Richardson
  Cc: dev, Thomas Monjalon, David Marchand, Honnappa.Nagarahalli, mb, hofors

On 2022-04-11 11:16, Bruce Richardson wrote:
> On Sun, Apr 10, 2022 at 03:51:38PM +0200, Mattias Rönnblom wrote:
>> This patch adds a wrapper macro __rte_warn_unused_result for the
>> warn_unused_result function attribute.
>>
>> Marking a function __rte_warn_unused_result will make the compiler
>> emit a warning in case the caller does not use the function's return
>> value.
>>
>> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
>> ---
> 
> This is good to have, thanks.
> 
> Series-acked-by: Bruce Richardson <bruce.richardson@intel.com>


There is one issue with this attribute in combination with GCC: a 
warn_unused_result warning cannot easily be suppressed in the source 
code of the caller. The usual cast-to-void trick doesn't work with this 
compiler (but does work with clang). This behavior limit the usefulness 
of this attribute to function where it's pretty much always a bug if you 
ignore the return value.

I will update the macro doc string with some details around this.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 1/3] eal: add macro to warn for unused function return values
  2022-04-11  7:17                                                   ` Morten Brørup
@ 2022-04-11 14:29                                                     ` Mattias Rönnblom
  0 siblings, 0 replies; 104+ messages in thread
From: Mattias Rönnblom @ 2022-04-11 14:29 UTC (permalink / raw)
  To: Morten Brørup, dev
  Cc: Thomas Monjalon, David Marchand, Honnappa.Nagarahalli, hofors

On 2022-04-11 09:17, Morten Brørup wrote:
>> From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com]
>> Sent: Sunday, 10 April 2022 15.52
>>
>> This patch adds a wrapper macro __rte_warn_unused_result for the
>> warn_unused_result function attribute.
>>
>> Marking a function __rte_warn_unused_result will make the compiler
>> emit a warning in case the caller does not use the function's return
>> value.
>>
>> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
>> ---
>>   lib/eal/include/rte_common.h | 5 +++++
>>   1 file changed, 5 insertions(+)
>>
>> diff --git a/lib/eal/include/rte_common.h
>> b/lib/eal/include/rte_common.h
>> index 4a399cc7c8..544e7de2e7 100644
>> --- a/lib/eal/include/rte_common.h
>> +++ b/lib/eal/include/rte_common.h
>> @@ -222,6 +222,11 @@ static void
>> __attribute__((destructor(RTE_PRIO(prio)), used)) func(void)
>>    */
>>   #define __rte_noreturn __attribute__((noreturn))
>>
>> +/**
>> + * Issue warning in case the function's return value is ignore
> 
> Typo: ignore -> ignored
> 
> Consider: warning -> a warning
> 

OK.

>> + */
>> +#define __rte_warn_unused_result __attribute__((warn_unused_result))
>> +
>>   /**
>>    * Force a function to be inlined
>>    */
>> --
>> 2.25.1
>>
> 
> Reviewed-by: Morten Brørup <mb@smartsharesystems.com>
> 

Thanks!

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 3/3] examples/bond: fix invalid use of trylock
  2022-04-11  1:01                                                     ` Min Hu (Connor)
@ 2022-04-11 14:32                                                       ` Mattias Rönnblom
  0 siblings, 0 replies; 104+ messages in thread
From: Mattias Rönnblom @ 2022-04-11 14:32 UTC (permalink / raw)
  To: Min Hu (Connor), dev
  Cc: Thomas Monjalon, David Marchand, Honnappa.Nagarahalli, mb,
	hofors, michalx.k.jastrzebski

On 2022-04-11 03:01, Min Hu (Connor) wrote:
> Acked-by: Min Hu (Connor) <humin29@huawei.com>
> 

Thanks.

It was pretty obvious that something was wrong with this example's use 
of the spinlock, but after the brief look I had it was a little less 
obvious if this patch would fix the problem or not.

> 在 2022/4/10 21:51, Mattias Rönnblom 写道:
>> The conditional rte_spinlock_trylock() was used as if it is an
>> unconditional lock operation in a number of places.
>>
>> Fixes: cc7e8ae84faa ("examples/bond: add example application for link 
>> bonding mode 6")
>> Cc: michalx.k.jastrzebski@intel.com
>>
>> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
>> ---
>>   examples/bond/main.c | 14 +++++++-------
>>   1 file changed, 7 insertions(+), 7 deletions(-)
>>
>> diff --git a/examples/bond/main.c b/examples/bond/main.c
>> index 335bde5c8d..4efebb3902 100644
>> --- a/examples/bond/main.c
>> +++ b/examples/bond/main.c
>> @@ -373,7 +373,7 @@ static int lcore_main(__rte_unused void *arg1)
>>       bond_ip = BOND_IP_1 | (BOND_IP_2 << 8) |
>>                   (BOND_IP_3 << 16) | (BOND_IP_4 << 24);
>> -    rte_spinlock_trylock(&global_flag_stru_p->lock);
>> +    rte_spinlock_lock(&global_flag_stru_p->lock);
>>       while (global_flag_stru_p->LcoreMainIsRunning) {
>>           rte_spinlock_unlock(&global_flag_stru_p->lock);
>> @@ -456,7 +456,7 @@ static int lcore_main(__rte_unused void *arg1)
>>               if (is_free == 0)
>>                   rte_pktmbuf_free(pkts[i]);
>>           }
>> -        rte_spinlock_trylock(&global_flag_stru_p->lock);
>> +        rte_spinlock_lock(&global_flag_stru_p->lock);
>>       }
>>       rte_spinlock_unlock(&global_flag_stru_p->lock);
>>       printf("BYE lcore_main\n");
>> @@ -571,7 +571,7 @@ static void cmd_start_parsed(__rte_unused void 
>> *parsed_result,
>>   {
>>       int worker_core_id = rte_lcore_id();
>> -    rte_spinlock_trylock(&global_flag_stru_p->lock);
>> +    rte_spinlock_lock(&global_flag_stru_p->lock);
>>       if (global_flag_stru_p->LcoreMainIsRunning == 0) {
>>           if (rte_eal_get_lcore_state(global_flag_stru_p->LcoreMainCore)
>>               != WAIT) {
>> @@ -591,7 +591,7 @@ static void cmd_start_parsed(__rte_unused void 
>> *parsed_result,
>>       if ((worker_core_id >= RTE_MAX_LCORE) || (worker_core_id == 0))
>>           return;
>> -    rte_spinlock_trylock(&global_flag_stru_p->lock);
>> +    rte_spinlock_lock(&global_flag_stru_p->lock);
>>       global_flag_stru_p->LcoreMainIsRunning = 1;
>>       rte_spinlock_unlock(&global_flag_stru_p->lock);
>>       cmdline_printf(cl,
>> @@ -659,7 +659,7 @@ static void cmd_stop_parsed(__rte_unused void 
>> *parsed_result,
>>                   struct cmdline *cl,
>>                   __rte_unused void *data)
>>   {
>> -    rte_spinlock_trylock(&global_flag_stru_p->lock);
>> +    rte_spinlock_lock(&global_flag_stru_p->lock);
>>       if (global_flag_stru_p->LcoreMainIsRunning == 0)    {
>>           cmdline_printf(cl,
>>                       "lcore_main not running on core:%d\n",
>> @@ -700,7 +700,7 @@ static void cmd_quit_parsed(__rte_unused void 
>> *parsed_result,
>>                   struct cmdline *cl,
>>                   __rte_unused void *data)
>>   {
>> -    rte_spinlock_trylock(&global_flag_stru_p->lock);
>> +    rte_spinlock_lock(&global_flag_stru_p->lock);
>>       if (global_flag_stru_p->LcoreMainIsRunning == 0)    {
>>           cmdline_printf(cl,
>>                       "lcore_main not running on core:%d\n",
>> @@ -762,7 +762,7 @@ static void cmd_show_parsed(__rte_unused void 
>> *parsed_result,
>>           printf("\n");
>>       }
>> -    rte_spinlock_trylock(&global_flag_stru_p->lock);
>> +    rte_spinlock_lock(&global_flag_stru_p->lock);
>>       cmdline_printf(cl,
>>               "Active_slaves:%d "
>>               "packets received:Tot:%d Arp:%d IPv4:%d\n",
>>


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 3/3] examples/bond: fix invalid use of trylock
  2022-04-11 11:25                                                     ` David Marchand
@ 2022-04-11 14:33                                                       ` Mattias Rönnblom
  0 siblings, 0 replies; 104+ messages in thread
From: Mattias Rönnblom @ 2022-04-11 14:33 UTC (permalink / raw)
  To: David Marchand
  Cc: dev, Thomas Monjalon, Honnappa Nagarahalli, Morten Brørup,
	hofors, michalx.k.jastrzebski

On 2022-04-11 13:25, David Marchand wrote:
> On Sun, Apr 10, 2022 at 3:53 PM Mattias Rönnblom
> <mattias.ronnblom@ericsson.com> wrote:
>>
>> The conditional rte_spinlock_trylock() was used as if it is an
>> unconditional lock operation in a number of places.
>>
>> Fixes: cc7e8ae84faa ("examples/bond: add example application for link bonding mode 6")
>> Cc: michalx.k.jastrzebski@intel.com
> 
> Any reason not to ask for backport in stable branches?
> 

No. Will do.

>>
>> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> 
> Otherwise, this series looks good, thanks Mattias.
> 
> 

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH 1/3] eal: add macro to warn for unused function return values
  2022-04-11  9:16                                                   ` Bruce Richardson
  2022-04-11 14:27                                                     ` Mattias Rönnblom
@ 2022-04-11 15:15                                                     ` Mattias Rönnblom
  2022-04-11 15:15                                                       ` [PATCH 2/3] eal: emit warning for unused trylock return value Mattias Rönnblom
                                                                         ` (2 more replies)
  2022-04-11 18:24                                                     ` [RFC " Tyler Retzlaff
  2 siblings, 3 replies; 104+ messages in thread
From: Mattias Rönnblom @ 2022-04-11 15:15 UTC (permalink / raw)
  To: dev
  Cc: Bruce Richardson, Thomas Monjalon, David Marchand,
	Honnappa.Nagarahalli, mb, hofors, Stephen Hemminger,
	Mattias Rönnblom

This patch adds a wrapper macro __rte_warn_unused_result for the
warn_unused_result function attribute.

Marking a function __rte_warn_unused_result will make the compiler
emit a warning in case the caller does not use the function's return
value.

Changes since RFC:
  * Include usage recommendation and GCC peculiarities in the macro
    documentation.

Acked-by: Bruce Richardson <bruce.richardson@intel.com>

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
---
 lib/eal/include/rte_common.h | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/lib/eal/include/rte_common.h b/lib/eal/include/rte_common.h
index 4a399cc7c8..67587025ab 100644
--- a/lib/eal/include/rte_common.h
+++ b/lib/eal/include/rte_common.h
@@ -222,6 +222,31 @@ static void __attribute__((destructor(RTE_PRIO(prio)), used)) func(void)
  */
 #define __rte_noreturn __attribute__((noreturn))
 
+/**
+ * Issue a warning in case the function's return value is ignored.
+ *
+ * The use of this attribute should be restricted to cases where
+ * ignoring the marked function's return value is almost always a
+ * bug. With GCC, some effort is required to make clear that ignoring
+ * the return value is intentional. The usual void-casting method to
+ * mark something unused as used does not suppress the warning with
+ * this compiler.
+ *
+ * @code{.c}
+ * __rte_warn_unused_result int foo();
+ *
+ * void ignore_foo_result(void) {
+ *         foo(); // generates a warning with all compilers
+ *
+ *         (void)foo(); // still generates the warning with GCC (but not clang)
+ *
+ *         int unused __rte_unused;
+ *         unused = foo(); // does the trick with all compilers
+ *  }
+ * @endcode
+ */
+#define __rte_warn_unused_result __attribute__((warn_unused_result))
+
 /**
  * Force a function to be inlined
  */
-- 
2.25.1


^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH 2/3] eal: emit warning for unused trylock return value
  2022-04-11 15:15                                                     ` [PATCH " Mattias Rönnblom
@ 2022-04-11 15:15                                                       ` Mattias Rönnblom
  2022-04-11 15:29                                                         ` Morten Brørup
  2022-04-11 15:15                                                       ` [PATCH 3/3] examples/bond: fix invalid use of trylock Mattias Rönnblom
  2022-04-11 15:25                                                       ` [PATCH 1/3] eal: add macro to warn for unused function return values Morten Brørup
  2 siblings, 1 reply; 104+ messages in thread
From: Mattias Rönnblom @ 2022-04-11 15:15 UTC (permalink / raw)
  To: dev
  Cc: Bruce Richardson, Thomas Monjalon, David Marchand,
	Honnappa.Nagarahalli, mb, hofors, Stephen Hemminger,
	Mattias Rönnblom

Mark the trylock family of spinlock functions with
__rte_warn_unused_result.

Acked-by: Bruce Richardson <bruce.richardson@intel.com>

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
---
 lib/eal/include/generic/rte_spinlock.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/lib/eal/include/generic/rte_spinlock.h b/lib/eal/include/generic/rte_spinlock.h
index 40fe49d5ad..73ed4bfbdc 100644
--- a/lib/eal/include/generic/rte_spinlock.h
+++ b/lib/eal/include/generic/rte_spinlock.h
@@ -97,6 +97,7 @@ rte_spinlock_unlock (rte_spinlock_t *sl)
  * @return
  *   1 if the lock is successfully taken; 0 otherwise.
  */
+__rte_warn_unused_result
 static inline int
 rte_spinlock_trylock (rte_spinlock_t *sl);
 
@@ -174,6 +175,7 @@ rte_spinlock_unlock_tm(rte_spinlock_t *sl);
  *   1 if the hardware memory transaction is successfully started
  *   or lock is successfully taken; 0 otherwise.
  */
+__rte_warn_unused_result
 static inline int
 rte_spinlock_trylock_tm(rte_spinlock_t *sl);
 
@@ -243,6 +245,7 @@ static inline void rte_spinlock_recursive_unlock(rte_spinlock_recursive_t *slr)
  * @return
  *   1 if the lock is successfully taken; 0 otherwise.
  */
+__rte_warn_unused_result
 static inline int rte_spinlock_recursive_trylock(rte_spinlock_recursive_t *slr)
 {
 	int id = rte_gettid();
@@ -299,6 +302,7 @@ static inline void rte_spinlock_recursive_unlock_tm(
  *   1 if the hardware memory transaction is successfully started
  *   or lock is successfully taken; 0 otherwise.
  */
+__rte_warn_unused_result
 static inline int rte_spinlock_recursive_trylock_tm(
 	rte_spinlock_recursive_t *slr);
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH 3/3] examples/bond: fix invalid use of trylock
  2022-04-11 15:15                                                     ` [PATCH " Mattias Rönnblom
  2022-04-11 15:15                                                       ` [PATCH 2/3] eal: emit warning for unused trylock return value Mattias Rönnblom
@ 2022-04-11 15:15                                                       ` Mattias Rönnblom
  2022-04-14 12:06                                                         ` David Marchand
  2022-04-11 15:25                                                       ` [PATCH 1/3] eal: add macro to warn for unused function return values Morten Brørup
  2 siblings, 1 reply; 104+ messages in thread
From: Mattias Rönnblom @ 2022-04-11 15:15 UTC (permalink / raw)
  To: dev
  Cc: Bruce Richardson, Thomas Monjalon, David Marchand,
	Honnappa.Nagarahalli, mb, hofors, Stephen Hemminger,
	Mattias Rönnblom, michalx.k.jastrzebski, stable, Min Hu

The conditional rte_spinlock_trylock() was used as if it is an
unconditional lock operation in a number of places.

Fixes: cc7e8ae84faa ("examples/bond: add example application for link bonding mode 6")
Cc: michalx.k.jastrzebski@intel.com
Cc: stable@dpdk.org

Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Min Hu (Connor) <humin29@huawei.com>

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
---
 examples/bond/main.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/examples/bond/main.c b/examples/bond/main.c
index 335bde5c8d..4efebb3902 100644
--- a/examples/bond/main.c
+++ b/examples/bond/main.c
@@ -373,7 +373,7 @@ static int lcore_main(__rte_unused void *arg1)
 	bond_ip = BOND_IP_1 | (BOND_IP_2 << 8) |
 				(BOND_IP_3 << 16) | (BOND_IP_4 << 24);
 
-	rte_spinlock_trylock(&global_flag_stru_p->lock);
+	rte_spinlock_lock(&global_flag_stru_p->lock);
 
 	while (global_flag_stru_p->LcoreMainIsRunning) {
 		rte_spinlock_unlock(&global_flag_stru_p->lock);
@@ -456,7 +456,7 @@ static int lcore_main(__rte_unused void *arg1)
 			if (is_free == 0)
 				rte_pktmbuf_free(pkts[i]);
 		}
-		rte_spinlock_trylock(&global_flag_stru_p->lock);
+		rte_spinlock_lock(&global_flag_stru_p->lock);
 	}
 	rte_spinlock_unlock(&global_flag_stru_p->lock);
 	printf("BYE lcore_main\n");
@@ -571,7 +571,7 @@ static void cmd_start_parsed(__rte_unused void *parsed_result,
 {
 	int worker_core_id = rte_lcore_id();
 
-	rte_spinlock_trylock(&global_flag_stru_p->lock);
+	rte_spinlock_lock(&global_flag_stru_p->lock);
 	if (global_flag_stru_p->LcoreMainIsRunning == 0) {
 		if (rte_eal_get_lcore_state(global_flag_stru_p->LcoreMainCore)
 		    != WAIT) {
@@ -591,7 +591,7 @@ static void cmd_start_parsed(__rte_unused void *parsed_result,
 	if ((worker_core_id >= RTE_MAX_LCORE) || (worker_core_id == 0))
 		return;
 
-	rte_spinlock_trylock(&global_flag_stru_p->lock);
+	rte_spinlock_lock(&global_flag_stru_p->lock);
 	global_flag_stru_p->LcoreMainIsRunning = 1;
 	rte_spinlock_unlock(&global_flag_stru_p->lock);
 	cmdline_printf(cl,
@@ -659,7 +659,7 @@ static void cmd_stop_parsed(__rte_unused void *parsed_result,
 			    struct cmdline *cl,
 			    __rte_unused void *data)
 {
-	rte_spinlock_trylock(&global_flag_stru_p->lock);
+	rte_spinlock_lock(&global_flag_stru_p->lock);
 	if (global_flag_stru_p->LcoreMainIsRunning == 0)	{
 		cmdline_printf(cl,
 					"lcore_main not running on core:%d\n",
@@ -700,7 +700,7 @@ static void cmd_quit_parsed(__rte_unused void *parsed_result,
 			    struct cmdline *cl,
 			    __rte_unused void *data)
 {
-	rte_spinlock_trylock(&global_flag_stru_p->lock);
+	rte_spinlock_lock(&global_flag_stru_p->lock);
 	if (global_flag_stru_p->LcoreMainIsRunning == 0)	{
 		cmdline_printf(cl,
 					"lcore_main not running on core:%d\n",
@@ -762,7 +762,7 @@ static void cmd_show_parsed(__rte_unused void *parsed_result,
 		printf("\n");
 	}
 
-	rte_spinlock_trylock(&global_flag_stru_p->lock);
+	rte_spinlock_lock(&global_flag_stru_p->lock);
 	cmdline_printf(cl,
 			"Active_slaves:%d "
 			"packets received:Tot:%d Arp:%d IPv4:%d\n",
-- 
2.25.1


^ permalink raw reply	[flat|nested] 104+ messages in thread

* RE: [PATCH 1/3] eal: add macro to warn for unused function return values
  2022-04-11 15:15                                                     ` [PATCH " Mattias Rönnblom
  2022-04-11 15:15                                                       ` [PATCH 2/3] eal: emit warning for unused trylock return value Mattias Rönnblom
  2022-04-11 15:15                                                       ` [PATCH 3/3] examples/bond: fix invalid use of trylock Mattias Rönnblom
@ 2022-04-11 15:25                                                       ` Morten Brørup
  2 siblings, 0 replies; 104+ messages in thread
From: Morten Brørup @ 2022-04-11 15:25 UTC (permalink / raw)
  To: Mattias Rönnblom, dev
  Cc: Bruce Richardson, Thomas Monjalon, David Marchand,
	Honnappa.Nagarahalli, hofors, Stephen Hemminger

> From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com]
> Sent: Monday, 11 April 2022 17.16
> 
> This patch adds a wrapper macro __rte_warn_unused_result for the
> warn_unused_result function attribute.
> 
> Marking a function __rte_warn_unused_result will make the compiler
> emit a warning in case the caller does not use the function's return
> value.
> 
> Changes since RFC:
>   * Include usage recommendation and GCC peculiarities in the macro
>     documentation.
> 
> Acked-by: Bruce Richardson <bruce.richardson@intel.com>
> 
> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> ---
>  lib/eal/include/rte_common.h | 25 +++++++++++++++++++++++++
>  1 file changed, 25 insertions(+)
> 
> diff --git a/lib/eal/include/rte_common.h
> b/lib/eal/include/rte_common.h
> index 4a399cc7c8..67587025ab 100644
> --- a/lib/eal/include/rte_common.h
> +++ b/lib/eal/include/rte_common.h
> @@ -222,6 +222,31 @@ static void
> __attribute__((destructor(RTE_PRIO(prio)), used)) func(void)
>   */
>  #define __rte_noreturn __attribute__((noreturn))
> 
> +/**
> + * Issue a warning in case the function's return value is ignored.
> + *
> + * The use of this attribute should be restricted to cases where
> + * ignoring the marked function's return value is almost always a
> + * bug. With GCC, some effort is required to make clear that ignoring
> + * the return value is intentional. The usual void-casting method to
> + * mark something unused as used does not suppress the warning with
> + * this compiler.
> + *
> + * @code{.c}
> + * __rte_warn_unused_result int foo();
> + *
> + * void ignore_foo_result(void) {
> + *         foo(); // generates a warning with all compilers
> + *
> + *         (void)foo(); // still generates the warning with GCC (but
> not clang)
> + *
> + *         int unused __rte_unused;
> + *         unused = foo(); // does the trick with all compilers
> + *  }
> + * @endcode
> + */
> +#define __rte_warn_unused_result __attribute__((warn_unused_result))
> +
>  /**
>   * Force a function to be inlined
>   */
> --
> 2.25.1
> 

Nice! If only all functions were that well documented. :-)

Reviewed-by: Morten Brørup <mb@smartsharesystems.com>


^ permalink raw reply	[flat|nested] 104+ messages in thread

* RE: [PATCH 2/3] eal: emit warning for unused trylock return value
  2022-04-11 15:15                                                       ` [PATCH 2/3] eal: emit warning for unused trylock return value Mattias Rönnblom
@ 2022-04-11 15:29                                                         ` Morten Brørup
  0 siblings, 0 replies; 104+ messages in thread
From: Morten Brørup @ 2022-04-11 15:29 UTC (permalink / raw)
  To: Mattias Rönnblom, dev
  Cc: Bruce Richardson, Thomas Monjalon, David Marchand,
	Honnappa.Nagarahalli, hofors, Stephen Hemminger

> From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com]
> Sent: Monday, 11 April 2022 17.16
> 
> Mark the trylock family of spinlock functions with
> __rte_warn_unused_result.
> 
> Acked-by: Bruce Richardson <bruce.richardson@intel.com>
> 
> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> ---
>  lib/eal/include/generic/rte_spinlock.h | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/lib/eal/include/generic/rte_spinlock.h
> b/lib/eal/include/generic/rte_spinlock.h
> index 40fe49d5ad..73ed4bfbdc 100644
> --- a/lib/eal/include/generic/rte_spinlock.h
> +++ b/lib/eal/include/generic/rte_spinlock.h
> @@ -97,6 +97,7 @@ rte_spinlock_unlock (rte_spinlock_t *sl)
>   * @return
>   *   1 if the lock is successfully taken; 0 otherwise.
>   */
> +__rte_warn_unused_result
>  static inline int
>  rte_spinlock_trylock (rte_spinlock_t *sl);
> 
> @@ -174,6 +175,7 @@ rte_spinlock_unlock_tm(rte_spinlock_t *sl);
>   *   1 if the hardware memory transaction is successfully started
>   *   or lock is successfully taken; 0 otherwise.
>   */
> +__rte_warn_unused_result
>  static inline int
>  rte_spinlock_trylock_tm(rte_spinlock_t *sl);
> 
> @@ -243,6 +245,7 @@ static inline void
> rte_spinlock_recursive_unlock(rte_spinlock_recursive_t *slr)
>   * @return
>   *   1 if the lock is successfully taken; 0 otherwise.
>   */
> +__rte_warn_unused_result
>  static inline int
> rte_spinlock_recursive_trylock(rte_spinlock_recursive_t *slr)
>  {
>  	int id = rte_gettid();
> @@ -299,6 +302,7 @@ static inline void
> rte_spinlock_recursive_unlock_tm(
>   *   1 if the hardware memory transaction is successfully started
>   *   or lock is successfully taken; 0 otherwise.
>   */
> +__rte_warn_unused_result
>  static inline int rte_spinlock_recursive_trylock_tm(
>  	rte_spinlock_recursive_t *slr);
> 
> --
> 2.25.1
> 

Acked-by: Morten Brørup <mb@smartsharesystems.com>


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 1/3] eal: add macro to warn for unused function return values
  2022-04-11  9:16                                                   ` Bruce Richardson
  2022-04-11 14:27                                                     ` Mattias Rönnblom
  2022-04-11 15:15                                                     ` [PATCH " Mattias Rönnblom
@ 2022-04-11 18:24                                                     ` Tyler Retzlaff
  2 siblings, 0 replies; 104+ messages in thread
From: Tyler Retzlaff @ 2022-04-11 18:24 UTC (permalink / raw)
  To: Bruce Richardson
  Cc: Mattias Rönnblom, dev, Thomas Monjalon, David Marchand,
	Honnappa.Nagarahalli, mb, hofors

On Mon, Apr 11, 2022 at 10:16:35AM +0100, Bruce Richardson wrote:
> On Sun, Apr 10, 2022 at 03:51:38PM +0200, Mattias Rönnblom wrote:
> > This patch adds a wrapper macro __rte_warn_unused_result for the
> > warn_unused_result function attribute.
> > 
> > Marking a function __rte_warn_unused_result will make the compiler
> > emit a warning in case the caller does not use the function's return
> > value.
> > 
> > Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> > ---
> 
> This is good to have, thanks.
> 
> Series-acked-by: Bruce Richardson <bruce.richardson@intel.com>

+1
Series-acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>

^ permalink raw reply	[flat|nested] 104+ messages in thread

* RE: [PATCH v4] eal: add seqlock
  2022-04-08 14:24                               ` [PATCH v4] " Mattias Rönnblom
                                                   ` (2 preceding siblings ...)
  2022-04-08 16:48                                 ` Mattias Rönnblom
@ 2022-04-12 17:27                                 ` Ananyev, Konstantin
  2022-04-28 10:28                                 ` David Marchand
  4 siblings, 0 replies; 104+ messages in thread
From: Ananyev, Konstantin @ 2022-04-12 17:27 UTC (permalink / raw)
  To: mattias.ronnblom, dev
  Cc: Thomas Monjalon, David Marchand, Olsen, Onar,
	Honnappa.Nagarahalli, nd, mb, stephen, hofors, mattias.ronnblom,
	Ola Liljedahl



> A sequence lock (seqlock) is synchronization primitive which allows
> for data-race free, low-overhead, high-frequency reads, especially for
> data structures shared across many cores and which are updated
> relatively infrequently.
> 
> A seqlock permits multiple parallel readers. The variant of seqlock
> implemented in this patch supports multiple writers as well. A
> spinlock is used for writer-writer serialization.
> 
> To avoid resource reclamation and other issues, the data protected by
> a seqlock is best off being self-contained (i.e., no pointers [except
> to constant data]).
> 
> One way to think about seqlocks is that they provide means to perform
> atomic operations on data objects larger what the native atomic
> machine instructions allow for.
> 
> DPDK seqlocks are not preemption safe on the writer side. A thread
> preemption affects performance, not correctness.
> 
> A seqlock contains a sequence number, which can be thought of as the
> generation of the data it protects.
> 
> A reader will
>   1. Load the sequence number (sn).
>   2. Load, in arbitrary order, the seqlock-protected data.
>   3. Load the sn again.
>   4. Check if the first and second sn are equal, and even numbered.
>      If they are not, discard the loaded data, and restart from 1.
> 
> The first three steps need to be ordered using suitable memory fences.
> 
> A writer will
>   1. Take the spinlock, to serialize writer access.
>   2. Load the sn.
>   3. Store the original sn + 1 as the new sn.
>   4. Perform load and stores to the seqlock-protected data.
>   5. Store the original sn + 2 as the new sn.
>   6. Release the spinlock.
> 
> Proper memory fencing is required to make sure the first sn store, the
> data stores, and the second sn store appear to the reader in the
> mentioned order.
> 
> The sn loads and stores must be atomic, but the data loads and stores
> need not be.
> 
> The original seqlock design and implementation was done by Stephen
> Hemminger. This is an independent implementation, using C11 atomics.
> 
> For more information on seqlocks, see
> https://en.wikipedia.org/wiki/Seqlock
> 
> PATCH v4:
>   * Reverted to Linux kernel style naming on the read side.
>   * Bail out early from the retry function if an odd sequence
>     number is encountered.
>   * Added experimental warnings in the API documentation.
>   * Static initializer now uses named field initialization.
>   * Various tweaks to API documentation (including the example).
> 
> PATCH v3:
>   * Renamed both read and write-side critical section begin/end functions
>     to better match rwlock naming, per Ola Liljedahl's suggestion.
>   * Added 'extern "C"' guards for C++ compatibility.
>   * Refer to the main lcore as the main lcore, and nothing else.
> 
> PATCH v2:
>   * Skip instead of fail unit test in case too few lcores are available.
>   * Use main lcore for testing, reducing the minimum number of lcores
>     required to run the unit tests to four.
>   * Consistently refer to sn field as the "sequence number" in the
>     documentation.
>   * Fixed spelling mistakes in documentation.
> 
> Updates since RFC:
>   * Added API documentation.
>   * Added link to Wikipedia article in the commit message.
>   * Changed seqlock sequence number field from uint64_t (which was
>     overkill) to uint32_t. The sn type needs to be sufficiently large
>     to assure no reader will read a sn, access the data, and then read
>     the same sn, but the sn has been incremented enough times to have
>     wrapped during the read, and arrived back at the original sn.
>   * Added RTE_SEQLOCK_INITIALIZER macro for static initialization.
>   * Removed the rte_seqlock struct + separate rte_seqlock_t typedef
>     with an anonymous struct typedef:ed to rte_seqlock_t.
> 
> Acked-by: Morten Brørup <mb@smartsharesystems.com>
> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> ---

Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

> --
> 2.25.1


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH 3/3] examples/bond: fix invalid use of trylock
  2022-04-11 15:15                                                       ` [PATCH 3/3] examples/bond: fix invalid use of trylock Mattias Rönnblom
@ 2022-04-14 12:06                                                         ` David Marchand
  0 siblings, 0 replies; 104+ messages in thread
From: David Marchand @ 2022-04-14 12:06 UTC (permalink / raw)
  To: Mattias Rönnblom
  Cc: dev, Bruce Richardson, Thomas Monjalon, Honnappa Nagarahalli,
	Morten Brørup, hofors, Stephen Hemminger,
	michalx.k.jastrzebski, dpdk stable, Min Hu, Tyler Retzlaff

On Mon, Apr 11, 2022 at 5:17 PM Mattias Rönnblom
<mattias.ronnblom@ericsson.com> wrote:
>
> The conditional rte_spinlock_trylock() was used as if it is an
> unconditional lock operation in a number of places.
>
> Fixes: cc7e8ae84faa ("examples/bond: add example application for link bonding mode 6")
> Cc: michalx.k.jastrzebski@intel.com
> Cc: stable@dpdk.org
>
> Acked-by: Bruce Richardson <bruce.richardson@intel.com>
> Acked-by: Min Hu (Connor) <humin29@huawei.com>
>
> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>

Series applied, thanks Mattias.


-- 
David Marchand


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v4] eal: add seqlock
  2022-04-08 14:24                               ` [PATCH v4] " Mattias Rönnblom
                                                   ` (3 preceding siblings ...)
  2022-04-12 17:27                                 ` Ananyev, Konstantin
@ 2022-04-28 10:28                                 ` David Marchand
  2022-05-01 13:46                                   ` Mattias Rönnblom
  4 siblings, 1 reply; 104+ messages in thread
From: David Marchand @ 2022-04-28 10:28 UTC (permalink / raw)
  To: Mattias Rönnblom
  Cc: dev, Thomas Monjalon, onar.olsen, Honnappa Nagarahalli, nd,
	Ananyev, Konstantin, Morten Brørup, Stephen Hemminger,
	hofors, Ola Liljedahl

Hello Mattias,

On Fri, Apr 8, 2022 at 4:25 PM Mattias Rönnblom
<mattias.ronnblom@ericsson.com> wrote:
>
> A sequence lock (seqlock) is synchronization primitive which allows
> for data-race free, low-overhead, high-frequency reads, especially for
> data structures shared across many cores and which are updated
> relatively infrequently.
>
> A seqlock permits multiple parallel readers. The variant of seqlock
> implemented in this patch supports multiple writers as well. A
> spinlock is used for writer-writer serialization.
>
> To avoid resource reclamation and other issues, the data protected by
> a seqlock is best off being self-contained (i.e., no pointers [except
> to constant data]).
>
> One way to think about seqlocks is that they provide means to perform
> atomic operations on data objects larger what the native atomic
> machine instructions allow for.
>
> DPDK seqlocks are not preemption safe on the writer side. A thread
> preemption affects performance, not correctness.
>
> A seqlock contains a sequence number, which can be thought of as the
> generation of the data it protects.
>
> A reader will
>   1. Load the sequence number (sn).
>   2. Load, in arbitrary order, the seqlock-protected data.
>   3. Load the sn again.
>   4. Check if the first and second sn are equal, and even numbered.
>      If they are not, discard the loaded data, and restart from 1.
>
> The first three steps need to be ordered using suitable memory fences.
>
> A writer will
>   1. Take the spinlock, to serialize writer access.
>   2. Load the sn.
>   3. Store the original sn + 1 as the new sn.
>   4. Perform load and stores to the seqlock-protected data.
>   5. Store the original sn + 2 as the new sn.
>   6. Release the spinlock.
>
> Proper memory fencing is required to make sure the first sn store, the
> data stores, and the second sn store appear to the reader in the
> mentioned order.
>
> The sn loads and stores must be atomic, but the data loads and stores
> need not be.
>
> The original seqlock design and implementation was done by Stephen
> Hemminger. This is an independent implementation, using C11 atomics.
>
> For more information on seqlocks, see
> https://en.wikipedia.org/wiki/Seqlock

Revisions changelog should be after commitlog, separated with ---.

>
> PATCH v4:
>   * Reverted to Linux kernel style naming on the read side.
>   * Bail out early from the retry function if an odd sequence
>     number is encountered.
>   * Added experimental warnings in the API documentation.
>   * Static initializer now uses named field initialization.
>   * Various tweaks to API documentation (including the example).
>
> PATCH v3:
>   * Renamed both read and write-side critical section begin/end functions
>     to better match rwlock naming, per Ola Liljedahl's suggestion.
>   * Added 'extern "C"' guards for C++ compatibility.
>   * Refer to the main lcore as the main lcore, and nothing else.
>
> PATCH v2:
>   * Skip instead of fail unit test in case too few lcores are available.
>   * Use main lcore for testing, reducing the minimum number of lcores
>     required to run the unit tests to four.
>   * Consistently refer to sn field as the "sequence number" in the
>     documentation.
>   * Fixed spelling mistakes in documentation.
>
> Updates since RFC:
>   * Added API documentation.
>   * Added link to Wikipedia article in the commit message.
>   * Changed seqlock sequence number field from uint64_t (which was
>     overkill) to uint32_t. The sn type needs to be sufficiently large
>     to assure no reader will read a sn, access the data, and then read
>     the same sn, but the sn has been incremented enough times to have
>     wrapped during the read, and arrived back at the original sn.
>   * Added RTE_SEQLOCK_INITIALIZER macro for static initialization.
>   * Removed the rte_seqlock struct + separate rte_seqlock_t typedef
>     with an anonymous struct typedef:ed to rte_seqlock_t.
>
> Acked-by: Morten Brørup <mb@smartsharesystems.com>
> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>

We are missing a MAINTAINERS update, either with a new section for
this lock (like for MCS and ticket locks), or adding the new test code
under the EAL API and common code section (like rest of the locks).

This new lock is not referenced in doxygen (see doc/api/doxy-api-index.md).

It's worth a release notes update for advertising this new lock type.

[snip]


> diff --git a/app/test/test_seqlock.c b/app/test/test_seqlock.c
> new file mode 100644
> index 0000000000..3f1ce53678
> --- /dev/null
> +++ b/app/test/test_seqlock.c

[snip]

> +/* Only a compile-time test */
> +static rte_seqlock_t __rte_unused static_init_lock = RTE_SEQLOCK_INITIALIZER;
> +
> +static int
> +test_seqlock(void)
> +{
> +       struct reader readers[MAX_READERS];
> +       unsigned int num_readers;
> +       unsigned int num_lcores;
> +       unsigned int i;
> +       unsigned int lcore_id;
> +       unsigned int reader_lcore_ids[MAX_READERS];
> +       unsigned int worker_writer_lcore_id = 0;
> +       int rc = 0;

A unit test is supposed to use TEST_* macros as return values.
I concede other locks unit tests return 0 or -1 (which is equivalent,
given TEST_SUCCESS / TEST_FAILED values).
We can go with 0 / -1 (and a cleanup could be done later on app/test
globally), but at least change to TEST_SKIPPED when lacking lcores
(see below).


> +
> +       num_lcores = rte_lcore_count();
> +
> +       if (num_lcores < MIN_LCORE_COUNT) {
> +               printf("Too few cores to run test. Skipping.\n");
> +               return 0;

return TEST_SKIPPED;


> +       }
> +
> +       num_readers = num_lcores - NUM_WRITERS;
> +
> +       struct data *data = rte_zmalloc(NULL, sizeof(struct data), 0);
> +
> +       i = 0;
> +       RTE_LCORE_FOREACH_WORKER(lcore_id) {
> +               if (i == 0) {
> +                       rte_eal_remote_launch(writer_run, data, lcore_id);
> +                       worker_writer_lcore_id = lcore_id;
> +               } else {
> +                       unsigned int reader_idx = i - 1;
> +                       struct reader *reader = &readers[reader_idx];
> +
> +                       reader->data = data;
> +                       reader->stop = 0;
> +
> +                       rte_eal_remote_launch(reader_run, reader, lcore_id);
> +                       reader_lcore_ids[reader_idx] = lcore_id;
> +               }
> +               i++;
> +       }
> +
> +       if (writer_run(data) != 0 ||
> +           rte_eal_wait_lcore(worker_writer_lcore_id) != 0)
> +               rc = -1;
> +
> +       for (i = 0; i < num_readers; i++) {
> +               reader_stop(&readers[i]);
> +               if (rte_eal_wait_lcore(reader_lcore_ids[i]) != 0)
> +                       rc = -1;
> +       }
> +
> +       return rc;
> +}
> +
> +REGISTER_TEST_COMMAND(seqlock_autotest, test_seqlock);
> diff --git a/lib/eal/common/meson.build b/lib/eal/common/meson.build
> index 917758cc65..a41343bfed 100644
> --- a/lib/eal/common/meson.build
> +++ b/lib/eal/common/meson.build
> @@ -35,6 +35,7 @@ sources += files(
>          'rte_malloc.c',
>          'rte_random.c',
>          'rte_reciprocal.c',
> +       'rte_seqlock.c',

Indent is not correct, please use spaces for meson files.


>          'rte_service.c',
>          'rte_version.c',
>  )
> diff --git a/lib/eal/include/rte_seqlock.h b/lib/eal/include/rte_seqlock.h
> new file mode 100644
> index 0000000000..961816aa10
> --- /dev/null
> +++ b/lib/eal/include/rte_seqlock.h
> @@ -0,0 +1,319 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2022 Ericsson AB
> + */
> +
> +#ifndef _RTE_SEQLOCK_H_
> +#define _RTE_SEQLOCK_H_
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +/**
> + * @file
> + * RTE Seqlock

Nit: mention of RTE adds nothing, I'd remove it.


> + *
> + * A sequence lock (seqlock) is a synchronization primitive allowing
> + * multiple, parallel, readers to efficiently and safely (i.e., in a
> + * data-race free manner) access lock-protected data. The RTE seqlock
> + * permits multiple writers as well. A spinlock is used to
> + * writer-writer synchronization.
> + *
> + * A reader never blocks a writer. Very high frequency writes may
> + * prevent readers from making progress.
> + *
> + * A seqlock is not preemption-safe on the writer side. If a writer is
> + * preempted, it may block readers until the writer thread is allowed
> + * to continue. Heavy computations should be kept out of the
> + * writer-side critical section, to avoid delaying readers.
> + *
> + * Seqlocks are useful for data which are read by many cores, at a
> + * high frequency, and relatively infrequently written to.
> + *
> + * One way to think about seqlocks is that they provide means to
> + * perform atomic operations on objects larger than what the native
> + * machine instructions allow for.
> + *
> + * To avoid resource reclamation issues, the data protected by a
> + * seqlock should typically be kept self-contained (e.g., no pointers
> + * to mutable, dynamically allocated data).
> + *
> + * Example usage:
> + * @code{.c}
> + * #define MAX_Y_LEN (16)
> + * // Application-defined example data structure, protected by a seqlock.
> + * struct config {
> + *         rte_seqlock_t lock;
> + *         int param_x;
> + *         char param_y[MAX_Y_LEN];
> + * };
> + *
> + * // Accessor function for reading config fields.
> + * void
> + * config_read(const struct config *config, int *param_x, char *param_y)
> + * {
> + *         uint32_t sn;
> + *
> + *         do {
> + *                 sn = rte_seqlock_read_begin(&config->lock);
> + *
> + *                 // Loads may be atomic or non-atomic, as in this example.
> + *                 *param_x = config->param_x;
> + *                 strcpy(param_y, config->param_y);
> + *                 // An alternative to an immediate retry is to abort and
> + *                 // try again at some later time, assuming progress is
> + *                 // possible without the data.
> + *         } while (rte_seqlock_read_retry(&config->lock));
> + * }
> + *
> + * // Accessor function for writing config fields.
> + * void
> + * config_update(struct config *config, int param_x, const char *param_y)
> + * {
> + *         rte_seqlock_write_lock(&config->lock);
> + *         // Stores may be atomic or non-atomic, as in this example.
> + *         config->param_x = param_x;
> + *         strcpy(config->param_y, param_y);
> + *         rte_seqlock_write_unlock(&config->lock);
> + * }
> + * @endcode
> + *
> + * @see
> + * https://en.wikipedia.org/wiki/Seqlock.
> + */


The rest lgtm.


-- 
David Marchand


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v4] eal: add seqlock
  2022-04-28 10:28                                 ` David Marchand
@ 2022-05-01 13:46                                   ` Mattias Rönnblom
  2022-05-01 14:03                                     ` [PATCH v5] " Mattias Rönnblom
  0 siblings, 1 reply; 104+ messages in thread
From: Mattias Rönnblom @ 2022-05-01 13:46 UTC (permalink / raw)
  To: David Marchand, Mattias Rönnblom
  Cc: dev, Thomas Monjalon, onar.olsen, Honnappa Nagarahalli, nd,
	Ananyev, Konstantin, Morten Brørup, Stephen Hemminger,
	Ola Liljedahl

On 2022-04-28 12:28, David Marchand wrote:
> Hello Mattias,
> 
> On Fri, Apr 8, 2022 at 4:25 PM Mattias Rönnblom
> <mattias.ronnblom@ericsson.com> wrote:
>>
>> A sequence lock (seqlock) is synchronization primitive which allows
>> for data-race free, low-overhead, high-frequency reads, especially for
>> data structures shared across many cores and which are updated
>> relatively infrequently.
>>
>> A seqlock permits multiple parallel readers. The variant of seqlock
>> implemented in this patch supports multiple writers as well. A
>> spinlock is used for writer-writer serialization.
>>
>> To avoid resource reclamation and other issues, the data protected by
>> a seqlock is best off being self-contained (i.e., no pointers [except
>> to constant data]).
>>
>> One way to think about seqlocks is that they provide means to perform
>> atomic operations on data objects larger what the native atomic
>> machine instructions allow for.
>>
>> DPDK seqlocks are not preemption safe on the writer side. A thread
>> preemption affects performance, not correctness.
>>
>> A seqlock contains a sequence number, which can be thought of as the
>> generation of the data it protects.
>>
>> A reader will
>>    1. Load the sequence number (sn).
>>    2. Load, in arbitrary order, the seqlock-protected data.
>>    3. Load the sn again.
>>    4. Check if the first and second sn are equal, and even numbered.
>>       If they are not, discard the loaded data, and restart from 1.
>>
>> The first three steps need to be ordered using suitable memory fences.
>>
>> A writer will
>>    1. Take the spinlock, to serialize writer access.
>>    2. Load the sn.
>>    3. Store the original sn + 1 as the new sn.
>>    4. Perform load and stores to the seqlock-protected data.
>>    5. Store the original sn + 2 as the new sn.
>>    6. Release the spinlock.
>>
>> Proper memory fencing is required to make sure the first sn store, the
>> data stores, and the second sn store appear to the reader in the
>> mentioned order.
>>
>> The sn loads and stores must be atomic, but the data loads and stores
>> need not be.
>>
>> The original seqlock design and implementation was done by Stephen
>> Hemminger. This is an independent implementation, using C11 atomics.
>>
>> For more information on seqlocks, see
>> https://en.wikipedia.org/wiki/Seqlock
> 
> Revisions changelog should be after commitlog, separated with ---.
> 

OK.

>>
>> PATCH v4:
>>    * Reverted to Linux kernel style naming on the read side.
>>    * Bail out early from the retry function if an odd sequence
>>      number is encountered.
>>    * Added experimental warnings in the API documentation.
>>    * Static initializer now uses named field initialization.
>>    * Various tweaks to API documentation (including the example).
>>
>> PATCH v3:
>>    * Renamed both read and write-side critical section begin/end functions
>>      to better match rwlock naming, per Ola Liljedahl's suggestion.
>>    * Added 'extern "C"' guards for C++ compatibility.
>>    * Refer to the main lcore as the main lcore, and nothing else.
>>
>> PATCH v2:
>>    * Skip instead of fail unit test in case too few lcores are available.
>>    * Use main lcore for testing, reducing the minimum number of lcores
>>      required to run the unit tests to four.
>>    * Consistently refer to sn field as the "sequence number" in the
>>      documentation.
>>    * Fixed spelling mistakes in documentation.
>>
>> Updates since RFC:
>>    * Added API documentation.
>>    * Added link to Wikipedia article in the commit message.
>>    * Changed seqlock sequence number field from uint64_t (which was
>>      overkill) to uint32_t. The sn type needs to be sufficiently large
>>      to assure no reader will read a sn, access the data, and then read
>>      the same sn, but the sn has been incremented enough times to have
>>      wrapped during the read, and arrived back at the original sn.
>>    * Added RTE_SEQLOCK_INITIALIZER macro for static initialization.
>>    * Removed the rte_seqlock struct + separate rte_seqlock_t typedef
>>      with an anonymous struct typedef:ed to rte_seqlock_t.
>>
>> Acked-by: Morten Brørup <mb@smartsharesystems.com>
>> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
>> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> 
> We are missing a MAINTAINERS update, either with a new section for
> this lock (like for MCS and ticket locks), or adding the new test code
> under the EAL API and common code section (like rest of the locks).
> 

OK. I added a new section, and myself as the maintainer. If you want to 
merge it with EAL and the common code that fine with me as well. Let me 
know in that case. The seqlock is a tiny bit of code, but there are some 
intricacies (like always when C11 memory models matter).

> This new lock is not referenced in doxygen (see doc/api/doxy-api-index.md).
> 

OK.

> It's worth a release notes update for advertising this new lock type.
> 

OK.

> [snip]
> 
> 
>> diff --git a/app/test/test_seqlock.c b/app/test/test_seqlock.c
>> new file mode 100644
>> index 0000000000..3f1ce53678
>> --- /dev/null
>> +++ b/app/test/test_seqlock.c
> 
> [snip]
> 
>> +/* Only a compile-time test */
>> +static rte_seqlock_t __rte_unused static_init_lock = RTE_SEQLOCK_INITIALIZER;
>> +
>> +static int
>> +test_seqlock(void)
>> +{
>> +       struct reader readers[MAX_READERS];
>> +       unsigned int num_readers;
>> +       unsigned int num_lcores;
>> +       unsigned int i;
>> +       unsigned int lcore_id;
>> +       unsigned int reader_lcore_ids[MAX_READERS];
>> +       unsigned int worker_writer_lcore_id = 0;
>> +       int rc = 0;
> 
> A unit test is supposed to use TEST_* macros as return values.
> I concede other locks unit tests return 0 or -1 (which is equivalent,
> given TEST_SUCCESS / TEST_FAILED values).
> We can go with 0 / -1 (and a cleanup could be done later on app/test
> globally), but at least change to TEST_SKIPPED when lacking lcores
> (see below).
> 

I'll fix it right away instead.

> 
>> +
>> +       num_lcores = rte_lcore_count();
>> +
>> +       if (num_lcores < MIN_LCORE_COUNT) {
>> +               printf("Too few cores to run test. Skipping.\n");
>> +               return 0;
> 
> return TEST_SKIPPED;
> 
> 

Excellent. I remember asking myself if there was such a return code when 
I wrote that code, but then I forgot to actually go look for it.

>> +       }
>> +
>> +       num_readers = num_lcores - NUM_WRITERS;
>> +
>> +       struct data *data = rte_zmalloc(NULL, sizeof(struct data), 0);
>> +
>> +       i = 0;
>> +       RTE_LCORE_FOREACH_WORKER(lcore_id) {
>> +               if (i == 0) {
>> +                       rte_eal_remote_launch(writer_run, data, lcore_id);
>> +                       worker_writer_lcore_id = lcore_id;
>> +               } else {
>> +                       unsigned int reader_idx = i - 1;
>> +                       struct reader *reader = &readers[reader_idx];
>> +
>> +                       reader->data = data;
>> +                       reader->stop = 0;
>> +
>> +                       rte_eal_remote_launch(reader_run, reader, lcore_id);
>> +                       reader_lcore_ids[reader_idx] = lcore_id;
>> +               }
>> +               i++;
>> +       }
>> +
>> +       if (writer_run(data) != 0 ||
>> +           rte_eal_wait_lcore(worker_writer_lcore_id) != 0)
>> +               rc = -1;
>> +
>> +       for (i = 0; i < num_readers; i++) {
>> +               reader_stop(&readers[i]);
>> +               if (rte_eal_wait_lcore(reader_lcore_ids[i]) != 0)
>> +                       rc = -1;
>> +       }
>> +
>> +       return rc;
>> +}
>> +
>> +REGISTER_TEST_COMMAND(seqlock_autotest, test_seqlock);
>> diff --git a/lib/eal/common/meson.build b/lib/eal/common/meson.build
>> index 917758cc65..a41343bfed 100644
>> --- a/lib/eal/common/meson.build
>> +++ b/lib/eal/common/meson.build
>> @@ -35,6 +35,7 @@ sources += files(
>>           'rte_malloc.c',
>>           'rte_random.c',
>>           'rte_reciprocal.c',
>> +       'rte_seqlock.c',
> 
> Indent is not correct, please use spaces for meson files.
> 

OK.

> 
>>           'rte_service.c',
>>           'rte_version.c',
>>   )
>> diff --git a/lib/eal/include/rte_seqlock.h b/lib/eal/include/rte_seqlock.h
>> new file mode 100644
>> index 0000000000..961816aa10
>> --- /dev/null
>> +++ b/lib/eal/include/rte_seqlock.h
>> @@ -0,0 +1,319 @@
>> +/* SPDX-License-Identifier: BSD-3-Clause
>> + * Copyright(c) 2022 Ericsson AB
>> + */
>> +
>> +#ifndef _RTE_SEQLOCK_H_
>> +#define _RTE_SEQLOCK_H_
>> +
>> +#ifdef __cplusplus
>> +extern "C" {
>> +#endif
>> +
>> +/** >> + * @file
>> + * RTE Seqlock
> 
> Nit: mention of RTE adds nothing, I'd remove it.
> 

I agree, but there seemed to be a convention that called for this.

> 
>> + *
>> + * A sequence lock (seqlock) is a synchronization primitive allowing
>> + * multiple, parallel, readers to efficiently and safely (i.e., in a
>> + * data-race free manner) access lock-protected data. The RTE seqlock
>> + * permits multiple writers as well. A spinlock is used to
>> + * writer-writer synchronization.
>> + *
>> + * A reader never blocks a writer. Very high frequency writes may
>> + * prevent readers from making progress.
>> + *
>> + * A seqlock is not preemption-safe on the writer side. If a writer is
>> + * preempted, it may block readers until the writer thread is allowed
>> + * to continue. Heavy computations should be kept out of the
>> + * writer-side critical section, to avoid delaying readers.
>> + *
>> + * Seqlocks are useful for data which are read by many cores, at a
>> + * high frequency, and relatively infrequently written to.
>> + *
>> + * One way to think about seqlocks is that they provide means to
>> + * perform atomic operations on objects larger than what the native
>> + * machine instructions allow for.
>> + *
>> + * To avoid resource reclamation issues, the data protected by a
>> + * seqlock should typically be kept self-contained (e.g., no pointers
>> + * to mutable, dynamically allocated data).
>> + *
>> + * Example usage:
>> + * @code{.c}
>> + * #define MAX_Y_LEN (16)
>> + * // Application-defined example data structure, protected by a seqlock.
>> + * struct config {
>> + *         rte_seqlock_t lock;
>> + *         int param_x;
>> + *         char param_y[MAX_Y_LEN];
>> + * };
>> + *
>> + * // Accessor function for reading config fields.
>> + * void
>> + * config_read(const struct config *config, int *param_x, char *param_y)
>> + * {
>> + *         uint32_t sn;
>> + *
>> + *         do {
>> + *                 sn = rte_seqlock_read_begin(&config->lock);
>> + *
>> + *                 // Loads may be atomic or non-atomic, as in this example.
>> + *                 *param_x = config->param_x;
>> + *                 strcpy(param_y, config->param_y);
>> + *                 // An alternative to an immediate retry is to abort and
>> + *                 // try again at some later time, assuming progress is
>> + *                 // possible without the data.
>> + *         } while (rte_seqlock_read_retry(&config->lock));
>> + * }
>> + *
>> + * // Accessor function for writing config fields.
>> + * void
>> + * config_update(struct config *config, int param_x, const char *param_y)
>> + * {
>> + *         rte_seqlock_write_lock(&config->lock);
>> + *         // Stores may be atomic or non-atomic, as in this example.
>> + *         config->param_x = param_x;
>> + *         strcpy(config->param_y, param_y);
>> + *         rte_seqlock_write_unlock(&config->lock);
>> + * }
>> + * @endcode
>> + *
>> + * @see
>> + * https://en.wikipedia.org/wiki/Seqlock.
>> + */
> 
> 
> The rest lgtm.
> 
> 

Thanks a lot David for the review!

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v5] eal: add seqlock
  2022-05-01 13:46                                   ` Mattias Rönnblom
@ 2022-05-01 14:03                                     ` Mattias Rönnblom
  2022-05-01 14:22                                       ` Mattias Rönnblom
                                                         ` (2 more replies)
  0 siblings, 3 replies; 104+ messages in thread
From: Mattias Rönnblom @ 2022-05-01 14:03 UTC (permalink / raw)
  To: dev
  Cc: Thomas Monjalon, David Marchand, onar.olsen,
	Honnappa.Nagarahalli, nd, konstantin.ananyev, mb, stephen,
	hofors, Mattias Rönnblom, Ola Liljedahl

A sequence lock (seqlock) is synchronization primitive which allows
for data-race free, low-overhead, high-frequency reads, especially for
data structures shared across many cores and which are updated
relatively infrequently.

A seqlock permits multiple parallel readers. The variant of seqlock
implemented in this patch supports multiple writers as well. A
spinlock is used for writer-writer serialization.

To avoid resource reclamation and other issues, the data protected by
a seqlock is best off being self-contained (i.e., no pointers [except
to constant data]).

One way to think about seqlocks is that they provide means to perform
atomic operations on data objects larger than what the native atomic
machine instructions allow for.

DPDK seqlocks are not preemption safe on the writer side. A thread
preemption affects performance, not correctness.

A seqlock contains a sequence number, which can be thought of as the
generation of the data it protects.

A reader will
  1. Load the sequence number (sn).
  2. Load, in arbitrary order, the seqlock-protected data.
  3. Load the sn again.
  4. Check if the first and second sn are equal, and even numbered.
     If they are not, discard the loaded data, and restart from 1.

The first three steps need to be ordered using suitable memory fences.

A writer will
  1. Take the spinlock, to serialize writer access.
  2. Load the sn.
  3. Store the original sn + 1 as the new sn.
  4. Perform load and stores to the seqlock-protected data.
  5. Store the original sn + 2 as the new sn.
  6. Release the spinlock.

Proper memory fencing is required to make sure the first sn store, the
data stores, and the second sn store appear to the reader in the
mentioned order.

The sn loads and stores must be atomic, but the data loads and stores
need not be.

The original seqlock design and implementation was done by Stephen
Hemminger. This is an independent implementation, using C11 atomics.

For more information on seqlocks, see
https://en.wikipedia.org/wiki/Seqlock

---

PATCH v5:
  * Add sequence lock section to MAINTAINERS.
  * Add entry in the release notes.
  * Add seqlock reference in the API index.
  * Fix meson build file indentation.
  * Use "increment" to describe how a writer changes the sequence number.
  * Remove compiler barriers from seqlock test.
  * Use appropriate macros (e.g., TEST_SUCCESS) for test return values.

PATCH v4:
  * Reverted to Linux kernel style naming on the read side.
  * Bail out early from the retry function if an odd sequence
    number is encountered.
  * Added experimental warnings in the API documentation.
  * Static initializer now uses named field initialization.
  * Various tweaks to API documentation (including the example).

PATCH v3:
  * Renamed both read and write-side critical section begin/end functions
    to better match rwlock naming, per Ola Liljedahl's suggestion.
  * Added 'extern "C"' guards for C++ compatibility.
  * Refer to the main lcore as the main lcore, and nothing else.

PATCH v2:
  * Skip instead of fail unit test in case too few lcores are available.
  * Use main lcore for testing, reducing the minimum number of lcores
    required to run the unit tests to four.
  * Consistently refer to sn field as the "sequence number" in the
    documentation.
  * Fixed spelling mistakes in documentation.

Updates since RFC:
  * Added API documentation.
  * Added link to Wikipedia article in the commit message.
  * Changed seqlock sequence number field from uint64_t (which was
    overkill) to uint32_t. The sn type needs to be sufficiently large
    to assure no reader will read a sn, access the data, and then read
    the same sn, but the sn has been incremented enough times to have
    wrapped during the read, and arrived back at the original sn.
  * Added RTE_SEQLOCK_INITIALIZER macro for static initialization.
  * Removed the rte_seqlock struct + separate rte_seqlock_t typedef
    with an anonymous struct typedef:ed to rte_seqlock_t.

Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
---
 MAINTAINERS                            |   5 +
 app/test/meson.build                   |   2 +
 app/test/test_seqlock.c                | 183 ++++++++++++++
 doc/api/doxy-api-index.md              |   1 +
 doc/guides/rel_notes/release_22_07.rst |  14 ++
 lib/eal/common/meson.build             |   1 +
 lib/eal/common/rte_seqlock.c           |  12 +
 lib/eal/include/meson.build            |   1 +
 lib/eal/include/rte_seqlock.h          | 322 +++++++++++++++++++++++++
 lib/eal/version.map                    |   3 +
 10 files changed, 544 insertions(+)
 create mode 100644 app/test/test_seqlock.c
 create mode 100644 lib/eal/common/rte_seqlock.c
 create mode 100644 lib/eal/include/rte_seqlock.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 7c4f541dba..2804d8136c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -262,6 +262,11 @@ M: Joyce Kong <joyce.kong@arm.com>
 F: lib/eal/include/generic/rte_ticketlock.h
 F: app/test/test_ticketlock.c
 
+Sequence Lock
+M: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
+F: lib/eal/include/rte_seqlock.h
+F: app/test/test_seqlock.c
+
 Pseudo-random Number Generation
 M: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
 F: lib/eal/include/rte_random.h
diff --git a/app/test/meson.build b/app/test/meson.build
index 5fc1dd1b7b..5e418e8766 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -125,6 +125,7 @@ test_sources = files(
         'test_rwlock.c',
         'test_sched.c',
         'test_security.c',
+        'test_seqlock.c',
         'test_service_cores.c',
         'test_spinlock.c',
         'test_stack.c',
@@ -214,6 +215,7 @@ fast_tests = [
         ['rwlock_rde_wro_autotest', true],
         ['sched_autotest', true],
         ['security_autotest', false],
+        ['seqlock_autotest', true],
         ['spinlock_autotest', true],
         ['stack_autotest', false],
         ['stack_lf_autotest', false],
diff --git a/app/test/test_seqlock.c b/app/test/test_seqlock.c
new file mode 100644
index 0000000000..8d91a23ba7
--- /dev/null
+++ b/app/test/test_seqlock.c
@@ -0,0 +1,183 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2022 Ericsson AB
+ */
+
+#include <rte_seqlock.h>
+
+#include <rte_cycles.h>
+#include <rte_malloc.h>
+#include <rte_random.h>
+
+#include <inttypes.h>
+
+#include "test.h"
+
+struct data {
+	rte_seqlock_t lock;
+
+	uint64_t a;
+	uint64_t b __rte_cache_aligned;
+	uint64_t c __rte_cache_aligned;
+} __rte_cache_aligned;
+
+struct reader {
+	struct data *data;
+	uint8_t stop;
+};
+
+#define WRITER_RUNTIME (2.0) /* s */
+
+#define WRITER_MAX_DELAY (100) /* us */
+
+#define INTERRUPTED_WRITER_FREQUENCY (1000)
+#define WRITER_INTERRUPT_TIME (1) /* us */
+
+static int
+writer_run(void *arg)
+{
+	struct data *data = arg;
+	uint64_t deadline;
+
+	deadline = rte_get_timer_cycles() +
+		WRITER_RUNTIME * rte_get_timer_hz();
+
+	while (rte_get_timer_cycles() < deadline) {
+		bool interrupted;
+		uint64_t new_value;
+		unsigned int delay;
+
+		new_value = rte_rand();
+
+		interrupted = rte_rand_max(INTERRUPTED_WRITER_FREQUENCY) == 0;
+
+		rte_seqlock_write_lock(&data->lock);
+
+		data->c = new_value;
+		data->b = new_value;
+
+		if (interrupted)
+			rte_delay_us_block(WRITER_INTERRUPT_TIME);
+
+		data->a = new_value;
+
+		rte_seqlock_write_unlock(&data->lock);
+
+		delay = rte_rand_max(WRITER_MAX_DELAY);
+
+		rte_delay_us_block(delay);
+	}
+
+	return TEST_SUCCESS;
+}
+
+#define INTERRUPTED_READER_FREQUENCY (1000)
+#define READER_INTERRUPT_TIME (1000) /* us */
+
+static int
+reader_run(void *arg)
+{
+	struct reader *r = arg;
+	int rc = TEST_SUCCESS;
+
+	while (__atomic_load_n(&r->stop, __ATOMIC_RELAXED) == 0 &&
+	       rc == TEST_SUCCESS) {
+		struct data *data = r->data;
+		bool interrupted;
+		uint32_t sn;
+		uint64_t a;
+		uint64_t b;
+		uint64_t c;
+
+		interrupted = rte_rand_max(INTERRUPTED_READER_FREQUENCY) == 0;
+
+		do {
+			sn = rte_seqlock_read_begin(&data->lock);
+
+			a = data->a;
+			if (interrupted)
+				rte_delay_us_block(READER_INTERRUPT_TIME);
+			c = data->c;
+			b = data->b;
+
+		} while (rte_seqlock_read_retry(&data->lock, sn));
+
+		if (a != b || b != c) {
+			printf("Reader observed inconsistent data values "
+			       "%" PRIu64 " %" PRIu64 " %" PRIu64 "\n",
+			       a, b, c);
+			rc = TEST_FAILED;
+		}
+	}
+
+	return rc;
+}
+
+static void
+reader_stop(struct reader *reader)
+{
+	__atomic_store_n(&reader->stop, 1, __ATOMIC_RELAXED);
+}
+
+#define NUM_WRITERS (2) /* main lcore + one worker */
+#define MIN_NUM_READERS (2)
+#define MAX_READERS (RTE_MAX_LCORE - NUM_WRITERS - 1)
+#define MIN_LCORE_COUNT (NUM_WRITERS + MIN_NUM_READERS)
+
+/* Only a compile-time test */
+static rte_seqlock_t __rte_unused static_init_lock = RTE_SEQLOCK_INITIALIZER;
+
+static int
+test_seqlock(void)
+{
+	struct reader readers[MAX_READERS];
+	unsigned int num_readers;
+	unsigned int num_lcores;
+	unsigned int i;
+	unsigned int lcore_id;
+	unsigned int reader_lcore_ids[MAX_READERS];
+	unsigned int worker_writer_lcore_id = 0;
+	int rc = TEST_SUCCESS;
+
+	num_lcores = rte_lcore_count();
+
+	if (num_lcores < MIN_LCORE_COUNT) {
+		printf("Too few cores to run test. Skipping.\n");
+		return TEST_SKIPPED;
+	}
+
+	num_readers = num_lcores - NUM_WRITERS;
+
+	struct data *data = rte_zmalloc(NULL, sizeof(struct data), 0);
+
+	i = 0;
+	RTE_LCORE_FOREACH_WORKER(lcore_id) {
+		if (i == 0) {
+			rte_eal_remote_launch(writer_run, data, lcore_id);
+			worker_writer_lcore_id = lcore_id;
+		} else {
+			unsigned int reader_idx = i - 1;
+			struct reader *reader = &readers[reader_idx];
+
+			reader->data = data;
+			reader->stop = 0;
+
+			rte_eal_remote_launch(reader_run, reader, lcore_id);
+			reader_lcore_ids[reader_idx] = lcore_id;
+		}
+		i++;
+	}
+
+	if (writer_run(data) != 0 ||
+	    rte_eal_wait_lcore(worker_writer_lcore_id) != 0)
+		rc = TEST_FAILED;
+
+	for (i = 0; i < num_readers; i++) {
+		reader_stop(&readers[i]);
+		if (rte_eal_wait_lcore(reader_lcore_ids[i]) != 0)
+			rc = TEST_FAILED;
+	}
+
+	return rc;
+}
+
+REGISTER_TEST_COMMAND(seqlock_autotest, test_seqlock);
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index 4245b9635c..f23e33ae30 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -77,6 +77,7 @@ The public API headers are grouped by topics:
   [rwlock]             (@ref rte_rwlock.h),
   [spinlock]           (@ref rte_spinlock.h),
   [ticketlock]         (@ref rte_ticketlock.h),
+  [seqlock]            (@ref rte_seqlock.h),
   [RCU]                (@ref rte_rcu_qsbr.h)
 
 - **CPU arch**:
diff --git a/doc/guides/rel_notes/release_22_07.rst b/doc/guides/rel_notes/release_22_07.rst
index 88d6e96cc1..d2f7bafe7b 100644
--- a/doc/guides/rel_notes/release_22_07.rst
+++ b/doc/guides/rel_notes/release_22_07.rst
@@ -55,6 +55,20 @@ New Features
      Also, make sure to start the actual text at the margin.
      =======================================================
 
+* **Added Sequence Lock.**
+
+  Added a new synchronization primitive: the sequence lock
+  (seqlock). A seqlock allows for low overhead, parallel reads. The
+  DPDK seqlock uses a spinlock to serialize multiple writing threads.
+
+  In particular, seqlocks are useful for protecting data structures
+  which are read very frequently, by threads running on many different
+  cores, and modified relatively infrequently.
+
+  One way to think about seqlocks is that they provide means to
+  perform atomic operations on data objects larger than what the
+  native atomic machine instructions allow for.
+
 * **Updated Intel iavf driver.**
 
   * Added Tx QoS queue rate limitation support.
diff --git a/lib/eal/common/meson.build b/lib/eal/common/meson.build
index 917758cc65..3c896711e5 100644
--- a/lib/eal/common/meson.build
+++ b/lib/eal/common/meson.build
@@ -35,6 +35,7 @@ sources += files(
         'rte_malloc.c',
         'rte_random.c',
         'rte_reciprocal.c',
+        'rte_seqlock.c',
         'rte_service.c',
         'rte_version.c',
 )
diff --git a/lib/eal/common/rte_seqlock.c b/lib/eal/common/rte_seqlock.c
new file mode 100644
index 0000000000..d4fe648799
--- /dev/null
+++ b/lib/eal/common/rte_seqlock.c
@@ -0,0 +1,12 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2022 Ericsson AB
+ */
+
+#include <rte_seqlock.h>
+
+void
+rte_seqlock_init(rte_seqlock_t *seqlock)
+{
+	seqlock->sn = 0;
+	rte_spinlock_init(&seqlock->lock);
+}
diff --git a/lib/eal/include/meson.build b/lib/eal/include/meson.build
index 9700494816..48df5f1a21 100644
--- a/lib/eal/include/meson.build
+++ b/lib/eal/include/meson.build
@@ -36,6 +36,7 @@ headers += files(
         'rte_per_lcore.h',
         'rte_random.h',
         'rte_reciprocal.h',
+        'rte_seqlock.h',
         'rte_service.h',
         'rte_service_component.h',
         'rte_string_fns.h',
diff --git a/lib/eal/include/rte_seqlock.h b/lib/eal/include/rte_seqlock.h
new file mode 100644
index 0000000000..13f8ae2e4e
--- /dev/null
+++ b/lib/eal/include/rte_seqlock.h
@@ -0,0 +1,322 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2022 Ericsson AB
+ */
+
+#ifndef _RTE_SEQLOCK_H_
+#define _RTE_SEQLOCK_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * @file
+ * RTE Seqlock
+ *
+ * A sequence lock (seqlock) is a synchronization primitive allowing
+ * multiple, parallel, readers to efficiently and safely (i.e., in a
+ * data-race free manner) access lock-protected data. The RTE seqlock
+ * permits multiple writers as well. A spinlock is used to
+ * writer-writer synchronization.
+ *
+ * A reader never blocks a writer. Very high frequency writes may
+ * prevent readers from making progress.
+ *
+ * A seqlock is not preemption-safe on the writer side. If a writer is
+ * preempted, it may block readers until the writer thread is allowed
+ * to continue. Heavy computations should be kept out of the
+ * writer-side critical section, to avoid delaying readers.
+ *
+ * Seqlocks are useful for data which are read by many cores, at a
+ * high frequency, and relatively infrequently written to.
+ *
+ * One way to think about seqlocks is that they provide means to
+ * perform atomic operations on objects larger than what the native
+ * machine instructions allow for.
+ *
+ * To avoid resource reclamation issues, the data protected by a
+ * seqlock should typically be kept self-contained (e.g., no pointers
+ * to mutable, dynamically allocated data).
+ *
+ * Example usage:
+ * @code{.c}
+ * #define MAX_Y_LEN (16)
+ * // Application-defined example data structure, protected by a seqlock.
+ * struct config {
+ *         rte_seqlock_t lock;
+ *         int param_x;
+ *         char param_y[MAX_Y_LEN];
+ * };
+ *
+ * // Accessor function for reading config fields.
+ * void
+ * config_read(const struct config *config, int *param_x, char *param_y)
+ * {
+ *         uint32_t sn;
+ *
+ *         do {
+ *                 sn = rte_seqlock_read_begin(&config->lock);
+ *
+ *                 // Loads may be atomic or non-atomic, as in this example.
+ *                 *param_x = config->param_x;
+ *                 strcpy(param_y, config->param_y);
+ *                 // An alternative to an immediate retry is to abort and
+ *                 // try again at some later time, assuming progress is
+ *                 // possible without the data.
+ *         } while (rte_seqlock_read_retry(&config->lock));
+ * }
+ *
+ * // Accessor function for writing config fields.
+ * void
+ * config_update(struct config *config, int param_x, const char *param_y)
+ * {
+ *         rte_seqlock_write_lock(&config->lock);
+ *         // Stores may be atomic or non-atomic, as in this example.
+ *         config->param_x = param_x;
+ *         strcpy(config->param_y, param_y);
+ *         rte_seqlock_write_unlock(&config->lock);
+ * }
+ * @endcode
+ *
+ * @see
+ * https://en.wikipedia.org/wiki/Seqlock.
+ */
+
+#include <stdbool.h>
+#include <stdint.h>
+
+#include <rte_atomic.h>
+#include <rte_branch_prediction.h>
+#include <rte_spinlock.h>
+
+/**
+ * The RTE seqlock type.
+ */
+typedef struct {
+	uint32_t sn; /**< A sequence number for the protected data. */
+	rte_spinlock_t lock; /**< Spinlock used to serialize writers.  */
+} rte_seqlock_t;
+
+/**
+ * A static seqlock initializer.
+ */
+#define RTE_SEQLOCK_INITIALIZER \
+	{							\
+		.sn = 0,					\
+		.lock = RTE_SPINLOCK_INITIALIZER		\
+	}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Initialize the seqlock.
+ *
+ * This function initializes the seqlock, and leaves the writer-side
+ * spinlock unlocked.
+ *
+ * @param seqlock
+ *   A pointer to the seqlock.
+ */
+__rte_experimental
+void
+rte_seqlock_init(rte_seqlock_t *seqlock);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Begin a read-side critical section.
+ *
+ * A call to this function marks the beginning of a read-side critical
+ * section, for @p seqlock.
+ *
+ * rte_seqlock_read_begin() returns a sequence number, which is later
+ * used in rte_seqlock_read_retry() to check if the protected data
+ * underwent any modifications during the read transaction.
+ *
+ * After (in program order) rte_seqlock_read_begin() has been called,
+ * the calling thread reads the protected data, for later use. The
+ * protected data read *must* be copied (either in pristine form, or
+ * in the form of some derivative), since the caller may only read the
+ * data from within the read-side critical section (i.e., after
+ * rte_seqlock_read_begin() and before rte_seqlock_read_retry()),
+ * but must not act upon the retrieved data while in the critical
+ * section, since it does not yet know if it is consistent.
+ *
+ * The protected data may be read using atomic and/or non-atomic
+ * operations.
+ *
+ * After (in program order) all required data loads have been
+ * performed, rte_seqlock_read_retry() should be called, marking
+ * the end of the read-side critical section.
+ *
+ * If rte_seqlock_read_retry() returns true, the just-read data is
+ * inconsistent and should be discarded. The caller has the option to
+ * either restart the whole procedure right away (i.e., calling
+ * rte_seqlock_read_begin() again), or do the same at some later time.
+ *
+ * If rte_seqlock_read_retry() returns false, the data was read
+ * atomically and the copied data is consistent.
+ *
+ * @param seqlock
+ *   A pointer to the seqlock.
+ * @return
+ *   The seqlock sequence number for this critical section, to
+ *   later be passed to rte_seqlock_read_retry().
+ *
+ * @see rte_seqlock_read_retry()
+ */
+
+__rte_experimental
+static inline uint32_t
+rte_seqlock_read_begin(const rte_seqlock_t *seqlock)
+{
+	/* __ATOMIC_ACQUIRE to prevent loads after (in program order)
+	 * from happening before the sn load. Synchronizes-with the
+	 * store release in rte_seqlock_write_unlock().
+	 */
+	return __atomic_load_n(&seqlock->sn, __ATOMIC_ACQUIRE);
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * End a read-side critical section.
+ *
+ * A call to this function marks the end of a read-side critical
+ * section, for @p seqlock. The application must supply the sequence
+ * number produced by the corresponding rte_seqlock_read_begin() call.
+ *
+ * After this function has been called, the caller should not access
+ * the protected data.
+ *
+ * In case rte_seqlock_read_retry() returns true, the just-read data
+ * was modified as it was being read and may be inconsistent, and thus
+ * should be discarded.
+ *
+ * In case this function returns false, the data is consistent and the
+ * set of atomic and non-atomic load operations performed between
+ * rte_seqlock_read_begin() and rte_seqlock_read_retry() were atomic,
+ * as a whole.
+ *
+ * @param seqlock
+ *   A pointer to the seqlock.
+ * @param begin_sn
+ *   The seqlock sequence number returned by rte_seqlock_read_begin().
+ * @return
+ *   true or false, if the just-read seqlock-protected data was
+ *   inconsistent or consistent, respectively, at the time it was
+ *   read.
+ *
+ * @see rte_seqlock_read_begin()
+ */
+__rte_experimental
+static inline bool
+rte_seqlock_read_retry(const rte_seqlock_t *seqlock, uint32_t begin_sn)
+{
+	uint32_t end_sn;
+
+	/* An odd sequence number means the protected data was being
+	 * modified already at the point of the rte_seqlock_read_begin()
+	 * call.
+	 */
+	if (unlikely(begin_sn & 1))
+		return true;
+
+	/* make sure the data loads happens before the sn load */
+	rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
+
+	end_sn = __atomic_load_n(&seqlock->sn, __ATOMIC_RELAXED);
+
+	/* A writer incremented the sequence number during this read
+	 * critical section.
+	 */
+	if (unlikely(begin_sn != end_sn))
+		return true;
+
+	return false;
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Begin a write-side critical section.
+ *
+ * A call to this function acquires the write lock associated @p
+ * seqlock, and marks the beginning of a write-side critical section.
+ *
+ * After having called this function, the caller may go on to modify
+ * (both read and write) the protected data, in an atomic or
+ * non-atomic manner.
+ *
+ * After the necessary updates have been performed, the application
+ * calls rte_seqlock_write_unlock().
+ *
+ * This function is not preemption-safe in the sense that preemption
+ * of the calling thread may block reader progress until the writer
+ * thread is rescheduled.
+ *
+ * Unlike rte_seqlock_read_begin(), each call made to
+ * rte_seqlock_write_lock() must be matched with an unlock call.
+ *
+ * @param seqlock
+ *   A pointer to the seqlock.
+ *
+ * @see rte_seqlock_write_unlock()
+ */
+__rte_experimental
+static inline void
+rte_seqlock_write_lock(rte_seqlock_t *seqlock)
+{
+	uint32_t sn;
+
+	/* to synchronize with other writers */
+	rte_spinlock_lock(&seqlock->lock);
+
+	sn = seqlock->sn + 1;
+
+	__atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELAXED);
+
+	/* __ATOMIC_RELEASE to prevent stores after (in program order)
+	 * from happening before the sn store.
+	 */
+	rte_atomic_thread_fence(__ATOMIC_RELEASE);
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * End a write-side critical section.
+ *
+ * A call to this function marks the end of the write-side critical
+ * section, for @p seqlock. After this call has been made, the protected
+ * data may no longer be modified.
+ *
+ * @param seqlock
+ *   A pointer to the seqlock.
+ *
+ * @see rte_seqlock_write_lock()
+ */
+__rte_experimental
+static inline void
+rte_seqlock_write_unlock(rte_seqlock_t *seqlock)
+{
+	uint32_t sn;
+
+	sn = seqlock->sn + 1;
+
+	/* synchronizes-with the load acquire in rte_seqlock_read_begin() */
+	__atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELEASE);
+
+	rte_spinlock_unlock(&seqlock->lock);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif  /* _RTE_SEQLOCK_H_ */
diff --git a/lib/eal/version.map b/lib/eal/version.map
index b53eeb30d7..4a9d0ed899 100644
--- a/lib/eal/version.map
+++ b/lib/eal/version.map
@@ -420,6 +420,9 @@ EXPERIMENTAL {
 	rte_intr_instance_free;
 	rte_intr_type_get;
 	rte_intr_type_set;
+
+	# added in 22.07
+	rte_seqlock_init;
 };
 
 INTERNAL {
-- 
2.25.1


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v5] eal: add seqlock
  2022-05-01 14:03                                     ` [PATCH v5] " Mattias Rönnblom
@ 2022-05-01 14:22                                       ` Mattias Rönnblom
  2022-05-02  6:47                                         ` David Marchand
  2022-05-01 20:17                                       ` Stephen Hemminger
  2022-05-06  1:26                                       ` fengchengwen
  2 siblings, 1 reply; 104+ messages in thread
From: Mattias Rönnblom @ 2022-05-01 14:22 UTC (permalink / raw)
  To: Mattias Rönnblom, dev
  Cc: Thomas Monjalon, David Marchand, onar.olsen,
	Honnappa.Nagarahalli, nd, konstantin.ananyev, mb, stephen,
	Ola Liljedahl

On 2022-05-01 16:03, Mattias Rönnblom wrote:
> A sequence lock (seqlock) is synchronization primitive which allows

"/../ is a /../"

<snip>

David, maybe you can fix this typo? Unless there is a need for a new 
version.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v5] eal: add seqlock
  2022-05-01 14:03                                     ` [PATCH v5] " Mattias Rönnblom
  2022-05-01 14:22                                       ` Mattias Rönnblom
@ 2022-05-01 20:17                                       ` Stephen Hemminger
  2022-05-02  4:51                                         ` Mattias Rönnblom
  2022-05-06  1:26                                       ` fengchengwen
  2 siblings, 1 reply; 104+ messages in thread
From: Stephen Hemminger @ 2022-05-01 20:17 UTC (permalink / raw)
  To: Mattias Rönnblom
  Cc: dev, Thomas Monjalon, David Marchand, onar.olsen,
	Honnappa.Nagarahalli, nd, konstantin.ananyev, mb, hofors,
	Ola Liljedahl

On Sun, 1 May 2022 16:03:27 +0200
Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote:

> +struct data {
> +	rte_seqlock_t lock;
> +
> +	uint64_t a;
> +	uint64_t b __rte_cache_aligned;
> +	uint64_t c __rte_cache_aligned;
> +} __rte_cache_aligned;

This will end up taking 192 bytes per lock.
Which is a lot especially if embedded in another structure.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v5] eal: add seqlock
  2022-05-01 20:17                                       ` Stephen Hemminger
@ 2022-05-02  4:51                                         ` Mattias Rönnblom
  0 siblings, 0 replies; 104+ messages in thread
From: Mattias Rönnblom @ 2022-05-02  4:51 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: dev, Thomas Monjalon, David Marchand, Onar Olsen,
	Honnappa.Nagarahalli, nd, konstantin.ananyev, mb, hofors,
	Ola Liljedahl

On 2022-05-01 22:17, Stephen Hemminger wrote:
> On Sun, 1 May 2022 16:03:27 +0200
> Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote:
> 
>> +struct data {
>> +	rte_seqlock_t lock;
>> +
>> +	uint64_t a;
>> +	uint64_t b __rte_cache_aligned;
>> +	uint64_t c __rte_cache_aligned;
>> +} __rte_cache_aligned;
> 
> This will end up taking 192 bytes per lock.
> Which is a lot especially if embedded in another structure.

"b" and "c" are cache-line aligned to increase the chance of exposing 
any bugs in the seqlock implementation. With these annotations, 
accessing all struct data's fields are multiple distinct interactions 
with the memory hierarchy, instead of one atomic "request for ownership" 
type operation for a particular cache line, from the core. At least that 
what the difference would be in my simple mental model of the typical CPU.

You mention this because you think it serves as a bad example, or what 
is the reason? The lock itself is much smaller than that, and not 
cache-line aligned. "struct data" are only used by the unit tests.

I should have mentioned the reason for the __rte_cache_aligned as a comment.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v5] eal: add seqlock
  2022-05-01 14:22                                       ` Mattias Rönnblom
@ 2022-05-02  6:47                                         ` David Marchand
  0 siblings, 0 replies; 104+ messages in thread
From: David Marchand @ 2022-05-02  6:47 UTC (permalink / raw)
  To: Mattias Rönnblom
  Cc: Mattias Rönnblom, dev, Thomas Monjalon, onar.olsen,
	Honnappa Nagarahalli, nd, Ananyev, Konstantin,
	Morten Brørup, Stephen Hemminger, Ola Liljedahl

On Sun, May 1, 2022 at 4:22 PM Mattias Rönnblom <hofors@lysator.liu.se> wrote:
>
> On 2022-05-01 16:03, Mattias Rönnblom wrote:
> > A sequence lock (seqlock) is synchronization primitive which allows
>
> "/../ is a /../"
>
> <snip>
>
> David, maybe you can fix this typo? Unless there is a need for a new
> version.

Noted.
No need for a new version just for this.

Thanks.


-- 
David Marchand


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v5] eal: add seqlock
  2022-05-01 14:03                                     ` [PATCH v5] " Mattias Rönnblom
  2022-05-01 14:22                                       ` Mattias Rönnblom
  2022-05-01 20:17                                       ` Stephen Hemminger
@ 2022-05-06  1:26                                       ` fengchengwen
  2022-05-06  1:33                                         ` Honnappa Nagarahalli
  2022-05-08 11:56                                         ` Mattias Rönnblom
  2 siblings, 2 replies; 104+ messages in thread
From: fengchengwen @ 2022-05-06  1:26 UTC (permalink / raw)
  To: Mattias Rönnblom, dev
  Cc: Thomas Monjalon, David Marchand, onar.olsen,
	Honnappa.Nagarahalli, nd, konstantin.ananyev, mb, stephen,
	hofors, Ola Liljedahl

On 2022/5/1 22:03, Mattias Rönnblom wrote:
> A sequence lock (seqlock) is synchronization primitive which allows
> for data-race free, low-overhead, high-frequency reads, especially for
> data structures shared across many cores and which are updated
> relatively infrequently.
> 

...

> +}
> +
> +static void
> +reader_stop(struct reader *reader)
> +{
> +	__atomic_store_n(&reader->stop, 1, __ATOMIC_RELAXED);
> +}
> +
> +#define NUM_WRITERS (2) /* main lcore + one worker */
> +#define MIN_NUM_READERS (2)
> +#define MAX_READERS (RTE_MAX_LCORE - NUM_WRITERS - 1)

Why minus 1 ?
Suggest define MAX_READERS RTE_MAX_LCORE to avoid underflow with small size VM.

> +#define MIN_LCORE_COUNT (NUM_WRITERS + MIN_NUM_READERS)
> +
> +/* Only a compile-time test */
> +static rte_seqlock_t __rte_unused static_init_lock = RTE_SEQLOCK_INITIALIZER;
> +
> +static int
> +test_seqlock(void)
> +{
> +	struct reader readers[MAX_READERS];
> +	unsigned int num_readers;
> +	unsigned int num_lcores;
> +	unsigned int i;
> +	unsigned int lcore_id;
> +	unsigned int reader_lcore_ids[MAX_READERS];
> +	unsigned int worker_writer_lcore_id = 0;
> +	int rc = TEST_SUCCESS;
> +
> +	num_lcores = rte_lcore_count();
> +
> +	if (num_lcores < MIN_LCORE_COUNT) {
> +		printf("Too few cores to run test. Skipping.\n");
> +		return TEST_SKIPPED;
> +	}
> +
> +	num_readers = num_lcores - NUM_WRITERS;
> +
> +	struct data *data = rte_zmalloc(NULL, sizeof(struct data), 0);

Please check whether the value of data is NULL.

> +
> +	i = 0;
> +	RTE_LCORE_FOREACH_WORKER(lcore_id) {
> +		if (i == 0) {
> +			rte_eal_remote_launch(writer_run, data, lcore_id);
> +			worker_writer_lcore_id = lcore_id;
> +		} else {
> +			unsigned int reader_idx = i - 1;
> +			struct reader *reader = &readers[reader_idx];
> +
> +			reader->data = data;
> +			reader->stop = 0;
> +
> +			rte_eal_remote_launch(reader_run, reader, lcore_id);
> +			reader_lcore_ids[reader_idx] = lcore_id;
> +		}
> +		i++;
> +	}
> +
> +	if (writer_run(data) != 0 ||
> +	    rte_eal_wait_lcore(worker_writer_lcore_id) != 0)
> +		rc = TEST_FAILED;
> +
> +	for (i = 0; i < num_readers; i++) {
> +		reader_stop(&readers[i]);
> +		if (rte_eal_wait_lcore(reader_lcore_ids[i]) != 0)
> +			rc = TEST_FAILED;
> +	}
> +

Please free data memory.

> +	return rc;
> +}
> +
> +REGISTER_TEST_COMMAND(seqlock_autotest, test_seqlock);
> diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
> index 4245b9635c..f23e33ae30 100644
> --- a/doc/api/doxy-api-index.md
> +++ b/doc/api/doxy-api-index.md
> @@ -77,6 +77,7 @@ The public API headers are grouped by topics:
>    [rwlock]             (@ref rte_rwlock.h),
>    [spinlock]           (@ref rte_spinlock.h),
>    [ticketlock]         (@ref rte_ticketlock.h),
> +  [seqlock]            (@ref rte_seqlock.h),
>    [RCU]                (@ref rte_rcu_qsbr.h)
>  

...

> + */
> +__rte_experimental
> +static inline bool
> +rte_seqlock_read_retry(const rte_seqlock_t *seqlock, uint32_t begin_sn)
> +{
> +	uint32_t end_sn;
> +
> +	/* An odd sequence number means the protected data was being
> +	 * modified already at the point of the rte_seqlock_read_begin()
> +	 * call.
> +	 */
> +	if (unlikely(begin_sn & 1))
> +		return true;
> +
> +	/* make sure the data loads happens before the sn load */
> +	rte_atomic_thread_fence(__ATOMIC_ACQUIRE);

In ARMv8, the rte_atomic_thread_fence(__ATOMIC_ACQUIRE) and rte_smp_rmb() both output 'dma ishld'
Suggest use rte_smp_rmb(), please see below comment.

> +
> +	end_sn = __atomic_load_n(&seqlock->sn, __ATOMIC_RELAXED);
> +
> +	/* A writer incremented the sequence number during this read
> +	 * critical section.
> +	 */
> +	if (unlikely(begin_sn != end_sn))
> +		return true;
> +
> +	return false;
> +}
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Begin a write-side critical section.
> + *
> + * A call to this function acquires the write lock associated @p
> + * seqlock, and marks the beginning of a write-side critical section.
> + *
> + * After having called this function, the caller may go on to modify
> + * (both read and write) the protected data, in an atomic or
> + * non-atomic manner.
> + *
> + * After the necessary updates have been performed, the application
> + * calls rte_seqlock_write_unlock().
> + *
> + * This function is not preemption-safe in the sense that preemption
> + * of the calling thread may block reader progress until the writer
> + * thread is rescheduled.
> + *
> + * Unlike rte_seqlock_read_begin(), each call made to
> + * rte_seqlock_write_lock() must be matched with an unlock call.
> + *
> + * @param seqlock
> + *   A pointer to the seqlock.
> + *
> + * @see rte_seqlock_write_unlock()
> + */
> +__rte_experimental
> +static inline void
> +rte_seqlock_write_lock(rte_seqlock_t *seqlock)
> +{
> +	uint32_t sn;
> +
> +	/* to synchronize with other writers */
> +	rte_spinlock_lock(&seqlock->lock);
> +
> +	sn = seqlock->sn + 1;
> +
> +	__atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELAXED);
> +
> +	/* __ATOMIC_RELEASE to prevent stores after (in program order)
> +	 * from happening before the sn store.
> +	 */
> +	rte_atomic_thread_fence(__ATOMIC_RELEASE);

In ARMv8, rte_atomic_thread_fence(__ATOMIC_RELEASE) will output 'dmb ish', and
rte_smp_wmb() will output 'dma ishst'.
Suggest use rte_smp_wmb(). I think here only need to use store mb here.

> +}
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * End a write-side critical section.
> + *
> + * A call to this function marks the end of the write-side critical
> + * section, for @p seqlock. After this call has been made, the protected
> + * data may no longer be modified.
> + *
> + * @param seqlock
> + *   A pointer to the seqlock.
> + *
> + * @see rte_seqlock_write_lock()
> + */
> +__rte_experimental
> +static inline void
> +rte_seqlock_write_unlock(rte_seqlock_t *seqlock)
> +{
> +	uint32_t sn;
> +
> +	sn = seqlock->sn + 1;
> +
> +	/* synchronizes-with the load acquire in rte_seqlock_read_begin() */
> +	__atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELEASE);
> +
> +	rte_spinlock_unlock(&seqlock->lock);
> +}
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif  /* _RTE_SEQLOCK_H_ */
> diff --git a/lib/eal/version.map b/lib/eal/version.map
> index b53eeb30d7..4a9d0ed899 100644
> --- a/lib/eal/version.map
> +++ b/lib/eal/version.map
> @@ -420,6 +420,9 @@ EXPERIMENTAL {
>  	rte_intr_instance_free;
>  	rte_intr_type_get;
>  	rte_intr_type_set;
> +
> +	# added in 22.07
> +	rte_seqlock_init;
>  };
>  
>  INTERNAL {
> 

Reviewed-by: Chengwen Feng <fengchengwen@huawei.com>



^ permalink raw reply	[flat|nested] 104+ messages in thread

* RE: [PATCH v5] eal: add seqlock
  2022-05-06  1:26                                       ` fengchengwen
@ 2022-05-06  1:33                                         ` Honnappa Nagarahalli
  2022-05-06  4:17                                           ` fengchengwen
  2022-05-08 11:56                                         ` Mattias Rönnblom
  1 sibling, 1 reply; 104+ messages in thread
From: Honnappa Nagarahalli @ 2022-05-06  1:33 UTC (permalink / raw)
  To: fengchengwen, Mattias Rönnblom, dev
  Cc: thomas, David Marchand, onar.olsen, nd, konstantin.ananyev, mb,
	stephen, hofors, Ola Liljedahl, nd

<snip>

> > +__rte_experimental
> > +static inline bool
> > +rte_seqlock_read_retry(const rte_seqlock_t *seqlock, uint32_t
> > +begin_sn) {
> > +	uint32_t end_sn;
> > +
> > +	/* An odd sequence number means the protected data was being
> > +	 * modified already at the point of the rte_seqlock_read_begin()
> > +	 * call.
> > +	 */
> > +	if (unlikely(begin_sn & 1))
> > +		return true;
> > +
> > +	/* make sure the data loads happens before the sn load */
> > +	rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
> 
> In ARMv8, the rte_atomic_thread_fence(__ATOMIC_ACQUIRE) and
> rte_smp_rmb() both output 'dma ishld'
> Suggest use rte_smp_rmb(), please see below comment.
rte_smp_xxx APIs are deprecated. Please check [1]

[1] https://www.dpdk.org/blog/2021/03/26/dpdk-adopts-the-c11-memory-model/

<snip>

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v5] eal: add seqlock
  2022-05-06  1:33                                         ` Honnappa Nagarahalli
@ 2022-05-06  4:17                                           ` fengchengwen
  2022-05-06  5:19                                             ` Honnappa Nagarahalli
  0 siblings, 1 reply; 104+ messages in thread
From: fengchengwen @ 2022-05-06  4:17 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Mattias Rönnblom, dev
  Cc: thomas, David Marchand, onar.olsen, nd, konstantin.ananyev, mb,
	stephen, hofors, Ola Liljedahl

On 2022/5/6 9:33, Honnappa Nagarahalli wrote:
> <snip>
> 
>>> +__rte_experimental
>>> +static inline bool
>>> +rte_seqlock_read_retry(const rte_seqlock_t *seqlock, uint32_t
>>> +begin_sn) {
>>> +	uint32_t end_sn;
>>> +
>>> +	/* An odd sequence number means the protected data was being
>>> +	 * modified already at the point of the rte_seqlock_read_begin()
>>> +	 * call.
>>> +	 */
>>> +	if (unlikely(begin_sn & 1))
>>> +		return true;
>>> +
>>> +	/* make sure the data loads happens before the sn load */
>>> +	rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
>>
>> In ARMv8, the rte_atomic_thread_fence(__ATOMIC_ACQUIRE) and
>> rte_smp_rmb() both output 'dma ishld'
>> Suggest use rte_smp_rmb(), please see below comment.
> rte_smp_xxx APIs are deprecated. Please check [1]
> 
> [1] https://www.dpdk.org/blog/2021/03/26/dpdk-adopts-the-c11-memory-model/

Got it, thanks

And I have a question about ARM: why can't find the parameter(rte_atomic_thread_fence(?)) corresponding to 'dmb ishst'?
I tried __ATOMIC_RELEASE/ACQ_REL/SEQ_CST and can't find it.

> 
> <snip>
> 


^ permalink raw reply	[flat|nested] 104+ messages in thread

* RE: [PATCH v5] eal: add seqlock
  2022-05-06  4:17                                           ` fengchengwen
@ 2022-05-06  5:19                                             ` Honnappa Nagarahalli
  2022-05-06  7:03                                               ` fengchengwen
  0 siblings, 1 reply; 104+ messages in thread
From: Honnappa Nagarahalli @ 2022-05-06  5:19 UTC (permalink / raw)
  To: fengchengwen, Mattias Rönnblom, dev
  Cc: thomas, David Marchand, onar.olsen, nd, konstantin.ananyev, mb,
	stephen, hofors, Ola Liljedahl, nd

<snip>

> >>> +__rte_experimental
> >>> +static inline bool
> >>> +rte_seqlock_read_retry(const rte_seqlock_t *seqlock, uint32_t
> >>> +begin_sn) {
> >>> +	uint32_t end_sn;
> >>> +
> >>> +	/* An odd sequence number means the protected data was being
> >>> +	 * modified already at the point of the rte_seqlock_read_begin()
> >>> +	 * call.
> >>> +	 */
> >>> +	if (unlikely(begin_sn & 1))
> >>> +		return true;
> >>> +
> >>> +	/* make sure the data loads happens before the sn load */
> >>> +	rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
> >>
> >> In ARMv8, the rte_atomic_thread_fence(__ATOMIC_ACQUIRE) and
> >> rte_smp_rmb() both output 'dma ishld'
> >> Suggest use rte_smp_rmb(), please see below comment.
> > rte_smp_xxx APIs are deprecated. Please check [1]
> >
> > [1] https://www.dpdk.org/blog/2021/03/26/dpdk-adopts-the-c11-memory-
> model/
> 
> Got it, thanks
> 
> And I have a question about ARM: why can't find the
> parameter(rte_atomic_thread_fence(?)) corresponding to 'dmb ishst'?
> I tried __ATOMIC_RELEASE/ACQ_REL/SEQ_CST and can't find it.
'dmb ishst' prevents store-store reordering. However, '__atomic_thread_fence' (with various memory ordering) requires more stronger barrier [1].

[1] https://preshing.com/20130922/acquire-and-release-fences/
> 
> >
> > <snip>
> >


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v5] eal: add seqlock
  2022-05-06  5:19                                             ` Honnappa Nagarahalli
@ 2022-05-06  7:03                                               ` fengchengwen
  0 siblings, 0 replies; 104+ messages in thread
From: fengchengwen @ 2022-05-06  7:03 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Mattias Rönnblom, dev
  Cc: thomas, David Marchand, onar.olsen, nd, konstantin.ananyev, mb,
	stephen, hofors, Ola Liljedahl

On 2022/5/6 13:19, Honnappa Nagarahalli wrote:
> <snip>
> 
>>>>> +__rte_experimental
>>>>> +static inline bool
>>>>> +rte_seqlock_read_retry(const rte_seqlock_t *seqlock, uint32_t
>>>>> +begin_sn) {
>>>>> +	uint32_t end_sn;
>>>>> +
>>>>> +	/* An odd sequence number means the protected data was being
>>>>> +	 * modified already at the point of the rte_seqlock_read_begin()
>>>>> +	 * call.
>>>>> +	 */
>>>>> +	if (unlikely(begin_sn & 1))
>>>>> +		return true;
>>>>> +
>>>>> +	/* make sure the data loads happens before the sn load */
>>>>> +	rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
>>>>
>>>> In ARMv8, the rte_atomic_thread_fence(__ATOMIC_ACQUIRE) and
>>>> rte_smp_rmb() both output 'dma ishld'
>>>> Suggest use rte_smp_rmb(), please see below comment.
>>> rte_smp_xxx APIs are deprecated. Please check [1]
>>>
>>> [1] https://www.dpdk.org/blog/2021/03/26/dpdk-adopts-the-c11-memory-
>> model/
>>
>> Got it, thanks
>>
>> And I have a question about ARM: why can't find the
>> parameter(rte_atomic_thread_fence(?)) corresponding to 'dmb ishst'?
>> I tried __ATOMIC_RELEASE/ACQ_REL/SEQ_CST and can't find it.
> 'dmb ishst' prevents store-store reordering. However, '__atomic_thread_fence' (with various memory ordering) requires more stronger barrier [1].

For this seqlock scenario, I think it's OK to use 'dmb ishst' in rte_seqlock_write_lock()
instead of use rte_atomic_thread_fence(__ATOMIC_RELEASE), but the 'dmb ishst' havn't corresponding
rte_atomic_thread_fence() wrapper, so in this case, we could only use stronger barrier.

Since the community has decided to use the C11 memory mode API, it is probably clear about the
preceding scenarios (using stronger barrier). I have no comment.

> 
> [1] https://preshing.com/20130922/acquire-and-release-fences/
>>
>>>
>>> <snip>
>>>
> 


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v5] eal: add seqlock
  2022-05-06  1:26                                       ` fengchengwen
  2022-05-06  1:33                                         ` Honnappa Nagarahalli
@ 2022-05-08 11:56                                         ` Mattias Rönnblom
  2022-05-08 12:12                                           ` [PATCH v6] " Mattias Rönnblom
  1 sibling, 1 reply; 104+ messages in thread
From: Mattias Rönnblom @ 2022-05-08 11:56 UTC (permalink / raw)
  To: fengchengwen, Mattias Rönnblom, dev
  Cc: Thomas Monjalon, David Marchand, onar.olsen,
	Honnappa.Nagarahalli, nd, konstantin.ananyev, mb, stephen,
	Ola Liljedahl

On 2022-05-06 03:26, fengchengwen wrote:
> On 2022/5/1 22:03, Mattias Rönnblom wrote:
>> A sequence lock (seqlock) is synchronization primitive which allows
>> for data-race free, low-overhead, high-frequency reads, especially for
>> data structures shared across many cores and which are updated
>> relatively infrequently.
>>
> 
> ...
> 
>> +}
>> +
>> +static void
>> +reader_stop(struct reader *reader)
>> +{
>> +	__atomic_store_n(&reader->stop, 1, __ATOMIC_RELAXED);
>> +}
>> +
>> +#define NUM_WRITERS (2) /* main lcore + one worker */
>> +#define MIN_NUM_READERS (2)
>> +#define MAX_READERS (RTE_MAX_LCORE - NUM_WRITERS - 1)
> 
> Why minus 1 ?
> Suggest define MAX_READERS RTE_MAX_LCORE to avoid underflow with small size VM.
> 

OK.

>> +#define MIN_LCORE_COUNT (NUM_WRITERS + MIN_NUM_READERS)
>> +
>> +/* Only a compile-time test */
>> +static rte_seqlock_t __rte_unused static_init_lock = RTE_SEQLOCK_INITIALIZER;
>> +
>> +static int
>> +test_seqlock(void)
>> +{
>> +	struct reader readers[MAX_READERS];
>> +	unsigned int num_readers;
>> +	unsigned int num_lcores;
>> +	unsigned int i;
>> +	unsigned int lcore_id;
>> +	unsigned int reader_lcore_ids[MAX_READERS];
>> +	unsigned int worker_writer_lcore_id = 0;
>> +	int rc = TEST_SUCCESS;
>> +
>> +	num_lcores = rte_lcore_count();
>> +
>> +	if (num_lcores < MIN_LCORE_COUNT) {
>> +		printf("Too few cores to run test. Skipping.\n");
>> +		return TEST_SKIPPED;
>> +	}
>> +
>> +	num_readers = num_lcores - NUM_WRITERS;
>> +
>> +	struct data *data = rte_zmalloc(NULL, sizeof(struct data), 0);
> 
> Please check whether the value of data is NULL.
> 

OK.

>> +
>> +	i = 0;
>> +	RTE_LCORE_FOREACH_WORKER(lcore_id) {
>> +		if (i == 0) {
>> +			rte_eal_remote_launch(writer_run, data, lcore_id);
>> +			worker_writer_lcore_id = lcore_id;
>> +		} else {
>> +			unsigned int reader_idx = i - 1;
>> +			struct reader *reader = &readers[reader_idx];
>> +
>> +			reader->data = data;
>> +			reader->stop = 0;
>> +
>> +			rte_eal_remote_launch(reader_run, reader, lcore_id);
>> +			reader_lcore_ids[reader_idx] = lcore_id;
>> +		}
>> +		i++;
>> +	}
>> +
>> +	if (writer_run(data) != 0 ||
>> +	    rte_eal_wait_lcore(worker_writer_lcore_id) != 0)
>> +		rc = TEST_FAILED;
>> +
>> +	for (i = 0; i < num_readers; i++) {
>> +		reader_stop(&readers[i]);
>> +		if (rte_eal_wait_lcore(reader_lcore_ids[i]) != 0)
>> +			rc = TEST_FAILED;
>> +	}
>> +
> 
> Please free data memory.
> 

OK.

>> +	return rc;
>> +}
>> +
>> +REGISTER_TEST_COMMAND(seqlock_autotest, test_seqlock);
>> diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
>> index 4245b9635c..f23e33ae30 100644
>> --- a/doc/api/doxy-api-index.md
>> +++ b/doc/api/doxy-api-index.md
>> @@ -77,6 +77,7 @@ The public API headers are grouped by topics:
>>     [rwlock]             (@ref rte_rwlock.h),
>>     [spinlock]           (@ref rte_spinlock.h),
>>     [ticketlock]         (@ref rte_ticketlock.h),
>> +  [seqlock]            (@ref rte_seqlock.h),
>>     [RCU]                (@ref rte_rcu_qsbr.h)
>>   
> 
> ...
> 
>> + */
>> +__rte_experimental
>> +static inline bool
>> +rte_seqlock_read_retry(const rte_seqlock_t *seqlock, uint32_t begin_sn)
>> +{
>> +	uint32_t end_sn;
>> +
>> +	/* An odd sequence number means the protected data was being
>> +	 * modified already at the point of the rte_seqlock_read_begin()
>> +	 * call.
>> +	 */
>> +	if (unlikely(begin_sn & 1))
>> +		return true;
>> +
>> +	/* make sure the data loads happens before the sn load */
>> +	rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
> 
> In ARMv8, the rte_atomic_thread_fence(__ATOMIC_ACQUIRE) and rte_smp_rmb() both output 'dma ishld'
> Suggest use rte_smp_rmb(), please see below comment.
> 
>> +
>> +	end_sn = __atomic_load_n(&seqlock->sn, __ATOMIC_RELAXED);
>> +
>> +	/* A writer incremented the sequence number during this read
>> +	 * critical section.
>> +	 */
>> +	if (unlikely(begin_sn != end_sn))
>> +		return true;
>> +
>> +	return false;
>> +}
>> +
>> +/**
>> + * @warning
>> + * @b EXPERIMENTAL: this API may change without prior notice.
>> + *
>> + * Begin a write-side critical section.
>> + *
>> + * A call to this function acquires the write lock associated @p
>> + * seqlock, and marks the beginning of a write-side critical section.
>> + *
>> + * After having called this function, the caller may go on to modify
>> + * (both read and write) the protected data, in an atomic or
>> + * non-atomic manner.
>> + *
>> + * After the necessary updates have been performed, the application
>> + * calls rte_seqlock_write_unlock().
>> + *
>> + * This function is not preemption-safe in the sense that preemption
>> + * of the calling thread may block reader progress until the writer
>> + * thread is rescheduled.
>> + *
>> + * Unlike rte_seqlock_read_begin(), each call made to
>> + * rte_seqlock_write_lock() must be matched with an unlock call.
>> + *
>> + * @param seqlock
>> + *   A pointer to the seqlock.
>> + *
>> + * @see rte_seqlock_write_unlock()
>> + */
>> +__rte_experimental
>> +static inline void
>> +rte_seqlock_write_lock(rte_seqlock_t *seqlock)
>> +{
>> +	uint32_t sn;
>> +
>> +	/* to synchronize with other writers */
>> +	rte_spinlock_lock(&seqlock->lock);
>> +
>> +	sn = seqlock->sn + 1;
>> +
>> +	__atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELAXED);
>> +
>> +	/* __ATOMIC_RELEASE to prevent stores after (in program order)
>> +	 * from happening before the sn store.
>> +	 */
>> +	rte_atomic_thread_fence(__ATOMIC_RELEASE);
> 
> In ARMv8, rte_atomic_thread_fence(__ATOMIC_RELEASE) will output 'dmb ish', and
> rte_smp_wmb() will output 'dma ishst'.
> Suggest use rte_smp_wmb(). I think here only need to use store mb here.
> 

(This has already been discussed further down in the mail thread, and I 
have nothing to add.)

>> +}
>> +
>> +/**
>> + * @warning
>> + * @b EXPERIMENTAL: this API may change without prior notice.
>> + *
>> + * End a write-side critical section.
>> + *
>> + * A call to this function marks the end of the write-side critical
>> + * section, for @p seqlock. After this call has been made, the protected
>> + * data may no longer be modified.
>> + *
>> + * @param seqlock
>> + *   A pointer to the seqlock.
>> + *
>> + * @see rte_seqlock_write_lock()
>> + */
>> +__rte_experimental
>> +static inline void
>> +rte_seqlock_write_unlock(rte_seqlock_t *seqlock)
>> +{
>> +	uint32_t sn;
>> +
>> +	sn = seqlock->sn + 1;
>> +
>> +	/* synchronizes-with the load acquire in rte_seqlock_read_begin() */
>> +	__atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELEASE);
>> +
>> +	rte_spinlock_unlock(&seqlock->lock);
>> +}
>> +
>> +#ifdef __cplusplus
>> +}
>> +#endif
>> +
>> +#endif  /* _RTE_SEQLOCK_H_ */
>> diff --git a/lib/eal/version.map b/lib/eal/version.map
>> index b53eeb30d7..4a9d0ed899 100644
>> --- a/lib/eal/version.map
>> +++ b/lib/eal/version.map
>> @@ -420,6 +420,9 @@ EXPERIMENTAL {
>>   	rte_intr_instance_free;
>>   	rte_intr_type_get;
>>   	rte_intr_type_set;
>> +
>> +	# added in 22.07
>> +	rte_seqlock_init;
>>   };
>>   
>>   INTERNAL {
>>
> 
> Reviewed-by: Chengwen Feng <fengchengwen@huawei.com>
> 
>
Thanks a lot for the review!


^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH v6] eal: add seqlock
  2022-05-08 11:56                                         ` Mattias Rönnblom
@ 2022-05-08 12:12                                           ` Mattias Rönnblom
  2022-05-08 16:10                                             ` Stephen Hemminger
  0 siblings, 1 reply; 104+ messages in thread
From: Mattias Rönnblom @ 2022-05-08 12:12 UTC (permalink / raw)
  To: Thomas Monjalon, David Marchand
  Cc: dev, onar.olsen, Honnappa.Nagarahalli, nd, konstantin.ananyev,
	mb, stephen, hofors, Chengwen Feng, Mattias Rönnblom,
	Ola Liljedahl

A sequence lock (seqlock) is a synchronization primitive which allows
for data-race free, low-overhead, high-frequency reads, suitable for
data structures shared across many cores and which are updated
relatively infrequently.

A seqlock permits multiple parallel readers. The variant of seqlock
implemented in this patch supports multiple writers as well. A
spinlock is used for writer-writer serialization.

To avoid resource reclamation and other issues, the data protected by
a seqlock is best off being self-contained (i.e., no pointers [except
to constant data]).

One way to think about seqlocks is that they provide means to perform
atomic operations on data objects larger than what the native atomic
machine instructions allow for.

DPDK seqlocks are not preemption safe on the writer side. A thread
preemption affects performance, not correctness.

A seqlock contains a sequence number, which can be thought of as the
generation of the data it protects.

A reader will
  1. Load the sequence number (sn).
  2. Load, in arbitrary order, the seqlock-protected data.
  3. Load the sn again.
  4. Check if the first and second sn are equal, and even numbered.
     If they are not, discard the loaded data, and restart from 1.

The first three steps need to be ordered using suitable memory fences.

A writer will
  1. Take the spinlock, to serialize writer access.
  2. Load the sn.
  3. Store the original sn + 1 as the new sn.
  4. Perform load and stores to the seqlock-protected data.
  5. Store the original sn + 2 as the new sn.
  6. Release the spinlock.

Proper memory fencing is required to make sure the first sn store, the
data stores, and the second sn store appear to the reader in the
mentioned order.

The sn loads and stores must be atomic, but the data loads and stores
need not be.

The original seqlock design and implementation was done by Stephen
Hemminger. This is an independent implementation, using C11 atomics.

For more information on seqlocks, see
https://en.wikipedia.org/wiki/Seqlock

---

PATCH v6:
  * Check for failed memory allocations in unit test.
  * Fix underflow issue in test case for small RTE_LCORE_MAX values.
  * Fix test case memory leak.

PATCH v5:
  * Add sequence lock section to MAINTAINERS.
  * Add entry in the release notes.
  * Add seqlock reference in the API index.
  * Fix meson build file indentation.
  * Use "increment" to describe how a writer changes the sequence number.
  * Remove compiler barriers from seqlock test.
  * Use appropriate macros (e.g., TEST_SUCCESS) for test return values.

PATCH v4:
  * Reverted to Linux kernel style naming on the read side.
  * Bail out early from the retry function if an odd sequence
    number is encountered.
  * Added experimental warnings in the API documentation.
  * Static initializer now uses named field initialization.
  * Various tweaks to API documentation (including the example).

PATCH v3:
  * Renamed both read and write-side critical section begin/end functions
    to better match rwlock naming, per Ola Liljedahl's suggestion.
  * Added 'extern "C"' guards for C++ compatibility.
  * Refer to the main lcore as the main lcore, and nothing else.

PATCH v2:
  * Skip instead of fail unit test in case too few lcores are available.
  * Use main lcore for testing, reducing the minimum number of lcores
    required to run the unit tests to four.
  * Consistently refer to sn field as the "sequence number" in the
    documentation.
  * Fixed spelling mistakes in documentation.

Updates since RFC:
  * Added API documentation.
  * Added link to Wikipedia article in the commit message.
  * Changed seqlock sequence number field from uint64_t (which was
    overkill) to uint32_t. The sn type needs to be sufficiently large
    to assure no reader will read a sn, access the data, and then read
    the same sn, but the sn has been incremented enough times to have
    wrapped during the read, and arrived back at the original sn.
  * Added RTE_SEQLOCK_INITIALIZER macro for static initialization.
  * Removed the rte_seqlock struct + separate rte_seqlock_t typedef
    with an anonymous struct typedef:ed to rte_seqlock_t.

Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Reviewed-by: Chengwen Feng <fengchengwen@huawei.com>
Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
---
 MAINTAINERS                            |   5 +
 app/test/meson.build                   |   2 +
 app/test/test_seqlock.c                | 190 +++++++++++++++
 doc/api/doxy-api-index.md              |   1 +
 doc/guides/rel_notes/release_22_07.rst |  14 ++
 lib/eal/common/meson.build             |   1 +
 lib/eal/common/rte_seqlock.c           |  12 +
 lib/eal/include/meson.build            |   1 +
 lib/eal/include/rte_seqlock.h          | 322 +++++++++++++++++++++++++
 lib/eal/version.map                    |   3 +
 10 files changed, 551 insertions(+)
 create mode 100644 app/test/test_seqlock.c
 create mode 100644 lib/eal/common/rte_seqlock.c
 create mode 100644 lib/eal/include/rte_seqlock.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 7c4f541dba..2804d8136c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -262,6 +262,11 @@ M: Joyce Kong <joyce.kong@arm.com>
 F: lib/eal/include/generic/rte_ticketlock.h
 F: app/test/test_ticketlock.c
 
+Sequence Lock
+M: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
+F: lib/eal/include/rte_seqlock.h
+F: app/test/test_seqlock.c
+
 Pseudo-random Number Generation
 M: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
 F: lib/eal/include/rte_random.h
diff --git a/app/test/meson.build b/app/test/meson.build
index 5fc1dd1b7b..5e418e8766 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -125,6 +125,7 @@ test_sources = files(
         'test_rwlock.c',
         'test_sched.c',
         'test_security.c',
+        'test_seqlock.c',
         'test_service_cores.c',
         'test_spinlock.c',
         'test_stack.c',
@@ -214,6 +215,7 @@ fast_tests = [
         ['rwlock_rde_wro_autotest', true],
         ['sched_autotest', true],
         ['security_autotest', false],
+        ['seqlock_autotest', true],
         ['spinlock_autotest', true],
         ['stack_autotest', false],
         ['stack_lf_autotest', false],
diff --git a/app/test/test_seqlock.c b/app/test/test_seqlock.c
new file mode 100644
index 0000000000..cb1c1baa82
--- /dev/null
+++ b/app/test/test_seqlock.c
@@ -0,0 +1,190 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2022 Ericsson AB
+ */
+
+#include <rte_seqlock.h>
+
+#include <rte_cycles.h>
+#include <rte_malloc.h>
+#include <rte_random.h>
+
+#include <inttypes.h>
+
+#include "test.h"
+
+struct data {
+	rte_seqlock_t lock;
+
+	uint64_t a;
+	uint64_t b __rte_cache_aligned;
+	uint64_t c __rte_cache_aligned;
+} __rte_cache_aligned;
+
+struct reader {
+	struct data *data;
+	uint8_t stop;
+};
+
+#define WRITER_RUNTIME (2.0) /* s */
+
+#define WRITER_MAX_DELAY (100) /* us */
+
+#define INTERRUPTED_WRITER_FREQUENCY (1000)
+#define WRITER_INTERRUPT_TIME (1) /* us */
+
+static int
+writer_run(void *arg)
+{
+	struct data *data = arg;
+	uint64_t deadline;
+
+	deadline = rte_get_timer_cycles() +
+		WRITER_RUNTIME * rte_get_timer_hz();
+
+	while (rte_get_timer_cycles() < deadline) {
+		bool interrupted;
+		uint64_t new_value;
+		unsigned int delay;
+
+		new_value = rte_rand();
+
+		interrupted = rte_rand_max(INTERRUPTED_WRITER_FREQUENCY) == 0;
+
+		rte_seqlock_write_lock(&data->lock);
+
+		data->c = new_value;
+		data->b = new_value;
+
+		if (interrupted)
+			rte_delay_us_block(WRITER_INTERRUPT_TIME);
+
+		data->a = new_value;
+
+		rte_seqlock_write_unlock(&data->lock);
+
+		delay = rte_rand_max(WRITER_MAX_DELAY);
+
+		rte_delay_us_block(delay);
+	}
+
+	return TEST_SUCCESS;
+}
+
+#define INTERRUPTED_READER_FREQUENCY (1000)
+#define READER_INTERRUPT_TIME (1000) /* us */
+
+static int
+reader_run(void *arg)
+{
+	struct reader *r = arg;
+	int rc = TEST_SUCCESS;
+
+	while (__atomic_load_n(&r->stop, __ATOMIC_RELAXED) == 0 &&
+	       rc == TEST_SUCCESS) {
+		struct data *data = r->data;
+		bool interrupted;
+		uint32_t sn;
+		uint64_t a;
+		uint64_t b;
+		uint64_t c;
+
+		interrupted = rte_rand_max(INTERRUPTED_READER_FREQUENCY) == 0;
+
+		do {
+			sn = rte_seqlock_read_begin(&data->lock);
+
+			a = data->a;
+			if (interrupted)
+				rte_delay_us_block(READER_INTERRUPT_TIME);
+			c = data->c;
+			b = data->b;
+
+		} while (rte_seqlock_read_retry(&data->lock, sn));
+
+		if (a != b || b != c) {
+			printf("Reader observed inconsistent data values "
+			       "%" PRIu64 " %" PRIu64 " %" PRIu64 "\n",
+			       a, b, c);
+			rc = TEST_FAILED;
+		}
+	}
+
+	return rc;
+}
+
+static void
+reader_stop(struct reader *reader)
+{
+	__atomic_store_n(&reader->stop, 1, __ATOMIC_RELAXED);
+}
+
+#define NUM_WRITERS (2) /* main lcore + one worker */
+#define MIN_NUM_READERS (2)
+#define MIN_LCORE_COUNT (NUM_WRITERS + MIN_NUM_READERS)
+
+/* Only a compile-time test */
+static rte_seqlock_t __rte_unused static_init_lock = RTE_SEQLOCK_INITIALIZER;
+
+static int
+test_seqlock(void)
+{
+	struct reader readers[RTE_MAX_LCORE];
+	unsigned int num_lcores;
+	unsigned int num_readers;
+	struct data *data;
+	unsigned int i;
+	unsigned int lcore_id;
+	unsigned int reader_lcore_ids[RTE_MAX_LCORE];
+	unsigned int worker_writer_lcore_id = 0;
+	int rc = TEST_SUCCESS;
+
+	num_lcores = rte_lcore_count();
+
+	if (num_lcores < MIN_LCORE_COUNT) {
+		printf("Too few cores to run test. Skipping.\n");
+		return TEST_SKIPPED;
+	}
+
+	num_readers = num_lcores - NUM_WRITERS;
+
+	data = rte_zmalloc(NULL, sizeof(struct data), 0);
+
+	if (data == NULL) {
+		printf("Failed to allocate memory for seqlock data\n");
+		return TEST_FAILED;
+	}
+
+	i = 0;
+	RTE_LCORE_FOREACH_WORKER(lcore_id) {
+		if (i == 0) {
+			rte_eal_remote_launch(writer_run, data, lcore_id);
+			worker_writer_lcore_id = lcore_id;
+		} else {
+			unsigned int reader_idx = i - 1;
+			struct reader *reader = &readers[reader_idx];
+
+			reader->data = data;
+			reader->stop = 0;
+
+			rte_eal_remote_launch(reader_run, reader, lcore_id);
+			reader_lcore_ids[reader_idx] = lcore_id;
+		}
+		i++;
+	}
+
+	if (writer_run(data) != 0 ||
+	    rte_eal_wait_lcore(worker_writer_lcore_id) != 0)
+		rc = TEST_FAILED;
+
+	for (i = 0; i < num_readers; i++) {
+		reader_stop(&readers[i]);
+		if (rte_eal_wait_lcore(reader_lcore_ids[i]) != 0)
+			rc = TEST_FAILED;
+	}
+
+	rte_free(data);
+
+	return rc;
+}
+
+REGISTER_TEST_COMMAND(seqlock_autotest, test_seqlock);
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index 4245b9635c..f23e33ae30 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -77,6 +77,7 @@ The public API headers are grouped by topics:
   [rwlock]             (@ref rte_rwlock.h),
   [spinlock]           (@ref rte_spinlock.h),
   [ticketlock]         (@ref rte_ticketlock.h),
+  [seqlock]            (@ref rte_seqlock.h),
   [RCU]                (@ref rte_rcu_qsbr.h)
 
 - **CPU arch**:
diff --git a/doc/guides/rel_notes/release_22_07.rst b/doc/guides/rel_notes/release_22_07.rst
index 88d6e96cc1..d2f7bafe7b 100644
--- a/doc/guides/rel_notes/release_22_07.rst
+++ b/doc/guides/rel_notes/release_22_07.rst
@@ -55,6 +55,20 @@ New Features
      Also, make sure to start the actual text at the margin.
      =======================================================
 
+* **Added Sequence Lock.**
+
+  Added a new synchronization primitive: the sequence lock
+  (seqlock). A seqlock allows for low overhead, parallel reads. The
+  DPDK seqlock uses a spinlock to serialize multiple writing threads.
+
+  In particular, seqlocks are useful for protecting data structures
+  which are read very frequently, by threads running on many different
+  cores, and modified relatively infrequently.
+
+  One way to think about seqlocks is that they provide means to
+  perform atomic operations on data objects larger than what the
+  native atomic machine instructions allow for.
+
 * **Updated Intel iavf driver.**
 
   * Added Tx QoS queue rate limitation support.
diff --git a/lib/eal/common/meson.build b/lib/eal/common/meson.build
index 917758cc65..3c896711e5 100644
--- a/lib/eal/common/meson.build
+++ b/lib/eal/common/meson.build
@@ -35,6 +35,7 @@ sources += files(
         'rte_malloc.c',
         'rte_random.c',
         'rte_reciprocal.c',
+        'rte_seqlock.c',
         'rte_service.c',
         'rte_version.c',
 )
diff --git a/lib/eal/common/rte_seqlock.c b/lib/eal/common/rte_seqlock.c
new file mode 100644
index 0000000000..d4fe648799
--- /dev/null
+++ b/lib/eal/common/rte_seqlock.c
@@ -0,0 +1,12 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2022 Ericsson AB
+ */
+
+#include <rte_seqlock.h>
+
+void
+rte_seqlock_init(rte_seqlock_t *seqlock)
+{
+	seqlock->sn = 0;
+	rte_spinlock_init(&seqlock->lock);
+}
diff --git a/lib/eal/include/meson.build b/lib/eal/include/meson.build
index 9700494816..48df5f1a21 100644
--- a/lib/eal/include/meson.build
+++ b/lib/eal/include/meson.build
@@ -36,6 +36,7 @@ headers += files(
         'rte_per_lcore.h',
         'rte_random.h',
         'rte_reciprocal.h',
+        'rte_seqlock.h',
         'rte_service.h',
         'rte_service_component.h',
         'rte_string_fns.h',
diff --git a/lib/eal/include/rte_seqlock.h b/lib/eal/include/rte_seqlock.h
new file mode 100644
index 0000000000..13f8ae2e4e
--- /dev/null
+++ b/lib/eal/include/rte_seqlock.h
@@ -0,0 +1,322 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2022 Ericsson AB
+ */
+
+#ifndef _RTE_SEQLOCK_H_
+#define _RTE_SEQLOCK_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * @file
+ * RTE Seqlock
+ *
+ * A sequence lock (seqlock) is a synchronization primitive allowing
+ * multiple, parallel, readers to efficiently and safely (i.e., in a
+ * data-race free manner) access lock-protected data. The RTE seqlock
+ * permits multiple writers as well. A spinlock is used to
+ * writer-writer synchronization.
+ *
+ * A reader never blocks a writer. Very high frequency writes may
+ * prevent readers from making progress.
+ *
+ * A seqlock is not preemption-safe on the writer side. If a writer is
+ * preempted, it may block readers until the writer thread is allowed
+ * to continue. Heavy computations should be kept out of the
+ * writer-side critical section, to avoid delaying readers.
+ *
+ * Seqlocks are useful for data which are read by many cores, at a
+ * high frequency, and relatively infrequently written to.
+ *
+ * One way to think about seqlocks is that they provide means to
+ * perform atomic operations on objects larger than what the native
+ * machine instructions allow for.
+ *
+ * To avoid resource reclamation issues, the data protected by a
+ * seqlock should typically be kept self-contained (e.g., no pointers
+ * to mutable, dynamically allocated data).
+ *
+ * Example usage:
+ * @code{.c}
+ * #define MAX_Y_LEN (16)
+ * // Application-defined example data structure, protected by a seqlock.
+ * struct config {
+ *         rte_seqlock_t lock;
+ *         int param_x;
+ *         char param_y[MAX_Y_LEN];
+ * };
+ *
+ * // Accessor function for reading config fields.
+ * void
+ * config_read(const struct config *config, int *param_x, char *param_y)
+ * {
+ *         uint32_t sn;
+ *
+ *         do {
+ *                 sn = rte_seqlock_read_begin(&config->lock);
+ *
+ *                 // Loads may be atomic or non-atomic, as in this example.
+ *                 *param_x = config->param_x;
+ *                 strcpy(param_y, config->param_y);
+ *                 // An alternative to an immediate retry is to abort and
+ *                 // try again at some later time, assuming progress is
+ *                 // possible without the data.
+ *         } while (rte_seqlock_read_retry(&config->lock));
+ * }
+ *
+ * // Accessor function for writing config fields.
+ * void
+ * config_update(struct config *config, int param_x, const char *param_y)
+ * {
+ *         rte_seqlock_write_lock(&config->lock);
+ *         // Stores may be atomic or non-atomic, as in this example.
+ *         config->param_x = param_x;
+ *         strcpy(config->param_y, param_y);
+ *         rte_seqlock_write_unlock(&config->lock);
+ * }
+ * @endcode
+ *
+ * @see
+ * https://en.wikipedia.org/wiki/Seqlock.
+ */
+
+#include <stdbool.h>
+#include <stdint.h>
+
+#include <rte_atomic.h>
+#include <rte_branch_prediction.h>
+#include <rte_spinlock.h>
+
+/**
+ * The RTE seqlock type.
+ */
+typedef struct {
+	uint32_t sn; /**< A sequence number for the protected data. */
+	rte_spinlock_t lock; /**< Spinlock used to serialize writers.  */
+} rte_seqlock_t;
+
+/**
+ * A static seqlock initializer.
+ */
+#define RTE_SEQLOCK_INITIALIZER \
+	{							\
+		.sn = 0,					\
+		.lock = RTE_SPINLOCK_INITIALIZER		\
+	}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Initialize the seqlock.
+ *
+ * This function initializes the seqlock, and leaves the writer-side
+ * spinlock unlocked.
+ *
+ * @param seqlock
+ *   A pointer to the seqlock.
+ */
+__rte_experimental
+void
+rte_seqlock_init(rte_seqlock_t *seqlock);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Begin a read-side critical section.
+ *
+ * A call to this function marks the beginning of a read-side critical
+ * section, for @p seqlock.
+ *
+ * rte_seqlock_read_begin() returns a sequence number, which is later
+ * used in rte_seqlock_read_retry() to check if the protected data
+ * underwent any modifications during the read transaction.
+ *
+ * After (in program order) rte_seqlock_read_begin() has been called,
+ * the calling thread reads the protected data, for later use. The
+ * protected data read *must* be copied (either in pristine form, or
+ * in the form of some derivative), since the caller may only read the
+ * data from within the read-side critical section (i.e., after
+ * rte_seqlock_read_begin() and before rte_seqlock_read_retry()),
+ * but must not act upon the retrieved data while in the critical
+ * section, since it does not yet know if it is consistent.
+ *
+ * The protected data may be read using atomic and/or non-atomic
+ * operations.
+ *
+ * After (in program order) all required data loads have been
+ * performed, rte_seqlock_read_retry() should be called, marking
+ * the end of the read-side critical section.
+ *
+ * If rte_seqlock_read_retry() returns true, the just-read data is
+ * inconsistent and should be discarded. The caller has the option to
+ * either restart the whole procedure right away (i.e., calling
+ * rte_seqlock_read_begin() again), or do the same at some later time.
+ *
+ * If rte_seqlock_read_retry() returns false, the data was read
+ * atomically and the copied data is consistent.
+ *
+ * @param seqlock
+ *   A pointer to the seqlock.
+ * @return
+ *   The seqlock sequence number for this critical section, to
+ *   later be passed to rte_seqlock_read_retry().
+ *
+ * @see rte_seqlock_read_retry()
+ */
+
+__rte_experimental
+static inline uint32_t
+rte_seqlock_read_begin(const rte_seqlock_t *seqlock)
+{
+	/* __ATOMIC_ACQUIRE to prevent loads after (in program order)
+	 * from happening before the sn load. Synchronizes-with the
+	 * store release in rte_seqlock_write_unlock().
+	 */
+	return __atomic_load_n(&seqlock->sn, __ATOMIC_ACQUIRE);
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * End a read-side critical section.
+ *
+ * A call to this function marks the end of a read-side critical
+ * section, for @p seqlock. The application must supply the sequence
+ * number produced by the corresponding rte_seqlock_read_begin() call.
+ *
+ * After this function has been called, the caller should not access
+ * the protected data.
+ *
+ * In case rte_seqlock_read_retry() returns true, the just-read data
+ * was modified as it was being read and may be inconsistent, and thus
+ * should be discarded.
+ *
+ * In case this function returns false, the data is consistent and the
+ * set of atomic and non-atomic load operations performed between
+ * rte_seqlock_read_begin() and rte_seqlock_read_retry() were atomic,
+ * as a whole.
+ *
+ * @param seqlock
+ *   A pointer to the seqlock.
+ * @param begin_sn
+ *   The seqlock sequence number returned by rte_seqlock_read_begin().
+ * @return
+ *   true or false, if the just-read seqlock-protected data was
+ *   inconsistent or consistent, respectively, at the time it was
+ *   read.
+ *
+ * @see rte_seqlock_read_begin()
+ */
+__rte_experimental
+static inline bool
+rte_seqlock_read_retry(const rte_seqlock_t *seqlock, uint32_t begin_sn)
+{
+	uint32_t end_sn;
+
+	/* An odd sequence number means the protected data was being
+	 * modified already at the point of the rte_seqlock_read_begin()
+	 * call.
+	 */
+	if (unlikely(begin_sn & 1))
+		return true;
+
+	/* make sure the data loads happens before the sn load */
+	rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
+
+	end_sn = __atomic_load_n(&seqlock->sn, __ATOMIC_RELAXED);
+
+	/* A writer incremented the sequence number during this read
+	 * critical section.
+	 */
+	if (unlikely(begin_sn != end_sn))
+		return true;
+
+	return false;
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Begin a write-side critical section.
+ *
+ * A call to this function acquires the write lock associated @p
+ * seqlock, and marks the beginning of a write-side critical section.
+ *
+ * After having called this function, the caller may go on to modify
+ * (both read and write) the protected data, in an atomic or
+ * non-atomic manner.
+ *
+ * After the necessary updates have been performed, the application
+ * calls rte_seqlock_write_unlock().
+ *
+ * This function is not preemption-safe in the sense that preemption
+ * of the calling thread may block reader progress until the writer
+ * thread is rescheduled.
+ *
+ * Unlike rte_seqlock_read_begin(), each call made to
+ * rte_seqlock_write_lock() must be matched with an unlock call.
+ *
+ * @param seqlock
+ *   A pointer to the seqlock.
+ *
+ * @see rte_seqlock_write_unlock()
+ */
+__rte_experimental
+static inline void
+rte_seqlock_write_lock(rte_seqlock_t *seqlock)
+{
+	uint32_t sn;
+
+	/* to synchronize with other writers */
+	rte_spinlock_lock(&seqlock->lock);
+
+	sn = seqlock->sn + 1;
+
+	__atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELAXED);
+
+	/* __ATOMIC_RELEASE to prevent stores after (in program order)
+	 * from happening before the sn store.
+	 */
+	rte_atomic_thread_fence(__ATOMIC_RELEASE);
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * End a write-side critical section.
+ *
+ * A call to this function marks the end of the write-side critical
+ * section, for @p seqlock. After this call has been made, the protected
+ * data may no longer be modified.
+ *
+ * @param seqlock
+ *   A pointer to the seqlock.
+ *
+ * @see rte_seqlock_write_lock()
+ */
+__rte_experimental
+static inline void
+rte_seqlock_write_unlock(rte_seqlock_t *seqlock)
+{
+	uint32_t sn;
+
+	sn = seqlock->sn + 1;
+
+	/* synchronizes-with the load acquire in rte_seqlock_read_begin() */
+	__atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELEASE);
+
+	rte_spinlock_unlock(&seqlock->lock);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif  /* _RTE_SEQLOCK_H_ */
diff --git a/lib/eal/version.map b/lib/eal/version.map
index b53eeb30d7..4a9d0ed899 100644
--- a/lib/eal/version.map
+++ b/lib/eal/version.map
@@ -420,6 +420,9 @@ EXPERIMENTAL {
 	rte_intr_instance_free;
 	rte_intr_type_get;
 	rte_intr_type_set;
+
+	# added in 22.07
+	rte_seqlock_init;
 };
 
 INTERNAL {
-- 
2.25.1


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v6] eal: add seqlock
  2022-05-08 12:12                                           ` [PATCH v6] " Mattias Rönnblom
@ 2022-05-08 16:10                                             ` Stephen Hemminger
  2022-05-08 19:40                                               ` Mattias Rönnblom
  0 siblings, 1 reply; 104+ messages in thread
From: Stephen Hemminger @ 2022-05-08 16:10 UTC (permalink / raw)
  To: Mattias Rönnblom
  Cc: Thomas Monjalon, David Marchand, dev, onar.olsen,
	Honnappa.Nagarahalli, nd, konstantin.ananyev, mb, hofors,
	Chengwen Feng, Ola Liljedahl

On Sun, 8 May 2022 14:12:42 +0200
Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote:
> A sequence lock (seqlock) is a synchronization primitive which allows
> for data-race free, low-overhead, high-frequency reads, suitable for
> data structures shared across many cores and which are updated
> relatively infrequently.
> 
> A seqlock permits multiple parallel readers. The variant of seqlock
> implemented in this patch supports multiple writers as well. A
> spinlock is used for writer-writer serialization.
> 
> To avoid resource reclamation and other issues, the data protected by
> a seqlock is best off being self-contained (i.e., no pointers [except
> to constant data]).
> 
> One way to think about seqlocks is that they provide means to perform
> atomic operations on data objects larger than what the native atomic
> machine instructions allow for.
> 
> DPDK seqlocks are not preemption safe on the writer side. A thread
> preemption affects performance, not correctness.
> 
> A seqlock contains a sequence number, which can be thought of as the
> generation of the data it protects.
> 
> A reader will
>   1. Load the sequence number (sn).
>   2. Load, in arbitrary order, the seqlock-protected data.
>   3. Load the sn again.
>   4. Check if the first and second sn are equal, and even numbered.
>      If they are not, discard the loaded data, and restart from 1.
> 
> The first three steps need to be ordered using suitable memory fences.
> 
> A writer will
>   1. Take the spinlock, to serialize writer access.
>   2. Load the sn.
>   3. Store the original sn + 1 as the new sn.
>   4. Perform load and stores to the seqlock-protected data.
>   5. Store the original sn + 2 as the new sn.
>   6. Release the spinlock.
> 
> Proper memory fencing is required to make sure the first sn store, the
> data stores, and the second sn store appear to the reader in the
> mentioned order.
> 
> The sn loads and stores must be atomic, but the data loads and stores
> need not be.
> 
> The original seqlock design and implementation was done by Stephen
> Hemminger. This is an independent implementation, using C11 atomics.
> 
> For more information on seqlocks, see
> https://en.wikipedia.org/wiki/Seqlock

I think would be good to have the sequence count (read side only) like
the kernel and sequence lock (sequence count + spinlock) as separate things.

That way the application could use sequence count + ticket lock if it
needed to scale to more writers.

> diff --git a/lib/eal/common/rte_seqlock.c b/lib/eal/common/rte_seqlock.c
> new file mode 100644
> index 0000000000..d4fe648799
> --- /dev/null
> +++ b/lib/eal/common/rte_seqlock.c
> @@ -0,0 +1,12 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2022 Ericsson AB
> + */
> +
> +#include <rte_seqlock.h>
> +
> +void
> +rte_seqlock_init(rte_seqlock_t *seqlock)
> +{
> +	seqlock->sn = 0;
> +	rte_spinlock_init(&seqlock->lock);
> +}

So small, worth just making inline?

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v6] eal: add seqlock
  2022-05-08 16:10                                             ` Stephen Hemminger
@ 2022-05-08 19:40                                               ` Mattias Rönnblom
  2022-05-09  3:48                                                 ` Stephen Hemminger
  0 siblings, 1 reply; 104+ messages in thread
From: Mattias Rönnblom @ 2022-05-08 19:40 UTC (permalink / raw)
  To: Stephen Hemminger, Mattias Rönnblom
  Cc: Thomas Monjalon, David Marchand, dev, onar.olsen,
	Honnappa.Nagarahalli, nd, konstantin.ananyev, mb, Chengwen Feng,
	Ola Liljedahl

On 2022-05-08 18:10, Stephen Hemminger wrote:
> On Sun, 8 May 2022 14:12:42 +0200
> Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote:
>> A sequence lock (seqlock) is a synchronization primitive which allows
>> for data-race free, low-overhead, high-frequency reads, suitable for
>> data structures shared across many cores and which are updated
>> relatively infrequently.
>>
>> A seqlock permits multiple parallel readers. The variant of seqlock
>> implemented in this patch supports multiple writers as well. A
>> spinlock is used for writer-writer serialization.
>>
>> To avoid resource reclamation and other issues, the data protected by
>> a seqlock is best off being self-contained (i.e., no pointers [except
>> to constant data]).
>>
>> One way to think about seqlocks is that they provide means to perform
>> atomic operations on data objects larger than what the native atomic
>> machine instructions allow for.
>>
>> DPDK seqlocks are not preemption safe on the writer side. A thread
>> preemption affects performance, not correctness.
>>
>> A seqlock contains a sequence number, which can be thought of as the
>> generation of the data it protects.
>>
>> A reader will
>>    1. Load the sequence number (sn).
>>    2. Load, in arbitrary order, the seqlock-protected data.
>>    3. Load the sn again.
>>    4. Check if the first and second sn are equal, and even numbered.
>>       If they are not, discard the loaded data, and restart from 1.
>>
>> The first three steps need to be ordered using suitable memory fences.
>>
>> A writer will
>>    1. Take the spinlock, to serialize writer access.
>>    2. Load the sn.
>>    3. Store the original sn + 1 as the new sn.
>>    4. Perform load and stores to the seqlock-protected data.
>>    5. Store the original sn + 2 as the new sn.
>>    6. Release the spinlock.
>>
>> Proper memory fencing is required to make sure the first sn store, the
>> data stores, and the second sn store appear to the reader in the
>> mentioned order.
>>
>> The sn loads and stores must be atomic, but the data loads and stores
>> need not be.
>>
>> The original seqlock design and implementation was done by Stephen
>> Hemminger. This is an independent implementation, using C11 atomics.
>>
>> For more information on seqlocks, see
>> https://en.wikipedia.org/wiki/Seqlock
> 
> I think would be good to have the sequence count (read side only) like
> the kernel and sequence lock (sequence count + spinlock) as separate things.
> 
> That way the application could use sequence count + ticket lock if it
> needed to scale to more writers.
> 

Sounds reasonable. Would that be something like:

typedef struct {
	uint32_t sn;
} rte_seqlock_t;

rte_seqlock_read_begin()
rte_seqlock_read_retry()
rte_seqlock_write_begin()
rte_seqlock_write_end()

typedef struct {
	rte_seqlock_t seqlock;
	rte_spinlock_t wlock;
} rte_<something>_t;

rte_<something>_read_begin()
rte_<something>_read_retry()
rte_<something>_write_lock()
rte_<something>_write_unlock()

or are you suggesting removing the spinlock altogether, and leave 
writer-side synchronization to the application (at least in this DPDK 
release)?

>> diff --git a/lib/eal/common/rte_seqlock.c b/lib/eal/common/rte_seqlock.c
>> new file mode 100644
>> index 0000000000..d4fe648799
>> --- /dev/null
>> +++ b/lib/eal/common/rte_seqlock.c
>> @@ -0,0 +1,12 @@
>> +/* SPDX-License-Identifier: BSD-3-Clause
>> + * Copyright(c) 2022 Ericsson AB
>> + */
>> +
>> +#include <rte_seqlock.h>
>> +
>> +void
>> +rte_seqlock_init(rte_seqlock_t *seqlock)
>> +{
>> +	seqlock->sn = 0;
>> +	rte_spinlock_init(&seqlock->lock);
>> +}
> 
> So small, worth just making inline?

I don't think so, but it is small. Especially if rte_spinlock_init() now 
goes away. :)


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v6] eal: add seqlock
  2022-05-08 19:40                                               ` Mattias Rönnblom
@ 2022-05-09  3:48                                                 ` Stephen Hemminger
  2022-05-09  6:26                                                   ` Morten Brørup
  2022-05-13  6:27                                                   ` Mattias Rönnblom
  0 siblings, 2 replies; 104+ messages in thread
From: Stephen Hemminger @ 2022-05-09  3:48 UTC (permalink / raw)
  To: Mattias Rönnblom
  Cc: Mattias Rönnblom, Thomas Monjalon, David Marchand, dev,
	onar.olsen, Honnappa.Nagarahalli, nd, konstantin.ananyev, mb,
	Chengwen Feng, Ola Liljedahl

On Sun, 8 May 2022 21:40:58 +0200
Mattias Rönnblom <hofors@lysator.liu.se> wrote:

> > I think would be good to have the sequence count (read side only) like
> > the kernel and sequence lock (sequence count + spinlock) as separate things.
> > 
> > That way the application could use sequence count + ticket lock if it
> > needed to scale to more writers.
> >   
> 
> Sounds reasonable. Would that be something like:
> 
> typedef struct {
> 	uint32_t sn;
> } rte_seqlock_t;
> 
> rte_seqlock_read_begin()
> rte_seqlock_read_retry()
> rte_seqlock_write_begin()
> rte_seqlock_write_end()
> 
> typedef struct {
> 	rte_seqlock_t seqlock;
> 	rte_spinlock_t wlock;
> } rte_<something>_t;
> 
> rte_<something>_read_begin()
> rte_<something>_read_retry()
> rte_<something>_write_lock()
> rte_<something>_write_unlock()
> 
> or are you suggesting removing the spinlock altogether, and leave 
> writer-side synchronization to the application (at least in this DPDK 
> release)?


No, like Linux kernel. Use seqcount for the reader counter only object
and seqlock for the seqcount + spinlock version.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* RE: [PATCH v6] eal: add seqlock
  2022-05-09  3:48                                                 ` Stephen Hemminger
@ 2022-05-09  6:26                                                   ` Morten Brørup
  2022-05-13  6:27                                                   ` Mattias Rönnblom
  1 sibling, 0 replies; 104+ messages in thread
From: Morten Brørup @ 2022-05-09  6:26 UTC (permalink / raw)
  To: Stephen Hemminger, Mattias Rönnblom
  Cc: Mattias Rönnblom, Thomas Monjalon, David Marchand, dev,
	onar.olsen, Honnappa.Nagarahalli, nd, konstantin.ananyev,
	Chengwen Feng, Ola Liljedahl

> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Monday, 9 May 2022 05.48
> 
> On Sun, 8 May 2022 21:40:58 +0200
> Mattias Rönnblom <hofors@lysator.liu.se> wrote:
> 
> > > I think would be good to have the sequence count (read side only)
> like
> > > the kernel and sequence lock (sequence count + spinlock) as
> separate things.
> > >
> > > That way the application could use sequence count + ticket lock if
> it
> > > needed to scale to more writers.
> > >

If we want a seqlock based on a ticket lock, I would prefer that DPDK offers it, rather than requiring the application to implement it.

Regardless, adding the seqcount type as a separate thing could still make sense.

> >
> > Sounds reasonable. Would that be something like:
> >
> > typedef struct {
> > 	uint32_t sn;
> > } rte_seqlock_t;
> >
> > rte_seqlock_read_begin()
> > rte_seqlock_read_retry()
> > rte_seqlock_write_begin()
> > rte_seqlock_write_end()
> >
> > typedef struct {
> > 	rte_seqlock_t seqlock;
> > 	rte_spinlock_t wlock;
> > } rte_<something>_t;
> >
> > rte_<something>_read_begin()
> > rte_<something>_read_retry()
> > rte_<something>_write_lock()
> > rte_<something>_write_unlock()
> >
> > or are you suggesting removing the spinlock altogether, and leave
> > writer-side synchronization to the application (at least in this DPDK
> > release)?
> 
> 
> No, like Linux kernel. Use seqcount for the reader counter only object
> and seqlock for the seqcount + spinlock version.

In other words: Keep the existing names, i.e. rte_seqlock_t/rte_seqlock_functions(), for what you have already implemented, and use the names rte_seqcount_t/rte_seqcount_functions() for the variant without the lock.

Linux source code here:
https://elixir.bootlin.com/linux/v5.10.113/source/include/linux/seqlock.h

I suppose that the rte_seqcount_t primitive should go into a separate file; it is not really a lock.


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH v6] eal: add seqlock
  2022-05-09  3:48                                                 ` Stephen Hemminger
  2022-05-09  6:26                                                   ` Morten Brørup
@ 2022-05-13  6:27                                                   ` Mattias Rönnblom
  1 sibling, 0 replies; 104+ messages in thread
From: Mattias Rönnblom @ 2022-05-13  6:27 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Mattias Rönnblom, Thomas Monjalon, David Marchand, dev,
	onar.olsen, Honnappa.Nagarahalli, nd, konstantin.ananyev, mb,
	Chengwen Feng, Ola Liljedahl

On 2022-05-09 05:48, Stephen Hemminger wrote:
> On Sun, 8 May 2022 21:40:58 +0200
> Mattias Rönnblom <hofors@lysator.liu.se> wrote:
> 
>>> I think would be good to have the sequence count (read side only) like
>>> the kernel and sequence lock (sequence count + spinlock) as separate things.
>>>
>>> That way the application could use sequence count + ticket lock if it
>>> needed to scale to more writers.
>>>    
>>
>> Sounds reasonable. Would that be something like:
>>
>> typedef struct {
>> 	uint32_t sn;
>> } rte_seqlock_t;
>>
>> rte_seqlock_read_begin()
>> rte_seqlock_read_retry()
>> rte_seqlock_write_begin()
>> rte_seqlock_write_end()
>>
>> typedef struct {
>> 	rte_seqlock_t seqlock;
>> 	rte_spinlock_t wlock;
>> } rte_<something>_t;
>>
>> rte_<something>_read_begin()
>> rte_<something>_read_retry()
>> rte_<something>_write_lock()
>> rte_<something>_write_unlock()
>>
>> or are you suggesting removing the spinlock altogether, and leave
>> writer-side synchronization to the application (at least in this DPDK
>> release)?
> 
> 
> No, like Linux kernel. Use seqcount for the reader counter only object
> and seqlock for the seqcount + spinlock version.

Should rte_seqcount_t be in a separate file?

Normally, I would use the "header file per 'class'" pattern (unless 
things are very tightly coupled), but I suspect DPDK style is the 
"header file per group of related 'classes'".

^ permalink raw reply	[flat|nested] 104+ messages in thread

end of thread, other threads:[~2022-05-13  6:27 UTC | newest]

Thread overview: 104+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-22 16:10 DPDK seqlock Mattias Rönnblom
2022-03-22 16:46 ` Ananyev, Konstantin
2022-03-24  4:52   ` Honnappa Nagarahalli
2022-03-24  5:06     ` Stephen Hemminger
2022-03-24 11:34     ` Mattias Rönnblom
2022-03-25 20:24       ` [RFC] eal: add seqlock Mattias Rönnblom
2022-03-25 21:10         ` Stephen Hemminger
2022-03-26 14:57           ` Mattias Rönnblom
2022-03-27 14:49         ` Ananyev, Konstantin
2022-03-27 17:42           ` Mattias Rönnblom
2022-03-28 10:53             ` Ananyev, Konstantin
2022-03-28 14:06               ` Ola Liljedahl
2022-03-29  8:32                 ` Mattias Rönnblom
2022-03-29 13:20                   ` Ananyev, Konstantin
2022-03-30 10:07                     ` [PATCH] " Mattias Rönnblom
2022-03-30 10:50                       ` Morten Brørup
2022-03-30 11:24                         ` Tyler Retzlaff
2022-03-30 11:25                         ` Mattias Rönnblom
2022-03-30 14:26                         ` [PATCH v2] " Mattias Rönnblom
2022-03-31  7:46                           ` Mattias Rönnblom
2022-03-31  9:04                             ` Ola Liljedahl
2022-03-31  9:25                               ` Morten Brørup
2022-03-31  9:38                                 ` Ola Liljedahl
2022-03-31 10:03                                   ` Morten Brørup
2022-03-31 11:44                                     ` Ola Liljedahl
2022-03-31 11:50                                       ` Morten Brørup
2022-03-31 14:02                                       ` Mattias Rönnblom
2022-04-01 15:07                                         ` [PATCH v3] " Mattias Rönnblom
2022-04-02  0:21                                           ` Honnappa Nagarahalli
2022-04-02 11:01                                             ` Morten Brørup
2022-04-02 19:38                                               ` Honnappa Nagarahalli
2022-04-10 13:51                                                 ` [RFC 1/3] eal: add macro to warn for unused function return values Mattias Rönnblom
2022-04-10 13:51                                                   ` [RFC 2/3] eal: emit warning for unused trylock return value Mattias Rönnblom
2022-04-10 13:51                                                   ` [RFC 3/3] examples/bond: fix invalid use of trylock Mattias Rönnblom
2022-04-11  1:01                                                     ` Min Hu (Connor)
2022-04-11 14:32                                                       ` Mattias Rönnblom
2022-04-11 11:25                                                     ` David Marchand
2022-04-11 14:33                                                       ` Mattias Rönnblom
2022-04-10 18:02                                                   ` [RFC 1/3] eal: add macro to warn for unused function return values Stephen Hemminger
2022-04-10 18:50                                                     ` Mattias Rönnblom
2022-04-11  7:17                                                   ` Morten Brørup
2022-04-11 14:29                                                     ` Mattias Rönnblom
2022-04-11  9:16                                                   ` Bruce Richardson
2022-04-11 14:27                                                     ` Mattias Rönnblom
2022-04-11 15:15                                                     ` [PATCH " Mattias Rönnblom
2022-04-11 15:15                                                       ` [PATCH 2/3] eal: emit warning for unused trylock return value Mattias Rönnblom
2022-04-11 15:29                                                         ` Morten Brørup
2022-04-11 15:15                                                       ` [PATCH 3/3] examples/bond: fix invalid use of trylock Mattias Rönnblom
2022-04-14 12:06                                                         ` David Marchand
2022-04-11 15:25                                                       ` [PATCH 1/3] eal: add macro to warn for unused function return values Morten Brørup
2022-04-11 18:24                                                     ` [RFC " Tyler Retzlaff
2022-04-03  6:10                                             ` [PATCH v3] eal: add seqlock Mattias Rönnblom
2022-04-03 17:27                                               ` Honnappa Nagarahalli
2022-04-03 18:37                                                 ` Ola Liljedahl
2022-04-04 21:56                                                   ` Honnappa Nagarahalli
2022-04-03  6:33                                             ` Mattias Rönnblom
2022-04-03 17:37                                               ` Honnappa Nagarahalli
2022-04-08 13:45                                                 ` Mattias Rönnblom
2022-04-02 18:15                                           ` Ola Liljedahl
2022-04-02 19:31                                             ` Honnappa Nagarahalli
2022-04-02 20:36                                               ` Morten Brørup
2022-04-02 22:01                                                 ` Honnappa Nagarahalli
2022-04-03 18:11                                               ` Ola Liljedahl
2022-04-03  6:51                                             ` Mattias Rönnblom
2022-03-31 13:51                                 ` [PATCH v2] " Mattias Rönnblom
2022-04-02  0:54                                   ` Stephen Hemminger
2022-04-02 10:25                                     ` Morten Brørup
2022-04-02 17:43                                       ` Ola Liljedahl
2022-03-31 13:38                               ` Mattias Rönnblom
2022-03-31 14:53                                 ` Ola Liljedahl
2022-04-02  0:52                                   ` Stephen Hemminger
2022-04-03  6:23                                     ` Mattias Rönnblom
2022-04-02  0:50                           ` Stephen Hemminger
2022-04-02 17:54                             ` Ola Liljedahl
2022-04-02 19:37                               ` Honnappa Nagarahalli
2022-04-05 20:16                           ` Stephen Hemminger
2022-04-08 13:50                             ` Mattias Rönnblom
2022-04-08 14:24                               ` [PATCH v4] " Mattias Rönnblom
2022-04-08 15:17                                 ` Stephen Hemminger
2022-04-08 16:24                                   ` Mattias Rönnblom
2022-04-08 15:19                                 ` Stephen Hemminger
2022-04-08 16:37                                   ` Mattias Rönnblom
2022-04-08 16:48                                 ` Mattias Rönnblom
2022-04-12 17:27                                 ` Ananyev, Konstantin
2022-04-28 10:28                                 ` David Marchand
2022-05-01 13:46                                   ` Mattias Rönnblom
2022-05-01 14:03                                     ` [PATCH v5] " Mattias Rönnblom
2022-05-01 14:22                                       ` Mattias Rönnblom
2022-05-02  6:47                                         ` David Marchand
2022-05-01 20:17                                       ` Stephen Hemminger
2022-05-02  4:51                                         ` Mattias Rönnblom
2022-05-06  1:26                                       ` fengchengwen
2022-05-06  1:33                                         ` Honnappa Nagarahalli
2022-05-06  4:17                                           ` fengchengwen
2022-05-06  5:19                                             ` Honnappa Nagarahalli
2022-05-06  7:03                                               ` fengchengwen
2022-05-08 11:56                                         ` Mattias Rönnblom
2022-05-08 12:12                                           ` [PATCH v6] " Mattias Rönnblom
2022-05-08 16:10                                             ` Stephen Hemminger
2022-05-08 19:40                                               ` Mattias Rönnblom
2022-05-09  3:48                                                 ` Stephen Hemminger
2022-05-09  6:26                                                   ` Morten Brørup
2022-05-13  6:27                                                   ` Mattias Rönnblom
2022-03-23 12:04 ` DPDK seqlock Morten Brørup

DPDK patches and discussions

This inbox may be cloned and mirrored by anyone:

	git clone --mirror http://inbox.dpdk.org/dev/0 dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 dev dev/ http://inbox.dpdk.org/dev \
		dev@dpdk.org
	public-inbox-index dev

Example config snippet for mirrors.
Newsgroup available over NNTP:
	nntp://inbox.dpdk.org/inbox.dpdk.dev


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git