* DPDK seqlock @ 2022-03-22 16:10 Mattias Rönnblom 2022-03-22 16:46 ` Ananyev, Konstantin 2022-03-23 12:04 ` DPDK seqlock Morten Brørup 0 siblings, 2 replies; 104+ messages in thread From: Mattias Rönnblom @ 2022-03-22 16:10 UTC (permalink / raw) To: dev Hi. Would it make sense to have a seqlock implementation in DPDK? I think so, since it's a very useful synchronization primitive in data plane applications. Regards, Mattias ^ permalink raw reply [flat|nested] 104+ messages in thread
* RE: DPDK seqlock 2022-03-22 16:10 DPDK seqlock Mattias Rönnblom @ 2022-03-22 16:46 ` Ananyev, Konstantin 2022-03-24 4:52 ` Honnappa Nagarahalli 2022-03-23 12:04 ` DPDK seqlock Morten Brørup 1 sibling, 1 reply; 104+ messages in thread From: Ananyev, Konstantin @ 2022-03-22 16:46 UTC (permalink / raw) To: mattias.ronnblom, dev Hi Mattias, > > Would it make sense to have a seqlock implementation in DPDK? > > I think so, since it's a very useful synchronization primitive in data > plane applications. > Agree, it might be useful. As I remember rte_hash '_lf' functions do use something similar to seqlock, but in hand-made manner. Probably some other entities within DPDK itself or related projects will benefit from it too... Konstantin ^ permalink raw reply [flat|nested] 104+ messages in thread
* RE: DPDK seqlock 2022-03-22 16:46 ` Ananyev, Konstantin @ 2022-03-24 4:52 ` Honnappa Nagarahalli 2022-03-24 5:06 ` Stephen Hemminger 2022-03-24 11:34 ` Mattias Rönnblom 0 siblings, 2 replies; 104+ messages in thread From: Honnappa Nagarahalli @ 2022-03-24 4:52 UTC (permalink / raw) To: Ananyev, Konstantin, mattias.ronnblom, dev; +Cc: nd, nd <snip> > > Hi Mattias, > > > > > Would it make sense to have a seqlock implementation in DPDK? I do not have any issues with adding the seqlock to DPDK. However, I am interested in understanding the use case. As I understand, seqlock is a type of reader-writer lock. This means that it is possible that readers (data plane) may be blocked till the writer completes the updates. Does not this mean, data plane might drop packets while the writer is updating entries? > > > > I think so, since it's a very useful synchronization primitive in data > > plane applications. > > > > Agree, it might be useful. > As I remember rte_hash '_lf' functions do use something similar to seqlock, but > in hand-made manner. > Probably some other entities within DPDK itself or related projects will benefit > from it too... > > Konstantin ^ permalink raw reply [flat|nested] 104+ messages in thread
* Re: DPDK seqlock 2022-03-24 4:52 ` Honnappa Nagarahalli @ 2022-03-24 5:06 ` Stephen Hemminger 2022-03-24 11:34 ` Mattias Rönnblom 1 sibling, 0 replies; 104+ messages in thread From: Stephen Hemminger @ 2022-03-24 5:06 UTC (permalink / raw) To: Honnappa Nagarahalli; +Cc: Ananyev, Konstantin, mattias.ronnblom, dev, nd On Thu, 24 Mar 2022 04:52:07 +0000 Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com> wrote: > <snip> > > > > > Hi Mattias, > > > > > > > > Would it make sense to have a seqlock implementation in DPDK? > I do not have any issues with adding the seqlock to DPDK. > > However, I am interested in understanding the use case. As I understand, seqlock is a type of reader-writer lock. This means that it is possible that readers (data plane) may be blocked till the writer completes the updates. Does not this mean, data plane might drop packets while the writer is updating entries? > > > > > > > I think so, since it's a very useful synchronization primitive in data > > > plane applications. > > > > > > > Agree, it might be useful. > > As I remember rte_hash '_lf' functions do use something similar to seqlock, but > > in hand-made manner. > > Probably some other entities within DPDK itself or related projects will benefit > > from it too... > > > > Konstantin As inventor of seqlock, it is really just a kind of reader/writer spinlock where spinning trys to do useful work. It useful for cases where the data being accessed is too large for __atomic primitives. ^ permalink raw reply [flat|nested] 104+ messages in thread
* Re: DPDK seqlock 2022-03-24 4:52 ` Honnappa Nagarahalli 2022-03-24 5:06 ` Stephen Hemminger @ 2022-03-24 11:34 ` Mattias Rönnblom 2022-03-25 20:24 ` [RFC] eal: add seqlock Mattias Rönnblom 1 sibling, 1 reply; 104+ messages in thread From: Mattias Rönnblom @ 2022-03-24 11:34 UTC (permalink / raw) To: Honnappa Nagarahalli, Ananyev, Konstantin, dev; +Cc: nd On 2022-03-24 05:52, Honnappa Nagarahalli wrote: > <snip> > >> Hi Mattias, >> >>> Would it make sense to have a seqlock implementation in DPDK? > I do not have any issues with adding the seqlock to DPDK. > > However, I am interested in understanding the use case. As I understand, seqlock is a type of reader-writer lock. This means that it is possible that readers (data plane) may be blocked till the writer completes the updates. Does not this mean, data plane might drop packets while the writer is updating entries? Yes, it's not preemption-safe, just like for example a spinlock-protected data structure. If the writer is interrupted after having stored the first counter update, but before storing the second, all subsequent read attempts will fail. The reading workers would have decide to either give up reading the data structure being protected, or keep retrying indefinitely. This issue is common across all non-preemption-safe data structures (default rings, spinlocks etc), and can be avoided by (surprise!) avoiding preemption by running the control plane thread on RT priority or on a dedicated core, or to use a preemption safe way to tell one of the worker lcore threads to do the actual update. A seqlock is much more efficient on the reader side for high-frequency accesses from many cores than a regular RW lock (i.e., one implemented by two spinlocks or mutexes). A spinlock being lock/unlocked on a per-packet basis by every core is a total performance killer. >>> I think so, since it's a very useful synchronization primitive in data >>> plane applications. >>> >> Agree, it might be useful. >> As I remember rte_hash '_lf' functions do use something similar to seqlock, but >> in hand-made manner. >> Probably some other entities within DPDK itself or related projects will benefit >> from it too... >> >> Konstantin ^ permalink raw reply [flat|nested] 104+ messages in thread
* [RFC] eal: add seqlock 2022-03-24 11:34 ` Mattias Rönnblom @ 2022-03-25 20:24 ` Mattias Rönnblom 2022-03-25 21:10 ` Stephen Hemminger 2022-03-27 14:49 ` Ananyev, Konstantin 0 siblings, 2 replies; 104+ messages in thread From: Mattias Rönnblom @ 2022-03-25 20:24 UTC (permalink / raw) To: dev Cc: Thomas Monjalon, David Marchand, onar.olsen, Honnappa.Nagarahalli, nd, konstantin.ananyev, mb, stephen, Mattias Rönnblom, Ola Liljedahl A sequence lock (seqlock) is synchronization primitive which allows for data-race free, low-overhead, high-frequency reads, especially for data structures shared across many cores and which are updated with relatively low frequency. A seqlock permits multiple parallel readers. The variant of seqlock implemented in this patch supports multiple writers as well. A spinlock is used for writer-writer serialization. To avoid resource reclamation and other issues, the data protected by a seqlock is best off being self-contained (i.e., no pointers [except to constant data]). One way to think about seqlocks is that they provide means to perform atomic operations on data objects larger what the native atomic machine instructions allow for. DPDK seqlocks are not preemption safe on the writer side. A thread preemption affects performance, not correctness. A seqlock contains a sequence number, which can be thought of as the generation of the data it protects. A reader will 1. Load the sequence number (sn). 2. Load, in arbitrary order, the seqlock-protected data. 3. Load the sn again. 4. Check if the first and second sn are equal, and even numbered. If they are not, discard the loaded data, and restart from 1. The first three steps need to be ordered using suitable memory fences. A writer will 1. Take the spinlock, to serialize writer access. 2. Load the sn. 3. Store the original sn + 1 as the new sn. 4. Perform load and stores to the seqlock-protected data. 5. Store the original sn + 2 as the new sn. 6. Release the spinlock. Proper memory fencing is required to make sure the first sn store, the data stores, and the second sn store appear to the reader in the mentioned order. The sn loads and stores must be atomic, but the data loads and stores need not be. The original seqlock design and implementation was done by Stephen Hemminger. This is an independent implementation, using C11 atomics. This RFC version lacks API documentation. Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com> --- app/test/meson.build | 2 + app/test/test_seqlock.c | 197 ++++++++++++++++++++++++++++++++++ lib/eal/common/meson.build | 1 + lib/eal/common/rte_seqlock.c | 12 +++ lib/eal/include/meson.build | 1 + lib/eal/include/rte_seqlock.h | 84 +++++++++++++++ lib/eal/version.map | 3 + 7 files changed, 300 insertions(+) create mode 100644 app/test/test_seqlock.c create mode 100644 lib/eal/common/rte_seqlock.c create mode 100644 lib/eal/include/rte_seqlock.h diff --git a/app/test/meson.build b/app/test/meson.build index 5fc1dd1b7b..5e418e8766 100644 --- a/app/test/meson.build +++ b/app/test/meson.build @@ -125,6 +125,7 @@ test_sources = files( 'test_rwlock.c', 'test_sched.c', 'test_security.c', + 'test_seqlock.c', 'test_service_cores.c', 'test_spinlock.c', 'test_stack.c', @@ -214,6 +215,7 @@ fast_tests = [ ['rwlock_rde_wro_autotest', true], ['sched_autotest', true], ['security_autotest', false], + ['seqlock_autotest', true], ['spinlock_autotest', true], ['stack_autotest', false], ['stack_lf_autotest', false], diff --git a/app/test/test_seqlock.c b/app/test/test_seqlock.c new file mode 100644 index 0000000000..a727e16caf --- /dev/null +++ b/app/test/test_seqlock.c @@ -0,0 +1,197 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2022 Ericsson AB + */ + +#include <rte_seqlock.h> + +#include <rte_cycles.h> +#include <rte_malloc.h> +#include <rte_random.h> + +#include <inttypes.h> + +#include "test.h" + +struct data { + rte_seqlock_t lock; + + uint64_t a; + uint64_t b __rte_cache_aligned; + uint64_t c __rte_cache_aligned; +} __rte_cache_aligned; + +struct reader { + struct data *data; + uint8_t stop; +}; + +#define WRITER_RUNTIME (2.0) /* s */ + +#define WRITER_MAX_DELAY (100) /* us */ + +#define INTERRUPTED_WRITER_FREQUENCY (1000) +#define WRITER_INTERRUPT_TIME (1) /* us */ + +static int +writer_start(void *arg) +{ + struct data *data = arg; + uint64_t deadline; + + deadline = rte_get_timer_cycles() + + WRITER_RUNTIME * rte_get_timer_hz(); + + while (rte_get_timer_cycles() < deadline) { + bool interrupted; + uint64_t new_value; + unsigned int delay; + + new_value = rte_rand(); + + interrupted = rte_rand_max(INTERRUPTED_WRITER_FREQUENCY) == 0; + + rte_seqlock_write_begin(&data->lock); + + data->c = new_value; + + /* These compiler barriers (both on the test reader + * and the test writer side) are here to ensure that + * loads/stores *usually* happen in test program order + * (always on a TSO machine). They are arrange in such + * a way that the writer stores in a different order + * than the reader loads, to emulate an arbitrary + * order. A real application using a seqlock does not + * require any compiler barriers. + */ + rte_compiler_barrier(); + data->b = new_value; + + if (interrupted) + rte_delay_us_block(WRITER_INTERRUPT_TIME); + + rte_compiler_barrier(); + data->a = new_value; + + rte_seqlock_write_end(&data->lock); + + delay = rte_rand_max(WRITER_MAX_DELAY); + + rte_delay_us_block(delay); + } + + return 0; +} + +#define INTERRUPTED_READER_FREQUENCY (1000) +#define READER_INTERRUPT_TIME (1000) /* us */ + +static int +reader_start(void *arg) +{ + struct reader *r = arg; + int rc = 0; + + while (__atomic_load_n(&r->stop, __ATOMIC_RELAXED) == 0 && rc == 0) { + struct data *data = r->data; + bool interrupted; + uint64_t a; + uint64_t b; + uint64_t c; + uint64_t sn; + + interrupted = rte_rand_max(INTERRUPTED_READER_FREQUENCY) == 0; + + do { + sn = rte_seqlock_read_begin(&data->lock); + + a = data->a; + /* See writer_start() for an explaination why + * these barriers are here. + */ + rte_compiler_barrier(); + + if (interrupted) + rte_delay_us_block(READER_INTERRUPT_TIME); + + c = data->c; + + rte_compiler_barrier(); + b = data->b; + + } while (rte_seqlock_read_retry(&data->lock, sn)); + + if (a != b || b != c) { + printf("Reader observed inconsistent data values " + "%" PRIu64 " %" PRIu64 " %" PRIu64 "\n", + a, b, c); + rc = -1; + } + } + + return rc; +} + +static void +reader_stop(struct reader *reader) +{ + __atomic_store_n(&reader->stop, 1, __ATOMIC_RELAXED); +} + +#define NUM_WRITERS (2) +#define MIN_NUM_READERS (2) +#define MAX_READERS (RTE_MAX_LCORE - NUM_WRITERS - 1) +#define MIN_LCORE_COUNT (NUM_WRITERS + MIN_NUM_READERS + 1) + +static int +test_seqlock(void) +{ + struct reader readers[MAX_READERS]; + unsigned int num_readers; + unsigned int num_lcores; + unsigned int i; + unsigned int lcore_id; + unsigned int writer_lcore_ids[NUM_WRITERS] = { 0 }; + unsigned int reader_lcore_ids[MAX_READERS]; + int rc = 0; + + num_lcores = rte_lcore_count(); + + if (num_lcores < MIN_LCORE_COUNT) + return -1; + + num_readers = num_lcores - NUM_WRITERS - 1; + + struct data *data = rte_zmalloc(NULL, sizeof(struct data), 0); + + i = 0; + RTE_LCORE_FOREACH_WORKER(lcore_id) { + if (i < NUM_WRITERS) { + rte_eal_remote_launch(writer_start, data, lcore_id); + writer_lcore_ids[i] = lcore_id; + } else { + unsigned int reader_idx = i - NUM_WRITERS; + struct reader *reader = &readers[reader_idx]; + + reader->data = data; + reader->stop = 0; + + rte_eal_remote_launch(reader_start, reader, lcore_id); + reader_lcore_ids[reader_idx] = lcore_id; + } + i++; + } + + for (i = 0; i < NUM_WRITERS; i++) + if (rte_eal_wait_lcore(writer_lcore_ids[i]) != 0) + rc = -1; + + for (i = 0; i < num_readers; i++) { + reader_stop(&readers[i]); + if (rte_eal_wait_lcore(reader_lcore_ids[i]) != 0) + rc = -1; + } + + return rc; +} + +REGISTER_TEST_COMMAND(seqlock_autotest, test_seqlock); diff --git a/lib/eal/common/meson.build b/lib/eal/common/meson.build index 917758cc65..a41343bfed 100644 --- a/lib/eal/common/meson.build +++ b/lib/eal/common/meson.build @@ -35,6 +35,7 @@ sources += files( 'rte_malloc.c', 'rte_random.c', 'rte_reciprocal.c', + 'rte_seqlock.c', 'rte_service.c', 'rte_version.c', ) diff --git a/lib/eal/common/rte_seqlock.c b/lib/eal/common/rte_seqlock.c new file mode 100644 index 0000000000..d4fe648799 --- /dev/null +++ b/lib/eal/common/rte_seqlock.c @@ -0,0 +1,12 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2022 Ericsson AB + */ + +#include <rte_seqlock.h> + +void +rte_seqlock_init(rte_seqlock_t *seqlock) +{ + seqlock->sn = 0; + rte_spinlock_init(&seqlock->lock); +} diff --git a/lib/eal/include/meson.build b/lib/eal/include/meson.build index 9700494816..48df5f1a21 100644 --- a/lib/eal/include/meson.build +++ b/lib/eal/include/meson.build @@ -36,6 +36,7 @@ headers += files( 'rte_per_lcore.h', 'rte_random.h', 'rte_reciprocal.h', + 'rte_seqlock.h', 'rte_service.h', 'rte_service_component.h', 'rte_string_fns.h', diff --git a/lib/eal/include/rte_seqlock.h b/lib/eal/include/rte_seqlock.h new file mode 100644 index 0000000000..b975ca848a --- /dev/null +++ b/lib/eal/include/rte_seqlock.h @@ -0,0 +1,84 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2022 Ericsson AB + */ + +#ifndef _RTE_SEQLOCK_H_ +#define _RTE_SEQLOCK_H_ + +#include <stdbool.h> +#include <stdint.h> + +#include <rte_atomic.h> +#include <rte_branch_prediction.h> +#include <rte_spinlock.h> + +struct rte_seqlock { + uint64_t sn; + rte_spinlock_t lock; +}; + +typedef struct rte_seqlock rte_seqlock_t; + +__rte_experimental +void +rte_seqlock_init(rte_seqlock_t *seqlock); + +__rte_experimental +static inline uint64_t +rte_seqlock_read_begin(const rte_seqlock_t *seqlock) +{ + /* __ATOMIC_ACQUIRE to prevent loads after (in program order) + * from happening before the sn load. Syncronizes-with the + * store release in rte_seqlock_end(). + */ + return __atomic_load_n(&seqlock->sn, __ATOMIC_ACQUIRE); +} + +__rte_experimental +static inline bool +rte_seqlock_read_retry(const rte_seqlock_t *seqlock, uint64_t begin_sn) +{ + uint64_t end_sn; + + /* make sure the data loads happens before the sn load */ + rte_atomic_thread_fence(__ATOMIC_ACQUIRE); + + end_sn = __atomic_load_n(&seqlock->sn, __ATOMIC_RELAXED); + + return unlikely(begin_sn & 1 || begin_sn != end_sn); +} + +__rte_experimental +static inline void +rte_seqlock_write_begin(rte_seqlock_t *seqlock) +{ + uint64_t sn; + + /* to synchronize with other writers */ + rte_spinlock_lock(&seqlock->lock); + + sn = seqlock->sn + 1; + + __atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELAXED); + + /* __ATOMIC_RELEASE to prevent stores after (in program order) + * from happening before the sn store. + */ + rte_atomic_thread_fence(__ATOMIC_RELEASE); +} + +__rte_experimental +static inline void +rte_seqlock_write_end(rte_seqlock_t *seqlock) +{ + uint64_t sn; + + sn = seqlock->sn + 1; + + /* synchronizes-with the load acquire in rte_seqlock_begin() */ + __atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELEASE); + + rte_spinlock_unlock(&seqlock->lock); +} + +#endif /* _RTE_SEQLOCK_H_ */ diff --git a/lib/eal/version.map b/lib/eal/version.map index b53eeb30d7..4a9d0ed899 100644 --- a/lib/eal/version.map +++ b/lib/eal/version.map @@ -420,6 +420,9 @@ EXPERIMENTAL { rte_intr_instance_free; rte_intr_type_get; rte_intr_type_set; + + # added in 22.07 + rte_seqlock_init; }; INTERNAL { -- 2.25.1 ^ permalink raw reply [flat|nested] 104+ messages in thread
* Re: [RFC] eal: add seqlock 2022-03-25 20:24 ` [RFC] eal: add seqlock Mattias Rönnblom @ 2022-03-25 21:10 ` Stephen Hemminger 2022-03-26 14:57 ` Mattias Rönnblom 2022-03-27 14:49 ` Ananyev, Konstantin 1 sibling, 1 reply; 104+ messages in thread From: Stephen Hemminger @ 2022-03-25 21:10 UTC (permalink / raw) To: Mattias Rönnblom Cc: dev, Thomas Monjalon, David Marchand, onar.olsen, Honnappa.Nagarahalli, nd, konstantin.ananyev, mb, Ola Liljedahl On Fri, 25 Mar 2022 21:24:28 +0100 Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote: > diff --git a/lib/eal/include/rte_seqlock.h b/lib/eal/include/rte_seqlock.h > new file mode 100644 > index 0000000000..b975ca848a > --- /dev/null > +++ b/lib/eal/include/rte_seqlock.h > @@ -0,0 +1,84 @@ > +/* SPDX-License-Identifier: BSD-3-Clause > + * Copyright(c) 2022 Ericsson AB > + */ > + > +#ifndef _RTE_SEQLOCK_H_ > +#define _RTE_SEQLOCK_H_ > + > +#include <stdbool.h> > +#include <stdint.h> > + > +#include <rte_atomic.h> > +#include <rte_branch_prediction.h> > +#include <rte_spinlock.h> > + > +struct rte_seqlock { > + uint64_t sn; > + rte_spinlock_t lock; > +}; > + > +typedef struct rte_seqlock rte_seqlock_t; > + Add a reference to Wikipedia and/or Linux since not every DPDK user maybe familar with this. > + > + sn = seqlock->sn + 1; > + > + __atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELAXED); > + > + /* __ATOMIC_RELEASE to prevent stores after (in program order) > + * from happening before the sn store. > + */ > + rte_atomic_thread_fence(__ATOMIC_RELEASE); Could this just be __atomic_fetch_add() with __ATOMIC_RELEASE? ^ permalink raw reply [flat|nested] 104+ messages in thread
* Re: [RFC] eal: add seqlock 2022-03-25 21:10 ` Stephen Hemminger @ 2022-03-26 14:57 ` Mattias Rönnblom 0 siblings, 0 replies; 104+ messages in thread From: Mattias Rönnblom @ 2022-03-26 14:57 UTC (permalink / raw) To: Stephen Hemminger Cc: dev, Thomas Monjalon, David Marchand, Onar Olsen, Honnappa.Nagarahalli, nd, konstantin.ananyev, mb, Ola Liljedahl On 2022-03-25 22:10, Stephen Hemminger wrote: > On Fri, 25 Mar 2022 21:24:28 +0100 > Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote: > >> diff --git a/lib/eal/include/rte_seqlock.h b/lib/eal/include/rte_seqlock.h >> new file mode 100644 >> index 0000000000..b975ca848a >> --- /dev/null >> +++ b/lib/eal/include/rte_seqlock.h >> @@ -0,0 +1,84 @@ >> +/* SPDX-License-Identifier: BSD-3-Clause >> + * Copyright(c) 2022 Ericsson AB >> + */ >> + >> +#ifndef _RTE_SEQLOCK_H_ >> +#define _RTE_SEQLOCK_H_ >> + >> +#include <stdbool.h> >> +#include <stdint.h> >> + >> +#include <rte_atomic.h> >> +#include <rte_branch_prediction.h> >> +#include <rte_spinlock.h> >> + >> +struct rte_seqlock { >> + uint64_t sn; >> + rte_spinlock_t lock; >> +}; >> + >> +typedef struct rte_seqlock rte_seqlock_t; >> + > > Add a reference to Wikipedia and/or Linux since not every DPDK > user maybe familar with this. OK, will do. >> + >> + sn = seqlock->sn + 1; >> + >> + __atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELAXED); >> + >> + /* __ATOMIC_RELEASE to prevent stores after (in program order) >> + * from happening before the sn store. >> + */ >> + rte_atomic_thread_fence(__ATOMIC_RELEASE); > Could this just be __atomic_fetch_add() with __ATOMIC_RELEASE? If I understood C11 correctly, an __atomic_fetch_add() with __ATOMIC_RELEASE only prevents stores that precedes it (in program order) to be move ahead of it. Thus, stores that follows it may be reordered across the __atomic_fetch_add(), and seen by a reader before the sn change. Also, __atomic_fetch_add() would generate an atomic add machine instruction, which, at least according to my experience (on x86_64), is slower than a mov+add+mov, which is what the above code will generate (plus prevent certain compiler optimizations). That's with TSO. What would happen on weakly ordered machines, I don't know in detail. ^ permalink raw reply [flat|nested] 104+ messages in thread
* RE: [RFC] eal: add seqlock 2022-03-25 20:24 ` [RFC] eal: add seqlock Mattias Rönnblom 2022-03-25 21:10 ` Stephen Hemminger @ 2022-03-27 14:49 ` Ananyev, Konstantin 2022-03-27 17:42 ` Mattias Rönnblom 1 sibling, 1 reply; 104+ messages in thread From: Ananyev, Konstantin @ 2022-03-27 14:49 UTC (permalink / raw) To: mattias.ronnblom, dev Cc: Thomas Monjalon, David Marchand, Olsen, Onar, Honnappa.Nagarahalli, nd, mb, stephen, mattias.ronnblom, Ola Liljedahl > diff --git a/lib/eal/include/meson.build b/lib/eal/include/meson.build > index 9700494816..48df5f1a21 100644 > --- a/lib/eal/include/meson.build > +++ b/lib/eal/include/meson.build > @@ -36,6 +36,7 @@ headers += files( > 'rte_per_lcore.h', > 'rte_random.h', > 'rte_reciprocal.h', > + 'rte_seqlock.h', > 'rte_service.h', > 'rte_service_component.h', > 'rte_string_fns.h', > diff --git a/lib/eal/include/rte_seqlock.h b/lib/eal/include/rte_seqlock.h > new file mode 100644 > index 0000000000..b975ca848a > --- /dev/null > +++ b/lib/eal/include/rte_seqlock.h > @@ -0,0 +1,84 @@ > +/* SPDX-License-Identifier: BSD-3-Clause > + * Copyright(c) 2022 Ericsson AB > + */ > + > +#ifndef _RTE_SEQLOCK_H_ > +#define _RTE_SEQLOCK_H_ > + > +#include <stdbool.h> > +#include <stdint.h> > + > +#include <rte_atomic.h> > +#include <rte_branch_prediction.h> > +#include <rte_spinlock.h> > + > +struct rte_seqlock { > + uint64_t sn; > + rte_spinlock_t lock; > +}; > + > +typedef struct rte_seqlock rte_seqlock_t; > + > +__rte_experimental > +void > +rte_seqlock_init(rte_seqlock_t *seqlock); Probably worth to have static initializer too. > + > +__rte_experimental > +static inline uint64_t > +rte_seqlock_read_begin(const rte_seqlock_t *seqlock) > +{ > + /* __ATOMIC_ACQUIRE to prevent loads after (in program order) > + * from happening before the sn load. Syncronizes-with the > + * store release in rte_seqlock_end(). > + */ > + return __atomic_load_n(&seqlock->sn, __ATOMIC_ACQUIRE); > +} > + > +__rte_experimental > +static inline bool > +rte_seqlock_read_retry(const rte_seqlock_t *seqlock, uint64_t begin_sn) > +{ > + uint64_t end_sn; > + > + /* make sure the data loads happens before the sn load */ > + rte_atomic_thread_fence(__ATOMIC_ACQUIRE); That's sort of 'read_end' correct? If so, shouldn't it be '__ATOMIC_RELEASE' instead here, and end_sn = __atomic_load_n(..., (__ATOMIC_ACQUIRE) on the line below? > + > + end_sn = __atomic_load_n(&seqlock->sn, __ATOMIC_RELAXED); > + > + return unlikely(begin_sn & 1 || begin_sn != end_sn); > +} > + > +__rte_experimental > +static inline void > +rte_seqlock_write_begin(rte_seqlock_t *seqlock) > +{ > + uint64_t sn; > + > + /* to synchronize with other writers */ > + rte_spinlock_lock(&seqlock->lock); > + > + sn = seqlock->sn + 1; > + > + __atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELAXED); > + > + /* __ATOMIC_RELEASE to prevent stores after (in program order) > + * from happening before the sn store. > + */ > + rte_atomic_thread_fence(__ATOMIC_RELEASE); I think it needs to be '__ATOMIC_ACQUIRE' here instead of '__ATOMIC_RELEASE'. > +} > + > +__rte_experimental > +static inline void > +rte_seqlock_write_end(rte_seqlock_t *seqlock) > +{ > + uint64_t sn; > + > + sn = seqlock->sn + 1; > + > + /* synchronizes-with the load acquire in rte_seqlock_begin() */ > + __atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELEASE); > + > + rte_spinlock_unlock(&seqlock->lock); > +} > + ^ permalink raw reply [flat|nested] 104+ messages in thread
* Re: [RFC] eal: add seqlock 2022-03-27 14:49 ` Ananyev, Konstantin @ 2022-03-27 17:42 ` Mattias Rönnblom 2022-03-28 10:53 ` Ananyev, Konstantin 0 siblings, 1 reply; 104+ messages in thread From: Mattias Rönnblom @ 2022-03-27 17:42 UTC (permalink / raw) To: Ananyev, Konstantin, dev Cc: Thomas Monjalon, David Marchand, Onar Olsen, Honnappa.Nagarahalli, nd, mb, stephen, Ola Liljedahl On 2022-03-27 16:49, Ananyev, Konstantin wrote: >> diff --git a/lib/eal/include/meson.build b/lib/eal/include/meson.build >> index 9700494816..48df5f1a21 100644 >> --- a/lib/eal/include/meson.build >> +++ b/lib/eal/include/meson.build >> @@ -36,6 +36,7 @@ headers += files( >> 'rte_per_lcore.h', >> 'rte_random.h', >> 'rte_reciprocal.h', >> + 'rte_seqlock.h', >> 'rte_service.h', >> 'rte_service_component.h', >> 'rte_string_fns.h', >> diff --git a/lib/eal/include/rte_seqlock.h b/lib/eal/include/rte_seqlock.h >> new file mode 100644 >> index 0000000000..b975ca848a >> --- /dev/null >> +++ b/lib/eal/include/rte_seqlock.h >> @@ -0,0 +1,84 @@ >> +/* SPDX-License-Identifier: BSD-3-Clause >> + * Copyright(c) 2022 Ericsson AB >> + */ >> + >> +#ifndef _RTE_SEQLOCK_H_ >> +#define _RTE_SEQLOCK_H_ >> + >> +#include <stdbool.h> >> +#include <stdint.h> >> + >> +#include <rte_atomic.h> >> +#include <rte_branch_prediction.h> >> +#include <rte_spinlock.h> >> + >> +struct rte_seqlock { >> + uint64_t sn; >> + rte_spinlock_t lock; >> +}; >> + >> +typedef struct rte_seqlock rte_seqlock_t; >> + >> +__rte_experimental >> +void >> +rte_seqlock_init(rte_seqlock_t *seqlock); > Probably worth to have static initializer too. > I will add that in the next version, thanks. >> + >> +__rte_experimental >> +static inline uint64_t >> +rte_seqlock_read_begin(const rte_seqlock_t *seqlock) >> +{ >> + /* __ATOMIC_ACQUIRE to prevent loads after (in program order) >> + * from happening before the sn load. Syncronizes-with the >> + * store release in rte_seqlock_end(). >> + */ >> + return __atomic_load_n(&seqlock->sn, __ATOMIC_ACQUIRE); >> +} >> + >> +__rte_experimental >> +static inline bool >> +rte_seqlock_read_retry(const rte_seqlock_t *seqlock, uint64_t begin_sn) >> +{ >> + uint64_t end_sn; >> + >> + /* make sure the data loads happens before the sn load */ >> + rte_atomic_thread_fence(__ATOMIC_ACQUIRE); > That's sort of 'read_end' correct? > If so, shouldn't it be '__ATOMIC_RELEASE' instead here, > and > end_sn = __atomic_load_n(..., (__ATOMIC_ACQUIRE) > on the line below? A release fence prevents reordering of stores. The reader doesn't do any stores, so I don't understand why you would use a release fence here. Could you elaborate? >> + >> + end_sn = __atomic_load_n(&seqlock->sn, __ATOMIC_RELAXED); >> + >> + return unlikely(begin_sn & 1 || begin_sn != end_sn); >> +} >> + >> +__rte_experimental >> +static inline void >> +rte_seqlock_write_begin(rte_seqlock_t *seqlock) >> +{ >> + uint64_t sn; >> + >> + /* to synchronize with other writers */ >> + rte_spinlock_lock(&seqlock->lock); >> + >> + sn = seqlock->sn + 1; >> + >> + __atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELAXED); >> + >> + /* __ATOMIC_RELEASE to prevent stores after (in program order) >> + * from happening before the sn store. >> + */ >> + rte_atomic_thread_fence(__ATOMIC_RELEASE); > I think it needs to be '__ATOMIC_ACQUIRE' here instead of '__ATOMIC_RELEASE'. Please elaborate on why. >> +} >> + >> +__rte_experimental >> +static inline void >> +rte_seqlock_write_end(rte_seqlock_t *seqlock) >> +{ >> + uint64_t sn; >> + >> + sn = seqlock->sn + 1; >> + >> + /* synchronizes-with the load acquire in rte_seqlock_begin() */ >> + __atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELEASE); >> + >> + rte_spinlock_unlock(&seqlock->lock); >> +} >> + ^ permalink raw reply [flat|nested] 104+ messages in thread
* RE: [RFC] eal: add seqlock 2022-03-27 17:42 ` Mattias Rönnblom @ 2022-03-28 10:53 ` Ananyev, Konstantin 2022-03-28 14:06 ` Ola Liljedahl 0 siblings, 1 reply; 104+ messages in thread From: Ananyev, Konstantin @ 2022-03-28 10:53 UTC (permalink / raw) To: mattias.ronnblom, dev Cc: Thomas Monjalon, David Marchand, Olsen, Onar, Honnappa.Nagarahalli, nd, mb, stephen, Ola Liljedahl > >> diff --git a/lib/eal/include/meson.build b/lib/eal/include/meson.build > >> index 9700494816..48df5f1a21 100644 > >> --- a/lib/eal/include/meson.build > >> +++ b/lib/eal/include/meson.build > >> @@ -36,6 +36,7 @@ headers += files( > >> 'rte_per_lcore.h', > >> 'rte_random.h', > >> 'rte_reciprocal.h', > >> + 'rte_seqlock.h', > >> 'rte_service.h', > >> 'rte_service_component.h', > >> 'rte_string_fns.h', > >> diff --git a/lib/eal/include/rte_seqlock.h b/lib/eal/include/rte_seqlock.h > >> new file mode 100644 > >> index 0000000000..b975ca848a > >> --- /dev/null > >> +++ b/lib/eal/include/rte_seqlock.h > >> @@ -0,0 +1,84 @@ > >> +/* SPDX-License-Identifier: BSD-3-Clause > >> + * Copyright(c) 2022 Ericsson AB > >> + */ > >> + > >> +#ifndef _RTE_SEQLOCK_H_ > >> +#define _RTE_SEQLOCK_H_ > >> + > >> +#include <stdbool.h> > >> +#include <stdint.h> > >> + > >> +#include <rte_atomic.h> > >> +#include <rte_branch_prediction.h> > >> +#include <rte_spinlock.h> > >> + > >> +struct rte_seqlock { > >> + uint64_t sn; > >> + rte_spinlock_t lock; > >> +}; > >> + > >> +typedef struct rte_seqlock rte_seqlock_t; > >> + > >> +__rte_experimental > >> +void > >> +rte_seqlock_init(rte_seqlock_t *seqlock); > > Probably worth to have static initializer too. > > > > I will add that in the next version, thanks. > > >> + > >> +__rte_experimental > >> +static inline uint64_t > >> +rte_seqlock_read_begin(const rte_seqlock_t *seqlock) > >> +{ > >> + /* __ATOMIC_ACQUIRE to prevent loads after (in program order) > >> + * from happening before the sn load. Syncronizes-with the > >> + * store release in rte_seqlock_end(). > >> + */ > >> + return __atomic_load_n(&seqlock->sn, __ATOMIC_ACQUIRE); > >> +} > >> + > >> +__rte_experimental > >> +static inline bool > >> +rte_seqlock_read_retry(const rte_seqlock_t *seqlock, uint64_t begin_sn) > >> +{ > >> + uint64_t end_sn; > >> + > >> + /* make sure the data loads happens before the sn load */ > >> + rte_atomic_thread_fence(__ATOMIC_ACQUIRE); > > That's sort of 'read_end' correct? > > If so, shouldn't it be '__ATOMIC_RELEASE' instead here, > > and > > end_sn = __atomic_load_n(..., (__ATOMIC_ACQUIRE) > > on the line below? > > A release fence prevents reordering of stores. The reader doesn't do any > stores, so I don't understand why you would use a release fence here. > Could you elaborate? From my understanding: rte_atomic_thread_fence(__ATOMIC_ACQUIRE); serves as a hoist barrier here, so it would only prevent later instructions to be executed before that point. But it wouldn't prevent earlier instructions to be executed after that point. While we do need to guarantee that cpu will finish all previous reads before progressing further. Suppose we have something like that: struct { uint64_t shared; rte_seqlock_t lock; } data; ... sn = ... uint64_t x = data.shared; /* inside rte_seqlock_read_retry(): */ ... rte_atomic_thread_fence(__ATOMIC_ACQUIRE); end_sn = __atomic_load_n(&data.lock.sn, __ATOMIC_RELAXED); Here we need to make sure that read of data.shared will always happen before reading of data.lock.sn. It is not a problem on IA (as reads are not reordered), but on machines with relaxed memory ordering (ARM, etc.) it can happen. So to prevent it we do need a sink barrier here first (ATOMIC_RELEASE). Honnappa and other ARM & atomics experts, please correct me if I am wrong here. > >> + > >> + end_sn = __atomic_load_n(&seqlock->sn, __ATOMIC_RELAXED); > >> + > >> + return unlikely(begin_sn & 1 || begin_sn != end_sn); > >> +} > >> + > >> +__rte_experimental > >> +static inline void > >> +rte_seqlock_write_begin(rte_seqlock_t *seqlock) > >> +{ > >> + uint64_t sn; > >> + > >> + /* to synchronize with other writers */ > >> + rte_spinlock_lock(&seqlock->lock); > >> + > >> + sn = seqlock->sn + 1; > >> + > >> + __atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELAXED); > >> + > >> + /* __ATOMIC_RELEASE to prevent stores after (in program order) > >> + * from happening before the sn store. > >> + */ > >> + rte_atomic_thread_fence(__ATOMIC_RELEASE); > > I think it needs to be '__ATOMIC_ACQUIRE' here instead of '__ATOMIC_RELEASE'. > > Please elaborate on why. As you said in the comments above, we need to prevent later stores to be executed before that point. So we do need a hoist barrier here. AFAIK to guarantee a hoist barrier '__ATOMIC_ACQUIRE' is required. > > >> +} > >> + > >> +__rte_experimental > >> +static inline void > >> +rte_seqlock_write_end(rte_seqlock_t *seqlock) > >> +{ > >> + uint64_t sn; > >> + > >> + sn = seqlock->sn + 1; > >> + > >> + /* synchronizes-with the load acquire in rte_seqlock_begin() */ > >> + __atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELEASE); > >> + > >> + rte_spinlock_unlock(&seqlock->lock); > >> +} > >> + ^ permalink raw reply [flat|nested] 104+ messages in thread
* Re: [RFC] eal: add seqlock 2022-03-28 10:53 ` Ananyev, Konstantin @ 2022-03-28 14:06 ` Ola Liljedahl 2022-03-29 8:32 ` Mattias Rönnblom 0 siblings, 1 reply; 104+ messages in thread From: Ola Liljedahl @ 2022-03-28 14:06 UTC (permalink / raw) To: Ananyev, Konstantin, mattias.ronnblom, dev Cc: Thomas Monjalon, David Marchand, Olsen, Onar, Honnappa.Nagarahalli, nd, mb, stephen On 3/28/22 12:53, Ananyev, Konstantin wrote: > >>>> diff --git a/lib/eal/include/meson.build b/lib/eal/include/meson.build >>>> index 9700494816..48df5f1a21 100644 >>>> --- a/lib/eal/include/meson.build >>>> +++ b/lib/eal/include/meson.build >>>> @@ -36,6 +36,7 @@ headers += files( >>>> 'rte_per_lcore.h', >>>> 'rte_random.h', >>>> 'rte_reciprocal.h', >>>> + 'rte_seqlock.h', >>>> 'rte_service.h', >>>> 'rte_service_component.h', >>>> 'rte_string_fns.h', >>>> diff --git a/lib/eal/include/rte_seqlock.h b/lib/eal/include/rte_seqlock.h >>>> new file mode 100644 >>>> index 0000000000..b975ca848a >>>> --- /dev/null >>>> +++ b/lib/eal/include/rte_seqlock.h >>>> @@ -0,0 +1,84 @@ >>>> +/* SPDX-License-Identifier: BSD-3-Clause >>>> + * Copyright(c) 2022 Ericsson AB >>>> + */ >>>> + >>>> +#ifndef _RTE_SEQLOCK_H_ >>>> +#define _RTE_SEQLOCK_H_ >>>> + >>>> +#include <stdbool.h> >>>> +#include <stdint.h> >>>> + >>>> +#include <rte_atomic.h> >>>> +#include <rte_branch_prediction.h> >>>> +#include <rte_spinlock.h> >>>> + >>>> +struct rte_seqlock { >>>> + uint64_t sn; >>>> + rte_spinlock_t lock; >>>> +}; >>>> + >>>> +typedef struct rte_seqlock rte_seqlock_t; >>>> + >>>> +__rte_experimental >>>> +void >>>> +rte_seqlock_init(rte_seqlock_t *seqlock); >>> Probably worth to have static initializer too. >>> >> >> I will add that in the next version, thanks. >> >>>> + >>>> +__rte_experimental >>>> +static inline uint64_t >>>> +rte_seqlock_read_begin(const rte_seqlock_t *seqlock) >>>> +{ >>>> + /* __ATOMIC_ACQUIRE to prevent loads after (in program order) >>>> + * from happening before the sn load. Syncronizes-with the >>>> + * store release in rte_seqlock_end(). >>>> + */ >>>> + return __atomic_load_n(&seqlock->sn, __ATOMIC_ACQUIRE); >>>> +} >>>> + >>>> +__rte_experimental >>>> +static inline bool >>>> +rte_seqlock_read_retry(const rte_seqlock_t *seqlock, uint64_t begin_sn) >>>> +{ >>>> + uint64_t end_sn; >>>> + >>>> + /* make sure the data loads happens before the sn load */ >>>> + rte_atomic_thread_fence(__ATOMIC_ACQUIRE); >>> That's sort of 'read_end' correct? >>> If so, shouldn't it be '__ATOMIC_RELEASE' instead here, >>> and >>> end_sn = __atomic_load_n(..., (__ATOMIC_ACQUIRE) >>> on the line below? >> >> A release fence prevents reordering of stores. The reader doesn't do any >> stores, so I don't understand why you would use a release fence here. >> Could you elaborate? > > From my understanding: > rte_atomic_thread_fence(__ATOMIC_ACQUIRE); > serves as a hoist barrier here, so it would only prevent later instructions > to be executed before that point. > But it wouldn't prevent earlier instructions to be executed after that point. > While we do need to guarantee that cpu will finish all previous reads before > progressing further. > > Suppose we have something like that: > > struct { > uint64_t shared; > rte_seqlock_t lock; > } data; > > ... > sn = ... > uint64_t x = data.shared; > /* inside rte_seqlock_read_retry(): */ > ... > rte_atomic_thread_fence(__ATOMIC_ACQUIRE); > end_sn = __atomic_load_n(&data.lock.sn, __ATOMIC_RELAXED); > > Here we need to make sure that read of data.shared will always happen > before reading of data.lock.sn. > It is not a problem on IA (as reads are not reordered), but on machines with > relaxed memory ordering (ARM, etc.) it can happen. > So to prevent it we do need a sink barrier here first (ATOMIC_RELEASE) We can't use store-release since there is no write on the reader-side. And fence-release orders against later stores, not later loads. > > Honnappa and other ARM & atomics experts, please correct me if I am wrong here. The C standard (chapter 7.17.4 in the C11 (draft)) isn't so easy to digest. If we trust Preshing, he has a more accessible description here: https://preshing.com/20130922/acquire-and-release-fences/ "An acquire fence prevents the memory reordering of any read which precedes it in program order with any read or write which follows it in program order." and here: https://preshing.com/20131125/acquire-and-release-fences-dont-work-the-way-youd-expect/ (for C++ but the definition seems to be identical to that of C11). Essentially a LoadLoad+LoadStore barrier which is what we want to achieve. GCC 10.3 for AArch64/A64 ISA generates a "DMB ISHLD" instruction. This waits for all loads preceding (in program order) the memory barrier to be observed before any memory accesses after (in program order) the memory barrier. I think the key to understanding atomic thread fences is that they are not associated with a specific memory access (unlike load-acquire and store-release) so they can't order earlier or later memory accesses against some specific memory access. Instead the fence orders any/all earlier loads and/or stores against any/all later loads or stores (depending on acquire or release). > >>>> + >>>> + end_sn = __atomic_load_n(&seqlock->sn, __ATOMIC_RELAXED); >>>> + >>>> + return unlikely(begin_sn & 1 || begin_sn != end_sn); >>>> +} >>>> + >>>> +__rte_experimental >>>> +static inline void >>>> +rte_seqlock_write_begin(rte_seqlock_t *seqlock) >>>> +{ >>>> + uint64_t sn; >>>> + >>>> + /* to synchronize with other writers */ >>>> + rte_spinlock_lock(&seqlock->lock); >>>> + >>>> + sn = seqlock->sn + 1; >>>> + >>>> + __atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELAXED); >>>> + >>>> + /* __ATOMIC_RELEASE to prevent stores after (in program order) >>>> + * from happening before the sn store. >>>> + */ >>>> + rte_atomic_thread_fence(__ATOMIC_RELEASE); >>> I think it needs to be '__ATOMIC_ACQUIRE' here instead of '__ATOMIC_RELEASE'. >> >> Please elaborate on why. > > As you said in the comments above, we need to prevent later stores > to be executed before that point. So we do need a hoist barrier here. > AFAIK to guarantee a hoist barrier '__ATOMIC_ACQUIRE' is required. An acquire fence wouldn't order an earlier store (the write to seqlock->sn) from being reordered with some later store (e.g. writes to the protected data), thus it would allow readers to see updated data (possibly torn) with a pre-update sequence number. We need a StoreStore barrier for ordering the SN store and data stores => fence(release). Acquire and releases fences can (also) be used to create synchronize-with relationships (this is how the C standard defines them). Preshing has a good example on this. Basically Thread 1: data = 242; atomic_thread_fence(atomic_release); atomic_store_n(&guard, 1, atomic_relaxed); Thread 2: while (atomic_load_n(&guard, atomic_relaxed) != 1) ; atomic_thread_fence(atomic_acquire); do_something(data); These are obvious analogues to store-release and load-acquire, thus the acquire & release names of the fences. - Ola > >> >>>> +} >>>> + >>>> +__rte_experimental >>>> +static inline void >>>> +rte_seqlock_write_end(rte_seqlock_t *seqlock) >>>> +{ >>>> + uint64_t sn; >>>> + >>>> + sn = seqlock->sn + 1; >>>> + >>>> + /* synchronizes-with the load acquire in rte_seqlock_begin() */ >>>> + __atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELEASE); >>>> + >>>> + rte_spinlock_unlock(&seqlock->lock); >>>> +} >>>> + > ^ permalink raw reply [flat|nested] 104+ messages in thread
* Re: [RFC] eal: add seqlock 2022-03-28 14:06 ` Ola Liljedahl @ 2022-03-29 8:32 ` Mattias Rönnblom 2022-03-29 13:20 ` Ananyev, Konstantin 0 siblings, 1 reply; 104+ messages in thread From: Mattias Rönnblom @ 2022-03-29 8:32 UTC (permalink / raw) To: Ola Liljedahl, Ananyev, Konstantin, dev Cc: Thomas Monjalon, David Marchand, Onar Olsen, Honnappa.Nagarahalli, nd, mb, stephen On 2022-03-28 16:06, Ola Liljedahl wrote: > > > On 3/28/22 12:53, Ananyev, Konstantin wrote: >> >>>>> diff --git a/lib/eal/include/meson.build >>>>> b/lib/eal/include/meson.build >>>>> index 9700494816..48df5f1a21 100644 >>>>> --- a/lib/eal/include/meson.build >>>>> +++ b/lib/eal/include/meson.build >>>>> @@ -36,6 +36,7 @@ headers += files( >>>>> 'rte_per_lcore.h', >>>>> 'rte_random.h', >>>>> 'rte_reciprocal.h', >>>>> + 'rte_seqlock.h', >>>>> 'rte_service.h', >>>>> 'rte_service_component.h', >>>>> 'rte_string_fns.h', >>>>> diff --git a/lib/eal/include/rte_seqlock.h >>>>> b/lib/eal/include/rte_seqlock.h >>>>> new file mode 100644 >>>>> index 0000000000..b975ca848a >>>>> --- /dev/null >>>>> +++ b/lib/eal/include/rte_seqlock.h >>>>> @@ -0,0 +1,84 @@ >>>>> +/* SPDX-License-Identifier: BSD-3-Clause >>>>> + * Copyright(c) 2022 Ericsson AB >>>>> + */ >>>>> + >>>>> +#ifndef _RTE_SEQLOCK_H_ >>>>> +#define _RTE_SEQLOCK_H_ >>>>> + >>>>> +#include <stdbool.h> >>>>> +#include <stdint.h> >>>>> + >>>>> +#include <rte_atomic.h> >>>>> +#include <rte_branch_prediction.h> >>>>> +#include <rte_spinlock.h> >>>>> + >>>>> +struct rte_seqlock { >>>>> + uint64_t sn; >>>>> + rte_spinlock_t lock; >>>>> +}; >>>>> + >>>>> +typedef struct rte_seqlock rte_seqlock_t; >>>>> + >>>>> +__rte_experimental >>>>> +void >>>>> +rte_seqlock_init(rte_seqlock_t *seqlock); >>>> Probably worth to have static initializer too. >>>> >>> >>> I will add that in the next version, thanks. >>> >>>>> + >>>>> +__rte_experimental >>>>> +static inline uint64_t >>>>> +rte_seqlock_read_begin(const rte_seqlock_t *seqlock) >>>>> +{ >>>>> + /* __ATOMIC_ACQUIRE to prevent loads after (in program order) >>>>> + * from happening before the sn load. Syncronizes-with the >>>>> + * store release in rte_seqlock_end(). >>>>> + */ >>>>> + return __atomic_load_n(&seqlock->sn, __ATOMIC_ACQUIRE); >>>>> +} >>>>> + >>>>> +__rte_experimental >>>>> +static inline bool >>>>> +rte_seqlock_read_retry(const rte_seqlock_t *seqlock, uint64_t >>>>> begin_sn) >>>>> +{ >>>>> + uint64_t end_sn; >>>>> + >>>>> + /* make sure the data loads happens before the sn load */ >>>>> + rte_atomic_thread_fence(__ATOMIC_ACQUIRE); >>>> That's sort of 'read_end' correct? >>>> If so, shouldn't it be '__ATOMIC_RELEASE' instead here, >>>> and >>>> end_sn = __atomic_load_n(..., (__ATOMIC_ACQUIRE) >>>> on the line below? >>> >>> A release fence prevents reordering of stores. The reader doesn't do >>> any >>> stores, so I don't understand why you would use a release fence here. >>> Could you elaborate? >> >> From my understanding: >> rte_atomic_thread_fence(__ATOMIC_ACQUIRE); >> serves as a hoist barrier here, so it would only prevent later >> instructions >> to be executed before that point. >> But it wouldn't prevent earlier instructions to be executed after >> that point. >> While we do need to guarantee that cpu will finish all previous reads >> before >> progressing further. >> >> Suppose we have something like that: >> >> struct { >> uint64_t shared; >> rte_seqlock_t lock; >> } data; >> >> ... >> sn = ... >> uint64_t x = data.shared; >> /* inside rte_seqlock_read_retry(): */ >> ... >> rte_atomic_thread_fence(__ATOMIC_ACQUIRE); >> end_sn = __atomic_load_n(&data.lock.sn, __ATOMIC_RELAXED); >> >> Here we need to make sure that read of data.shared will always happen >> before reading of data.lock.sn. >> It is not a problem on IA (as reads are not reordered), but on >> machines with >> relaxed memory ordering (ARM, etc.) it can happen. >> So to prevent it we do need a sink barrier here first (ATOMIC_RELEASE) > We can't use store-release since there is no write on the reader-side. > And fence-release orders against later stores, not later loads. > >> >> Honnappa and other ARM & atomics experts, please correct me if I am >> wrong here. > The C standard (chapter 7.17.4 in the C11 (draft)) isn't so easy to > digest. If we trust Preshing, he has a more accessible description > here: > https://protect2.fireeye.com/v1/url?k=31323334-501d5122-313273af-454445555731-f4f5b1eec2980283&q=1&e=3479ebfa-e18d-4bf8-88fe-76823a531912&u=https%3A%2F%2Fpreshing.com%2F20130922%2Facquire-and-release-fences%2F > "An acquire fence prevents the memory reordering of any read which > precedes it in program order with any read or write which follows it > in program order." > and here: > https://protect2.fireeye.com/v1/url?k=31323334-501d5122-313273af-454445555731-64b0eba450be934b&q=1&e=3479ebfa-e18d-4bf8-88fe-76823a531912&u=https%3A%2F%2Fpreshing.com%2F20131125%2Facquire-and-release-fences-dont-work-the-way-youd-expect%2F > (for C++ but the definition seems to be identical to that of C11). > Essentially a LoadLoad+LoadStore barrier which is what we want to > achieve. > > GCC 10.3 for AArch64/A64 ISA generates a "DMB ISHLD" instruction. This > waits for all loads preceding (in program order) the memory barrier to > be observed before any memory accesses after (in program order) the > memory barrier. > > I think the key to understanding atomic thread fences is that they are > not associated with a specific memory access (unlike load-acquire and > store-release) so they can't order earlier or later memory accesses > against some specific memory access. Instead the fence orders any/all > earlier loads and/or stores against any/all later loads or stores > (depending on acquire or release). > >> >>>>> + >>>>> + end_sn = __atomic_load_n(&seqlock->sn, __ATOMIC_RELAXED); >>>>> + >>>>> + return unlikely(begin_sn & 1 || begin_sn != end_sn); >>>>> +} >>>>> + >>>>> +__rte_experimental >>>>> +static inline void >>>>> +rte_seqlock_write_begin(rte_seqlock_t *seqlock) >>>>> +{ >>>>> + uint64_t sn; >>>>> + >>>>> + /* to synchronize with other writers */ >>>>> + rte_spinlock_lock(&seqlock->lock); >>>>> + >>>>> + sn = seqlock->sn + 1; >>>>> + >>>>> + __atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELAXED); >>>>> + >>>>> + /* __ATOMIC_RELEASE to prevent stores after (in program order) >>>>> + * from happening before the sn store. >>>>> + */ >>>>> + rte_atomic_thread_fence(__ATOMIC_RELEASE); >>>> I think it needs to be '__ATOMIC_ACQUIRE' here instead of >>>> '__ATOMIC_RELEASE'. >>> >>> Please elaborate on why. >> >> As you said in the comments above, we need to prevent later stores >> to be executed before that point. So we do need a hoist barrier here. >> AFAIK to guarantee a hoist barrier '__ATOMIC_ACQUIRE' is required. > An acquire fence wouldn't order an earlier store (the write to > seqlock->sn) from being reordered with some later store (e.g. writes > to the protected data), thus it would allow readers to see updated > data (possibly torn) with a pre-update sequence number. We need a > StoreStore barrier for ordering the SN store and data stores => > fence(release). > > Acquire and releases fences can (also) be used to create > synchronize-with relationships (this is how the C standard defines > them). Preshing has a good example on this. Basically > Thread 1: > data = 242; > atomic_thread_fence(atomic_release); > atomic_store_n(&guard, 1, atomic_relaxed); > > Thread 2: > while (atomic_load_n(&guard, atomic_relaxed) != 1) ; > atomic_thread_fence(atomic_acquire); > do_something(data); > > These are obvious analogues to store-release and load-acquire, thus > the acquire & release names of the fences. > > - Ola > >> >>> >>>>> +} >>>>> + >>>>> +__rte_experimental >>>>> +static inline void >>>>> +rte_seqlock_write_end(rte_seqlock_t *seqlock) >>>>> +{ >>>>> + uint64_t sn; >>>>> + >>>>> + sn = seqlock->sn + 1; >>>>> + >>>>> + /* synchronizes-with the load acquire in rte_seqlock_begin() */ >>>>> + __atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELEASE); >>>>> + >>>>> + rte_spinlock_unlock(&seqlock->lock); >>>>> +} >>>>> + >> I have nothing to add, but Ola's mail seems to have been blocked from the dev list, so I'm posting this again. ^ permalink raw reply [flat|nested] 104+ messages in thread
* RE: [RFC] eal: add seqlock 2022-03-29 8:32 ` Mattias Rönnblom @ 2022-03-29 13:20 ` Ananyev, Konstantin 2022-03-30 10:07 ` [PATCH] " Mattias Rönnblom 0 siblings, 1 reply; 104+ messages in thread From: Ananyev, Konstantin @ 2022-03-29 13:20 UTC (permalink / raw) To: mattias.ronnblom, Ola Liljedahl, dev Cc: Thomas Monjalon, David Marchand, Olsen, Onar, Honnappa.Nagarahalli, nd, mb, stephen > >>>>> diff --git a/lib/eal/include/meson.build > >>>>> b/lib/eal/include/meson.build > >>>>> index 9700494816..48df5f1a21 100644 > >>>>> --- a/lib/eal/include/meson.build > >>>>> +++ b/lib/eal/include/meson.build > >>>>> @@ -36,6 +36,7 @@ headers += files( > >>>>> 'rte_per_lcore.h', > >>>>> 'rte_random.h', > >>>>> 'rte_reciprocal.h', > >>>>> + 'rte_seqlock.h', > >>>>> 'rte_service.h', > >>>>> 'rte_service_component.h', > >>>>> 'rte_string_fns.h', > >>>>> diff --git a/lib/eal/include/rte_seqlock.h > >>>>> b/lib/eal/include/rte_seqlock.h > >>>>> new file mode 100644 > >>>>> index 0000000000..b975ca848a > >>>>> --- /dev/null > >>>>> +++ b/lib/eal/include/rte_seqlock.h > >>>>> @@ -0,0 +1,84 @@ > >>>>> +/* SPDX-License-Identifier: BSD-3-Clause > >>>>> + * Copyright(c) 2022 Ericsson AB > >>>>> + */ > >>>>> + > >>>>> +#ifndef _RTE_SEQLOCK_H_ > >>>>> +#define _RTE_SEQLOCK_H_ > >>>>> + > >>>>> +#include <stdbool.h> > >>>>> +#include <stdint.h> > >>>>> + > >>>>> +#include <rte_atomic.h> > >>>>> +#include <rte_branch_prediction.h> > >>>>> +#include <rte_spinlock.h> > >>>>> + > >>>>> +struct rte_seqlock { > >>>>> + uint64_t sn; > >>>>> + rte_spinlock_t lock; > >>>>> +}; > >>>>> + > >>>>> +typedef struct rte_seqlock rte_seqlock_t; > >>>>> + > >>>>> +__rte_experimental > >>>>> +void > >>>>> +rte_seqlock_init(rte_seqlock_t *seqlock); > >>>> Probably worth to have static initializer too. > >>>> > >>> > >>> I will add that in the next version, thanks. > >>> > >>>>> + > >>>>> +__rte_experimental > >>>>> +static inline uint64_t > >>>>> +rte_seqlock_read_begin(const rte_seqlock_t *seqlock) > >>>>> +{ > >>>>> + /* __ATOMIC_ACQUIRE to prevent loads after (in program order) > >>>>> + * from happening before the sn load. Syncronizes-with the > >>>>> + * store release in rte_seqlock_end(). > >>>>> + */ > >>>>> + return __atomic_load_n(&seqlock->sn, __ATOMIC_ACQUIRE); > >>>>> +} > >>>>> + > >>>>> +__rte_experimental > >>>>> +static inline bool > >>>>> +rte_seqlock_read_retry(const rte_seqlock_t *seqlock, uint64_t > >>>>> begin_sn) > >>>>> +{ > >>>>> + uint64_t end_sn; > >>>>> + > >>>>> + /* make sure the data loads happens before the sn load */ > >>>>> + rte_atomic_thread_fence(__ATOMIC_ACQUIRE); > >>>> That's sort of 'read_end' correct? > >>>> If so, shouldn't it be '__ATOMIC_RELEASE' instead here, > >>>> and > >>>> end_sn = __atomic_load_n(..., (__ATOMIC_ACQUIRE) > >>>> on the line below? > >>> > >>> A release fence prevents reordering of stores. The reader doesn't do > >>> any > >>> stores, so I don't understand why you would use a release fence here. > >>> Could you elaborate? > >> > >> From my understanding: > >> rte_atomic_thread_fence(__ATOMIC_ACQUIRE); > >> serves as a hoist barrier here, so it would only prevent later > >> instructions > >> to be executed before that point. > >> But it wouldn't prevent earlier instructions to be executed after > >> that point. > >> While we do need to guarantee that cpu will finish all previous reads > >> before > >> progressing further. > >> > >> Suppose we have something like that: > >> > >> struct { > >> uint64_t shared; > >> rte_seqlock_t lock; > >> } data; > >> > >> ... > >> sn = ... > >> uint64_t x = data.shared; > >> /* inside rte_seqlock_read_retry(): */ > >> ... > >> rte_atomic_thread_fence(__ATOMIC_ACQUIRE); > >> end_sn = __atomic_load_n(&data.lock.sn, __ATOMIC_RELAXED); > >> > >> Here we need to make sure that read of data.shared will always happen > >> before reading of data.lock.sn. > >> It is not a problem on IA (as reads are not reordered), but on > >> machines with > >> relaxed memory ordering (ARM, etc.) it can happen. > >> So to prevent it we do need a sink barrier here first (ATOMIC_RELEASE) > > We can't use store-release since there is no write on the reader-side. > > And fence-release orders against later stores, not later loads. > > > >> > >> Honnappa and other ARM & atomics experts, please correct me if I am > >> wrong here. > > The C standard (chapter 7.17.4 in the C11 (draft)) isn't so easy to > > digest. If we trust Preshing, he has a more accessible description > > here: > > https://protect2.fireeye.com/v1/url?k=31323334-501d5122-313273af-454445555731-f4f5b1eec2980283&q=1&e=3479ebfa-e18d-4bf8- > 88fe-76823a531912&u=https%3A%2F%2Fpreshing.com%2F20130922%2Facquire-and-release-fences%2F > > "An acquire fence prevents the memory reordering of any read which > > precedes it in program order with any read or write which follows it > > in program order." > > and here: > > https://protect2.fireeye.com/v1/url?k=31323334-501d5122-313273af-454445555731-64b0eba450be934b&q=1&e=3479ebfa-e18d-4bf8- > 88fe-76823a531912&u=https%3A%2F%2Fpreshing.com%2F20131125%2Facquire-and-release-fences-dont-work-the-way-youd-expect%2F > > (for C++ but the definition seems to be identical to that of C11). > > Essentially a LoadLoad+LoadStore barrier which is what we want to > > achieve. > > > > GCC 10.3 for AArch64/A64 ISA generates a "DMB ISHLD" instruction. This > > waits for all loads preceding (in program order) the memory barrier to > > be observed before any memory accesses after (in program order) the > > memory barrier. > > > > I think the key to understanding atomic thread fences is that they are > > not associated with a specific memory access (unlike load-acquire and > > store-release) so they can't order earlier or later memory accesses > > against some specific memory access. Instead the fence orders any/all > > earlier loads and/or stores against any/all later loads or stores > > (depending on acquire or release). > > > >> > >>>>> + > >>>>> + end_sn = __atomic_load_n(&seqlock->sn, __ATOMIC_RELAXED); > >>>>> + > >>>>> + return unlikely(begin_sn & 1 || begin_sn != end_sn); > >>>>> +} > >>>>> + > >>>>> +__rte_experimental > >>>>> +static inline void > >>>>> +rte_seqlock_write_begin(rte_seqlock_t *seqlock) > >>>>> +{ > >>>>> + uint64_t sn; > >>>>> + > >>>>> + /* to synchronize with other writers */ > >>>>> + rte_spinlock_lock(&seqlock->lock); > >>>>> + > >>>>> + sn = seqlock->sn + 1; > >>>>> + > >>>>> + __atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELAXED); > >>>>> + > >>>>> + /* __ATOMIC_RELEASE to prevent stores after (in program order) > >>>>> + * from happening before the sn store. > >>>>> + */ > >>>>> + rte_atomic_thread_fence(__ATOMIC_RELEASE); > >>>> I think it needs to be '__ATOMIC_ACQUIRE' here instead of > >>>> '__ATOMIC_RELEASE'. > >>> > >>> Please elaborate on why. > >> > >> As you said in the comments above, we need to prevent later stores > >> to be executed before that point. So we do need a hoist barrier here. > >> AFAIK to guarantee a hoist barrier '__ATOMIC_ACQUIRE' is required. > > An acquire fence wouldn't order an earlier store (the write to > > seqlock->sn) from being reordered with some later store (e.g. writes > > to the protected data), thus it would allow readers to see updated > > data (possibly torn) with a pre-update sequence number. We need a > > StoreStore barrier for ordering the SN store and data stores => > > fence(release). > > > > Acquire and releases fences can (also) be used to create > > synchronize-with relationships (this is how the C standard defines > > them). Preshing has a good example on this. Basically > > Thread 1: > > data = 242; > > atomic_thread_fence(atomic_release); > > atomic_store_n(&guard, 1, atomic_relaxed); > > > > Thread 2: > > while (atomic_load_n(&guard, atomic_relaxed) != 1) ; > > atomic_thread_fence(atomic_acquire); > > do_something(data); > > > > These are obvious analogues to store-release and load-acquire, thus > > the acquire & release names of the fences. > > > > - Ola > > > >> > >>> > >>>>> +} > >>>>> + > >>>>> +__rte_experimental > >>>>> +static inline void > >>>>> +rte_seqlock_write_end(rte_seqlock_t *seqlock) > >>>>> +{ > >>>>> + uint64_t sn; > >>>>> + > >>>>> + sn = seqlock->sn + 1; > >>>>> + > >>>>> + /* synchronizes-with the load acquire in rte_seqlock_begin() */ > >>>>> + __atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELEASE); > >>>>> + > >>>>> + rte_spinlock_unlock(&seqlock->lock); > >>>>> +} > >>>>> + > >> > > I have nothing to add, but Ola's mail seems to have been blocked from > the dev list, so I'm posting this again. Ok, thanks Ola for detailed explanation. Have to admit then that my understanding of atomic_fence() behaviour was incorrect. Please disregard my comments above about rte_seqlock_read_retry() and rte_seqlock_write_begin(). Konstantin ^ permalink raw reply [flat|nested] 104+ messages in thread
* [PATCH] eal: add seqlock 2022-03-29 13:20 ` Ananyev, Konstantin @ 2022-03-30 10:07 ` Mattias Rönnblom 2022-03-30 10:50 ` Morten Brørup 0 siblings, 1 reply; 104+ messages in thread From: Mattias Rönnblom @ 2022-03-30 10:07 UTC (permalink / raw) To: dev Cc: Thomas Monjalon, David Marchand, onar.olsen, Honnappa.Nagarahalli, nd, konstantin.ananyev, mb, stephen, Mattias Rönnblom, Ola Liljedahl A sequence lock (seqlock) is synchronization primitive which allows for data-race free, low-overhead, high-frequency reads, especially for data structures shared across many cores and which are updated with relatively infrequently. A seqlock permits multiple parallel readers. The variant of seqlock implemented in this patch supports multiple writers as well. A spinlock is used for writer-writer serialization. To avoid resource reclamation and other issues, the data protected by a seqlock is best off being self-contained (i.e., no pointers [except to constant data]). One way to think about seqlocks is that they provide means to perform atomic operations on data objects larger what the native atomic machine instructions allow for. DPDK seqlocks are not preemption safe on the writer side. A thread preemption affects performance, not correctness. A seqlock contains a sequence number, which can be thought of as the generation of the data it protects. A reader will 1. Load the sequence number (sn). 2. Load, in arbitrary order, the seqlock-protected data. 3. Load the sn again. 4. Check if the first and second sn are equal, and even numbered. If they are not, discard the loaded data, and restart from 1. The first three steps need to be ordered using suitable memory fences. A writer will 1. Take the spinlock, to serialize writer access. 2. Load the sn. 3. Store the original sn + 1 as the new sn. 4. Perform load and stores to the seqlock-protected data. 5. Store the original sn + 2 as the new sn. 6. Release the spinlock. Proper memory fencing is required to make sure the first sn store, the data stores, and the second sn store appear to the reader in the mentioned order. The sn loads and stores must be atomic, but the data loads and stores need not be. The original seqlock design and implementation was done by Stephen Hemminger. This is an independent implementation, using C11 atomics. For more information on seqlocks, see https://en.wikipedia.org/wiki/Seqlock Updates since RFC: * Added API documentation. * Added link to Wikipedia article in the commit message. * Changed seqlock sequence number field from uint64_t (which was overkill) to uint32_t. The sn type needs to be sufficiently large to assure no reader will read a sn, access the data, and then read the same sn, but the sn has been updated to many times during the read, so it has wrapped. * Added RTE_SEQLOCK_INITIALIZER macro for static initialization. * Removed the rte_seqlock struct + separate rte_seqlock_t typedef with an anonymous struct typedef:ed to rte_seqlock_t. Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com> --- app/test/meson.build | 2 + app/test/test_seqlock.c | 200 ++++++++++++++++++++++++ lib/eal/common/meson.build | 1 + lib/eal/common/rte_seqlock.c | 12 ++ lib/eal/include/meson.build | 1 + lib/eal/include/rte_seqlock.h | 282 ++++++++++++++++++++++++++++++++++ lib/eal/version.map | 3 + 7 files changed, 501 insertions(+) create mode 100644 app/test/test_seqlock.c create mode 100644 lib/eal/common/rte_seqlock.c create mode 100644 lib/eal/include/rte_seqlock.h diff --git a/app/test/meson.build b/app/test/meson.build index 5fc1dd1b7b..5e418e8766 100644 --- a/app/test/meson.build +++ b/app/test/meson.build @@ -125,6 +125,7 @@ test_sources = files( 'test_rwlock.c', 'test_sched.c', 'test_security.c', + 'test_seqlock.c', 'test_service_cores.c', 'test_spinlock.c', 'test_stack.c', @@ -214,6 +215,7 @@ fast_tests = [ ['rwlock_rde_wro_autotest', true], ['sched_autotest', true], ['security_autotest', false], + ['seqlock_autotest', true], ['spinlock_autotest', true], ['stack_autotest', false], ['stack_lf_autotest', false], diff --git a/app/test/test_seqlock.c b/app/test/test_seqlock.c new file mode 100644 index 0000000000..8d094a3c32 --- /dev/null +++ b/app/test/test_seqlock.c @@ -0,0 +1,200 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2022 Ericsson AB + */ + +#include <rte_seqlock.h> + +#include <rte_cycles.h> +#include <rte_malloc.h> +#include <rte_random.h> + +#include <inttypes.h> + +#include "test.h" + +struct data { + rte_seqlock_t lock; + + uint64_t a; + uint64_t b __rte_cache_aligned; + uint64_t c __rte_cache_aligned; +} __rte_cache_aligned; + +struct reader { + struct data *data; + uint8_t stop; +}; + +#define WRITER_RUNTIME (2.0) /* s */ + +#define WRITER_MAX_DELAY (100) /* us */ + +#define INTERRUPTED_WRITER_FREQUENCY (1000) +#define WRITER_INTERRUPT_TIME (1) /* us */ + +static int +writer_start(void *arg) +{ + struct data *data = arg; + uint64_t deadline; + + deadline = rte_get_timer_cycles() + + WRITER_RUNTIME * rte_get_timer_hz(); + + while (rte_get_timer_cycles() < deadline) { + bool interrupted; + uint64_t new_value; + unsigned int delay; + + new_value = rte_rand(); + + interrupted = rte_rand_max(INTERRUPTED_WRITER_FREQUENCY) == 0; + + rte_seqlock_write_begin(&data->lock); + + data->c = new_value; + + /* These compiler barriers (both on the test reader + * and the test writer side) are here to ensure that + * loads/stores *usually* happen in test program order + * (always on a TSO machine). They are arrange in such + * a way that the writer stores in a different order + * than the reader loads, to emulate an arbitrary + * order. A real application using a seqlock does not + * require any compiler barriers. + */ + rte_compiler_barrier(); + data->b = new_value; + + if (interrupted) + rte_delay_us_block(WRITER_INTERRUPT_TIME); + + rte_compiler_barrier(); + data->a = new_value; + + rte_seqlock_write_end(&data->lock); + + delay = rte_rand_max(WRITER_MAX_DELAY); + + rte_delay_us_block(delay); + } + + return 0; +} + +#define INTERRUPTED_READER_FREQUENCY (1000) +#define READER_INTERRUPT_TIME (1000) /* us */ + +static int +reader_start(void *arg) +{ + struct reader *r = arg; + int rc = 0; + + while (__atomic_load_n(&r->stop, __ATOMIC_RELAXED) == 0 && rc == 0) { + struct data *data = r->data; + bool interrupted; + uint64_t a; + uint64_t b; + uint64_t c; + uint32_t sn; + + interrupted = rte_rand_max(INTERRUPTED_READER_FREQUENCY) == 0; + + do { + sn = rte_seqlock_read_begin(&data->lock); + + a = data->a; + /* See writer_start() for an explanation why + * these barriers are here. + */ + rte_compiler_barrier(); + + if (interrupted) + rte_delay_us_block(READER_INTERRUPT_TIME); + + c = data->c; + + rte_compiler_barrier(); + b = data->b; + + } while (rte_seqlock_read_retry(&data->lock, sn)); + + if (a != b || b != c) { + printf("Reader observed inconsistent data values " + "%" PRIu64 " %" PRIu64 " %" PRIu64 "\n", + a, b, c); + rc = -1; + } + } + + return rc; +} + +static void +reader_stop(struct reader *reader) +{ + __atomic_store_n(&reader->stop, 1, __ATOMIC_RELAXED); +} + +#define NUM_WRITERS (2) +#define MIN_NUM_READERS (2) +#define MAX_READERS (RTE_MAX_LCORE - NUM_WRITERS - 1) +#define MIN_LCORE_COUNT (NUM_WRITERS + MIN_NUM_READERS + 1) + +/* Only a compile-time test */ +static rte_seqlock_t __rte_unused static_init_lock = RTE_SEQLOCK_INITIALIZER; + +static int +test_seqlock(void) +{ + struct reader readers[MAX_READERS]; + unsigned int num_readers; + unsigned int num_lcores; + unsigned int i; + unsigned int lcore_id; + unsigned int writer_lcore_ids[NUM_WRITERS] = { 0 }; + unsigned int reader_lcore_ids[MAX_READERS]; + int rc = 0; + + num_lcores = rte_lcore_count(); + + if (num_lcores < MIN_LCORE_COUNT) + return -1; + + num_readers = num_lcores - NUM_WRITERS - 1; + + struct data *data = rte_zmalloc(NULL, sizeof(struct data), 0); + + i = 0; + RTE_LCORE_FOREACH_WORKER(lcore_id) { + if (i < NUM_WRITERS) { + rte_eal_remote_launch(writer_start, data, lcore_id); + writer_lcore_ids[i] = lcore_id; + } else { + unsigned int reader_idx = i - NUM_WRITERS; + struct reader *reader = &readers[reader_idx]; + + reader->data = data; + reader->stop = 0; + + rte_eal_remote_launch(reader_start, reader, lcore_id); + reader_lcore_ids[reader_idx] = lcore_id; + } + i++; + } + + for (i = 0; i < NUM_WRITERS; i++) + if (rte_eal_wait_lcore(writer_lcore_ids[i]) != 0) + rc = -1; + + for (i = 0; i < num_readers; i++) { + reader_stop(&readers[i]); + if (rte_eal_wait_lcore(reader_lcore_ids[i]) != 0) + rc = -1; + } + + return rc; +} + +REGISTER_TEST_COMMAND(seqlock_autotest, test_seqlock); diff --git a/lib/eal/common/meson.build b/lib/eal/common/meson.build index 917758cc65..a41343bfed 100644 --- a/lib/eal/common/meson.build +++ b/lib/eal/common/meson.build @@ -35,6 +35,7 @@ sources += files( 'rte_malloc.c', 'rte_random.c', 'rte_reciprocal.c', + 'rte_seqlock.c', 'rte_service.c', 'rte_version.c', ) diff --git a/lib/eal/common/rte_seqlock.c b/lib/eal/common/rte_seqlock.c new file mode 100644 index 0000000000..d4fe648799 --- /dev/null +++ b/lib/eal/common/rte_seqlock.c @@ -0,0 +1,12 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2022 Ericsson AB + */ + +#include <rte_seqlock.h> + +void +rte_seqlock_init(rte_seqlock_t *seqlock) +{ + seqlock->sn = 0; + rte_spinlock_init(&seqlock->lock); +} diff --git a/lib/eal/include/meson.build b/lib/eal/include/meson.build index 9700494816..48df5f1a21 100644 --- a/lib/eal/include/meson.build +++ b/lib/eal/include/meson.build @@ -36,6 +36,7 @@ headers += files( 'rte_per_lcore.h', 'rte_random.h', 'rte_reciprocal.h', + 'rte_seqlock.h', 'rte_service.h', 'rte_service_component.h', 'rte_string_fns.h', diff --git a/lib/eal/include/rte_seqlock.h b/lib/eal/include/rte_seqlock.h new file mode 100644 index 0000000000..03a7da57e9 --- /dev/null +++ b/lib/eal/include/rte_seqlock.h @@ -0,0 +1,282 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2022 Ericsson AB + */ + +#ifndef _RTE_SEQLOCK_H_ +#define _RTE_SEQLOCK_H_ + +/** + * @file + * RTE Seqlock + * + * A sequence lock (seqlock) is a synchronization primitive allowing + * multiple, parallel, readers to efficiently and safely (i.e., in a + * data-race free manner) access the lock-protected data. The RTE + * seqlock permits multiple writers as well. A spinlock is used for + * writer-writer synchronization. + * + * A reader never blocks a writer. Very high frequency writes may + * prevent readers from making progress. + * + * A seqlock is not preemption-safe on the writer side. If a writer is + * preempted, it may block readers until the writer thread is again + * allowed to execute. Heavy computations should be kept out of the + * writer-side critical section, to avoid delaying readers. + * + * Seqlocks are useful for data which are read by many cores, at a + * high frequency, and relatively infrequently written to. + * + * One way to think about seqlocks is that they provide means to + * perform atomic operations on objects larger than what the native + * machine instructions allow for. + * + * To avoid resource reclaimation issues, the data protected by a + * seqlock should typically be kept self-contained (e.g., no pointers + * to mutable, dynamically allocated data). + * + * Example usage: + * @code{.c} + * #define MAX_Y_LEN (16) + * // Application-defined example data structure, protected by a seqlock. + * struct config { + * rte_seqlock_t lock; + * int param_x; + * char param_y[MAX_Y_LEN]; + * }; + * + * // Accessor function for reading config fields. + * void + * config_read(const struct config *config, int *param_x, char *param_y) + * { + * // Temporary variables, just to improve readability. + * int tentative_x; + * char tentative_y[MAX_Y_LEN]; + * + * do { + * rte_seqlock_read(&config->lock); + * // Loads may be atomic or non-atomic, as in this example. + * tentative_x = config->param_x; + * strcpy(tentative_y, config->param_y); + * } while (rte_seqlock_read_retry(&config->lock)); + * // An application could skip retrying, and try again later, if + * // it can make progress without the data. + * + * *param_x = tentative_x; + * strcpy(param_y, tentative_y); + * } + * + * // Accessor function for writing config fields. + * void + * config_update(struct config *config, int param_x, const char *param_y) + * { + * rte_seqlock_write_begin(&config->lock); + * // Stores may be atomic or non-atomic, as in this example. + * config->param_x = param_x; + * strcpy(config->param_y, param_y); + * rte_seqlock_write_end(&config->lock); + * } + * @endcode + * + * @see + * https://en.wikipedia.org/wiki/Seqlock. + */ + +#include <stdbool.h> +#include <stdint.h> + +#include <rte_atomic.h> +#include <rte_branch_prediction.h> +#include <rte_spinlock.h> + +/** + * The RTE seqlock type. + */ +typedef struct { + uint32_t sn; /**< A generation number for the protected data. */ + rte_spinlock_t lock; /**< Spinlock used to serialize writers. */ +} rte_seqlock_t; + +/** + * A static seqlock initializer. + */ +#define RTE_SEQLOCK_INITIALIZER { 0, RTE_SPINLOCK_INITIALIZER } + +/** + * Initialize the seqlock. + * + * This function initializes the seqlock, and leaves the writer-side + * spinlock unlocked. + * + * @param seqlock + * A pointer to the seqlock. + */ +__rte_experimental +void +rte_seqlock_init(rte_seqlock_t *seqlock); + +/** + * Begin a read-side critical section. + * + * A call to this function marks the beginning of a read-side critical + * section, for @p seqlock. + * + * rte_seqlock_read_begin() returns a sequence number, which is later + * used in rte_seqlock_read_retry() to check if the protected data + * underwent any modifications during the read transaction. + * + * After (in program order) rte_seqlock_read_begin() has been called, + * the calling thread may read and copy the protected data. The + * protected data read *must* be copied (either in pristine form, or + * in the form of some derivative). A copy is required since the + * application only may read the data in the read-side criticial + * section (i.e., after rte_seqlock_read_begin() and before + * rte_seqlock_read_retry()), but must not act upon the retrieved data + * while in the critical section, since it does not yet know if it is + * consistent. + * + * The data may be accessed with both atomic and/or non-atomic loads. + * + * After (in program order) all required data loads have been + * performed, rte_seqlock_read_retry() must be called, marking the end + * of the read-side critical section. + * + * If rte_seqlock_read_retry() returns true, the just-read data is + * inconsistent and should be discarded. If rte_seqlock_read_retry() + * returns false, the data was read atomically and the copied data is + * consistent. + * + * If rte_seqlock_read_retry() returns false, the application has the + * option to immediately restart the whole procedure (e.g., calling + * rte_seqlock_read_being() again), or do the same at some later time. + * + * @param seqlock + * A pointer to the seqlock. + * @return + * The seqlock sequence number for this critical section, to + * later be passed to rte_seqlock_read_retry(). + * + * @see rte_seqlock_read_retry() + */ +__rte_experimental +static inline uint32_t +rte_seqlock_read_begin(const rte_seqlock_t *seqlock) +{ + /* __ATOMIC_ACQUIRE to prevent loads after (in program order) + * from happening before the sn load. Synchronizes-with the + * store release in rte_seqlock_end(). + */ + return __atomic_load_n(&seqlock->sn, __ATOMIC_ACQUIRE); +} + +/** + * End a read-side critical section. + * + * A call to this function marks the end of a read-side critical + * section, for @p seqlock. The application must supply the sequence + * number returned from the corresponding rte_seqlock_read_begin() + * call. + * + * After this function has been called, the caller should not access + * the protected data. + * + * In case this function returns false, the just-read data was + * consistent and the set of atomic and non-atomic load operations + * performed between rte_seqlock_read_begin() and + * rte_seqlock_read_retry() were atomic, as a whole. + * + * In case rte_seqlock_read_retry() returns true, the data was + * modified as it was being read and may be inconsistent, and thus + * should be discarded. + * + * @param seqlock + * A pointer to the seqlock. + * @param begin_sn + * The seqlock sequence number that was returned by + * rte_seqlock_read_begin() for this critical section. + * @return + * true or false, if the just-read seqlock-protected data is inconsistent + * or consistent, respectively. + * + * @see rte_seqlock_read_begin() + */ +__rte_experimental +static inline bool +rte_seqlock_read_retry(const rte_seqlock_t *seqlock, uint32_t begin_sn) +{ + uint32_t end_sn; + + /* make sure the data loads happens before the sn load */ + rte_atomic_thread_fence(__ATOMIC_ACQUIRE); + + end_sn = __atomic_load_n(&seqlock->sn, __ATOMIC_RELAXED); + + return unlikely(begin_sn & 1 || begin_sn != end_sn); +} + +/** + * Begin write-side critical section. + * + * A call to this function acquires the write lock associated @p + * seqlock, and marks the beginning of a write-side critical section. + * + * After having called this function, the caller may go on to modify + * the protected data, in an atomic or non-atomic manner. + * + * After the nessesary updates have been performed, the application + * calls rte_seqlock_write_end(). + * + * This function is not preemption-safe in the sense that preemption + * of the calling thread may block reader progress until the writer + * thread is rescheduled. + * + * @param seqlock + * A pointer to the seqlock. + * + * @see rte_seqlock_write_end() + */ +__rte_experimental +static inline void +rte_seqlock_write_begin(rte_seqlock_t *seqlock) +{ + uint32_t sn; + + /* to synchronize with other writers */ + rte_spinlock_lock(&seqlock->lock); + + sn = seqlock->sn + 1; + + __atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELAXED); + + /* __ATOMIC_RELEASE to prevent stores after (in program order) + * from happening before the sn store. + */ + rte_atomic_thread_fence(__ATOMIC_RELEASE); +} + +/** + * End write-side critical section. + * + * A call to this function marks the end of the write-side critical + * section, for @p seqlock. After this call has been made, the protected + * data may no longer be modified. + * + * @param seqlock + * A pointer to the seqlock. + * + * @see rte_seqlock_write_begin() + */ +__rte_experimental +static inline void +rte_seqlock_write_end(rte_seqlock_t *seqlock) +{ + uint32_t sn; + + sn = seqlock->sn + 1; + + /* synchronizes-with the load acquire in rte_seqlock_begin() */ + __atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELEASE); + + rte_spinlock_unlock(&seqlock->lock); +} + +#endif /* _RTE_SEQLOCK_H_ */ diff --git a/lib/eal/version.map b/lib/eal/version.map index b53eeb30d7..4a9d0ed899 100644 --- a/lib/eal/version.map +++ b/lib/eal/version.map @@ -420,6 +420,9 @@ EXPERIMENTAL { rte_intr_instance_free; rte_intr_type_get; rte_intr_type_set; + + # added in 22.07 + rte_seqlock_init; }; INTERNAL { -- 2.25.1 ^ permalink raw reply [flat|nested] 104+ messages in thread
* RE: [PATCH] eal: add seqlock 2022-03-30 10:07 ` [PATCH] " Mattias Rönnblom @ 2022-03-30 10:50 ` Morten Brørup 2022-03-30 11:24 ` Tyler Retzlaff ` (2 more replies) 0 siblings, 3 replies; 104+ messages in thread From: Morten Brørup @ 2022-03-30 10:50 UTC (permalink / raw) To: Mattias Rönnblom, dev Cc: Thomas Monjalon, David Marchand, onar.olsen, Honnappa.Nagarahalli, nd, konstantin.ananyev, stephen, Ola Liljedahl > From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com] > Sent: Wednesday, 30 March 2022 12.07 > + > +/** > + * The RTE seqlock type. > + */ > +typedef struct { > + uint32_t sn; /**< A generation number for the protected data. */ > + rte_spinlock_t lock; /**< Spinlock used to serialize writers. */ > +} rte_seqlock_t; > + You refer to 'sn' as the sequence number everywhere else, so please document is as such: "/**< Sequence number for the protected data. */" Also, consider making 'sn' volatile, although it is only accessed through the __atomic_load_n() function. I don't know if it makes any difference, so I'm just bringing this to the attention of the experts! Acked-by: Morten Brørup <mb@smartsharesystems.com> ^ permalink raw reply [flat|nested] 104+ messages in thread
* Re: [PATCH] eal: add seqlock 2022-03-30 10:50 ` Morten Brørup @ 2022-03-30 11:24 ` Tyler Retzlaff 2022-03-30 11:25 ` Mattias Rönnblom 2022-03-30 14:26 ` [PATCH v2] " Mattias Rönnblom 2 siblings, 0 replies; 104+ messages in thread From: Tyler Retzlaff @ 2022-03-30 11:24 UTC (permalink / raw) To: Morten Brørup Cc: Mattias Rönnblom, dev, Thomas Monjalon, David Marchand, onar.olsen, Honnappa.Nagarahalli, nd, konstantin.ananyev, stephen, Ola Liljedahl On Wed, Mar 30, 2022 at 12:50:42PM +0200, Morten Brørup wrote: > > From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com] > > Sent: Wednesday, 30 March 2022 12.07 > > > + > > +/** > > + * The RTE seqlock type. > > + */ > > +typedef struct { > > + uint32_t sn; /**< A generation number for the protected data. */ > > + rte_spinlock_t lock; /**< Spinlock used to serialize writers. */ > > +} rte_seqlock_t; > > + > > You refer to 'sn' as the sequence number everywhere else, so please document is as such: > "/**< Sequence number for the protected data. */" > > Also, consider making 'sn' volatile, although it is only accessed through the __atomic_load_n() function. I don't know if it makes any difference, so I'm just bringing this to the attention of the experts! i don't think there is value added by cv-volatile qualification. if we want correct/portable behavior for all targets then we should just access with appropriate atomics builtins/intrinsics they will be qualifying volatile and generating correct barriers when necessary. > > Acked-by: Morten Brørup <mb@smartsharesystems.com> > ^ permalink raw reply [flat|nested] 104+ messages in thread
* Re: [PATCH] eal: add seqlock 2022-03-30 10:50 ` Morten Brørup 2022-03-30 11:24 ` Tyler Retzlaff @ 2022-03-30 11:25 ` Mattias Rönnblom 2022-03-30 14:26 ` [PATCH v2] " Mattias Rönnblom 2 siblings, 0 replies; 104+ messages in thread From: Mattias Rönnblom @ 2022-03-30 11:25 UTC (permalink / raw) To: Morten Brørup, dev Cc: Thomas Monjalon, David Marchand, Onar Olsen, Honnappa.Nagarahalli, nd, konstantin.ananyev, stephen, Ola Liljedahl On 2022-03-30 12:50, Morten Brørup wrote: >> From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com] >> Sent: Wednesday, 30 March 2022 12.07 >> + >> +/** >> + * The RTE seqlock type. >> + */ >> +typedef struct { >> + uint32_t sn; /**< A generation number for the protected data. */ >> + rte_spinlock_t lock; /**< Spinlock used to serialize writers. */ >> +} rte_seqlock_t; >> + > You refer to 'sn' as the sequence number everywhere else, so please document is as such: > "/**< Sequence number for the protected data. */" Will do. > > Also, consider making 'sn' volatile, although it is only accessed through the __atomic_load_n() function. I don't know if it makes any difference, so I'm just bringing this to the attention of the experts! It might make a difference, but not for the better. There are almost no valid uses of volatile for core-to-core/thread-to-thread synchronization, in C11. > Acked-by: Morten Brørup <mb@smartsharesystems.com> > Thanks for your comments. ^ permalink raw reply [flat|nested] 104+ messages in thread
* [PATCH v2] eal: add seqlock 2022-03-30 10:50 ` Morten Brørup 2022-03-30 11:24 ` Tyler Retzlaff 2022-03-30 11:25 ` Mattias Rönnblom @ 2022-03-30 14:26 ` Mattias Rönnblom 2022-03-31 7:46 ` Mattias Rönnblom ` (2 more replies) 2 siblings, 3 replies; 104+ messages in thread From: Mattias Rönnblom @ 2022-03-30 14:26 UTC (permalink / raw) To: dev Cc: Thomas Monjalon, David Marchand, onar.olsen, Honnappa.Nagarahalli, nd, konstantin.ananyev, mb, stephen, Mattias Rönnblom, Ola Liljedahl A sequence lock (seqlock) is synchronization primitive which allows for data-race free, low-overhead, high-frequency reads, especially for data structures shared across many cores and which are updated with relatively infrequently. A seqlock permits multiple parallel readers. The variant of seqlock implemented in this patch supports multiple writers as well. A spinlock is used for writer-writer serialization. To avoid resource reclamation and other issues, the data protected by a seqlock is best off being self-contained (i.e., no pointers [except to constant data]). One way to think about seqlocks is that they provide means to perform atomic operations on data objects larger what the native atomic machine instructions allow for. DPDK seqlocks are not preemption safe on the writer side. A thread preemption affects performance, not correctness. A seqlock contains a sequence number, which can be thought of as the generation of the data it protects. A reader will 1. Load the sequence number (sn). 2. Load, in arbitrary order, the seqlock-protected data. 3. Load the sn again. 4. Check if the first and second sn are equal, and even numbered. If they are not, discard the loaded data, and restart from 1. The first three steps need to be ordered using suitable memory fences. A writer will 1. Take the spinlock, to serialize writer access. 2. Load the sn. 3. Store the original sn + 1 as the new sn. 4. Perform load and stores to the seqlock-protected data. 5. Store the original sn + 2 as the new sn. 6. Release the spinlock. Proper memory fencing is required to make sure the first sn store, the data stores, and the second sn store appear to the reader in the mentioned order. The sn loads and stores must be atomic, but the data loads and stores need not be. The original seqlock design and implementation was done by Stephen Hemminger. This is an independent implementation, using C11 atomics. For more information on seqlocks, see https://en.wikipedia.org/wiki/Seqlock PATCH v2: * Skip instead of fail unit test in case too few lcores are available. * Use main lcore for testing, reducing the minimum number of lcores required to run the unit tests to four. * Consistently refer to sn field as the "sequence number" in the documentation. * Fixed spelling mistakes in documentation. Updates since RFC: * Added API documentation. * Added link to Wikipedia article in the commit message. * Changed seqlock sequence number field from uint64_t (which was overkill) to uint32_t. The sn type needs to be sufficiently large to assure no reader will read a sn, access the data, and then read the same sn, but the sn has been updated to many times during the read, so it has wrapped. * Added RTE_SEQLOCK_INITIALIZER macro for static initialization. * Removed the rte_seqlock struct + separate rte_seqlock_t typedef with an anonymous struct typedef:ed to rte_seqlock_t. Acked-by: Morten Brørup <mb@smartsharesystems.com> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com> --- app/test/meson.build | 2 + app/test/test_seqlock.c | 202 ++++++++++++++++++++++++ lib/eal/common/meson.build | 1 + lib/eal/common/rte_seqlock.c | 12 ++ lib/eal/include/meson.build | 1 + lib/eal/include/rte_seqlock.h | 282 ++++++++++++++++++++++++++++++++++ lib/eal/version.map | 3 + 7 files changed, 503 insertions(+) create mode 100644 app/test/test_seqlock.c create mode 100644 lib/eal/common/rte_seqlock.c create mode 100644 lib/eal/include/rte_seqlock.h diff --git a/app/test/meson.build b/app/test/meson.build index 5fc1dd1b7b..5e418e8766 100644 --- a/app/test/meson.build +++ b/app/test/meson.build @@ -125,6 +125,7 @@ test_sources = files( 'test_rwlock.c', 'test_sched.c', 'test_security.c', + 'test_seqlock.c', 'test_service_cores.c', 'test_spinlock.c', 'test_stack.c', @@ -214,6 +215,7 @@ fast_tests = [ ['rwlock_rde_wro_autotest', true], ['sched_autotest', true], ['security_autotest', false], + ['seqlock_autotest', true], ['spinlock_autotest', true], ['stack_autotest', false], ['stack_lf_autotest', false], diff --git a/app/test/test_seqlock.c b/app/test/test_seqlock.c new file mode 100644 index 0000000000..ba1755d9ad --- /dev/null +++ b/app/test/test_seqlock.c @@ -0,0 +1,202 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2022 Ericsson AB + */ + +#include <rte_seqlock.h> + +#include <rte_cycles.h> +#include <rte_malloc.h> +#include <rte_random.h> + +#include <inttypes.h> + +#include "test.h" + +struct data { + rte_seqlock_t lock; + + uint64_t a; + uint64_t b __rte_cache_aligned; + uint64_t c __rte_cache_aligned; +} __rte_cache_aligned; + +struct reader { + struct data *data; + uint8_t stop; +}; + +#define WRITER_RUNTIME (2.0) /* s */ + +#define WRITER_MAX_DELAY (100) /* us */ + +#define INTERRUPTED_WRITER_FREQUENCY (1000) +#define WRITER_INTERRUPT_TIME (1) /* us */ + +static int +writer_run(void *arg) +{ + struct data *data = arg; + uint64_t deadline; + + deadline = rte_get_timer_cycles() + + WRITER_RUNTIME * rte_get_timer_hz(); + + while (rte_get_timer_cycles() < deadline) { + bool interrupted; + uint64_t new_value; + unsigned int delay; + + new_value = rte_rand(); + + interrupted = rte_rand_max(INTERRUPTED_WRITER_FREQUENCY) == 0; + + rte_seqlock_write_begin(&data->lock); + + data->c = new_value; + + /* These compiler barriers (both on the test reader + * and the test writer side) are here to ensure that + * loads/stores *usually* happen in test program order + * (always on a TSO machine). They are arrange in such + * a way that the writer stores in a different order + * than the reader loads, to emulate an arbitrary + * order. A real application using a seqlock does not + * require any compiler barriers. + */ + rte_compiler_barrier(); + data->b = new_value; + + if (interrupted) + rte_delay_us_block(WRITER_INTERRUPT_TIME); + + rte_compiler_barrier(); + data->a = new_value; + + rte_seqlock_write_end(&data->lock); + + delay = rte_rand_max(WRITER_MAX_DELAY); + + rte_delay_us_block(delay); + } + + return 0; +} + +#define INTERRUPTED_READER_FREQUENCY (1000) +#define READER_INTERRUPT_TIME (1000) /* us */ + +static int +reader_run(void *arg) +{ + struct reader *r = arg; + int rc = 0; + + while (__atomic_load_n(&r->stop, __ATOMIC_RELAXED) == 0 && rc == 0) { + struct data *data = r->data; + bool interrupted; + uint64_t a; + uint64_t b; + uint64_t c; + uint32_t sn; + + interrupted = rte_rand_max(INTERRUPTED_READER_FREQUENCY) == 0; + + do { + sn = rte_seqlock_read_begin(&data->lock); + + a = data->a; + /* See writer_run() for an explanation why + * these barriers are here. + */ + rte_compiler_barrier(); + + if (interrupted) + rte_delay_us_block(READER_INTERRUPT_TIME); + + c = data->c; + + rte_compiler_barrier(); + b = data->b; + + } while (rte_seqlock_read_retry(&data->lock, sn)); + + if (a != b || b != c) { + printf("Reader observed inconsistent data values " + "%" PRIu64 " %" PRIu64 " %" PRIu64 "\n", + a, b, c); + rc = -1; + } + } + + return rc; +} + +static void +reader_stop(struct reader *reader) +{ + __atomic_store_n(&reader->stop, 1, __ATOMIC_RELAXED); +} + +#define NUM_WRITERS (2) /* master lcore + one worker */ +#define MIN_NUM_READERS (2) +#define MAX_READERS (RTE_MAX_LCORE - NUM_WRITERS - 1) +#define MIN_LCORE_COUNT (NUM_WRITERS + MIN_NUM_READERS) + +/* Only a compile-time test */ +static rte_seqlock_t __rte_unused static_init_lock = RTE_SEQLOCK_INITIALIZER; + +static int +test_seqlock(void) +{ + struct reader readers[MAX_READERS]; + unsigned int num_readers; + unsigned int num_lcores; + unsigned int i; + unsigned int lcore_id; + unsigned int reader_lcore_ids[MAX_READERS]; + unsigned int worker_writer_lcore_id = 0; + int rc = 0; + + num_lcores = rte_lcore_count(); + + if (num_lcores < MIN_LCORE_COUNT) { + printf("Too few cores to run test. Skipping.\n"); + return 0; + } + + num_readers = num_lcores - NUM_WRITERS; + + struct data *data = rte_zmalloc(NULL, sizeof(struct data), 0); + + i = 0; + RTE_LCORE_FOREACH_WORKER(lcore_id) { + if (i == 0) { + rte_eal_remote_launch(writer_run, data, lcore_id); + worker_writer_lcore_id = lcore_id; + } else { + unsigned int reader_idx = i - 1; + struct reader *reader = &readers[reader_idx]; + + reader->data = data; + reader->stop = 0; + + rte_eal_remote_launch(reader_run, reader, lcore_id); + reader_lcore_ids[reader_idx] = lcore_id; + } + i++; + } + + if (writer_run(data) != 0 || + rte_eal_wait_lcore(worker_writer_lcore_id) != 0) + rc = -1; + + for (i = 0; i < num_readers; i++) { + reader_stop(&readers[i]); + if (rte_eal_wait_lcore(reader_lcore_ids[i]) != 0) + rc = -1; + } + + return rc; +} + +REGISTER_TEST_COMMAND(seqlock_autotest, test_seqlock); diff --git a/lib/eal/common/meson.build b/lib/eal/common/meson.build index 917758cc65..a41343bfed 100644 --- a/lib/eal/common/meson.build +++ b/lib/eal/common/meson.build @@ -35,6 +35,7 @@ sources += files( 'rte_malloc.c', 'rte_random.c', 'rte_reciprocal.c', + 'rte_seqlock.c', 'rte_service.c', 'rte_version.c', ) diff --git a/lib/eal/common/rte_seqlock.c b/lib/eal/common/rte_seqlock.c new file mode 100644 index 0000000000..d4fe648799 --- /dev/null +++ b/lib/eal/common/rte_seqlock.c @@ -0,0 +1,12 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2022 Ericsson AB + */ + +#include <rte_seqlock.h> + +void +rte_seqlock_init(rte_seqlock_t *seqlock) +{ + seqlock->sn = 0; + rte_spinlock_init(&seqlock->lock); +} diff --git a/lib/eal/include/meson.build b/lib/eal/include/meson.build index 9700494816..48df5f1a21 100644 --- a/lib/eal/include/meson.build +++ b/lib/eal/include/meson.build @@ -36,6 +36,7 @@ headers += files( 'rte_per_lcore.h', 'rte_random.h', 'rte_reciprocal.h', + 'rte_seqlock.h', 'rte_service.h', 'rte_service_component.h', 'rte_string_fns.h', diff --git a/lib/eal/include/rte_seqlock.h b/lib/eal/include/rte_seqlock.h new file mode 100644 index 0000000000..12cc3cdcb2 --- /dev/null +++ b/lib/eal/include/rte_seqlock.h @@ -0,0 +1,282 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2022 Ericsson AB + */ + +#ifndef _RTE_SEQLOCK_H_ +#define _RTE_SEQLOCK_H_ + +/** + * @file + * RTE Seqlock + * + * A sequence lock (seqlock) is a synchronization primitive allowing + * multiple, parallel, readers to efficiently and safely (i.e., in a + * data-race free manner) access the lock-protected data. The RTE + * seqlock permits multiple writers as well. A spinlock is used for + * writer-writer synchronization. + * + * A reader never blocks a writer. Very high frequency writes may + * prevent readers from making progress. + * + * A seqlock is not preemption-safe on the writer side. If a writer is + * preempted, it may block readers until the writer thread is again + * allowed to execute. Heavy computations should be kept out of the + * writer-side critical section, to avoid delaying readers. + * + * Seqlocks are useful for data which are read by many cores, at a + * high frequency, and relatively infrequently written to. + * + * One way to think about seqlocks is that they provide means to + * perform atomic operations on objects larger than what the native + * machine instructions allow for. + * + * To avoid resource reclamation issues, the data protected by a + * seqlock should typically be kept self-contained (e.g., no pointers + * to mutable, dynamically allocated data). + * + * Example usage: + * @code{.c} + * #define MAX_Y_LEN (16) + * // Application-defined example data structure, protected by a seqlock. + * struct config { + * rte_seqlock_t lock; + * int param_x; + * char param_y[MAX_Y_LEN]; + * }; + * + * // Accessor function for reading config fields. + * void + * config_read(const struct config *config, int *param_x, char *param_y) + * { + * // Temporary variables, just to improve readability. + * int tentative_x; + * char tentative_y[MAX_Y_LEN]; + * + * do { + * rte_seqlock_read(&config->lock); + * // Loads may be atomic or non-atomic, as in this example. + * tentative_x = config->param_x; + * strcpy(tentative_y, config->param_y); + * } while (rte_seqlock_read_retry(&config->lock)); + * // An application could skip retrying, and try again later, if + * // it can make progress without the data. + * + * *param_x = tentative_x; + * strcpy(param_y, tentative_y); + * } + * + * // Accessor function for writing config fields. + * void + * config_update(struct config *config, int param_x, const char *param_y) + * { + * rte_seqlock_write_begin(&config->lock); + * // Stores may be atomic or non-atomic, as in this example. + * config->param_x = param_x; + * strcpy(config->param_y, param_y); + * rte_seqlock_write_end(&config->lock); + * } + * @endcode + * + * @see + * https://en.wikipedia.org/wiki/Seqlock. + */ + +#include <stdbool.h> +#include <stdint.h> + +#include <rte_atomic.h> +#include <rte_branch_prediction.h> +#include <rte_spinlock.h> + +/** + * The RTE seqlock type. + */ +typedef struct { + uint32_t sn; /**< A sequence number for the protected data. */ + rte_spinlock_t lock; /**< Spinlock used to serialize writers. */ +} rte_seqlock_t; + +/** + * A static seqlock initializer. + */ +#define RTE_SEQLOCK_INITIALIZER { 0, RTE_SPINLOCK_INITIALIZER } + +/** + * Initialize the seqlock. + * + * This function initializes the seqlock, and leaves the writer-side + * spinlock unlocked. + * + * @param seqlock + * A pointer to the seqlock. + */ +__rte_experimental +void +rte_seqlock_init(rte_seqlock_t *seqlock); + +/** + * Begin a read-side critical section. + * + * A call to this function marks the beginning of a read-side critical + * section, for @p seqlock. + * + * rte_seqlock_read_begin() returns a sequence number, which is later + * used in rte_seqlock_read_retry() to check if the protected data + * underwent any modifications during the read transaction. + * + * After (in program order) rte_seqlock_read_begin() has been called, + * the calling thread may read and copy the protected data. The + * protected data read *must* be copied (either in pristine form, or + * in the form of some derivative). A copy is required since the + * application only may read the data in the read-side critical + * section (i.e., after rte_seqlock_read_begin() and before + * rte_seqlock_read_retry()), but must not act upon the retrieved data + * while in the critical section, since it does not yet know if it is + * consistent. + * + * The data may be accessed with both atomic and/or non-atomic loads. + * + * After (in program order) all required data loads have been + * performed, rte_seqlock_read_retry() must be called, marking the end + * of the read-side critical section. + * + * If rte_seqlock_read_retry() returns true, the just-read data is + * inconsistent and should be discarded. If rte_seqlock_read_retry() + * returns false, the data was read atomically and the copied data is + * consistent. + * + * If rte_seqlock_read_retry() returns false, the application has the + * option to immediately restart the whole procedure (e.g., calling + * rte_seqlock_read_being() again), or do the same at some later time. + * + * @param seqlock + * A pointer to the seqlock. + * @return + * The seqlock sequence number for this critical section, to + * later be passed to rte_seqlock_read_retry(). + * + * @see rte_seqlock_read_retry() + */ +__rte_experimental +static inline uint32_t +rte_seqlock_read_begin(const rte_seqlock_t *seqlock) +{ + /* __ATOMIC_ACQUIRE to prevent loads after (in program order) + * from happening before the sn load. Synchronizes-with the + * store release in rte_seqlock_end(). + */ + return __atomic_load_n(&seqlock->sn, __ATOMIC_ACQUIRE); +} + +/** + * End a read-side critical section. + * + * A call to this function marks the end of a read-side critical + * section, for @p seqlock. The application must supply the sequence + * number returned from the corresponding rte_seqlock_read_begin() + * call. + * + * After this function has been called, the caller should not access + * the protected data. + * + * In case this function returns false, the just-read data was + * consistent and the set of atomic and non-atomic load operations + * performed between rte_seqlock_read_begin() and + * rte_seqlock_read_retry() were atomic, as a whole. + * + * In case rte_seqlock_read_retry() returns true, the data was + * modified as it was being read and may be inconsistent, and thus + * should be discarded. + * + * @param seqlock + * A pointer to the seqlock. + * @param begin_sn + * The seqlock sequence number that was returned by + * rte_seqlock_read_begin() for this critical section. + * @return + * true or false, if the just-read seqlock-protected data is inconsistent + * or consistent, respectively. + * + * @see rte_seqlock_read_begin() + */ +__rte_experimental +static inline bool +rte_seqlock_read_retry(const rte_seqlock_t *seqlock, uint32_t begin_sn) +{ + uint32_t end_sn; + + /* make sure the data loads happens before the sn load */ + rte_atomic_thread_fence(__ATOMIC_ACQUIRE); + + end_sn = __atomic_load_n(&seqlock->sn, __ATOMIC_RELAXED); + + return unlikely(begin_sn & 1 || begin_sn != end_sn); +} + +/** + * Begin write-side critical section. + * + * A call to this function acquires the write lock associated @p + * seqlock, and marks the beginning of a write-side critical section. + * + * After having called this function, the caller may go on to modify + * the protected data, in an atomic or non-atomic manner. + * + * After the necessary updates have been performed, the application + * calls rte_seqlock_write_end(). + * + * This function is not preemption-safe in the sense that preemption + * of the calling thread may block reader progress until the writer + * thread is rescheduled. + * + * @param seqlock + * A pointer to the seqlock. + * + * @see rte_seqlock_write_end() + */ +__rte_experimental +static inline void +rte_seqlock_write_begin(rte_seqlock_t *seqlock) +{ + uint32_t sn; + + /* to synchronize with other writers */ + rte_spinlock_lock(&seqlock->lock); + + sn = seqlock->sn + 1; + + __atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELAXED); + + /* __ATOMIC_RELEASE to prevent stores after (in program order) + * from happening before the sn store. + */ + rte_atomic_thread_fence(__ATOMIC_RELEASE); +} + +/** + * End write-side critical section. + * + * A call to this function marks the end of the write-side critical + * section, for @p seqlock. After this call has been made, the protected + * data may no longer be modified. + * + * @param seqlock + * A pointer to the seqlock. + * + * @see rte_seqlock_write_begin() + */ +__rte_experimental +static inline void +rte_seqlock_write_end(rte_seqlock_t *seqlock) +{ + uint32_t sn; + + sn = seqlock->sn + 1; + + /* synchronizes-with the load acquire in rte_seqlock_begin() */ + __atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELEASE); + + rte_spinlock_unlock(&seqlock->lock); +} + +#endif /* _RTE_SEQLOCK_H_ */ diff --git a/lib/eal/version.map b/lib/eal/version.map index b53eeb30d7..4a9d0ed899 100644 --- a/lib/eal/version.map +++ b/lib/eal/version.map @@ -420,6 +420,9 @@ EXPERIMENTAL { rte_intr_instance_free; rte_intr_type_get; rte_intr_type_set; + + # added in 22.07 + rte_seqlock_init; }; INTERNAL { -- 2.25.1 ^ permalink raw reply [flat|nested] 104+ messages in thread
* Re: [PATCH v2] eal: add seqlock 2022-03-30 14:26 ` [PATCH v2] " Mattias Rönnblom @ 2022-03-31 7:46 ` Mattias Rönnblom 2022-03-31 9:04 ` Ola Liljedahl 2022-04-02 0:50 ` Stephen Hemminger 2022-04-05 20:16 ` Stephen Hemminger 2 siblings, 1 reply; 104+ messages in thread From: Mattias Rönnblom @ 2022-03-31 7:46 UTC (permalink / raw) To: dev Cc: Thomas Monjalon, David Marchand, Onar Olsen, Honnappa.Nagarahalli, nd, konstantin.ananyev, mb, stephen, Ola Liljedahl On 2022-03-30 16:26, Mattias Rönnblom wrote: > A sequence lock (seqlock) is synchronization primitive which allows > for data-race free, low-overhead, high-frequency reads, especially for > data structures shared across many cores and which are updated with > relatively infrequently. > > <snip> Some questions I have: Is a variant of the seqlock without the spinlock required? The reason I left such out was that I thought that in most cases where only a single writer is used (or serialization is external to the seqlock), the spinlock overhead is negligible, since updates are relatively infrequent. Should the rte_seqlock_read_retry() be called rte_seqlock_read_end(), or some third alternative? I wanted to make clear it's not just a "release the lock" function. You could use the|||__attribute__((warn_unused_result)) annotation to make clear the return value cannot be ignored, although I'm not sure DPDK ever use that attribute. | ^ permalink raw reply [flat|nested] 104+ messages in thread
* Re: [PATCH v2] eal: add seqlock 2022-03-31 7:46 ` Mattias Rönnblom @ 2022-03-31 9:04 ` Ola Liljedahl 2022-03-31 9:25 ` Morten Brørup 2022-03-31 13:38 ` Mattias Rönnblom 0 siblings, 2 replies; 104+ messages in thread From: Ola Liljedahl @ 2022-03-31 9:04 UTC (permalink / raw) To: Mattias Rönnblom, dev Cc: Thomas Monjalon, David Marchand, Onar Olsen, Honnappa.Nagarahalli, nd, konstantin.ananyev, mb, stephen On 3/31/22 09:46, Mattias Rönnblom wrote: > On 2022-03-30 16:26, Mattias Rönnblom wrote: >> A sequence lock (seqlock) is synchronization primitive which allows >> for data-race free, low-overhead, high-frequency reads, especially for >> data structures shared across many cores and which are updated with >> relatively infrequently. >> >> > > <snip> > > Some questions I have: > > Is a variant of the seqlock without the spinlock required? The reason I > left such out was that I thought that in most cases where only a single > writer is used (or serialization is external to the seqlock), the > spinlock overhead is negligible, since updates are relatively infrequent. You can combine the spinlock and the sequence number. Odd sequence number means the seqlock is busy. That would replace a non-atomic RMW of the sequence number with an atomic RMW CAS and avoid the spin lock atomic RMW operation. Not sure how much it helps. > > Should the rte_seqlock_read_retry() be called rte_seqlock_read_end(), or > some third alternative? I wanted to make clear it's not just a "release > the lock" function. You could use > the|||__attribute__((warn_unused_result)) annotation to make clear the > return value cannot be ignored, although I'm not sure DPDK ever use that > attribute. We have to decide how to use the seqlock API from the application perspective. Your current proposal: do { sn = rte_seqlock_read_begin(&seqlock) //read protected data } while (rte_seqlock_read_retry(&seqlock, sn)); or perhaps sn = rte_seqlock_read_lock(&seqlock); do { //read protected data } while (!rte_seqlock_read_tryunlock(&seqlock, &sn)); Tryunlock should signal to the user that the unlock operation might not succeed and something needs to be repeated. -- Ola ^ permalink raw reply [flat|nested] 104+ messages in thread
* RE: [PATCH v2] eal: add seqlock 2022-03-31 9:04 ` Ola Liljedahl @ 2022-03-31 9:25 ` Morten Brørup 2022-03-31 9:38 ` Ola Liljedahl 2022-03-31 13:51 ` [PATCH v2] " Mattias Rönnblom 2022-03-31 13:38 ` Mattias Rönnblom 1 sibling, 2 replies; 104+ messages in thread From: Morten Brørup @ 2022-03-31 9:25 UTC (permalink / raw) To: Ola Liljedahl, Mattias Rönnblom, dev Cc: Thomas Monjalon, David Marchand, Onar Olsen, Honnappa.Nagarahalli, nd, konstantin.ananyev, stephen > From: Ola Liljedahl [mailto:ola.liljedahl@arm.com] > Sent: Thursday, 31 March 2022 11.05 > > On 3/31/22 09:46, Mattias Rönnblom wrote: > > On 2022-03-30 16:26, Mattias Rönnblom wrote: > >> A sequence lock (seqlock) is synchronization primitive which allows > >> for data-race free, low-overhead, high-frequency reads, especially > for > >> data structures shared across many cores and which are updated with > >> relatively infrequently. > >> > >> > > > > <snip> > > > > Some questions I have: > > > > Is a variant of the seqlock without the spinlock required? The reason > I > > left such out was that I thought that in most cases where only a > single > > writer is used (or serialization is external to the seqlock), the > > spinlock overhead is negligible, since updates are relatively > infrequent. Mattias, when you suggested adding the seqlock, I considered this too, and came to the same conclusion as you. > You can combine the spinlock and the sequence number. Odd sequence > number means the seqlock is busy. That would replace a non-atomic RMW > of > the sequence number with an atomic RMW CAS and avoid the spin lock > atomic RMW operation. Not sure how much it helps. > > > > > Should the rte_seqlock_read_retry() be called rte_seqlock_read_end(), > or > > some third alternative? I wanted to make clear it's not just a > "release > > the lock" function. You could use > > the|||__attribute__((warn_unused_result)) annotation to make clear > the > > return value cannot be ignored, although I'm not sure DPDK ever use > that > > attribute. I strongly support adding __attribute__((warn_unused_result)) to the function. There's a first time for everything, and this attribute is very relevant here! > We have to decide how to use the seqlock API from the application > perspective. > Your current proposal: > do { > sn = rte_seqlock_read_begin(&seqlock) > //read protected data > } while (rte_seqlock_read_retry(&seqlock, sn)); > > or perhaps > sn = rte_seqlock_read_lock(&seqlock); > do { > //read protected data > } while (!rte_seqlock_read_tryunlock(&seqlock, &sn)); > > Tryunlock should signal to the user that the unlock operation might not > succeed and something needs to be repeated. Perhaps rename rte_seqlock_read_retry() to rte_seqlock_read_tryend()? As Ola mentions, this also inverses the boolean result value. If you consider this, please check that the resulting assembly output remains efficient. I think lock()/unlock() should be avoided in the read operation names, because no lock is taken during read. I like the critical region begin()/end() names. Regarding naming, you should also consider renaming rte_seqlock_write_begin/end() to rte_seqlock_write_lock/unlock(), following the naming convention of the other locks. This could prepare for future extensions, such as rte_seqlock_write_trylock(). Just a thought; I don't feel strongly about this. Ola, the rte_seqlock_read_lock(&seqlock) must remain inside the loop, because retries can be triggered by a write operation happening between the read_begin() and read_tryend(), and then the new sn must be used by the read operation. ^ permalink raw reply [flat|nested] 104+ messages in thread
* Re: [PATCH v2] eal: add seqlock 2022-03-31 9:25 ` Morten Brørup @ 2022-03-31 9:38 ` Ola Liljedahl 2022-03-31 10:03 ` Morten Brørup 2022-03-31 13:51 ` [PATCH v2] " Mattias Rönnblom 1 sibling, 1 reply; 104+ messages in thread From: Ola Liljedahl @ 2022-03-31 9:38 UTC (permalink / raw) To: Morten Brørup, Mattias Rönnblom, dev Cc: Thomas Monjalon, David Marchand, Onar Olsen, Honnappa.Nagarahalli, nd, konstantin.ananyev, stephen On 3/31/22 11:25, Morten Brørup wrote: >> From: Ola Liljedahl [mailto:ola.liljedahl@arm.com] >> Sent: Thursday, 31 March 2022 11.05 >> >> On 3/31/22 09:46, Mattias Rönnblom wrote: >>> On 2022-03-30 16:26, Mattias Rönnblom wrote: >>>> A sequence lock (seqlock) is synchronization primitive which allows >>>> for data-race free, low-overhead, high-frequency reads, especially >> for >>>> data structures shared across many cores and which are updated with >>>> relatively infrequently. >>>> >>>> >>> >>> <snip> >>> >>> Some questions I have: >>> >>> Is a variant of the seqlock without the spinlock required? The reason >> I >>> left such out was that I thought that in most cases where only a >> single >>> writer is used (or serialization is external to the seqlock), the >>> spinlock overhead is negligible, since updates are relatively >> infrequent. > > Mattias, when you suggested adding the seqlock, I considered this too, and came to the same conclusion as you. > >> You can combine the spinlock and the sequence number. Odd sequence >> number means the seqlock is busy. That would replace a non-atomic RMW >> of >> the sequence number with an atomic RMW CAS and avoid the spin lock >> atomic RMW operation. Not sure how much it helps. >> >>> >>> Should the rte_seqlock_read_retry() be called rte_seqlock_read_end(), >> or >>> some third alternative? I wanted to make clear it's not just a >> "release >>> the lock" function. You could use >>> the|||__attribute__((warn_unused_result)) annotation to make clear >> the >>> return value cannot be ignored, although I'm not sure DPDK ever use >> that >>> attribute. > > I strongly support adding __attribute__((warn_unused_result)) to the function. There's a first time for everything, and this attribute is very relevant here! > >> We have to decide how to use the seqlock API from the application >> perspective. >> Your current proposal: >> do { >> sn = rte_seqlock_read_begin(&seqlock) >> //read protected data >> } while (rte_seqlock_read_retry(&seqlock, sn)); >> >> or perhaps >> sn = rte_seqlock_read_lock(&seqlock); >> do { >> //read protected data >> } while (!rte_seqlock_read_tryunlock(&seqlock, &sn)); >> >> Tryunlock should signal to the user that the unlock operation might not >> succeed and something needs to be repeated. > > Perhaps rename rte_seqlock_read_retry() to rte_seqlock_read_tryend()? As Ola mentions, this also inverses the boolean result value. If you consider this, please check that the resulting assembly output remains efficient. > > I think lock()/unlock() should be avoided in the read operation names, because no lock is taken during read. I like the critical region begin()/end() names. I was following the naming convention of rte_rwlock. Isn't the seqlock just a more scalable implementation of a reader/writer lock? > > Regarding naming, you should also consider renaming rte_seqlock_write_begin/end() to rte_seqlock_write_lock/unlock(), following the naming convention of the other locks. This could prepare for future extensions, such as rte_seqlock_write_trylock(). Just a thought; I don't feel strongly about this. > > Ola, the rte_seqlock_read_lock(&seqlock) must remain inside the loop, because retries can be triggered by a write operation happening between the read_begin() and read_tryend(), and then the new sn must be used by the read operation. That's why my rte_seqlock_read_tryunlock() function takes the sequence number as a parameter passed by reference. Then the sequence number can be updated if necessary. I didn't want to force a new call to rte_seqlock_read_lock() because there should be a one-to-one match between rte_seqlock_read_lock() and a successful call to rte_seqlock_read_tryunlock(). - Ola > ^ permalink raw reply [flat|nested] 104+ messages in thread
* RE: [PATCH v2] eal: add seqlock 2022-03-31 9:38 ` Ola Liljedahl @ 2022-03-31 10:03 ` Morten Brørup 2022-03-31 11:44 ` Ola Liljedahl 0 siblings, 1 reply; 104+ messages in thread From: Morten Brørup @ 2022-03-31 10:03 UTC (permalink / raw) To: Ola Liljedahl, Mattias Rönnblom, dev Cc: Thomas Monjalon, David Marchand, Onar Olsen, Honnappa.Nagarahalli, nd, konstantin.ananyev, stephen > From: Ola Liljedahl [mailto:ola.liljedahl@arm.com] > Sent: Thursday, 31 March 2022 11.39 > > On 3/31/22 11:25, Morten Brørup wrote: > >> From: Ola Liljedahl [mailto:ola.liljedahl@arm.com] > >> Sent: Thursday, 31 March 2022 11.05 > >> > >> On 3/31/22 09:46, Mattias Rönnblom wrote: > >>> On 2022-03-30 16:26, Mattias Rönnblom wrote: > >>>> A sequence lock (seqlock) is synchronization primitive which > >>>> allows > >>>> for data-race free, low-overhead, high-frequency reads, especially > >>>> for > >>>> data structures shared across many cores and which are updated > >>>> with relatively infrequently. > >>>> > >>>> > >>> > >>> <snip> > >>> > >>> Some questions I have: > >>> > >>> Is a variant of the seqlock without the spinlock required? The > >>> reason I > >>> left such out was that I thought that in most cases where only a > >>> single > >>> writer is used (or serialization is external to the seqlock), the > >>> spinlock overhead is negligible, since updates are relatively > >>> infrequent. > > > > Mattias, when you suggested adding the seqlock, I considered this > > too, and came to the same conclusion as you. > > > >> You can combine the spinlock and the sequence number. Odd sequence > >> number means the seqlock is busy. That would replace a non-atomic > >> RMW of > >> the sequence number with an atomic RMW CAS and avoid the spin lock > >> atomic RMW operation. Not sure how much it helps. > >> > >>> > >>> Should the rte_seqlock_read_retry() be called > >>> rte_seqlock_read_end(), or > >>> some third alternative? I wanted to make clear it's not just a > >>> "release the lock" function. You could use > >>> the|||__attribute__((warn_unused_result)) annotation to make clear > >>> the > >>> return value cannot be ignored, although I'm not sure DPDK ever use > >>> that attribute. > > > > I strongly support adding __attribute__((warn_unused_result)) to the > > function. There's a first time for everything, and this attribute is > > very relevant here! > > > >> We have to decide how to use the seqlock API from the application > >> perspective. > >> Your current proposal: > >> do { > >> sn = rte_seqlock_read_begin(&seqlock) > >> //read protected data > >> } while (rte_seqlock_read_retry(&seqlock, sn)); > >> > >> or perhaps > >> sn = rte_seqlock_read_lock(&seqlock); > >> do { > >> //read protected data > >> } while (!rte_seqlock_read_tryunlock(&seqlock, &sn)); > >> > >> Tryunlock should signal to the user that the unlock operation might > >> not succeed and something needs to be repeated. > > > > Perhaps rename rte_seqlock_read_retry() to rte_seqlock_read_tryend()? > > As Ola mentions, this also inverses the boolean result value. If you > > consider this, please check that the resulting assembly output remains > > efficient. > > > > I think lock()/unlock() should be avoided in the read operation > > names, because no lock is taken during read. I like the critical region > > begin()/end() names. > I was following the naming convention of rte_rwlock. Isn't the seqlock > just a more scalable implementation of a reader/writer lock? I see your point. However, no lock is taken, so using lock()/unlock() is somewhat misleading. I have no strong opinion about this, so I'll leave it up to Mattias. > > Regarding naming, you should also consider renaming > > rte_seqlock_write_begin/end() to rte_seqlock_write_lock/unlock(), > > following the naming convention of the other locks. This could prepare > > for future extensions, such as rte_seqlock_write_trylock(). Just a > > thought; I don't feel strongly about this. > > > > Ola, the rte_seqlock_read_lock(&seqlock) must remain inside the loop, > > because retries can be triggered by a write operation happening between > > the read_begin() and read_tryend(), and then the new sn must be used by > > the read operation. > That's why my rte_seqlock_read_tryunlock() function takes the sequence > number as a parameter passed by reference. Then the sequence number can > be updated if necessary. I didn't want to force a new call to > rte_seqlock_read_lock() because there should be a one-to-one match > between rte_seqlock_read_lock() and a successful call to > rte_seqlock_read_tryunlock(). Uhh... I missed that point. In that case, consider passing sn as output parameter to read_begin() by reference too, like the Linux kernel's spin_lock_irqsave() takes the flags as output parameter. I don't have a strong opinion here; just mentioning the possibility. Performance wise, the resulting assembly output is probably the same, regardless if sn is the return value or an output parameter passed by reference. ^ permalink raw reply [flat|nested] 104+ messages in thread
* Re: [PATCH v2] eal: add seqlock 2022-03-31 10:03 ` Morten Brørup @ 2022-03-31 11:44 ` Ola Liljedahl 2022-03-31 11:50 ` Morten Brørup 2022-03-31 14:02 ` Mattias Rönnblom 0 siblings, 2 replies; 104+ messages in thread From: Ola Liljedahl @ 2022-03-31 11:44 UTC (permalink / raw) To: Morten Brørup, Mattias Rönnblom, dev Cc: Thomas Monjalon, David Marchand, Onar Olsen, Honnappa.Nagarahalli, nd, konstantin.ananyev, stephen <snip> >>> I think lock()/unlock() should be avoided in the read operation >>> names, because no lock is taken during read. I like the critical region >>> begin()/end() names. >> I was following the naming convention of rte_rwlock. Isn't the seqlock >> just a more scalable implementation of a reader/writer lock? > > I see your point. However, no lock is taken, so using lock()/unlock() is somewhat misleading. Conceptually, a reader lock is acquired and should be released. Now there wouldn't be any negative effects of skipping the unlock operation but then you wouldn't know if the data was properly read so you would have to ignore any read data as well. Why even call rte_seqlock_read_lock() in such a situation? In the only meaningful case, the lock is acquired, the protected data is read and the lock is released. The only difference compared to a more vanilla lock implementation is that the release operation may fail and the operation must restart. <snip> - Ola ^ permalink raw reply [flat|nested] 104+ messages in thread
* RE: [PATCH v2] eal: add seqlock 2022-03-31 11:44 ` Ola Liljedahl @ 2022-03-31 11:50 ` Morten Brørup 2022-03-31 14:02 ` Mattias Rönnblom 1 sibling, 0 replies; 104+ messages in thread From: Morten Brørup @ 2022-03-31 11:50 UTC (permalink / raw) To: Ola Liljedahl, Mattias Rönnblom, dev Cc: Thomas Monjalon, David Marchand, Onar Olsen, Honnappa.Nagarahalli, nd, konstantin.ananyev, stephen > From: Ola Liljedahl [mailto:ola.liljedahl@arm.com] > Sent: Thursday, 31 March 2022 13.44 > > <snip> > >>> I think lock()/unlock() should be avoided in the read operation > >>> names, because no lock is taken during read. I like the critical > region > >>> begin()/end() names. > >> I was following the naming convention of rte_rwlock. Isn't the > seqlock > >> just a more scalable implementation of a reader/writer lock? > > > > I see your point. However, no lock is taken, so using lock()/unlock() > is somewhat misleading. > Conceptually, a reader lock is acquired and should be released. Now > there wouldn't be any negative effects of skipping the unlock operation > but then you wouldn't know if the data was properly read so you would > have to ignore any read data as well. Why even call > rte_seqlock_read_lock() in such a situation? > > In the only meaningful case, the lock is acquired, the protected data > is > read and the lock is released. The only difference compared to a more > vanilla lock implementation is that the release operation may fail and > the operation must restart. Thank you for taking the time to correct me on this... I was stuck on the "lock" variable not being touched, but you are right: The serial number is a lock conceptually taken. Then I agree about the lock()/unlock() names for the read operation too. -Morten ^ permalink raw reply [flat|nested] 104+ messages in thread
* Re: [PATCH v2] eal: add seqlock 2022-03-31 11:44 ` Ola Liljedahl 2022-03-31 11:50 ` Morten Brørup @ 2022-03-31 14:02 ` Mattias Rönnblom 2022-04-01 15:07 ` [PATCH v3] " Mattias Rönnblom 1 sibling, 1 reply; 104+ messages in thread From: Mattias Rönnblom @ 2022-03-31 14:02 UTC (permalink / raw) To: Ola Liljedahl, Morten Brørup, dev Cc: Thomas Monjalon, David Marchand, Onar Olsen, Honnappa.Nagarahalli, nd, konstantin.ananyev, stephen On 2022-03-31 13:44, Ola Liljedahl wrote: > <snip> >>>> I think lock()/unlock() should be avoided in the read operation >>>> names, because no lock is taken during read. I like the critical region >>>> begin()/end() names. >>> I was following the naming convention of rte_rwlock. Isn't the seqlock >>> just a more scalable implementation of a reader/writer lock? >> >> I see your point. However, no lock is taken, so using lock()/unlock() >> is somewhat misleading. > Conceptually, a reader lock is acquired and should be released. Now > there wouldn't be any negative effects of skipping the unlock operation > but then you wouldn't know if the data was properly read so you would > have to ignore any read data as well. Why even call > rte_seqlock_read_lock() in such a situation? > > In the only meaningful case, the lock is acquired, the protected data is > read and the lock is released. The only difference compared to a more > vanilla lock implementation is that the release operation may fail and > the operation must restart. > > <snip> > > - Ola The RCU library also use the terms "lock" and "unlock" for the reader side. ^ permalink raw reply [flat|nested] 104+ messages in thread
* [PATCH v3] eal: add seqlock 2022-03-31 14:02 ` Mattias Rönnblom @ 2022-04-01 15:07 ` Mattias Rönnblom 2022-04-02 0:21 ` Honnappa Nagarahalli 2022-04-02 18:15 ` Ola Liljedahl 0 siblings, 2 replies; 104+ messages in thread From: Mattias Rönnblom @ 2022-04-01 15:07 UTC (permalink / raw) To: dev Cc: Thomas Monjalon, David Marchand, onar.olsen, Honnappa.Nagarahalli, nd, konstantin.ananyev, mb, stephen, Mattias Rönnblom, Ola Liljedahl A sequence lock (seqlock) is synchronization primitive which allows for data-race free, low-overhead, high-frequency reads, especially for data structures shared across many cores and which are updated with relatively infrequently. A seqlock permits multiple parallel readers. The variant of seqlock implemented in this patch supports multiple writers as well. A spinlock is used for writer-writer serialization. To avoid resource reclamation and other issues, the data protected by a seqlock is best off being self-contained (i.e., no pointers [except to constant data]). One way to think about seqlocks is that they provide means to perform atomic operations on data objects larger what the native atomic machine instructions allow for. DPDK seqlocks are not preemption safe on the writer side. A thread preemption affects performance, not correctness. A seqlock contains a sequence number, which can be thought of as the generation of the data it protects. A reader will 1. Load the sequence number (sn). 2. Load, in arbitrary order, the seqlock-protected data. 3. Load the sn again. 4. Check if the first and second sn are equal, and even numbered. If they are not, discard the loaded data, and restart from 1. The first three steps need to be ordered using suitable memory fences. A writer will 1. Take the spinlock, to serialize writer access. 2. Load the sn. 3. Store the original sn + 1 as the new sn. 4. Perform load and stores to the seqlock-protected data. 5. Store the original sn + 2 as the new sn. 6. Release the spinlock. Proper memory fencing is required to make sure the first sn store, the data stores, and the second sn store appear to the reader in the mentioned order. The sn loads and stores must be atomic, but the data loads and stores need not be. The original seqlock design and implementation was done by Stephen Hemminger. This is an independent implementation, using C11 atomics. For more information on seqlocks, see https://en.wikipedia.org/wiki/Seqlock PATCH v3: * Renamed both read and write-side critical section begin/end functions to better match rwlock naming, per Ola Liljedahl's suggestion. * Added 'extern "C"' guards for C++ compatibility. * Refer to the main lcore as the main, and not the master. PATCH v2: * Skip instead of fail unit test in case too few lcores are available. * Use main lcore for testing, reducing the minimum number of lcores required to run the unit tests to four. * Consistently refer to sn field as the "sequence number" in the documentation. * Fixed spelling mistakes in documentation. Updates since RFC: * Added API documentation. * Added link to Wikipedia article in the commit message. * Changed seqlock sequence number field from uint64_t (which was overkill) to uint32_t. The sn type needs to be sufficiently large to assure no reader will read a sn, access the data, and then read the same sn, but the sn has been updated to many times during the read, so it has wrapped. * Added RTE_SEQLOCK_INITIALIZER macro for static initialization. * Removed the rte_seqlock struct + separate rte_seqlock_t typedef with an anonymous struct typedef:ed to rte_seqlock_t. Acked-by: Morten Brørup <mb@smartsharesystems.com> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com> --- app/test/meson.build | 2 + app/test/test_seqlock.c | 202 +++++++++++++++++++++++ lib/eal/common/meson.build | 1 + lib/eal/common/rte_seqlock.c | 12 ++ lib/eal/include/meson.build | 1 + lib/eal/include/rte_seqlock.h | 302 ++++++++++++++++++++++++++++++++++ lib/eal/version.map | 3 + 7 files changed, 523 insertions(+) create mode 100644 app/test/test_seqlock.c create mode 100644 lib/eal/common/rte_seqlock.c create mode 100644 lib/eal/include/rte_seqlock.h diff --git a/app/test/meson.build b/app/test/meson.build index 5fc1dd1b7b..5e418e8766 100644 --- a/app/test/meson.build +++ b/app/test/meson.build @@ -125,6 +125,7 @@ test_sources = files( 'test_rwlock.c', 'test_sched.c', 'test_security.c', + 'test_seqlock.c', 'test_service_cores.c', 'test_spinlock.c', 'test_stack.c', @@ -214,6 +215,7 @@ fast_tests = [ ['rwlock_rde_wro_autotest', true], ['sched_autotest', true], ['security_autotest', false], + ['seqlock_autotest', true], ['spinlock_autotest', true], ['stack_autotest', false], ['stack_lf_autotest', false], diff --git a/app/test/test_seqlock.c b/app/test/test_seqlock.c new file mode 100644 index 0000000000..54fadf8025 --- /dev/null +++ b/app/test/test_seqlock.c @@ -0,0 +1,202 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2022 Ericsson AB + */ + +#include <rte_seqlock.h> + +#include <rte_cycles.h> +#include <rte_malloc.h> +#include <rte_random.h> + +#include <inttypes.h> + +#include "test.h" + +struct data { + rte_seqlock_t lock; + + uint64_t a; + uint64_t b __rte_cache_aligned; + uint64_t c __rte_cache_aligned; +} __rte_cache_aligned; + +struct reader { + struct data *data; + uint8_t stop; +}; + +#define WRITER_RUNTIME (2.0) /* s */ + +#define WRITER_MAX_DELAY (100) /* us */ + +#define INTERRUPTED_WRITER_FREQUENCY (1000) +#define WRITER_INTERRUPT_TIME (1) /* us */ + +static int +writer_run(void *arg) +{ + struct data *data = arg; + uint64_t deadline; + + deadline = rte_get_timer_cycles() + + WRITER_RUNTIME * rte_get_timer_hz(); + + while (rte_get_timer_cycles() < deadline) { + bool interrupted; + uint64_t new_value; + unsigned int delay; + + new_value = rte_rand(); + + interrupted = rte_rand_max(INTERRUPTED_WRITER_FREQUENCY) == 0; + + rte_seqlock_write_lock(&data->lock); + + data->c = new_value; + + /* These compiler barriers (both on the test reader + * and the test writer side) are here to ensure that + * loads/stores *usually* happen in test program order + * (always on a TSO machine). They are arrange in such + * a way that the writer stores in a different order + * than the reader loads, to emulate an arbitrary + * order. A real application using a seqlock does not + * require any compiler barriers. + */ + rte_compiler_barrier(); + data->b = new_value; + + if (interrupted) + rte_delay_us_block(WRITER_INTERRUPT_TIME); + + rte_compiler_barrier(); + data->a = new_value; + + rte_seqlock_write_unlock(&data->lock); + + delay = rte_rand_max(WRITER_MAX_DELAY); + + rte_delay_us_block(delay); + } + + return 0; +} + +#define INTERRUPTED_READER_FREQUENCY (1000) +#define READER_INTERRUPT_TIME (1000) /* us */ + +static int +reader_run(void *arg) +{ + struct reader *r = arg; + int rc = 0; + + while (__atomic_load_n(&r->stop, __ATOMIC_RELAXED) == 0 && rc == 0) { + struct data *data = r->data; + bool interrupted; + uint64_t a; + uint64_t b; + uint64_t c; + uint32_t sn; + + interrupted = rte_rand_max(INTERRUPTED_READER_FREQUENCY) == 0; + + sn = rte_seqlock_read_lock(&data->lock); + + do { + a = data->a; + /* See writer_run() for an explanation why + * these barriers are here. + */ + rte_compiler_barrier(); + + if (interrupted) + rte_delay_us_block(READER_INTERRUPT_TIME); + + c = data->c; + + rte_compiler_barrier(); + b = data->b; + + } while (!rte_seqlock_read_tryunlock(&data->lock, &sn)); + + if (a != b || b != c) { + printf("Reader observed inconsistent data values " + "%" PRIu64 " %" PRIu64 " %" PRIu64 "\n", + a, b, c); + rc = -1; + } + } + + return rc; +} + +static void +reader_stop(struct reader *reader) +{ + __atomic_store_n(&reader->stop, 1, __ATOMIC_RELAXED); +} + +#define NUM_WRITERS (2) /* main lcore + one worker */ +#define MIN_NUM_READERS (2) +#define MAX_READERS (RTE_MAX_LCORE - NUM_WRITERS - 1) +#define MIN_LCORE_COUNT (NUM_WRITERS + MIN_NUM_READERS) + +/* Only a compile-time test */ +static rte_seqlock_t __rte_unused static_init_lock = RTE_SEQLOCK_INITIALIZER; + +static int +test_seqlock(void) +{ + struct reader readers[MAX_READERS]; + unsigned int num_readers; + unsigned int num_lcores; + unsigned int i; + unsigned int lcore_id; + unsigned int reader_lcore_ids[MAX_READERS]; + unsigned int worker_writer_lcore_id = 0; + int rc = 0; + + num_lcores = rte_lcore_count(); + + if (num_lcores < MIN_LCORE_COUNT) { + printf("Too few cores to run test. Skipping.\n"); + return 0; + } + + num_readers = num_lcores - NUM_WRITERS; + + struct data *data = rte_zmalloc(NULL, sizeof(struct data), 0); + + i = 0; + RTE_LCORE_FOREACH_WORKER(lcore_id) { + if (i == 0) { + rte_eal_remote_launch(writer_run, data, lcore_id); + worker_writer_lcore_id = lcore_id; + } else { + unsigned int reader_idx = i - 1; + struct reader *reader = &readers[reader_idx]; + + reader->data = data; + reader->stop = 0; + + rte_eal_remote_launch(reader_run, reader, lcore_id); + reader_lcore_ids[reader_idx] = lcore_id; + } + i++; + } + + if (writer_run(data) != 0 || + rte_eal_wait_lcore(worker_writer_lcore_id) != 0) + rc = -1; + + for (i = 0; i < num_readers; i++) { + reader_stop(&readers[i]); + if (rte_eal_wait_lcore(reader_lcore_ids[i]) != 0) + rc = -1; + } + + return rc; +} + +REGISTER_TEST_COMMAND(seqlock_autotest, test_seqlock); diff --git a/lib/eal/common/meson.build b/lib/eal/common/meson.build index 917758cc65..a41343bfed 100644 --- a/lib/eal/common/meson.build +++ b/lib/eal/common/meson.build @@ -35,6 +35,7 @@ sources += files( 'rte_malloc.c', 'rte_random.c', 'rte_reciprocal.c', + 'rte_seqlock.c', 'rte_service.c', 'rte_version.c', ) diff --git a/lib/eal/common/rte_seqlock.c b/lib/eal/common/rte_seqlock.c new file mode 100644 index 0000000000..d4fe648799 --- /dev/null +++ b/lib/eal/common/rte_seqlock.c @@ -0,0 +1,12 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2022 Ericsson AB + */ + +#include <rte_seqlock.h> + +void +rte_seqlock_init(rte_seqlock_t *seqlock) +{ + seqlock->sn = 0; + rte_spinlock_init(&seqlock->lock); +} diff --git a/lib/eal/include/meson.build b/lib/eal/include/meson.build index 9700494816..48df5f1a21 100644 --- a/lib/eal/include/meson.build +++ b/lib/eal/include/meson.build @@ -36,6 +36,7 @@ headers += files( 'rte_per_lcore.h', 'rte_random.h', 'rte_reciprocal.h', + 'rte_seqlock.h', 'rte_service.h', 'rte_service_component.h', 'rte_string_fns.h', diff --git a/lib/eal/include/rte_seqlock.h b/lib/eal/include/rte_seqlock.h new file mode 100644 index 0000000000..44eacd66e8 --- /dev/null +++ b/lib/eal/include/rte_seqlock.h @@ -0,0 +1,302 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2022 Ericsson AB + */ + +#ifndef _RTE_SEQLOCK_H_ +#define _RTE_SEQLOCK_H_ + +#ifdef __cplusplus +extern "C" { +#endif + +/** + * @file + * RTE Seqlock + * + * A sequence lock (seqlock) is a synchronization primitive allowing + * multiple, parallel, readers to efficiently and safely (i.e., in a + * data-race free manner) access the lock-protected data. The RTE + * seqlock permits multiple writers as well. A spinlock is used to + * writer-writer synchronization. + * + * A reader never blocks a writer. Very high frequency writes may + * prevent readers from making progress. + * + * A seqlock is not preemption-safe on the writer side. If a writer is + * preempted, it may block readers until the writer thread is again + * allowed to execute. Heavy computations should be kept out of the + * writer-side critical section, to avoid delaying readers. + * + * Seqlocks are useful for data which are read by many cores, at a + * high frequency, and relatively infrequently written to. + * + * One way to think about seqlocks is that they provide means to + * perform atomic operations on objects larger than what the native + * machine instructions allow for. + * + * To avoid resource reclamation issues, the data protected by a + * seqlock should typically be kept self-contained (e.g., no pointers + * to mutable, dynamically allocated data). + * + * Example usage: + * @code{.c} + * #define MAX_Y_LEN (16) + * // Application-defined example data structure, protected by a seqlock. + * struct config { + * rte_seqlock_t lock; + * int param_x; + * char param_y[MAX_Y_LEN]; + * }; + * + * // Accessor function for reading config fields. + * void + * config_read(const struct config *config, int *param_x, char *param_y) + * { + * // Temporary variables, just to improve readability. + * int tentative_x; + * char tentative_y[MAX_Y_LEN]; + * uint32_t sn; + * + * sn = rte_seqlock_read_lock(&config->lock); + * do { + * // Loads may be atomic or non-atomic, as in this example. + * tentative_x = config->param_x; + * strcpy(tentative_y, config->param_y); + * } while (!rte_seqlock_read_tryunlock(&config->lock, &sn)); + * // An application could skip retrying, and try again later, if + * // progress is possible without the data. + * + * *param_x = tentative_x; + * strcpy(param_y, tentative_y); + * } + * + * // Accessor function for writing config fields. + * void + * config_update(struct config *config, int param_x, const char *param_y) + * { + * rte_seqlock_write_lock(&config->lock); + * // Stores may be atomic or non-atomic, as in this example. + * config->param_x = param_x; + * strcpy(config->param_y, param_y); + * rte_seqlock_write_unlock(&config->lock); + * } + * @endcode + * + * @see + * https://en.wikipedia.org/wiki/Seqlock. + */ + +#include <stdbool.h> +#include <stdint.h> + +#include <rte_atomic.h> +#include <rte_branch_prediction.h> +#include <rte_spinlock.h> + +/** + * The RTE seqlock type. + */ +typedef struct { + uint32_t sn; /**< A sequence number for the protected data. */ + rte_spinlock_t lock; /**< Spinlock used to serialize writers. */ +} rte_seqlock_t; + +/** + * A static seqlock initializer. + */ +#define RTE_SEQLOCK_INITIALIZER { 0, RTE_SPINLOCK_INITIALIZER } + +/** + * Initialize the seqlock. + * + * This function initializes the seqlock, and leaves the writer-side + * spinlock unlocked. + * + * @param seqlock + * A pointer to the seqlock. + */ +__rte_experimental +void +rte_seqlock_init(rte_seqlock_t *seqlock); + +/** + * Begin a read-side critical section. + * + * A call to this function marks the beginning of a read-side critical + * section, for @p seqlock. + * + * rte_seqlock_read_lock() returns a sequence number, which is later + * used in rte_seqlock_read_tryunlock() to check if the protected data + * underwent any modifications during the read transaction. + * + * After (in program order) rte_seqlock_read_lock() has been called, + * the calling thread reads the protected data, for later use. The + * protected data read *must* be copied (either in pristine form, or + * in the form of some derivative), since the caller may only read the + * data from within the read-side critical section (i.e., after + * rte_seqlock_read_lock() and before rte_seqlock_read_tryunlock()), + * but must not act upon the retrieved data while in the critical + * section, since it does not yet know if it is consistent. + * + * The protected data may be read using atomic and/or non-atomic + * operations. + * + * After (in program order) all required data loads have been + * performed, rte_seqlock_read_tryunlock() should be called, marking + * the end of the read-side critical section. + * + * If rte_seqlock_read_tryunlock() returns true, the data was read + * atomically and the copied data is consistent. + * + * If rte_seqlock_read_tryunlock() returns false, the just-read data + * is inconsistent and should be discarded. The caller has the option + * to either re-read the data and call rte_seqlock_read_tryunlock() + * again, or to restart the whole procedure (i.e., from + * rte_seqlock_read_lock()) at some later time. + * + * @param seqlock + * A pointer to the seqlock. + * @return + * The seqlock sequence number for this critical section, to + * later be passed to rte_seqlock_read_tryunlock(). + * + * @see rte_seqlock_read_tryunlock() + */ +__rte_experimental +static inline uint32_t +rte_seqlock_read_lock(const rte_seqlock_t *seqlock) +{ + /* __ATOMIC_ACQUIRE to prevent loads after (in program order) + * from happening before the sn load. Synchronizes-with the + * store release in rte_seqlock_write_unlock(). + */ + return __atomic_load_n(&seqlock->sn, __ATOMIC_ACQUIRE); +} + +/** + * End a read-side critical section. + * + * A call to this function marks the end of a read-side critical + * section, for @p seqlock. The application must supply the sequence + * number produced by the corresponding rte_seqlock_read_lock() (or, + * in case of a retry, the rte_seqlock_tryunlock()) call. + * + * After this function has been called, the caller should not access + * the protected data. + * + * In case this function returns true, the just-read data was + * consistent and the set of atomic and non-atomic load operations + * performed between rte_seqlock_read_lock() and + * rte_seqlock_read_tryunlock() were atomic, as a whole. + * + * In case rte_seqlock_read_tryunlock() returns false, the data was + * modified as it was being read and may be inconsistent, and thus + * should be discarded. The @p begin_sn is updated with the + * now-current sequence number. + * + * @param seqlock + * A pointer to the seqlock. + * @param begin_sn + * The seqlock sequence number returned by + * rte_seqlock_read_lock() (potentially updated in subsequent + * rte_seqlock_read_tryunlock() calls) for this critical section. + * @return + * true or false, if the just-read seqlock-protected data was consistent + * or inconsistent, respectively, at the time it was read. + * + * @see rte_seqlock_read_lock() + */ +__rte_experimental +static inline bool +rte_seqlock_read_tryunlock(const rte_seqlock_t *seqlock, uint32_t *begin_sn) +{ + uint32_t end_sn; + + /* make sure the data loads happens before the sn load */ + rte_atomic_thread_fence(__ATOMIC_ACQUIRE); + + end_sn = __atomic_load_n(&seqlock->sn, __ATOMIC_RELAXED); + + if (unlikely(end_sn & 1 || *begin_sn != end_sn)) { + *begin_sn = end_sn; + return false; + } + + return true; +} + +/** + * Begin a write-side critical section. + * + * A call to this function acquires the write lock associated @p + * seqlock, and marks the beginning of a write-side critical section. + * + * After having called this function, the caller may go on to modify + * (both read and write) the protected data, in an atomic or + * non-atomic manner. + * + * After the necessary updates have been performed, the application + * calls rte_seqlock_write_unlock(). + * + * This function is not preemption-safe in the sense that preemption + * of the calling thread may block reader progress until the writer + * thread is rescheduled. + * + * Unlike rte_seqlock_read_lock(), each call made to + * rte_seqlock_write_lock() must be matched with an unlock call. + * + * @param seqlock + * A pointer to the seqlock. + * + * @see rte_seqlock_write_unlock() + */ +__rte_experimental +static inline void +rte_seqlock_write_lock(rte_seqlock_t *seqlock) +{ + uint32_t sn; + + /* to synchronize with other writers */ + rte_spinlock_lock(&seqlock->lock); + + sn = seqlock->sn + 1; + + __atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELAXED); + + /* __ATOMIC_RELEASE to prevent stores after (in program order) + * from happening before the sn store. + */ + rte_atomic_thread_fence(__ATOMIC_RELEASE); +} + +/** + * End a write-side critical section. + * + * A call to this function marks the end of the write-side critical + * section, for @p seqlock. After this call has been made, the protected + * data may no longer be modified. + * + * @param seqlock + * A pointer to the seqlock. + * + * @see rte_seqlock_write_lock() + */ +__rte_experimental +static inline void +rte_seqlock_write_unlock(rte_seqlock_t *seqlock) +{ + uint32_t sn; + + sn = seqlock->sn + 1; + + /* synchronizes-with the load acquire in rte_seqlock_read_lock() */ + __atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELEASE); + + rte_spinlock_unlock(&seqlock->lock); +} + +#ifdef __cplusplus +} +#endif + +#endif /* _RTE_SEQLOCK_H_ */ diff --git a/lib/eal/version.map b/lib/eal/version.map index b53eeb30d7..4a9d0ed899 100644 --- a/lib/eal/version.map +++ b/lib/eal/version.map @@ -420,6 +420,9 @@ EXPERIMENTAL { rte_intr_instance_free; rte_intr_type_get; rte_intr_type_set; + + # added in 22.07 + rte_seqlock_init; }; INTERNAL { -- 2.25.1 ^ permalink raw reply [flat|nested] 104+ messages in thread
* RE: [PATCH v3] eal: add seqlock 2022-04-01 15:07 ` [PATCH v3] " Mattias Rönnblom @ 2022-04-02 0:21 ` Honnappa Nagarahalli 2022-04-02 11:01 ` Morten Brørup ` (2 more replies) 2022-04-02 18:15 ` Ola Liljedahl 1 sibling, 3 replies; 104+ messages in thread From: Honnappa Nagarahalli @ 2022-04-02 0:21 UTC (permalink / raw) To: Mattias Rönnblom, dev Cc: thomas, David Marchand, onar.olsen, nd, konstantin.ananyev, mb, stephen, Ola Liljedahl, nd Hi Mattias, Few comments inline. > -----Original Message----- > From: Mattias Rönnblom <mattias.ronnblom@ericsson.com> > Sent: Friday, April 1, 2022 10:08 AM > To: dev@dpdk.org > Cc: thomas@monjalon.net; David Marchand <david.marchand@redhat.com>; > onar.olsen@ericsson.com; Honnappa Nagarahalli > <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>; > konstantin.ananyev@intel.com; mb@smartsharesystems.com; > stephen@networkplumber.org; Mattias Rönnblom > <mattias.ronnblom@ericsson.com>; Ola Liljedahl <Ola.Liljedahl@arm.com> > Subject: [PATCH v3] eal: add seqlock > > A sequence lock (seqlock) is synchronization primitive which allows for data- > race free, low-overhead, high-frequency reads, especially for data structures > shared across many cores and which are updated with relatively infrequently. > > A seqlock permits multiple parallel readers. The variant of seqlock implemented > in this patch supports multiple writers as well. A spinlock is used for writer- > writer serialization. > > To avoid resource reclamation and other issues, the data protected by a seqlock > is best off being self-contained (i.e., no pointers [except to constant data]). > > One way to think about seqlocks is that they provide means to perform atomic > operations on data objects larger what the native atomic machine instructions > allow for. > > DPDK seqlocks are not preemption safe on the writer side. A thread preemption > affects performance, not correctness. > > A seqlock contains a sequence number, which can be thought of as the > generation of the data it protects. > > A reader will > 1. Load the sequence number (sn). > 2. Load, in arbitrary order, the seqlock-protected data. > 3. Load the sn again. > 4. Check if the first and second sn are equal, and even numbered. > If they are not, discard the loaded data, and restart from 1. > > The first three steps need to be ordered using suitable memory fences. > > A writer will > 1. Take the spinlock, to serialize writer access. > 2. Load the sn. > 3. Store the original sn + 1 as the new sn. > 4. Perform load and stores to the seqlock-protected data. > 5. Store the original sn + 2 as the new sn. > 6. Release the spinlock. > > Proper memory fencing is required to make sure the first sn store, the data > stores, and the second sn store appear to the reader in the mentioned order. > > The sn loads and stores must be atomic, but the data loads and stores need not > be. > > The original seqlock design and implementation was done by Stephen > Hemminger. This is an independent implementation, using C11 atomics. > > For more information on seqlocks, see > https://en.wikipedia.org/wiki/Seqlock > > PATCH v3: > * Renamed both read and write-side critical section begin/end functions > to better match rwlock naming, per Ola Liljedahl's suggestion. > * Added 'extern "C"' guards for C++ compatibility. > * Refer to the main lcore as the main, and not the master. > > PATCH v2: > * Skip instead of fail unit test in case too few lcores are available. > * Use main lcore for testing, reducing the minimum number of lcores > required to run the unit tests to four. > * Consistently refer to sn field as the "sequence number" in the > documentation. > * Fixed spelling mistakes in documentation. > > Updates since RFC: > * Added API documentation. > * Added link to Wikipedia article in the commit message. > * Changed seqlock sequence number field from uint64_t (which was > overkill) to uint32_t. The sn type needs to be sufficiently large > to assure no reader will read a sn, access the data, and then read > the same sn, but the sn has been updated to many times during the > read, so it has wrapped. > * Added RTE_SEQLOCK_INITIALIZER macro for static initialization. > * Removed the rte_seqlock struct + separate rte_seqlock_t typedef > with an anonymous struct typedef:ed to rte_seqlock_t. > > Acked-by: Morten Brørup <mb@smartsharesystems.com> > Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com> > Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com> > --- > app/test/meson.build | 2 + > app/test/test_seqlock.c | 202 +++++++++++++++++++++++ > lib/eal/common/meson.build | 1 + > lib/eal/common/rte_seqlock.c | 12 ++ > lib/eal/include/meson.build | 1 + > lib/eal/include/rte_seqlock.h | 302 ++++++++++++++++++++++++++++++++++ > lib/eal/version.map | 3 + > 7 files changed, 523 insertions(+) > create mode 100644 app/test/test_seqlock.c create mode 100644 > lib/eal/common/rte_seqlock.c create mode 100644 > lib/eal/include/rte_seqlock.h > > diff --git a/app/test/meson.build b/app/test/meson.build index > 5fc1dd1b7b..5e418e8766 100644 > --- a/app/test/meson.build > +++ b/app/test/meson.build > @@ -125,6 +125,7 @@ test_sources = files( > 'test_rwlock.c', > 'test_sched.c', > 'test_security.c', > + 'test_seqlock.c', > 'test_service_cores.c', > 'test_spinlock.c', > 'test_stack.c', > @@ -214,6 +215,7 @@ fast_tests = [ > ['rwlock_rde_wro_autotest', true], > ['sched_autotest', true], > ['security_autotest', false], > + ['seqlock_autotest', true], > ['spinlock_autotest', true], > ['stack_autotest', false], > ['stack_lf_autotest', false], > diff --git a/app/test/test_seqlock.c b/app/test/test_seqlock.c new file mode > 100644 index 0000000000..54fadf8025 > --- /dev/null > +++ b/app/test/test_seqlock.c > @@ -0,0 +1,202 @@ > +/* SPDX-License-Identifier: BSD-3-Clause > + * Copyright(c) 2022 Ericsson AB > + */ > + > +#include <rte_seqlock.h> > + > +#include <rte_cycles.h> > +#include <rte_malloc.h> > +#include <rte_random.h> > + > +#include <inttypes.h> > + > +#include "test.h" > + > +struct data { > + rte_seqlock_t lock; > + > + uint64_t a; > + uint64_t b __rte_cache_aligned; > + uint64_t c __rte_cache_aligned; > +} __rte_cache_aligned; > + > +struct reader { > + struct data *data; > + uint8_t stop; > +}; > + > +#define WRITER_RUNTIME (2.0) /* s */ > + > +#define WRITER_MAX_DELAY (100) /* us */ > + > +#define INTERRUPTED_WRITER_FREQUENCY (1000) #define > +WRITER_INTERRUPT_TIME (1) /* us */ > + > +static int > +writer_run(void *arg) > +{ > + struct data *data = arg; > + uint64_t deadline; > + > + deadline = rte_get_timer_cycles() + > + WRITER_RUNTIME * rte_get_timer_hz(); > + > + while (rte_get_timer_cycles() < deadline) { > + bool interrupted; > + uint64_t new_value; > + unsigned int delay; > + > + new_value = rte_rand(); > + > + interrupted = > rte_rand_max(INTERRUPTED_WRITER_FREQUENCY) == 0; > + > + rte_seqlock_write_lock(&data->lock); > + > + data->c = new_value; > + > + /* These compiler barriers (both on the test reader > + * and the test writer side) are here to ensure that > + * loads/stores *usually* happen in test program order > + * (always on a TSO machine). They are arrange in such > + * a way that the writer stores in a different order > + * than the reader loads, to emulate an arbitrary > + * order. A real application using a seqlock does not > + * require any compiler barriers. > + */ > + rte_compiler_barrier(); The compiler barriers are not sufficient on all architectures (if the intention is to maintain the program order). > + data->b = new_value; > + > + if (interrupted) > + rte_delay_us_block(WRITER_INTERRUPT_TIME); > + > + rte_compiler_barrier(); > + data->a = new_value; > + > + rte_seqlock_write_unlock(&data->lock); > + > + delay = rte_rand_max(WRITER_MAX_DELAY); > + > + rte_delay_us_block(delay); > + } > + > + return 0; > +} > + > +#define INTERRUPTED_READER_FREQUENCY (1000) #define > +READER_INTERRUPT_TIME (1000) /* us */ > + > +static int > +reader_run(void *arg) > +{ > + struct reader *r = arg; > + int rc = 0; > + > + while (__atomic_load_n(&r->stop, __ATOMIC_RELAXED) == 0 && rc == > 0) { > + struct data *data = r->data; > + bool interrupted; > + uint64_t a; > + uint64_t b; > + uint64_t c; > + uint32_t sn; > + > + interrupted = > rte_rand_max(INTERRUPTED_READER_FREQUENCY) == 0; > + > + sn = rte_seqlock_read_lock(&data->lock); > + > + do { > + a = data->a; > + /* See writer_run() for an explanation why > + * these barriers are here. > + */ > + rte_compiler_barrier(); > + > + if (interrupted) > + > rte_delay_us_block(READER_INTERRUPT_TIME); > + > + c = data->c; > + > + rte_compiler_barrier(); > + b = data->b; > + > + } while (!rte_seqlock_read_tryunlock(&data->lock, &sn)); > + > + if (a != b || b != c) { > + printf("Reader observed inconsistent data values " > + "%" PRIu64 " %" PRIu64 " %" PRIu64 "\n", > + a, b, c); > + rc = -1; > + } > + } > + > + return rc; > +} > + > +static void > +reader_stop(struct reader *reader) > +{ > + __atomic_store_n(&reader->stop, 1, __ATOMIC_RELAXED); } > + > +#define NUM_WRITERS (2) /* main lcore + one worker */ #define > +MIN_NUM_READERS (2) #define MAX_READERS (RTE_MAX_LCORE - > NUM_WRITERS - > +1) #define MIN_LCORE_COUNT (NUM_WRITERS + MIN_NUM_READERS) > + > +/* Only a compile-time test */ > +static rte_seqlock_t __rte_unused static_init_lock = > +RTE_SEQLOCK_INITIALIZER; > + > +static int > +test_seqlock(void) > +{ > + struct reader readers[MAX_READERS]; > + unsigned int num_readers; > + unsigned int num_lcores; > + unsigned int i; > + unsigned int lcore_id; > + unsigned int reader_lcore_ids[MAX_READERS]; > + unsigned int worker_writer_lcore_id = 0; > + int rc = 0; > + > + num_lcores = rte_lcore_count(); > + > + if (num_lcores < MIN_LCORE_COUNT) { > + printf("Too few cores to run test. Skipping.\n"); > + return 0; > + } > + > + num_readers = num_lcores - NUM_WRITERS; > + > + struct data *data = rte_zmalloc(NULL, sizeof(struct data), 0); > + > + i = 0; > + RTE_LCORE_FOREACH_WORKER(lcore_id) { > + if (i == 0) { > + rte_eal_remote_launch(writer_run, data, lcore_id); > + worker_writer_lcore_id = lcore_id; > + } else { > + unsigned int reader_idx = i - 1; > + struct reader *reader = &readers[reader_idx]; > + > + reader->data = data; > + reader->stop = 0; > + > + rte_eal_remote_launch(reader_run, reader, lcore_id); > + reader_lcore_ids[reader_idx] = lcore_id; > + } > + i++; > + } > + > + if (writer_run(data) != 0 || > + rte_eal_wait_lcore(worker_writer_lcore_id) != 0) > + rc = -1; > + > + for (i = 0; i < num_readers; i++) { > + reader_stop(&readers[i]); > + if (rte_eal_wait_lcore(reader_lcore_ids[i]) != 0) > + rc = -1; > + } > + > + return rc; > +} > + > +REGISTER_TEST_COMMAND(seqlock_autotest, test_seqlock); > diff --git a/lib/eal/common/meson.build b/lib/eal/common/meson.build index > 917758cc65..a41343bfed 100644 > --- a/lib/eal/common/meson.build > +++ b/lib/eal/common/meson.build > @@ -35,6 +35,7 @@ sources += files( > 'rte_malloc.c', > 'rte_random.c', > 'rte_reciprocal.c', > + 'rte_seqlock.c', > 'rte_service.c', > 'rte_version.c', > ) > diff --git a/lib/eal/common/rte_seqlock.c b/lib/eal/common/rte_seqlock.c new > file mode 100644 index 0000000000..d4fe648799 > --- /dev/null > +++ b/lib/eal/common/rte_seqlock.c > @@ -0,0 +1,12 @@ > +/* SPDX-License-Identifier: BSD-3-Clause > + * Copyright(c) 2022 Ericsson AB > + */ > + > +#include <rte_seqlock.h> > + > +void > +rte_seqlock_init(rte_seqlock_t *seqlock) { > + seqlock->sn = 0; > + rte_spinlock_init(&seqlock->lock); > +} > diff --git a/lib/eal/include/meson.build b/lib/eal/include/meson.build index > 9700494816..48df5f1a21 100644 > --- a/lib/eal/include/meson.build > +++ b/lib/eal/include/meson.build > @@ -36,6 +36,7 @@ headers += files( > 'rte_per_lcore.h', > 'rte_random.h', > 'rte_reciprocal.h', > + 'rte_seqlock.h', > 'rte_service.h', > 'rte_service_component.h', > 'rte_string_fns.h', > diff --git a/lib/eal/include/rte_seqlock.h b/lib/eal/include/rte_seqlock.h new file Other lock implementations are in lib/eal/include/generic. > mode 100644 index 0000000000..44eacd66e8 > --- /dev/null > +++ b/lib/eal/include/rte_seqlock.h > @@ -0,0 +1,302 @@ > +/* SPDX-License-Identifier: BSD-3-Clause > + * Copyright(c) 2022 Ericsson AB > + */ > + > +#ifndef _RTE_SEQLOCK_H_ > +#define _RTE_SEQLOCK_H_ > + > +#ifdef __cplusplus > +extern "C" { > +#endif > + > +/** > + * @file > + * RTE Seqlock > + * > + * A sequence lock (seqlock) is a synchronization primitive allowing > + * multiple, parallel, readers to efficiently and safely (i.e., in a > + * data-race free manner) access the lock-protected data. The RTE > + * seqlock permits multiple writers as well. A spinlock is used to > + * writer-writer synchronization. > + * > + * A reader never blocks a writer. Very high frequency writes may > + * prevent readers from making progress. > + * > + * A seqlock is not preemption-safe on the writer side. If a writer is > + * preempted, it may block readers until the writer thread is again > + * allowed to execute. Heavy computations should be kept out of the > + * writer-side critical section, to avoid delaying readers. > + * > + * Seqlocks are useful for data which are read by many cores, at a > + * high frequency, and relatively infrequently written to. > + * > + * One way to think about seqlocks is that they provide means to > + * perform atomic operations on objects larger than what the native > + * machine instructions allow for. > + * > + * To avoid resource reclamation issues, the data protected by a > + * seqlock should typically be kept self-contained (e.g., no pointers > + * to mutable, dynamically allocated data). > + * > + * Example usage: > + * @code{.c} > + * #define MAX_Y_LEN (16) > + * // Application-defined example data structure, protected by a seqlock. > + * struct config { > + * rte_seqlock_t lock; > + * int param_x; > + * char param_y[MAX_Y_LEN]; > + * }; > + * > + * // Accessor function for reading config fields. > + * void > + * config_read(const struct config *config, int *param_x, char > +*param_y) > + * { > + * // Temporary variables, just to improve readability. I think the above comment is not necessary. It is beneficial to copy the protected data to keep the read side critical section small. > + * int tentative_x; > + * char tentative_y[MAX_Y_LEN]; > + * uint32_t sn; > + * > + * sn = rte_seqlock_read_lock(&config->lock); > + * do { > + * // Loads may be atomic or non-atomic, as in this example. > + * tentative_x = config->param_x; > + * strcpy(tentative_y, config->param_y); > + * } while (!rte_seqlock_read_tryunlock(&config->lock, &sn)); > + * // An application could skip retrying, and try again later, if > + * // progress is possible without the data. > + * > + * *param_x = tentative_x; > + * strcpy(param_y, tentative_y); > + * } > + * > + * // Accessor function for writing config fields. > + * void > + * config_update(struct config *config, int param_x, const char > +*param_y) > + * { > + * rte_seqlock_write_lock(&config->lock); > + * // Stores may be atomic or non-atomic, as in this example. > + * config->param_x = param_x; > + * strcpy(config->param_y, param_y); > + * rte_seqlock_write_unlock(&config->lock); > + * } > + * @endcode > + * > + * @see > + * https://en.wikipedia.org/wiki/Seqlock. > + */ > + > +#include <stdbool.h> > +#include <stdint.h> > + > +#include <rte_atomic.h> > +#include <rte_branch_prediction.h> > +#include <rte_spinlock.h> > + > +/** > + * The RTE seqlock type. > + */ > +typedef struct { > + uint32_t sn; /**< A sequence number for the protected data. */ > + rte_spinlock_t lock; /**< Spinlock used to serialize writers. */ } Suggest using ticket lock for the writer side. It should have low overhead when there is a single writer, but provides better functionality when there are multiple writers. > +rte_seqlock_t; > + > +/** > + * A static seqlock initializer. > + */ > +#define RTE_SEQLOCK_INITIALIZER { 0, RTE_SPINLOCK_INITIALIZER } > + > +/** > + * Initialize the seqlock. > + * > + * This function initializes the seqlock, and leaves the writer-side > + * spinlock unlocked. > + * > + * @param seqlock > + * A pointer to the seqlock. > + */ > +__rte_experimental > +void > +rte_seqlock_init(rte_seqlock_t *seqlock); > + > +/** > + * Begin a read-side critical section. > + * > + * A call to this function marks the beginning of a read-side critical > + * section, for @p seqlock. > + * > + * rte_seqlock_read_lock() returns a sequence number, which is later > + * used in rte_seqlock_read_tryunlock() to check if the protected data > + * underwent any modifications during the read transaction. > + * > + * After (in program order) rte_seqlock_read_lock() has been called, > + * the calling thread reads the protected data, for later use. The > + * protected data read *must* be copied (either in pristine form, or > + * in the form of some derivative), since the caller may only read the > + * data from within the read-side critical section (i.e., after > + * rte_seqlock_read_lock() and before rte_seqlock_read_tryunlock()), > + * but must not act upon the retrieved data while in the critical > + * section, since it does not yet know if it is consistent. > + * > + * The protected data may be read using atomic and/or non-atomic > + * operations. > + * > + * After (in program order) all required data loads have been > + * performed, rte_seqlock_read_tryunlock() should be called, marking > + * the end of the read-side critical section. > + * > + * If rte_seqlock_read_tryunlock() returns true, the data was read > + * atomically and the copied data is consistent. > + * > + * If rte_seqlock_read_tryunlock() returns false, the just-read data > + * is inconsistent and should be discarded. The caller has the option > + * to either re-read the data and call rte_seqlock_read_tryunlock() > + * again, or to restart the whole procedure (i.e., from > + * rte_seqlock_read_lock()) at some later time. > + * > + * @param seqlock > + * A pointer to the seqlock. > + * @return > + * The seqlock sequence number for this critical section, to > + * later be passed to rte_seqlock_read_tryunlock(). > + * > + * @see rte_seqlock_read_tryunlock() > + */ > +__rte_experimental > +static inline uint32_t > +rte_seqlock_read_lock(const rte_seqlock_t *seqlock) { > + /* __ATOMIC_ACQUIRE to prevent loads after (in program order) > + * from happening before the sn load. Synchronizes-with the > + * store release in rte_seqlock_write_unlock(). > + */ > + return __atomic_load_n(&seqlock->sn, __ATOMIC_ACQUIRE); } > + > +/** > + * End a read-side critical section. > + * > + * A call to this function marks the end of a read-side critical Should we capture that it also begins a new critical-section for the subsequent calls to rte_seqlock_tryunlock()? > + * section, for @p seqlock. The application must supply the sequence > + * number produced by the corresponding rte_seqlock_read_lock() (or, > + * in case of a retry, the rte_seqlock_tryunlock()) call. > + * > + * After this function has been called, the caller should not access > + * the protected data. I understand what you mean here. But, I think this needs clarity. In the documentation for rte_seqlock_read_lock() you have mentioned, if rte_seqlock_read_tryunlock() returns false, one could re-read the data. May be this should be changed to: " After this function returns true, the caller should not access the protected data."? Or may be combine it with the following para. > + * > + * In case this function returns true, the just-read data was > + * consistent and the set of atomic and non-atomic load operations > + * performed between rte_seqlock_read_lock() and > + * rte_seqlock_read_tryunlock() were atomic, as a whole. > + * > + * In case rte_seqlock_read_tryunlock() returns false, the data was > + * modified as it was being read and may be inconsistent, and thus > + * should be discarded. The @p begin_sn is updated with the > + * now-current sequence number. May be " The @p begin_sn is updated with the sequence number for the next critical section." > + * > + * @param seqlock > + * A pointer to the seqlock. > + * @param begin_sn > + * The seqlock sequence number returned by > + * rte_seqlock_read_lock() (potentially updated in subsequent > + * rte_seqlock_read_tryunlock() calls) for this critical section. > + * @return > + * true or false, if the just-read seqlock-protected data was consistent > + * or inconsistent, respectively, at the time it was read. true - just read protected data was consistent false - just read protected data was inconsistent > + * > + * @see rte_seqlock_read_lock() > + */ > +__rte_experimental > +static inline bool > +rte_seqlock_read_tryunlock(const rte_seqlock_t *seqlock, uint32_t > +*begin_sn) { > + uint32_t end_sn; > + > + /* make sure the data loads happens before the sn load */ > + rte_atomic_thread_fence(__ATOMIC_ACQUIRE); > + > + end_sn = __atomic_load_n(&seqlock->sn, __ATOMIC_RELAXED); > + > + if (unlikely(end_sn & 1 || *begin_sn != end_sn)) { > + *begin_sn = end_sn; > + return false; > + } > + > + return true; > +} > + > +/** > + * Begin a write-side critical section. > + * > + * A call to this function acquires the write lock associated @p > + * seqlock, and marks the beginning of a write-side critical section. > + * > + * After having called this function, the caller may go on to modify > + * (both read and write) the protected data, in an atomic or > + * non-atomic manner. > + * > + * After the necessary updates have been performed, the application > + * calls rte_seqlock_write_unlock(). > + * > + * This function is not preemption-safe in the sense that preemption > + * of the calling thread may block reader progress until the writer > + * thread is rescheduled. > + * > + * Unlike rte_seqlock_read_lock(), each call made to > + * rte_seqlock_write_lock() must be matched with an unlock call. > + * > + * @param seqlock > + * A pointer to the seqlock. > + * > + * @see rte_seqlock_write_unlock() > + */ > +__rte_experimental > +static inline void > +rte_seqlock_write_lock(rte_seqlock_t *seqlock) { > + uint32_t sn; > + > + /* to synchronize with other writers */ > + rte_spinlock_lock(&seqlock->lock); > + > + sn = seqlock->sn + 1; The load of seqlock->sn could use __atomic_load_n to be consistent. > + > + __atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELAXED); > + > + /* __ATOMIC_RELEASE to prevent stores after (in program order) > + * from happening before the sn store. > + */ > + rte_atomic_thread_fence(__ATOMIC_RELEASE); > +} > + > +/** > + * End a write-side critical section. > + * > + * A call to this function marks the end of the write-side critical > + * section, for @p seqlock. After this call has been made, the > +protected > + * data may no longer be modified. > + * > + * @param seqlock > + * A pointer to the seqlock. > + * > + * @see rte_seqlock_write_lock() > + */ > +__rte_experimental > +static inline void > +rte_seqlock_write_unlock(rte_seqlock_t *seqlock) { > + uint32_t sn; > + > + sn = seqlock->sn + 1; Same here, the load of seqlock->sn could use __atomic_load_n > + > + /* synchronizes-with the load acquire in rte_seqlock_read_lock() */ > + __atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELEASE); > + > + rte_spinlock_unlock(&seqlock->lock); > +} > + > +#ifdef __cplusplus > +} > +#endif > + > +#endif /* _RTE_SEQLOCK_H_ */ > diff --git a/lib/eal/version.map b/lib/eal/version.map index > b53eeb30d7..4a9d0ed899 100644 > --- a/lib/eal/version.map > +++ b/lib/eal/version.map > @@ -420,6 +420,9 @@ EXPERIMENTAL { > rte_intr_instance_free; > rte_intr_type_get; > rte_intr_type_set; > + > + # added in 22.07 > + rte_seqlock_init; > }; > > INTERNAL { > -- > 2.25.1 ^ permalink raw reply [flat|nested] 104+ messages in thread
* RE: [PATCH v3] eal: add seqlock 2022-04-02 0:21 ` Honnappa Nagarahalli @ 2022-04-02 11:01 ` Morten Brørup 2022-04-02 19:38 ` Honnappa Nagarahalli 2022-04-03 6:10 ` [PATCH v3] eal: add seqlock Mattias Rönnblom 2022-04-03 6:33 ` Mattias Rönnblom 2 siblings, 1 reply; 104+ messages in thread From: Morten Brørup @ 2022-04-02 11:01 UTC (permalink / raw) To: Honnappa Nagarahalli, Mattias Rönnblom, dev Cc: thomas, David Marchand, onar.olsen, nd, konstantin.ananyev, stephen, Ola Liljedahl, nd > From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com] > Sent: Saturday, 2 April 2022 02.22 > > > From: Mattias Rönnblom <mattias.ronnblom@ericsson.com> > > Sent: Friday, April 1, 2022 10:08 AM > > > > diff --git a/lib/eal/include/rte_seqlock.h > > b/lib/eal/include/rte_seqlock.h new file > Other lock implementations are in lib/eal/include/generic. I'm not sure why what goes where... e.g. rte_branch_prediction.h and rte_bitops.h are not in include/generic. But I agree that keeping lock implementations in the same location makes sense. Also, other lock implementations have their init() function in their header file, so you could consider getting rid of the C file. I don't care, just mentioning it. > > +/** > > + * The RTE seqlock type. > > + */ > > +typedef struct { > > + uint32_t sn; /**< A sequence number for the protected data. */ > > + rte_spinlock_t lock; /**< Spinlock used to serialize writers. */ > } > Suggest using ticket lock for the writer side. It should have low > overhead when there is a single writer, but provides better > functionality when there are multiple writers. A spinlock and a ticket lock have the same size, so there is no memory cost either. Unless using a ticket lock stands in the way for future extensions to the seqlock library, then it seems like a good idea. > > +__rte_experimental > > +static inline bool > > +rte_seqlock_read_tryunlock(const rte_seqlock_t *seqlock, uint32_t > > +*begin_sn) { Did anyone object to adding the __attribute__((warn_unused_result))? Otherwise, I think you should add it. ^ permalink raw reply [flat|nested] 104+ messages in thread
* RE: [PATCH v3] eal: add seqlock 2022-04-02 11:01 ` Morten Brørup @ 2022-04-02 19:38 ` Honnappa Nagarahalli 2022-04-10 13:51 ` [RFC 1/3] eal: add macro to warn for unused function return values Mattias Rönnblom 0 siblings, 1 reply; 104+ messages in thread From: Honnappa Nagarahalli @ 2022-04-02 19:38 UTC (permalink / raw) To: Morten Brørup, Mattias Rönnblom, dev Cc: thomas, David Marchand, onar.olsen, nd, konstantin.ananyev, stephen, Ola Liljedahl, nd, nd <snip> > > > > +__rte_experimental > > > +static inline bool > > > +rte_seqlock_read_tryunlock(const rte_seqlock_t *seqlock, uint32_t > > > +*begin_sn) { > > Did anyone object to adding the __attribute__((warn_unused_result))? > > Otherwise, I think you should add it. +1 ^ permalink raw reply [flat|nested] 104+ messages in thread
* [RFC 1/3] eal: add macro to warn for unused function return values 2022-04-02 19:38 ` Honnappa Nagarahalli @ 2022-04-10 13:51 ` Mattias Rönnblom 2022-04-10 13:51 ` [RFC 2/3] eal: emit warning for unused trylock return value Mattias Rönnblom ` (4 more replies) 0 siblings, 5 replies; 104+ messages in thread From: Mattias Rönnblom @ 2022-04-10 13:51 UTC (permalink / raw) To: dev Cc: Thomas Monjalon, David Marchand, Honnappa.Nagarahalli, mb, hofors, Mattias Rönnblom This patch adds a wrapper macro __rte_warn_unused_result for the warn_unused_result function attribute. Marking a function __rte_warn_unused_result will make the compiler emit a warning in case the caller does not use the function's return value. Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com> --- lib/eal/include/rte_common.h | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/lib/eal/include/rte_common.h b/lib/eal/include/rte_common.h index 4a399cc7c8..544e7de2e7 100644 --- a/lib/eal/include/rte_common.h +++ b/lib/eal/include/rte_common.h @@ -222,6 +222,11 @@ static void __attribute__((destructor(RTE_PRIO(prio)), used)) func(void) */ #define __rte_noreturn __attribute__((noreturn)) +/** + * Issue warning in case the function's return value is ignore + */ +#define __rte_warn_unused_result __attribute__((warn_unused_result)) + /** * Force a function to be inlined */ -- 2.25.1 ^ permalink raw reply [flat|nested] 104+ messages in thread
* [RFC 2/3] eal: emit warning for unused trylock return value 2022-04-10 13:51 ` [RFC 1/3] eal: add macro to warn for unused function return values Mattias Rönnblom @ 2022-04-10 13:51 ` Mattias Rönnblom 2022-04-10 13:51 ` [RFC 3/3] examples/bond: fix invalid use of trylock Mattias Rönnblom ` (3 subsequent siblings) 4 siblings, 0 replies; 104+ messages in thread From: Mattias Rönnblom @ 2022-04-10 13:51 UTC (permalink / raw) To: dev Cc: Thomas Monjalon, David Marchand, Honnappa.Nagarahalli, mb, hofors, Mattias Rönnblom Mark the trylock family of spinlock functions with __rte_warn_unused_result. Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com> --- lib/eal/include/generic/rte_spinlock.h | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/lib/eal/include/generic/rte_spinlock.h b/lib/eal/include/generic/rte_spinlock.h index 40fe49d5ad..73ed4bfbdc 100644 --- a/lib/eal/include/generic/rte_spinlock.h +++ b/lib/eal/include/generic/rte_spinlock.h @@ -97,6 +97,7 @@ rte_spinlock_unlock (rte_spinlock_t *sl) * @return * 1 if the lock is successfully taken; 0 otherwise. */ +__rte_warn_unused_result static inline int rte_spinlock_trylock (rte_spinlock_t *sl); @@ -174,6 +175,7 @@ rte_spinlock_unlock_tm(rte_spinlock_t *sl); * 1 if the hardware memory transaction is successfully started * or lock is successfully taken; 0 otherwise. */ +__rte_warn_unused_result static inline int rte_spinlock_trylock_tm(rte_spinlock_t *sl); @@ -243,6 +245,7 @@ static inline void rte_spinlock_recursive_unlock(rte_spinlock_recursive_t *slr) * @return * 1 if the lock is successfully taken; 0 otherwise. */ +__rte_warn_unused_result static inline int rte_spinlock_recursive_trylock(rte_spinlock_recursive_t *slr) { int id = rte_gettid(); @@ -299,6 +302,7 @@ static inline void rte_spinlock_recursive_unlock_tm( * 1 if the hardware memory transaction is successfully started * or lock is successfully taken; 0 otherwise. */ +__rte_warn_unused_result static inline int rte_spinlock_recursive_trylock_tm( rte_spinlock_recursive_t *slr); -- 2.25.1 ^ permalink raw reply [flat|nested] 104+ messages in thread
* [RFC 3/3] examples/bond: fix invalid use of trylock 2022-04-10 13:51 ` [RFC 1/3] eal: add macro to warn for unused function return values Mattias Rönnblom 2022-04-10 13:51 ` [RFC 2/3] eal: emit warning for unused trylock return value Mattias Rönnblom @ 2022-04-10 13:51 ` Mattias Rönnblom 2022-04-11 1:01 ` Min Hu (Connor) 2022-04-11 11:25 ` David Marchand 2022-04-10 18:02 ` [RFC 1/3] eal: add macro to warn for unused function return values Stephen Hemminger ` (2 subsequent siblings) 4 siblings, 2 replies; 104+ messages in thread From: Mattias Rönnblom @ 2022-04-10 13:51 UTC (permalink / raw) To: dev Cc: Thomas Monjalon, David Marchand, Honnappa.Nagarahalli, mb, hofors, Mattias Rönnblom, michalx.k.jastrzebski The conditional rte_spinlock_trylock() was used as if it is an unconditional lock operation in a number of places. Fixes: cc7e8ae84faa ("examples/bond: add example application for link bonding mode 6") Cc: michalx.k.jastrzebski@intel.com Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com> --- examples/bond/main.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/examples/bond/main.c b/examples/bond/main.c index 335bde5c8d..4efebb3902 100644 --- a/examples/bond/main.c +++ b/examples/bond/main.c @@ -373,7 +373,7 @@ static int lcore_main(__rte_unused void *arg1) bond_ip = BOND_IP_1 | (BOND_IP_2 << 8) | (BOND_IP_3 << 16) | (BOND_IP_4 << 24); - rte_spinlock_trylock(&global_flag_stru_p->lock); + rte_spinlock_lock(&global_flag_stru_p->lock); while (global_flag_stru_p->LcoreMainIsRunning) { rte_spinlock_unlock(&global_flag_stru_p->lock); @@ -456,7 +456,7 @@ static int lcore_main(__rte_unused void *arg1) if (is_free == 0) rte_pktmbuf_free(pkts[i]); } - rte_spinlock_trylock(&global_flag_stru_p->lock); + rte_spinlock_lock(&global_flag_stru_p->lock); } rte_spinlock_unlock(&global_flag_stru_p->lock); printf("BYE lcore_main\n"); @@ -571,7 +571,7 @@ static void cmd_start_parsed(__rte_unused void *parsed_result, { int worker_core_id = rte_lcore_id(); - rte_spinlock_trylock(&global_flag_stru_p->lock); + rte_spinlock_lock(&global_flag_stru_p->lock); if (global_flag_stru_p->LcoreMainIsRunning == 0) { if (rte_eal_get_lcore_state(global_flag_stru_p->LcoreMainCore) != WAIT) { @@ -591,7 +591,7 @@ static void cmd_start_parsed(__rte_unused void *parsed_result, if ((worker_core_id >= RTE_MAX_LCORE) || (worker_core_id == 0)) return; - rte_spinlock_trylock(&global_flag_stru_p->lock); + rte_spinlock_lock(&global_flag_stru_p->lock); global_flag_stru_p->LcoreMainIsRunning = 1; rte_spinlock_unlock(&global_flag_stru_p->lock); cmdline_printf(cl, @@ -659,7 +659,7 @@ static void cmd_stop_parsed(__rte_unused void *parsed_result, struct cmdline *cl, __rte_unused void *data) { - rte_spinlock_trylock(&global_flag_stru_p->lock); + rte_spinlock_lock(&global_flag_stru_p->lock); if (global_flag_stru_p->LcoreMainIsRunning == 0) { cmdline_printf(cl, "lcore_main not running on core:%d\n", @@ -700,7 +700,7 @@ static void cmd_quit_parsed(__rte_unused void *parsed_result, struct cmdline *cl, __rte_unused void *data) { - rte_spinlock_trylock(&global_flag_stru_p->lock); + rte_spinlock_lock(&global_flag_stru_p->lock); if (global_flag_stru_p->LcoreMainIsRunning == 0) { cmdline_printf(cl, "lcore_main not running on core:%d\n", @@ -762,7 +762,7 @@ static void cmd_show_parsed(__rte_unused void *parsed_result, printf("\n"); } - rte_spinlock_trylock(&global_flag_stru_p->lock); + rte_spinlock_lock(&global_flag_stru_p->lock); cmdline_printf(cl, "Active_slaves:%d " "packets received:Tot:%d Arp:%d IPv4:%d\n", -- 2.25.1 ^ permalink raw reply [flat|nested] 104+ messages in thread
* Re: [RFC 3/3] examples/bond: fix invalid use of trylock 2022-04-10 13:51 ` [RFC 3/3] examples/bond: fix invalid use of trylock Mattias Rönnblom @ 2022-04-11 1:01 ` Min Hu (Connor) 2022-04-11 14:32 ` Mattias Rönnblom 2022-04-11 11:25 ` David Marchand 1 sibling, 1 reply; 104+ messages in thread From: Min Hu (Connor) @ 2022-04-11 1:01 UTC (permalink / raw) To: Mattias Rönnblom, dev Cc: Thomas Monjalon, David Marchand, Honnappa.Nagarahalli, mb, hofors, michalx.k.jastrzebski Acked-by: Min Hu (Connor) <humin29@huawei.com> 在 2022/4/10 21:51, Mattias Rönnblom 写道: > The conditional rte_spinlock_trylock() was used as if it is an > unconditional lock operation in a number of places. > > Fixes: cc7e8ae84faa ("examples/bond: add example application for link bonding mode 6") > Cc: michalx.k.jastrzebski@intel.com > > Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com> > --- > examples/bond/main.c | 14 +++++++------- > 1 file changed, 7 insertions(+), 7 deletions(-) > > diff --git a/examples/bond/main.c b/examples/bond/main.c > index 335bde5c8d..4efebb3902 100644 > --- a/examples/bond/main.c > +++ b/examples/bond/main.c > @@ -373,7 +373,7 @@ static int lcore_main(__rte_unused void *arg1) > bond_ip = BOND_IP_1 | (BOND_IP_2 << 8) | > (BOND_IP_3 << 16) | (BOND_IP_4 << 24); > > - rte_spinlock_trylock(&global_flag_stru_p->lock); > + rte_spinlock_lock(&global_flag_stru_p->lock); > > while (global_flag_stru_p->LcoreMainIsRunning) { > rte_spinlock_unlock(&global_flag_stru_p->lock); > @@ -456,7 +456,7 @@ static int lcore_main(__rte_unused void *arg1) > if (is_free == 0) > rte_pktmbuf_free(pkts[i]); > } > - rte_spinlock_trylock(&global_flag_stru_p->lock); > + rte_spinlock_lock(&global_flag_stru_p->lock); > } > rte_spinlock_unlock(&global_flag_stru_p->lock); > printf("BYE lcore_main\n"); > @@ -571,7 +571,7 @@ static void cmd_start_parsed(__rte_unused void *parsed_result, > { > int worker_core_id = rte_lcore_id(); > > - rte_spinlock_trylock(&global_flag_stru_p->lock); > + rte_spinlock_lock(&global_flag_stru_p->lock); > if (global_flag_stru_p->LcoreMainIsRunning == 0) { > if (rte_eal_get_lcore_state(global_flag_stru_p->LcoreMainCore) > != WAIT) { > @@ -591,7 +591,7 @@ static void cmd_start_parsed(__rte_unused void *parsed_result, > if ((worker_core_id >= RTE_MAX_LCORE) || (worker_core_id == 0)) > return; > > - rte_spinlock_trylock(&global_flag_stru_p->lock); > + rte_spinlock_lock(&global_flag_stru_p->lock); > global_flag_stru_p->LcoreMainIsRunning = 1; > rte_spinlock_unlock(&global_flag_stru_p->lock); > cmdline_printf(cl, > @@ -659,7 +659,7 @@ static void cmd_stop_parsed(__rte_unused void *parsed_result, > struct cmdline *cl, > __rte_unused void *data) > { > - rte_spinlock_trylock(&global_flag_stru_p->lock); > + rte_spinlock_lock(&global_flag_stru_p->lock); > if (global_flag_stru_p->LcoreMainIsRunning == 0) { > cmdline_printf(cl, > "lcore_main not running on core:%d\n", > @@ -700,7 +700,7 @@ static void cmd_quit_parsed(__rte_unused void *parsed_result, > struct cmdline *cl, > __rte_unused void *data) > { > - rte_spinlock_trylock(&global_flag_stru_p->lock); > + rte_spinlock_lock(&global_flag_stru_p->lock); > if (global_flag_stru_p->LcoreMainIsRunning == 0) { > cmdline_printf(cl, > "lcore_main not running on core:%d\n", > @@ -762,7 +762,7 @@ static void cmd_show_parsed(__rte_unused void *parsed_result, > printf("\n"); > } > > - rte_spinlock_trylock(&global_flag_stru_p->lock); > + rte_spinlock_lock(&global_flag_stru_p->lock); > cmdline_printf(cl, > "Active_slaves:%d " > "packets received:Tot:%d Arp:%d IPv4:%d\n", > ^ permalink raw reply [flat|nested] 104+ messages in thread
* Re: [RFC 3/3] examples/bond: fix invalid use of trylock 2022-04-11 1:01 ` Min Hu (Connor) @ 2022-04-11 14:32 ` Mattias Rönnblom 0 siblings, 0 replies; 104+ messages in thread From: Mattias Rönnblom @ 2022-04-11 14:32 UTC (permalink / raw) To: Min Hu (Connor), dev Cc: Thomas Monjalon, David Marchand, Honnappa.Nagarahalli, mb, hofors, michalx.k.jastrzebski On 2022-04-11 03:01, Min Hu (Connor) wrote: > Acked-by: Min Hu (Connor) <humin29@huawei.com> > Thanks. It was pretty obvious that something was wrong with this example's use of the spinlock, but after the brief look I had it was a little less obvious if this patch would fix the problem or not. > 在 2022/4/10 21:51, Mattias Rönnblom 写道: >> The conditional rte_spinlock_trylock() was used as if it is an >> unconditional lock operation in a number of places. >> >> Fixes: cc7e8ae84faa ("examples/bond: add example application for link >> bonding mode 6") >> Cc: michalx.k.jastrzebski@intel.com >> >> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com> >> --- >> examples/bond/main.c | 14 +++++++------- >> 1 file changed, 7 insertions(+), 7 deletions(-) >> >> diff --git a/examples/bond/main.c b/examples/bond/main.c >> index 335bde5c8d..4efebb3902 100644 >> --- a/examples/bond/main.c >> +++ b/examples/bond/main.c >> @@ -373,7 +373,7 @@ static int lcore_main(__rte_unused void *arg1) >> bond_ip = BOND_IP_1 | (BOND_IP_2 << 8) | >> (BOND_IP_3 << 16) | (BOND_IP_4 << 24); >> - rte_spinlock_trylock(&global_flag_stru_p->lock); >> + rte_spinlock_lock(&global_flag_stru_p->lock); >> while (global_flag_stru_p->LcoreMainIsRunning) { >> rte_spinlock_unlock(&global_flag_stru_p->lock); >> @@ -456,7 +456,7 @@ static int lcore_main(__rte_unused void *arg1) >> if (is_free == 0) >> rte_pktmbuf_free(pkts[i]); >> } >> - rte_spinlock_trylock(&global_flag_stru_p->lock); >> + rte_spinlock_lock(&global_flag_stru_p->lock); >> } >> rte_spinlock_unlock(&global_flag_stru_p->lock); >> printf("BYE lcore_main\n"); >> @@ -571,7 +571,7 @@ static void cmd_start_parsed(__rte_unused void >> *parsed_result, >> { >> int worker_core_id = rte_lcore_id(); >> - rte_spinlock_trylock(&global_flag_stru_p->lock); >> + rte_spinlock_lock(&global_flag_stru_p->lock); >> if (global_flag_stru_p->LcoreMainIsRunning == 0) { >> if (rte_eal_get_lcore_state(global_flag_stru_p->LcoreMainCore) >> != WAIT) { >> @@ -591,7 +591,7 @@ static void cmd_start_parsed(__rte_unused void >> *parsed_result, >> if ((worker_core_id >= RTE_MAX_LCORE) || (worker_core_id == 0)) >> return; >> - rte_spinlock_trylock(&global_flag_stru_p->lock); >> + rte_spinlock_lock(&global_flag_stru_p->lock); >> global_flag_stru_p->LcoreMainIsRunning = 1; >> rte_spinlock_unlock(&global_flag_stru_p->lock); >> cmdline_printf(cl, >> @@ -659,7 +659,7 @@ static void cmd_stop_parsed(__rte_unused void >> *parsed_result, >> struct cmdline *cl, >> __rte_unused void *data) >> { >> - rte_spinlock_trylock(&global_flag_stru_p->lock); >> + rte_spinlock_lock(&global_flag_stru_p->lock); >> if (global_flag_stru_p->LcoreMainIsRunning == 0) { >> cmdline_printf(cl, >> "lcore_main not running on core:%d\n", >> @@ -700,7 +700,7 @@ static void cmd_quit_parsed(__rte_unused void >> *parsed_result, >> struct cmdline *cl, >> __rte_unused void *data) >> { >> - rte_spinlock_trylock(&global_flag_stru_p->lock); >> + rte_spinlock_lock(&global_flag_stru_p->lock); >> if (global_flag_stru_p->LcoreMainIsRunning == 0) { >> cmdline_printf(cl, >> "lcore_main not running on core:%d\n", >> @@ -762,7 +762,7 @@ static void cmd_show_parsed(__rte_unused void >> *parsed_result, >> printf("\n"); >> } >> - rte_spinlock_trylock(&global_flag_stru_p->lock); >> + rte_spinlock_lock(&global_flag_stru_p->lock); >> cmdline_printf(cl, >> "Active_slaves:%d " >> "packets received:Tot:%d Arp:%d IPv4:%d\n", >> ^ permalink raw reply [flat|nested] 104+ messages in thread
* Re: [RFC 3/3] examples/bond: fix invalid use of trylock 2022-04-10 13:51 ` [RFC 3/3] examples/bond: fix invalid use of trylock Mattias Rönnblom 2022-04-11 1:01 ` Min Hu (Connor) @ 2022-04-11 11:25 ` David Marchand 2022-04-11 14:33 ` Mattias Rönnblom 1 sibling, 1 reply; 104+ messages in thread From: David Marchand @ 2022-04-11 11:25 UTC (permalink / raw) To: Mattias Rönnblom Cc: dev, Thomas Monjalon, Honnappa Nagarahalli, Morten Brørup, hofors, michalx.k.jastrzebski On Sun, Apr 10, 2022 at 3:53 PM Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote: > > The conditional rte_spinlock_trylock() was used as if it is an > unconditional lock operation in a number of places. > > Fixes: cc7e8ae84faa ("examples/bond: add example application for link bonding mode 6") > Cc: michalx.k.jastrzebski@intel.com Any reason not to ask for backport in stable branches? > > Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com> Otherwise, this series looks good, thanks Mattias. -- David Marchand ^ permalink raw reply [flat|nested] 104+ messages in thread
* Re: [RFC 3/3] examples/bond: fix invalid use of trylock 2022-04-11 11:25 ` David Marchand @ 2022-04-11 14:33 ` Mattias Rönnblom 0 siblings, 0 replies; 104+ messages in thread From: Mattias Rönnblom @ 2022-04-11 14:33 UTC (permalink / raw) To: David Marchand Cc: dev, Thomas Monjalon, Honnappa Nagarahalli, Morten Brørup, hofors, michalx.k.jastrzebski On 2022-04-11 13:25, David Marchand wrote: > On Sun, Apr 10, 2022 at 3:53 PM Mattias Rönnblom > <mattias.ronnblom@ericsson.com> wrote: >> >> The conditional rte_spinlock_trylock() was used as if it is an >> unconditional lock operation in a number of places. >> >> Fixes: cc7e8ae84faa ("examples/bond: add example application for link bonding mode 6") >> Cc: michalx.k.jastrzebski@intel.com > > Any reason not to ask for backport in stable branches? > No. Will do. >> >> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com> > > Otherwise, this series looks good, thanks Mattias. > > ^ permalink raw reply [flat|nested] 104+ messages in thread
* Re: [RFC 1/3] eal: add macro to warn for unused function return values 2022-04-10 13:51 ` [RFC 1/3] eal: add macro to warn for unused function return values Mattias Rönnblom 2022-04-10 13:51 ` [RFC 2/3] eal: emit warning for unused trylock return value Mattias Rönnblom 2022-04-10 13:51 ` [RFC 3/3] examples/bond: fix invalid use of trylock Mattias Rönnblom @ 2022-04-10 18:02 ` Stephen Hemminger 2022-04-10 18:50 ` Mattias Rönnblom 2022-04-11 7:17 ` Morten Brørup 2022-04-11 9:16 ` Bruce Richardson 4 siblings, 1 reply; 104+ messages in thread From: Stephen Hemminger @ 2022-04-10 18:02 UTC (permalink / raw) To: Mattias Rönnblom Cc: dev, Thomas Monjalon, David Marchand, Honnappa.Nagarahalli, mb, hofors On Sun, 10 Apr 2022 15:51:38 +0200 Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote: > This patch adds a wrapper macro __rte_warn_unused_result for the > warn_unused_result function attribute. > > Marking a function __rte_warn_unused_result will make the compiler > emit a warning in case the caller does not use the function's return > value. > > Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com> Looks good, but are these attributes compiler specific? ^ permalink raw reply [flat|nested] 104+ messages in thread
* Re: [RFC 1/3] eal: add macro to warn for unused function return values 2022-04-10 18:02 ` [RFC 1/3] eal: add macro to warn for unused function return values Stephen Hemminger @ 2022-04-10 18:50 ` Mattias Rönnblom 0 siblings, 0 replies; 104+ messages in thread From: Mattias Rönnblom @ 2022-04-10 18:50 UTC (permalink / raw) To: Stephen Hemminger Cc: dev, Thomas Monjalon, David Marchand, Honnappa.Nagarahalli, mb, hofors On 2022-04-10 20:02, Stephen Hemminger wrote: > On Sun, 10 Apr 2022 15:51:38 +0200 > Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote: > >> This patch adds a wrapper macro __rte_warn_unused_result for the >> warn_unused_result function attribute. >> >> Marking a function __rte_warn_unused_result will make the compiler >> emit a warning in case the caller does not use the function's return >> value. >> >> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com> > > Looks good, but are these attributes compiler specific? GCC and LLVM clang supports this and many other attributes (some of which are already wrapped by ___rte_* macros). The whole attribute machinery is compiler (or rather, "implementation") specific, as suggested by the double-underscore prefix (__attribute__). I don't know about icc. ^ permalink raw reply [flat|nested] 104+ messages in thread
* RE: [RFC 1/3] eal: add macro to warn for unused function return values 2022-04-10 13:51 ` [RFC 1/3] eal: add macro to warn for unused function return values Mattias Rönnblom ` (2 preceding siblings ...) 2022-04-10 18:02 ` [RFC 1/3] eal: add macro to warn for unused function return values Stephen Hemminger @ 2022-04-11 7:17 ` Morten Brørup 2022-04-11 14:29 ` Mattias Rönnblom 2022-04-11 9:16 ` Bruce Richardson 4 siblings, 1 reply; 104+ messages in thread From: Morten Brørup @ 2022-04-11 7:17 UTC (permalink / raw) To: Mattias Rönnblom, dev Cc: Thomas Monjalon, David Marchand, Honnappa.Nagarahalli, hofors > From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com] > Sent: Sunday, 10 April 2022 15.52 > > This patch adds a wrapper macro __rte_warn_unused_result for the > warn_unused_result function attribute. > > Marking a function __rte_warn_unused_result will make the compiler > emit a warning in case the caller does not use the function's return > value. > > Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com> > --- > lib/eal/include/rte_common.h | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/lib/eal/include/rte_common.h > b/lib/eal/include/rte_common.h > index 4a399cc7c8..544e7de2e7 100644 > --- a/lib/eal/include/rte_common.h > +++ b/lib/eal/include/rte_common.h > @@ -222,6 +222,11 @@ static void > __attribute__((destructor(RTE_PRIO(prio)), used)) func(void) > */ > #define __rte_noreturn __attribute__((noreturn)) > > +/** > + * Issue warning in case the function's return value is ignore Typo: ignore -> ignored Consider: warning -> a warning > + */ > +#define __rte_warn_unused_result __attribute__((warn_unused_result)) > + > /** > * Force a function to be inlined > */ > -- > 2.25.1 > Reviewed-by: Morten Brørup <mb@smartsharesystems.com> ^ permalink raw reply [flat|nested] 104+ messages in thread
* Re: [RFC 1/3] eal: add macro to warn for unused function return values 2022-04-11 7:17 ` Morten Brørup @ 2022-04-11 14:29 ` Mattias Rönnblom 0 siblings, 0 replies; 104+ messages in thread From: Mattias Rönnblom @ 2022-04-11 14:29 UTC (permalink / raw) To: Morten Brørup, dev Cc: Thomas Monjalon, David Marchand, Honnappa.Nagarahalli, hofors On 2022-04-11 09:17, Morten Brørup wrote: >> From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com] >> Sent: Sunday, 10 April 2022 15.52 >> >> This patch adds a wrapper macro __rte_warn_unused_result for the >> warn_unused_result function attribute. >> >> Marking a function __rte_warn_unused_result will make the compiler >> emit a warning in case the caller does not use the function's return >> value. >> >> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com> >> --- >> lib/eal/include/rte_common.h | 5 +++++ >> 1 file changed, 5 insertions(+) >> >> diff --git a/lib/eal/include/rte_common.h >> b/lib/eal/include/rte_common.h >> index 4a399cc7c8..544e7de2e7 100644 >> --- a/lib/eal/include/rte_common.h >> +++ b/lib/eal/include/rte_common.h >> @@ -222,6 +222,11 @@ static void >> __attribute__((destructor(RTE_PRIO(prio)), used)) func(void) >> */ >> #define __rte_noreturn __attribute__((noreturn)) >> >> +/** >> + * Issue warning in case the function's return value is ignore > > Typo: ignore -> ignored > > Consider: warning -> a warning > OK. >> + */ >> +#define __rte_warn_unused_result __attribute__((warn_unused_result)) >> + >> /** >> * Force a function to be inlined >> */ >> -- >> 2.25.1 >> > > Reviewed-by: Morten Brørup <mb@smartsharesystems.com> > Thanks! ^ permalink raw reply [flat|nested] 104+ messages in thread
* Re: [RFC 1/3] eal: add macro to warn for unused function return values 2022-04-10 13:51 ` [RFC 1/3] eal: add macro to warn for unused function return values Mattias Rönnblom ` (3 preceding siblings ...) 2022-04-11 7:17 ` Morten Brørup @ 2022-04-11 9:16 ` Bruce Richardson 2022-04-11 14:27 ` Mattias Rönnblom ` (2 more replies) 4 siblings, 3 replies; 104+ messages in thread From: Bruce Richardson @ 2022-04-11 9:16 UTC (permalink / raw) To: Mattias Rönnblom Cc: dev, Thomas Monjalon, David Marchand, Honnappa.Nagarahalli, mb, hofors On Sun, Apr 10, 2022 at 03:51:38PM +0200, Mattias Rönnblom wrote: > This patch adds a wrapper macro __rte_warn_unused_result for the > warn_unused_result function attribute. > > Marking a function __rte_warn_unused_result will make the compiler > emit a warning in case the caller does not use the function's return > value. > > Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com> > --- This is good to have, thanks. Series-acked-by: Bruce Richardson <bruce.richardson@intel.com> ^ permalink raw reply [flat|nested] 104+ messages in thread
* Re: [RFC 1/3] eal: add macro to warn for unused function return values 2022-04-11 9:16 ` Bruce Richardson @ 2022-04-11 14:27 ` Mattias Rönnblom 2022-04-11 15:15 ` [PATCH " Mattias Rönnblom 2022-04-11 18:24 ` [RFC " Tyler Retzlaff 2 siblings, 0 replies; 104+ messages in thread From: Mattias Rönnblom @ 2022-04-11 14:27 UTC (permalink / raw) To: Bruce Richardson Cc: dev, Thomas Monjalon, David Marchand, Honnappa.Nagarahalli, mb, hofors On 2022-04-11 11:16, Bruce Richardson wrote: > On Sun, Apr 10, 2022 at 03:51:38PM +0200, Mattias Rönnblom wrote: >> This patch adds a wrapper macro __rte_warn_unused_result for the >> warn_unused_result function attribute. >> >> Marking a function __rte_warn_unused_result will make the compiler >> emit a warning in case the caller does not use the function's return >> value. >> >> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com> >> --- > > This is good to have, thanks. > > Series-acked-by: Bruce Richardson <bruce.richardson@intel.com> There is one issue with this attribute in combination with GCC: a warn_unused_result warning cannot easily be suppressed in the source code of the caller. The usual cast-to-void trick doesn't work with this compiler (but does work with clang). This behavior limit the usefulness of this attribute to function where it's pretty much always a bug if you ignore the return value. I will update the macro doc string with some details around this. ^ permalink raw reply [flat|nested] 104+ messages in thread
* [PATCH 1/3] eal: add macro to warn for unused function return values 2022-04-11 9:16 ` Bruce Richardson 2022-04-11 14:27 ` Mattias Rönnblom @ 2022-04-11 15:15 ` Mattias Rönnblom 2022-04-11 15:15 ` [PATCH 2/3] eal: emit warning for unused trylock return value Mattias Rönnblom ` (2 more replies) 2022-04-11 18:24 ` [RFC " Tyler Retzlaff 2 siblings, 3 replies; 104+ messages in thread From: Mattias Rönnblom @ 2022-04-11 15:15 UTC (permalink / raw) To: dev Cc: Bruce Richardson, Thomas Monjalon, David Marchand, Honnappa.Nagarahalli, mb, hofors, Stephen Hemminger, Mattias Rönnblom This patch adds a wrapper macro __rte_warn_unused_result for the warn_unused_result function attribute. Marking a function __rte_warn_unused_result will make the compiler emit a warning in case the caller does not use the function's return value. Changes since RFC: * Include usage recommendation and GCC peculiarities in the macro documentation. Acked-by: Bruce Richardson <bruce.richardson@intel.com> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com> --- lib/eal/include/rte_common.h | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/lib/eal/include/rte_common.h b/lib/eal/include/rte_common.h index 4a399cc7c8..67587025ab 100644 --- a/lib/eal/include/rte_common.h +++ b/lib/eal/include/rte_common.h @@ -222,6 +222,31 @@ static void __attribute__((destructor(RTE_PRIO(prio)), used)) func(void) */ #define __rte_noreturn __attribute__((noreturn)) +/** + * Issue a warning in case the function's return value is ignored. + * + * The use of this attribute should be restricted to cases where + * ignoring the marked function's return value is almost always a + * bug. With GCC, some effort is required to make clear that ignoring + * the return value is intentional. The usual void-casting method to + * mark something unused as used does not suppress the warning with + * this compiler. + * + * @code{.c} + * __rte_warn_unused_result int foo(); + * + * void ignore_foo_result(void) { + * foo(); // generates a warning with all compilers + * + * (void)foo(); // still generates the warning with GCC (but not clang) + * + * int unused __rte_unused; + * unused = foo(); // does the trick with all compilers + * } + * @endcode + */ +#define __rte_warn_unused_result __attribute__((warn_unused_result)) + /** * Force a function to be inlined */ -- 2.25.1 ^ permalink raw reply [flat|nested] 104+ messages in thread
* [PATCH 2/3] eal: emit warning for unused trylock return value 2022-04-11 15:15 ` [PATCH " Mattias Rönnblom @ 2022-04-11 15:15 ` Mattias Rönnblom 2022-04-11 15:29 ` Morten Brørup 2022-04-11 15:15 ` [PATCH 3/3] examples/bond: fix invalid use of trylock Mattias Rönnblom 2022-04-11 15:25 ` [PATCH 1/3] eal: add macro to warn for unused function return values Morten Brørup 2 siblings, 1 reply; 104+ messages in thread From: Mattias Rönnblom @ 2022-04-11 15:15 UTC (permalink / raw) To: dev Cc: Bruce Richardson, Thomas Monjalon, David Marchand, Honnappa.Nagarahalli, mb, hofors, Stephen Hemminger, Mattias Rönnblom Mark the trylock family of spinlock functions with __rte_warn_unused_result. Acked-by: Bruce Richardson <bruce.richardson@intel.com> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com> --- lib/eal/include/generic/rte_spinlock.h | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/lib/eal/include/generic/rte_spinlock.h b/lib/eal/include/generic/rte_spinlock.h index 40fe49d5ad..73ed4bfbdc 100644 --- a/lib/eal/include/generic/rte_spinlock.h +++ b/lib/eal/include/generic/rte_spinlock.h @@ -97,6 +97,7 @@ rte_spinlock_unlock (rte_spinlock_t *sl) * @return * 1 if the lock is successfully taken; 0 otherwise. */ +__rte_warn_unused_result static inline int rte_spinlock_trylock (rte_spinlock_t *sl); @@ -174,6 +175,7 @@ rte_spinlock_unlock_tm(rte_spinlock_t *sl); * 1 if the hardware memory transaction is successfully started * or lock is successfully taken; 0 otherwise. */ +__rte_warn_unused_result static inline int rte_spinlock_trylock_tm(rte_spinlock_t *sl); @@ -243,6 +245,7 @@ static inline void rte_spinlock_recursive_unlock(rte_spinlock_recursive_t *slr) * @return * 1 if the lock is successfully taken; 0 otherwise. */ +__rte_warn_unused_result static inline int rte_spinlock_recursive_trylock(rte_spinlock_recursive_t *slr) { int id = rte_gettid(); @@ -299,6 +302,7 @@ static inline void rte_spinlock_recursive_unlock_tm( * 1 if the hardware memory transaction is successfully started * or lock is successfully taken; 0 otherwise. */ +__rte_warn_unused_result static inline int rte_spinlock_recursive_trylock_tm( rte_spinlock_recursive_t *slr); -- 2.25.1 ^ permalink raw reply [flat|nested] 104+ messages in thread
* RE: [PATCH 2/3] eal: emit warning for unused trylock return value 2022-04-11 15:15 ` [PATCH 2/3] eal: emit warning for unused trylock return value Mattias Rönnblom @ 2022-04-11 15:29 ` Morten Brørup 0 siblings, 0 replies; 104+ messages in thread From: Morten Brørup @ 2022-04-11 15:29 UTC (permalink / raw) To: Mattias Rönnblom, dev Cc: Bruce Richardson, Thomas Monjalon, David Marchand, Honnappa.Nagarahalli, hofors, Stephen Hemminger > From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com] > Sent: Monday, 11 April 2022 17.16 > > Mark the trylock family of spinlock functions with > __rte_warn_unused_result. > > Acked-by: Bruce Richardson <bruce.richardson@intel.com> > > Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com> > --- > lib/eal/include/generic/rte_spinlock.h | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/lib/eal/include/generic/rte_spinlock.h > b/lib/eal/include/generic/rte_spinlock.h > index 40fe49d5ad..73ed4bfbdc 100644 > --- a/lib/eal/include/generic/rte_spinlock.h > +++ b/lib/eal/include/generic/rte_spinlock.h > @@ -97,6 +97,7 @@ rte_spinlock_unlock (rte_spinlock_t *sl) > * @return > * 1 if the lock is successfully taken; 0 otherwise. > */ > +__rte_warn_unused_result > static inline int > rte_spinlock_trylock (rte_spinlock_t *sl); > > @@ -174,6 +175,7 @@ rte_spinlock_unlock_tm(rte_spinlock_t *sl); > * 1 if the hardware memory transaction is successfully started > * or lock is successfully taken; 0 otherwise. > */ > +__rte_warn_unused_result > static inline int > rte_spinlock_trylock_tm(rte_spinlock_t *sl); > > @@ -243,6 +245,7 @@ static inline void > rte_spinlock_recursive_unlock(rte_spinlock_recursive_t *slr) > * @return > * 1 if the lock is successfully taken; 0 otherwise. > */ > +__rte_warn_unused_result > static inline int > rte_spinlock_recursive_trylock(rte_spinlock_recursive_t *slr) > { > int id = rte_gettid(); > @@ -299,6 +302,7 @@ static inline void > rte_spinlock_recursive_unlock_tm( > * 1 if the hardware memory transaction is successfully started > * or lock is successfully taken; 0 otherwise. > */ > +__rte_warn_unused_result > static inline int rte_spinlock_recursive_trylock_tm( > rte_spinlock_recursive_t *slr); > > -- > 2.25.1 > Acked-by: Morten Brørup <mb@smartsharesystems.com> ^ permalink raw reply [flat|nested] 104+ messages in thread
* [PATCH 3/3] examples/bond: fix invalid use of trylock 2022-04-11 15:15 ` [PATCH " Mattias Rönnblom 2022-04-11 15:15 ` [PATCH 2/3] eal: emit warning for unused trylock return value Mattias Rönnblom @ 2022-04-11 15:15 ` Mattias Rönnblom 2022-04-14 12:06 ` David Marchand 2022-04-11 15:25 ` [PATCH 1/3] eal: add macro to warn for unused function return values Morten Brørup 2 siblings, 1 reply; 104+ messages in thread From: Mattias Rönnblom @ 2022-04-11 15:15 UTC (permalink / raw) To: dev Cc: Bruce Richardson, Thomas Monjalon, David Marchand, Honnappa.Nagarahalli, mb, hofors, Stephen Hemminger, Mattias Rönnblom, michalx.k.jastrzebski, stable, Min Hu The conditional rte_spinlock_trylock() was used as if it is an unconditional lock operation in a number of places. Fixes: cc7e8ae84faa ("examples/bond: add example application for link bonding mode 6") Cc: michalx.k.jastrzebski@intel.com Cc: stable@dpdk.org Acked-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: Min Hu (Connor) <humin29@huawei.com> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com> --- examples/bond/main.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/examples/bond/main.c b/examples/bond/main.c index 335bde5c8d..4efebb3902 100644 --- a/examples/bond/main.c +++ b/examples/bond/main.c @@ -373,7 +373,7 @@ static int lcore_main(__rte_unused void *arg1) bond_ip = BOND_IP_1 | (BOND_IP_2 << 8) | (BOND_IP_3 << 16) | (BOND_IP_4 << 24); - rte_spinlock_trylock(&global_flag_stru_p->lock); + rte_spinlock_lock(&global_flag_stru_p->lock); while (global_flag_stru_p->LcoreMainIsRunning) { rte_spinlock_unlock(&global_flag_stru_p->lock); @@ -456,7 +456,7 @@ static int lcore_main(__rte_unused void *arg1) if (is_free == 0) rte_pktmbuf_free(pkts[i]); } - rte_spinlock_trylock(&global_flag_stru_p->lock); + rte_spinlock_lock(&global_flag_stru_p->lock); } rte_spinlock_unlock(&global_flag_stru_p->lock); printf("BYE lcore_main\n"); @@ -571,7 +571,7 @@ static void cmd_start_parsed(__rte_unused void *parsed_result, { int worker_core_id = rte_lcore_id(); - rte_spinlock_trylock(&global_flag_stru_p->lock); + rte_spinlock_lock(&global_flag_stru_p->lock); if (global_flag_stru_p->LcoreMainIsRunning == 0) { if (rte_eal_get_lcore_state(global_flag_stru_p->LcoreMainCore) != WAIT) { @@ -591,7 +591,7 @@ static void cmd_start_parsed(__rte_unused void *parsed_result, if ((worker_core_id >= RTE_MAX_LCORE) || (worker_core_id == 0)) return; - rte_spinlock_trylock(&global_flag_stru_p->lock); + rte_spinlock_lock(&global_flag_stru_p->lock); global_flag_stru_p->LcoreMainIsRunning = 1; rte_spinlock_unlock(&global_flag_stru_p->lock); cmdline_printf(cl, @@ -659,7 +659,7 @@ static void cmd_stop_parsed(__rte_unused void *parsed_result, struct cmdline *cl, __rte_unused void *data) { - rte_spinlock_trylock(&global_flag_stru_p->lock); + rte_spinlock_lock(&global_flag_stru_p->lock); if (global_flag_stru_p->LcoreMainIsRunning == 0) { cmdline_printf(cl, "lcore_main not running on core:%d\n", @@ -700,7 +700,7 @@ static void cmd_quit_parsed(__rte_unused void *parsed_result, struct cmdline *cl, __rte_unused void *data) { - rte_spinlock_trylock(&global_flag_stru_p->lock); + rte_spinlock_lock(&global_flag_stru_p->lock); if (global_flag_stru_p->LcoreMainIsRunning == 0) { cmdline_printf(cl, "lcore_main not running on core:%d\n", @@ -762,7 +762,7 @@ static void cmd_show_parsed(__rte_unused void *parsed_result, printf("\n"); } - rte_spinlock_trylock(&global_flag_stru_p->lock); + rte_spinlock_lock(&global_flag_stru_p->lock); cmdline_printf(cl, "Active_slaves:%d " "packets received:Tot:%d Arp:%d IPv4:%d\n", -- 2.25.1 ^ permalink raw reply [flat|nested] 104+ messages in thread
* Re: [PATCH 3/3] examples/bond: fix invalid use of trylock 2022-04-11 15:15 ` [PATCH 3/3] examples/bond: fix invalid use of trylock Mattias Rönnblom @ 2022-04-14 12:06 ` David Marchand 0 siblings, 0 replies; 104+ messages in thread From: David Marchand @ 2022-04-14 12:06 UTC (permalink / raw) To: Mattias Rönnblom Cc: dev, Bruce Richardson, Thomas Monjalon, Honnappa Nagarahalli, Morten Brørup, hofors, Stephen Hemminger, michalx.k.jastrzebski, dpdk stable, Min Hu, Tyler Retzlaff On Mon, Apr 11, 2022 at 5:17 PM Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote: > > The conditional rte_spinlock_trylock() was used as if it is an > unconditional lock operation in a number of places. > > Fixes: cc7e8ae84faa ("examples/bond: add example application for link bonding mode 6") > Cc: michalx.k.jastrzebski@intel.com > Cc: stable@dpdk.org > > Acked-by: Bruce Richardson <bruce.richardson@intel.com> > Acked-by: Min Hu (Connor) <humin29@huawei.com> > > Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com> Series applied, thanks Mattias. -- David Marchand ^ permalink raw reply [flat|nested] 104+ messages in thread
* RE: [PATCH 1/3] eal: add macro to warn for unused function return values 2022-04-11 15:15 ` [PATCH " Mattias Rönnblom 2022-04-11 15:15 ` [PATCH 2/3] eal: emit warning for unused trylock return value Mattias Rönnblom 2022-04-11 15:15 ` [PATCH 3/3] examples/bond: fix invalid use of trylock Mattias Rönnblom @ 2022-04-11 15:25 ` Morten Brørup 2 siblings, 0 replies; 104+ messages in thread From: Morten Brørup @ 2022-04-11 15:25 UTC (permalink / raw) To: Mattias Rönnblom, dev Cc: Bruce Richardson, Thomas Monjalon, David Marchand, Honnappa.Nagarahalli, hofors, Stephen Hemminger > From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com] > Sent: Monday, 11 April 2022 17.16 > > This patch adds a wrapper macro __rte_warn_unused_result for the > warn_unused_result function attribute. > > Marking a function __rte_warn_unused_result will make the compiler > emit a warning in case the caller does not use the function's return > value. > > Changes since RFC: > * Include usage recommendation and GCC peculiarities in the macro > documentation. > > Acked-by: Bruce Richardson <bruce.richardson@intel.com> > > Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com> > --- > lib/eal/include/rte_common.h | 25 +++++++++++++++++++++++++ > 1 file changed, 25 insertions(+) > > diff --git a/lib/eal/include/rte_common.h > b/lib/eal/include/rte_common.h > index 4a399cc7c8..67587025ab 100644 > --- a/lib/eal/include/rte_common.h > +++ b/lib/eal/include/rte_common.h > @@ -222,6 +222,31 @@ static void > __attribute__((destructor(RTE_PRIO(prio)), used)) func(void) > */ > #define __rte_noreturn __attribute__((noreturn)) > > +/** > + * Issue a warning in case the function's return value is ignored. > + * > + * The use of this attribute should be restricted to cases where > + * ignoring the marked function's return value is almost always a > + * bug. With GCC, some effort is required to make clear that ignoring > + * the return value is intentional. The usual void-casting method to > + * mark something unused as used does not suppress the warning with > + * this compiler. > + * > + * @code{.c} > + * __rte_warn_unused_result int foo(); > + * > + * void ignore_foo_result(void) { > + * foo(); // generates a warning with all compilers > + * > + * (void)foo(); // still generates the warning with GCC (but > not clang) > + * > + * int unused __rte_unused; > + * unused = foo(); // does the trick with all compilers > + * } > + * @endcode > + */ > +#define __rte_warn_unused_result __attribute__((warn_unused_result)) > + > /** > * Force a function to be inlined > */ > -- > 2.25.1 > Nice! If only all functions were that well documented. :-) Reviewed-by: Morten Brørup <mb@smartsharesystems.com> ^ permalink raw reply [flat|nested] 104+ messages in thread
* Re: [RFC 1/3] eal: add macro to warn for unused function return values 2022-04-11 9:16 ` Bruce Richardson 2022-04-11 14:27 ` Mattias Rönnblom 2022-04-11 15:15 ` [PATCH " Mattias Rönnblom @ 2022-04-11 18:24 ` Tyler Retzlaff 2 siblings, 0 replies; 104+ messages in thread From: Tyler Retzlaff @ 2022-04-11 18:24 UTC (permalink / raw) To: Bruce Richardson Cc: Mattias Rönnblom, dev, Thomas Monjalon, David Marchand, Honnappa.Nagarahalli, mb, hofors On Mon, Apr 11, 2022 at 10:16:35AM +0100, Bruce Richardson wrote: > On Sun, Apr 10, 2022 at 03:51:38PM +0200, Mattias Rönnblom wrote: > > This patch adds a wrapper macro __rte_warn_unused_result for the > > warn_unused_result function attribute. > > > > Marking a function __rte_warn_unused_result will make the compiler > > emit a warning in case the caller does not use the function's return > > value. > > > > Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com> > > --- > > This is good to have, thanks. > > Series-acked-by: Bruce Richardson <bruce.richardson@intel.com> +1 Series-acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com> ^ permalink raw reply [flat|nested] 104+ messages in thread
* Re: [PATCH v3] eal: add seqlock 2022-04-02 0:21 ` Honnappa Nagarahalli 2022-04-02 11:01 ` Morten Brørup @ 2022-04-03 6:10 ` Mattias Rönnblom 2022-04-03 17:27 ` Honnappa Nagarahalli 2022-04-03 6:33 ` Mattias Rönnblom 2 siblings, 1 reply; 104+ messages in thread From: Mattias Rönnblom @ 2022-04-03 6:10 UTC (permalink / raw) To: Honnappa Nagarahalli, dev Cc: thomas, David Marchand, onar.olsen, nd, konstantin.ananyev, mb, stephen, Ola Liljedahl On 2022-04-02 02:21, Honnappa Nagarahalli wrote: > Hi Mattias, > Few comments inline. > >> -----Original Message----- >> From: Mattias Rönnblom <mattias.ronnblom@ericsson.com> >> Sent: Friday, April 1, 2022 10:08 AM >> To: dev@dpdk.org >> Cc: thomas@monjalon.net; David Marchand <david.marchand@redhat.com>; >> onar.olsen@ericsson.com; Honnappa Nagarahalli >> <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>; >> konstantin.ananyev@intel.com; mb@smartsharesystems.com; >> stephen@networkplumber.org; Mattias Rönnblom >> <mattias.ronnblom@ericsson.com>; Ola Liljedahl <Ola.Liljedahl@arm.com> >> Subject: [PATCH v3] eal: add seqlock >> >> A sequence lock (seqlock) is synchronization primitive which allows for data- >> race free, low-overhead, high-frequency reads, especially for data structures >> shared across many cores and which are updated with relatively infrequently. >> >> A seqlock permits multiple parallel readers. The variant of seqlock implemented >> in this patch supports multiple writers as well. A spinlock is used for writer- >> writer serialization. >> >> To avoid resource reclamation and other issues, the data protected by a seqlock >> is best off being self-contained (i.e., no pointers [except to constant data]). >> >> One way to think about seqlocks is that they provide means to perform atomic >> operations on data objects larger what the native atomic machine instructions >> allow for. >> >> DPDK seqlocks are not preemption safe on the writer side. A thread preemption >> affects performance, not correctness. >> >> A seqlock contains a sequence number, which can be thought of as the >> generation of the data it protects. >> >> A reader will >> 1. Load the sequence number (sn). >> 2. Load, in arbitrary order, the seqlock-protected data. >> 3. Load the sn again. >> 4. Check if the first and second sn are equal, and even numbered. >> If they are not, discard the loaded data, and restart from 1. >> >> The first three steps need to be ordered using suitable memory fences. >> >> A writer will >> 1. Take the spinlock, to serialize writer access. >> 2. Load the sn. >> 3. Store the original sn + 1 as the new sn. >> 4. Perform load and stores to the seqlock-protected data. >> 5. Store the original sn + 2 as the new sn. >> 6. Release the spinlock. >> >> Proper memory fencing is required to make sure the first sn store, the data >> stores, and the second sn store appear to the reader in the mentioned order. >> >> The sn loads and stores must be atomic, but the data loads and stores need not >> be. >> >> The original seqlock design and implementation was done by Stephen >> Hemminger. This is an independent implementation, using C11 atomics. >> >> For more information on seqlocks, see >> https://en.wikipedia.org/wiki/Seqlock >> >> PATCH v3: >> * Renamed both read and write-side critical section begin/end functions >> to better match rwlock naming, per Ola Liljedahl's suggestion. >> * Added 'extern "C"' guards for C++ compatibility. >> * Refer to the main lcore as the main, and not the master. >> >> PATCH v2: >> * Skip instead of fail unit test in case too few lcores are available. >> * Use main lcore for testing, reducing the minimum number of lcores >> required to run the unit tests to four. >> * Consistently refer to sn field as the "sequence number" in the >> documentation. >> * Fixed spelling mistakes in documentation. >> >> Updates since RFC: >> * Added API documentation. >> * Added link to Wikipedia article in the commit message. >> * Changed seqlock sequence number field from uint64_t (which was >> overkill) to uint32_t. The sn type needs to be sufficiently large >> to assure no reader will read a sn, access the data, and then read >> the same sn, but the sn has been updated to many times during the >> read, so it has wrapped. >> * Added RTE_SEQLOCK_INITIALIZER macro for static initialization. >> * Removed the rte_seqlock struct + separate rte_seqlock_t typedef >> with an anonymous struct typedef:ed to rte_seqlock_t. >> >> Acked-by: Morten Brørup <mb@smartsharesystems.com> >> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com> >> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com> >> --- >> app/test/meson.build | 2 + >> app/test/test_seqlock.c | 202 +++++++++++++++++++++++ >> lib/eal/common/meson.build | 1 + >> lib/eal/common/rte_seqlock.c | 12 ++ >> lib/eal/include/meson.build | 1 + >> lib/eal/include/rte_seqlock.h | 302 ++++++++++++++++++++++++++++++++++ >> lib/eal/version.map | 3 + >> 7 files changed, 523 insertions(+) >> create mode 100644 app/test/test_seqlock.c create mode 100644 >> lib/eal/common/rte_seqlock.c create mode 100644 >> lib/eal/include/rte_seqlock.h >> >> diff --git a/app/test/meson.build b/app/test/meson.build index >> 5fc1dd1b7b..5e418e8766 100644 >> --- a/app/test/meson.build >> +++ b/app/test/meson.build >> @@ -125,6 +125,7 @@ test_sources = files( >> 'test_rwlock.c', >> 'test_sched.c', >> 'test_security.c', >> + 'test_seqlock.c', >> 'test_service_cores.c', >> 'test_spinlock.c', >> 'test_stack.c', >> @@ -214,6 +215,7 @@ fast_tests = [ >> ['rwlock_rde_wro_autotest', true], >> ['sched_autotest', true], >> ['security_autotest', false], >> + ['seqlock_autotest', true], >> ['spinlock_autotest', true], >> ['stack_autotest', false], >> ['stack_lf_autotest', false], >> diff --git a/app/test/test_seqlock.c b/app/test/test_seqlock.c new file mode >> 100644 index 0000000000..54fadf8025 >> --- /dev/null >> +++ b/app/test/test_seqlock.c >> @@ -0,0 +1,202 @@ >> +/* SPDX-License-Identifier: BSD-3-Clause >> + * Copyright(c) 2022 Ericsson AB >> + */ >> + >> +#include <rte_seqlock.h> >> + >> +#include <rte_cycles.h> >> +#include <rte_malloc.h> >> +#include <rte_random.h> >> + >> +#include <inttypes.h> >> + >> +#include "test.h" >> + >> +struct data { >> + rte_seqlock_t lock; >> + >> + uint64_t a; >> + uint64_t b __rte_cache_aligned; >> + uint64_t c __rte_cache_aligned; >> +} __rte_cache_aligned; >> + >> +struct reader { >> + struct data *data; >> + uint8_t stop; >> +}; >> + >> +#define WRITER_RUNTIME (2.0) /* s */ >> + >> +#define WRITER_MAX_DELAY (100) /* us */ >> + >> +#define INTERRUPTED_WRITER_FREQUENCY (1000) #define >> +WRITER_INTERRUPT_TIME (1) /* us */ >> + >> +static int >> +writer_run(void *arg) >> +{ >> + struct data *data = arg; >> + uint64_t deadline; >> + >> + deadline = rte_get_timer_cycles() + >> + WRITER_RUNTIME * rte_get_timer_hz(); >> + >> + while (rte_get_timer_cycles() < deadline) { >> + bool interrupted; >> + uint64_t new_value; >> + unsigned int delay; >> + >> + new_value = rte_rand(); >> + >> + interrupted = >> rte_rand_max(INTERRUPTED_WRITER_FREQUENCY) == 0; >> + >> + rte_seqlock_write_lock(&data->lock); >> + >> + data->c = new_value; >> + >> + /* These compiler barriers (both on the test reader >> + * and the test writer side) are here to ensure that >> + * loads/stores *usually* happen in test program order >> + * (always on a TSO machine). They are arrange in such >> + * a way that the writer stores in a different order >> + * than the reader loads, to emulate an arbitrary >> + * order. A real application using a seqlock does not >> + * require any compiler barriers. >> + */ >> + rte_compiler_barrier(); > The compiler barriers are not sufficient on all architectures (if the intention is to maintain the program order). > The intention is what is described in the comment (i.e., to make it likely, but no guaranteed, that the stores will be globally visible in the program order). The reason I didn't put in a release memory barrier, was that it seems a little intrusive. Maybe I should remove these compiler barriers. They are also intrusive in the way may prevent some compiler optimizations, that could expose a seqlock bug. Or, I could have two variants of the tests. I don't know. >> + data->b = new_value; >> + >> + if (interrupted) >> + rte_delay_us_block(WRITER_INTERRUPT_TIME); >> + >> + rte_compiler_barrier(); >> + data->a = new_value; >> + >> + rte_seqlock_write_unlock(&data->lock); >> + >> + delay = rte_rand_max(WRITER_MAX_DELAY); >> + >> + rte_delay_us_block(delay); >> + } >> + >> + return 0; >> +} >> + >> +#define INTERRUPTED_READER_FREQUENCY (1000) #define >> +READER_INTERRUPT_TIME (1000) /* us */ >> + >> +static int >> +reader_run(void *arg) >> +{ >> + struct reader *r = arg; >> + int rc = 0; >> + >> + while (__atomic_load_n(&r->stop, __ATOMIC_RELAXED) == 0 && rc == >> 0) { >> + struct data *data = r->data; >> + bool interrupted; >> + uint64_t a; >> + uint64_t b; >> + uint64_t c; >> + uint32_t sn; >> + >> + interrupted = >> rte_rand_max(INTERRUPTED_READER_FREQUENCY) == 0; >> + >> + sn = rte_seqlock_read_lock(&data->lock); >> + >> + do { >> + a = data->a; >> + /* See writer_run() for an explanation why >> + * these barriers are here. >> + */ >> + rte_compiler_barrier(); >> + >> + if (interrupted) >> + >> rte_delay_us_block(READER_INTERRUPT_TIME); >> + >> + c = data->c; >> + >> + rte_compiler_barrier(); >> + b = data->b; >> + >> + } while (!rte_seqlock_read_tryunlock(&data->lock, &sn)); >> + >> + if (a != b || b != c) { >> + printf("Reader observed inconsistent data values " >> + "%" PRIu64 " %" PRIu64 " %" PRIu64 "\n", >> + a, b, c); >> + rc = -1; >> + } >> + } >> + >> + return rc; >> +} >> + >> +static void >> +reader_stop(struct reader *reader) >> +{ >> + __atomic_store_n(&reader->stop, 1, __ATOMIC_RELAXED); } >> + >> +#define NUM_WRITERS (2) /* main lcore + one worker */ #define >> +MIN_NUM_READERS (2) #define MAX_READERS (RTE_MAX_LCORE - >> NUM_WRITERS - >> +1) #define MIN_LCORE_COUNT (NUM_WRITERS + MIN_NUM_READERS) >> + >> +/* Only a compile-time test */ >> +static rte_seqlock_t __rte_unused static_init_lock = >> +RTE_SEQLOCK_INITIALIZER; >> + >> +static int >> +test_seqlock(void) >> +{ >> + struct reader readers[MAX_READERS]; >> + unsigned int num_readers; >> + unsigned int num_lcores; >> + unsigned int i; >> + unsigned int lcore_id; >> + unsigned int reader_lcore_ids[MAX_READERS]; >> + unsigned int worker_writer_lcore_id = 0; >> + int rc = 0; >> + >> + num_lcores = rte_lcore_count(); >> + >> + if (num_lcores < MIN_LCORE_COUNT) { >> + printf("Too few cores to run test. Skipping.\n"); >> + return 0; >> + } >> + >> + num_readers = num_lcores - NUM_WRITERS; >> + >> + struct data *data = rte_zmalloc(NULL, sizeof(struct data), 0); >> + >> + i = 0; >> + RTE_LCORE_FOREACH_WORKER(lcore_id) { >> + if (i == 0) { >> + rte_eal_remote_launch(writer_run, data, lcore_id); >> + worker_writer_lcore_id = lcore_id; >> + } else { >> + unsigned int reader_idx = i - 1; >> + struct reader *reader = &readers[reader_idx]; >> + >> + reader->data = data; >> + reader->stop = 0; >> + >> + rte_eal_remote_launch(reader_run, reader, lcore_id); >> + reader_lcore_ids[reader_idx] = lcore_id; >> + } >> + i++; >> + } >> + >> + if (writer_run(data) != 0 || >> + rte_eal_wait_lcore(worker_writer_lcore_id) != 0) >> + rc = -1; >> + >> + for (i = 0; i < num_readers; i++) { >> + reader_stop(&readers[i]); >> + if (rte_eal_wait_lcore(reader_lcore_ids[i]) != 0) >> + rc = -1; >> + } >> + >> + return rc; >> +} >> + >> +REGISTER_TEST_COMMAND(seqlock_autotest, test_seqlock); >> diff --git a/lib/eal/common/meson.build b/lib/eal/common/meson.build index >> 917758cc65..a41343bfed 100644 >> --- a/lib/eal/common/meson.build >> +++ b/lib/eal/common/meson.build >> @@ -35,6 +35,7 @@ sources += files( >> 'rte_malloc.c', >> 'rte_random.c', >> 'rte_reciprocal.c', >> + 'rte_seqlock.c', >> 'rte_service.c', >> 'rte_version.c', >> ) >> diff --git a/lib/eal/common/rte_seqlock.c b/lib/eal/common/rte_seqlock.c new >> file mode 100644 index 0000000000..d4fe648799 >> --- /dev/null >> +++ b/lib/eal/common/rte_seqlock.c >> @@ -0,0 +1,12 @@ >> +/* SPDX-License-Identifier: BSD-3-Clause >> + * Copyright(c) 2022 Ericsson AB >> + */ >> + >> +#include <rte_seqlock.h> >> + >> +void >> +rte_seqlock_init(rte_seqlock_t *seqlock) { >> + seqlock->sn = 0; >> + rte_spinlock_init(&seqlock->lock); >> +} >> diff --git a/lib/eal/include/meson.build b/lib/eal/include/meson.build index >> 9700494816..48df5f1a21 100644 >> --- a/lib/eal/include/meson.build >> +++ b/lib/eal/include/meson.build >> @@ -36,6 +36,7 @@ headers += files( >> 'rte_per_lcore.h', >> 'rte_random.h', >> 'rte_reciprocal.h', >> + 'rte_seqlock.h', >> 'rte_service.h', >> 'rte_service_component.h', >> 'rte_string_fns.h', >> diff --git a/lib/eal/include/rte_seqlock.h b/lib/eal/include/rte_seqlock.h new file > Other lock implementations are in lib/eal/include/generic. > >> mode 100644 index 0000000000..44eacd66e8 >> --- /dev/null >> +++ b/lib/eal/include/rte_seqlock.h >> @@ -0,0 +1,302 @@ >> +/* SPDX-License-Identifier: BSD-3-Clause >> + * Copyright(c) 2022 Ericsson AB >> + */ >> + >> +#ifndef _RTE_SEQLOCK_H_ >> +#define _RTE_SEQLOCK_H_ >> + >> +#ifdef __cplusplus >> +extern "C" { >> +#endif >> + >> +/** >> + * @file >> + * RTE Seqlock >> + * >> + * A sequence lock (seqlock) is a synchronization primitive allowing >> + * multiple, parallel, readers to efficiently and safely (i.e., in a >> + * data-race free manner) access the lock-protected data. The RTE >> + * seqlock permits multiple writers as well. A spinlock is used to >> + * writer-writer synchronization. >> + * >> + * A reader never blocks a writer. Very high frequency writes may >> + * prevent readers from making progress. >> + * >> + * A seqlock is not preemption-safe on the writer side. If a writer is >> + * preempted, it may block readers until the writer thread is again >> + * allowed to execute. Heavy computations should be kept out of the >> + * writer-side critical section, to avoid delaying readers. >> + * >> + * Seqlocks are useful for data which are read by many cores, at a >> + * high frequency, and relatively infrequently written to. >> + * >> + * One way to think about seqlocks is that they provide means to >> + * perform atomic operations on objects larger than what the native >> + * machine instructions allow for. >> + * >> + * To avoid resource reclamation issues, the data protected by a >> + * seqlock should typically be kept self-contained (e.g., no pointers >> + * to mutable, dynamically allocated data). >> + * >> + * Example usage: >> + * @code{.c} >> + * #define MAX_Y_LEN (16) >> + * // Application-defined example data structure, protected by a seqlock. >> + * struct config { >> + * rte_seqlock_t lock; >> + * int param_x; >> + * char param_y[MAX_Y_LEN]; >> + * }; >> + * >> + * // Accessor function for reading config fields. >> + * void >> + * config_read(const struct config *config, int *param_x, char >> +*param_y) >> + * { >> + * // Temporary variables, just to improve readability. > I think the above comment is not necessary. It is beneficial to copy the protected data to keep the read side critical section small. > >> + * int tentative_x; >> + * char tentative_y[MAX_Y_LEN]; >> + * uint32_t sn; >> + * >> + * sn = rte_seqlock_read_lock(&config->lock); >> + * do { >> + * // Loads may be atomic or non-atomic, as in this example. >> + * tentative_x = config->param_x; >> + * strcpy(tentative_y, config->param_y); >> + * } while (!rte_seqlock_read_tryunlock(&config->lock, &sn)); >> + * // An application could skip retrying, and try again later, if >> + * // progress is possible without the data. >> + * >> + * *param_x = tentative_x; >> + * strcpy(param_y, tentative_y); >> + * } >> + * >> + * // Accessor function for writing config fields. >> + * void >> + * config_update(struct config *config, int param_x, const char >> +*param_y) >> + * { >> + * rte_seqlock_write_lock(&config->lock); >> + * // Stores may be atomic or non-atomic, as in this example. >> + * config->param_x = param_x; >> + * strcpy(config->param_y, param_y); >> + * rte_seqlock_write_unlock(&config->lock); >> + * } >> + * @endcode >> + * >> + * @see >> + * https://en.wikipedia.org/wiki/Seqlock. >> + */ >> + >> +#include <stdbool.h> >> +#include <stdint.h> >> + >> +#include <rte_atomic.h> >> +#include <rte_branch_prediction.h> >> +#include <rte_spinlock.h> >> + >> +/** >> + * The RTE seqlock type. >> + */ >> +typedef struct { >> + uint32_t sn; /**< A sequence number for the protected data. */ >> + rte_spinlock_t lock; /**< Spinlock used to serialize writers. */ } > Suggest using ticket lock for the writer side. It should have low overhead when there is a single writer, but provides better functionality when there are multiple writers. > >> +rte_seqlock_t; >> + >> +/** >> + * A static seqlock initializer. >> + */ >> +#define RTE_SEQLOCK_INITIALIZER { 0, RTE_SPINLOCK_INITIALIZER } >> + >> +/** >> + * Initialize the seqlock. >> + * >> + * This function initializes the seqlock, and leaves the writer-side >> + * spinlock unlocked. >> + * >> + * @param seqlock >> + * A pointer to the seqlock. >> + */ >> +__rte_experimental >> +void >> +rte_seqlock_init(rte_seqlock_t *seqlock); >> + >> +/** >> + * Begin a read-side critical section. >> + * >> + * A call to this function marks the beginning of a read-side critical >> + * section, for @p seqlock. >> + * >> + * rte_seqlock_read_lock() returns a sequence number, which is later >> + * used in rte_seqlock_read_tryunlock() to check if the protected data >> + * underwent any modifications during the read transaction. >> + * >> + * After (in program order) rte_seqlock_read_lock() has been called, >> + * the calling thread reads the protected data, for later use. The >> + * protected data read *must* be copied (either in pristine form, or >> + * in the form of some derivative), since the caller may only read the >> + * data from within the read-side critical section (i.e., after >> + * rte_seqlock_read_lock() and before rte_seqlock_read_tryunlock()), >> + * but must not act upon the retrieved data while in the critical >> + * section, since it does not yet know if it is consistent. >> + * >> + * The protected data may be read using atomic and/or non-atomic >> + * operations. >> + * >> + * After (in program order) all required data loads have been >> + * performed, rte_seqlock_read_tryunlock() should be called, marking >> + * the end of the read-side critical section. >> + * >> + * If rte_seqlock_read_tryunlock() returns true, the data was read >> + * atomically and the copied data is consistent. >> + * >> + * If rte_seqlock_read_tryunlock() returns false, the just-read data >> + * is inconsistent and should be discarded. The caller has the option >> + * to either re-read the data and call rte_seqlock_read_tryunlock() >> + * again, or to restart the whole procedure (i.e., from >> + * rte_seqlock_read_lock()) at some later time. >> + * >> + * @param seqlock >> + * A pointer to the seqlock. >> + * @return >> + * The seqlock sequence number for this critical section, to >> + * later be passed to rte_seqlock_read_tryunlock(). >> + * >> + * @see rte_seqlock_read_tryunlock() >> + */ >> +__rte_experimental >> +static inline uint32_t >> +rte_seqlock_read_lock(const rte_seqlock_t *seqlock) { >> + /* __ATOMIC_ACQUIRE to prevent loads after (in program order) >> + * from happening before the sn load. Synchronizes-with the >> + * store release in rte_seqlock_write_unlock(). >> + */ >> + return __atomic_load_n(&seqlock->sn, __ATOMIC_ACQUIRE); } >> + >> +/** >> + * End a read-side critical section. >> + * >> + * A call to this function marks the end of a read-side critical > Should we capture that it also begins a new critical-section for the subsequent calls to rte_seqlock_tryunlock()? > >> + * section, for @p seqlock. The application must supply the sequence >> + * number produced by the corresponding rte_seqlock_read_lock() (or, >> + * in case of a retry, the rte_seqlock_tryunlock()) call. >> + * >> + * After this function has been called, the caller should not access >> + * the protected data. > I understand what you mean here. But, I think this needs clarity. > In the documentation for rte_seqlock_read_lock() you have mentioned, if rte_seqlock_read_tryunlock() returns false, one could re-read the data. > May be this should be changed to: > " After this function returns true, the caller should not access the protected data."? > Or may be combine it with the following para. > >> + * >> + * In case this function returns true, the just-read data was >> + * consistent and the set of atomic and non-atomic load operations >> + * performed between rte_seqlock_read_lock() and >> + * rte_seqlock_read_tryunlock() were atomic, as a whole. >> + * >> + * In case rte_seqlock_read_tryunlock() returns false, the data was >> + * modified as it was being read and may be inconsistent, and thus >> + * should be discarded. The @p begin_sn is updated with the >> + * now-current sequence number. > May be > " The @p begin_sn is updated with the sequence number for the next critical section." > Sounds good. >> + * >> + * @param seqlock >> + * A pointer to the seqlock. >> + * @param begin_sn >> + * The seqlock sequence number returned by >> + * rte_seqlock_read_lock() (potentially updated in subsequent >> + * rte_seqlock_read_tryunlock() calls) for this critical section. >> + * @return >> + * true or false, if the just-read seqlock-protected data was consistent >> + * or inconsistent, respectively, at the time it was read. > true - just read protected data was consistent > false - just read protected data was inconsistent > >> + * >> + * @see rte_seqlock_read_lock() >> + */ >> +__rte_experimental >> +static inline bool >> +rte_seqlock_read_tryunlock(const rte_seqlock_t *seqlock, uint32_t >> +*begin_sn) { >> + uint32_t end_sn; >> + >> + /* make sure the data loads happens before the sn load */ >> + rte_atomic_thread_fence(__ATOMIC_ACQUIRE); >> + >> + end_sn = __atomic_load_n(&seqlock->sn, __ATOMIC_RELAXED); >> + >> + if (unlikely(end_sn & 1 || *begin_sn != end_sn)) { >> + *begin_sn = end_sn; >> + return false; >> + } >> + >> + return true; >> +} >> + >> +/** >> + * Begin a write-side critical section. >> + * >> + * A call to this function acquires the write lock associated @p >> + * seqlock, and marks the beginning of a write-side critical section. >> + * >> + * After having called this function, the caller may go on to modify >> + * (both read and write) the protected data, in an atomic or >> + * non-atomic manner. >> + * >> + * After the necessary updates have been performed, the application >> + * calls rte_seqlock_write_unlock(). >> + * >> + * This function is not preemption-safe in the sense that preemption >> + * of the calling thread may block reader progress until the writer >> + * thread is rescheduled. >> + * >> + * Unlike rte_seqlock_read_lock(), each call made to >> + * rte_seqlock_write_lock() must be matched with an unlock call. >> + * >> + * @param seqlock >> + * A pointer to the seqlock. >> + * >> + * @see rte_seqlock_write_unlock() >> + */ >> +__rte_experimental >> +static inline void >> +rte_seqlock_write_lock(rte_seqlock_t *seqlock) { >> + uint32_t sn; >> + >> + /* to synchronize with other writers */ >> + rte_spinlock_lock(&seqlock->lock); >> + >> + sn = seqlock->sn + 1; > The load of seqlock->sn could use __atomic_load_n to be consistent. > But why? I know it doesn't have any cost (these loads are going to be atomic anyways), but why use a construct with stronger guarantees than you have to? >> + >> + __atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELAXED); >> + >> + /* __ATOMIC_RELEASE to prevent stores after (in program order) >> + * from happening before the sn store. >> + */ >> + rte_atomic_thread_fence(__ATOMIC_RELEASE); >> +} >> + >> +/** >> + * End a write-side critical section. >> + * >> + * A call to this function marks the end of the write-side critical >> + * section, for @p seqlock. After this call has been made, the >> +protected >> + * data may no longer be modified. >> + * >> + * @param seqlock >> + * A pointer to the seqlock. >> + * >> + * @see rte_seqlock_write_lock() >> + */ >> +__rte_experimental >> +static inline void >> +rte_seqlock_write_unlock(rte_seqlock_t *seqlock) { >> + uint32_t sn; >> + >> + sn = seqlock->sn + 1; > Same here, the load of seqlock->sn could use __atomic_load_n > >> + >> + /* synchronizes-with the load acquire in rte_seqlock_read_lock() */ >> + __atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELEASE); >> + >> + rte_spinlock_unlock(&seqlock->lock); >> +} >> + >> +#ifdef __cplusplus >> +} >> +#endif >> + >> +#endif /* _RTE_SEQLOCK_H_ */ >> diff --git a/lib/eal/version.map b/lib/eal/version.map index >> b53eeb30d7..4a9d0ed899 100644 >> --- a/lib/eal/version.map >> +++ b/lib/eal/version.map >> @@ -420,6 +420,9 @@ EXPERIMENTAL { >> rte_intr_instance_free; >> rte_intr_type_get; >> rte_intr_type_set; >> + >> + # added in 22.07 >> + rte_seqlock_init; >> }; >> >> INTERNAL { >> -- >> 2.25.1 > ^ permalink raw reply [flat|nested] 104+ messages in thread
* RE: [PATCH v3] eal: add seqlock 2022-04-03 6:10 ` [PATCH v3] eal: add seqlock Mattias Rönnblom @ 2022-04-03 17:27 ` Honnappa Nagarahalli 2022-04-03 18:37 ` Ola Liljedahl 0 siblings, 1 reply; 104+ messages in thread From: Honnappa Nagarahalli @ 2022-04-03 17:27 UTC (permalink / raw) To: Mattias Rönnblom, dev Cc: thomas, David Marchand, onar.olsen, nd, konstantin.ananyev, mb, stephen, Ola Liljedahl, nd <snip> > >> a/app/test/test_seqlock.c b/app/test/test_seqlock.c new file mode > >> 100644 index 0000000000..54fadf8025 > >> --- /dev/null > >> +++ b/app/test/test_seqlock.c > >> @@ -0,0 +1,202 @@ > >> +/* SPDX-License-Identifier: BSD-3-Clause > >> + * Copyright(c) 2022 Ericsson AB > >> + */ > >> + > >> +#include <rte_seqlock.h> > >> + > >> +#include <rte_cycles.h> > >> +#include <rte_malloc.h> > >> +#include <rte_random.h> > >> + > >> +#include <inttypes.h> > >> + > >> +#include "test.h" > >> + > >> +struct data { > >> + rte_seqlock_t lock; > >> + > >> + uint64_t a; > >> + uint64_t b __rte_cache_aligned; > >> + uint64_t c __rte_cache_aligned; > >> +} __rte_cache_aligned; > >> + > >> +struct reader { > >> + struct data *data; > >> + uint8_t stop; > >> +}; > >> + > >> +#define WRITER_RUNTIME (2.0) /* s */ > >> + > >> +#define WRITER_MAX_DELAY (100) /* us */ > >> + > >> +#define INTERRUPTED_WRITER_FREQUENCY (1000) #define > >> +WRITER_INTERRUPT_TIME (1) /* us */ > >> + > >> +static int > >> +writer_run(void *arg) > >> +{ > >> + struct data *data = arg; > >> + uint64_t deadline; > >> + > >> + deadline = rte_get_timer_cycles() + > >> + WRITER_RUNTIME * rte_get_timer_hz(); > >> + > >> + while (rte_get_timer_cycles() < deadline) { > >> + bool interrupted; > >> + uint64_t new_value; > >> + unsigned int delay; > >> + > >> + new_value = rte_rand(); > >> + > >> + interrupted = > >> rte_rand_max(INTERRUPTED_WRITER_FREQUENCY) == 0; > >> + > >> + rte_seqlock_write_lock(&data->lock); > >> + > >> + data->c = new_value; > >> + > >> + /* These compiler barriers (both on the test reader > >> + * and the test writer side) are here to ensure that > >> + * loads/stores *usually* happen in test program order > >> + * (always on a TSO machine). They are arrange in such > >> + * a way that the writer stores in a different order > >> + * than the reader loads, to emulate an arbitrary > >> + * order. A real application using a seqlock does not > >> + * require any compiler barriers. > >> + */ > >> + rte_compiler_barrier(); > > The compiler barriers are not sufficient on all architectures (if the intention > is to maintain the program order). > > > > The intention is what is described in the comment (i.e., to make it likely, but > no guaranteed, that the stores will be globally visible in the program order). > > The reason I didn't put in a release memory barrier, was that it seems a little > intrusive. > > Maybe I should remove these compiler barriers. They are also intrusive in the > way may prevent some compiler optimizations, that could expose a seqlock > bug. Or, I could have two variants of the tests. I don't know. I would suggest removing the compiler barriers, leave it to the CPU to do what it can do. > > >> + data->b = new_value; > >> + > >> + if (interrupted) > >> + rte_delay_us_block(WRITER_INTERRUPT_TIME); > >> + > >> + rte_compiler_barrier(); > >> + data->a = new_value; > >> + > >> + rte_seqlock_write_unlock(&data->lock); > >> + > >> + delay = rte_rand_max(WRITER_MAX_DELAY); > >> + > >> + rte_delay_us_block(delay); > >> + } > >> + > >> + return 0; > >> +} > >> + <snip> > >> + > >> +/** > >> + * Begin a write-side critical section. > >> + * > >> + * A call to this function acquires the write lock associated @p > >> + * seqlock, and marks the beginning of a write-side critical section. > >> + * > >> + * After having called this function, the caller may go on to modify > >> + * (both read and write) the protected data, in an atomic or > >> + * non-atomic manner. > >> + * > >> + * After the necessary updates have been performed, the application > >> + * calls rte_seqlock_write_unlock(). > >> + * > >> + * This function is not preemption-safe in the sense that preemption > >> + * of the calling thread may block reader progress until the writer > >> + * thread is rescheduled. > >> + * > >> + * Unlike rte_seqlock_read_lock(), each call made to > >> + * rte_seqlock_write_lock() must be matched with an unlock call. > >> + * > >> + * @param seqlock > >> + * A pointer to the seqlock. > >> + * > >> + * @see rte_seqlock_write_unlock() > >> + */ > >> +__rte_experimental > >> +static inline void > >> +rte_seqlock_write_lock(rte_seqlock_t *seqlock) { > >> + uint32_t sn; > >> + > >> + /* to synchronize with other writers */ > >> + rte_spinlock_lock(&seqlock->lock); > >> + > >> + sn = seqlock->sn + 1; > > The load of seqlock->sn could use __atomic_load_n to be consistent. > > > > But why? I know it doesn't have any cost (these loads are going to be atomic > anyways), but why use a construct with stronger guarantees than you have > to? Using __atomic_xxx ensures that the operation is atomic always. I believe (I am not sure) that, when not using __atomic_xxx, the compiler is allowed to use non-atomic operations. The other reason is we are not qualifying 'sn' as volatile. Use of __atomic_xxx inherently indicate to the compiler not to cache 'sn' in a register. I do not know the compiler behavior if some operations on 'sn' use __atomic_xxx and some do not. > > >> + > >> + __atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELAXED); > >> + > >> + /* __ATOMIC_RELEASE to prevent stores after (in program order) > >> + * from happening before the sn store. > >> + */ > >> + rte_atomic_thread_fence(__ATOMIC_RELEASE); > >> +} > >> + > >> +/** > >> + * End a write-side critical section. > >> + * > >> + * A call to this function marks the end of the write-side critical > >> + * section, for @p seqlock. After this call has been made, the > >> +protected > >> + * data may no longer be modified. > >> + * > >> + * @param seqlock > >> + * A pointer to the seqlock. > >> + * > >> + * @see rte_seqlock_write_lock() > >> + */ > >> +__rte_experimental > >> +static inline void > >> +rte_seqlock_write_unlock(rte_seqlock_t *seqlock) { > >> + uint32_t sn; > >> + > >> + sn = seqlock->sn + 1; > > Same here, the load of seqlock->sn could use __atomic_load_n > > > >> + > >> + /* synchronizes-with the load acquire in rte_seqlock_read_lock() */ > >> + __atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELEASE); > >> + > >> + rte_spinlock_unlock(&seqlock->lock); > >> +} > >> + > >> +#ifdef __cplusplus > >> +} > >> +#endif > >> + > >> +#endif /* _RTE_SEQLOCK_H_ */ <snip> ^ permalink raw reply [flat|nested] 104+ messages in thread
* Re: [PATCH v3] eal: add seqlock 2022-04-03 17:27 ` Honnappa Nagarahalli @ 2022-04-03 18:37 ` Ola Liljedahl 2022-04-04 21:56 ` Honnappa Nagarahalli 0 siblings, 1 reply; 104+ messages in thread From: Ola Liljedahl @ 2022-04-03 18:37 UTC (permalink / raw) To: Honnappa Nagarahalli Cc: Mattias Rönnblom, dev, thomas, David Marchand, onar.olsen, nd, konstantin.ananyev, mb, stephen >>>> +__rte_experimental >>>> +static inline void >>>> +rte_seqlock_write_lock(rte_seqlock_t *seqlock) { >>>> + uint32_t sn; >>>> + >>>> + /* to synchronize with other writers */ >>>> + rte_spinlock_lock(&seqlock->lock); >>>> + >>>> + sn = seqlock->sn + 1; >>> The load of seqlock->sn could use __atomic_load_n to be consistent. >>> >> >> But why? I know it doesn't have any cost (these loads are going to be atomic >> anyways), but why use a construct with stronger guarantees than you have >> to? > Using __atomic_xxx ensures that the operation is atomic always. I believe (I am not sure) that, when not using __atomic_xxx, the compiler is allowed to use non-atomic operations. > The other reason is we are not qualifying 'sn' as volatile. Use of __atomic_xxx inherently indicate to the compiler not to cache 'sn' in a register. I do not know the compiler behavior if some operations on 'sn' use __atomic_xxx and some do not. We don’t need an atomic read here as the seqlock->lock protects (serialises) writer-side accesses to seqlock->sn. There is no other thread which could update seqlock->sn while this thread owns the lock. The seqlock owner could read seqlock->sn byte for byte without any problems. Only writes to seqlock->sn need to be atomic as there might be readers who read seqlock->sn and in such multi-access scenarios, all accesses need to be atomic in order to avoid data races. If seqlock->sn was a C11 _Atomic type, all accesses would automatically be atomic. - Ola ^ permalink raw reply [flat|nested] 104+ messages in thread
* RE: [PATCH v3] eal: add seqlock 2022-04-03 18:37 ` Ola Liljedahl @ 2022-04-04 21:56 ` Honnappa Nagarahalli 0 siblings, 0 replies; 104+ messages in thread From: Honnappa Nagarahalli @ 2022-04-04 21:56 UTC (permalink / raw) To: Ola Liljedahl Cc: Mattias Rönnblom, dev, thomas, David Marchand, onar.olsen, nd, konstantin.ananyev, mb, stephen, nd <snip> > > > >>>> +__rte_experimental > >>>> +static inline void > >>>> +rte_seqlock_write_lock(rte_seqlock_t *seqlock) { > >>>> + uint32_t sn; > >>>> + > >>>> + /* to synchronize with other writers */ > >>>> + rte_spinlock_lock(&seqlock->lock); > >>>> + > >>>> + sn = seqlock->sn + 1; > >>> The load of seqlock->sn could use __atomic_load_n to be consistent. > >>> > >> > >> But why? I know it doesn't have any cost (these loads are going to be > >> atomic anyways), but why use a construct with stronger guarantees > >> than you have to? > > Using __atomic_xxx ensures that the operation is atomic always. I believe (I > am not sure) that, when not using __atomic_xxx, the compiler is allowed to > use non-atomic operations. > > The other reason is we are not qualifying 'sn' as volatile. Use of > __atomic_xxx inherently indicate to the compiler not to cache 'sn' in a > register. I do not know the compiler behavior if some operations on 'sn' use > __atomic_xxx and some do not. > We don’t need an atomic read here as the seqlock->lock protects (serialises) > writer-side accesses to seqlock->sn. There is no other thread which could > update seqlock->sn while this thread owns the lock. The seqlock owner could > read seqlock->sn byte for byte without any problems. > Only writes to seqlock->sn need to be atomic as there might be readers who > read seqlock->sn and in such multi-access scenarios, all accesses need to be > atomic in order to avoid data races. How does the compiler interpret a mix of __atomic_xxx and non-atomic access to a memory location? What are the guarantees the compiler provides in such cases? Do you see any harm in making this an atomic operation? > > If seqlock->sn was a C11 _Atomic type, all accesses would automatically be > atomic. Exactly, we do not have that currently. > > - Ola > ^ permalink raw reply [flat|nested] 104+ messages in thread
* Re: [PATCH v3] eal: add seqlock 2022-04-02 0:21 ` Honnappa Nagarahalli 2022-04-02 11:01 ` Morten Brørup 2022-04-03 6:10 ` [PATCH v3] eal: add seqlock Mattias Rönnblom @ 2022-04-03 6:33 ` Mattias Rönnblom 2022-04-03 17:37 ` Honnappa Nagarahalli 2 siblings, 1 reply; 104+ messages in thread From: Mattias Rönnblom @ 2022-04-03 6:33 UTC (permalink / raw) To: Honnappa Nagarahalli, dev Cc: thomas, David Marchand, onar.olsen, nd, konstantin.ananyev, mb, stephen, Ola Liljedahl I missed some of your comments. On 2022-04-02 02:21, Honnappa Nagarahalli wrote: <snip> >> + * Example usage: >> + * @code{.c} >> + * #define MAX_Y_LEN (16) >> + * // Application-defined example data structure, protected by a seqlock. >> + * struct config { >> + * rte_seqlock_t lock; >> + * int param_x; >> + * char param_y[MAX_Y_LEN]; >> + * }; >> + * >> + * // Accessor function for reading config fields. >> + * void >> + * config_read(const struct config *config, int *param_x, char >> +*param_y) >> + * { >> + * // Temporary variables, just to improve readability. > I think the above comment is not necessary. It is beneficial to copy the protected data to keep the read side critical section small. > The data here would be copied into the buffers supplied by config_read() anyways, so it's a copy regardless. >> + * int tentative_x; >> + * char tentative_y[MAX_Y_LEN]; >> + * uint32_t sn; >> + * >> + * sn = rte_seqlock_read_lock(&config->lock); >> + * do { >> + * // Loads may be atomic or non-atomic, as in this example. >> + * tentative_x = config->param_x; >> + * strcpy(tentative_y, config->param_y); >> + * } while (!rte_seqlock_read_tryunlock(&config->lock, &sn)); >> + * // An application could skip retrying, and try again later, if >> + * // progress is possible without the data. >> + * >> + * *param_x = tentative_x; >> + * strcpy(param_y, tentative_y); >> + * } >> + * >> + * // Accessor function for writing config fields. >> + * void >> + * config_update(struct config *config, int param_x, const char >> +*param_y) >> + * { >> + * rte_seqlock_write_lock(&config->lock); >> + * // Stores may be atomic or non-atomic, as in this example. >> + * config->param_x = param_x; >> + * strcpy(config->param_y, param_y); >> + * rte_seqlock_write_unlock(&config->lock); >> + * } >> + * @endcode >> + * >> + * @see >> + * https://en.wikipedia.org/wiki/Seqlock. >> + */ >> + >> +#include <stdbool.h> >> +#include <stdint.h> >> + >> +#include <rte_atomic.h> >> +#include <rte_branch_prediction.h> >> +#include <rte_spinlock.h> >> + >> +/** >> + * The RTE seqlock type. >> + */ >> +typedef struct { >> + uint32_t sn; /**< A sequence number for the protected data. */ >> + rte_spinlock_t lock; /**< Spinlock used to serialize writers. */ } > Suggest using ticket lock for the writer side. It should have low overhead when there is a single writer, but provides better functionality when there are multiple writers. > Is a seqlock the synchronization primitive of choice for high-contention cases? I would say no, but I'm not sure what you would use instead. <snip> ^ permalink raw reply [flat|nested] 104+ messages in thread
* RE: [PATCH v3] eal: add seqlock 2022-04-03 6:33 ` Mattias Rönnblom @ 2022-04-03 17:37 ` Honnappa Nagarahalli 2022-04-08 13:45 ` Mattias Rönnblom 0 siblings, 1 reply; 104+ messages in thread From: Honnappa Nagarahalli @ 2022-04-03 17:37 UTC (permalink / raw) To: Mattias Rönnblom, dev Cc: thomas, David Marchand, onar.olsen, nd, konstantin.ananyev, mb, stephen, Ola Liljedahl, nd <snip> > > >> + * Example usage: > >> + * @code{.c} > >> + * #define MAX_Y_LEN (16) > >> + * // Application-defined example data structure, protected by a seqlock. > >> + * struct config { > >> + * rte_seqlock_t lock; > >> + * int param_x; > >> + * char param_y[MAX_Y_LEN]; > >> + * }; > >> + * > >> + * // Accessor function for reading config fields. > >> + * void > >> + * config_read(const struct config *config, int *param_x, char > >> +*param_y) > >> + * { > >> + * // Temporary variables, just to improve readability. > > I think the above comment is not necessary. It is beneficial to copy the > protected data to keep the read side critical section small. > > > > The data here would be copied into the buffers supplied by config_read() > anyways, so it's a copy regardless. I see what you mean here. I would think the local variables add confusion, the copy can happen to the passed parameters directly. I will leave it to you to decide. > > >> + * int tentative_x; > >> + * char tentative_y[MAX_Y_LEN]; > >> + * uint32_t sn; > >> + * > >> + * sn = rte_seqlock_read_lock(&config->lock); > >> + * do { > >> + * // Loads may be atomic or non-atomic, as in this example. > >> + * tentative_x = config->param_x; > >> + * strcpy(tentative_y, config->param_y); > >> + * } while (!rte_seqlock_read_tryunlock(&config->lock, &sn)); > >> + * // An application could skip retrying, and try again later, if > >> + * // progress is possible without the data. > >> + * > >> + * *param_x = tentative_x; > >> + * strcpy(param_y, tentative_y); > >> + * } > >> + * > >> + * // Accessor function for writing config fields. > >> + * void > >> + * config_update(struct config *config, int param_x, const char > >> +*param_y) > >> + * { > >> + * rte_seqlock_write_lock(&config->lock); > >> + * // Stores may be atomic or non-atomic, as in this example. > >> + * config->param_x = param_x; > >> + * strcpy(config->param_y, param_y); > >> + * rte_seqlock_write_unlock(&config->lock); > >> + * } > >> + * @endcode > >> + * > >> + * @see > >> + * https://en.wikipedia.org/wiki/Seqlock. > >> + */ > >> + > >> +#include <stdbool.h> > >> +#include <stdint.h> > >> + > >> +#include <rte_atomic.h> > >> +#include <rte_branch_prediction.h> > >> +#include <rte_spinlock.h> > >> + > >> +/** > >> + * The RTE seqlock type. > >> + */ > >> +typedef struct { > >> + uint32_t sn; /**< A sequence number for the protected data. */ > >> + rte_spinlock_t lock; /**< Spinlock used to serialize writers. */ } > > Suggest using ticket lock for the writer side. It should have low overhead > when there is a single writer, but provides better functionality when there are > multiple writers. > > > > Is a seqlock the synchronization primitive of choice for high-contention cases? > I would say no, but I'm not sure what you would use instead. I think Stephen has come across some use cases of high contention writers with readers, maybe Stephen can provide some input. IMO, there is no harm/perf issues in using ticket lock. > > <snip> ^ permalink raw reply [flat|nested] 104+ messages in thread
* Re: [PATCH v3] eal: add seqlock 2022-04-03 17:37 ` Honnappa Nagarahalli @ 2022-04-08 13:45 ` Mattias Rönnblom 0 siblings, 0 replies; 104+ messages in thread From: Mattias Rönnblom @ 2022-04-08 13:45 UTC (permalink / raw) To: Honnappa Nagarahalli, dev Cc: thomas, David Marchand, onar.olsen, nd, konstantin.ananyev, mb, stephen, Ola Liljedahl, Mattias Rönnblom On 2022-04-03 19:37, Honnappa Nagarahalli wrote: > <snip> > >> >>>> + * Example usage: >>>> + * @code{.c} >>>> + * #define MAX_Y_LEN (16) >>>> + * // Application-defined example data structure, protected by a seqlock. >>>> + * struct config { >>>> + * rte_seqlock_t lock; >>>> + * int param_x; >>>> + * char param_y[MAX_Y_LEN]; >>>> + * }; >>>> + * >>>> + * // Accessor function for reading config fields. >>>> + * void >>>> + * config_read(const struct config *config, int *param_x, char >>>> +*param_y) >>>> + * { >>>> + * // Temporary variables, just to improve readability. >>> I think the above comment is not necessary. It is beneficial to copy the >> protected data to keep the read side critical section small. >>> >> >> The data here would be copied into the buffers supplied by config_read() >> anyways, so it's a copy regardless. > I see what you mean here. I would think the local variables add confusion, the copy can happen to the passed parameters directly. I will leave it to you to decide. > I'll remove the temp variables. >> >>>> + * int tentative_x; >>>> + * char tentative_y[MAX_Y_LEN]; >>>> + * uint32_t sn; >>>> + * >>>> + * sn = rte_seqlock_read_lock(&config->lock); >>>> + * do { >>>> + * // Loads may be atomic or non-atomic, as in this example. >>>> + * tentative_x = config->param_x; >>>> + * strcpy(tentative_y, config->param_y); >>>> + * } while (!rte_seqlock_read_tryunlock(&config->lock, &sn)); >>>> + * // An application could skip retrying, and try again later, if >>>> + * // progress is possible without the data. >>>> + * >>>> + * *param_x = tentative_x; >>>> + * strcpy(param_y, tentative_y); >>>> + * } >>>> + * >>>> + * // Accessor function for writing config fields. >>>> + * void >>>> + * config_update(struct config *config, int param_x, const char >>>> +*param_y) >>>> + * { >>>> + * rte_seqlock_write_lock(&config->lock); >>>> + * // Stores may be atomic or non-atomic, as in this example. >>>> + * config->param_x = param_x; >>>> + * strcpy(config->param_y, param_y); >>>> + * rte_seqlock_write_unlock(&config->lock); >>>> + * } >>>> + * @endcode >>>> + * >>>> + * @see >>>> + * https://en.wikipedia.org/wiki/Seqlock. >>>> + */ >>>> + >>>> +#include <stdbool.h> >>>> +#include <stdint.h> >>>> + >>>> +#include <rte_atomic.h> >>>> +#include <rte_branch_prediction.h> >>>> +#include <rte_spinlock.h> >>>> + >>>> +/** >>>> + * The RTE seqlock type. >>>> + */ >>>> +typedef struct { >>>> + uint32_t sn; /**< A sequence number for the protected data. */ >>>> + rte_spinlock_t lock; /**< Spinlock used to serialize writers. */ } >>> Suggest using ticket lock for the writer side. It should have low overhead >> when there is a single writer, but provides better functionality when there are >> multiple writers. >>> >> >> Is a seqlock the synchronization primitive of choice for high-contention cases? >> I would say no, but I'm not sure what you would use instead. > I think Stephen has come across some use cases of high contention writers with readers, maybe Stephen can provide some input. > > IMO, there is no harm/perf issues in using ticket lock. > OK. I will leave at as spinlock for now (PATCH v4). ^ permalink raw reply [flat|nested] 104+ messages in thread
* Re: [PATCH v3] eal: add seqlock 2022-04-01 15:07 ` [PATCH v3] " Mattias Rönnblom 2022-04-02 0:21 ` Honnappa Nagarahalli @ 2022-04-02 18:15 ` Ola Liljedahl 2022-04-02 19:31 ` Honnappa Nagarahalli 2022-04-03 6:51 ` Mattias Rönnblom 1 sibling, 2 replies; 104+ messages in thread From: Ola Liljedahl @ 2022-04-02 18:15 UTC (permalink / raw) To: Mattias Rönnblom, dev Cc: Thomas Monjalon, David Marchand, onar.olsen, Honnappa.Nagarahalli, nd, konstantin.ananyev, mb, stephen On 4/1/22 17:07, Mattias Rönnblom wrote: > + > +/** > + * End a read-side critical section. > + * > + * A call to this function marks the end of a read-side critical > + * section, for @p seqlock. The application must supply the sequence > + * number produced by the corresponding rte_seqlock_read_lock() (or, > + * in case of a retry, the rte_seqlock_tryunlock()) call. > + * > + * After this function has been called, the caller should not access > + * the protected data. > + * > + * In case this function returns true, the just-read data was > + * consistent and the set of atomic and non-atomic load operations > + * performed between rte_seqlock_read_lock() and > + * rte_seqlock_read_tryunlock() were atomic, as a whole. > + * > + * In case rte_seqlock_read_tryunlock() returns false, the data was > + * modified as it was being read and may be inconsistent, and thus > + * should be discarded. The @p begin_sn is updated with the > + * now-current sequence number. > + * > + * @param seqlock > + * A pointer to the seqlock. > + * @param begin_sn > + * The seqlock sequence number returned by > + * rte_seqlock_read_lock() (potentially updated in subsequent > + * rte_seqlock_read_tryunlock() calls) for this critical section. > + * @return > + * true or false, if the just-read seqlock-protected data was consistent > + * or inconsistent, respectively, at the time it was read. > + * > + * @see rte_seqlock_read_lock() > + */ > +__rte_experimental > +static inline bool > +rte_seqlock_read_tryunlock(const rte_seqlock_t *seqlock, uint32_t *begin_sn) > +{ > + uint32_t end_sn; > + > + /* make sure the data loads happens before the sn load */ > + rte_atomic_thread_fence(__ATOMIC_ACQUIRE); > + > + end_sn = __atomic_load_n(&seqlock->sn, __ATOMIC_RELAXED); Since we are reading and potentially returning the sequence number here (repeating the read of the protected data), we need to use load-acquire. I assume it is not expected that the user will call rte_seqlock_read_lock() again. Seeing this implementation, I might actually prefer the original implementation, I think it is cleaner. But I would like for the begin function also to wait for an even sequence number, the end function would only have to check for same sequence number, this might improve performance a little bit as readers won't perform one or several broken reads while a write is in progress. The function names are a different thing though. The writer side behaves much more like a lock with mutual exclusion so write_lock/write_unlock makes sense. > + > + if (unlikely(end_sn & 1 || *begin_sn != end_sn)) { > + *begin_sn = end_sn; > + return false; > + } > + > + return true; > +} > + ^ permalink raw reply [flat|nested] 104+ messages in thread
* RE: [PATCH v3] eal: add seqlock 2022-04-02 18:15 ` Ola Liljedahl @ 2022-04-02 19:31 ` Honnappa Nagarahalli 2022-04-02 20:36 ` Morten Brørup 2022-04-03 18:11 ` Ola Liljedahl 2022-04-03 6:51 ` Mattias Rönnblom 1 sibling, 2 replies; 104+ messages in thread From: Honnappa Nagarahalli @ 2022-04-02 19:31 UTC (permalink / raw) To: Ola Liljedahl, Mattias Rönnblom, dev Cc: thomas, David Marchand, onar.olsen, nd, konstantin.ananyev, mb, stephen, nd <snip> > > +__rte_experimental > > +static inline bool > > +rte_seqlock_read_tryunlock(const rte_seqlock_t *seqlock, uint32_t > > +*begin_sn) { > > + uint32_t end_sn; > > + > > + /* make sure the data loads happens before the sn load */ > > + rte_atomic_thread_fence(__ATOMIC_ACQUIRE); > > + > > + end_sn = __atomic_load_n(&seqlock->sn, __ATOMIC_RELAXED); > > Since we are reading and potentially returning the sequence number here > (repeating the read of the protected data), we need to use load-acquire. > I assume it is not expected that the user will call > rte_seqlock_read_lock() again. Good point, we need a load-acquire (due to changes done in v3). > > Seeing this implementation, I might actually prefer the original > implementation, I think it is cleaner. But I would like for the begin function > also to wait for an even sequence number, the end function would only have > to check for same sequence number, this might improve performance a little > bit as readers won't perform one or several broken reads while a write is in > progress. The function names are a different thing though. I think we need to be optimizing for the case where there is no contention between readers and writers (as that happens most of the time). From this perspective, not checking for an even seq number in the begin function would reduce one 'if' statement. Going back to the earlier model is better as well, because of the load-acquire required in the 'rte_seqlock_read_tryunlock' function. The earlier model would not introduce the load-acquire for the no contention case. > > The writer side behaves much more like a lock with mutual exclusion so > write_lock/write_unlock makes sense. > > > + > > + if (unlikely(end_sn & 1 || *begin_sn != end_sn)) { > > + *begin_sn = end_sn; > > + return false; > > + } > > + > > + return true; > > +} > > + ^ permalink raw reply [flat|nested] 104+ messages in thread
* RE: [PATCH v3] eal: add seqlock 2022-04-02 19:31 ` Honnappa Nagarahalli @ 2022-04-02 20:36 ` Morten Brørup 2022-04-02 22:01 ` Honnappa Nagarahalli 2022-04-03 18:11 ` Ola Liljedahl 1 sibling, 1 reply; 104+ messages in thread From: Morten Brørup @ 2022-04-02 20:36 UTC (permalink / raw) To: Honnappa Nagarahalli, Ola Liljedahl, Mattias Rönnblom, dev Cc: thomas, David Marchand, onar.olsen, nd, konstantin.ananyev, stephen, nd > From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com] > Sent: Saturday, 2 April 2022 21.31 > > <snip> > > > > +__rte_experimental > > > +static inline bool > > > +rte_seqlock_read_tryunlock(const rte_seqlock_t *seqlock, uint32_t > > > +*begin_sn) { > > > + uint32_t end_sn; > > > + > > > + /* make sure the data loads happens before the sn load */ > > > + rte_atomic_thread_fence(__ATOMIC_ACQUIRE); > > > + > > > + end_sn = __atomic_load_n(&seqlock->sn, __ATOMIC_RELAXED); > > > > Since we are reading and potentially returning the sequence number > here > > (repeating the read of the protected data), we need to use load- > acquire. > > I assume it is not expected that the user will call > > rte_seqlock_read_lock() again. > Good point, we need a load-acquire (due to changes done in v3). > > > > > Seeing this implementation, I might actually prefer the original > > implementation, I think it is cleaner. But I would like for the begin > function > > also to wait for an even sequence number, the end function would only > have > > to check for same sequence number, this might improve performance a > little > > bit as readers won't perform one or several broken reads while a > write is in > > progress. The function names are a different thing though. > I think we need to be optimizing for the case where there is no > contention between readers and writers (as that happens most of the > time). From this perspective, not checking for an even seq number in > the begin function would reduce one 'if' statement. I might be siding with Ola on this, but with a twist: The read_lock() should not wait, but test. (Or both variants could be available. Or all three, including the variant without checking for an even sequence number.) My argument for this is: The write operation could take a long time to complete, and while this goes on, it is good for the reading threads to know at entry of their critical read section that the read operation will fail, so they can take the alternative code path instead of proceeding into the critical read section. Otherwise, the reading threads have to waste time reading the protected data, only to discard them at the end. It's an optimization based on the assumption that reading the protected data has some small cost, because this small cost adds up if done many times during a longwinded write operation. And, although checking for the sequence number in read_trylock() adds an 'if' statement to it, that 'if' statement should be surrounded by likely() to reduce its cost in the case we are optimizing for, i.e. when no write operation is ongoing. This means that read_trylock() returns a boolean, and the sequence number is returned in an output parameter. Please note that it doesn't change the fact that read_tryunlock() can still fail, even though read_trylock() gave the go-ahead. I'm trying to highlight that while we all agree to optimize for the case of reading while no writing is ongoing, there might be opportunity for optimizing for the opposite case (i.e. trying to read while writing is ongoing) at the same time. I only hope it can be done with negligent performance cost for the primary case. I'll respectfully leave the hardcore implementation details and performance considerations to you experts in this area. :-) ^ permalink raw reply [flat|nested] 104+ messages in thread
* RE: [PATCH v3] eal: add seqlock 2022-04-02 20:36 ` Morten Brørup @ 2022-04-02 22:01 ` Honnappa Nagarahalli 0 siblings, 0 replies; 104+ messages in thread From: Honnappa Nagarahalli @ 2022-04-02 22:01 UTC (permalink / raw) To: Morten Brørup, Ola Liljedahl, Mattias Rönnblom, dev Cc: thomas, David Marchand, onar.olsen, nd, konstantin.ananyev, stephen, nd, nd <snip> > > > > > > +__rte_experimental > > > > +static inline bool > > > > +rte_seqlock_read_tryunlock(const rte_seqlock_t *seqlock, uint32_t > > > > +*begin_sn) { > > > > + uint32_t end_sn; > > > > + > > > > + /* make sure the data loads happens before the sn load */ > > > > + rte_atomic_thread_fence(__ATOMIC_ACQUIRE); > > > > + > > > > + end_sn = __atomic_load_n(&seqlock->sn, __ATOMIC_RELAXED); > > > > > > Since we are reading and potentially returning the sequence number > > here > > > (repeating the read of the protected data), we need to use load- > > acquire. > > > I assume it is not expected that the user will call > > > rte_seqlock_read_lock() again. > > Good point, we need a load-acquire (due to changes done in v3). > > > > > > > > Seeing this implementation, I might actually prefer the original > > > implementation, I think it is cleaner. But I would like for the > > > begin > > function > > > also to wait for an even sequence number, the end function would > > > only > > have > > > to check for same sequence number, this might improve performance a > > little > > > bit as readers won't perform one or several broken reads while a > > write is in > > > progress. The function names are a different thing though. > > I think we need to be optimizing for the case where there is no > > contention between readers and writers (as that happens most of the > > time). From this perspective, not checking for an even seq number in > > the begin function would reduce one 'if' statement. > > I might be siding with Ola on this, but with a twist: The read_lock() should not > wait, but test. (Or both variants could be available. Or all three, including the > variant without checking for an even sequence number.) > > My argument for this is: The write operation could take a long time to > complete, and while this goes on, it is good for the reading threads to know at > entry of their critical read section that the read operation will fail, so they can > take the alternative code path instead of proceeding into the critical read > section. Otherwise, the reading threads have to waste time reading the > protected data, only to discard them at the end. It's an optimization based on > the assumption that reading the protected data has some small cost, because > this small cost adds up if done many times during a longwinded write > operation. > > And, although checking for the sequence number in read_trylock() adds an 'if' > statement to it, that 'if' statement should be surrounded by likely() to reduce > its cost in the case we are optimizing for, i.e. when no write operation is > ongoing. This 'if' statement can be part of the application code as well. This would allow for multiple models to exist. > > This means that read_trylock() returns a boolean, and the sequence number is > returned in an output parameter. > > Please note that it doesn't change the fact that read_tryunlock() can still fail, > even though read_trylock() gave the go-ahead. > > I'm trying to highlight that while we all agree to optimize for the case of > reading while no writing is ongoing, there might be opportunity for optimizing > for the opposite case (i.e. trying to read while writing is ongoing) at the same > time. > > I only hope it can be done with negligent performance cost for the primary > case. > > I'll respectfully leave the hardcore implementation details and performance > considerations to you experts in this area. :-) ^ permalink raw reply [flat|nested] 104+ messages in thread
* Re: [PATCH v3] eal: add seqlock 2022-04-02 19:31 ` Honnappa Nagarahalli 2022-04-02 20:36 ` Morten Brørup @ 2022-04-03 18:11 ` Ola Liljedahl 1 sibling, 0 replies; 104+ messages in thread From: Ola Liljedahl @ 2022-04-03 18:11 UTC (permalink / raw) To: Honnappa Nagarahalli Cc: Mattias Rönnblom, dev, thomas, David Marchand, onar.olsen, nd, konstantin.ananyev, mb, stephen (Now using macOS Mail program in plain text mode, hope this works) > On 2 Apr 2022, at 21:31, Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com> wrote: > > <snip> > >>> +__rte_experimental >>> +static inline bool >>> +rte_seqlock_read_tryunlock(const rte_seqlock_t *seqlock, uint32_t >>> +*begin_sn) { >>> + uint32_t end_sn; >>> + >>> + /* make sure the data loads happens before the sn load */ >>> + rte_atomic_thread_fence(__ATOMIC_ACQUIRE); >>> + >>> + end_sn = __atomic_load_n(&seqlock->sn, __ATOMIC_RELAXED); >> >> Since we are reading and potentially returning the sequence number here >> (repeating the read of the protected data), we need to use load-acquire. >> I assume it is not expected that the user will call >> rte_seqlock_read_lock() again. > Good point, we need a load-acquire (due to changes done in v3). > >> >> Seeing this implementation, I might actually prefer the original >> implementation, I think it is cleaner. But I would like for the begin function >> also to wait for an even sequence number, the end function would only have >> to check for same sequence number, this might improve performance a little >> bit as readers won't perform one or several broken reads while a write is in >> progress. The function names are a different thing though. > I think we need to be optimizing for the case where there is no contention between readers and writers (as that happens most of the time). From this perspective, not checking for an even seq number in the begin function would reduce one 'if' statement. The number of statements in C is not relevant, instead we need to look at the generated code. On x86, I would assume an if-statement like “if ((sn & 1) || (sn == sl->sn))” to generate two separate evaluations with their own conditional jump instructions. On AArch64, the two evaluations could probably be combined using a CCMP instruction and need only one conditional branch instruction. With branch prediction, it is doubtful we will see any difference in performance. > > Going back to the earlier model is better as well, because of the load-acquire required in the 'rte_seqlock_read_tryunlock' function. The earlier model would not introduce the load-acquire for the no contention case. The earlier model still had load-acquire in the read_begin function which would have to invoked again. There is no difference in the number or type of memory accesses. We just need to copy the implementation of read_begin into the read_tryunlock function if we decide that the user should not have to re-invoke read_begin on a failed read_tryunlock. > >> >> The writer side behaves much more like a lock with mutual exclusion so >> write_lock/write_unlock makes sense. >> >>> + >>> + if (unlikely(end_sn & 1 || *begin_sn != end_sn)) { >>> + *begin_sn = end_sn; >>> + return false; >>> + } >>> + >>> + return true; >>> +} >>> + ^ permalink raw reply [flat|nested] 104+ messages in thread
* Re: [PATCH v3] eal: add seqlock 2022-04-02 18:15 ` Ola Liljedahl 2022-04-02 19:31 ` Honnappa Nagarahalli @ 2022-04-03 6:51 ` Mattias Rönnblom 1 sibling, 0 replies; 104+ messages in thread From: Mattias Rönnblom @ 2022-04-03 6:51 UTC (permalink / raw) To: Ola Liljedahl, dev Cc: Thomas Monjalon, David Marchand, onar.olsen, Honnappa.Nagarahalli, nd, konstantin.ananyev, mb, stephen On 2022-04-02 20:15, Ola Liljedahl wrote: > On 4/1/22 17:07, Mattias Rönnblom wrote: >> + >> +/** >> + * End a read-side critical section. >> + * >> + * A call to this function marks the end of a read-side critical >> + * section, for @p seqlock. The application must supply the sequence >> + * number produced by the corresponding rte_seqlock_read_lock() (or, >> + * in case of a retry, the rte_seqlock_tryunlock()) call. >> + * >> + * After this function has been called, the caller should not access >> + * the protected data. >> + * >> + * In case this function returns true, the just-read data was >> + * consistent and the set of atomic and non-atomic load operations >> + * performed between rte_seqlock_read_lock() and >> + * rte_seqlock_read_tryunlock() were atomic, as a whole. >> + * >> + * In case rte_seqlock_read_tryunlock() returns false, the data was >> + * modified as it was being read and may be inconsistent, and thus >> + * should be discarded. The @p begin_sn is updated with the >> + * now-current sequence number. >> + * >> + * @param seqlock >> + * A pointer to the seqlock. >> + * @param begin_sn >> + * The seqlock sequence number returned by >> + * rte_seqlock_read_lock() (potentially updated in subsequent >> + * rte_seqlock_read_tryunlock() calls) for this critical section. >> + * @return >> + * true or false, if the just-read seqlock-protected data was >> consistent >> + * or inconsistent, respectively, at the time it was read. >> + * >> + * @see rte_seqlock_read_lock() >> + */ >> +__rte_experimental >> +static inline bool >> +rte_seqlock_read_tryunlock(const rte_seqlock_t *seqlock, uint32_t >> *begin_sn) >> +{ >> + uint32_t end_sn; >> + >> + /* make sure the data loads happens before the sn load */ >> + rte_atomic_thread_fence(__ATOMIC_ACQUIRE); >> + >> + end_sn = __atomic_load_n(&seqlock->sn, __ATOMIC_RELAXED); > > Since we are reading and potentially returning the sequence number here > (repeating the read of the protected data), we need to use load-acquire. > I assume it is not expected that the user will call > rte_seqlock_read_lock() again. > Good point. > Seeing this implementation, I might actually prefer the original > implementation, I think it is cleaner. Me too. > But I would like for the begin > function also to wait for an even sequence number, the end function > would only have to check for same sequence number, this might improve > performance a little bit as readers won't perform one or several broken > reads while a write is in progress. The function names are a different > thing though. > The odd sn should be a rare case, if the seqlock is used for relatively low frequency update scenarios, which is what I think it should be designed for. Waiting for an even sn in read_begin() would exclude the option for the caller to defer reading the new data to same later time, in case it's being written. That in turn would force even a single writer to make sure its thread is not preempted, or risk blocking all lcore worker cores attempting to read the protected data. You could complete the above API with a read_trybegin() function to address that issue, for those who care, but that would force some extra complexity on the user. > The writer side behaves much more like a lock with mutual exclusion so > write_lock/write_unlock makes sense. > >> + >> + if (unlikely(end_sn & 1 || *begin_sn != end_sn)) { >> + *begin_sn = end_sn; >> + return false; >> + } >> + >> + return true; >> +} >> + ^ permalink raw reply [flat|nested] 104+ messages in thread
* Re: [PATCH v2] eal: add seqlock 2022-03-31 9:25 ` Morten Brørup 2022-03-31 9:38 ` Ola Liljedahl @ 2022-03-31 13:51 ` Mattias Rönnblom 2022-04-02 0:54 ` Stephen Hemminger 1 sibling, 1 reply; 104+ messages in thread From: Mattias Rönnblom @ 2022-03-31 13:51 UTC (permalink / raw) To: Morten Brørup, Ola Liljedahl, dev Cc: Thomas Monjalon, David Marchand, Onar Olsen, Honnappa.Nagarahalli, nd, konstantin.ananyev, stephen On 2022-03-31 11:25, Morten Brørup wrote: >> From: Ola Liljedahl [mailto:ola.liljedahl@arm.com] >> Sent: Thursday, 31 March 2022 11.05 >> >> On 3/31/22 09:46, Mattias Rönnblom wrote: >>> On 2022-03-30 16:26, Mattias Rönnblom wrote: >>>> A sequence lock (seqlock) is synchronization primitive which allows >>>> for data-race free, low-overhead, high-frequency reads, especially >> for >>>> data structures shared across many cores and which are updated with >>>> relatively infrequently. >>>> >>>> >>> >>> <snip> >>> >>> Some questions I have: >>> >>> Is a variant of the seqlock without the spinlock required? The reason >> I >>> left such out was that I thought that in most cases where only a >> single >>> writer is used (or serialization is external to the seqlock), the >>> spinlock overhead is negligible, since updates are relatively >> infrequent. > > Mattias, when you suggested adding the seqlock, I considered this too, and came to the same conclusion as you. > >> You can combine the spinlock and the sequence number. Odd sequence >> number means the seqlock is busy. That would replace a non-atomic RMW >> of >> the sequence number with an atomic RMW CAS and avoid the spin lock >> atomic RMW operation. Not sure how much it helps. >> >>> >>> Should the rte_seqlock_read_retry() be called rte_seqlock_read_end(), >> or >>> some third alternative? I wanted to make clear it's not just a >> "release >>> the lock" function. You could use >>> the|||__attribute__((warn_unused_result)) annotation to make clear >> the >>> return value cannot be ignored, although I'm not sure DPDK ever use >> that >>> attribute. > > I strongly support adding __attribute__((warn_unused_result)) to the function. There's a first time for everything, and this attribute is very relevant here! > That would be a separate patch, I assume. Does anyone know if this attribute is available in all supported compilers? >> We have to decide how to use the seqlock API from the application >> perspective. >> Your current proposal: >> do { >> sn = rte_seqlock_read_begin(&seqlock) >> //read protected data >> } while (rte_seqlock_read_retry(&seqlock, sn)); >> >> or perhaps >> sn = rte_seqlock_read_lock(&seqlock); >> do { >> //read protected data >> } while (!rte_seqlock_read_tryunlock(&seqlock, &sn)); >> >> Tryunlock should signal to the user that the unlock operation might not >> succeed and something needs to be repeated. > > Perhaps rename rte_seqlock_read_retry() to rte_seqlock_read_tryend()? As Ola mentions, this also inverses the boolean result value. If you consider this, please check that the resulting assembly output remains efficient. > > I think lock()/unlock() should be avoided in the read operation names, because no lock is taken during read. I like the critical region begin()/end() names. > > Regarding naming, you should also consider renaming rte_seqlock_write_begin/end() to rte_seqlock_write_lock/unlock(), following the naming convention of the other locks. This could prepare for future extensions, such as rte_seqlock_write_trylock(). Just a thought; I don't feel strongly about this. > > Ola, the rte_seqlock_read_lock(&seqlock) must remain inside the loop, because retries can be triggered by a write operation happening between the read_begin() and read_tryend(), and then the new sn must be used by the read operation. > ^ permalink raw reply [flat|nested] 104+ messages in thread
* Re: [PATCH v2] eal: add seqlock 2022-03-31 13:51 ` [PATCH v2] " Mattias Rönnblom @ 2022-04-02 0:54 ` Stephen Hemminger 2022-04-02 10:25 ` Morten Brørup 0 siblings, 1 reply; 104+ messages in thread From: Stephen Hemminger @ 2022-04-02 0:54 UTC (permalink / raw) To: Mattias Rönnblom Cc: Morten Brørup, Ola Liljedahl, dev, Thomas Monjalon, David Marchand, Onar Olsen, Honnappa.Nagarahalli, nd, konstantin.ananyev On Thu, 31 Mar 2022 13:51:32 +0000 Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote: > > > > Regarding naming, you should also consider renaming > > rte_seqlock_write_begin/end() to rte_seqlock_write_lock/unlock(), > > following the naming convention of the other locks. This could > > prepare for future extensions, such as rte_seqlock_write_trylock(). > > Just a thought; I don't feel strongly about this. Semantics and naming should be the same as Linux kernel or you risk having to reeducate too many people. ^ permalink raw reply [flat|nested] 104+ messages in thread
* RE: [PATCH v2] eal: add seqlock 2022-04-02 0:54 ` Stephen Hemminger @ 2022-04-02 10:25 ` Morten Brørup 2022-04-02 17:43 ` Ola Liljedahl 0 siblings, 1 reply; 104+ messages in thread From: Morten Brørup @ 2022-04-02 10:25 UTC (permalink / raw) To: Stephen Hemminger, Mattias Rönnblom Cc: Ola Liljedahl, dev, Thomas Monjalon, David Marchand, Onar Olsen, Honnappa.Nagarahalli, nd, konstantin.ananyev > From: Stephen Hemminger [mailto:stephen@networkplumber.org] > Sent: Saturday, 2 April 2022 02.54 > > Semantics and naming should be the same as Linux kernel or you risk > having to reeducate too many people. Although I do see significant value in that point, I don't consider the Linux kernel API the definitive golden standard in all regards. If DPDK can do better, it should. However, if different naming/semantics does a better job for DPDK, then we should take care to avoid similar function names with different behavior than Linux, to reduce the risk of incorrect use by seasoned Linux kernel developers. ^ permalink raw reply [flat|nested] 104+ messages in thread
* Re: [PATCH v2] eal: add seqlock 2022-04-02 10:25 ` Morten Brørup @ 2022-04-02 17:43 ` Ola Liljedahl 0 siblings, 0 replies; 104+ messages in thread From: Ola Liljedahl @ 2022-04-02 17:43 UTC (permalink / raw) To: Morten Brørup, Stephen Hemminger, Mattias Rönnblom Cc: dev, Thomas Monjalon, David Marchand, Onar Olsen, Honnappa.Nagarahalli, nd, konstantin.ananyev On 4/2/22 12:25, Morten Brørup wrote: >> From: Stephen Hemminger [mailto:stephen@networkplumber.org] >> Sent: Saturday, 2 April 2022 02.54 >> >> Semantics and naming should be the same as Linux kernel or you risk >> having to reeducate too many people. > Although I do see significant value in that point, I don't consider the Linux kernel API the definitive golden standard in all regards. If DPDK can do better, it should. > > However, if different naming/semantics does a better job for DPDK, then we should take care to avoid similar function names with different behavior than Linux, to reduce the risk of incorrect use by seasoned Linux kernel developers. Couldn't have said it better myself. ^ permalink raw reply [flat|nested] 104+ messages in thread
* Re: [PATCH v2] eal: add seqlock 2022-03-31 9:04 ` Ola Liljedahl 2022-03-31 9:25 ` Morten Brørup @ 2022-03-31 13:38 ` Mattias Rönnblom 2022-03-31 14:53 ` Ola Liljedahl 1 sibling, 1 reply; 104+ messages in thread From: Mattias Rönnblom @ 2022-03-31 13:38 UTC (permalink / raw) To: Ola Liljedahl, dev Cc: Thomas Monjalon, David Marchand, Onar Olsen, Honnappa.Nagarahalli, nd, konstantin.ananyev, mb, stephen On 2022-03-31 11:04, Ola Liljedahl wrote: > > On 3/31/22 09:46, Mattias Rönnblom wrote: >> On 2022-03-30 16:26, Mattias Rönnblom wrote: >>> A sequence lock (seqlock) is synchronization primitive which allows >>> for data-race free, low-overhead, high-frequency reads, especially for >>> data structures shared across many cores and which are updated with >>> relatively infrequently. >>> >>> >> >> <snip> >> >> Some questions I have: >> >> Is a variant of the seqlock without the spinlock required? The reason I >> left such out was that I thought that in most cases where only a single >> writer is used (or serialization is external to the seqlock), the >> spinlock overhead is negligible, since updates are relatively infrequent. > You can combine the spinlock and the sequence number. Odd sequence > number means the seqlock is busy. That would replace a non-atomic RMW of > the sequence number with an atomic RMW CAS and avoid the spin lock > atomic RMW operation. Not sure how much it helps. > >> >> Should the rte_seqlock_read_retry() be called rte_seqlock_read_end(), or >> some third alternative? I wanted to make clear it's not just a "release >> the lock" function. You could use >> the|||__attribute__((warn_unused_result)) annotation to make clear the >> return value cannot be ignored, although I'm not sure DPDK ever use that >> attribute. > We have to decide how to use the seqlock API from the application > perspective. > Your current proposal: > do { > sn = rte_seqlock_read_begin(&seqlock) > //read protected data > } while (rte_seqlock_read_retry(&seqlock, sn)); > > or perhaps > sn = rte_seqlock_read_lock(&seqlock); > do { > //read protected data > } while (!rte_seqlock_read_tryunlock(&seqlock, &sn)); > > Tryunlock should signal to the user that the unlock operation might not > succeed and something needs to be repeated. > I like that your proposal is consistent with rwlock API, although I tend to think about a seqlock more like an arbitrary-size atomic load/store, where begin() is the beginning of the read transaction. What I don't like so much with "tryunlock" is that it's not obvious what return type and values it should have. I seem not to be the only one which suffers from a lack of intuition here, since the DPDK spinlock trylock() function returns '1' in case lock is taken (using an int, but treating it like a bool), while the rwlock equivalent returns '0' (also int, but treating it as an error code). "lock" also suggests you prevent something from occurring, which is not the case on the reader side. A calling application also need not call the reader unlock (or retry) function for all seqlocks it has locked, although I don't see a point why it wouldn't. (I don't see why a read-side critical section should contain much logic at all, since you can't act on the just-read data.) ^ permalink raw reply [flat|nested] 104+ messages in thread
* Re: [PATCH v2] eal: add seqlock 2022-03-31 13:38 ` Mattias Rönnblom @ 2022-03-31 14:53 ` Ola Liljedahl 2022-04-02 0:52 ` Stephen Hemminger 0 siblings, 1 reply; 104+ messages in thread From: Ola Liljedahl @ 2022-03-31 14:53 UTC (permalink / raw) To: Mattias Rönnblom, dev Cc: Thomas Monjalon, David Marchand, Onar Olsen, Honnappa.Nagarahalli, nd, konstantin.ananyev, mb, stephen (Thunderbird suddenly refuses to edit in plain text mode, hope the mail gets sent as text anyway) On 3/31/22 15:38, Mattias Rönnblom wrote: > On 2022-03-31 11:04, Ola Liljedahl wrote: >> On 3/31/22 09:46, Mattias Rönnblom wrote: >>> On 2022-03-30 16:26, Mattias Rönnblom wrote: >>> <snip> >>> Should the rte_seqlock_read_retry() be called rte_seqlock_read_end(), or >>> some third alternative? I wanted to make clear it's not just a "release >>> the lock" function. You could use >>> the|||__attribute__((warn_unused_result)) annotation to make clear the >>> return value cannot be ignored, although I'm not sure DPDK ever use that >>> attribute. >> We have to decide how to use the seqlock API from the application >> perspective. >> Your current proposal: >> do { >> sn = rte_seqlock_read_begin(&seqlock) >> //read protected data >> } while (rte_seqlock_read_retry(&seqlock, sn)); >> >> or perhaps >> sn = rte_seqlock_read_lock(&seqlock); >> do { >> //read protected data >> } while (!rte_seqlock_read_tryunlock(&seqlock, &sn)); >> >> Tryunlock should signal to the user that the unlock operation might not >> succeed and something needs to be repeated. >> > I like that your proposal is consistent with rwlock API, although I tend > to think about a seqlock more like an arbitrary-size atomic load/store, > where begin() is the beginning of the read transaction. I can see the evolution of an application where is starts to use plain spin locks, moves to reader/writer locks for better performance and eventually moves to seqlocks. The usage is the same, only the characteristics (including performance) differ. > > What I don't like so much with "tryunlock" is that it's not obvious what > return type and values it should have. I seem not to be the only one > which suffers from a lack of intuition here, since the DPDK spinlock > trylock() function returns '1' in case lock is taken (using an int, but > treating it like a bool), while the rwlock equivalent returns '0' (also > int, but treating it as an error code). Then you have two different ways of doing it! Or invent a third since there seems to be no consistent pattern. > > "lock" also suggests you prevent something from occurring, which is not > the case on the reader side. That's why my implementations in Progress64 use the terms acquire and release. Locks are acquired and released (with acquire and release semantics!). Hazard pointers are acquired and released (with acquire and release semantics!). Slots in reorder buffers are acquired and released. Etc. https://github.com/ARM-software/progress64 > A calling application also need not call > the reader unlock (or retry) function for all seqlocks it has locked, > although I don't see a point why it wouldn't. (I don't see why a > read-side critical section should contain much logic at all, since you > can't act on the just-read data.) Lock without unlock/retry is meaningless and not something we need to consider IMO. - Ola ^ permalink raw reply [flat|nested] 104+ messages in thread
* Re: [PATCH v2] eal: add seqlock 2022-03-31 14:53 ` Ola Liljedahl @ 2022-04-02 0:52 ` Stephen Hemminger 2022-04-03 6:23 ` Mattias Rönnblom 0 siblings, 1 reply; 104+ messages in thread From: Stephen Hemminger @ 2022-04-02 0:52 UTC (permalink / raw) To: Ola Liljedahl Cc: Mattias Rönnblom, dev, Thomas Monjalon, David Marchand, Onar Olsen, Honnappa.Nagarahalli, nd, konstantin.ananyev, mb On Thu, 31 Mar 2022 16:53:00 +0200 Ola Liljedahl <ola.liljedahl@arm.com> wrote: > From: Ola Liljedahl <ola.liljedahl@arm.com> > To: Mattias Rönnblom <mattias.ronnblom@ericsson.com>, "dev@dpdk.org" > <dev@dpdk.org> Cc: Thomas Monjalon <thomas@monjalon.net>, David > Marchand <david.marchand@redhat.com>, Onar Olsen > <onar.olsen@ericsson.com>, "Honnappa.Nagarahalli@arm.com" > <Honnappa.Nagarahalli@arm.com>, "nd@arm.com" <nd@arm.com>, > "konstantin.ananyev@intel.com" <konstantin.ananyev@intel.com>, > "mb@smartsharesystems.com" <mb@smartsharesystems.com>, > "stephen@networkplumber.org" <stephen@networkplumber.org> Subject: > Re: [PATCH v2] eal: add seqlock Date: Thu, 31 Mar 2022 16:53:00 +0200 > User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 > Thunderbird/91.7.0 > > (Thunderbird suddenly refuses to edit in plain text mode, hope the > mail gets sent as text anyway) > > On 3/31/22 15:38, Mattias Rönnblom wrote: > > > On 2022-03-31 11:04, Ola Liljedahl wrote: > >> On 3/31/22 09:46, Mattias Rönnblom wrote: > >>> On 2022-03-30 16:26, Mattias Rönnblom wrote: > >>> > <snip> > >>> Should the rte_seqlock_read_retry() be called > >>> rte_seqlock_read_end(), or some third alternative? I wanted to > >>> make clear it's not just a "release the lock" function. You could > >>> use the|||__attribute__((warn_unused_result)) annotation to make > >>> clear the return value cannot be ignored, although I'm not sure > >>> DPDK ever use that attribute. > >> We have to decide how to use the seqlock API from the application > >> perspective. > >> Your current proposal: > >> do { > >> sn = rte_seqlock_read_begin(&seqlock) > >> //read protected data > >> } while (rte_seqlock_read_retry(&seqlock, sn)); > >> > >> or perhaps > >> sn = rte_seqlock_read_lock(&seqlock); > >> do { > >> //read protected data > >> } while (!rte_seqlock_read_tryunlock(&seqlock, &sn)); > >> > >> Tryunlock should signal to the user that the unlock operation > >> might not succeed and something needs to be repeated. > >> > > I like that your proposal is consistent with rwlock API, although I > > tend to think about a seqlock more like an arbitrary-size atomic > > load/store, where begin() is the beginning of the read transaction. > > > > I can see the evolution of an application where is starts to use > plain spin locks, moves to reader/writer locks for better performance > and eventually moves to seqlocks. The usage is the same, only the > characteristics (including performance) differ. The semantics of seqlock in DPDK must be the same as what Linux kernel does or you are asking for trouble. It is not a reader-writer lock in traditional sense. ^ permalink raw reply [flat|nested] 104+ messages in thread
* Re: [PATCH v2] eal: add seqlock 2022-04-02 0:52 ` Stephen Hemminger @ 2022-04-03 6:23 ` Mattias Rönnblom 0 siblings, 0 replies; 104+ messages in thread From: Mattias Rönnblom @ 2022-04-03 6:23 UTC (permalink / raw) To: Stephen Hemminger, Ola Liljedahl Cc: dev, Thomas Monjalon, David Marchand, Onar Olsen, Honnappa.Nagarahalli, nd, konstantin.ananyev, mb On 2022-04-02 02:52, Stephen Hemminger wrote: > On Thu, 31 Mar 2022 16:53:00 +0200 > Ola Liljedahl <ola.liljedahl@arm.com> wrote: > >> From: Ola Liljedahl <ola.liljedahl@arm.com> >> To: Mattias Rönnblom <mattias.ronnblom@ericsson.com>, "dev@dpdk.org" >> <dev@dpdk.org> Cc: Thomas Monjalon <thomas@monjalon.net>, David >> Marchand <david.marchand@redhat.com>, Onar Olsen >> <onar.olsen@ericsson.com>, "Honnappa.Nagarahalli@arm.com" >> <Honnappa.Nagarahalli@arm.com>, "nd@arm.com" <nd@arm.com>, >> "konstantin.ananyev@intel.com" <konstantin.ananyev@intel.com>, >> "mb@smartsharesystems.com" <mb@smartsharesystems.com>, >> "stephen@networkplumber.org" <stephen@networkplumber.org> Subject: >> Re: [PATCH v2] eal: add seqlock Date: Thu, 31 Mar 2022 16:53:00 +0200 >> User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 >> Thunderbird/91.7.0 >> >> (Thunderbird suddenly refuses to edit in plain text mode, hope the >> mail gets sent as text anyway) >> >> On 3/31/22 15:38, Mattias Rönnblom wrote: >> >>> On 2022-03-31 11:04, Ola Liljedahl wrote: >>>> On 3/31/22 09:46, Mattias Rönnblom wrote: >>>>> On 2022-03-30 16:26, Mattias Rönnblom wrote: >>>>> >> <snip> >>>>> Should the rte_seqlock_read_retry() be called >>>>> rte_seqlock_read_end(), or some third alternative? I wanted to >>>>> make clear it's not just a "release the lock" function. You could >>>>> use the|||__attribute__((warn_unused_result)) annotation to make >>>>> clear the return value cannot be ignored, although I'm not sure >>>>> DPDK ever use that attribute. >>>> We have to decide how to use the seqlock API from the application >>>> perspective. >>>> Your current proposal: >>>> do { >>>> sn = rte_seqlock_read_begin(&seqlock) >>>> //read protected data >>>> } while (rte_seqlock_read_retry(&seqlock, sn)); >>>> >>>> or perhaps >>>> sn = rte_seqlock_read_lock(&seqlock); >>>> do { >>>> //read protected data >>>> } while (!rte_seqlock_read_tryunlock(&seqlock, &sn)); >>>> >>>> Tryunlock should signal to the user that the unlock operation >>>> might not succeed and something needs to be repeated. >>>> >>> I like that your proposal is consistent with rwlock API, although I >>> tend to think about a seqlock more like an arbitrary-size atomic >>> load/store, where begin() is the beginning of the read transaction. >>> >> >> I can see the evolution of an application where is starts to use >> plain spin locks, moves to reader/writer locks for better performance >> and eventually moves to seqlocks. The usage is the same, only the >> characteristics (including performance) differ. > > > The semantics of seqlock in DPDK must be the same as what Linux kernel > does or you are asking for trouble. It is not a reader-writer lock in > traditional sense. Does "semantics" here including the naming of the functions? The overall semantics will be the same, except the kernel has a bunch of variants with different kind of write-side synchronization, if I recall correctly. I'll try to summarize the options as I see them: Option A: (PATCH v3): rte_seqlock_read_lock() rte_seqlock_read_tryunlock() /* with built-in "read restart" */ rte_seqlock_write_lock() rte_seqlock_write_unlock() Option B: (Linux kernel-style naming): rte_seqlock_read_begin() rte_seqlock_read_end() rte_seqlock_write_begin() rte_seqlock_write_end() A combination, acknowledging there's a lock on the writer side, but not on the read side. Option C: rte_seqlock_read_begin() rte_seqlock_read_retry() rte_seqlock_write_lock() rte_seqlock_write_unlock() ^ permalink raw reply [flat|nested] 104+ messages in thread
* Re: [PATCH v2] eal: add seqlock 2022-03-30 14:26 ` [PATCH v2] " Mattias Rönnblom 2022-03-31 7:46 ` Mattias Rönnblom @ 2022-04-02 0:50 ` Stephen Hemminger 2022-04-02 17:54 ` Ola Liljedahl 2022-04-05 20:16 ` Stephen Hemminger 2 siblings, 1 reply; 104+ messages in thread From: Stephen Hemminger @ 2022-04-02 0:50 UTC (permalink / raw) To: Mattias Rönnblom Cc: dev, Thomas Monjalon, David Marchand, onar.olsen, Honnappa.Nagarahalli, nd, konstantin.ananyev, mb, Ola Liljedahl On Wed, 30 Mar 2022 16:26:02 +0200 Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote: > + __atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELAXED); > + > + /* __ATOMIC_RELEASE to prevent stores after (in program > order) > + * from happening before the sn store. > + */ > + rte_atomic_thread_fence(__ATOMIC_RELEASE); Couldn't atomic store with __ATOMIC_RELEASE do same thing? > +static inline void > +rte_seqlock_write_end(rte_seqlock_t *seqlock) > +{ > + uint32_t sn; > + > + sn = seqlock->sn + 1; > + > + /* synchronizes-with the load acquire in rte_seqlock_begin() > */ > + __atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELEASE); > + > + rte_spinlock_unlock(&seqlock->lock); Atomic store is not necessary here, the atomic operation in spinlock_unlock wil assure theat the seqeuence number update is ordered correctly. ^ permalink raw reply [flat|nested] 104+ messages in thread
* Re: [PATCH v2] eal: add seqlock 2022-04-02 0:50 ` Stephen Hemminger @ 2022-04-02 17:54 ` Ola Liljedahl 2022-04-02 19:37 ` Honnappa Nagarahalli 0 siblings, 1 reply; 104+ messages in thread From: Ola Liljedahl @ 2022-04-02 17:54 UTC (permalink / raw) To: Stephen Hemminger, Mattias Rönnblom Cc: dev, Thomas Monjalon, David Marchand, onar.olsen, Honnappa.Nagarahalli, nd, konstantin.ananyev, mb On 4/2/22 02:50, Stephen Hemminger wrote: > On Wed, 30 Mar 2022 16:26:02 +0200 > Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote: > >> + __atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELAXED); >> + >> + /* __ATOMIC_RELEASE to prevent stores after (in program >> order) >> + * from happening before the sn store. >> + */ >> + rte_atomic_thread_fence(__ATOMIC_RELEASE); > Couldn't atomic store with __ATOMIC_RELEASE do same thing? No, store-release wouldn't prevent later stores from moving up. It only ensures that earlier loads and stores have completed before store-release completes. If later stores could move before a supposed store-release(seqlock->sn), readers could see inconsistent (torn) data with a valid sequence number. >> +static inline void >> +rte_seqlock_write_end(rte_seqlock_t *seqlock) >> +{ >> + uint32_t sn; >> + >> + sn = seqlock->sn + 1; >> + >> + /* synchronizes-with the load acquire in rte_seqlock_begin() >> */ >> + __atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELEASE); >> + >> + rte_spinlock_unlock(&seqlock->lock); > Atomic store is not necessary here, the atomic operation in > spinlock_unlock wil assure theat the seqeuence number update is > ordered correctly. Load-acquire(seqlock->sn) in rte_seqlock_begin() must be paired with store-release(seqlock->sn) in rte_seqlock_write_end() or there wouldn't exist any synchronize-with relationship. Readers don't access the spin lock so any writer-side updates to the spin lock don't mean anything to readers. ^ permalink raw reply [flat|nested] 104+ messages in thread
* RE: [PATCH v2] eal: add seqlock 2022-04-02 17:54 ` Ola Liljedahl @ 2022-04-02 19:37 ` Honnappa Nagarahalli 0 siblings, 0 replies; 104+ messages in thread From: Honnappa Nagarahalli @ 2022-04-02 19:37 UTC (permalink / raw) To: Ola Liljedahl, Stephen Hemminger, Mattias Rönnblom Cc: dev, thomas, David Marchand, onar.olsen, nd, konstantin.ananyev, mb, nd <snip> > >> +static inline void > >> +rte_seqlock_write_end(rte_seqlock_t *seqlock) { > >> + uint32_t sn; > >> + > >> + sn = seqlock->sn + 1; > >> + > >> + /* synchronizes-with the load acquire in rte_seqlock_begin() > >> */ > >> + __atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELEASE); > >> + > >> + rte_spinlock_unlock(&seqlock->lock); > > Atomic store is not necessary here, the atomic operation in > > spinlock_unlock wil assure theat the seqeuence number update is > > ordered correctly. > Load-acquire(seqlock->sn) in rte_seqlock_begin() must be paired with > store-release(seqlock->sn) in rte_seqlock_write_end() or there wouldn't exist > any synchronize-with relationship. Readers don't access the spin lock so any > writer-side updates to the spin lock don't mean anything to readers. Agree with this assessment. The store-release in spin-lock unlock does not synchronize with the readers. ^ permalink raw reply [flat|nested] 104+ messages in thread
* Re: [PATCH v2] eal: add seqlock 2022-03-30 14:26 ` [PATCH v2] " Mattias Rönnblom 2022-03-31 7:46 ` Mattias Rönnblom 2022-04-02 0:50 ` Stephen Hemminger @ 2022-04-05 20:16 ` Stephen Hemminger 2022-04-08 13:50 ` Mattias Rönnblom 2 siblings, 1 reply; 104+ messages in thread From: Stephen Hemminger @ 2022-04-05 20:16 UTC (permalink / raw) To: Mattias Rönnblom Cc: dev, Thomas Monjalon, David Marchand, onar.olsen, Honnappa.Nagarahalli, nd, konstantin.ananyev, mb, Ola Liljedahl On Wed, 30 Mar 2022 16:26:02 +0200 Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote: > +/** > + * A static seqlock initializer. > + */ > +#define RTE_SEQLOCK_INITIALIZER { 0, RTE_SPINLOCK_INITIALIZER } Used named field initializers here please. > +/** > + * Initialize the seqlock. > + * > + * This function initializes the seqlock, and leaves the writer-side > + * spinlock unlocked. > + * > + * @param seqlock > + * A pointer to the seqlock. > + */ > +__rte_experimental > +void > +rte_seqlock_init(rte_seqlock_t *seqlock); You need to add the standard experimental prefix to the comment so that doxygen marks the API as experimental in documentation. > +static inline bool > +rte_seqlock_read_retry(const rte_seqlock_t *seqlock, uint32_t begin_sn) > +{ > + uint32_t end_sn; > + > + /* make sure the data loads happens before the sn load */ > + rte_atomic_thread_fence(__ATOMIC_ACQUIRE); > + > + end_sn = __atomic_load_n(&seqlock->sn, __ATOMIC_RELAXED); > + > + return unlikely(begin_sn & 1 || begin_sn != end_sn); Please add parenthesis around the and to test if odd. It would be good to document why if begin_sn is odd it returns false. ^ permalink raw reply [flat|nested] 104+ messages in thread
* Re: [PATCH v2] eal: add seqlock 2022-04-05 20:16 ` Stephen Hemminger @ 2022-04-08 13:50 ` Mattias Rönnblom 2022-04-08 14:24 ` [PATCH v4] " Mattias Rönnblom 0 siblings, 1 reply; 104+ messages in thread From: Mattias Rönnblom @ 2022-04-08 13:50 UTC (permalink / raw) To: Stephen Hemminger Cc: dev, Thomas Monjalon, David Marchand, onar.olsen, Honnappa.Nagarahalli, nd, konstantin.ananyev, mb, Ola Liljedahl, Mattias Rönnblom On 2022-04-05 22:16, Stephen Hemminger wrote: > On Wed, 30 Mar 2022 16:26:02 +0200 > Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote: > >> +/** >> + * A static seqlock initializer. >> + */ >> +#define RTE_SEQLOCK_INITIALIZER { 0, RTE_SPINLOCK_INITIALIZER } > > Used named field initializers here please. > OK. >> +/** >> + * Initialize the seqlock. >> + * >> + * This function initializes the seqlock, and leaves the writer-side >> + * spinlock unlocked. >> + * >> + * @param seqlock >> + * A pointer to the seqlock. >> + */ >> +__rte_experimental >> +void >> +rte_seqlock_init(rte_seqlock_t *seqlock); > > You need to add the standard experimental prefix > to the comment so that doxygen marks the API as experimental > in documentation. > OK. >> +static inline bool >> +rte_seqlock_read_retry(const rte_seqlock_t *seqlock, uint32_t begin_sn) >> +{ >> + uint32_t end_sn; >> + >> + /* make sure the data loads happens before the sn load */ >> + rte_atomic_thread_fence(__ATOMIC_ACQUIRE); >> + >> + end_sn = __atomic_load_n(&seqlock->sn, __ATOMIC_RELAXED); >> + >> + return unlikely(begin_sn & 1 || begin_sn != end_sn); > > Please add parenthesis around the and to test if odd. > It would be good to document why if begin_sn is odd it returns false. > > Will do. Thanks. ^ permalink raw reply [flat|nested] 104+ messages in thread
* [PATCH v4] eal: add seqlock 2022-04-08 13:50 ` Mattias Rönnblom @ 2022-04-08 14:24 ` Mattias Rönnblom 2022-04-08 15:17 ` Stephen Hemminger ` (4 more replies) 0 siblings, 5 replies; 104+ messages in thread From: Mattias Rönnblom @ 2022-04-08 14:24 UTC (permalink / raw) To: dev Cc: Thomas Monjalon, David Marchand, onar.olsen, Honnappa.Nagarahalli, nd, konstantin.ananyev, mb, stephen, hofors, Mattias Rönnblom, Ola Liljedahl A sequence lock (seqlock) is synchronization primitive which allows for data-race free, low-overhead, high-frequency reads, especially for data structures shared across many cores and which are updated relatively infrequently. A seqlock permits multiple parallel readers. The variant of seqlock implemented in this patch supports multiple writers as well. A spinlock is used for writer-writer serialization. To avoid resource reclamation and other issues, the data protected by a seqlock is best off being self-contained (i.e., no pointers [except to constant data]). One way to think about seqlocks is that they provide means to perform atomic operations on data objects larger what the native atomic machine instructions allow for. DPDK seqlocks are not preemption safe on the writer side. A thread preemption affects performance, not correctness. A seqlock contains a sequence number, which can be thought of as the generation of the data it protects. A reader will 1. Load the sequence number (sn). 2. Load, in arbitrary order, the seqlock-protected data. 3. Load the sn again. 4. Check if the first and second sn are equal, and even numbered. If they are not, discard the loaded data, and restart from 1. The first three steps need to be ordered using suitable memory fences. A writer will 1. Take the spinlock, to serialize writer access. 2. Load the sn. 3. Store the original sn + 1 as the new sn. 4. Perform load and stores to the seqlock-protected data. 5. Store the original sn + 2 as the new sn. 6. Release the spinlock. Proper memory fencing is required to make sure the first sn store, the data stores, and the second sn store appear to the reader in the mentioned order. The sn loads and stores must be atomic, but the data loads and stores need not be. The original seqlock design and implementation was done by Stephen Hemminger. This is an independent implementation, using C11 atomics. For more information on seqlocks, see https://en.wikipedia.org/wiki/Seqlock PATCH v4: * Reverted to Linux kernel style naming on the read side. * Bail out early from the retry function if an odd sequence number is encountered. * Added experimental warnings in the API documentation. * Static initializer now uses named field initialization. * Various tweaks to API documentation (including the example). PATCH v3: * Renamed both read and write-side critical section begin/end functions to better match rwlock naming, per Ola Liljedahl's suggestion. * Added 'extern "C"' guards for C++ compatibility. * Refer to the main lcore as the main lcore, and nothing else. PATCH v2: * Skip instead of fail unit test in case too few lcores are available. * Use main lcore for testing, reducing the minimum number of lcores required to run the unit tests to four. * Consistently refer to sn field as the "sequence number" in the documentation. * Fixed spelling mistakes in documentation. Updates since RFC: * Added API documentation. * Added link to Wikipedia article in the commit message. * Changed seqlock sequence number field from uint64_t (which was overkill) to uint32_t. The sn type needs to be sufficiently large to assure no reader will read a sn, access the data, and then read the same sn, but the sn has been incremented enough times to have wrapped during the read, and arrived back at the original sn. * Added RTE_SEQLOCK_INITIALIZER macro for static initialization. * Removed the rte_seqlock struct + separate rte_seqlock_t typedef with an anonymous struct typedef:ed to rte_seqlock_t. Acked-by: Morten Brørup <mb@smartsharesystems.com> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com> --- app/test/meson.build | 2 + app/test/test_seqlock.c | 202 +++++++++++++++++++++ lib/eal/common/meson.build | 1 + lib/eal/common/rte_seqlock.c | 12 ++ lib/eal/include/meson.build | 1 + lib/eal/include/rte_seqlock.h | 319 ++++++++++++++++++++++++++++++++++ lib/eal/version.map | 3 + 7 files changed, 540 insertions(+) create mode 100644 app/test/test_seqlock.c create mode 100644 lib/eal/common/rte_seqlock.c create mode 100644 lib/eal/include/rte_seqlock.h diff --git a/app/test/meson.build b/app/test/meson.build index 5fc1dd1b7b..5e418e8766 100644 --- a/app/test/meson.build +++ b/app/test/meson.build @@ -125,6 +125,7 @@ test_sources = files( 'test_rwlock.c', 'test_sched.c', 'test_security.c', + 'test_seqlock.c', 'test_service_cores.c', 'test_spinlock.c', 'test_stack.c', @@ -214,6 +215,7 @@ fast_tests = [ ['rwlock_rde_wro_autotest', true], ['sched_autotest', true], ['security_autotest', false], + ['seqlock_autotest', true], ['spinlock_autotest', true], ['stack_autotest', false], ['stack_lf_autotest', false], diff --git a/app/test/test_seqlock.c b/app/test/test_seqlock.c new file mode 100644 index 0000000000..3f1ce53678 --- /dev/null +++ b/app/test/test_seqlock.c @@ -0,0 +1,202 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2022 Ericsson AB + */ + +#include <rte_seqlock.h> + +#include <rte_cycles.h> +#include <rte_malloc.h> +#include <rte_random.h> + +#include <inttypes.h> + +#include "test.h" + +struct data { + rte_seqlock_t lock; + + uint64_t a; + uint64_t b __rte_cache_aligned; + uint64_t c __rte_cache_aligned; +} __rte_cache_aligned; + +struct reader { + struct data *data; + uint8_t stop; +}; + +#define WRITER_RUNTIME (2.0) /* s */ + +#define WRITER_MAX_DELAY (100) /* us */ + +#define INTERRUPTED_WRITER_FREQUENCY (1000) +#define WRITER_INTERRUPT_TIME (1) /* us */ + +static int +writer_run(void *arg) +{ + struct data *data = arg; + uint64_t deadline; + + deadline = rte_get_timer_cycles() + + WRITER_RUNTIME * rte_get_timer_hz(); + + while (rte_get_timer_cycles() < deadline) { + bool interrupted; + uint64_t new_value; + unsigned int delay; + + new_value = rte_rand(); + + interrupted = rte_rand_max(INTERRUPTED_WRITER_FREQUENCY) == 0; + + rte_seqlock_write_lock(&data->lock); + + data->c = new_value; + + /* These compiler barriers (both on the test reader + * and the test writer side) are here to ensure that + * loads/stores *usually* happen in test program order + * (always on a TSO machine). They are arrange in such + * a way that the writer stores in a different order + * than the reader loads, to emulate an arbitrary + * order. A real application using a seqlock does not + * require any compiler barriers. + */ + rte_compiler_barrier(); + data->b = new_value; + + if (interrupted) + rte_delay_us_block(WRITER_INTERRUPT_TIME); + + rte_compiler_barrier(); + data->a = new_value; + + rte_seqlock_write_unlock(&data->lock); + + delay = rte_rand_max(WRITER_MAX_DELAY); + + rte_delay_us_block(delay); + } + + return 0; +} + +#define INTERRUPTED_READER_FREQUENCY (1000) +#define READER_INTERRUPT_TIME (1000) /* us */ + +static int +reader_run(void *arg) +{ + struct reader *r = arg; + int rc = 0; + + while (__atomic_load_n(&r->stop, __ATOMIC_RELAXED) == 0 && rc == 0) { + struct data *data = r->data; + bool interrupted; + uint32_t sn; + uint64_t a; + uint64_t b; + uint64_t c; + + interrupted = rte_rand_max(INTERRUPTED_READER_FREQUENCY) == 0; + + do { + sn = rte_seqlock_read_begin(&data->lock); + + a = data->a; + /* See writer_run() for an explanation why + * these barriers are here. + */ + rte_compiler_barrier(); + + if (interrupted) + rte_delay_us_block(READER_INTERRUPT_TIME); + + c = data->c; + + rte_compiler_barrier(); + b = data->b; + + } while (rte_seqlock_read_retry(&data->lock, sn)); + + if (a != b || b != c) { + printf("Reader observed inconsistent data values " + "%" PRIu64 " %" PRIu64 " %" PRIu64 "\n", + a, b, c); + rc = -1; + } + } + + return rc; +} + +static void +reader_stop(struct reader *reader) +{ + __atomic_store_n(&reader->stop, 1, __ATOMIC_RELAXED); +} + +#define NUM_WRITERS (2) /* main lcore + one worker */ +#define MIN_NUM_READERS (2) +#define MAX_READERS (RTE_MAX_LCORE - NUM_WRITERS - 1) +#define MIN_LCORE_COUNT (NUM_WRITERS + MIN_NUM_READERS) + +/* Only a compile-time test */ +static rte_seqlock_t __rte_unused static_init_lock = RTE_SEQLOCK_INITIALIZER; + +static int +test_seqlock(void) +{ + struct reader readers[MAX_READERS]; + unsigned int num_readers; + unsigned int num_lcores; + unsigned int i; + unsigned int lcore_id; + unsigned int reader_lcore_ids[MAX_READERS]; + unsigned int worker_writer_lcore_id = 0; + int rc = 0; + + num_lcores = rte_lcore_count(); + + if (num_lcores < MIN_LCORE_COUNT) { + printf("Too few cores to run test. Skipping.\n"); + return 0; + } + + num_readers = num_lcores - NUM_WRITERS; + + struct data *data = rte_zmalloc(NULL, sizeof(struct data), 0); + + i = 0; + RTE_LCORE_FOREACH_WORKER(lcore_id) { + if (i == 0) { + rte_eal_remote_launch(writer_run, data, lcore_id); + worker_writer_lcore_id = lcore_id; + } else { + unsigned int reader_idx = i - 1; + struct reader *reader = &readers[reader_idx]; + + reader->data = data; + reader->stop = 0; + + rte_eal_remote_launch(reader_run, reader, lcore_id); + reader_lcore_ids[reader_idx] = lcore_id; + } + i++; + } + + if (writer_run(data) != 0 || + rte_eal_wait_lcore(worker_writer_lcore_id) != 0) + rc = -1; + + for (i = 0; i < num_readers; i++) { + reader_stop(&readers[i]); + if (rte_eal_wait_lcore(reader_lcore_ids[i]) != 0) + rc = -1; + } + + return rc; +} + +REGISTER_TEST_COMMAND(seqlock_autotest, test_seqlock); diff --git a/lib/eal/common/meson.build b/lib/eal/common/meson.build index 917758cc65..a41343bfed 100644 --- a/lib/eal/common/meson.build +++ b/lib/eal/common/meson.build @@ -35,6 +35,7 @@ sources += files( 'rte_malloc.c', 'rte_random.c', 'rte_reciprocal.c', + 'rte_seqlock.c', 'rte_service.c', 'rte_version.c', ) diff --git a/lib/eal/common/rte_seqlock.c b/lib/eal/common/rte_seqlock.c new file mode 100644 index 0000000000..d4fe648799 --- /dev/null +++ b/lib/eal/common/rte_seqlock.c @@ -0,0 +1,12 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2022 Ericsson AB + */ + +#include <rte_seqlock.h> + +void +rte_seqlock_init(rte_seqlock_t *seqlock) +{ + seqlock->sn = 0; + rte_spinlock_init(&seqlock->lock); +} diff --git a/lib/eal/include/meson.build b/lib/eal/include/meson.build index 9700494816..48df5f1a21 100644 --- a/lib/eal/include/meson.build +++ b/lib/eal/include/meson.build @@ -36,6 +36,7 @@ headers += files( 'rte_per_lcore.h', 'rte_random.h', 'rte_reciprocal.h', + 'rte_seqlock.h', 'rte_service.h', 'rte_service_component.h', 'rte_string_fns.h', diff --git a/lib/eal/include/rte_seqlock.h b/lib/eal/include/rte_seqlock.h new file mode 100644 index 0000000000..961816aa10 --- /dev/null +++ b/lib/eal/include/rte_seqlock.h @@ -0,0 +1,319 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2022 Ericsson AB + */ + +#ifndef _RTE_SEQLOCK_H_ +#define _RTE_SEQLOCK_H_ + +#ifdef __cplusplus +extern "C" { +#endif + +/** + * @file + * RTE Seqlock + * + * A sequence lock (seqlock) is a synchronization primitive allowing + * multiple, parallel, readers to efficiently and safely (i.e., in a + * data-race free manner) access lock-protected data. The RTE seqlock + * permits multiple writers as well. A spinlock is used to + * writer-writer synchronization. + * + * A reader never blocks a writer. Very high frequency writes may + * prevent readers from making progress. + * + * A seqlock is not preemption-safe on the writer side. If a writer is + * preempted, it may block readers until the writer thread is allowed + * to continue. Heavy computations should be kept out of the + * writer-side critical section, to avoid delaying readers. + * + * Seqlocks are useful for data which are read by many cores, at a + * high frequency, and relatively infrequently written to. + * + * One way to think about seqlocks is that they provide means to + * perform atomic operations on objects larger than what the native + * machine instructions allow for. + * + * To avoid resource reclamation issues, the data protected by a + * seqlock should typically be kept self-contained (e.g., no pointers + * to mutable, dynamically allocated data). + * + * Example usage: + * @code{.c} + * #define MAX_Y_LEN (16) + * // Application-defined example data structure, protected by a seqlock. + * struct config { + * rte_seqlock_t lock; + * int param_x; + * char param_y[MAX_Y_LEN]; + * }; + * + * // Accessor function for reading config fields. + * void + * config_read(const struct config *config, int *param_x, char *param_y) + * { + * uint32_t sn; + * + * do { + * sn = rte_seqlock_read_begin(&config->lock); + * + * // Loads may be atomic or non-atomic, as in this example. + * *param_x = config->param_x; + * strcpy(param_y, config->param_y); + * // An alternative to an immediate retry is to abort and + * // try again at some later time, assuming progress is + * // possible without the data. + * } while (rte_seqlock_read_retry(&config->lock)); + * } + * + * // Accessor function for writing config fields. + * void + * config_update(struct config *config, int param_x, const char *param_y) + * { + * rte_seqlock_write_lock(&config->lock); + * // Stores may be atomic or non-atomic, as in this example. + * config->param_x = param_x; + * strcpy(config->param_y, param_y); + * rte_seqlock_write_unlock(&config->lock); + * } + * @endcode + * + * @see + * https://en.wikipedia.org/wiki/Seqlock. + */ + +#include <stdbool.h> +#include <stdint.h> + +#include <rte_atomic.h> +#include <rte_branch_prediction.h> +#include <rte_spinlock.h> + +/** + * The RTE seqlock type. + */ +typedef struct { + uint32_t sn; /**< A sequence number for the protected data. */ + rte_spinlock_t lock; /**< Spinlock used to serialize writers. */ +} rte_seqlock_t; + +/** + * A static seqlock initializer. + */ +#define RTE_SEQLOCK_INITIALIZER \ + { \ + .sn = 0, \ + .lock = RTE_SPINLOCK_INITIALIZER \ + } + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice. + * + * Initialize the seqlock. + * + * This function initializes the seqlock, and leaves the writer-side + * spinlock unlocked. + * + * @param seqlock + * A pointer to the seqlock. + */ +__rte_experimental +void +rte_seqlock_init(rte_seqlock_t *seqlock); + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice. + * + * Begin a read-side critical section. + * + * A call to this function marks the beginning of a read-side critical + * section, for @p seqlock. + * + * rte_seqlock_read_begin() returns a sequence number, which is later + * used in rte_seqlock_read_retry() to check if the protected data + * underwent any modifications during the read transaction. + * + * After (in program order) rte_seqlock_read_begin() has been called, + * the calling thread reads the protected data, for later use. The + * protected data read *must* be copied (either in pristine form, or + * in the form of some derivative), since the caller may only read the + * data from within the read-side critical section (i.e., after + * rte_seqlock_read_begin() and before rte_seqlock_read_retry()), + * but must not act upon the retrieved data while in the critical + * section, since it does not yet know if it is consistent. + * + * The protected data may be read using atomic and/or non-atomic + * operations. + * + * After (in program order) all required data loads have been + * performed, rte_seqlock_read_retry() should be called, marking + * the end of the read-side critical section. + * + * If rte_seqlock_read_retry() returns true, the just-read data is + * inconsistent and should be discarded. The caller has the option to + * either restart the whole procedure right away (i.e., calling + * rte_seqlock_read_begin() again), or do the same at some later time. + * + * If rte_seqlock_read_retry() returns false, the data was read + * atomically and the copied data is consistent. + * + * @param seqlock + * A pointer to the seqlock. + * @return + * The seqlock sequence number for this critical section, to + * later be passed to rte_seqlock_read_retry(). + * + * @see rte_seqlock_read_retry() + */ +__rte_experimental +static inline uint32_t +rte_seqlock_read_begin(const rte_seqlock_t *seqlock) +{ + /* __ATOMIC_ACQUIRE to prevent loads after (in program order) + * from happening before the sn load. Synchronizes-with the + * store release in rte_seqlock_write_unlock(). + */ + return __atomic_load_n(&seqlock->sn, __ATOMIC_ACQUIRE); +} + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice. + * + * End a read-side critical section. + * + * A call to this function marks the end of a read-side critical + * section, for @p seqlock. The application must supply the sequence + * number produced by the corresponding rte_seqlock_read_begin() call. + * + * After this function has been called, the caller should not access + * the protected data. + * + * In case rte_seqlock_read_retry() returns true, the just-read data + * was modified as it was being read and may be inconsistent, and thus + * should be discarded. + * + * In case this function returns false, the data is consistent and the + * set of atomic and non-atomic load operations performed between + * rte_seqlock_read_begin() and rte_seqlock_read_retry() were atomic, + * as a whole. + * + * @param seqlock + * A pointer to the seqlock. + * @param begin_sn + * The seqlock sequence number returned by rte_seqlock_read_begin(). + * @return + * true or false, if the just-read seqlock-protected data was + * inconsistent or consistent, respectively, at the time it was + * read. + * + * @see rte_seqlock_read_begin() + */ +__rte_experimental +static inline bool +rte_seqlock_read_retry(const rte_seqlock_t *seqlock, uint32_t begin_sn) +{ + uint32_t end_sn; + + /* An odd sequence number means the protected data was being + * modified already at the point of the rte_seqlock_read_begin() + * call. + */ + if (unlikely(begin_sn & 1)) + return true; + + /* make sure the data loads happens before the sn load */ + rte_atomic_thread_fence(__ATOMIC_ACQUIRE); + + end_sn = __atomic_load_n(&seqlock->sn, __ATOMIC_RELAXED); + + /* A writer pegged the sequence number during the read operation. */ + if (unlikely(begin_sn != end_sn)) + return true; + + return false; +} + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice. + * + * Begin a write-side critical section. + * + * A call to this function acquires the write lock associated @p + * seqlock, and marks the beginning of a write-side critical section. + * + * After having called this function, the caller may go on to modify + * (both read and write) the protected data, in an atomic or + * non-atomic manner. + * + * After the necessary updates have been performed, the application + * calls rte_seqlock_write_unlock(). + * + * This function is not preemption-safe in the sense that preemption + * of the calling thread may block reader progress until the writer + * thread is rescheduled. + * + * Unlike rte_seqlock_read_begin(), each call made to + * rte_seqlock_write_lock() must be matched with an unlock call. + * + * @param seqlock + * A pointer to the seqlock. + * + * @see rte_seqlock_write_unlock() + */ +__rte_experimental +static inline void +rte_seqlock_write_lock(rte_seqlock_t *seqlock) +{ + uint32_t sn; + + /* to synchronize with other writers */ + rte_spinlock_lock(&seqlock->lock); + + sn = seqlock->sn + 1; + + __atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELAXED); + + /* __ATOMIC_RELEASE to prevent stores after (in program order) + * from happening before the sn store. + */ + rte_atomic_thread_fence(__ATOMIC_RELEASE); +} + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice. + * + * End a write-side critical section. + * + * A call to this function marks the end of the write-side critical + * section, for @p seqlock. After this call has been made, the protected + * data may no longer be modified. + * + * @param seqlock + * A pointer to the seqlock. + * + * @see rte_seqlock_write_lock() + */ +__rte_experimental +static inline void +rte_seqlock_write_unlock(rte_seqlock_t *seqlock) +{ + uint32_t sn; + + sn = seqlock->sn + 1; + + /* synchronizes-with the load acquire in rte_seqlock_read_begin() */ + __atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELEASE); + + rte_spinlock_unlock(&seqlock->lock); +} + +#ifdef __cplusplus +} +#endif + +#endif /* _RTE_SEQLOCK_H_ */ diff --git a/lib/eal/version.map b/lib/eal/version.map index b53eeb30d7..4a9d0ed899 100644 --- a/lib/eal/version.map +++ b/lib/eal/version.map @@ -420,6 +420,9 @@ EXPERIMENTAL { rte_intr_instance_free; rte_intr_type_get; rte_intr_type_set; + + # added in 22.07 + rte_seqlock_init; }; INTERNAL { -- 2.25.1 ^ permalink raw reply [flat|nested] 104+ messages in thread
* Re: [PATCH v4] eal: add seqlock 2022-04-08 14:24 ` [PATCH v4] " Mattias Rönnblom @ 2022-04-08 15:17 ` Stephen Hemminger 2022-04-08 16:24 ` Mattias Rönnblom 2022-04-08 15:19 ` Stephen Hemminger ` (3 subsequent siblings) 4 siblings, 1 reply; 104+ messages in thread From: Stephen Hemminger @ 2022-04-08 15:17 UTC (permalink / raw) To: Mattias Rönnblom Cc: dev, Thomas Monjalon, David Marchand, onar.olsen, Honnappa.Nagarahalli, nd, konstantin.ananyev, mb, hofors, Ola Liljedahl On Fri, 8 Apr 2022 16:24:42 +0200 Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote: > +++ b/lib/eal/common/rte_seqlock.c > @@ -0,0 +1,12 @@ > +/* SPDX-License-Identifier: BSD-3-Clause > + * Copyright(c) 2022 Ericsson AB > + */ > + > +#include <rte_seqlock.h> > + > +void > +rte_seqlock_init(rte_seqlock_t *seqlock) > +{ > + seqlock->sn = 0; > + rte_spinlock_init(&seqlock->lock); > +} Why not put init in rte_seqlock.h (like other locks) and not need a .c at all? ^ permalink raw reply [flat|nested] 104+ messages in thread
* Re: [PATCH v4] eal: add seqlock 2022-04-08 15:17 ` Stephen Hemminger @ 2022-04-08 16:24 ` Mattias Rönnblom 0 siblings, 0 replies; 104+ messages in thread From: Mattias Rönnblom @ 2022-04-08 16:24 UTC (permalink / raw) To: Stephen Hemminger Cc: dev, Thomas Monjalon, David Marchand, onar.olsen, Honnappa.Nagarahalli, nd, konstantin.ananyev, mb, hofors, Ola Liljedahl On 2022-04-08 17:17, Stephen Hemminger wrote: > On Fri, 8 Apr 2022 16:24:42 +0200 > Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote: > >> +++ b/lib/eal/common/rte_seqlock.c >> @@ -0,0 +1,12 @@ >> +/* SPDX-License-Identifier: BSD-3-Clause >> + * Copyright(c) 2022 Ericsson AB >> + */ >> + >> +#include <rte_seqlock.h> >> + >> +void >> +rte_seqlock_init(rte_seqlock_t *seqlock) >> +{ >> + seqlock->sn = 0; >> + rte_spinlock_init(&seqlock->lock); >> +} > > Why not put init in rte_seqlock.h (like other locks) > and not need a .c at all? > Non-performance critical functions shouldn't be in header files. ^ permalink raw reply [flat|nested] 104+ messages in thread
* Re: [PATCH v4] eal: add seqlock 2022-04-08 14:24 ` [PATCH v4] " Mattias Rönnblom 2022-04-08 15:17 ` Stephen Hemminger @ 2022-04-08 15:19 ` Stephen Hemminger 2022-04-08 16:37 ` Mattias Rönnblom 2022-04-08 16:48 ` Mattias Rönnblom ` (2 subsequent siblings) 4 siblings, 1 reply; 104+ messages in thread From: Stephen Hemminger @ 2022-04-08 15:19 UTC (permalink / raw) To: Mattias Rönnblom Cc: dev, Thomas Monjalon, David Marchand, onar.olsen, Honnappa.Nagarahalli, nd, konstantin.ananyev, mb, hofors, Ola Liljedahl On Fri, 8 Apr 2022 16:24:42 +0200 Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote: > + /* A writer pegged the sequence number during the read operation. */ > + if (unlikely(begin_sn != end_sn)) > + return true; In some countries "pegged" might be considered inappropriate slang. Use incremented or changed instead. ^ permalink raw reply [flat|nested] 104+ messages in thread
* Re: [PATCH v4] eal: add seqlock 2022-04-08 15:19 ` Stephen Hemminger @ 2022-04-08 16:37 ` Mattias Rönnblom 0 siblings, 0 replies; 104+ messages in thread From: Mattias Rönnblom @ 2022-04-08 16:37 UTC (permalink / raw) To: Stephen Hemminger Cc: dev, Thomas Monjalon, David Marchand, onar.olsen, Honnappa.Nagarahalli, nd, konstantin.ananyev, mb, hofors, Ola Liljedahl On 2022-04-08 17:19, Stephen Hemminger wrote: > On Fri, 8 Apr 2022 16:24:42 +0200 > Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote: > >> + /* A writer pegged the sequence number during the read operation. */ >> + if (unlikely(begin_sn != end_sn)) >> + return true; > > In some countries "pegged" might be considered inappropriate slang. > Use incremented or changed instead. OK. ^ permalink raw reply [flat|nested] 104+ messages in thread
* Re: [PATCH v4] eal: add seqlock 2022-04-08 14:24 ` [PATCH v4] " Mattias Rönnblom 2022-04-08 15:17 ` Stephen Hemminger 2022-04-08 15:19 ` Stephen Hemminger @ 2022-04-08 16:48 ` Mattias Rönnblom 2022-04-12 17:27 ` Ananyev, Konstantin 2022-04-28 10:28 ` David Marchand 4 siblings, 0 replies; 104+ messages in thread From: Mattias Rönnblom @ 2022-04-08 16:48 UTC (permalink / raw) To: Mattias Rönnblom, dev Cc: Thomas Monjalon, David Marchand, onar.olsen, Honnappa.Nagarahalli, nd, konstantin.ananyev, mb, stephen, Ola Liljedahl On 2022-04-08 16:24, Mattias Rönnblom wrote: <snip> > > PATCH v4: > * Reverted to Linux kernel style naming on the read side. In this version I chose to adhere to kernel naming on the read side, but keep the write_lock()/unlock() on the write side. I think those names communicate better what the functions do, but Stephen's comment about keeping naming and semantics close to the Linux kernel APIs is very much relevant, also for the write functions. I don't really have an opinion if we keep these names, or if we change to rte_seqlock_write_begin()/end(). You might ask yourself which of the two naming options make most sense in the light that we might extend the proposed seqlock API with an "unlocked" (non-writer-serializing) seqlock variant, or variants with other types of lock, in the future. What function writer-side names would be suitable for such. (I don't know, but it seemed something that might be useful to consider.) <snip> ^ permalink raw reply [flat|nested] 104+ messages in thread
* RE: [PATCH v4] eal: add seqlock 2022-04-08 14:24 ` [PATCH v4] " Mattias Rönnblom ` (2 preceding siblings ...) 2022-04-08 16:48 ` Mattias Rönnblom @ 2022-04-12 17:27 ` Ananyev, Konstantin 2022-04-28 10:28 ` David Marchand 4 siblings, 0 replies; 104+ messages in thread From: Ananyev, Konstantin @ 2022-04-12 17:27 UTC (permalink / raw) To: mattias.ronnblom, dev Cc: Thomas Monjalon, David Marchand, Olsen, Onar, Honnappa.Nagarahalli, nd, mb, stephen, hofors, mattias.ronnblom, Ola Liljedahl > A sequence lock (seqlock) is synchronization primitive which allows > for data-race free, low-overhead, high-frequency reads, especially for > data structures shared across many cores and which are updated > relatively infrequently. > > A seqlock permits multiple parallel readers. The variant of seqlock > implemented in this patch supports multiple writers as well. A > spinlock is used for writer-writer serialization. > > To avoid resource reclamation and other issues, the data protected by > a seqlock is best off being self-contained (i.e., no pointers [except > to constant data]). > > One way to think about seqlocks is that they provide means to perform > atomic operations on data objects larger what the native atomic > machine instructions allow for. > > DPDK seqlocks are not preemption safe on the writer side. A thread > preemption affects performance, not correctness. > > A seqlock contains a sequence number, which can be thought of as the > generation of the data it protects. > > A reader will > 1. Load the sequence number (sn). > 2. Load, in arbitrary order, the seqlock-protected data. > 3. Load the sn again. > 4. Check if the first and second sn are equal, and even numbered. > If they are not, discard the loaded data, and restart from 1. > > The first three steps need to be ordered using suitable memory fences. > > A writer will > 1. Take the spinlock, to serialize writer access. > 2. Load the sn. > 3. Store the original sn + 1 as the new sn. > 4. Perform load and stores to the seqlock-protected data. > 5. Store the original sn + 2 as the new sn. > 6. Release the spinlock. > > Proper memory fencing is required to make sure the first sn store, the > data stores, and the second sn store appear to the reader in the > mentioned order. > > The sn loads and stores must be atomic, but the data loads and stores > need not be. > > The original seqlock design and implementation was done by Stephen > Hemminger. This is an independent implementation, using C11 atomics. > > For more information on seqlocks, see > https://en.wikipedia.org/wiki/Seqlock > > PATCH v4: > * Reverted to Linux kernel style naming on the read side. > * Bail out early from the retry function if an odd sequence > number is encountered. > * Added experimental warnings in the API documentation. > * Static initializer now uses named field initialization. > * Various tweaks to API documentation (including the example). > > PATCH v3: > * Renamed both read and write-side critical section begin/end functions > to better match rwlock naming, per Ola Liljedahl's suggestion. > * Added 'extern "C"' guards for C++ compatibility. > * Refer to the main lcore as the main lcore, and nothing else. > > PATCH v2: > * Skip instead of fail unit test in case too few lcores are available. > * Use main lcore for testing, reducing the minimum number of lcores > required to run the unit tests to four. > * Consistently refer to sn field as the "sequence number" in the > documentation. > * Fixed spelling mistakes in documentation. > > Updates since RFC: > * Added API documentation. > * Added link to Wikipedia article in the commit message. > * Changed seqlock sequence number field from uint64_t (which was > overkill) to uint32_t. The sn type needs to be sufficiently large > to assure no reader will read a sn, access the data, and then read > the same sn, but the sn has been incremented enough times to have > wrapped during the read, and arrived back at the original sn. > * Added RTE_SEQLOCK_INITIALIZER macro for static initialization. > * Removed the rte_seqlock struct + separate rte_seqlock_t typedef > with an anonymous struct typedef:ed to rte_seqlock_t. > > Acked-by: Morten Brørup <mb@smartsharesystems.com> > Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com> > Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com> > --- Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> > -- > 2.25.1 ^ permalink raw reply [flat|nested] 104+ messages in thread
* Re: [PATCH v4] eal: add seqlock 2022-04-08 14:24 ` [PATCH v4] " Mattias Rönnblom ` (3 preceding siblings ...) 2022-04-12 17:27 ` Ananyev, Konstantin @ 2022-04-28 10:28 ` David Marchand 2022-05-01 13:46 ` Mattias Rönnblom 4 siblings, 1 reply; 104+ messages in thread From: David Marchand @ 2022-04-28 10:28 UTC (permalink / raw) To: Mattias Rönnblom Cc: dev, Thomas Monjalon, onar.olsen, Honnappa Nagarahalli, nd, Ananyev, Konstantin, Morten Brørup, Stephen Hemminger, hofors, Ola Liljedahl Hello Mattias, On Fri, Apr 8, 2022 at 4:25 PM Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote: > > A sequence lock (seqlock) is synchronization primitive which allows > for data-race free, low-overhead, high-frequency reads, especially for > data structures shared across many cores and which are updated > relatively infrequently. > > A seqlock permits multiple parallel readers. The variant of seqlock > implemented in this patch supports multiple writers as well. A > spinlock is used for writer-writer serialization. > > To avoid resource reclamation and other issues, the data protected by > a seqlock is best off being self-contained (i.e., no pointers [except > to constant data]). > > One way to think about seqlocks is that they provide means to perform > atomic operations on data objects larger what the native atomic > machine instructions allow for. > > DPDK seqlocks are not preemption safe on the writer side. A thread > preemption affects performance, not correctness. > > A seqlock contains a sequence number, which can be thought of as the > generation of the data it protects. > > A reader will > 1. Load the sequence number (sn). > 2. Load, in arbitrary order, the seqlock-protected data. > 3. Load the sn again. > 4. Check if the first and second sn are equal, and even numbered. > If they are not, discard the loaded data, and restart from 1. > > The first three steps need to be ordered using suitable memory fences. > > A writer will > 1. Take the spinlock, to serialize writer access. > 2. Load the sn. > 3. Store the original sn + 1 as the new sn. > 4. Perform load and stores to the seqlock-protected data. > 5. Store the original sn + 2 as the new sn. > 6. Release the spinlock. > > Proper memory fencing is required to make sure the first sn store, the > data stores, and the second sn store appear to the reader in the > mentioned order. > > The sn loads and stores must be atomic, but the data loads and stores > need not be. > > The original seqlock design and implementation was done by Stephen > Hemminger. This is an independent implementation, using C11 atomics. > > For more information on seqlocks, see > https://en.wikipedia.org/wiki/Seqlock Revisions changelog should be after commitlog, separated with ---. > > PATCH v4: > * Reverted to Linux kernel style naming on the read side. > * Bail out early from the retry function if an odd sequence > number is encountered. > * Added experimental warnings in the API documentation. > * Static initializer now uses named field initialization. > * Various tweaks to API documentation (including the example). > > PATCH v3: > * Renamed both read and write-side critical section begin/end functions > to better match rwlock naming, per Ola Liljedahl's suggestion. > * Added 'extern "C"' guards for C++ compatibility. > * Refer to the main lcore as the main lcore, and nothing else. > > PATCH v2: > * Skip instead of fail unit test in case too few lcores are available. > * Use main lcore for testing, reducing the minimum number of lcores > required to run the unit tests to four. > * Consistently refer to sn field as the "sequence number" in the > documentation. > * Fixed spelling mistakes in documentation. > > Updates since RFC: > * Added API documentation. > * Added link to Wikipedia article in the commit message. > * Changed seqlock sequence number field from uint64_t (which was > overkill) to uint32_t. The sn type needs to be sufficiently large > to assure no reader will read a sn, access the data, and then read > the same sn, but the sn has been incremented enough times to have > wrapped during the read, and arrived back at the original sn. > * Added RTE_SEQLOCK_INITIALIZER macro for static initialization. > * Removed the rte_seqlock struct + separate rte_seqlock_t typedef > with an anonymous struct typedef:ed to rte_seqlock_t. > > Acked-by: Morten Brørup <mb@smartsharesystems.com> > Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com> > Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com> We are missing a MAINTAINERS update, either with a new section for this lock (like for MCS and ticket locks), or adding the new test code under the EAL API and common code section (like rest of the locks). This new lock is not referenced in doxygen (see doc/api/doxy-api-index.md). It's worth a release notes update for advertising this new lock type. [snip] > diff --git a/app/test/test_seqlock.c b/app/test/test_seqlock.c > new file mode 100644 > index 0000000000..3f1ce53678 > --- /dev/null > +++ b/app/test/test_seqlock.c [snip] > +/* Only a compile-time test */ > +static rte_seqlock_t __rte_unused static_init_lock = RTE_SEQLOCK_INITIALIZER; > + > +static int > +test_seqlock(void) > +{ > + struct reader readers[MAX_READERS]; > + unsigned int num_readers; > + unsigned int num_lcores; > + unsigned int i; > + unsigned int lcore_id; > + unsigned int reader_lcore_ids[MAX_READERS]; > + unsigned int worker_writer_lcore_id = 0; > + int rc = 0; A unit test is supposed to use TEST_* macros as return values. I concede other locks unit tests return 0 or -1 (which is equivalent, given TEST_SUCCESS / TEST_FAILED values). We can go with 0 / -1 (and a cleanup could be done later on app/test globally), but at least change to TEST_SKIPPED when lacking lcores (see below). > + > + num_lcores = rte_lcore_count(); > + > + if (num_lcores < MIN_LCORE_COUNT) { > + printf("Too few cores to run test. Skipping.\n"); > + return 0; return TEST_SKIPPED; > + } > + > + num_readers = num_lcores - NUM_WRITERS; > + > + struct data *data = rte_zmalloc(NULL, sizeof(struct data), 0); > + > + i = 0; > + RTE_LCORE_FOREACH_WORKER(lcore_id) { > + if (i == 0) { > + rte_eal_remote_launch(writer_run, data, lcore_id); > + worker_writer_lcore_id = lcore_id; > + } else { > + unsigned int reader_idx = i - 1; > + struct reader *reader = &readers[reader_idx]; > + > + reader->data = data; > + reader->stop = 0; > + > + rte_eal_remote_launch(reader_run, reader, lcore_id); > + reader_lcore_ids[reader_idx] = lcore_id; > + } > + i++; > + } > + > + if (writer_run(data) != 0 || > + rte_eal_wait_lcore(worker_writer_lcore_id) != 0) > + rc = -1; > + > + for (i = 0; i < num_readers; i++) { > + reader_stop(&readers[i]); > + if (rte_eal_wait_lcore(reader_lcore_ids[i]) != 0) > + rc = -1; > + } > + > + return rc; > +} > + > +REGISTER_TEST_COMMAND(seqlock_autotest, test_seqlock); > diff --git a/lib/eal/common/meson.build b/lib/eal/common/meson.build > index 917758cc65..a41343bfed 100644 > --- a/lib/eal/common/meson.build > +++ b/lib/eal/common/meson.build > @@ -35,6 +35,7 @@ sources += files( > 'rte_malloc.c', > 'rte_random.c', > 'rte_reciprocal.c', > + 'rte_seqlock.c', Indent is not correct, please use spaces for meson files. > 'rte_service.c', > 'rte_version.c', > ) > diff --git a/lib/eal/include/rte_seqlock.h b/lib/eal/include/rte_seqlock.h > new file mode 100644 > index 0000000000..961816aa10 > --- /dev/null > +++ b/lib/eal/include/rte_seqlock.h > @@ -0,0 +1,319 @@ > +/* SPDX-License-Identifier: BSD-3-Clause > + * Copyright(c) 2022 Ericsson AB > + */ > + > +#ifndef _RTE_SEQLOCK_H_ > +#define _RTE_SEQLOCK_H_ > + > +#ifdef __cplusplus > +extern "C" { > +#endif > + > +/** > + * @file > + * RTE Seqlock Nit: mention of RTE adds nothing, I'd remove it. > + * > + * A sequence lock (seqlock) is a synchronization primitive allowing > + * multiple, parallel, readers to efficiently and safely (i.e., in a > + * data-race free manner) access lock-protected data. The RTE seqlock > + * permits multiple writers as well. A spinlock is used to > + * writer-writer synchronization. > + * > + * A reader never blocks a writer. Very high frequency writes may > + * prevent readers from making progress. > + * > + * A seqlock is not preemption-safe on the writer side. If a writer is > + * preempted, it may block readers until the writer thread is allowed > + * to continue. Heavy computations should be kept out of the > + * writer-side critical section, to avoid delaying readers. > + * > + * Seqlocks are useful for data which are read by many cores, at a > + * high frequency, and relatively infrequently written to. > + * > + * One way to think about seqlocks is that they provide means to > + * perform atomic operations on objects larger than what the native > + * machine instructions allow for. > + * > + * To avoid resource reclamation issues, the data protected by a > + * seqlock should typically be kept self-contained (e.g., no pointers > + * to mutable, dynamically allocated data). > + * > + * Example usage: > + * @code{.c} > + * #define MAX_Y_LEN (16) > + * // Application-defined example data structure, protected by a seqlock. > + * struct config { > + * rte_seqlock_t lock; > + * int param_x; > + * char param_y[MAX_Y_LEN]; > + * }; > + * > + * // Accessor function for reading config fields. > + * void > + * config_read(const struct config *config, int *param_x, char *param_y) > + * { > + * uint32_t sn; > + * > + * do { > + * sn = rte_seqlock_read_begin(&config->lock); > + * > + * // Loads may be atomic or non-atomic, as in this example. > + * *param_x = config->param_x; > + * strcpy(param_y, config->param_y); > + * // An alternative to an immediate retry is to abort and > + * // try again at some later time, assuming progress is > + * // possible without the data. > + * } while (rte_seqlock_read_retry(&config->lock)); > + * } > + * > + * // Accessor function for writing config fields. > + * void > + * config_update(struct config *config, int param_x, const char *param_y) > + * { > + * rte_seqlock_write_lock(&config->lock); > + * // Stores may be atomic or non-atomic, as in this example. > + * config->param_x = param_x; > + * strcpy(config->param_y, param_y); > + * rte_seqlock_write_unlock(&config->lock); > + * } > + * @endcode > + * > + * @see > + * https://en.wikipedia.org/wiki/Seqlock. > + */ The rest lgtm. -- David Marchand ^ permalink raw reply [flat|nested] 104+ messages in thread
* Re: [PATCH v4] eal: add seqlock 2022-04-28 10:28 ` David Marchand @ 2022-05-01 13:46 ` Mattias Rönnblom 2022-05-01 14:03 ` [PATCH v5] " Mattias Rönnblom 0 siblings, 1 reply; 104+ messages in thread From: Mattias Rönnblom @ 2022-05-01 13:46 UTC (permalink / raw) To: David Marchand, Mattias Rönnblom Cc: dev, Thomas Monjalon, onar.olsen, Honnappa Nagarahalli, nd, Ananyev, Konstantin, Morten Brørup, Stephen Hemminger, Ola Liljedahl On 2022-04-28 12:28, David Marchand wrote: > Hello Mattias, > > On Fri, Apr 8, 2022 at 4:25 PM Mattias Rönnblom > <mattias.ronnblom@ericsson.com> wrote: >> >> A sequence lock (seqlock) is synchronization primitive which allows >> for data-race free, low-overhead, high-frequency reads, especially for >> data structures shared across many cores and which are updated >> relatively infrequently. >> >> A seqlock permits multiple parallel readers. The variant of seqlock >> implemented in this patch supports multiple writers as well. A >> spinlock is used for writer-writer serialization. >> >> To avoid resource reclamation and other issues, the data protected by >> a seqlock is best off being self-contained (i.e., no pointers [except >> to constant data]). >> >> One way to think about seqlocks is that they provide means to perform >> atomic operations on data objects larger what the native atomic >> machine instructions allow for. >> >> DPDK seqlocks are not preemption safe on the writer side. A thread >> preemption affects performance, not correctness. >> >> A seqlock contains a sequence number, which can be thought of as the >> generation of the data it protects. >> >> A reader will >> 1. Load the sequence number (sn). >> 2. Load, in arbitrary order, the seqlock-protected data. >> 3. Load the sn again. >> 4. Check if the first and second sn are equal, and even numbered. >> If they are not, discard the loaded data, and restart from 1. >> >> The first three steps need to be ordered using suitable memory fences. >> >> A writer will >> 1. Take the spinlock, to serialize writer access. >> 2. Load the sn. >> 3. Store the original sn + 1 as the new sn. >> 4. Perform load and stores to the seqlock-protected data. >> 5. Store the original sn + 2 as the new sn. >> 6. Release the spinlock. >> >> Proper memory fencing is required to make sure the first sn store, the >> data stores, and the second sn store appear to the reader in the >> mentioned order. >> >> The sn loads and stores must be atomic, but the data loads and stores >> need not be. >> >> The original seqlock design and implementation was done by Stephen >> Hemminger. This is an independent implementation, using C11 atomics. >> >> For more information on seqlocks, see >> https://en.wikipedia.org/wiki/Seqlock > > Revisions changelog should be after commitlog, separated with ---. > OK. >> >> PATCH v4: >> * Reverted to Linux kernel style naming on the read side. >> * Bail out early from the retry function if an odd sequence >> number is encountered. >> * Added experimental warnings in the API documentation. >> * Static initializer now uses named field initialization. >> * Various tweaks to API documentation (including the example). >> >> PATCH v3: >> * Renamed both read and write-side critical section begin/end functions >> to better match rwlock naming, per Ola Liljedahl's suggestion. >> * Added 'extern "C"' guards for C++ compatibility. >> * Refer to the main lcore as the main lcore, and nothing else. >> >> PATCH v2: >> * Skip instead of fail unit test in case too few lcores are available. >> * Use main lcore for testing, reducing the minimum number of lcores >> required to run the unit tests to four. >> * Consistently refer to sn field as the "sequence number" in the >> documentation. >> * Fixed spelling mistakes in documentation. >> >> Updates since RFC: >> * Added API documentation. >> * Added link to Wikipedia article in the commit message. >> * Changed seqlock sequence number field from uint64_t (which was >> overkill) to uint32_t. The sn type needs to be sufficiently large >> to assure no reader will read a sn, access the data, and then read >> the same sn, but the sn has been incremented enough times to have >> wrapped during the read, and arrived back at the original sn. >> * Added RTE_SEQLOCK_INITIALIZER macro for static initialization. >> * Removed the rte_seqlock struct + separate rte_seqlock_t typedef >> with an anonymous struct typedef:ed to rte_seqlock_t. >> >> Acked-by: Morten Brørup <mb@smartsharesystems.com> >> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com> >> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com> > > We are missing a MAINTAINERS update, either with a new section for > this lock (like for MCS and ticket locks), or adding the new test code > under the EAL API and common code section (like rest of the locks). > OK. I added a new section, and myself as the maintainer. If you want to merge it with EAL and the common code that fine with me as well. Let me know in that case. The seqlock is a tiny bit of code, but there are some intricacies (like always when C11 memory models matter). > This new lock is not referenced in doxygen (see doc/api/doxy-api-index.md). > OK. > It's worth a release notes update for advertising this new lock type. > OK. > [snip] > > >> diff --git a/app/test/test_seqlock.c b/app/test/test_seqlock.c >> new file mode 100644 >> index 0000000000..3f1ce53678 >> --- /dev/null >> +++ b/app/test/test_seqlock.c > > [snip] > >> +/* Only a compile-time test */ >> +static rte_seqlock_t __rte_unused static_init_lock = RTE_SEQLOCK_INITIALIZER; >> + >> +static int >> +test_seqlock(void) >> +{ >> + struct reader readers[MAX_READERS]; >> + unsigned int num_readers; >> + unsigned int num_lcores; >> + unsigned int i; >> + unsigned int lcore_id; >> + unsigned int reader_lcore_ids[MAX_READERS]; >> + unsigned int worker_writer_lcore_id = 0; >> + int rc = 0; > > A unit test is supposed to use TEST_* macros as return values. > I concede other locks unit tests return 0 or -1 (which is equivalent, > given TEST_SUCCESS / TEST_FAILED values). > We can go with 0 / -1 (and a cleanup could be done later on app/test > globally), but at least change to TEST_SKIPPED when lacking lcores > (see below). > I'll fix it right away instead. > >> + >> + num_lcores = rte_lcore_count(); >> + >> + if (num_lcores < MIN_LCORE_COUNT) { >> + printf("Too few cores to run test. Skipping.\n"); >> + return 0; > > return TEST_SKIPPED; > > Excellent. I remember asking myself if there was such a return code when I wrote that code, but then I forgot to actually go look for it. >> + } >> + >> + num_readers = num_lcores - NUM_WRITERS; >> + >> + struct data *data = rte_zmalloc(NULL, sizeof(struct data), 0); >> + >> + i = 0; >> + RTE_LCORE_FOREACH_WORKER(lcore_id) { >> + if (i == 0) { >> + rte_eal_remote_launch(writer_run, data, lcore_id); >> + worker_writer_lcore_id = lcore_id; >> + } else { >> + unsigned int reader_idx = i - 1; >> + struct reader *reader = &readers[reader_idx]; >> + >> + reader->data = data; >> + reader->stop = 0; >> + >> + rte_eal_remote_launch(reader_run, reader, lcore_id); >> + reader_lcore_ids[reader_idx] = lcore_id; >> + } >> + i++; >> + } >> + >> + if (writer_run(data) != 0 || >> + rte_eal_wait_lcore(worker_writer_lcore_id) != 0) >> + rc = -1; >> + >> + for (i = 0; i < num_readers; i++) { >> + reader_stop(&readers[i]); >> + if (rte_eal_wait_lcore(reader_lcore_ids[i]) != 0) >> + rc = -1; >> + } >> + >> + return rc; >> +} >> + >> +REGISTER_TEST_COMMAND(seqlock_autotest, test_seqlock); >> diff --git a/lib/eal/common/meson.build b/lib/eal/common/meson.build >> index 917758cc65..a41343bfed 100644 >> --- a/lib/eal/common/meson.build >> +++ b/lib/eal/common/meson.build >> @@ -35,6 +35,7 @@ sources += files( >> 'rte_malloc.c', >> 'rte_random.c', >> 'rte_reciprocal.c', >> + 'rte_seqlock.c', > > Indent is not correct, please use spaces for meson files. > OK. > >> 'rte_service.c', >> 'rte_version.c', >> ) >> diff --git a/lib/eal/include/rte_seqlock.h b/lib/eal/include/rte_seqlock.h >> new file mode 100644 >> index 0000000000..961816aa10 >> --- /dev/null >> +++ b/lib/eal/include/rte_seqlock.h >> @@ -0,0 +1,319 @@ >> +/* SPDX-License-Identifier: BSD-3-Clause >> + * Copyright(c) 2022 Ericsson AB >> + */ >> + >> +#ifndef _RTE_SEQLOCK_H_ >> +#define _RTE_SEQLOCK_H_ >> + >> +#ifdef __cplusplus >> +extern "C" { >> +#endif >> + >> +/** >> + * @file >> + * RTE Seqlock > > Nit: mention of RTE adds nothing, I'd remove it. > I agree, but there seemed to be a convention that called for this. > >> + * >> + * A sequence lock (seqlock) is a synchronization primitive allowing >> + * multiple, parallel, readers to efficiently and safely (i.e., in a >> + * data-race free manner) access lock-protected data. The RTE seqlock >> + * permits multiple writers as well. A spinlock is used to >> + * writer-writer synchronization. >> + * >> + * A reader never blocks a writer. Very high frequency writes may >> + * prevent readers from making progress. >> + * >> + * A seqlock is not preemption-safe on the writer side. If a writer is >> + * preempted, it may block readers until the writer thread is allowed >> + * to continue. Heavy computations should be kept out of the >> + * writer-side critical section, to avoid delaying readers. >> + * >> + * Seqlocks are useful for data which are read by many cores, at a >> + * high frequency, and relatively infrequently written to. >> + * >> + * One way to think about seqlocks is that they provide means to >> + * perform atomic operations on objects larger than what the native >> + * machine instructions allow for. >> + * >> + * To avoid resource reclamation issues, the data protected by a >> + * seqlock should typically be kept self-contained (e.g., no pointers >> + * to mutable, dynamically allocated data). >> + * >> + * Example usage: >> + * @code{.c} >> + * #define MAX_Y_LEN (16) >> + * // Application-defined example data structure, protected by a seqlock. >> + * struct config { >> + * rte_seqlock_t lock; >> + * int param_x; >> + * char param_y[MAX_Y_LEN]; >> + * }; >> + * >> + * // Accessor function for reading config fields. >> + * void >> + * config_read(const struct config *config, int *param_x, char *param_y) >> + * { >> + * uint32_t sn; >> + * >> + * do { >> + * sn = rte_seqlock_read_begin(&config->lock); >> + * >> + * // Loads may be atomic or non-atomic, as in this example. >> + * *param_x = config->param_x; >> + * strcpy(param_y, config->param_y); >> + * // An alternative to an immediate retry is to abort and >> + * // try again at some later time, assuming progress is >> + * // possible without the data. >> + * } while (rte_seqlock_read_retry(&config->lock)); >> + * } >> + * >> + * // Accessor function for writing config fields. >> + * void >> + * config_update(struct config *config, int param_x, const char *param_y) >> + * { >> + * rte_seqlock_write_lock(&config->lock); >> + * // Stores may be atomic or non-atomic, as in this example. >> + * config->param_x = param_x; >> + * strcpy(config->param_y, param_y); >> + * rte_seqlock_write_unlock(&config->lock); >> + * } >> + * @endcode >> + * >> + * @see >> + * https://en.wikipedia.org/wiki/Seqlock. >> + */ > > > The rest lgtm. > > Thanks a lot David for the review! ^ permalink raw reply [flat|nested] 104+ messages in thread
* [PATCH v5] eal: add seqlock 2022-05-01 13:46 ` Mattias Rönnblom @ 2022-05-01 14:03 ` Mattias Rönnblom 2022-05-01 14:22 ` Mattias Rönnblom ` (2 more replies) 0 siblings, 3 replies; 104+ messages in thread From: Mattias Rönnblom @ 2022-05-01 14:03 UTC (permalink / raw) To: dev Cc: Thomas Monjalon, David Marchand, onar.olsen, Honnappa.Nagarahalli, nd, konstantin.ananyev, mb, stephen, hofors, Mattias Rönnblom, Ola Liljedahl A sequence lock (seqlock) is synchronization primitive which allows for data-race free, low-overhead, high-frequency reads, especially for data structures shared across many cores and which are updated relatively infrequently. A seqlock permits multiple parallel readers. The variant of seqlock implemented in this patch supports multiple writers as well. A spinlock is used for writer-writer serialization. To avoid resource reclamation and other issues, the data protected by a seqlock is best off being self-contained (i.e., no pointers [except to constant data]). One way to think about seqlocks is that they provide means to perform atomic operations on data objects larger than what the native atomic machine instructions allow for. DPDK seqlocks are not preemption safe on the writer side. A thread preemption affects performance, not correctness. A seqlock contains a sequence number, which can be thought of as the generation of the data it protects. A reader will 1. Load the sequence number (sn). 2. Load, in arbitrary order, the seqlock-protected data. 3. Load the sn again. 4. Check if the first and second sn are equal, and even numbered. If they are not, discard the loaded data, and restart from 1. The first three steps need to be ordered using suitable memory fences. A writer will 1. Take the spinlock, to serialize writer access. 2. Load the sn. 3. Store the original sn + 1 as the new sn. 4. Perform load and stores to the seqlock-protected data. 5. Store the original sn + 2 as the new sn. 6. Release the spinlock. Proper memory fencing is required to make sure the first sn store, the data stores, and the second sn store appear to the reader in the mentioned order. The sn loads and stores must be atomic, but the data loads and stores need not be. The original seqlock design and implementation was done by Stephen Hemminger. This is an independent implementation, using C11 atomics. For more information on seqlocks, see https://en.wikipedia.org/wiki/Seqlock --- PATCH v5: * Add sequence lock section to MAINTAINERS. * Add entry in the release notes. * Add seqlock reference in the API index. * Fix meson build file indentation. * Use "increment" to describe how a writer changes the sequence number. * Remove compiler barriers from seqlock test. * Use appropriate macros (e.g., TEST_SUCCESS) for test return values. PATCH v4: * Reverted to Linux kernel style naming on the read side. * Bail out early from the retry function if an odd sequence number is encountered. * Added experimental warnings in the API documentation. * Static initializer now uses named field initialization. * Various tweaks to API documentation (including the example). PATCH v3: * Renamed both read and write-side critical section begin/end functions to better match rwlock naming, per Ola Liljedahl's suggestion. * Added 'extern "C"' guards for C++ compatibility. * Refer to the main lcore as the main lcore, and nothing else. PATCH v2: * Skip instead of fail unit test in case too few lcores are available. * Use main lcore for testing, reducing the minimum number of lcores required to run the unit tests to four. * Consistently refer to sn field as the "sequence number" in the documentation. * Fixed spelling mistakes in documentation. Updates since RFC: * Added API documentation. * Added link to Wikipedia article in the commit message. * Changed seqlock sequence number field from uint64_t (which was overkill) to uint32_t. The sn type needs to be sufficiently large to assure no reader will read a sn, access the data, and then read the same sn, but the sn has been incremented enough times to have wrapped during the read, and arrived back at the original sn. * Added RTE_SEQLOCK_INITIALIZER macro for static initialization. * Removed the rte_seqlock struct + separate rte_seqlock_t typedef with an anonymous struct typedef:ed to rte_seqlock_t. Acked-by: Morten Brørup <mb@smartsharesystems.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com> --- MAINTAINERS | 5 + app/test/meson.build | 2 + app/test/test_seqlock.c | 183 ++++++++++++++ doc/api/doxy-api-index.md | 1 + doc/guides/rel_notes/release_22_07.rst | 14 ++ lib/eal/common/meson.build | 1 + lib/eal/common/rte_seqlock.c | 12 + lib/eal/include/meson.build | 1 + lib/eal/include/rte_seqlock.h | 322 +++++++++++++++++++++++++ lib/eal/version.map | 3 + 10 files changed, 544 insertions(+) create mode 100644 app/test/test_seqlock.c create mode 100644 lib/eal/common/rte_seqlock.c create mode 100644 lib/eal/include/rte_seqlock.h diff --git a/MAINTAINERS b/MAINTAINERS index 7c4f541dba..2804d8136c 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -262,6 +262,11 @@ M: Joyce Kong <joyce.kong@arm.com> F: lib/eal/include/generic/rte_ticketlock.h F: app/test/test_ticketlock.c +Sequence Lock +M: Mattias Rönnblom <mattias.ronnblom@ericsson.com> +F: lib/eal/include/rte_seqlock.h +F: app/test/test_seqlock.c + Pseudo-random Number Generation M: Mattias Rönnblom <mattias.ronnblom@ericsson.com> F: lib/eal/include/rte_random.h diff --git a/app/test/meson.build b/app/test/meson.build index 5fc1dd1b7b..5e418e8766 100644 --- a/app/test/meson.build +++ b/app/test/meson.build @@ -125,6 +125,7 @@ test_sources = files( 'test_rwlock.c', 'test_sched.c', 'test_security.c', + 'test_seqlock.c', 'test_service_cores.c', 'test_spinlock.c', 'test_stack.c', @@ -214,6 +215,7 @@ fast_tests = [ ['rwlock_rde_wro_autotest', true], ['sched_autotest', true], ['security_autotest', false], + ['seqlock_autotest', true], ['spinlock_autotest', true], ['stack_autotest', false], ['stack_lf_autotest', false], diff --git a/app/test/test_seqlock.c b/app/test/test_seqlock.c new file mode 100644 index 0000000000..8d91a23ba7 --- /dev/null +++ b/app/test/test_seqlock.c @@ -0,0 +1,183 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2022 Ericsson AB + */ + +#include <rte_seqlock.h> + +#include <rte_cycles.h> +#include <rte_malloc.h> +#include <rte_random.h> + +#include <inttypes.h> + +#include "test.h" + +struct data { + rte_seqlock_t lock; + + uint64_t a; + uint64_t b __rte_cache_aligned; + uint64_t c __rte_cache_aligned; +} __rte_cache_aligned; + +struct reader { + struct data *data; + uint8_t stop; +}; + +#define WRITER_RUNTIME (2.0) /* s */ + +#define WRITER_MAX_DELAY (100) /* us */ + +#define INTERRUPTED_WRITER_FREQUENCY (1000) +#define WRITER_INTERRUPT_TIME (1) /* us */ + +static int +writer_run(void *arg) +{ + struct data *data = arg; + uint64_t deadline; + + deadline = rte_get_timer_cycles() + + WRITER_RUNTIME * rte_get_timer_hz(); + + while (rte_get_timer_cycles() < deadline) { + bool interrupted; + uint64_t new_value; + unsigned int delay; + + new_value = rte_rand(); + + interrupted = rte_rand_max(INTERRUPTED_WRITER_FREQUENCY) == 0; + + rte_seqlock_write_lock(&data->lock); + + data->c = new_value; + data->b = new_value; + + if (interrupted) + rte_delay_us_block(WRITER_INTERRUPT_TIME); + + data->a = new_value; + + rte_seqlock_write_unlock(&data->lock); + + delay = rte_rand_max(WRITER_MAX_DELAY); + + rte_delay_us_block(delay); + } + + return TEST_SUCCESS; +} + +#define INTERRUPTED_READER_FREQUENCY (1000) +#define READER_INTERRUPT_TIME (1000) /* us */ + +static int +reader_run(void *arg) +{ + struct reader *r = arg; + int rc = TEST_SUCCESS; + + while (__atomic_load_n(&r->stop, __ATOMIC_RELAXED) == 0 && + rc == TEST_SUCCESS) { + struct data *data = r->data; + bool interrupted; + uint32_t sn; + uint64_t a; + uint64_t b; + uint64_t c; + + interrupted = rte_rand_max(INTERRUPTED_READER_FREQUENCY) == 0; + + do { + sn = rte_seqlock_read_begin(&data->lock); + + a = data->a; + if (interrupted) + rte_delay_us_block(READER_INTERRUPT_TIME); + c = data->c; + b = data->b; + + } while (rte_seqlock_read_retry(&data->lock, sn)); + + if (a != b || b != c) { + printf("Reader observed inconsistent data values " + "%" PRIu64 " %" PRIu64 " %" PRIu64 "\n", + a, b, c); + rc = TEST_FAILED; + } + } + + return rc; +} + +static void +reader_stop(struct reader *reader) +{ + __atomic_store_n(&reader->stop, 1, __ATOMIC_RELAXED); +} + +#define NUM_WRITERS (2) /* main lcore + one worker */ +#define MIN_NUM_READERS (2) +#define MAX_READERS (RTE_MAX_LCORE - NUM_WRITERS - 1) +#define MIN_LCORE_COUNT (NUM_WRITERS + MIN_NUM_READERS) + +/* Only a compile-time test */ +static rte_seqlock_t __rte_unused static_init_lock = RTE_SEQLOCK_INITIALIZER; + +static int +test_seqlock(void) +{ + struct reader readers[MAX_READERS]; + unsigned int num_readers; + unsigned int num_lcores; + unsigned int i; + unsigned int lcore_id; + unsigned int reader_lcore_ids[MAX_READERS]; + unsigned int worker_writer_lcore_id = 0; + int rc = TEST_SUCCESS; + + num_lcores = rte_lcore_count(); + + if (num_lcores < MIN_LCORE_COUNT) { + printf("Too few cores to run test. Skipping.\n"); + return TEST_SKIPPED; + } + + num_readers = num_lcores - NUM_WRITERS; + + struct data *data = rte_zmalloc(NULL, sizeof(struct data), 0); + + i = 0; + RTE_LCORE_FOREACH_WORKER(lcore_id) { + if (i == 0) { + rte_eal_remote_launch(writer_run, data, lcore_id); + worker_writer_lcore_id = lcore_id; + } else { + unsigned int reader_idx = i - 1; + struct reader *reader = &readers[reader_idx]; + + reader->data = data; + reader->stop = 0; + + rte_eal_remote_launch(reader_run, reader, lcore_id); + reader_lcore_ids[reader_idx] = lcore_id; + } + i++; + } + + if (writer_run(data) != 0 || + rte_eal_wait_lcore(worker_writer_lcore_id) != 0) + rc = TEST_FAILED; + + for (i = 0; i < num_readers; i++) { + reader_stop(&readers[i]); + if (rte_eal_wait_lcore(reader_lcore_ids[i]) != 0) + rc = TEST_FAILED; + } + + return rc; +} + +REGISTER_TEST_COMMAND(seqlock_autotest, test_seqlock); diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md index 4245b9635c..f23e33ae30 100644 --- a/doc/api/doxy-api-index.md +++ b/doc/api/doxy-api-index.md @@ -77,6 +77,7 @@ The public API headers are grouped by topics: [rwlock] (@ref rte_rwlock.h), [spinlock] (@ref rte_spinlock.h), [ticketlock] (@ref rte_ticketlock.h), + [seqlock] (@ref rte_seqlock.h), [RCU] (@ref rte_rcu_qsbr.h) - **CPU arch**: diff --git a/doc/guides/rel_notes/release_22_07.rst b/doc/guides/rel_notes/release_22_07.rst index 88d6e96cc1..d2f7bafe7b 100644 --- a/doc/guides/rel_notes/release_22_07.rst +++ b/doc/guides/rel_notes/release_22_07.rst @@ -55,6 +55,20 @@ New Features Also, make sure to start the actual text at the margin. ======================================================= +* **Added Sequence Lock.** + + Added a new synchronization primitive: the sequence lock + (seqlock). A seqlock allows for low overhead, parallel reads. The + DPDK seqlock uses a spinlock to serialize multiple writing threads. + + In particular, seqlocks are useful for protecting data structures + which are read very frequently, by threads running on many different + cores, and modified relatively infrequently. + + One way to think about seqlocks is that they provide means to + perform atomic operations on data objects larger than what the + native atomic machine instructions allow for. + * **Updated Intel iavf driver.** * Added Tx QoS queue rate limitation support. diff --git a/lib/eal/common/meson.build b/lib/eal/common/meson.build index 917758cc65..3c896711e5 100644 --- a/lib/eal/common/meson.build +++ b/lib/eal/common/meson.build @@ -35,6 +35,7 @@ sources += files( 'rte_malloc.c', 'rte_random.c', 'rte_reciprocal.c', + 'rte_seqlock.c', 'rte_service.c', 'rte_version.c', ) diff --git a/lib/eal/common/rte_seqlock.c b/lib/eal/common/rte_seqlock.c new file mode 100644 index 0000000000..d4fe648799 --- /dev/null +++ b/lib/eal/common/rte_seqlock.c @@ -0,0 +1,12 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2022 Ericsson AB + */ + +#include <rte_seqlock.h> + +void +rte_seqlock_init(rte_seqlock_t *seqlock) +{ + seqlock->sn = 0; + rte_spinlock_init(&seqlock->lock); +} diff --git a/lib/eal/include/meson.build b/lib/eal/include/meson.build index 9700494816..48df5f1a21 100644 --- a/lib/eal/include/meson.build +++ b/lib/eal/include/meson.build @@ -36,6 +36,7 @@ headers += files( 'rte_per_lcore.h', 'rte_random.h', 'rte_reciprocal.h', + 'rte_seqlock.h', 'rte_service.h', 'rte_service_component.h', 'rte_string_fns.h', diff --git a/lib/eal/include/rte_seqlock.h b/lib/eal/include/rte_seqlock.h new file mode 100644 index 0000000000..13f8ae2e4e --- /dev/null +++ b/lib/eal/include/rte_seqlock.h @@ -0,0 +1,322 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2022 Ericsson AB + */ + +#ifndef _RTE_SEQLOCK_H_ +#define _RTE_SEQLOCK_H_ + +#ifdef __cplusplus +extern "C" { +#endif + +/** + * @file + * RTE Seqlock + * + * A sequence lock (seqlock) is a synchronization primitive allowing + * multiple, parallel, readers to efficiently and safely (i.e., in a + * data-race free manner) access lock-protected data. The RTE seqlock + * permits multiple writers as well. A spinlock is used to + * writer-writer synchronization. + * + * A reader never blocks a writer. Very high frequency writes may + * prevent readers from making progress. + * + * A seqlock is not preemption-safe on the writer side. If a writer is + * preempted, it may block readers until the writer thread is allowed + * to continue. Heavy computations should be kept out of the + * writer-side critical section, to avoid delaying readers. + * + * Seqlocks are useful for data which are read by many cores, at a + * high frequency, and relatively infrequently written to. + * + * One way to think about seqlocks is that they provide means to + * perform atomic operations on objects larger than what the native + * machine instructions allow for. + * + * To avoid resource reclamation issues, the data protected by a + * seqlock should typically be kept self-contained (e.g., no pointers + * to mutable, dynamically allocated data). + * + * Example usage: + * @code{.c} + * #define MAX_Y_LEN (16) + * // Application-defined example data structure, protected by a seqlock. + * struct config { + * rte_seqlock_t lock; + * int param_x; + * char param_y[MAX_Y_LEN]; + * }; + * + * // Accessor function for reading config fields. + * void + * config_read(const struct config *config, int *param_x, char *param_y) + * { + * uint32_t sn; + * + * do { + * sn = rte_seqlock_read_begin(&config->lock); + * + * // Loads may be atomic or non-atomic, as in this example. + * *param_x = config->param_x; + * strcpy(param_y, config->param_y); + * // An alternative to an immediate retry is to abort and + * // try again at some later time, assuming progress is + * // possible without the data. + * } while (rte_seqlock_read_retry(&config->lock)); + * } + * + * // Accessor function for writing config fields. + * void + * config_update(struct config *config, int param_x, const char *param_y) + * { + * rte_seqlock_write_lock(&config->lock); + * // Stores may be atomic or non-atomic, as in this example. + * config->param_x = param_x; + * strcpy(config->param_y, param_y); + * rte_seqlock_write_unlock(&config->lock); + * } + * @endcode + * + * @see + * https://en.wikipedia.org/wiki/Seqlock. + */ + +#include <stdbool.h> +#include <stdint.h> + +#include <rte_atomic.h> +#include <rte_branch_prediction.h> +#include <rte_spinlock.h> + +/** + * The RTE seqlock type. + */ +typedef struct { + uint32_t sn; /**< A sequence number for the protected data. */ + rte_spinlock_t lock; /**< Spinlock used to serialize writers. */ +} rte_seqlock_t; + +/** + * A static seqlock initializer. + */ +#define RTE_SEQLOCK_INITIALIZER \ + { \ + .sn = 0, \ + .lock = RTE_SPINLOCK_INITIALIZER \ + } + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice. + * + * Initialize the seqlock. + * + * This function initializes the seqlock, and leaves the writer-side + * spinlock unlocked. + * + * @param seqlock + * A pointer to the seqlock. + */ +__rte_experimental +void +rte_seqlock_init(rte_seqlock_t *seqlock); + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice. + * + * Begin a read-side critical section. + * + * A call to this function marks the beginning of a read-side critical + * section, for @p seqlock. + * + * rte_seqlock_read_begin() returns a sequence number, which is later + * used in rte_seqlock_read_retry() to check if the protected data + * underwent any modifications during the read transaction. + * + * After (in program order) rte_seqlock_read_begin() has been called, + * the calling thread reads the protected data, for later use. The + * protected data read *must* be copied (either in pristine form, or + * in the form of some derivative), since the caller may only read the + * data from within the read-side critical section (i.e., after + * rte_seqlock_read_begin() and before rte_seqlock_read_retry()), + * but must not act upon the retrieved data while in the critical + * section, since it does not yet know if it is consistent. + * + * The protected data may be read using atomic and/or non-atomic + * operations. + * + * After (in program order) all required data loads have been + * performed, rte_seqlock_read_retry() should be called, marking + * the end of the read-side critical section. + * + * If rte_seqlock_read_retry() returns true, the just-read data is + * inconsistent and should be discarded. The caller has the option to + * either restart the whole procedure right away (i.e., calling + * rte_seqlock_read_begin() again), or do the same at some later time. + * + * If rte_seqlock_read_retry() returns false, the data was read + * atomically and the copied data is consistent. + * + * @param seqlock + * A pointer to the seqlock. + * @return + * The seqlock sequence number for this critical section, to + * later be passed to rte_seqlock_read_retry(). + * + * @see rte_seqlock_read_retry() + */ + +__rte_experimental +static inline uint32_t +rte_seqlock_read_begin(const rte_seqlock_t *seqlock) +{ + /* __ATOMIC_ACQUIRE to prevent loads after (in program order) + * from happening before the sn load. Synchronizes-with the + * store release in rte_seqlock_write_unlock(). + */ + return __atomic_load_n(&seqlock->sn, __ATOMIC_ACQUIRE); +} + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice. + * + * End a read-side critical section. + * + * A call to this function marks the end of a read-side critical + * section, for @p seqlock. The application must supply the sequence + * number produced by the corresponding rte_seqlock_read_begin() call. + * + * After this function has been called, the caller should not access + * the protected data. + * + * In case rte_seqlock_read_retry() returns true, the just-read data + * was modified as it was being read and may be inconsistent, and thus + * should be discarded. + * + * In case this function returns false, the data is consistent and the + * set of atomic and non-atomic load operations performed between + * rte_seqlock_read_begin() and rte_seqlock_read_retry() were atomic, + * as a whole. + * + * @param seqlock + * A pointer to the seqlock. + * @param begin_sn + * The seqlock sequence number returned by rte_seqlock_read_begin(). + * @return + * true or false, if the just-read seqlock-protected data was + * inconsistent or consistent, respectively, at the time it was + * read. + * + * @see rte_seqlock_read_begin() + */ +__rte_experimental +static inline bool +rte_seqlock_read_retry(const rte_seqlock_t *seqlock, uint32_t begin_sn) +{ + uint32_t end_sn; + + /* An odd sequence number means the protected data was being + * modified already at the point of the rte_seqlock_read_begin() + * call. + */ + if (unlikely(begin_sn & 1)) + return true; + + /* make sure the data loads happens before the sn load */ + rte_atomic_thread_fence(__ATOMIC_ACQUIRE); + + end_sn = __atomic_load_n(&seqlock->sn, __ATOMIC_RELAXED); + + /* A writer incremented the sequence number during this read + * critical section. + */ + if (unlikely(begin_sn != end_sn)) + return true; + + return false; +} + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice. + * + * Begin a write-side critical section. + * + * A call to this function acquires the write lock associated @p + * seqlock, and marks the beginning of a write-side critical section. + * + * After having called this function, the caller may go on to modify + * (both read and write) the protected data, in an atomic or + * non-atomic manner. + * + * After the necessary updates have been performed, the application + * calls rte_seqlock_write_unlock(). + * + * This function is not preemption-safe in the sense that preemption + * of the calling thread may block reader progress until the writer + * thread is rescheduled. + * + * Unlike rte_seqlock_read_begin(), each call made to + * rte_seqlock_write_lock() must be matched with an unlock call. + * + * @param seqlock + * A pointer to the seqlock. + * + * @see rte_seqlock_write_unlock() + */ +__rte_experimental +static inline void +rte_seqlock_write_lock(rte_seqlock_t *seqlock) +{ + uint32_t sn; + + /* to synchronize with other writers */ + rte_spinlock_lock(&seqlock->lock); + + sn = seqlock->sn + 1; + + __atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELAXED); + + /* __ATOMIC_RELEASE to prevent stores after (in program order) + * from happening before the sn store. + */ + rte_atomic_thread_fence(__ATOMIC_RELEASE); +} + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice. + * + * End a write-side critical section. + * + * A call to this function marks the end of the write-side critical + * section, for @p seqlock. After this call has been made, the protected + * data may no longer be modified. + * + * @param seqlock + * A pointer to the seqlock. + * + * @see rte_seqlock_write_lock() + */ +__rte_experimental +static inline void +rte_seqlock_write_unlock(rte_seqlock_t *seqlock) +{ + uint32_t sn; + + sn = seqlock->sn + 1; + + /* synchronizes-with the load acquire in rte_seqlock_read_begin() */ + __atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELEASE); + + rte_spinlock_unlock(&seqlock->lock); +} + +#ifdef __cplusplus +} +#endif + +#endif /* _RTE_SEQLOCK_H_ */ diff --git a/lib/eal/version.map b/lib/eal/version.map index b53eeb30d7..4a9d0ed899 100644 --- a/lib/eal/version.map +++ b/lib/eal/version.map @@ -420,6 +420,9 @@ EXPERIMENTAL { rte_intr_instance_free; rte_intr_type_get; rte_intr_type_set; + + # added in 22.07 + rte_seqlock_init; }; INTERNAL { -- 2.25.1 ^ permalink raw reply [flat|nested] 104+ messages in thread
* Re: [PATCH v5] eal: add seqlock 2022-05-01 14:03 ` [PATCH v5] " Mattias Rönnblom @ 2022-05-01 14:22 ` Mattias Rönnblom 2022-05-02 6:47 ` David Marchand 2022-05-01 20:17 ` Stephen Hemminger 2022-05-06 1:26 ` fengchengwen 2 siblings, 1 reply; 104+ messages in thread From: Mattias Rönnblom @ 2022-05-01 14:22 UTC (permalink / raw) To: Mattias Rönnblom, dev Cc: Thomas Monjalon, David Marchand, onar.olsen, Honnappa.Nagarahalli, nd, konstantin.ananyev, mb, stephen, Ola Liljedahl On 2022-05-01 16:03, Mattias Rönnblom wrote: > A sequence lock (seqlock) is synchronization primitive which allows "/../ is a /../" <snip> David, maybe you can fix this typo? Unless there is a need for a new version. ^ permalink raw reply [flat|nested] 104+ messages in thread
* Re: [PATCH v5] eal: add seqlock 2022-05-01 14:22 ` Mattias Rönnblom @ 2022-05-02 6:47 ` David Marchand 0 siblings, 0 replies; 104+ messages in thread From: David Marchand @ 2022-05-02 6:47 UTC (permalink / raw) To: Mattias Rönnblom Cc: Mattias Rönnblom, dev, Thomas Monjalon, onar.olsen, Honnappa Nagarahalli, nd, Ananyev, Konstantin, Morten Brørup, Stephen Hemminger, Ola Liljedahl On Sun, May 1, 2022 at 4:22 PM Mattias Rönnblom <hofors@lysator.liu.se> wrote: > > On 2022-05-01 16:03, Mattias Rönnblom wrote: > > A sequence lock (seqlock) is synchronization primitive which allows > > "/../ is a /../" > > <snip> > > David, maybe you can fix this typo? Unless there is a need for a new > version. Noted. No need for a new version just for this. Thanks. -- David Marchand ^ permalink raw reply [flat|nested] 104+ messages in thread
* Re: [PATCH v5] eal: add seqlock 2022-05-01 14:03 ` [PATCH v5] " Mattias Rönnblom 2022-05-01 14:22 ` Mattias Rönnblom @ 2022-05-01 20:17 ` Stephen Hemminger 2022-05-02 4:51 ` Mattias Rönnblom 2022-05-06 1:26 ` fengchengwen 2 siblings, 1 reply; 104+ messages in thread From: Stephen Hemminger @ 2022-05-01 20:17 UTC (permalink / raw) To: Mattias Rönnblom Cc: dev, Thomas Monjalon, David Marchand, onar.olsen, Honnappa.Nagarahalli, nd, konstantin.ananyev, mb, hofors, Ola Liljedahl On Sun, 1 May 2022 16:03:27 +0200 Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote: > +struct data { > + rte_seqlock_t lock; > + > + uint64_t a; > + uint64_t b __rte_cache_aligned; > + uint64_t c __rte_cache_aligned; > +} __rte_cache_aligned; This will end up taking 192 bytes per lock. Which is a lot especially if embedded in another structure. ^ permalink raw reply [flat|nested] 104+ messages in thread
* Re: [PATCH v5] eal: add seqlock 2022-05-01 20:17 ` Stephen Hemminger @ 2022-05-02 4:51 ` Mattias Rönnblom 0 siblings, 0 replies; 104+ messages in thread From: Mattias Rönnblom @ 2022-05-02 4:51 UTC (permalink / raw) To: Stephen Hemminger Cc: dev, Thomas Monjalon, David Marchand, Onar Olsen, Honnappa.Nagarahalli, nd, konstantin.ananyev, mb, hofors, Ola Liljedahl On 2022-05-01 22:17, Stephen Hemminger wrote: > On Sun, 1 May 2022 16:03:27 +0200 > Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote: > >> +struct data { >> + rte_seqlock_t lock; >> + >> + uint64_t a; >> + uint64_t b __rte_cache_aligned; >> + uint64_t c __rte_cache_aligned; >> +} __rte_cache_aligned; > > This will end up taking 192 bytes per lock. > Which is a lot especially if embedded in another structure. "b" and "c" are cache-line aligned to increase the chance of exposing any bugs in the seqlock implementation. With these annotations, accessing all struct data's fields are multiple distinct interactions with the memory hierarchy, instead of one atomic "request for ownership" type operation for a particular cache line, from the core. At least that what the difference would be in my simple mental model of the typical CPU. You mention this because you think it serves as a bad example, or what is the reason? The lock itself is much smaller than that, and not cache-line aligned. "struct data" are only used by the unit tests. I should have mentioned the reason for the __rte_cache_aligned as a comment. ^ permalink raw reply [flat|nested] 104+ messages in thread
* Re: [PATCH v5] eal: add seqlock 2022-05-01 14:03 ` [PATCH v5] " Mattias Rönnblom 2022-05-01 14:22 ` Mattias Rönnblom 2022-05-01 20:17 ` Stephen Hemminger @ 2022-05-06 1:26 ` fengchengwen 2022-05-06 1:33 ` Honnappa Nagarahalli 2022-05-08 11:56 ` Mattias Rönnblom 2 siblings, 2 replies; 104+ messages in thread From: fengchengwen @ 2022-05-06 1:26 UTC (permalink / raw) To: Mattias Rönnblom, dev Cc: Thomas Monjalon, David Marchand, onar.olsen, Honnappa.Nagarahalli, nd, konstantin.ananyev, mb, stephen, hofors, Ola Liljedahl On 2022/5/1 22:03, Mattias Rönnblom wrote: > A sequence lock (seqlock) is synchronization primitive which allows > for data-race free, low-overhead, high-frequency reads, especially for > data structures shared across many cores and which are updated > relatively infrequently. > ... > +} > + > +static void > +reader_stop(struct reader *reader) > +{ > + __atomic_store_n(&reader->stop, 1, __ATOMIC_RELAXED); > +} > + > +#define NUM_WRITERS (2) /* main lcore + one worker */ > +#define MIN_NUM_READERS (2) > +#define MAX_READERS (RTE_MAX_LCORE - NUM_WRITERS - 1) Why minus 1 ? Suggest define MAX_READERS RTE_MAX_LCORE to avoid underflow with small size VM. > +#define MIN_LCORE_COUNT (NUM_WRITERS + MIN_NUM_READERS) > + > +/* Only a compile-time test */ > +static rte_seqlock_t __rte_unused static_init_lock = RTE_SEQLOCK_INITIALIZER; > + > +static int > +test_seqlock(void) > +{ > + struct reader readers[MAX_READERS]; > + unsigned int num_readers; > + unsigned int num_lcores; > + unsigned int i; > + unsigned int lcore_id; > + unsigned int reader_lcore_ids[MAX_READERS]; > + unsigned int worker_writer_lcore_id = 0; > + int rc = TEST_SUCCESS; > + > + num_lcores = rte_lcore_count(); > + > + if (num_lcores < MIN_LCORE_COUNT) { > + printf("Too few cores to run test. Skipping.\n"); > + return TEST_SKIPPED; > + } > + > + num_readers = num_lcores - NUM_WRITERS; > + > + struct data *data = rte_zmalloc(NULL, sizeof(struct data), 0); Please check whether the value of data is NULL. > + > + i = 0; > + RTE_LCORE_FOREACH_WORKER(lcore_id) { > + if (i == 0) { > + rte_eal_remote_launch(writer_run, data, lcore_id); > + worker_writer_lcore_id = lcore_id; > + } else { > + unsigned int reader_idx = i - 1; > + struct reader *reader = &readers[reader_idx]; > + > + reader->data = data; > + reader->stop = 0; > + > + rte_eal_remote_launch(reader_run, reader, lcore_id); > + reader_lcore_ids[reader_idx] = lcore_id; > + } > + i++; > + } > + > + if (writer_run(data) != 0 || > + rte_eal_wait_lcore(worker_writer_lcore_id) != 0) > + rc = TEST_FAILED; > + > + for (i = 0; i < num_readers; i++) { > + reader_stop(&readers[i]); > + if (rte_eal_wait_lcore(reader_lcore_ids[i]) != 0) > + rc = TEST_FAILED; > + } > + Please free data memory. > + return rc; > +} > + > +REGISTER_TEST_COMMAND(seqlock_autotest, test_seqlock); > diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md > index 4245b9635c..f23e33ae30 100644 > --- a/doc/api/doxy-api-index.md > +++ b/doc/api/doxy-api-index.md > @@ -77,6 +77,7 @@ The public API headers are grouped by topics: > [rwlock] (@ref rte_rwlock.h), > [spinlock] (@ref rte_spinlock.h), > [ticketlock] (@ref rte_ticketlock.h), > + [seqlock] (@ref rte_seqlock.h), > [RCU] (@ref rte_rcu_qsbr.h) > ... > + */ > +__rte_experimental > +static inline bool > +rte_seqlock_read_retry(const rte_seqlock_t *seqlock, uint32_t begin_sn) > +{ > + uint32_t end_sn; > + > + /* An odd sequence number means the protected data was being > + * modified already at the point of the rte_seqlock_read_begin() > + * call. > + */ > + if (unlikely(begin_sn & 1)) > + return true; > + > + /* make sure the data loads happens before the sn load */ > + rte_atomic_thread_fence(__ATOMIC_ACQUIRE); In ARMv8, the rte_atomic_thread_fence(__ATOMIC_ACQUIRE) and rte_smp_rmb() both output 'dma ishld' Suggest use rte_smp_rmb(), please see below comment. > + > + end_sn = __atomic_load_n(&seqlock->sn, __ATOMIC_RELAXED); > + > + /* A writer incremented the sequence number during this read > + * critical section. > + */ > + if (unlikely(begin_sn != end_sn)) > + return true; > + > + return false; > +} > + > +/** > + * @warning > + * @b EXPERIMENTAL: this API may change without prior notice. > + * > + * Begin a write-side critical section. > + * > + * A call to this function acquires the write lock associated @p > + * seqlock, and marks the beginning of a write-side critical section. > + * > + * After having called this function, the caller may go on to modify > + * (both read and write) the protected data, in an atomic or > + * non-atomic manner. > + * > + * After the necessary updates have been performed, the application > + * calls rte_seqlock_write_unlock(). > + * > + * This function is not preemption-safe in the sense that preemption > + * of the calling thread may block reader progress until the writer > + * thread is rescheduled. > + * > + * Unlike rte_seqlock_read_begin(), each call made to > + * rte_seqlock_write_lock() must be matched with an unlock call. > + * > + * @param seqlock > + * A pointer to the seqlock. > + * > + * @see rte_seqlock_write_unlock() > + */ > +__rte_experimental > +static inline void > +rte_seqlock_write_lock(rte_seqlock_t *seqlock) > +{ > + uint32_t sn; > + > + /* to synchronize with other writers */ > + rte_spinlock_lock(&seqlock->lock); > + > + sn = seqlock->sn + 1; > + > + __atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELAXED); > + > + /* __ATOMIC_RELEASE to prevent stores after (in program order) > + * from happening before the sn store. > + */ > + rte_atomic_thread_fence(__ATOMIC_RELEASE); In ARMv8, rte_atomic_thread_fence(__ATOMIC_RELEASE) will output 'dmb ish', and rte_smp_wmb() will output 'dma ishst'. Suggest use rte_smp_wmb(). I think here only need to use store mb here. > +} > + > +/** > + * @warning > + * @b EXPERIMENTAL: this API may change without prior notice. > + * > + * End a write-side critical section. > + * > + * A call to this function marks the end of the write-side critical > + * section, for @p seqlock. After this call has been made, the protected > + * data may no longer be modified. > + * > + * @param seqlock > + * A pointer to the seqlock. > + * > + * @see rte_seqlock_write_lock() > + */ > +__rte_experimental > +static inline void > +rte_seqlock_write_unlock(rte_seqlock_t *seqlock) > +{ > + uint32_t sn; > + > + sn = seqlock->sn + 1; > + > + /* synchronizes-with the load acquire in rte_seqlock_read_begin() */ > + __atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELEASE); > + > + rte_spinlock_unlock(&seqlock->lock); > +} > + > +#ifdef __cplusplus > +} > +#endif > + > +#endif /* _RTE_SEQLOCK_H_ */ > diff --git a/lib/eal/version.map b/lib/eal/version.map > index b53eeb30d7..4a9d0ed899 100644 > --- a/lib/eal/version.map > +++ b/lib/eal/version.map > @@ -420,6 +420,9 @@ EXPERIMENTAL { > rte_intr_instance_free; > rte_intr_type_get; > rte_intr_type_set; > + > + # added in 22.07 > + rte_seqlock_init; > }; > > INTERNAL { > Reviewed-by: Chengwen Feng <fengchengwen@huawei.com> ^ permalink raw reply [flat|nested] 104+ messages in thread
* RE: [PATCH v5] eal: add seqlock 2022-05-06 1:26 ` fengchengwen @ 2022-05-06 1:33 ` Honnappa Nagarahalli 2022-05-06 4:17 ` fengchengwen 2022-05-08 11:56 ` Mattias Rönnblom 1 sibling, 1 reply; 104+ messages in thread From: Honnappa Nagarahalli @ 2022-05-06 1:33 UTC (permalink / raw) To: fengchengwen, Mattias Rönnblom, dev Cc: thomas, David Marchand, onar.olsen, nd, konstantin.ananyev, mb, stephen, hofors, Ola Liljedahl, nd <snip> > > +__rte_experimental > > +static inline bool > > +rte_seqlock_read_retry(const rte_seqlock_t *seqlock, uint32_t > > +begin_sn) { > > + uint32_t end_sn; > > + > > + /* An odd sequence number means the protected data was being > > + * modified already at the point of the rte_seqlock_read_begin() > > + * call. > > + */ > > + if (unlikely(begin_sn & 1)) > > + return true; > > + > > + /* make sure the data loads happens before the sn load */ > > + rte_atomic_thread_fence(__ATOMIC_ACQUIRE); > > In ARMv8, the rte_atomic_thread_fence(__ATOMIC_ACQUIRE) and > rte_smp_rmb() both output 'dma ishld' > Suggest use rte_smp_rmb(), please see below comment. rte_smp_xxx APIs are deprecated. Please check [1] [1] https://www.dpdk.org/blog/2021/03/26/dpdk-adopts-the-c11-memory-model/ <snip> ^ permalink raw reply [flat|nested] 104+ messages in thread
* Re: [PATCH v5] eal: add seqlock 2022-05-06 1:33 ` Honnappa Nagarahalli @ 2022-05-06 4:17 ` fengchengwen 2022-05-06 5:19 ` Honnappa Nagarahalli 0 siblings, 1 reply; 104+ messages in thread From: fengchengwen @ 2022-05-06 4:17 UTC (permalink / raw) To: Honnappa Nagarahalli, Mattias Rönnblom, dev Cc: thomas, David Marchand, onar.olsen, nd, konstantin.ananyev, mb, stephen, hofors, Ola Liljedahl On 2022/5/6 9:33, Honnappa Nagarahalli wrote: > <snip> > >>> +__rte_experimental >>> +static inline bool >>> +rte_seqlock_read_retry(const rte_seqlock_t *seqlock, uint32_t >>> +begin_sn) { >>> + uint32_t end_sn; >>> + >>> + /* An odd sequence number means the protected data was being >>> + * modified already at the point of the rte_seqlock_read_begin() >>> + * call. >>> + */ >>> + if (unlikely(begin_sn & 1)) >>> + return true; >>> + >>> + /* make sure the data loads happens before the sn load */ >>> + rte_atomic_thread_fence(__ATOMIC_ACQUIRE); >> >> In ARMv8, the rte_atomic_thread_fence(__ATOMIC_ACQUIRE) and >> rte_smp_rmb() both output 'dma ishld' >> Suggest use rte_smp_rmb(), please see below comment. > rte_smp_xxx APIs are deprecated. Please check [1] > > [1] https://www.dpdk.org/blog/2021/03/26/dpdk-adopts-the-c11-memory-model/ Got it, thanks And I have a question about ARM: why can't find the parameter(rte_atomic_thread_fence(?)) corresponding to 'dmb ishst'? I tried __ATOMIC_RELEASE/ACQ_REL/SEQ_CST and can't find it. > > <snip> > ^ permalink raw reply [flat|nested] 104+ messages in thread
* RE: [PATCH v5] eal: add seqlock 2022-05-06 4:17 ` fengchengwen @ 2022-05-06 5:19 ` Honnappa Nagarahalli 2022-05-06 7:03 ` fengchengwen 0 siblings, 1 reply; 104+ messages in thread From: Honnappa Nagarahalli @ 2022-05-06 5:19 UTC (permalink / raw) To: fengchengwen, Mattias Rönnblom, dev Cc: thomas, David Marchand, onar.olsen, nd, konstantin.ananyev, mb, stephen, hofors, Ola Liljedahl, nd <snip> > >>> +__rte_experimental > >>> +static inline bool > >>> +rte_seqlock_read_retry(const rte_seqlock_t *seqlock, uint32_t > >>> +begin_sn) { > >>> + uint32_t end_sn; > >>> + > >>> + /* An odd sequence number means the protected data was being > >>> + * modified already at the point of the rte_seqlock_read_begin() > >>> + * call. > >>> + */ > >>> + if (unlikely(begin_sn & 1)) > >>> + return true; > >>> + > >>> + /* make sure the data loads happens before the sn load */ > >>> + rte_atomic_thread_fence(__ATOMIC_ACQUIRE); > >> > >> In ARMv8, the rte_atomic_thread_fence(__ATOMIC_ACQUIRE) and > >> rte_smp_rmb() both output 'dma ishld' > >> Suggest use rte_smp_rmb(), please see below comment. > > rte_smp_xxx APIs are deprecated. Please check [1] > > > > [1] https://www.dpdk.org/blog/2021/03/26/dpdk-adopts-the-c11-memory- > model/ > > Got it, thanks > > And I have a question about ARM: why can't find the > parameter(rte_atomic_thread_fence(?)) corresponding to 'dmb ishst'? > I tried __ATOMIC_RELEASE/ACQ_REL/SEQ_CST and can't find it. 'dmb ishst' prevents store-store reordering. However, '__atomic_thread_fence' (with various memory ordering) requires more stronger barrier [1]. [1] https://preshing.com/20130922/acquire-and-release-fences/ > > > > > <snip> > > ^ permalink raw reply [flat|nested] 104+ messages in thread
* Re: [PATCH v5] eal: add seqlock 2022-05-06 5:19 ` Honnappa Nagarahalli @ 2022-05-06 7:03 ` fengchengwen 0 siblings, 0 replies; 104+ messages in thread From: fengchengwen @ 2022-05-06 7:03 UTC (permalink / raw) To: Honnappa Nagarahalli, Mattias Rönnblom, dev Cc: thomas, David Marchand, onar.olsen, nd, konstantin.ananyev, mb, stephen, hofors, Ola Liljedahl On 2022/5/6 13:19, Honnappa Nagarahalli wrote: > <snip> > >>>>> +__rte_experimental >>>>> +static inline bool >>>>> +rte_seqlock_read_retry(const rte_seqlock_t *seqlock, uint32_t >>>>> +begin_sn) { >>>>> + uint32_t end_sn; >>>>> + >>>>> + /* An odd sequence number means the protected data was being >>>>> + * modified already at the point of the rte_seqlock_read_begin() >>>>> + * call. >>>>> + */ >>>>> + if (unlikely(begin_sn & 1)) >>>>> + return true; >>>>> + >>>>> + /* make sure the data loads happens before the sn load */ >>>>> + rte_atomic_thread_fence(__ATOMIC_ACQUIRE); >>>> >>>> In ARMv8, the rte_atomic_thread_fence(__ATOMIC_ACQUIRE) and >>>> rte_smp_rmb() both output 'dma ishld' >>>> Suggest use rte_smp_rmb(), please see below comment. >>> rte_smp_xxx APIs are deprecated. Please check [1] >>> >>> [1] https://www.dpdk.org/blog/2021/03/26/dpdk-adopts-the-c11-memory- >> model/ >> >> Got it, thanks >> >> And I have a question about ARM: why can't find the >> parameter(rte_atomic_thread_fence(?)) corresponding to 'dmb ishst'? >> I tried __ATOMIC_RELEASE/ACQ_REL/SEQ_CST and can't find it. > 'dmb ishst' prevents store-store reordering. However, '__atomic_thread_fence' (with various memory ordering) requires more stronger barrier [1]. For this seqlock scenario, I think it's OK to use 'dmb ishst' in rte_seqlock_write_lock() instead of use rte_atomic_thread_fence(__ATOMIC_RELEASE), but the 'dmb ishst' havn't corresponding rte_atomic_thread_fence() wrapper, so in this case, we could only use stronger barrier. Since the community has decided to use the C11 memory mode API, it is probably clear about the preceding scenarios (using stronger barrier). I have no comment. > > [1] https://preshing.com/20130922/acquire-and-release-fences/ >> >>> >>> <snip> >>> > ^ permalink raw reply [flat|nested] 104+ messages in thread
* Re: [PATCH v5] eal: add seqlock 2022-05-06 1:26 ` fengchengwen 2022-05-06 1:33 ` Honnappa Nagarahalli @ 2022-05-08 11:56 ` Mattias Rönnblom 2022-05-08 12:12 ` [PATCH v6] " Mattias Rönnblom 1 sibling, 1 reply; 104+ messages in thread From: Mattias Rönnblom @ 2022-05-08 11:56 UTC (permalink / raw) To: fengchengwen, Mattias Rönnblom, dev Cc: Thomas Monjalon, David Marchand, onar.olsen, Honnappa.Nagarahalli, nd, konstantin.ananyev, mb, stephen, Ola Liljedahl On 2022-05-06 03:26, fengchengwen wrote: > On 2022/5/1 22:03, Mattias Rönnblom wrote: >> A sequence lock (seqlock) is synchronization primitive which allows >> for data-race free, low-overhead, high-frequency reads, especially for >> data structures shared across many cores and which are updated >> relatively infrequently. >> > > ... > >> +} >> + >> +static void >> +reader_stop(struct reader *reader) >> +{ >> + __atomic_store_n(&reader->stop, 1, __ATOMIC_RELAXED); >> +} >> + >> +#define NUM_WRITERS (2) /* main lcore + one worker */ >> +#define MIN_NUM_READERS (2) >> +#define MAX_READERS (RTE_MAX_LCORE - NUM_WRITERS - 1) > > Why minus 1 ? > Suggest define MAX_READERS RTE_MAX_LCORE to avoid underflow with small size VM. > OK. >> +#define MIN_LCORE_COUNT (NUM_WRITERS + MIN_NUM_READERS) >> + >> +/* Only a compile-time test */ >> +static rte_seqlock_t __rte_unused static_init_lock = RTE_SEQLOCK_INITIALIZER; >> + >> +static int >> +test_seqlock(void) >> +{ >> + struct reader readers[MAX_READERS]; >> + unsigned int num_readers; >> + unsigned int num_lcores; >> + unsigned int i; >> + unsigned int lcore_id; >> + unsigned int reader_lcore_ids[MAX_READERS]; >> + unsigned int worker_writer_lcore_id = 0; >> + int rc = TEST_SUCCESS; >> + >> + num_lcores = rte_lcore_count(); >> + >> + if (num_lcores < MIN_LCORE_COUNT) { >> + printf("Too few cores to run test. Skipping.\n"); >> + return TEST_SKIPPED; >> + } >> + >> + num_readers = num_lcores - NUM_WRITERS; >> + >> + struct data *data = rte_zmalloc(NULL, sizeof(struct data), 0); > > Please check whether the value of data is NULL. > OK. >> + >> + i = 0; >> + RTE_LCORE_FOREACH_WORKER(lcore_id) { >> + if (i == 0) { >> + rte_eal_remote_launch(writer_run, data, lcore_id); >> + worker_writer_lcore_id = lcore_id; >> + } else { >> + unsigned int reader_idx = i - 1; >> + struct reader *reader = &readers[reader_idx]; >> + >> + reader->data = data; >> + reader->stop = 0; >> + >> + rte_eal_remote_launch(reader_run, reader, lcore_id); >> + reader_lcore_ids[reader_idx] = lcore_id; >> + } >> + i++; >> + } >> + >> + if (writer_run(data) != 0 || >> + rte_eal_wait_lcore(worker_writer_lcore_id) != 0) >> + rc = TEST_FAILED; >> + >> + for (i = 0; i < num_readers; i++) { >> + reader_stop(&readers[i]); >> + if (rte_eal_wait_lcore(reader_lcore_ids[i]) != 0) >> + rc = TEST_FAILED; >> + } >> + > > Please free data memory. > OK. >> + return rc; >> +} >> + >> +REGISTER_TEST_COMMAND(seqlock_autotest, test_seqlock); >> diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md >> index 4245b9635c..f23e33ae30 100644 >> --- a/doc/api/doxy-api-index.md >> +++ b/doc/api/doxy-api-index.md >> @@ -77,6 +77,7 @@ The public API headers are grouped by topics: >> [rwlock] (@ref rte_rwlock.h), >> [spinlock] (@ref rte_spinlock.h), >> [ticketlock] (@ref rte_ticketlock.h), >> + [seqlock] (@ref rte_seqlock.h), >> [RCU] (@ref rte_rcu_qsbr.h) >> > > ... > >> + */ >> +__rte_experimental >> +static inline bool >> +rte_seqlock_read_retry(const rte_seqlock_t *seqlock, uint32_t begin_sn) >> +{ >> + uint32_t end_sn; >> + >> + /* An odd sequence number means the protected data was being >> + * modified already at the point of the rte_seqlock_read_begin() >> + * call. >> + */ >> + if (unlikely(begin_sn & 1)) >> + return true; >> + >> + /* make sure the data loads happens before the sn load */ >> + rte_atomic_thread_fence(__ATOMIC_ACQUIRE); > > In ARMv8, the rte_atomic_thread_fence(__ATOMIC_ACQUIRE) and rte_smp_rmb() both output 'dma ishld' > Suggest use rte_smp_rmb(), please see below comment. > >> + >> + end_sn = __atomic_load_n(&seqlock->sn, __ATOMIC_RELAXED); >> + >> + /* A writer incremented the sequence number during this read >> + * critical section. >> + */ >> + if (unlikely(begin_sn != end_sn)) >> + return true; >> + >> + return false; >> +} >> + >> +/** >> + * @warning >> + * @b EXPERIMENTAL: this API may change without prior notice. >> + * >> + * Begin a write-side critical section. >> + * >> + * A call to this function acquires the write lock associated @p >> + * seqlock, and marks the beginning of a write-side critical section. >> + * >> + * After having called this function, the caller may go on to modify >> + * (both read and write) the protected data, in an atomic or >> + * non-atomic manner. >> + * >> + * After the necessary updates have been performed, the application >> + * calls rte_seqlock_write_unlock(). >> + * >> + * This function is not preemption-safe in the sense that preemption >> + * of the calling thread may block reader progress until the writer >> + * thread is rescheduled. >> + * >> + * Unlike rte_seqlock_read_begin(), each call made to >> + * rte_seqlock_write_lock() must be matched with an unlock call. >> + * >> + * @param seqlock >> + * A pointer to the seqlock. >> + * >> + * @see rte_seqlock_write_unlock() >> + */ >> +__rte_experimental >> +static inline void >> +rte_seqlock_write_lock(rte_seqlock_t *seqlock) >> +{ >> + uint32_t sn; >> + >> + /* to synchronize with other writers */ >> + rte_spinlock_lock(&seqlock->lock); >> + >> + sn = seqlock->sn + 1; >> + >> + __atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELAXED); >> + >> + /* __ATOMIC_RELEASE to prevent stores after (in program order) >> + * from happening before the sn store. >> + */ >> + rte_atomic_thread_fence(__ATOMIC_RELEASE); > > In ARMv8, rte_atomic_thread_fence(__ATOMIC_RELEASE) will output 'dmb ish', and > rte_smp_wmb() will output 'dma ishst'. > Suggest use rte_smp_wmb(). I think here only need to use store mb here. > (This has already been discussed further down in the mail thread, and I have nothing to add.) >> +} >> + >> +/** >> + * @warning >> + * @b EXPERIMENTAL: this API may change without prior notice. >> + * >> + * End a write-side critical section. >> + * >> + * A call to this function marks the end of the write-side critical >> + * section, for @p seqlock. After this call has been made, the protected >> + * data may no longer be modified. >> + * >> + * @param seqlock >> + * A pointer to the seqlock. >> + * >> + * @see rte_seqlock_write_lock() >> + */ >> +__rte_experimental >> +static inline void >> +rte_seqlock_write_unlock(rte_seqlock_t *seqlock) >> +{ >> + uint32_t sn; >> + >> + sn = seqlock->sn + 1; >> + >> + /* synchronizes-with the load acquire in rte_seqlock_read_begin() */ >> + __atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELEASE); >> + >> + rte_spinlock_unlock(&seqlock->lock); >> +} >> + >> +#ifdef __cplusplus >> +} >> +#endif >> + >> +#endif /* _RTE_SEQLOCK_H_ */ >> diff --git a/lib/eal/version.map b/lib/eal/version.map >> index b53eeb30d7..4a9d0ed899 100644 >> --- a/lib/eal/version.map >> +++ b/lib/eal/version.map >> @@ -420,6 +420,9 @@ EXPERIMENTAL { >> rte_intr_instance_free; >> rte_intr_type_get; >> rte_intr_type_set; >> + >> + # added in 22.07 >> + rte_seqlock_init; >> }; >> >> INTERNAL { >> > > Reviewed-by: Chengwen Feng <fengchengwen@huawei.com> > > Thanks a lot for the review! ^ permalink raw reply [flat|nested] 104+ messages in thread
* [PATCH v6] eal: add seqlock 2022-05-08 11:56 ` Mattias Rönnblom @ 2022-05-08 12:12 ` Mattias Rönnblom 2022-05-08 16:10 ` Stephen Hemminger 0 siblings, 1 reply; 104+ messages in thread From: Mattias Rönnblom @ 2022-05-08 12:12 UTC (permalink / raw) To: Thomas Monjalon, David Marchand Cc: dev, onar.olsen, Honnappa.Nagarahalli, nd, konstantin.ananyev, mb, stephen, hofors, Chengwen Feng, Mattias Rönnblom, Ola Liljedahl A sequence lock (seqlock) is a synchronization primitive which allows for data-race free, low-overhead, high-frequency reads, suitable for data structures shared across many cores and which are updated relatively infrequently. A seqlock permits multiple parallel readers. The variant of seqlock implemented in this patch supports multiple writers as well. A spinlock is used for writer-writer serialization. To avoid resource reclamation and other issues, the data protected by a seqlock is best off being self-contained (i.e., no pointers [except to constant data]). One way to think about seqlocks is that they provide means to perform atomic operations on data objects larger than what the native atomic machine instructions allow for. DPDK seqlocks are not preemption safe on the writer side. A thread preemption affects performance, not correctness. A seqlock contains a sequence number, which can be thought of as the generation of the data it protects. A reader will 1. Load the sequence number (sn). 2. Load, in arbitrary order, the seqlock-protected data. 3. Load the sn again. 4. Check if the first and second sn are equal, and even numbered. If they are not, discard the loaded data, and restart from 1. The first three steps need to be ordered using suitable memory fences. A writer will 1. Take the spinlock, to serialize writer access. 2. Load the sn. 3. Store the original sn + 1 as the new sn. 4. Perform load and stores to the seqlock-protected data. 5. Store the original sn + 2 as the new sn. 6. Release the spinlock. Proper memory fencing is required to make sure the first sn store, the data stores, and the second sn store appear to the reader in the mentioned order. The sn loads and stores must be atomic, but the data loads and stores need not be. The original seqlock design and implementation was done by Stephen Hemminger. This is an independent implementation, using C11 atomics. For more information on seqlocks, see https://en.wikipedia.org/wiki/Seqlock --- PATCH v6: * Check for failed memory allocations in unit test. * Fix underflow issue in test case for small RTE_LCORE_MAX values. * Fix test case memory leak. PATCH v5: * Add sequence lock section to MAINTAINERS. * Add entry in the release notes. * Add seqlock reference in the API index. * Fix meson build file indentation. * Use "increment" to describe how a writer changes the sequence number. * Remove compiler barriers from seqlock test. * Use appropriate macros (e.g., TEST_SUCCESS) for test return values. PATCH v4: * Reverted to Linux kernel style naming on the read side. * Bail out early from the retry function if an odd sequence number is encountered. * Added experimental warnings in the API documentation. * Static initializer now uses named field initialization. * Various tweaks to API documentation (including the example). PATCH v3: * Renamed both read and write-side critical section begin/end functions to better match rwlock naming, per Ola Liljedahl's suggestion. * Added 'extern "C"' guards for C++ compatibility. * Refer to the main lcore as the main lcore, and nothing else. PATCH v2: * Skip instead of fail unit test in case too few lcores are available. * Use main lcore for testing, reducing the minimum number of lcores required to run the unit tests to four. * Consistently refer to sn field as the "sequence number" in the documentation. * Fixed spelling mistakes in documentation. Updates since RFC: * Added API documentation. * Added link to Wikipedia article in the commit message. * Changed seqlock sequence number field from uint64_t (which was overkill) to uint32_t. The sn type needs to be sufficiently large to assure no reader will read a sn, access the data, and then read the same sn, but the sn has been incremented enough times to have wrapped during the read, and arrived back at the original sn. * Added RTE_SEQLOCK_INITIALIZER macro for static initialization. * Removed the rte_seqlock struct + separate rte_seqlock_t typedef with an anonymous struct typedef:ed to rte_seqlock_t. Acked-by: Morten Brørup <mb@smartsharesystems.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com> Reviewed-by: Chengwen Feng <fengchengwen@huawei.com> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com> --- MAINTAINERS | 5 + app/test/meson.build | 2 + app/test/test_seqlock.c | 190 +++++++++++++++ doc/api/doxy-api-index.md | 1 + doc/guides/rel_notes/release_22_07.rst | 14 ++ lib/eal/common/meson.build | 1 + lib/eal/common/rte_seqlock.c | 12 + lib/eal/include/meson.build | 1 + lib/eal/include/rte_seqlock.h | 322 +++++++++++++++++++++++++ lib/eal/version.map | 3 + 10 files changed, 551 insertions(+) create mode 100644 app/test/test_seqlock.c create mode 100644 lib/eal/common/rte_seqlock.c create mode 100644 lib/eal/include/rte_seqlock.h diff --git a/MAINTAINERS b/MAINTAINERS index 7c4f541dba..2804d8136c 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -262,6 +262,11 @@ M: Joyce Kong <joyce.kong@arm.com> F: lib/eal/include/generic/rte_ticketlock.h F: app/test/test_ticketlock.c +Sequence Lock +M: Mattias Rönnblom <mattias.ronnblom@ericsson.com> +F: lib/eal/include/rte_seqlock.h +F: app/test/test_seqlock.c + Pseudo-random Number Generation M: Mattias Rönnblom <mattias.ronnblom@ericsson.com> F: lib/eal/include/rte_random.h diff --git a/app/test/meson.build b/app/test/meson.build index 5fc1dd1b7b..5e418e8766 100644 --- a/app/test/meson.build +++ b/app/test/meson.build @@ -125,6 +125,7 @@ test_sources = files( 'test_rwlock.c', 'test_sched.c', 'test_security.c', + 'test_seqlock.c', 'test_service_cores.c', 'test_spinlock.c', 'test_stack.c', @@ -214,6 +215,7 @@ fast_tests = [ ['rwlock_rde_wro_autotest', true], ['sched_autotest', true], ['security_autotest', false], + ['seqlock_autotest', true], ['spinlock_autotest', true], ['stack_autotest', false], ['stack_lf_autotest', false], diff --git a/app/test/test_seqlock.c b/app/test/test_seqlock.c new file mode 100644 index 0000000000..cb1c1baa82 --- /dev/null +++ b/app/test/test_seqlock.c @@ -0,0 +1,190 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2022 Ericsson AB + */ + +#include <rte_seqlock.h> + +#include <rte_cycles.h> +#include <rte_malloc.h> +#include <rte_random.h> + +#include <inttypes.h> + +#include "test.h" + +struct data { + rte_seqlock_t lock; + + uint64_t a; + uint64_t b __rte_cache_aligned; + uint64_t c __rte_cache_aligned; +} __rte_cache_aligned; + +struct reader { + struct data *data; + uint8_t stop; +}; + +#define WRITER_RUNTIME (2.0) /* s */ + +#define WRITER_MAX_DELAY (100) /* us */ + +#define INTERRUPTED_WRITER_FREQUENCY (1000) +#define WRITER_INTERRUPT_TIME (1) /* us */ + +static int +writer_run(void *arg) +{ + struct data *data = arg; + uint64_t deadline; + + deadline = rte_get_timer_cycles() + + WRITER_RUNTIME * rte_get_timer_hz(); + + while (rte_get_timer_cycles() < deadline) { + bool interrupted; + uint64_t new_value; + unsigned int delay; + + new_value = rte_rand(); + + interrupted = rte_rand_max(INTERRUPTED_WRITER_FREQUENCY) == 0; + + rte_seqlock_write_lock(&data->lock); + + data->c = new_value; + data->b = new_value; + + if (interrupted) + rte_delay_us_block(WRITER_INTERRUPT_TIME); + + data->a = new_value; + + rte_seqlock_write_unlock(&data->lock); + + delay = rte_rand_max(WRITER_MAX_DELAY); + + rte_delay_us_block(delay); + } + + return TEST_SUCCESS; +} + +#define INTERRUPTED_READER_FREQUENCY (1000) +#define READER_INTERRUPT_TIME (1000) /* us */ + +static int +reader_run(void *arg) +{ + struct reader *r = arg; + int rc = TEST_SUCCESS; + + while (__atomic_load_n(&r->stop, __ATOMIC_RELAXED) == 0 && + rc == TEST_SUCCESS) { + struct data *data = r->data; + bool interrupted; + uint32_t sn; + uint64_t a; + uint64_t b; + uint64_t c; + + interrupted = rte_rand_max(INTERRUPTED_READER_FREQUENCY) == 0; + + do { + sn = rte_seqlock_read_begin(&data->lock); + + a = data->a; + if (interrupted) + rte_delay_us_block(READER_INTERRUPT_TIME); + c = data->c; + b = data->b; + + } while (rte_seqlock_read_retry(&data->lock, sn)); + + if (a != b || b != c) { + printf("Reader observed inconsistent data values " + "%" PRIu64 " %" PRIu64 " %" PRIu64 "\n", + a, b, c); + rc = TEST_FAILED; + } + } + + return rc; +} + +static void +reader_stop(struct reader *reader) +{ + __atomic_store_n(&reader->stop, 1, __ATOMIC_RELAXED); +} + +#define NUM_WRITERS (2) /* main lcore + one worker */ +#define MIN_NUM_READERS (2) +#define MIN_LCORE_COUNT (NUM_WRITERS + MIN_NUM_READERS) + +/* Only a compile-time test */ +static rte_seqlock_t __rte_unused static_init_lock = RTE_SEQLOCK_INITIALIZER; + +static int +test_seqlock(void) +{ + struct reader readers[RTE_MAX_LCORE]; + unsigned int num_lcores; + unsigned int num_readers; + struct data *data; + unsigned int i; + unsigned int lcore_id; + unsigned int reader_lcore_ids[RTE_MAX_LCORE]; + unsigned int worker_writer_lcore_id = 0; + int rc = TEST_SUCCESS; + + num_lcores = rte_lcore_count(); + + if (num_lcores < MIN_LCORE_COUNT) { + printf("Too few cores to run test. Skipping.\n"); + return TEST_SKIPPED; + } + + num_readers = num_lcores - NUM_WRITERS; + + data = rte_zmalloc(NULL, sizeof(struct data), 0); + + if (data == NULL) { + printf("Failed to allocate memory for seqlock data\n"); + return TEST_FAILED; + } + + i = 0; + RTE_LCORE_FOREACH_WORKER(lcore_id) { + if (i == 0) { + rte_eal_remote_launch(writer_run, data, lcore_id); + worker_writer_lcore_id = lcore_id; + } else { + unsigned int reader_idx = i - 1; + struct reader *reader = &readers[reader_idx]; + + reader->data = data; + reader->stop = 0; + + rte_eal_remote_launch(reader_run, reader, lcore_id); + reader_lcore_ids[reader_idx] = lcore_id; + } + i++; + } + + if (writer_run(data) != 0 || + rte_eal_wait_lcore(worker_writer_lcore_id) != 0) + rc = TEST_FAILED; + + for (i = 0; i < num_readers; i++) { + reader_stop(&readers[i]); + if (rte_eal_wait_lcore(reader_lcore_ids[i]) != 0) + rc = TEST_FAILED; + } + + rte_free(data); + + return rc; +} + +REGISTER_TEST_COMMAND(seqlock_autotest, test_seqlock); diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md index 4245b9635c..f23e33ae30 100644 --- a/doc/api/doxy-api-index.md +++ b/doc/api/doxy-api-index.md @@ -77,6 +77,7 @@ The public API headers are grouped by topics: [rwlock] (@ref rte_rwlock.h), [spinlock] (@ref rte_spinlock.h), [ticketlock] (@ref rte_ticketlock.h), + [seqlock] (@ref rte_seqlock.h), [RCU] (@ref rte_rcu_qsbr.h) - **CPU arch**: diff --git a/doc/guides/rel_notes/release_22_07.rst b/doc/guides/rel_notes/release_22_07.rst index 88d6e96cc1..d2f7bafe7b 100644 --- a/doc/guides/rel_notes/release_22_07.rst +++ b/doc/guides/rel_notes/release_22_07.rst @@ -55,6 +55,20 @@ New Features Also, make sure to start the actual text at the margin. ======================================================= +* **Added Sequence Lock.** + + Added a new synchronization primitive: the sequence lock + (seqlock). A seqlock allows for low overhead, parallel reads. The + DPDK seqlock uses a spinlock to serialize multiple writing threads. + + In particular, seqlocks are useful for protecting data structures + which are read very frequently, by threads running on many different + cores, and modified relatively infrequently. + + One way to think about seqlocks is that they provide means to + perform atomic operations on data objects larger than what the + native atomic machine instructions allow for. + * **Updated Intel iavf driver.** * Added Tx QoS queue rate limitation support. diff --git a/lib/eal/common/meson.build b/lib/eal/common/meson.build index 917758cc65..3c896711e5 100644 --- a/lib/eal/common/meson.build +++ b/lib/eal/common/meson.build @@ -35,6 +35,7 @@ sources += files( 'rte_malloc.c', 'rte_random.c', 'rte_reciprocal.c', + 'rte_seqlock.c', 'rte_service.c', 'rte_version.c', ) diff --git a/lib/eal/common/rte_seqlock.c b/lib/eal/common/rte_seqlock.c new file mode 100644 index 0000000000..d4fe648799 --- /dev/null +++ b/lib/eal/common/rte_seqlock.c @@ -0,0 +1,12 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2022 Ericsson AB + */ + +#include <rte_seqlock.h> + +void +rte_seqlock_init(rte_seqlock_t *seqlock) +{ + seqlock->sn = 0; + rte_spinlock_init(&seqlock->lock); +} diff --git a/lib/eal/include/meson.build b/lib/eal/include/meson.build index 9700494816..48df5f1a21 100644 --- a/lib/eal/include/meson.build +++ b/lib/eal/include/meson.build @@ -36,6 +36,7 @@ headers += files( 'rte_per_lcore.h', 'rte_random.h', 'rte_reciprocal.h', + 'rte_seqlock.h', 'rte_service.h', 'rte_service_component.h', 'rte_string_fns.h', diff --git a/lib/eal/include/rte_seqlock.h b/lib/eal/include/rte_seqlock.h new file mode 100644 index 0000000000..13f8ae2e4e --- /dev/null +++ b/lib/eal/include/rte_seqlock.h @@ -0,0 +1,322 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2022 Ericsson AB + */ + +#ifndef _RTE_SEQLOCK_H_ +#define _RTE_SEQLOCK_H_ + +#ifdef __cplusplus +extern "C" { +#endif + +/** + * @file + * RTE Seqlock + * + * A sequence lock (seqlock) is a synchronization primitive allowing + * multiple, parallel, readers to efficiently and safely (i.e., in a + * data-race free manner) access lock-protected data. The RTE seqlock + * permits multiple writers as well. A spinlock is used to + * writer-writer synchronization. + * + * A reader never blocks a writer. Very high frequency writes may + * prevent readers from making progress. + * + * A seqlock is not preemption-safe on the writer side. If a writer is + * preempted, it may block readers until the writer thread is allowed + * to continue. Heavy computations should be kept out of the + * writer-side critical section, to avoid delaying readers. + * + * Seqlocks are useful for data which are read by many cores, at a + * high frequency, and relatively infrequently written to. + * + * One way to think about seqlocks is that they provide means to + * perform atomic operations on objects larger than what the native + * machine instructions allow for. + * + * To avoid resource reclamation issues, the data protected by a + * seqlock should typically be kept self-contained (e.g., no pointers + * to mutable, dynamically allocated data). + * + * Example usage: + * @code{.c} + * #define MAX_Y_LEN (16) + * // Application-defined example data structure, protected by a seqlock. + * struct config { + * rte_seqlock_t lock; + * int param_x; + * char param_y[MAX_Y_LEN]; + * }; + * + * // Accessor function for reading config fields. + * void + * config_read(const struct config *config, int *param_x, char *param_y) + * { + * uint32_t sn; + * + * do { + * sn = rte_seqlock_read_begin(&config->lock); + * + * // Loads may be atomic or non-atomic, as in this example. + * *param_x = config->param_x; + * strcpy(param_y, config->param_y); + * // An alternative to an immediate retry is to abort and + * // try again at some later time, assuming progress is + * // possible without the data. + * } while (rte_seqlock_read_retry(&config->lock)); + * } + * + * // Accessor function for writing config fields. + * void + * config_update(struct config *config, int param_x, const char *param_y) + * { + * rte_seqlock_write_lock(&config->lock); + * // Stores may be atomic or non-atomic, as in this example. + * config->param_x = param_x; + * strcpy(config->param_y, param_y); + * rte_seqlock_write_unlock(&config->lock); + * } + * @endcode + * + * @see + * https://en.wikipedia.org/wiki/Seqlock. + */ + +#include <stdbool.h> +#include <stdint.h> + +#include <rte_atomic.h> +#include <rte_branch_prediction.h> +#include <rte_spinlock.h> + +/** + * The RTE seqlock type. + */ +typedef struct { + uint32_t sn; /**< A sequence number for the protected data. */ + rte_spinlock_t lock; /**< Spinlock used to serialize writers. */ +} rte_seqlock_t; + +/** + * A static seqlock initializer. + */ +#define RTE_SEQLOCK_INITIALIZER \ + { \ + .sn = 0, \ + .lock = RTE_SPINLOCK_INITIALIZER \ + } + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice. + * + * Initialize the seqlock. + * + * This function initializes the seqlock, and leaves the writer-side + * spinlock unlocked. + * + * @param seqlock + * A pointer to the seqlock. + */ +__rte_experimental +void +rte_seqlock_init(rte_seqlock_t *seqlock); + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice. + * + * Begin a read-side critical section. + * + * A call to this function marks the beginning of a read-side critical + * section, for @p seqlock. + * + * rte_seqlock_read_begin() returns a sequence number, which is later + * used in rte_seqlock_read_retry() to check if the protected data + * underwent any modifications during the read transaction. + * + * After (in program order) rte_seqlock_read_begin() has been called, + * the calling thread reads the protected data, for later use. The + * protected data read *must* be copied (either in pristine form, or + * in the form of some derivative), since the caller may only read the + * data from within the read-side critical section (i.e., after + * rte_seqlock_read_begin() and before rte_seqlock_read_retry()), + * but must not act upon the retrieved data while in the critical + * section, since it does not yet know if it is consistent. + * + * The protected data may be read using atomic and/or non-atomic + * operations. + * + * After (in program order) all required data loads have been + * performed, rte_seqlock_read_retry() should be called, marking + * the end of the read-side critical section. + * + * If rte_seqlock_read_retry() returns true, the just-read data is + * inconsistent and should be discarded. The caller has the option to + * either restart the whole procedure right away (i.e., calling + * rte_seqlock_read_begin() again), or do the same at some later time. + * + * If rte_seqlock_read_retry() returns false, the data was read + * atomically and the copied data is consistent. + * + * @param seqlock + * A pointer to the seqlock. + * @return + * The seqlock sequence number for this critical section, to + * later be passed to rte_seqlock_read_retry(). + * + * @see rte_seqlock_read_retry() + */ + +__rte_experimental +static inline uint32_t +rte_seqlock_read_begin(const rte_seqlock_t *seqlock) +{ + /* __ATOMIC_ACQUIRE to prevent loads after (in program order) + * from happening before the sn load. Synchronizes-with the + * store release in rte_seqlock_write_unlock(). + */ + return __atomic_load_n(&seqlock->sn, __ATOMIC_ACQUIRE); +} + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice. + * + * End a read-side critical section. + * + * A call to this function marks the end of a read-side critical + * section, for @p seqlock. The application must supply the sequence + * number produced by the corresponding rte_seqlock_read_begin() call. + * + * After this function has been called, the caller should not access + * the protected data. + * + * In case rte_seqlock_read_retry() returns true, the just-read data + * was modified as it was being read and may be inconsistent, and thus + * should be discarded. + * + * In case this function returns false, the data is consistent and the + * set of atomic and non-atomic load operations performed between + * rte_seqlock_read_begin() and rte_seqlock_read_retry() were atomic, + * as a whole. + * + * @param seqlock + * A pointer to the seqlock. + * @param begin_sn + * The seqlock sequence number returned by rte_seqlock_read_begin(). + * @return + * true or false, if the just-read seqlock-protected data was + * inconsistent or consistent, respectively, at the time it was + * read. + * + * @see rte_seqlock_read_begin() + */ +__rte_experimental +static inline bool +rte_seqlock_read_retry(const rte_seqlock_t *seqlock, uint32_t begin_sn) +{ + uint32_t end_sn; + + /* An odd sequence number means the protected data was being + * modified already at the point of the rte_seqlock_read_begin() + * call. + */ + if (unlikely(begin_sn & 1)) + return true; + + /* make sure the data loads happens before the sn load */ + rte_atomic_thread_fence(__ATOMIC_ACQUIRE); + + end_sn = __atomic_load_n(&seqlock->sn, __ATOMIC_RELAXED); + + /* A writer incremented the sequence number during this read + * critical section. + */ + if (unlikely(begin_sn != end_sn)) + return true; + + return false; +} + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice. + * + * Begin a write-side critical section. + * + * A call to this function acquires the write lock associated @p + * seqlock, and marks the beginning of a write-side critical section. + * + * After having called this function, the caller may go on to modify + * (both read and write) the protected data, in an atomic or + * non-atomic manner. + * + * After the necessary updates have been performed, the application + * calls rte_seqlock_write_unlock(). + * + * This function is not preemption-safe in the sense that preemption + * of the calling thread may block reader progress until the writer + * thread is rescheduled. + * + * Unlike rte_seqlock_read_begin(), each call made to + * rte_seqlock_write_lock() must be matched with an unlock call. + * + * @param seqlock + * A pointer to the seqlock. + * + * @see rte_seqlock_write_unlock() + */ +__rte_experimental +static inline void +rte_seqlock_write_lock(rte_seqlock_t *seqlock) +{ + uint32_t sn; + + /* to synchronize with other writers */ + rte_spinlock_lock(&seqlock->lock); + + sn = seqlock->sn + 1; + + __atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELAXED); + + /* __ATOMIC_RELEASE to prevent stores after (in program order) + * from happening before the sn store. + */ + rte_atomic_thread_fence(__ATOMIC_RELEASE); +} + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice. + * + * End a write-side critical section. + * + * A call to this function marks the end of the write-side critical + * section, for @p seqlock. After this call has been made, the protected + * data may no longer be modified. + * + * @param seqlock + * A pointer to the seqlock. + * + * @see rte_seqlock_write_lock() + */ +__rte_experimental +static inline void +rte_seqlock_write_unlock(rte_seqlock_t *seqlock) +{ + uint32_t sn; + + sn = seqlock->sn + 1; + + /* synchronizes-with the load acquire in rte_seqlock_read_begin() */ + __atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELEASE); + + rte_spinlock_unlock(&seqlock->lock); +} + +#ifdef __cplusplus +} +#endif + +#endif /* _RTE_SEQLOCK_H_ */ diff --git a/lib/eal/version.map b/lib/eal/version.map index b53eeb30d7..4a9d0ed899 100644 --- a/lib/eal/version.map +++ b/lib/eal/version.map @@ -420,6 +420,9 @@ EXPERIMENTAL { rte_intr_instance_free; rte_intr_type_get; rte_intr_type_set; + + # added in 22.07 + rte_seqlock_init; }; INTERNAL { -- 2.25.1 ^ permalink raw reply [flat|nested] 104+ messages in thread
* Re: [PATCH v6] eal: add seqlock 2022-05-08 12:12 ` [PATCH v6] " Mattias Rönnblom @ 2022-05-08 16:10 ` Stephen Hemminger 2022-05-08 19:40 ` Mattias Rönnblom 0 siblings, 1 reply; 104+ messages in thread From: Stephen Hemminger @ 2022-05-08 16:10 UTC (permalink / raw) To: Mattias Rönnblom Cc: Thomas Monjalon, David Marchand, dev, onar.olsen, Honnappa.Nagarahalli, nd, konstantin.ananyev, mb, hofors, Chengwen Feng, Ola Liljedahl On Sun, 8 May 2022 14:12:42 +0200 Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote: > A sequence lock (seqlock) is a synchronization primitive which allows > for data-race free, low-overhead, high-frequency reads, suitable for > data structures shared across many cores and which are updated > relatively infrequently. > > A seqlock permits multiple parallel readers. The variant of seqlock > implemented in this patch supports multiple writers as well. A > spinlock is used for writer-writer serialization. > > To avoid resource reclamation and other issues, the data protected by > a seqlock is best off being self-contained (i.e., no pointers [except > to constant data]). > > One way to think about seqlocks is that they provide means to perform > atomic operations on data objects larger than what the native atomic > machine instructions allow for. > > DPDK seqlocks are not preemption safe on the writer side. A thread > preemption affects performance, not correctness. > > A seqlock contains a sequence number, which can be thought of as the > generation of the data it protects. > > A reader will > 1. Load the sequence number (sn). > 2. Load, in arbitrary order, the seqlock-protected data. > 3. Load the sn again. > 4. Check if the first and second sn are equal, and even numbered. > If they are not, discard the loaded data, and restart from 1. > > The first three steps need to be ordered using suitable memory fences. > > A writer will > 1. Take the spinlock, to serialize writer access. > 2. Load the sn. > 3. Store the original sn + 1 as the new sn. > 4. Perform load and stores to the seqlock-protected data. > 5. Store the original sn + 2 as the new sn. > 6. Release the spinlock. > > Proper memory fencing is required to make sure the first sn store, the > data stores, and the second sn store appear to the reader in the > mentioned order. > > The sn loads and stores must be atomic, but the data loads and stores > need not be. > > The original seqlock design and implementation was done by Stephen > Hemminger. This is an independent implementation, using C11 atomics. > > For more information on seqlocks, see > https://en.wikipedia.org/wiki/Seqlock I think would be good to have the sequence count (read side only) like the kernel and sequence lock (sequence count + spinlock) as separate things. That way the application could use sequence count + ticket lock if it needed to scale to more writers. > diff --git a/lib/eal/common/rte_seqlock.c b/lib/eal/common/rte_seqlock.c > new file mode 100644 > index 0000000000..d4fe648799 > --- /dev/null > +++ b/lib/eal/common/rte_seqlock.c > @@ -0,0 +1,12 @@ > +/* SPDX-License-Identifier: BSD-3-Clause > + * Copyright(c) 2022 Ericsson AB > + */ > + > +#include <rte_seqlock.h> > + > +void > +rte_seqlock_init(rte_seqlock_t *seqlock) > +{ > + seqlock->sn = 0; > + rte_spinlock_init(&seqlock->lock); > +} So small, worth just making inline? ^ permalink raw reply [flat|nested] 104+ messages in thread
* Re: [PATCH v6] eal: add seqlock 2022-05-08 16:10 ` Stephen Hemminger @ 2022-05-08 19:40 ` Mattias Rönnblom 2022-05-09 3:48 ` Stephen Hemminger 0 siblings, 1 reply; 104+ messages in thread From: Mattias Rönnblom @ 2022-05-08 19:40 UTC (permalink / raw) To: Stephen Hemminger, Mattias Rönnblom Cc: Thomas Monjalon, David Marchand, dev, onar.olsen, Honnappa.Nagarahalli, nd, konstantin.ananyev, mb, Chengwen Feng, Ola Liljedahl On 2022-05-08 18:10, Stephen Hemminger wrote: > On Sun, 8 May 2022 14:12:42 +0200 > Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote: >> A sequence lock (seqlock) is a synchronization primitive which allows >> for data-race free, low-overhead, high-frequency reads, suitable for >> data structures shared across many cores and which are updated >> relatively infrequently. >> >> A seqlock permits multiple parallel readers. The variant of seqlock >> implemented in this patch supports multiple writers as well. A >> spinlock is used for writer-writer serialization. >> >> To avoid resource reclamation and other issues, the data protected by >> a seqlock is best off being self-contained (i.e., no pointers [except >> to constant data]). >> >> One way to think about seqlocks is that they provide means to perform >> atomic operations on data objects larger than what the native atomic >> machine instructions allow for. >> >> DPDK seqlocks are not preemption safe on the writer side. A thread >> preemption affects performance, not correctness. >> >> A seqlock contains a sequence number, which can be thought of as the >> generation of the data it protects. >> >> A reader will >> 1. Load the sequence number (sn). >> 2. Load, in arbitrary order, the seqlock-protected data. >> 3. Load the sn again. >> 4. Check if the first and second sn are equal, and even numbered. >> If they are not, discard the loaded data, and restart from 1. >> >> The first three steps need to be ordered using suitable memory fences. >> >> A writer will >> 1. Take the spinlock, to serialize writer access. >> 2. Load the sn. >> 3. Store the original sn + 1 as the new sn. >> 4. Perform load and stores to the seqlock-protected data. >> 5. Store the original sn + 2 as the new sn. >> 6. Release the spinlock. >> >> Proper memory fencing is required to make sure the first sn store, the >> data stores, and the second sn store appear to the reader in the >> mentioned order. >> >> The sn loads and stores must be atomic, but the data loads and stores >> need not be. >> >> The original seqlock design and implementation was done by Stephen >> Hemminger. This is an independent implementation, using C11 atomics. >> >> For more information on seqlocks, see >> https://en.wikipedia.org/wiki/Seqlock > > I think would be good to have the sequence count (read side only) like > the kernel and sequence lock (sequence count + spinlock) as separate things. > > That way the application could use sequence count + ticket lock if it > needed to scale to more writers. > Sounds reasonable. Would that be something like: typedef struct { uint32_t sn; } rte_seqlock_t; rte_seqlock_read_begin() rte_seqlock_read_retry() rte_seqlock_write_begin() rte_seqlock_write_end() typedef struct { rte_seqlock_t seqlock; rte_spinlock_t wlock; } rte_<something>_t; rte_<something>_read_begin() rte_<something>_read_retry() rte_<something>_write_lock() rte_<something>_write_unlock() or are you suggesting removing the spinlock altogether, and leave writer-side synchronization to the application (at least in this DPDK release)? >> diff --git a/lib/eal/common/rte_seqlock.c b/lib/eal/common/rte_seqlock.c >> new file mode 100644 >> index 0000000000..d4fe648799 >> --- /dev/null >> +++ b/lib/eal/common/rte_seqlock.c >> @@ -0,0 +1,12 @@ >> +/* SPDX-License-Identifier: BSD-3-Clause >> + * Copyright(c) 2022 Ericsson AB >> + */ >> + >> +#include <rte_seqlock.h> >> + >> +void >> +rte_seqlock_init(rte_seqlock_t *seqlock) >> +{ >> + seqlock->sn = 0; >> + rte_spinlock_init(&seqlock->lock); >> +} > > So small, worth just making inline? I don't think so, but it is small. Especially if rte_spinlock_init() now goes away. :) ^ permalink raw reply [flat|nested] 104+ messages in thread
* Re: [PATCH v6] eal: add seqlock 2022-05-08 19:40 ` Mattias Rönnblom @ 2022-05-09 3:48 ` Stephen Hemminger 2022-05-09 6:26 ` Morten Brørup 2022-05-13 6:27 ` Mattias Rönnblom 0 siblings, 2 replies; 104+ messages in thread From: Stephen Hemminger @ 2022-05-09 3:48 UTC (permalink / raw) To: Mattias Rönnblom Cc: Mattias Rönnblom, Thomas Monjalon, David Marchand, dev, onar.olsen, Honnappa.Nagarahalli, nd, konstantin.ananyev, mb, Chengwen Feng, Ola Liljedahl On Sun, 8 May 2022 21:40:58 +0200 Mattias Rönnblom <hofors@lysator.liu.se> wrote: > > I think would be good to have the sequence count (read side only) like > > the kernel and sequence lock (sequence count + spinlock) as separate things. > > > > That way the application could use sequence count + ticket lock if it > > needed to scale to more writers. > > > > Sounds reasonable. Would that be something like: > > typedef struct { > uint32_t sn; > } rte_seqlock_t; > > rte_seqlock_read_begin() > rte_seqlock_read_retry() > rte_seqlock_write_begin() > rte_seqlock_write_end() > > typedef struct { > rte_seqlock_t seqlock; > rte_spinlock_t wlock; > } rte_<something>_t; > > rte_<something>_read_begin() > rte_<something>_read_retry() > rte_<something>_write_lock() > rte_<something>_write_unlock() > > or are you suggesting removing the spinlock altogether, and leave > writer-side synchronization to the application (at least in this DPDK > release)? No, like Linux kernel. Use seqcount for the reader counter only object and seqlock for the seqcount + spinlock version. ^ permalink raw reply [flat|nested] 104+ messages in thread
* RE: [PATCH v6] eal: add seqlock 2022-05-09 3:48 ` Stephen Hemminger @ 2022-05-09 6:26 ` Morten Brørup 2022-05-13 6:27 ` Mattias Rönnblom 1 sibling, 0 replies; 104+ messages in thread From: Morten Brørup @ 2022-05-09 6:26 UTC (permalink / raw) To: Stephen Hemminger, Mattias Rönnblom Cc: Mattias Rönnblom, Thomas Monjalon, David Marchand, dev, onar.olsen, Honnappa.Nagarahalli, nd, konstantin.ananyev, Chengwen Feng, Ola Liljedahl > From: Stephen Hemminger [mailto:stephen@networkplumber.org] > Sent: Monday, 9 May 2022 05.48 > > On Sun, 8 May 2022 21:40:58 +0200 > Mattias Rönnblom <hofors@lysator.liu.se> wrote: > > > > I think would be good to have the sequence count (read side only) > like > > > the kernel and sequence lock (sequence count + spinlock) as > separate things. > > > > > > That way the application could use sequence count + ticket lock if > it > > > needed to scale to more writers. > > > If we want a seqlock based on a ticket lock, I would prefer that DPDK offers it, rather than requiring the application to implement it. Regardless, adding the seqcount type as a separate thing could still make sense. > > > > Sounds reasonable. Would that be something like: > > > > typedef struct { > > uint32_t sn; > > } rte_seqlock_t; > > > > rte_seqlock_read_begin() > > rte_seqlock_read_retry() > > rte_seqlock_write_begin() > > rte_seqlock_write_end() > > > > typedef struct { > > rte_seqlock_t seqlock; > > rte_spinlock_t wlock; > > } rte_<something>_t; > > > > rte_<something>_read_begin() > > rte_<something>_read_retry() > > rte_<something>_write_lock() > > rte_<something>_write_unlock() > > > > or are you suggesting removing the spinlock altogether, and leave > > writer-side synchronization to the application (at least in this DPDK > > release)? > > > No, like Linux kernel. Use seqcount for the reader counter only object > and seqlock for the seqcount + spinlock version. In other words: Keep the existing names, i.e. rte_seqlock_t/rte_seqlock_functions(), for what you have already implemented, and use the names rte_seqcount_t/rte_seqcount_functions() for the variant without the lock. Linux source code here: https://elixir.bootlin.com/linux/v5.10.113/source/include/linux/seqlock.h I suppose that the rte_seqcount_t primitive should go into a separate file; it is not really a lock. ^ permalink raw reply [flat|nested] 104+ messages in thread
* Re: [PATCH v6] eal: add seqlock 2022-05-09 3:48 ` Stephen Hemminger 2022-05-09 6:26 ` Morten Brørup @ 2022-05-13 6:27 ` Mattias Rönnblom 1 sibling, 0 replies; 104+ messages in thread From: Mattias Rönnblom @ 2022-05-13 6:27 UTC (permalink / raw) To: Stephen Hemminger Cc: Mattias Rönnblom, Thomas Monjalon, David Marchand, dev, onar.olsen, Honnappa.Nagarahalli, nd, konstantin.ananyev, mb, Chengwen Feng, Ola Liljedahl On 2022-05-09 05:48, Stephen Hemminger wrote: > On Sun, 8 May 2022 21:40:58 +0200 > Mattias Rönnblom <hofors@lysator.liu.se> wrote: > >>> I think would be good to have the sequence count (read side only) like >>> the kernel and sequence lock (sequence count + spinlock) as separate things. >>> >>> That way the application could use sequence count + ticket lock if it >>> needed to scale to more writers. >>> >> >> Sounds reasonable. Would that be something like: >> >> typedef struct { >> uint32_t sn; >> } rte_seqlock_t; >> >> rte_seqlock_read_begin() >> rte_seqlock_read_retry() >> rte_seqlock_write_begin() >> rte_seqlock_write_end() >> >> typedef struct { >> rte_seqlock_t seqlock; >> rte_spinlock_t wlock; >> } rte_<something>_t; >> >> rte_<something>_read_begin() >> rte_<something>_read_retry() >> rte_<something>_write_lock() >> rte_<something>_write_unlock() >> >> or are you suggesting removing the spinlock altogether, and leave >> writer-side synchronization to the application (at least in this DPDK >> release)? > > > No, like Linux kernel. Use seqcount for the reader counter only object > and seqlock for the seqcount + spinlock version. Should rte_seqcount_t be in a separate file? Normally, I would use the "header file per 'class'" pattern (unless things are very tightly coupled), but I suspect DPDK style is the "header file per group of related 'classes'". ^ permalink raw reply [flat|nested] 104+ messages in thread
* RE: DPDK seqlock 2022-03-22 16:10 DPDK seqlock Mattias Rönnblom 2022-03-22 16:46 ` Ananyev, Konstantin @ 2022-03-23 12:04 ` Morten Brørup 1 sibling, 0 replies; 104+ messages in thread From: Morten Brørup @ 2022-03-23 12:04 UTC (permalink / raw) To: Mattias Rönnblom, dev > From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com] > Sent: Tuesday, 22 March 2022 17.10 > > Hi. > > Would it make sense to have a seqlock implementation in DPDK? Certainly! > > I think so, since it's a very useful synchronization primitive in data > plane applications. Yes, and having it in DPDK saves application developers from writing their own (with the risks coming with that). > > Regards, > Mattias ^ permalink raw reply [flat|nested] 104+ messages in thread
end of thread, other threads:[~2022-05-13 6:27 UTC | newest] Thread overview: 104+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-03-22 16:10 DPDK seqlock Mattias Rönnblom 2022-03-22 16:46 ` Ananyev, Konstantin 2022-03-24 4:52 ` Honnappa Nagarahalli 2022-03-24 5:06 ` Stephen Hemminger 2022-03-24 11:34 ` Mattias Rönnblom 2022-03-25 20:24 ` [RFC] eal: add seqlock Mattias Rönnblom 2022-03-25 21:10 ` Stephen Hemminger 2022-03-26 14:57 ` Mattias Rönnblom 2022-03-27 14:49 ` Ananyev, Konstantin 2022-03-27 17:42 ` Mattias Rönnblom 2022-03-28 10:53 ` Ananyev, Konstantin 2022-03-28 14:06 ` Ola Liljedahl 2022-03-29 8:32 ` Mattias Rönnblom 2022-03-29 13:20 ` Ananyev, Konstantin 2022-03-30 10:07 ` [PATCH] " Mattias Rönnblom 2022-03-30 10:50 ` Morten Brørup 2022-03-30 11:24 ` Tyler Retzlaff 2022-03-30 11:25 ` Mattias Rönnblom 2022-03-30 14:26 ` [PATCH v2] " Mattias Rönnblom 2022-03-31 7:46 ` Mattias Rönnblom 2022-03-31 9:04 ` Ola Liljedahl 2022-03-31 9:25 ` Morten Brørup 2022-03-31 9:38 ` Ola Liljedahl 2022-03-31 10:03 ` Morten Brørup 2022-03-31 11:44 ` Ola Liljedahl 2022-03-31 11:50 ` Morten Brørup 2022-03-31 14:02 ` Mattias Rönnblom 2022-04-01 15:07 ` [PATCH v3] " Mattias Rönnblom 2022-04-02 0:21 ` Honnappa Nagarahalli 2022-04-02 11:01 ` Morten Brørup 2022-04-02 19:38 ` Honnappa Nagarahalli 2022-04-10 13:51 ` [RFC 1/3] eal: add macro to warn for unused function return values Mattias Rönnblom 2022-04-10 13:51 ` [RFC 2/3] eal: emit warning for unused trylock return value Mattias Rönnblom 2022-04-10 13:51 ` [RFC 3/3] examples/bond: fix invalid use of trylock Mattias Rönnblom 2022-04-11 1:01 ` Min Hu (Connor) 2022-04-11 14:32 ` Mattias Rönnblom 2022-04-11 11:25 ` David Marchand 2022-04-11 14:33 ` Mattias Rönnblom 2022-04-10 18:02 ` [RFC 1/3] eal: add macro to warn for unused function return values Stephen Hemminger 2022-04-10 18:50 ` Mattias Rönnblom 2022-04-11 7:17 ` Morten Brørup 2022-04-11 14:29 ` Mattias Rönnblom 2022-04-11 9:16 ` Bruce Richardson 2022-04-11 14:27 ` Mattias Rönnblom 2022-04-11 15:15 ` [PATCH " Mattias Rönnblom 2022-04-11 15:15 ` [PATCH 2/3] eal: emit warning for unused trylock return value Mattias Rönnblom 2022-04-11 15:29 ` Morten Brørup 2022-04-11 15:15 ` [PATCH 3/3] examples/bond: fix invalid use of trylock Mattias Rönnblom 2022-04-14 12:06 ` David Marchand 2022-04-11 15:25 ` [PATCH 1/3] eal: add macro to warn for unused function return values Morten Brørup 2022-04-11 18:24 ` [RFC " Tyler Retzlaff 2022-04-03 6:10 ` [PATCH v3] eal: add seqlock Mattias Rönnblom 2022-04-03 17:27 ` Honnappa Nagarahalli 2022-04-03 18:37 ` Ola Liljedahl 2022-04-04 21:56 ` Honnappa Nagarahalli 2022-04-03 6:33 ` Mattias Rönnblom 2022-04-03 17:37 ` Honnappa Nagarahalli 2022-04-08 13:45 ` Mattias Rönnblom 2022-04-02 18:15 ` Ola Liljedahl 2022-04-02 19:31 ` Honnappa Nagarahalli 2022-04-02 20:36 ` Morten Brørup 2022-04-02 22:01 ` Honnappa Nagarahalli 2022-04-03 18:11 ` Ola Liljedahl 2022-04-03 6:51 ` Mattias Rönnblom 2022-03-31 13:51 ` [PATCH v2] " Mattias Rönnblom 2022-04-02 0:54 ` Stephen Hemminger 2022-04-02 10:25 ` Morten Brørup 2022-04-02 17:43 ` Ola Liljedahl 2022-03-31 13:38 ` Mattias Rönnblom 2022-03-31 14:53 ` Ola Liljedahl 2022-04-02 0:52 ` Stephen Hemminger 2022-04-03 6:23 ` Mattias Rönnblom 2022-04-02 0:50 ` Stephen Hemminger 2022-04-02 17:54 ` Ola Liljedahl 2022-04-02 19:37 ` Honnappa Nagarahalli 2022-04-05 20:16 ` Stephen Hemminger 2022-04-08 13:50 ` Mattias Rönnblom 2022-04-08 14:24 ` [PATCH v4] " Mattias Rönnblom 2022-04-08 15:17 ` Stephen Hemminger 2022-04-08 16:24 ` Mattias Rönnblom 2022-04-08 15:19 ` Stephen Hemminger 2022-04-08 16:37 ` Mattias Rönnblom 2022-04-08 16:48 ` Mattias Rönnblom 2022-04-12 17:27 ` Ananyev, Konstantin 2022-04-28 10:28 ` David Marchand 2022-05-01 13:46 ` Mattias Rönnblom 2022-05-01 14:03 ` [PATCH v5] " Mattias Rönnblom 2022-05-01 14:22 ` Mattias Rönnblom 2022-05-02 6:47 ` David Marchand 2022-05-01 20:17 ` Stephen Hemminger 2022-05-02 4:51 ` Mattias Rönnblom 2022-05-06 1:26 ` fengchengwen 2022-05-06 1:33 ` Honnappa Nagarahalli 2022-05-06 4:17 ` fengchengwen 2022-05-06 5:19 ` Honnappa Nagarahalli 2022-05-06 7:03 ` fengchengwen 2022-05-08 11:56 ` Mattias Rönnblom 2022-05-08 12:12 ` [PATCH v6] " Mattias Rönnblom 2022-05-08 16:10 ` Stephen Hemminger 2022-05-08 19:40 ` Mattias Rönnblom 2022-05-09 3:48 ` Stephen Hemminger 2022-05-09 6:26 ` Morten Brørup 2022-05-13 6:27 ` Mattias Rönnblom 2022-03-23 12:04 ` DPDK seqlock Morten Brørup
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).