From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124])
	by inbox.dpdk.org (Postfix) with ESMTP id 7F368A00C2;
	Sun,  3 Apr 2022 08:10:36 +0200 (CEST)
Received: from [217.70.189.124] (localhost [127.0.0.1])
	by mails.dpdk.org (Postfix) with ESMTP id 1489F4068A;
	Sun,  3 Apr 2022 08:10:36 +0200 (CEST)
Received: from mail.lysator.liu.se (mail.lysator.liu.se [130.236.254.3])
 by mails.dpdk.org (Postfix) with ESMTP id 9C4A040041
 for <dev@dpdk.org>; Sun,  3 Apr 2022 08:10:34 +0200 (CEST)
Received: from mail.lysator.liu.se (localhost [127.0.0.1])
 by mail.lysator.liu.se (Postfix) with ESMTP id 2303213275
 for <dev@dpdk.org>; Sun,  3 Apr 2022 08:10:33 +0200 (CEST)
Received: by mail.lysator.liu.se (Postfix, from userid 1004)
 id 219E3135E6; Sun,  3 Apr 2022 08:10:33 +0200 (CEST)
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 hermod.lysator.liu.se
X-Spam-Level: 
X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED, AWL, NICE_REPLY_A,
 T_SCC_BODY_TEXT_LINE autolearn=disabled version=3.4.6
X-Spam-Score: -2.9
Received: from [192.168.1.59] (unknown [62.63.215.114])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by mail.lysator.liu.se (Postfix) with ESMTPSA id CEFB4138AE;
 Sun,  3 Apr 2022 08:10:28 +0200 (CEST)
Message-ID: <7fa75549-c0db-a206-f642-69235bb7a3fb@lysator.liu.se>
Date: Sun, 3 Apr 2022 08:10:28 +0200
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
 Thunderbird/91.7.0
Subject: Re: [PATCH v3] eal: add seqlock
Content-Language: en-US
To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>,
 "dev@dpdk.org" <dev@dpdk.org>
Cc: "thomas@monjalon.net" <thomas@monjalon.net>,
 David Marchand <david.marchand@redhat.com>,
 "onar.olsen@ericsson.com" <onar.olsen@ericsson.com>, nd <nd@arm.com>,
 "konstantin.ananyev@intel.com" <konstantin.ananyev@intel.com>,
 "mb@smartsharesystems.com" <mb@smartsharesystems.com>,
 "stephen@networkplumber.org" <stephen@networkplumber.org>,
 Ola Liljedahl <Ola.Liljedahl@arm.com>
References: <a0ec9df7-1af7-708e-fef0-49cb4e147a2e@ericsson.com>
 <20220401150749.136921-1-mattias.ronnblom@ericsson.com>
 <DBAPR08MB5814D447DD823DE97339533C98E39@DBAPR08MB5814.eurprd08.prod.outlook.com>
From: =?UTF-8?Q?Mattias_R=c3=b6nnblom?= <hofors@lysator.liu.se>
In-Reply-To: <DBAPR08MB5814D447DD823DE97339533C98E39@DBAPR08MB5814.eurprd08.prod.outlook.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Virus-Scanned: ClamAV using ClamSMTP
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org

On 2022-04-02 02:21, Honnappa Nagarahalli wrote:
> Hi Mattias,
> 	Few comments inline.
> 
>> -----Original Message-----
>> From: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
>> Sent: Friday, April 1, 2022 10:08 AM
>> To: dev@dpdk.org
>> Cc: thomas@monjalon.net; David Marchand <david.marchand@redhat.com>;
>> onar.olsen@ericsson.com; Honnappa Nagarahalli
>> <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>;
>> konstantin.ananyev@intel.com; mb@smartsharesystems.com;
>> stephen@networkplumber.org; Mattias Rönnblom
>> <mattias.ronnblom@ericsson.com>; Ola Liljedahl <Ola.Liljedahl@arm.com>
>> Subject: [PATCH v3] eal: add seqlock
>>
>> A sequence lock (seqlock) is synchronization primitive which allows for data-
>> race free, low-overhead, high-frequency reads, especially for data structures
>> shared across many cores and which are updated with relatively infrequently.
>>
>> A seqlock permits multiple parallel readers. The variant of seqlock implemented
>> in this patch supports multiple writers as well. A spinlock is used for writer-
>> writer serialization.
>>
>> To avoid resource reclamation and other issues, the data protected by a seqlock
>> is best off being self-contained (i.e., no pointers [except to constant data]).
>>
>> One way to think about seqlocks is that they provide means to perform atomic
>> operations on data objects larger what the native atomic machine instructions
>> allow for.
>>
>> DPDK seqlocks are not preemption safe on the writer side. A thread preemption
>> affects performance, not correctness.
>>
>> A seqlock contains a sequence number, which can be thought of as the
>> generation of the data it protects.
>>
>> A reader will
>>    1. Load the sequence number (sn).
>>    2. Load, in arbitrary order, the seqlock-protected data.
>>    3. Load the sn again.
>>    4. Check if the first and second sn are equal, and even numbered.
>>       If they are not, discard the loaded data, and restart from 1.
>>
>> The first three steps need to be ordered using suitable memory fences.
>>
>> A writer will
>>    1. Take the spinlock, to serialize writer access.
>>    2. Load the sn.
>>    3. Store the original sn + 1 as the new sn.
>>    4. Perform load and stores to the seqlock-protected data.
>>    5. Store the original sn + 2 as the new sn.
>>    6. Release the spinlock.
>>
>> Proper memory fencing is required to make sure the first sn store, the data
>> stores, and the second sn store appear to the reader in the mentioned order.
>>
>> The sn loads and stores must be atomic, but the data loads and stores need not
>> be.
>>
>> The original seqlock design and implementation was done by Stephen
>> Hemminger. This is an independent implementation, using C11 atomics.
>>
>> For more information on seqlocks, see
>> https://en.wikipedia.org/wiki/Seqlock
>>
>> PATCH v3:
>>    * Renamed both read and write-side critical section begin/end functions
>>      to better match rwlock naming, per Ola Liljedahl's suggestion.
>>    * Added 'extern "C"' guards for C++ compatibility.
>>    * Refer to the main lcore as the main, and not the master.
>>
>> PATCH v2:
>>    * Skip instead of fail unit test in case too few lcores are available.
>>    * Use main lcore for testing, reducing the minimum number of lcores
>>      required to run the unit tests to four.
>>    * Consistently refer to sn field as the "sequence number" in the
>>      documentation.
>>    * Fixed spelling mistakes in documentation.
>>
>> Updates since RFC:
>>    * Added API documentation.
>>    * Added link to Wikipedia article in the commit message.
>>    * Changed seqlock sequence number field from uint64_t (which was
>>      overkill) to uint32_t. The sn type needs to be sufficiently large
>>      to assure no reader will read a sn, access the data, and then read
>>      the same sn, but the sn has been updated to many times during the
>>      read, so it has wrapped.
>>    * Added RTE_SEQLOCK_INITIALIZER macro for static initialization.
>>    * Removed the rte_seqlock struct + separate rte_seqlock_t typedef
>>      with an anonymous struct typedef:ed to rte_seqlock_t.
>>
>> Acked-by: Morten Brørup <mb@smartsharesystems.com>
>> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
>> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
>> ---
>>   app/test/meson.build          |   2 +
>>   app/test/test_seqlock.c       | 202 +++++++++++++++++++++++
>>   lib/eal/common/meson.build    |   1 +
>>   lib/eal/common/rte_seqlock.c  |  12 ++
>>   lib/eal/include/meson.build   |   1 +
>>   lib/eal/include/rte_seqlock.h | 302 ++++++++++++++++++++++++++++++++++
>>   lib/eal/version.map           |   3 +
>>   7 files changed, 523 insertions(+)
>>   create mode 100644 app/test/test_seqlock.c  create mode 100644
>> lib/eal/common/rte_seqlock.c  create mode 100644
>> lib/eal/include/rte_seqlock.h
>>
>> diff --git a/app/test/meson.build b/app/test/meson.build index
>> 5fc1dd1b7b..5e418e8766 100644
>> --- a/app/test/meson.build
>> +++ b/app/test/meson.build
>> @@ -125,6 +125,7 @@ test_sources = files(
>>           'test_rwlock.c',
>>           'test_sched.c',
>>           'test_security.c',
>> +        'test_seqlock.c',
>>           'test_service_cores.c',
>>           'test_spinlock.c',
>>           'test_stack.c',
>> @@ -214,6 +215,7 @@ fast_tests = [
>>           ['rwlock_rde_wro_autotest', true],
>>           ['sched_autotest', true],
>>           ['security_autotest', false],
>> +        ['seqlock_autotest', true],
>>           ['spinlock_autotest', true],
>>           ['stack_autotest', false],
>>           ['stack_lf_autotest', false],
>> diff --git a/app/test/test_seqlock.c b/app/test/test_seqlock.c new file mode
>> 100644 index 0000000000..54fadf8025
>> --- /dev/null
>> +++ b/app/test/test_seqlock.c
>> @@ -0,0 +1,202 @@
>> +/* SPDX-License-Identifier: BSD-3-Clause
>> + * Copyright(c) 2022 Ericsson AB
>> + */
>> +
>> +#include <rte_seqlock.h>
>> +
>> +#include <rte_cycles.h>
>> +#include <rte_malloc.h>
>> +#include <rte_random.h>
>> +
>> +#include <inttypes.h>
>> +
>> +#include "test.h"
>> +
>> +struct data {
>> +	rte_seqlock_t lock;
>> +
>> +	uint64_t a;
>> +	uint64_t b __rte_cache_aligned;
>> +	uint64_t c __rte_cache_aligned;
>> +} __rte_cache_aligned;
>> +
>> +struct reader {
>> +	struct data *data;
>> +	uint8_t stop;
>> +};
>> +
>> +#define WRITER_RUNTIME (2.0) /* s */
>> +
>> +#define WRITER_MAX_DELAY (100) /* us */
>> +
>> +#define INTERRUPTED_WRITER_FREQUENCY (1000) #define
>> +WRITER_INTERRUPT_TIME (1) /* us */
>> +
>> +static int
>> +writer_run(void *arg)
>> +{
>> +	struct data *data = arg;
>> +	uint64_t deadline;
>> +
>> +	deadline = rte_get_timer_cycles() +
>> +		WRITER_RUNTIME * rte_get_timer_hz();
>> +
>> +	while (rte_get_timer_cycles() < deadline) {
>> +		bool interrupted;
>> +		uint64_t new_value;
>> +		unsigned int delay;
>> +
>> +		new_value = rte_rand();
>> +
>> +		interrupted =
>> rte_rand_max(INTERRUPTED_WRITER_FREQUENCY) == 0;
>> +
>> +		rte_seqlock_write_lock(&data->lock);
>> +
>> +		data->c = new_value;
>> +
>> +		/* These compiler barriers (both on the test reader
>> +		 * and the test writer side) are here to ensure that
>> +		 * loads/stores *usually* happen in test program order
>> +		 * (always on a TSO machine). They are arrange in such
>> +		 * a way that the writer stores in a different order
>> +		 * than the reader loads, to emulate an arbitrary
>> +		 * order. A real application using a seqlock does not
>> +		 * require any compiler barriers.
>> +		 */
>> +		rte_compiler_barrier();
> The compiler barriers are not sufficient on all architectures (if the intention is to maintain the program order).
> 

The intention is what is described in the comment (i.e., to make it 
likely, but no guaranteed, that the stores will be globally visible in 
the program order).

The reason I didn't put in a release memory barrier, was that it seems a 
little intrusive.

Maybe I should remove these compiler barriers. They are also intrusive 
in the way may prevent some compiler optimizations, that could expose a 
seqlock bug. Or, I could have two variants of the tests. I don't know.

>> +		data->b = new_value;
>> +
>> +		if (interrupted)
>> +			rte_delay_us_block(WRITER_INTERRUPT_TIME);
>> +
>> +		rte_compiler_barrier();
>> +		data->a = new_value;
>> +
>> +		rte_seqlock_write_unlock(&data->lock);
>> +
>> +		delay = rte_rand_max(WRITER_MAX_DELAY);
>> +
>> +		rte_delay_us_block(delay);
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +#define INTERRUPTED_READER_FREQUENCY (1000) #define
>> +READER_INTERRUPT_TIME (1000) /* us */
>> +
>> +static int
>> +reader_run(void *arg)
>> +{
>> +	struct reader *r = arg;
>> +	int rc = 0;
>> +
>> +	while (__atomic_load_n(&r->stop, __ATOMIC_RELAXED) == 0 && rc ==
>> 0) {
>> +		struct data *data = r->data;
>> +		bool interrupted;
>> +		uint64_t a;
>> +		uint64_t b;
>> +		uint64_t c;
>> +		uint32_t sn;
>> +
>> +		interrupted =
>> rte_rand_max(INTERRUPTED_READER_FREQUENCY) == 0;
>> +
>> +		sn = rte_seqlock_read_lock(&data->lock);
>> +
>> +		do {
>> +			a = data->a;
>> +			/* See writer_run() for an explanation why
>> +			 * these barriers are here.
>> +			 */
>> +			rte_compiler_barrier();
>> +
>> +			if (interrupted)
>> +
>> 	rte_delay_us_block(READER_INTERRUPT_TIME);
>> +
>> +			c = data->c;
>> +
>> +			rte_compiler_barrier();
>> +			b = data->b;
>> +
>> +		} while (!rte_seqlock_read_tryunlock(&data->lock, &sn));
>> +
>> +		if (a != b || b != c) {
>> +			printf("Reader observed inconsistent data values "
>> +			       "%" PRIu64 " %" PRIu64 " %" PRIu64 "\n",
>> +			       a, b, c);
>> +			rc = -1;
>> +		}
>> +	}
>> +
>> +	return rc;
>> +}
>> +
>> +static void
>> +reader_stop(struct reader *reader)
>> +{
>> +	__atomic_store_n(&reader->stop, 1, __ATOMIC_RELAXED); }
>> +
>> +#define NUM_WRITERS (2) /* main lcore + one worker */ #define
>> +MIN_NUM_READERS (2) #define MAX_READERS (RTE_MAX_LCORE -
>> NUM_WRITERS -
>> +1) #define MIN_LCORE_COUNT (NUM_WRITERS + MIN_NUM_READERS)
>> +
>> +/* Only a compile-time test */
>> +static rte_seqlock_t __rte_unused static_init_lock =
>> +RTE_SEQLOCK_INITIALIZER;
>> +
>> +static int
>> +test_seqlock(void)
>> +{
>> +	struct reader readers[MAX_READERS];
>> +	unsigned int num_readers;
>> +	unsigned int num_lcores;
>> +	unsigned int i;
>> +	unsigned int lcore_id;
>> +	unsigned int reader_lcore_ids[MAX_READERS];
>> +	unsigned int worker_writer_lcore_id = 0;
>> +	int rc = 0;
>> +
>> +	num_lcores = rte_lcore_count();
>> +
>> +	if (num_lcores < MIN_LCORE_COUNT) {
>> +		printf("Too few cores to run test. Skipping.\n");
>> +		return 0;
>> +	}
>> +
>> +	num_readers = num_lcores - NUM_WRITERS;
>> +
>> +	struct data *data = rte_zmalloc(NULL, sizeof(struct data), 0);
>> +
>> +	i = 0;
>> +	RTE_LCORE_FOREACH_WORKER(lcore_id) {
>> +		if (i == 0) {
>> +			rte_eal_remote_launch(writer_run, data, lcore_id);
>> +			worker_writer_lcore_id = lcore_id;
>> +		} else {
>> +			unsigned int reader_idx = i - 1;
>> +			struct reader *reader = &readers[reader_idx];
>> +
>> +			reader->data = data;
>> +			reader->stop = 0;
>> +
>> +			rte_eal_remote_launch(reader_run, reader, lcore_id);
>> +			reader_lcore_ids[reader_idx] = lcore_id;
>> +		}
>> +		i++;
>> +	}
>> +
>> +	if (writer_run(data) != 0 ||
>> +	    rte_eal_wait_lcore(worker_writer_lcore_id) != 0)
>> +		rc = -1;
>> +
>> +	for (i = 0; i < num_readers; i++) {
>> +		reader_stop(&readers[i]);
>> +		if (rte_eal_wait_lcore(reader_lcore_ids[i]) != 0)
>> +			rc = -1;
>> +	}
>> +
>> +	return rc;
>> +}
>> +
>> +REGISTER_TEST_COMMAND(seqlock_autotest, test_seqlock);
>> diff --git a/lib/eal/common/meson.build b/lib/eal/common/meson.build index
>> 917758cc65..a41343bfed 100644
>> --- a/lib/eal/common/meson.build
>> +++ b/lib/eal/common/meson.build
>> @@ -35,6 +35,7 @@ sources += files(
>>           'rte_malloc.c',
>>           'rte_random.c',
>>           'rte_reciprocal.c',
>> +	'rte_seqlock.c',
>>           'rte_service.c',
>>           'rte_version.c',
>>   )
>> diff --git a/lib/eal/common/rte_seqlock.c b/lib/eal/common/rte_seqlock.c new
>> file mode 100644 index 0000000000..d4fe648799
>> --- /dev/null
>> +++ b/lib/eal/common/rte_seqlock.c
>> @@ -0,0 +1,12 @@
>> +/* SPDX-License-Identifier: BSD-3-Clause
>> + * Copyright(c) 2022 Ericsson AB
>> + */
>> +
>> +#include <rte_seqlock.h>
>> +
>> +void
>> +rte_seqlock_init(rte_seqlock_t *seqlock) {
>> +	seqlock->sn = 0;
>> +	rte_spinlock_init(&seqlock->lock);
>> +}
>> diff --git a/lib/eal/include/meson.build b/lib/eal/include/meson.build index
>> 9700494816..48df5f1a21 100644
>> --- a/lib/eal/include/meson.build
>> +++ b/lib/eal/include/meson.build
>> @@ -36,6 +36,7 @@ headers += files(
>>           'rte_per_lcore.h',
>>           'rte_random.h',
>>           'rte_reciprocal.h',
>> +        'rte_seqlock.h',
>>           'rte_service.h',
>>           'rte_service_component.h',
>>           'rte_string_fns.h',
>> diff --git a/lib/eal/include/rte_seqlock.h b/lib/eal/include/rte_seqlock.h new file
> Other lock implementations are in lib/eal/include/generic.
> 
>> mode 100644 index 0000000000..44eacd66e8
>> --- /dev/null
>> +++ b/lib/eal/include/rte_seqlock.h
>> @@ -0,0 +1,302 @@
>> +/* SPDX-License-Identifier: BSD-3-Clause
>> + * Copyright(c) 2022 Ericsson AB
>> + */
>> +
>> +#ifndef _RTE_SEQLOCK_H_
>> +#define _RTE_SEQLOCK_H_
>> +
>> +#ifdef __cplusplus
>> +extern "C" {
>> +#endif
>> +
>> +/**
>> + * @file
>> + * RTE Seqlock
>> + *
>> + * A sequence lock (seqlock) is a synchronization primitive allowing
>> + * multiple, parallel, readers to efficiently and safely (i.e., in a
>> + * data-race free manner) access the lock-protected data. The RTE
>> + * seqlock permits multiple writers as well. A spinlock is used to
>> + * writer-writer synchronization.
>> + *
>> + * A reader never blocks a writer. Very high frequency writes may
>> + * prevent readers from making progress.
>> + *
>> + * A seqlock is not preemption-safe on the writer side. If a writer is
>> + * preempted, it may block readers until the writer thread is again
>> + * allowed to execute. Heavy computations should be kept out of the
>> + * writer-side critical section, to avoid delaying readers.
>> + *
>> + * Seqlocks are useful for data which are read by many cores, at a
>> + * high frequency, and relatively infrequently written to.
>> + *
>> + * One way to think about seqlocks is that they provide means to
>> + * perform atomic operations on objects larger than what the native
>> + * machine instructions allow for.
>> + *
>> + * To avoid resource reclamation issues, the data protected by a
>> + * seqlock should typically be kept self-contained (e.g., no pointers
>> + * to mutable, dynamically allocated data).
>> + *
>> + * Example usage:
>> + * @code{.c}
>> + * #define MAX_Y_LEN (16)
>> + * // Application-defined example data structure, protected by a seqlock.
>> + * struct config {
>> + *         rte_seqlock_t lock;
>> + *         int param_x;
>> + *         char param_y[MAX_Y_LEN];
>> + * };
>> + *
>> + * // Accessor function for reading config fields.
>> + * void
>> + * config_read(const struct config *config, int *param_x, char
>> +*param_y)
>> + * {
>> + *         // Temporary variables, just to improve readability.
> I think the above comment is not necessary. It is beneficial to copy the protected data to keep the read side critical section small.
> 
>> + *         int tentative_x;
>> + *         char tentative_y[MAX_Y_LEN];
>> + *         uint32_t sn;
>> + *
>> + *         sn = rte_seqlock_read_lock(&config->lock);
>> + *         do {
>> + *                 // Loads may be atomic or non-atomic, as in this example.
>> + *                 tentative_x = config->param_x;
>> + *                 strcpy(tentative_y, config->param_y);
>> + *         } while (!rte_seqlock_read_tryunlock(&config->lock, &sn));
>> + *         // An application could skip retrying, and try again later, if
>> + *         // progress is possible without the data.
>> + *
>> + *         *param_x = tentative_x;
>> + *         strcpy(param_y, tentative_y);
>> + * }
>> + *
>> + * // Accessor function for writing config fields.
>> + * void
>> + * config_update(struct config *config, int param_x, const char
>> +*param_y)
>> + * {
>> + *         rte_seqlock_write_lock(&config->lock);
>> + *         // Stores may be atomic or non-atomic, as in this example.
>> + *         config->param_x = param_x;
>> + *         strcpy(config->param_y, param_y);
>> + *         rte_seqlock_write_unlock(&config->lock);
>> + * }
>> + * @endcode
>> + *
>> + * @see
>> + * https://en.wikipedia.org/wiki/Seqlock.
>> + */
>> +
>> +#include <stdbool.h>
>> +#include <stdint.h>
>> +
>> +#include <rte_atomic.h>
>> +#include <rte_branch_prediction.h>
>> +#include <rte_spinlock.h>
>> +
>> +/**
>> + * The RTE seqlock type.
>> + */
>> +typedef struct {
>> +	uint32_t sn; /**< A sequence number for the protected data. */
>> +	rte_spinlock_t lock; /**< Spinlock used to serialize writers.  */ }
> Suggest using ticket lock for the writer side. It should have low overhead when there is a single writer, but provides better functionality when there are multiple writers.
> 
>> +rte_seqlock_t;
>> +
>> +/**
>> + * A static seqlock initializer.
>> + */
>> +#define RTE_SEQLOCK_INITIALIZER { 0, RTE_SPINLOCK_INITIALIZER }
>> +
>> +/**
>> + * Initialize the seqlock.
>> + *
>> + * This function initializes the seqlock, and leaves the writer-side
>> + * spinlock unlocked.
>> + *
>> + * @param seqlock
>> + *   A pointer to the seqlock.
>> + */
>> +__rte_experimental
>> +void
>> +rte_seqlock_init(rte_seqlock_t *seqlock);
>> +
>> +/**
>> + * Begin a read-side critical section.
>> + *
>> + * A call to this function marks the beginning of a read-side critical
>> + * section, for @p seqlock.
>> + *
>> + * rte_seqlock_read_lock() returns a sequence number, which is later
>> + * used in rte_seqlock_read_tryunlock() to check if the protected data
>> + * underwent any modifications during the read transaction.
>> + *
>> + * After (in program order) rte_seqlock_read_lock() has been called,
>> + * the calling thread reads the protected data, for later use. The
>> + * protected data read *must* be copied (either in pristine form, or
>> + * in the form of some derivative), since the caller may only read the
>> + * data from within the read-side critical section (i.e., after
>> + * rte_seqlock_read_lock() and before rte_seqlock_read_tryunlock()),
>> + * but must not act upon the retrieved data while in the critical
>> + * section, since it does not yet know if it is consistent.
>> + *
>> + * The protected data may be read using atomic and/or non-atomic
>> + * operations.
>> + *
>> + * After (in program order) all required data loads have been
>> + * performed, rte_seqlock_read_tryunlock() should be called, marking
>> + * the end of the read-side critical section.
>> + *
>> + * If rte_seqlock_read_tryunlock() returns true, the data was read
>> + * atomically and the copied data is consistent.
>> + *
>> + * If rte_seqlock_read_tryunlock() returns false, the just-read data
>> + * is inconsistent and should be discarded. The caller has the option
>> + * to either re-read the data and call rte_seqlock_read_tryunlock()
>> + * again, or to restart the whole procedure (i.e., from
>> + * rte_seqlock_read_lock()) at some later time.
>> + *
>> + * @param seqlock
>> + *   A pointer to the seqlock.
>> + * @return
>> + *   The seqlock sequence number for this critical section, to
>> + *   later be passed to rte_seqlock_read_tryunlock().
>> + *
>> + * @see rte_seqlock_read_tryunlock()
>> + */
>> +__rte_experimental
>> +static inline uint32_t
>> +rte_seqlock_read_lock(const rte_seqlock_t *seqlock) {
>> +	/* __ATOMIC_ACQUIRE to prevent loads after (in program order)
>> +	 * from happening before the sn load. Synchronizes-with the
>> +	 * store release in rte_seqlock_write_unlock().
>> +	 */
>> +	return __atomic_load_n(&seqlock->sn, __ATOMIC_ACQUIRE); }
>> +
>> +/**
>> + * End a read-side critical section.
>> + *
>> + * A call to this function marks the end of a read-side critical
> Should we capture that it also begins a new critical-section for the subsequent calls to rte_seqlock_tryunlock()?
> 
>> + * section, for @p seqlock. The application must supply the sequence
>> + * number produced by the corresponding rte_seqlock_read_lock() (or,
>> + * in case of a retry, the rte_seqlock_tryunlock()) call.
>> + *
>> + * After this function has been called, the caller should not access
>> + * the protected data.
> I understand what you mean here. But, I think this needs clarity.
> In the documentation for rte_seqlock_read_lock() you have mentioned, if rte_seqlock_read_tryunlock() returns false, one could re-read the data.
> May be this should be changed to:
> " After this function returns true, the caller should not access the protected data."?
> Or may be combine it with the following para.
> 
>> + *
>> + * In case this function returns true, the just-read data was
>> + * consistent and the set of atomic and non-atomic load operations
>> + * performed between rte_seqlock_read_lock() and
>> + * rte_seqlock_read_tryunlock() were atomic, as a whole.
>> + *
>> + * In case rte_seqlock_read_tryunlock() returns false, the data was
>> + * modified as it was being read and may be inconsistent, and thus
>> + * should be discarded. The @p begin_sn is updated with the
>> + * now-current sequence number.
> May be
> " The @p begin_sn is updated with the sequence number for the next critical section."
> 

Sounds good.

>> + *
>> + * @param seqlock
>> + *   A pointer to the seqlock.
>> + * @param begin_sn
>> + *   The seqlock sequence number returned by
>> + *   rte_seqlock_read_lock() (potentially updated in subsequent
>> + *   rte_seqlock_read_tryunlock() calls) for this critical section.
>> + * @return
>> + *   true or false, if the just-read seqlock-protected data was consistent
>> + *   or inconsistent, respectively, at the time it was read.
> true - just read protected data was consistent
> false - just read protected data was inconsistent
> 
>> + *
>> + * @see rte_seqlock_read_lock()
>> + */
>> +__rte_experimental
>> +static inline bool
>> +rte_seqlock_read_tryunlock(const rte_seqlock_t *seqlock, uint32_t
>> +*begin_sn) {
>> +	uint32_t end_sn;
>> +
>> +	/* make sure the data loads happens before the sn load */
>> +	rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
>> +
>> +	end_sn = __atomic_load_n(&seqlock->sn, __ATOMIC_RELAXED);
>> +
>> +	if (unlikely(end_sn & 1 || *begin_sn != end_sn)) {
>> +		*begin_sn = end_sn;
>> +		return false;
>> +	}
>> +
>> +	return true;
>> +}
>> +
>> +/**
>> + * Begin a write-side critical section.
>> + *
>> + * A call to this function acquires the write lock associated @p
>> + * seqlock, and marks the beginning of a write-side critical section.
>> + *
>> + * After having called this function, the caller may go on to modify
>> + * (both read and write) the protected data, in an atomic or
>> + * non-atomic manner.
>> + *
>> + * After the necessary updates have been performed, the application
>> + * calls rte_seqlock_write_unlock().
>> + *
>> + * This function is not preemption-safe in the sense that preemption
>> + * of the calling thread may block reader progress until the writer
>> + * thread is rescheduled.
>> + *
>> + * Unlike rte_seqlock_read_lock(), each call made to
>> + * rte_seqlock_write_lock() must be matched with an unlock call.
>> + *
>> + * @param seqlock
>> + *   A pointer to the seqlock.
>> + *
>> + * @see rte_seqlock_write_unlock()
>> + */
>> +__rte_experimental
>> +static inline void
>> +rte_seqlock_write_lock(rte_seqlock_t *seqlock) {
>> +	uint32_t sn;
>> +
>> +	/* to synchronize with other writers */
>> +	rte_spinlock_lock(&seqlock->lock);
>> +
>> +	sn = seqlock->sn + 1;
> The load of seqlock->sn could use __atomic_load_n to be consistent.
> 

But why? I know it doesn't have any cost (these loads are going to be 
atomic anyways), but why use a construct with stronger guarantees than 
you have to?

>> +
>> +	__atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELAXED);
>> +
>> +	/* __ATOMIC_RELEASE to prevent stores after (in program order)
>> +	 * from happening before the sn store.
>> +	 */
>> +	rte_atomic_thread_fence(__ATOMIC_RELEASE);
>> +}
>> +
>> +/**
>> + * End a write-side critical section.
>> + *
>> + * A call to this function marks the end of the write-side critical
>> + * section, for @p seqlock. After this call has been made, the
>> +protected
>> + * data may no longer be modified.
>> + *
>> + * @param seqlock
>> + *   A pointer to the seqlock.
>> + *
>> + * @see rte_seqlock_write_lock()
>> + */
>> +__rte_experimental
>> +static inline void
>> +rte_seqlock_write_unlock(rte_seqlock_t *seqlock) {
>> +	uint32_t sn;
>> +
>> +	sn = seqlock->sn + 1;
> Same here, the load of seqlock->sn could use __atomic_load_n
> 
>> +
>> +	/* synchronizes-with the load acquire in rte_seqlock_read_lock() */
>> +	__atomic_store_n(&seqlock->sn, sn, __ATOMIC_RELEASE);
>> +
>> +	rte_spinlock_unlock(&seqlock->lock);
>> +}
>> +
>> +#ifdef __cplusplus
>> +}
>> +#endif
>> +
>> +#endif  /* _RTE_SEQLOCK_H_ */
>> diff --git a/lib/eal/version.map b/lib/eal/version.map index
>> b53eeb30d7..4a9d0ed899 100644
>> --- a/lib/eal/version.map
>> +++ b/lib/eal/version.map
>> @@ -420,6 +420,9 @@ EXPERIMENTAL {
>>   	rte_intr_instance_free;
>>   	rte_intr_type_get;
>>   	rte_intr_type_set;
>> +
>> +	# added in 22.07
>> +	rte_seqlock_init;
>>   };
>>
>>   INTERNAL {
>> --
>> 2.25.1
>