DPDK patches and discussions
 help / color / mirror / Atom feed
From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
To: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>,
	"paulmck@linux.ibm.com" <paulmck@linux.ibm.com>
Cc: "stephen@networkplumber.org" <stephen@networkplumber.org>,
	"Kovacevic, Marko" <marko.kovacevic@intel.com>,
	"dev@dpdk.org" <dev@dpdk.org>,
	"Gavin Hu (Arm Technology China)" <Gavin.Hu@arm.com>,
	Dharmik Thakkar <Dharmik.Thakkar@arm.com>,
	Malvika Gupta <Malvika.Gupta@arm.com>,
	Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>,
	nd <nd@arm.com>, nd <nd@arm.com>
Subject: Re: [dpdk-dev] [PATCH v4 1/3] rcu: add RCU library supporting QSBR mechanism
Date: Mon, 15 Apr 2019 19:46:28 +0000	[thread overview]
Message-ID: <VE1PR08MB5149E8A6CC456B98B6CBE320982B0@VE1PR08MB5149.eurprd08.prod.outlook.com> (raw)
Message-ID: <20190415194628.XgA_8n35Fx5pIka2ZpkkV7s9RJsNK463h7DpW9NCYZg@z> (raw)
In-Reply-To: <2601191342CEEE43887BDE71AB9772580148A9800A@irsmsx105.ger.corp.intel.com>

> 
> > > > >
> > > > > On Wed, Apr 10, 2019 at 06:20:04AM -0500, Honnappa Nagarahalli
> > > wrote:
> > > > > > Add RCU library supporting quiescent state based memory
> > > > > > reclamation
> > > > > method.
> > > > > > This library helps identify the quiescent state of the reader
> > > > > > threads so that the writers can free the memory associated
> > > > > > with the lock less data structures.
> > > > >
> > > > > I don't see any sign of read-side markers (rcu_read_lock() and
> > > > > rcu_read_unlock() in the Linux kernel, userspace RCU, etc.).
> > > > >
> > > > > Yes, strictly speaking, these are not needed for QSBR to
> > > > > operate, but they
> > > > These APIs would be empty for QSBR.
> > > >
> > > > > make it way easier to maintain and debug code using RCU.  For
> > > > > example, given the read-side markers, you can check for errors
> > > > > like having a call to
> > > > > rte_rcu_qsbr_quiescent() in the middle of a reader quite easily.
> > > > > Without those read-side markers, life can be quite hard and you
> > > > > will really hate yourself for failing to have provided them.
> > > >
> > > > Want to make sure I understood this, do you mean the application
> > > would mark before and after accessing the shared data structure on
> > > the reader side?
> > > >
> > > > rte_rcu_qsbr_lock()
> > > > <begin access shared data structure> ...
> > > > ...
> > > > <end access shared data structure>
> > > > rte_rcu_qsbr_unlock()
> > >
> > > Yes, that is the idea.
> > >
> > > > If someone is debugging this code, they have to make sure that
> > > > there is
> > > an unlock for every lock and there is no call to
> > > rte_rcu_qsbr_quiescent in between.
> > > > It sounds good to me. Obviously, they will not add any additional
> > > > cycles
> > > as well.
> > > > Please let me know if my understanding is correct.
> > >
> > > Yes.  And in some sort of debug mode, you could capture the counter
> > > at
> > > rte_rcu_qsbr_lock() time and check it at rte_rcu_qsbr_unlock() time.
> > > If the counter has advanced too far (more than one, if I am not too
> > > confused) there is a bug.  Also in debug mode, you could have
> > > rte_rcu_qsbr_lock() increment a per-thread counter and
> rte_rcu_qsbr_unlock() decrement it.
> > > If the counter is non-zero at a quiescent state, there is a bug.
> > > And so on.
> > >
> > Added this in V5
> >
> > <snip>
> >
> > > > > > +
> > > > > > +/* Get the memory size of QSBR variable */ size_t
> > > > > > +__rte_experimental rte_rcu_qsbr_get_memsize(uint32_t
> > > max_threads) {
> > > > > > +	size_t sz;
> > > > > > +
> > > > > > +	if (max_threads == 0) {
> > > > > > +		rte_log(RTE_LOG_ERR, rcu_log_type,
> > > > > > +			"%s(): Invalid max_threads %u\n",
> > > > > > +			__func__, max_threads);
> > > > > > +		rte_errno = EINVAL;
> > > > > > +
> > > > > > +		return 1;
> > > > > > +	}
> > > > > > +
> > > > > > +	sz = sizeof(struct rte_rcu_qsbr);
> > > > > > +
> > > > > > +	/* Add the size of quiescent state counter array */
> > > > > > +	sz += sizeof(struct rte_rcu_qsbr_cnt) * max_threads;
> > > > > > +
> > > > > > +	/* Add the size of the registered thread ID bitmap array */
> > > > > > +	sz += RTE_QSBR_THRID_ARRAY_SIZE(max_threads);
> > > > > > +
> > > > > > +	return RTE_ALIGN(sz, RTE_CACHE_LINE_SIZE);
> > > > >
> > > > > Given that you align here, should you also align in the earlier
> > > > > steps in the computation of sz?
> > > >
> > > > Agree. I will remove the align here and keep the earlier one as
> > > > the intent
> > > is to align the thread ID array.
> > >
> > > Sounds good!
> > Added this in V5
> >
> > >
> > > > > > +}
> > > > > > +
> > > > > > +/* Initialize a quiescent state variable */ int
> > > > > > +__rte_experimental rte_rcu_qsbr_init(struct rte_rcu_qsbr *v,
> > > uint32_t max_threads) {
> > > > > > +	size_t sz;
> > > > > > +
> > > > > > +	if (v == NULL) {
> > > > > > +		rte_log(RTE_LOG_ERR, rcu_log_type,
> > > > > > +			"%s(): Invalid input parameter\n", __func__);
> > > > > > +		rte_errno = EINVAL;
> > > > > > +
> > > > > > +		return 1;
> > > > > > +	}
> > > > > > +
> > > > > > +	sz = rte_rcu_qsbr_get_memsize(max_threads);
> > > > > > +	if (sz == 1)
> > > > > > +		return 1;
> > > > > > +
> > > > > > +	/* Set all the threads to offline */
> > > > > > +	memset(v, 0, sz);
> > > > >
> > > > > We calculate sz here, but it looks like the caller must also
> > > > > calculate it in order to correctly allocate the memory
> > > > > referenced by the "v" argument to this function, with bad things
> > > > > happening if the two calculations get different results.  Should
> > > > > "v" instead be allocated within this function to avoid this sort of
> problem?
> > > >
> > > > Earlier version allocated the memory with-in this library.
> > > > However, it was
> > > decided to go with the current implementation as it provides
> > > flexibility for the application to manage the memory as it sees fit.
> > > For ex: it could allocate this as part of another structure in a
> > > single allocation. This also falls inline with similar approach taken in
> other libraries.
> > >
> > > So the allocator APIs vary too much to allow a pointer to the
> > > desired allocator function to be passed in?  Or do you also want to
> > > allow static allocation?  If the latter, would a DEFINE_RTE_RCU_QSBR()
> be of use?
> > >
> > This is done to allow for allocation of memory for QS variable as part
> > of a another bigger data structure. This will help in not fragmenting the
> memory. For ex:
> >
> > struct xyz {
> >     rte_ring *ring;
> >     rte_rcu_qsbr *v;
> >     abc *t;
> > };
> > struct xyz c;
> >
> > Memory for the above structure can be allocated in one chunk by
> calculating the size required.
> >
> > In some use cases static allocation might be enough as 'max_threads'
> > might be a compile time constant. I am not sure on how to support both
> dynamic and static 'max_threads'.
> 
> Same thought here-  would be good to have a static initializer
> (DEFINE_RTE_RCU_QSBR), but that means new compile time limit
> ('max_threads') - thing that we try to avoid.
> 
> >
> > > > > > +	v->max_threads = max_threads;
> > > > > > +	v->num_elems = RTE_ALIGN_MUL_CEIL(max_threads,
> > > > > > +			RTE_QSBR_THRID_ARRAY_ELM_SIZE) /
> > > > > > +			RTE_QSBR_THRID_ARRAY_ELM_SIZE;
> > > > > > +	v->token = RTE_QSBR_CNT_INIT;
> > > > > > +
> > > > > > +	return 0;
> > > > > > +}
> > > > > > +
> > > > > > +/* Register a reader thread to report its quiescent state
> > > > > > + * on a QS variable.
> > > > > > + */
> > > > > > +int __rte_experimental
> > > > > > +rte_rcu_qsbr_thread_register(struct rte_rcu_qsbr *v, unsigned
> > > > > > +int
> > > > > > +thread_id) {
> > > > > > +	unsigned int i, id, success;
> > > > > > +	uint64_t old_bmap, new_bmap;
> > > > > > +
> > > > > > +	if (v == NULL || thread_id >= v->max_threads) {
> > > > > > +		rte_log(RTE_LOG_ERR, rcu_log_type,
> > > > > > +			"%s(): Invalid input parameter\n", __func__);
> > > > > > +		rte_errno = EINVAL;
> > > > > > +
> > > > > > +		return 1;
> > > > > > +	}
> > > > > > +
> > > > > > +	id = thread_id & RTE_QSBR_THRID_MASK;
> > > > > > +	i = thread_id >> RTE_QSBR_THRID_INDEX_SHIFT;
> > > > > > +
> > > > > > +	/* Make sure that the counter for registered threads does
> not
> > > > > > +	 * go out of sync. Hence, additional checks are required.
> > > > > > +	 */
> > > > > > +	/* Check if the thread is already registered */
> > > > > > +	old_bmap =
> __atomic_load_n(RTE_QSBR_THRID_ARRAY_ELM(v, i),
> > > > > > +					__ATOMIC_RELAXED);
> > > > > > +	if (old_bmap & 1UL << id)
> > > > > > +		return 0;
> > > > > > +
> > > > > > +	do {
> > > > > > +		new_bmap = old_bmap | (1UL << id);
> > > > > > +		success = __atomic_compare_exchange(
> > > > > > +
> 	RTE_QSBR_THRID_ARRAY_ELM(v, i),
> > > > > > +					&old_bmap, &new_bmap, 0,
> > > > > > +					__ATOMIC_RELEASE,
> > > > > __ATOMIC_RELAXED);
> > > > > > +
> > > > > > +		if (success)
> > > > > > +			__atomic_fetch_add(&v->num_threads,
> > > > > > +						1,
> __ATOMIC_RELAXED);
> > > > > > +		else if (old_bmap & (1UL << id))
> > > > > > +			/* Someone else registered this thread.
> > > > > > +			 * Counter should not be incremented.
> > > > > > +			 */
> > > > > > +			return 0;
> > > > > > +	} while (success == 0);
> > > > >
> > > > > This would be simpler if threads were required to register
> themselves.
> > > > > Maybe you have use cases requiring registration of other
> > > > > threads, but this capability is adding significant complexity,
> > > > > so it might be worth some thought.
> > > > >
> > > > It was simple earlier (__atomic_fetch_or). The complexity is added
> > > > as
> > > 'num_threads' should not go out of sync.
> > >
> > > Hmmm...
> > >
> > > So threads are allowed to register other threads?  Or is there some
> > > other reason that concurrent registration is required?
> > >
> > Yes, control plane threads can register the fast path threads. Though,
> > I am not sure how useful it is. I did not want to add the restriction. I
> expect that reader threads will register themselves. The reader threads
> require concurrent registration as they all will be running in parallel.
> > If the requirement of keeping track of the number of threads registered
> currently goes away, then this function will be simple.
> >
> > <snip>
> >
> > > > > > diff --git a/lib/librte_rcu/rte_rcu_qsbr.h
> > > > > > b/lib/librte_rcu/rte_rcu_qsbr.h new file mode 100644 index
> > > > > > 000000000..ff696aeab
> > > > > > --- /dev/null
> > > > > > +++ b/lib/librte_rcu/rte_rcu_qsbr.h
> > > > > > @@ -0,0 +1,554 @@
> > > > > > +/* SPDX-License-Identifier: BSD-3-Clause
> > > > > > + * Copyright (c) 2018 Arm Limited  */
> > > > > > +
> > > > > > +#ifndef _RTE_RCU_QSBR_H_
> > > > > > +#define _RTE_RCU_QSBR_H_
> > > > > > +
> > > > > > +/**
> > > > > > + * @file
> > > > > > + * RTE Quiescent State Based Reclamation (QSBR)
> > > > > > + *
> > > > > > + * Quiescent State (QS) is any point in the thread execution
> > > > > > + * where the thread does not hold a reference to a data
> > > > > > +structure
> > > > > > + * in shared memory. While using lock-less data structures,
> > > > > > +the writer
> > > > > > + * can safely free memory once all the reader threads have
> > > > > > +entered
> > > > > > + * quiescent state.
> > > > > > + *
> > > > > > + * This library provides the ability for the readers to
> > > > > > +report quiescent
> > > > > > + * state and for the writers to identify when all the readers
> > > > > > +have
> > > > > > + * entered quiescent state.
> > > > > > + */
> > > > > > +
> > > > > > +#ifdef __cplusplus
> > > > > > +extern "C" {
> > > > > > +#endif
> > > > > > +
> > > > > > +#include <stdio.h>
> > > > > > +#include <stdint.h>
> > > > > > +#include <errno.h>
> > > > > > +#include <rte_common.h>
> > > > > > +#include <rte_memory.h>
> > > > > > +#include <rte_lcore.h>
> > > > > > +#include <rte_debug.h>
> > > > > > +#include <rte_atomic.h>
> > > > > > +
> > > > > > +extern int rcu_log_type;
> > > > > > +
> > > > > > +#if RTE_LOG_DP_LEVEL >= RTE_LOG_DEBUG #define
> > > RCU_DP_LOG(level,
> > > > > fmt,
> > > > > > +args...) \
> > > > > > +	rte_log(RTE_LOG_ ## level, rcu_log_type, \
> > > > > > +		"%s(): " fmt "\n", __func__, ## args) #else #define
> > > > > > +RCU_DP_LOG(level, fmt, args...) #endif
> > > > > > +
> > > > > > +/* Registered thread IDs are stored as a bitmap of 64b
> > > > > > +element
> > > array.
> > > > > > + * Given thread id needs to be converted to index into the
> > > > > > +array and
> > > > > > + * the id within the array element.
> > > > > > + */
> > > > > > +#define RTE_QSBR_THRID_ARRAY_ELM_SIZE (sizeof(uint64_t) * 8)
> > > > > #define
> > > > > > +RTE_QSBR_THRID_ARRAY_SIZE(max_threads) \
> > > > > > +	RTE_ALIGN(RTE_ALIGN_MUL_CEIL(max_threads, \
> > > > > > +		RTE_QSBR_THRID_ARRAY_ELM_SIZE) >> 3,
> > > > > RTE_CACHE_LINE_SIZE) #define
> > > > > > +RTE_QSBR_THRID_ARRAY_ELM(v, i) ((uint64_t *) \
> > > > > > +	((struct rte_rcu_qsbr_cnt *)(v + 1) + v->max_threads) + i)
> > > > > > +#define RTE_QSBR_THRID_INDEX_SHIFT 6 #define
> > > RTE_QSBR_THRID_MASK
> > > > > > +0x3f
> > > > > #define
> > > > > > +RTE_QSBR_THRID_INVALID 0xffffffff
> > > > > > +
> > > > > > +/* Worker thread counter */
> > > > > > +struct rte_rcu_qsbr_cnt {
> > > > > > +	uint64_t cnt;
> > > > > > +	/**< Quiescent state counter. Value 0 indicates the thread
> > > > > > +is offline */ } __rte_cache_aligned;
> > > > > > +
> > > > > > +#define RTE_QSBR_CNT_THR_OFFLINE 0 #define
> > > RTE_QSBR_CNT_INIT 1
> > > > > > +
> > > > > > +/* RTE Quiescent State variable structure.
> > > > > > + * This structure has two elements that vary in size based on
> > > > > > +the
> > > > > > + * 'max_threads' parameter.
> > > > > > + * 1) Quiescent state counter array
> > > > > > + * 2) Register thread ID array  */ struct rte_rcu_qsbr {
> > > > > > +	uint64_t token __rte_cache_aligned;
> > > > > > +	/**< Counter to allow for multiple concurrent quiescent
> > > > > > +state queries */
> > > > > > +
> > > > > > +	uint32_t num_elems __rte_cache_aligned;
> > > > > > +	/**< Number of elements in the thread ID array */
> > > > > > +	uint32_t num_threads;
> > > > > > +	/**< Number of threads currently using this QS variable */
> > > > > > +	uint32_t max_threads;
> > > > > > +	/**< Maximum number of threads using this QS variable */
> > > > > > +
> > > > > > +	struct rte_rcu_qsbr_cnt qsbr_cnt[0] __rte_cache_aligned;
> > > > > > +	/**< Quiescent state counter array of 'max_threads'
> elements
> > > > > > +*/
> > > > > > +
> > > > > > +	/**< Registered thread IDs are stored in a bitmap array,
> > > > > > +	 *   after the quiescent state counter array.
> > > > > > +	 */
> > > > > > +} __rte_cache_aligned;
> > > > > > +
> >
> > <snip>
> >
> > > > > > +
> > > > > > +/* Check the quiescent state counter for registered threads
> > > > > > +only, assuming
> > > > > > + * that not all threads have registered.
> > > > > > + */
> > > > > > +static __rte_always_inline int
> > > > > > +__rcu_qsbr_check_selective(struct rte_rcu_qsbr *v, uint64_t
> > > > > > +t, bool
> > > > > > +wait) {
> > > > > > +	uint32_t i, j, id;
> > > > > > +	uint64_t bmap;
> > > > > > +	uint64_t c;
> > > > > > +	uint64_t *reg_thread_id;
> > > > > > +
> > > > > > +	for (i = 0, reg_thread_id = RTE_QSBR_THRID_ARRAY_ELM(v,
> 0);
> > > > > > +		i < v->num_elems;
> > > > > > +		i++, reg_thread_id++) {
> > > > > > +		/* Load the current registered thread bit map
> before
> > > > > > +		 * loading the reader thread quiescent state
> counters.
> > > > > > +		 */
> > > > > > +		bmap = __atomic_load_n(reg_thread_id,
> > > > > __ATOMIC_ACQUIRE);
> > > > > > +		id = i << RTE_QSBR_THRID_INDEX_SHIFT;
> > > > > > +
> > > > > > +		while (bmap) {
> > > > > > +			j = __builtin_ctzl(bmap);
> > > > > > +			RCU_DP_LOG(DEBUG,
> > > > > > +				"%s: check: token = %lu, wait = %d,
> Bit Map
> > > > > = 0x%lx, Thread ID = %d",
> > > > > > +				__func__, t, wait, bmap, id + j);
> > > > > > +			c = __atomic_load_n(
> > > > > > +					&v->qsbr_cnt[id + j].cnt,
> > > > > > +					__ATOMIC_ACQUIRE);
> > > > > > +			RCU_DP_LOG(DEBUG,
> > > > > > +				"%s: status: token = %lu, wait = %d,
> Thread
> > > > > QS cnt = %lu, Thread ID = %d",
> > > > > > +				__func__, t, wait, c, id+j);
> > > > > > +			/* Counter is not checked for wrap-around
> > > > > condition
> > > > > > +			 * as it is a 64b counter.
> > > > > > +			 */
> > > > > > +			if (unlikely(c !=
> RTE_QSBR_CNT_THR_OFFLINE && c
> > > > > < t)) {
> > > > >
> > > > > This assumes that a 64-bit counter won't overflow, which is
> > > > > close enough to true given current CPU clock frequencies.  ;-)
> > > > >
> > > > > > +				/* This thread is not in quiescent
> state */
> > > > > > +				if (!wait)
> > > > > > +					return 0;
> > > > > > +
> > > > > > +				rte_pause();
> > > > > > +				/* This thread might have
> unregistered.
> > > > > > +				 * Re-read the bitmap.
> > > > > > +				 */
> > > > > > +				bmap =
> __atomic_load_n(reg_thread_id,
> > > > > > +						__ATOMIC_ACQUIRE);
> > > > > > +
> > > > > > +				continue;
> > > > > > +			}
> > > > > > +
> > > > > > +			bmap &= ~(1UL << j);
> > > > > > +		}
> > > > > > +	}
> > > > > > +
> > > > > > +	return 1;
> > > > > > +}
> > > > > > +
> > > > > > +/* Check the quiescent state counter for all threads,
> > > > > > +assuming that
> > > > > > + * all the threads have registered.
> > > > > > + */
> > > > > > +static __rte_always_inline int __rcu_qsbr_check_all(struct
> > > > > > +rte_rcu_qsbr *v, uint64_t t, bool
> > > > > > +wait)
> > > > >
> > > > > Does checking the bitmap really take long enough to make this
> > > > > worthwhile as a separate function?  I would think that the
> > > > > bitmap-checking time would be lost in the noise of cache misses
> > > > > from
> > > the ->cnt loads.
> > > >
> > > > It avoids accessing one cache line. I think this is where the
> > > > savings are
> > > (may be in theory). This is the most probable use case.
> > > > On the other hand, __rcu_qsbr_check_selective() will result in
> > > > savings
> > > (depending on how many threads are currently registered) by avoiding
> > > accessing unwanted counters.
> > >
> > > Do you really expect to be calling this function on any kind of fastpath?
> >
> > Yes. For some of the libraries (rte_hash), the writer is on the fast path.
> >
> > >
> > > > > Sure, if you invoke __rcu_qsbr_check_selective() in a tight loop
> > > > > in the absence of readers, you might see __rcu_qsbr_check_all()
> > > > > being a bit faster.  But is that really what DPDK does?
> > > > I see improvements in the synthetic test case (similar to the one
> > > > you
> > > have described, around 27%). However, in the more practical test
> > > cases I do not see any difference.
> > >
> > > If the performance improvement only occurs in a synthetic test case,
> > > does it really make sense to optimize for it?
> > I had to fix few issues in the performance test cases and added more to
> do the comparison. These changes are in v5.
> > There are 4 performance tests involving this API.
> > 1) 1 Writer, 'N' readers
> >      Writer: qsbr_start, qsbr_check(wait = true)
> >      Readers: qsbr_quiescent
> > 2) 'N' writers
> >      Writers: qsbr_start, qsbr_check(wait == false)
> > 3) 1 Writer, 'N' readers (this test uses the lock-free rte_hash data
> structure)
> >      Writer: hash_del, qsbr_start, qsbr_check(wait = true), validate that
> the reader was able to complete its work successfully
> >      Readers: thread_online, hash_lookup, access the pointer - do some
> > work on it, qsbr_quiescent, thread_offline
> > 4) Same as test 3) but qsbr_check (wait == false)
> >
> > There are 2 sets of these tests.
> > a) QS variable is created with number of threads same as number of
> > readers - this will exercise __rcu_qsbr_check_all
> > b) QS variable is created with 128 threads, number of registered
> > threads is same as in a) - this will exercise
> > __rcu_qsbr_check_selective
> >
> > Following are the results on x86 (E5-2660 v4 @ 2.00GHz) comparing from
> > a) to b) (on x86 in my setup, the results are not very stable between
> > runs)
> > 1) 25%
> > 2) -3%
> > 3) -0.4%
> > 4) 1.38%
> >
> > Following are the results on an Arm system comparing from a) to b)
> > (results are not pretty stable between runs)
                           ^^^
Correction, on the Arm system, the results *are* stable (copy-paste error)

> > 1) -3.45%
> > 2) 0%
> > 3) -0.03%
> > 4) -0.04%
> >
> > Konstantin, is it possible to run the tests on your setup and look at the
> results?
> 
> I did run V5 on my box (SKX 2.1 GHz) with 17 lcores (1 physical core per
> thread).
> Didn't notice any siginifcatn fluctuations between runs, output below.
> 
> >rcu_qsbr_perf_autotesESC[0Kt
> Number of cores provided = 17
> Perf test with all reader threads registered
> --------------------------------------------
> 
> Perf Test: 16 Readers/1 Writer('wait' in qsbr_check == true) Total RCU
> updates = 65707232899 Cycles per 1000 updates: 18482 Total RCU checks =
> 20000000 Cycles per 1000 checks: 3794991
> 
> Perf Test: 17 Readers
> Total RCU updates = 1700000000
> Cycles per 1000 updates: 2128
> 
> Perf test: 17 Writers ('wait' in qsbr_check == false) Total RCU checks =
> 340000000 Cycles per 1000 checks: 10030
> 
> Perf test: 1 writer, 17 readers, 1 QSBR variable, 1 QSBR Query, Blocking
> QSBR Check Following numbers include calls to rte_hash functions Cycles
> per 1 update(online/update/offline): 1984696 Cycles per 1 check(start,
> check): 2619002
> 
> Perf test: 1 writer, 17 readers, 1 QSBR variable, 1 QSBR Query, Non-
> Blocking QSBR check Following numbers include calls to rte_hash functions
> Cycles per 1 update(online/update/offline): 2028030 Cycles per 1
> check(start, check): 2876667
> 
> Perf test with some of reader threads registered
> ------------------------------------------------
> 
> Perf Test: 16 Readers/1 Writer('wait' in qsbr_check == true) Total RCU
> updates = 68850073055 Cycles per 1000 updates: 25490 Total RCU checks =
> 20000000 Cycles per 1000 checks: 5484403
> 
> Perf Test: 17 Readers
> Total RCU updates = 1700000000
> Cycles per 1000 updates: 2127
> 
> Perf test: 17 Writers ('wait' in qsbr_check == false) Total RCU checks =
> 340000000 Cycles per 1000 checks: 10034
> 
> Perf test: 1 writer, 17 readers, 1 QSBR variable, 1 QSBR Query, Blocking
> QSBR Check Following numbers include calls to rte_hash functions Cycles
> per 1 update(online/update/offline): 3604489 Cycles per 1 check(start,
> check): 7077372
> 
> Perf test: 1 writer, 17 readers, 1 QSBR variable, 1 QSBR Query, Non-
> Blocking QSBR check Following numbers include calls to rte_hash functions
> Cycles per 1 update(online/update/offline): 3936831 Cycles per 1
> check(start, check): 7262738
> 
> 
> Test OK
Thanks for running the test. From the numbers, the comparison is as follows:
1) -44%
2) 0.03%
3) -170%
4) -152%

Trend is the same between x86 and Arm. However, x86 has drastic improvement with __rcu_qsbr_check_all function.

> 
> Konstantin


  parent reply	other threads:[~2019-04-15 19:46 UTC|newest]

Thread overview: 260+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-22  3:30 [dpdk-dev] [RFC 0/3] tqs: add thread quiescent state library Honnappa Nagarahalli
2018-11-22  3:30 ` [dpdk-dev] [RFC 1/3] log: add TQS log type Honnappa Nagarahalli
2018-11-27 22:24   ` Stephen Hemminger
2018-11-28  5:58     ` Honnappa Nagarahalli
2018-11-22  3:30 ` [dpdk-dev] [RFC 2/3] tqs: add thread quiescent state library Honnappa Nagarahalli
2018-11-24 12:18   ` Ananyev, Konstantin
2018-11-27 21:32     ` Honnappa Nagarahalli
2018-11-28 15:25       ` Ananyev, Konstantin
2018-12-07  7:27         ` Honnappa Nagarahalli
2018-12-07 17:29           ` Stephen Hemminger
2018-12-11  6:40             ` Honnappa Nagarahalli
2018-12-13 12:26               ` Burakov, Anatoly
2018-12-18  4:30                 ` Honnappa Nagarahalli
2018-12-18  6:31                   ` Stephen Hemminger
2018-12-12  9:29           ` Ananyev, Konstantin
2018-12-13  7:39             ` Honnappa Nagarahalli
2018-12-17 13:14               ` Ananyev, Konstantin
2018-11-22  3:30 ` [dpdk-dev] [RFC 3/3] test/tqs: Add API and functional tests Honnappa Nagarahalli
     [not found] ` <CGME20181122073110eucas1p17592400af6c0b807dc87e90d136575af@eucas1p1.samsung.com>
2018-11-22  7:31   ` [dpdk-dev] [RFC 0/3] tqs: add thread quiescent state library Ilya Maximets
2018-11-27 22:28 ` Stephen Hemminger
2018-11-27 22:49   ` Van Haaren, Harry
2018-11-28  5:31     ` Honnappa Nagarahalli
2018-11-28 23:23       ` Stephen Hemminger
2018-11-30  2:13         ` Honnappa Nagarahalli
2018-11-30 16:26           ` Luca Boccassi
2018-11-30 18:32             ` Stephen Hemminger
2018-11-30 20:20             ` Honnappa Nagarahalli
2018-11-30 20:56           ` Mattias Rönnblom
2018-11-30 23:44             ` Stephen Hemminger
2018-12-01 18:37               ` Honnappa Nagarahalli
2018-11-30  2:25     ` Honnappa Nagarahalli
2018-11-30 21:03       ` Mattias Rönnblom
2018-12-22  2:14 ` [dpdk-dev] [RFC v2 0/2] rcu: add RCU library supporting QSBR mechanism Honnappa Nagarahalli
2018-12-22  2:14   ` [dpdk-dev] [RFC v2 1/2] " Honnappa Nagarahalli
2019-01-15 11:39     ` Ananyev, Konstantin
2019-01-15 20:43       ` Honnappa Nagarahalli
2019-01-16 15:56         ` Ananyev, Konstantin
2019-01-18  6:48           ` Honnappa Nagarahalli
2019-01-18 12:14             ` Ananyev, Konstantin
2019-01-24 17:15               ` Honnappa Nagarahalli
2019-01-24 18:05                 ` Ananyev, Konstantin
2019-02-22  7:07                   ` Honnappa Nagarahalli
2018-12-22  2:14   ` [dpdk-dev] [RFC v2 2/2] test/rcu_qsbr: add API and functional tests Honnappa Nagarahalli
2018-12-23  7:30     ` Stephen Hemminger
2018-12-23 16:25       ` Paul E. McKenney
2019-01-18  7:04         ` Honnappa Nagarahalli
2019-02-22  7:04   ` [dpdk-dev] [RFC v3 0/5] rcu: add RCU library supporting QSBR mechanism Honnappa Nagarahalli
2019-02-22  7:04     ` [dpdk-dev] [RFC v3 1/5] " Honnappa Nagarahalli
2019-02-22  7:04     ` [dpdk-dev] [RFC v3 2/5] test/rcu_qsbr: add API and functional tests Honnappa Nagarahalli
2019-02-22  7:04     ` [dpdk-dev] [RFC v3 3/5] lib/rcu: add dynamic memory allocation capability Honnappa Nagarahalli
2019-02-22  7:04     ` [dpdk-dev] [RFC v3 4/5] test/rcu_qsbr: modify test cases for dynamic memory allocation Honnappa Nagarahalli
2019-02-22  7:04     ` [dpdk-dev] [RFC v3 5/5] lib/rcu: fix the size of register thread ID array size Honnappa Nagarahalli
2019-03-19  4:52 ` [dpdk-dev] [PATCH 0/3] lib/rcu: add RCU library supporting QSBR mechanism Honnappa Nagarahalli
2019-03-19  4:52   ` Honnappa Nagarahalli
2019-03-19  4:52   ` [dpdk-dev] [PATCH 1/3] rcu: " Honnappa Nagarahalli
2019-03-19  4:52     ` Honnappa Nagarahalli
2019-03-22 16:42     ` Ananyev, Konstantin
2019-03-22 16:42       ` Ananyev, Konstantin
2019-03-26  4:35       ` Honnappa Nagarahalli
2019-03-26  4:35         ` Honnappa Nagarahalli
2019-03-28 11:15         ` Ananyev, Konstantin
2019-03-28 11:15           ` Ananyev, Konstantin
2019-03-29  5:54           ` Honnappa Nagarahalli
2019-03-29  5:54             ` Honnappa Nagarahalli
2019-03-19  4:52   ` [dpdk-dev] [PATCH 2/3] test/rcu_qsbr: add API and functional tests Honnappa Nagarahalli
2019-03-19  4:52     ` Honnappa Nagarahalli
2019-03-19  4:52   ` [dpdk-dev] [PATCH 3/3] doc/rcu: add lib_rcu documentation Honnappa Nagarahalli
2019-03-19  4:52     ` Honnappa Nagarahalli
2019-03-25 11:34     ` Kovacevic, Marko
2019-03-25 11:34       ` Kovacevic, Marko
2019-03-26  4:43       ` Honnappa Nagarahalli
2019-03-26  4:43         ` Honnappa Nagarahalli
2019-03-27  5:52 ` [dpdk-dev] [PATCH v2 0/3] lib/rcu: add RCU library supporting QSBR mechanism Honnappa Nagarahalli
2019-03-27  5:52   ` Honnappa Nagarahalli
2019-03-27  5:52   ` [dpdk-dev] [PATCH v2 1/3] rcu: " Honnappa Nagarahalli
2019-03-27  5:52     ` Honnappa Nagarahalli
2019-03-27  5:52   ` [dpdk-dev] [PATCH v2 2/3] test/rcu_qsbr: add API and functional tests Honnappa Nagarahalli
2019-03-27  5:52     ` Honnappa Nagarahalli
2019-03-27  5:52   ` [dpdk-dev] [PATCH v2 3/3] doc/rcu: add lib_rcu documentation Honnappa Nagarahalli
2019-03-27  5:52     ` Honnappa Nagarahalli
2019-04-01 17:10 ` [dpdk-dev] [PATCH v3 0/3] lib/rcu: add RCU library supporting QSBR mechanism Honnappa Nagarahalli
2019-04-01 17:10   ` Honnappa Nagarahalli
2019-04-01 17:11   ` [dpdk-dev] [PATCH v3 1/3] rcu: " Honnappa Nagarahalli
2019-04-01 17:11     ` Honnappa Nagarahalli
2019-04-02 10:22     ` Ananyev, Konstantin
2019-04-02 10:22       ` Ananyev, Konstantin
2019-04-02 10:53       ` Ananyev, Konstantin
2019-04-02 10:53         ` Ananyev, Konstantin
2019-04-01 17:11   ` [dpdk-dev] [PATCH v3 2/3] test/rcu_qsbr: add API and functional tests Honnappa Nagarahalli
2019-04-01 17:11     ` Honnappa Nagarahalli
2019-04-02 10:55     ` Ananyev, Konstantin
2019-04-02 10:55       ` Ananyev, Konstantin
2019-04-01 17:11   ` [dpdk-dev] [PATCH v3 3/3] doc/rcu: add lib_rcu documentation Honnappa Nagarahalli
2019-04-01 17:11     ` Honnappa Nagarahalli
2019-04-10 11:20 ` [dpdk-dev] [PATCH v4 0/3] lib/rcu: add RCU library supporting QSBR mechanism Honnappa Nagarahalli
2019-04-10 11:20   ` Honnappa Nagarahalli
2019-04-10 11:20   ` [dpdk-dev] [PATCH v4 1/3] rcu: " Honnappa Nagarahalli
2019-04-10 11:20     ` Honnappa Nagarahalli
2019-04-10 18:14     ` Paul E. McKenney
2019-04-10 18:14       ` Paul E. McKenney
2019-04-11  4:35       ` Honnappa Nagarahalli
2019-04-11  4:35         ` Honnappa Nagarahalli
2019-04-11 15:26         ` Paul E. McKenney
2019-04-11 15:26           ` Paul E. McKenney
2019-04-12 20:21           ` Honnappa Nagarahalli
2019-04-12 20:21             ` Honnappa Nagarahalli
2019-04-15 16:51             ` Ananyev, Konstantin
2019-04-15 16:51               ` Ananyev, Konstantin
2019-04-15 19:46               ` Honnappa Nagarahalli [this message]
2019-04-15 19:46                 ` Honnappa Nagarahalli
2019-04-10 11:20   ` [dpdk-dev] [PATCH v4 2/3] test/rcu_qsbr: add API and functional tests Honnappa Nagarahalli
2019-04-10 11:20     ` Honnappa Nagarahalli
2019-04-10 15:26     ` Stephen Hemminger
2019-04-10 15:26       ` Stephen Hemminger
2019-04-10 16:15       ` Honnappa Nagarahalli
2019-04-10 16:15         ` Honnappa Nagarahalli
2019-04-10 11:20   ` [dpdk-dev] [PATCH v4 3/3] doc/rcu: add lib_rcu documentation Honnappa Nagarahalli
2019-04-10 11:20     ` Honnappa Nagarahalli
2019-04-12 20:20 ` [dpdk-dev] [PATCH v5 0/3] lib/rcu: add RCU library supporting QSBR mechanism Honnappa Nagarahalli
2019-04-12 20:20   ` Honnappa Nagarahalli
2019-04-12 20:20   ` [dpdk-dev] [PATCH v5 1/3] rcu: " Honnappa Nagarahalli
2019-04-12 20:20     ` Honnappa Nagarahalli
2019-04-12 22:06     ` Stephen Hemminger
2019-04-12 22:06       ` Stephen Hemminger
2019-04-12 22:24       ` Honnappa Nagarahalli
2019-04-12 22:24         ` Honnappa Nagarahalli
2019-04-12 23:06         ` Stephen Hemminger
2019-04-12 23:06           ` Stephen Hemminger
2019-04-15 12:24           ` Ananyev, Konstantin
2019-04-15 12:24             ` Ananyev, Konstantin
2019-04-15 15:38             ` Stephen Hemminger
2019-04-15 15:38               ` Stephen Hemminger
2019-04-15 17:39               ` Ananyev, Konstantin
2019-04-15 17:39                 ` Ananyev, Konstantin
2019-04-15 18:56                 ` Honnappa Nagarahalli
2019-04-15 18:56                   ` Honnappa Nagarahalli
2019-04-15 21:26                 ` Stephen Hemminger
2019-04-15 21:26                   ` Stephen Hemminger
2019-04-16  5:29                   ` Honnappa Nagarahalli
2019-04-16  5:29                     ` Honnappa Nagarahalli
2019-04-16 14:54                     ` Stephen Hemminger
2019-04-16 14:54                       ` Stephen Hemminger
2019-04-16 16:56                       ` Honnappa Nagarahalli
2019-04-16 16:56                         ` Honnappa Nagarahalli
2019-04-16 21:22                         ` Stephen Hemminger
2019-04-16 21:22                           ` Stephen Hemminger
2019-04-17  1:45                           ` Honnappa Nagarahalli
2019-04-17  1:45                             ` Honnappa Nagarahalli
2019-04-17 13:39                             ` Ananyev, Konstantin
2019-04-17 13:39                               ` Ananyev, Konstantin
2019-04-17 14:02                               ` Honnappa Nagarahalli
2019-04-17 14:02                                 ` Honnappa Nagarahalli
2019-04-17 14:18                               ` Thomas Monjalon
2019-04-17 14:18                                 ` Thomas Monjalon
2019-04-12 20:20   ` [dpdk-dev] [PATCH v5 2/3] test/rcu_qsbr: add API and functional tests Honnappa Nagarahalli
2019-04-12 20:20     ` Honnappa Nagarahalli
2019-04-12 20:20   ` [dpdk-dev] [PATCH v5 3/3] doc/rcu: add lib_rcu documentation Honnappa Nagarahalli
2019-04-12 20:20     ` Honnappa Nagarahalli
2019-04-15 17:29   ` [dpdk-dev] [PATCH v5 0/3] lib/rcu: add RCU library supporting QSBR mechanism Ananyev, Konstantin
2019-04-15 17:29     ` Ananyev, Konstantin
2019-04-16  5:10     ` Honnappa Nagarahalli
2019-04-16  5:10       ` Honnappa Nagarahalli
2019-04-17  4:13 ` [dpdk-dev] [PATCH v6 " Honnappa Nagarahalli
2019-04-17  4:13   ` Honnappa Nagarahalli
2019-04-17  4:13   ` [dpdk-dev] [PATCH v6 1/3] rcu: " Honnappa Nagarahalli
2019-04-17  4:13     ` Honnappa Nagarahalli
2019-04-19 19:19     ` Paul E. McKenney
2019-04-19 19:19       ` Paul E. McKenney
2019-04-23  1:08       ` Honnappa Nagarahalli
2019-04-23  1:08         ` Honnappa Nagarahalli
2019-04-17  4:13   ` [dpdk-dev] [PATCH v6 2/3] test/rcu_qsbr: add API and functional tests Honnappa Nagarahalli
2019-04-17  4:13     ` Honnappa Nagarahalli
2019-04-17  4:13   ` [dpdk-dev] [PATCH v6 3/3] doc/rcu: add lib_rcu documentation Honnappa Nagarahalli
2019-04-17  4:13     ` Honnappa Nagarahalli
2019-04-21 16:40   ` [dpdk-dev] [PATCH v6 0/3] lib/rcu: add RCU library supporting QSBR mechanism Thomas Monjalon
2019-04-21 16:40     ` Thomas Monjalon
2019-04-25 14:18     ` Honnappa Nagarahalli
2019-04-25 14:18       ` Honnappa Nagarahalli
2019-04-25 14:27       ` Honnappa Nagarahalli
2019-04-25 14:27         ` Honnappa Nagarahalli
2019-04-25 14:38       ` David Marchand
2019-04-25 14:38         ` David Marchand
2019-04-23  4:31 ` [dpdk-dev] [PATCH v7 " Honnappa Nagarahalli
2019-04-23  4:31   ` Honnappa Nagarahalli
2019-04-23  4:31   ` [dpdk-dev] [PATCH v7 1/3] rcu: " Honnappa Nagarahalli
2019-04-23  4:31     ` Honnappa Nagarahalli
2019-04-23  8:10     ` Paul E. McKenney
2019-04-23  8:10       ` Paul E. McKenney
2019-04-23 21:23       ` Honnappa Nagarahalli
2019-04-23 21:23         ` Honnappa Nagarahalli
2019-04-24 20:02         ` Jerin Jacob Kollanukkaran
2019-04-24 20:02           ` Jerin Jacob Kollanukkaran
2019-04-25  5:15           ` Honnappa Nagarahalli
2019-04-25  5:15             ` Honnappa Nagarahalli
2019-04-24 10:03     ` Ruifeng Wang (Arm Technology China)
2019-04-24 10:03       ` Ruifeng Wang (Arm Technology China)
2019-04-23  4:31   ` [dpdk-dev] [PATCH v7 2/3] test/rcu_qsbr: add API and functional tests Honnappa Nagarahalli
2019-04-23  4:31     ` Honnappa Nagarahalli
2019-04-23  4:31   ` [dpdk-dev] [PATCH v7 3/3] doc/rcu: add lib_rcu documentation Honnappa Nagarahalli
2019-04-23  4:31     ` Honnappa Nagarahalli
2019-04-24 10:12     ` Ruifeng Wang (Arm Technology China)
2019-04-24 10:12       ` Ruifeng Wang (Arm Technology China)
2019-04-26  4:39 ` [dpdk-dev] [PATCH v8 0/4] lib/rcu: add RCU library supporting QSBR mechanism Honnappa Nagarahalli
2019-04-26  4:39   ` Honnappa Nagarahalli
2019-04-26  4:39   ` [dpdk-dev] [PATCH v8 1/4] rcu: " Honnappa Nagarahalli
2019-04-26  4:39     ` Honnappa Nagarahalli
2019-04-26  8:13     ` Jerin Jacob Kollanukkaran
2019-04-26  8:13       ` Jerin Jacob Kollanukkaran
2019-04-28  3:25     ` Ruifeng Wang (Arm Technology China)
2019-04-28  3:25       ` Ruifeng Wang (Arm Technology China)
2019-04-29 20:33       ` Thomas Monjalon
2019-04-29 20:33         ` Thomas Monjalon
2019-04-30 10:51         ` Hemant Agrawal
2019-04-30 10:51           ` Hemant Agrawal
2019-04-26  4:39   ` [dpdk-dev] [PATCH v8 2/4] test/rcu_qsbr: add API and functional tests Honnappa Nagarahalli
2019-04-26  4:39     ` Honnappa Nagarahalli
2019-04-29 20:35     ` Thomas Monjalon
2019-04-29 20:35       ` Thomas Monjalon
2019-04-30  4:20       ` Honnappa Nagarahalli
2019-04-30  4:20         ` Honnappa Nagarahalli
2019-04-30  7:58         ` Thomas Monjalon
2019-04-30  7:58           ` Thomas Monjalon
2019-04-26  4:39   ` [dpdk-dev] [PATCH v8 3/4] doc/rcu: add lib_rcu documentation Honnappa Nagarahalli
2019-04-26  4:39     ` Honnappa Nagarahalli
2019-04-26  4:40   ` [dpdk-dev] [PATCH v8 4/4] doc: added RCU to the release notes Honnappa Nagarahalli
2019-04-26  4:40     ` Honnappa Nagarahalli
2019-04-26 12:04   ` [dpdk-dev] [PATCH v8 0/4] lib/rcu: add RCU library supporting QSBR mechanism Ananyev, Konstantin
2019-04-26 12:04     ` Ananyev, Konstantin
2019-05-01  3:54 ` [dpdk-dev] [PATCH v9 " Honnappa Nagarahalli
2019-05-01  3:54   ` Honnappa Nagarahalli
2019-05-01  3:54   ` [dpdk-dev] [PATCH v9 1/4] rcu: " Honnappa Nagarahalli
2019-05-01  3:54     ` Honnappa Nagarahalli
2019-05-01  3:54   ` [dpdk-dev] [PATCH v9 2/4] test/rcu_qsbr: add API and functional tests Honnappa Nagarahalli
2019-05-01  3:54     ` Honnappa Nagarahalli
2019-05-03 14:31     ` David Marchand
2019-05-03 14:31       ` David Marchand
2019-05-06 23:16       ` Honnappa Nagarahalli
2019-05-06 23:16         ` Honnappa Nagarahalli
2019-05-01  3:54   ` [dpdk-dev] [PATCH v9 3/4] doc/rcu: add lib_rcu documentation Honnappa Nagarahalli
2019-05-01  3:54     ` Honnappa Nagarahalli
2019-05-01 11:37     ` Mcnamara, John
2019-05-01 11:37       ` Mcnamara, John
2019-05-01 21:20       ` Honnappa Nagarahalli
2019-05-01 21:20         ` Honnappa Nagarahalli
2019-05-01 21:32         ` Thomas Monjalon
2019-05-01 21:32           ` Thomas Monjalon
2019-05-01  3:54   ` [dpdk-dev] [PATCH v9 4/4] doc: added RCU to the release notes Honnappa Nagarahalli
2019-05-01  3:54     ` Honnappa Nagarahalli
2019-05-01 11:31     ` Mcnamara, John
2019-05-01 11:31       ` Mcnamara, John
2019-05-01 12:15   ` [dpdk-dev] [PATCH v9 0/4] lib/rcu: add RCU library supporting QSBR mechanism Neil Horman
2019-05-01 12:15     ` Neil Horman
2019-05-01 14:56     ` Honnappa Nagarahalli
2019-05-01 14:56       ` Honnappa Nagarahalli
2019-05-01 18:05       ` Neil Horman
2019-05-01 18:05         ` Neil Horman
2019-05-01 21:18         ` Honnappa Nagarahalli
2019-05-01 21:18           ` Honnappa Nagarahalli
2019-05-01 23:36   ` Thomas Monjalon
2019-05-01 23:36     ` Thomas Monjalon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=VE1PR08MB5149E8A6CC456B98B6CBE320982B0@VE1PR08MB5149.eurprd08.prod.outlook.com \
    --to=honnappa.nagarahalli@arm.com \
    --cc=Dharmik.Thakkar@arm.com \
    --cc=Gavin.Hu@arm.com \
    --cc=Malvika.Gupta@arm.com \
    --cc=dev@dpdk.org \
    --cc=konstantin.ananyev@intel.com \
    --cc=marko.kovacevic@intel.com \
    --cc=nd@arm.com \
    --cc=paulmck@linux.ibm.com \
    --cc=stephen@networkplumber.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).