From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
To: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>,
"paulmck@linux.ibm.com" <paulmck@linux.ibm.com>
Cc: "stephen@networkplumber.org" <stephen@networkplumber.org>,
"Kovacevic, Marko" <marko.kovacevic@intel.com>,
"dev@dpdk.org" <dev@dpdk.org>,
"Gavin Hu (Arm Technology China)" <Gavin.Hu@arm.com>,
Dharmik Thakkar <Dharmik.Thakkar@arm.com>,
Malvika Gupta <Malvika.Gupta@arm.com>,
Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>,
nd <nd@arm.com>, nd <nd@arm.com>
Subject: Re: [dpdk-dev] [PATCH v4 1/3] rcu: add RCU library supporting QSBR mechanism
Date: Mon, 15 Apr 2019 19:46:28 +0000 [thread overview]
Message-ID: <VE1PR08MB5149E8A6CC456B98B6CBE320982B0@VE1PR08MB5149.eurprd08.prod.outlook.com> (raw)
Message-ID: <20190415194628.XgA_8n35Fx5pIka2ZpkkV7s9RJsNK463h7DpW9NCYZg@z> (raw)
In-Reply-To: <2601191342CEEE43887BDE71AB9772580148A9800A@irsmsx105.ger.corp.intel.com>
>
> > > > >
> > > > > On Wed, Apr 10, 2019 at 06:20:04AM -0500, Honnappa Nagarahalli
> > > wrote:
> > > > > > Add RCU library supporting quiescent state based memory
> > > > > > reclamation
> > > > > method.
> > > > > > This library helps identify the quiescent state of the reader
> > > > > > threads so that the writers can free the memory associated
> > > > > > with the lock less data structures.
> > > > >
> > > > > I don't see any sign of read-side markers (rcu_read_lock() and
> > > > > rcu_read_unlock() in the Linux kernel, userspace RCU, etc.).
> > > > >
> > > > > Yes, strictly speaking, these are not needed for QSBR to
> > > > > operate, but they
> > > > These APIs would be empty for QSBR.
> > > >
> > > > > make it way easier to maintain and debug code using RCU. For
> > > > > example, given the read-side markers, you can check for errors
> > > > > like having a call to
> > > > > rte_rcu_qsbr_quiescent() in the middle of a reader quite easily.
> > > > > Without those read-side markers, life can be quite hard and you
> > > > > will really hate yourself for failing to have provided them.
> > > >
> > > > Want to make sure I understood this, do you mean the application
> > > would mark before and after accessing the shared data structure on
> > > the reader side?
> > > >
> > > > rte_rcu_qsbr_lock()
> > > > <begin access shared data structure> ...
> > > > ...
> > > > <end access shared data structure>
> > > > rte_rcu_qsbr_unlock()
> > >
> > > Yes, that is the idea.
> > >
> > > > If someone is debugging this code, they have to make sure that
> > > > there is
> > > an unlock for every lock and there is no call to
> > > rte_rcu_qsbr_quiescent in between.
> > > > It sounds good to me. Obviously, they will not add any additional
> > > > cycles
> > > as well.
> > > > Please let me know if my understanding is correct.
> > >
> > > Yes. And in some sort of debug mode, you could capture the counter
> > > at
> > > rte_rcu_qsbr_lock() time and check it at rte_rcu_qsbr_unlock() time.
> > > If the counter has advanced too far (more than one, if I am not too
> > > confused) there is a bug. Also in debug mode, you could have
> > > rte_rcu_qsbr_lock() increment a per-thread counter and
> rte_rcu_qsbr_unlock() decrement it.
> > > If the counter is non-zero at a quiescent state, there is a bug.
> > > And so on.
> > >
> > Added this in V5
> >
> > <snip>
> >
> > > > > > +
> > > > > > +/* Get the memory size of QSBR variable */ size_t
> > > > > > +__rte_experimental rte_rcu_qsbr_get_memsize(uint32_t
> > > max_threads) {
> > > > > > + size_t sz;
> > > > > > +
> > > > > > + if (max_threads == 0) {
> > > > > > + rte_log(RTE_LOG_ERR, rcu_log_type,
> > > > > > + "%s(): Invalid max_threads %u\n",
> > > > > > + __func__, max_threads);
> > > > > > + rte_errno = EINVAL;
> > > > > > +
> > > > > > + return 1;
> > > > > > + }
> > > > > > +
> > > > > > + sz = sizeof(struct rte_rcu_qsbr);
> > > > > > +
> > > > > > + /* Add the size of quiescent state counter array */
> > > > > > + sz += sizeof(struct rte_rcu_qsbr_cnt) * max_threads;
> > > > > > +
> > > > > > + /* Add the size of the registered thread ID bitmap array */
> > > > > > + sz += RTE_QSBR_THRID_ARRAY_SIZE(max_threads);
> > > > > > +
> > > > > > + return RTE_ALIGN(sz, RTE_CACHE_LINE_SIZE);
> > > > >
> > > > > Given that you align here, should you also align in the earlier
> > > > > steps in the computation of sz?
> > > >
> > > > Agree. I will remove the align here and keep the earlier one as
> > > > the intent
> > > is to align the thread ID array.
> > >
> > > Sounds good!
> > Added this in V5
> >
> > >
> > > > > > +}
> > > > > > +
> > > > > > +/* Initialize a quiescent state variable */ int
> > > > > > +__rte_experimental rte_rcu_qsbr_init(struct rte_rcu_qsbr *v,
> > > uint32_t max_threads) {
> > > > > > + size_t sz;
> > > > > > +
> > > > > > + if (v == NULL) {
> > > > > > + rte_log(RTE_LOG_ERR, rcu_log_type,
> > > > > > + "%s(): Invalid input parameter\n", __func__);
> > > > > > + rte_errno = EINVAL;
> > > > > > +
> > > > > > + return 1;
> > > > > > + }
> > > > > > +
> > > > > > + sz = rte_rcu_qsbr_get_memsize(max_threads);
> > > > > > + if (sz == 1)
> > > > > > + return 1;
> > > > > > +
> > > > > > + /* Set all the threads to offline */
> > > > > > + memset(v, 0, sz);
> > > > >
> > > > > We calculate sz here, but it looks like the caller must also
> > > > > calculate it in order to correctly allocate the memory
> > > > > referenced by the "v" argument to this function, with bad things
> > > > > happening if the two calculations get different results. Should
> > > > > "v" instead be allocated within this function to avoid this sort of
> problem?
> > > >
> > > > Earlier version allocated the memory with-in this library.
> > > > However, it was
> > > decided to go with the current implementation as it provides
> > > flexibility for the application to manage the memory as it sees fit.
> > > For ex: it could allocate this as part of another structure in a
> > > single allocation. This also falls inline with similar approach taken in
> other libraries.
> > >
> > > So the allocator APIs vary too much to allow a pointer to the
> > > desired allocator function to be passed in? Or do you also want to
> > > allow static allocation? If the latter, would a DEFINE_RTE_RCU_QSBR()
> be of use?
> > >
> > This is done to allow for allocation of memory for QS variable as part
> > of a another bigger data structure. This will help in not fragmenting the
> memory. For ex:
> >
> > struct xyz {
> > rte_ring *ring;
> > rte_rcu_qsbr *v;
> > abc *t;
> > };
> > struct xyz c;
> >
> > Memory for the above structure can be allocated in one chunk by
> calculating the size required.
> >
> > In some use cases static allocation might be enough as 'max_threads'
> > might be a compile time constant. I am not sure on how to support both
> dynamic and static 'max_threads'.
>
> Same thought here- would be good to have a static initializer
> (DEFINE_RTE_RCU_QSBR), but that means new compile time limit
> ('max_threads') - thing that we try to avoid.
>
> >
> > > > > > + v->max_threads = max_threads;
> > > > > > + v->num_elems = RTE_ALIGN_MUL_CEIL(max_threads,
> > > > > > + RTE_QSBR_THRID_ARRAY_ELM_SIZE) /
> > > > > > + RTE_QSBR_THRID_ARRAY_ELM_SIZE;
> > > > > > + v->token = RTE_QSBR_CNT_INIT;
> > > > > > +
> > > > > > + return 0;
> > > > > > +}
> > > > > > +
> > > > > > +/* Register a reader thread to report its quiescent state
> > > > > > + * on a QS variable.
> > > > > > + */
> > > > > > +int __rte_experimental
> > > > > > +rte_rcu_qsbr_thread_register(struct rte_rcu_qsbr *v, unsigned
> > > > > > +int
> > > > > > +thread_id) {
> > > > > > + unsigned int i, id, success;
> > > > > > + uint64_t old_bmap, new_bmap;
> > > > > > +
> > > > > > + if (v == NULL || thread_id >= v->max_threads) {
> > > > > > + rte_log(RTE_LOG_ERR, rcu_log_type,
> > > > > > + "%s(): Invalid input parameter\n", __func__);
> > > > > > + rte_errno = EINVAL;
> > > > > > +
> > > > > > + return 1;
> > > > > > + }
> > > > > > +
> > > > > > + id = thread_id & RTE_QSBR_THRID_MASK;
> > > > > > + i = thread_id >> RTE_QSBR_THRID_INDEX_SHIFT;
> > > > > > +
> > > > > > + /* Make sure that the counter for registered threads does
> not
> > > > > > + * go out of sync. Hence, additional checks are required.
> > > > > > + */
> > > > > > + /* Check if the thread is already registered */
> > > > > > + old_bmap =
> __atomic_load_n(RTE_QSBR_THRID_ARRAY_ELM(v, i),
> > > > > > + __ATOMIC_RELAXED);
> > > > > > + if (old_bmap & 1UL << id)
> > > > > > + return 0;
> > > > > > +
> > > > > > + do {
> > > > > > + new_bmap = old_bmap | (1UL << id);
> > > > > > + success = __atomic_compare_exchange(
> > > > > > +
> RTE_QSBR_THRID_ARRAY_ELM(v, i),
> > > > > > + &old_bmap, &new_bmap, 0,
> > > > > > + __ATOMIC_RELEASE,
> > > > > __ATOMIC_RELAXED);
> > > > > > +
> > > > > > + if (success)
> > > > > > + __atomic_fetch_add(&v->num_threads,
> > > > > > + 1,
> __ATOMIC_RELAXED);
> > > > > > + else if (old_bmap & (1UL << id))
> > > > > > + /* Someone else registered this thread.
> > > > > > + * Counter should not be incremented.
> > > > > > + */
> > > > > > + return 0;
> > > > > > + } while (success == 0);
> > > > >
> > > > > This would be simpler if threads were required to register
> themselves.
> > > > > Maybe you have use cases requiring registration of other
> > > > > threads, but this capability is adding significant complexity,
> > > > > so it might be worth some thought.
> > > > >
> > > > It was simple earlier (__atomic_fetch_or). The complexity is added
> > > > as
> > > 'num_threads' should not go out of sync.
> > >
> > > Hmmm...
> > >
> > > So threads are allowed to register other threads? Or is there some
> > > other reason that concurrent registration is required?
> > >
> > Yes, control plane threads can register the fast path threads. Though,
> > I am not sure how useful it is. I did not want to add the restriction. I
> expect that reader threads will register themselves. The reader threads
> require concurrent registration as they all will be running in parallel.
> > If the requirement of keeping track of the number of threads registered
> currently goes away, then this function will be simple.
> >
> > <snip>
> >
> > > > > > diff --git a/lib/librte_rcu/rte_rcu_qsbr.h
> > > > > > b/lib/librte_rcu/rte_rcu_qsbr.h new file mode 100644 index
> > > > > > 000000000..ff696aeab
> > > > > > --- /dev/null
> > > > > > +++ b/lib/librte_rcu/rte_rcu_qsbr.h
> > > > > > @@ -0,0 +1,554 @@
> > > > > > +/* SPDX-License-Identifier: BSD-3-Clause
> > > > > > + * Copyright (c) 2018 Arm Limited */
> > > > > > +
> > > > > > +#ifndef _RTE_RCU_QSBR_H_
> > > > > > +#define _RTE_RCU_QSBR_H_
> > > > > > +
> > > > > > +/**
> > > > > > + * @file
> > > > > > + * RTE Quiescent State Based Reclamation (QSBR)
> > > > > > + *
> > > > > > + * Quiescent State (QS) is any point in the thread execution
> > > > > > + * where the thread does not hold a reference to a data
> > > > > > +structure
> > > > > > + * in shared memory. While using lock-less data structures,
> > > > > > +the writer
> > > > > > + * can safely free memory once all the reader threads have
> > > > > > +entered
> > > > > > + * quiescent state.
> > > > > > + *
> > > > > > + * This library provides the ability for the readers to
> > > > > > +report quiescent
> > > > > > + * state and for the writers to identify when all the readers
> > > > > > +have
> > > > > > + * entered quiescent state.
> > > > > > + */
> > > > > > +
> > > > > > +#ifdef __cplusplus
> > > > > > +extern "C" {
> > > > > > +#endif
> > > > > > +
> > > > > > +#include <stdio.h>
> > > > > > +#include <stdint.h>
> > > > > > +#include <errno.h>
> > > > > > +#include <rte_common.h>
> > > > > > +#include <rte_memory.h>
> > > > > > +#include <rte_lcore.h>
> > > > > > +#include <rte_debug.h>
> > > > > > +#include <rte_atomic.h>
> > > > > > +
> > > > > > +extern int rcu_log_type;
> > > > > > +
> > > > > > +#if RTE_LOG_DP_LEVEL >= RTE_LOG_DEBUG #define
> > > RCU_DP_LOG(level,
> > > > > fmt,
> > > > > > +args...) \
> > > > > > + rte_log(RTE_LOG_ ## level, rcu_log_type, \
> > > > > > + "%s(): " fmt "\n", __func__, ## args) #else #define
> > > > > > +RCU_DP_LOG(level, fmt, args...) #endif
> > > > > > +
> > > > > > +/* Registered thread IDs are stored as a bitmap of 64b
> > > > > > +element
> > > array.
> > > > > > + * Given thread id needs to be converted to index into the
> > > > > > +array and
> > > > > > + * the id within the array element.
> > > > > > + */
> > > > > > +#define RTE_QSBR_THRID_ARRAY_ELM_SIZE (sizeof(uint64_t) * 8)
> > > > > #define
> > > > > > +RTE_QSBR_THRID_ARRAY_SIZE(max_threads) \
> > > > > > + RTE_ALIGN(RTE_ALIGN_MUL_CEIL(max_threads, \
> > > > > > + RTE_QSBR_THRID_ARRAY_ELM_SIZE) >> 3,
> > > > > RTE_CACHE_LINE_SIZE) #define
> > > > > > +RTE_QSBR_THRID_ARRAY_ELM(v, i) ((uint64_t *) \
> > > > > > + ((struct rte_rcu_qsbr_cnt *)(v + 1) + v->max_threads) + i)
> > > > > > +#define RTE_QSBR_THRID_INDEX_SHIFT 6 #define
> > > RTE_QSBR_THRID_MASK
> > > > > > +0x3f
> > > > > #define
> > > > > > +RTE_QSBR_THRID_INVALID 0xffffffff
> > > > > > +
> > > > > > +/* Worker thread counter */
> > > > > > +struct rte_rcu_qsbr_cnt {
> > > > > > + uint64_t cnt;
> > > > > > + /**< Quiescent state counter. Value 0 indicates the thread
> > > > > > +is offline */ } __rte_cache_aligned;
> > > > > > +
> > > > > > +#define RTE_QSBR_CNT_THR_OFFLINE 0 #define
> > > RTE_QSBR_CNT_INIT 1
> > > > > > +
> > > > > > +/* RTE Quiescent State variable structure.
> > > > > > + * This structure has two elements that vary in size based on
> > > > > > +the
> > > > > > + * 'max_threads' parameter.
> > > > > > + * 1) Quiescent state counter array
> > > > > > + * 2) Register thread ID array */ struct rte_rcu_qsbr {
> > > > > > + uint64_t token __rte_cache_aligned;
> > > > > > + /**< Counter to allow for multiple concurrent quiescent
> > > > > > +state queries */
> > > > > > +
> > > > > > + uint32_t num_elems __rte_cache_aligned;
> > > > > > + /**< Number of elements in the thread ID array */
> > > > > > + uint32_t num_threads;
> > > > > > + /**< Number of threads currently using this QS variable */
> > > > > > + uint32_t max_threads;
> > > > > > + /**< Maximum number of threads using this QS variable */
> > > > > > +
> > > > > > + struct rte_rcu_qsbr_cnt qsbr_cnt[0] __rte_cache_aligned;
> > > > > > + /**< Quiescent state counter array of 'max_threads'
> elements
> > > > > > +*/
> > > > > > +
> > > > > > + /**< Registered thread IDs are stored in a bitmap array,
> > > > > > + * after the quiescent state counter array.
> > > > > > + */
> > > > > > +} __rte_cache_aligned;
> > > > > > +
> >
> > <snip>
> >
> > > > > > +
> > > > > > +/* Check the quiescent state counter for registered threads
> > > > > > +only, assuming
> > > > > > + * that not all threads have registered.
> > > > > > + */
> > > > > > +static __rte_always_inline int
> > > > > > +__rcu_qsbr_check_selective(struct rte_rcu_qsbr *v, uint64_t
> > > > > > +t, bool
> > > > > > +wait) {
> > > > > > + uint32_t i, j, id;
> > > > > > + uint64_t bmap;
> > > > > > + uint64_t c;
> > > > > > + uint64_t *reg_thread_id;
> > > > > > +
> > > > > > + for (i = 0, reg_thread_id = RTE_QSBR_THRID_ARRAY_ELM(v,
> 0);
> > > > > > + i < v->num_elems;
> > > > > > + i++, reg_thread_id++) {
> > > > > > + /* Load the current registered thread bit map
> before
> > > > > > + * loading the reader thread quiescent state
> counters.
> > > > > > + */
> > > > > > + bmap = __atomic_load_n(reg_thread_id,
> > > > > __ATOMIC_ACQUIRE);
> > > > > > + id = i << RTE_QSBR_THRID_INDEX_SHIFT;
> > > > > > +
> > > > > > + while (bmap) {
> > > > > > + j = __builtin_ctzl(bmap);
> > > > > > + RCU_DP_LOG(DEBUG,
> > > > > > + "%s: check: token = %lu, wait = %d,
> Bit Map
> > > > > = 0x%lx, Thread ID = %d",
> > > > > > + __func__, t, wait, bmap, id + j);
> > > > > > + c = __atomic_load_n(
> > > > > > + &v->qsbr_cnt[id + j].cnt,
> > > > > > + __ATOMIC_ACQUIRE);
> > > > > > + RCU_DP_LOG(DEBUG,
> > > > > > + "%s: status: token = %lu, wait = %d,
> Thread
> > > > > QS cnt = %lu, Thread ID = %d",
> > > > > > + __func__, t, wait, c, id+j);
> > > > > > + /* Counter is not checked for wrap-around
> > > > > condition
> > > > > > + * as it is a 64b counter.
> > > > > > + */
> > > > > > + if (unlikely(c !=
> RTE_QSBR_CNT_THR_OFFLINE && c
> > > > > < t)) {
> > > > >
> > > > > This assumes that a 64-bit counter won't overflow, which is
> > > > > close enough to true given current CPU clock frequencies. ;-)
> > > > >
> > > > > > + /* This thread is not in quiescent
> state */
> > > > > > + if (!wait)
> > > > > > + return 0;
> > > > > > +
> > > > > > + rte_pause();
> > > > > > + /* This thread might have
> unregistered.
> > > > > > + * Re-read the bitmap.
> > > > > > + */
> > > > > > + bmap =
> __atomic_load_n(reg_thread_id,
> > > > > > + __ATOMIC_ACQUIRE);
> > > > > > +
> > > > > > + continue;
> > > > > > + }
> > > > > > +
> > > > > > + bmap &= ~(1UL << j);
> > > > > > + }
> > > > > > + }
> > > > > > +
> > > > > > + return 1;
> > > > > > +}
> > > > > > +
> > > > > > +/* Check the quiescent state counter for all threads,
> > > > > > +assuming that
> > > > > > + * all the threads have registered.
> > > > > > + */
> > > > > > +static __rte_always_inline int __rcu_qsbr_check_all(struct
> > > > > > +rte_rcu_qsbr *v, uint64_t t, bool
> > > > > > +wait)
> > > > >
> > > > > Does checking the bitmap really take long enough to make this
> > > > > worthwhile as a separate function? I would think that the
> > > > > bitmap-checking time would be lost in the noise of cache misses
> > > > > from
> > > the ->cnt loads.
> > > >
> > > > It avoids accessing one cache line. I think this is where the
> > > > savings are
> > > (may be in theory). This is the most probable use case.
> > > > On the other hand, __rcu_qsbr_check_selective() will result in
> > > > savings
> > > (depending on how many threads are currently registered) by avoiding
> > > accessing unwanted counters.
> > >
> > > Do you really expect to be calling this function on any kind of fastpath?
> >
> > Yes. For some of the libraries (rte_hash), the writer is on the fast path.
> >
> > >
> > > > > Sure, if you invoke __rcu_qsbr_check_selective() in a tight loop
> > > > > in the absence of readers, you might see __rcu_qsbr_check_all()
> > > > > being a bit faster. But is that really what DPDK does?
> > > > I see improvements in the synthetic test case (similar to the one
> > > > you
> > > have described, around 27%). However, in the more practical test
> > > cases I do not see any difference.
> > >
> > > If the performance improvement only occurs in a synthetic test case,
> > > does it really make sense to optimize for it?
> > I had to fix few issues in the performance test cases and added more to
> do the comparison. These changes are in v5.
> > There are 4 performance tests involving this API.
> > 1) 1 Writer, 'N' readers
> > Writer: qsbr_start, qsbr_check(wait = true)
> > Readers: qsbr_quiescent
> > 2) 'N' writers
> > Writers: qsbr_start, qsbr_check(wait == false)
> > 3) 1 Writer, 'N' readers (this test uses the lock-free rte_hash data
> structure)
> > Writer: hash_del, qsbr_start, qsbr_check(wait = true), validate that
> the reader was able to complete its work successfully
> > Readers: thread_online, hash_lookup, access the pointer - do some
> > work on it, qsbr_quiescent, thread_offline
> > 4) Same as test 3) but qsbr_check (wait == false)
> >
> > There are 2 sets of these tests.
> > a) QS variable is created with number of threads same as number of
> > readers - this will exercise __rcu_qsbr_check_all
> > b) QS variable is created with 128 threads, number of registered
> > threads is same as in a) - this will exercise
> > __rcu_qsbr_check_selective
> >
> > Following are the results on x86 (E5-2660 v4 @ 2.00GHz) comparing from
> > a) to b) (on x86 in my setup, the results are not very stable between
> > runs)
> > 1) 25%
> > 2) -3%
> > 3) -0.4%
> > 4) 1.38%
> >
> > Following are the results on an Arm system comparing from a) to b)
> > (results are not pretty stable between runs)
^^^
Correction, on the Arm system, the results *are* stable (copy-paste error)
> > 1) -3.45%
> > 2) 0%
> > 3) -0.03%
> > 4) -0.04%
> >
> > Konstantin, is it possible to run the tests on your setup and look at the
> results?
>
> I did run V5 on my box (SKX 2.1 GHz) with 17 lcores (1 physical core per
> thread).
> Didn't notice any siginifcatn fluctuations between runs, output below.
>
> >rcu_qsbr_perf_autotesESC[0Kt
> Number of cores provided = 17
> Perf test with all reader threads registered
> --------------------------------------------
>
> Perf Test: 16 Readers/1 Writer('wait' in qsbr_check == true) Total RCU
> updates = 65707232899 Cycles per 1000 updates: 18482 Total RCU checks =
> 20000000 Cycles per 1000 checks: 3794991
>
> Perf Test: 17 Readers
> Total RCU updates = 1700000000
> Cycles per 1000 updates: 2128
>
> Perf test: 17 Writers ('wait' in qsbr_check == false) Total RCU checks =
> 340000000 Cycles per 1000 checks: 10030
>
> Perf test: 1 writer, 17 readers, 1 QSBR variable, 1 QSBR Query, Blocking
> QSBR Check Following numbers include calls to rte_hash functions Cycles
> per 1 update(online/update/offline): 1984696 Cycles per 1 check(start,
> check): 2619002
>
> Perf test: 1 writer, 17 readers, 1 QSBR variable, 1 QSBR Query, Non-
> Blocking QSBR check Following numbers include calls to rte_hash functions
> Cycles per 1 update(online/update/offline): 2028030 Cycles per 1
> check(start, check): 2876667
>
> Perf test with some of reader threads registered
> ------------------------------------------------
>
> Perf Test: 16 Readers/1 Writer('wait' in qsbr_check == true) Total RCU
> updates = 68850073055 Cycles per 1000 updates: 25490 Total RCU checks =
> 20000000 Cycles per 1000 checks: 5484403
>
> Perf Test: 17 Readers
> Total RCU updates = 1700000000
> Cycles per 1000 updates: 2127
>
> Perf test: 17 Writers ('wait' in qsbr_check == false) Total RCU checks =
> 340000000 Cycles per 1000 checks: 10034
>
> Perf test: 1 writer, 17 readers, 1 QSBR variable, 1 QSBR Query, Blocking
> QSBR Check Following numbers include calls to rte_hash functions Cycles
> per 1 update(online/update/offline): 3604489 Cycles per 1 check(start,
> check): 7077372
>
> Perf test: 1 writer, 17 readers, 1 QSBR variable, 1 QSBR Query, Non-
> Blocking QSBR check Following numbers include calls to rte_hash functions
> Cycles per 1 update(online/update/offline): 3936831 Cycles per 1
> check(start, check): 7262738
>
>
> Test OK
Thanks for running the test. From the numbers, the comparison is as follows:
1) -44%
2) 0.03%
3) -170%
4) -152%
Trend is the same between x86 and Arm. However, x86 has drastic improvement with __rcu_qsbr_check_all function.
>
> Konstantin
next prev parent reply other threads:[~2019-04-15 19:46 UTC|newest]
Thread overview: 260+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-11-22 3:30 [dpdk-dev] [RFC 0/3] tqs: add thread quiescent state library Honnappa Nagarahalli
2018-11-22 3:30 ` [dpdk-dev] [RFC 1/3] log: add TQS log type Honnappa Nagarahalli
2018-11-27 22:24 ` Stephen Hemminger
2018-11-28 5:58 ` Honnappa Nagarahalli
2018-11-22 3:30 ` [dpdk-dev] [RFC 2/3] tqs: add thread quiescent state library Honnappa Nagarahalli
2018-11-24 12:18 ` Ananyev, Konstantin
2018-11-27 21:32 ` Honnappa Nagarahalli
2018-11-28 15:25 ` Ananyev, Konstantin
2018-12-07 7:27 ` Honnappa Nagarahalli
2018-12-07 17:29 ` Stephen Hemminger
2018-12-11 6:40 ` Honnappa Nagarahalli
2018-12-13 12:26 ` Burakov, Anatoly
2018-12-18 4:30 ` Honnappa Nagarahalli
2018-12-18 6:31 ` Stephen Hemminger
2018-12-12 9:29 ` Ananyev, Konstantin
2018-12-13 7:39 ` Honnappa Nagarahalli
2018-12-17 13:14 ` Ananyev, Konstantin
2018-11-22 3:30 ` [dpdk-dev] [RFC 3/3] test/tqs: Add API and functional tests Honnappa Nagarahalli
[not found] ` <CGME20181122073110eucas1p17592400af6c0b807dc87e90d136575af@eucas1p1.samsung.com>
2018-11-22 7:31 ` [dpdk-dev] [RFC 0/3] tqs: add thread quiescent state library Ilya Maximets
2018-11-27 22:28 ` Stephen Hemminger
2018-11-27 22:49 ` Van Haaren, Harry
2018-11-28 5:31 ` Honnappa Nagarahalli
2018-11-28 23:23 ` Stephen Hemminger
2018-11-30 2:13 ` Honnappa Nagarahalli
2018-11-30 16:26 ` Luca Boccassi
2018-11-30 18:32 ` Stephen Hemminger
2018-11-30 20:20 ` Honnappa Nagarahalli
2018-11-30 20:56 ` Mattias Rönnblom
2018-11-30 23:44 ` Stephen Hemminger
2018-12-01 18:37 ` Honnappa Nagarahalli
2018-11-30 2:25 ` Honnappa Nagarahalli
2018-11-30 21:03 ` Mattias Rönnblom
2018-12-22 2:14 ` [dpdk-dev] [RFC v2 0/2] rcu: add RCU library supporting QSBR mechanism Honnappa Nagarahalli
2018-12-22 2:14 ` [dpdk-dev] [RFC v2 1/2] " Honnappa Nagarahalli
2019-01-15 11:39 ` Ananyev, Konstantin
2019-01-15 20:43 ` Honnappa Nagarahalli
2019-01-16 15:56 ` Ananyev, Konstantin
2019-01-18 6:48 ` Honnappa Nagarahalli
2019-01-18 12:14 ` Ananyev, Konstantin
2019-01-24 17:15 ` Honnappa Nagarahalli
2019-01-24 18:05 ` Ananyev, Konstantin
2019-02-22 7:07 ` Honnappa Nagarahalli
2018-12-22 2:14 ` [dpdk-dev] [RFC v2 2/2] test/rcu_qsbr: add API and functional tests Honnappa Nagarahalli
2018-12-23 7:30 ` Stephen Hemminger
2018-12-23 16:25 ` Paul E. McKenney
2019-01-18 7:04 ` Honnappa Nagarahalli
2019-02-22 7:04 ` [dpdk-dev] [RFC v3 0/5] rcu: add RCU library supporting QSBR mechanism Honnappa Nagarahalli
2019-02-22 7:04 ` [dpdk-dev] [RFC v3 1/5] " Honnappa Nagarahalli
2019-02-22 7:04 ` [dpdk-dev] [RFC v3 2/5] test/rcu_qsbr: add API and functional tests Honnappa Nagarahalli
2019-02-22 7:04 ` [dpdk-dev] [RFC v3 3/5] lib/rcu: add dynamic memory allocation capability Honnappa Nagarahalli
2019-02-22 7:04 ` [dpdk-dev] [RFC v3 4/5] test/rcu_qsbr: modify test cases for dynamic memory allocation Honnappa Nagarahalli
2019-02-22 7:04 ` [dpdk-dev] [RFC v3 5/5] lib/rcu: fix the size of register thread ID array size Honnappa Nagarahalli
2019-03-19 4:52 ` [dpdk-dev] [PATCH 0/3] lib/rcu: add RCU library supporting QSBR mechanism Honnappa Nagarahalli
2019-03-19 4:52 ` Honnappa Nagarahalli
2019-03-19 4:52 ` [dpdk-dev] [PATCH 1/3] rcu: " Honnappa Nagarahalli
2019-03-19 4:52 ` Honnappa Nagarahalli
2019-03-22 16:42 ` Ananyev, Konstantin
2019-03-22 16:42 ` Ananyev, Konstantin
2019-03-26 4:35 ` Honnappa Nagarahalli
2019-03-26 4:35 ` Honnappa Nagarahalli
2019-03-28 11:15 ` Ananyev, Konstantin
2019-03-28 11:15 ` Ananyev, Konstantin
2019-03-29 5:54 ` Honnappa Nagarahalli
2019-03-29 5:54 ` Honnappa Nagarahalli
2019-03-19 4:52 ` [dpdk-dev] [PATCH 2/3] test/rcu_qsbr: add API and functional tests Honnappa Nagarahalli
2019-03-19 4:52 ` Honnappa Nagarahalli
2019-03-19 4:52 ` [dpdk-dev] [PATCH 3/3] doc/rcu: add lib_rcu documentation Honnappa Nagarahalli
2019-03-19 4:52 ` Honnappa Nagarahalli
2019-03-25 11:34 ` Kovacevic, Marko
2019-03-25 11:34 ` Kovacevic, Marko
2019-03-26 4:43 ` Honnappa Nagarahalli
2019-03-26 4:43 ` Honnappa Nagarahalli
2019-03-27 5:52 ` [dpdk-dev] [PATCH v2 0/3] lib/rcu: add RCU library supporting QSBR mechanism Honnappa Nagarahalli
2019-03-27 5:52 ` Honnappa Nagarahalli
2019-03-27 5:52 ` [dpdk-dev] [PATCH v2 1/3] rcu: " Honnappa Nagarahalli
2019-03-27 5:52 ` Honnappa Nagarahalli
2019-03-27 5:52 ` [dpdk-dev] [PATCH v2 2/3] test/rcu_qsbr: add API and functional tests Honnappa Nagarahalli
2019-03-27 5:52 ` Honnappa Nagarahalli
2019-03-27 5:52 ` [dpdk-dev] [PATCH v2 3/3] doc/rcu: add lib_rcu documentation Honnappa Nagarahalli
2019-03-27 5:52 ` Honnappa Nagarahalli
2019-04-01 17:10 ` [dpdk-dev] [PATCH v3 0/3] lib/rcu: add RCU library supporting QSBR mechanism Honnappa Nagarahalli
2019-04-01 17:10 ` Honnappa Nagarahalli
2019-04-01 17:11 ` [dpdk-dev] [PATCH v3 1/3] rcu: " Honnappa Nagarahalli
2019-04-01 17:11 ` Honnappa Nagarahalli
2019-04-02 10:22 ` Ananyev, Konstantin
2019-04-02 10:22 ` Ananyev, Konstantin
2019-04-02 10:53 ` Ananyev, Konstantin
2019-04-02 10:53 ` Ananyev, Konstantin
2019-04-01 17:11 ` [dpdk-dev] [PATCH v3 2/3] test/rcu_qsbr: add API and functional tests Honnappa Nagarahalli
2019-04-01 17:11 ` Honnappa Nagarahalli
2019-04-02 10:55 ` Ananyev, Konstantin
2019-04-02 10:55 ` Ananyev, Konstantin
2019-04-01 17:11 ` [dpdk-dev] [PATCH v3 3/3] doc/rcu: add lib_rcu documentation Honnappa Nagarahalli
2019-04-01 17:11 ` Honnappa Nagarahalli
2019-04-10 11:20 ` [dpdk-dev] [PATCH v4 0/3] lib/rcu: add RCU library supporting QSBR mechanism Honnappa Nagarahalli
2019-04-10 11:20 ` Honnappa Nagarahalli
2019-04-10 11:20 ` [dpdk-dev] [PATCH v4 1/3] rcu: " Honnappa Nagarahalli
2019-04-10 11:20 ` Honnappa Nagarahalli
2019-04-10 18:14 ` Paul E. McKenney
2019-04-10 18:14 ` Paul E. McKenney
2019-04-11 4:35 ` Honnappa Nagarahalli
2019-04-11 4:35 ` Honnappa Nagarahalli
2019-04-11 15:26 ` Paul E. McKenney
2019-04-11 15:26 ` Paul E. McKenney
2019-04-12 20:21 ` Honnappa Nagarahalli
2019-04-12 20:21 ` Honnappa Nagarahalli
2019-04-15 16:51 ` Ananyev, Konstantin
2019-04-15 16:51 ` Ananyev, Konstantin
2019-04-15 19:46 ` Honnappa Nagarahalli [this message]
2019-04-15 19:46 ` Honnappa Nagarahalli
2019-04-10 11:20 ` [dpdk-dev] [PATCH v4 2/3] test/rcu_qsbr: add API and functional tests Honnappa Nagarahalli
2019-04-10 11:20 ` Honnappa Nagarahalli
2019-04-10 15:26 ` Stephen Hemminger
2019-04-10 15:26 ` Stephen Hemminger
2019-04-10 16:15 ` Honnappa Nagarahalli
2019-04-10 16:15 ` Honnappa Nagarahalli
2019-04-10 11:20 ` [dpdk-dev] [PATCH v4 3/3] doc/rcu: add lib_rcu documentation Honnappa Nagarahalli
2019-04-10 11:20 ` Honnappa Nagarahalli
2019-04-12 20:20 ` [dpdk-dev] [PATCH v5 0/3] lib/rcu: add RCU library supporting QSBR mechanism Honnappa Nagarahalli
2019-04-12 20:20 ` Honnappa Nagarahalli
2019-04-12 20:20 ` [dpdk-dev] [PATCH v5 1/3] rcu: " Honnappa Nagarahalli
2019-04-12 20:20 ` Honnappa Nagarahalli
2019-04-12 22:06 ` Stephen Hemminger
2019-04-12 22:06 ` Stephen Hemminger
2019-04-12 22:24 ` Honnappa Nagarahalli
2019-04-12 22:24 ` Honnappa Nagarahalli
2019-04-12 23:06 ` Stephen Hemminger
2019-04-12 23:06 ` Stephen Hemminger
2019-04-15 12:24 ` Ananyev, Konstantin
2019-04-15 12:24 ` Ananyev, Konstantin
2019-04-15 15:38 ` Stephen Hemminger
2019-04-15 15:38 ` Stephen Hemminger
2019-04-15 17:39 ` Ananyev, Konstantin
2019-04-15 17:39 ` Ananyev, Konstantin
2019-04-15 18:56 ` Honnappa Nagarahalli
2019-04-15 18:56 ` Honnappa Nagarahalli
2019-04-15 21:26 ` Stephen Hemminger
2019-04-15 21:26 ` Stephen Hemminger
2019-04-16 5:29 ` Honnappa Nagarahalli
2019-04-16 5:29 ` Honnappa Nagarahalli
2019-04-16 14:54 ` Stephen Hemminger
2019-04-16 14:54 ` Stephen Hemminger
2019-04-16 16:56 ` Honnappa Nagarahalli
2019-04-16 16:56 ` Honnappa Nagarahalli
2019-04-16 21:22 ` Stephen Hemminger
2019-04-16 21:22 ` Stephen Hemminger
2019-04-17 1:45 ` Honnappa Nagarahalli
2019-04-17 1:45 ` Honnappa Nagarahalli
2019-04-17 13:39 ` Ananyev, Konstantin
2019-04-17 13:39 ` Ananyev, Konstantin
2019-04-17 14:02 ` Honnappa Nagarahalli
2019-04-17 14:02 ` Honnappa Nagarahalli
2019-04-17 14:18 ` Thomas Monjalon
2019-04-17 14:18 ` Thomas Monjalon
2019-04-12 20:20 ` [dpdk-dev] [PATCH v5 2/3] test/rcu_qsbr: add API and functional tests Honnappa Nagarahalli
2019-04-12 20:20 ` Honnappa Nagarahalli
2019-04-12 20:20 ` [dpdk-dev] [PATCH v5 3/3] doc/rcu: add lib_rcu documentation Honnappa Nagarahalli
2019-04-12 20:20 ` Honnappa Nagarahalli
2019-04-15 17:29 ` [dpdk-dev] [PATCH v5 0/3] lib/rcu: add RCU library supporting QSBR mechanism Ananyev, Konstantin
2019-04-15 17:29 ` Ananyev, Konstantin
2019-04-16 5:10 ` Honnappa Nagarahalli
2019-04-16 5:10 ` Honnappa Nagarahalli
2019-04-17 4:13 ` [dpdk-dev] [PATCH v6 " Honnappa Nagarahalli
2019-04-17 4:13 ` Honnappa Nagarahalli
2019-04-17 4:13 ` [dpdk-dev] [PATCH v6 1/3] rcu: " Honnappa Nagarahalli
2019-04-17 4:13 ` Honnappa Nagarahalli
2019-04-19 19:19 ` Paul E. McKenney
2019-04-19 19:19 ` Paul E. McKenney
2019-04-23 1:08 ` Honnappa Nagarahalli
2019-04-23 1:08 ` Honnappa Nagarahalli
2019-04-17 4:13 ` [dpdk-dev] [PATCH v6 2/3] test/rcu_qsbr: add API and functional tests Honnappa Nagarahalli
2019-04-17 4:13 ` Honnappa Nagarahalli
2019-04-17 4:13 ` [dpdk-dev] [PATCH v6 3/3] doc/rcu: add lib_rcu documentation Honnappa Nagarahalli
2019-04-17 4:13 ` Honnappa Nagarahalli
2019-04-21 16:40 ` [dpdk-dev] [PATCH v6 0/3] lib/rcu: add RCU library supporting QSBR mechanism Thomas Monjalon
2019-04-21 16:40 ` Thomas Monjalon
2019-04-25 14:18 ` Honnappa Nagarahalli
2019-04-25 14:18 ` Honnappa Nagarahalli
2019-04-25 14:27 ` Honnappa Nagarahalli
2019-04-25 14:27 ` Honnappa Nagarahalli
2019-04-25 14:38 ` David Marchand
2019-04-25 14:38 ` David Marchand
2019-04-23 4:31 ` [dpdk-dev] [PATCH v7 " Honnappa Nagarahalli
2019-04-23 4:31 ` Honnappa Nagarahalli
2019-04-23 4:31 ` [dpdk-dev] [PATCH v7 1/3] rcu: " Honnappa Nagarahalli
2019-04-23 4:31 ` Honnappa Nagarahalli
2019-04-23 8:10 ` Paul E. McKenney
2019-04-23 8:10 ` Paul E. McKenney
2019-04-23 21:23 ` Honnappa Nagarahalli
2019-04-23 21:23 ` Honnappa Nagarahalli
2019-04-24 20:02 ` Jerin Jacob Kollanukkaran
2019-04-24 20:02 ` Jerin Jacob Kollanukkaran
2019-04-25 5:15 ` Honnappa Nagarahalli
2019-04-25 5:15 ` Honnappa Nagarahalli
2019-04-24 10:03 ` Ruifeng Wang (Arm Technology China)
2019-04-24 10:03 ` Ruifeng Wang (Arm Technology China)
2019-04-23 4:31 ` [dpdk-dev] [PATCH v7 2/3] test/rcu_qsbr: add API and functional tests Honnappa Nagarahalli
2019-04-23 4:31 ` Honnappa Nagarahalli
2019-04-23 4:31 ` [dpdk-dev] [PATCH v7 3/3] doc/rcu: add lib_rcu documentation Honnappa Nagarahalli
2019-04-23 4:31 ` Honnappa Nagarahalli
2019-04-24 10:12 ` Ruifeng Wang (Arm Technology China)
2019-04-24 10:12 ` Ruifeng Wang (Arm Technology China)
2019-04-26 4:39 ` [dpdk-dev] [PATCH v8 0/4] lib/rcu: add RCU library supporting QSBR mechanism Honnappa Nagarahalli
2019-04-26 4:39 ` Honnappa Nagarahalli
2019-04-26 4:39 ` [dpdk-dev] [PATCH v8 1/4] rcu: " Honnappa Nagarahalli
2019-04-26 4:39 ` Honnappa Nagarahalli
2019-04-26 8:13 ` Jerin Jacob Kollanukkaran
2019-04-26 8:13 ` Jerin Jacob Kollanukkaran
2019-04-28 3:25 ` Ruifeng Wang (Arm Technology China)
2019-04-28 3:25 ` Ruifeng Wang (Arm Technology China)
2019-04-29 20:33 ` Thomas Monjalon
2019-04-29 20:33 ` Thomas Monjalon
2019-04-30 10:51 ` Hemant Agrawal
2019-04-30 10:51 ` Hemant Agrawal
2019-04-26 4:39 ` [dpdk-dev] [PATCH v8 2/4] test/rcu_qsbr: add API and functional tests Honnappa Nagarahalli
2019-04-26 4:39 ` Honnappa Nagarahalli
2019-04-29 20:35 ` Thomas Monjalon
2019-04-29 20:35 ` Thomas Monjalon
2019-04-30 4:20 ` Honnappa Nagarahalli
2019-04-30 4:20 ` Honnappa Nagarahalli
2019-04-30 7:58 ` Thomas Monjalon
2019-04-30 7:58 ` Thomas Monjalon
2019-04-26 4:39 ` [dpdk-dev] [PATCH v8 3/4] doc/rcu: add lib_rcu documentation Honnappa Nagarahalli
2019-04-26 4:39 ` Honnappa Nagarahalli
2019-04-26 4:40 ` [dpdk-dev] [PATCH v8 4/4] doc: added RCU to the release notes Honnappa Nagarahalli
2019-04-26 4:40 ` Honnappa Nagarahalli
2019-04-26 12:04 ` [dpdk-dev] [PATCH v8 0/4] lib/rcu: add RCU library supporting QSBR mechanism Ananyev, Konstantin
2019-04-26 12:04 ` Ananyev, Konstantin
2019-05-01 3:54 ` [dpdk-dev] [PATCH v9 " Honnappa Nagarahalli
2019-05-01 3:54 ` Honnappa Nagarahalli
2019-05-01 3:54 ` [dpdk-dev] [PATCH v9 1/4] rcu: " Honnappa Nagarahalli
2019-05-01 3:54 ` Honnappa Nagarahalli
2019-05-01 3:54 ` [dpdk-dev] [PATCH v9 2/4] test/rcu_qsbr: add API and functional tests Honnappa Nagarahalli
2019-05-01 3:54 ` Honnappa Nagarahalli
2019-05-03 14:31 ` David Marchand
2019-05-03 14:31 ` David Marchand
2019-05-06 23:16 ` Honnappa Nagarahalli
2019-05-06 23:16 ` Honnappa Nagarahalli
2019-05-01 3:54 ` [dpdk-dev] [PATCH v9 3/4] doc/rcu: add lib_rcu documentation Honnappa Nagarahalli
2019-05-01 3:54 ` Honnappa Nagarahalli
2019-05-01 11:37 ` Mcnamara, John
2019-05-01 11:37 ` Mcnamara, John
2019-05-01 21:20 ` Honnappa Nagarahalli
2019-05-01 21:20 ` Honnappa Nagarahalli
2019-05-01 21:32 ` Thomas Monjalon
2019-05-01 21:32 ` Thomas Monjalon
2019-05-01 3:54 ` [dpdk-dev] [PATCH v9 4/4] doc: added RCU to the release notes Honnappa Nagarahalli
2019-05-01 3:54 ` Honnappa Nagarahalli
2019-05-01 11:31 ` Mcnamara, John
2019-05-01 11:31 ` Mcnamara, John
2019-05-01 12:15 ` [dpdk-dev] [PATCH v9 0/4] lib/rcu: add RCU library supporting QSBR mechanism Neil Horman
2019-05-01 12:15 ` Neil Horman
2019-05-01 14:56 ` Honnappa Nagarahalli
2019-05-01 14:56 ` Honnappa Nagarahalli
2019-05-01 18:05 ` Neil Horman
2019-05-01 18:05 ` Neil Horman
2019-05-01 21:18 ` Honnappa Nagarahalli
2019-05-01 21:18 ` Honnappa Nagarahalli
2019-05-01 23:36 ` Thomas Monjalon
2019-05-01 23:36 ` Thomas Monjalon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=VE1PR08MB5149E8A6CC456B98B6CBE320982B0@VE1PR08MB5149.eurprd08.prod.outlook.com \
--to=honnappa.nagarahalli@arm.com \
--cc=Dharmik.Thakkar@arm.com \
--cc=Gavin.Hu@arm.com \
--cc=Malvika.Gupta@arm.com \
--cc=dev@dpdk.org \
--cc=konstantin.ananyev@intel.com \
--cc=marko.kovacevic@intel.com \
--cc=nd@arm.com \
--cc=paulmck@linux.ibm.com \
--cc=stephen@networkplumber.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).