DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH 0/3] new software event timer adapter
@ 2018-11-29 23:35 Erik Gabriel Carrillo
  2018-11-29 23:35 ` [dpdk-dev] [PATCH 1/3] timer: allow timer management in shared memory Erik Gabriel Carrillo
                   ` (5 more replies)
  0 siblings, 6 replies; 77+ messages in thread
From: Erik Gabriel Carrillo @ 2018-11-29 23:35 UTC (permalink / raw)
  To: pbhagavatula, jerin.jacob, rsanford; +Cc: dev

This patch series introduces a new version of the event timer 
adapter software PMD [1].  In the original design, timer event producer
lcores in the primary and secondary processes enqueued event timers
into a ring, and a service core in the primary process dequeued them
and processed them further.  To improve performance, this version does
away with the ring and lets the lcores in both primary and secondary
processes insert timers into directly into the timer skiplist data
structures; the service core directly accesses the lists as well. 
To achieve this, however, modifications to the timer library [2] are
required to enable the timer skiplists to be created and accessed in
shared memory.  New APIs are introduced in the timer library to enable
selecting from multiple instances of the timer skiplists. Instances of
the event timer adapter, as well as the original APIs of the timer
library, can then each access distinct timer lists.

Future versions of this series will hopefully improve the names
used for the data structures and APIs in the timer library.

This series depends on the following patch:
https://patches.dpdk.org/patch/48417/

[1] https://doc.dpdk.org/guides/prog_guide/event_timer_adapter.html
[2] https://doc.dpdk.org/guides/prog_guide/timer_lib.html

Erik Gabriel Carrillo (3):
  timer: allow timer management in shared memory
  timer: add function to stop all timers in a list
  eventdev: add new software event timer adapter

 lib/librte_eventdev/rte_event_timer_adapter.c | 687 +++++++++++---------------
 lib/librte_timer/Makefile                     |   1 +
 lib/librte_timer/rte_timer.c                  | 579 ++++++++++++++++++----
 lib/librte_timer/rte_timer.h                  | 200 +++++++-
 lib/librte_timer/rte_timer_version.map        |  22 +-
 5 files changed, 972 insertions(+), 517 deletions(-)

-- 
2.6.4

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [dpdk-dev] [PATCH 1/3] timer: allow timer management in shared memory
  2018-11-29 23:35 [dpdk-dev] [PATCH 0/3] new software event timer adapter Erik Gabriel Carrillo
@ 2018-11-29 23:35 ` Erik Gabriel Carrillo
  2018-11-29 23:35 ` [dpdk-dev] [PATCH 2/3] timer: add function to stop all timers in a list Erik Gabriel Carrillo
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 77+ messages in thread
From: Erik Gabriel Carrillo @ 2018-11-29 23:35 UTC (permalink / raw)
  To: pbhagavatula, jerin.jacob, rsanford; +Cc: dev

Currently, the timer library uses a per-process table of structures to
manage skiplists of timers presumably because timers contain arbitrary
function pointers whose value may not resolve properly in other
processes.

However, if the same callback is used handle all timers, and that
callback is only invoked in one process, then it woud be safe to allow
the data structures to be allocated in shared memory, and to allow
secondary processes to modify the timer lists.  This would let timers be
used in more multi-process scenarios.

The library's global variables are wrapped with a struct, and an array
of these structures is created in shared memory.  The original APIs
are updated to reference the zeroth entry in the array. This maintains
the original behavior for both primary and secondary processes since
the set intersection of their coremasks should be empty [1].  New APIs
are introduced to enable the allocation/deallocation of other entries
in the array.

New variants of the APIs used to start and stop timers are introduced;
they allow a caller to specify which array entry should be used to
locate the timer list to insert into or delete from.

Finally, a new variant of rte_timer_manage() is introduced, which
allows a caller to specify which array entry should be used to locate
the timer lists to process; it can also process multiple timer lists per
invocation.

[1] https://doc.dpdk.org/guides/prog_guide/multi_proc_support.html#multi-process-limitations

Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
---
 lib/librte_timer/Makefile              |   1 +
 lib/librte_timer/rte_timer.c           | 526 +++++++++++++++++++++++++++------
 lib/librte_timer/rte_timer.h           | 168 ++++++++++-
 lib/librte_timer/rte_timer_version.map |  21 +-
 4 files changed, 614 insertions(+), 102 deletions(-)

diff --git a/lib/librte_timer/Makefile b/lib/librte_timer/Makefile
index 4ebd528..8ec63f4 100644
--- a/lib/librte_timer/Makefile
+++ b/lib/librte_timer/Makefile
@@ -6,6 +6,7 @@ include $(RTE_SDK)/mk/rte.vars.mk
 # library name
 LIB = librte_timer.a
 
+CFLAGS += -DALLOW_EXPERIMENTAL_API
 CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3
 LDLIBS += -lrte_eal
 
diff --git a/lib/librte_timer/rte_timer.c b/lib/librte_timer/rte_timer.c
index 30c7b0a..a76be8b 100644
--- a/lib/librte_timer/rte_timer.c
+++ b/lib/librte_timer/rte_timer.c
@@ -5,6 +5,7 @@
 #include <string.h>
 #include <stdio.h>
 #include <stdint.h>
+#include <stdbool.h>
 #include <inttypes.h>
 #include <assert.h>
 #include <sys/queue.h>
@@ -21,23 +22,27 @@
 #include <rte_spinlock.h>
 #include <rte_random.h>
 #include <rte_pause.h>
+#include <rte_memzone.h>
+#include <rte_malloc.h>
 
 #include "rte_timer.h"
 
-LIST_HEAD(rte_timer_list, rte_timer);
-
+/**
+ * Per-lcore info for timers.
+ */
 struct priv_timer {
-	struct rte_timer pending_head;  /**< dummy timer instance to head up list */
+	struct rte_timer pending_head;  /**< dummy timer to head up list */
 	rte_spinlock_t list_lock;       /**< lock to protect list access */
 
 	/** per-core variable that true if a timer was updated on this
-	 *  core since last reset of the variable */
+	 *  core since last reset of the variable
+	 */
 	int updated;
 
 	/** track the current depth of the skiplist */
-	unsigned curr_skiplist_depth;
+	unsigned int curr_skiplist_depth;
 
-	unsigned prev_lcore;              /**< used for lcore round robin */
+	unsigned int prev_lcore;	/**< used for lcore round robin */
 
 	/** running timer on this lcore now */
 	struct rte_timer *running_tim;
@@ -48,33 +53,140 @@ struct priv_timer {
 #endif
 } __rte_cache_aligned;
 
-/** per-lcore private info for timers */
-static struct priv_timer priv_timer[RTE_MAX_LCORE];
+#define FL_ALLOCATED	(1 << 0)
+struct rte_timer_data {
+	struct priv_timer priv_timer[RTE_MAX_LCORE];
+	uint8_t internal_flags;
+};
+
+#define RTE_MAX_DATA_ELS 64
+static struct rte_timer_data *rte_timer_data_arr;
+static uint32_t default_data_id;  // id set to zero automatically
+static uint32_t rte_timer_subsystem_initialized;
 
 /* when debug is enabled, store some statistics */
 #ifdef RTE_LIBRTE_TIMER_DEBUG
-#define __TIMER_STAT_ADD(name, n) do {					\
+#define __TIMER_STAT_ADD(data, name, n) do {				\
 		unsigned __lcore_id = rte_lcore_id();			\
 		if (__lcore_id < RTE_MAX_LCORE)				\
-			priv_timer[__lcore_id].stats.name += (n);	\
+			data->priv_timer[__lcore_id].stats.name += (n);	\
 	} while(0)
 #else
-#define __TIMER_STAT_ADD(name, n) do {} while(0)
+#define __TIMER_STAT_ADD(data, name, n) do {} while (0)
 #endif
 
-/* Init the timer library. */
-void
+static inline int
+timer_data_valid(uint32_t id)
+{
+	return !!(rte_timer_data_arr[id].internal_flags & FL_ALLOCATED);
+}
+
+/* validate ID and retrieve timer data pointer, or return error value */
+#define TIMER_DATA_VALID_GET_OR_ERR_RET(id, timer_data, retval) do {	\
+	if (id >= RTE_MAX_DATA_ELS || !timer_data_valid(id))		\
+		return retval;						\
+	timer_data = &rte_timer_data_arr[id];				\
+} while (0)
+
+int __rte_experimental
+rte_timer_data_alloc(uint32_t *id_ptr)
+{
+	int i;
+	struct rte_timer_data *data;
+
+	if (!rte_timer_subsystem_initialized)
+		return -ENOMEM;
+
+	for (i = 0; i < RTE_MAX_DATA_ELS; i++) {
+		data = &rte_timer_data_arr[i];
+		if (!(data->internal_flags & FL_ALLOCATED)) {
+			data->internal_flags |= FL_ALLOCATED;
+
+			if (id_ptr)
+				*id_ptr = i;
+
+			return 0;
+		}
+	}
+
+	return -ENOSPC;
+}
+
+int __rte_experimental
+rte_timer_data_dealloc(uint32_t id)
+{
+	struct rte_timer_data *timer_data;
+	TIMER_DATA_VALID_GET_OR_ERR_RET(id, timer_data, -EINVAL);
+
+	timer_data->internal_flags &= ~(FL_ALLOCATED);
+
+	return 0;
+}
+
+/* Init the timer library. Allocate an array of timer data structs in shared
+ * memory, and allocate the zeroth entry for use with original timer
+ * APIs. Since the intersection of the sets of lcore ids in primary and
+ * secondary processes should be empty, the zeroth entry can be shared by
+ * multiple processes.
+ */
+int
 rte_timer_subsystem_init(void)
 {
-	unsigned lcore_id;
+	const struct rte_memzone *mz;
+	struct rte_timer_data *data;
+	int i, lcore_id;
+	static const char *mz_name = "rte_timer_mz";
 
-	/* since priv_timer is static, it's zeroed by default, so only init some
-	 * fields.
-	 */
-	for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id ++) {
-		rte_spinlock_init(&priv_timer[lcore_id].list_lock);
-		priv_timer[lcore_id].prev_lcore = lcore_id;
+	if (rte_timer_subsystem_initialized)
+		return -EALREADY;
+
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
+		mz = rte_memzone_lookup(mz_name);
+		if (mz == NULL)
+			return -EEXIST;
+
+		rte_timer_data_arr = mz->addr;
+
+		rte_timer_data_arr[default_data_id].internal_flags |=
+			FL_ALLOCATED;
+
+		rte_timer_subsystem_initialized = 1;
+
+		return 0;
+	}
+
+	mz = rte_memzone_reserve_aligned(mz_name,
+			RTE_MAX_DATA_ELS * sizeof(*rte_timer_data_arr),
+			SOCKET_ID_ANY, 0, RTE_CACHE_LINE_SIZE);
+	if (mz == NULL)
+		return -ENOMEM;
+
+	rte_timer_data_arr = mz->addr;
+
+	for (i = 0; i < RTE_MAX_DATA_ELS; i++) {
+		data = &rte_timer_data_arr[i];
+
+		for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
+			rte_spinlock_init(
+				&data->priv_timer[lcore_id].list_lock);
+			data->priv_timer[lcore_id].prev_lcore = lcore_id;
+		}
 	}
+
+	rte_timer_data_arr[default_data_id].internal_flags |= FL_ALLOCATED;
+
+	rte_timer_subsystem_initialized = 1;
+
+	return 0;
+}
+
+void __rte_experimental
+rte_timer_subsystem_finalize(void)
+{
+	if (rte_timer_data_arr)
+		rte_free(rte_timer_data_arr);
+
+	rte_timer_subsystem_initialized = 0;
 }
 
 /* Initialize the timer handle tim for use */
@@ -95,7 +207,8 @@ rte_timer_init(struct rte_timer *tim)
  */
 static int
 timer_set_config_state(struct rte_timer *tim,
-		       union rte_timer_status *ret_prev_status)
+		       union rte_timer_status *ret_prev_status,
+		       struct rte_timer_data *data)
 {
 	union rte_timer_status prev_status, status;
 	int success = 0;
@@ -113,7 +226,7 @@ timer_set_config_state(struct rte_timer *tim,
 		 */
 		if (prev_status.state == RTE_TIMER_RUNNING &&
 		    (prev_status.owner != (uint16_t)lcore_id ||
-		     tim != priv_timer[lcore_id].running_tim))
+		     tim != data->priv_timer[lcore_id].running_tim))
 			return -1;
 
 		/* timer is being configured on another core */
@@ -207,13 +320,13 @@ timer_get_skiplist_level(unsigned curr_depth)
  */
 static void
 timer_get_prev_entries(uint64_t time_val, unsigned tim_lcore,
-		struct rte_timer **prev)
+		struct rte_timer **prev, struct rte_timer_data *data)
 {
-	unsigned lvl = priv_timer[tim_lcore].curr_skiplist_depth;
-	prev[lvl] = &priv_timer[tim_lcore].pending_head;
-	while(lvl != 0) {
+	unsigned int lvl = data->priv_timer[tim_lcore].curr_skiplist_depth;
+	prev[lvl] = &data->priv_timer[tim_lcore].pending_head;
+	while (lvl != 0) {
 		lvl--;
-		prev[lvl] = prev[lvl+1];
+		prev[lvl] = prev[lvl + 1];
 		while (prev[lvl]->sl_next[lvl] &&
 				prev[lvl]->sl_next[lvl]->expire <= time_val)
 			prev[lvl] = prev[lvl]->sl_next[lvl];
@@ -226,14 +339,16 @@ timer_get_prev_entries(uint64_t time_val, unsigned tim_lcore,
  */
 static void
 timer_get_prev_entries_for_node(struct rte_timer *tim, unsigned tim_lcore,
-		struct rte_timer **prev)
+				struct rte_timer **prev,
+				struct rte_timer_data *data)
 {
 	int i;
 	/* to get a specific entry in the list, look for just lower than the time
 	 * values, and then increment on each level individually if necessary
 	 */
-	timer_get_prev_entries(tim->expire - 1, tim_lcore, prev);
-	for (i = priv_timer[tim_lcore].curr_skiplist_depth - 1; i >= 0; i--) {
+	timer_get_prev_entries(tim->expire - 1, tim_lcore, prev, data);
+	for (i = data->priv_timer[tim_lcore].curr_skiplist_depth - 1; i >= 0;
+	     i--) {
 		while (prev[i]->sl_next[i] != NULL &&
 				prev[i]->sl_next[i] != tim &&
 				prev[i]->sl_next[i]->expire <= tim->expire)
@@ -247,20 +362,21 @@ timer_get_prev_entries_for_node(struct rte_timer *tim, unsigned tim_lcore,
  * timer must not be in a list
  */
 static void
-timer_add(struct rte_timer *tim, unsigned int tim_lcore)
+timer_add(struct rte_timer *tim, unsigned int tim_lcore,
+	  struct rte_timer_data *data)
 {
 	unsigned lvl;
 	struct rte_timer *prev[MAX_SKIPLIST_DEPTH+1];
 
 	/* find where exactly this element goes in the list of elements
 	 * for each depth. */
-	timer_get_prev_entries(tim->expire, tim_lcore, prev);
+	timer_get_prev_entries(tim->expire, tim_lcore, prev, data);
 
 	/* now assign it a new level and add at that level */
 	const unsigned tim_level = timer_get_skiplist_level(
-			priv_timer[tim_lcore].curr_skiplist_depth);
-	if (tim_level == priv_timer[tim_lcore].curr_skiplist_depth)
-		priv_timer[tim_lcore].curr_skiplist_depth++;
+			data->priv_timer[tim_lcore].curr_skiplist_depth);
+	if (tim_level == data->priv_timer[tim_lcore].curr_skiplist_depth)
+		data->priv_timer[tim_lcore].curr_skiplist_depth++;
 
 	lvl = tim_level;
 	while (lvl > 0) {
@@ -272,9 +388,10 @@ timer_add(struct rte_timer *tim, unsigned int tim_lcore)
 	prev[0]->sl_next[0] = tim;
 
 	/* save the lowest list entry into the expire field of the dummy hdr
-	 * NOTE: this is not atomic on 32-bit*/
-	priv_timer[tim_lcore].pending_head.expire = priv_timer[tim_lcore].\
-			pending_head.sl_next[0]->expire;
+	 * NOTE: this is not atomic on 32-bit
+	 */
+	data->priv_timer[tim_lcore].pending_head.expire =
+		data->priv_timer[tim_lcore].pending_head.sl_next[0]->expire;
 }
 
 /*
@@ -284,7 +401,7 @@ timer_add(struct rte_timer *tim, unsigned int tim_lcore)
  */
 static void
 timer_del(struct rte_timer *tim, union rte_timer_status prev_status,
-		int local_is_locked)
+	  int local_is_locked, struct rte_timer_data *data)
 {
 	unsigned lcore_id = rte_lcore_id();
 	unsigned prev_owner = prev_status.owner;
@@ -295,30 +412,33 @@ timer_del(struct rte_timer *tim, union rte_timer_status prev_status,
 	 * list; if it is on local core, we need to lock if we are not
 	 * called from rte_timer_manage() */
 	if (prev_owner != lcore_id || !local_is_locked)
-		rte_spinlock_lock(&priv_timer[prev_owner].list_lock);
+		rte_spinlock_lock(&data->priv_timer[prev_owner].list_lock);
 
 	/* save the lowest list entry into the expire field of the dummy hdr.
 	 * NOTE: this is not atomic on 32-bit */
-	if (tim == priv_timer[prev_owner].pending_head.sl_next[0])
-		priv_timer[prev_owner].pending_head.expire =
+	if (tim == data->priv_timer[prev_owner].pending_head.sl_next[0])
+		data->priv_timer[prev_owner].pending_head.expire =
 				((tim->sl_next[0] == NULL) ? 0 : tim->sl_next[0]->expire);
 
 	/* adjust pointers from previous entries to point past this */
-	timer_get_prev_entries_for_node(tim, prev_owner, prev);
-	for (i = priv_timer[prev_owner].curr_skiplist_depth - 1; i >= 0; i--) {
+	timer_get_prev_entries_for_node(tim, prev_owner, prev, data);
+	i = data->priv_timer[prev_owner].curr_skiplist_depth - 1;
+	for ( ; i >= 0; i--) {
 		if (prev[i]->sl_next[i] == tim)
 			prev[i]->sl_next[i] = tim->sl_next[i];
 	}
 
 	/* in case we deleted last entry at a level, adjust down max level */
-	for (i = priv_timer[prev_owner].curr_skiplist_depth - 1; i >= 0; i--)
-		if (priv_timer[prev_owner].pending_head.sl_next[i] == NULL)
-			priv_timer[prev_owner].curr_skiplist_depth --;
+	for (i = data->priv_timer[prev_owner].curr_skiplist_depth - 1; i >= 0;
+	     i--)
+		if (data->priv_timer[prev_owner].pending_head.sl_next[i] ==
+		    NULL)
+			data->priv_timer[prev_owner].curr_skiplist_depth--;
 		else
 			break;
 
 	if (prev_owner != lcore_id || !local_is_locked)
-		rte_spinlock_unlock(&priv_timer[prev_owner].list_lock);
+		rte_spinlock_unlock(&data->priv_timer[prev_owner].list_lock);
 }
 
 /* Reset and start the timer associated with the timer handle (private func) */
@@ -326,7 +446,8 @@ static int
 __rte_timer_reset(struct rte_timer *tim, uint64_t expire,
 		  uint64_t period, unsigned tim_lcore,
 		  rte_timer_cb_t fct, void *arg,
-		  int local_is_locked)
+		  int local_is_locked,
+		  struct rte_timer_data *data)
 {
 	union rte_timer_status prev_status, status;
 	int ret;
@@ -337,9 +458,9 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t expire,
 		if (lcore_id < RTE_MAX_LCORE) {
 			/* EAL thread with valid lcore_id */
 			tim_lcore = rte_get_next_lcore(
-				priv_timer[lcore_id].prev_lcore,
+				data->priv_timer[lcore_id].prev_lcore,
 				0, 1);
-			priv_timer[lcore_id].prev_lcore = tim_lcore;
+			data->priv_timer[lcore_id].prev_lcore = tim_lcore;
 		} else
 			/* non-EAL thread do not run rte_timer_manage(),
 			 * so schedule the timer on the first enabled lcore. */
@@ -348,20 +469,20 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t expire,
 
 	/* wait that the timer is in correct status before update,
 	 * and mark it as being configured */
-	ret = timer_set_config_state(tim, &prev_status);
+	ret = timer_set_config_state(tim, &prev_status, data);
 	if (ret < 0)
 		return -1;
 
-	__TIMER_STAT_ADD(reset, 1);
+	__TIMER_STAT_ADD(data, reset, 1);
 	if (prev_status.state == RTE_TIMER_RUNNING &&
 	    lcore_id < RTE_MAX_LCORE) {
-		priv_timer[lcore_id].updated = 1;
+		data->priv_timer[lcore_id].updated = 1;
 	}
 
 	/* remove it from list */
 	if (prev_status.state == RTE_TIMER_PENDING) {
-		timer_del(tim, prev_status, local_is_locked);
-		__TIMER_STAT_ADD(pending, -1);
+		timer_del(tim, prev_status, local_is_locked, data);
+		__TIMER_STAT_ADD(data, pending, -1);
 	}
 
 	tim->period = period;
@@ -374,10 +495,10 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t expire,
 	 * we are not called from rte_timer_manage()
 	 */
 	if (tim_lcore != lcore_id || !local_is_locked)
-		rte_spinlock_lock(&priv_timer[tim_lcore].list_lock);
+		rte_spinlock_lock(&data->priv_timer[tim_lcore].list_lock);
 
-	__TIMER_STAT_ADD(pending, 1);
-	timer_add(tim, tim_lcore);
+	__TIMER_STAT_ADD(data, pending, 1);
+	timer_add(tim, tim_lcore, data);
 
 	/* update state: as we are in CONFIG state, only us can modify
 	 * the state so we don't need to use cmpset() here */
@@ -387,7 +508,7 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t expire,
 	tim->status.u32 = status.u32;
 
 	if (tim_lcore != lcore_id || !local_is_locked)
-		rte_spinlock_unlock(&priv_timer[tim_lcore].list_lock);
+		rte_spinlock_unlock(&data->priv_timer[tim_lcore].list_lock);
 
 	return 0;
 }
@@ -395,11 +516,23 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t expire,
 /* Reset and start the timer associated with the timer handle tim */
 int
 rte_timer_reset(struct rte_timer *tim, uint64_t ticks,
-		enum rte_timer_type type, unsigned tim_lcore,
-		rte_timer_cb_t fct, void *arg)
+		 enum rte_timer_type type, unsigned int tim_lcore,
+		 rte_timer_cb_t fct, void *arg)
+{
+	return rte_timer_alt_reset(default_data_id, tim, ticks, type,
+				   tim_lcore, fct, arg);
+}
+
+int __rte_experimental
+rte_timer_alt_reset(uint32_t timer_data_id, struct rte_timer *tim,
+		    uint64_t ticks, enum rte_timer_type type,
+		    unsigned int tim_lcore, rte_timer_cb_t fct, void *arg)
 {
 	uint64_t cur_time = rte_get_timer_cycles();
 	uint64_t period;
+	struct rte_timer_data *timer_data;
+
+	TIMER_DATA_VALID_GET_OR_ERR_RET(timer_data_id, timer_data, -EINVAL);
 
 	if (unlikely((tim_lcore != (unsigned)LCORE_ID_ANY) &&
 			!(rte_lcore_is_enabled(tim_lcore) ||
@@ -412,7 +545,7 @@ rte_timer_reset(struct rte_timer *tim, uint64_t ticks,
 		period = 0;
 
 	return __rte_timer_reset(tim,  cur_time + ticks, period, tim_lcore,
-			  fct, arg, 0);
+				 fct, arg, 0, timer_data);
 }
 
 /* loop until rte_timer_reset() succeed */
@@ -430,26 +563,35 @@ rte_timer_reset_sync(struct rte_timer *tim, uint64_t ticks,
 int
 rte_timer_stop(struct rte_timer *tim)
 {
+	return rte_timer_alt_stop(default_data_id, tim);
+}
+
+int __rte_experimental
+rte_timer_alt_stop(uint32_t timer_data_id, struct rte_timer *tim)
+{
 	union rte_timer_status prev_status, status;
 	unsigned lcore_id = rte_lcore_id();
 	int ret;
+	struct rte_timer_data *timer_data;
+
+	TIMER_DATA_VALID_GET_OR_ERR_RET(timer_data_id, timer_data, -EINVAL);
 
 	/* wait that the timer is in correct status before update,
 	 * and mark it as being configured */
-	ret = timer_set_config_state(tim, &prev_status);
+	ret = timer_set_config_state(tim, &prev_status, timer_data);
 	if (ret < 0)
 		return -1;
 
-	__TIMER_STAT_ADD(stop, 1);
+	__TIMER_STAT_ADD(timer_data, stop, 1);
 	if (prev_status.state == RTE_TIMER_RUNNING &&
 	    lcore_id < RTE_MAX_LCORE) {
-		priv_timer[lcore_id].updated = 1;
+		timer_data->priv_timer[lcore_id].updated = 1;
 	}
 
 	/* remove it from list */
 	if (prev_status.state == RTE_TIMER_PENDING) {
-		timer_del(tim, prev_status, 0);
-		__TIMER_STAT_ADD(pending, -1);
+		timer_del(tim, prev_status, 0, timer_data);
+		__TIMER_STAT_ADD(timer_data, pending, -1);
 	}
 
 	/* mark timer as stopped */
@@ -486,13 +628,14 @@ void rte_timer_manage(void)
 	struct rte_timer *prev[MAX_SKIPLIST_DEPTH + 1];
 	uint64_t cur_time;
 	int i, ret;
+	struct rte_timer_data *data = &rte_timer_data_arr[default_data_id];
 
 	/* timer manager only runs on EAL thread with valid lcore_id */
 	assert(lcore_id < RTE_MAX_LCORE);
 
-	__TIMER_STAT_ADD(manage, 1);
+	__TIMER_STAT_ADD(data, manage, 1);
 	/* optimize for the case where per-cpu list is empty */
-	if (priv_timer[lcore_id].pending_head.sl_next[0] == NULL)
+	if (data->priv_timer[lcore_id].pending_head.sl_next[0] == NULL)
 		return;
 	cur_time = rte_get_timer_cycles();
 
@@ -500,32 +643,34 @@ void rte_timer_manage(void)
 	/* on 64-bit the value cached in the pending_head.expired will be
 	 * updated atomically, so we can consult that for a quick check here
 	 * outside the lock */
-	if (likely(priv_timer[lcore_id].pending_head.expire > cur_time))
+	if (likely(data->priv_timer[lcore_id].pending_head.expire > cur_time))
 		return;
 #endif
 
 	/* browse ordered list, add expired timers in 'expired' list */
-	rte_spinlock_lock(&priv_timer[lcore_id].list_lock);
+	rte_spinlock_lock(&data->priv_timer[lcore_id].list_lock);
 
 	/* if nothing to do just unlock and return */
-	if (priv_timer[lcore_id].pending_head.sl_next[0] == NULL ||
-	    priv_timer[lcore_id].pending_head.sl_next[0]->expire > cur_time) {
-		rte_spinlock_unlock(&priv_timer[lcore_id].list_lock);
+	if (data->priv_timer[lcore_id].pending_head.sl_next[0] == NULL ||
+	    data->priv_timer[lcore_id].pending_head.sl_next[0]->expire >
+	    cur_time) {
+		rte_spinlock_unlock(&data->priv_timer[lcore_id].list_lock);
 		return;
 	}
 
 	/* save start of list of expired timers */
-	tim = priv_timer[lcore_id].pending_head.sl_next[0];
+	tim = data->priv_timer[lcore_id].pending_head.sl_next[0];
 
 	/* break the existing list at current time point */
-	timer_get_prev_entries(cur_time, lcore_id, prev);
-	for (i = priv_timer[lcore_id].curr_skiplist_depth -1; i >= 0; i--) {
-		if (prev[i] == &priv_timer[lcore_id].pending_head)
+	timer_get_prev_entries(cur_time, lcore_id, prev, data);
+	for (i = data->priv_timer[lcore_id].curr_skiplist_depth - 1; i >= 0;
+	     i--) {
+		if (prev[i] == &data->priv_timer[lcore_id].pending_head)
 			continue;
-		priv_timer[lcore_id].pending_head.sl_next[i] =
+		data->priv_timer[lcore_id].pending_head.sl_next[i] =
 		    prev[i]->sl_next[i];
 		if (prev[i]->sl_next[i] == NULL)
-			priv_timer[lcore_id].curr_skiplist_depth--;
+			data->priv_timer[lcore_id].curr_skiplist_depth--;
 		prev[i] ->sl_next[i] = NULL;
 	}
 
@@ -548,25 +693,25 @@ void rte_timer_manage(void)
 	}
 
 	/* update the next to expire timer value */
-	priv_timer[lcore_id].pending_head.expire =
-	    (priv_timer[lcore_id].pending_head.sl_next[0] == NULL) ? 0 :
-		priv_timer[lcore_id].pending_head.sl_next[0]->expire;
+	data->priv_timer[lcore_id].pending_head.expire =
+	    (data->priv_timer[lcore_id].pending_head.sl_next[0] == NULL) ? 0 :
+		data->priv_timer[lcore_id].pending_head.sl_next[0]->expire;
 
-	rte_spinlock_unlock(&priv_timer[lcore_id].list_lock);
+	rte_spinlock_unlock(&data->priv_timer[lcore_id].list_lock);
 
 	/* now scan expired list and call callbacks */
 	for (tim = run_first_tim; tim != NULL; tim = next_tim) {
 		next_tim = tim->sl_next[0];
-		priv_timer[lcore_id].updated = 0;
-		priv_timer[lcore_id].running_tim = tim;
+		data->priv_timer[lcore_id].updated = 0;
+		data->priv_timer[lcore_id].running_tim = tim;
 
 		/* execute callback function with list unlocked */
 		tim->f(tim, tim->arg);
 
-		__TIMER_STAT_ADD(pending, -1);
+		__TIMER_STAT_ADD(data, pending, -1);
 		/* the timer was stopped or reloaded by the callback
 		 * function, we have nothing to do here */
-		if (priv_timer[lcore_id].updated == 1)
+		if (data->priv_timer[lcore_id].updated == 1)
 			continue;
 
 		if (tim->period == 0) {
@@ -578,33 +723,217 @@ void rte_timer_manage(void)
 		}
 		else {
 			/* keep it in list and mark timer as pending */
-			rte_spinlock_lock(&priv_timer[lcore_id].list_lock);
+			rte_spinlock_lock(
+					&data->priv_timer[lcore_id].list_lock);
 			status.state = RTE_TIMER_PENDING;
-			__TIMER_STAT_ADD(pending, 1);
+			__TIMER_STAT_ADD(data, pending, 1);
 			status.owner = (int16_t)lcore_id;
 			rte_wmb();
 			tim->status.u32 = status.u32;
 			__rte_timer_reset(tim, tim->expire + tim->period,
-				tim->period, lcore_id, tim->f, tim->arg, 1);
-			rte_spinlock_unlock(&priv_timer[lcore_id].list_lock);
+				tim->period, lcore_id, tim->f, tim->arg, 1,
+				data);
+			rte_spinlock_unlock(
+					&data->priv_timer[lcore_id].list_lock);
+		}
+	}
+	data->priv_timer[lcore_id].running_tim = NULL;
+}
+
+int __rte_experimental
+rte_timer_alt_manage(uint32_t timer_data_id,
+		     unsigned int *poll_lcores,
+		     int nb_poll_lcores,
+		     rte_timer_alt_manage_cb_t f)
+{
+	union rte_timer_status status;
+	struct rte_timer *tim, *next_tim, **pprev;
+	struct rte_timer *run_first_tims[RTE_MAX_LCORE];
+	unsigned int this_lcore = rte_lcore_id();
+	struct rte_timer *prev[MAX_SKIPLIST_DEPTH + 1];
+	uint64_t cur_time;
+	int i, j, ret;
+	int nb_runlists = 0;
+	struct priv_timer *priv_timer;
+	uint32_t poll_lcore;
+	struct rte_timer_data *data;
+
+	TIMER_DATA_VALID_GET_OR_ERR_RET(timer_data_id, data, -EINVAL);
+
+	/* timer manager only runs on EAL thread with valid lcore_id */
+	assert(this_lcore < RTE_MAX_LCORE);
+
+	__TIMER_STAT_ADD(data, manage, 1);
+
+	if (poll_lcores == NULL) {
+		poll_lcores = (unsigned int []){rte_lcore_id()};
+		nb_poll_lcores = 1;
+	}
+
+	for (i = 0, poll_lcore = poll_lcores[i]; i < nb_poll_lcores;
+	     poll_lcore = poll_lcores[++i]) {
+		priv_timer = &data->priv_timer[poll_lcore];
+
+		/* optimize for the case where per-cpu list is empty */
+		if (priv_timer->pending_head.sl_next[0] == NULL)
+			continue;
+		cur_time = rte_get_timer_cycles();
+
+#ifdef RTE_ARCH_64
+		/* on 64-bit the value cached in the pending_head.expired will
+		 * be updated atomically, so we can consult that for a quick
+		 * check here outside the lock
+		 */
+		if (likely(priv_timer->pending_head.expire > cur_time))
+			continue;
+#endif
+
+		/* browse ordered list, add expired timers in 'expired' list */
+		rte_spinlock_lock(&priv_timer->list_lock);
+
+		/* if nothing to do just unlock and return */
+		if (priv_timer->pending_head.sl_next[0] == NULL ||
+		    priv_timer->pending_head.sl_next[0]->expire > cur_time) {
+			rte_spinlock_unlock(&priv_timer->list_lock);
+			continue;
+		}
+
+		/* save start of list of expired timers */
+		tim = priv_timer->pending_head.sl_next[0];
+
+		/* break the existing list at current time point */
+		timer_get_prev_entries(cur_time, poll_lcore, prev, data);
+		for (j = priv_timer->curr_skiplist_depth - 1; j >= 0; j--) {
+			if (prev[j] == &priv_timer->pending_head)
+				continue;
+
+			priv_timer->pending_head.sl_next[j] =
+							prev[j]->sl_next[j];
+
+			if (prev[j]->sl_next[j] == NULL)
+				priv_timer->curr_skiplist_depth--;
+
+			prev[j]->sl_next[j] = NULL;
+		}
+
+		/* transition run-list from PENDING to RUNNING */
+		run_first_tims[nb_runlists] = tim;
+		pprev = &run_first_tims[nb_runlists];
+		nb_runlists++;
+
+		for ( ; tim != NULL; tim = next_tim) {
+			next_tim = tim->sl_next[0];
+
+			ret = timer_set_running_state(tim);
+			if (likely(ret == 0)) {
+				pprev = &tim->sl_next[0];
+			} else {
+				/* another core is trying to re-config this one,
+				 * remove it from local expired list
+				 */
+				*pprev = next_tim;
+			}
+		}
+
+		/* update the next to expire timer value */
+		priv_timer->pending_head.expire =
+		    (priv_timer->pending_head.sl_next[0] == NULL) ? 0 :
+			priv_timer->pending_head.sl_next[0]->expire;
+
+		rte_spinlock_unlock(&priv_timer->list_lock);
+	}
+
+	/* Now process the run lists */
+	while (1) {
+		bool done = true;
+		uint64_t min_expire = UINT64_MAX;
+		int min_idx = 0;
+
+		/* Find the next oldest timer to process */
+		for (i = 0; i < nb_runlists; i++) {
+			tim = run_first_tims[i];
+
+			if (tim != NULL && tim->expire < min_expire) {
+				min_expire = tim->expire;
+				min_idx = i;
+				done = false;
+			}
+		}
+
+		if (done)
+			break;
+
+		tim = run_first_tims[min_idx];
+
+		/* Move down the runlist from which we picked a timer to
+		 * execute
+		 */
+		run_first_tims[min_idx] = run_first_tims[min_idx]->sl_next[0];
+
+		priv_timer->updated = 0;
+		priv_timer->running_tim = tim;
+
+		/* Call the provided callback function */
+		f(tim);
+
+		__TIMER_STAT_ADD(data, pending, -1);
+
+		/* the timer was stopped or reloaded by the callback
+		 * function, we have nothing to do here
+		 */
+		if (priv_timer->updated == 1)
+			continue;
+
+		if (tim->period == 0) {
+			/* remove from done list and mark timer as stopped */
+			status.state = RTE_TIMER_STOP;
+			status.owner = RTE_TIMER_NO_OWNER;
+			rte_wmb();
+			tim->status.u32 = status.u32;
+		} else {
+			/* keep it in list and mark timer as pending */
+			rte_spinlock_lock(
+				&data->priv_timer[this_lcore].list_lock);
+			status.state = RTE_TIMER_PENDING;
+			__TIMER_STAT_ADD(data, pending, 1);
+			status.owner = (int16_t)this_lcore;
+			rte_wmb();
+			tim->status.u32 = status.u32;
+			__rte_timer_reset(tim, tim->expire + tim->period,
+				tim->period, this_lcore, tim->f, tim->arg, 1,
+				data);
+			rte_spinlock_unlock(
+				&data->priv_timer[this_lcore].list_lock);
 		}
+
+		priv_timer->running_tim = NULL;
 	}
-	priv_timer[lcore_id].running_tim = NULL;
+
+	return 0;
 }
 
 /* dump statistics about timers */
 void rte_timer_dump_stats(FILE *f)
 {
+	rte_timer_alt_dump_stats(default_data_id, f);
+}
+
+int __rte_experimental
+rte_timer_alt_dump_stats(uint32_t timer_data_id __rte_unused, FILE *f)
+{
 #ifdef RTE_LIBRTE_TIMER_DEBUG
 	struct rte_timer_debug_stats sum;
 	unsigned lcore_id;
+	struct rte_timer_data *timer_data;
+
+	TIMER_DATA_VALID_GET_OR_ERR_RET(timer_data_id, data, -EINVAL);
 
 	memset(&sum, 0, sizeof(sum));
 	for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
-		sum.reset += priv_timer[lcore_id].stats.reset;
-		sum.stop += priv_timer[lcore_id].stats.stop;
-		sum.manage += priv_timer[lcore_id].stats.manage;
-		sum.pending += priv_timer[lcore_id].stats.pending;
+		sum.reset += data->priv_timer[lcore_id].stats.reset;
+		sum.stop += data->priv_timer[lcore_id].stats.stop;
+		sum.manage += data->priv_timer[lcore_id].stats.manage;
+		sum.pending += data->priv_timer[lcore_id].stats.pending;
 	}
 	fprintf(f, "Timer statistics:\n");
 	fprintf(f, "  reset = %"PRIu64"\n", sum.reset);
@@ -614,4 +943,5 @@ void rte_timer_dump_stats(FILE *f)
 #else
 	fprintf(f, "No timer statistics, RTE_LIBRTE_TIMER_DEBUG is disabled\n");
 #endif
+	return 0;
 }
diff --git a/lib/librte_timer/rte_timer.h b/lib/librte_timer/rte_timer.h
index 9b95cd2..9daa334 100644
--- a/lib/librte_timer/rte_timer.h
+++ b/lib/librte_timer/rte_timer.h
@@ -39,6 +39,7 @@
 #include <stddef.h>
 #include <rte_common.h>
 #include <rte_config.h>
+#include <rte_spinlock.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -132,12 +133,52 @@ struct rte_timer
 #endif
 
 /**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Allocate a timer data instance in shared memory to track a set of pending
+ * timer lists.
+ *
+ * @param id_ptr
+ *   Pointer to variable into which to write the identifier of the allocated
+ *   timer data instance.
+ *
+ * @return
+ *   0: Success
+ *   -ENOSPC: maximum number of timer data instances already allocated
+ */
+int __rte_experimental rte_timer_data_alloc(uint32_t *id_ptr);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Deallocate a timer data instance.
+ *
+ * @param id
+ *   Identifier of the timer data instance to deallocate.
+ *
+ * @return
+ *   0: Success
+ *   -EINVAL: invalid timer data instance identifier
+ */
+int __rte_experimental rte_timer_data_dealloc(uint32_t id);
+
+/**
  * Initialize the timer library.
  *
  * Initializes internal variables (list, locks and so on) for the RTE
  * timer library.
  */
-void rte_timer_subsystem_init(void);
+int rte_timer_subsystem_init(void);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Free timer subsystem resources.
+ */
+void __rte_experimental rte_timer_subsystem_finalize(void);
 
 /**
  * Initialize a timer handle.
@@ -254,7 +295,6 @@ rte_timer_reset_sync(struct rte_timer *tim, uint64_t ticks,
  */
 int rte_timer_stop(struct rte_timer *tim);
 
-
 /**
  * Loop until rte_timer_stop() succeeds.
  *
@@ -302,6 +342,130 @@ void rte_timer_manage(void);
  */
 void rte_timer_dump_stats(FILE *f);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * This function is the same as rte_timer_reset(), except that it allows a
+ * caller to specify the rte_timer_data instance containing the list to which
+ * the timer should be added.
+ *
+ * @see rte_timer_reset()
+ *
+ * @param timer_data_id
+ *   An identifier indicating which instance of timer data should be used for
+ *   this operation.
+ * @param tim
+ *   The timer handle.
+ * @param ticks
+ *   The number of cycles (see rte_get_hpet_hz()) before the callback
+ *   function is called.
+ * @param type
+ *   The type can be either:
+ *   - PERIODICAL: The timer is automatically reloaded after execution
+ *     (returns to the PENDING state)
+ *   - SINGLE: The timer is one-shot, that is, the timer goes to a
+ *     STOPPED state after execution.
+ * @param tim_lcore
+ *   The ID of the lcore where the timer callback function has to be
+ *   executed. If tim_lcore is LCORE_ID_ANY, the timer library will
+ *   launch it on a different core for each call (round-robin).
+ * @param fct
+ *   The callback function of the timer. This parameter can be NULL if (and
+ *   only if) rte_timer_alt_manage() will be used to manage this timer.
+ * @param arg
+ *   The user argument of the callback function.
+ * @return
+ *   - 0: Success; the timer is scheduled.
+ *   - (-1): Timer is in the RUNNING or CONFIG state.
+ *   - -EINVAL: invalid timer_data_id
+ */
+int __rte_experimental
+rte_timer_alt_reset(uint32_t timer_data_id, struct rte_timer *tim,
+		    uint64_t ticks, enum rte_timer_type type,
+		    unsigned int tim_lcore, rte_timer_cb_t fct, void *arg);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * This function is the same as rte_timer_stop(), except that it allows a
+ * caller to specify the rte_timer_data instance containing the list from which
+ * this timer should be removed.
+ *
+ * @see rte_timer_stop()
+ *
+ * @param timer_data_id
+ *   An identifier indicating which instance of timer data should be used for
+ *   this operation.
+ * @param tim
+ *   The timer handle.
+ * @return
+ *   - 0: Success; the timer is stopped.
+ *   - (-1): The timer is in the RUNNING or CONFIG state.
+ *   - -EINVAL: invalid timer_data_id
+ */
+int __rte_experimental
+rte_timer_alt_stop(uint32_t timer_data_id, struct rte_timer *tim);
+
+/**
+ * Callback function type for rte_timer_alt_manage().
+ */
+typedef void (*rte_timer_alt_manage_cb_t)(void *);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Manage a set of timer lists and execute the specified callback function for
+ * all expired timers. This function is similar to rte_timer_manage(), except
+ * that it allows a caller to specify the timer_data instance that should
+ * be operated on, as well as a set of lcore IDs identifying which timer lists
+ * should be processed.  Callback functions of individual timers are ignored.
+ *
+ * @see rte_timer_manage()
+ *
+ * @param timer_data_id
+ *   An identifier indicating which instance of timer data should be used for
+ *   this operation.
+ * @param poll_lcores
+ *   An array of lcore ids identifying the timer lists that should be processed.
+ *   NULL is allowed - if NULL, the timer list corresponding to the lcore
+ *   calling this routine is processed (same as rte_timer_manage()).
+ * @param n_poll_lcores
+ *   The size of the poll_lcores array. If 'poll_lcores' is NULL, this parameter
+ *   is ignored.
+ * @param f
+ *   The callback function which should be called for all expired timers.
+ * @return
+ *   - 0: success
+ *   - -EINVAL: invalid timer_data_id
+ */
+int __rte_experimental
+rte_timer_alt_manage(uint32_t timer_data_id, unsigned int *poll_lcores,
+		     int n_poll_lcores, rte_timer_alt_manage_cb_t f);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * This function is the same as rte_timer_dump_stats(), except that it allows
+ * the caller to specify the rte_timer_data instance that should be used.
+ *
+ * @see rte_timer_dump_stats()
+ *
+ * @param timer_data_id
+ *   An identifier indicating which instance of timer data should be used for
+ *   this operation.
+ * @param f
+ *   A pointer to a file for output
+ * @return
+ *   - 0: success
+ *   - -EINVAL: invalid timer_data_id
+ */
+int __rte_experimental
+rte_timer_alt_dump_stats(uint32_t timer_data_id, FILE *f);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_timer/rte_timer_version.map b/lib/librte_timer/rte_timer_version.map
index 9b2e4b8..1e6b70d 100644
--- a/lib/librte_timer/rte_timer_version.map
+++ b/lib/librte_timer/rte_timer_version.map
@@ -3,13 +3,30 @@ DPDK_2.0 {
 
 	rte_timer_dump_stats;
 	rte_timer_init;
-	rte_timer_manage;
 	rte_timer_pending;
 	rte_timer_reset;
 	rte_timer_reset_sync;
 	rte_timer_stop;
 	rte_timer_stop_sync;
-	rte_timer_subsystem_init;
 
 	local: *;
 };
+
+DPDK_19.02 {
+	global:
+
+	rte_timer_manage;
+	rte_timer_subsystem_init;
+} DPDK_2.0;
+
+EXPERIMENTAL {
+	global:
+
+	rte_timer_alt_dump_stats;
+	rte_timer_alt_manage;
+	rte_timer_alt_reset;
+	rte_timer_alt_stop;
+	rte_timer_data_alloc;
+	rte_timer_data_dealloc;
+	rte_timer_subsystem_finalize;
+};
-- 
2.6.4

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [dpdk-dev] [PATCH 2/3] timer: add function to stop all timers in a list
  2018-11-29 23:35 [dpdk-dev] [PATCH 0/3] new software event timer adapter Erik Gabriel Carrillo
  2018-11-29 23:35 ` [dpdk-dev] [PATCH 1/3] timer: allow timer management in shared memory Erik Gabriel Carrillo
@ 2018-11-29 23:35 ` Erik Gabriel Carrillo
  2018-11-29 23:35 ` [dpdk-dev] [PATCH 3/3] eventdev: add new software event timer adapter Erik Gabriel Carrillo
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 77+ messages in thread
From: Erik Gabriel Carrillo @ 2018-11-29 23:35 UTC (permalink / raw)
  To: pbhagavatula, jerin.jacob, rsanford; +Cc: dev

Add a function to the timer API that allows a caller to traverse a
specified set of timer lists, stopping each timer in each list,
and invoking a callback function.

Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
---
 lib/librte_timer/rte_timer.c           | 81 +++++++++++++++++++++++++++-------
 lib/librte_timer/rte_timer.h           | 32 ++++++++++++++
 lib/librte_timer/rte_timer_version.map |  1 +
 3 files changed, 97 insertions(+), 17 deletions(-)

diff --git a/lib/librte_timer/rte_timer.c b/lib/librte_timer/rte_timer.c
index a76be8b..1eaf755 100644
--- a/lib/librte_timer/rte_timer.c
+++ b/lib/librte_timer/rte_timer.c
@@ -559,39 +559,30 @@ rte_timer_reset_sync(struct rte_timer *tim, uint64_t ticks,
 		rte_pause();
 }
 
-/* Stop the timer associated with the timer handle tim */
-int
-rte_timer_stop(struct rte_timer *tim)
-{
-	return rte_timer_alt_stop(default_data_id, tim);
-}
-
-int __rte_experimental
-rte_timer_alt_stop(uint32_t timer_data_id, struct rte_timer *tim)
+static int
+__rte_timer_stop(struct rte_timer *tim, int local_is_locked,
+		 struct rte_timer_data *data)
 {
 	union rte_timer_status prev_status, status;
 	unsigned lcore_id = rte_lcore_id();
 	int ret;
-	struct rte_timer_data *timer_data;
-
-	TIMER_DATA_VALID_GET_OR_ERR_RET(timer_data_id, timer_data, -EINVAL);
 
 	/* wait that the timer is in correct status before update,
 	 * and mark it as being configured */
-	ret = timer_set_config_state(tim, &prev_status, timer_data);
+	ret = timer_set_config_state(tim, &prev_status, data);
 	if (ret < 0)
 		return -1;
 
-	__TIMER_STAT_ADD(timer_data, stop, 1);
+	__TIMER_STAT_ADD(data, stop, 1);
 	if (prev_status.state == RTE_TIMER_RUNNING &&
 	    lcore_id < RTE_MAX_LCORE) {
-		timer_data->priv_timer[lcore_id].updated = 1;
+		data->priv_timer[lcore_id].updated = 1;
 	}
 
 	/* remove it from list */
 	if (prev_status.state == RTE_TIMER_PENDING) {
-		timer_del(tim, prev_status, 0, timer_data);
-		__TIMER_STAT_ADD(timer_data, pending, -1);
+		timer_del(tim, prev_status, local_is_locked, data);
+		__TIMER_STAT_ADD(data, pending, -1);
 	}
 
 	/* mark timer as stopped */
@@ -603,6 +594,23 @@ rte_timer_alt_stop(uint32_t timer_data_id, struct rte_timer *tim)
 	return 0;
 }
 
+/* Stop the timer associated with the timer handle tim */
+int
+rte_timer_stop(struct rte_timer *tim)
+{
+	return rte_timer_alt_stop(default_data_id, tim);
+}
+
+int __rte_experimental
+rte_timer_alt_stop(uint32_t timer_data_id, struct rte_timer *tim)
+{
+	struct rte_timer_data *timer_data;
+
+	TIMER_DATA_VALID_GET_OR_ERR_RET(timer_data_id, timer_data, -EINVAL);
+
+	return __rte_timer_stop(tim, 0, timer_data);
+}
+
 /* loop until rte_timer_stop() succeed */
 void
 rte_timer_stop_sync(struct rte_timer *tim)
@@ -912,6 +920,45 @@ rte_timer_alt_manage(uint32_t timer_data_id,
 	return 0;
 }
 
+/* Walk pending lists, stopping timers and calling user-specified function */
+int __rte_experimental
+rte_timer_stop_all(uint32_t timer_data_id, unsigned int *walk_lcores,
+		   int nb_walk_lcores,
+		   rte_timer_stop_all_cb_t f, void *f_arg)
+{
+	int i;
+	struct priv_timer *priv_timer;
+	uint32_t walk_lcore;
+	struct rte_timer *tim, *next_tim;
+	struct rte_timer_data *timer_data;
+
+	TIMER_DATA_VALID_GET_OR_ERR_RET(timer_data_id, timer_data, -EINVAL);
+
+	for (i = 0, walk_lcore = walk_lcores[i];
+	     i < nb_walk_lcores;
+	     walk_lcore = walk_lcores[++i]) {
+		priv_timer = &timer_data->priv_timer[walk_lcore];
+
+		rte_spinlock_lock(&priv_timer->list_lock);
+
+		for (tim = priv_timer->pending_head.sl_next[0];
+		     tim != NULL;
+		     tim = next_tim) {
+			next_tim = tim->sl_next[0];
+
+			/* Call timer_stop with lock held */
+			__rte_timer_stop(tim, 1, timer_data);
+
+			if (f)
+				f(tim, f_arg);
+		}
+
+		rte_spinlock_unlock(&priv_timer->list_lock);
+	}
+
+	return 0;
+}
+
 /* dump statistics about timers */
 void rte_timer_dump_stats(FILE *f)
 {
diff --git a/lib/librte_timer/rte_timer.h b/lib/librte_timer/rte_timer.h
index 9daa334..27b1ebd 100644
--- a/lib/librte_timer/rte_timer.h
+++ b/lib/librte_timer/rte_timer.h
@@ -446,6 +446,38 @@ rte_timer_alt_manage(uint32_t timer_data_id, unsigned int *poll_lcores,
 		     int n_poll_lcores, rte_timer_alt_manage_cb_t f);
 
 /**
+ * Callback function type for rte_timer_stop_all().
+ */
+typedef void (*rte_timer_stop_all_cb_t)(struct rte_timer *tim, void *arg);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Walk the pending timer lists for the specified lcore IDs, and for each timer
+ * that is encountered, stop it and call the specified callback function to
+ * process it further.
+ *
+ * @param timer_data_id
+ *   An identifier indicating which instance of timer data should be used for
+ *   this operation.
+ * @param walk_lcores
+ *   An array of lcore ids identifying the timer lists that should be processed.
+ * @param nb_walk_lcores
+ *   The size of the walk_lcores array.
+ * @param f
+ *   The callback function which should be called for each timers. Can be NULL.
+ * @param f_arg
+ *   An arbitrary argument that will be passed to f, if it is called.
+ * @return
+ *   - 0: success
+ *   - EINVAL: invalid timer_data_id
+ */
+int __rte_experimental
+rte_timer_stop_all(uint32_t timer_data_id, unsigned int *walk_lcores,
+		   int nb_walk_lcores, rte_timer_stop_all_cb_t f, void *f_arg);
+
+/**
  * @warning
  * @b EXPERIMENTAL: this API may change without prior notice
  *
diff --git a/lib/librte_timer/rte_timer_version.map b/lib/librte_timer/rte_timer_version.map
index 1e6b70d..0fab845 100644
--- a/lib/librte_timer/rte_timer_version.map
+++ b/lib/librte_timer/rte_timer_version.map
@@ -28,5 +28,6 @@ EXPERIMENTAL {
 	rte_timer_alt_stop;
 	rte_timer_data_alloc;
 	rte_timer_data_dealloc;
+	rte_timer_stop_all;
 	rte_timer_subsystem_finalize;
 };
-- 
2.6.4

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [dpdk-dev] [PATCH 3/3] eventdev: add new software event timer adapter
  2018-11-29 23:35 [dpdk-dev] [PATCH 0/3] new software event timer adapter Erik Gabriel Carrillo
  2018-11-29 23:35 ` [dpdk-dev] [PATCH 1/3] timer: allow timer management in shared memory Erik Gabriel Carrillo
  2018-11-29 23:35 ` [dpdk-dev] [PATCH 2/3] timer: add function to stop all timers in a list Erik Gabriel Carrillo
@ 2018-11-29 23:35 ` Erik Gabriel Carrillo
  2018-11-30  7:26 ` [dpdk-dev] [PATCH 0/3] " Pavan Nikhilesh
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 77+ messages in thread
From: Erik Gabriel Carrillo @ 2018-11-29 23:35 UTC (permalink / raw)
  To: pbhagavatula, jerin.jacob, rsanford; +Cc: dev

This commit updates the implementation of the software event timer
adapter.  The original version used rings to let producer cores (and
secondary processes) send timers to a service core, which would then arm
or cancel the timers, depending on what the application had requested.
The ring can be a bottleneck, so we replace the original implementation
with one that uses new APIs introduced in the timer library.  The new
APIs allow the underlying timer skiplists to be allocated in shared
memory, which allows the producer cores in both primary and secondary
processes to install timers directly into the lists, obviating the need
for a ring.  Each producer core also gets a unique timer list to insert
timers into, so no contention occurs there. The adapter's service
function can utilize a new flavor of rte_timer_manage() that can traverse
multiple timer lists, and also accepts a callback function.  The callback
function is only called from the primary process, since that's where the
service runs, and the callback is the same for all timers - it is defined
to enqueue a timer expiry event in the event device.

Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
---
 lib/librte_eventdev/rte_event_timer_adapter.c | 687 +++++++++++---------------
 1 file changed, 275 insertions(+), 412 deletions(-)

diff --git a/lib/librte_eventdev/rte_event_timer_adapter.c b/lib/librte_eventdev/rte_event_timer_adapter.c
index 79070d4..9c528cb 100644
--- a/lib/librte_eventdev/rte_event_timer_adapter.c
+++ b/lib/librte_eventdev/rte_event_timer_adapter.c
@@ -7,6 +7,7 @@
 #include <inttypes.h>
 #include <stdbool.h>
 #include <sys/queue.h>
+#include <assert.h>
 
 #include <rte_memzone.h>
 #include <rte_memory.h>
@@ -19,6 +20,7 @@
 #include <rte_timer.h>
 #include <rte_service_component.h>
 #include <rte_cycles.h>
+#include <rte_random.h>
 
 #include "rte_eventdev.h"
 #include "rte_eventdev_pmd.h"
@@ -34,7 +36,7 @@ static int evtim_buffer_logtype;
 
 static struct rte_event_timer_adapter adapters[RTE_EVENT_TIMER_ADAPTER_NUM_MAX];
 
-static const struct rte_event_timer_adapter_ops sw_event_adapter_timer_ops;
+static const struct rte_event_timer_adapter_ops swtim_ops;
 
 #define EVTIM_LOG(level, logtype, ...) \
 	rte_log(RTE_LOG_ ## level, logtype, \
@@ -211,7 +213,7 @@ rte_event_timer_adapter_create_ext(
 	 * implementation.
 	 */
 	if (adapter->ops == NULL)
-		adapter->ops = &sw_event_adapter_timer_ops;
+		adapter->ops = &swtim_ops;
 
 	/* Allow driver to do some setup */
 	FUNC_PTR_OR_NULL_RET_WITH_ERRNO(adapter->ops->init, -ENOTSUP);
@@ -334,7 +336,7 @@ rte_event_timer_adapter_lookup(uint16_t adapter_id)
 	 * implementation.
 	 */
 	if (adapter->ops == NULL)
-		adapter->ops = &sw_event_adapter_timer_ops;
+		adapter->ops = &swtim_ops;
 
 	/* Set fast-path function pointers */
 	adapter->arm_burst = adapter->ops->arm_burst;
@@ -491,6 +493,7 @@ event_buffer_flush(struct event_buffer *bufp, uint8_t dev_id, uint8_t port_id,
 	}
 
 	*nb_events_inv = 0;
+
 	*nb_events_flushed = rte_event_enqueue_burst(dev_id, port_id,
 						     &events[tail_idx], n);
 	if (*nb_events_flushed != n && rte_errno == -EINVAL) {
@@ -498,137 +501,123 @@ event_buffer_flush(struct event_buffer *bufp, uint8_t dev_id, uint8_t port_id,
 		(*nb_events_inv)++;
 	}
 
+	if (*nb_events_flushed > 0)
+		EVTIM_BUF_LOG_DBG("enqueued %"PRIu16" timer events to event "
+				  "device", *nb_events_flushed);
+
 	bufp->tail = bufp->tail + *nb_events_flushed + *nb_events_inv;
 }
 
 /*
  * Software event timer adapter implementation
  */
-
-struct rte_event_timer_adapter_sw_data {
-	/* List of messages for outstanding timers */
-	TAILQ_HEAD(, msg) msgs_tailq_head;
-	/* Lock to guard tailq and armed count */
-	rte_spinlock_t msgs_tailq_sl;
+struct swtim {
 	/* Identifier of service executing timer management logic. */
 	uint32_t service_id;
 	/* The cycle count at which the adapter should next tick */
 	uint64_t next_tick_cycles;
-	/* Incremented as the service moves through phases of an iteration */
-	volatile int service_phase;
 	/* The tick resolution used by adapter instance. May have been
 	 * adjusted from what user requested
 	 */
 	uint64_t timer_tick_ns;
 	/* Maximum timeout in nanoseconds allowed by adapter instance. */
 	uint64_t max_tmo_ns;
-	/* Ring containing messages to arm or cancel event timers */
-	struct rte_ring *msg_ring;
-	/* Mempool containing msg objects */
-	struct rte_mempool *msg_pool;
 	/* Buffered timer expiry events to be enqueued to an event device. */
 	struct event_buffer buffer;
 	/* Statistics */
 	struct rte_event_timer_adapter_stats stats;
-	/* The number of threads currently adding to the message ring */
-	rte_atomic16_t message_producer_count;
+	/* Mempool of timer objects */
+	struct rte_mempool *tim_pool;
+	/* Back pointer for convenience */
+	struct rte_event_timer_adapter *adapter;
+	/* Identifier of timer data instance */
+	uint32_t timer_data_id;
+	/* Track which cores have actually armed a timer */
+	rte_atomic16_t in_use[RTE_MAX_LCORE];
+	/* Track which cores' timer lists should be polled */
+	unsigned int poll_lcores[RTE_MAX_LCORE];
+	/* The number of lists that should be polled */
+	int n_poll_lcores;
+	/* Lock to atomically access the above two variables */
+	rte_spinlock_t poll_lcores_sl;
 };
 
-enum msg_type {MSG_TYPE_ARM, MSG_TYPE_CANCEL};
-
-struct msg {
-	enum msg_type type;
-	struct rte_event_timer *evtim;
-	struct rte_timer tim;
-	TAILQ_ENTRY(msg) msgs;
-};
+static inline struct swtim *
+swtim_pmd_priv(const struct rte_event_timer_adapter *adapter)
+{
+	return adapter->data->adapter_priv;
+}
 
 static void
-sw_event_timer_cb(struct rte_timer *tim, void *arg)
+swtim_callback(void *arg)
 {
-	int ret;
+	struct rte_timer *tim = arg;
+	struct rte_event_timer *evtim = tim->arg;
+	struct rte_event_timer_adapter *adapter;
+	struct swtim *sw;
 	uint16_t nb_evs_flushed = 0;
 	uint16_t nb_evs_invalid = 0;
 	uint64_t opaque;
-	struct rte_event_timer *evtim;
-	struct rte_event_timer_adapter *adapter;
-	struct rte_event_timer_adapter_sw_data *sw_data;
+	int ret;
 
-	evtim = arg;
 	opaque = evtim->impl_opaque[1];
 	adapter = (struct rte_event_timer_adapter *)(uintptr_t)opaque;
-	sw_data = adapter->data->adapter_priv;
+	sw = swtim_pmd_priv(adapter);
 
-	ret = event_buffer_add(&sw_data->buffer, &evtim->ev);
+	ret = event_buffer_add(&sw->buffer, &evtim->ev);
 	if (ret < 0) {
 		/* If event buffer is full, put timer back in list with
 		 * immediate expiry value, so that we process it again on the
 		 * next iteration.
 		 */
-		rte_timer_reset_sync(tim, 0, SINGLE, rte_lcore_id(),
-				     sw_event_timer_cb, evtim);
+		rte_timer_alt_reset(sw->timer_data_id, tim, 0, SINGLE,
+				    rte_lcore_id(), NULL, evtim);
+
+		sw->stats.evtim_retry_count++;
 
-		sw_data->stats.evtim_retry_count++;
 		EVTIM_LOG_DBG("event buffer full, resetting rte_timer with "
 			      "immediate expiry value");
 	} else {
-		struct msg *m = container_of(tim, struct msg, tim);
-		TAILQ_REMOVE(&sw_data->msgs_tailq_head, m, msgs);
 		EVTIM_BUF_LOG_DBG("buffered an event timer expiry event");
-		evtim->state = RTE_EVENT_TIMER_NOT_ARMED;
+		rte_mempool_put(sw->tim_pool, tim);
+		sw->stats.evtim_exp_count++;
 
-		/* Free the msg object containing the rte_timer now that
-		 * we've buffered its event successfully.
-		 */
-		rte_mempool_put(sw_data->msg_pool, m);
-
-		/* Bump the count when we successfully add an expiry event to
-		 * the buffer.
-		 */
-		sw_data->stats.evtim_exp_count++;
+		evtim->state = RTE_EVENT_TIMER_NOT_ARMED;
 	}
 
-	if (event_buffer_batch_ready(&sw_data->buffer)) {
-		event_buffer_flush(&sw_data->buffer,
+	if (event_buffer_batch_ready(&sw->buffer)) {
+		event_buffer_flush(&sw->buffer,
 				   adapter->data->event_dev_id,
 				   adapter->data->event_port_id,
 				   &nb_evs_flushed,
 				   &nb_evs_invalid);
 
-		sw_data->stats.ev_enq_count += nb_evs_flushed;
-		sw_data->stats.ev_inv_count += nb_evs_invalid;
+		sw->stats.ev_enq_count += nb_evs_flushed;
+		sw->stats.ev_inv_count += nb_evs_invalid;
 	}
 }
 
 static __rte_always_inline uint64_t
 get_timeout_cycles(struct rte_event_timer *evtim,
-		   struct rte_event_timer_adapter *adapter)
+		   const struct rte_event_timer_adapter *adapter)
 {
-	uint64_t timeout_ns;
-	struct rte_event_timer_adapter_sw_data *sw_data;
-
-	sw_data = adapter->data->adapter_priv;
-	timeout_ns = evtim->timeout_ticks * sw_data->timer_tick_ns;
+	struct swtim *sw = swtim_pmd_priv(adapter);
+	uint64_t timeout_ns = evtim->timeout_ticks * sw->timer_tick_ns;
 	return timeout_ns * rte_get_timer_hz() / NSECPERSEC;
-
 }
 
 /* This function returns true if one or more (adapter) ticks have occurred since
  * the last time it was called.
  */
 static inline bool
-adapter_did_tick(struct rte_event_timer_adapter *adapter)
+swtim_did_tick(struct swtim *sw)
 {
 	uint64_t cycles_per_adapter_tick, start_cycles;
 	uint64_t *next_tick_cyclesp;
-	struct rte_event_timer_adapter_sw_data *sw_data;
-
-	sw_data = adapter->data->adapter_priv;
-	next_tick_cyclesp = &sw_data->next_tick_cycles;
 
-	cycles_per_adapter_tick = sw_data->timer_tick_ns *
+	next_tick_cyclesp = &sw->next_tick_cycles;
+	cycles_per_adapter_tick = sw->timer_tick_ns *
 			(rte_get_timer_hz() / NSECPERSEC);
-
 	start_cycles = rte_get_timer_cycles();
 
 	/* Note: initially, *next_tick_cyclesp == 0, so the clause below will
@@ -640,7 +629,6 @@ adapter_did_tick(struct rte_event_timer_adapter *adapter)
 		 * boundary.
 		 */
 		start_cycles -= start_cycles % cycles_per_adapter_tick;
-
 		*next_tick_cyclesp = start_cycles + cycles_per_adapter_tick;
 
 		return true;
@@ -655,15 +643,12 @@ check_timeout(struct rte_event_timer *evtim,
 	      const struct rte_event_timer_adapter *adapter)
 {
 	uint64_t tmo_nsec;
-	struct rte_event_timer_adapter_sw_data *sw_data;
-
-	sw_data = adapter->data->adapter_priv;
-	tmo_nsec = evtim->timeout_ticks * sw_data->timer_tick_ns;
+	struct swtim *sw = swtim_pmd_priv(adapter);
 
-	if (tmo_nsec > sw_data->max_tmo_ns)
+	tmo_nsec = evtim->timeout_ticks * sw->timer_tick_ns;
+	if (tmo_nsec > sw->max_tmo_ns)
 		return -1;
-
-	if (tmo_nsec < sw_data->timer_tick_ns)
+	if (tmo_nsec < sw->timer_tick_ns)
 		return -2;
 
 	return 0;
@@ -691,110 +676,34 @@ check_destination_event_queue(struct rte_event_timer *evtim,
 	return 0;
 }
 
-#define NB_OBJS 32
 static int
-sw_event_timer_adapter_service_func(void *arg)
+swtim_service_func(void *arg)
 {
-	int i, num_msgs;
-	uint64_t cycles, opaque;
+	struct rte_event_timer_adapter *adapter = arg;
+	struct swtim *sw = swtim_pmd_priv(adapter);
 	uint16_t nb_evs_flushed = 0;
 	uint16_t nb_evs_invalid = 0;
-	struct rte_event_timer_adapter *adapter;
-	struct rte_event_timer_adapter_sw_data *sw_data;
-	struct rte_event_timer *evtim = NULL;
-	struct rte_timer *tim = NULL;
-	struct msg *msg, *msgs[NB_OBJS];
-
-	adapter = arg;
-	sw_data = adapter->data->adapter_priv;
-
-	sw_data->service_phase = 1;
-	rte_smp_wmb();
-
-	while (rte_atomic16_read(&sw_data->message_producer_count) > 0 ||
-	       !rte_ring_empty(sw_data->msg_ring)) {
-
-		num_msgs = rte_ring_dequeue_burst(sw_data->msg_ring,
-						  (void **)msgs, NB_OBJS, NULL);
-
-		for (i = 0; i < num_msgs; i++) {
-			int ret = 0;
-
-			RTE_SET_USED(ret);
-
-			msg = msgs[i];
-			evtim = msg->evtim;
-
-			switch (msg->type) {
-			case MSG_TYPE_ARM:
-				EVTIM_SVC_LOG_DBG("dequeued ARM message from "
-						  "ring");
-				tim = &msg->tim;
-				rte_timer_init(tim);
-				cycles = get_timeout_cycles(evtim,
-							    adapter);
-				ret = rte_timer_reset(tim, cycles, SINGLE,
-						      rte_lcore_id(),
-						      sw_event_timer_cb,
-						      evtim);
-				RTE_ASSERT(ret == 0);
-
-				evtim->impl_opaque[0] = (uintptr_t)tim;
-				evtim->impl_opaque[1] = (uintptr_t)adapter;
-
-				TAILQ_INSERT_TAIL(&sw_data->msgs_tailq_head,
-						  msg,
-						  msgs);
-				break;
-			case MSG_TYPE_CANCEL:
-				EVTIM_SVC_LOG_DBG("dequeued CANCEL message "
-						  "from ring");
-				opaque = evtim->impl_opaque[0];
-				tim = (struct rte_timer *)(uintptr_t)opaque;
-				RTE_ASSERT(tim != NULL);
-
-				ret = rte_timer_stop(tim);
-				RTE_ASSERT(ret == 0);
-
-				/* Free the msg object for the original arm
-				 * request.
-				 */
-				struct msg *m;
-				m = container_of(tim, struct msg, tim);
-				TAILQ_REMOVE(&sw_data->msgs_tailq_head, m,
-					     msgs);
-				rte_mempool_put(sw_data->msg_pool, m);
-
-				/* Free the msg object for the current msg */
-				rte_mempool_put(sw_data->msg_pool, msg);
-
-				evtim->impl_opaque[0] = 0;
-				evtim->impl_opaque[1] = 0;
-
-				break;
-			}
-		}
-	}
-
-	sw_data->service_phase = 2;
-	rte_smp_wmb();
 
-	if (adapter_did_tick(adapter)) {
-		rte_timer_manage();
+	if (swtim_did_tick(sw)) {
+		/* This lock is seldom acquired on the arm side */
+		rte_spinlock_lock(&sw->poll_lcores_sl);
+		rte_timer_alt_manage(sw->timer_data_id,
+				     sw->poll_lcores,
+				     sw->n_poll_lcores,
+				     swtim_callback);
+		rte_spinlock_unlock(&sw->poll_lcores_sl);
 
-		event_buffer_flush(&sw_data->buffer,
+		event_buffer_flush(&sw->buffer,
 				   adapter->data->event_dev_id,
 				   adapter->data->event_port_id,
-				   &nb_evs_flushed, &nb_evs_invalid);
+				   &nb_evs_flushed,
+				   &nb_evs_invalid);
 
-		sw_data->stats.ev_enq_count += nb_evs_flushed;
-		sw_data->stats.ev_inv_count += nb_evs_invalid;
-		sw_data->stats.adapter_tick_count++;
+		sw->stats.ev_enq_count += nb_evs_flushed;
+		sw->stats.ev_inv_count += nb_evs_invalid;
+		sw->stats.adapter_tick_count++;
 	}
 
-	sw_data->service_phase = 0;
-	rte_smp_wmb();
-
 	return 0;
 }
 
@@ -828,168 +737,145 @@ compute_msg_mempool_cache_size(uint64_t nb_requested, uint64_t nb_actual)
 	return cache_size;
 }
 
-#define SW_MIN_INTERVAL 1E5
-
 static int
-sw_event_timer_adapter_init(struct rte_event_timer_adapter *adapter)
+swtim_init(struct rte_event_timer_adapter *adapter)
 {
-	int ret;
-	struct rte_event_timer_adapter_sw_data *sw_data;
-	uint64_t nb_timers;
+	int i, ret;
+	struct swtim *sw;
 	unsigned int flags;
 	struct rte_service_spec service;
-	static bool timer_subsystem_inited; // static initialized to false
 
-	/* Allocate storage for SW implementation data */
-	char priv_data_name[RTE_RING_NAMESIZE];
-	snprintf(priv_data_name, RTE_RING_NAMESIZE, "sw_evtim_adap_priv_%"PRIu8,
-		 adapter->data->id);
-	adapter->data->adapter_priv = rte_zmalloc_socket(
-				priv_data_name,
-				sizeof(struct rte_event_timer_adapter_sw_data),
-				RTE_CACHE_LINE_SIZE,
-				adapter->data->socket_id);
-	if (adapter->data->adapter_priv == NULL) {
+	/* Allocate storage for private data area */
+#define SWTIM_NAMESIZE 32
+	char swtim_name[SWTIM_NAMESIZE];
+	snprintf(swtim_name, SWTIM_NAMESIZE, "swtim_%"PRIu8,
+			adapter->data->id);
+	sw = rte_zmalloc_socket(swtim_name, sizeof(*sw), RTE_CACHE_LINE_SIZE,
+			adapter->data->socket_id);
+	if (sw == NULL) {
 		EVTIM_LOG_ERR("failed to allocate space for private data");
 		rte_errno = ENOMEM;
 		return -1;
 	}
 
-	if (adapter->data->conf.timer_tick_ns < SW_MIN_INTERVAL) {
-		EVTIM_LOG_ERR("failed to create adapter with requested tick "
-			      "interval");
-		rte_errno = EINVAL;
-		return -1;
-	}
-
-	sw_data = adapter->data->adapter_priv;
-
-	sw_data->timer_tick_ns = adapter->data->conf.timer_tick_ns;
-	sw_data->max_tmo_ns = adapter->data->conf.max_tmo_ns;
+	/* Connect storage to adapter instance */
+	adapter->data->adapter_priv = sw;
+	sw->adapter = adapter;
 
-	TAILQ_INIT(&sw_data->msgs_tailq_head);
-	rte_spinlock_init(&sw_data->msgs_tailq_sl);
-	rte_atomic16_init(&sw_data->message_producer_count);
-
-	/* Rings require power of 2, so round up to next such value */
-	nb_timers = rte_align64pow2(adapter->data->conf.nb_timers);
-
-	char msg_ring_name[RTE_RING_NAMESIZE];
-	snprintf(msg_ring_name, RTE_RING_NAMESIZE,
-		 "sw_evtim_adap_msg_ring_%"PRIu8, adapter->data->id);
-	flags = adapter->data->conf.flags & RTE_EVENT_TIMER_ADAPTER_F_SP_PUT ?
-		RING_F_SP_ENQ | RING_F_SC_DEQ :
-		RING_F_SC_DEQ;
-	sw_data->msg_ring = rte_ring_create(msg_ring_name, nb_timers,
-					    adapter->data->socket_id, flags);
-	if (sw_data->msg_ring == NULL) {
-		EVTIM_LOG_ERR("failed to create message ring");
-		rte_errno = ENOMEM;
-		goto free_priv_data;
-	}
+	sw->timer_tick_ns = adapter->data->conf.timer_tick_ns;
+	sw->max_tmo_ns = adapter->data->conf.max_tmo_ns;
 
-	char pool_name[RTE_RING_NAMESIZE];
-	snprintf(pool_name, RTE_RING_NAMESIZE, "sw_evtim_adap_msg_pool_%"PRIu8,
+	/* Create a timer pool */
+	char pool_name[SWTIM_NAMESIZE];
+	snprintf(pool_name, SWTIM_NAMESIZE, "swtim_pool_%"PRIu8,
 		 adapter->data->id);
-
-	/* Both the arming/canceling thread and the service thread will do puts
-	 * to the mempool, but if the SP_PUT flag is enabled, we can specify
-	 * single-consumer get for the mempool.
-	 */
-	flags = adapter->data->conf.flags & RTE_EVENT_TIMER_ADAPTER_F_SP_PUT ?
-		MEMPOOL_F_SC_GET : 0;
-
-	/* The usable size of a ring is count - 1, so subtract one here to
-	 * make the counts agree.
-	 */
+	/* Optimal mempool size is a power of 2 minus one */
+	uint64_t nb_timers = rte_align64pow2(adapter->data->conf.nb_timers);
 	int pool_size = nb_timers - 1;
 	int cache_size = compute_msg_mempool_cache_size(
 				adapter->data->conf.nb_timers, nb_timers);
-	sw_data->msg_pool = rte_mempool_create(pool_name, pool_size,
-					       sizeof(struct msg), cache_size,
-					       0, NULL, NULL, NULL, NULL,
-					       adapter->data->socket_id, flags);
-	if (sw_data->msg_pool == NULL) {
-		EVTIM_LOG_ERR("failed to create message object mempool");
+	flags = 0; /* pool is multi-producer, multi-consumer */
+	sw->tim_pool = rte_mempool_create(pool_name, pool_size,
+			sizeof(struct rte_timer), cache_size, 0, NULL, NULL,
+			NULL, NULL, adapter->data->socket_id, flags);
+	if (sw->tim_pool == NULL) {
+		EVTIM_LOG_ERR("failed to create timer object mempool");
 		rte_errno = ENOMEM;
-		goto free_msg_ring;
+		goto free_alloc;
+	}
+
+	/* Initialize the variables that track in-use timer lists */
+	rte_spinlock_init(&sw->poll_lcores_sl);
+	for (i = 0; i < RTE_MAX_LCORE; i++)
+		rte_atomic16_init(&sw->in_use[i]);
+
+	/* Initialize the timer subsystem and allocate timer data instance */
+	ret = rte_timer_subsystem_init();
+	if (ret < 0) {
+		if (ret != -EALREADY) {
+			EVTIM_LOG_ERR("failed to initialize timer subsystem");
+			rte_errno = ret;
+			goto free_mempool;
+		}
+	}
+
+	ret = rte_timer_data_alloc(&sw->timer_data_id);
+	if (ret < 0) {
+		EVTIM_LOG_ERR("failed to allocate timer data instance");
+		rte_errno = ret;
+		goto free_mempool;
 	}
 
-	event_buffer_init(&sw_data->buffer);
+	/* Initialize timer event buffer */
+	event_buffer_init(&sw->buffer);
+
+	sw->adapter = adapter;
 
 	/* Register a service component to run adapter logic */
 	memset(&service, 0, sizeof(service));
 	snprintf(service.name, RTE_SERVICE_NAME_MAX,
-		 "sw_evimer_adap_svc_%"PRIu8, adapter->data->id);
+		 "swtim_svc_%"PRIu8, adapter->data->id);
 	service.socket_id = adapter->data->socket_id;
-	service.callback = sw_event_timer_adapter_service_func;
+	service.callback = swtim_service_func;
 	service.callback_userdata = adapter;
 	service.capabilities &= ~(RTE_SERVICE_CAP_MT_SAFE);
-	ret = rte_service_component_register(&service, &sw_data->service_id);
+	ret = rte_service_component_register(&service, &sw->service_id);
 	if (ret < 0) {
 		EVTIM_LOG_ERR("failed to register service %s with id %"PRIu32
-			      ": err = %d", service.name, sw_data->service_id,
+			      ": err = %d", service.name, sw->service_id,
 			      ret);
 
 		rte_errno = ENOSPC;
-		goto free_msg_pool;
+		goto free_mempool;
 	}
 
 	EVTIM_LOG_DBG("registered service %s with id %"PRIu32, service.name,
-		      sw_data->service_id);
+		      sw->service_id);
 
-	adapter->data->service_id = sw_data->service_id;
+	adapter->data->service_id = sw->service_id;
 	adapter->data->service_inited = 1;
 
-	if (!timer_subsystem_inited) {
-		rte_timer_subsystem_init();
-		timer_subsystem_inited = true;
-	}
-
 	return 0;
-
-free_msg_pool:
-	rte_mempool_free(sw_data->msg_pool);
-free_msg_ring:
-	rte_ring_free(sw_data->msg_ring);
-free_priv_data:
-	rte_free(sw_data);
+free_mempool:
+	rte_mempool_free(sw->tim_pool);
+free_alloc:
+	rte_free(sw);
 	return -1;
 }
 
-static int
-sw_event_timer_adapter_uninit(struct rte_event_timer_adapter *adapter)
+static void
+swtim_free_tim(struct rte_timer *tim, void *arg)
 {
-	int ret;
-	struct msg *m1, *m2;
-	struct rte_event_timer_adapter_sw_data *sw_data =
-						adapter->data->adapter_priv;
+	struct swtim *sw = arg;
 
-	rte_spinlock_lock(&sw_data->msgs_tailq_sl);
-
-	/* Cancel outstanding rte_timers and free msg objects */
-	m1 = TAILQ_FIRST(&sw_data->msgs_tailq_head);
-	while (m1 != NULL) {
-		EVTIM_LOG_DBG("freeing outstanding timer");
-		m2 = TAILQ_NEXT(m1, msgs);
-
-		rte_timer_stop_sync(&m1->tim);
-		rte_mempool_put(sw_data->msg_pool, m1);
+	rte_mempool_put(sw->tim_pool, (void *)tim);
+}
 
-		m1 = m2;
-	}
+/* Traverse the list of outstanding timers and put them back in the mempool
+ * before freeing the adapter to avoid leaking the memory.
+ */
+static int
+swtim_uninit(struct rte_event_timer_adapter *adapter)
+{
+	int ret;
+	struct swtim *sw = swtim_pmd_priv(adapter);
 
-	rte_spinlock_unlock(&sw_data->msgs_tailq_sl);
+	/* Free outstanding timers */
+	rte_timer_stop_all(sw->timer_data_id,
+			   sw->poll_lcores,
+			   sw->n_poll_lcores,
+			   swtim_free_tim,
+			   sw);
 
-	ret = rte_service_component_unregister(sw_data->service_id);
+	ret = rte_service_component_unregister(sw->service_id);
 	if (ret < 0) {
 		EVTIM_LOG_ERR("failed to unregister service component");
 		return ret;
 	}
 
-	rte_ring_free(sw_data->msg_ring);
-	rte_mempool_free(sw_data->msg_pool);
-	rte_free(adapter->data->adapter_priv);
+	rte_mempool_free(sw->tim_pool);
+	rte_free(sw);
+	adapter->data->adapter_priv = NULL;
 
 	return 0;
 }
@@ -1010,88 +896,79 @@ get_mapped_count_for_service(uint32_t service_id)
 }
 
 static int
-sw_event_timer_adapter_start(const struct rte_event_timer_adapter *adapter)
+swtim_start(const struct rte_event_timer_adapter *adapter)
 {
 	int mapped_count;
-	struct rte_event_timer_adapter_sw_data *sw_data;
-
-	sw_data = adapter->data->adapter_priv;
+	struct swtim *sw = swtim_pmd_priv(adapter);
 
 	/* Mapping the service to more than one service core can introduce
 	 * delays while one thread is waiting to acquire a lock, so only allow
 	 * one core to be mapped to the service.
+	 *
+	 * Note: the service could be modified such that it spreads cores to
+	 * poll over multiple service instances.
 	 */
-	mapped_count = get_mapped_count_for_service(sw_data->service_id);
+	mapped_count = get_mapped_count_for_service(sw->service_id);
 
-	if (mapped_count == 1)
-		return rte_service_component_runstate_set(sw_data->service_id,
-							  1);
+	if (mapped_count != 1)
+		return mapped_count < 1 ? -ENOENT : -ENOTSUP;
 
-	return mapped_count < 1 ? -ENOENT : -ENOTSUP;
+	return rte_service_component_runstate_set(sw->service_id, 1);
 }
 
 static int
-sw_event_timer_adapter_stop(const struct rte_event_timer_adapter *adapter)
+swtim_stop(const struct rte_event_timer_adapter *adapter)
 {
 	int ret;
-	struct rte_event_timer_adapter_sw_data *sw_data =
-						adapter->data->adapter_priv;
+	struct swtim *sw = swtim_pmd_priv(adapter);
 
-	ret = rte_service_component_runstate_set(sw_data->service_id, 0);
+	ret = rte_service_component_runstate_set(sw->service_id, 0);
 	if (ret < 0)
 		return ret;
 
-	/* Wait for the service to complete its final iteration before
-	 * stopping.
-	 */
-	while (sw_data->service_phase != 0)
+	/* Wait for the service to complete its final iteration */
+	while (rte_service_may_be_active(sw->service_id))
 		rte_pause();
 
-	rte_smp_rmb();
-
 	return 0;
 }
 
 static void
-sw_event_timer_adapter_get_info(const struct rte_event_timer_adapter *adapter,
+swtim_get_info(const struct rte_event_timer_adapter *adapter,
 		struct rte_event_timer_adapter_info *adapter_info)
 {
-	struct rte_event_timer_adapter_sw_data *sw_data;
-	sw_data = adapter->data->adapter_priv;
-
-	adapter_info->min_resolution_ns = sw_data->timer_tick_ns;
-	adapter_info->max_tmo_ns = sw_data->max_tmo_ns;
+	struct swtim *sw = swtim_pmd_priv(adapter);
+	adapter_info->min_resolution_ns = sw->timer_tick_ns;
+	adapter_info->max_tmo_ns = sw->max_tmo_ns;
 }
 
 static int
-sw_event_timer_adapter_stats_get(const struct rte_event_timer_adapter *adapter,
-				 struct rte_event_timer_adapter_stats *stats)
+swtim_stats_get(const struct rte_event_timer_adapter *adapter,
+		struct rte_event_timer_adapter_stats *stats)
 {
-	struct rte_event_timer_adapter_sw_data *sw_data;
-	sw_data = adapter->data->adapter_priv;
-	*stats = sw_data->stats;
+	struct swtim *sw = swtim_pmd_priv(adapter);
+	*stats = sw->stats; /* structure copy */
 	return 0;
 }
 
 static int
-sw_event_timer_adapter_stats_reset(
-				const struct rte_event_timer_adapter *adapter)
+swtim_stats_reset(const struct rte_event_timer_adapter *adapter)
 {
-	struct rte_event_timer_adapter_sw_data *sw_data;
-	sw_data = adapter->data->adapter_priv;
-	memset(&sw_data->stats, 0, sizeof(sw_data->stats));
+	struct swtim *sw = swtim_pmd_priv(adapter);
+	memset(&sw->stats, 0, sizeof(sw->stats));
 	return 0;
 }
 
-static __rte_always_inline uint16_t
-__sw_event_timer_arm_burst(const struct rte_event_timer_adapter *adapter,
-			  struct rte_event_timer **evtims,
-			  uint16_t nb_evtims)
+static uint16_t
+__swtim_arm_burst(const struct rte_event_timer_adapter *adapter,
+		struct rte_event_timer **evtims,
+		uint16_t nb_evtims)
 {
-	uint16_t i;
-	int ret;
-	struct rte_event_timer_adapter_sw_data *sw_data;
-	struct msg *msgs[nb_evtims];
+	int i, ret;
+	struct swtim *sw = swtim_pmd_priv(adapter);
+	uint32_t lcore_id = rte_lcore_id();
+	struct rte_timer *tim, *tims[nb_evtims];
+	uint64_t cycles;
 
 #ifdef RTE_LIBRTE_EVENTDEV_DEBUG
 	/* Check that the service is running. */
@@ -1101,101 +978,104 @@ __sw_event_timer_arm_burst(const struct rte_event_timer_adapter *adapter,
 	}
 #endif
 
-	sw_data = adapter->data->adapter_priv;
+	/* Adjust lcore_id if non-EAL thread. Arbitrarily pick the timer list of
+	 * the highest lcore to insert such timers into
+	 */
+	if (lcore_id == LCORE_ID_ANY)
+		lcore_id = RTE_MAX_LCORE - 1;
+
+	/* If this is the first time we're arming an event timer on this lcore,
+	 * mark this lcore as "in use"; this will cause the service
+	 * function to process the timer list that corresponds to this lcore.
+	 */
+	if (unlikely(rte_atomic16_test_and_set(&sw->in_use[lcore_id]))) {
+		rte_spinlock_lock(&sw->poll_lcores_sl);
+		EVTIM_LOG_DBG("Adding lcore id = %u to list of lcores to poll",
+			      lcore_id);
+		sw->poll_lcores[sw->n_poll_lcores++] = lcore_id;
+		rte_spinlock_unlock(&sw->poll_lcores_sl);
+	}
 
-	ret = rte_mempool_get_bulk(sw_data->msg_pool, (void **)msgs, nb_evtims);
+	ret = rte_mempool_get_bulk(sw->tim_pool, (void **)tims,
+				   nb_evtims);
 	if (ret < 0) {
 		rte_errno = ENOSPC;
 		return 0;
 	}
 
-	/* Let the service know we're producing messages for it to process */
-	rte_atomic16_inc(&sw_data->message_producer_count);
-
-	/* If the service is managing timers, wait for it to finish */
-	while (sw_data->service_phase == 2)
-		rte_pause();
-
-	rte_smp_rmb();
-
 	for (i = 0; i < nb_evtims; i++) {
 		/* Don't modify the event timer state in these cases */
 		if (evtims[i]->state == RTE_EVENT_TIMER_ARMED) {
 			rte_errno = EALREADY;
 			break;
 		} else if (!(evtims[i]->state == RTE_EVENT_TIMER_NOT_ARMED ||
-		    evtims[i]->state == RTE_EVENT_TIMER_CANCELED)) {
+			     evtims[i]->state == RTE_EVENT_TIMER_CANCELED)) {
 			rte_errno = EINVAL;
 			break;
 		}
 
 		ret = check_timeout(evtims[i], adapter);
-		if (ret == -1) {
+		if (unlikely(ret == -1)) {
 			evtims[i]->state = RTE_EVENT_TIMER_ERROR_TOOLATE;
 			rte_errno = EINVAL;
 			break;
-		}
-		if (ret == -2) {
+		} else if (unlikely(ret == -2)) {
 			evtims[i]->state = RTE_EVENT_TIMER_ERROR_TOOEARLY;
 			rte_errno = EINVAL;
 			break;
 		}
 
-		if (check_destination_event_queue(evtims[i], adapter) < 0) {
+		if (unlikely(check_destination_event_queue(evtims[i],
+							   adapter) < 0)) {
 			evtims[i]->state = RTE_EVENT_TIMER_ERROR;
 			rte_errno = EINVAL;
 			break;
 		}
 
-		/* Checks passed, set up a message to enqueue */
-		msgs[i]->type = MSG_TYPE_ARM;
-		msgs[i]->evtim = evtims[i];
+		tim = tims[i];
+		rte_timer_init(tim);
 
-		/* Set the payload pointer if not set. */
-		if (evtims[i]->ev.event_ptr == NULL)
-			evtims[i]->ev.event_ptr = evtims[i];
+		evtims[i]->impl_opaque[0] = (uintptr_t)tim;
+		evtims[i]->impl_opaque[1] = (uintptr_t)adapter;
 
-		/* msg objects that get enqueued successfully will be freed
-		 * either by a future cancel operation or by the timer
-		 * expiration callback.
-		 */
-		if (rte_ring_enqueue(sw_data->msg_ring, msgs[i]) < 0) {
-			rte_errno = ENOSPC;
+		cycles = get_timeout_cycles(evtims[i], adapter);
+		ret = rte_timer_alt_reset(sw->timer_data_id, tim, cycles,
+					  SINGLE, lcore_id, NULL, evtims[i]);
+		if (ret < 0) {
+			/* tim was in RUNNING or CONFIG state */
+			evtims[i]->state = RTE_EVENT_TIMER_ERROR;
 			break;
 		}
 
-		EVTIM_LOG_DBG("enqueued ARM message to ring");
-
+		rte_smp_wmb();
+		EVTIM_LOG_DBG("armed an event timer");
 		evtims[i]->state = RTE_EVENT_TIMER_ARMED;
 	}
 
-	/* Let the service know we're done producing messages */
-	rte_atomic16_dec(&sw_data->message_producer_count);
-
 	if (i < nb_evtims)
-		rte_mempool_put_bulk(sw_data->msg_pool, (void **)&msgs[i],
-				     nb_evtims - i);
+		rte_mempool_put_bulk(sw->tim_pool,
+				     (void **)&tims[i], nb_evtims - i);
 
 	return i;
 }
 
 static uint16_t
-sw_event_timer_arm_burst(const struct rte_event_timer_adapter *adapter,
-			 struct rte_event_timer **evtims,
-			 uint16_t nb_evtims)
+swtim_arm_burst(const struct rte_event_timer_adapter *adapter,
+		struct rte_event_timer **evtims,
+		uint16_t nb_evtims)
 {
-	return __sw_event_timer_arm_burst(adapter, evtims, nb_evtims);
+	return __swtim_arm_burst(adapter, evtims, nb_evtims);
 }
 
 static uint16_t
-sw_event_timer_cancel_burst(const struct rte_event_timer_adapter *adapter,
-			    struct rte_event_timer **evtims,
-			    uint16_t nb_evtims)
+swtim_cancel_burst(const struct rte_event_timer_adapter *adapter,
+		   struct rte_event_timer **evtims,
+		   uint16_t nb_evtims)
 {
-	uint16_t i;
-	int ret;
-	struct rte_event_timer_adapter_sw_data *sw_data;
-	struct msg *msgs[nb_evtims];
+	int i, ret;
+	struct rte_timer *timp;
+	uint64_t opaque;
+	struct swtim *sw = swtim_pmd_priv(adapter);
 
 #ifdef RTE_LIBRTE_EVENTDEV_DEBUG
 	/* Check that the service is running. */
@@ -1205,23 +1085,6 @@ sw_event_timer_cancel_burst(const struct rte_event_timer_adapter *adapter,
 	}
 #endif
 
-	sw_data = adapter->data->adapter_priv;
-
-	ret = rte_mempool_get_bulk(sw_data->msg_pool, (void **)msgs, nb_evtims);
-	if (ret < 0) {
-		rte_errno = ENOSPC;
-		return 0;
-	}
-
-	/* Let the service know we're producing messages for it to process */
-	rte_atomic16_inc(&sw_data->message_producer_count);
-
-	/* If the service could be modifying event timer states, wait */
-	while (sw_data->service_phase == 2)
-		rte_pause();
-
-	rte_smp_rmb();
-
 	for (i = 0; i < nb_evtims; i++) {
 		/* Don't modify the event timer state in these cases */
 		if (evtims[i]->state == RTE_EVENT_TIMER_CANCELED) {
@@ -1232,54 +1095,54 @@ sw_event_timer_cancel_burst(const struct rte_event_timer_adapter *adapter,
 			break;
 		}
 
-		msgs[i]->type = MSG_TYPE_CANCEL;
-		msgs[i]->evtim = evtims[i];
+		opaque = evtims[i]->impl_opaque[0];
+		timp = (struct rte_timer *)(uintptr_t)opaque;
+		RTE_ASSERT(timp != NULL);
 
-		if (rte_ring_enqueue(sw_data->msg_ring, msgs[i]) < 0) {
-			rte_errno = ENOSPC;
+		ret = rte_timer_alt_stop(sw->timer_data_id, timp);
+		if (ret < 0) {
+			/* Timer is running or being configured */
+			rte_errno = EAGAIN;
 			break;
 		}
 
-		EVTIM_LOG_DBG("enqueued CANCEL message to ring");
+		rte_mempool_put(sw->tim_pool, (void **)timp);
 
 		evtims[i]->state = RTE_EVENT_TIMER_CANCELED;
-	}
+		evtims[i]->impl_opaque[0] = 0;
+		evtims[i]->impl_opaque[1] = 0;
 
-	/* Let the service know we're done producing messages */
-	rte_atomic16_dec(&sw_data->message_producer_count);
-
-	if (i < nb_evtims)
-		rte_mempool_put_bulk(sw_data->msg_pool, (void **)&msgs[i],
-				     nb_evtims - i);
+		rte_smp_wmb();
+	}
 
 	return i;
 }
 
 static uint16_t
-sw_event_timer_arm_tmo_tick_burst(const struct rte_event_timer_adapter *adapter,
-				  struct rte_event_timer **evtims,
-				  uint64_t timeout_ticks,
-				  uint16_t nb_evtims)
+swtim_arm_tmo_tick_burst(const struct rte_event_timer_adapter *adapter,
+			 struct rte_event_timer **evtims,
+			 uint64_t timeout_ticks,
+			 uint16_t nb_evtims)
 {
 	int i;
 
 	for (i = 0; i < nb_evtims; i++)
 		evtims[i]->timeout_ticks = timeout_ticks;
 
-	return __sw_event_timer_arm_burst(adapter, evtims, nb_evtims);
+	return __swtim_arm_burst(adapter, evtims, nb_evtims);
 }
 
-static const struct rte_event_timer_adapter_ops sw_event_adapter_timer_ops = {
-	.init = sw_event_timer_adapter_init,
-	.uninit = sw_event_timer_adapter_uninit,
-	.start = sw_event_timer_adapter_start,
-	.stop = sw_event_timer_adapter_stop,
-	.get_info = sw_event_timer_adapter_get_info,
-	.stats_get = sw_event_timer_adapter_stats_get,
-	.stats_reset = sw_event_timer_adapter_stats_reset,
-	.arm_burst = sw_event_timer_arm_burst,
-	.arm_tmo_tick_burst = sw_event_timer_arm_tmo_tick_burst,
-	.cancel_burst = sw_event_timer_cancel_burst,
+static const struct rte_event_timer_adapter_ops swtim_ops = {
+	.init			= swtim_init,
+	.uninit			= swtim_uninit,
+	.start			= swtim_start,
+	.stop			= swtim_stop,
+	.get_info		= swtim_get_info,
+	.stats_get		= swtim_stats_get,
+	.stats_reset		= swtim_stats_reset,
+	.arm_burst		= swtim_arm_burst,
+	.arm_tmo_tick_burst	= swtim_arm_tmo_tick_burst,
+	.cancel_burst		= swtim_cancel_burst,
 };
 
 RTE_INIT(event_timer_adapter_init_log)
-- 
2.6.4

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [dpdk-dev] [PATCH 0/3] new software event timer adapter
  2018-11-29 23:35 [dpdk-dev] [PATCH 0/3] new software event timer adapter Erik Gabriel Carrillo
                   ` (2 preceding siblings ...)
  2018-11-29 23:35 ` [dpdk-dev] [PATCH 3/3] eventdev: add new software event timer adapter Erik Gabriel Carrillo
@ 2018-11-30  7:26 ` Pavan Nikhilesh
  2018-11-30 19:07   ` Carrillo, Erik G
  2018-12-07 17:52 ` [dpdk-dev] [PATCH v2 0/2] Timer library changes Erik Gabriel Carrillo
  2018-12-07 20:34 ` [dpdk-dev] [PATCH v2 0/1] New software event timer adapter Erik Gabriel Carrillo
  5 siblings, 1 reply; 77+ messages in thread
From: Pavan Nikhilesh @ 2018-11-30  7:26 UTC (permalink / raw)
  To: Erik Gabriel Carrillo, Jacob,  Jerin, rsanford; +Cc: stephen, dev

Hi Eric,

I think we may need to address the librte_timer and event_timer patches in
separate series as we are modifying common code for the sake of sw_event_timer
PMD and the series title implies that only the PMD has been modified.

Also, I think we need to profile and report the performance regression
(timer_perf_autotest,) if any of the rte_timer library with the new patches
as it is also used as a standalone library.

On Thu, Nov 29, 2018 at 05:35:11PM -0600, Erik Gabriel Carrillo wrote:
> This patch series introduces a new version of the event timer
> adapter software PMD [1].  In the original design, timer event producer
> lcores in the primary and secondary processes enqueued event timers
> into a ring, and a service core in the primary process dequeued them
> and processed them further.  To improve performance, this version does
> away with the ring and lets the lcores in both primary and secondary
> processes insert timers into directly into the timer skiplist data
> structures; the service core directly accesses the lists as well.
> To achieve this, however, modifications to the timer library [2] are
> required to enable the timer skiplists to be created and accessed in
> shared memory.  New APIs are introduced in the timer library to enable
> selecting from multiple instances of the timer skiplists. Instances of
> the event timer adapter, as well as the original APIs of the timer
> library, can then each access distinct timer lists.
>
> Future versions of this series will hopefully improve the names
> used for the data structures and APIs in the timer library.
>
> This series depends on the following patch:
> https://patches.dpdk.org/patch/48417/
>
> [1] https://doc.dpdk.org/guides/prog_guide/event_timer_adapter.html
> [2] https://doc.dpdk.org/guides/prog_guide/timer_lib.html
>
> Erik Gabriel Carrillo (3):
>   timer: allow timer management in shared memory
>   timer: add function to stop all timers in a list
>   eventdev: add new software event timer adapter
>
>  lib/librte_eventdev/rte_event_timer_adapter.c | 687 +++++++++++---------------
>  lib/librte_timer/Makefile                     |   1 +
>  lib/librte_timer/rte_timer.c                  | 579 ++++++++++++++++++----
>  lib/librte_timer/rte_timer.h                  | 200 +++++++-
>  lib/librte_timer/rte_timer_version.map        |  22 +-
>  5 files changed, 972 insertions(+), 517 deletions(-)
>
> --
> 2.6.4
>

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [dpdk-dev] [PATCH 0/3] new software event timer adapter
  2018-11-30  7:26 ` [dpdk-dev] [PATCH 0/3] " Pavan Nikhilesh
@ 2018-11-30 19:07   ` Carrillo, Erik G
  0 siblings, 0 replies; 77+ messages in thread
From: Carrillo, Erik G @ 2018-11-30 19:07 UTC (permalink / raw)
  To: Pavan Nikhilesh, Jacob,  Jerin, rsanford; +Cc: stephen, dev

Hi Pavan,

Thanks for the feedback.  Response inline:

> -----Original Message-----
> From: Pavan Nikhilesh [mailto:pbhagavatula@caviumnetworks.com]
> Sent: Friday, November 30, 2018 1:26 AM
> To: Carrillo, Erik G <erik.g.carrillo@intel.com>; Jacob, Jerin
> <Jerin.JacobKollanukkaran@cavium.com>; rsanford@akamai.com
> Cc: stephen@networkplumber.org; dev@dpdk.org
> Subject: Re: [PATCH 0/3] new software event timer adapter
> 
> Hi Eric,
> 
> I think we may need to address the librte_timer and event_timer patches in
> separate series as we are modifying common code for the sake of
> sw_event_timer PMD and the series title implies that only the PMD has been
> modified.
> 
> Also, I think we need to profile and report the performance regression
> (timer_perf_autotest,) if any of the rte_timer library with the new patches
> as it is also used as a standalone library.
> 

Makes sense.  I'll separate the series and check for a performance regression 
in the timer library for the next iteration.

Thanks,
Erik

> On Thu, Nov 29, 2018 at 05:35:11PM -0600, Erik Gabriel Carrillo wrote:
> > This patch series introduces a new version of the event timer adapter
> > software PMD [1].  In the original design, timer event producer lcores
> > in the primary and secondary processes enqueued event timers into a
> > ring, and a service core in the primary process dequeued them and
> > processed them further.  To improve performance, this version does
> > away with the ring and lets the lcores in both primary and secondary
> > processes insert timers into directly into the timer skiplist data
> > structures; the service core directly accesses the lists as well.
> > To achieve this, however, modifications to the timer library [2] are
> > required to enable the timer skiplists to be created and accessed in
> > shared memory.  New APIs are introduced in the timer library to enable
> > selecting from multiple instances of the timer skiplists. Instances of
> > the event timer adapter, as well as the original APIs of the timer
> > library, can then each access distinct timer lists.
> >
> > Future versions of this series will hopefully improve the names used
> > for the data structures and APIs in the timer library.
> >
> > This series depends on the following patch:
> > https://patches.dpdk.org/patch/48417/
> >
> > [1] https://doc.dpdk.org/guides/prog_guide/event_timer_adapter.html
> > [2] https://doc.dpdk.org/guides/prog_guide/timer_lib.html
> >
> > Erik Gabriel Carrillo (3):
> >   timer: allow timer management in shared memory
> >   timer: add function to stop all timers in a list
> >   eventdev: add new software event timer adapter
> >
> >  lib/librte_eventdev/rte_event_timer_adapter.c | 687 +++++++++++-------
> --------
> >  lib/librte_timer/Makefile                     |   1 +
> >  lib/librte_timer/rte_timer.c                  | 579 ++++++++++++++++++----
> >  lib/librte_timer/rte_timer.h                  | 200 +++++++-
> >  lib/librte_timer/rte_timer_version.map        |  22 +-
> >  5 files changed, 972 insertions(+), 517 deletions(-)
> >
> > --
> > 2.6.4
> >

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [dpdk-dev] [PATCH v2 0/2] Timer library changes
  2018-11-29 23:35 [dpdk-dev] [PATCH 0/3] new software event timer adapter Erik Gabriel Carrillo
                   ` (3 preceding siblings ...)
  2018-11-30  7:26 ` [dpdk-dev] [PATCH 0/3] " Pavan Nikhilesh
@ 2018-12-07 17:52 ` Erik Gabriel Carrillo
  2018-12-07 17:52   ` [dpdk-dev] [PATCH v2 1/2] timer: allow timer management in shared memory Erik Gabriel Carrillo
                     ` (2 more replies)
  2018-12-07 20:34 ` [dpdk-dev] [PATCH v2 0/1] New software event timer adapter Erik Gabriel Carrillo
  5 siblings, 3 replies; 77+ messages in thread
From: Erik Gabriel Carrillo @ 2018-12-07 17:52 UTC (permalink / raw)
  To: rsanford; +Cc: jerin.jacob, pbhagavatula, dev

This patch series modifies the timer library in such a way that 
structures that used to be statically allocated in a process's data
segment are now allocated in shared memory.  As these structures contain
lists of timers, new APIs are introduced that allow a caller to specify
the particular structure instance into which a timer should be inserted
or from which a timer should be removed.  This enables primary and secondary
processes to modify the same timer list, which enables some
multi-process use cases that were not previously possible; e.g. a
secondary process can start a timer whose expiration is detected in a
primary process running a new flavor of timer_manage().

The original library API is mostly unchanged, though implementations are
updated to call into newly added functions with a default structure instance
ID that provides the original behavior.  New functions are introduced to
enable applications to allocate structure instances to house timer
lists, and to reference them with an identifier when starting and
stopping timers, and finally, to manage the timer lists referenced with
an identifier.

My initial performance testing with the "timer_perf_autotest" test shows
no performance regression or improvement, and inspection of the
generated optimized code shows that the extra function call gets inlined
in the functions that now have an extra function call. 

Depends on: https://patches.dpdk.org/patch/48417/

Changes in v2:
 - split these changes out into their own series
 - version the symbols where the existing ABI was updated, and
   provide alternate implementation with behavior equivalent to original
   behavior. Validate ABI compatibility with validate-abi.sh
 - refactor changes to simplify patches

Erik Gabriel Carrillo (2):
  timer: allow timer management in shared memory
  timer: add function to stop all timers in a list

 lib/librte_timer/Makefile              |   1 +
 lib/librte_timer/rte_timer.c           | 558 ++++++++++++++++++++++++++++++---
 lib/librte_timer/rte_timer.h           | 258 ++++++++++++++-
 lib/librte_timer/rte_timer_version.map |  23 ++
 4 files changed, 795 insertions(+), 45 deletions(-)

-- 
2.6.4

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [dpdk-dev] [PATCH v2 1/2] timer: allow timer management in shared memory
  2018-12-07 17:52 ` [dpdk-dev] [PATCH v2 0/2] Timer library changes Erik Gabriel Carrillo
@ 2018-12-07 17:52   ` Erik Gabriel Carrillo
  2018-12-07 18:10     ` Stephen Hemminger
  2018-12-07 17:53   ` [dpdk-dev] [PATCH v2 2/2] timer: add function to stop all timers in a list Erik Gabriel Carrillo
  2018-12-13 22:26   ` [dpdk-dev] [PATCH v3 0/2] Timer library changes Erik Gabriel Carrillo
  2 siblings, 1 reply; 77+ messages in thread
From: Erik Gabriel Carrillo @ 2018-12-07 17:52 UTC (permalink / raw)
  To: rsanford; +Cc: jerin.jacob, pbhagavatula, dev

Currently, the timer library uses a per-process table of structures to
manage skiplists of timers presumably because timers contain arbitrary
function pointers whose value may not resolve properly in other
processes.

However, if the same callback is used handle all timers, and that
callback is only invoked in one process, then it woud be safe to allow
the data structures to be allocated in shared memory, and to allow
secondary processes to modify the timer lists.  This would let timers be
used in more multi-process scenarios.

The library's global variables are wrapped with a struct, and an array
of these structures is created in shared memory.  The original APIs
are updated to reference the zeroth entry in the array. This maintains
the original behavior for both primary and secondary processes since
the set intersection of their coremasks should be empty [1].  New APIs
are introduced to enable the allocation/deallocation of other entries
in the array.

New variants of the APIs used to start and stop timers are introduced;
they allow a caller to specify which array entry should be used to
locate the timer list to insert into or delete from.

Finally, a new variant of rte_timer_manage() is introduced, which
allows a caller to specify which array entry should be used to locate
the timer lists to process; it can also process multiple timer lists per
invocation.

[1] https://doc.dpdk.org/guides/prog_guide/multi_proc_support.html#multi-process-limitations

Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
---
 lib/librte_timer/Makefile              |   1 +
 lib/librte_timer/rte_timer.c           | 519 ++++++++++++++++++++++++++++++---
 lib/librte_timer/rte_timer.h           | 226 +++++++++++++-
 lib/librte_timer/rte_timer_version.map |  22 ++
 4 files changed, 723 insertions(+), 45 deletions(-)

diff --git a/lib/librte_timer/Makefile b/lib/librte_timer/Makefile
index 4ebd528..8ec63f4 100644
--- a/lib/librte_timer/Makefile
+++ b/lib/librte_timer/Makefile
@@ -6,6 +6,7 @@ include $(RTE_SDK)/mk/rte.vars.mk
 # library name
 LIB = librte_timer.a
 
+CFLAGS += -DALLOW_EXPERIMENTAL_API
 CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3
 LDLIBS += -lrte_eal
 
diff --git a/lib/librte_timer/rte_timer.c b/lib/librte_timer/rte_timer.c
index 30c7b0a..571fb3f 100644
--- a/lib/librte_timer/rte_timer.c
+++ b/lib/librte_timer/rte_timer.c
@@ -5,6 +5,7 @@
 #include <string.h>
 #include <stdio.h>
 #include <stdint.h>
+#include <stdbool.h>
 #include <inttypes.h>
 #include <assert.h>
 #include <sys/queue.h>
@@ -21,11 +22,15 @@
 #include <rte_spinlock.h>
 #include <rte_random.h>
 #include <rte_pause.h>
+#include <rte_memzone.h>
+#include <rte_malloc.h>
+#include <rte_compat.h>
 
 #include "rte_timer.h"
 
-LIST_HEAD(rte_timer_list, rte_timer);
-
+/**
+ * Per-lcore info for timers.
+ */
 struct priv_timer {
 	struct rte_timer pending_head;  /**< dummy timer instance to head up list */
 	rte_spinlock_t list_lock;       /**< lock to protect list access */
@@ -48,25 +53,84 @@ struct priv_timer {
 #endif
 } __rte_cache_aligned;
 
-/** per-lcore private info for timers */
-static struct priv_timer priv_timer[RTE_MAX_LCORE];
+#define FL_ALLOCATED	(1 << 0)
+struct rte_timer_data {
+	struct priv_timer priv_timer[RTE_MAX_LCORE];
+	uint8_t internal_flags;
+};
+
+#define RTE_MAX_DATA_ELS 64
+static struct rte_timer_data *rte_timer_data_arr;
+static uint32_t default_data_id;  // id set to zero automatically
+static uint32_t rte_timer_subsystem_initialized;
+
+/* For maintaining older interfaces for a period */
+static struct rte_timer_data default_timer_data;
 
 /* when debug is enabled, store some statistics */
 #ifdef RTE_LIBRTE_TIMER_DEBUG
-#define __TIMER_STAT_ADD(name, n) do {					\
+#define __TIMER_STAT_ADD(priv_timer, name, n) do {			\
 		unsigned __lcore_id = rte_lcore_id();			\
 		if (__lcore_id < RTE_MAX_LCORE)				\
 			priv_timer[__lcore_id].stats.name += (n);	\
 	} while(0)
 #else
-#define __TIMER_STAT_ADD(name, n) do {} while(0)
+#define __TIMER_STAT_ADD(priv_timer, name, n) do {} while (0)
 #endif
 
-/* Init the timer library. */
+static inline int
+timer_data_valid(uint32_t id)
+{
+	return !!(rte_timer_data_arr[id].internal_flags & FL_ALLOCATED);
+}
+
+/* validate ID and retrieve timer data pointer, or return error value */
+#define TIMER_DATA_VALID_GET_OR_ERR_RET(id, timer_data, retval) do {	\
+	if (id >= RTE_MAX_DATA_ELS || !timer_data_valid(id))		\
+		return retval;						\
+	timer_data = &rte_timer_data_arr[id];				\
+} while (0)
+
+int __rte_experimental
+rte_timer_data_alloc(uint32_t *id_ptr)
+{
+	int i;
+	struct rte_timer_data *data;
+
+	if (!rte_timer_subsystem_initialized)
+		return -ENOMEM;
+
+	for (i = 0; i < RTE_MAX_DATA_ELS; i++) {
+		data = &rte_timer_data_arr[i];
+		if (!(data->internal_flags & FL_ALLOCATED)) {
+			data->internal_flags |= FL_ALLOCATED;
+
+			if (id_ptr)
+				*id_ptr = i;
+
+			return 0;
+		}
+	}
+
+	return -ENOSPC;
+}
+
+int __rte_experimental
+rte_timer_data_dealloc(uint32_t id)
+{
+	struct rte_timer_data *timer_data;
+	TIMER_DATA_VALID_GET_OR_ERR_RET(id, timer_data, -EINVAL);
+
+	timer_data->internal_flags &= ~(FL_ALLOCATED);
+
+	return 0;
+}
+
 void
-rte_timer_subsystem_init(void)
+rte_timer_subsystem_init_v20(void)
 {
 	unsigned lcore_id;
+	struct priv_timer *priv_timer = default_timer_data.priv_timer;
 
 	/* since priv_timer is static, it's zeroed by default, so only init some
 	 * fields.
@@ -76,6 +140,76 @@ rte_timer_subsystem_init(void)
 		priv_timer[lcore_id].prev_lcore = lcore_id;
 	}
 }
+VERSION_SYMBOL(rte_timer_subsystem_init, _v20, 2.0);
+
+/* Init the timer library. Allocate an array of timer data structs in shared
+ * memory, and allocate the zeroth entry for use with original timer
+ * APIs. Since the intersection of the sets of lcore ids in primary and
+ * secondary processes should be empty, the zeroth entry can be shared by
+ * multiple processes.
+ */
+int
+rte_timer_subsystem_init_v1902(void)
+{
+	const struct rte_memzone *mz;
+	struct rte_timer_data *data;
+	int i, lcore_id;
+	static const char *mz_name = "rte_timer_mz";
+
+	if (rte_timer_subsystem_initialized)
+		return -EALREADY;
+
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
+		mz = rte_memzone_lookup(mz_name);
+		if (mz == NULL)
+			return -EEXIST;
+
+		rte_timer_data_arr = mz->addr;
+
+		rte_timer_data_arr[default_data_id].internal_flags |=
+			FL_ALLOCATED;
+
+		rte_timer_subsystem_initialized = 1;
+
+		return 0;
+	}
+
+	mz = rte_memzone_reserve_aligned(mz_name,
+			RTE_MAX_DATA_ELS * sizeof(*rte_timer_data_arr),
+			SOCKET_ID_ANY, 0, RTE_CACHE_LINE_SIZE);
+	if (mz == NULL)
+		return -ENOMEM;
+
+	rte_timer_data_arr = mz->addr;
+
+	for (i = 0; i < RTE_MAX_DATA_ELS; i++) {
+		data = &rte_timer_data_arr[i];
+
+		for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
+			rte_spinlock_init(
+				&data->priv_timer[lcore_id].list_lock);
+			data->priv_timer[lcore_id].prev_lcore = lcore_id;
+		}
+	}
+
+	rte_timer_data_arr[default_data_id].internal_flags |= FL_ALLOCATED;
+
+	rte_timer_subsystem_initialized = 1;
+
+	return 0;
+}
+MAP_STATIC_SYMBOL(int rte_timer_subsystem_init(void),
+		  rte_timer_subsystem_init_v1902);
+BIND_DEFAULT_SYMBOL(rte_timer_subsystem_init, _v1902, 19.02);
+
+void __rte_experimental
+rte_timer_subsystem_finalize(void)
+{
+	if (rte_timer_data_arr)
+		rte_free(rte_timer_data_arr);
+
+	rte_timer_subsystem_initialized = 0;
+}
 
 /* Initialize the timer handle tim for use */
 void
@@ -95,7 +229,8 @@ rte_timer_init(struct rte_timer *tim)
  */
 static int
 timer_set_config_state(struct rte_timer *tim,
-		       union rte_timer_status *ret_prev_status)
+		       union rte_timer_status *ret_prev_status,
+		       struct priv_timer *priv_timer)
 {
 	union rte_timer_status prev_status, status;
 	int success = 0;
@@ -207,7 +342,7 @@ timer_get_skiplist_level(unsigned curr_depth)
  */
 static void
 timer_get_prev_entries(uint64_t time_val, unsigned tim_lcore,
-		struct rte_timer **prev)
+		       struct rte_timer **prev, struct priv_timer *priv_timer)
 {
 	unsigned lvl = priv_timer[tim_lcore].curr_skiplist_depth;
 	prev[lvl] = &priv_timer[tim_lcore].pending_head;
@@ -226,13 +361,15 @@ timer_get_prev_entries(uint64_t time_val, unsigned tim_lcore,
  */
 static void
 timer_get_prev_entries_for_node(struct rte_timer *tim, unsigned tim_lcore,
-		struct rte_timer **prev)
+				struct rte_timer **prev,
+				struct priv_timer *priv_timer)
 {
 	int i;
+
 	/* to get a specific entry in the list, look for just lower than the time
 	 * values, and then increment on each level individually if necessary
 	 */
-	timer_get_prev_entries(tim->expire - 1, tim_lcore, prev);
+	timer_get_prev_entries(tim->expire - 1, tim_lcore, prev, priv_timer);
 	for (i = priv_timer[tim_lcore].curr_skiplist_depth - 1; i >= 0; i--) {
 		while (prev[i]->sl_next[i] != NULL &&
 				prev[i]->sl_next[i] != tim &&
@@ -247,14 +384,15 @@ timer_get_prev_entries_for_node(struct rte_timer *tim, unsigned tim_lcore,
  * timer must not be in a list
  */
 static void
-timer_add(struct rte_timer *tim, unsigned int tim_lcore)
+timer_add(struct rte_timer *tim, unsigned int tim_lcore,
+	  struct priv_timer *priv_timer)
 {
 	unsigned lvl;
 	struct rte_timer *prev[MAX_SKIPLIST_DEPTH+1];
 
 	/* find where exactly this element goes in the list of elements
 	 * for each depth. */
-	timer_get_prev_entries(tim->expire, tim_lcore, prev);
+	timer_get_prev_entries(tim->expire, tim_lcore, prev, priv_timer);
 
 	/* now assign it a new level and add at that level */
 	const unsigned tim_level = timer_get_skiplist_level(
@@ -284,7 +422,7 @@ timer_add(struct rte_timer *tim, unsigned int tim_lcore)
  */
 static void
 timer_del(struct rte_timer *tim, union rte_timer_status prev_status,
-		int local_is_locked)
+	  int local_is_locked, struct priv_timer *priv_timer)
 {
 	unsigned lcore_id = rte_lcore_id();
 	unsigned prev_owner = prev_status.owner;
@@ -304,7 +442,7 @@ timer_del(struct rte_timer *tim, union rte_timer_status prev_status,
 				((tim->sl_next[0] == NULL) ? 0 : tim->sl_next[0]->expire);
 
 	/* adjust pointers from previous entries to point past this */
-	timer_get_prev_entries_for_node(tim, prev_owner, prev);
+	timer_get_prev_entries_for_node(tim, prev_owner, prev, priv_timer);
 	for (i = priv_timer[prev_owner].curr_skiplist_depth - 1; i >= 0; i--) {
 		if (prev[i]->sl_next[i] == tim)
 			prev[i]->sl_next[i] = tim->sl_next[i];
@@ -326,11 +464,13 @@ static int
 __rte_timer_reset(struct rte_timer *tim, uint64_t expire,
 		  uint64_t period, unsigned tim_lcore,
 		  rte_timer_cb_t fct, void *arg,
-		  int local_is_locked)
+		  int local_is_locked,
+		  struct rte_timer_data *timer_data)
 {
 	union rte_timer_status prev_status, status;
 	int ret;
 	unsigned lcore_id = rte_lcore_id();
+	struct priv_timer *priv_timer = timer_data->priv_timer;
 
 	/* round robin for tim_lcore */
 	if (tim_lcore == (unsigned)LCORE_ID_ANY) {
@@ -348,11 +488,11 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t expire,
 
 	/* wait that the timer is in correct status before update,
 	 * and mark it as being configured */
-	ret = timer_set_config_state(tim, &prev_status);
+	ret = timer_set_config_state(tim, &prev_status, priv_timer);
 	if (ret < 0)
 		return -1;
 
-	__TIMER_STAT_ADD(reset, 1);
+	__TIMER_STAT_ADD(priv_timer, reset, 1);
 	if (prev_status.state == RTE_TIMER_RUNNING &&
 	    lcore_id < RTE_MAX_LCORE) {
 		priv_timer[lcore_id].updated = 1;
@@ -360,8 +500,8 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t expire,
 
 	/* remove it from list */
 	if (prev_status.state == RTE_TIMER_PENDING) {
-		timer_del(tim, prev_status, local_is_locked);
-		__TIMER_STAT_ADD(pending, -1);
+		timer_del(tim, prev_status, local_is_locked, priv_timer);
+		__TIMER_STAT_ADD(priv_timer, pending, -1);
 	}
 
 	tim->period = period;
@@ -376,8 +516,8 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t expire,
 	if (tim_lcore != lcore_id || !local_is_locked)
 		rte_spinlock_lock(&priv_timer[tim_lcore].list_lock);
 
-	__TIMER_STAT_ADD(pending, 1);
-	timer_add(tim, tim_lcore);
+	__TIMER_STAT_ADD(priv_timer, pending, 1);
+	timer_add(tim, tim_lcore, priv_timer);
 
 	/* update state: as we are in CONFIG state, only us can modify
 	 * the state so we don't need to use cmpset() here */
@@ -394,9 +534,9 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t expire,
 
 /* Reset and start the timer associated with the timer handle tim */
 int
-rte_timer_reset(struct rte_timer *tim, uint64_t ticks,
-		enum rte_timer_type type, unsigned tim_lcore,
-		rte_timer_cb_t fct, void *arg)
+rte_timer_reset_v20(struct rte_timer *tim, uint64_t ticks,
+		    enum rte_timer_type type, unsigned int tim_lcore,
+		    rte_timer_cb_t fct, void *arg)
 {
 	uint64_t cur_time = rte_get_timer_cycles();
 	uint64_t period;
@@ -412,7 +552,48 @@ rte_timer_reset(struct rte_timer *tim, uint64_t ticks,
 		period = 0;
 
 	return __rte_timer_reset(tim,  cur_time + ticks, period, tim_lcore,
-			  fct, arg, 0);
+			  fct, arg, 0, &default_timer_data);
+}
+VERSION_SYMBOL(rte_timer_reset, _v20, 2.0);
+
+int
+rte_timer_reset_v1902(struct rte_timer *tim, uint64_t ticks,
+		      enum rte_timer_type type, unsigned int tim_lcore,
+		      rte_timer_cb_t fct, void *arg)
+{
+	return rte_timer_alt_reset(default_data_id, tim, ticks, type,
+				   tim_lcore, fct, arg);
+}
+MAP_STATIC_SYMBOL(int rte_timer_reset(struct rte_timer *tim, uint64_t ticks,
+				      enum rte_timer_type type,
+				      unsigned int tim_lcore,
+				      rte_timer_cb_t fct, void *arg),
+		  rte_timer_reset_v1902);
+BIND_DEFAULT_SYMBOL(rte_timer_reset, _v1902, 19.02);
+
+int __rte_experimental
+rte_timer_alt_reset(uint32_t timer_data_id, struct rte_timer *tim,
+		    uint64_t ticks, enum rte_timer_type type,
+		    unsigned int tim_lcore, rte_timer_cb_t fct, void *arg)
+{
+	uint64_t cur_time = rte_get_timer_cycles();
+	uint64_t period;
+	struct rte_timer_data *timer_data;
+
+	TIMER_DATA_VALID_GET_OR_ERR_RET(timer_data_id, timer_data, -EINVAL);
+
+	if (unlikely((tim_lcore != (unsigned int)LCORE_ID_ANY) &&
+			!(rte_lcore_is_enabled(tim_lcore) ||
+			  rte_lcore_has_role(tim_lcore, ROLE_SERVICE))))
+		return -1;
+
+	if (type == PERIODICAL)
+		period = ticks;
+	else
+		period = 0;
+
+	return __rte_timer_reset(tim,  cur_time + ticks, period, tim_lcore,
+				 fct, arg, 0, timer_data);
 }
 
 /* loop until rte_timer_reset() succeed */
@@ -426,21 +607,22 @@ rte_timer_reset_sync(struct rte_timer *tim, uint64_t ticks,
 		rte_pause();
 }
 
-/* Stop the timer associated with the timer handle tim */
-int
-rte_timer_stop(struct rte_timer *tim)
+static int
+__rte_timer_stop(struct rte_timer *tim, int local_is_locked,
+		 struct rte_timer_data *timer_data)
 {
 	union rte_timer_status prev_status, status;
 	unsigned lcore_id = rte_lcore_id();
 	int ret;
+	struct priv_timer *priv_timer = timer_data->priv_timer;
 
 	/* wait that the timer is in correct status before update,
 	 * and mark it as being configured */
-	ret = timer_set_config_state(tim, &prev_status);
+	ret = timer_set_config_state(tim, &prev_status, priv_timer);
 	if (ret < 0)
 		return -1;
 
-	__TIMER_STAT_ADD(stop, 1);
+	__TIMER_STAT_ADD(priv_timer, stop, 1);
 	if (prev_status.state == RTE_TIMER_RUNNING &&
 	    lcore_id < RTE_MAX_LCORE) {
 		priv_timer[lcore_id].updated = 1;
@@ -448,8 +630,8 @@ rte_timer_stop(struct rte_timer *tim)
 
 	/* remove it from list */
 	if (prev_status.state == RTE_TIMER_PENDING) {
-		timer_del(tim, prev_status, 0);
-		__TIMER_STAT_ADD(pending, -1);
+		timer_del(tim, prev_status, local_is_locked, priv_timer);
+		__TIMER_STAT_ADD(priv_timer, pending, -1);
 	}
 
 	/* mark timer as stopped */
@@ -461,6 +643,33 @@ rte_timer_stop(struct rte_timer *tim)
 	return 0;
 }
 
+/* Stop the timer associated with the timer handle tim */
+int
+rte_timer_stop_v20(struct rte_timer *tim)
+{
+	return __rte_timer_stop(tim, 0, &default_timer_data);
+}
+VERSION_SYMBOL(rte_timer_stop, _v20, 2.0);
+
+int
+rte_timer_stop_v1902(struct rte_timer *tim)
+{
+	return rte_timer_alt_stop(default_data_id, tim);
+}
+MAP_STATIC_SYMBOL(int rte_timer_stop(struct rte_timer *tim),
+		  rte_timer_stop_v1902);
+BIND_DEFAULT_SYMBOL(rte_timer_stop, _v1902, 19.02);
+
+int __rte_experimental
+rte_timer_alt_stop(uint32_t timer_data_id, struct rte_timer *tim)
+{
+	struct rte_timer_data *timer_data;
+
+	TIMER_DATA_VALID_GET_OR_ERR_RET(timer_data_id, timer_data, -EINVAL);
+
+	return __rte_timer_stop(tim, 0, timer_data);
+}
+
 /* loop until rte_timer_stop() succeed */
 void
 rte_timer_stop_sync(struct rte_timer *tim)
@@ -477,7 +686,8 @@ rte_timer_pending(struct rte_timer *tim)
 }
 
 /* must be called periodically, run all timer that expired */
-void rte_timer_manage(void)
+static void
+__rte_timer_manage(struct rte_timer_data *timer_data)
 {
 	union rte_timer_status status;
 	struct rte_timer *tim, *next_tim;
@@ -486,11 +696,12 @@ void rte_timer_manage(void)
 	struct rte_timer *prev[MAX_SKIPLIST_DEPTH + 1];
 	uint64_t cur_time;
 	int i, ret;
+	struct priv_timer *priv_timer = timer_data->priv_timer;
 
 	/* timer manager only runs on EAL thread with valid lcore_id */
 	assert(lcore_id < RTE_MAX_LCORE);
 
-	__TIMER_STAT_ADD(manage, 1);
+	__TIMER_STAT_ADD(priv_timer, manage, 1);
 	/* optimize for the case where per-cpu list is empty */
 	if (priv_timer[lcore_id].pending_head.sl_next[0] == NULL)
 		return;
@@ -518,7 +729,7 @@ void rte_timer_manage(void)
 	tim = priv_timer[lcore_id].pending_head.sl_next[0];
 
 	/* break the existing list at current time point */
-	timer_get_prev_entries(cur_time, lcore_id, prev);
+	timer_get_prev_entries(cur_time, lcore_id, prev, priv_timer);
 	for (i = priv_timer[lcore_id].curr_skiplist_depth -1; i >= 0; i--) {
 		if (prev[i] == &priv_timer[lcore_id].pending_head)
 			continue;
@@ -563,7 +774,7 @@ void rte_timer_manage(void)
 		/* execute callback function with list unlocked */
 		tim->f(tim, tim->arg);
 
-		__TIMER_STAT_ADD(pending, -1);
+		__TIMER_STAT_ADD(priv_timer, pending, -1);
 		/* the timer was stopped or reloaded by the callback
 		 * function, we have nothing to do here */
 		if (priv_timer[lcore_id].updated == 1)
@@ -580,24 +791,222 @@ void rte_timer_manage(void)
 			/* keep it in list and mark timer as pending */
 			rte_spinlock_lock(&priv_timer[lcore_id].list_lock);
 			status.state = RTE_TIMER_PENDING;
-			__TIMER_STAT_ADD(pending, 1);
+			__TIMER_STAT_ADD(priv_timer, pending, 1);
 			status.owner = (int16_t)lcore_id;
 			rte_wmb();
 			tim->status.u32 = status.u32;
 			__rte_timer_reset(tim, tim->expire + tim->period,
-				tim->period, lcore_id, tim->f, tim->arg, 1);
+				tim->period, lcore_id, tim->f, tim->arg, 1,
+				timer_data);
 			rte_spinlock_unlock(&priv_timer[lcore_id].list_lock);
 		}
 	}
 	priv_timer[lcore_id].running_tim = NULL;
 }
 
+void
+rte_timer_manage_v20(void)
+{
+	__rte_timer_manage(&default_timer_data);
+}
+VERSION_SYMBOL(rte_timer_manage, _v20, 2.0);
+
+int
+rte_timer_manage_v1902(void)
+{
+	struct rte_timer_data *timer_data;
+
+	TIMER_DATA_VALID_GET_OR_ERR_RET(default_data_id, timer_data, -EINVAL);
+
+	__rte_timer_manage(timer_data);
+
+	return 0;
+}
+MAP_STATIC_SYMBOL(int rte_timer_manage(void), rte_timer_manage_v1902);
+BIND_DEFAULT_SYMBOL(rte_timer_manage, _v1902, 19.02);
+
+int __rte_experimental
+rte_timer_alt_manage(uint32_t timer_data_id,
+		     unsigned int *poll_lcores,
+		     int nb_poll_lcores,
+		     rte_timer_alt_manage_cb_t f)
+{
+	union rte_timer_status status;
+	struct rte_timer *tim, *next_tim, **pprev;
+	struct rte_timer *run_first_tims[RTE_MAX_LCORE];
+	unsigned int runlist_lcore_ids[RTE_MAX_LCORE];
+	unsigned int this_lcore = rte_lcore_id();
+	struct rte_timer *prev[MAX_SKIPLIST_DEPTH + 1];
+	uint64_t cur_time;
+	int i, j, ret;
+	int nb_runlists = 0;
+	struct rte_timer_data *data;
+	struct priv_timer *privp;
+	uint32_t poll_lcore;
+
+	TIMER_DATA_VALID_GET_OR_ERR_RET(timer_data_id, data, -EINVAL);
+
+	/* timer manager only runs on EAL thread with valid lcore_id */
+	assert(this_lcore < RTE_MAX_LCORE);
+
+	__TIMER_STAT_ADD(data->priv_timer, manage, 1);
+
+	if (poll_lcores == NULL) {
+		poll_lcores = (unsigned int []){rte_lcore_id()};
+		nb_poll_lcores = 1;
+	}
+
+	for (i = 0, poll_lcore = poll_lcores[i]; i < nb_poll_lcores;
+	     poll_lcore = poll_lcores[++i]) {
+		privp = &data->priv_timer[poll_lcore];
+
+		/* optimize for the case where per-cpu list is empty */
+		if (privp->pending_head.sl_next[0] == NULL)
+			continue;
+		cur_time = rte_get_timer_cycles();
+
+#ifdef RTE_ARCH_64
+		/* on 64-bit the value cached in the pending_head.expired will
+		 * be updated atomically, so we can consult that for a quick
+		 * check here outside the lock
+		 */
+		if (likely(privp->pending_head.expire > cur_time))
+			continue;
+#endif
+
+		/* browse ordered list, add expired timers in 'expired' list */
+		rte_spinlock_lock(&privp->list_lock);
+
+		/* if nothing to do just unlock and return */
+		if (privp->pending_head.sl_next[0] == NULL ||
+		    privp->pending_head.sl_next[0]->expire > cur_time) {
+			rte_spinlock_unlock(&privp->list_lock);
+			continue;
+		}
+
+		/* save start of list of expired timers */
+		tim = privp->pending_head.sl_next[0];
+
+		/* break the existing list at current time point */
+		timer_get_prev_entries(cur_time, poll_lcore, prev,
+				       data->priv_timer);
+		for (j = privp->curr_skiplist_depth - 1; j >= 0; j--) {
+			if (prev[j] == &privp->pending_head)
+				continue;
+			privp->pending_head.sl_next[j] =
+				prev[j]->sl_next[j];
+			if (prev[j]->sl_next[j] == NULL)
+				privp->curr_skiplist_depth--;
+
+			prev[j]->sl_next[j] = NULL;
+		}
+
+		/* transition run-list from PENDING to RUNNING */
+		run_first_tims[nb_runlists] = tim;
+		runlist_lcore_ids[nb_runlists] = poll_lcore;
+		pprev = &run_first_tims[nb_runlists];
+		nb_runlists++;
+
+		for ( ; tim != NULL; tim = next_tim) {
+			next_tim = tim->sl_next[0];
+
+			ret = timer_set_running_state(tim);
+			if (likely(ret == 0)) {
+				pprev = &tim->sl_next[0];
+			} else {
+				/* another core is trying to re-config this one,
+				 * remove it from local expired list
+				 */
+				*pprev = next_tim;
+			}
+		}
+
+		/* update the next to expire timer value */
+		privp->pending_head.expire =
+		    (privp->pending_head.sl_next[0] == NULL) ? 0 :
+			privp->pending_head.sl_next[0]->expire;
+
+		rte_spinlock_unlock(&privp->list_lock);
+	}
+
+	/* Now process the run lists */
+	while (1) {
+		bool done = true;
+		uint64_t min_expire = UINT64_MAX;
+		int min_idx = 0;
+
+		/* Find the next oldest timer to process */
+		for (i = 0; i < nb_runlists; i++) {
+			tim = run_first_tims[i];
+
+			if (tim != NULL && tim->expire < min_expire) {
+				min_expire = tim->expire;
+				min_idx = i;
+				done = false;
+			}
+		}
+
+		if (done)
+			break;
+
+		tim = run_first_tims[min_idx];
+		privp = &data->priv_timer[runlist_lcore_ids[min_idx]];
+
+		/* Move down the runlist from which we picked a timer to
+		 * execute
+		 */
+		run_first_tims[min_idx] = run_first_tims[min_idx]->sl_next[0];
+
+		privp->updated = 0;
+		privp->running_tim = tim;
+
+		/* Call the provided callback function */
+		f(tim);
+
+		__TIMER_STAT_ADD(privp, pending, -1);
+
+		/* the timer was stopped or reloaded by the callback
+		 * function, we have nothing to do here
+		 */
+		if (privp->updated == 1)
+			continue;
+
+		if (tim->period == 0) {
+			/* remove from done list and mark timer as stopped */
+			status.state = RTE_TIMER_STOP;
+			status.owner = RTE_TIMER_NO_OWNER;
+			rte_wmb();
+			tim->status.u32 = status.u32;
+		} else {
+			/* keep it in list and mark timer as pending */
+			rte_spinlock_lock(
+				&data->priv_timer[this_lcore].list_lock);
+			status.state = RTE_TIMER_PENDING;
+			__TIMER_STAT_ADD(data->priv_timer, pending, 1);
+			status.owner = (int16_t)this_lcore;
+			rte_wmb();
+			tim->status.u32 = status.u32;
+			__rte_timer_reset(tim, tim->expire + tim->period,
+				tim->period, this_lcore, tim->f, tim->arg, 1,
+				data);
+			rte_spinlock_unlock(
+				&data->priv_timer[this_lcore].list_lock);
+		}
+
+		privp->running_tim = NULL;
+	}
+
+	return 0;
+}
+
 /* dump statistics about timers */
-void rte_timer_dump_stats(FILE *f)
+static void
+__rte_timer_dump_stats(struct rte_timer_data *timer_data __rte_unused, FILE *f)
 {
 #ifdef RTE_LIBRTE_TIMER_DEBUG
 	struct rte_timer_debug_stats sum;
 	unsigned lcore_id;
+	struct priv_timer *priv_timer = timer_data->priv_timer;
 
 	memset(&sum, 0, sizeof(sum));
 	for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
@@ -615,3 +1024,31 @@ void rte_timer_dump_stats(FILE *f)
 	fprintf(f, "No timer statistics, RTE_LIBRTE_TIMER_DEBUG is disabled\n");
 #endif
 }
+
+void
+rte_timer_dump_stats_v20(FILE *f)
+{
+	__rte_timer_dump_stats(&default_timer_data, f);
+}
+VERSION_SYMBOL(rte_timer_dump_stats, _v20, 2.0);
+
+int
+rte_timer_dump_stats_v1902(FILE *f)
+{
+	return rte_timer_alt_dump_stats(default_data_id, f);
+}
+MAP_STATIC_SYMBOL(int rte_timer_dump_stats(FILE *f),
+		  rte_timer_dump_stats_v1902);
+BIND_DEFAULT_SYMBOL(rte_timer_dump_stats, _v1902, 19.02);
+
+int __rte_experimental
+rte_timer_alt_dump_stats(uint32_t timer_data_id __rte_unused, FILE *f)
+{
+	struct rte_timer_data *timer_data;
+
+	TIMER_DATA_VALID_GET_OR_ERR_RET(timer_data_id, timer_data, -EINVAL);
+
+	__rte_timer_dump_stats(timer_data, f);
+
+	return 0;
+}
diff --git a/lib/librte_timer/rte_timer.h b/lib/librte_timer/rte_timer.h
index 9b95cd2..82f5fba 100644
--- a/lib/librte_timer/rte_timer.h
+++ b/lib/librte_timer/rte_timer.h
@@ -39,6 +39,7 @@
 #include <stddef.h>
 #include <rte_common.h>
 #include <rte_config.h>
+#include <rte_spinlock.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -132,12 +133,68 @@ struct rte_timer
 #endif
 
 /**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Allocate a timer data instance in shared memory to track a set of pending
+ * timer lists.
+ *
+ * @param id_ptr
+ *   Pointer to variable into which to write the identifier of the allocated
+ *   timer data instance.
+ *
+ * @return
+ *   - 0: Success
+ *   - -ENOSPC: maximum number of timer data instances already allocated
+ */
+int __rte_experimental rte_timer_data_alloc(uint32_t *id_ptr);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Deallocate a timer data instance.
+ *
+ * @param id
+ *   Identifier of the timer data instance to deallocate.
+ *
+ * @return
+ *   - 0: Success
+ *   - -EINVAL: invalid timer data instance identifier
+ */
+int __rte_experimental rte_timer_data_dealloc(uint32_t id);
+
+/**
  * Initialize the timer library.
  *
  * Initializes internal variables (list, locks and so on) for the RTE
  * timer library.
  */
-void rte_timer_subsystem_init(void);
+void rte_timer_subsystem_init_v20(void);
+
+/**
+ * Initialize the timer library.
+ *
+ * Initializes internal variables (list, locks and so on) for the RTE
+ * timer library.
+ *
+ * @return
+ *   - 0: Success
+ *   - -EEXIST: Returned in secondary process when primary process has not
+ *      yet initialized the timer subsystem
+ *   - -ENOMEM: Unable to allocate memory needed to initialize timer
+ *      subsystem
+ */
+int rte_timer_subsystem_init_v1902(void);
+int rte_timer_subsystem_init(void);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Free timer subsystem resources.
+ */
+void __rte_experimental rte_timer_subsystem_finalize(void);
 
 /**
  * Initialize a timer handle.
@@ -193,6 +250,12 @@ void rte_timer_init(struct rte_timer *tim);
  *   - 0: Success; the timer is scheduled.
  *   - (-1): Timer is in the RUNNING or CONFIG state.
  */
+int rte_timer_reset_v20(struct rte_timer *tim, uint64_t ticks,
+			enum rte_timer_type type, unsigned int tim_lcore,
+			rte_timer_cb_t fct, void *arg);
+int rte_timer_reset_v1902(struct rte_timer *tim, uint64_t ticks,
+			  enum rte_timer_type type, unsigned int tim_lcore,
+			  rte_timer_cb_t fct, void *arg);
 int rte_timer_reset(struct rte_timer *tim, uint64_t ticks,
 		    enum rte_timer_type type, unsigned tim_lcore,
 		    rte_timer_cb_t fct, void *arg);
@@ -252,9 +315,10 @@ rte_timer_reset_sync(struct rte_timer *tim, uint64_t ticks,
  *   - 0: Success; the timer is stopped.
  *   - (-1): The timer is in the RUNNING or CONFIG state.
  */
+int rte_timer_stop_v20(struct rte_timer *tim);
+int rte_timer_stop_v1902(struct rte_timer *tim);
 int rte_timer_stop(struct rte_timer *tim);
 
-
 /**
  * Loop until rte_timer_stop() succeeds.
  *
@@ -292,7 +356,25 @@ int rte_timer_pending(struct rte_timer *tim);
  * function. However, the more often the function is called, the more
  * CPU resources it will use.
  */
-void rte_timer_manage(void);
+void rte_timer_manage_v20(void);
+
+/**
+ * Manage the timer list and execute callback functions.
+ *
+ * This function must be called periodically from EAL lcores
+ * main_loop(). It browses the list of pending timers and runs all
+ * timers that are expired.
+ *
+ * The precision of the timer depends on the call frequency of this
+ * function. However, the more often the function is called, the more
+ * CPU resources it will use.
+ *
+ * @return
+ *   - 0: Success
+ *   - -EINVAL: timer subsystem not yet initialized
+ */
+int rte_timer_manage_v1902(void);
+int rte_timer_manage(void);
 
 /**
  * Dump statistics about timers.
@@ -300,7 +382,143 @@ void rte_timer_manage(void);
  * @param f
  *   A pointer to a file for output
  */
-void rte_timer_dump_stats(FILE *f);
+void rte_timer_dump_stats_v20(FILE *f);
+
+/**
+ * Dump statistics about timers.
+ *
+ * @param f
+ *   A pointer to a file for output
+ * @return
+ *   - 0: Success
+ *   - -EINVAL: timer subsystem not yet initialized
+ */
+int rte_timer_dump_stats_v1902(FILE *f);
+int rte_timer_dump_stats(FILE *f);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * This function is the same as rte_timer_reset(), except that it allows a
+ * caller to specify the rte_timer_data instance containing the list to which
+ * the timer should be added.
+ *
+ * @see rte_timer_reset()
+ *
+ * @param timer_data_id
+ *   An identifier indicating which instance of timer data should be used for
+ *   this operation.
+ * @param tim
+ *   The timer handle.
+ * @param ticks
+ *   The number of cycles (see rte_get_hpet_hz()) before the callback
+ *   function is called.
+ * @param type
+ *   The type can be either:
+ *   - PERIODICAL: The timer is automatically reloaded after execution
+ *     (returns to the PENDING state)
+ *   - SINGLE: The timer is one-shot, that is, the timer goes to a
+ *     STOPPED state after execution.
+ * @param tim_lcore
+ *   The ID of the lcore where the timer callback function has to be
+ *   executed. If tim_lcore is LCORE_ID_ANY, the timer library will
+ *   launch it on a different core for each call (round-robin).
+ * @param fct
+ *   The callback function of the timer. This parameter can be NULL if (and
+ *   only if) rte_timer_alt_manage() will be used to manage this timer.
+ * @param arg
+ *   The user argument of the callback function.
+ * @return
+ *   - 0: Success; the timer is scheduled.
+ *   - (-1): Timer is in the RUNNING or CONFIG state.
+ *   - -EINVAL: invalid timer_data_id
+ */
+int __rte_experimental
+rte_timer_alt_reset(uint32_t timer_data_id, struct rte_timer *tim,
+		    uint64_t ticks, enum rte_timer_type type,
+		    unsigned int tim_lcore, rte_timer_cb_t fct, void *arg);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * This function is the same as rte_timer_stop(), except that it allows a
+ * caller to specify the rte_timer_data instance containing the list from which
+ * this timer should be removed.
+ *
+ * @see rte_timer_stop()
+ *
+ * @param timer_data_id
+ *   An identifier indicating which instance of timer data should be used for
+ *   this operation.
+ * @param tim
+ *   The timer handle.
+ * @return
+ *   - 0: Success; the timer is stopped.
+ *   - (-1): The timer is in the RUNNING or CONFIG state.
+ *   - -EINVAL: invalid timer_data_id
+ */
+int __rte_experimental
+rte_timer_alt_stop(uint32_t timer_data_id, struct rte_timer *tim);
+
+/**
+ * Callback function type for rte_timer_alt_manage().
+ */
+typedef void (*rte_timer_alt_manage_cb_t)(void *);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Manage a set of timer lists and execute the specified callback function for
+ * all expired timers. This function is similar to rte_timer_manage(), except
+ * that it allows a caller to specify the timer_data instance that should
+ * be operated on, as well as a set of lcore IDs identifying which timer lists
+ * should be processed.  Callback functions of individual timers are ignored.
+ *
+ * @see rte_timer_manage()
+ *
+ * @param timer_data_id
+ *   An identifier indicating which instance of timer data should be used for
+ *   this operation.
+ * @param poll_lcores
+ *   An array of lcore ids identifying the timer lists that should be processed.
+ *   NULL is allowed - if NULL, the timer list corresponding to the lcore
+ *   calling this routine is processed (same as rte_timer_manage()).
+ * @param n_poll_lcores
+ *   The size of the poll_lcores array. If 'poll_lcores' is NULL, this parameter
+ *   is ignored.
+ * @param f
+ *   The callback function which should be called for all expired timers.
+ * @return
+ *   - 0: success
+ *   - -EINVAL: invalid timer_data_id
+ */
+int __rte_experimental
+rte_timer_alt_manage(uint32_t timer_data_id, unsigned int *poll_lcores,
+		     int n_poll_lcores, rte_timer_alt_manage_cb_t f);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * This function is the same as rte_timer_dump_stats(), except that it allows
+ * the caller to specify the rte_timer_data instance that should be used.
+ *
+ * @see rte_timer_dump_stats()
+ *
+ * @param timer_data_id
+ *   An identifier indicating which instance of timer data should be used for
+ *   this operation.
+ * @param f
+ *   A pointer to a file for output
+ * @return
+ *   - 0: success
+ *   - -EINVAL: invalid timer_data_id
+ */
+int __rte_experimental
+rte_timer_alt_dump_stats(uint32_t timer_data_id, FILE *f);
 
 #ifdef __cplusplus
 }
diff --git a/lib/librte_timer/rte_timer_version.map b/lib/librte_timer/rte_timer_version.map
index 9b2e4b8..b3f4b6c 100644
--- a/lib/librte_timer/rte_timer_version.map
+++ b/lib/librte_timer/rte_timer_version.map
@@ -13,3 +13,25 @@ DPDK_2.0 {
 
 	local: *;
 };
+
+DPDK_19.02 {
+	global:
+
+	rte_timer_dump_stats;
+	rte_timer_manage;
+	rte_timer_reset;
+	rte_timer_stop;
+	rte_timer_subsystem_init;
+} DPDK_2.0;
+
+EXPERIMENTAL {
+	global:
+
+	rte_timer_alt_dump_stats;
+	rte_timer_alt_manage;
+	rte_timer_alt_reset;
+	rte_timer_alt_stop;
+	rte_timer_data_alloc;
+	rte_timer_data_dealloc;
+	rte_timer_subsystem_finalize;
+};
-- 
2.6.4

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [dpdk-dev] [PATCH v2 2/2] timer: add function to stop all timers in a list
  2018-12-07 17:52 ` [dpdk-dev] [PATCH v2 0/2] Timer library changes Erik Gabriel Carrillo
  2018-12-07 17:52   ` [dpdk-dev] [PATCH v2 1/2] timer: allow timer management in shared memory Erik Gabriel Carrillo
@ 2018-12-07 17:53   ` Erik Gabriel Carrillo
  2018-12-13 22:26   ` [dpdk-dev] [PATCH v3 0/2] Timer library changes Erik Gabriel Carrillo
  2 siblings, 0 replies; 77+ messages in thread
From: Erik Gabriel Carrillo @ 2018-12-07 17:53 UTC (permalink / raw)
  To: rsanford; +Cc: jerin.jacob, pbhagavatula, dev

Add a function to the timer API that allows a caller to traverse a
specified set of timer lists, stopping each timer in each list,
and invoking a callback function.

Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
---
 lib/librte_timer/rte_timer.c           | 39 ++++++++++++++++++++++++++++++++++
 lib/librte_timer/rte_timer.h           | 32 ++++++++++++++++++++++++++++
 lib/librte_timer/rte_timer_version.map |  1 +
 3 files changed, 72 insertions(+)

diff --git a/lib/librte_timer/rte_timer.c b/lib/librte_timer/rte_timer.c
index 571fb3f..23c755c 100644
--- a/lib/librte_timer/rte_timer.c
+++ b/lib/librte_timer/rte_timer.c
@@ -999,6 +999,45 @@ rte_timer_alt_manage(uint32_t timer_data_id,
 	return 0;
 }
 
+/* Walk pending lists, stopping timers and calling user-specified function */
+int __rte_experimental
+rte_timer_stop_all(uint32_t timer_data_id, unsigned int *walk_lcores,
+		   int nb_walk_lcores,
+		   rte_timer_stop_all_cb_t f, void *f_arg)
+{
+	int i;
+	struct priv_timer *priv_timer;
+	uint32_t walk_lcore;
+	struct rte_timer *tim, *next_tim;
+	struct rte_timer_data *timer_data;
+
+	TIMER_DATA_VALID_GET_OR_ERR_RET(timer_data_id, timer_data, -EINVAL);
+
+	for (i = 0, walk_lcore = walk_lcores[i];
+	     i < nb_walk_lcores;
+	     walk_lcore = walk_lcores[++i]) {
+		priv_timer = &timer_data->priv_timer[walk_lcore];
+
+		rte_spinlock_lock(&priv_timer->list_lock);
+
+		for (tim = priv_timer->pending_head.sl_next[0];
+		     tim != NULL;
+		     tim = next_tim) {
+			next_tim = tim->sl_next[0];
+
+			/* Call timer_stop with lock held */
+			__rte_timer_stop(tim, 1, timer_data);
+
+			if (f)
+				f(tim, f_arg);
+		}
+
+		rte_spinlock_unlock(&priv_timer->list_lock);
+	}
+
+	return 0;
+}
+
 /* dump statistics about timers */
 static void
 __rte_timer_dump_stats(struct rte_timer_data *timer_data __rte_unused, FILE *f)
diff --git a/lib/librte_timer/rte_timer.h b/lib/librte_timer/rte_timer.h
index 82f5fba..b01bd97 100644
--- a/lib/librte_timer/rte_timer.h
+++ b/lib/librte_timer/rte_timer.h
@@ -500,6 +500,38 @@ rte_timer_alt_manage(uint32_t timer_data_id, unsigned int *poll_lcores,
 		     int n_poll_lcores, rte_timer_alt_manage_cb_t f);
 
 /**
+ * Callback function type for rte_timer_stop_all().
+ */
+typedef void (*rte_timer_stop_all_cb_t)(struct rte_timer *tim, void *arg);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Walk the pending timer lists for the specified lcore IDs, and for each timer
+ * that is encountered, stop it and call the specified callback function to
+ * process it further.
+ *
+ * @param timer_data_id
+ *   An identifier indicating which instance of timer data should be used for
+ *   this operation.
+ * @param walk_lcores
+ *   An array of lcore ids identifying the timer lists that should be processed.
+ * @param nb_walk_lcores
+ *   The size of the walk_lcores array.
+ * @param f
+ *   The callback function which should be called for each timers. Can be NULL.
+ * @param f_arg
+ *   An arbitrary argument that will be passed to f, if it is called.
+ * @return
+ *   - 0: success
+ *   - EINVAL: invalid timer_data_id
+ */
+int __rte_experimental
+rte_timer_stop_all(uint32_t timer_data_id, unsigned int *walk_lcores,
+		   int nb_walk_lcores, rte_timer_stop_all_cb_t f, void *f_arg);
+
+/**
  * @warning
  * @b EXPERIMENTAL: this API may change without prior notice
  *
diff --git a/lib/librte_timer/rte_timer_version.map b/lib/librte_timer/rte_timer_version.map
index b3f4b6c..278b2af 100644
--- a/lib/librte_timer/rte_timer_version.map
+++ b/lib/librte_timer/rte_timer_version.map
@@ -33,5 +33,6 @@ EXPERIMENTAL {
 	rte_timer_alt_stop;
 	rte_timer_data_alloc;
 	rte_timer_data_dealloc;
+	rte_timer_stop_all;
 	rte_timer_subsystem_finalize;
 };
-- 
2.6.4

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [dpdk-dev] [PATCH v2 1/2] timer: allow timer management in shared memory
  2018-12-07 17:52   ` [dpdk-dev] [PATCH v2 1/2] timer: allow timer management in shared memory Erik Gabriel Carrillo
@ 2018-12-07 18:10     ` Stephen Hemminger
  2018-12-07 19:21       ` Carrillo, Erik G
  0 siblings, 1 reply; 77+ messages in thread
From: Stephen Hemminger @ 2018-12-07 18:10 UTC (permalink / raw)
  To: Erik Gabriel Carrillo; +Cc: rsanford, jerin.jacob, pbhagavatula, dev

On Fri,  7 Dec 2018 11:52:59 -0600
Erik Gabriel Carrillo <erik.g.carrillo@intel.com> wrote:

> Currently, the timer library uses a per-process table of structures to
> manage skiplists of timers presumably because timers contain arbitrary
> function pointers whose value may not resolve properly in other
> processes.
> 
> However, if the same callback is used handle all timers, and that
> callback is only invoked in one process, then it woud be safe to allow
> the data structures to be allocated in shared memory, and to allow
> secondary processes to modify the timer lists.  This would let timers be
> used in more multi-process scenarios.
> 
> The library's global variables are wrapped with a struct, and an array
> of these structures is created in shared memory.  The original APIs
> are updated to reference the zeroth entry in the array. This maintains
> the original behavior for both primary and secondary processes since
> the set intersection of their coremasks should be empty [1].  New APIs
> are introduced to enable the allocation/deallocation of other entries
> in the array.
> 
> New variants of the APIs used to start and stop timers are introduced;
> they allow a caller to specify which array entry should be used to
> locate the timer list to insert into or delete from.
> 
> Finally, a new variant of rte_timer_manage() is introduced, which
> allows a caller to specify which array entry should be used to locate
> the timer lists to process; it can also process multiple timer lists per
> invocation.
> 
> [1] https://doc.dpdk.org/guides/prog_guide/multi_proc_support.html#multi-process-limitations
> 
> Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>

Makes sense but it looks to me like an ABI breakage. Experimental isn't going to
work for this.

> +static uint32_t default_data_id;  // id set to zero automatically

C++ style comments are not allowed per DPDK coding style.
Best to just drop the comment, it is stating the obvious.
 
> -/* Init the timer library. */
> +static inline int
> +timer_data_valid(uint32_t id)
> +{
> +	return !!(rte_timer_data_arr[id].internal_flags & FL_ALLOCATED);
> +}

Don't need inline on static functions.
...

> +MAP_STATIC_SYMBOL(int rte_timer_manage(void), rte_timer_manage_v1902);
> +BIND_DEFAULT_SYMBOL(rte_timer_manage, _v1902, 19.02);
> +
> +int __rte_experimental
> +rte_timer_alt_manage(uint32_t timer_data_id,
> +		     unsigned int *poll_lcores,
> +		     int nb_poll_lcores,
> +		     rte_timer_alt_manage_cb_t f)
> +{
> +	union rte_timer_status status;
> +	struct rte_timer *tim, *next_tim, **pprev;
> +	struct rte_timer *run_first_tims[RTE_MAX_LCORE];
> +	unsigned int runlist_lcore_ids[RTE_MAX_LCORE];
> +	unsigned int this_lcore = rte_lcore_id();
> +	struct rte_timer *prev[MAX_SKIPLIST_DEPTH + 1];
> +	uint64_t cur_time;
> +	int i, j, ret;
> +	int nb_runlists = 0;
> +	struct rte_timer_data *data;
> +	struct priv_timer *privp;
> +	uint32_t poll_lcore;
> +
> +	TIMER_DATA_VALID_GET_OR_ERR_RET(timer_data_id, data, -EINVAL);
> +
> +	/* timer manager only runs on EAL thread with valid lcore_id */
> +	assert(this_lcore < RTE_MAX_LCORE);
> +
> +	__TIMER_STAT_ADD(data->priv_timer, manage, 1);
> +
> +	if (poll_lcores == NULL) {
> +		poll_lcores = (unsigned int []){rte_lcore_id()};


This isn't going to be safe. It assigns poll_lcores to an array
allocated on the stack.

> +
> +	for (i = 0, poll_lcore = poll_lcores[i]; i < nb_poll_lcores;
> +	     poll_lcore = poll_lcores[++i]) {
> +		privp = &data->priv_timer[poll_lcore];
> +
> +		/* optimize for the case where per-cpu list is empty */
> +		if (privp->pending_head.sl_next[0] == NULL)
> +			continue;
> +		cur_time = rte_get_timer_cycles();
> +
> +#ifdef RTE_ARCH_64
> +		/* on 64-bit the value cached in the pending_head.expired will
> +		 * be updated atomically, so we can consult that for a quick
> +		 * check here outside the lock
> +		 */
> +		if (likely(privp->pending_head.expire > cur_time))
> +			continue;
> +#endif


This code needs to be optimized so that application can call this at a very
high rate without performance impact.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [dpdk-dev] [PATCH v2 1/2] timer: allow timer management in shared memory
  2018-12-07 18:10     ` Stephen Hemminger
@ 2018-12-07 19:21       ` Carrillo, Erik G
  0 siblings, 0 replies; 77+ messages in thread
From: Carrillo, Erik G @ 2018-12-07 19:21 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: rsanford, jerin.jacob, pbhagavatula, dev

Hi Stephen,

Thanks for the review.   Some responses in-line:

> -----Original Message-----
> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Friday, December 7, 2018 12:10 PM
> To: Carrillo, Erik G <erik.g.carrillo@intel.com>
> Cc: rsanford@akamai.com; jerin.jacob@caviumnetworks.com;
> pbhagavatula@caviumnetworks.com; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v2 1/2] timer: allow timer management in
> shared memory
> 
> On Fri,  7 Dec 2018 11:52:59 -0600
> Erik Gabriel Carrillo <erik.g.carrillo@intel.com> wrote:
> 
> > Currently, the timer library uses a per-process table of structures to
> > manage skiplists of timers presumably because timers contain arbitrary
> > function pointers whose value may not resolve properly in other
> > processes.
> >
> > However, if the same callback is used handle all timers, and that
> > callback is only invoked in one process, then it woud be safe to allow
> > the data structures to be allocated in shared memory, and to allow
> > secondary processes to modify the timer lists.  This would let timers
> > be used in more multi-process scenarios.
> >
> > The library's global variables are wrapped with a struct, and an array
> > of these structures is created in shared memory.  The original APIs
> > are updated to reference the zeroth entry in the array. This maintains
> > the original behavior for both primary and secondary processes since
> > the set intersection of their coremasks should be empty [1].  New APIs
> > are introduced to enable the allocation/deallocation of other entries
> > in the array.
> >
> > New variants of the APIs used to start and stop timers are introduced;
> > they allow a caller to specify which array entry should be used to
> > locate the timer list to insert into or delete from.
> >
> > Finally, a new variant of rte_timer_manage() is introduced, which
> > allows a caller to specify which array entry should be used to locate
> > the timer lists to process; it can also process multiple timer lists
> > per invocation.
> >
> > [1]
> > https://doc.dpdk.org/guides/prog_guide/multi_proc_support.html#multi-
> p
> > rocess-limitations
> >
> > Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
> 
> Makes sense but it looks to me like an ABI breakage. Experimental isn't going
> to work for this.

For APIs that existed prior to this patch, I've duplicated them in a "19.02" node in 
the map file;  I only marked new APIs as experimental.  I versioned each API in
order to maintain the prior interface as well.  I tested ABI compatibility
with devtools/validate-abi.sh; it reported no errors detected.  So I believe this
won't break the ABI, but if I need to change something I certainly will.

> 
> > +static uint32_t default_data_id;  // id set to zero automatically
> 
> C++ style comments are not allowed per DPDK coding style.
> Best to just drop the comment, it is stating the obvious.
> 

Sure - will do.

> > -/* Init the timer library. */
> > +static inline int
> > +timer_data_valid(uint32_t id)
> > +{
> > +	return !!(rte_timer_data_arr[id].internal_flags & FL_ALLOCATED); }
> 
> Don't need inline on static functions.
> ...
> 
> > +MAP_STATIC_SYMBOL(int rte_timer_manage(void),
> > +rte_timer_manage_v1902); BIND_DEFAULT_SYMBOL(rte_timer_manage,
> > +_v1902, 19.02);
> > +
> > +int __rte_experimental
> > +rte_timer_alt_manage(uint32_t timer_data_id,
> > +		     unsigned int *poll_lcores,
> > +		     int nb_poll_lcores,
> > +		     rte_timer_alt_manage_cb_t f)
> > +{
> > +	union rte_timer_status status;
> > +	struct rte_timer *tim, *next_tim, **pprev;
> > +	struct rte_timer *run_first_tims[RTE_MAX_LCORE];
> > +	unsigned int runlist_lcore_ids[RTE_MAX_LCORE];
> > +	unsigned int this_lcore = rte_lcore_id();
> > +	struct rte_timer *prev[MAX_SKIPLIST_DEPTH + 1];
> > +	uint64_t cur_time;
> > +	int i, j, ret;
> > +	int nb_runlists = 0;
> > +	struct rte_timer_data *data;
> > +	struct priv_timer *privp;
> > +	uint32_t poll_lcore;
> > +
> > +	TIMER_DATA_VALID_GET_OR_ERR_RET(timer_data_id, data, -
> EINVAL);
> > +
> > +	/* timer manager only runs on EAL thread with valid lcore_id */
> > +	assert(this_lcore < RTE_MAX_LCORE);
> > +
> > +	__TIMER_STAT_ADD(data->priv_timer, manage, 1);
> > +
> > +	if (poll_lcores == NULL) {
> > +		poll_lcores = (unsigned int []){rte_lcore_id()};
> 
> 
> This isn't going to be safe. It assigns poll_lcores to an array allocated on the
> stack.
> 

poll_lcores is allowed to be NULL when  rte_timer_alt_manage() is called for
convenience;  if it is NULL, then we create an array on the stack 
containing one item and point poll_lcores at it.  poll_lcores only needs to be
valid for the invocation of the function, so pointing to an array on the stack
seems fine.  Did I miss the point?

> > +
> > +	for (i = 0, poll_lcore = poll_lcores[i]; i < nb_poll_lcores;
> > +	     poll_lcore = poll_lcores[++i]) {
> > +		privp = &data->priv_timer[poll_lcore];
> > +
> > +		/* optimize for the case where per-cpu list is empty */
> > +		if (privp->pending_head.sl_next[0] == NULL)
> > +			continue;
> > +		cur_time = rte_get_timer_cycles();
> > +
> > +#ifdef RTE_ARCH_64
> > +		/* on 64-bit the value cached in the pending_head.expired
> will
> > +		 * be updated atomically, so we can consult that for a quick
> > +		 * check here outside the lock
> > +		 */
> > +		if (likely(privp->pending_head.expire > cur_time))
> > +			continue;
> > +#endif
> 
> 
> This code needs to be optimized so that application can call this at a very high
> rate without performance impact.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [dpdk-dev] [PATCH v2 0/1] New software event timer adapter
  2018-11-29 23:35 [dpdk-dev] [PATCH 0/3] new software event timer adapter Erik Gabriel Carrillo
                   ` (4 preceding siblings ...)
  2018-12-07 17:52 ` [dpdk-dev] [PATCH v2 0/2] Timer library changes Erik Gabriel Carrillo
@ 2018-12-07 20:34 ` Erik Gabriel Carrillo
  2018-12-07 20:34   ` [dpdk-dev] [PATCH v2 1/1] eventdev: add new " Erik Gabriel Carrillo
  2018-12-14 15:45   ` [dpdk-dev] [PATCH v3 0/1] New " Erik Gabriel Carrillo
  5 siblings, 2 replies; 77+ messages in thread
From: Erik Gabriel Carrillo @ 2018-12-07 20:34 UTC (permalink / raw)
  To: jerin.jacob; +Cc: pbhagavatula, rsanford, stephen, dev

This patch introduces a new version of the event timer adapter software
PMD [1]. In the original design, timer event producer lcores in the primary
and secondary processes enqueued event timers into a ring, and a service
core in the primary process dequeued them and processed them further.  To
improve performance, this version does away with the ring and lets lcores in
both primary and secondary processes insert timers directly into timer
skiplist data structures; the service core directly accesses the lists as
well, when looking for timers that have expired. (This behavior requires
the patch to the timer library that is referenced below.)

Depends on: https://patches.dpdk.org/project/dpdk/list/?series=2699

[1] https://doc.dpdk.org/guides/prog_guide/event_timer_adapter.html

Changes in v2:
 - split this change out into its own patch series

Erik Gabriel Carrillo (1):
  eventdev: add new software event timer adapter

 lib/librte_eventdev/rte_event_timer_adapter.c | 687 +++++++++++---------------
 1 file changed, 275 insertions(+), 412 deletions(-)

-- 
2.6.4

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [dpdk-dev] [PATCH v2 1/1] eventdev: add new software event timer adapter
  2018-12-07 20:34 ` [dpdk-dev] [PATCH v2 0/1] New software event timer adapter Erik Gabriel Carrillo
@ 2018-12-07 20:34   ` Erik Gabriel Carrillo
  2018-12-09 19:17     ` Mattias Rönnblom
  2018-12-14 15:45   ` [dpdk-dev] [PATCH v3 0/1] New " Erik Gabriel Carrillo
  1 sibling, 1 reply; 77+ messages in thread
From: Erik Gabriel Carrillo @ 2018-12-07 20:34 UTC (permalink / raw)
  To: jerin.jacob; +Cc: pbhagavatula, rsanford, stephen, dev

This patch introduces a new version of the event timer adapter software
PMD. In the original design, timer event producer lcores in the primary
and secondary processes enqueued event timers into a ring, and a
service core in the primary process dequeued them and processed them
further.  To improve performance, this version does away with the ring
and lets lcores in both primary and secondary processes insert timers
directly into timer skiplist data structures; the service core directly
accesses the lists as well, when looking for timers that have expired.

Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
---
 lib/librte_eventdev/rte_event_timer_adapter.c | 687 +++++++++++---------------
 1 file changed, 275 insertions(+), 412 deletions(-)

diff --git a/lib/librte_eventdev/rte_event_timer_adapter.c b/lib/librte_eventdev/rte_event_timer_adapter.c
index 79070d4..9c528cb 100644
--- a/lib/librte_eventdev/rte_event_timer_adapter.c
+++ b/lib/librte_eventdev/rte_event_timer_adapter.c
@@ -7,6 +7,7 @@
 #include <inttypes.h>
 #include <stdbool.h>
 #include <sys/queue.h>
+#include <assert.h>
 
 #include <rte_memzone.h>
 #include <rte_memory.h>
@@ -19,6 +20,7 @@
 #include <rte_timer.h>
 #include <rte_service_component.h>
 #include <rte_cycles.h>
+#include <rte_random.h>
 
 #include "rte_eventdev.h"
 #include "rte_eventdev_pmd.h"
@@ -34,7 +36,7 @@ static int evtim_buffer_logtype;
 
 static struct rte_event_timer_adapter adapters[RTE_EVENT_TIMER_ADAPTER_NUM_MAX];
 
-static const struct rte_event_timer_adapter_ops sw_event_adapter_timer_ops;
+static const struct rte_event_timer_adapter_ops swtim_ops;
 
 #define EVTIM_LOG(level, logtype, ...) \
 	rte_log(RTE_LOG_ ## level, logtype, \
@@ -211,7 +213,7 @@ rte_event_timer_adapter_create_ext(
 	 * implementation.
 	 */
 	if (adapter->ops == NULL)
-		adapter->ops = &sw_event_adapter_timer_ops;
+		adapter->ops = &swtim_ops;
 
 	/* Allow driver to do some setup */
 	FUNC_PTR_OR_NULL_RET_WITH_ERRNO(adapter->ops->init, -ENOTSUP);
@@ -334,7 +336,7 @@ rte_event_timer_adapter_lookup(uint16_t adapter_id)
 	 * implementation.
 	 */
 	if (adapter->ops == NULL)
-		adapter->ops = &sw_event_adapter_timer_ops;
+		adapter->ops = &swtim_ops;
 
 	/* Set fast-path function pointers */
 	adapter->arm_burst = adapter->ops->arm_burst;
@@ -491,6 +493,7 @@ event_buffer_flush(struct event_buffer *bufp, uint8_t dev_id, uint8_t port_id,
 	}
 
 	*nb_events_inv = 0;
+
 	*nb_events_flushed = rte_event_enqueue_burst(dev_id, port_id,
 						     &events[tail_idx], n);
 	if (*nb_events_flushed != n && rte_errno == -EINVAL) {
@@ -498,137 +501,123 @@ event_buffer_flush(struct event_buffer *bufp, uint8_t dev_id, uint8_t port_id,
 		(*nb_events_inv)++;
 	}
 
+	if (*nb_events_flushed > 0)
+		EVTIM_BUF_LOG_DBG("enqueued %"PRIu16" timer events to event "
+				  "device", *nb_events_flushed);
+
 	bufp->tail = bufp->tail + *nb_events_flushed + *nb_events_inv;
 }
 
 /*
  * Software event timer adapter implementation
  */
-
-struct rte_event_timer_adapter_sw_data {
-	/* List of messages for outstanding timers */
-	TAILQ_HEAD(, msg) msgs_tailq_head;
-	/* Lock to guard tailq and armed count */
-	rte_spinlock_t msgs_tailq_sl;
+struct swtim {
 	/* Identifier of service executing timer management logic. */
 	uint32_t service_id;
 	/* The cycle count at which the adapter should next tick */
 	uint64_t next_tick_cycles;
-	/* Incremented as the service moves through phases of an iteration */
-	volatile int service_phase;
 	/* The tick resolution used by adapter instance. May have been
 	 * adjusted from what user requested
 	 */
 	uint64_t timer_tick_ns;
 	/* Maximum timeout in nanoseconds allowed by adapter instance. */
 	uint64_t max_tmo_ns;
-	/* Ring containing messages to arm or cancel event timers */
-	struct rte_ring *msg_ring;
-	/* Mempool containing msg objects */
-	struct rte_mempool *msg_pool;
 	/* Buffered timer expiry events to be enqueued to an event device. */
 	struct event_buffer buffer;
 	/* Statistics */
 	struct rte_event_timer_adapter_stats stats;
-	/* The number of threads currently adding to the message ring */
-	rte_atomic16_t message_producer_count;
+	/* Mempool of timer objects */
+	struct rte_mempool *tim_pool;
+	/* Back pointer for convenience */
+	struct rte_event_timer_adapter *adapter;
+	/* Identifier of timer data instance */
+	uint32_t timer_data_id;
+	/* Track which cores have actually armed a timer */
+	rte_atomic16_t in_use[RTE_MAX_LCORE];
+	/* Track which cores' timer lists should be polled */
+	unsigned int poll_lcores[RTE_MAX_LCORE];
+	/* The number of lists that should be polled */
+	int n_poll_lcores;
+	/* Lock to atomically access the above two variables */
+	rte_spinlock_t poll_lcores_sl;
 };
 
-enum msg_type {MSG_TYPE_ARM, MSG_TYPE_CANCEL};
-
-struct msg {
-	enum msg_type type;
-	struct rte_event_timer *evtim;
-	struct rte_timer tim;
-	TAILQ_ENTRY(msg) msgs;
-};
+static inline struct swtim *
+swtim_pmd_priv(const struct rte_event_timer_adapter *adapter)
+{
+	return adapter->data->adapter_priv;
+}
 
 static void
-sw_event_timer_cb(struct rte_timer *tim, void *arg)
+swtim_callback(void *arg)
 {
-	int ret;
+	struct rte_timer *tim = arg;
+	struct rte_event_timer *evtim = tim->arg;
+	struct rte_event_timer_adapter *adapter;
+	struct swtim *sw;
 	uint16_t nb_evs_flushed = 0;
 	uint16_t nb_evs_invalid = 0;
 	uint64_t opaque;
-	struct rte_event_timer *evtim;
-	struct rte_event_timer_adapter *adapter;
-	struct rte_event_timer_adapter_sw_data *sw_data;
+	int ret;
 
-	evtim = arg;
 	opaque = evtim->impl_opaque[1];
 	adapter = (struct rte_event_timer_adapter *)(uintptr_t)opaque;
-	sw_data = adapter->data->adapter_priv;
+	sw = swtim_pmd_priv(adapter);
 
-	ret = event_buffer_add(&sw_data->buffer, &evtim->ev);
+	ret = event_buffer_add(&sw->buffer, &evtim->ev);
 	if (ret < 0) {
 		/* If event buffer is full, put timer back in list with
 		 * immediate expiry value, so that we process it again on the
 		 * next iteration.
 		 */
-		rte_timer_reset_sync(tim, 0, SINGLE, rte_lcore_id(),
-				     sw_event_timer_cb, evtim);
+		rte_timer_alt_reset(sw->timer_data_id, tim, 0, SINGLE,
+				    rte_lcore_id(), NULL, evtim);
+
+		sw->stats.evtim_retry_count++;
 
-		sw_data->stats.evtim_retry_count++;
 		EVTIM_LOG_DBG("event buffer full, resetting rte_timer with "
 			      "immediate expiry value");
 	} else {
-		struct msg *m = container_of(tim, struct msg, tim);
-		TAILQ_REMOVE(&sw_data->msgs_tailq_head, m, msgs);
 		EVTIM_BUF_LOG_DBG("buffered an event timer expiry event");
-		evtim->state = RTE_EVENT_TIMER_NOT_ARMED;
+		rte_mempool_put(sw->tim_pool, tim);
+		sw->stats.evtim_exp_count++;
 
-		/* Free the msg object containing the rte_timer now that
-		 * we've buffered its event successfully.
-		 */
-		rte_mempool_put(sw_data->msg_pool, m);
-
-		/* Bump the count when we successfully add an expiry event to
-		 * the buffer.
-		 */
-		sw_data->stats.evtim_exp_count++;
+		evtim->state = RTE_EVENT_TIMER_NOT_ARMED;
 	}
 
-	if (event_buffer_batch_ready(&sw_data->buffer)) {
-		event_buffer_flush(&sw_data->buffer,
+	if (event_buffer_batch_ready(&sw->buffer)) {
+		event_buffer_flush(&sw->buffer,
 				   adapter->data->event_dev_id,
 				   adapter->data->event_port_id,
 				   &nb_evs_flushed,
 				   &nb_evs_invalid);
 
-		sw_data->stats.ev_enq_count += nb_evs_flushed;
-		sw_data->stats.ev_inv_count += nb_evs_invalid;
+		sw->stats.ev_enq_count += nb_evs_flushed;
+		sw->stats.ev_inv_count += nb_evs_invalid;
 	}
 }
 
 static __rte_always_inline uint64_t
 get_timeout_cycles(struct rte_event_timer *evtim,
-		   struct rte_event_timer_adapter *adapter)
+		   const struct rte_event_timer_adapter *adapter)
 {
-	uint64_t timeout_ns;
-	struct rte_event_timer_adapter_sw_data *sw_data;
-
-	sw_data = adapter->data->adapter_priv;
-	timeout_ns = evtim->timeout_ticks * sw_data->timer_tick_ns;
+	struct swtim *sw = swtim_pmd_priv(adapter);
+	uint64_t timeout_ns = evtim->timeout_ticks * sw->timer_tick_ns;
 	return timeout_ns * rte_get_timer_hz() / NSECPERSEC;
-
 }
 
 /* This function returns true if one or more (adapter) ticks have occurred since
  * the last time it was called.
  */
 static inline bool
-adapter_did_tick(struct rte_event_timer_adapter *adapter)
+swtim_did_tick(struct swtim *sw)
 {
 	uint64_t cycles_per_adapter_tick, start_cycles;
 	uint64_t *next_tick_cyclesp;
-	struct rte_event_timer_adapter_sw_data *sw_data;
-
-	sw_data = adapter->data->adapter_priv;
-	next_tick_cyclesp = &sw_data->next_tick_cycles;
 
-	cycles_per_adapter_tick = sw_data->timer_tick_ns *
+	next_tick_cyclesp = &sw->next_tick_cycles;
+	cycles_per_adapter_tick = sw->timer_tick_ns *
 			(rte_get_timer_hz() / NSECPERSEC);
-
 	start_cycles = rte_get_timer_cycles();
 
 	/* Note: initially, *next_tick_cyclesp == 0, so the clause below will
@@ -640,7 +629,6 @@ adapter_did_tick(struct rte_event_timer_adapter *adapter)
 		 * boundary.
 		 */
 		start_cycles -= start_cycles % cycles_per_adapter_tick;
-
 		*next_tick_cyclesp = start_cycles + cycles_per_adapter_tick;
 
 		return true;
@@ -655,15 +643,12 @@ check_timeout(struct rte_event_timer *evtim,
 	      const struct rte_event_timer_adapter *adapter)
 {
 	uint64_t tmo_nsec;
-	struct rte_event_timer_adapter_sw_data *sw_data;
-
-	sw_data = adapter->data->adapter_priv;
-	tmo_nsec = evtim->timeout_ticks * sw_data->timer_tick_ns;
+	struct swtim *sw = swtim_pmd_priv(adapter);
 
-	if (tmo_nsec > sw_data->max_tmo_ns)
+	tmo_nsec = evtim->timeout_ticks * sw->timer_tick_ns;
+	if (tmo_nsec > sw->max_tmo_ns)
 		return -1;
-
-	if (tmo_nsec < sw_data->timer_tick_ns)
+	if (tmo_nsec < sw->timer_tick_ns)
 		return -2;
 
 	return 0;
@@ -691,110 +676,34 @@ check_destination_event_queue(struct rte_event_timer *evtim,
 	return 0;
 }
 
-#define NB_OBJS 32
 static int
-sw_event_timer_adapter_service_func(void *arg)
+swtim_service_func(void *arg)
 {
-	int i, num_msgs;
-	uint64_t cycles, opaque;
+	struct rte_event_timer_adapter *adapter = arg;
+	struct swtim *sw = swtim_pmd_priv(adapter);
 	uint16_t nb_evs_flushed = 0;
 	uint16_t nb_evs_invalid = 0;
-	struct rte_event_timer_adapter *adapter;
-	struct rte_event_timer_adapter_sw_data *sw_data;
-	struct rte_event_timer *evtim = NULL;
-	struct rte_timer *tim = NULL;
-	struct msg *msg, *msgs[NB_OBJS];
-
-	adapter = arg;
-	sw_data = adapter->data->adapter_priv;
-
-	sw_data->service_phase = 1;
-	rte_smp_wmb();
-
-	while (rte_atomic16_read(&sw_data->message_producer_count) > 0 ||
-	       !rte_ring_empty(sw_data->msg_ring)) {
-
-		num_msgs = rte_ring_dequeue_burst(sw_data->msg_ring,
-						  (void **)msgs, NB_OBJS, NULL);
-
-		for (i = 0; i < num_msgs; i++) {
-			int ret = 0;
-
-			RTE_SET_USED(ret);
-
-			msg = msgs[i];
-			evtim = msg->evtim;
-
-			switch (msg->type) {
-			case MSG_TYPE_ARM:
-				EVTIM_SVC_LOG_DBG("dequeued ARM message from "
-						  "ring");
-				tim = &msg->tim;
-				rte_timer_init(tim);
-				cycles = get_timeout_cycles(evtim,
-							    adapter);
-				ret = rte_timer_reset(tim, cycles, SINGLE,
-						      rte_lcore_id(),
-						      sw_event_timer_cb,
-						      evtim);
-				RTE_ASSERT(ret == 0);
-
-				evtim->impl_opaque[0] = (uintptr_t)tim;
-				evtim->impl_opaque[1] = (uintptr_t)adapter;
-
-				TAILQ_INSERT_TAIL(&sw_data->msgs_tailq_head,
-						  msg,
-						  msgs);
-				break;
-			case MSG_TYPE_CANCEL:
-				EVTIM_SVC_LOG_DBG("dequeued CANCEL message "
-						  "from ring");
-				opaque = evtim->impl_opaque[0];
-				tim = (struct rte_timer *)(uintptr_t)opaque;
-				RTE_ASSERT(tim != NULL);
-
-				ret = rte_timer_stop(tim);
-				RTE_ASSERT(ret == 0);
-
-				/* Free the msg object for the original arm
-				 * request.
-				 */
-				struct msg *m;
-				m = container_of(tim, struct msg, tim);
-				TAILQ_REMOVE(&sw_data->msgs_tailq_head, m,
-					     msgs);
-				rte_mempool_put(sw_data->msg_pool, m);
-
-				/* Free the msg object for the current msg */
-				rte_mempool_put(sw_data->msg_pool, msg);
-
-				evtim->impl_opaque[0] = 0;
-				evtim->impl_opaque[1] = 0;
-
-				break;
-			}
-		}
-	}
-
-	sw_data->service_phase = 2;
-	rte_smp_wmb();
 
-	if (adapter_did_tick(adapter)) {
-		rte_timer_manage();
+	if (swtim_did_tick(sw)) {
+		/* This lock is seldom acquired on the arm side */
+		rte_spinlock_lock(&sw->poll_lcores_sl);
+		rte_timer_alt_manage(sw->timer_data_id,
+				     sw->poll_lcores,
+				     sw->n_poll_lcores,
+				     swtim_callback);
+		rte_spinlock_unlock(&sw->poll_lcores_sl);
 
-		event_buffer_flush(&sw_data->buffer,
+		event_buffer_flush(&sw->buffer,
 				   adapter->data->event_dev_id,
 				   adapter->data->event_port_id,
-				   &nb_evs_flushed, &nb_evs_invalid);
+				   &nb_evs_flushed,
+				   &nb_evs_invalid);
 
-		sw_data->stats.ev_enq_count += nb_evs_flushed;
-		sw_data->stats.ev_inv_count += nb_evs_invalid;
-		sw_data->stats.adapter_tick_count++;
+		sw->stats.ev_enq_count += nb_evs_flushed;
+		sw->stats.ev_inv_count += nb_evs_invalid;
+		sw->stats.adapter_tick_count++;
 	}
 
-	sw_data->service_phase = 0;
-	rte_smp_wmb();
-
 	return 0;
 }
 
@@ -828,168 +737,145 @@ compute_msg_mempool_cache_size(uint64_t nb_requested, uint64_t nb_actual)
 	return cache_size;
 }
 
-#define SW_MIN_INTERVAL 1E5
-
 static int
-sw_event_timer_adapter_init(struct rte_event_timer_adapter *adapter)
+swtim_init(struct rte_event_timer_adapter *adapter)
 {
-	int ret;
-	struct rte_event_timer_adapter_sw_data *sw_data;
-	uint64_t nb_timers;
+	int i, ret;
+	struct swtim *sw;
 	unsigned int flags;
 	struct rte_service_spec service;
-	static bool timer_subsystem_inited; // static initialized to false
 
-	/* Allocate storage for SW implementation data */
-	char priv_data_name[RTE_RING_NAMESIZE];
-	snprintf(priv_data_name, RTE_RING_NAMESIZE, "sw_evtim_adap_priv_%"PRIu8,
-		 adapter->data->id);
-	adapter->data->adapter_priv = rte_zmalloc_socket(
-				priv_data_name,
-				sizeof(struct rte_event_timer_adapter_sw_data),
-				RTE_CACHE_LINE_SIZE,
-				adapter->data->socket_id);
-	if (adapter->data->adapter_priv == NULL) {
+	/* Allocate storage for private data area */
+#define SWTIM_NAMESIZE 32
+	char swtim_name[SWTIM_NAMESIZE];
+	snprintf(swtim_name, SWTIM_NAMESIZE, "swtim_%"PRIu8,
+			adapter->data->id);
+	sw = rte_zmalloc_socket(swtim_name, sizeof(*sw), RTE_CACHE_LINE_SIZE,
+			adapter->data->socket_id);
+	if (sw == NULL) {
 		EVTIM_LOG_ERR("failed to allocate space for private data");
 		rte_errno = ENOMEM;
 		return -1;
 	}
 
-	if (adapter->data->conf.timer_tick_ns < SW_MIN_INTERVAL) {
-		EVTIM_LOG_ERR("failed to create adapter with requested tick "
-			      "interval");
-		rte_errno = EINVAL;
-		return -1;
-	}
-
-	sw_data = adapter->data->adapter_priv;
-
-	sw_data->timer_tick_ns = adapter->data->conf.timer_tick_ns;
-	sw_data->max_tmo_ns = adapter->data->conf.max_tmo_ns;
+	/* Connect storage to adapter instance */
+	adapter->data->adapter_priv = sw;
+	sw->adapter = adapter;
 
-	TAILQ_INIT(&sw_data->msgs_tailq_head);
-	rte_spinlock_init(&sw_data->msgs_tailq_sl);
-	rte_atomic16_init(&sw_data->message_producer_count);
-
-	/* Rings require power of 2, so round up to next such value */
-	nb_timers = rte_align64pow2(adapter->data->conf.nb_timers);
-
-	char msg_ring_name[RTE_RING_NAMESIZE];
-	snprintf(msg_ring_name, RTE_RING_NAMESIZE,
-		 "sw_evtim_adap_msg_ring_%"PRIu8, adapter->data->id);
-	flags = adapter->data->conf.flags & RTE_EVENT_TIMER_ADAPTER_F_SP_PUT ?
-		RING_F_SP_ENQ | RING_F_SC_DEQ :
-		RING_F_SC_DEQ;
-	sw_data->msg_ring = rte_ring_create(msg_ring_name, nb_timers,
-					    adapter->data->socket_id, flags);
-	if (sw_data->msg_ring == NULL) {
-		EVTIM_LOG_ERR("failed to create message ring");
-		rte_errno = ENOMEM;
-		goto free_priv_data;
-	}
+	sw->timer_tick_ns = adapter->data->conf.timer_tick_ns;
+	sw->max_tmo_ns = adapter->data->conf.max_tmo_ns;
 
-	char pool_name[RTE_RING_NAMESIZE];
-	snprintf(pool_name, RTE_RING_NAMESIZE, "sw_evtim_adap_msg_pool_%"PRIu8,
+	/* Create a timer pool */
+	char pool_name[SWTIM_NAMESIZE];
+	snprintf(pool_name, SWTIM_NAMESIZE, "swtim_pool_%"PRIu8,
 		 adapter->data->id);
-
-	/* Both the arming/canceling thread and the service thread will do puts
-	 * to the mempool, but if the SP_PUT flag is enabled, we can specify
-	 * single-consumer get for the mempool.
-	 */
-	flags = adapter->data->conf.flags & RTE_EVENT_TIMER_ADAPTER_F_SP_PUT ?
-		MEMPOOL_F_SC_GET : 0;
-
-	/* The usable size of a ring is count - 1, so subtract one here to
-	 * make the counts agree.
-	 */
+	/* Optimal mempool size is a power of 2 minus one */
+	uint64_t nb_timers = rte_align64pow2(adapter->data->conf.nb_timers);
 	int pool_size = nb_timers - 1;
 	int cache_size = compute_msg_mempool_cache_size(
 				adapter->data->conf.nb_timers, nb_timers);
-	sw_data->msg_pool = rte_mempool_create(pool_name, pool_size,
-					       sizeof(struct msg), cache_size,
-					       0, NULL, NULL, NULL, NULL,
-					       adapter->data->socket_id, flags);
-	if (sw_data->msg_pool == NULL) {
-		EVTIM_LOG_ERR("failed to create message object mempool");
+	flags = 0; /* pool is multi-producer, multi-consumer */
+	sw->tim_pool = rte_mempool_create(pool_name, pool_size,
+			sizeof(struct rte_timer), cache_size, 0, NULL, NULL,
+			NULL, NULL, adapter->data->socket_id, flags);
+	if (sw->tim_pool == NULL) {
+		EVTIM_LOG_ERR("failed to create timer object mempool");
 		rte_errno = ENOMEM;
-		goto free_msg_ring;
+		goto free_alloc;
+	}
+
+	/* Initialize the variables that track in-use timer lists */
+	rte_spinlock_init(&sw->poll_lcores_sl);
+	for (i = 0; i < RTE_MAX_LCORE; i++)
+		rte_atomic16_init(&sw->in_use[i]);
+
+	/* Initialize the timer subsystem and allocate timer data instance */
+	ret = rte_timer_subsystem_init();
+	if (ret < 0) {
+		if (ret != -EALREADY) {
+			EVTIM_LOG_ERR("failed to initialize timer subsystem");
+			rte_errno = ret;
+			goto free_mempool;
+		}
+	}
+
+	ret = rte_timer_data_alloc(&sw->timer_data_id);
+	if (ret < 0) {
+		EVTIM_LOG_ERR("failed to allocate timer data instance");
+		rte_errno = ret;
+		goto free_mempool;
 	}
 
-	event_buffer_init(&sw_data->buffer);
+	/* Initialize timer event buffer */
+	event_buffer_init(&sw->buffer);
+
+	sw->adapter = adapter;
 
 	/* Register a service component to run adapter logic */
 	memset(&service, 0, sizeof(service));
 	snprintf(service.name, RTE_SERVICE_NAME_MAX,
-		 "sw_evimer_adap_svc_%"PRIu8, adapter->data->id);
+		 "swtim_svc_%"PRIu8, adapter->data->id);
 	service.socket_id = adapter->data->socket_id;
-	service.callback = sw_event_timer_adapter_service_func;
+	service.callback = swtim_service_func;
 	service.callback_userdata = adapter;
 	service.capabilities &= ~(RTE_SERVICE_CAP_MT_SAFE);
-	ret = rte_service_component_register(&service, &sw_data->service_id);
+	ret = rte_service_component_register(&service, &sw->service_id);
 	if (ret < 0) {
 		EVTIM_LOG_ERR("failed to register service %s with id %"PRIu32
-			      ": err = %d", service.name, sw_data->service_id,
+			      ": err = %d", service.name, sw->service_id,
 			      ret);
 
 		rte_errno = ENOSPC;
-		goto free_msg_pool;
+		goto free_mempool;
 	}
 
 	EVTIM_LOG_DBG("registered service %s with id %"PRIu32, service.name,
-		      sw_data->service_id);
+		      sw->service_id);
 
-	adapter->data->service_id = sw_data->service_id;
+	adapter->data->service_id = sw->service_id;
 	adapter->data->service_inited = 1;
 
-	if (!timer_subsystem_inited) {
-		rte_timer_subsystem_init();
-		timer_subsystem_inited = true;
-	}
-
 	return 0;
-
-free_msg_pool:
-	rte_mempool_free(sw_data->msg_pool);
-free_msg_ring:
-	rte_ring_free(sw_data->msg_ring);
-free_priv_data:
-	rte_free(sw_data);
+free_mempool:
+	rte_mempool_free(sw->tim_pool);
+free_alloc:
+	rte_free(sw);
 	return -1;
 }
 
-static int
-sw_event_timer_adapter_uninit(struct rte_event_timer_adapter *adapter)
+static void
+swtim_free_tim(struct rte_timer *tim, void *arg)
 {
-	int ret;
-	struct msg *m1, *m2;
-	struct rte_event_timer_adapter_sw_data *sw_data =
-						adapter->data->adapter_priv;
+	struct swtim *sw = arg;
 
-	rte_spinlock_lock(&sw_data->msgs_tailq_sl);
-
-	/* Cancel outstanding rte_timers and free msg objects */
-	m1 = TAILQ_FIRST(&sw_data->msgs_tailq_head);
-	while (m1 != NULL) {
-		EVTIM_LOG_DBG("freeing outstanding timer");
-		m2 = TAILQ_NEXT(m1, msgs);
-
-		rte_timer_stop_sync(&m1->tim);
-		rte_mempool_put(sw_data->msg_pool, m1);
+	rte_mempool_put(sw->tim_pool, (void *)tim);
+}
 
-		m1 = m2;
-	}
+/* Traverse the list of outstanding timers and put them back in the mempool
+ * before freeing the adapter to avoid leaking the memory.
+ */
+static int
+swtim_uninit(struct rte_event_timer_adapter *adapter)
+{
+	int ret;
+	struct swtim *sw = swtim_pmd_priv(adapter);
 
-	rte_spinlock_unlock(&sw_data->msgs_tailq_sl);
+	/* Free outstanding timers */
+	rte_timer_stop_all(sw->timer_data_id,
+			   sw->poll_lcores,
+			   sw->n_poll_lcores,
+			   swtim_free_tim,
+			   sw);
 
-	ret = rte_service_component_unregister(sw_data->service_id);
+	ret = rte_service_component_unregister(sw->service_id);
 	if (ret < 0) {
 		EVTIM_LOG_ERR("failed to unregister service component");
 		return ret;
 	}
 
-	rte_ring_free(sw_data->msg_ring);
-	rte_mempool_free(sw_data->msg_pool);
-	rte_free(adapter->data->adapter_priv);
+	rte_mempool_free(sw->tim_pool);
+	rte_free(sw);
+	adapter->data->adapter_priv = NULL;
 
 	return 0;
 }
@@ -1010,88 +896,79 @@ get_mapped_count_for_service(uint32_t service_id)
 }
 
 static int
-sw_event_timer_adapter_start(const struct rte_event_timer_adapter *adapter)
+swtim_start(const struct rte_event_timer_adapter *adapter)
 {
 	int mapped_count;
-	struct rte_event_timer_adapter_sw_data *sw_data;
-
-	sw_data = adapter->data->adapter_priv;
+	struct swtim *sw = swtim_pmd_priv(adapter);
 
 	/* Mapping the service to more than one service core can introduce
 	 * delays while one thread is waiting to acquire a lock, so only allow
 	 * one core to be mapped to the service.
+	 *
+	 * Note: the service could be modified such that it spreads cores to
+	 * poll over multiple service instances.
 	 */
-	mapped_count = get_mapped_count_for_service(sw_data->service_id);
+	mapped_count = get_mapped_count_for_service(sw->service_id);
 
-	if (mapped_count == 1)
-		return rte_service_component_runstate_set(sw_data->service_id,
-							  1);
+	if (mapped_count != 1)
+		return mapped_count < 1 ? -ENOENT : -ENOTSUP;
 
-	return mapped_count < 1 ? -ENOENT : -ENOTSUP;
+	return rte_service_component_runstate_set(sw->service_id, 1);
 }
 
 static int
-sw_event_timer_adapter_stop(const struct rte_event_timer_adapter *adapter)
+swtim_stop(const struct rte_event_timer_adapter *adapter)
 {
 	int ret;
-	struct rte_event_timer_adapter_sw_data *sw_data =
-						adapter->data->adapter_priv;
+	struct swtim *sw = swtim_pmd_priv(adapter);
 
-	ret = rte_service_component_runstate_set(sw_data->service_id, 0);
+	ret = rte_service_component_runstate_set(sw->service_id, 0);
 	if (ret < 0)
 		return ret;
 
-	/* Wait for the service to complete its final iteration before
-	 * stopping.
-	 */
-	while (sw_data->service_phase != 0)
+	/* Wait for the service to complete its final iteration */
+	while (rte_service_may_be_active(sw->service_id))
 		rte_pause();
 
-	rte_smp_rmb();
-
 	return 0;
 }
 
 static void
-sw_event_timer_adapter_get_info(const struct rte_event_timer_adapter *adapter,
+swtim_get_info(const struct rte_event_timer_adapter *adapter,
 		struct rte_event_timer_adapter_info *adapter_info)
 {
-	struct rte_event_timer_adapter_sw_data *sw_data;
-	sw_data = adapter->data->adapter_priv;
-
-	adapter_info->min_resolution_ns = sw_data->timer_tick_ns;
-	adapter_info->max_tmo_ns = sw_data->max_tmo_ns;
+	struct swtim *sw = swtim_pmd_priv(adapter);
+	adapter_info->min_resolution_ns = sw->timer_tick_ns;
+	adapter_info->max_tmo_ns = sw->max_tmo_ns;
 }
 
 static int
-sw_event_timer_adapter_stats_get(const struct rte_event_timer_adapter *adapter,
-				 struct rte_event_timer_adapter_stats *stats)
+swtim_stats_get(const struct rte_event_timer_adapter *adapter,
+		struct rte_event_timer_adapter_stats *stats)
 {
-	struct rte_event_timer_adapter_sw_data *sw_data;
-	sw_data = adapter->data->adapter_priv;
-	*stats = sw_data->stats;
+	struct swtim *sw = swtim_pmd_priv(adapter);
+	*stats = sw->stats; /* structure copy */
 	return 0;
 }
 
 static int
-sw_event_timer_adapter_stats_reset(
-				const struct rte_event_timer_adapter *adapter)
+swtim_stats_reset(const struct rte_event_timer_adapter *adapter)
 {
-	struct rte_event_timer_adapter_sw_data *sw_data;
-	sw_data = adapter->data->adapter_priv;
-	memset(&sw_data->stats, 0, sizeof(sw_data->stats));
+	struct swtim *sw = swtim_pmd_priv(adapter);
+	memset(&sw->stats, 0, sizeof(sw->stats));
 	return 0;
 }
 
-static __rte_always_inline uint16_t
-__sw_event_timer_arm_burst(const struct rte_event_timer_adapter *adapter,
-			  struct rte_event_timer **evtims,
-			  uint16_t nb_evtims)
+static uint16_t
+__swtim_arm_burst(const struct rte_event_timer_adapter *adapter,
+		struct rte_event_timer **evtims,
+		uint16_t nb_evtims)
 {
-	uint16_t i;
-	int ret;
-	struct rte_event_timer_adapter_sw_data *sw_data;
-	struct msg *msgs[nb_evtims];
+	int i, ret;
+	struct swtim *sw = swtim_pmd_priv(adapter);
+	uint32_t lcore_id = rte_lcore_id();
+	struct rte_timer *tim, *tims[nb_evtims];
+	uint64_t cycles;
 
 #ifdef RTE_LIBRTE_EVENTDEV_DEBUG
 	/* Check that the service is running. */
@@ -1101,101 +978,104 @@ __sw_event_timer_arm_burst(const struct rte_event_timer_adapter *adapter,
 	}
 #endif
 
-	sw_data = adapter->data->adapter_priv;
+	/* Adjust lcore_id if non-EAL thread. Arbitrarily pick the timer list of
+	 * the highest lcore to insert such timers into
+	 */
+	if (lcore_id == LCORE_ID_ANY)
+		lcore_id = RTE_MAX_LCORE - 1;
+
+	/* If this is the first time we're arming an event timer on this lcore,
+	 * mark this lcore as "in use"; this will cause the service
+	 * function to process the timer list that corresponds to this lcore.
+	 */
+	if (unlikely(rte_atomic16_test_and_set(&sw->in_use[lcore_id]))) {
+		rte_spinlock_lock(&sw->poll_lcores_sl);
+		EVTIM_LOG_DBG("Adding lcore id = %u to list of lcores to poll",
+			      lcore_id);
+		sw->poll_lcores[sw->n_poll_lcores++] = lcore_id;
+		rte_spinlock_unlock(&sw->poll_lcores_sl);
+	}
 
-	ret = rte_mempool_get_bulk(sw_data->msg_pool, (void **)msgs, nb_evtims);
+	ret = rte_mempool_get_bulk(sw->tim_pool, (void **)tims,
+				   nb_evtims);
 	if (ret < 0) {
 		rte_errno = ENOSPC;
 		return 0;
 	}
 
-	/* Let the service know we're producing messages for it to process */
-	rte_atomic16_inc(&sw_data->message_producer_count);
-
-	/* If the service is managing timers, wait for it to finish */
-	while (sw_data->service_phase == 2)
-		rte_pause();
-
-	rte_smp_rmb();
-
 	for (i = 0; i < nb_evtims; i++) {
 		/* Don't modify the event timer state in these cases */
 		if (evtims[i]->state == RTE_EVENT_TIMER_ARMED) {
 			rte_errno = EALREADY;
 			break;
 		} else if (!(evtims[i]->state == RTE_EVENT_TIMER_NOT_ARMED ||
-		    evtims[i]->state == RTE_EVENT_TIMER_CANCELED)) {
+			     evtims[i]->state == RTE_EVENT_TIMER_CANCELED)) {
 			rte_errno = EINVAL;
 			break;
 		}
 
 		ret = check_timeout(evtims[i], adapter);
-		if (ret == -1) {
+		if (unlikely(ret == -1)) {
 			evtims[i]->state = RTE_EVENT_TIMER_ERROR_TOOLATE;
 			rte_errno = EINVAL;
 			break;
-		}
-		if (ret == -2) {
+		} else if (unlikely(ret == -2)) {
 			evtims[i]->state = RTE_EVENT_TIMER_ERROR_TOOEARLY;
 			rte_errno = EINVAL;
 			break;
 		}
 
-		if (check_destination_event_queue(evtims[i], adapter) < 0) {
+		if (unlikely(check_destination_event_queue(evtims[i],
+							   adapter) < 0)) {
 			evtims[i]->state = RTE_EVENT_TIMER_ERROR;
 			rte_errno = EINVAL;
 			break;
 		}
 
-		/* Checks passed, set up a message to enqueue */
-		msgs[i]->type = MSG_TYPE_ARM;
-		msgs[i]->evtim = evtims[i];
+		tim = tims[i];
+		rte_timer_init(tim);
 
-		/* Set the payload pointer if not set. */
-		if (evtims[i]->ev.event_ptr == NULL)
-			evtims[i]->ev.event_ptr = evtims[i];
+		evtims[i]->impl_opaque[0] = (uintptr_t)tim;
+		evtims[i]->impl_opaque[1] = (uintptr_t)adapter;
 
-		/* msg objects that get enqueued successfully will be freed
-		 * either by a future cancel operation or by the timer
-		 * expiration callback.
-		 */
-		if (rte_ring_enqueue(sw_data->msg_ring, msgs[i]) < 0) {
-			rte_errno = ENOSPC;
+		cycles = get_timeout_cycles(evtims[i], adapter);
+		ret = rte_timer_alt_reset(sw->timer_data_id, tim, cycles,
+					  SINGLE, lcore_id, NULL, evtims[i]);
+		if (ret < 0) {
+			/* tim was in RUNNING or CONFIG state */
+			evtims[i]->state = RTE_EVENT_TIMER_ERROR;
 			break;
 		}
 
-		EVTIM_LOG_DBG("enqueued ARM message to ring");
-
+		rte_smp_wmb();
+		EVTIM_LOG_DBG("armed an event timer");
 		evtims[i]->state = RTE_EVENT_TIMER_ARMED;
 	}
 
-	/* Let the service know we're done producing messages */
-	rte_atomic16_dec(&sw_data->message_producer_count);
-
 	if (i < nb_evtims)
-		rte_mempool_put_bulk(sw_data->msg_pool, (void **)&msgs[i],
-				     nb_evtims - i);
+		rte_mempool_put_bulk(sw->tim_pool,
+				     (void **)&tims[i], nb_evtims - i);
 
 	return i;
 }
 
 static uint16_t
-sw_event_timer_arm_burst(const struct rte_event_timer_adapter *adapter,
-			 struct rte_event_timer **evtims,
-			 uint16_t nb_evtims)
+swtim_arm_burst(const struct rte_event_timer_adapter *adapter,
+		struct rte_event_timer **evtims,
+		uint16_t nb_evtims)
 {
-	return __sw_event_timer_arm_burst(adapter, evtims, nb_evtims);
+	return __swtim_arm_burst(adapter, evtims, nb_evtims);
 }
 
 static uint16_t
-sw_event_timer_cancel_burst(const struct rte_event_timer_adapter *adapter,
-			    struct rte_event_timer **evtims,
-			    uint16_t nb_evtims)
+swtim_cancel_burst(const struct rte_event_timer_adapter *adapter,
+		   struct rte_event_timer **evtims,
+		   uint16_t nb_evtims)
 {
-	uint16_t i;
-	int ret;
-	struct rte_event_timer_adapter_sw_data *sw_data;
-	struct msg *msgs[nb_evtims];
+	int i, ret;
+	struct rte_timer *timp;
+	uint64_t opaque;
+	struct swtim *sw = swtim_pmd_priv(adapter);
 
 #ifdef RTE_LIBRTE_EVENTDEV_DEBUG
 	/* Check that the service is running. */
@@ -1205,23 +1085,6 @@ sw_event_timer_cancel_burst(const struct rte_event_timer_adapter *adapter,
 	}
 #endif
 
-	sw_data = adapter->data->adapter_priv;
-
-	ret = rte_mempool_get_bulk(sw_data->msg_pool, (void **)msgs, nb_evtims);
-	if (ret < 0) {
-		rte_errno = ENOSPC;
-		return 0;
-	}
-
-	/* Let the service know we're producing messages for it to process */
-	rte_atomic16_inc(&sw_data->message_producer_count);
-
-	/* If the service could be modifying event timer states, wait */
-	while (sw_data->service_phase == 2)
-		rte_pause();
-
-	rte_smp_rmb();
-
 	for (i = 0; i < nb_evtims; i++) {
 		/* Don't modify the event timer state in these cases */
 		if (evtims[i]->state == RTE_EVENT_TIMER_CANCELED) {
@@ -1232,54 +1095,54 @@ sw_event_timer_cancel_burst(const struct rte_event_timer_adapter *adapter,
 			break;
 		}
 
-		msgs[i]->type = MSG_TYPE_CANCEL;
-		msgs[i]->evtim = evtims[i];
+		opaque = evtims[i]->impl_opaque[0];
+		timp = (struct rte_timer *)(uintptr_t)opaque;
+		RTE_ASSERT(timp != NULL);
 
-		if (rte_ring_enqueue(sw_data->msg_ring, msgs[i]) < 0) {
-			rte_errno = ENOSPC;
+		ret = rte_timer_alt_stop(sw->timer_data_id, timp);
+		if (ret < 0) {
+			/* Timer is running or being configured */
+			rte_errno = EAGAIN;
 			break;
 		}
 
-		EVTIM_LOG_DBG("enqueued CANCEL message to ring");
+		rte_mempool_put(sw->tim_pool, (void **)timp);
 
 		evtims[i]->state = RTE_EVENT_TIMER_CANCELED;
-	}
+		evtims[i]->impl_opaque[0] = 0;
+		evtims[i]->impl_opaque[1] = 0;
 
-	/* Let the service know we're done producing messages */
-	rte_atomic16_dec(&sw_data->message_producer_count);
-
-	if (i < nb_evtims)
-		rte_mempool_put_bulk(sw_data->msg_pool, (void **)&msgs[i],
-				     nb_evtims - i);
+		rte_smp_wmb();
+	}
 
 	return i;
 }
 
 static uint16_t
-sw_event_timer_arm_tmo_tick_burst(const struct rte_event_timer_adapter *adapter,
-				  struct rte_event_timer **evtims,
-				  uint64_t timeout_ticks,
-				  uint16_t nb_evtims)
+swtim_arm_tmo_tick_burst(const struct rte_event_timer_adapter *adapter,
+			 struct rte_event_timer **evtims,
+			 uint64_t timeout_ticks,
+			 uint16_t nb_evtims)
 {
 	int i;
 
 	for (i = 0; i < nb_evtims; i++)
 		evtims[i]->timeout_ticks = timeout_ticks;
 
-	return __sw_event_timer_arm_burst(adapter, evtims, nb_evtims);
+	return __swtim_arm_burst(adapter, evtims, nb_evtims);
 }
 
-static const struct rte_event_timer_adapter_ops sw_event_adapter_timer_ops = {
-	.init = sw_event_timer_adapter_init,
-	.uninit = sw_event_timer_adapter_uninit,
-	.start = sw_event_timer_adapter_start,
-	.stop = sw_event_timer_adapter_stop,
-	.get_info = sw_event_timer_adapter_get_info,
-	.stats_get = sw_event_timer_adapter_stats_get,
-	.stats_reset = sw_event_timer_adapter_stats_reset,
-	.arm_burst = sw_event_timer_arm_burst,
-	.arm_tmo_tick_burst = sw_event_timer_arm_tmo_tick_burst,
-	.cancel_burst = sw_event_timer_cancel_burst,
+static const struct rte_event_timer_adapter_ops swtim_ops = {
+	.init			= swtim_init,
+	.uninit			= swtim_uninit,
+	.start			= swtim_start,
+	.stop			= swtim_stop,
+	.get_info		= swtim_get_info,
+	.stats_get		= swtim_stats_get,
+	.stats_reset		= swtim_stats_reset,
+	.arm_burst		= swtim_arm_burst,
+	.arm_tmo_tick_burst	= swtim_arm_tmo_tick_burst,
+	.cancel_burst		= swtim_cancel_burst,
 };
 
 RTE_INIT(event_timer_adapter_init_log)
-- 
2.6.4

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [dpdk-dev] [PATCH v2 1/1] eventdev: add new software event timer adapter
  2018-12-07 20:34   ` [dpdk-dev] [PATCH v2 1/1] eventdev: add new " Erik Gabriel Carrillo
@ 2018-12-09 19:17     ` Mattias Rönnblom
  2018-12-10 17:17       ` Carrillo, Erik G
  0 siblings, 1 reply; 77+ messages in thread
From: Mattias Rönnblom @ 2018-12-09 19:17 UTC (permalink / raw)
  To: Erik Gabriel Carrillo, jerin.jacob; +Cc: pbhagavatula, rsanford, stephen, dev

On 2018-12-07 21:34, Erik Gabriel Carrillo wrote:
> This patch introduces a new version of the event timer adapter software
> PMD. In the original design, timer event producer lcores in the primary
> and secondary processes enqueued event timers into a ring, and a
> service core in the primary process dequeued them and processed them
> further.  To improve performance, this version does away with the ring
> and lets lcores in both primary and secondary processes insert timers
> directly into timer skiplist data structures; the service core directly
> accesses the lists as well, when looking for timers that have expired.
> 
> Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
> ---
>   lib/librte_eventdev/rte_event_timer_adapter.c | 687 +++++++++++---------------
>   1 file changed, 275 insertions(+), 412 deletions(-)
> 
> diff --git a/lib/librte_eventdev/rte_event_timer_adapter.c b/lib/librte_eventdev/rte_event_timer_adapter.c
> index 79070d4..9c528cb 100644
> --- a/lib/librte_eventdev/rte_event_timer_adapter.c
> +++ b/lib/librte_eventdev/rte_event_timer_adapter.c
> @@ -7,6 +7,7 @@
>   #include <inttypes.h>
>   #include <stdbool.h>
>   #include <sys/queue.h>
> +#include <assert.h>
>   

You have no assert() calls, from what I can see. Include <rte_debug.h> 
for RTE_ASSERT().

>   #include <rte_memzone.h>
>   #include <rte_memory.h>
> @@ -19,6 +20,7 @@
>   #include <rte_timer.h>
>   #include <rte_service_component.h>
>   #include <rte_cycles.h>
> +#include <rte_random.h>
>   
>   #include "rte_eventdev.h"
>   #include "rte_eventdev_pmd.h"
> @@ -34,7 +36,7 @@ static int evtim_buffer_logtype;
>   
>   static struct rte_event_timer_adapter adapters[RTE_EVENT_TIMER_ADAPTER_NUM_MAX];
>   
> -static const struct rte_event_timer_adapter_ops sw_event_adapter_timer_ops;
> +static const struct rte_event_timer_adapter_ops swtim_ops;
>   
>   #define EVTIM_LOG(level, logtype, ...) \
>   	rte_log(RTE_LOG_ ## level, logtype, \
> @@ -211,7 +213,7 @@ rte_event_timer_adapter_create_ext(
>   	 * implementation.
>   	 */
>   	if (adapter->ops == NULL)
> -		adapter->ops = &sw_event_adapter_timer_ops;
> +		adapter->ops = &swtim_ops;
>   
>   	/* Allow driver to do some setup */
>   	FUNC_PTR_OR_NULL_RET_WITH_ERRNO(adapter->ops->init, -ENOTSUP);
> @@ -334,7 +336,7 @@ rte_event_timer_adapter_lookup(uint16_t adapter_id)
>   	 * implementation.
>   	 */
>   	if (adapter->ops == NULL)
> -		adapter->ops = &sw_event_adapter_timer_ops;
> +		adapter->ops = &swtim_ops;
>   
>   	/* Set fast-path function pointers */
>   	adapter->arm_burst = adapter->ops->arm_burst;
> @@ -491,6 +493,7 @@ event_buffer_flush(struct event_buffer *bufp, uint8_t dev_id, uint8_t port_id,
>   	}
>   
>   	*nb_events_inv = 0;
> +
>   	*nb_events_flushed = rte_event_enqueue_burst(dev_id, port_id,
>   						     &events[tail_idx], n);
>   	if (*nb_events_flushed != n && rte_errno == -EINVAL) {
> @@ -498,137 +501,123 @@ event_buffer_flush(struct event_buffer *bufp, uint8_t dev_id, uint8_t port_id,
>   		(*nb_events_inv)++;
>   	}
>   
> +	if (*nb_events_flushed > 0)
> +		EVTIM_BUF_LOG_DBG("enqueued %"PRIu16" timer events to event "
> +				  "device", *nb_events_flushed);
> +
>   	bufp->tail = bufp->tail + *nb_events_flushed + *nb_events_inv;
>   }
>   
>   /*
>    * Software event timer adapter implementation
>    */
> -
> -struct rte_event_timer_adapter_sw_data {
> -	/* List of messages for outstanding timers */
> -	TAILQ_HEAD(, msg) msgs_tailq_head;
> -	/* Lock to guard tailq and armed count */
> -	rte_spinlock_t msgs_tailq_sl;
> +struct swtim {
>   	/* Identifier of service executing timer management logic. */
>   	uint32_t service_id;
>   	/* The cycle count at which the adapter should next tick */
>   	uint64_t next_tick_cycles;
> -	/* Incremented as the service moves through phases of an iteration */
> -	volatile int service_phase;
>   	/* The tick resolution used by adapter instance. May have been
>   	 * adjusted from what user requested
>   	 */
>   	uint64_t timer_tick_ns;
>   	/* Maximum timeout in nanoseconds allowed by adapter instance. */
>   	uint64_t max_tmo_ns;
> -	/* Ring containing messages to arm or cancel event timers */
> -	struct rte_ring *msg_ring;
> -	/* Mempool containing msg objects */
> -	struct rte_mempool *msg_pool;
>   	/* Buffered timer expiry events to be enqueued to an event device. */
>   	struct event_buffer buffer;
>   	/* Statistics */
>   	struct rte_event_timer_adapter_stats stats;
> -	/* The number of threads currently adding to the message ring */
> -	rte_atomic16_t message_producer_count;
> +	/* Mempool of timer objects */
> +	struct rte_mempool *tim_pool;
> +	/* Back pointer for convenience */
> +	struct rte_event_timer_adapter *adapter;
> +	/* Identifier of timer data instance */
> +	uint32_t timer_data_id;
> +	/* Track which cores have actually armed a timer */
> +	rte_atomic16_t in_use[RTE_MAX_LCORE];
> +	/* Track which cores' timer lists should be polled */
> +	unsigned int poll_lcores[RTE_MAX_LCORE];
> +	/* The number of lists that should be polled */
> +	int n_poll_lcores;
> +	/* Lock to atomically access the above two variables */
> +	rte_spinlock_t poll_lcores_sl;
>   };
>   
> -enum msg_type {MSG_TYPE_ARM, MSG_TYPE_CANCEL};
> -
> -struct msg {
> -	enum msg_type type;
> -	struct rte_event_timer *evtim;
> -	struct rte_timer tim;
> -	TAILQ_ENTRY(msg) msgs;
> -};
> +static inline struct swtim *
> +swtim_pmd_priv(const struct rte_event_timer_adapter *adapter)
> +{
> +	return adapter->data->adapter_priv;
> +}
>   
>   static void
> -sw_event_timer_cb(struct rte_timer *tim, void *arg)
> +swtim_callback(void *arg)
>   {
> -	int ret;
> +	struct rte_timer *tim = arg;
> +	struct rte_event_timer *evtim = tim->arg;
> +	struct rte_event_timer_adapter *adapter;
> +	struct swtim *sw;
>   	uint16_t nb_evs_flushed = 0;
>   	uint16_t nb_evs_invalid = 0;
>   	uint64_t opaque;
> -	struct rte_event_timer *evtim;
> -	struct rte_event_timer_adapter *adapter;
> -	struct rte_event_timer_adapter_sw_data *sw_data;
> +	int ret;
>   
> -	evtim = arg;
>   	opaque = evtim->impl_opaque[1];
>   	adapter = (struct rte_event_timer_adapter *)(uintptr_t)opaque;
> -	sw_data = adapter->data->adapter_priv;
> +	sw = swtim_pmd_priv(adapter);
>   
> -	ret = event_buffer_add(&sw_data->buffer, &evtim->ev);
> +	ret = event_buffer_add(&sw->buffer, &evtim->ev);
>   	if (ret < 0) {
>   		/* If event buffer is full, put timer back in list with
>   		 * immediate expiry value, so that we process it again on the
>   		 * next iteration.
>   		 */
> -		rte_timer_reset_sync(tim, 0, SINGLE, rte_lcore_id(),
> -				     sw_event_timer_cb, evtim);
> +		rte_timer_alt_reset(sw->timer_data_id, tim, 0, SINGLE,
> +				    rte_lcore_id(), NULL, evtim);
> +
> +		sw->stats.evtim_retry_count++;
>   
> -		sw_data->stats.evtim_retry_count++;
>   		EVTIM_LOG_DBG("event buffer full, resetting rte_timer with "
>   			      "immediate expiry value");
>   	} else {
> -		struct msg *m = container_of(tim, struct msg, tim);
> -		TAILQ_REMOVE(&sw_data->msgs_tailq_head, m, msgs);
>   		EVTIM_BUF_LOG_DBG("buffered an event timer expiry event");
> -		evtim->state = RTE_EVENT_TIMER_NOT_ARMED;
> +		rte_mempool_put(sw->tim_pool, tim);
> +		sw->stats.evtim_exp_count++;
>   
> -		/* Free the msg object containing the rte_timer now that
> -		 * we've buffered its event successfully.
> -		 */
> -		rte_mempool_put(sw_data->msg_pool, m);
> -
> -		/* Bump the count when we successfully add an expiry event to
> -		 * the buffer.
> -		 */
> -		sw_data->stats.evtim_exp_count++;
> +		evtim->state = RTE_EVENT_TIMER_NOT_ARMED;
>   	}
>   
> -	if (event_buffer_batch_ready(&sw_data->buffer)) {
> -		event_buffer_flush(&sw_data->buffer,
> +	if (event_buffer_batch_ready(&sw->buffer)) {
> +		event_buffer_flush(&sw->buffer,
>   				   adapter->data->event_dev_id,
>   				   adapter->data->event_port_id,
>   				   &nb_evs_flushed,
>   				   &nb_evs_invalid);
>   
> -		sw_data->stats.ev_enq_count += nb_evs_flushed;
> -		sw_data->stats.ev_inv_count += nb_evs_invalid;
> +		sw->stats.ev_enq_count += nb_evs_flushed;
> +		sw->stats.ev_inv_count += nb_evs_invalid;
>   	}
>   }
>   
>   static __rte_always_inline uint64_t
>   get_timeout_cycles(struct rte_event_timer *evtim,
> -		   struct rte_event_timer_adapter *adapter)
> +		   const struct rte_event_timer_adapter *adapter)
>   {
> -	uint64_t timeout_ns;
> -	struct rte_event_timer_adapter_sw_data *sw_data;
> -
> -	sw_data = adapter->data->adapter_priv;
> -	timeout_ns = evtim->timeout_ticks * sw_data->timer_tick_ns;
> +	struct swtim *sw = swtim_pmd_priv(adapter);
> +	uint64_t timeout_ns = evtim->timeout_ticks * sw->timer_tick_ns;
>   	return timeout_ns * rte_get_timer_hz() / NSECPERSEC;
> -
>   }
>   
>   /* This function returns true if one or more (adapter) ticks have occurred since
>    * the last time it was called.
>    */
>   static inline bool
> -adapter_did_tick(struct rte_event_timer_adapter *adapter)
> +swtim_did_tick(struct swtim *sw)
>   {
>   	uint64_t cycles_per_adapter_tick, start_cycles;
>   	uint64_t *next_tick_cyclesp;
> -	struct rte_event_timer_adapter_sw_data *sw_data;
> -
> -	sw_data = adapter->data->adapter_priv;
> -	next_tick_cyclesp = &sw_data->next_tick_cycles;
>   
> -	cycles_per_adapter_tick = sw_data->timer_tick_ns *
> +	next_tick_cyclesp = &sw->next_tick_cycles;
> +	cycles_per_adapter_tick = sw->timer_tick_ns *
>   			(rte_get_timer_hz() / NSECPERSEC);
> -
>   	start_cycles = rte_get_timer_cycles();
>   
>   	/* Note: initially, *next_tick_cyclesp == 0, so the clause below will
> @@ -640,7 +629,6 @@ adapter_did_tick(struct rte_event_timer_adapter *adapter)
>   		 * boundary.
>   		 */
>   		start_cycles -= start_cycles % cycles_per_adapter_tick;
> -
>   		*next_tick_cyclesp = start_cycles + cycles_per_adapter_tick;
>   
>   		return true;
> @@ -655,15 +643,12 @@ check_timeout(struct rte_event_timer *evtim,
>   	      const struct rte_event_timer_adapter *adapter)
>   {
>   	uint64_t tmo_nsec;
> -	struct rte_event_timer_adapter_sw_data *sw_data;
> -
> -	sw_data = adapter->data->adapter_priv;
> -	tmo_nsec = evtim->timeout_ticks * sw_data->timer_tick_ns;
> +	struct swtim *sw = swtim_pmd_priv(adapter);
>   
> -	if (tmo_nsec > sw_data->max_tmo_ns)
> +	tmo_nsec = evtim->timeout_ticks * sw->timer_tick_ns;
> +	if (tmo_nsec > sw->max_tmo_ns)
>   		return -1;
> -
> -	if (tmo_nsec < sw_data->timer_tick_ns)
> +	if (tmo_nsec < sw->timer_tick_ns)
>   		return -2;
>   
>   	return 0;
> @@ -691,110 +676,34 @@ check_destination_event_queue(struct rte_event_timer *evtim,
>   	return 0;
>   }
>   
> -#define NB_OBJS 32
>   static int
> -sw_event_timer_adapter_service_func(void *arg)
> +swtim_service_func(void *arg)
>   {
> -	int i, num_msgs;
> -	uint64_t cycles, opaque;
> +	struct rte_event_timer_adapter *adapter = arg;
> +	struct swtim *sw = swtim_pmd_priv(adapter);
>   	uint16_t nb_evs_flushed = 0;
>   	uint16_t nb_evs_invalid = 0;
> -	struct rte_event_timer_adapter *adapter;
> -	struct rte_event_timer_adapter_sw_data *sw_data;
> -	struct rte_event_timer *evtim = NULL;
> -	struct rte_timer *tim = NULL;
> -	struct msg *msg, *msgs[NB_OBJS];
> -
> -	adapter = arg;
> -	sw_data = adapter->data->adapter_priv;
> -
> -	sw_data->service_phase = 1;
> -	rte_smp_wmb();
> -
> -	while (rte_atomic16_read(&sw_data->message_producer_count) > 0 ||
> -	       !rte_ring_empty(sw_data->msg_ring)) {
> -
> -		num_msgs = rte_ring_dequeue_burst(sw_data->msg_ring,
> -						  (void **)msgs, NB_OBJS, NULL);
> -
> -		for (i = 0; i < num_msgs; i++) {
> -			int ret = 0;
> -
> -			RTE_SET_USED(ret);
> -
> -			msg = msgs[i];
> -			evtim = msg->evtim;
> -
> -			switch (msg->type) {
> -			case MSG_TYPE_ARM:
> -				EVTIM_SVC_LOG_DBG("dequeued ARM message from "
> -						  "ring");
> -				tim = &msg->tim;
> -				rte_timer_init(tim);
> -				cycles = get_timeout_cycles(evtim,
> -							    adapter);
> -				ret = rte_timer_reset(tim, cycles, SINGLE,
> -						      rte_lcore_id(),
> -						      sw_event_timer_cb,
> -						      evtim);
> -				RTE_ASSERT(ret == 0);
> -
> -				evtim->impl_opaque[0] = (uintptr_t)tim;
> -				evtim->impl_opaque[1] = (uintptr_t)adapter;
> -
> -				TAILQ_INSERT_TAIL(&sw_data->msgs_tailq_head,
> -						  msg,
> -						  msgs);
> -				break;
> -			case MSG_TYPE_CANCEL:
> -				EVTIM_SVC_LOG_DBG("dequeued CANCEL message "
> -						  "from ring");
> -				opaque = evtim->impl_opaque[0];
> -				tim = (struct rte_timer *)(uintptr_t)opaque;
> -				RTE_ASSERT(tim != NULL);
> -
> -				ret = rte_timer_stop(tim);
> -				RTE_ASSERT(ret == 0);
> -
> -				/* Free the msg object for the original arm
> -				 * request.
> -				 */
> -				struct msg *m;
> -				m = container_of(tim, struct msg, tim);
> -				TAILQ_REMOVE(&sw_data->msgs_tailq_head, m,
> -					     msgs);
> -				rte_mempool_put(sw_data->msg_pool, m);
> -
> -				/* Free the msg object for the current msg */
> -				rte_mempool_put(sw_data->msg_pool, msg);
> -
> -				evtim->impl_opaque[0] = 0;
> -				evtim->impl_opaque[1] = 0;
> -
> -				break;
> -			}
> -		}
> -	}
> -
> -	sw_data->service_phase = 2;
> -	rte_smp_wmb();
>   
> -	if (adapter_did_tick(adapter)) {
> -		rte_timer_manage();
> +	if (swtim_did_tick(sw)) {
> +		/* This lock is seldom acquired on the arm side */
> +		rte_spinlock_lock(&sw->poll_lcores_sl);
> +		rte_timer_alt_manage(sw->timer_data_id,
> +				     sw->poll_lcores,
> +				     sw->n_poll_lcores,
> +				     swtim_callback);
> +		rte_spinlock_unlock(&sw->poll_lcores_sl);
>   
> -		event_buffer_flush(&sw_data->buffer,
> +		event_buffer_flush(&sw->buffer,
>   				   adapter->data->event_dev_id,
>   				   adapter->data->event_port_id,
> -				   &nb_evs_flushed, &nb_evs_invalid);
> +				   &nb_evs_flushed,
> +				   &nb_evs_invalid);
>   
> -		sw_data->stats.ev_enq_count += nb_evs_flushed;
> -		sw_data->stats.ev_inv_count += nb_evs_invalid;
> -		sw_data->stats.adapter_tick_count++;
> +		sw->stats.ev_enq_count += nb_evs_flushed;
> +		sw->stats.ev_inv_count += nb_evs_invalid;
> +		sw->stats.adapter_tick_count++;
>   	}
>   
> -	sw_data->service_phase = 0;
> -	rte_smp_wmb();
> -
>   	return 0;
>   }
>   
> @@ -828,168 +737,145 @@ compute_msg_mempool_cache_size(uint64_t nb_requested, uint64_t nb_actual)
>   	return cache_size;
>   }
>   
> -#define SW_MIN_INTERVAL 1E5
> -
>   static int
> -sw_event_timer_adapter_init(struct rte_event_timer_adapter *adapter)
> +swtim_init(struct rte_event_timer_adapter *adapter)
>   {
> -	int ret;
> -	struct rte_event_timer_adapter_sw_data *sw_data;
> -	uint64_t nb_timers;
> +	int i, ret;
> +	struct swtim *sw;
>   	unsigned int flags;
>   	struct rte_service_spec service;
> -	static bool timer_subsystem_inited; // static initialized to false
>   
> -	/* Allocate storage for SW implementation data */
> -	char priv_data_name[RTE_RING_NAMESIZE];
> -	snprintf(priv_data_name, RTE_RING_NAMESIZE, "sw_evtim_adap_priv_%"PRIu8,
> -		 adapter->data->id);
> -	adapter->data->adapter_priv = rte_zmalloc_socket(
> -				priv_data_name,
> -				sizeof(struct rte_event_timer_adapter_sw_data),
> -				RTE_CACHE_LINE_SIZE,
> -				adapter->data->socket_id);
> -	if (adapter->data->adapter_priv == NULL) {
> +	/* Allocate storage for private data area */
> +#define SWTIM_NAMESIZE 32
> +	char swtim_name[SWTIM_NAMESIZE];
> +	snprintf(swtim_name, SWTIM_NAMESIZE, "swtim_%"PRIu8,
> +			adapter->data->id);
> +	sw = rte_zmalloc_socket(swtim_name, sizeof(*sw), RTE_CACHE_LINE_SIZE,
> +			adapter->data->socket_id);
> +	if (sw == NULL) {
>   		EVTIM_LOG_ERR("failed to allocate space for private data");
>   		rte_errno = ENOMEM;
>   		return -1;
>   	}
>   
> -	if (adapter->data->conf.timer_tick_ns < SW_MIN_INTERVAL) {
> -		EVTIM_LOG_ERR("failed to create adapter with requested tick "
> -			      "interval");
> -		rte_errno = EINVAL;
> -		return -1;
> -	}
> -
> -	sw_data = adapter->data->adapter_priv;
> -
> -	sw_data->timer_tick_ns = adapter->data->conf.timer_tick_ns;
> -	sw_data->max_tmo_ns = adapter->data->conf.max_tmo_ns;
> +	/* Connect storage to adapter instance */
> +	adapter->data->adapter_priv = sw;
> +	sw->adapter = adapter;
>   
> -	TAILQ_INIT(&sw_data->msgs_tailq_head);
> -	rte_spinlock_init(&sw_data->msgs_tailq_sl);
> -	rte_atomic16_init(&sw_data->message_producer_count);
> -
> -	/* Rings require power of 2, so round up to next such value */
> -	nb_timers = rte_align64pow2(adapter->data->conf.nb_timers);
> -
> -	char msg_ring_name[RTE_RING_NAMESIZE];
> -	snprintf(msg_ring_name, RTE_RING_NAMESIZE,
> -		 "sw_evtim_adap_msg_ring_%"PRIu8, adapter->data->id);
> -	flags = adapter->data->conf.flags & RTE_EVENT_TIMER_ADAPTER_F_SP_PUT ?
> -		RING_F_SP_ENQ | RING_F_SC_DEQ :
> -		RING_F_SC_DEQ;
> -	sw_data->msg_ring = rte_ring_create(msg_ring_name, nb_timers,
> -					    adapter->data->socket_id, flags);
> -	if (sw_data->msg_ring == NULL) {
> -		EVTIM_LOG_ERR("failed to create message ring");
> -		rte_errno = ENOMEM;
> -		goto free_priv_data;
> -	}
> +	sw->timer_tick_ns = adapter->data->conf.timer_tick_ns;
> +	sw->max_tmo_ns = adapter->data->conf.max_tmo_ns;
>   
> -	char pool_name[RTE_RING_NAMESIZE];
> -	snprintf(pool_name, RTE_RING_NAMESIZE, "sw_evtim_adap_msg_pool_%"PRIu8,
> +	/* Create a timer pool */
> +	char pool_name[SWTIM_NAMESIZE];
> +	snprintf(pool_name, SWTIM_NAMESIZE, "swtim_pool_%"PRIu8,
>   		 adapter->data->id);
> -
> -	/* Both the arming/canceling thread and the service thread will do puts
> -	 * to the mempool, but if the SP_PUT flag is enabled, we can specify
> -	 * single-consumer get for the mempool.
> -	 */
> -	flags = adapter->data->conf.flags & RTE_EVENT_TIMER_ADAPTER_F_SP_PUT ?
> -		MEMPOOL_F_SC_GET : 0;
> -
> -	/* The usable size of a ring is count - 1, so subtract one here to
> -	 * make the counts agree.
> -	 */
> +	/* Optimal mempool size is a power of 2 minus one */
> +	uint64_t nb_timers = rte_align64pow2(adapter->data->conf.nb_timers);
>   	int pool_size = nb_timers - 1;
>   	int cache_size = compute_msg_mempool_cache_size(
>   				adapter->data->conf.nb_timers, nb_timers);
> -	sw_data->msg_pool = rte_mempool_create(pool_name, pool_size,
> -					       sizeof(struct msg), cache_size,
> -					       0, NULL, NULL, NULL, NULL,
> -					       adapter->data->socket_id, flags);
> -	if (sw_data->msg_pool == NULL) {
> -		EVTIM_LOG_ERR("failed to create message object mempool");
> +	flags = 0; /* pool is multi-producer, multi-consumer */
> +	sw->tim_pool = rte_mempool_create(pool_name, pool_size,
> +			sizeof(struct rte_timer), cache_size, 0, NULL, NULL,
> +			NULL, NULL, adapter->data->socket_id, flags);
> +	if (sw->tim_pool == NULL) {
> +		EVTIM_LOG_ERR("failed to create timer object mempool");
>   		rte_errno = ENOMEM;
> -		goto free_msg_ring;
> +		goto free_alloc;
> +	}
> +
> +	/* Initialize the variables that track in-use timer lists */
> +	rte_spinlock_init(&sw->poll_lcores_sl);
> +	for (i = 0; i < RTE_MAX_LCORE; i++)
> +		rte_atomic16_init(&sw->in_use[i]);
> +
> +	/* Initialize the timer subsystem and allocate timer data instance */
> +	ret = rte_timer_subsystem_init();
> +	if (ret < 0) {
> +		if (ret != -EALREADY) {
> +			EVTIM_LOG_ERR("failed to initialize timer subsystem");
> +			rte_errno = ret;
> +			goto free_mempool;
> +		}
> +	}
> +
> +	ret = rte_timer_data_alloc(&sw->timer_data_id);
> +	if (ret < 0) {
> +		EVTIM_LOG_ERR("failed to allocate timer data instance");
> +		rte_errno = ret;
> +		goto free_mempool;
>   	}
>   
> -	event_buffer_init(&sw_data->buffer);
> +	/* Initialize timer event buffer */
> +	event_buffer_init(&sw->buffer);
> +
> +	sw->adapter = adapter;
>   
>   	/* Register a service component to run adapter logic */
>   	memset(&service, 0, sizeof(service));
>   	snprintf(service.name, RTE_SERVICE_NAME_MAX,
> -		 "sw_evimer_adap_svc_%"PRIu8, adapter->data->id);
> +		 "swtim_svc_%"PRIu8, adapter->data->id);
>   	service.socket_id = adapter->data->socket_id;
> -	service.callback = sw_event_timer_adapter_service_func;
> +	service.callback = swtim_service_func;
>   	service.callback_userdata = adapter;
>   	service.capabilities &= ~(RTE_SERVICE_CAP_MT_SAFE);
> -	ret = rte_service_component_register(&service, &sw_data->service_id);
> +	ret = rte_service_component_register(&service, &sw->service_id);
>   	if (ret < 0) {
>   		EVTIM_LOG_ERR("failed to register service %s with id %"PRIu32
> -			      ": err = %d", service.name, sw_data->service_id,
> +			      ": err = %d", service.name, sw->service_id,
>   			      ret);
>   
>   		rte_errno = ENOSPC;
> -		goto free_msg_pool;
> +		goto free_mempool;
>   	}
>   
>   	EVTIM_LOG_DBG("registered service %s with id %"PRIu32, service.name,
> -		      sw_data->service_id);
> +		      sw->service_id);
>   
> -	adapter->data->service_id = sw_data->service_id;
> +	adapter->data->service_id = sw->service_id;
>   	adapter->data->service_inited = 1;
>   
> -	if (!timer_subsystem_inited) {
> -		rte_timer_subsystem_init();
> -		timer_subsystem_inited = true;
> -	}
> -
>   	return 0;
> -
> -free_msg_pool:
> -	rte_mempool_free(sw_data->msg_pool);
> -free_msg_ring:
> -	rte_ring_free(sw_data->msg_ring);
> -free_priv_data:
> -	rte_free(sw_data);
> +free_mempool:
> +	rte_mempool_free(sw->tim_pool);
> +free_alloc:
> +	rte_free(sw);
>   	return -1;
>   }
>   
> -static int
> -sw_event_timer_adapter_uninit(struct rte_event_timer_adapter *adapter)
> +static void
> +swtim_free_tim(struct rte_timer *tim, void *arg)
>   {
> -	int ret;
> -	struct msg *m1, *m2;
> -	struct rte_event_timer_adapter_sw_data *sw_data =
> -						adapter->data->adapter_priv;
> +	struct swtim *sw = arg;
>   
> -	rte_spinlock_lock(&sw_data->msgs_tailq_sl);
> -
> -	/* Cancel outstanding rte_timers and free msg objects */
> -	m1 = TAILQ_FIRST(&sw_data->msgs_tailq_head);
> -	while (m1 != NULL) {
> -		EVTIM_LOG_DBG("freeing outstanding timer");
> -		m2 = TAILQ_NEXT(m1, msgs);
> -
> -		rte_timer_stop_sync(&m1->tim);
> -		rte_mempool_put(sw_data->msg_pool, m1);
> +	rte_mempool_put(sw->tim_pool, (void *)tim);
> +}

No cast required.

>   
> -		m1 = m2;
> -	}
> +/* Traverse the list of outstanding timers and put them back in the mempool
> + * before freeing the adapter to avoid leaking the memory.
> + */
> +static int
> +swtim_uninit(struct rte_event_timer_adapter *adapter)
> +{
> +	int ret;
> +	struct swtim *sw = swtim_pmd_priv(adapter);
>   
> -	rte_spinlock_unlock(&sw_data->msgs_tailq_sl);
> +	/* Free outstanding timers */
> +	rte_timer_stop_all(sw->timer_data_id,
> +			   sw->poll_lcores,
> +			   sw->n_poll_lcores,
> +			   swtim_free_tim,
> +			   sw);
>   
> -	ret = rte_service_component_unregister(sw_data->service_id);
> +	ret = rte_service_component_unregister(sw->service_id);
>   	if (ret < 0) {
>   		EVTIM_LOG_ERR("failed to unregister service component");
>   		return ret;
>   	}
>   
> -	rte_ring_free(sw_data->msg_ring);
> -	rte_mempool_free(sw_data->msg_pool);
> -	rte_free(adapter->data->adapter_priv);
> +	rte_mempool_free(sw->tim_pool);
> +	rte_free(sw);
> +	adapter->data->adapter_priv = NULL;
>   
>   	return 0;
>   }
> @@ -1010,88 +896,79 @@ get_mapped_count_for_service(uint32_t service_id)
>   }
>   
>   static int
> -sw_event_timer_adapter_start(const struct rte_event_timer_adapter *adapter)
> +swtim_start(const struct rte_event_timer_adapter *adapter)
>   {
>   	int mapped_count;
> -	struct rte_event_timer_adapter_sw_data *sw_data;
> -
> -	sw_data = adapter->data->adapter_priv;
> +	struct swtim *sw = swtim_pmd_priv(adapter);
>   
>   	/* Mapping the service to more than one service core can introduce
>   	 * delays while one thread is waiting to acquire a lock, so only allow
>   	 * one core to be mapped to the service.
> +	 *
> +	 * Note: the service could be modified such that it spreads cores to
> +	 * poll over multiple service instances.
>   	 */
> -	mapped_count = get_mapped_count_for_service(sw_data->service_id);
> +	mapped_count = get_mapped_count_for_service(sw->service_id);
>   
> -	if (mapped_count == 1)
> -		return rte_service_component_runstate_set(sw_data->service_id,
> -							  1);
> +	if (mapped_count != 1)
> +		return mapped_count < 1 ? -ENOENT : -ENOTSUP;
>   
> -	return mapped_count < 1 ? -ENOENT : -ENOTSUP;
> +	return rte_service_component_runstate_set(sw->service_id, 1);
>   }
>   
>   static int
> -sw_event_timer_adapter_stop(const struct rte_event_timer_adapter *adapter)
> +swtim_stop(const struct rte_event_timer_adapter *adapter)
>   {
>   	int ret;
> -	struct rte_event_timer_adapter_sw_data *sw_data =
> -						adapter->data->adapter_priv;
> +	struct swtim *sw = swtim_pmd_priv(adapter);
>   
> -	ret = rte_service_component_runstate_set(sw_data->service_id, 0);
> +	ret = rte_service_component_runstate_set(sw->service_id, 0);
>   	if (ret < 0)
>   		return ret;
>   
> -	/* Wait for the service to complete its final iteration before
> -	 * stopping.
> -	 */
> -	while (sw_data->service_phase != 0)
> +	/* Wait for the service to complete its final iteration */
> +	while (rte_service_may_be_active(sw->service_id))
>   		rte_pause();
>   
> -	rte_smp_rmb();
> -
>   	return 0;
>   }
>   
>   static void
> -sw_event_timer_adapter_get_info(const struct rte_event_timer_adapter *adapter,
> +swtim_get_info(const struct rte_event_timer_adapter *adapter,
>   		struct rte_event_timer_adapter_info *adapter_info)
>   {
> -	struct rte_event_timer_adapter_sw_data *sw_data;
> -	sw_data = adapter->data->adapter_priv;
> -
> -	adapter_info->min_resolution_ns = sw_data->timer_tick_ns;
> -	adapter_info->max_tmo_ns = sw_data->max_tmo_ns;
> +	struct swtim *sw = swtim_pmd_priv(adapter);
> +	adapter_info->min_resolution_ns = sw->timer_tick_ns;
> +	adapter_info->max_tmo_ns = sw->max_tmo_ns;
>   }
>   
>   static int
> -sw_event_timer_adapter_stats_get(const struct rte_event_timer_adapter *adapter,
> -				 struct rte_event_timer_adapter_stats *stats)
> +swtim_stats_get(const struct rte_event_timer_adapter *adapter,
> +		struct rte_event_timer_adapter_stats *stats)
>   {
> -	struct rte_event_timer_adapter_sw_data *sw_data;
> -	sw_data = adapter->data->adapter_priv;
> -	*stats = sw_data->stats;
> +	struct swtim *sw = swtim_pmd_priv(adapter);
> +	*stats = sw->stats; /* structure copy */
>   	return 0;
>   }
>   
>   static int
> -sw_event_timer_adapter_stats_reset(
> -				const struct rte_event_timer_adapter *adapter)
> +swtim_stats_reset(const struct rte_event_timer_adapter *adapter)
>   {
> -	struct rte_event_timer_adapter_sw_data *sw_data;
> -	sw_data = adapter->data->adapter_priv;
> -	memset(&sw_data->stats, 0, sizeof(sw_data->stats));
> +	struct swtim *sw = swtim_pmd_priv(adapter);
> +	memset(&sw->stats, 0, sizeof(sw->stats));
>   	return 0;
>   }
>   
> -static __rte_always_inline uint16_t
> -__sw_event_timer_arm_burst(const struct rte_event_timer_adapter *adapter,
> -			  struct rte_event_timer **evtims,
> -			  uint16_t nb_evtims)
> +static uint16_t
> +__swtim_arm_burst(const struct rte_event_timer_adapter *adapter,
> +		struct rte_event_timer **evtims,
> +		uint16_t nb_evtims)
>   {
> -	uint16_t i;
> -	int ret;
> -	struct rte_event_timer_adapter_sw_data *sw_data;
> -	struct msg *msgs[nb_evtims];
> +	int i, ret;
> +	struct swtim *sw = swtim_pmd_priv(adapter);
> +	uint32_t lcore_id = rte_lcore_id();
> +	struct rte_timer *tim, *tims[nb_evtims];
> +	uint64_t cycles;
>   
>   #ifdef RTE_LIBRTE_EVENTDEV_DEBUG
>   	/* Check that the service is running. */
> @@ -1101,101 +978,104 @@ __sw_event_timer_arm_burst(const struct rte_event_timer_adapter *adapter,
>   	}
>   #endif
>   
> -	sw_data = adapter->data->adapter_priv;
> +	/* Adjust lcore_id if non-EAL thread. Arbitrarily pick the timer list of
> +	 * the highest lcore to insert such timers into
> +	 */
> +	if (lcore_id == LCORE_ID_ANY)
> +		lcore_id = RTE_MAX_LCORE - 1;
> +
> +	/* If this is the first time we're arming an event timer on this lcore,
> +	 * mark this lcore as "in use"; this will cause the service
> +	 * function to process the timer list that corresponds to this lcore.
> +	 */
> +	if (unlikely(rte_atomic16_test_and_set(&sw->in_use[lcore_id]))) {

I suspect we have a performance critical false sharing issue above. 
Many/all flags are going to be arranged on the same cache line.

> +		rte_spinlock_lock(&sw->poll_lcores_sl);
> +		EVTIM_LOG_DBG("Adding lcore id = %u to list of lcores to poll",
> +			      lcore_id);
> +		sw->poll_lcores[sw->n_poll_lcores++] = lcore_id;
> +		rte_spinlock_unlock(&sw->poll_lcores_sl);
> +	}
>   
> -	ret = rte_mempool_get_bulk(sw_data->msg_pool, (void **)msgs, nb_evtims);
> +	ret = rte_mempool_get_bulk(sw->tim_pool, (void **)tims,
> +				   nb_evtims);
>   	if (ret < 0) {
>   		rte_errno = ENOSPC;
>   		return 0;
>   	}
>   
> -	/* Let the service know we're producing messages for it to process */
> -	rte_atomic16_inc(&sw_data->message_producer_count);
> -
> -	/* If the service is managing timers, wait for it to finish */
> -	while (sw_data->service_phase == 2)
> -		rte_pause();
> -
> -	rte_smp_rmb();
> -
>   	for (i = 0; i < nb_evtims; i++) {
>   		/* Don't modify the event timer state in these cases */
>   		if (evtims[i]->state == RTE_EVENT_TIMER_ARMED) {
>   			rte_errno = EALREADY;
>   			break;
>   		} else if (!(evtims[i]->state == RTE_EVENT_TIMER_NOT_ARMED ||
> -		    evtims[i]->state == RTE_EVENT_TIMER_CANCELED)) {
> +			     evtims[i]->state == RTE_EVENT_TIMER_CANCELED)) {
>   			rte_errno = EINVAL;
>   			break;
>   		}
>   
>   		ret = check_timeout(evtims[i], adapter);
> -		if (ret == -1) {
> +		if (unlikely(ret == -1)) {
>   			evtims[i]->state = RTE_EVENT_TIMER_ERROR_TOOLATE;
>   			rte_errno = EINVAL;
>   			break;
> -		}
> -		if (ret == -2) {
> +		} else if (unlikely(ret == -2)) {
>   			evtims[i]->state = RTE_EVENT_TIMER_ERROR_TOOEARLY;
>   			rte_errno = EINVAL;
>   			break;
>   		}
>   
> -		if (check_destination_event_queue(evtims[i], adapter) < 0) {
> +		if (unlikely(check_destination_event_queue(evtims[i],
> +							   adapter) < 0)) {
>   			evtims[i]->state = RTE_EVENT_TIMER_ERROR;
>   			rte_errno = EINVAL;
>   			break;
>   		}
>   
> -		/* Checks passed, set up a message to enqueue */
> -		msgs[i]->type = MSG_TYPE_ARM;
> -		msgs[i]->evtim = evtims[i];
> +		tim = tims[i];
> +		rte_timer_init(tim);
>   
> -		/* Set the payload pointer if not set. */
> -		if (evtims[i]->ev.event_ptr == NULL)
> -			evtims[i]->ev.event_ptr = evtims[i];
> +		evtims[i]->impl_opaque[0] = (uintptr_t)tim;
> +		evtims[i]->impl_opaque[1] = (uintptr_t)adapter;
>   
> -		/* msg objects that get enqueued successfully will be freed
> -		 * either by a future cancel operation or by the timer
> -		 * expiration callback.
> -		 */
> -		if (rte_ring_enqueue(sw_data->msg_ring, msgs[i]) < 0) {
> -			rte_errno = ENOSPC;
> +		cycles = get_timeout_cycles(evtims[i], adapter);
> +		ret = rte_timer_alt_reset(sw->timer_data_id, tim, cycles,
> +					  SINGLE, lcore_id, NULL, evtims[i]);
> +		if (ret < 0) {
> +			/* tim was in RUNNING or CONFIG state */
> +			evtims[i]->state = RTE_EVENT_TIMER_ERROR;
>   			break;
>   		}
>   
> -		EVTIM_LOG_DBG("enqueued ARM message to ring");
> -
> +		rte_smp_wmb();
> +		EVTIM_LOG_DBG("armed an event timer");
>   		evtims[i]->state = RTE_EVENT_TIMER_ARMED;
>   	}
>   
> -	/* Let the service know we're done producing messages */
> -	rte_atomic16_dec(&sw_data->message_producer_count);
> -
>   	if (i < nb_evtims)
> -		rte_mempool_put_bulk(sw_data->msg_pool, (void **)&msgs[i],
> -				     nb_evtims - i);
> +		rte_mempool_put_bulk(sw->tim_pool,
> +				     (void **)&tims[i], nb_evtims - i);
>   
>   	return i;
>   }
>   
>   static uint16_t
> -sw_event_timer_arm_burst(const struct rte_event_timer_adapter *adapter,
> -			 struct rte_event_timer **evtims,
> -			 uint16_t nb_evtims)
> +swtim_arm_burst(const struct rte_event_timer_adapter *adapter,
> +		struct rte_event_timer **evtims,
> +		uint16_t nb_evtims)
>   {
> -	return __sw_event_timer_arm_burst(adapter, evtims, nb_evtims);
> +	return __swtim_arm_burst(adapter, evtims, nb_evtims);
>   }
>   
>   static uint16_t
> -sw_event_timer_cancel_burst(const struct rte_event_timer_adapter *adapter,
> -			    struct rte_event_timer **evtims,
> -			    uint16_t nb_evtims)
> +swtim_cancel_burst(const struct rte_event_timer_adapter *adapter,
> +		   struct rte_event_timer **evtims,
> +		   uint16_t nb_evtims)
>   {
> -	uint16_t i;
> -	int ret;
> -	struct rte_event_timer_adapter_sw_data *sw_data;
> -	struct msg *msgs[nb_evtims];
> +	int i, ret;
> +	struct rte_timer *timp;
> +	uint64_t opaque;
> +	struct swtim *sw = swtim_pmd_priv(adapter);
>   
>   #ifdef RTE_LIBRTE_EVENTDEV_DEBUG
>   	/* Check that the service is running. */
> @@ -1205,23 +1085,6 @@ sw_event_timer_cancel_burst(const struct rte_event_timer_adapter *adapter,
>   	}
>   #endif
>   
> -	sw_data = adapter->data->adapter_priv;
> -
> -	ret = rte_mempool_get_bulk(sw_data->msg_pool, (void **)msgs, nb_evtims);
> -	if (ret < 0) {
> -		rte_errno = ENOSPC;
> -		return 0;
> -	}
> -
> -	/* Let the service know we're producing messages for it to process */
> -	rte_atomic16_inc(&sw_data->message_producer_count);
> -
> -	/* If the service could be modifying event timer states, wait */
> -	while (sw_data->service_phase == 2)
> -		rte_pause();
> -
> -	rte_smp_rmb();
> -
>   	for (i = 0; i < nb_evtims; i++) {
>   		/* Don't modify the event timer state in these cases */
>   		if (evtims[i]->state == RTE_EVENT_TIMER_CANCELED) {
> @@ -1232,54 +1095,54 @@ sw_event_timer_cancel_burst(const struct rte_event_timer_adapter *adapter,
>   			break;
>   		}
>   
> -		msgs[i]->type = MSG_TYPE_CANCEL;
> -		msgs[i]->evtim = evtims[i];
> +		opaque = evtims[i]->impl_opaque[0];
> +		timp = (struct rte_timer *)(uintptr_t)opaque;
> +		RTE_ASSERT(timp != NULL);
>   
> -		if (rte_ring_enqueue(sw_data->msg_ring, msgs[i]) < 0) {
> -			rte_errno = ENOSPC;
> +		ret = rte_timer_alt_stop(sw->timer_data_id, timp);
> +		if (ret < 0) {
> +			/* Timer is running or being configured */
> +			rte_errno = EAGAIN;
>   			break;
>   		}
>   
> -		EVTIM_LOG_DBG("enqueued CANCEL message to ring");
> +		rte_mempool_put(sw->tim_pool, (void **)timp);
>   
>   		evtims[i]->state = RTE_EVENT_TIMER_CANCELED;
> -	}
> +		evtims[i]->impl_opaque[0] = 0;
> +		evtims[i]->impl_opaque[1] = 0;
>   
> -	/* Let the service know we're done producing messages */
> -	rte_atomic16_dec(&sw_data->message_producer_count);
> -
> -	if (i < nb_evtims)
> -		rte_mempool_put_bulk(sw_data->msg_pool, (void **)&msgs[i],
> -				     nb_evtims - i);
> +		rte_smp_wmb();
> +	}
>   
>   	return i;
>   }
>   
>   static uint16_t
> -sw_event_timer_arm_tmo_tick_burst(const struct rte_event_timer_adapter *adapter,
> -				  struct rte_event_timer **evtims,
> -				  uint64_t timeout_ticks,
> -				  uint16_t nb_evtims)
> +swtim_arm_tmo_tick_burst(const struct rte_event_timer_adapter *adapter,
> +			 struct rte_event_timer **evtims,
> +			 uint64_t timeout_ticks,
> +			 uint16_t nb_evtims)
>   {
>   	int i;
>   
>   	for (i = 0; i < nb_evtims; i++)
>   		evtims[i]->timeout_ticks = timeout_ticks;
>   
> -	return __sw_event_timer_arm_burst(adapter, evtims, nb_evtims);
> +	return __swtim_arm_burst(adapter, evtims, nb_evtims);
>   }
>   
> -static const struct rte_event_timer_adapter_ops sw_event_adapter_timer_ops = {
> -	.init = sw_event_timer_adapter_init,
> -	.uninit = sw_event_timer_adapter_uninit,
> -	.start = sw_event_timer_adapter_start,
> -	.stop = sw_event_timer_adapter_stop,
> -	.get_info = sw_event_timer_adapter_get_info,
> -	.stats_get = sw_event_timer_adapter_stats_get,
> -	.stats_reset = sw_event_timer_adapter_stats_reset,
> -	.arm_burst = sw_event_timer_arm_burst,
> -	.arm_tmo_tick_burst = sw_event_timer_arm_tmo_tick_burst,
> -	.cancel_burst = sw_event_timer_cancel_burst,
> +static const struct rte_event_timer_adapter_ops swtim_ops = {
> +	.init			= swtim_init,
> +	.uninit			= swtim_uninit,
> +	.start			= swtim_start,
> +	.stop			= swtim_stop,
> +	.get_info		= swtim_get_info,
> +	.stats_get		= swtim_stats_get,
> +	.stats_reset		= swtim_stats_reset,
> +	.arm_burst		= swtim_arm_burst,
> +	.arm_tmo_tick_burst	= swtim_arm_tmo_tick_burst,
> +	.cancel_burst		= swtim_cancel_burst,
>   };
>   
>   RTE_INIT(event_timer_adapter_init_log)
> 

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [dpdk-dev] [PATCH v2 1/1] eventdev: add new software event timer adapter
  2018-12-09 19:17     ` Mattias Rönnblom
@ 2018-12-10 17:17       ` Carrillo, Erik G
  0 siblings, 0 replies; 77+ messages in thread
From: Carrillo, Erik G @ 2018-12-10 17:17 UTC (permalink / raw)
  To: Mattias Rönnblom, jerin.jacob; +Cc: pbhagavatula, rsanford, stephen, dev

Hi Mattias,

Thanks for the review.  Responses in-line:

> -----Original Message-----
> From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com]
> Sent: Sunday, December 9, 2018 1:17 PM
> To: Carrillo, Erik G <erik.g.carrillo@intel.com>;
> jerin.jacob@caviumnetworks.com
> Cc: pbhagavatula@caviumnetworks.com; rsanford@akamai.com;
> stephen@networkplumber.org; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v2 1/1] eventdev: add new software event
> timer adapter
> 
> On 2018-12-07 21:34, Erik Gabriel Carrillo wrote:
> > This patch introduces a new version of the event timer adapter
> > software PMD. In the original design, timer event producer lcores in
> > the primary and secondary processes enqueued event timers into a ring,
> > and a service core in the primary process dequeued them and processed
> > them further.  To improve performance, this version does away with the
> > ring and lets lcores in both primary and secondary processes insert
> > timers directly into timer skiplist data structures; the service core
> > directly accesses the lists as well, when looking for timers that have
> expired.
> >
> > Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
> > ---
> >   lib/librte_eventdev/rte_event_timer_adapter.c | 687 +++++++++++------
> ---------
> >   1 file changed, 275 insertions(+), 412 deletions(-)
> >
> > diff --git a/lib/librte_eventdev/rte_event_timer_adapter.c
> > b/lib/librte_eventdev/rte_event_timer_adapter.c
> > index 79070d4..9c528cb 100644
> > --- a/lib/librte_eventdev/rte_event_timer_adapter.c
> > +++ b/lib/librte_eventdev/rte_event_timer_adapter.c
> > @@ -7,6 +7,7 @@
> >   #include <inttypes.h>
> >   #include <stdbool.h>
> >   #include <sys/queue.h>
> > +#include <assert.h>
> >
> 
> You have no assert() calls, from what I can see. Include <rte_debug.h> for
> RTE_ASSERT().
> 

Indeed - looks like I can remove that.

<...snipped...>

> > +static void
> > +swtim_free_tim(struct rte_timer *tim, void *arg)
> >   {
> > -	int ret;
> > -	struct msg *m1, *m2;
> > -	struct rte_event_timer_adapter_sw_data *sw_data =
> > -						adapter->data-
> >adapter_priv;
> > +	struct swtim *sw = arg;
> >
> > -	rte_spinlock_lock(&sw_data->msgs_tailq_sl);
> > -
> > -	/* Cancel outstanding rte_timers and free msg objects */
> > -	m1 = TAILQ_FIRST(&sw_data->msgs_tailq_head);
> > -	while (m1 != NULL) {
> > -		EVTIM_LOG_DBG("freeing outstanding timer");
> > -		m2 = TAILQ_NEXT(m1, msgs);
> > -
> > -		rte_timer_stop_sync(&m1->tim);
> > -		rte_mempool_put(sw_data->msg_pool, m1);
> > +	rte_mempool_put(sw->tim_pool, (void *)tim); }
> 
> No cast required.
> 

Will update.

<...snipped...>

> > +static uint16_t
> > +__swtim_arm_burst(const struct rte_event_timer_adapter *adapter,
> > +		struct rte_event_timer **evtims,
> > +		uint16_t nb_evtims)
> >   {
> > -	uint16_t i;
> > -	int ret;
> > -	struct rte_event_timer_adapter_sw_data *sw_data;
> > -	struct msg *msgs[nb_evtims];
> > +	int i, ret;
> > +	struct swtim *sw = swtim_pmd_priv(adapter);
> > +	uint32_t lcore_id = rte_lcore_id();
> > +	struct rte_timer *tim, *tims[nb_evtims];
> > +	uint64_t cycles;
> >
> >   #ifdef RTE_LIBRTE_EVENTDEV_DEBUG
> >   	/* Check that the service is running. */ @@ -1101,101 +978,104 @@
> > __sw_event_timer_arm_burst(const struct rte_event_timer_adapter
> *adapter,
> >   	}
> >   #endif
> >
> > -	sw_data = adapter->data->adapter_priv;
> > +	/* Adjust lcore_id if non-EAL thread. Arbitrarily pick the timer list of
> > +	 * the highest lcore to insert such timers into
> > +	 */
> > +	if (lcore_id == LCORE_ID_ANY)
> > +		lcore_id = RTE_MAX_LCORE - 1;
> > +
> > +	/* If this is the first time we're arming an event timer on this lcore,
> > +	 * mark this lcore as "in use"; this will cause the service
> > +	 * function to process the timer list that corresponds to this lcore.
> > +	 */
> > +	if (unlikely(rte_atomic16_test_and_set(&sw->in_use[lcore_id]))) {
> 
> I suspect we have a performance critical false sharing issue above.
> Many/all flags are going to be arranged on the same cache line.
> 

Good catch - thanks for spotting this.  I'll update the array layout.

Thanks,
Erik

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [dpdk-dev] [PATCH v3 0/2] Timer library changes
  2018-12-07 17:52 ` [dpdk-dev] [PATCH v2 0/2] Timer library changes Erik Gabriel Carrillo
  2018-12-07 17:52   ` [dpdk-dev] [PATCH v2 1/2] timer: allow timer management in shared memory Erik Gabriel Carrillo
  2018-12-07 17:53   ` [dpdk-dev] [PATCH v2 2/2] timer: add function to stop all timers in a list Erik Gabriel Carrillo
@ 2018-12-13 22:26   ` Erik Gabriel Carrillo
  2018-12-13 22:26     ` [dpdk-dev] [PATCH v3 1/2] timer: allow timer management in shared memory Erik Gabriel Carrillo
                       ` (4 more replies)
  2 siblings, 5 replies; 77+ messages in thread
From: Erik Gabriel Carrillo @ 2018-12-13 22:26 UTC (permalink / raw)
  To: rsanford; +Cc: stephen, jerin.jacob, pbhagavatula, dev

This patch series modifies the timer library in such a way that
structures that used to be statically allocated in a process's data
segment are now allocated in shared memory.  As these structures contain
lists of timers, new APIs are introduced that allow a caller to specify
the particular structure instance into which a timer should be inserted
or from which a timer should be removed.  This enables primary and
secondary processes to modify the same timer list, which enables some
multi-process use cases that were not previously possible; e.g. a
secondary process can start a timer whose expiration is detected in a
primary process running a new flavor of timer_manage().

The original library API is mostly unchanged, though implementations are
updated to call into newly added functions with a default structure
instance ID that provides the original behavior.  New functions are
introduced to enable applications to allocate structure instances to
house timer lists, and to reference them with an identifier when
starting and stopping timers, and finally, to manage the timer lists
referenced with an identifier.

My initial performance testing with the "timer_perf_autotest" test shows
no performance regression or improvement, and inspection of the
generated optimized code shows that the extra function call gets inlined
in the functions that now have an extra function call. 

Depends on: https://patches.dpdk.org/patch/48417/

Changes in v3:
 - remove C++ style comment in first patch in series (Stephen)

Changes in v2:
 - split these changes out into their own series
 - version the symbols where the existing ABI was updated, and
   provide alternate implementation with behavior equivalent to original
   behavior. Validated ABI compatibility with validate-abi.sh
 - refactor changes to simplify patches

Erik Gabriel Carrillo (2):
  timer: allow timer management in shared memory
  timer: add function to stop all timers in a list

 lib/librte_timer/Makefile              |   1 +
 lib/librte_timer/rte_timer.c           | 558 ++++++++++++++++++++++++++++++---
 lib/librte_timer/rte_timer.h           | 258 ++++++++++++++-
 lib/librte_timer/rte_timer_version.map |  23 ++
 4 files changed, 795 insertions(+), 45 deletions(-)

-- 
2.6.4

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [dpdk-dev] [PATCH v3 1/2] timer: allow timer management in shared memory
  2018-12-13 22:26   ` [dpdk-dev] [PATCH v3 0/2] Timer library changes Erik Gabriel Carrillo
@ 2018-12-13 22:26     ` Erik Gabriel Carrillo
  2018-12-13 22:26     ` [dpdk-dev] [PATCH v3 2/2] timer: add function to stop all timers in a list Erik Gabriel Carrillo
                       ` (3 subsequent siblings)
  4 siblings, 0 replies; 77+ messages in thread
From: Erik Gabriel Carrillo @ 2018-12-13 22:26 UTC (permalink / raw)
  To: rsanford; +Cc: stephen, jerin.jacob, pbhagavatula, dev

Currently, the timer library uses a per-process table of structures to
manage skiplists of timers presumably because timers contain arbitrary
function pointers whose value may not resolve properly in other
processes.

However, if the same callback is used handle all timers, and that
callback is only invoked in one process, then it woud be safe to allow
the data structures to be allocated in shared memory, and to allow
secondary processes to modify the timer lists.  This would let timers be
used in more multi-process scenarios.

The library's global variables are wrapped with a struct, and an array
of these structures is created in shared memory.  The original APIs
are updated to reference the zeroth entry in the array. This maintains
the original behavior for both primary and secondary processes since
the set intersection of their coremasks should be empty [1].  New APIs
are introduced to enable the allocation/deallocation of other entries
in the array.

New variants of the APIs used to start and stop timers are introduced;
they allow a caller to specify which array entry should be used to
locate the timer list to insert into or delete from.

Finally, a new variant of rte_timer_manage() is introduced, which
allows a caller to specify which array entry should be used to locate
the timer lists to process; it can also process multiple timer lists per
invocation.

[1] https://doc.dpdk.org/guides/prog_guide/multi_proc_support.html#multi-process-limitations

Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
---
 lib/librte_timer/Makefile              |   1 +
 lib/librte_timer/rte_timer.c           | 519 ++++++++++++++++++++++++++++++---
 lib/librte_timer/rte_timer.h           | 226 +++++++++++++-
 lib/librte_timer/rte_timer_version.map |  22 ++
 4 files changed, 723 insertions(+), 45 deletions(-)

diff --git a/lib/librte_timer/Makefile b/lib/librte_timer/Makefile
index 4ebd528..8ec63f4 100644
--- a/lib/librte_timer/Makefile
+++ b/lib/librte_timer/Makefile
@@ -6,6 +6,7 @@ include $(RTE_SDK)/mk/rte.vars.mk
 # library name
 LIB = librte_timer.a
 
+CFLAGS += -DALLOW_EXPERIMENTAL_API
 CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3
 LDLIBS += -lrte_eal
 
diff --git a/lib/librte_timer/rte_timer.c b/lib/librte_timer/rte_timer.c
index 30c7b0a..d761cda 100644
--- a/lib/librte_timer/rte_timer.c
+++ b/lib/librte_timer/rte_timer.c
@@ -5,6 +5,7 @@
 #include <string.h>
 #include <stdio.h>
 #include <stdint.h>
+#include <stdbool.h>
 #include <inttypes.h>
 #include <assert.h>
 #include <sys/queue.h>
@@ -21,11 +22,15 @@
 #include <rte_spinlock.h>
 #include <rte_random.h>
 #include <rte_pause.h>
+#include <rte_memzone.h>
+#include <rte_malloc.h>
+#include <rte_compat.h>
 
 #include "rte_timer.h"
 
-LIST_HEAD(rte_timer_list, rte_timer);
-
+/**
+ * Per-lcore info for timers.
+ */
 struct priv_timer {
 	struct rte_timer pending_head;  /**< dummy timer instance to head up list */
 	rte_spinlock_t list_lock;       /**< lock to protect list access */
@@ -48,25 +53,84 @@ struct priv_timer {
 #endif
 } __rte_cache_aligned;
 
-/** per-lcore private info for timers */
-static struct priv_timer priv_timer[RTE_MAX_LCORE];
+#define FL_ALLOCATED	(1 << 0)
+struct rte_timer_data {
+	struct priv_timer priv_timer[RTE_MAX_LCORE];
+	uint8_t internal_flags;
+};
+
+#define RTE_MAX_DATA_ELS 64
+static struct rte_timer_data *rte_timer_data_arr;
+static uint32_t default_data_id;
+static uint32_t rte_timer_subsystem_initialized;
+
+/* For maintaining older interfaces for a period */
+static struct rte_timer_data default_timer_data;
 
 /* when debug is enabled, store some statistics */
 #ifdef RTE_LIBRTE_TIMER_DEBUG
-#define __TIMER_STAT_ADD(name, n) do {					\
+#define __TIMER_STAT_ADD(priv_timer, name, n) do {			\
 		unsigned __lcore_id = rte_lcore_id();			\
 		if (__lcore_id < RTE_MAX_LCORE)				\
 			priv_timer[__lcore_id].stats.name += (n);	\
 	} while(0)
 #else
-#define __TIMER_STAT_ADD(name, n) do {} while(0)
+#define __TIMER_STAT_ADD(priv_timer, name, n) do {} while (0)
 #endif
 
-/* Init the timer library. */
+static inline int
+timer_data_valid(uint32_t id)
+{
+	return !!(rte_timer_data_arr[id].internal_flags & FL_ALLOCATED);
+}
+
+/* validate ID and retrieve timer data pointer, or return error value */
+#define TIMER_DATA_VALID_GET_OR_ERR_RET(id, timer_data, retval) do {	\
+	if (id >= RTE_MAX_DATA_ELS || !timer_data_valid(id))		\
+		return retval;						\
+	timer_data = &rte_timer_data_arr[id];				\
+} while (0)
+
+int __rte_experimental
+rte_timer_data_alloc(uint32_t *id_ptr)
+{
+	int i;
+	struct rte_timer_data *data;
+
+	if (!rte_timer_subsystem_initialized)
+		return -ENOMEM;
+
+	for (i = 0; i < RTE_MAX_DATA_ELS; i++) {
+		data = &rte_timer_data_arr[i];
+		if (!(data->internal_flags & FL_ALLOCATED)) {
+			data->internal_flags |= FL_ALLOCATED;
+
+			if (id_ptr)
+				*id_ptr = i;
+
+			return 0;
+		}
+	}
+
+	return -ENOSPC;
+}
+
+int __rte_experimental
+rte_timer_data_dealloc(uint32_t id)
+{
+	struct rte_timer_data *timer_data;
+	TIMER_DATA_VALID_GET_OR_ERR_RET(id, timer_data, -EINVAL);
+
+	timer_data->internal_flags &= ~(FL_ALLOCATED);
+
+	return 0;
+}
+
 void
-rte_timer_subsystem_init(void)
+rte_timer_subsystem_init_v20(void)
 {
 	unsigned lcore_id;
+	struct priv_timer *priv_timer = default_timer_data.priv_timer;
 
 	/* since priv_timer is static, it's zeroed by default, so only init some
 	 * fields.
@@ -76,6 +140,76 @@ rte_timer_subsystem_init(void)
 		priv_timer[lcore_id].prev_lcore = lcore_id;
 	}
 }
+VERSION_SYMBOL(rte_timer_subsystem_init, _v20, 2.0);
+
+/* Init the timer library. Allocate an array of timer data structs in shared
+ * memory, and allocate the zeroth entry for use with original timer
+ * APIs. Since the intersection of the sets of lcore ids in primary and
+ * secondary processes should be empty, the zeroth entry can be shared by
+ * multiple processes.
+ */
+int
+rte_timer_subsystem_init_v1902(void)
+{
+	const struct rte_memzone *mz;
+	struct rte_timer_data *data;
+	int i, lcore_id;
+	static const char *mz_name = "rte_timer_mz";
+
+	if (rte_timer_subsystem_initialized)
+		return -EALREADY;
+
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
+		mz = rte_memzone_lookup(mz_name);
+		if (mz == NULL)
+			return -EEXIST;
+
+		rte_timer_data_arr = mz->addr;
+
+		rte_timer_data_arr[default_data_id].internal_flags |=
+			FL_ALLOCATED;
+
+		rte_timer_subsystem_initialized = 1;
+
+		return 0;
+	}
+
+	mz = rte_memzone_reserve_aligned(mz_name,
+			RTE_MAX_DATA_ELS * sizeof(*rte_timer_data_arr),
+			SOCKET_ID_ANY, 0, RTE_CACHE_LINE_SIZE);
+	if (mz == NULL)
+		return -ENOMEM;
+
+	rte_timer_data_arr = mz->addr;
+
+	for (i = 0; i < RTE_MAX_DATA_ELS; i++) {
+		data = &rte_timer_data_arr[i];
+
+		for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
+			rte_spinlock_init(
+				&data->priv_timer[lcore_id].list_lock);
+			data->priv_timer[lcore_id].prev_lcore = lcore_id;
+		}
+	}
+
+	rte_timer_data_arr[default_data_id].internal_flags |= FL_ALLOCATED;
+
+	rte_timer_subsystem_initialized = 1;
+
+	return 0;
+}
+MAP_STATIC_SYMBOL(int rte_timer_subsystem_init(void),
+		  rte_timer_subsystem_init_v1902);
+BIND_DEFAULT_SYMBOL(rte_timer_subsystem_init, _v1902, 19.02);
+
+void __rte_experimental
+rte_timer_subsystem_finalize(void)
+{
+	if (rte_timer_data_arr)
+		rte_free(rte_timer_data_arr);
+
+	rte_timer_subsystem_initialized = 0;
+}
 
 /* Initialize the timer handle tim for use */
 void
@@ -95,7 +229,8 @@ rte_timer_init(struct rte_timer *tim)
  */
 static int
 timer_set_config_state(struct rte_timer *tim,
-		       union rte_timer_status *ret_prev_status)
+		       union rte_timer_status *ret_prev_status,
+		       struct priv_timer *priv_timer)
 {
 	union rte_timer_status prev_status, status;
 	int success = 0;
@@ -207,7 +342,7 @@ timer_get_skiplist_level(unsigned curr_depth)
  */
 static void
 timer_get_prev_entries(uint64_t time_val, unsigned tim_lcore,
-		struct rte_timer **prev)
+		       struct rte_timer **prev, struct priv_timer *priv_timer)
 {
 	unsigned lvl = priv_timer[tim_lcore].curr_skiplist_depth;
 	prev[lvl] = &priv_timer[tim_lcore].pending_head;
@@ -226,13 +361,15 @@ timer_get_prev_entries(uint64_t time_val, unsigned tim_lcore,
  */
 static void
 timer_get_prev_entries_for_node(struct rte_timer *tim, unsigned tim_lcore,
-		struct rte_timer **prev)
+				struct rte_timer **prev,
+				struct priv_timer *priv_timer)
 {
 	int i;
+
 	/* to get a specific entry in the list, look for just lower than the time
 	 * values, and then increment on each level individually if necessary
 	 */
-	timer_get_prev_entries(tim->expire - 1, tim_lcore, prev);
+	timer_get_prev_entries(tim->expire - 1, tim_lcore, prev, priv_timer);
 	for (i = priv_timer[tim_lcore].curr_skiplist_depth - 1; i >= 0; i--) {
 		while (prev[i]->sl_next[i] != NULL &&
 				prev[i]->sl_next[i] != tim &&
@@ -247,14 +384,15 @@ timer_get_prev_entries_for_node(struct rte_timer *tim, unsigned tim_lcore,
  * timer must not be in a list
  */
 static void
-timer_add(struct rte_timer *tim, unsigned int tim_lcore)
+timer_add(struct rte_timer *tim, unsigned int tim_lcore,
+	  struct priv_timer *priv_timer)
 {
 	unsigned lvl;
 	struct rte_timer *prev[MAX_SKIPLIST_DEPTH+1];
 
 	/* find where exactly this element goes in the list of elements
 	 * for each depth. */
-	timer_get_prev_entries(tim->expire, tim_lcore, prev);
+	timer_get_prev_entries(tim->expire, tim_lcore, prev, priv_timer);
 
 	/* now assign it a new level and add at that level */
 	const unsigned tim_level = timer_get_skiplist_level(
@@ -284,7 +422,7 @@ timer_add(struct rte_timer *tim, unsigned int tim_lcore)
  */
 static void
 timer_del(struct rte_timer *tim, union rte_timer_status prev_status,
-		int local_is_locked)
+	  int local_is_locked, struct priv_timer *priv_timer)
 {
 	unsigned lcore_id = rte_lcore_id();
 	unsigned prev_owner = prev_status.owner;
@@ -304,7 +442,7 @@ timer_del(struct rte_timer *tim, union rte_timer_status prev_status,
 				((tim->sl_next[0] == NULL) ? 0 : tim->sl_next[0]->expire);
 
 	/* adjust pointers from previous entries to point past this */
-	timer_get_prev_entries_for_node(tim, prev_owner, prev);
+	timer_get_prev_entries_for_node(tim, prev_owner, prev, priv_timer);
 	for (i = priv_timer[prev_owner].curr_skiplist_depth - 1; i >= 0; i--) {
 		if (prev[i]->sl_next[i] == tim)
 			prev[i]->sl_next[i] = tim->sl_next[i];
@@ -326,11 +464,13 @@ static int
 __rte_timer_reset(struct rte_timer *tim, uint64_t expire,
 		  uint64_t period, unsigned tim_lcore,
 		  rte_timer_cb_t fct, void *arg,
-		  int local_is_locked)
+		  int local_is_locked,
+		  struct rte_timer_data *timer_data)
 {
 	union rte_timer_status prev_status, status;
 	int ret;
 	unsigned lcore_id = rte_lcore_id();
+	struct priv_timer *priv_timer = timer_data->priv_timer;
 
 	/* round robin for tim_lcore */
 	if (tim_lcore == (unsigned)LCORE_ID_ANY) {
@@ -348,11 +488,11 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t expire,
 
 	/* wait that the timer is in correct status before update,
 	 * and mark it as being configured */
-	ret = timer_set_config_state(tim, &prev_status);
+	ret = timer_set_config_state(tim, &prev_status, priv_timer);
 	if (ret < 0)
 		return -1;
 
-	__TIMER_STAT_ADD(reset, 1);
+	__TIMER_STAT_ADD(priv_timer, reset, 1);
 	if (prev_status.state == RTE_TIMER_RUNNING &&
 	    lcore_id < RTE_MAX_LCORE) {
 		priv_timer[lcore_id].updated = 1;
@@ -360,8 +500,8 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t expire,
 
 	/* remove it from list */
 	if (prev_status.state == RTE_TIMER_PENDING) {
-		timer_del(tim, prev_status, local_is_locked);
-		__TIMER_STAT_ADD(pending, -1);
+		timer_del(tim, prev_status, local_is_locked, priv_timer);
+		__TIMER_STAT_ADD(priv_timer, pending, -1);
 	}
 
 	tim->period = period;
@@ -376,8 +516,8 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t expire,
 	if (tim_lcore != lcore_id || !local_is_locked)
 		rte_spinlock_lock(&priv_timer[tim_lcore].list_lock);
 
-	__TIMER_STAT_ADD(pending, 1);
-	timer_add(tim, tim_lcore);
+	__TIMER_STAT_ADD(priv_timer, pending, 1);
+	timer_add(tim, tim_lcore, priv_timer);
 
 	/* update state: as we are in CONFIG state, only us can modify
 	 * the state so we don't need to use cmpset() here */
@@ -394,9 +534,9 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t expire,
 
 /* Reset and start the timer associated with the timer handle tim */
 int
-rte_timer_reset(struct rte_timer *tim, uint64_t ticks,
-		enum rte_timer_type type, unsigned tim_lcore,
-		rte_timer_cb_t fct, void *arg)
+rte_timer_reset_v20(struct rte_timer *tim, uint64_t ticks,
+		    enum rte_timer_type type, unsigned int tim_lcore,
+		    rte_timer_cb_t fct, void *arg)
 {
 	uint64_t cur_time = rte_get_timer_cycles();
 	uint64_t period;
@@ -412,7 +552,48 @@ rte_timer_reset(struct rte_timer *tim, uint64_t ticks,
 		period = 0;
 
 	return __rte_timer_reset(tim,  cur_time + ticks, period, tim_lcore,
-			  fct, arg, 0);
+			  fct, arg, 0, &default_timer_data);
+}
+VERSION_SYMBOL(rte_timer_reset, _v20, 2.0);
+
+int
+rte_timer_reset_v1902(struct rte_timer *tim, uint64_t ticks,
+		      enum rte_timer_type type, unsigned int tim_lcore,
+		      rte_timer_cb_t fct, void *arg)
+{
+	return rte_timer_alt_reset(default_data_id, tim, ticks, type,
+				   tim_lcore, fct, arg);
+}
+MAP_STATIC_SYMBOL(int rte_timer_reset(struct rte_timer *tim, uint64_t ticks,
+				      enum rte_timer_type type,
+				      unsigned int tim_lcore,
+				      rte_timer_cb_t fct, void *arg),
+		  rte_timer_reset_v1902);
+BIND_DEFAULT_SYMBOL(rte_timer_reset, _v1902, 19.02);
+
+int __rte_experimental
+rte_timer_alt_reset(uint32_t timer_data_id, struct rte_timer *tim,
+		    uint64_t ticks, enum rte_timer_type type,
+		    unsigned int tim_lcore, rte_timer_cb_t fct, void *arg)
+{
+	uint64_t cur_time = rte_get_timer_cycles();
+	uint64_t period;
+	struct rte_timer_data *timer_data;
+
+	TIMER_DATA_VALID_GET_OR_ERR_RET(timer_data_id, timer_data, -EINVAL);
+
+	if (unlikely((tim_lcore != (unsigned int)LCORE_ID_ANY) &&
+			!(rte_lcore_is_enabled(tim_lcore) ||
+			  rte_lcore_has_role(tim_lcore, ROLE_SERVICE))))
+		return -1;
+
+	if (type == PERIODICAL)
+		period = ticks;
+	else
+		period = 0;
+
+	return __rte_timer_reset(tim,  cur_time + ticks, period, tim_lcore,
+				 fct, arg, 0, timer_data);
 }
 
 /* loop until rte_timer_reset() succeed */
@@ -426,21 +607,22 @@ rte_timer_reset_sync(struct rte_timer *tim, uint64_t ticks,
 		rte_pause();
 }
 
-/* Stop the timer associated with the timer handle tim */
-int
-rte_timer_stop(struct rte_timer *tim)
+static int
+__rte_timer_stop(struct rte_timer *tim, int local_is_locked,
+		 struct rte_timer_data *timer_data)
 {
 	union rte_timer_status prev_status, status;
 	unsigned lcore_id = rte_lcore_id();
 	int ret;
+	struct priv_timer *priv_timer = timer_data->priv_timer;
 
 	/* wait that the timer is in correct status before update,
 	 * and mark it as being configured */
-	ret = timer_set_config_state(tim, &prev_status);
+	ret = timer_set_config_state(tim, &prev_status, priv_timer);
 	if (ret < 0)
 		return -1;
 
-	__TIMER_STAT_ADD(stop, 1);
+	__TIMER_STAT_ADD(priv_timer, stop, 1);
 	if (prev_status.state == RTE_TIMER_RUNNING &&
 	    lcore_id < RTE_MAX_LCORE) {
 		priv_timer[lcore_id].updated = 1;
@@ -448,8 +630,8 @@ rte_timer_stop(struct rte_timer *tim)
 
 	/* remove it from list */
 	if (prev_status.state == RTE_TIMER_PENDING) {
-		timer_del(tim, prev_status, 0);
-		__TIMER_STAT_ADD(pending, -1);
+		timer_del(tim, prev_status, local_is_locked, priv_timer);
+		__TIMER_STAT_ADD(priv_timer, pending, -1);
 	}
 
 	/* mark timer as stopped */
@@ -461,6 +643,33 @@ rte_timer_stop(struct rte_timer *tim)
 	return 0;
 }
 
+/* Stop the timer associated with the timer handle tim */
+int
+rte_timer_stop_v20(struct rte_timer *tim)
+{
+	return __rte_timer_stop(tim, 0, &default_timer_data);
+}
+VERSION_SYMBOL(rte_timer_stop, _v20, 2.0);
+
+int
+rte_timer_stop_v1902(struct rte_timer *tim)
+{
+	return rte_timer_alt_stop(default_data_id, tim);
+}
+MAP_STATIC_SYMBOL(int rte_timer_stop(struct rte_timer *tim),
+		  rte_timer_stop_v1902);
+BIND_DEFAULT_SYMBOL(rte_timer_stop, _v1902, 19.02);
+
+int __rte_experimental
+rte_timer_alt_stop(uint32_t timer_data_id, struct rte_timer *tim)
+{
+	struct rte_timer_data *timer_data;
+
+	TIMER_DATA_VALID_GET_OR_ERR_RET(timer_data_id, timer_data, -EINVAL);
+
+	return __rte_timer_stop(tim, 0, timer_data);
+}
+
 /* loop until rte_timer_stop() succeed */
 void
 rte_timer_stop_sync(struct rte_timer *tim)
@@ -477,7 +686,8 @@ rte_timer_pending(struct rte_timer *tim)
 }
 
 /* must be called periodically, run all timer that expired */
-void rte_timer_manage(void)
+static void
+__rte_timer_manage(struct rte_timer_data *timer_data)
 {
 	union rte_timer_status status;
 	struct rte_timer *tim, *next_tim;
@@ -486,11 +696,12 @@ void rte_timer_manage(void)
 	struct rte_timer *prev[MAX_SKIPLIST_DEPTH + 1];
 	uint64_t cur_time;
 	int i, ret;
+	struct priv_timer *priv_timer = timer_data->priv_timer;
 
 	/* timer manager only runs on EAL thread with valid lcore_id */
 	assert(lcore_id < RTE_MAX_LCORE);
 
-	__TIMER_STAT_ADD(manage, 1);
+	__TIMER_STAT_ADD(priv_timer, manage, 1);
 	/* optimize for the case where per-cpu list is empty */
 	if (priv_timer[lcore_id].pending_head.sl_next[0] == NULL)
 		return;
@@ -518,7 +729,7 @@ void rte_timer_manage(void)
 	tim = priv_timer[lcore_id].pending_head.sl_next[0];
 
 	/* break the existing list at current time point */
-	timer_get_prev_entries(cur_time, lcore_id, prev);
+	timer_get_prev_entries(cur_time, lcore_id, prev, priv_timer);
 	for (i = priv_timer[lcore_id].curr_skiplist_depth -1; i >= 0; i--) {
 		if (prev[i] == &priv_timer[lcore_id].pending_head)
 			continue;
@@ -563,7 +774,7 @@ void rte_timer_manage(void)
 		/* execute callback function with list unlocked */
 		tim->f(tim, tim->arg);
 
-		__TIMER_STAT_ADD(pending, -1);
+		__TIMER_STAT_ADD(priv_timer, pending, -1);
 		/* the timer was stopped or reloaded by the callback
 		 * function, we have nothing to do here */
 		if (priv_timer[lcore_id].updated == 1)
@@ -580,24 +791,222 @@ void rte_timer_manage(void)
 			/* keep it in list and mark timer as pending */
 			rte_spinlock_lock(&priv_timer[lcore_id].list_lock);
 			status.state = RTE_TIMER_PENDING;
-			__TIMER_STAT_ADD(pending, 1);
+			__TIMER_STAT_ADD(priv_timer, pending, 1);
 			status.owner = (int16_t)lcore_id;
 			rte_wmb();
 			tim->status.u32 = status.u32;
 			__rte_timer_reset(tim, tim->expire + tim->period,
-				tim->period, lcore_id, tim->f, tim->arg, 1);
+				tim->period, lcore_id, tim->f, tim->arg, 1,
+				timer_data);
 			rte_spinlock_unlock(&priv_timer[lcore_id].list_lock);
 		}
 	}
 	priv_timer[lcore_id].running_tim = NULL;
 }
 
+void
+rte_timer_manage_v20(void)
+{
+	__rte_timer_manage(&default_timer_data);
+}
+VERSION_SYMBOL(rte_timer_manage, _v20, 2.0);
+
+int
+rte_timer_manage_v1902(void)
+{
+	struct rte_timer_data *timer_data;
+
+	TIMER_DATA_VALID_GET_OR_ERR_RET(default_data_id, timer_data, -EINVAL);
+
+	__rte_timer_manage(timer_data);
+
+	return 0;
+}
+MAP_STATIC_SYMBOL(int rte_timer_manage(void), rte_timer_manage_v1902);
+BIND_DEFAULT_SYMBOL(rte_timer_manage, _v1902, 19.02);
+
+int __rte_experimental
+rte_timer_alt_manage(uint32_t timer_data_id,
+		     unsigned int *poll_lcores,
+		     int nb_poll_lcores,
+		     rte_timer_alt_manage_cb_t f)
+{
+	union rte_timer_status status;
+	struct rte_timer *tim, *next_tim, **pprev;
+	struct rte_timer *run_first_tims[RTE_MAX_LCORE];
+	unsigned int runlist_lcore_ids[RTE_MAX_LCORE];
+	unsigned int this_lcore = rte_lcore_id();
+	struct rte_timer *prev[MAX_SKIPLIST_DEPTH + 1];
+	uint64_t cur_time;
+	int i, j, ret;
+	int nb_runlists = 0;
+	struct rte_timer_data *data;
+	struct priv_timer *privp;
+	uint32_t poll_lcore;
+
+	TIMER_DATA_VALID_GET_OR_ERR_RET(timer_data_id, data, -EINVAL);
+
+	/* timer manager only runs on EAL thread with valid lcore_id */
+	assert(this_lcore < RTE_MAX_LCORE);
+
+	__TIMER_STAT_ADD(data->priv_timer, manage, 1);
+
+	if (poll_lcores == NULL) {
+		poll_lcores = (unsigned int []){rte_lcore_id()};
+		nb_poll_lcores = 1;
+	}
+
+	for (i = 0, poll_lcore = poll_lcores[i]; i < nb_poll_lcores;
+	     poll_lcore = poll_lcores[++i]) {
+		privp = &data->priv_timer[poll_lcore];
+
+		/* optimize for the case where per-cpu list is empty */
+		if (privp->pending_head.sl_next[0] == NULL)
+			continue;
+		cur_time = rte_get_timer_cycles();
+
+#ifdef RTE_ARCH_64
+		/* on 64-bit the value cached in the pending_head.expired will
+		 * be updated atomically, so we can consult that for a quick
+		 * check here outside the lock
+		 */
+		if (likely(privp->pending_head.expire > cur_time))
+			continue;
+#endif
+
+		/* browse ordered list, add expired timers in 'expired' list */
+		rte_spinlock_lock(&privp->list_lock);
+
+		/* if nothing to do just unlock and return */
+		if (privp->pending_head.sl_next[0] == NULL ||
+		    privp->pending_head.sl_next[0]->expire > cur_time) {
+			rte_spinlock_unlock(&privp->list_lock);
+			continue;
+		}
+
+		/* save start of list of expired timers */
+		tim = privp->pending_head.sl_next[0];
+
+		/* break the existing list at current time point */
+		timer_get_prev_entries(cur_time, poll_lcore, prev,
+				       data->priv_timer);
+		for (j = privp->curr_skiplist_depth - 1; j >= 0; j--) {
+			if (prev[j] == &privp->pending_head)
+				continue;
+			privp->pending_head.sl_next[j] =
+				prev[j]->sl_next[j];
+			if (prev[j]->sl_next[j] == NULL)
+				privp->curr_skiplist_depth--;
+
+			prev[j]->sl_next[j] = NULL;
+		}
+
+		/* transition run-list from PENDING to RUNNING */
+		run_first_tims[nb_runlists] = tim;
+		runlist_lcore_ids[nb_runlists] = poll_lcore;
+		pprev = &run_first_tims[nb_runlists];
+		nb_runlists++;
+
+		for ( ; tim != NULL; tim = next_tim) {
+			next_tim = tim->sl_next[0];
+
+			ret = timer_set_running_state(tim);
+			if (likely(ret == 0)) {
+				pprev = &tim->sl_next[0];
+			} else {
+				/* another core is trying to re-config this one,
+				 * remove it from local expired list
+				 */
+				*pprev = next_tim;
+			}
+		}
+
+		/* update the next to expire timer value */
+		privp->pending_head.expire =
+		    (privp->pending_head.sl_next[0] == NULL) ? 0 :
+			privp->pending_head.sl_next[0]->expire;
+
+		rte_spinlock_unlock(&privp->list_lock);
+	}
+
+	/* Now process the run lists */
+	while (1) {
+		bool done = true;
+		uint64_t min_expire = UINT64_MAX;
+		int min_idx = 0;
+
+		/* Find the next oldest timer to process */
+		for (i = 0; i < nb_runlists; i++) {
+			tim = run_first_tims[i];
+
+			if (tim != NULL && tim->expire < min_expire) {
+				min_expire = tim->expire;
+				min_idx = i;
+				done = false;
+			}
+		}
+
+		if (done)
+			break;
+
+		tim = run_first_tims[min_idx];
+		privp = &data->priv_timer[runlist_lcore_ids[min_idx]];
+
+		/* Move down the runlist from which we picked a timer to
+		 * execute
+		 */
+		run_first_tims[min_idx] = run_first_tims[min_idx]->sl_next[0];
+
+		privp->updated = 0;
+		privp->running_tim = tim;
+
+		/* Call the provided callback function */
+		f(tim);
+
+		__TIMER_STAT_ADD(privp, pending, -1);
+
+		/* the timer was stopped or reloaded by the callback
+		 * function, we have nothing to do here
+		 */
+		if (privp->updated == 1)
+			continue;
+
+		if (tim->period == 0) {
+			/* remove from done list and mark timer as stopped */
+			status.state = RTE_TIMER_STOP;
+			status.owner = RTE_TIMER_NO_OWNER;
+			rte_wmb();
+			tim->status.u32 = status.u32;
+		} else {
+			/* keep it in list and mark timer as pending */
+			rte_spinlock_lock(
+				&data->priv_timer[this_lcore].list_lock);
+			status.state = RTE_TIMER_PENDING;
+			__TIMER_STAT_ADD(data->priv_timer, pending, 1);
+			status.owner = (int16_t)this_lcore;
+			rte_wmb();
+			tim->status.u32 = status.u32;
+			__rte_timer_reset(tim, tim->expire + tim->period,
+				tim->period, this_lcore, tim->f, tim->arg, 1,
+				data);
+			rte_spinlock_unlock(
+				&data->priv_timer[this_lcore].list_lock);
+		}
+
+		privp->running_tim = NULL;
+	}
+
+	return 0;
+}
+
 /* dump statistics about timers */
-void rte_timer_dump_stats(FILE *f)
+static void
+__rte_timer_dump_stats(struct rte_timer_data *timer_data __rte_unused, FILE *f)
 {
 #ifdef RTE_LIBRTE_TIMER_DEBUG
 	struct rte_timer_debug_stats sum;
 	unsigned lcore_id;
+	struct priv_timer *priv_timer = timer_data->priv_timer;
 
 	memset(&sum, 0, sizeof(sum));
 	for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
@@ -615,3 +1024,31 @@ void rte_timer_dump_stats(FILE *f)
 	fprintf(f, "No timer statistics, RTE_LIBRTE_TIMER_DEBUG is disabled\n");
 #endif
 }
+
+void
+rte_timer_dump_stats_v20(FILE *f)
+{
+	__rte_timer_dump_stats(&default_timer_data, f);
+}
+VERSION_SYMBOL(rte_timer_dump_stats, _v20, 2.0);
+
+int
+rte_timer_dump_stats_v1902(FILE *f)
+{
+	return rte_timer_alt_dump_stats(default_data_id, f);
+}
+MAP_STATIC_SYMBOL(int rte_timer_dump_stats(FILE *f),
+		  rte_timer_dump_stats_v1902);
+BIND_DEFAULT_SYMBOL(rte_timer_dump_stats, _v1902, 19.02);
+
+int __rte_experimental
+rte_timer_alt_dump_stats(uint32_t timer_data_id __rte_unused, FILE *f)
+{
+	struct rte_timer_data *timer_data;
+
+	TIMER_DATA_VALID_GET_OR_ERR_RET(timer_data_id, timer_data, -EINVAL);
+
+	__rte_timer_dump_stats(timer_data, f);
+
+	return 0;
+}
diff --git a/lib/librte_timer/rte_timer.h b/lib/librte_timer/rte_timer.h
index 9b95cd2..82f5fba 100644
--- a/lib/librte_timer/rte_timer.h
+++ b/lib/librte_timer/rte_timer.h
@@ -39,6 +39,7 @@
 #include <stddef.h>
 #include <rte_common.h>
 #include <rte_config.h>
+#include <rte_spinlock.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -132,12 +133,68 @@ struct rte_timer
 #endif
 
 /**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Allocate a timer data instance in shared memory to track a set of pending
+ * timer lists.
+ *
+ * @param id_ptr
+ *   Pointer to variable into which to write the identifier of the allocated
+ *   timer data instance.
+ *
+ * @return
+ *   - 0: Success
+ *   - -ENOSPC: maximum number of timer data instances already allocated
+ */
+int __rte_experimental rte_timer_data_alloc(uint32_t *id_ptr);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Deallocate a timer data instance.
+ *
+ * @param id
+ *   Identifier of the timer data instance to deallocate.
+ *
+ * @return
+ *   - 0: Success
+ *   - -EINVAL: invalid timer data instance identifier
+ */
+int __rte_experimental rte_timer_data_dealloc(uint32_t id);
+
+/**
  * Initialize the timer library.
  *
  * Initializes internal variables (list, locks and so on) for the RTE
  * timer library.
  */
-void rte_timer_subsystem_init(void);
+void rte_timer_subsystem_init_v20(void);
+
+/**
+ * Initialize the timer library.
+ *
+ * Initializes internal variables (list, locks and so on) for the RTE
+ * timer library.
+ *
+ * @return
+ *   - 0: Success
+ *   - -EEXIST: Returned in secondary process when primary process has not
+ *      yet initialized the timer subsystem
+ *   - -ENOMEM: Unable to allocate memory needed to initialize timer
+ *      subsystem
+ */
+int rte_timer_subsystem_init_v1902(void);
+int rte_timer_subsystem_init(void);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Free timer subsystem resources.
+ */
+void __rte_experimental rte_timer_subsystem_finalize(void);
 
 /**
  * Initialize a timer handle.
@@ -193,6 +250,12 @@ void rte_timer_init(struct rte_timer *tim);
  *   - 0: Success; the timer is scheduled.
  *   - (-1): Timer is in the RUNNING or CONFIG state.
  */
+int rte_timer_reset_v20(struct rte_timer *tim, uint64_t ticks,
+			enum rte_timer_type type, unsigned int tim_lcore,
+			rte_timer_cb_t fct, void *arg);
+int rte_timer_reset_v1902(struct rte_timer *tim, uint64_t ticks,
+			  enum rte_timer_type type, unsigned int tim_lcore,
+			  rte_timer_cb_t fct, void *arg);
 int rte_timer_reset(struct rte_timer *tim, uint64_t ticks,
 		    enum rte_timer_type type, unsigned tim_lcore,
 		    rte_timer_cb_t fct, void *arg);
@@ -252,9 +315,10 @@ rte_timer_reset_sync(struct rte_timer *tim, uint64_t ticks,
  *   - 0: Success; the timer is stopped.
  *   - (-1): The timer is in the RUNNING or CONFIG state.
  */
+int rte_timer_stop_v20(struct rte_timer *tim);
+int rte_timer_stop_v1902(struct rte_timer *tim);
 int rte_timer_stop(struct rte_timer *tim);
 
-
 /**
  * Loop until rte_timer_stop() succeeds.
  *
@@ -292,7 +356,25 @@ int rte_timer_pending(struct rte_timer *tim);
  * function. However, the more often the function is called, the more
  * CPU resources it will use.
  */
-void rte_timer_manage(void);
+void rte_timer_manage_v20(void);
+
+/**
+ * Manage the timer list and execute callback functions.
+ *
+ * This function must be called periodically from EAL lcores
+ * main_loop(). It browses the list of pending timers and runs all
+ * timers that are expired.
+ *
+ * The precision of the timer depends on the call frequency of this
+ * function. However, the more often the function is called, the more
+ * CPU resources it will use.
+ *
+ * @return
+ *   - 0: Success
+ *   - -EINVAL: timer subsystem not yet initialized
+ */
+int rte_timer_manage_v1902(void);
+int rte_timer_manage(void);
 
 /**
  * Dump statistics about timers.
@@ -300,7 +382,143 @@ void rte_timer_manage(void);
  * @param f
  *   A pointer to a file for output
  */
-void rte_timer_dump_stats(FILE *f);
+void rte_timer_dump_stats_v20(FILE *f);
+
+/**
+ * Dump statistics about timers.
+ *
+ * @param f
+ *   A pointer to a file for output
+ * @return
+ *   - 0: Success
+ *   - -EINVAL: timer subsystem not yet initialized
+ */
+int rte_timer_dump_stats_v1902(FILE *f);
+int rte_timer_dump_stats(FILE *f);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * This function is the same as rte_timer_reset(), except that it allows a
+ * caller to specify the rte_timer_data instance containing the list to which
+ * the timer should be added.
+ *
+ * @see rte_timer_reset()
+ *
+ * @param timer_data_id
+ *   An identifier indicating which instance of timer data should be used for
+ *   this operation.
+ * @param tim
+ *   The timer handle.
+ * @param ticks
+ *   The number of cycles (see rte_get_hpet_hz()) before the callback
+ *   function is called.
+ * @param type
+ *   The type can be either:
+ *   - PERIODICAL: The timer is automatically reloaded after execution
+ *     (returns to the PENDING state)
+ *   - SINGLE: The timer is one-shot, that is, the timer goes to a
+ *     STOPPED state after execution.
+ * @param tim_lcore
+ *   The ID of the lcore where the timer callback function has to be
+ *   executed. If tim_lcore is LCORE_ID_ANY, the timer library will
+ *   launch it on a different core for each call (round-robin).
+ * @param fct
+ *   The callback function of the timer. This parameter can be NULL if (and
+ *   only if) rte_timer_alt_manage() will be used to manage this timer.
+ * @param arg
+ *   The user argument of the callback function.
+ * @return
+ *   - 0: Success; the timer is scheduled.
+ *   - (-1): Timer is in the RUNNING or CONFIG state.
+ *   - -EINVAL: invalid timer_data_id
+ */
+int __rte_experimental
+rte_timer_alt_reset(uint32_t timer_data_id, struct rte_timer *tim,
+		    uint64_t ticks, enum rte_timer_type type,
+		    unsigned int tim_lcore, rte_timer_cb_t fct, void *arg);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * This function is the same as rte_timer_stop(), except that it allows a
+ * caller to specify the rte_timer_data instance containing the list from which
+ * this timer should be removed.
+ *
+ * @see rte_timer_stop()
+ *
+ * @param timer_data_id
+ *   An identifier indicating which instance of timer data should be used for
+ *   this operation.
+ * @param tim
+ *   The timer handle.
+ * @return
+ *   - 0: Success; the timer is stopped.
+ *   - (-1): The timer is in the RUNNING or CONFIG state.
+ *   - -EINVAL: invalid timer_data_id
+ */
+int __rte_experimental
+rte_timer_alt_stop(uint32_t timer_data_id, struct rte_timer *tim);
+
+/**
+ * Callback function type for rte_timer_alt_manage().
+ */
+typedef void (*rte_timer_alt_manage_cb_t)(void *);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Manage a set of timer lists and execute the specified callback function for
+ * all expired timers. This function is similar to rte_timer_manage(), except
+ * that it allows a caller to specify the timer_data instance that should
+ * be operated on, as well as a set of lcore IDs identifying which timer lists
+ * should be processed.  Callback functions of individual timers are ignored.
+ *
+ * @see rte_timer_manage()
+ *
+ * @param timer_data_id
+ *   An identifier indicating which instance of timer data should be used for
+ *   this operation.
+ * @param poll_lcores
+ *   An array of lcore ids identifying the timer lists that should be processed.
+ *   NULL is allowed - if NULL, the timer list corresponding to the lcore
+ *   calling this routine is processed (same as rte_timer_manage()).
+ * @param n_poll_lcores
+ *   The size of the poll_lcores array. If 'poll_lcores' is NULL, this parameter
+ *   is ignored.
+ * @param f
+ *   The callback function which should be called for all expired timers.
+ * @return
+ *   - 0: success
+ *   - -EINVAL: invalid timer_data_id
+ */
+int __rte_experimental
+rte_timer_alt_manage(uint32_t timer_data_id, unsigned int *poll_lcores,
+		     int n_poll_lcores, rte_timer_alt_manage_cb_t f);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * This function is the same as rte_timer_dump_stats(), except that it allows
+ * the caller to specify the rte_timer_data instance that should be used.
+ *
+ * @see rte_timer_dump_stats()
+ *
+ * @param timer_data_id
+ *   An identifier indicating which instance of timer data should be used for
+ *   this operation.
+ * @param f
+ *   A pointer to a file for output
+ * @return
+ *   - 0: success
+ *   - -EINVAL: invalid timer_data_id
+ */
+int __rte_experimental
+rte_timer_alt_dump_stats(uint32_t timer_data_id, FILE *f);
 
 #ifdef __cplusplus
 }
diff --git a/lib/librte_timer/rte_timer_version.map b/lib/librte_timer/rte_timer_version.map
index 9b2e4b8..b3f4b6c 100644
--- a/lib/librte_timer/rte_timer_version.map
+++ b/lib/librte_timer/rte_timer_version.map
@@ -13,3 +13,25 @@ DPDK_2.0 {
 
 	local: *;
 };
+
+DPDK_19.02 {
+	global:
+
+	rte_timer_dump_stats;
+	rte_timer_manage;
+	rte_timer_reset;
+	rte_timer_stop;
+	rte_timer_subsystem_init;
+} DPDK_2.0;
+
+EXPERIMENTAL {
+	global:
+
+	rte_timer_alt_dump_stats;
+	rte_timer_alt_manage;
+	rte_timer_alt_reset;
+	rte_timer_alt_stop;
+	rte_timer_data_alloc;
+	rte_timer_data_dealloc;
+	rte_timer_subsystem_finalize;
+};
-- 
2.6.4

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [dpdk-dev] [PATCH v3 2/2] timer: add function to stop all timers in a list
  2018-12-13 22:26   ` [dpdk-dev] [PATCH v3 0/2] Timer library changes Erik Gabriel Carrillo
  2018-12-13 22:26     ` [dpdk-dev] [PATCH v3 1/2] timer: allow timer management in shared memory Erik Gabriel Carrillo
@ 2018-12-13 22:26     ` Erik Gabriel Carrillo
  2018-12-19  3:35     ` [dpdk-dev] [PATCH v3 0/2] Timer library changes Thomas Monjalon
                       ` (2 subsequent siblings)
  4 siblings, 0 replies; 77+ messages in thread
From: Erik Gabriel Carrillo @ 2018-12-13 22:26 UTC (permalink / raw)
  To: rsanford; +Cc: stephen, jerin.jacob, pbhagavatula, dev

Add a function to the timer API that allows a caller to traverse a
specified set of timer lists, stopping each timer in each list,
and invoking a callback function.

Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
---
 lib/librte_timer/rte_timer.c           | 39 ++++++++++++++++++++++++++++++++++
 lib/librte_timer/rte_timer.h           | 32 ++++++++++++++++++++++++++++
 lib/librte_timer/rte_timer_version.map |  1 +
 3 files changed, 72 insertions(+)

diff --git a/lib/librte_timer/rte_timer.c b/lib/librte_timer/rte_timer.c
index d761cda..0fa68f7 100644
--- a/lib/librte_timer/rte_timer.c
+++ b/lib/librte_timer/rte_timer.c
@@ -999,6 +999,45 @@ rte_timer_alt_manage(uint32_t timer_data_id,
 	return 0;
 }
 
+/* Walk pending lists, stopping timers and calling user-specified function */
+int __rte_experimental
+rte_timer_stop_all(uint32_t timer_data_id, unsigned int *walk_lcores,
+		   int nb_walk_lcores,
+		   rte_timer_stop_all_cb_t f, void *f_arg)
+{
+	int i;
+	struct priv_timer *priv_timer;
+	uint32_t walk_lcore;
+	struct rte_timer *tim, *next_tim;
+	struct rte_timer_data *timer_data;
+
+	TIMER_DATA_VALID_GET_OR_ERR_RET(timer_data_id, timer_data, -EINVAL);
+
+	for (i = 0, walk_lcore = walk_lcores[i];
+	     i < nb_walk_lcores;
+	     walk_lcore = walk_lcores[++i]) {
+		priv_timer = &timer_data->priv_timer[walk_lcore];
+
+		rte_spinlock_lock(&priv_timer->list_lock);
+
+		for (tim = priv_timer->pending_head.sl_next[0];
+		     tim != NULL;
+		     tim = next_tim) {
+			next_tim = tim->sl_next[0];
+
+			/* Call timer_stop with lock held */
+			__rte_timer_stop(tim, 1, timer_data);
+
+			if (f)
+				f(tim, f_arg);
+		}
+
+		rte_spinlock_unlock(&priv_timer->list_lock);
+	}
+
+	return 0;
+}
+
 /* dump statistics about timers */
 static void
 __rte_timer_dump_stats(struct rte_timer_data *timer_data __rte_unused, FILE *f)
diff --git a/lib/librte_timer/rte_timer.h b/lib/librte_timer/rte_timer.h
index 82f5fba..b01bd97 100644
--- a/lib/librte_timer/rte_timer.h
+++ b/lib/librte_timer/rte_timer.h
@@ -500,6 +500,38 @@ rte_timer_alt_manage(uint32_t timer_data_id, unsigned int *poll_lcores,
 		     int n_poll_lcores, rte_timer_alt_manage_cb_t f);
 
 /**
+ * Callback function type for rte_timer_stop_all().
+ */
+typedef void (*rte_timer_stop_all_cb_t)(struct rte_timer *tim, void *arg);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Walk the pending timer lists for the specified lcore IDs, and for each timer
+ * that is encountered, stop it and call the specified callback function to
+ * process it further.
+ *
+ * @param timer_data_id
+ *   An identifier indicating which instance of timer data should be used for
+ *   this operation.
+ * @param walk_lcores
+ *   An array of lcore ids identifying the timer lists that should be processed.
+ * @param nb_walk_lcores
+ *   The size of the walk_lcores array.
+ * @param f
+ *   The callback function which should be called for each timers. Can be NULL.
+ * @param f_arg
+ *   An arbitrary argument that will be passed to f, if it is called.
+ * @return
+ *   - 0: success
+ *   - EINVAL: invalid timer_data_id
+ */
+int __rte_experimental
+rte_timer_stop_all(uint32_t timer_data_id, unsigned int *walk_lcores,
+		   int nb_walk_lcores, rte_timer_stop_all_cb_t f, void *f_arg);
+
+/**
  * @warning
  * @b EXPERIMENTAL: this API may change without prior notice
  *
diff --git a/lib/librte_timer/rte_timer_version.map b/lib/librte_timer/rte_timer_version.map
index b3f4b6c..278b2af 100644
--- a/lib/librte_timer/rte_timer_version.map
+++ b/lib/librte_timer/rte_timer_version.map
@@ -33,5 +33,6 @@ EXPERIMENTAL {
 	rte_timer_alt_stop;
 	rte_timer_data_alloc;
 	rte_timer_data_dealloc;
+	rte_timer_stop_all;
 	rte_timer_subsystem_finalize;
 };
-- 
2.6.4

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [dpdk-dev] [PATCH v3 0/1] New software event timer adapter
  2018-12-07 20:34 ` [dpdk-dev] [PATCH v2 0/1] New software event timer adapter Erik Gabriel Carrillo
  2018-12-07 20:34   ` [dpdk-dev] [PATCH v2 1/1] eventdev: add new " Erik Gabriel Carrillo
@ 2018-12-14 15:45   ` Erik Gabriel Carrillo
  2018-12-14 15:45     ` [dpdk-dev] [PATCH v3 1/1] eventdev: add new " Erik Gabriel Carrillo
  2018-12-14 23:15     ` [dpdk-dev] [PATCH v4 0/1] New " Erik Gabriel Carrillo
  1 sibling, 2 replies; 77+ messages in thread
From: Erik Gabriel Carrillo @ 2018-12-14 15:45 UTC (permalink / raw)
  To: dev

This patch introduces a new version of the event timer adapter software
PMD [1]. In the original design, timer event producer lcores in the primary
and secondary processes enqueued event timers into a ring, and a service
core in the primary process dequeued them and processed them further.  To
improve performance, this version does away with the ring and lets lcores in
both primary and secondary processes insert timers directly into timer
skiplist data structures; the service core directly accesses the lists as
well, when looking for timers that have expired. (This behavior requires
the patch to the timer library that is referenced below.)

Depends on: https://patches.dpdk.org/project/dpdk/list/?series=2767

[1] https://doc.dpdk.org/guides/prog_guide/event_timer_adapter.html

Changes in v3:
 - Addressed comments from Mattias Ronnblom:
   - remove unnecessary header include
   - remove unnecessary cast in mempool_put() call
   - update alignment of elements of array to avoid false sharing issue

Changes in v2:
 - split this change out into its own patch series

Erik Gabriel Carrillo (1):
  eventdev: add new software event timer adapter

 lib/librte_eventdev/rte_event_timer_adapter.c | 688 +++++++++++---------------
 1 file changed, 276 insertions(+), 412 deletions(-)

-- 
2.6.4

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [dpdk-dev] [PATCH v3 1/1] eventdev: add new software event timer adapter
  2018-12-14 15:45   ` [dpdk-dev] [PATCH v3 0/1] New " Erik Gabriel Carrillo
@ 2018-12-14 15:45     ` Erik Gabriel Carrillo
  2018-12-14 21:15       ` Mattias Rönnblom
  2018-12-14 23:15     ` [dpdk-dev] [PATCH v4 0/1] New " Erik Gabriel Carrillo
  1 sibling, 1 reply; 77+ messages in thread
From: Erik Gabriel Carrillo @ 2018-12-14 15:45 UTC (permalink / raw)
  To: dev

This patch introduces a new version of the event timer adapter software
PMD. In the original design, timer event producer lcores in the primary
and secondary processes enqueued event timers into a ring, and a
service core in the primary process dequeued them and processed them
further.  To improve performance, this version does away with the ring
and lets lcores in both primary and secondary processes insert timers
directly into timer skiplist data structures; the service core directly
accesses the lists as well, when looking for timers that have expired.

Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
---
 lib/librte_eventdev/rte_event_timer_adapter.c | 688 +++++++++++---------------
 1 file changed, 276 insertions(+), 412 deletions(-)

diff --git a/lib/librte_eventdev/rte_event_timer_adapter.c b/lib/librte_eventdev/rte_event_timer_adapter.c
index 79070d4..029a45a 100644
--- a/lib/librte_eventdev/rte_event_timer_adapter.c
+++ b/lib/librte_eventdev/rte_event_timer_adapter.c
@@ -19,6 +19,7 @@
 #include <rte_timer.h>
 #include <rte_service_component.h>
 #include <rte_cycles.h>
+#include <rte_random.h>
 
 #include "rte_eventdev.h"
 #include "rte_eventdev_pmd.h"
@@ -34,7 +35,7 @@ static int evtim_buffer_logtype;
 
 static struct rte_event_timer_adapter adapters[RTE_EVENT_TIMER_ADAPTER_NUM_MAX];
 
-static const struct rte_event_timer_adapter_ops sw_event_adapter_timer_ops;
+static const struct rte_event_timer_adapter_ops swtim_ops;
 
 #define EVTIM_LOG(level, logtype, ...) \
 	rte_log(RTE_LOG_ ## level, logtype, \
@@ -211,7 +212,7 @@ rte_event_timer_adapter_create_ext(
 	 * implementation.
 	 */
 	if (adapter->ops == NULL)
-		adapter->ops = &sw_event_adapter_timer_ops;
+		adapter->ops = &swtim_ops;
 
 	/* Allow driver to do some setup */
 	FUNC_PTR_OR_NULL_RET_WITH_ERRNO(adapter->ops->init, -ENOTSUP);
@@ -334,7 +335,7 @@ rte_event_timer_adapter_lookup(uint16_t adapter_id)
 	 * implementation.
 	 */
 	if (adapter->ops == NULL)
-		adapter->ops = &sw_event_adapter_timer_ops;
+		adapter->ops = &swtim_ops;
 
 	/* Set fast-path function pointers */
 	adapter->arm_burst = adapter->ops->arm_burst;
@@ -491,6 +492,7 @@ event_buffer_flush(struct event_buffer *bufp, uint8_t dev_id, uint8_t port_id,
 	}
 
 	*nb_events_inv = 0;
+
 	*nb_events_flushed = rte_event_enqueue_burst(dev_id, port_id,
 						     &events[tail_idx], n);
 	if (*nb_events_flushed != n && rte_errno == -EINVAL) {
@@ -498,137 +500,125 @@ event_buffer_flush(struct event_buffer *bufp, uint8_t dev_id, uint8_t port_id,
 		(*nb_events_inv)++;
 	}
 
+	if (*nb_events_flushed > 0)
+		EVTIM_BUF_LOG_DBG("enqueued %"PRIu16" timer events to event "
+				  "device", *nb_events_flushed);
+
 	bufp->tail = bufp->tail + *nb_events_flushed + *nb_events_inv;
 }
 
 /*
  * Software event timer adapter implementation
  */
-
-struct rte_event_timer_adapter_sw_data {
-	/* List of messages for outstanding timers */
-	TAILQ_HEAD(, msg) msgs_tailq_head;
-	/* Lock to guard tailq and armed count */
-	rte_spinlock_t msgs_tailq_sl;
+struct swtim {
 	/* Identifier of service executing timer management logic. */
 	uint32_t service_id;
 	/* The cycle count at which the adapter should next tick */
 	uint64_t next_tick_cycles;
-	/* Incremented as the service moves through phases of an iteration */
-	volatile int service_phase;
 	/* The tick resolution used by adapter instance. May have been
 	 * adjusted from what user requested
 	 */
 	uint64_t timer_tick_ns;
 	/* Maximum timeout in nanoseconds allowed by adapter instance. */
 	uint64_t max_tmo_ns;
-	/* Ring containing messages to arm or cancel event timers */
-	struct rte_ring *msg_ring;
-	/* Mempool containing msg objects */
-	struct rte_mempool *msg_pool;
 	/* Buffered timer expiry events to be enqueued to an event device. */
 	struct event_buffer buffer;
 	/* Statistics */
 	struct rte_event_timer_adapter_stats stats;
-	/* The number of threads currently adding to the message ring */
-	rte_atomic16_t message_producer_count;
+	/* Mempool of timer objects */
+	struct rte_mempool *tim_pool;
+	/* Back pointer for convenience */
+	struct rte_event_timer_adapter *adapter;
+	/* Identifier of timer data instance */
+	uint32_t timer_data_id;
+	/* Track which cores have actually armed a timer */
+	struct {
+		rte_atomic16_t v;
+	} __rte_cache_aligned in_use[RTE_MAX_LCORE];
+	/* Track which cores' timer lists should be polled */
+	unsigned int poll_lcores[RTE_MAX_LCORE];
+	/* The number of lists that should be polled */
+	int n_poll_lcores;
+	/* Lock to atomically access the above two variables */
+	rte_spinlock_t poll_lcores_sl;
 };
 
-enum msg_type {MSG_TYPE_ARM, MSG_TYPE_CANCEL};
-
-struct msg {
-	enum msg_type type;
-	struct rte_event_timer *evtim;
-	struct rte_timer tim;
-	TAILQ_ENTRY(msg) msgs;
-};
+static inline struct swtim *
+swtim_pmd_priv(const struct rte_event_timer_adapter *adapter)
+{
+	return adapter->data->adapter_priv;
+}
 
 static void
-sw_event_timer_cb(struct rte_timer *tim, void *arg)
+swtim_callback(void *arg)
 {
-	int ret;
+	struct rte_timer *tim = arg;
+	struct rte_event_timer *evtim = tim->arg;
+	struct rte_event_timer_adapter *adapter;
+	struct swtim *sw;
 	uint16_t nb_evs_flushed = 0;
 	uint16_t nb_evs_invalid = 0;
 	uint64_t opaque;
-	struct rte_event_timer *evtim;
-	struct rte_event_timer_adapter *adapter;
-	struct rte_event_timer_adapter_sw_data *sw_data;
+	int ret;
 
-	evtim = arg;
 	opaque = evtim->impl_opaque[1];
 	adapter = (struct rte_event_timer_adapter *)(uintptr_t)opaque;
-	sw_data = adapter->data->adapter_priv;
+	sw = swtim_pmd_priv(adapter);
 
-	ret = event_buffer_add(&sw_data->buffer, &evtim->ev);
+	ret = event_buffer_add(&sw->buffer, &evtim->ev);
 	if (ret < 0) {
 		/* If event buffer is full, put timer back in list with
 		 * immediate expiry value, so that we process it again on the
 		 * next iteration.
 		 */
-		rte_timer_reset_sync(tim, 0, SINGLE, rte_lcore_id(),
-				     sw_event_timer_cb, evtim);
+		rte_timer_alt_reset(sw->timer_data_id, tim, 0, SINGLE,
+				    rte_lcore_id(), NULL, evtim);
+
+		sw->stats.evtim_retry_count++;
 
-		sw_data->stats.evtim_retry_count++;
 		EVTIM_LOG_DBG("event buffer full, resetting rte_timer with "
 			      "immediate expiry value");
 	} else {
-		struct msg *m = container_of(tim, struct msg, tim);
-		TAILQ_REMOVE(&sw_data->msgs_tailq_head, m, msgs);
 		EVTIM_BUF_LOG_DBG("buffered an event timer expiry event");
-		evtim->state = RTE_EVENT_TIMER_NOT_ARMED;
+		rte_mempool_put(sw->tim_pool, tim);
+		sw->stats.evtim_exp_count++;
 
-		/* Free the msg object containing the rte_timer now that
-		 * we've buffered its event successfully.
-		 */
-		rte_mempool_put(sw_data->msg_pool, m);
-
-		/* Bump the count when we successfully add an expiry event to
-		 * the buffer.
-		 */
-		sw_data->stats.evtim_exp_count++;
+		evtim->state = RTE_EVENT_TIMER_NOT_ARMED;
 	}
 
-	if (event_buffer_batch_ready(&sw_data->buffer)) {
-		event_buffer_flush(&sw_data->buffer,
+	if (event_buffer_batch_ready(&sw->buffer)) {
+		event_buffer_flush(&sw->buffer,
 				   adapter->data->event_dev_id,
 				   adapter->data->event_port_id,
 				   &nb_evs_flushed,
 				   &nb_evs_invalid);
 
-		sw_data->stats.ev_enq_count += nb_evs_flushed;
-		sw_data->stats.ev_inv_count += nb_evs_invalid;
+		sw->stats.ev_enq_count += nb_evs_flushed;
+		sw->stats.ev_inv_count += nb_evs_invalid;
 	}
 }
 
 static __rte_always_inline uint64_t
 get_timeout_cycles(struct rte_event_timer *evtim,
-		   struct rte_event_timer_adapter *adapter)
+		   const struct rte_event_timer_adapter *adapter)
 {
-	uint64_t timeout_ns;
-	struct rte_event_timer_adapter_sw_data *sw_data;
-
-	sw_data = adapter->data->adapter_priv;
-	timeout_ns = evtim->timeout_ticks * sw_data->timer_tick_ns;
+	struct swtim *sw = swtim_pmd_priv(adapter);
+	uint64_t timeout_ns = evtim->timeout_ticks * sw->timer_tick_ns;
 	return timeout_ns * rte_get_timer_hz() / NSECPERSEC;
-
 }
 
 /* This function returns true if one or more (adapter) ticks have occurred since
  * the last time it was called.
  */
 static inline bool
-adapter_did_tick(struct rte_event_timer_adapter *adapter)
+swtim_did_tick(struct swtim *sw)
 {
 	uint64_t cycles_per_adapter_tick, start_cycles;
 	uint64_t *next_tick_cyclesp;
-	struct rte_event_timer_adapter_sw_data *sw_data;
-
-	sw_data = adapter->data->adapter_priv;
-	next_tick_cyclesp = &sw_data->next_tick_cycles;
 
-	cycles_per_adapter_tick = sw_data->timer_tick_ns *
+	next_tick_cyclesp = &sw->next_tick_cycles;
+	cycles_per_adapter_tick = sw->timer_tick_ns *
 			(rte_get_timer_hz() / NSECPERSEC);
-
 	start_cycles = rte_get_timer_cycles();
 
 	/* Note: initially, *next_tick_cyclesp == 0, so the clause below will
@@ -640,7 +630,6 @@ adapter_did_tick(struct rte_event_timer_adapter *adapter)
 		 * boundary.
 		 */
 		start_cycles -= start_cycles % cycles_per_adapter_tick;
-
 		*next_tick_cyclesp = start_cycles + cycles_per_adapter_tick;
 
 		return true;
@@ -655,15 +644,12 @@ check_timeout(struct rte_event_timer *evtim,
 	      const struct rte_event_timer_adapter *adapter)
 {
 	uint64_t tmo_nsec;
-	struct rte_event_timer_adapter_sw_data *sw_data;
-
-	sw_data = adapter->data->adapter_priv;
-	tmo_nsec = evtim->timeout_ticks * sw_data->timer_tick_ns;
+	struct swtim *sw = swtim_pmd_priv(adapter);
 
-	if (tmo_nsec > sw_data->max_tmo_ns)
+	tmo_nsec = evtim->timeout_ticks * sw->timer_tick_ns;
+	if (tmo_nsec > sw->max_tmo_ns)
 		return -1;
-
-	if (tmo_nsec < sw_data->timer_tick_ns)
+	if (tmo_nsec < sw->timer_tick_ns)
 		return -2;
 
 	return 0;
@@ -691,110 +677,34 @@ check_destination_event_queue(struct rte_event_timer *evtim,
 	return 0;
 }
 
-#define NB_OBJS 32
 static int
-sw_event_timer_adapter_service_func(void *arg)
+swtim_service_func(void *arg)
 {
-	int i, num_msgs;
-	uint64_t cycles, opaque;
+	struct rte_event_timer_adapter *adapter = arg;
+	struct swtim *sw = swtim_pmd_priv(adapter);
 	uint16_t nb_evs_flushed = 0;
 	uint16_t nb_evs_invalid = 0;
-	struct rte_event_timer_adapter *adapter;
-	struct rte_event_timer_adapter_sw_data *sw_data;
-	struct rte_event_timer *evtim = NULL;
-	struct rte_timer *tim = NULL;
-	struct msg *msg, *msgs[NB_OBJS];
-
-	adapter = arg;
-	sw_data = adapter->data->adapter_priv;
-
-	sw_data->service_phase = 1;
-	rte_smp_wmb();
-
-	while (rte_atomic16_read(&sw_data->message_producer_count) > 0 ||
-	       !rte_ring_empty(sw_data->msg_ring)) {
-
-		num_msgs = rte_ring_dequeue_burst(sw_data->msg_ring,
-						  (void **)msgs, NB_OBJS, NULL);
-
-		for (i = 0; i < num_msgs; i++) {
-			int ret = 0;
-
-			RTE_SET_USED(ret);
-
-			msg = msgs[i];
-			evtim = msg->evtim;
-
-			switch (msg->type) {
-			case MSG_TYPE_ARM:
-				EVTIM_SVC_LOG_DBG("dequeued ARM message from "
-						  "ring");
-				tim = &msg->tim;
-				rte_timer_init(tim);
-				cycles = get_timeout_cycles(evtim,
-							    adapter);
-				ret = rte_timer_reset(tim, cycles, SINGLE,
-						      rte_lcore_id(),
-						      sw_event_timer_cb,
-						      evtim);
-				RTE_ASSERT(ret == 0);
-
-				evtim->impl_opaque[0] = (uintptr_t)tim;
-				evtim->impl_opaque[1] = (uintptr_t)adapter;
-
-				TAILQ_INSERT_TAIL(&sw_data->msgs_tailq_head,
-						  msg,
-						  msgs);
-				break;
-			case MSG_TYPE_CANCEL:
-				EVTIM_SVC_LOG_DBG("dequeued CANCEL message "
-						  "from ring");
-				opaque = evtim->impl_opaque[0];
-				tim = (struct rte_timer *)(uintptr_t)opaque;
-				RTE_ASSERT(tim != NULL);
-
-				ret = rte_timer_stop(tim);
-				RTE_ASSERT(ret == 0);
-
-				/* Free the msg object for the original arm
-				 * request.
-				 */
-				struct msg *m;
-				m = container_of(tim, struct msg, tim);
-				TAILQ_REMOVE(&sw_data->msgs_tailq_head, m,
-					     msgs);
-				rte_mempool_put(sw_data->msg_pool, m);
-
-				/* Free the msg object for the current msg */
-				rte_mempool_put(sw_data->msg_pool, msg);
-
-				evtim->impl_opaque[0] = 0;
-				evtim->impl_opaque[1] = 0;
-
-				break;
-			}
-		}
-	}
-
-	sw_data->service_phase = 2;
-	rte_smp_wmb();
 
-	if (adapter_did_tick(adapter)) {
-		rte_timer_manage();
+	if (swtim_did_tick(sw)) {
+		/* This lock is seldom acquired on the arm side */
+		rte_spinlock_lock(&sw->poll_lcores_sl);
+		rte_timer_alt_manage(sw->timer_data_id,
+				     sw->poll_lcores,
+				     sw->n_poll_lcores,
+				     swtim_callback);
+		rte_spinlock_unlock(&sw->poll_lcores_sl);
 
-		event_buffer_flush(&sw_data->buffer,
+		event_buffer_flush(&sw->buffer,
 				   adapter->data->event_dev_id,
 				   adapter->data->event_port_id,
-				   &nb_evs_flushed, &nb_evs_invalid);
+				   &nb_evs_flushed,
+				   &nb_evs_invalid);
 
-		sw_data->stats.ev_enq_count += nb_evs_flushed;
-		sw_data->stats.ev_inv_count += nb_evs_invalid;
-		sw_data->stats.adapter_tick_count++;
+		sw->stats.ev_enq_count += nb_evs_flushed;
+		sw->stats.ev_inv_count += nb_evs_invalid;
+		sw->stats.adapter_tick_count++;
 	}
 
-	sw_data->service_phase = 0;
-	rte_smp_wmb();
-
 	return 0;
 }
 
@@ -828,168 +738,145 @@ compute_msg_mempool_cache_size(uint64_t nb_requested, uint64_t nb_actual)
 	return cache_size;
 }
 
-#define SW_MIN_INTERVAL 1E5
-
 static int
-sw_event_timer_adapter_init(struct rte_event_timer_adapter *adapter)
+swtim_init(struct rte_event_timer_adapter *adapter)
 {
-	int ret;
-	struct rte_event_timer_adapter_sw_data *sw_data;
-	uint64_t nb_timers;
+	int i, ret;
+	struct swtim *sw;
 	unsigned int flags;
 	struct rte_service_spec service;
-	static bool timer_subsystem_inited; // static initialized to false
 
-	/* Allocate storage for SW implementation data */
-	char priv_data_name[RTE_RING_NAMESIZE];
-	snprintf(priv_data_name, RTE_RING_NAMESIZE, "sw_evtim_adap_priv_%"PRIu8,
-		 adapter->data->id);
-	adapter->data->adapter_priv = rte_zmalloc_socket(
-				priv_data_name,
-				sizeof(struct rte_event_timer_adapter_sw_data),
-				RTE_CACHE_LINE_SIZE,
-				adapter->data->socket_id);
-	if (adapter->data->adapter_priv == NULL) {
+	/* Allocate storage for private data area */
+#define SWTIM_NAMESIZE 32
+	char swtim_name[SWTIM_NAMESIZE];
+	snprintf(swtim_name, SWTIM_NAMESIZE, "swtim_%"PRIu8,
+			adapter->data->id);
+	sw = rte_zmalloc_socket(swtim_name, sizeof(*sw), RTE_CACHE_LINE_SIZE,
+			adapter->data->socket_id);
+	if (sw == NULL) {
 		EVTIM_LOG_ERR("failed to allocate space for private data");
 		rte_errno = ENOMEM;
 		return -1;
 	}
 
-	if (adapter->data->conf.timer_tick_ns < SW_MIN_INTERVAL) {
-		EVTIM_LOG_ERR("failed to create adapter with requested tick "
-			      "interval");
-		rte_errno = EINVAL;
-		return -1;
-	}
-
-	sw_data = adapter->data->adapter_priv;
-
-	sw_data->timer_tick_ns = adapter->data->conf.timer_tick_ns;
-	sw_data->max_tmo_ns = adapter->data->conf.max_tmo_ns;
+	/* Connect storage to adapter instance */
+	adapter->data->adapter_priv = sw;
+	sw->adapter = adapter;
 
-	TAILQ_INIT(&sw_data->msgs_tailq_head);
-	rte_spinlock_init(&sw_data->msgs_tailq_sl);
-	rte_atomic16_init(&sw_data->message_producer_count);
-
-	/* Rings require power of 2, so round up to next such value */
-	nb_timers = rte_align64pow2(adapter->data->conf.nb_timers);
-
-	char msg_ring_name[RTE_RING_NAMESIZE];
-	snprintf(msg_ring_name, RTE_RING_NAMESIZE,
-		 "sw_evtim_adap_msg_ring_%"PRIu8, adapter->data->id);
-	flags = adapter->data->conf.flags & RTE_EVENT_TIMER_ADAPTER_F_SP_PUT ?
-		RING_F_SP_ENQ | RING_F_SC_DEQ :
-		RING_F_SC_DEQ;
-	sw_data->msg_ring = rte_ring_create(msg_ring_name, nb_timers,
-					    adapter->data->socket_id, flags);
-	if (sw_data->msg_ring == NULL) {
-		EVTIM_LOG_ERR("failed to create message ring");
-		rte_errno = ENOMEM;
-		goto free_priv_data;
-	}
+	sw->timer_tick_ns = adapter->data->conf.timer_tick_ns;
+	sw->max_tmo_ns = adapter->data->conf.max_tmo_ns;
 
-	char pool_name[RTE_RING_NAMESIZE];
-	snprintf(pool_name, RTE_RING_NAMESIZE, "sw_evtim_adap_msg_pool_%"PRIu8,
+	/* Create a timer pool */
+	char pool_name[SWTIM_NAMESIZE];
+	snprintf(pool_name, SWTIM_NAMESIZE, "swtim_pool_%"PRIu8,
 		 adapter->data->id);
-
-	/* Both the arming/canceling thread and the service thread will do puts
-	 * to the mempool, but if the SP_PUT flag is enabled, we can specify
-	 * single-consumer get for the mempool.
-	 */
-	flags = adapter->data->conf.flags & RTE_EVENT_TIMER_ADAPTER_F_SP_PUT ?
-		MEMPOOL_F_SC_GET : 0;
-
-	/* The usable size of a ring is count - 1, so subtract one here to
-	 * make the counts agree.
-	 */
+	/* Optimal mempool size is a power of 2 minus one */
+	uint64_t nb_timers = rte_align64pow2(adapter->data->conf.nb_timers);
 	int pool_size = nb_timers - 1;
 	int cache_size = compute_msg_mempool_cache_size(
 				adapter->data->conf.nb_timers, nb_timers);
-	sw_data->msg_pool = rte_mempool_create(pool_name, pool_size,
-					       sizeof(struct msg), cache_size,
-					       0, NULL, NULL, NULL, NULL,
-					       adapter->data->socket_id, flags);
-	if (sw_data->msg_pool == NULL) {
-		EVTIM_LOG_ERR("failed to create message object mempool");
+	flags = 0; /* pool is multi-producer, multi-consumer */
+	sw->tim_pool = rte_mempool_create(pool_name, pool_size,
+			sizeof(struct rte_timer), cache_size, 0, NULL, NULL,
+			NULL, NULL, adapter->data->socket_id, flags);
+	if (sw->tim_pool == NULL) {
+		EVTIM_LOG_ERR("failed to create timer object mempool");
 		rte_errno = ENOMEM;
-		goto free_msg_ring;
+		goto free_alloc;
+	}
+
+	/* Initialize the variables that track in-use timer lists */
+	rte_spinlock_init(&sw->poll_lcores_sl);
+	for (i = 0; i < RTE_MAX_LCORE; i++)
+		rte_atomic16_init(&sw->in_use[i].v);
+
+	/* Initialize the timer subsystem and allocate timer data instance */
+	ret = rte_timer_subsystem_init();
+	if (ret < 0) {
+		if (ret != -EALREADY) {
+			EVTIM_LOG_ERR("failed to initialize timer subsystem");
+			rte_errno = ret;
+			goto free_mempool;
+		}
+	}
+
+	ret = rte_timer_data_alloc(&sw->timer_data_id);
+	if (ret < 0) {
+		EVTIM_LOG_ERR("failed to allocate timer data instance");
+		rte_errno = ret;
+		goto free_mempool;
 	}
 
-	event_buffer_init(&sw_data->buffer);
+	/* Initialize timer event buffer */
+	event_buffer_init(&sw->buffer);
+
+	sw->adapter = adapter;
 
 	/* Register a service component to run adapter logic */
 	memset(&service, 0, sizeof(service));
 	snprintf(service.name, RTE_SERVICE_NAME_MAX,
-		 "sw_evimer_adap_svc_%"PRIu8, adapter->data->id);
+		 "swtim_svc_%"PRIu8, adapter->data->id);
 	service.socket_id = adapter->data->socket_id;
-	service.callback = sw_event_timer_adapter_service_func;
+	service.callback = swtim_service_func;
 	service.callback_userdata = adapter;
 	service.capabilities &= ~(RTE_SERVICE_CAP_MT_SAFE);
-	ret = rte_service_component_register(&service, &sw_data->service_id);
+	ret = rte_service_component_register(&service, &sw->service_id);
 	if (ret < 0) {
 		EVTIM_LOG_ERR("failed to register service %s with id %"PRIu32
-			      ": err = %d", service.name, sw_data->service_id,
+			      ": err = %d", service.name, sw->service_id,
 			      ret);
 
 		rte_errno = ENOSPC;
-		goto free_msg_pool;
+		goto free_mempool;
 	}
 
 	EVTIM_LOG_DBG("registered service %s with id %"PRIu32, service.name,
-		      sw_data->service_id);
+		      sw->service_id);
 
-	adapter->data->service_id = sw_data->service_id;
+	adapter->data->service_id = sw->service_id;
 	adapter->data->service_inited = 1;
 
-	if (!timer_subsystem_inited) {
-		rte_timer_subsystem_init();
-		timer_subsystem_inited = true;
-	}
-
 	return 0;
-
-free_msg_pool:
-	rte_mempool_free(sw_data->msg_pool);
-free_msg_ring:
-	rte_ring_free(sw_data->msg_ring);
-free_priv_data:
-	rte_free(sw_data);
+free_mempool:
+	rte_mempool_free(sw->tim_pool);
+free_alloc:
+	rte_free(sw);
 	return -1;
 }
 
-static int
-sw_event_timer_adapter_uninit(struct rte_event_timer_adapter *adapter)
+static void
+swtim_free_tim(struct rte_timer *tim, void *arg)
 {
-	int ret;
-	struct msg *m1, *m2;
-	struct rte_event_timer_adapter_sw_data *sw_data =
-						adapter->data->adapter_priv;
+	struct swtim *sw = arg;
 
-	rte_spinlock_lock(&sw_data->msgs_tailq_sl);
-
-	/* Cancel outstanding rte_timers and free msg objects */
-	m1 = TAILQ_FIRST(&sw_data->msgs_tailq_head);
-	while (m1 != NULL) {
-		EVTIM_LOG_DBG("freeing outstanding timer");
-		m2 = TAILQ_NEXT(m1, msgs);
-
-		rte_timer_stop_sync(&m1->tim);
-		rte_mempool_put(sw_data->msg_pool, m1);
+	rte_mempool_put(sw->tim_pool, tim);
+}
 
-		m1 = m2;
-	}
+/* Traverse the list of outstanding timers and put them back in the mempool
+ * before freeing the adapter to avoid leaking the memory.
+ */
+static int
+swtim_uninit(struct rte_event_timer_adapter *adapter)
+{
+	int ret;
+	struct swtim *sw = swtim_pmd_priv(adapter);
 
-	rte_spinlock_unlock(&sw_data->msgs_tailq_sl);
+	/* Free outstanding timers */
+	rte_timer_stop_all(sw->timer_data_id,
+			   sw->poll_lcores,
+			   sw->n_poll_lcores,
+			   swtim_free_tim,
+			   sw);
 
-	ret = rte_service_component_unregister(sw_data->service_id);
+	ret = rte_service_component_unregister(sw->service_id);
 	if (ret < 0) {
 		EVTIM_LOG_ERR("failed to unregister service component");
 		return ret;
 	}
 
-	rte_ring_free(sw_data->msg_ring);
-	rte_mempool_free(sw_data->msg_pool);
-	rte_free(adapter->data->adapter_priv);
+	rte_mempool_free(sw->tim_pool);
+	rte_free(sw);
+	adapter->data->adapter_priv = NULL;
 
 	return 0;
 }
@@ -1010,88 +897,79 @@ get_mapped_count_for_service(uint32_t service_id)
 }
 
 static int
-sw_event_timer_adapter_start(const struct rte_event_timer_adapter *adapter)
+swtim_start(const struct rte_event_timer_adapter *adapter)
 {
 	int mapped_count;
-	struct rte_event_timer_adapter_sw_data *sw_data;
-
-	sw_data = adapter->data->adapter_priv;
+	struct swtim *sw = swtim_pmd_priv(adapter);
 
 	/* Mapping the service to more than one service core can introduce
 	 * delays while one thread is waiting to acquire a lock, so only allow
 	 * one core to be mapped to the service.
+	 *
+	 * Note: the service could be modified such that it spreads cores to
+	 * poll over multiple service instances.
 	 */
-	mapped_count = get_mapped_count_for_service(sw_data->service_id);
+	mapped_count = get_mapped_count_for_service(sw->service_id);
 
-	if (mapped_count == 1)
-		return rte_service_component_runstate_set(sw_data->service_id,
-							  1);
+	if (mapped_count != 1)
+		return mapped_count < 1 ? -ENOENT : -ENOTSUP;
 
-	return mapped_count < 1 ? -ENOENT : -ENOTSUP;
+	return rte_service_component_runstate_set(sw->service_id, 1);
 }
 
 static int
-sw_event_timer_adapter_stop(const struct rte_event_timer_adapter *adapter)
+swtim_stop(const struct rte_event_timer_adapter *adapter)
 {
 	int ret;
-	struct rte_event_timer_adapter_sw_data *sw_data =
-						adapter->data->adapter_priv;
+	struct swtim *sw = swtim_pmd_priv(adapter);
 
-	ret = rte_service_component_runstate_set(sw_data->service_id, 0);
+	ret = rte_service_component_runstate_set(sw->service_id, 0);
 	if (ret < 0)
 		return ret;
 
-	/* Wait for the service to complete its final iteration before
-	 * stopping.
-	 */
-	while (sw_data->service_phase != 0)
+	/* Wait for the service to complete its final iteration */
+	while (rte_service_may_be_active(sw->service_id))
 		rte_pause();
 
-	rte_smp_rmb();
-
 	return 0;
 }
 
 static void
-sw_event_timer_adapter_get_info(const struct rte_event_timer_adapter *adapter,
+swtim_get_info(const struct rte_event_timer_adapter *adapter,
 		struct rte_event_timer_adapter_info *adapter_info)
 {
-	struct rte_event_timer_adapter_sw_data *sw_data;
-	sw_data = adapter->data->adapter_priv;
-
-	adapter_info->min_resolution_ns = sw_data->timer_tick_ns;
-	adapter_info->max_tmo_ns = sw_data->max_tmo_ns;
+	struct swtim *sw = swtim_pmd_priv(adapter);
+	adapter_info->min_resolution_ns = sw->timer_tick_ns;
+	adapter_info->max_tmo_ns = sw->max_tmo_ns;
 }
 
 static int
-sw_event_timer_adapter_stats_get(const struct rte_event_timer_adapter *adapter,
-				 struct rte_event_timer_adapter_stats *stats)
+swtim_stats_get(const struct rte_event_timer_adapter *adapter,
+		struct rte_event_timer_adapter_stats *stats)
 {
-	struct rte_event_timer_adapter_sw_data *sw_data;
-	sw_data = adapter->data->adapter_priv;
-	*stats = sw_data->stats;
+	struct swtim *sw = swtim_pmd_priv(adapter);
+	*stats = sw->stats; /* structure copy */
 	return 0;
 }
 
 static int
-sw_event_timer_adapter_stats_reset(
-				const struct rte_event_timer_adapter *adapter)
+swtim_stats_reset(const struct rte_event_timer_adapter *adapter)
 {
-	struct rte_event_timer_adapter_sw_data *sw_data;
-	sw_data = adapter->data->adapter_priv;
-	memset(&sw_data->stats, 0, sizeof(sw_data->stats));
+	struct swtim *sw = swtim_pmd_priv(adapter);
+	memset(&sw->stats, 0, sizeof(sw->stats));
 	return 0;
 }
 
-static __rte_always_inline uint16_t
-__sw_event_timer_arm_burst(const struct rte_event_timer_adapter *adapter,
-			  struct rte_event_timer **evtims,
-			  uint16_t nb_evtims)
+static uint16_t
+__swtim_arm_burst(const struct rte_event_timer_adapter *adapter,
+		struct rte_event_timer **evtims,
+		uint16_t nb_evtims)
 {
-	uint16_t i;
-	int ret;
-	struct rte_event_timer_adapter_sw_data *sw_data;
-	struct msg *msgs[nb_evtims];
+	int i, ret;
+	struct swtim *sw = swtim_pmd_priv(adapter);
+	uint32_t lcore_id = rte_lcore_id();
+	struct rte_timer *tim, *tims[nb_evtims];
+	uint64_t cycles;
 
 #ifdef RTE_LIBRTE_EVENTDEV_DEBUG
 	/* Check that the service is running. */
@@ -1101,101 +979,104 @@ __sw_event_timer_arm_burst(const struct rte_event_timer_adapter *adapter,
 	}
 #endif
 
-	sw_data = adapter->data->adapter_priv;
+	/* Adjust lcore_id if non-EAL thread. Arbitrarily pick the timer list of
+	 * the highest lcore to insert such timers into
+	 */
+	if (lcore_id == LCORE_ID_ANY)
+		lcore_id = RTE_MAX_LCORE - 1;
+
+	/* If this is the first time we're arming an event timer on this lcore,
+	 * mark this lcore as "in use"; this will cause the service
+	 * function to process the timer list that corresponds to this lcore.
+	 */
+	if (unlikely(rte_atomic16_test_and_set(&sw->in_use[lcore_id].v))) {
+		rte_spinlock_lock(&sw->poll_lcores_sl);
+		EVTIM_LOG_DBG("Adding lcore id = %u to list of lcores to poll",
+			      lcore_id);
+		sw->poll_lcores[sw->n_poll_lcores++] = lcore_id;
+		rte_spinlock_unlock(&sw->poll_lcores_sl);
+	}
 
-	ret = rte_mempool_get_bulk(sw_data->msg_pool, (void **)msgs, nb_evtims);
+	ret = rte_mempool_get_bulk(sw->tim_pool, (void **)tims,
+				   nb_evtims);
 	if (ret < 0) {
 		rte_errno = ENOSPC;
 		return 0;
 	}
 
-	/* Let the service know we're producing messages for it to process */
-	rte_atomic16_inc(&sw_data->message_producer_count);
-
-	/* If the service is managing timers, wait for it to finish */
-	while (sw_data->service_phase == 2)
-		rte_pause();
-
-	rte_smp_rmb();
-
 	for (i = 0; i < nb_evtims; i++) {
 		/* Don't modify the event timer state in these cases */
 		if (evtims[i]->state == RTE_EVENT_TIMER_ARMED) {
 			rte_errno = EALREADY;
 			break;
 		} else if (!(evtims[i]->state == RTE_EVENT_TIMER_NOT_ARMED ||
-		    evtims[i]->state == RTE_EVENT_TIMER_CANCELED)) {
+			     evtims[i]->state == RTE_EVENT_TIMER_CANCELED)) {
 			rte_errno = EINVAL;
 			break;
 		}
 
 		ret = check_timeout(evtims[i], adapter);
-		if (ret == -1) {
+		if (unlikely(ret == -1)) {
 			evtims[i]->state = RTE_EVENT_TIMER_ERROR_TOOLATE;
 			rte_errno = EINVAL;
 			break;
-		}
-		if (ret == -2) {
+		} else if (unlikely(ret == -2)) {
 			evtims[i]->state = RTE_EVENT_TIMER_ERROR_TOOEARLY;
 			rte_errno = EINVAL;
 			break;
 		}
 
-		if (check_destination_event_queue(evtims[i], adapter) < 0) {
+		if (unlikely(check_destination_event_queue(evtims[i],
+							   adapter) < 0)) {
 			evtims[i]->state = RTE_EVENT_TIMER_ERROR;
 			rte_errno = EINVAL;
 			break;
 		}
 
-		/* Checks passed, set up a message to enqueue */
-		msgs[i]->type = MSG_TYPE_ARM;
-		msgs[i]->evtim = evtims[i];
+		tim = tims[i];
+		rte_timer_init(tim);
 
-		/* Set the payload pointer if not set. */
-		if (evtims[i]->ev.event_ptr == NULL)
-			evtims[i]->ev.event_ptr = evtims[i];
+		evtims[i]->impl_opaque[0] = (uintptr_t)tim;
+		evtims[i]->impl_opaque[1] = (uintptr_t)adapter;
 
-		/* msg objects that get enqueued successfully will be freed
-		 * either by a future cancel operation or by the timer
-		 * expiration callback.
-		 */
-		if (rte_ring_enqueue(sw_data->msg_ring, msgs[i]) < 0) {
-			rte_errno = ENOSPC;
+		cycles = get_timeout_cycles(evtims[i], adapter);
+		ret = rte_timer_alt_reset(sw->timer_data_id, tim, cycles,
+					  SINGLE, lcore_id, NULL, evtims[i]);
+		if (ret < 0) {
+			/* tim was in RUNNING or CONFIG state */
+			evtims[i]->state = RTE_EVENT_TIMER_ERROR;
 			break;
 		}
 
-		EVTIM_LOG_DBG("enqueued ARM message to ring");
-
+		rte_smp_wmb();
+		EVTIM_LOG_DBG("armed an event timer");
 		evtims[i]->state = RTE_EVENT_TIMER_ARMED;
 	}
 
-	/* Let the service know we're done producing messages */
-	rte_atomic16_dec(&sw_data->message_producer_count);
-
 	if (i < nb_evtims)
-		rte_mempool_put_bulk(sw_data->msg_pool, (void **)&msgs[i],
-				     nb_evtims - i);
+		rte_mempool_put_bulk(sw->tim_pool,
+				     (void **)&tims[i], nb_evtims - i);
 
 	return i;
 }
 
 static uint16_t
-sw_event_timer_arm_burst(const struct rte_event_timer_adapter *adapter,
-			 struct rte_event_timer **evtims,
-			 uint16_t nb_evtims)
+swtim_arm_burst(const struct rte_event_timer_adapter *adapter,
+		struct rte_event_timer **evtims,
+		uint16_t nb_evtims)
 {
-	return __sw_event_timer_arm_burst(adapter, evtims, nb_evtims);
+	return __swtim_arm_burst(adapter, evtims, nb_evtims);
 }
 
 static uint16_t
-sw_event_timer_cancel_burst(const struct rte_event_timer_adapter *adapter,
-			    struct rte_event_timer **evtims,
-			    uint16_t nb_evtims)
+swtim_cancel_burst(const struct rte_event_timer_adapter *adapter,
+		   struct rte_event_timer **evtims,
+		   uint16_t nb_evtims)
 {
-	uint16_t i;
-	int ret;
-	struct rte_event_timer_adapter_sw_data *sw_data;
-	struct msg *msgs[nb_evtims];
+	int i, ret;
+	struct rte_timer *timp;
+	uint64_t opaque;
+	struct swtim *sw = swtim_pmd_priv(adapter);
 
 #ifdef RTE_LIBRTE_EVENTDEV_DEBUG
 	/* Check that the service is running. */
@@ -1205,23 +1086,6 @@ sw_event_timer_cancel_burst(const struct rte_event_timer_adapter *adapter,
 	}
 #endif
 
-	sw_data = adapter->data->adapter_priv;
-
-	ret = rte_mempool_get_bulk(sw_data->msg_pool, (void **)msgs, nb_evtims);
-	if (ret < 0) {
-		rte_errno = ENOSPC;
-		return 0;
-	}
-
-	/* Let the service know we're producing messages for it to process */
-	rte_atomic16_inc(&sw_data->message_producer_count);
-
-	/* If the service could be modifying event timer states, wait */
-	while (sw_data->service_phase == 2)
-		rte_pause();
-
-	rte_smp_rmb();
-
 	for (i = 0; i < nb_evtims; i++) {
 		/* Don't modify the event timer state in these cases */
 		if (evtims[i]->state == RTE_EVENT_TIMER_CANCELED) {
@@ -1232,54 +1096,54 @@ sw_event_timer_cancel_burst(const struct rte_event_timer_adapter *adapter,
 			break;
 		}
 
-		msgs[i]->type = MSG_TYPE_CANCEL;
-		msgs[i]->evtim = evtims[i];
+		opaque = evtims[i]->impl_opaque[0];
+		timp = (struct rte_timer *)(uintptr_t)opaque;
+		RTE_ASSERT(timp != NULL);
 
-		if (rte_ring_enqueue(sw_data->msg_ring, msgs[i]) < 0) {
-			rte_errno = ENOSPC;
+		ret = rte_timer_alt_stop(sw->timer_data_id, timp);
+		if (ret < 0) {
+			/* Timer is running or being configured */
+			rte_errno = EAGAIN;
 			break;
 		}
 
-		EVTIM_LOG_DBG("enqueued CANCEL message to ring");
+		rte_mempool_put(sw->tim_pool, (void **)timp);
 
 		evtims[i]->state = RTE_EVENT_TIMER_CANCELED;
-	}
+		evtims[i]->impl_opaque[0] = 0;
+		evtims[i]->impl_opaque[1] = 0;
 
-	/* Let the service know we're done producing messages */
-	rte_atomic16_dec(&sw_data->message_producer_count);
-
-	if (i < nb_evtims)
-		rte_mempool_put_bulk(sw_data->msg_pool, (void **)&msgs[i],
-				     nb_evtims - i);
+		rte_smp_wmb();
+	}
 
 	return i;
 }
 
 static uint16_t
-sw_event_timer_arm_tmo_tick_burst(const struct rte_event_timer_adapter *adapter,
-				  struct rte_event_timer **evtims,
-				  uint64_t timeout_ticks,
-				  uint16_t nb_evtims)
+swtim_arm_tmo_tick_burst(const struct rte_event_timer_adapter *adapter,
+			 struct rte_event_timer **evtims,
+			 uint64_t timeout_ticks,
+			 uint16_t nb_evtims)
 {
 	int i;
 
 	for (i = 0; i < nb_evtims; i++)
 		evtims[i]->timeout_ticks = timeout_ticks;
 
-	return __sw_event_timer_arm_burst(adapter, evtims, nb_evtims);
+	return __swtim_arm_burst(adapter, evtims, nb_evtims);
 }
 
-static const struct rte_event_timer_adapter_ops sw_event_adapter_timer_ops = {
-	.init = sw_event_timer_adapter_init,
-	.uninit = sw_event_timer_adapter_uninit,
-	.start = sw_event_timer_adapter_start,
-	.stop = sw_event_timer_adapter_stop,
-	.get_info = sw_event_timer_adapter_get_info,
-	.stats_get = sw_event_timer_adapter_stats_get,
-	.stats_reset = sw_event_timer_adapter_stats_reset,
-	.arm_burst = sw_event_timer_arm_burst,
-	.arm_tmo_tick_burst = sw_event_timer_arm_tmo_tick_burst,
-	.cancel_burst = sw_event_timer_cancel_burst,
+static const struct rte_event_timer_adapter_ops swtim_ops = {
+	.init			= swtim_init,
+	.uninit			= swtim_uninit,
+	.start			= swtim_start,
+	.stop			= swtim_stop,
+	.get_info		= swtim_get_info,
+	.stats_get		= swtim_stats_get,
+	.stats_reset		= swtim_stats_reset,
+	.arm_burst		= swtim_arm_burst,
+	.arm_tmo_tick_burst	= swtim_arm_tmo_tick_burst,
+	.cancel_burst		= swtim_cancel_burst,
 };
 
 RTE_INIT(event_timer_adapter_init_log)
-- 
2.6.4

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [dpdk-dev] [PATCH v3 1/1] eventdev: add new software event timer adapter
  2018-12-14 15:45     ` [dpdk-dev] [PATCH v3 1/1] eventdev: add new " Erik Gabriel Carrillo
@ 2018-12-14 21:15       ` Mattias Rönnblom
  2018-12-14 23:04         ` Carrillo, Erik G
  0 siblings, 1 reply; 77+ messages in thread
From: Mattias Rönnblom @ 2018-12-14 21:15 UTC (permalink / raw)
  To: Erik Gabriel Carrillo, dev

On 2018-12-14 16:45, Erik Gabriel Carrillo wrote:
> This patch introduces a new version of the event timer adapter software
> PMD. In the original design, timer event producer lcores in the primary
> and secondary processes enqueued event timers into a ring, and a
> service core in the primary process dequeued them and processed them
> further.  To improve performance, this version does away with the ring
> and lets lcores in both primary and secondary processes insert timers
> directly into timer skiplist data structures; the service core directly
> accesses the lists as well, when looking for timers that have expired.
> 
> Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
> ---
>   lib/librte_eventdev/rte_event_timer_adapter.c | 688 +++++++++++---------------
>   1 file changed, 276 insertions(+), 412 deletions(-)
> 
> diff --git a/lib/librte_eventdev/rte_event_timer_adapter.c b/lib/librte_eventdev/rte_event_timer_adapter.c
> index 79070d4..029a45a 100644
> --- a/lib/librte_eventdev/rte_event_timer_adapter.c
> +++ b/lib/librte_eventdev/rte_event_timer_adapter.c
> @@ -19,6 +19,7 @@
>   #include <rte_timer.h>
>   #include <rte_service_component.h>
>   #include <rte_cycles.h>
> +#include <rte_random.h>

You aren't using anything from rte_random.h.

/../

> -static __rte_always_inline uint16_t
> -__sw_event_timer_arm_burst(const struct rte_event_timer_adapter *adapter,
> -			  struct rte_event_timer **evtims,
> -			  uint16_t nb_evtims)
> +static uint16_t
> +__swtim_arm_burst(const struct rte_event_timer_adapter *adapter,
> +		struct rte_event_timer **evtims,
> +		uint16_t nb_evtims)
>   {
> -	uint16_t i;
> -	int ret;
> -	struct rte_event_timer_adapter_sw_data *sw_data;
> -	struct msg *msgs[nb_evtims];
> +	int i, ret;
> +	struct swtim *sw = swtim_pmd_priv(adapter);
> +	uint32_t lcore_id = rte_lcore_id();
> +	struct rte_timer *tim, *tims[nb_evtims];
> +	uint64_t cycles;
>   
>   #ifdef RTE_LIBRTE_EVENTDEV_DEBUG
>   	/* Check that the service is running. */
> @@ -1101,101 +979,104 @@ __sw_event_timer_arm_burst(const struct rte_event_timer_adapter *adapter,
>   	}
>   #endif
>   
> -	sw_data = adapter->data->adapter_priv;
> +	/* Adjust lcore_id if non-EAL thread. Arbitrarily pick the timer list of
> +	 * the highest lcore to insert such timers into
> +	 */
> +	if (lcore_id == LCORE_ID_ANY)
> +		lcore_id = RTE_MAX_LCORE - 1;
> +
> +	/* If this is the first time we're arming an event timer on this lcore,
> +	 * mark this lcore as "in use"; this will cause the service
> +	 * function to process the timer list that corresponds to this lcore.
> +	 */
> +	if (unlikely(rte_atomic16_test_and_set(&sw->in_use[lcore_id].v))) {
> +		rte_spinlock_lock(&sw->poll_lcores_sl);
> +		EVTIM_LOG_DBG("Adding lcore id = %u to list of lcores to poll",
> +			      lcore_id);
> +		sw->poll_lcores[sw->n_poll_lcores++] = lcore_id;
> +		rte_spinlock_unlock(&sw->poll_lcores_sl);
> +	}
>   
> -	ret = rte_mempool_get_bulk(sw_data->msg_pool, (void **)msgs, nb_evtims);
> +	ret = rte_mempool_get_bulk(sw->tim_pool, (void **)tims,
> +				   nb_evtims);
>   	if (ret < 0) {
>   		rte_errno = ENOSPC;
>   		return 0;
>   	}
>   
> -	/* Let the service know we're producing messages for it to process */
> -	rte_atomic16_inc(&sw_data->message_producer_count);
> -
> -	/* If the service is managing timers, wait for it to finish */
> -	while (sw_data->service_phase == 2)
> -		rte_pause();
> -
> -	rte_smp_rmb();
> -
>   	for (i = 0; i < nb_evtims; i++) {
>   		/* Don't modify the event timer state in these cases */
>   		if (evtims[i]->state == RTE_EVENT_TIMER_ARMED) {
>   			rte_errno = EALREADY;
>   			break;
>   		} else if (!(evtims[i]->state == RTE_EVENT_TIMER_NOT_ARMED ||
> -		    evtims[i]->state == RTE_EVENT_TIMER_CANCELED)) {
> +			     evtims[i]->state == RTE_EVENT_TIMER_CANCELED)) {
>   			rte_errno = EINVAL;
>   			break;
>   		}
>   
>   		ret = check_timeout(evtims[i], adapter);
> -		if (ret == -1) {
> +		if (unlikely(ret == -1)) {
>   			evtims[i]->state = RTE_EVENT_TIMER_ERROR_TOOLATE;
>   			rte_errno = EINVAL;
>   			break;
> -		}
> -		if (ret == -2) {
> +		} else if (unlikely(ret == -2)) {
>   			evtims[i]->state = RTE_EVENT_TIMER_ERROR_TOOEARLY;
>   			rte_errno = EINVAL;
>   			break;
>   		}
>   
> -		if (check_destination_event_queue(evtims[i], adapter) < 0) {
> +		if (unlikely(check_destination_event_queue(evtims[i],
> +							   adapter) < 0)) {
>   			evtims[i]->state = RTE_EVENT_TIMER_ERROR;
>   			rte_errno = EINVAL;
>   			break;
>   		}
>   
> -		/* Checks passed, set up a message to enqueue */
> -		msgs[i]->type = MSG_TYPE_ARM;
> -		msgs[i]->evtim = evtims[i];
> +		tim = tims[i];
> +		rte_timer_init(tim);
>   
> -		/* Set the payload pointer if not set. */
> -		if (evtims[i]->ev.event_ptr == NULL)
> -			evtims[i]->ev.event_ptr = evtims[i];
> +		evtims[i]->impl_opaque[0] = (uintptr_t)tim;
> +		evtims[i]->impl_opaque[1] = (uintptr_t)adapter;
>   
> -		/* msg objects that get enqueued successfully will be freed
> -		 * either by a future cancel operation or by the timer
> -		 * expiration callback.
> -		 */
> -		if (rte_ring_enqueue(sw_data->msg_ring, msgs[i]) < 0) {
> -			rte_errno = ENOSPC;
> +		cycles = get_timeout_cycles(evtims[i], adapter);
> +		ret = rte_timer_alt_reset(sw->timer_data_id, tim, cycles,
> +					  SINGLE, lcore_id, NULL, evtims[i]);
> +		if (ret < 0) {
> +			/* tim was in RUNNING or CONFIG state */
> +			evtims[i]->state = RTE_EVENT_TIMER_ERROR;
>   			break;
>   		}
>   
> -		EVTIM_LOG_DBG("enqueued ARM message to ring");
> -
> +		rte_smp_wmb();
> +		EVTIM_LOG_DBG("armed an event timer");
>   		evtims[i]->state = RTE_EVENT_TIMER_ARMED;

This looks like you want a reader to see the impl_opaque[] stores, 
before the state store, which sounds like a good idea.

However, I fail to find the corresponding read barriers on the reader 
side. Shouldn't swtim_cancel_burst() have such? It's loading state, and 
loading impl_opaque[], but there's no guarantee those memory loads 
happens in program order.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [dpdk-dev] [PATCH v3 1/1] eventdev: add new software event timer adapter
  2018-12-14 21:15       ` Mattias Rönnblom
@ 2018-12-14 23:04         ` Carrillo, Erik G
  0 siblings, 0 replies; 77+ messages in thread
From: Carrillo, Erik G @ 2018-12-14 23:04 UTC (permalink / raw)
  To: Mattias Rönnblom, dev

> >   lib/librte_eventdev/rte_event_timer_adapter.c | 688 +++++++++++------
> ---------
> >   1 file changed, 276 insertions(+), 412 deletions(-)
> >
> > diff --git a/lib/librte_eventdev/rte_event_timer_adapter.c
> > b/lib/librte_eventdev/rte_event_timer_adapter.c
> > index 79070d4..029a45a 100644
> > --- a/lib/librte_eventdev/rte_event_timer_adapter.c
> > +++ b/lib/librte_eventdev/rte_event_timer_adapter.c
> > @@ -19,6 +19,7 @@
> >   #include <rte_timer.h>
> >   #include <rte_service_component.h>
> >   #include <rte_cycles.h>
> > +#include <rte_random.h>
> 
> You aren't using anything from rte_random.h.
> 
> /../
> 

Removed in next version.

<...snipped...>

> > -		if (check_destination_event_queue(evtims[i], adapter) < 0) {
> > +		if (unlikely(check_destination_event_queue(evtims[i],
> > +							   adapter) < 0)) {
> >   			evtims[i]->state = RTE_EVENT_TIMER_ERROR;
> >   			rte_errno = EINVAL;
> >   			break;
> >   		}
> >
> > -		/* Checks passed, set up a message to enqueue */
> > -		msgs[i]->type = MSG_TYPE_ARM;
> > -		msgs[i]->evtim = evtims[i];
> > +		tim = tims[i];
> > +		rte_timer_init(tim);
> >
> > -		/* Set the payload pointer if not set. */
> > -		if (evtims[i]->ev.event_ptr == NULL)
> > -			evtims[i]->ev.event_ptr = evtims[i];
> > +		evtims[i]->impl_opaque[0] = (uintptr_t)tim;
> > +		evtims[i]->impl_opaque[1] = (uintptr_t)adapter;
> >
> > -		/* msg objects that get enqueued successfully will be freed
> > -		 * either by a future cancel operation or by the timer
> > -		 * expiration callback.
> > -		 */
> > -		if (rte_ring_enqueue(sw_data->msg_ring, msgs[i]) < 0) {
> > -			rte_errno = ENOSPC;
> > +		cycles = get_timeout_cycles(evtims[i], adapter);
> > +		ret = rte_timer_alt_reset(sw->timer_data_id, tim, cycles,
> > +					  SINGLE, lcore_id, NULL, evtims[i]);
> > +		if (ret < 0) {
> > +			/* tim was in RUNNING or CONFIG state */
> > +			evtims[i]->state = RTE_EVENT_TIMER_ERROR;
> >   			break;
> >   		}
> >
> > -		EVTIM_LOG_DBG("enqueued ARM message to ring");
> > -
> > +		rte_smp_wmb();
> > +		EVTIM_LOG_DBG("armed an event timer");
> >   		evtims[i]->state = RTE_EVENT_TIMER_ARMED;
> 
> This looks like you want a reader to see the impl_opaque[] stores, before the
> state store, which sounds like a good idea.
> 
> However, I fail to find the corresponding read barriers on the reader side.
> Shouldn't swtim_cancel_burst() have such? It's loading state, and loading
> impl_opaque[], but there's no guarantee those memory loads happens in
> program order.

Thanks again for the good catch.  Updating this in the next version of the patch.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [dpdk-dev] [PATCH v4 0/1] New software event timer adapter
  2018-12-14 15:45   ` [dpdk-dev] [PATCH v3 0/1] New " Erik Gabriel Carrillo
  2018-12-14 15:45     ` [dpdk-dev] [PATCH v3 1/1] eventdev: add new " Erik Gabriel Carrillo
@ 2018-12-14 23:15     ` Erik Gabriel Carrillo
  2018-12-14 23:15       ` [dpdk-dev] [PATCH v4 1/1] eventdev: add new " Erik Gabriel Carrillo
                         ` (2 more replies)
  1 sibling, 3 replies; 77+ messages in thread
From: Erik Gabriel Carrillo @ 2018-12-14 23:15 UTC (permalink / raw)
  To: jerin.jacob; +Cc: mattias.ronnblom, pbhagavatula, rsanford, dev

This patch introduces a new version of the event timer adapter software
PMD [1]. In the original design, timer event producer lcores in the primary
and secondary processes enqueued event timers into a ring, and a service
core in the primary process dequeued them and processed them further.  To
improve performance, this version does away with the ring and lets lcores in
both primary and secondary processes insert timers directly into timer
skiplist data structures; the service core directly accesses the lists as
well, when looking for timers that have expired. (This behavior requires
the patch to the timer library that is referenced below.)

Depends on: https://patches.dpdk.org/project/dpdk/list/?series=2767

[1] https://doc.dpdk.org/guides/prog_guide/event_timer_adapter.html

Changes in v4:
 - Addressed the following comments from Mattias Ronnblom:
   - remove unnecessary header include
   - add missing read barrier in timer cancel function

Changes in v3:
 - Addressed comments from Mattias Ronnblom:
   - remove unnecessary header include
   - remove unnecessary cast in mempool_put() call
   - update alignment of elements of array to avoid false sharing issue

Changes in v2:
 - split this change out into its own patch series

Erik Gabriel Carrillo (1):
  eventdev: add new software event timer adapter

 lib/librte_eventdev/rte_event_timer_adapter.c | 689 +++++++++++---------------
 1 file changed, 277 insertions(+), 412 deletions(-)

-- 
2.6.4

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [dpdk-dev] [PATCH v4 1/1] eventdev: add new software event timer adapter
  2018-12-14 23:15     ` [dpdk-dev] [PATCH v4 0/1] New " Erik Gabriel Carrillo
@ 2018-12-14 23:15       ` Erik Gabriel Carrillo
  2018-12-18 20:11       ` [dpdk-dev] [EXT] [PATCH v4 0/1] New " Jerin Jacob Kollanukkaran
  2019-04-22 14:57       ` [dpdk-dev] [PATCH v5 " Erik Gabriel Carrillo
  2 siblings, 0 replies; 77+ messages in thread
From: Erik Gabriel Carrillo @ 2018-12-14 23:15 UTC (permalink / raw)
  To: jerin.jacob; +Cc: mattias.ronnblom, pbhagavatula, rsanford, dev

This patch introduces a new version of the event timer adapter software
PMD. In the original design, timer event producer lcores in the primary
and secondary processes enqueued event timers into a ring, and a
service core in the primary process dequeued them and processed them
further.  To improve performance, this version does away with the ring
and lets lcores in both primary and secondary processes insert timers
directly into timer skiplist data structures; the service core directly
accesses the lists as well, when looking for timers that have expired.

Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
---
 lib/librte_eventdev/rte_event_timer_adapter.c | 689 +++++++++++---------------
 1 file changed, 277 insertions(+), 412 deletions(-)

diff --git a/lib/librte_eventdev/rte_event_timer_adapter.c b/lib/librte_eventdev/rte_event_timer_adapter.c
index 79070d4..3851896 100644
--- a/lib/librte_eventdev/rte_event_timer_adapter.c
+++ b/lib/librte_eventdev/rte_event_timer_adapter.c
@@ -34,7 +34,7 @@ static int evtim_buffer_logtype;
 
 static struct rte_event_timer_adapter adapters[RTE_EVENT_TIMER_ADAPTER_NUM_MAX];
 
-static const struct rte_event_timer_adapter_ops sw_event_adapter_timer_ops;
+static const struct rte_event_timer_adapter_ops swtim_ops;
 
 #define EVTIM_LOG(level, logtype, ...) \
 	rte_log(RTE_LOG_ ## level, logtype, \
@@ -211,7 +211,7 @@ rte_event_timer_adapter_create_ext(
 	 * implementation.
 	 */
 	if (adapter->ops == NULL)
-		adapter->ops = &sw_event_adapter_timer_ops;
+		adapter->ops = &swtim_ops;
 
 	/* Allow driver to do some setup */
 	FUNC_PTR_OR_NULL_RET_WITH_ERRNO(adapter->ops->init, -ENOTSUP);
@@ -334,7 +334,7 @@ rte_event_timer_adapter_lookup(uint16_t adapter_id)
 	 * implementation.
 	 */
 	if (adapter->ops == NULL)
-		adapter->ops = &sw_event_adapter_timer_ops;
+		adapter->ops = &swtim_ops;
 
 	/* Set fast-path function pointers */
 	adapter->arm_burst = adapter->ops->arm_burst;
@@ -491,6 +491,7 @@ event_buffer_flush(struct event_buffer *bufp, uint8_t dev_id, uint8_t port_id,
 	}
 
 	*nb_events_inv = 0;
+
 	*nb_events_flushed = rte_event_enqueue_burst(dev_id, port_id,
 						     &events[tail_idx], n);
 	if (*nb_events_flushed != n && rte_errno == -EINVAL) {
@@ -498,137 +499,125 @@ event_buffer_flush(struct event_buffer *bufp, uint8_t dev_id, uint8_t port_id,
 		(*nb_events_inv)++;
 	}
 
+	if (*nb_events_flushed > 0)
+		EVTIM_BUF_LOG_DBG("enqueued %"PRIu16" timer events to event "
+				  "device", *nb_events_flushed);
+
 	bufp->tail = bufp->tail + *nb_events_flushed + *nb_events_inv;
 }
 
 /*
  * Software event timer adapter implementation
  */
-
-struct rte_event_timer_adapter_sw_data {
-	/* List of messages for outstanding timers */
-	TAILQ_HEAD(, msg) msgs_tailq_head;
-	/* Lock to guard tailq and armed count */
-	rte_spinlock_t msgs_tailq_sl;
+struct swtim {
 	/* Identifier of service executing timer management logic. */
 	uint32_t service_id;
 	/* The cycle count at which the adapter should next tick */
 	uint64_t next_tick_cycles;
-	/* Incremented as the service moves through phases of an iteration */
-	volatile int service_phase;
 	/* The tick resolution used by adapter instance. May have been
 	 * adjusted from what user requested
 	 */
 	uint64_t timer_tick_ns;
 	/* Maximum timeout in nanoseconds allowed by adapter instance. */
 	uint64_t max_tmo_ns;
-	/* Ring containing messages to arm or cancel event timers */
-	struct rte_ring *msg_ring;
-	/* Mempool containing msg objects */
-	struct rte_mempool *msg_pool;
 	/* Buffered timer expiry events to be enqueued to an event device. */
 	struct event_buffer buffer;
 	/* Statistics */
 	struct rte_event_timer_adapter_stats stats;
-	/* The number of threads currently adding to the message ring */
-	rte_atomic16_t message_producer_count;
+	/* Mempool of timer objects */
+	struct rte_mempool *tim_pool;
+	/* Back pointer for convenience */
+	struct rte_event_timer_adapter *adapter;
+	/* Identifier of timer data instance */
+	uint32_t timer_data_id;
+	/* Track which cores have actually armed a timer */
+	struct {
+		rte_atomic16_t v;
+	} __rte_cache_aligned in_use[RTE_MAX_LCORE];
+	/* Track which cores' timer lists should be polled */
+	unsigned int poll_lcores[RTE_MAX_LCORE];
+	/* The number of lists that should be polled */
+	int n_poll_lcores;
+	/* Lock to atomically access the above two variables */
+	rte_spinlock_t poll_lcores_sl;
 };
 
-enum msg_type {MSG_TYPE_ARM, MSG_TYPE_CANCEL};
-
-struct msg {
-	enum msg_type type;
-	struct rte_event_timer *evtim;
-	struct rte_timer tim;
-	TAILQ_ENTRY(msg) msgs;
-};
+static inline struct swtim *
+swtim_pmd_priv(const struct rte_event_timer_adapter *adapter)
+{
+	return adapter->data->adapter_priv;
+}
 
 static void
-sw_event_timer_cb(struct rte_timer *tim, void *arg)
+swtim_callback(void *arg)
 {
-	int ret;
+	struct rte_timer *tim = arg;
+	struct rte_event_timer *evtim = tim->arg;
+	struct rte_event_timer_adapter *adapter;
+	struct swtim *sw;
 	uint16_t nb_evs_flushed = 0;
 	uint16_t nb_evs_invalid = 0;
 	uint64_t opaque;
-	struct rte_event_timer *evtim;
-	struct rte_event_timer_adapter *adapter;
-	struct rte_event_timer_adapter_sw_data *sw_data;
+	int ret;
 
-	evtim = arg;
 	opaque = evtim->impl_opaque[1];
 	adapter = (struct rte_event_timer_adapter *)(uintptr_t)opaque;
-	sw_data = adapter->data->adapter_priv;
+	sw = swtim_pmd_priv(adapter);
 
-	ret = event_buffer_add(&sw_data->buffer, &evtim->ev);
+	ret = event_buffer_add(&sw->buffer, &evtim->ev);
 	if (ret < 0) {
 		/* If event buffer is full, put timer back in list with
 		 * immediate expiry value, so that we process it again on the
 		 * next iteration.
 		 */
-		rte_timer_reset_sync(tim, 0, SINGLE, rte_lcore_id(),
-				     sw_event_timer_cb, evtim);
+		rte_timer_alt_reset(sw->timer_data_id, tim, 0, SINGLE,
+				    rte_lcore_id(), NULL, evtim);
+
+		sw->stats.evtim_retry_count++;
 
-		sw_data->stats.evtim_retry_count++;
 		EVTIM_LOG_DBG("event buffer full, resetting rte_timer with "
 			      "immediate expiry value");
 	} else {
-		struct msg *m = container_of(tim, struct msg, tim);
-		TAILQ_REMOVE(&sw_data->msgs_tailq_head, m, msgs);
 		EVTIM_BUF_LOG_DBG("buffered an event timer expiry event");
-		evtim->state = RTE_EVENT_TIMER_NOT_ARMED;
+		rte_mempool_put(sw->tim_pool, tim);
+		sw->stats.evtim_exp_count++;
 
-		/* Free the msg object containing the rte_timer now that
-		 * we've buffered its event successfully.
-		 */
-		rte_mempool_put(sw_data->msg_pool, m);
-
-		/* Bump the count when we successfully add an expiry event to
-		 * the buffer.
-		 */
-		sw_data->stats.evtim_exp_count++;
+		evtim->state = RTE_EVENT_TIMER_NOT_ARMED;
 	}
 
-	if (event_buffer_batch_ready(&sw_data->buffer)) {
-		event_buffer_flush(&sw_data->buffer,
+	if (event_buffer_batch_ready(&sw->buffer)) {
+		event_buffer_flush(&sw->buffer,
 				   adapter->data->event_dev_id,
 				   adapter->data->event_port_id,
 				   &nb_evs_flushed,
 				   &nb_evs_invalid);
 
-		sw_data->stats.ev_enq_count += nb_evs_flushed;
-		sw_data->stats.ev_inv_count += nb_evs_invalid;
+		sw->stats.ev_enq_count += nb_evs_flushed;
+		sw->stats.ev_inv_count += nb_evs_invalid;
 	}
 }
 
 static __rte_always_inline uint64_t
 get_timeout_cycles(struct rte_event_timer *evtim,
-		   struct rte_event_timer_adapter *adapter)
+		   const struct rte_event_timer_adapter *adapter)
 {
-	uint64_t timeout_ns;
-	struct rte_event_timer_adapter_sw_data *sw_data;
-
-	sw_data = adapter->data->adapter_priv;
-	timeout_ns = evtim->timeout_ticks * sw_data->timer_tick_ns;
+	struct swtim *sw = swtim_pmd_priv(adapter);
+	uint64_t timeout_ns = evtim->timeout_ticks * sw->timer_tick_ns;
 	return timeout_ns * rte_get_timer_hz() / NSECPERSEC;
-
 }
 
 /* This function returns true if one or more (adapter) ticks have occurred since
  * the last time it was called.
  */
 static inline bool
-adapter_did_tick(struct rte_event_timer_adapter *adapter)
+swtim_did_tick(struct swtim *sw)
 {
 	uint64_t cycles_per_adapter_tick, start_cycles;
 	uint64_t *next_tick_cyclesp;
-	struct rte_event_timer_adapter_sw_data *sw_data;
-
-	sw_data = adapter->data->adapter_priv;
-	next_tick_cyclesp = &sw_data->next_tick_cycles;
 
-	cycles_per_adapter_tick = sw_data->timer_tick_ns *
+	next_tick_cyclesp = &sw->next_tick_cycles;
+	cycles_per_adapter_tick = sw->timer_tick_ns *
 			(rte_get_timer_hz() / NSECPERSEC);
-
 	start_cycles = rte_get_timer_cycles();
 
 	/* Note: initially, *next_tick_cyclesp == 0, so the clause below will
@@ -640,7 +629,6 @@ adapter_did_tick(struct rte_event_timer_adapter *adapter)
 		 * boundary.
 		 */
 		start_cycles -= start_cycles % cycles_per_adapter_tick;
-
 		*next_tick_cyclesp = start_cycles + cycles_per_adapter_tick;
 
 		return true;
@@ -655,15 +643,12 @@ check_timeout(struct rte_event_timer *evtim,
 	      const struct rte_event_timer_adapter *adapter)
 {
 	uint64_t tmo_nsec;
-	struct rte_event_timer_adapter_sw_data *sw_data;
-
-	sw_data = adapter->data->adapter_priv;
-	tmo_nsec = evtim->timeout_ticks * sw_data->timer_tick_ns;
+	struct swtim *sw = swtim_pmd_priv(adapter);
 
-	if (tmo_nsec > sw_data->max_tmo_ns)
+	tmo_nsec = evtim->timeout_ticks * sw->timer_tick_ns;
+	if (tmo_nsec > sw->max_tmo_ns)
 		return -1;
-
-	if (tmo_nsec < sw_data->timer_tick_ns)
+	if (tmo_nsec < sw->timer_tick_ns)
 		return -2;
 
 	return 0;
@@ -691,110 +676,34 @@ check_destination_event_queue(struct rte_event_timer *evtim,
 	return 0;
 }
 
-#define NB_OBJS 32
 static int
-sw_event_timer_adapter_service_func(void *arg)
+swtim_service_func(void *arg)
 {
-	int i, num_msgs;
-	uint64_t cycles, opaque;
+	struct rte_event_timer_adapter *adapter = arg;
+	struct swtim *sw = swtim_pmd_priv(adapter);
 	uint16_t nb_evs_flushed = 0;
 	uint16_t nb_evs_invalid = 0;
-	struct rte_event_timer_adapter *adapter;
-	struct rte_event_timer_adapter_sw_data *sw_data;
-	struct rte_event_timer *evtim = NULL;
-	struct rte_timer *tim = NULL;
-	struct msg *msg, *msgs[NB_OBJS];
-
-	adapter = arg;
-	sw_data = adapter->data->adapter_priv;
-
-	sw_data->service_phase = 1;
-	rte_smp_wmb();
-
-	while (rte_atomic16_read(&sw_data->message_producer_count) > 0 ||
-	       !rte_ring_empty(sw_data->msg_ring)) {
-
-		num_msgs = rte_ring_dequeue_burst(sw_data->msg_ring,
-						  (void **)msgs, NB_OBJS, NULL);
-
-		for (i = 0; i < num_msgs; i++) {
-			int ret = 0;
-
-			RTE_SET_USED(ret);
-
-			msg = msgs[i];
-			evtim = msg->evtim;
-
-			switch (msg->type) {
-			case MSG_TYPE_ARM:
-				EVTIM_SVC_LOG_DBG("dequeued ARM message from "
-						  "ring");
-				tim = &msg->tim;
-				rte_timer_init(tim);
-				cycles = get_timeout_cycles(evtim,
-							    adapter);
-				ret = rte_timer_reset(tim, cycles, SINGLE,
-						      rte_lcore_id(),
-						      sw_event_timer_cb,
-						      evtim);
-				RTE_ASSERT(ret == 0);
-
-				evtim->impl_opaque[0] = (uintptr_t)tim;
-				evtim->impl_opaque[1] = (uintptr_t)adapter;
-
-				TAILQ_INSERT_TAIL(&sw_data->msgs_tailq_head,
-						  msg,
-						  msgs);
-				break;
-			case MSG_TYPE_CANCEL:
-				EVTIM_SVC_LOG_DBG("dequeued CANCEL message "
-						  "from ring");
-				opaque = evtim->impl_opaque[0];
-				tim = (struct rte_timer *)(uintptr_t)opaque;
-				RTE_ASSERT(tim != NULL);
-
-				ret = rte_timer_stop(tim);
-				RTE_ASSERT(ret == 0);
-
-				/* Free the msg object for the original arm
-				 * request.
-				 */
-				struct msg *m;
-				m = container_of(tim, struct msg, tim);
-				TAILQ_REMOVE(&sw_data->msgs_tailq_head, m,
-					     msgs);
-				rte_mempool_put(sw_data->msg_pool, m);
-
-				/* Free the msg object for the current msg */
-				rte_mempool_put(sw_data->msg_pool, msg);
-
-				evtim->impl_opaque[0] = 0;
-				evtim->impl_opaque[1] = 0;
-
-				break;
-			}
-		}
-	}
-
-	sw_data->service_phase = 2;
-	rte_smp_wmb();
 
-	if (adapter_did_tick(adapter)) {
-		rte_timer_manage();
+	if (swtim_did_tick(sw)) {
+		/* This lock is seldom acquired on the arm side */
+		rte_spinlock_lock(&sw->poll_lcores_sl);
+		rte_timer_alt_manage(sw->timer_data_id,
+				     sw->poll_lcores,
+				     sw->n_poll_lcores,
+				     swtim_callback);
+		rte_spinlock_unlock(&sw->poll_lcores_sl);
 
-		event_buffer_flush(&sw_data->buffer,
+		event_buffer_flush(&sw->buffer,
 				   adapter->data->event_dev_id,
 				   adapter->data->event_port_id,
-				   &nb_evs_flushed, &nb_evs_invalid);
+				   &nb_evs_flushed,
+				   &nb_evs_invalid);
 
-		sw_data->stats.ev_enq_count += nb_evs_flushed;
-		sw_data->stats.ev_inv_count += nb_evs_invalid;
-		sw_data->stats.adapter_tick_count++;
+		sw->stats.ev_enq_count += nb_evs_flushed;
+		sw->stats.ev_inv_count += nb_evs_invalid;
+		sw->stats.adapter_tick_count++;
 	}
 
-	sw_data->service_phase = 0;
-	rte_smp_wmb();
-
 	return 0;
 }
 
@@ -828,168 +737,145 @@ compute_msg_mempool_cache_size(uint64_t nb_requested, uint64_t nb_actual)
 	return cache_size;
 }
 
-#define SW_MIN_INTERVAL 1E5
-
 static int
-sw_event_timer_adapter_init(struct rte_event_timer_adapter *adapter)
+swtim_init(struct rte_event_timer_adapter *adapter)
 {
-	int ret;
-	struct rte_event_timer_adapter_sw_data *sw_data;
-	uint64_t nb_timers;
+	int i, ret;
+	struct swtim *sw;
 	unsigned int flags;
 	struct rte_service_spec service;
-	static bool timer_subsystem_inited; // static initialized to false
 
-	/* Allocate storage for SW implementation data */
-	char priv_data_name[RTE_RING_NAMESIZE];
-	snprintf(priv_data_name, RTE_RING_NAMESIZE, "sw_evtim_adap_priv_%"PRIu8,
-		 adapter->data->id);
-	adapter->data->adapter_priv = rte_zmalloc_socket(
-				priv_data_name,
-				sizeof(struct rte_event_timer_adapter_sw_data),
-				RTE_CACHE_LINE_SIZE,
-				adapter->data->socket_id);
-	if (adapter->data->adapter_priv == NULL) {
+	/* Allocate storage for private data area */
+#define SWTIM_NAMESIZE 32
+	char swtim_name[SWTIM_NAMESIZE];
+	snprintf(swtim_name, SWTIM_NAMESIZE, "swtim_%"PRIu8,
+			adapter->data->id);
+	sw = rte_zmalloc_socket(swtim_name, sizeof(*sw), RTE_CACHE_LINE_SIZE,
+			adapter->data->socket_id);
+	if (sw == NULL) {
 		EVTIM_LOG_ERR("failed to allocate space for private data");
 		rte_errno = ENOMEM;
 		return -1;
 	}
 
-	if (adapter->data->conf.timer_tick_ns < SW_MIN_INTERVAL) {
-		EVTIM_LOG_ERR("failed to create adapter with requested tick "
-			      "interval");
-		rte_errno = EINVAL;
-		return -1;
-	}
-
-	sw_data = adapter->data->adapter_priv;
-
-	sw_data->timer_tick_ns = adapter->data->conf.timer_tick_ns;
-	sw_data->max_tmo_ns = adapter->data->conf.max_tmo_ns;
-
-	TAILQ_INIT(&sw_data->msgs_tailq_head);
-	rte_spinlock_init(&sw_data->msgs_tailq_sl);
-	rte_atomic16_init(&sw_data->message_producer_count);
+	/* Connect storage to adapter instance */
+	adapter->data->adapter_priv = sw;
+	sw->adapter = adapter;
 
-	/* Rings require power of 2, so round up to next such value */
-	nb_timers = rte_align64pow2(adapter->data->conf.nb_timers);
-
-	char msg_ring_name[RTE_RING_NAMESIZE];
-	snprintf(msg_ring_name, RTE_RING_NAMESIZE,
-		 "sw_evtim_adap_msg_ring_%"PRIu8, adapter->data->id);
-	flags = adapter->data->conf.flags & RTE_EVENT_TIMER_ADAPTER_F_SP_PUT ?
-		RING_F_SP_ENQ | RING_F_SC_DEQ :
-		RING_F_SC_DEQ;
-	sw_data->msg_ring = rte_ring_create(msg_ring_name, nb_timers,
-					    adapter->data->socket_id, flags);
-	if (sw_data->msg_ring == NULL) {
-		EVTIM_LOG_ERR("failed to create message ring");
-		rte_errno = ENOMEM;
-		goto free_priv_data;
-	}
+	sw->timer_tick_ns = adapter->data->conf.timer_tick_ns;
+	sw->max_tmo_ns = adapter->data->conf.max_tmo_ns;
 
-	char pool_name[RTE_RING_NAMESIZE];
-	snprintf(pool_name, RTE_RING_NAMESIZE, "sw_evtim_adap_msg_pool_%"PRIu8,
+	/* Create a timer pool */
+	char pool_name[SWTIM_NAMESIZE];
+	snprintf(pool_name, SWTIM_NAMESIZE, "swtim_pool_%"PRIu8,
 		 adapter->data->id);
-
-	/* Both the arming/canceling thread and the service thread will do puts
-	 * to the mempool, but if the SP_PUT flag is enabled, we can specify
-	 * single-consumer get for the mempool.
-	 */
-	flags = adapter->data->conf.flags & RTE_EVENT_TIMER_ADAPTER_F_SP_PUT ?
-		MEMPOOL_F_SC_GET : 0;
-
-	/* The usable size of a ring is count - 1, so subtract one here to
-	 * make the counts agree.
-	 */
+	/* Optimal mempool size is a power of 2 minus one */
+	uint64_t nb_timers = rte_align64pow2(adapter->data->conf.nb_timers);
 	int pool_size = nb_timers - 1;
 	int cache_size = compute_msg_mempool_cache_size(
 				adapter->data->conf.nb_timers, nb_timers);
-	sw_data->msg_pool = rte_mempool_create(pool_name, pool_size,
-					       sizeof(struct msg), cache_size,
-					       0, NULL, NULL, NULL, NULL,
-					       adapter->data->socket_id, flags);
-	if (sw_data->msg_pool == NULL) {
-		EVTIM_LOG_ERR("failed to create message object mempool");
+	flags = 0; /* pool is multi-producer, multi-consumer */
+	sw->tim_pool = rte_mempool_create(pool_name, pool_size,
+			sizeof(struct rte_timer), cache_size, 0, NULL, NULL,
+			NULL, NULL, adapter->data->socket_id, flags);
+	if (sw->tim_pool == NULL) {
+		EVTIM_LOG_ERR("failed to create timer object mempool");
 		rte_errno = ENOMEM;
-		goto free_msg_ring;
+		goto free_alloc;
+	}
+
+	/* Initialize the variables that track in-use timer lists */
+	rte_spinlock_init(&sw->poll_lcores_sl);
+	for (i = 0; i < RTE_MAX_LCORE; i++)
+		rte_atomic16_init(&sw->in_use[i].v);
+
+	/* Initialize the timer subsystem and allocate timer data instance */
+	ret = rte_timer_subsystem_init();
+	if (ret < 0) {
+		if (ret != -EALREADY) {
+			EVTIM_LOG_ERR("failed to initialize timer subsystem");
+			rte_errno = ret;
+			goto free_mempool;
+		}
+	}
+
+	ret = rte_timer_data_alloc(&sw->timer_data_id);
+	if (ret < 0) {
+		EVTIM_LOG_ERR("failed to allocate timer data instance");
+		rte_errno = ret;
+		goto free_mempool;
 	}
 
-	event_buffer_init(&sw_data->buffer);
+	/* Initialize timer event buffer */
+	event_buffer_init(&sw->buffer);
+
+	sw->adapter = adapter;
 
 	/* Register a service component to run adapter logic */
 	memset(&service, 0, sizeof(service));
 	snprintf(service.name, RTE_SERVICE_NAME_MAX,
-		 "sw_evimer_adap_svc_%"PRIu8, adapter->data->id);
+		 "swtim_svc_%"PRIu8, adapter->data->id);
 	service.socket_id = adapter->data->socket_id;
-	service.callback = sw_event_timer_adapter_service_func;
+	service.callback = swtim_service_func;
 	service.callback_userdata = adapter;
 	service.capabilities &= ~(RTE_SERVICE_CAP_MT_SAFE);
-	ret = rte_service_component_register(&service, &sw_data->service_id);
+	ret = rte_service_component_register(&service, &sw->service_id);
 	if (ret < 0) {
 		EVTIM_LOG_ERR("failed to register service %s with id %"PRIu32
-			      ": err = %d", service.name, sw_data->service_id,
+			      ": err = %d", service.name, sw->service_id,
 			      ret);
 
 		rte_errno = ENOSPC;
-		goto free_msg_pool;
+		goto free_mempool;
 	}
 
 	EVTIM_LOG_DBG("registered service %s with id %"PRIu32, service.name,
-		      sw_data->service_id);
+		      sw->service_id);
 
-	adapter->data->service_id = sw_data->service_id;
+	adapter->data->service_id = sw->service_id;
 	adapter->data->service_inited = 1;
 
-	if (!timer_subsystem_inited) {
-		rte_timer_subsystem_init();
-		timer_subsystem_inited = true;
-	}
-
 	return 0;
-
-free_msg_pool:
-	rte_mempool_free(sw_data->msg_pool);
-free_msg_ring:
-	rte_ring_free(sw_data->msg_ring);
-free_priv_data:
-	rte_free(sw_data);
+free_mempool:
+	rte_mempool_free(sw->tim_pool);
+free_alloc:
+	rte_free(sw);
 	return -1;
 }
 
-static int
-sw_event_timer_adapter_uninit(struct rte_event_timer_adapter *adapter)
+static void
+swtim_free_tim(struct rte_timer *tim, void *arg)
 {
-	int ret;
-	struct msg *m1, *m2;
-	struct rte_event_timer_adapter_sw_data *sw_data =
-						adapter->data->adapter_priv;
-
-	rte_spinlock_lock(&sw_data->msgs_tailq_sl);
+	struct swtim *sw = arg;
 
-	/* Cancel outstanding rte_timers and free msg objects */
-	m1 = TAILQ_FIRST(&sw_data->msgs_tailq_head);
-	while (m1 != NULL) {
-		EVTIM_LOG_DBG("freeing outstanding timer");
-		m2 = TAILQ_NEXT(m1, msgs);
-
-		rte_timer_stop_sync(&m1->tim);
-		rte_mempool_put(sw_data->msg_pool, m1);
+	rte_mempool_put(sw->tim_pool, tim);
+}
 
-		m1 = m2;
-	}
+/* Traverse the list of outstanding timers and put them back in the mempool
+ * before freeing the adapter to avoid leaking the memory.
+ */
+static int
+swtim_uninit(struct rte_event_timer_adapter *adapter)
+{
+	int ret;
+	struct swtim *sw = swtim_pmd_priv(adapter);
 
-	rte_spinlock_unlock(&sw_data->msgs_tailq_sl);
+	/* Free outstanding timers */
+	rte_timer_stop_all(sw->timer_data_id,
+			   sw->poll_lcores,
+			   sw->n_poll_lcores,
+			   swtim_free_tim,
+			   sw);
 
-	ret = rte_service_component_unregister(sw_data->service_id);
+	ret = rte_service_component_unregister(sw->service_id);
 	if (ret < 0) {
 		EVTIM_LOG_ERR("failed to unregister service component");
 		return ret;
 	}
 
-	rte_ring_free(sw_data->msg_ring);
-	rte_mempool_free(sw_data->msg_pool);
-	rte_free(adapter->data->adapter_priv);
+	rte_mempool_free(sw->tim_pool);
+	rte_free(sw);
+	adapter->data->adapter_priv = NULL;
 
 	return 0;
 }
@@ -1010,88 +896,79 @@ get_mapped_count_for_service(uint32_t service_id)
 }
 
 static int
-sw_event_timer_adapter_start(const struct rte_event_timer_adapter *adapter)
+swtim_start(const struct rte_event_timer_adapter *adapter)
 {
 	int mapped_count;
-	struct rte_event_timer_adapter_sw_data *sw_data;
-
-	sw_data = adapter->data->adapter_priv;
+	struct swtim *sw = swtim_pmd_priv(adapter);
 
 	/* Mapping the service to more than one service core can introduce
 	 * delays while one thread is waiting to acquire a lock, so only allow
 	 * one core to be mapped to the service.
+	 *
+	 * Note: the service could be modified such that it spreads cores to
+	 * poll over multiple service instances.
 	 */
-	mapped_count = get_mapped_count_for_service(sw_data->service_id);
+	mapped_count = get_mapped_count_for_service(sw->service_id);
 
-	if (mapped_count == 1)
-		return rte_service_component_runstate_set(sw_data->service_id,
-							  1);
+	if (mapped_count != 1)
+		return mapped_count < 1 ? -ENOENT : -ENOTSUP;
 
-	return mapped_count < 1 ? -ENOENT : -ENOTSUP;
+	return rte_service_component_runstate_set(sw->service_id, 1);
 }
 
 static int
-sw_event_timer_adapter_stop(const struct rte_event_timer_adapter *adapter)
+swtim_stop(const struct rte_event_timer_adapter *adapter)
 {
 	int ret;
-	struct rte_event_timer_adapter_sw_data *sw_data =
-						adapter->data->adapter_priv;
+	struct swtim *sw = swtim_pmd_priv(adapter);
 
-	ret = rte_service_component_runstate_set(sw_data->service_id, 0);
+	ret = rte_service_component_runstate_set(sw->service_id, 0);
 	if (ret < 0)
 		return ret;
 
-	/* Wait for the service to complete its final iteration before
-	 * stopping.
-	 */
-	while (sw_data->service_phase != 0)
+	/* Wait for the service to complete its final iteration */
+	while (rte_service_may_be_active(sw->service_id))
 		rte_pause();
 
-	rte_smp_rmb();
-
 	return 0;
 }
 
 static void
-sw_event_timer_adapter_get_info(const struct rte_event_timer_adapter *adapter,
+swtim_get_info(const struct rte_event_timer_adapter *adapter,
 		struct rte_event_timer_adapter_info *adapter_info)
 {
-	struct rte_event_timer_adapter_sw_data *sw_data;
-	sw_data = adapter->data->adapter_priv;
-
-	adapter_info->min_resolution_ns = sw_data->timer_tick_ns;
-	adapter_info->max_tmo_ns = sw_data->max_tmo_ns;
+	struct swtim *sw = swtim_pmd_priv(adapter);
+	adapter_info->min_resolution_ns = sw->timer_tick_ns;
+	adapter_info->max_tmo_ns = sw->max_tmo_ns;
 }
 
 static int
-sw_event_timer_adapter_stats_get(const struct rte_event_timer_adapter *adapter,
-				 struct rte_event_timer_adapter_stats *stats)
+swtim_stats_get(const struct rte_event_timer_adapter *adapter,
+		struct rte_event_timer_adapter_stats *stats)
 {
-	struct rte_event_timer_adapter_sw_data *sw_data;
-	sw_data = adapter->data->adapter_priv;
-	*stats = sw_data->stats;
+	struct swtim *sw = swtim_pmd_priv(adapter);
+	*stats = sw->stats; /* structure copy */
 	return 0;
 }
 
 static int
-sw_event_timer_adapter_stats_reset(
-				const struct rte_event_timer_adapter *adapter)
+swtim_stats_reset(const struct rte_event_timer_adapter *adapter)
 {
-	struct rte_event_timer_adapter_sw_data *sw_data;
-	sw_data = adapter->data->adapter_priv;
-	memset(&sw_data->stats, 0, sizeof(sw_data->stats));
+	struct swtim *sw = swtim_pmd_priv(adapter);
+	memset(&sw->stats, 0, sizeof(sw->stats));
 	return 0;
 }
 
-static __rte_always_inline uint16_t
-__sw_event_timer_arm_burst(const struct rte_event_timer_adapter *adapter,
-			  struct rte_event_timer **evtims,
-			  uint16_t nb_evtims)
+static uint16_t
+__swtim_arm_burst(const struct rte_event_timer_adapter *adapter,
+		struct rte_event_timer **evtims,
+		uint16_t nb_evtims)
 {
-	uint16_t i;
-	int ret;
-	struct rte_event_timer_adapter_sw_data *sw_data;
-	struct msg *msgs[nb_evtims];
+	int i, ret;
+	struct swtim *sw = swtim_pmd_priv(adapter);
+	uint32_t lcore_id = rte_lcore_id();
+	struct rte_timer *tim, *tims[nb_evtims];
+	uint64_t cycles;
 
 #ifdef RTE_LIBRTE_EVENTDEV_DEBUG
 	/* Check that the service is running. */
@@ -1101,101 +978,104 @@ __sw_event_timer_arm_burst(const struct rte_event_timer_adapter *adapter,
 	}
 #endif
 
-	sw_data = adapter->data->adapter_priv;
+	/* Adjust lcore_id if non-EAL thread. Arbitrarily pick the timer list of
+	 * the highest lcore to insert such timers into
+	 */
+	if (lcore_id == LCORE_ID_ANY)
+		lcore_id = RTE_MAX_LCORE - 1;
+
+	/* If this is the first time we're arming an event timer on this lcore,
+	 * mark this lcore as "in use"; this will cause the service
+	 * function to process the timer list that corresponds to this lcore.
+	 */
+	if (unlikely(rte_atomic16_test_and_set(&sw->in_use[lcore_id].v))) {
+		rte_spinlock_lock(&sw->poll_lcores_sl);
+		EVTIM_LOG_DBG("Adding lcore id = %u to list of lcores to poll",
+			      lcore_id);
+		sw->poll_lcores[sw->n_poll_lcores++] = lcore_id;
+		rte_spinlock_unlock(&sw->poll_lcores_sl);
+	}
 
-	ret = rte_mempool_get_bulk(sw_data->msg_pool, (void **)msgs, nb_evtims);
+	ret = rte_mempool_get_bulk(sw->tim_pool, (void **)tims,
+				   nb_evtims);
 	if (ret < 0) {
 		rte_errno = ENOSPC;
 		return 0;
 	}
 
-	/* Let the service know we're producing messages for it to process */
-	rte_atomic16_inc(&sw_data->message_producer_count);
-
-	/* If the service is managing timers, wait for it to finish */
-	while (sw_data->service_phase == 2)
-		rte_pause();
-
-	rte_smp_rmb();
-
 	for (i = 0; i < nb_evtims; i++) {
 		/* Don't modify the event timer state in these cases */
 		if (evtims[i]->state == RTE_EVENT_TIMER_ARMED) {
 			rte_errno = EALREADY;
 			break;
 		} else if (!(evtims[i]->state == RTE_EVENT_TIMER_NOT_ARMED ||
-		    evtims[i]->state == RTE_EVENT_TIMER_CANCELED)) {
+			     evtims[i]->state == RTE_EVENT_TIMER_CANCELED)) {
 			rte_errno = EINVAL;
 			break;
 		}
 
 		ret = check_timeout(evtims[i], adapter);
-		if (ret == -1) {
+		if (unlikely(ret == -1)) {
 			evtims[i]->state = RTE_EVENT_TIMER_ERROR_TOOLATE;
 			rte_errno = EINVAL;
 			break;
-		}
-		if (ret == -2) {
+		} else if (unlikely(ret == -2)) {
 			evtims[i]->state = RTE_EVENT_TIMER_ERROR_TOOEARLY;
 			rte_errno = EINVAL;
 			break;
 		}
 
-		if (check_destination_event_queue(evtims[i], adapter) < 0) {
+		if (unlikely(check_destination_event_queue(evtims[i],
+							   adapter) < 0)) {
 			evtims[i]->state = RTE_EVENT_TIMER_ERROR;
 			rte_errno = EINVAL;
 			break;
 		}
 
-		/* Checks passed, set up a message to enqueue */
-		msgs[i]->type = MSG_TYPE_ARM;
-		msgs[i]->evtim = evtims[i];
+		tim = tims[i];
+		rte_timer_init(tim);
 
-		/* Set the payload pointer if not set. */
-		if (evtims[i]->ev.event_ptr == NULL)
-			evtims[i]->ev.event_ptr = evtims[i];
+		evtims[i]->impl_opaque[0] = (uintptr_t)tim;
+		evtims[i]->impl_opaque[1] = (uintptr_t)adapter;
 
-		/* msg objects that get enqueued successfully will be freed
-		 * either by a future cancel operation or by the timer
-		 * expiration callback.
-		 */
-		if (rte_ring_enqueue(sw_data->msg_ring, msgs[i]) < 0) {
-			rte_errno = ENOSPC;
+		cycles = get_timeout_cycles(evtims[i], adapter);
+		ret = rte_timer_alt_reset(sw->timer_data_id, tim, cycles,
+					  SINGLE, lcore_id, NULL, evtims[i]);
+		if (ret < 0) {
+			/* tim was in RUNNING or CONFIG state */
+			evtims[i]->state = RTE_EVENT_TIMER_ERROR;
 			break;
 		}
 
-		EVTIM_LOG_DBG("enqueued ARM message to ring");
-
+		rte_smp_wmb();
+		EVTIM_LOG_DBG("armed an event timer");
 		evtims[i]->state = RTE_EVENT_TIMER_ARMED;
 	}
 
-	/* Let the service know we're done producing messages */
-	rte_atomic16_dec(&sw_data->message_producer_count);
-
 	if (i < nb_evtims)
-		rte_mempool_put_bulk(sw_data->msg_pool, (void **)&msgs[i],
-				     nb_evtims - i);
+		rte_mempool_put_bulk(sw->tim_pool,
+				     (void **)&tims[i], nb_evtims - i);
 
 	return i;
 }
 
 static uint16_t
-sw_event_timer_arm_burst(const struct rte_event_timer_adapter *adapter,
-			 struct rte_event_timer **evtims,
-			 uint16_t nb_evtims)
+swtim_arm_burst(const struct rte_event_timer_adapter *adapter,
+		struct rte_event_timer **evtims,
+		uint16_t nb_evtims)
 {
-	return __sw_event_timer_arm_burst(adapter, evtims, nb_evtims);
+	return __swtim_arm_burst(adapter, evtims, nb_evtims);
 }
 
 static uint16_t
-sw_event_timer_cancel_burst(const struct rte_event_timer_adapter *adapter,
-			    struct rte_event_timer **evtims,
-			    uint16_t nb_evtims)
+swtim_cancel_burst(const struct rte_event_timer_adapter *adapter,
+		   struct rte_event_timer **evtims,
+		   uint16_t nb_evtims)
 {
-	uint16_t i;
-	int ret;
-	struct rte_event_timer_adapter_sw_data *sw_data;
-	struct msg *msgs[nb_evtims];
+	int i, ret;
+	struct rte_timer *timp;
+	uint64_t opaque;
+	struct swtim *sw = swtim_pmd_priv(adapter);
 
 #ifdef RTE_LIBRTE_EVENTDEV_DEBUG
 	/* Check that the service is running. */
@@ -1205,23 +1085,6 @@ sw_event_timer_cancel_burst(const struct rte_event_timer_adapter *adapter,
 	}
 #endif
 
-	sw_data = adapter->data->adapter_priv;
-
-	ret = rte_mempool_get_bulk(sw_data->msg_pool, (void **)msgs, nb_evtims);
-	if (ret < 0) {
-		rte_errno = ENOSPC;
-		return 0;
-	}
-
-	/* Let the service know we're producing messages for it to process */
-	rte_atomic16_inc(&sw_data->message_producer_count);
-
-	/* If the service could be modifying event timer states, wait */
-	while (sw_data->service_phase == 2)
-		rte_pause();
-
-	rte_smp_rmb();
-
 	for (i = 0; i < nb_evtims; i++) {
 		/* Don't modify the event timer state in these cases */
 		if (evtims[i]->state == RTE_EVENT_TIMER_CANCELED) {
@@ -1232,54 +1095,56 @@ sw_event_timer_cancel_burst(const struct rte_event_timer_adapter *adapter,
 			break;
 		}
 
-		msgs[i]->type = MSG_TYPE_CANCEL;
-		msgs[i]->evtim = evtims[i];
+		rte_smp_rmb();
+
+		opaque = evtims[i]->impl_opaque[0];
+		timp = (struct rte_timer *)(uintptr_t)opaque;
+		RTE_ASSERT(timp != NULL);
 
-		if (rte_ring_enqueue(sw_data->msg_ring, msgs[i]) < 0) {
-			rte_errno = ENOSPC;
+		ret = rte_timer_alt_stop(sw->timer_data_id, timp);
+		if (ret < 0) {
+			/* Timer is running or being configured */
+			rte_errno = EAGAIN;
 			break;
 		}
 
-		EVTIM_LOG_DBG("enqueued CANCEL message to ring");
+		rte_mempool_put(sw->tim_pool, (void **)timp);
 
 		evtims[i]->state = RTE_EVENT_TIMER_CANCELED;
-	}
+		evtims[i]->impl_opaque[0] = 0;
+		evtims[i]->impl_opaque[1] = 0;
 
-	/* Let the service know we're done producing messages */
-	rte_atomic16_dec(&sw_data->message_producer_count);
-
-	if (i < nb_evtims)
-		rte_mempool_put_bulk(sw_data->msg_pool, (void **)&msgs[i],
-				     nb_evtims - i);
+		rte_smp_wmb();
+	}
 
 	return i;
 }
 
 static uint16_t
-sw_event_timer_arm_tmo_tick_burst(const struct rte_event_timer_adapter *adapter,
-				  struct rte_event_timer **evtims,
-				  uint64_t timeout_ticks,
-				  uint16_t nb_evtims)
+swtim_arm_tmo_tick_burst(const struct rte_event_timer_adapter *adapter,
+			 struct rte_event_timer **evtims,
+			 uint64_t timeout_ticks,
+			 uint16_t nb_evtims)
 {
 	int i;
 
 	for (i = 0; i < nb_evtims; i++)
 		evtims[i]->timeout_ticks = timeout_ticks;
 
-	return __sw_event_timer_arm_burst(adapter, evtims, nb_evtims);
+	return __swtim_arm_burst(adapter, evtims, nb_evtims);
 }
 
-static const struct rte_event_timer_adapter_ops sw_event_adapter_timer_ops = {
-	.init = sw_event_timer_adapter_init,
-	.uninit = sw_event_timer_adapter_uninit,
-	.start = sw_event_timer_adapter_start,
-	.stop = sw_event_timer_adapter_stop,
-	.get_info = sw_event_timer_adapter_get_info,
-	.stats_get = sw_event_timer_adapter_stats_get,
-	.stats_reset = sw_event_timer_adapter_stats_reset,
-	.arm_burst = sw_event_timer_arm_burst,
-	.arm_tmo_tick_burst = sw_event_timer_arm_tmo_tick_burst,
-	.cancel_burst = sw_event_timer_cancel_burst,
+static const struct rte_event_timer_adapter_ops swtim_ops = {
+	.init			= swtim_init,
+	.uninit			= swtim_uninit,
+	.start			= swtim_start,
+	.stop			= swtim_stop,
+	.get_info		= swtim_get_info,
+	.stats_get		= swtim_stats_get,
+	.stats_reset		= swtim_stats_reset,
+	.arm_burst		= swtim_arm_burst,
+	.arm_tmo_tick_burst	= swtim_arm_tmo_tick_burst,
+	.cancel_burst		= swtim_cancel_burst,
 };
 
 RTE_INIT(event_timer_adapter_init_log)
-- 
2.6.4

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [dpdk-dev] [EXT] [PATCH v4 0/1] New software event timer adapter
  2018-12-14 23:15     ` [dpdk-dev] [PATCH v4 0/1] New " Erik Gabriel Carrillo
  2018-12-14 23:15       ` [dpdk-dev] [PATCH v4 1/1] eventdev: add new " Erik Gabriel Carrillo
@ 2018-12-18 20:11       ` Jerin Jacob Kollanukkaran
  2018-12-18 20:14         ` Carrillo, Erik G
  2019-04-22 14:57       ` [dpdk-dev] [PATCH v5 " Erik Gabriel Carrillo
  2 siblings, 1 reply; 77+ messages in thread
From: Jerin Jacob Kollanukkaran @ 2018-12-18 20:11 UTC (permalink / raw)
  To: Jerin Jacob Kollanukkaran, erik.g.carrillo
  Cc: rsanford, mattias.ronnblom, pbhagavatula, dev

On Fri, 2018-12-14 at 17:15 -0600, Erik Gabriel Carrillo wrote:
> This patch introduces a new version of the event timer adapter
> software
> PMD [1]. In the original design, timer event producer lcores in the
> primary
> and secondary processes enqueued event timers into a ring, and a
> service
> core in the primary process dequeued them and processed them
> further.  To
> improve performance, this version does away with the ring and lets
> lcores in
> both primary and secondary processes insert timers directly into
> timer
> skiplist data structures; the service core directly accesses the
> lists as
> well, when looking for timers that have expired. (This behavior
> requires
> the patch to the timer library that is referenced below.)
> 
> Depends on: https://patches.dpdk.org/project/dpdk/list/?series=2767

Looks like this series not cleanly applying to master branch.

I will pull this change when depended patch pulled into master tree and
there are not more review comments.


> 
> [1] https://doc.dpdk.org/guides/prog_guide/event_timer_adapter.html
> 
> Changes in v4:
>  - Addressed the following comments from Mattias Ronnblom:
>    - remove unnecessary header include
>    - add missing read barrier in timer cancel function
> 
> Changes in v3:
>  - Addressed comments from Mattias Ronnblom:
>    - remove unnecessary header include
>    - remove unnecessary cast in mempool_put() call
>    - update alignment of elements of array to avoid false sharing
> issue
> 
> Changes in v2:
>  - split this change out into its own patch series
> 
> Erik Gabriel Carrillo (1):
>   eventdev: add new software event timer adapter
> 
>  lib/librte_eventdev/rte_event_timer_adapter.c | 689 +++++++++++-----
> ----------
>  1 file changed, 277 insertions(+), 412 deletions(-)
> 
> --
> 2.6.4
> 

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [dpdk-dev] [EXT] [PATCH v4 0/1] New software event timer adapter
  2018-12-18 20:11       ` [dpdk-dev] [EXT] [PATCH v4 0/1] New " Jerin Jacob Kollanukkaran
@ 2018-12-18 20:14         ` Carrillo, Erik G
  0 siblings, 0 replies; 77+ messages in thread
From: Carrillo, Erik G @ 2018-12-18 20:14 UTC (permalink / raw)
  To: Jerin Jacob Kollanukkaran; +Cc: rsanford, mattias.ronnblom, pbhagavatula, dev

> -----Original Message-----
> From: Jerin Jacob Kollanukkaran [mailto:jerinj@marvell.com]
> Sent: Tuesday, December 18, 2018 2:12 PM
> To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>; Carrillo, Erik G
> <erik.g.carrillo@intel.com>
> Cc: rsanford@akamai.com; mattias.ronnblom@ericsson.com;
> pbhagavatula@caviumnetworks.com; dev@dpdk.org
> Subject: Re: [EXT] [PATCH v4 0/1] New software event timer adapter
> 
> On Fri, 2018-12-14 at 17:15 -0600, Erik Gabriel Carrillo wrote:
> > This patch introduces a new version of the event timer adapter
> > software PMD [1]. In the original design, timer event producer lcores
> > in the primary and secondary processes enqueued event timers into a
> > ring, and a service core in the primary process dequeued them and
> > processed them further.  To improve performance, this version does
> > away with the ring and lets lcores in both primary and secondary
> > processes insert timers directly into timer skiplist data structures;
> > the service core directly accesses the lists as well, when looking for
> > timers that have expired. (This behavior requires the patch to the
> > timer library that is referenced below.)
> >
> > Depends on: https://patches.dpdk.org/project/dpdk/list/?series=2767
> 
> Looks like this series not cleanly applying to master branch.
> 
> I will pull this change when depended patch pulled into master tree and
> there are not more review comments.
> 

Ok, thanks Jerin.

> 
> >
> > [1] https://doc.dpdk.org/guides/prog_guide/event_timer_adapter.html
> >
> > Changes in v4:
> >  - Addressed the following comments from Mattias Ronnblom:
> >    - remove unnecessary header include
> >    - add missing read barrier in timer cancel function
> >
> > Changes in v3:
> >  - Addressed comments from Mattias Ronnblom:
> >    - remove unnecessary header include
> >    - remove unnecessary cast in mempool_put() call
> >    - update alignment of elements of array to avoid false sharing
> > issue
> >
> > Changes in v2:
> >  - split this change out into its own patch series
> >
> > Erik Gabriel Carrillo (1):
> >   eventdev: add new software event timer adapter
> >
> >  lib/librte_eventdev/rte_event_timer_adapter.c | 689 +++++++++++-----
> > ----------
> >  1 file changed, 277 insertions(+), 412 deletions(-)
> >
> > --
> > 2.6.4
> >

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/2] Timer library changes
  2018-12-13 22:26   ` [dpdk-dev] [PATCH v3 0/2] Timer library changes Erik Gabriel Carrillo
  2018-12-13 22:26     ` [dpdk-dev] [PATCH v3 1/2] timer: allow timer management in shared memory Erik Gabriel Carrillo
  2018-12-13 22:26     ` [dpdk-dev] [PATCH v3 2/2] timer: add function to stop all timers in a list Erik Gabriel Carrillo
@ 2018-12-19  3:35     ` Thomas Monjalon
  2018-12-19  7:33       ` Mattias Rönnblom
  2019-03-05 22:41     ` Carrillo, Erik G
  2019-03-06 17:20     ` [dpdk-dev] [PATCH v4 " Erik Gabriel Carrillo
  4 siblings, 1 reply; 77+ messages in thread
From: Thomas Monjalon @ 2018-12-19  3:35 UTC (permalink / raw)
  To: dev; +Cc: Erik Gabriel Carrillo, rsanford, stephen, jerin.jacob, pbhagavatula

13/12/2018 23:26, Erik Gabriel Carrillo:
> This patch series modifies the timer library in such a way that
> structures that used to be statically allocated in a process's data
> segment are now allocated in shared memory.  As these structures contain
> lists of timers, new APIs are introduced that allow a caller to specify
> the particular structure instance into which a timer should be inserted
> or from which a timer should be removed.  This enables primary and
> secondary processes to modify the same timer list, which enables some
> multi-process use cases that were not previously possible; e.g. a
> secondary process can start a timer whose expiration is detected in a
> primary process running a new flavor of timer_manage().
> 
> The original library API is mostly unchanged, though implementations are
> updated to call into newly added functions with a default structure
> instance ID that provides the original behavior.  New functions are
> introduced to enable applications to allocate structure instances to
> house timer lists, and to reference them with an identifier when
> starting and stopping timers, and finally, to manage the timer lists
> referenced with an identifier.
> 
> My initial performance testing with the "timer_perf_autotest" test shows
> no performance regression or improvement, and inspection of the
> generated optimized code shows that the extra function call gets inlined
> in the functions that now have an extra function call. 
> 
> Depends on: https://patches.dpdk.org/patch/48417/
> 
> Changes in v3:
>  - remove C++ style comment in first patch in series (Stephen)
> 
> Changes in v2:
>  - split these changes out into their own series
>  - version the symbols where the existing ABI was updated, and
>    provide alternate implementation with behavior equivalent to original
>    behavior. Validated ABI compatibility with validate-abi.sh
>  - refactor changes to simplify patches
> 
> Erik Gabriel Carrillo (2):
>   timer: allow timer management in shared memory
>   timer: add function to stop all timers in a list
> 
>  lib/librte_timer/Makefile              |   1 +
>  lib/librte_timer/rte_timer.c           | 558 ++++++++++++++++++++++++++++++---
>  lib/librte_timer/rte_timer.h           | 258 ++++++++++++++-
>  lib/librte_timer/rte_timer_version.map |  23 ++
>  4 files changed, 795 insertions(+), 45 deletions(-)

It is a lot of changes!
Anyone to review please?

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/2] Timer library changes
  2018-12-19  3:35     ` [dpdk-dev] [PATCH v3 0/2] Timer library changes Thomas Monjalon
@ 2018-12-19  7:33       ` Mattias Rönnblom
  0 siblings, 0 replies; 77+ messages in thread
From: Mattias Rönnblom @ 2018-12-19  7:33 UTC (permalink / raw)
  To: Thomas Monjalon, dev
  Cc: Erik Gabriel Carrillo, rsanford, stephen, jerin.jacob, pbhagavatula

On 2018-12-19 04:35, Thomas Monjalon wrote:
> 13/12/2018 23:26, Erik Gabriel Carrillo:
>> This patch series modifies the timer library in such a way that
>> structures that used to be statically allocated in a process's data
>> segment are now allocated in shared memory.  As these structures contain
>> lists of timers, new APIs are introduced that allow a caller to specify
>> the particular structure instance into which a timer should be inserted
>> or from which a timer should be removed.  This enables primary and
>> secondary processes to modify the same timer list, which enables some
>> multi-process use cases that were not previously possible; e.g. a
>> secondary process can start a timer whose expiration is detected in a
>> primary process running a new flavor of timer_manage().
>>
>> The original library API is mostly unchanged, though implementations are
>> updated to call into newly added functions with a default structure
>> instance ID that provides the original behavior.  New functions are
>> introduced to enable applications to allocate structure instances to
>> house timer lists, and to reference them with an identifier when
>> starting and stopping timers, and finally, to manage the timer lists
>> referenced with an identifier.
>>
>> My initial performance testing with the "timer_perf_autotest" test shows
>> no performance regression or improvement, and inspection of the
>> generated optimized code shows that the extra function call gets inlined
>> in the functions that now have an extra function call.
>>
>> Depends on: https://patches.dpdk.org/patch/48417/
>>
>> Changes in v3:
>>   - remove C++ style comment in first patch in series (Stephen)
>>
>> Changes in v2:
>>   - split these changes out into their own series
>>   - version the symbols where the existing ABI was updated, and
>>     provide alternate implementation with behavior equivalent to original
>>     behavior. Validated ABI compatibility with validate-abi.sh
>>   - refactor changes to simplify patches
>>
>> Erik Gabriel Carrillo (2):
>>    timer: allow timer management in shared memory
>>    timer: add function to stop all timers in a list
>>
>>   lib/librte_timer/Makefile              |   1 +
>>   lib/librte_timer/rte_timer.c           | 558 ++++++++++++++++++++++++++++++---
>>   lib/librte_timer/rte_timer.h           | 258 ++++++++++++++-
>>   lib/librte_timer/rte_timer_version.map |  23 ++
>>   4 files changed, 795 insertions(+), 45 deletions(-)
> 
> It is a lot of changes!
> Anyone to review please?
> 
> 

I can give reviewing the overall aim with the patch set a try: Do we 
really want DPDK to support more secondary process-based use cases? I 
would rather see it supporting fewer, with the long term goal of 
dropping support for secondary processes altogether.

DPDK secondary processes are a horrible mess, in my opinion, to put it 
bluntly.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/2] Timer library changes
  2018-12-13 22:26   ` [dpdk-dev] [PATCH v3 0/2] Timer library changes Erik Gabriel Carrillo
                       ` (2 preceding siblings ...)
  2018-12-19  3:35     ` [dpdk-dev] [PATCH v3 0/2] Timer library changes Thomas Monjalon
@ 2019-03-05 22:41     ` Carrillo, Erik G
  2019-03-05 22:58       ` [dpdk-dev] [dpdk-techboard] " Thomas Monjalon
  2019-03-06  2:39       ` [dpdk-dev] " Varghese, Vipin
  2019-03-06 17:20     ` [dpdk-dev] [PATCH v4 " Erik Gabriel Carrillo
  4 siblings, 2 replies; 77+ messages in thread
From: Carrillo, Erik G @ 2019-03-05 22:41 UTC (permalink / raw)
  To: rsanford; +Cc: dev, techboard

Hi all,

I'd like to bring this patch proposal up again and see if I can get any more feedback from the maintainer or others.

I need to update the map file to reflect the next release, so I'll add those changes in if any other modifications are suggested.

Thanks,
Erik

ML:  https://mails.dpdk.org/archives/dev/2018-December/120864.html
Patchwork:  https://patches.dpdk.org/project/dpdk/list/?series=2767

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Erik Gabriel Carrillo
> Sent: Thursday, December 13, 2018 4:27 PM
> To: rsanford@akamai.com
> Cc: stephen@networkplumber.org; jerin.jacob@caviumnetworks.com;
> pbhagavatula@caviumnetworks.com; dev@dpdk.org
> Subject: [dpdk-dev] [PATCH v3 0/2] Timer library changes
> 
> This patch series modifies the timer library in such a way that structures that
> used to be statically allocated in a process's data segment are now allocated
> in shared memory.  As these structures contain lists of timers, new APIs are
> introduced that allow a caller to specify the particular structure instance into
> which a timer should be inserted or from which a timer should be removed.
> This enables primary and secondary processes to modify the same timer list,
> which enables some multi-process use cases that were not previously
> possible; e.g. a secondary process can start a timer whose expiration is
> detected in a primary process running a new flavor of timer_manage().
> 
> The original library API is mostly unchanged, though implementations are
> updated to call into newly added functions with a default structure instance
> ID that provides the original behavior.  New functions are introduced to
> enable applications to allocate structure instances to house timer lists, and to
> reference them with an identifier when starting and stopping timers, and
> finally, to manage the timer lists referenced with an identifier.
> 
> My initial performance testing with the "timer_perf_autotest" test shows no
> performance regression or improvement, and inspection of the generated
> optimized code shows that the extra function call gets inlined in the functions
> that now have an extra function call.
> 
> Depends on: https://patches.dpdk.org/patch/48417/
> 
> Changes in v3:
>  - remove C++ style comment in first patch in series (Stephen)
> 
> Changes in v2:
>  - split these changes out into their own series
>  - version the symbols where the existing ABI was updated, and
>    provide alternate implementation with behavior equivalent to original
>    behavior. Validated ABI compatibility with validate-abi.sh
>  - refactor changes to simplify patches
> 
> Erik Gabriel Carrillo (2):
>   timer: allow timer management in shared memory
>   timer: add function to stop all timers in a list
> 
>  lib/librte_timer/Makefile              |   1 +
>  lib/librte_timer/rte_timer.c           | 558
> ++++++++++++++++++++++++++++++---
>  lib/librte_timer/rte_timer.h           | 258 ++++++++++++++-
>  lib/librte_timer/rte_timer_version.map |  23 ++
>  4 files changed, 795 insertions(+), 45 deletions(-)
> 
> --
> 2.6.4

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [dpdk-dev] [dpdk-techboard] [PATCH v3 0/2] Timer library changes
  2019-03-05 22:41     ` Carrillo, Erik G
@ 2019-03-05 22:58       ` Thomas Monjalon
  2019-03-06 18:54         ` Carrillo, Erik G
  2019-03-06  2:39       ` [dpdk-dev] " Varghese, Vipin
  1 sibling, 1 reply; 77+ messages in thread
From: Thomas Monjalon @ 2019-03-05 22:58 UTC (permalink / raw)
  To: Carrillo, Erik G; +Cc: techboard, rsanford, dev

05/03/2019 23:41, Carrillo, Erik G:
> Hi all,
> 
> I'd like to bring this patch proposal up again and see if I can get any more feedback from the maintainer or others.
> 
> I need to update the map file to reflect the next release, so I'll add those changes in if any other modifications are suggested.

Please send an updated version.
If nobody reply after 10 days, I will push it.

Would you be interested to become maintainer of the timer lib?

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/2] Timer library changes
  2019-03-05 22:41     ` Carrillo, Erik G
  2019-03-05 22:58       ` [dpdk-dev] [dpdk-techboard] " Thomas Monjalon
@ 2019-03-06  2:39       ` Varghese, Vipin
  2019-03-06 15:15         ` Carrillo, Erik G
  1 sibling, 1 reply; 77+ messages in thread
From: Varghese, Vipin @ 2019-03-06  2:39 UTC (permalink / raw)
  To: Carrillo, Erik G, rsanford; +Cc: dev, techboard

Hi Erik,

Apologies if I am reaching out a bit late. Please find my query below

<snipped>
> > This enables primary and secondary processes to modify the same timer
> > list, which enables some multi-process use cases that were not
> > previously possible; e.g. a secondary process can start a timer whose
> > expiration is detected in a primary process running a new flavor of
> timer_manage().
Does this mean the following, primary can detect the timer expire primed by secondary. On calling new timer_manage() from primary will it invoke call back handler of secondary? If yes, has this been tested with shared library too?
<snipped>

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/2] Timer library changes
  2019-03-06  2:39       ` [dpdk-dev] " Varghese, Vipin
@ 2019-03-06 15:15         ` Carrillo, Erik G
  2019-03-07  2:33           ` Varghese, Vipin
  0 siblings, 1 reply; 77+ messages in thread
From: Carrillo, Erik G @ 2019-03-06 15:15 UTC (permalink / raw)
  To: Varghese, Vipin, rsanford; +Cc: dev, techboard

> -----Original Message-----
> From: Varghese, Vipin
> Sent: Tuesday, March 5, 2019 8:39 PM
> To: Carrillo, Erik G <erik.g.carrillo@intel.com>; rsanford@akamai.com
> Cc: dev@dpdk.org; techboard@dpdk.org
> Subject: RE: [dpdk-dev] [PATCH v3 0/2] Timer library changes
> 
> Hi Erik,
> 
> Apologies if I am reaching out a bit late. Please find my query below
> 
> <snipped>
> > > This enables primary and secondary processes to modify the same
> > > timer list, which enables some multi-process use cases that were not
> > > previously possible; e.g. a secondary process can start a timer
> > > whose expiration is detected in a primary process running a new
> > > flavor of
> > timer_manage().
> Does this mean the following, primary can detect the timer expire primed by
> secondary. On calling new timer_manage() from primary will it invoke call
> back handler of secondary? If yes, has this been tested with shared library
> too?
> <snipped>

Hi Vipin,

No, with the proposed patch,  the callback handler would need to be a function pointer valid in the same process that is invoking the new timer_manage().

Thanks,
Gabriel

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [dpdk-dev] [PATCH v4 0/2] Timer library changes
  2018-12-13 22:26   ` [dpdk-dev] [PATCH v3 0/2] Timer library changes Erik Gabriel Carrillo
                       ` (3 preceding siblings ...)
  2019-03-05 22:41     ` Carrillo, Erik G
@ 2019-03-06 17:20     ` Erik Gabriel Carrillo
  2019-03-06 17:20       ` [dpdk-dev] [PATCH v4 1/2] timer: allow timer management in shared memory Erik Gabriel Carrillo
                         ` (2 more replies)
  4 siblings, 3 replies; 77+ messages in thread
From: Erik Gabriel Carrillo @ 2019-03-06 17:20 UTC (permalink / raw)
  To: rsanford, thomas; +Cc: dev, nhorman

This patch series modifies the timer library in such a way that
structures that used to be statically allocated in a process's data
segment are now allocated in shared memory.  As these structures contain
lists of timers, new APIs are introduced that allow a caller to specify
the particular structure instance into which a timer should be inserted
or from which a timer should be removed.  This enables primary and
secondary processes to modify the same timer list, which enables some
multi-process use cases that were not previously possible; e.g. a
secondary process can start a timer whose expiration is detected in a
primary process running a new flavor of timer_manage().

The original library API is mostly unchanged, though implementations are
updated to call into newly added functions with a default structure
instance ID that provides the original behavior.  New functions are
introduced to enable applications to allocate structure instances to
house timer lists, and to reference them with an identifier when
starting and stopping timers, and finally, to manage the timer lists
referenced with an identifier.

My initial performance testing with the "timer_perf_autotest" test shows
no performance regression or improvement, and inspection of the
generated optimized code shows that the extra function call gets inlined
in the functions that now have an extra function call. 

Changes in v4:
 - Updated versioned symbols so that they correspond to the next
   release. Checked ABI compatibility again with validate-abi.sh.

Changes in v3:
 - remove C++ style comment in first patch in series (Stephen)

Changes in v2:
 - split these changes out into their own series
 - version the symbols where the existing ABI was updated, and
   provide alternate implementation with behavior equivalent to original
   behavior. Validated ABI compatibility with validate-abi.sh
 - refactor changes to simplify patches

Erik Gabriel Carrillo (2):
  timer: allow timer management in shared memory
  timer: add function to stop all timers in a list

 lib/librte_timer/Makefile              |   1 +
 lib/librte_timer/rte_timer.c           | 558 ++++++++++++++++++++++++++++++---
 lib/librte_timer/rte_timer.h           | 258 ++++++++++++++-
 lib/librte_timer/rte_timer_version.map |  23 ++
 4 files changed, 795 insertions(+), 45 deletions(-)

-- 
2.6.4

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [dpdk-dev] [PATCH v4 1/2] timer: allow timer management in shared memory
  2019-03-06 17:20     ` [dpdk-dev] [PATCH v4 " Erik Gabriel Carrillo
@ 2019-03-06 17:20       ` Erik Gabriel Carrillo
  2019-03-20 13:52         ` Sanford, Robert
  2019-03-06 17:20       ` [dpdk-dev] [PATCH v4 2/2] timer: add function to stop all timers in a list Erik Gabriel Carrillo
  2019-04-15 21:41       ` [dpdk-dev] [PATCH v5 0/2] Timer library changes Erik Gabriel Carrillo
  2 siblings, 1 reply; 77+ messages in thread
From: Erik Gabriel Carrillo @ 2019-03-06 17:20 UTC (permalink / raw)
  To: rsanford, thomas; +Cc: dev, nhorman

Currently, the timer library uses a per-process table of structures to
manage skiplists of timers presumably because timers contain arbitrary
function pointers whose value may not resolve properly in other
processes.

However, if the same callback is used handle all timers, and that
callback is only invoked in one process, then it woud be safe to allow
the data structures to be allocated in shared memory, and to allow
secondary processes to modify the timer lists.  This would let timers be
used in more multi-process scenarios.

The library's global variables are wrapped with a struct, and an array
of these structures is created in shared memory.  The original APIs
are updated to reference the zeroth entry in the array. This maintains
the original behavior for both primary and secondary processes since
the set intersection of their coremasks should be empty [1].  New APIs
are introduced to enable the allocation/deallocation of other entries
in the array.

New variants of the APIs used to start and stop timers are introduced;
they allow a caller to specify which array entry should be used to
locate the timer list to insert into or delete from.

Finally, a new variant of rte_timer_manage() is introduced, which
allows a caller to specify which array entry should be used to locate
the timer lists to process; it can also process multiple timer lists per
invocation.

[1] https://doc.dpdk.org/guides/prog_guide/multi_proc_support.html#multi-process-limitations

Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
---
 lib/librte_timer/Makefile              |   1 +
 lib/librte_timer/rte_timer.c           | 519 ++++++++++++++++++++++++++++++---
 lib/librte_timer/rte_timer.h           | 226 +++++++++++++-
 lib/librte_timer/rte_timer_version.map |  22 ++
 4 files changed, 723 insertions(+), 45 deletions(-)

diff --git a/lib/librte_timer/Makefile b/lib/librte_timer/Makefile
index 4ebd528..8ec63f4 100644
--- a/lib/librte_timer/Makefile
+++ b/lib/librte_timer/Makefile
@@ -6,6 +6,7 @@ include $(RTE_SDK)/mk/rte.vars.mk
 # library name
 LIB = librte_timer.a
 
+CFLAGS += -DALLOW_EXPERIMENTAL_API
 CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3
 LDLIBS += -lrte_eal
 
diff --git a/lib/librte_timer/rte_timer.c b/lib/librte_timer/rte_timer.c
index 30c7b0a..2bd49d0 100644
--- a/lib/librte_timer/rte_timer.c
+++ b/lib/librte_timer/rte_timer.c
@@ -5,6 +5,7 @@
 #include <string.h>
 #include <stdio.h>
 #include <stdint.h>
+#include <stdbool.h>
 #include <inttypes.h>
 #include <assert.h>
 #include <sys/queue.h>
@@ -21,11 +22,15 @@
 #include <rte_spinlock.h>
 #include <rte_random.h>
 #include <rte_pause.h>
+#include <rte_memzone.h>
+#include <rte_malloc.h>
+#include <rte_compat.h>
 
 #include "rte_timer.h"
 
-LIST_HEAD(rte_timer_list, rte_timer);
-
+/**
+ * Per-lcore info for timers.
+ */
 struct priv_timer {
 	struct rte_timer pending_head;  /**< dummy timer instance to head up list */
 	rte_spinlock_t list_lock;       /**< lock to protect list access */
@@ -48,25 +53,84 @@ struct priv_timer {
 #endif
 } __rte_cache_aligned;
 
-/** per-lcore private info for timers */
-static struct priv_timer priv_timer[RTE_MAX_LCORE];
+#define FL_ALLOCATED	(1 << 0)
+struct rte_timer_data {
+	struct priv_timer priv_timer[RTE_MAX_LCORE];
+	uint8_t internal_flags;
+};
+
+#define RTE_MAX_DATA_ELS 64
+static struct rte_timer_data *rte_timer_data_arr;
+static uint32_t default_data_id;
+static uint32_t rte_timer_subsystem_initialized;
+
+/* For maintaining older interfaces for a period */
+static struct rte_timer_data default_timer_data;
 
 /* when debug is enabled, store some statistics */
 #ifdef RTE_LIBRTE_TIMER_DEBUG
-#define __TIMER_STAT_ADD(name, n) do {					\
+#define __TIMER_STAT_ADD(priv_timer, name, n) do {			\
 		unsigned __lcore_id = rte_lcore_id();			\
 		if (__lcore_id < RTE_MAX_LCORE)				\
 			priv_timer[__lcore_id].stats.name += (n);	\
 	} while(0)
 #else
-#define __TIMER_STAT_ADD(name, n) do {} while(0)
+#define __TIMER_STAT_ADD(priv_timer, name, n) do {} while (0)
 #endif
 
-/* Init the timer library. */
+static inline int
+timer_data_valid(uint32_t id)
+{
+	return !!(rte_timer_data_arr[id].internal_flags & FL_ALLOCATED);
+}
+
+/* validate ID and retrieve timer data pointer, or return error value */
+#define TIMER_DATA_VALID_GET_OR_ERR_RET(id, timer_data, retval) do {	\
+	if (id >= RTE_MAX_DATA_ELS || !timer_data_valid(id))		\
+		return retval;						\
+	timer_data = &rte_timer_data_arr[id];				\
+} while (0)
+
+int __rte_experimental
+rte_timer_data_alloc(uint32_t *id_ptr)
+{
+	int i;
+	struct rte_timer_data *data;
+
+	if (!rte_timer_subsystem_initialized)
+		return -ENOMEM;
+
+	for (i = 0; i < RTE_MAX_DATA_ELS; i++) {
+		data = &rte_timer_data_arr[i];
+		if (!(data->internal_flags & FL_ALLOCATED)) {
+			data->internal_flags |= FL_ALLOCATED;
+
+			if (id_ptr)
+				*id_ptr = i;
+
+			return 0;
+		}
+	}
+
+	return -ENOSPC;
+}
+
+int __rte_experimental
+rte_timer_data_dealloc(uint32_t id)
+{
+	struct rte_timer_data *timer_data;
+	TIMER_DATA_VALID_GET_OR_ERR_RET(id, timer_data, -EINVAL);
+
+	timer_data->internal_flags &= ~(FL_ALLOCATED);
+
+	return 0;
+}
+
 void
-rte_timer_subsystem_init(void)
+rte_timer_subsystem_init_v20(void)
 {
 	unsigned lcore_id;
+	struct priv_timer *priv_timer = default_timer_data.priv_timer;
 
 	/* since priv_timer is static, it's zeroed by default, so only init some
 	 * fields.
@@ -76,6 +140,76 @@ rte_timer_subsystem_init(void)
 		priv_timer[lcore_id].prev_lcore = lcore_id;
 	}
 }
+VERSION_SYMBOL(rte_timer_subsystem_init, _v20, 2.0);
+
+/* Init the timer library. Allocate an array of timer data structs in shared
+ * memory, and allocate the zeroth entry for use with original timer
+ * APIs. Since the intersection of the sets of lcore ids in primary and
+ * secondary processes should be empty, the zeroth entry can be shared by
+ * multiple processes.
+ */
+int
+rte_timer_subsystem_init_v1905(void)
+{
+	const struct rte_memzone *mz;
+	struct rte_timer_data *data;
+	int i, lcore_id;
+	static const char *mz_name = "rte_timer_mz";
+
+	if (rte_timer_subsystem_initialized)
+		return -EALREADY;
+
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
+		mz = rte_memzone_lookup(mz_name);
+		if (mz == NULL)
+			return -EEXIST;
+
+		rte_timer_data_arr = mz->addr;
+
+		rte_timer_data_arr[default_data_id].internal_flags |=
+			FL_ALLOCATED;
+
+		rte_timer_subsystem_initialized = 1;
+
+		return 0;
+	}
+
+	mz = rte_memzone_reserve_aligned(mz_name,
+			RTE_MAX_DATA_ELS * sizeof(*rte_timer_data_arr),
+			SOCKET_ID_ANY, 0, RTE_CACHE_LINE_SIZE);
+	if (mz == NULL)
+		return -ENOMEM;
+
+	rte_timer_data_arr = mz->addr;
+
+	for (i = 0; i < RTE_MAX_DATA_ELS; i++) {
+		data = &rte_timer_data_arr[i];
+
+		for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
+			rte_spinlock_init(
+				&data->priv_timer[lcore_id].list_lock);
+			data->priv_timer[lcore_id].prev_lcore = lcore_id;
+		}
+	}
+
+	rte_timer_data_arr[default_data_id].internal_flags |= FL_ALLOCATED;
+
+	rte_timer_subsystem_initialized = 1;
+
+	return 0;
+}
+MAP_STATIC_SYMBOL(int rte_timer_subsystem_init(void),
+		  rte_timer_subsystem_init_v1905);
+BIND_DEFAULT_SYMBOL(rte_timer_subsystem_init, _v1905, 19.05);
+
+void __rte_experimental
+rte_timer_subsystem_finalize(void)
+{
+	if (rte_timer_data_arr)
+		rte_free(rte_timer_data_arr);
+
+	rte_timer_subsystem_initialized = 0;
+}
 
 /* Initialize the timer handle tim for use */
 void
@@ -95,7 +229,8 @@ rte_timer_init(struct rte_timer *tim)
  */
 static int
 timer_set_config_state(struct rte_timer *tim,
-		       union rte_timer_status *ret_prev_status)
+		       union rte_timer_status *ret_prev_status,
+		       struct priv_timer *priv_timer)
 {
 	union rte_timer_status prev_status, status;
 	int success = 0;
@@ -207,7 +342,7 @@ timer_get_skiplist_level(unsigned curr_depth)
  */
 static void
 timer_get_prev_entries(uint64_t time_val, unsigned tim_lcore,
-		struct rte_timer **prev)
+		       struct rte_timer **prev, struct priv_timer *priv_timer)
 {
 	unsigned lvl = priv_timer[tim_lcore].curr_skiplist_depth;
 	prev[lvl] = &priv_timer[tim_lcore].pending_head;
@@ -226,13 +361,15 @@ timer_get_prev_entries(uint64_t time_val, unsigned tim_lcore,
  */
 static void
 timer_get_prev_entries_for_node(struct rte_timer *tim, unsigned tim_lcore,
-		struct rte_timer **prev)
+				struct rte_timer **prev,
+				struct priv_timer *priv_timer)
 {
 	int i;
+
 	/* to get a specific entry in the list, look for just lower than the time
 	 * values, and then increment on each level individually if necessary
 	 */
-	timer_get_prev_entries(tim->expire - 1, tim_lcore, prev);
+	timer_get_prev_entries(tim->expire - 1, tim_lcore, prev, priv_timer);
 	for (i = priv_timer[tim_lcore].curr_skiplist_depth - 1; i >= 0; i--) {
 		while (prev[i]->sl_next[i] != NULL &&
 				prev[i]->sl_next[i] != tim &&
@@ -247,14 +384,15 @@ timer_get_prev_entries_for_node(struct rte_timer *tim, unsigned tim_lcore,
  * timer must not be in a list
  */
 static void
-timer_add(struct rte_timer *tim, unsigned int tim_lcore)
+timer_add(struct rte_timer *tim, unsigned int tim_lcore,
+	  struct priv_timer *priv_timer)
 {
 	unsigned lvl;
 	struct rte_timer *prev[MAX_SKIPLIST_DEPTH+1];
 
 	/* find where exactly this element goes in the list of elements
 	 * for each depth. */
-	timer_get_prev_entries(tim->expire, tim_lcore, prev);
+	timer_get_prev_entries(tim->expire, tim_lcore, prev, priv_timer);
 
 	/* now assign it a new level and add at that level */
 	const unsigned tim_level = timer_get_skiplist_level(
@@ -284,7 +422,7 @@ timer_add(struct rte_timer *tim, unsigned int tim_lcore)
  */
 static void
 timer_del(struct rte_timer *tim, union rte_timer_status prev_status,
-		int local_is_locked)
+	  int local_is_locked, struct priv_timer *priv_timer)
 {
 	unsigned lcore_id = rte_lcore_id();
 	unsigned prev_owner = prev_status.owner;
@@ -304,7 +442,7 @@ timer_del(struct rte_timer *tim, union rte_timer_status prev_status,
 				((tim->sl_next[0] == NULL) ? 0 : tim->sl_next[0]->expire);
 
 	/* adjust pointers from previous entries to point past this */
-	timer_get_prev_entries_for_node(tim, prev_owner, prev);
+	timer_get_prev_entries_for_node(tim, prev_owner, prev, priv_timer);
 	for (i = priv_timer[prev_owner].curr_skiplist_depth - 1; i >= 0; i--) {
 		if (prev[i]->sl_next[i] == tim)
 			prev[i]->sl_next[i] = tim->sl_next[i];
@@ -326,11 +464,13 @@ static int
 __rte_timer_reset(struct rte_timer *tim, uint64_t expire,
 		  uint64_t period, unsigned tim_lcore,
 		  rte_timer_cb_t fct, void *arg,
-		  int local_is_locked)
+		  int local_is_locked,
+		  struct rte_timer_data *timer_data)
 {
 	union rte_timer_status prev_status, status;
 	int ret;
 	unsigned lcore_id = rte_lcore_id();
+	struct priv_timer *priv_timer = timer_data->priv_timer;
 
 	/* round robin for tim_lcore */
 	if (tim_lcore == (unsigned)LCORE_ID_ANY) {
@@ -348,11 +488,11 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t expire,
 
 	/* wait that the timer is in correct status before update,
 	 * and mark it as being configured */
-	ret = timer_set_config_state(tim, &prev_status);
+	ret = timer_set_config_state(tim, &prev_status, priv_timer);
 	if (ret < 0)
 		return -1;
 
-	__TIMER_STAT_ADD(reset, 1);
+	__TIMER_STAT_ADD(priv_timer, reset, 1);
 	if (prev_status.state == RTE_TIMER_RUNNING &&
 	    lcore_id < RTE_MAX_LCORE) {
 		priv_timer[lcore_id].updated = 1;
@@ -360,8 +500,8 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t expire,
 
 	/* remove it from list */
 	if (prev_status.state == RTE_TIMER_PENDING) {
-		timer_del(tim, prev_status, local_is_locked);
-		__TIMER_STAT_ADD(pending, -1);
+		timer_del(tim, prev_status, local_is_locked, priv_timer);
+		__TIMER_STAT_ADD(priv_timer, pending, -1);
 	}
 
 	tim->period = period;
@@ -376,8 +516,8 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t expire,
 	if (tim_lcore != lcore_id || !local_is_locked)
 		rte_spinlock_lock(&priv_timer[tim_lcore].list_lock);
 
-	__TIMER_STAT_ADD(pending, 1);
-	timer_add(tim, tim_lcore);
+	__TIMER_STAT_ADD(priv_timer, pending, 1);
+	timer_add(tim, tim_lcore, priv_timer);
 
 	/* update state: as we are in CONFIG state, only us can modify
 	 * the state so we don't need to use cmpset() here */
@@ -394,9 +534,9 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t expire,
 
 /* Reset and start the timer associated with the timer handle tim */
 int
-rte_timer_reset(struct rte_timer *tim, uint64_t ticks,
-		enum rte_timer_type type, unsigned tim_lcore,
-		rte_timer_cb_t fct, void *arg)
+rte_timer_reset_v20(struct rte_timer *tim, uint64_t ticks,
+		    enum rte_timer_type type, unsigned int tim_lcore,
+		    rte_timer_cb_t fct, void *arg)
 {
 	uint64_t cur_time = rte_get_timer_cycles();
 	uint64_t period;
@@ -412,7 +552,48 @@ rte_timer_reset(struct rte_timer *tim, uint64_t ticks,
 		period = 0;
 
 	return __rte_timer_reset(tim,  cur_time + ticks, period, tim_lcore,
-			  fct, arg, 0);
+			  fct, arg, 0, &default_timer_data);
+}
+VERSION_SYMBOL(rte_timer_reset, _v20, 2.0);
+
+int
+rte_timer_reset_v1905(struct rte_timer *tim, uint64_t ticks,
+		      enum rte_timer_type type, unsigned int tim_lcore,
+		      rte_timer_cb_t fct, void *arg)
+{
+	return rte_timer_alt_reset(default_data_id, tim, ticks, type,
+				   tim_lcore, fct, arg);
+}
+MAP_STATIC_SYMBOL(int rte_timer_reset(struct rte_timer *tim, uint64_t ticks,
+				      enum rte_timer_type type,
+				      unsigned int tim_lcore,
+				      rte_timer_cb_t fct, void *arg),
+		  rte_timer_reset_v1905);
+BIND_DEFAULT_SYMBOL(rte_timer_reset, _v1905, 19.05);
+
+int __rte_experimental
+rte_timer_alt_reset(uint32_t timer_data_id, struct rte_timer *tim,
+		    uint64_t ticks, enum rte_timer_type type,
+		    unsigned int tim_lcore, rte_timer_cb_t fct, void *arg)
+{
+	uint64_t cur_time = rte_get_timer_cycles();
+	uint64_t period;
+	struct rte_timer_data *timer_data;
+
+	TIMER_DATA_VALID_GET_OR_ERR_RET(timer_data_id, timer_data, -EINVAL);
+
+	if (unlikely((tim_lcore != (unsigned int)LCORE_ID_ANY) &&
+			!(rte_lcore_is_enabled(tim_lcore) ||
+			  rte_lcore_has_role(tim_lcore, ROLE_SERVICE))))
+		return -1;
+
+	if (type == PERIODICAL)
+		period = ticks;
+	else
+		period = 0;
+
+	return __rte_timer_reset(tim,  cur_time + ticks, period, tim_lcore,
+				 fct, arg, 0, timer_data);
 }
 
 /* loop until rte_timer_reset() succeed */
@@ -426,21 +607,22 @@ rte_timer_reset_sync(struct rte_timer *tim, uint64_t ticks,
 		rte_pause();
 }
 
-/* Stop the timer associated with the timer handle tim */
-int
-rte_timer_stop(struct rte_timer *tim)
+static int
+__rte_timer_stop(struct rte_timer *tim, int local_is_locked,
+		 struct rte_timer_data *timer_data)
 {
 	union rte_timer_status prev_status, status;
 	unsigned lcore_id = rte_lcore_id();
 	int ret;
+	struct priv_timer *priv_timer = timer_data->priv_timer;
 
 	/* wait that the timer is in correct status before update,
 	 * and mark it as being configured */
-	ret = timer_set_config_state(tim, &prev_status);
+	ret = timer_set_config_state(tim, &prev_status, priv_timer);
 	if (ret < 0)
 		return -1;
 
-	__TIMER_STAT_ADD(stop, 1);
+	__TIMER_STAT_ADD(priv_timer, stop, 1);
 	if (prev_status.state == RTE_TIMER_RUNNING &&
 	    lcore_id < RTE_MAX_LCORE) {
 		priv_timer[lcore_id].updated = 1;
@@ -448,8 +630,8 @@ rte_timer_stop(struct rte_timer *tim)
 
 	/* remove it from list */
 	if (prev_status.state == RTE_TIMER_PENDING) {
-		timer_del(tim, prev_status, 0);
-		__TIMER_STAT_ADD(pending, -1);
+		timer_del(tim, prev_status, local_is_locked, priv_timer);
+		__TIMER_STAT_ADD(priv_timer, pending, -1);
 	}
 
 	/* mark timer as stopped */
@@ -461,6 +643,33 @@ rte_timer_stop(struct rte_timer *tim)
 	return 0;
 }
 
+/* Stop the timer associated with the timer handle tim */
+int
+rte_timer_stop_v20(struct rte_timer *tim)
+{
+	return __rte_timer_stop(tim, 0, &default_timer_data);
+}
+VERSION_SYMBOL(rte_timer_stop, _v20, 2.0);
+
+int
+rte_timer_stop_v1905(struct rte_timer *tim)
+{
+	return rte_timer_alt_stop(default_data_id, tim);
+}
+MAP_STATIC_SYMBOL(int rte_timer_stop(struct rte_timer *tim),
+		  rte_timer_stop_v1905);
+BIND_DEFAULT_SYMBOL(rte_timer_stop, _v1905, 19.05);
+
+int __rte_experimental
+rte_timer_alt_stop(uint32_t timer_data_id, struct rte_timer *tim)
+{
+	struct rte_timer_data *timer_data;
+
+	TIMER_DATA_VALID_GET_OR_ERR_RET(timer_data_id, timer_data, -EINVAL);
+
+	return __rte_timer_stop(tim, 0, timer_data);
+}
+
 /* loop until rte_timer_stop() succeed */
 void
 rte_timer_stop_sync(struct rte_timer *tim)
@@ -477,7 +686,8 @@ rte_timer_pending(struct rte_timer *tim)
 }
 
 /* must be called periodically, run all timer that expired */
-void rte_timer_manage(void)
+static void
+__rte_timer_manage(struct rte_timer_data *timer_data)
 {
 	union rte_timer_status status;
 	struct rte_timer *tim, *next_tim;
@@ -486,11 +696,12 @@ void rte_timer_manage(void)
 	struct rte_timer *prev[MAX_SKIPLIST_DEPTH + 1];
 	uint64_t cur_time;
 	int i, ret;
+	struct priv_timer *priv_timer = timer_data->priv_timer;
 
 	/* timer manager only runs on EAL thread with valid lcore_id */
 	assert(lcore_id < RTE_MAX_LCORE);
 
-	__TIMER_STAT_ADD(manage, 1);
+	__TIMER_STAT_ADD(priv_timer, manage, 1);
 	/* optimize for the case where per-cpu list is empty */
 	if (priv_timer[lcore_id].pending_head.sl_next[0] == NULL)
 		return;
@@ -518,7 +729,7 @@ void rte_timer_manage(void)
 	tim = priv_timer[lcore_id].pending_head.sl_next[0];
 
 	/* break the existing list at current time point */
-	timer_get_prev_entries(cur_time, lcore_id, prev);
+	timer_get_prev_entries(cur_time, lcore_id, prev, priv_timer);
 	for (i = priv_timer[lcore_id].curr_skiplist_depth -1; i >= 0; i--) {
 		if (prev[i] == &priv_timer[lcore_id].pending_head)
 			continue;
@@ -563,7 +774,7 @@ void rte_timer_manage(void)
 		/* execute callback function with list unlocked */
 		tim->f(tim, tim->arg);
 
-		__TIMER_STAT_ADD(pending, -1);
+		__TIMER_STAT_ADD(priv_timer, pending, -1);
 		/* the timer was stopped or reloaded by the callback
 		 * function, we have nothing to do here */
 		if (priv_timer[lcore_id].updated == 1)
@@ -580,24 +791,222 @@ void rte_timer_manage(void)
 			/* keep it in list and mark timer as pending */
 			rte_spinlock_lock(&priv_timer[lcore_id].list_lock);
 			status.state = RTE_TIMER_PENDING;
-			__TIMER_STAT_ADD(pending, 1);
+			__TIMER_STAT_ADD(priv_timer, pending, 1);
 			status.owner = (int16_t)lcore_id;
 			rte_wmb();
 			tim->status.u32 = status.u32;
 			__rte_timer_reset(tim, tim->expire + tim->period,
-				tim->period, lcore_id, tim->f, tim->arg, 1);
+				tim->period, lcore_id, tim->f, tim->arg, 1,
+				timer_data);
 			rte_spinlock_unlock(&priv_timer[lcore_id].list_lock);
 		}
 	}
 	priv_timer[lcore_id].running_tim = NULL;
 }
 
+void
+rte_timer_manage_v20(void)
+{
+	__rte_timer_manage(&default_timer_data);
+}
+VERSION_SYMBOL(rte_timer_manage, _v20, 2.0);
+
+int
+rte_timer_manage_v1905(void)
+{
+	struct rte_timer_data *timer_data;
+
+	TIMER_DATA_VALID_GET_OR_ERR_RET(default_data_id, timer_data, -EINVAL);
+
+	__rte_timer_manage(timer_data);
+
+	return 0;
+}
+MAP_STATIC_SYMBOL(int rte_timer_manage(void), rte_timer_manage_v1905);
+BIND_DEFAULT_SYMBOL(rte_timer_manage, _v1905, 19.05);
+
+int __rte_experimental
+rte_timer_alt_manage(uint32_t timer_data_id,
+		     unsigned int *poll_lcores,
+		     int nb_poll_lcores,
+		     rte_timer_alt_manage_cb_t f)
+{
+	union rte_timer_status status;
+	struct rte_timer *tim, *next_tim, **pprev;
+	struct rte_timer *run_first_tims[RTE_MAX_LCORE];
+	unsigned int runlist_lcore_ids[RTE_MAX_LCORE];
+	unsigned int this_lcore = rte_lcore_id();
+	struct rte_timer *prev[MAX_SKIPLIST_DEPTH + 1];
+	uint64_t cur_time;
+	int i, j, ret;
+	int nb_runlists = 0;
+	struct rte_timer_data *data;
+	struct priv_timer *privp;
+	uint32_t poll_lcore;
+
+	TIMER_DATA_VALID_GET_OR_ERR_RET(timer_data_id, data, -EINVAL);
+
+	/* timer manager only runs on EAL thread with valid lcore_id */
+	assert(this_lcore < RTE_MAX_LCORE);
+
+	__TIMER_STAT_ADD(data->priv_timer, manage, 1);
+
+	if (poll_lcores == NULL) {
+		poll_lcores = (unsigned int []){rte_lcore_id()};
+		nb_poll_lcores = 1;
+	}
+
+	for (i = 0, poll_lcore = poll_lcores[i]; i < nb_poll_lcores;
+	     poll_lcore = poll_lcores[++i]) {
+		privp = &data->priv_timer[poll_lcore];
+
+		/* optimize for the case where per-cpu list is empty */
+		if (privp->pending_head.sl_next[0] == NULL)
+			continue;
+		cur_time = rte_get_timer_cycles();
+
+#ifdef RTE_ARCH_64
+		/* on 64-bit the value cached in the pending_head.expired will
+		 * be updated atomically, so we can consult that for a quick
+		 * check here outside the lock
+		 */
+		if (likely(privp->pending_head.expire > cur_time))
+			continue;
+#endif
+
+		/* browse ordered list, add expired timers in 'expired' list */
+		rte_spinlock_lock(&privp->list_lock);
+
+		/* if nothing to do just unlock and return */
+		if (privp->pending_head.sl_next[0] == NULL ||
+		    privp->pending_head.sl_next[0]->expire > cur_time) {
+			rte_spinlock_unlock(&privp->list_lock);
+			continue;
+		}
+
+		/* save start of list of expired timers */
+		tim = privp->pending_head.sl_next[0];
+
+		/* break the existing list at current time point */
+		timer_get_prev_entries(cur_time, poll_lcore, prev,
+				       data->priv_timer);
+		for (j = privp->curr_skiplist_depth - 1; j >= 0; j--) {
+			if (prev[j] == &privp->pending_head)
+				continue;
+			privp->pending_head.sl_next[j] =
+				prev[j]->sl_next[j];
+			if (prev[j]->sl_next[j] == NULL)
+				privp->curr_skiplist_depth--;
+
+			prev[j]->sl_next[j] = NULL;
+		}
+
+		/* transition run-list from PENDING to RUNNING */
+		run_first_tims[nb_runlists] = tim;
+		runlist_lcore_ids[nb_runlists] = poll_lcore;
+		pprev = &run_first_tims[nb_runlists];
+		nb_runlists++;
+
+		for ( ; tim != NULL; tim = next_tim) {
+			next_tim = tim->sl_next[0];
+
+			ret = timer_set_running_state(tim);
+			if (likely(ret == 0)) {
+				pprev = &tim->sl_next[0];
+			} else {
+				/* another core is trying to re-config this one,
+				 * remove it from local expired list
+				 */
+				*pprev = next_tim;
+			}
+		}
+
+		/* update the next to expire timer value */
+		privp->pending_head.expire =
+		    (privp->pending_head.sl_next[0] == NULL) ? 0 :
+			privp->pending_head.sl_next[0]->expire;
+
+		rte_spinlock_unlock(&privp->list_lock);
+	}
+
+	/* Now process the run lists */
+	while (1) {
+		bool done = true;
+		uint64_t min_expire = UINT64_MAX;
+		int min_idx = 0;
+
+		/* Find the next oldest timer to process */
+		for (i = 0; i < nb_runlists; i++) {
+			tim = run_first_tims[i];
+
+			if (tim != NULL && tim->expire < min_expire) {
+				min_expire = tim->expire;
+				min_idx = i;
+				done = false;
+			}
+		}
+
+		if (done)
+			break;
+
+		tim = run_first_tims[min_idx];
+		privp = &data->priv_timer[runlist_lcore_ids[min_idx]];
+
+		/* Move down the runlist from which we picked a timer to
+		 * execute
+		 */
+		run_first_tims[min_idx] = run_first_tims[min_idx]->sl_next[0];
+
+		privp->updated = 0;
+		privp->running_tim = tim;
+
+		/* Call the provided callback function */
+		f(tim);
+
+		__TIMER_STAT_ADD(privp, pending, -1);
+
+		/* the timer was stopped or reloaded by the callback
+		 * function, we have nothing to do here
+		 */
+		if (privp->updated == 1)
+			continue;
+
+		if (tim->period == 0) {
+			/* remove from done list and mark timer as stopped */
+			status.state = RTE_TIMER_STOP;
+			status.owner = RTE_TIMER_NO_OWNER;
+			rte_wmb();
+			tim->status.u32 = status.u32;
+		} else {
+			/* keep it in list and mark timer as pending */
+			rte_spinlock_lock(
+				&data->priv_timer[this_lcore].list_lock);
+			status.state = RTE_TIMER_PENDING;
+			__TIMER_STAT_ADD(data->priv_timer, pending, 1);
+			status.owner = (int16_t)this_lcore;
+			rte_wmb();
+			tim->status.u32 = status.u32;
+			__rte_timer_reset(tim, tim->expire + tim->period,
+				tim->period, this_lcore, tim->f, tim->arg, 1,
+				data);
+			rte_spinlock_unlock(
+				&data->priv_timer[this_lcore].list_lock);
+		}
+
+		privp->running_tim = NULL;
+	}
+
+	return 0;
+}
+
 /* dump statistics about timers */
-void rte_timer_dump_stats(FILE *f)
+static void
+__rte_timer_dump_stats(struct rte_timer_data *timer_data __rte_unused, FILE *f)
 {
 #ifdef RTE_LIBRTE_TIMER_DEBUG
 	struct rte_timer_debug_stats sum;
 	unsigned lcore_id;
+	struct priv_timer *priv_timer = timer_data->priv_timer;
 
 	memset(&sum, 0, sizeof(sum));
 	for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
@@ -615,3 +1024,31 @@ void rte_timer_dump_stats(FILE *f)
 	fprintf(f, "No timer statistics, RTE_LIBRTE_TIMER_DEBUG is disabled\n");
 #endif
 }
+
+void
+rte_timer_dump_stats_v20(FILE *f)
+{
+	__rte_timer_dump_stats(&default_timer_data, f);
+}
+VERSION_SYMBOL(rte_timer_dump_stats, _v20, 2.0);
+
+int
+rte_timer_dump_stats_v1905(FILE *f)
+{
+	return rte_timer_alt_dump_stats(default_data_id, f);
+}
+MAP_STATIC_SYMBOL(int rte_timer_dump_stats(FILE *f),
+		  rte_timer_dump_stats_v1905);
+BIND_DEFAULT_SYMBOL(rte_timer_dump_stats, _v1905, 19.05);
+
+int __rte_experimental
+rte_timer_alt_dump_stats(uint32_t timer_data_id __rte_unused, FILE *f)
+{
+	struct rte_timer_data *timer_data;
+
+	TIMER_DATA_VALID_GET_OR_ERR_RET(timer_data_id, timer_data, -EINVAL);
+
+	__rte_timer_dump_stats(timer_data, f);
+
+	return 0;
+}
diff --git a/lib/librte_timer/rte_timer.h b/lib/librte_timer/rte_timer.h
index 9b95cd2..bee1676 100644
--- a/lib/librte_timer/rte_timer.h
+++ b/lib/librte_timer/rte_timer.h
@@ -39,6 +39,7 @@
 #include <stddef.h>
 #include <rte_common.h>
 #include <rte_config.h>
+#include <rte_spinlock.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -132,12 +133,68 @@ struct rte_timer
 #endif
 
 /**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Allocate a timer data instance in shared memory to track a set of pending
+ * timer lists.
+ *
+ * @param id_ptr
+ *   Pointer to variable into which to write the identifier of the allocated
+ *   timer data instance.
+ *
+ * @return
+ *   - 0: Success
+ *   - -ENOSPC: maximum number of timer data instances already allocated
+ */
+int __rte_experimental rte_timer_data_alloc(uint32_t *id_ptr);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Deallocate a timer data instance.
+ *
+ * @param id
+ *   Identifier of the timer data instance to deallocate.
+ *
+ * @return
+ *   - 0: Success
+ *   - -EINVAL: invalid timer data instance identifier
+ */
+int __rte_experimental rte_timer_data_dealloc(uint32_t id);
+
+/**
  * Initialize the timer library.
  *
  * Initializes internal variables (list, locks and so on) for the RTE
  * timer library.
  */
-void rte_timer_subsystem_init(void);
+void rte_timer_subsystem_init_v20(void);
+
+/**
+ * Initialize the timer library.
+ *
+ * Initializes internal variables (list, locks and so on) for the RTE
+ * timer library.
+ *
+ * @return
+ *   - 0: Success
+ *   - -EEXIST: Returned in secondary process when primary process has not
+ *      yet initialized the timer subsystem
+ *   - -ENOMEM: Unable to allocate memory needed to initialize timer
+ *      subsystem
+ */
+int rte_timer_subsystem_init_v1905(void);
+int rte_timer_subsystem_init(void);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Free timer subsystem resources.
+ */
+void __rte_experimental rte_timer_subsystem_finalize(void);
 
 /**
  * Initialize a timer handle.
@@ -193,6 +250,12 @@ void rte_timer_init(struct rte_timer *tim);
  *   - 0: Success; the timer is scheduled.
  *   - (-1): Timer is in the RUNNING or CONFIG state.
  */
+int rte_timer_reset_v20(struct rte_timer *tim, uint64_t ticks,
+			enum rte_timer_type type, unsigned int tim_lcore,
+			rte_timer_cb_t fct, void *arg);
+int rte_timer_reset_v1905(struct rte_timer *tim, uint64_t ticks,
+			  enum rte_timer_type type, unsigned int tim_lcore,
+			  rte_timer_cb_t fct, void *arg);
 int rte_timer_reset(struct rte_timer *tim, uint64_t ticks,
 		    enum rte_timer_type type, unsigned tim_lcore,
 		    rte_timer_cb_t fct, void *arg);
@@ -252,9 +315,10 @@ rte_timer_reset_sync(struct rte_timer *tim, uint64_t ticks,
  *   - 0: Success; the timer is stopped.
  *   - (-1): The timer is in the RUNNING or CONFIG state.
  */
+int rte_timer_stop_v20(struct rte_timer *tim);
+int rte_timer_stop_v1905(struct rte_timer *tim);
 int rte_timer_stop(struct rte_timer *tim);
 
-
 /**
  * Loop until rte_timer_stop() succeeds.
  *
@@ -292,7 +356,25 @@ int rte_timer_pending(struct rte_timer *tim);
  * function. However, the more often the function is called, the more
  * CPU resources it will use.
  */
-void rte_timer_manage(void);
+void rte_timer_manage_v20(void);
+
+/**
+ * Manage the timer list and execute callback functions.
+ *
+ * This function must be called periodically from EAL lcores
+ * main_loop(). It browses the list of pending timers and runs all
+ * timers that are expired.
+ *
+ * The precision of the timer depends on the call frequency of this
+ * function. However, the more often the function is called, the more
+ * CPU resources it will use.
+ *
+ * @return
+ *   - 0: Success
+ *   - -EINVAL: timer subsystem not yet initialized
+ */
+int rte_timer_manage_v1905(void);
+int rte_timer_manage(void);
 
 /**
  * Dump statistics about timers.
@@ -300,7 +382,143 @@ void rte_timer_manage(void);
  * @param f
  *   A pointer to a file for output
  */
-void rte_timer_dump_stats(FILE *f);
+void rte_timer_dump_stats_v20(FILE *f);
+
+/**
+ * Dump statistics about timers.
+ *
+ * @param f
+ *   A pointer to a file for output
+ * @return
+ *   - 0: Success
+ *   - -EINVAL: timer subsystem not yet initialized
+ */
+int rte_timer_dump_stats_v1905(FILE *f);
+int rte_timer_dump_stats(FILE *f);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * This function is the same as rte_timer_reset(), except that it allows a
+ * caller to specify the rte_timer_data instance containing the list to which
+ * the timer should be added.
+ *
+ * @see rte_timer_reset()
+ *
+ * @param timer_data_id
+ *   An identifier indicating which instance of timer data should be used for
+ *   this operation.
+ * @param tim
+ *   The timer handle.
+ * @param ticks
+ *   The number of cycles (see rte_get_hpet_hz()) before the callback
+ *   function is called.
+ * @param type
+ *   The type can be either:
+ *   - PERIODICAL: The timer is automatically reloaded after execution
+ *     (returns to the PENDING state)
+ *   - SINGLE: The timer is one-shot, that is, the timer goes to a
+ *     STOPPED state after execution.
+ * @param tim_lcore
+ *   The ID of the lcore where the timer callback function has to be
+ *   executed. If tim_lcore is LCORE_ID_ANY, the timer library will
+ *   launch it on a different core for each call (round-robin).
+ * @param fct
+ *   The callback function of the timer. This parameter can be NULL if (and
+ *   only if) rte_timer_alt_manage() will be used to manage this timer.
+ * @param arg
+ *   The user argument of the callback function.
+ * @return
+ *   - 0: Success; the timer is scheduled.
+ *   - (-1): Timer is in the RUNNING or CONFIG state.
+ *   - -EINVAL: invalid timer_data_id
+ */
+int __rte_experimental
+rte_timer_alt_reset(uint32_t timer_data_id, struct rte_timer *tim,
+		    uint64_t ticks, enum rte_timer_type type,
+		    unsigned int tim_lcore, rte_timer_cb_t fct, void *arg);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * This function is the same as rte_timer_stop(), except that it allows a
+ * caller to specify the rte_timer_data instance containing the list from which
+ * this timer should be removed.
+ *
+ * @see rte_timer_stop()
+ *
+ * @param timer_data_id
+ *   An identifier indicating which instance of timer data should be used for
+ *   this operation.
+ * @param tim
+ *   The timer handle.
+ * @return
+ *   - 0: Success; the timer is stopped.
+ *   - (-1): The timer is in the RUNNING or CONFIG state.
+ *   - -EINVAL: invalid timer_data_id
+ */
+int __rte_experimental
+rte_timer_alt_stop(uint32_t timer_data_id, struct rte_timer *tim);
+
+/**
+ * Callback function type for rte_timer_alt_manage().
+ */
+typedef void (*rte_timer_alt_manage_cb_t)(void *);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Manage a set of timer lists and execute the specified callback function for
+ * all expired timers. This function is similar to rte_timer_manage(), except
+ * that it allows a caller to specify the timer_data instance that should
+ * be operated on, as well as a set of lcore IDs identifying which timer lists
+ * should be processed.  Callback functions of individual timers are ignored.
+ *
+ * @see rte_timer_manage()
+ *
+ * @param timer_data_id
+ *   An identifier indicating which instance of timer data should be used for
+ *   this operation.
+ * @param poll_lcores
+ *   An array of lcore ids identifying the timer lists that should be processed.
+ *   NULL is allowed - if NULL, the timer list corresponding to the lcore
+ *   calling this routine is processed (same as rte_timer_manage()).
+ * @param n_poll_lcores
+ *   The size of the poll_lcores array. If 'poll_lcores' is NULL, this parameter
+ *   is ignored.
+ * @param f
+ *   The callback function which should be called for all expired timers.
+ * @return
+ *   - 0: success
+ *   - -EINVAL: invalid timer_data_id
+ */
+int __rte_experimental
+rte_timer_alt_manage(uint32_t timer_data_id, unsigned int *poll_lcores,
+		     int n_poll_lcores, rte_timer_alt_manage_cb_t f);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * This function is the same as rte_timer_dump_stats(), except that it allows
+ * the caller to specify the rte_timer_data instance that should be used.
+ *
+ * @see rte_timer_dump_stats()
+ *
+ * @param timer_data_id
+ *   An identifier indicating which instance of timer data should be used for
+ *   this operation.
+ * @param f
+ *   A pointer to a file for output
+ * @return
+ *   - 0: success
+ *   - -EINVAL: invalid timer_data_id
+ */
+int __rte_experimental
+rte_timer_alt_dump_stats(uint32_t timer_data_id, FILE *f);
 
 #ifdef __cplusplus
 }
diff --git a/lib/librte_timer/rte_timer_version.map b/lib/librte_timer/rte_timer_version.map
index 9b2e4b8..c2e5836 100644
--- a/lib/librte_timer/rte_timer_version.map
+++ b/lib/librte_timer/rte_timer_version.map
@@ -13,3 +13,25 @@ DPDK_2.0 {
 
 	local: *;
 };
+
+DPDK_19.05 {
+	global:
+
+	rte_timer_dump_stats;
+	rte_timer_manage;
+	rte_timer_reset;
+	rte_timer_stop;
+	rte_timer_subsystem_init;
+} DPDK_2.0;
+
+EXPERIMENTAL {
+	global:
+
+	rte_timer_alt_dump_stats;
+	rte_timer_alt_manage;
+	rte_timer_alt_reset;
+	rte_timer_alt_stop;
+	rte_timer_data_alloc;
+	rte_timer_data_dealloc;
+	rte_timer_subsystem_finalize;
+};
-- 
2.6.4

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [dpdk-dev] [PATCH v4 2/2] timer: add function to stop all timers in a list
  2019-03-06 17:20     ` [dpdk-dev] [PATCH v4 " Erik Gabriel Carrillo
  2019-03-06 17:20       ` [dpdk-dev] [PATCH v4 1/2] timer: allow timer management in shared memory Erik Gabriel Carrillo
@ 2019-03-06 17:20       ` Erik Gabriel Carrillo
  2019-04-15 21:41       ` [dpdk-dev] [PATCH v5 0/2] Timer library changes Erik Gabriel Carrillo
  2 siblings, 0 replies; 77+ messages in thread
From: Erik Gabriel Carrillo @ 2019-03-06 17:20 UTC (permalink / raw)
  To: rsanford, thomas; +Cc: dev, nhorman

Add a function to the timer API that allows a caller to traverse a
specified set of timer lists, stopping each timer in each list,
and invoking a callback function.

Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
---
 lib/librte_timer/rte_timer.c           | 39 ++++++++++++++++++++++++++++++++++
 lib/librte_timer/rte_timer.h           | 32 ++++++++++++++++++++++++++++
 lib/librte_timer/rte_timer_version.map |  1 +
 3 files changed, 72 insertions(+)

diff --git a/lib/librte_timer/rte_timer.c b/lib/librte_timer/rte_timer.c
index 2bd49d0..539926f 100644
--- a/lib/librte_timer/rte_timer.c
+++ b/lib/librte_timer/rte_timer.c
@@ -999,6 +999,45 @@ rte_timer_alt_manage(uint32_t timer_data_id,
 	return 0;
 }
 
+/* Walk pending lists, stopping timers and calling user-specified function */
+int __rte_experimental
+rte_timer_stop_all(uint32_t timer_data_id, unsigned int *walk_lcores,
+		   int nb_walk_lcores,
+		   rte_timer_stop_all_cb_t f, void *f_arg)
+{
+	int i;
+	struct priv_timer *priv_timer;
+	uint32_t walk_lcore;
+	struct rte_timer *tim, *next_tim;
+	struct rte_timer_data *timer_data;
+
+	TIMER_DATA_VALID_GET_OR_ERR_RET(timer_data_id, timer_data, -EINVAL);
+
+	for (i = 0, walk_lcore = walk_lcores[i];
+	     i < nb_walk_lcores;
+	     walk_lcore = walk_lcores[++i]) {
+		priv_timer = &timer_data->priv_timer[walk_lcore];
+
+		rte_spinlock_lock(&priv_timer->list_lock);
+
+		for (tim = priv_timer->pending_head.sl_next[0];
+		     tim != NULL;
+		     tim = next_tim) {
+			next_tim = tim->sl_next[0];
+
+			/* Call timer_stop with lock held */
+			__rte_timer_stop(tim, 1, timer_data);
+
+			if (f)
+				f(tim, f_arg);
+		}
+
+		rte_spinlock_unlock(&priv_timer->list_lock);
+	}
+
+	return 0;
+}
+
 /* dump statistics about timers */
 static void
 __rte_timer_dump_stats(struct rte_timer_data *timer_data __rte_unused, FILE *f)
diff --git a/lib/librte_timer/rte_timer.h b/lib/librte_timer/rte_timer.h
index bee1676..ca8a052 100644
--- a/lib/librte_timer/rte_timer.h
+++ b/lib/librte_timer/rte_timer.h
@@ -500,6 +500,38 @@ rte_timer_alt_manage(uint32_t timer_data_id, unsigned int *poll_lcores,
 		     int n_poll_lcores, rte_timer_alt_manage_cb_t f);
 
 /**
+ * Callback function type for rte_timer_stop_all().
+ */
+typedef void (*rte_timer_stop_all_cb_t)(struct rte_timer *tim, void *arg);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Walk the pending timer lists for the specified lcore IDs, and for each timer
+ * that is encountered, stop it and call the specified callback function to
+ * process it further.
+ *
+ * @param timer_data_id
+ *   An identifier indicating which instance of timer data should be used for
+ *   this operation.
+ * @param walk_lcores
+ *   An array of lcore ids identifying the timer lists that should be processed.
+ * @param nb_walk_lcores
+ *   The size of the walk_lcores array.
+ * @param f
+ *   The callback function which should be called for each timers. Can be NULL.
+ * @param f_arg
+ *   An arbitrary argument that will be passed to f, if it is called.
+ * @return
+ *   - 0: success
+ *   - EINVAL: invalid timer_data_id
+ */
+int __rte_experimental
+rte_timer_stop_all(uint32_t timer_data_id, unsigned int *walk_lcores,
+		   int nb_walk_lcores, rte_timer_stop_all_cb_t f, void *f_arg);
+
+/**
  * @warning
  * @b EXPERIMENTAL: this API may change without prior notice
  *
diff --git a/lib/librte_timer/rte_timer_version.map b/lib/librte_timer/rte_timer_version.map
index c2e5836..72f75c8 100644
--- a/lib/librte_timer/rte_timer_version.map
+++ b/lib/librte_timer/rte_timer_version.map
@@ -33,5 +33,6 @@ EXPERIMENTAL {
 	rte_timer_alt_stop;
 	rte_timer_data_alloc;
 	rte_timer_data_dealloc;
+	rte_timer_stop_all;
 	rte_timer_subsystem_finalize;
 };
-- 
2.6.4

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [dpdk-dev] [dpdk-techboard] [PATCH v3 0/2] Timer library changes
  2019-03-05 22:58       ` [dpdk-dev] [dpdk-techboard] " Thomas Monjalon
@ 2019-03-06 18:54         ` Carrillo, Erik G
  2019-03-06 20:17           ` Thomas Monjalon
  0 siblings, 1 reply; 77+ messages in thread
From: Carrillo, Erik G @ 2019-03-06 18:54 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: techboard, rsanford, dev

> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> Sent: Tuesday, March 5, 2019 4:59 PM
> To: Carrillo, Erik G <erik.g.carrillo@intel.com>
> Cc: techboard@dpdk.org; rsanford@akamai.com; dev@dpdk.org
> Subject: Re: [dpdk-techboard] [dpdk-dev] [PATCH v3 0/2] Timer library
> changes
> 
> 05/03/2019 23:41, Carrillo, Erik G:
> > Hi all,
> >
> > I'd like to bring this patch proposal up again and see if I can get any more
> feedback from the maintainer or others.
> >
> > I need to update the map file to reflect the next release, so I'll add those
> changes in if any other modifications are suggested.
> 
> Please send an updated version.
> If nobody reply after 10 days, I will push it.
> 

Thanks, Thomas.  I submitted a v4 a little earlier.

> Would you be interested to become maintainer of the timer lib?
> 
> 

Yes, I'd be willing.  I'll do my best ;)

Regards,
Gabriel

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [dpdk-dev] [dpdk-techboard] [PATCH v3 0/2] Timer library changes
  2019-03-06 18:54         ` Carrillo, Erik G
@ 2019-03-06 20:17           ` Thomas Monjalon
  0 siblings, 0 replies; 77+ messages in thread
From: Thomas Monjalon @ 2019-03-06 20:17 UTC (permalink / raw)
  To: Carrillo, Erik G; +Cc: techboard, rsanford, dev

06/03/2019 19:54, Carrillo, Erik G:
> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > 05/03/2019 23:41, Carrillo, Erik G:
> > > Hi all,
> > >
> > > I'd like to bring this patch proposal up again and see if I can get any more
> > feedback from the maintainer or others.
> > >
> > > I need to update the map file to reflect the next release, so I'll add those
> > changes in if any other modifications are suggested.
> > 
> > Please send an updated version.
> > If nobody reply after 10 days, I will push it.
> > 
> 
> Thanks, Thomas.  I submitted a v4 a little earlier.
> 
> > Would you be interested to become maintainer of the timer lib?
> > 
> > 
> 
> Yes, I'd be willing.  I'll do my best ;)

Great, thanks.
Next step will be to send a patch for MAINTAINERS file.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/2] Timer library changes
  2019-03-06 15:15         ` Carrillo, Erik G
@ 2019-03-07  2:33           ` Varghese, Vipin
  0 siblings, 0 replies; 77+ messages in thread
From: Varghese, Vipin @ 2019-03-07  2:33 UTC (permalink / raw)
  To: Carrillo, Erik G, rsanford; +Cc: dev, techboard

Hi Gabriel,

Thanks for the clarification.

> -----Original Message-----
> From: Carrillo, Erik G
> Sent: Wednesday, March 6, 2019 8:46 PM
> To: Varghese, Vipin <vipin.varghese@intel.com>; rsanford@akamai.com
> Cc: dev@dpdk.org; techboard@dpdk.org
> Subject: RE: [dpdk-dev] [PATCH v3 0/2] Timer library changes
> 
> > -----Original Message-----
> > From: Varghese, Vipin
> > Sent: Tuesday, March 5, 2019 8:39 PM
> > To: Carrillo, Erik G <erik.g.carrillo@intel.com>; rsanford@akamai.com
> > Cc: dev@dpdk.org; techboard@dpdk.org
> > Subject: RE: [dpdk-dev] [PATCH v3 0/2] Timer library changes
> >
> > Hi Erik,
> >
> > Apologies if I am reaching out a bit late. Please find my query below
> >
> > <snipped>
> > > > This enables primary and secondary processes to modify the same
> > > > timer list, which enables some multi-process use cases that were
> > > > not previously possible; e.g. a secondary process can start a
> > > > timer whose expiration is detected in a primary process running a
> > > > new flavor of
> > > timer_manage().
> > Does this mean the following, primary can detect the timer expire
> > primed by secondary. On calling new timer_manage() from primary will
> > it invoke call back handler of secondary? If yes, has this been tested
> > with shared library too?
> > <snipped>
> 
> Hi Vipin,
> 
> No, with the proposed patch,  the callback handler would need to be a function
> pointer valid in the same process that is invoking the new timer_manage().
> 
> Thanks,
> Gabriel

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [dpdk-dev] [PATCH v4 1/2] timer: allow timer management in shared memory
  2019-03-06 17:20       ` [dpdk-dev] [PATCH v4 1/2] timer: allow timer management in shared memory Erik Gabriel Carrillo
@ 2019-03-20 13:52         ` Sanford, Robert
  2019-03-20 13:52           ` Sanford, Robert
                             ` (2 more replies)
  0 siblings, 3 replies; 77+ messages in thread
From: Sanford, Robert @ 2019-03-20 13:52 UTC (permalink / raw)
  To: Erik Gabriel Carrillo, thomas, dev; +Cc: nhorman

Hi Erik,

I have a few questions and comments on this patch series.

1. Don't you think we need new tests (in test/test/) to verify the secondary-process APIs?
2. I suggest we define default_data_id as const, and explicitly set it to 0.
3. The outer for-loop in rte_timer_alt_manage() touches beyond the end of poll_lcores[]. I suggest a change like this:

-       for (i = 0, poll_lcore = poll_lcores[i]; i < nb_poll_lcores;
-            poll_lcore = poll_lcores[++i]) {
+       for (i = 0; I < nb_poll_lcores; i++) {
+            poll_lcore = poll_lcores[i];

4. Same problem (as #3) in the for-loop in rte_timer_stop_all(), in patch v4 2/2.
5. There seems to be no difference between "typedef void (*rte_timer_cb_t)(struct rte_timer *, void *)" and "typedef void (*rte_timer_stop_all_cb_t)(struct rte_timer *tim, void *arg)", why add rte_timer_stop_all_cb_t?
6. Can you provide a use case or code snippet that shows how we will use rte_timer_alt_manage()?
7. Why not make the argument to rte_timer_alt_manage_cb_t a "struct rte_timer *", instead of a "void *", since we pass a pointer-to-timer when we invoke the function?

--
Regards,
Robert Sanford


On 3/6/19, 12:20 PM, "Erik Gabriel Carrillo" <erik.g.carrillo@intel.com> wrote:

Currently, the timer library uses a per-process table of structures to
manage skiplists of timers presumably because timers contain arbitrary
function pointers whose value may not resolve properly in other
processes.

However, if the same callback is used handle all timers, and that
callback is only invoked in one process, then it woud be safe to allow
the data structures to be allocated in shared memory, and to allow
secondary processes to modify the timer lists.  This would let timers be
used in more multi-process scenarios.

The library's global variables are wrapped with a struct, and an array
of these structures is created in shared memory.  The original APIs
are updated to reference the zeroth entry in the array. This maintains
the original behavior for both primary and secondary processes since
the set intersection of their coremasks should be empty [1].  New APIs
are introduced to enable the allocation/deallocation of other entries
in the array.

New variants of the APIs used to start and stop timers are introduced;
they allow a caller to specify which array entry should be used to
locate the timer list to insert into or delete from.

Finally, a new variant of rte_timer_manage() is introduced, which
allows a caller to specify which array entry should be used to locate
the timer lists to process; it can also process multiple timer lists per
invocation.

[1] https://doc.dpdk.org/guides/prog_guide/multi_proc_support.html#multi-process-limitations

Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
---
 lib/librte_timer/Makefile              |   1 +
 lib/librte_timer/rte_timer.c           | 519 ++++++++++++++++++++++++++++++---
 lib/librte_timer/rte_timer.h           | 226 +++++++++++++-
 lib/librte_timer/rte_timer_version.map |  22 ++
 4 files changed, 723 insertions(+), 45 deletions(-)

diff --git a/lib/librte_timer/Makefile b/lib/librte_timer/Makefile
index 4ebd528..8ec63f4 100644
--- a/lib/librte_timer/Makefile
+++ b/lib/librte_timer/Makefile
@@ -6,6 +6,7 @@ include $(RTE_SDK)/mk/rte.vars.mk
 # library name
 LIB = librte_timer.a
 
+CFLAGS += -DALLOW_EXPERIMENTAL_API
 CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3
 LDLIBS += -lrte_eal
 
diff --git a/lib/librte_timer/rte_timer.c b/lib/librte_timer/rte_timer.c
index 30c7b0a..2bd49d0 100644
--- a/lib/librte_timer/rte_timer.c
+++ b/lib/librte_timer/rte_timer.c
@@ -5,6 +5,7 @@
 #include <string.h>
 #include <stdio.h>
 #include <stdint.h>
+#include <stdbool.h>
 #include <inttypes.h>
 #include <assert.h>
 #include <sys/queue.h>
@@ -21,11 +22,15 @@
 #include <rte_spinlock.h>
 #include <rte_random.h>
 #include <rte_pause.h>
+#include <rte_memzone.h>
+#include <rte_malloc.h>
+#include <rte_compat.h>
 
 #include "rte_timer.h"
 
-LIST_HEAD(rte_timer_list, rte_timer);
-
+/**
+ * Per-lcore info for timers.
+ */
 struct priv_timer {
 	struct rte_timer pending_head;  /**< dummy timer instance to head up list */
 	rte_spinlock_t list_lock;       /**< lock to protect list access */
@@ -48,25 +53,84 @@ struct priv_timer {
 #endif
 } __rte_cache_aligned;
 
-/** per-lcore private info for timers */
-static struct priv_timer priv_timer[RTE_MAX_LCORE];
+#define FL_ALLOCATED	(1 << 0)
+struct rte_timer_data {
+	struct priv_timer priv_timer[RTE_MAX_LCORE];
+	uint8_t internal_flags;
+};
+
+#define RTE_MAX_DATA_ELS 64
+static struct rte_timer_data *rte_timer_data_arr;
+static uint32_t default_data_id;
+static uint32_t rte_timer_subsystem_initialized;
+
+/* For maintaining older interfaces for a period */
+static struct rte_timer_data default_timer_data;
 
 /* when debug is enabled, store some statistics */
 #ifdef RTE_LIBRTE_TIMER_DEBUG
-#define __TIMER_STAT_ADD(name, n) do {					\
+#define __TIMER_STAT_ADD(priv_timer, name, n) do {			\
 		unsigned __lcore_id = rte_lcore_id();			\
 		if (__lcore_id < RTE_MAX_LCORE)				\
 			priv_timer[__lcore_id].stats.name += (n);	\
 	} while(0)
 #else
-#define __TIMER_STAT_ADD(name, n) do {} while(0)
+#define __TIMER_STAT_ADD(priv_timer, name, n) do {} while (0)
 #endif
 
-/* Init the timer library. */
+static inline int
+timer_data_valid(uint32_t id)
+{
+	return !!(rte_timer_data_arr[id].internal_flags & FL_ALLOCATED);
+}
+
+/* validate ID and retrieve timer data pointer, or return error value */
+#define TIMER_DATA_VALID_GET_OR_ERR_RET(id, timer_data, retval) do {	\
+	if (id >= RTE_MAX_DATA_ELS || !timer_data_valid(id))		\
+		return retval;						\
+	timer_data = &rte_timer_data_arr[id];				\
+} while (0)
+
+int __rte_experimental
+rte_timer_data_alloc(uint32_t *id_ptr)
+{
+	int i;
+	struct rte_timer_data *data;
+
+	if (!rte_timer_subsystem_initialized)
+		return -ENOMEM;
+
+	for (i = 0; i < RTE_MAX_DATA_ELS; i++) {
+		data = &rte_timer_data_arr[i];
+		if (!(data->internal_flags & FL_ALLOCATED)) {
+			data->internal_flags |= FL_ALLOCATED;
+
+			if (id_ptr)
+				*id_ptr = i;
+
+			return 0;
+		}
+	}
+
+	return -ENOSPC;
+}
+
+int __rte_experimental
+rte_timer_data_dealloc(uint32_t id)
+{
+	struct rte_timer_data *timer_data;
+	TIMER_DATA_VALID_GET_OR_ERR_RET(id, timer_data, -EINVAL);
+
+	timer_data->internal_flags &= ~(FL_ALLOCATED);
+
+	return 0;
+}
+
 void
-rte_timer_subsystem_init(void)
+rte_timer_subsystem_init_v20(void)
 {
 	unsigned lcore_id;
+	struct priv_timer *priv_timer = default_timer_data.priv_timer;
 
 	/* since priv_timer is static, it's zeroed by default, so only init some
 	 * fields.
@@ -76,6 +140,76 @@ rte_timer_subsystem_init(void)
 		priv_timer[lcore_id].prev_lcore = lcore_id;
 	}
 }
+VERSION_SYMBOL(rte_timer_subsystem_init, _v20, 2.0);
+
+/* Init the timer library. Allocate an array of timer data structs in shared
+ * memory, and allocate the zeroth entry for use with original timer
+ * APIs. Since the intersection of the sets of lcore ids in primary and
+ * secondary processes should be empty, the zeroth entry can be shared by
+ * multiple processes.
+ */
+int
+rte_timer_subsystem_init_v1905(void)
+{
+	const struct rte_memzone *mz;
+	struct rte_timer_data *data;
+	int i, lcore_id;
+	static const char *mz_name = "rte_timer_mz";
+
+	if (rte_timer_subsystem_initialized)
+		return -EALREADY;
+
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
+		mz = rte_memzone_lookup(mz_name);
+		if (mz == NULL)
+			return -EEXIST;
+
+		rte_timer_data_arr = mz->addr;
+
+		rte_timer_data_arr[default_data_id].internal_flags |=
+			FL_ALLOCATED;
+
+		rte_timer_subsystem_initialized = 1;
+
+		return 0;
+	}
+
+	mz = rte_memzone_reserve_aligned(mz_name,
+			RTE_MAX_DATA_ELS * sizeof(*rte_timer_data_arr),
+			SOCKET_ID_ANY, 0, RTE_CACHE_LINE_SIZE);
+	if (mz == NULL)
+		return -ENOMEM;
+
+	rte_timer_data_arr = mz->addr;
+
+	for (i = 0; i < RTE_MAX_DATA_ELS; i++) {
+		data = &rte_timer_data_arr[i];
+
+		for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
+			rte_spinlock_init(
+				&data->priv_timer[lcore_id].list_lock);
+			data->priv_timer[lcore_id].prev_lcore = lcore_id;
+		}
+	}
+
+	rte_timer_data_arr[default_data_id].internal_flags |= FL_ALLOCATED;
+
+	rte_timer_subsystem_initialized = 1;
+
+	return 0;
+}
+MAP_STATIC_SYMBOL(int rte_timer_subsystem_init(void),
+		  rte_timer_subsystem_init_v1905);
+BIND_DEFAULT_SYMBOL(rte_timer_subsystem_init, _v1905, 19.05);
+
+void __rte_experimental
+rte_timer_subsystem_finalize(void)
+{
+	if (rte_timer_data_arr)
+		rte_free(rte_timer_data_arr);
+
+	rte_timer_subsystem_initialized = 0;
+}
 
 /* Initialize the timer handle tim for use */
 void
@@ -95,7 +229,8 @@ rte_timer_init(struct rte_timer *tim)
  */
 static int
 timer_set_config_state(struct rte_timer *tim,
-		       union rte_timer_status *ret_prev_status)
+		       union rte_timer_status *ret_prev_status,
+		       struct priv_timer *priv_timer)
 {
 	union rte_timer_status prev_status, status;
 	int success = 0;
@@ -207,7 +342,7 @@ timer_get_skiplist_level(unsigned curr_depth)
  */
 static void
 timer_get_prev_entries(uint64_t time_val, unsigned tim_lcore,
-		struct rte_timer **prev)
+		       struct rte_timer **prev, struct priv_timer *priv_timer)
 {
 	unsigned lvl = priv_timer[tim_lcore].curr_skiplist_depth;
 	prev[lvl] = &priv_timer[tim_lcore].pending_head;
@@ -226,13 +361,15 @@ timer_get_prev_entries(uint64_t time_val, unsigned tim_lcore,
  */
 static void
 timer_get_prev_entries_for_node(struct rte_timer *tim, unsigned tim_lcore,
-		struct rte_timer **prev)
+				struct rte_timer **prev,
+				struct priv_timer *priv_timer)
 {
 	int i;
+
 	/* to get a specific entry in the list, look for just lower than the time
 	 * values, and then increment on each level individually if necessary
 	 */
-	timer_get_prev_entries(tim->expire - 1, tim_lcore, prev);
+	timer_get_prev_entries(tim->expire - 1, tim_lcore, prev, priv_timer);
 	for (i = priv_timer[tim_lcore].curr_skiplist_depth - 1; i >= 0; i--) {
 		while (prev[i]->sl_next[i] != NULL &&
 				prev[i]->sl_next[i] != tim &&
@@ -247,14 +384,15 @@ timer_get_prev_entries_for_node(struct rte_timer *tim, unsigned tim_lcore,
  * timer must not be in a list
  */
 static void
-timer_add(struct rte_timer *tim, unsigned int tim_lcore)
+timer_add(struct rte_timer *tim, unsigned int tim_lcore,
+	  struct priv_timer *priv_timer)
 {
 	unsigned lvl;
 	struct rte_timer *prev[MAX_SKIPLIST_DEPTH+1];
 
 	/* find where exactly this element goes in the list of elements
 	 * for each depth. */
-	timer_get_prev_entries(tim->expire, tim_lcore, prev);
+	timer_get_prev_entries(tim->expire, tim_lcore, prev, priv_timer);
 
 	/* now assign it a new level and add at that level */
 	const unsigned tim_level = timer_get_skiplist_level(
@@ -284,7 +422,7 @@ timer_add(struct rte_timer *tim, unsigned int tim_lcore)
  */
 static void
 timer_del(struct rte_timer *tim, union rte_timer_status prev_status,
-		int local_is_locked)
+	  int local_is_locked, struct priv_timer *priv_timer)
 {
 	unsigned lcore_id = rte_lcore_id();
 	unsigned prev_owner = prev_status.owner;
@@ -304,7 +442,7 @@ timer_del(struct rte_timer *tim, union rte_timer_status prev_status,
 				((tim->sl_next[0] == NULL) ? 0 : tim->sl_next[0]->expire);
 
 	/* adjust pointers from previous entries to point past this */
-	timer_get_prev_entries_for_node(tim, prev_owner, prev);
+	timer_get_prev_entries_for_node(tim, prev_owner, prev, priv_timer);
 	for (i = priv_timer[prev_owner].curr_skiplist_depth - 1; i >= 0; i--) {
 		if (prev[i]->sl_next[i] == tim)
 			prev[i]->sl_next[i] = tim->sl_next[i];
@@ -326,11 +464,13 @@ static int
 __rte_timer_reset(struct rte_timer *tim, uint64_t expire,
 		  uint64_t period, unsigned tim_lcore,
 		  rte_timer_cb_t fct, void *arg,
-		  int local_is_locked)
+		  int local_is_locked,
+		  struct rte_timer_data *timer_data)
 {
 	union rte_timer_status prev_status, status;
 	int ret;
 	unsigned lcore_id = rte_lcore_id();
+	struct priv_timer *priv_timer = timer_data->priv_timer;
 
 	/* round robin for tim_lcore */
 	if (tim_lcore == (unsigned)LCORE_ID_ANY) {
@@ -348,11 +488,11 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t expire,
 
 	/* wait that the timer is in correct status before update,
 	 * and mark it as being configured */
-	ret = timer_set_config_state(tim, &prev_status);
+	ret = timer_set_config_state(tim, &prev_status, priv_timer);
 	if (ret < 0)
 		return -1;
 
-	__TIMER_STAT_ADD(reset, 1);
+	__TIMER_STAT_ADD(priv_timer, reset, 1);
 	if (prev_status.state == RTE_TIMER_RUNNING &&
 	    lcore_id < RTE_MAX_LCORE) {
 		priv_timer[lcore_id].updated = 1;
@@ -360,8 +500,8 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t expire,
 
 	/* remove it from list */
 	if (prev_status.state == RTE_TIMER_PENDING) {
-		timer_del(tim, prev_status, local_is_locked);
-		__TIMER_STAT_ADD(pending, -1);
+		timer_del(tim, prev_status, local_is_locked, priv_timer);
+		__TIMER_STAT_ADD(priv_timer, pending, -1);
 	}
 
 	tim->period = period;
@@ -376,8 +516,8 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t expire,
 	if (tim_lcore != lcore_id || !local_is_locked)
 		rte_spinlock_lock(&priv_timer[tim_lcore].list_lock);
 
-	__TIMER_STAT_ADD(pending, 1);
-	timer_add(tim, tim_lcore);
+	__TIMER_STAT_ADD(priv_timer, pending, 1);
+	timer_add(tim, tim_lcore, priv_timer);
 
 	/* update state: as we are in CONFIG state, only us can modify
 	 * the state so we don't need to use cmpset() here */
@@ -394,9 +534,9 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t expire,
 
 /* Reset and start the timer associated with the timer handle tim */
 int
-rte_timer_reset(struct rte_timer *tim, uint64_t ticks,
-		enum rte_timer_type type, unsigned tim_lcore,
-		rte_timer_cb_t fct, void *arg)
+rte_timer_reset_v20(struct rte_timer *tim, uint64_t ticks,
+		    enum rte_timer_type type, unsigned int tim_lcore,
+		    rte_timer_cb_t fct, void *arg)
 {
 	uint64_t cur_time = rte_get_timer_cycles();
 	uint64_t period;
@@ -412,7 +552,48 @@ rte_timer_reset(struct rte_timer *tim, uint64_t ticks,
 		period = 0;
 
 	return __rte_timer_reset(tim,  cur_time + ticks, period, tim_lcore,
-			  fct, arg, 0);
+			  fct, arg, 0, &default_timer_data);
+}
+VERSION_SYMBOL(rte_timer_reset, _v20, 2.0);
+
+int
+rte_timer_reset_v1905(struct rte_timer *tim, uint64_t ticks,
+		      enum rte_timer_type type, unsigned int tim_lcore,
+		      rte_timer_cb_t fct, void *arg)
+{
+	return rte_timer_alt_reset(default_data_id, tim, ticks, type,
+				   tim_lcore, fct, arg);
+}
+MAP_STATIC_SYMBOL(int rte_timer_reset(struct rte_timer *tim, uint64_t ticks,
+				      enum rte_timer_type type,
+				      unsigned int tim_lcore,
+				      rte_timer_cb_t fct, void *arg),
+		  rte_timer_reset_v1905);
+BIND_DEFAULT_SYMBOL(rte_timer_reset, _v1905, 19.05);
+
+int __rte_experimental
+rte_timer_alt_reset(uint32_t timer_data_id, struct rte_timer *tim,
+		    uint64_t ticks, enum rte_timer_type type,
+		    unsigned int tim_lcore, rte_timer_cb_t fct, void *arg)
+{
+	uint64_t cur_time = rte_get_timer_cycles();
+	uint64_t period;
+	struct rte_timer_data *timer_data;
+
+	TIMER_DATA_VALID_GET_OR_ERR_RET(timer_data_id, timer_data, -EINVAL);
+
+	if (unlikely((tim_lcore != (unsigned int)LCORE_ID_ANY) &&
+			!(rte_lcore_is_enabled(tim_lcore) ||
+			  rte_lcore_has_role(tim_lcore, ROLE_SERVICE))))
+		return -1;
+
+	if (type == PERIODICAL)
+		period = ticks;
+	else
+		period = 0;
+
+	return __rte_timer_reset(tim,  cur_time + ticks, period, tim_lcore,
+				 fct, arg, 0, timer_data);
 }
 
 /* loop until rte_timer_reset() succeed */
@@ -426,21 +607,22 @@ rte_timer_reset_sync(struct rte_timer *tim, uint64_t ticks,
 		rte_pause();
 }
 
-/* Stop the timer associated with the timer handle tim */
-int
-rte_timer_stop(struct rte_timer *tim)
+static int
+__rte_timer_stop(struct rte_timer *tim, int local_is_locked,
+		 struct rte_timer_data *timer_data)
 {
 	union rte_timer_status prev_status, status;
 	unsigned lcore_id = rte_lcore_id();
 	int ret;
+	struct priv_timer *priv_timer = timer_data->priv_timer;
 
 	/* wait that the timer is in correct status before update,
 	 * and mark it as being configured */
-	ret = timer_set_config_state(tim, &prev_status);
+	ret = timer_set_config_state(tim, &prev_status, priv_timer);
 	if (ret < 0)
 		return -1;
 
-	__TIMER_STAT_ADD(stop, 1);
+	__TIMER_STAT_ADD(priv_timer, stop, 1);
 	if (prev_status.state == RTE_TIMER_RUNNING &&
 	    lcore_id < RTE_MAX_LCORE) {
 		priv_timer[lcore_id].updated = 1;
@@ -448,8 +630,8 @@ rte_timer_stop(struct rte_timer *tim)
 
 	/* remove it from list */
 	if (prev_status.state == RTE_TIMER_PENDING) {
-		timer_del(tim, prev_status, 0);
-		__TIMER_STAT_ADD(pending, -1);
+		timer_del(tim, prev_status, local_is_locked, priv_timer);
+		__TIMER_STAT_ADD(priv_timer, pending, -1);
 	}
 
 	/* mark timer as stopped */
@@ -461,6 +643,33 @@ rte_timer_stop(struct rte_timer *tim)
 	return 0;
 }
 
+/* Stop the timer associated with the timer handle tim */
+int
+rte_timer_stop_v20(struct rte_timer *tim)
+{
+	return __rte_timer_stop(tim, 0, &default_timer_data);
+}
+VERSION_SYMBOL(rte_timer_stop, _v20, 2.0);
+
+int
+rte_timer_stop_v1905(struct rte_timer *tim)
+{
+	return rte_timer_alt_stop(default_data_id, tim);
+}
+MAP_STATIC_SYMBOL(int rte_timer_stop(struct rte_timer *tim),
+		  rte_timer_stop_v1905);
+BIND_DEFAULT_SYMBOL(rte_timer_stop, _v1905, 19.05);
+
+int __rte_experimental
+rte_timer_alt_stop(uint32_t timer_data_id, struct rte_timer *tim)
+{
+	struct rte_timer_data *timer_data;
+
+	TIMER_DATA_VALID_GET_OR_ERR_RET(timer_data_id, timer_data, -EINVAL);
+
+	return __rte_timer_stop(tim, 0, timer_data);
+}
+
 /* loop until rte_timer_stop() succeed */
 void
 rte_timer_stop_sync(struct rte_timer *tim)
@@ -477,7 +686,8 @@ rte_timer_pending(struct rte_timer *tim)
 }
 
 /* must be called periodically, run all timer that expired */
-void rte_timer_manage(void)
+static void
+__rte_timer_manage(struct rte_timer_data *timer_data)
 {
 	union rte_timer_status status;
 	struct rte_timer *tim, *next_tim;
@@ -486,11 +696,12 @@ void rte_timer_manage(void)
 	struct rte_timer *prev[MAX_SKIPLIST_DEPTH + 1];
 	uint64_t cur_time;
 	int i, ret;
+	struct priv_timer *priv_timer = timer_data->priv_timer;
 
 	/* timer manager only runs on EAL thread with valid lcore_id */
 	assert(lcore_id < RTE_MAX_LCORE);
 
-	__TIMER_STAT_ADD(manage, 1);
+	__TIMER_STAT_ADD(priv_timer, manage, 1);
 	/* optimize for the case where per-cpu list is empty */
 	if (priv_timer[lcore_id].pending_head.sl_next[0] == NULL)
 		return;
@@ -518,7 +729,7 @@ void rte_timer_manage(void)
 	tim = priv_timer[lcore_id].pending_head.sl_next[0];
 
 	/* break the existing list at current time point */
-	timer_get_prev_entries(cur_time, lcore_id, prev);
+	timer_get_prev_entries(cur_time, lcore_id, prev, priv_timer);
 	for (i = priv_timer[lcore_id].curr_skiplist_depth -1; i >= 0; i--) {
 		if (prev[i] == &priv_timer[lcore_id].pending_head)
 			continue;
@@ -563,7 +774,7 @@ void rte_timer_manage(void)
 		/* execute callback function with list unlocked */
 		tim->f(tim, tim->arg);
 
-		__TIMER_STAT_ADD(pending, -1);
+		__TIMER_STAT_ADD(priv_timer, pending, -1);
 		/* the timer was stopped or reloaded by the callback
 		 * function, we have nothing to do here */
 		if (priv_timer[lcore_id].updated == 1)
@@ -580,24 +791,222 @@ void rte_timer_manage(void)
 			/* keep it in list and mark timer as pending */
 			rte_spinlock_lock(&priv_timer[lcore_id].list_lock);
 			status.state = RTE_TIMER_PENDING;
-			__TIMER_STAT_ADD(pending, 1);
+			__TIMER_STAT_ADD(priv_timer, pending, 1);
 			status.owner = (int16_t)lcore_id;
 			rte_wmb();
 			tim->status.u32 = status.u32;
 			__rte_timer_reset(tim, tim->expire + tim->period,
-				tim->period, lcore_id, tim->f, tim->arg, 1);
+				tim->period, lcore_id, tim->f, tim->arg, 1,
+				timer_data);
 			rte_spinlock_unlock(&priv_timer[lcore_id].list_lock);
 		}
 	}
 	priv_timer[lcore_id].running_tim = NULL;
 }
 
+void
+rte_timer_manage_v20(void)
+{
+	__rte_timer_manage(&default_timer_data);
+}
+VERSION_SYMBOL(rte_timer_manage, _v20, 2.0);
+
+int
+rte_timer_manage_v1905(void)
+{
+	struct rte_timer_data *timer_data;
+
+	TIMER_DATA_VALID_GET_OR_ERR_RET(default_data_id, timer_data, -EINVAL);
+
+	__rte_timer_manage(timer_data);
+
+	return 0;
+}
+MAP_STATIC_SYMBOL(int rte_timer_manage(void), rte_timer_manage_v1905);
+BIND_DEFAULT_SYMBOL(rte_timer_manage, _v1905, 19.05);
+
+int __rte_experimental
+rte_timer_alt_manage(uint32_t timer_data_id,
+		     unsigned int *poll_lcores,
+		     int nb_poll_lcores,
+		     rte_timer_alt_manage_cb_t f)
+{
+	union rte_timer_status status;
+	struct rte_timer *tim, *next_tim, **pprev;
+	struct rte_timer *run_first_tims[RTE_MAX_LCORE];
+	unsigned int runlist_lcore_ids[RTE_MAX_LCORE];
+	unsigned int this_lcore = rte_lcore_id();
+	struct rte_timer *prev[MAX_SKIPLIST_DEPTH + 1];
+	uint64_t cur_time;
+	int i, j, ret;
+	int nb_runlists = 0;
+	struct rte_timer_data *data;
+	struct priv_timer *privp;
+	uint32_t poll_lcore;
+
+	TIMER_DATA_VALID_GET_OR_ERR_RET(timer_data_id, data, -EINVAL);
+
+	/* timer manager only runs on EAL thread with valid lcore_id */
+	assert(this_lcore < RTE_MAX_LCORE);
+
+	__TIMER_STAT_ADD(data->priv_timer, manage, 1);
+
+	if (poll_lcores == NULL) {
+		poll_lcores = (unsigned int []){rte_lcore_id()};
+		nb_poll_lcores = 1;
+	}
+
+	for (i = 0, poll_lcore = poll_lcores[i]; i < nb_poll_lcores;
+	     poll_lcore = poll_lcores[++i]) {
+		privp = &data->priv_timer[poll_lcore];
+
+		/* optimize for the case where per-cpu list is empty */
+		if (privp->pending_head.sl_next[0] == NULL)
+			continue;
+		cur_time = rte_get_timer_cycles();
+
+#ifdef RTE_ARCH_64
+		/* on 64-bit the value cached in the pending_head.expired will
+		 * be updated atomically, so we can consult that for a quick
+		 * check here outside the lock
+		 */
+		if (likely(privp->pending_head.expire > cur_time))
+			continue;
+#endif
+
+		/* browse ordered list, add expired timers in 'expired' list */
+		rte_spinlock_lock(&privp->list_lock);
+
+		/* if nothing to do just unlock and return */
+		if (privp->pending_head.sl_next[0] == NULL ||
+		    privp->pending_head.sl_next[0]->expire > cur_time) {
+			rte_spinlock_unlock(&privp->list_lock);
+			continue;
+		}
+
+		/* save start of list of expired timers */
+		tim = privp->pending_head.sl_next[0];
+
+		/* break the existing list at current time point */
+		timer_get_prev_entries(cur_time, poll_lcore, prev,
+				       data->priv_timer);
+		for (j = privp->curr_skiplist_depth - 1; j >= 0; j--) {
+			if (prev[j] == &privp->pending_head)
+				continue;
+			privp->pending_head.sl_next[j] =
+				prev[j]->sl_next[j];
+			if (prev[j]->sl_next[j] == NULL)
+				privp->curr_skiplist_depth--;
+
+			prev[j]->sl_next[j] = NULL;
+		}
+
+		/* transition run-list from PENDING to RUNNING */
+		run_first_tims[nb_runlists] = tim;
+		runlist_lcore_ids[nb_runlists] = poll_lcore;
+		pprev = &run_first_tims[nb_runlists];
+		nb_runlists++;
+
+		for ( ; tim != NULL; tim = next_tim) {
+			next_tim = tim->sl_next[0];
+
+			ret = timer_set_running_state(tim);
+			if (likely(ret == 0)) {
+				pprev = &tim->sl_next[0];
+			} else {
+				/* another core is trying to re-config this one,
+				 * remove it from local expired list
+				 */
+				*pprev = next_tim;
+			}
+		}
+
+		/* update the next to expire timer value */
+		privp->pending_head.expire =
+		    (privp->pending_head.sl_next[0] == NULL) ? 0 :
+			privp->pending_head.sl_next[0]->expire;
+
+		rte_spinlock_unlock(&privp->list_lock);
+	}
+
+	/* Now process the run lists */
+	while (1) {
+		bool done = true;
+		uint64_t min_expire = UINT64_MAX;
+		int min_idx = 0;
+
+		/* Find the next oldest timer to process */
+		for (i = 0; i < nb_runlists; i++) {
+			tim = run_first_tims[i];
+
+			if (tim != NULL && tim->expire < min_expire) {
+				min_expire = tim->expire;
+				min_idx = i;
+				done = false;
+			}
+		}
+
+		if (done)
+			break;
+
+		tim = run_first_tims[min_idx];
+		privp = &data->priv_timer[runlist_lcore_ids[min_idx]];
+
+		/* Move down the runlist from which we picked a timer to
+		 * execute
+		 */
+		run_first_tims[min_idx] = run_first_tims[min_idx]->sl_next[0];
+
+		privp->updated = 0;
+		privp->running_tim = tim;
+
+		/* Call the provided callback function */
+		f(tim);
+
+		__TIMER_STAT_ADD(privp, pending, -1);
+
+		/* the timer was stopped or reloaded by the callback
+		 * function, we have nothing to do here
+		 */
+		if (privp->updated == 1)
+			continue;
+
+		if (tim->period == 0) {
+			/* remove from done list and mark timer as stopped */
+			status.state = RTE_TIMER_STOP;
+			status.owner = RTE_TIMER_NO_OWNER;
+			rte_wmb();
+			tim->status.u32 = status.u32;
+		} else {
+			/* keep it in list and mark timer as pending */
+			rte_spinlock_lock(
+				&data->priv_timer[this_lcore].list_lock);
+			status.state = RTE_TIMER_PENDING;
+			__TIMER_STAT_ADD(data->priv_timer, pending, 1);
+			status.owner = (int16_t)this_lcore;
+			rte_wmb();
+			tim->status.u32 = status.u32;
+			__rte_timer_reset(tim, tim->expire + tim->period,
+				tim->period, this_lcore, tim->f, tim->arg, 1,
+				data);
+			rte_spinlock_unlock(
+				&data->priv_timer[this_lcore].list_lock);
+		}
+
+		privp->running_tim = NULL;
+	}
+
+	return 0;
+}
+
 /* dump statistics about timers */
-void rte_timer_dump_stats(FILE *f)
+static void
+__rte_timer_dump_stats(struct rte_timer_data *timer_data __rte_unused, FILE *f)
 {
 #ifdef RTE_LIBRTE_TIMER_DEBUG
 	struct rte_timer_debug_stats sum;
 	unsigned lcore_id;
+	struct priv_timer *priv_timer = timer_data->priv_timer;
 
 	memset(&sum, 0, sizeof(sum));
 	for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
@@ -615,3 +1024,31 @@ void rte_timer_dump_stats(FILE *f)
 	fprintf(f, "No timer statistics, RTE_LIBRTE_TIMER_DEBUG is disabled\n");
 #endif
 }
+
+void
+rte_timer_dump_stats_v20(FILE *f)
+{
+	__rte_timer_dump_stats(&default_timer_data, f);
+}
+VERSION_SYMBOL(rte_timer_dump_stats, _v20, 2.0);
+
+int
+rte_timer_dump_stats_v1905(FILE *f)
+{
+	return rte_timer_alt_dump_stats(default_data_id, f);
+}
+MAP_STATIC_SYMBOL(int rte_timer_dump_stats(FILE *f),
+		  rte_timer_dump_stats_v1905);
+BIND_DEFAULT_SYMBOL(rte_timer_dump_stats, _v1905, 19.05);
+
+int __rte_experimental
+rte_timer_alt_dump_stats(uint32_t timer_data_id __rte_unused, FILE *f)
+{
+	struct rte_timer_data *timer_data;
+
+	TIMER_DATA_VALID_GET_OR_ERR_RET(timer_data_id, timer_data, -EINVAL);
+
+	__rte_timer_dump_stats(timer_data, f);
+
+	return 0;
+}
diff --git a/lib/librte_timer/rte_timer.h b/lib/librte_timer/rte_timer.h
index 9b95cd2..bee1676 100644
--- a/lib/librte_timer/rte_timer.h
+++ b/lib/librte_timer/rte_timer.h
@@ -39,6 +39,7 @@
 #include <stddef.h>
 #include <rte_common.h>
 #include <rte_config.h>
+#include <rte_spinlock.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -132,12 +133,68 @@ struct rte_timer
 #endif
 
 /**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Allocate a timer data instance in shared memory to track a set of pending
+ * timer lists.
+ *
+ * @param id_ptr
+ *   Pointer to variable into which to write the identifier of the allocated
+ *   timer data instance.
+ *
+ * @return
+ *   - 0: Success
+ *   - -ENOSPC: maximum number of timer data instances already allocated
+ */
+int __rte_experimental rte_timer_data_alloc(uint32_t *id_ptr);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Deallocate a timer data instance.
+ *
+ * @param id
+ *   Identifier of the timer data instance to deallocate.
+ *
+ * @return
+ *   - 0: Success
+ *   - -EINVAL: invalid timer data instance identifier
+ */
+int __rte_experimental rte_timer_data_dealloc(uint32_t id);
+
+/**
  * Initialize the timer library.
  *
  * Initializes internal variables (list, locks and so on) for the RTE
  * timer library.
  */
-void rte_timer_subsystem_init(void);
+void rte_timer_subsystem_init_v20(void);
+
+/**
+ * Initialize the timer library.
+ *
+ * Initializes internal variables (list, locks and so on) for the RTE
+ * timer library.
+ *
+ * @return
+ *   - 0: Success
+ *   - -EEXIST: Returned in secondary process when primary process has not
+ *      yet initialized the timer subsystem
+ *   - -ENOMEM: Unable to allocate memory needed to initialize timer
+ *      subsystem
+ */
+int rte_timer_subsystem_init_v1905(void);
+int rte_timer_subsystem_init(void);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Free timer subsystem resources.
+ */
+void __rte_experimental rte_timer_subsystem_finalize(void);
 
 /**
  * Initialize a timer handle.
@@ -193,6 +250,12 @@ void rte_timer_init(struct rte_timer *tim);
  *   - 0: Success; the timer is scheduled.
  *   - (-1): Timer is in the RUNNING or CONFIG state.
  */
+int rte_timer_reset_v20(struct rte_timer *tim, uint64_t ticks,
+			enum rte_timer_type type, unsigned int tim_lcore,
+			rte_timer_cb_t fct, void *arg);
+int rte_timer_reset_v1905(struct rte_timer *tim, uint64_t ticks,
+			  enum rte_timer_type type, unsigned int tim_lcore,
+			  rte_timer_cb_t fct, void *arg);
 int rte_timer_reset(struct rte_timer *tim, uint64_t ticks,
 		    enum rte_timer_type type, unsigned tim_lcore,
 		    rte_timer_cb_t fct, void *arg);
@@ -252,9 +315,10 @@ rte_timer_reset_sync(struct rte_timer *tim, uint64_t ticks,
  *   - 0: Success; the timer is stopped.
  *   - (-1): The timer is in the RUNNING or CONFIG state.
  */
+int rte_timer_stop_v20(struct rte_timer *tim);
+int rte_timer_stop_v1905(struct rte_timer *tim);
 int rte_timer_stop(struct rte_timer *tim);
 
-
 /**
  * Loop until rte_timer_stop() succeeds.
  *
@@ -292,7 +356,25 @@ int rte_timer_pending(struct rte_timer *tim);
  * function. However, the more often the function is called, the more
  * CPU resources it will use.
  */
-void rte_timer_manage(void);
+void rte_timer_manage_v20(void);
+
+/**
+ * Manage the timer list and execute callback functions.
+ *
+ * This function must be called periodically from EAL lcores
+ * main_loop(). It browses the list of pending timers and runs all
+ * timers that are expired.
+ *
+ * The precision of the timer depends on the call frequency of this
+ * function. However, the more often the function is called, the more
+ * CPU resources it will use.
+ *
+ * @return
+ *   - 0: Success
+ *   - -EINVAL: timer subsystem not yet initialized
+ */
+int rte_timer_manage_v1905(void);
+int rte_timer_manage(void);
 
 /**
  * Dump statistics about timers.
@@ -300,7 +382,143 @@ void rte_timer_manage(void);
  * @param f
  *   A pointer to a file for output
  */
-void rte_timer_dump_stats(FILE *f);
+void rte_timer_dump_stats_v20(FILE *f);
+
+/**
+ * Dump statistics about timers.
+ *
+ * @param f
+ *   A pointer to a file for output
+ * @return
+ *   - 0: Success
+ *   - -EINVAL: timer subsystem not yet initialized
+ */
+int rte_timer_dump_stats_v1905(FILE *f);
+int rte_timer_dump_stats(FILE *f);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * This function is the same as rte_timer_reset(), except that it allows a
+ * caller to specify the rte_timer_data instance containing the list to which
+ * the timer should be added.
+ *
+ * @see rte_timer_reset()
+ *
+ * @param timer_data_id
+ *   An identifier indicating which instance of timer data should be used for
+ *   this operation.
+ * @param tim
+ *   The timer handle.
+ * @param ticks
+ *   The number of cycles (see rte_get_hpet_hz()) before the callback
+ *   function is called.
+ * @param type
+ *   The type can be either:
+ *   - PERIODICAL: The timer is automatically reloaded after execution
+ *     (returns to the PENDING state)
+ *   - SINGLE: The timer is one-shot, that is, the timer goes to a
+ *     STOPPED state after execution.
+ * @param tim_lcore
+ *   The ID of the lcore where the timer callback function has to be
+ *   executed. If tim_lcore is LCORE_ID_ANY, the timer library will
+ *   launch it on a different core for each call (round-robin).
+ * @param fct
+ *   The callback function of the timer. This parameter can be NULL if (and
+ *   only if) rte_timer_alt_manage() will be used to manage this timer.
+ * @param arg
+ *   The user argument of the callback function.
+ * @return
+ *   - 0: Success; the timer is scheduled.
+ *   - (-1): Timer is in the RUNNING or CONFIG state.
+ *   - -EINVAL: invalid timer_data_id
+ */
+int __rte_experimental
+rte_timer_alt_reset(uint32_t timer_data_id, struct rte_timer *tim,
+		    uint64_t ticks, enum rte_timer_type type,
+		    unsigned int tim_lcore, rte_timer_cb_t fct, void *arg);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * This function is the same as rte_timer_stop(), except that it allows a
+ * caller to specify the rte_timer_data instance containing the list from which
+ * this timer should be removed.
+ *
+ * @see rte_timer_stop()
+ *
+ * @param timer_data_id
+ *   An identifier indicating which instance of timer data should be used for
+ *   this operation.
+ * @param tim
+ *   The timer handle.
+ * @return
+ *   - 0: Success; the timer is stopped.
+ *   - (-1): The timer is in the RUNNING or CONFIG state.
+ *   - -EINVAL: invalid timer_data_id
+ */
+int __rte_experimental
+rte_timer_alt_stop(uint32_t timer_data_id, struct rte_timer *tim);
+
+/**
+ * Callback function type for rte_timer_alt_manage().
+ */
+typedef void (*rte_timer_alt_manage_cb_t)(void *);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Manage a set of timer lists and execute the specified callback function for
+ * all expired timers. This function is similar to rte_timer_manage(), except
+ * that it allows a caller to specify the timer_data instance that should
+ * be operated on, as well as a set of lcore IDs identifying which timer lists
+ * should be processed.  Callback functions of individual timers are ignored.
+ *
+ * @see rte_timer_manage()
+ *
+ * @param timer_data_id
+ *   An identifier indicating which instance of timer data should be used for
+ *   this operation.
+ * @param poll_lcores
+ *   An array of lcore ids identifying the timer lists that should be processed.
+ *   NULL is allowed - if NULL, the timer list corresponding to the lcore
+ *   calling this routine is processed (same as rte_timer_manage()).
+ * @param n_poll_lcores
+ *   The size of the poll_lcores array. If 'poll_lcores' is NULL, this parameter
+ *   is ignored.
+ * @param f
+ *   The callback function which should be called for all expired timers.
+ * @return
+ *   - 0: success
+ *   - -EINVAL: invalid timer_data_id
+ */
+int __rte_experimental
+rte_timer_alt_manage(uint32_t timer_data_id, unsigned int *poll_lcores,
+		     int n_poll_lcores, rte_timer_alt_manage_cb_t f);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * This function is the same as rte_timer_dump_stats(), except that it allows
+ * the caller to specify the rte_timer_data instance that should be used.
+ *
+ * @see rte_timer_dump_stats()
+ *
+ * @param timer_data_id
+ *   An identifier indicating which instance of timer data should be used for
+ *   this operation.
+ * @param f
+ *   A pointer to a file for output
+ * @return
+ *   - 0: success
+ *   - -EINVAL: invalid timer_data_id
+ */
+int __rte_experimental
+rte_timer_alt_dump_stats(uint32_t timer_data_id, FILE *f);
 
 #ifdef __cplusplus
 }
diff --git a/lib/librte_timer/rte_timer_version.map b/lib/librte_timer/rte_timer_version.map
index 9b2e4b8..c2e5836 100644
--- a/lib/librte_timer/rte_timer_version.map
+++ b/lib/librte_timer/rte_timer_version.map
@@ -13,3 +13,25 @@ DPDK_2.0 {
 
 	local: *;
 };
+
+DPDK_19.05 {
+	global:
+
+	rte_timer_dump_stats;
+	rte_timer_manage;
+	rte_timer_reset;
+	rte_timer_stop;
+	rte_timer_subsystem_init;
+} DPDK_2.0;
+
+EXPERIMENTAL {
+	global:
+
+	rte_timer_alt_dump_stats;
+	rte_timer_alt_manage;
+	rte_timer_alt_reset;
+	rte_timer_alt_stop;
+	rte_timer_data_alloc;
+	rte_timer_data_dealloc;
+	rte_timer_subsystem_finalize;
+};
-- 
2.6.4



^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [dpdk-dev] [PATCH v4 1/2] timer: allow timer management in shared memory
  2019-03-20 13:52         ` Sanford, Robert
@ 2019-03-20 13:52           ` Sanford, Robert
  2019-03-21  1:01           ` Carrillo, Erik G
  2019-04-15 21:49           ` Carrillo, Erik G
  2 siblings, 0 replies; 77+ messages in thread
From: Sanford, Robert @ 2019-03-20 13:52 UTC (permalink / raw)
  To: Erik Gabriel Carrillo, thomas, dev; +Cc: nhorman

Hi Erik,

I have a few questions and comments on this patch series.

1. Don't you think we need new tests (in test/test/) to verify the secondary-process APIs?
2. I suggest we define default_data_id as const, and explicitly set it to 0.
3. The outer for-loop in rte_timer_alt_manage() touches beyond the end of poll_lcores[]. I suggest a change like this:

-       for (i = 0, poll_lcore = poll_lcores[i]; i < nb_poll_lcores;
-            poll_lcore = poll_lcores[++i]) {
+       for (i = 0; I < nb_poll_lcores; i++) {
+            poll_lcore = poll_lcores[i];

4. Same problem (as #3) in the for-loop in rte_timer_stop_all(), in patch v4 2/2.
5. There seems to be no difference between "typedef void (*rte_timer_cb_t)(struct rte_timer *, void *)" and "typedef void (*rte_timer_stop_all_cb_t)(struct rte_timer *tim, void *arg)", why add rte_timer_stop_all_cb_t?
6. Can you provide a use case or code snippet that shows how we will use rte_timer_alt_manage()?
7. Why not make the argument to rte_timer_alt_manage_cb_t a "struct rte_timer *", instead of a "void *", since we pass a pointer-to-timer when we invoke the function?

--
Regards,
Robert Sanford


On 3/6/19, 12:20 PM, "Erik Gabriel Carrillo" <erik.g.carrillo@intel.com> wrote:

Currently, the timer library uses a per-process table of structures to
manage skiplists of timers presumably because timers contain arbitrary
function pointers whose value may not resolve properly in other
processes.

However, if the same callback is used handle all timers, and that
callback is only invoked in one process, then it woud be safe to allow
the data structures to be allocated in shared memory, and to allow
secondary processes to modify the timer lists.  This would let timers be
used in more multi-process scenarios.

The library's global variables are wrapped with a struct, and an array
of these structures is created in shared memory.  The original APIs
are updated to reference the zeroth entry in the array. This maintains
the original behavior for both primary and secondary processes since
the set intersection of their coremasks should be empty [1].  New APIs
are introduced to enable the allocation/deallocation of other entries
in the array.

New variants of the APIs used to start and stop timers are introduced;
they allow a caller to specify which array entry should be used to
locate the timer list to insert into or delete from.

Finally, a new variant of rte_timer_manage() is introduced, which
allows a caller to specify which array entry should be used to locate
the timer lists to process; it can also process multiple timer lists per
invocation.

[1] https://doc.dpdk.org/guides/prog_guide/multi_proc_support.html#multi-process-limitations

Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
---
 lib/librte_timer/Makefile              |   1 +
 lib/librte_timer/rte_timer.c           | 519 ++++++++++++++++++++++++++++++---
 lib/librte_timer/rte_timer.h           | 226 +++++++++++++-
 lib/librte_timer/rte_timer_version.map |  22 ++
 4 files changed, 723 insertions(+), 45 deletions(-)

diff --git a/lib/librte_timer/Makefile b/lib/librte_timer/Makefile
index 4ebd528..8ec63f4 100644
--- a/lib/librte_timer/Makefile
+++ b/lib/librte_timer/Makefile
@@ -6,6 +6,7 @@ include $(RTE_SDK)/mk/rte.vars.mk
 # library name
 LIB = librte_timer.a
 
+CFLAGS += -DALLOW_EXPERIMENTAL_API
 CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3
 LDLIBS += -lrte_eal
 
diff --git a/lib/librte_timer/rte_timer.c b/lib/librte_timer/rte_timer.c
index 30c7b0a..2bd49d0 100644
--- a/lib/librte_timer/rte_timer.c
+++ b/lib/librte_timer/rte_timer.c
@@ -5,6 +5,7 @@
 #include <string.h>
 #include <stdio.h>
 #include <stdint.h>
+#include <stdbool.h>
 #include <inttypes.h>
 #include <assert.h>
 #include <sys/queue.h>
@@ -21,11 +22,15 @@
 #include <rte_spinlock.h>
 #include <rte_random.h>
 #include <rte_pause.h>
+#include <rte_memzone.h>
+#include <rte_malloc.h>
+#include <rte_compat.h>
 
 #include "rte_timer.h"
 
-LIST_HEAD(rte_timer_list, rte_timer);
-
+/**
+ * Per-lcore info for timers.
+ */
 struct priv_timer {
 	struct rte_timer pending_head;  /**< dummy timer instance to head up list */
 	rte_spinlock_t list_lock;       /**< lock to protect list access */
@@ -48,25 +53,84 @@ struct priv_timer {
 #endif
 } __rte_cache_aligned;
 
-/** per-lcore private info for timers */
-static struct priv_timer priv_timer[RTE_MAX_LCORE];
+#define FL_ALLOCATED	(1 << 0)
+struct rte_timer_data {
+	struct priv_timer priv_timer[RTE_MAX_LCORE];
+	uint8_t internal_flags;
+};
+
+#define RTE_MAX_DATA_ELS 64
+static struct rte_timer_data *rte_timer_data_arr;
+static uint32_t default_data_id;
+static uint32_t rte_timer_subsystem_initialized;
+
+/* For maintaining older interfaces for a period */
+static struct rte_timer_data default_timer_data;
 
 /* when debug is enabled, store some statistics */
 #ifdef RTE_LIBRTE_TIMER_DEBUG
-#define __TIMER_STAT_ADD(name, n) do {					\
+#define __TIMER_STAT_ADD(priv_timer, name, n) do {			\
 		unsigned __lcore_id = rte_lcore_id();			\
 		if (__lcore_id < RTE_MAX_LCORE)				\
 			priv_timer[__lcore_id].stats.name += (n);	\
 	} while(0)
 #else
-#define __TIMER_STAT_ADD(name, n) do {} while(0)
+#define __TIMER_STAT_ADD(priv_timer, name, n) do {} while (0)
 #endif
 
-/* Init the timer library. */
+static inline int
+timer_data_valid(uint32_t id)
+{
+	return !!(rte_timer_data_arr[id].internal_flags & FL_ALLOCATED);
+}
+
+/* validate ID and retrieve timer data pointer, or return error value */
+#define TIMER_DATA_VALID_GET_OR_ERR_RET(id, timer_data, retval) do {	\
+	if (id >= RTE_MAX_DATA_ELS || !timer_data_valid(id))		\
+		return retval;						\
+	timer_data = &rte_timer_data_arr[id];				\
+} while (0)
+
+int __rte_experimental
+rte_timer_data_alloc(uint32_t *id_ptr)
+{
+	int i;
+	struct rte_timer_data *data;
+
+	if (!rte_timer_subsystem_initialized)
+		return -ENOMEM;
+
+	for (i = 0; i < RTE_MAX_DATA_ELS; i++) {
+		data = &rte_timer_data_arr[i];
+		if (!(data->internal_flags & FL_ALLOCATED)) {
+			data->internal_flags |= FL_ALLOCATED;
+
+			if (id_ptr)
+				*id_ptr = i;
+
+			return 0;
+		}
+	}
+
+	return -ENOSPC;
+}
+
+int __rte_experimental
+rte_timer_data_dealloc(uint32_t id)
+{
+	struct rte_timer_data *timer_data;
+	TIMER_DATA_VALID_GET_OR_ERR_RET(id, timer_data, -EINVAL);
+
+	timer_data->internal_flags &= ~(FL_ALLOCATED);
+
+	return 0;
+}
+
 void
-rte_timer_subsystem_init(void)
+rte_timer_subsystem_init_v20(void)
 {
 	unsigned lcore_id;
+	struct priv_timer *priv_timer = default_timer_data.priv_timer;
 
 	/* since priv_timer is static, it's zeroed by default, so only init some
 	 * fields.
@@ -76,6 +140,76 @@ rte_timer_subsystem_init(void)
 		priv_timer[lcore_id].prev_lcore = lcore_id;
 	}
 }
+VERSION_SYMBOL(rte_timer_subsystem_init, _v20, 2.0);
+
+/* Init the timer library. Allocate an array of timer data structs in shared
+ * memory, and allocate the zeroth entry for use with original timer
+ * APIs. Since the intersection of the sets of lcore ids in primary and
+ * secondary processes should be empty, the zeroth entry can be shared by
+ * multiple processes.
+ */
+int
+rte_timer_subsystem_init_v1905(void)
+{
+	const struct rte_memzone *mz;
+	struct rte_timer_data *data;
+	int i, lcore_id;
+	static const char *mz_name = "rte_timer_mz";
+
+	if (rte_timer_subsystem_initialized)
+		return -EALREADY;
+
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
+		mz = rte_memzone_lookup(mz_name);
+		if (mz == NULL)
+			return -EEXIST;
+
+		rte_timer_data_arr = mz->addr;
+
+		rte_timer_data_arr[default_data_id].internal_flags |=
+			FL_ALLOCATED;
+
+		rte_timer_subsystem_initialized = 1;
+
+		return 0;
+	}
+
+	mz = rte_memzone_reserve_aligned(mz_name,
+			RTE_MAX_DATA_ELS * sizeof(*rte_timer_data_arr),
+			SOCKET_ID_ANY, 0, RTE_CACHE_LINE_SIZE);
+	if (mz == NULL)
+		return -ENOMEM;
+
+	rte_timer_data_arr = mz->addr;
+
+	for (i = 0; i < RTE_MAX_DATA_ELS; i++) {
+		data = &rte_timer_data_arr[i];
+
+		for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
+			rte_spinlock_init(
+				&data->priv_timer[lcore_id].list_lock);
+			data->priv_timer[lcore_id].prev_lcore = lcore_id;
+		}
+	}
+
+	rte_timer_data_arr[default_data_id].internal_flags |= FL_ALLOCATED;
+
+	rte_timer_subsystem_initialized = 1;
+
+	return 0;
+}
+MAP_STATIC_SYMBOL(int rte_timer_subsystem_init(void),
+		  rte_timer_subsystem_init_v1905);
+BIND_DEFAULT_SYMBOL(rte_timer_subsystem_init, _v1905, 19.05);
+
+void __rte_experimental
+rte_timer_subsystem_finalize(void)
+{
+	if (rte_timer_data_arr)
+		rte_free(rte_timer_data_arr);
+
+	rte_timer_subsystem_initialized = 0;
+}
 
 /* Initialize the timer handle tim for use */
 void
@@ -95,7 +229,8 @@ rte_timer_init(struct rte_timer *tim)
  */
 static int
 timer_set_config_state(struct rte_timer *tim,
-		       union rte_timer_status *ret_prev_status)
+		       union rte_timer_status *ret_prev_status,
+		       struct priv_timer *priv_timer)
 {
 	union rte_timer_status prev_status, status;
 	int success = 0;
@@ -207,7 +342,7 @@ timer_get_skiplist_level(unsigned curr_depth)
  */
 static void
 timer_get_prev_entries(uint64_t time_val, unsigned tim_lcore,
-		struct rte_timer **prev)
+		       struct rte_timer **prev, struct priv_timer *priv_timer)
 {
 	unsigned lvl = priv_timer[tim_lcore].curr_skiplist_depth;
 	prev[lvl] = &priv_timer[tim_lcore].pending_head;
@@ -226,13 +361,15 @@ timer_get_prev_entries(uint64_t time_val, unsigned tim_lcore,
  */
 static void
 timer_get_prev_entries_for_node(struct rte_timer *tim, unsigned tim_lcore,
-		struct rte_timer **prev)
+				struct rte_timer **prev,
+				struct priv_timer *priv_timer)
 {
 	int i;
+
 	/* to get a specific entry in the list, look for just lower than the time
 	 * values, and then increment on each level individually if necessary
 	 */
-	timer_get_prev_entries(tim->expire - 1, tim_lcore, prev);
+	timer_get_prev_entries(tim->expire - 1, tim_lcore, prev, priv_timer);
 	for (i = priv_timer[tim_lcore].curr_skiplist_depth - 1; i >= 0; i--) {
 		while (prev[i]->sl_next[i] != NULL &&
 				prev[i]->sl_next[i] != tim &&
@@ -247,14 +384,15 @@ timer_get_prev_entries_for_node(struct rte_timer *tim, unsigned tim_lcore,
  * timer must not be in a list
  */
 static void
-timer_add(struct rte_timer *tim, unsigned int tim_lcore)
+timer_add(struct rte_timer *tim, unsigned int tim_lcore,
+	  struct priv_timer *priv_timer)
 {
 	unsigned lvl;
 	struct rte_timer *prev[MAX_SKIPLIST_DEPTH+1];
 
 	/* find where exactly this element goes in the list of elements
 	 * for each depth. */
-	timer_get_prev_entries(tim->expire, tim_lcore, prev);
+	timer_get_prev_entries(tim->expire, tim_lcore, prev, priv_timer);
 
 	/* now assign it a new level and add at that level */
 	const unsigned tim_level = timer_get_skiplist_level(
@@ -284,7 +422,7 @@ timer_add(struct rte_timer *tim, unsigned int tim_lcore)
  */
 static void
 timer_del(struct rte_timer *tim, union rte_timer_status prev_status,
-		int local_is_locked)
+	  int local_is_locked, struct priv_timer *priv_timer)
 {
 	unsigned lcore_id = rte_lcore_id();
 	unsigned prev_owner = prev_status.owner;
@@ -304,7 +442,7 @@ timer_del(struct rte_timer *tim, union rte_timer_status prev_status,
 				((tim->sl_next[0] == NULL) ? 0 : tim->sl_next[0]->expire);
 
 	/* adjust pointers from previous entries to point past this */
-	timer_get_prev_entries_for_node(tim, prev_owner, prev);
+	timer_get_prev_entries_for_node(tim, prev_owner, prev, priv_timer);
 	for (i = priv_timer[prev_owner].curr_skiplist_depth - 1; i >= 0; i--) {
 		if (prev[i]->sl_next[i] == tim)
 			prev[i]->sl_next[i] = tim->sl_next[i];
@@ -326,11 +464,13 @@ static int
 __rte_timer_reset(struct rte_timer *tim, uint64_t expire,
 		  uint64_t period, unsigned tim_lcore,
 		  rte_timer_cb_t fct, void *arg,
-		  int local_is_locked)
+		  int local_is_locked,
+		  struct rte_timer_data *timer_data)
 {
 	union rte_timer_status prev_status, status;
 	int ret;
 	unsigned lcore_id = rte_lcore_id();
+	struct priv_timer *priv_timer = timer_data->priv_timer;
 
 	/* round robin for tim_lcore */
 	if (tim_lcore == (unsigned)LCORE_ID_ANY) {
@@ -348,11 +488,11 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t expire,
 
 	/* wait that the timer is in correct status before update,
 	 * and mark it as being configured */
-	ret = timer_set_config_state(tim, &prev_status);
+	ret = timer_set_config_state(tim, &prev_status, priv_timer);
 	if (ret < 0)
 		return -1;
 
-	__TIMER_STAT_ADD(reset, 1);
+	__TIMER_STAT_ADD(priv_timer, reset, 1);
 	if (prev_status.state == RTE_TIMER_RUNNING &&
 	    lcore_id < RTE_MAX_LCORE) {
 		priv_timer[lcore_id].updated = 1;
@@ -360,8 +500,8 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t expire,
 
 	/* remove it from list */
 	if (prev_status.state == RTE_TIMER_PENDING) {
-		timer_del(tim, prev_status, local_is_locked);
-		__TIMER_STAT_ADD(pending, -1);
+		timer_del(tim, prev_status, local_is_locked, priv_timer);
+		__TIMER_STAT_ADD(priv_timer, pending, -1);
 	}
 
 	tim->period = period;
@@ -376,8 +516,8 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t expire,
 	if (tim_lcore != lcore_id || !local_is_locked)
 		rte_spinlock_lock(&priv_timer[tim_lcore].list_lock);
 
-	__TIMER_STAT_ADD(pending, 1);
-	timer_add(tim, tim_lcore);
+	__TIMER_STAT_ADD(priv_timer, pending, 1);
+	timer_add(tim, tim_lcore, priv_timer);
 
 	/* update state: as we are in CONFIG state, only us can modify
 	 * the state so we don't need to use cmpset() here */
@@ -394,9 +534,9 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t expire,
 
 /* Reset and start the timer associated with the timer handle tim */
 int
-rte_timer_reset(struct rte_timer *tim, uint64_t ticks,
-		enum rte_timer_type type, unsigned tim_lcore,
-		rte_timer_cb_t fct, void *arg)
+rte_timer_reset_v20(struct rte_timer *tim, uint64_t ticks,
+		    enum rte_timer_type type, unsigned int tim_lcore,
+		    rte_timer_cb_t fct, void *arg)
 {
 	uint64_t cur_time = rte_get_timer_cycles();
 	uint64_t period;
@@ -412,7 +552,48 @@ rte_timer_reset(struct rte_timer *tim, uint64_t ticks,
 		period = 0;
 
 	return __rte_timer_reset(tim,  cur_time + ticks, period, tim_lcore,
-			  fct, arg, 0);
+			  fct, arg, 0, &default_timer_data);
+}
+VERSION_SYMBOL(rte_timer_reset, _v20, 2.0);
+
+int
+rte_timer_reset_v1905(struct rte_timer *tim, uint64_t ticks,
+		      enum rte_timer_type type, unsigned int tim_lcore,
+		      rte_timer_cb_t fct, void *arg)
+{
+	return rte_timer_alt_reset(default_data_id, tim, ticks, type,
+				   tim_lcore, fct, arg);
+}
+MAP_STATIC_SYMBOL(int rte_timer_reset(struct rte_timer *tim, uint64_t ticks,
+				      enum rte_timer_type type,
+				      unsigned int tim_lcore,
+				      rte_timer_cb_t fct, void *arg),
+		  rte_timer_reset_v1905);
+BIND_DEFAULT_SYMBOL(rte_timer_reset, _v1905, 19.05);
+
+int __rte_experimental
+rte_timer_alt_reset(uint32_t timer_data_id, struct rte_timer *tim,
+		    uint64_t ticks, enum rte_timer_type type,
+		    unsigned int tim_lcore, rte_timer_cb_t fct, void *arg)
+{
+	uint64_t cur_time = rte_get_timer_cycles();
+	uint64_t period;
+	struct rte_timer_data *timer_data;
+
+	TIMER_DATA_VALID_GET_OR_ERR_RET(timer_data_id, timer_data, -EINVAL);
+
+	if (unlikely((tim_lcore != (unsigned int)LCORE_ID_ANY) &&
+			!(rte_lcore_is_enabled(tim_lcore) ||
+			  rte_lcore_has_role(tim_lcore, ROLE_SERVICE))))
+		return -1;
+
+	if (type == PERIODICAL)
+		period = ticks;
+	else
+		period = 0;
+
+	return __rte_timer_reset(tim,  cur_time + ticks, period, tim_lcore,
+				 fct, arg, 0, timer_data);
 }
 
 /* loop until rte_timer_reset() succeed */
@@ -426,21 +607,22 @@ rte_timer_reset_sync(struct rte_timer *tim, uint64_t ticks,
 		rte_pause();
 }
 
-/* Stop the timer associated with the timer handle tim */
-int
-rte_timer_stop(struct rte_timer *tim)
+static int
+__rte_timer_stop(struct rte_timer *tim, int local_is_locked,
+		 struct rte_timer_data *timer_data)
 {
 	union rte_timer_status prev_status, status;
 	unsigned lcore_id = rte_lcore_id();
 	int ret;
+	struct priv_timer *priv_timer = timer_data->priv_timer;
 
 	/* wait that the timer is in correct status before update,
 	 * and mark it as being configured */
-	ret = timer_set_config_state(tim, &prev_status);
+	ret = timer_set_config_state(tim, &prev_status, priv_timer);
 	if (ret < 0)
 		return -1;
 
-	__TIMER_STAT_ADD(stop, 1);
+	__TIMER_STAT_ADD(priv_timer, stop, 1);
 	if (prev_status.state == RTE_TIMER_RUNNING &&
 	    lcore_id < RTE_MAX_LCORE) {
 		priv_timer[lcore_id].updated = 1;
@@ -448,8 +630,8 @@ rte_timer_stop(struct rte_timer *tim)
 
 	/* remove it from list */
 	if (prev_status.state == RTE_TIMER_PENDING) {
-		timer_del(tim, prev_status, 0);
-		__TIMER_STAT_ADD(pending, -1);
+		timer_del(tim, prev_status, local_is_locked, priv_timer);
+		__TIMER_STAT_ADD(priv_timer, pending, -1);
 	}
 
 	/* mark timer as stopped */
@@ -461,6 +643,33 @@ rte_timer_stop(struct rte_timer *tim)
 	return 0;
 }
 
+/* Stop the timer associated with the timer handle tim */
+int
+rte_timer_stop_v20(struct rte_timer *tim)
+{
+	return __rte_timer_stop(tim, 0, &default_timer_data);
+}
+VERSION_SYMBOL(rte_timer_stop, _v20, 2.0);
+
+int
+rte_timer_stop_v1905(struct rte_timer *tim)
+{
+	return rte_timer_alt_stop(default_data_id, tim);
+}
+MAP_STATIC_SYMBOL(int rte_timer_stop(struct rte_timer *tim),
+		  rte_timer_stop_v1905);
+BIND_DEFAULT_SYMBOL(rte_timer_stop, _v1905, 19.05);
+
+int __rte_experimental
+rte_timer_alt_stop(uint32_t timer_data_id, struct rte_timer *tim)
+{
+	struct rte_timer_data *timer_data;
+
+	TIMER_DATA_VALID_GET_OR_ERR_RET(timer_data_id, timer_data, -EINVAL);
+
+	return __rte_timer_stop(tim, 0, timer_data);
+}
+
 /* loop until rte_timer_stop() succeed */
 void
 rte_timer_stop_sync(struct rte_timer *tim)
@@ -477,7 +686,8 @@ rte_timer_pending(struct rte_timer *tim)
 }
 
 /* must be called periodically, run all timer that expired */
-void rte_timer_manage(void)
+static void
+__rte_timer_manage(struct rte_timer_data *timer_data)
 {
 	union rte_timer_status status;
 	struct rte_timer *tim, *next_tim;
@@ -486,11 +696,12 @@ void rte_timer_manage(void)
 	struct rte_timer *prev[MAX_SKIPLIST_DEPTH + 1];
 	uint64_t cur_time;
 	int i, ret;
+	struct priv_timer *priv_timer = timer_data->priv_timer;
 
 	/* timer manager only runs on EAL thread with valid lcore_id */
 	assert(lcore_id < RTE_MAX_LCORE);
 
-	__TIMER_STAT_ADD(manage, 1);
+	__TIMER_STAT_ADD(priv_timer, manage, 1);
 	/* optimize for the case where per-cpu list is empty */
 	if (priv_timer[lcore_id].pending_head.sl_next[0] == NULL)
 		return;
@@ -518,7 +729,7 @@ void rte_timer_manage(void)
 	tim = priv_timer[lcore_id].pending_head.sl_next[0];
 
 	/* break the existing list at current time point */
-	timer_get_prev_entries(cur_time, lcore_id, prev);
+	timer_get_prev_entries(cur_time, lcore_id, prev, priv_timer);
 	for (i = priv_timer[lcore_id].curr_skiplist_depth -1; i >= 0; i--) {
 		if (prev[i] == &priv_timer[lcore_id].pending_head)
 			continue;
@@ -563,7 +774,7 @@ void rte_timer_manage(void)
 		/* execute callback function with list unlocked */
 		tim->f(tim, tim->arg);
 
-		__TIMER_STAT_ADD(pending, -1);
+		__TIMER_STAT_ADD(priv_timer, pending, -1);
 		/* the timer was stopped or reloaded by the callback
 		 * function, we have nothing to do here */
 		if (priv_timer[lcore_id].updated == 1)
@@ -580,24 +791,222 @@ void rte_timer_manage(void)
 			/* keep it in list and mark timer as pending */
 			rte_spinlock_lock(&priv_timer[lcore_id].list_lock);
 			status.state = RTE_TIMER_PENDING;
-			__TIMER_STAT_ADD(pending, 1);
+			__TIMER_STAT_ADD(priv_timer, pending, 1);
 			status.owner = (int16_t)lcore_id;
 			rte_wmb();
 			tim->status.u32 = status.u32;
 			__rte_timer_reset(tim, tim->expire + tim->period,
-				tim->period, lcore_id, tim->f, tim->arg, 1);
+				tim->period, lcore_id, tim->f, tim->arg, 1,
+				timer_data);
 			rte_spinlock_unlock(&priv_timer[lcore_id].list_lock);
 		}
 	}
 	priv_timer[lcore_id].running_tim = NULL;
 }
 
+void
+rte_timer_manage_v20(void)
+{
+	__rte_timer_manage(&default_timer_data);
+}
+VERSION_SYMBOL(rte_timer_manage, _v20, 2.0);
+
+int
+rte_timer_manage_v1905(void)
+{
+	struct rte_timer_data *timer_data;
+
+	TIMER_DATA_VALID_GET_OR_ERR_RET(default_data_id, timer_data, -EINVAL);
+
+	__rte_timer_manage(timer_data);
+
+	return 0;
+}
+MAP_STATIC_SYMBOL(int rte_timer_manage(void), rte_timer_manage_v1905);
+BIND_DEFAULT_SYMBOL(rte_timer_manage, _v1905, 19.05);
+
+int __rte_experimental
+rte_timer_alt_manage(uint32_t timer_data_id,
+		     unsigned int *poll_lcores,
+		     int nb_poll_lcores,
+		     rte_timer_alt_manage_cb_t f)
+{
+	union rte_timer_status status;
+	struct rte_timer *tim, *next_tim, **pprev;
+	struct rte_timer *run_first_tims[RTE_MAX_LCORE];
+	unsigned int runlist_lcore_ids[RTE_MAX_LCORE];
+	unsigned int this_lcore = rte_lcore_id();
+	struct rte_timer *prev[MAX_SKIPLIST_DEPTH + 1];
+	uint64_t cur_time;
+	int i, j, ret;
+	int nb_runlists = 0;
+	struct rte_timer_data *data;
+	struct priv_timer *privp;
+	uint32_t poll_lcore;
+
+	TIMER_DATA_VALID_GET_OR_ERR_RET(timer_data_id, data, -EINVAL);
+
+	/* timer manager only runs on EAL thread with valid lcore_id */
+	assert(this_lcore < RTE_MAX_LCORE);
+
+	__TIMER_STAT_ADD(data->priv_timer, manage, 1);
+
+	if (poll_lcores == NULL) {
+		poll_lcores = (unsigned int []){rte_lcore_id()};
+		nb_poll_lcores = 1;
+	}
+
+	for (i = 0, poll_lcore = poll_lcores[i]; i < nb_poll_lcores;
+	     poll_lcore = poll_lcores[++i]) {
+		privp = &data->priv_timer[poll_lcore];
+
+		/* optimize for the case where per-cpu list is empty */
+		if (privp->pending_head.sl_next[0] == NULL)
+			continue;
+		cur_time = rte_get_timer_cycles();
+
+#ifdef RTE_ARCH_64
+		/* on 64-bit the value cached in the pending_head.expired will
+		 * be updated atomically, so we can consult that for a quick
+		 * check here outside the lock
+		 */
+		if (likely(privp->pending_head.expire > cur_time))
+			continue;
+#endif
+
+		/* browse ordered list, add expired timers in 'expired' list */
+		rte_spinlock_lock(&privp->list_lock);
+
+		/* if nothing to do just unlock and return */
+		if (privp->pending_head.sl_next[0] == NULL ||
+		    privp->pending_head.sl_next[0]->expire > cur_time) {
+			rte_spinlock_unlock(&privp->list_lock);
+			continue;
+		}
+
+		/* save start of list of expired timers */
+		tim = privp->pending_head.sl_next[0];
+
+		/* break the existing list at current time point */
+		timer_get_prev_entries(cur_time, poll_lcore, prev,
+				       data->priv_timer);
+		for (j = privp->curr_skiplist_depth - 1; j >= 0; j--) {
+			if (prev[j] == &privp->pending_head)
+				continue;
+			privp->pending_head.sl_next[j] =
+				prev[j]->sl_next[j];
+			if (prev[j]->sl_next[j] == NULL)
+				privp->curr_skiplist_depth--;
+
+			prev[j]->sl_next[j] = NULL;
+		}
+
+		/* transition run-list from PENDING to RUNNING */
+		run_first_tims[nb_runlists] = tim;
+		runlist_lcore_ids[nb_runlists] = poll_lcore;
+		pprev = &run_first_tims[nb_runlists];
+		nb_runlists++;
+
+		for ( ; tim != NULL; tim = next_tim) {
+			next_tim = tim->sl_next[0];
+
+			ret = timer_set_running_state(tim);
+			if (likely(ret == 0)) {
+				pprev = &tim->sl_next[0];
+			} else {
+				/* another core is trying to re-config this one,
+				 * remove it from local expired list
+				 */
+				*pprev = next_tim;
+			}
+		}
+
+		/* update the next to expire timer value */
+		privp->pending_head.expire =
+		    (privp->pending_head.sl_next[0] == NULL) ? 0 :
+			privp->pending_head.sl_next[0]->expire;
+
+		rte_spinlock_unlock(&privp->list_lock);
+	}
+
+	/* Now process the run lists */
+	while (1) {
+		bool done = true;
+		uint64_t min_expire = UINT64_MAX;
+		int min_idx = 0;
+
+		/* Find the next oldest timer to process */
+		for (i = 0; i < nb_runlists; i++) {
+			tim = run_first_tims[i];
+
+			if (tim != NULL && tim->expire < min_expire) {
+				min_expire = tim->expire;
+				min_idx = i;
+				done = false;
+			}
+		}
+
+		if (done)
+			break;
+
+		tim = run_first_tims[min_idx];
+		privp = &data->priv_timer[runlist_lcore_ids[min_idx]];
+
+		/* Move down the runlist from which we picked a timer to
+		 * execute
+		 */
+		run_first_tims[min_idx] = run_first_tims[min_idx]->sl_next[0];
+
+		privp->updated = 0;
+		privp->running_tim = tim;
+
+		/* Call the provided callback function */
+		f(tim);
+
+		__TIMER_STAT_ADD(privp, pending, -1);
+
+		/* the timer was stopped or reloaded by the callback
+		 * function, we have nothing to do here
+		 */
+		if (privp->updated == 1)
+			continue;
+
+		if (tim->period == 0) {
+			/* remove from done list and mark timer as stopped */
+			status.state = RTE_TIMER_STOP;
+			status.owner = RTE_TIMER_NO_OWNER;
+			rte_wmb();
+			tim->status.u32 = status.u32;
+		} else {
+			/* keep it in list and mark timer as pending */
+			rte_spinlock_lock(
+				&data->priv_timer[this_lcore].list_lock);
+			status.state = RTE_TIMER_PENDING;
+			__TIMER_STAT_ADD(data->priv_timer, pending, 1);
+			status.owner = (int16_t)this_lcore;
+			rte_wmb();
+			tim->status.u32 = status.u32;
+			__rte_timer_reset(tim, tim->expire + tim->period,
+				tim->period, this_lcore, tim->f, tim->arg, 1,
+				data);
+			rte_spinlock_unlock(
+				&data->priv_timer[this_lcore].list_lock);
+		}
+
+		privp->running_tim = NULL;
+	}
+
+	return 0;
+}
+
 /* dump statistics about timers */
-void rte_timer_dump_stats(FILE *f)
+static void
+__rte_timer_dump_stats(struct rte_timer_data *timer_data __rte_unused, FILE *f)
 {
 #ifdef RTE_LIBRTE_TIMER_DEBUG
 	struct rte_timer_debug_stats sum;
 	unsigned lcore_id;
+	struct priv_timer *priv_timer = timer_data->priv_timer;
 
 	memset(&sum, 0, sizeof(sum));
 	for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
@@ -615,3 +1024,31 @@ void rte_timer_dump_stats(FILE *f)
 	fprintf(f, "No timer statistics, RTE_LIBRTE_TIMER_DEBUG is disabled\n");
 #endif
 }
+
+void
+rte_timer_dump_stats_v20(FILE *f)
+{
+	__rte_timer_dump_stats(&default_timer_data, f);
+}
+VERSION_SYMBOL(rte_timer_dump_stats, _v20, 2.0);
+
+int
+rte_timer_dump_stats_v1905(FILE *f)
+{
+	return rte_timer_alt_dump_stats(default_data_id, f);
+}
+MAP_STATIC_SYMBOL(int rte_timer_dump_stats(FILE *f),
+		  rte_timer_dump_stats_v1905);
+BIND_DEFAULT_SYMBOL(rte_timer_dump_stats, _v1905, 19.05);
+
+int __rte_experimental
+rte_timer_alt_dump_stats(uint32_t timer_data_id __rte_unused, FILE *f)
+{
+	struct rte_timer_data *timer_data;
+
+	TIMER_DATA_VALID_GET_OR_ERR_RET(timer_data_id, timer_data, -EINVAL);
+
+	__rte_timer_dump_stats(timer_data, f);
+
+	return 0;
+}
diff --git a/lib/librte_timer/rte_timer.h b/lib/librte_timer/rte_timer.h
index 9b95cd2..bee1676 100644
--- a/lib/librte_timer/rte_timer.h
+++ b/lib/librte_timer/rte_timer.h
@@ -39,6 +39,7 @@
 #include <stddef.h>
 #include <rte_common.h>
 #include <rte_config.h>
+#include <rte_spinlock.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -132,12 +133,68 @@ struct rte_timer
 #endif
 
 /**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Allocate a timer data instance in shared memory to track a set of pending
+ * timer lists.
+ *
+ * @param id_ptr
+ *   Pointer to variable into which to write the identifier of the allocated
+ *   timer data instance.
+ *
+ * @return
+ *   - 0: Success
+ *   - -ENOSPC: maximum number of timer data instances already allocated
+ */
+int __rte_experimental rte_timer_data_alloc(uint32_t *id_ptr);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Deallocate a timer data instance.
+ *
+ * @param id
+ *   Identifier of the timer data instance to deallocate.
+ *
+ * @return
+ *   - 0: Success
+ *   - -EINVAL: invalid timer data instance identifier
+ */
+int __rte_experimental rte_timer_data_dealloc(uint32_t id);
+
+/**
  * Initialize the timer library.
  *
  * Initializes internal variables (list, locks and so on) for the RTE
  * timer library.
  */
-void rte_timer_subsystem_init(void);
+void rte_timer_subsystem_init_v20(void);
+
+/**
+ * Initialize the timer library.
+ *
+ * Initializes internal variables (list, locks and so on) for the RTE
+ * timer library.
+ *
+ * @return
+ *   - 0: Success
+ *   - -EEXIST: Returned in secondary process when primary process has not
+ *      yet initialized the timer subsystem
+ *   - -ENOMEM: Unable to allocate memory needed to initialize timer
+ *      subsystem
+ */
+int rte_timer_subsystem_init_v1905(void);
+int rte_timer_subsystem_init(void);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Free timer subsystem resources.
+ */
+void __rte_experimental rte_timer_subsystem_finalize(void);
 
 /**
  * Initialize a timer handle.
@@ -193,6 +250,12 @@ void rte_timer_init(struct rte_timer *tim);
  *   - 0: Success; the timer is scheduled.
  *   - (-1): Timer is in the RUNNING or CONFIG state.
  */
+int rte_timer_reset_v20(struct rte_timer *tim, uint64_t ticks,
+			enum rte_timer_type type, unsigned int tim_lcore,
+			rte_timer_cb_t fct, void *arg);
+int rte_timer_reset_v1905(struct rte_timer *tim, uint64_t ticks,
+			  enum rte_timer_type type, unsigned int tim_lcore,
+			  rte_timer_cb_t fct, void *arg);
 int rte_timer_reset(struct rte_timer *tim, uint64_t ticks,
 		    enum rte_timer_type type, unsigned tim_lcore,
 		    rte_timer_cb_t fct, void *arg);
@@ -252,9 +315,10 @@ rte_timer_reset_sync(struct rte_timer *tim, uint64_t ticks,
  *   - 0: Success; the timer is stopped.
  *   - (-1): The timer is in the RUNNING or CONFIG state.
  */
+int rte_timer_stop_v20(struct rte_timer *tim);
+int rte_timer_stop_v1905(struct rte_timer *tim);
 int rte_timer_stop(struct rte_timer *tim);
 
-
 /**
  * Loop until rte_timer_stop() succeeds.
  *
@@ -292,7 +356,25 @@ int rte_timer_pending(struct rte_timer *tim);
  * function. However, the more often the function is called, the more
  * CPU resources it will use.
  */
-void rte_timer_manage(void);
+void rte_timer_manage_v20(void);
+
+/**
+ * Manage the timer list and execute callback functions.
+ *
+ * This function must be called periodically from EAL lcores
+ * main_loop(). It browses the list of pending timers and runs all
+ * timers that are expired.
+ *
+ * The precision of the timer depends on the call frequency of this
+ * function. However, the more often the function is called, the more
+ * CPU resources it will use.
+ *
+ * @return
+ *   - 0: Success
+ *   - -EINVAL: timer subsystem not yet initialized
+ */
+int rte_timer_manage_v1905(void);
+int rte_timer_manage(void);
 
 /**
  * Dump statistics about timers.
@@ -300,7 +382,143 @@ void rte_timer_manage(void);
  * @param f
  *   A pointer to a file for output
  */
-void rte_timer_dump_stats(FILE *f);
+void rte_timer_dump_stats_v20(FILE *f);
+
+/**
+ * Dump statistics about timers.
+ *
+ * @param f
+ *   A pointer to a file for output
+ * @return
+ *   - 0: Success
+ *   - -EINVAL: timer subsystem not yet initialized
+ */
+int rte_timer_dump_stats_v1905(FILE *f);
+int rte_timer_dump_stats(FILE *f);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * This function is the same as rte_timer_reset(), except that it allows a
+ * caller to specify the rte_timer_data instance containing the list to which
+ * the timer should be added.
+ *
+ * @see rte_timer_reset()
+ *
+ * @param timer_data_id
+ *   An identifier indicating which instance of timer data should be used for
+ *   this operation.
+ * @param tim
+ *   The timer handle.
+ * @param ticks
+ *   The number of cycles (see rte_get_hpet_hz()) before the callback
+ *   function is called.
+ * @param type
+ *   The type can be either:
+ *   - PERIODICAL: The timer is automatically reloaded after execution
+ *     (returns to the PENDING state)
+ *   - SINGLE: The timer is one-shot, that is, the timer goes to a
+ *     STOPPED state after execution.
+ * @param tim_lcore
+ *   The ID of the lcore where the timer callback function has to be
+ *   executed. If tim_lcore is LCORE_ID_ANY, the timer library will
+ *   launch it on a different core for each call (round-robin).
+ * @param fct
+ *   The callback function of the timer. This parameter can be NULL if (and
+ *   only if) rte_timer_alt_manage() will be used to manage this timer.
+ * @param arg
+ *   The user argument of the callback function.
+ * @return
+ *   - 0: Success; the timer is scheduled.
+ *   - (-1): Timer is in the RUNNING or CONFIG state.
+ *   - -EINVAL: invalid timer_data_id
+ */
+int __rte_experimental
+rte_timer_alt_reset(uint32_t timer_data_id, struct rte_timer *tim,
+		    uint64_t ticks, enum rte_timer_type type,
+		    unsigned int tim_lcore, rte_timer_cb_t fct, void *arg);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * This function is the same as rte_timer_stop(), except that it allows a
+ * caller to specify the rte_timer_data instance containing the list from which
+ * this timer should be removed.
+ *
+ * @see rte_timer_stop()
+ *
+ * @param timer_data_id
+ *   An identifier indicating which instance of timer data should be used for
+ *   this operation.
+ * @param tim
+ *   The timer handle.
+ * @return
+ *   - 0: Success; the timer is stopped.
+ *   - (-1): The timer is in the RUNNING or CONFIG state.
+ *   - -EINVAL: invalid timer_data_id
+ */
+int __rte_experimental
+rte_timer_alt_stop(uint32_t timer_data_id, struct rte_timer *tim);
+
+/**
+ * Callback function type for rte_timer_alt_manage().
+ */
+typedef void (*rte_timer_alt_manage_cb_t)(void *);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Manage a set of timer lists and execute the specified callback function for
+ * all expired timers. This function is similar to rte_timer_manage(), except
+ * that it allows a caller to specify the timer_data instance that should
+ * be operated on, as well as a set of lcore IDs identifying which timer lists
+ * should be processed.  Callback functions of individual timers are ignored.
+ *
+ * @see rte_timer_manage()
+ *
+ * @param timer_data_id
+ *   An identifier indicating which instance of timer data should be used for
+ *   this operation.
+ * @param poll_lcores
+ *   An array of lcore ids identifying the timer lists that should be processed.
+ *   NULL is allowed - if NULL, the timer list corresponding to the lcore
+ *   calling this routine is processed (same as rte_timer_manage()).
+ * @param n_poll_lcores
+ *   The size of the poll_lcores array. If 'poll_lcores' is NULL, this parameter
+ *   is ignored.
+ * @param f
+ *   The callback function which should be called for all expired timers.
+ * @return
+ *   - 0: success
+ *   - -EINVAL: invalid timer_data_id
+ */
+int __rte_experimental
+rte_timer_alt_manage(uint32_t timer_data_id, unsigned int *poll_lcores,
+		     int n_poll_lcores, rte_timer_alt_manage_cb_t f);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * This function is the same as rte_timer_dump_stats(), except that it allows
+ * the caller to specify the rte_timer_data instance that should be used.
+ *
+ * @see rte_timer_dump_stats()
+ *
+ * @param timer_data_id
+ *   An identifier indicating which instance of timer data should be used for
+ *   this operation.
+ * @param f
+ *   A pointer to a file for output
+ * @return
+ *   - 0: success
+ *   - -EINVAL: invalid timer_data_id
+ */
+int __rte_experimental
+rte_timer_alt_dump_stats(uint32_t timer_data_id, FILE *f);
 
 #ifdef __cplusplus
 }
diff --git a/lib/librte_timer/rte_timer_version.map b/lib/librte_timer/rte_timer_version.map
index 9b2e4b8..c2e5836 100644
--- a/lib/librte_timer/rte_timer_version.map
+++ b/lib/librte_timer/rte_timer_version.map
@@ -13,3 +13,25 @@ DPDK_2.0 {
 
 	local: *;
 };
+
+DPDK_19.05 {
+	global:
+
+	rte_timer_dump_stats;
+	rte_timer_manage;
+	rte_timer_reset;
+	rte_timer_stop;
+	rte_timer_subsystem_init;
+} DPDK_2.0;
+
+EXPERIMENTAL {
+	global:
+
+	rte_timer_alt_dump_stats;
+	rte_timer_alt_manage;
+	rte_timer_alt_reset;
+	rte_timer_alt_stop;
+	rte_timer_data_alloc;
+	rte_timer_data_dealloc;
+	rte_timer_subsystem_finalize;
+};
-- 
2.6.4



^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [dpdk-dev] [PATCH v4 1/2] timer: allow timer management in shared memory
  2019-03-20 13:52         ` Sanford, Robert
  2019-03-20 13:52           ` Sanford, Robert
@ 2019-03-21  1:01           ` Carrillo, Erik G
  2019-03-21  1:01             ` Carrillo, Erik G
  2019-03-27 14:03             ` Thomas Monjalon
  2019-04-15 21:49           ` Carrillo, Erik G
  2 siblings, 2 replies; 77+ messages in thread
From: Carrillo, Erik G @ 2019-03-21  1:01 UTC (permalink / raw)
  To: Sanford, Robert; +Cc: thomas, dev, nhorman

Hi Robert,

Thanks for the review and suggestions.  I’m out of the office on bonding leave for the next few weeks, but I’ll update the patch to address your points below when I return.

Thanks,
Erik

> On Mar 20, 2019, at 8:53 AM, Sanford, Robert <rsanford@akamai.com> wrote:
> 
> Hi Erik,
> 
> I have a few questions and comments on this patch series.
> 
> 1. Don't you think we need new tests (in test/test/) to verify the secondary-process APIs?
> 2. I suggest we define default_data_id as const, and explicitly set it to 0.
> 3. The outer for-loop in rte_timer_alt_manage() touches beyond the end of poll_lcores[]. I suggest a change like this:
> 
> -       for (i = 0, poll_lcore = poll_lcores[i]; i < nb_poll_lcores;
> -            poll_lcore = poll_lcores[++i]) {
> +       for (i = 0; I < nb_poll_lcores; i++) {
> +            poll_lcore = poll_lcores[i];
> 
> 4. Same problem (as #3) in the for-loop in rte_timer_stop_all(), in patch v4 2/2.
> 5. There seems to be no difference between "typedef void (*rte_timer_cb_t)(struct rte_timer *, void *)" and "typedef void (*rte_timer_stop_all_cb_t)(struct rte_timer *tim, void *arg)", why add rte_timer_stop_all_cb_t?
> 6. Can you provide a use case or code snippet that shows how we will use rte_timer_alt_manage()?
> 7. Why not make the argument to rte_timer_alt_manage_cb_t a "struct rte_timer *", instead of a "void *", since we pass a pointer-to-timer when we invoke the function?
> 
> --
> Regards,
> Robert Sanford
> 
> 
> On 3/6/19, 12:20 PM, "Erik Gabriel Carrillo" <erik.g.carrillo@intel.com> wrote:
> 
> Currently, the timer library uses a per-process table of structures to
> manage skiplists of timers presumably because timers contain arbitrary
> function pointers whose value may not resolve properly in other
> processes.
> 
> However, if the same callback is used handle all timers, and that
> callback is only invoked in one process, then it woud be safe to allow
> the data structures to be allocated in shared memory, and to allow
> secondary processes to modify the timer lists.  This would let timers be
> used in more multi-process scenarios.
> 
> The library's global variables are wrapped with a struct, and an array
> of these structures is created in shared memory.  The original APIs
> are updated to reference the zeroth entry in the array. This maintains
> the original behavior for both primary and secondary processes since
> the set intersection of their coremasks should be empty [1].  New APIs
> are introduced to enable the allocation/deallocation of other entries
> in the array.
> 
> New variants of the APIs used to start and stop timers are introduced;
> they allow a caller to specify which array entry should be used to
> locate the timer list to insert into or delete from.
> 
> Finally, a new variant of rte_timer_manage() is introduced, which
> allows a caller to specify which array entry should be used to locate
> the timer lists to process; it can also process multiple timer lists per
> invocation.
> 
> [1] https://doc.dpdk.org/guides/prog_guide/multi_proc_support.html#multi-process-limitations
> 
> Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
> ---
> lib/librte_timer/Makefile              |   1 +
> lib/librte_timer/rte_timer.c           | 519 ++++++++++++++++++++++++++++++---
> lib/librte_timer/rte_timer.h           | 226 +++++++++++++-
> lib/librte_timer/rte_timer_version.map |  22 ++
> 4 files changed, 723 insertions(+), 45 deletions(-)
> 
> diff --git a/lib/librte_timer/Makefile b/lib/librte_timer/Makefile
> index 4ebd528..8ec63f4 100644
> --- a/lib/librte_timer/Makefile
> +++ b/lib/librte_timer/Makefile
> @@ -6,6 +6,7 @@ include $(RTE_SDK)/mk/rte.vars.mk
> # library name
> LIB = librte_timer.a
> 
> +CFLAGS += -DALLOW_EXPERIMENTAL_API
> CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3
> LDLIBS += -lrte_eal
> 
> diff --git a/lib/librte_timer/rte_timer.c b/lib/librte_timer/rte_timer.c
> index 30c7b0a..2bd49d0 100644
> --- a/lib/librte_timer/rte_timer.c
> +++ b/lib/librte_timer/rte_timer.c
> @@ -5,6 +5,7 @@
> #include <string.h>
> #include <stdio.h>
> #include <stdint.h>
> +#include <stdbool.h>
> #include <inttypes.h>
> #include <assert.h>
> #include <sys/queue.h>
> @@ -21,11 +22,15 @@
> #include <rte_spinlock.h>
> #include <rte_random.h>
> #include <rte_pause.h>
> +#include <rte_memzone.h>
> +#include <rte_malloc.h>
> +#include <rte_compat.h>
> 
> #include "rte_timer.h"
> 
> -LIST_HEAD(rte_timer_list, rte_timer);
> -
> +/**
> + * Per-lcore info for timers.
> + */
> struct priv_timer {
>    struct rte_timer pending_head;  /**< dummy timer instance to head up list */
>    rte_spinlock_t list_lock;       /**< lock to protect list access */
> @@ -48,25 +53,84 @@ struct priv_timer {
> #endif
> } __rte_cache_aligned;
> 
> -/** per-lcore private info for timers */
> -static struct priv_timer priv_timer[RTE_MAX_LCORE];
> +#define FL_ALLOCATED    (1 << 0)
> +struct rte_timer_data {
> +    struct priv_timer priv_timer[RTE_MAX_LCORE];
> +    uint8_t internal_flags;
> +};
> +
> +#define RTE_MAX_DATA_ELS 64
> +static struct rte_timer_data *rte_timer_data_arr;
> +static uint32_t default_data_id;
> +static uint32_t rte_timer_subsystem_initialized;
> +
> +/* For maintaining older interfaces for a period */
> +static struct rte_timer_data default_timer_data;
> 
> /* when debug is enabled, store some statistics */
> #ifdef RTE_LIBRTE_TIMER_DEBUG
> -#define __TIMER_STAT_ADD(name, n) do {                    \
> +#define __TIMER_STAT_ADD(priv_timer, name, n) do {            \
>        unsigned __lcore_id = rte_lcore_id();            \
>        if (__lcore_id < RTE_MAX_LCORE)                \
>            priv_timer[__lcore_id].stats.name += (n);    \
>    } while(0)
> #else
> -#define __TIMER_STAT_ADD(name, n) do {} while(0)
> +#define __TIMER_STAT_ADD(priv_timer, name, n) do {} while (0)
> #endif
> 
> -/* Init the timer library. */
> +static inline int
> +timer_data_valid(uint32_t id)
> +{
> +    return !!(rte_timer_data_arr[id].internal_flags & FL_ALLOCATED);
> +}
> +
> +/* validate ID and retrieve timer data pointer, or return error value */
> +#define TIMER_DATA_VALID_GET_OR_ERR_RET(id, timer_data, retval) do {    \
> +    if (id >= RTE_MAX_DATA_ELS || !timer_data_valid(id))        \
> +        return retval;                        \
> +    timer_data = &rte_timer_data_arr[id];                \
> +} while (0)
> +
> +int __rte_experimental
> +rte_timer_data_alloc(uint32_t *id_ptr)
> +{
> +    int i;
> +    struct rte_timer_data *data;
> +
> +    if (!rte_timer_subsystem_initialized)
> +        return -ENOMEM;
> +
> +    for (i = 0; i < RTE_MAX_DATA_ELS; i++) {
> +        data = &rte_timer_data_arr[i];
> +        if (!(data->internal_flags & FL_ALLOCATED)) {
> +            data->internal_flags |= FL_ALLOCATED;
> +
> +            if (id_ptr)
> +                *id_ptr = i;
> +
> +            return 0;
> +        }
> +    }
> +
> +    return -ENOSPC;
> +}
> +
> +int __rte_experimental
> +rte_timer_data_dealloc(uint32_t id)
> +{
> +    struct rte_timer_data *timer_data;
> +    TIMER_DATA_VALID_GET_OR_ERR_RET(id, timer_data, -EINVAL);
> +
> +    timer_data->internal_flags &= ~(FL_ALLOCATED);
> +
> +    return 0;
> +}
> +
> void
> -rte_timer_subsystem_init(void)
> +rte_timer_subsystem_init_v20(void)
> {
>    unsigned lcore_id;
> +    struct priv_timer *priv_timer = default_timer_data.priv_timer;
> 
>    /* since priv_timer is static, it's zeroed by default, so only init some
>     * fields.
> @@ -76,6 +140,76 @@ rte_timer_subsystem_init(void)
>        priv_timer[lcore_id].prev_lcore = lcore_id;
>    }
> }
> +VERSION_SYMBOL(rte_timer_subsystem_init, _v20, 2.0);
> +
> +/* Init the timer library. Allocate an array of timer data structs in shared
> + * memory, and allocate the zeroth entry for use with original timer
> + * APIs. Since the intersection of the sets of lcore ids in primary and
> + * secondary processes should be empty, the zeroth entry can be shared by
> + * multiple processes.
> + */
> +int
> +rte_timer_subsystem_init_v1905(void)
> +{
> +    const struct rte_memzone *mz;
> +    struct rte_timer_data *data;
> +    int i, lcore_id;
> +    static const char *mz_name = "rte_timer_mz";
> +
> +    if (rte_timer_subsystem_initialized)
> +        return -EALREADY;
> +
> +    if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
> +        mz = rte_memzone_lookup(mz_name);
> +        if (mz == NULL)
> +            return -EEXIST;
> +
> +        rte_timer_data_arr = mz->addr;
> +
> +        rte_timer_data_arr[default_data_id].internal_flags |=
> +            FL_ALLOCATED;
> +
> +        rte_timer_subsystem_initialized = 1;
> +
> +        return 0;
> +    }
> +
> +    mz = rte_memzone_reserve_aligned(mz_name,
> +            RTE_MAX_DATA_ELS * sizeof(*rte_timer_data_arr),
> +            SOCKET_ID_ANY, 0, RTE_CACHE_LINE_SIZE);
> +    if (mz == NULL)
> +        return -ENOMEM;
> +
> +    rte_timer_data_arr = mz->addr;
> +
> +    for (i = 0; i < RTE_MAX_DATA_ELS; i++) {
> +        data = &rte_timer_data_arr[i];
> +
> +        for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
> +            rte_spinlock_init(
> +                &data->priv_timer[lcore_id].list_lock);
> +            data->priv_timer[lcore_id].prev_lcore = lcore_id;
> +        }
> +    }
> +
> +    rte_timer_data_arr[default_data_id].internal_flags |= FL_ALLOCATED;
> +
> +    rte_timer_subsystem_initialized = 1;
> +
> +    return 0;
> +}
> +MAP_STATIC_SYMBOL(int rte_timer_subsystem_init(void),
> +          rte_timer_subsystem_init_v1905);
> +BIND_DEFAULT_SYMBOL(rte_timer_subsystem_init, _v1905, 19.05);
> +
> +void __rte_experimental
> +rte_timer_subsystem_finalize(void)
> +{
> +    if (rte_timer_data_arr)
> +        rte_free(rte_timer_data_arr);
> +
> +    rte_timer_subsystem_initialized = 0;
> +}
> 
> /* Initialize the timer handle tim for use */
> void
> @@ -95,7 +229,8 @@ rte_timer_init(struct rte_timer *tim)
>  */
> static int
> timer_set_config_state(struct rte_timer *tim,
> -               union rte_timer_status *ret_prev_status)
> +               union rte_timer_status *ret_prev_status,
> +               struct priv_timer *priv_timer)
> {
>    union rte_timer_status prev_status, status;
>    int success = 0;
> @@ -207,7 +342,7 @@ timer_get_skiplist_level(unsigned curr_depth)
>  */
> static void
> timer_get_prev_entries(uint64_t time_val, unsigned tim_lcore,
> -        struct rte_timer **prev)
> +               struct rte_timer **prev, struct priv_timer *priv_timer)
> {
>    unsigned lvl = priv_timer[tim_lcore].curr_skiplist_depth;
>    prev[lvl] = &priv_timer[tim_lcore].pending_head;
> @@ -226,13 +361,15 @@ timer_get_prev_entries(uint64_t time_val, unsigned tim_lcore,
>  */
> static void
> timer_get_prev_entries_for_node(struct rte_timer *tim, unsigned tim_lcore,
> -        struct rte_timer **prev)
> +                struct rte_timer **prev,
> +                struct priv_timer *priv_timer)
> {
>    int i;
> +
>    /* to get a specific entry in the list, look for just lower than the time
>     * values, and then increment on each level individually if necessary
>     */
> -    timer_get_prev_entries(tim->expire - 1, tim_lcore, prev);
> +    timer_get_prev_entries(tim->expire - 1, tim_lcore, prev, priv_timer);
>    for (i = priv_timer[tim_lcore].curr_skiplist_depth - 1; i >= 0; i--) {
>        while (prev[i]->sl_next[i] != NULL &&
>                prev[i]->sl_next[i] != tim &&
> @@ -247,14 +384,15 @@ timer_get_prev_entries_for_node(struct rte_timer *tim, unsigned tim_lcore,
>  * timer must not be in a list
>  */
> static void
> -timer_add(struct rte_timer *tim, unsigned int tim_lcore)
> +timer_add(struct rte_timer *tim, unsigned int tim_lcore,
> +      struct priv_timer *priv_timer)
> {
>    unsigned lvl;
>    struct rte_timer *prev[MAX_SKIPLIST_DEPTH+1];
> 
>    /* find where exactly this element goes in the list of elements
>     * for each depth. */
> -    timer_get_prev_entries(tim->expire, tim_lcore, prev);
> +    timer_get_prev_entries(tim->expire, tim_lcore, prev, priv_timer);
> 
>    /* now assign it a new level and add at that level */
>    const unsigned tim_level = timer_get_skiplist_level(
> @@ -284,7 +422,7 @@ timer_add(struct rte_timer *tim, unsigned int tim_lcore)
>  */
> static void
> timer_del(struct rte_timer *tim, union rte_timer_status prev_status,
> -        int local_is_locked)
> +      int local_is_locked, struct priv_timer *priv_timer)
> {
>    unsigned lcore_id = rte_lcore_id();
>    unsigned prev_owner = prev_status.owner;
> @@ -304,7 +442,7 @@ timer_del(struct rte_timer *tim, union rte_timer_status prev_status,
>                ((tim->sl_next[0] == NULL) ? 0 : tim->sl_next[0]->expire);
> 
>    /* adjust pointers from previous entries to point past this */
> -    timer_get_prev_entries_for_node(tim, prev_owner, prev);
> +    timer_get_prev_entries_for_node(tim, prev_owner, prev, priv_timer);
>    for (i = priv_timer[prev_owner].curr_skiplist_depth - 1; i >= 0; i--) {
>        if (prev[i]->sl_next[i] == tim)
>            prev[i]->sl_next[i] = tim->sl_next[i];
> @@ -326,11 +464,13 @@ static int
> __rte_timer_reset(struct rte_timer *tim, uint64_t expire,
>          uint64_t period, unsigned tim_lcore,
>          rte_timer_cb_t fct, void *arg,
> -          int local_is_locked)
> +          int local_is_locked,
> +          struct rte_timer_data *timer_data)
> {
>    union rte_timer_status prev_status, status;
>    int ret;
>    unsigned lcore_id = rte_lcore_id();
> +    struct priv_timer *priv_timer = timer_data->priv_timer;
> 
>    /* round robin for tim_lcore */
>    if (tim_lcore == (unsigned)LCORE_ID_ANY) {
> @@ -348,11 +488,11 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t expire,
> 
>    /* wait that the timer is in correct status before update,
>     * and mark it as being configured */
> -    ret = timer_set_config_state(tim, &prev_status);
> +    ret = timer_set_config_state(tim, &prev_status, priv_timer);
>    if (ret < 0)
>        return -1;
> 
> -    __TIMER_STAT_ADD(reset, 1);
> +    __TIMER_STAT_ADD(priv_timer, reset, 1);
>    if (prev_status.state == RTE_TIMER_RUNNING &&
>        lcore_id < RTE_MAX_LCORE) {
>        priv_timer[lcore_id].updated = 1;
> @@ -360,8 +500,8 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t expire,
> 
>    /* remove it from list */
>    if (prev_status.state == RTE_TIMER_PENDING) {
> -        timer_del(tim, prev_status, local_is_locked);
> -        __TIMER_STAT_ADD(pending, -1);
> +        timer_del(tim, prev_status, local_is_locked, priv_timer);
> +        __TIMER_STAT_ADD(priv_timer, pending, -1);
>    }
> 
>    tim->period = period;
> @@ -376,8 +516,8 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t expire,
>    if (tim_lcore != lcore_id || !local_is_locked)
>        rte_spinlock_lock(&priv_timer[tim_lcore].list_lock);
> 
> -    __TIMER_STAT_ADD(pending, 1);
> -    timer_add(tim, tim_lcore);
> +    __TIMER_STAT_ADD(priv_timer, pending, 1);
> +    timer_add(tim, tim_lcore, priv_timer);
> 
>    /* update state: as we are in CONFIG state, only us can modify
>     * the state so we don't need to use cmpset() here */
> @@ -394,9 +534,9 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t expire,
> 
> /* Reset and start the timer associated with the timer handle tim */
> int
> -rte_timer_reset(struct rte_timer *tim, uint64_t ticks,
> -        enum rte_timer_type type, unsigned tim_lcore,
> -        rte_timer_cb_t fct, void *arg)
> +rte_timer_reset_v20(struct rte_timer *tim, uint64_t ticks,
> +            enum rte_timer_type type, unsigned int tim_lcore,
> +            rte_timer_cb_t fct, void *arg)
> {
>    uint64_t cur_time = rte_get_timer_cycles();
>    uint64_t period;
> @@ -412,7 +552,48 @@ rte_timer_reset(struct rte_timer *tim, uint64_t ticks,
>        period = 0;
> 
>    return __rte_timer_reset(tim,  cur_time + ticks, period, tim_lcore,
> -              fct, arg, 0);
> +              fct, arg, 0, &default_timer_data);
> +}
> +VERSION_SYMBOL(rte_timer_reset, _v20, 2.0);
> +
> +int
> +rte_timer_reset_v1905(struct rte_timer *tim, uint64_t ticks,
> +              enum rte_timer_type type, unsigned int tim_lcore,
> +              rte_timer_cb_t fct, void *arg)
> +{
> +    return rte_timer_alt_reset(default_data_id, tim, ticks, type,
> +                   tim_lcore, fct, arg);
> +}
> +MAP_STATIC_SYMBOL(int rte_timer_reset(struct rte_timer *tim, uint64_t ticks,
> +                      enum rte_timer_type type,
> +                      unsigned int tim_lcore,
> +                      rte_timer_cb_t fct, void *arg),
> +          rte_timer_reset_v1905);
> +BIND_DEFAULT_SYMBOL(rte_timer_reset, _v1905, 19.05);
> +
> +int __rte_experimental
> +rte_timer_alt_reset(uint32_t timer_data_id, struct rte_timer *tim,
> +            uint64_t ticks, enum rte_timer_type type,
> +            unsigned int tim_lcore, rte_timer_cb_t fct, void *arg)
> +{
> +    uint64_t cur_time = rte_get_timer_cycles();
> +    uint64_t period;
> +    struct rte_timer_data *timer_data;
> +
> +    TIMER_DATA_VALID_GET_OR_ERR_RET(timer_data_id, timer_data, -EINVAL);
> +
> +    if (unlikely((tim_lcore != (unsigned int)LCORE_ID_ANY) &&
> +            !(rte_lcore_is_enabled(tim_lcore) ||
> +              rte_lcore_has_role(tim_lcore, ROLE_SERVICE))))
> +        return -1;
> +
> +    if (type == PERIODICAL)
> +        period = ticks;
> +    else
> +        period = 0;
> +
> +    return __rte_timer_reset(tim,  cur_time + ticks, period, tim_lcore,
> +                 fct, arg, 0, timer_data);
> }
> 
> /* loop until rte_timer_reset() succeed */
> @@ -426,21 +607,22 @@ rte_timer_reset_sync(struct rte_timer *tim, uint64_t ticks,
>        rte_pause();
> }
> 
> -/* Stop the timer associated with the timer handle tim */
> -int
> -rte_timer_stop(struct rte_timer *tim)
> +static int
> +__rte_timer_stop(struct rte_timer *tim, int local_is_locked,
> +         struct rte_timer_data *timer_data)
> {
>    union rte_timer_status prev_status, status;
>    unsigned lcore_id = rte_lcore_id();
>    int ret;
> +    struct priv_timer *priv_timer = timer_data->priv_timer;
> 
>    /* wait that the timer is in correct status before update,
>     * and mark it as being configured */
> -    ret = timer_set_config_state(tim, &prev_status);
> +    ret = timer_set_config_state(tim, &prev_status, priv_timer);
>    if (ret < 0)
>        return -1;
> 
> -    __TIMER_STAT_ADD(stop, 1);
> +    __TIMER_STAT_ADD(priv_timer, stop, 1);
>    if (prev_status.state == RTE_TIMER_RUNNING &&
>        lcore_id < RTE_MAX_LCORE) {
>        priv_timer[lcore_id].updated = 1;
> @@ -448,8 +630,8 @@ rte_timer_stop(struct rte_timer *tim)
> 
>    /* remove it from list */
>    if (prev_status.state == RTE_TIMER_PENDING) {
> -        timer_del(tim, prev_status, 0);
> -        __TIMER_STAT_ADD(pending, -1);
> +        timer_del(tim, prev_status, local_is_locked, priv_timer);
> +        __TIMER_STAT_ADD(priv_timer, pending, -1);
>    }
> 
>    /* mark timer as stopped */
> @@ -461,6 +643,33 @@ rte_timer_stop(struct rte_timer *tim)
>    return 0;
> }
> 
> +/* Stop the timer associated with the timer handle tim */
> +int
> +rte_timer_stop_v20(struct rte_timer *tim)
> +{
> +    return __rte_timer_stop(tim, 0, &default_timer_data);
> +}
> +VERSION_SYMBOL(rte_timer_stop, _v20, 2.0);
> +
> +int
> +rte_timer_stop_v1905(struct rte_timer *tim)
> +{
> +    return rte_timer_alt_stop(default_data_id, tim);
> +}
> +MAP_STATIC_SYMBOL(int rte_timer_stop(struct rte_timer *tim),
> +          rte_timer_stop_v1905);
> +BIND_DEFAULT_SYMBOL(rte_timer_stop, _v1905, 19.05);
> +
> +int __rte_experimental
> +rte_timer_alt_stop(uint32_t timer_data_id, struct rte_timer *tim)
> +{
> +    struct rte_timer_data *timer_data;
> +
> +    TIMER_DATA_VALID_GET_OR_ERR_RET(timer_data_id, timer_data, -EINVAL);
> +
> +    return __rte_timer_stop(tim, 0, timer_data);
> +}
> +
> /* loop until rte_timer_stop() succeed */
> void
> rte_timer_stop_sync(struct rte_timer *tim)
> @@ -477,7 +686,8 @@ rte_timer_pending(struct rte_timer *tim)
> }
> 
> /* must be called periodically, run all timer that expired */
> -void rte_timer_manage(void)
> +static void
> +__rte_timer_manage(struct rte_timer_data *timer_data)
> {
>    union rte_timer_status status;
>    struct rte_timer *tim, *next_tim;
> @@ -486,11 +696,12 @@ void rte_timer_manage(void)
>    struct rte_timer *prev[MAX_SKIPLIST_DEPTH + 1];
>    uint64_t cur_time;
>    int i, ret;
> +    struct priv_timer *priv_timer = timer_data->priv_timer;
> 
>    /* timer manager only runs on EAL thread with valid lcore_id */
>    assert(lcore_id < RTE_MAX_LCORE);
> 
> -    __TIMER_STAT_ADD(manage, 1);
> +    __TIMER_STAT_ADD(priv_timer, manage, 1);
>    /* optimize for the case where per-cpu list is empty */
>    if (priv_timer[lcore_id].pending_head.sl_next[0] == NULL)
>        return;
> @@ -518,7 +729,7 @@ void rte_timer_manage(void)
>    tim = priv_timer[lcore_id].pending_head.sl_next[0];
> 
>    /* break the existing list at current time point */
> -    timer_get_prev_entries(cur_time, lcore_id, prev);
> +    timer_get_prev_entries(cur_time, lcore_id, prev, priv_timer);
>    for (i = priv_timer[lcore_id].curr_skiplist_depth -1; i >= 0; i--) {
>        if (prev[i] == &priv_timer[lcore_id].pending_head)
>            continue;
> @@ -563,7 +774,7 @@ void rte_timer_manage(void)
>        /* execute callback function with list unlocked */
>        tim->f(tim, tim->arg);
> 
> -        __TIMER_STAT_ADD(pending, -1);
> +        __TIMER_STAT_ADD(priv_timer, pending, -1);
>        /* the timer was stopped or reloaded by the callback
>         * function, we have nothing to do here */
>        if (priv_timer[lcore_id].updated == 1)
> @@ -580,24 +791,222 @@ void rte_timer_manage(void)
>            /* keep it in list and mark timer as pending */
>            rte_spinlock_lock(&priv_timer[lcore_id].list_lock);
>            status.state = RTE_TIMER_PENDING;
> -            __TIMER_STAT_ADD(pending, 1);
> +            __TIMER_STAT_ADD(priv_timer, pending, 1);
>            status.owner = (int16_t)lcore_id;
>            rte_wmb();
>            tim->status.u32 = status.u32;
>            __rte_timer_reset(tim, tim->expire + tim->period,
> -                tim->period, lcore_id, tim->f, tim->arg, 1);
> +                tim->period, lcore_id, tim->f, tim->arg, 1,
> +                timer_data);
>            rte_spinlock_unlock(&priv_timer[lcore_id].list_lock);
>        }
>    }
>    priv_timer[lcore_id].running_tim = NULL;
> }
> 
> +void
> +rte_timer_manage_v20(void)
> +{
> +    __rte_timer_manage(&default_timer_data);
> +}
> +VERSION_SYMBOL(rte_timer_manage, _v20, 2.0);
> +
> +int
> +rte_timer_manage_v1905(void)
> +{
> +    struct rte_timer_data *timer_data;
> +
> +    TIMER_DATA_VALID_GET_OR_ERR_RET(default_data_id, timer_data, -EINVAL);
> +
> +    __rte_timer_manage(timer_data);
> +
> +    return 0;
> +}
> +MAP_STATIC_SYMBOL(int rte_timer_manage(void), rte_timer_manage_v1905);
> +BIND_DEFAULT_SYMBOL(rte_timer_manage, _v1905, 19.05);
> +
> +int __rte_experimental
> +rte_timer_alt_manage(uint32_t timer_data_id,
> +             unsigned int *poll_lcores,
> +             int nb_poll_lcores,
> +             rte_timer_alt_manage_cb_t f)
> +{
> +    union rte_timer_status status;
> +    struct rte_timer *tim, *next_tim, **pprev;
> +    struct rte_timer *run_first_tims[RTE_MAX_LCORE];
> +    unsigned int runlist_lcore_ids[RTE_MAX_LCORE];
> +    unsigned int this_lcore = rte_lcore_id();
> +    struct rte_timer *prev[MAX_SKIPLIST_DEPTH + 1];
> +    uint64_t cur_time;
> +    int i, j, ret;
> +    int nb_runlists = 0;
> +    struct rte_timer_data *data;
> +    struct priv_timer *privp;
> +    uint32_t poll_lcore;
> +
> +    TIMER_DATA_VALID_GET_OR_ERR_RET(timer_data_id, data, -EINVAL);
> +
> +    /* timer manager only runs on EAL thread with valid lcore_id */
> +    assert(this_lcore < RTE_MAX_LCORE);
> +
> +    __TIMER_STAT_ADD(data->priv_timer, manage, 1);
> +
> +    if (poll_lcores == NULL) {
> +        poll_lcores = (unsigned int []){rte_lcore_id()};
> +        nb_poll_lcores = 1;
> +    }
> +
> +    for (i = 0, poll_lcore = poll_lcores[i]; i < nb_poll_lcores;
> +         poll_lcore = poll_lcores[++i]) {
> +        privp = &data->priv_timer[poll_lcore];
> +
> +        /* optimize for the case where per-cpu list is empty */
> +        if (privp->pending_head.sl_next[0] == NULL)
> +            continue;
> +        cur_time = rte_get_timer_cycles();
> +
> +#ifdef RTE_ARCH_64
> +        /* on 64-bit the value cached in the pending_head.expired will
> +         * be updated atomically, so we can consult that for a quick
> +         * check here outside the lock
> +         */
> +        if (likely(privp->pending_head.expire > cur_time))
> +            continue;
> +#endif
> +
> +        /* browse ordered list, add expired timers in 'expired' list */
> +        rte_spinlock_lock(&privp->list_lock);
> +
> +        /* if nothing to do just unlock and return */
> +        if (privp->pending_head.sl_next[0] == NULL ||
> +            privp->pending_head.sl_next[0]->expire > cur_time) {
> +            rte_spinlock_unlock(&privp->list_lock);
> +            continue;
> +        }
> +
> +        /* save start of list of expired timers */
> +        tim = privp->pending_head.sl_next[0];
> +
> +        /* break the existing list at current time point */
> +        timer_get_prev_entries(cur_time, poll_lcore, prev,
> +                       data->priv_timer);
> +        for (j = privp->curr_skiplist_depth - 1; j >= 0; j--) {
> +            if (prev[j] == &privp->pending_head)
> +                continue;
> +            privp->pending_head.sl_next[j] =
> +                prev[j]->sl_next[j];
> +            if (prev[j]->sl_next[j] == NULL)
> +                privp->curr_skiplist_depth--;
> +
> +            prev[j]->sl_next[j] = NULL;
> +        }
> +
> +        /* transition run-list from PENDING to RUNNING */
> +        run_first_tims[nb_runlists] = tim;
> +        runlist_lcore_ids[nb_runlists] = poll_lcore;
> +        pprev = &run_first_tims[nb_runlists];
> +        nb_runlists++;
> +
> +        for ( ; tim != NULL; tim = next_tim) {
> +            next_tim = tim->sl_next[0];
> +
> +            ret = timer_set_running_state(tim);
> +            if (likely(ret == 0)) {
> +                pprev = &tim->sl_next[0];
> +            } else {
> +                /* another core is trying to re-config this one,
> +                 * remove it from local expired list
> +                 */
> +                *pprev = next_tim;
> +            }
> +        }
> +
> +        /* update the next to expire timer value */
> +        privp->pending_head.expire =
> +            (privp->pending_head.sl_next[0] == NULL) ? 0 :
> +            privp->pending_head.sl_next[0]->expire;
> +
> +        rte_spinlock_unlock(&privp->list_lock);
> +    }
> +
> +    /* Now process the run lists */
> +    while (1) {
> +        bool done = true;
> +        uint64_t min_expire = UINT64_MAX;
> +        int min_idx = 0;
> +
> +        /* Find the next oldest timer to process */
> +        for (i = 0; i < nb_runlists; i++) {
> +            tim = run_first_tims[i];
> +
> +            if (tim != NULL && tim->expire < min_expire) {
> +                min_expire = tim->expire;
> +                min_idx = i;
> +                done = false;
> +            }
> +        }
> +
> +        if (done)
> +            break;
> +
> +        tim = run_first_tims[min_idx];
> +        privp = &data->priv_timer[runlist_lcore_ids[min_idx]];
> +
> +        /* Move down the runlist from which we picked a timer to
> +         * execute
> +         */
> +        run_first_tims[min_idx] = run_first_tims[min_idx]->sl_next[0];
> +
> +        privp->updated = 0;
> +        privp->running_tim = tim;
> +
> +        /* Call the provided callback function */
> +        f(tim);
> +
> +        __TIMER_STAT_ADD(privp, pending, -1);
> +
> +        /* the timer was stopped or reloaded by the callback
> +         * function, we have nothing to do here
> +         */
> +        if (privp->updated == 1)
> +            continue;
> +
> +        if (tim->period == 0) {
> +            /* remove from done list and mark timer as stopped */
> +            status.state = RTE_TIMER_STOP;
> +            status.owner = RTE_TIMER_NO_OWNER;
> +            rte_wmb();
> +            tim->status.u32 = status.u32;
> +        } else {
> +            /* keep it in list and mark timer as pending */
> +            rte_spinlock_lock(
> +                &data->priv_timer[this_lcore].list_lock);
> +            status.state = RTE_TIMER_PENDING;
> +            __TIMER_STAT_ADD(data->priv_timer, pending, 1);
> +            status.owner = (int16_t)this_lcore;
> +            rte_wmb();
> +            tim->status.u32 = status.u32;
> +            __rte_timer_reset(tim, tim->expire + tim->period,
> +                tim->period, this_lcore, tim->f, tim->arg, 1,
> +                data);
> +            rte_spinlock_unlock(
> +                &data->priv_timer[this_lcore].list_lock);
> +        }
> +
> +        privp->running_tim = NULL;
> +    }
> +
> +    return 0;
> +}
> +
> /* dump statistics about timers */
> -void rte_timer_dump_stats(FILE *f)
> +static void
> +__rte_timer_dump_stats(struct rte_timer_data *timer_data __rte_unused, FILE *f)
> {
> #ifdef RTE_LIBRTE_TIMER_DEBUG
>    struct rte_timer_debug_stats sum;
>    unsigned lcore_id;
> +    struct priv_timer *priv_timer = timer_data->priv_timer;
> 
>    memset(&sum, 0, sizeof(sum));
>    for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
> @@ -615,3 +1024,31 @@ void rte_timer_dump_stats(FILE *f)
>    fprintf(f, "No timer statistics, RTE_LIBRTE_TIMER_DEBUG is disabled\n");
> #endif
> }
> +
> +void
> +rte_timer_dump_stats_v20(FILE *f)
> +{
> +    __rte_timer_dump_stats(&default_timer_data, f);
> +}
> +VERSION_SYMBOL(rte_timer_dump_stats, _v20, 2.0);
> +
> +int
> +rte_timer_dump_stats_v1905(FILE *f)
> +{
> +    return rte_timer_alt_dump_stats(default_data_id, f);
> +}
> +MAP_STATIC_SYMBOL(int rte_timer_dump_stats(FILE *f),
> +          rte_timer_dump_stats_v1905);
> +BIND_DEFAULT_SYMBOL(rte_timer_dump_stats, _v1905, 19.05);
> +
> +int __rte_experimental
> +rte_timer_alt_dump_stats(uint32_t timer_data_id __rte_unused, FILE *f)
> +{
> +    struct rte_timer_data *timer_data;
> +
> +    TIMER_DATA_VALID_GET_OR_ERR_RET(timer_data_id, timer_data, -EINVAL);
> +
> +    __rte_timer_dump_stats(timer_data, f);
> +
> +    return 0;
> +}
> diff --git a/lib/librte_timer/rte_timer.h b/lib/librte_timer/rte_timer.h
> index 9b95cd2..bee1676 100644
> --- a/lib/librte_timer/rte_timer.h
> +++ b/lib/librte_timer/rte_timer.h
> @@ -39,6 +39,7 @@
> #include <stddef.h>
> #include <rte_common.h>
> #include <rte_config.h>
> +#include <rte_spinlock.h>
> 
> #ifdef __cplusplus
> extern "C" {
> @@ -132,12 +133,68 @@ struct rte_timer
> #endif
> 
> /**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Allocate a timer data instance in shared memory to track a set of pending
> + * timer lists.
> + *
> + * @param id_ptr
> + *   Pointer to variable into which to write the identifier of the allocated
> + *   timer data instance.
> + *
> + * @return
> + *   - 0: Success
> + *   - -ENOSPC: maximum number of timer data instances already allocated
> + */
> +int __rte_experimental rte_timer_data_alloc(uint32_t *id_ptr);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Deallocate a timer data instance.
> + *
> + * @param id
> + *   Identifier of the timer data instance to deallocate.
> + *
> + * @return
> + *   - 0: Success
> + *   - -EINVAL: invalid timer data instance identifier
> + */
> +int __rte_experimental rte_timer_data_dealloc(uint32_t id);
> +
> +/**
>  * Initialize the timer library.
>  *
>  * Initializes internal variables (list, locks and so on) for the RTE
>  * timer library.
>  */
> -void rte_timer_subsystem_init(void);
> +void rte_timer_subsystem_init_v20(void);
> +
> +/**
> + * Initialize the timer library.
> + *
> + * Initializes internal variables (list, locks and so on) for the RTE
> + * timer library.
> + *
> + * @return
> + *   - 0: Success
> + *   - -EEXIST: Returned in secondary process when primary process has not
> + *      yet initialized the timer subsystem
> + *   - -ENOMEM: Unable to allocate memory needed to initialize timer
> + *      subsystem
> + */
> +int rte_timer_subsystem_init_v1905(void);
> +int rte_timer_subsystem_init(void);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Free timer subsystem resources.
> + */
> +void __rte_experimental rte_timer_subsystem_finalize(void);
> 
> /**
>  * Initialize a timer handle.
> @@ -193,6 +250,12 @@ void rte_timer_init(struct rte_timer *tim);
>  *   - 0: Success; the timer is scheduled.
>  *   - (-1): Timer is in the RUNNING or CONFIG state.
>  */
> +int rte_timer_reset_v20(struct rte_timer *tim, uint64_t ticks,
> +            enum rte_timer_type type, unsigned int tim_lcore,
> +            rte_timer_cb_t fct, void *arg);
> +int rte_timer_reset_v1905(struct rte_timer *tim, uint64_t ticks,
> +              enum rte_timer_type type, unsigned int tim_lcore,
> +              rte_timer_cb_t fct, void *arg);
> int rte_timer_reset(struct rte_timer *tim, uint64_t ticks,
>            enum rte_timer_type type, unsigned tim_lcore,
>            rte_timer_cb_t fct, void *arg);
> @@ -252,9 +315,10 @@ rte_timer_reset_sync(struct rte_timer *tim, uint64_t ticks,
>  *   - 0: Success; the timer is stopped.
>  *   - (-1): The timer is in the RUNNING or CONFIG state.
>  */
> +int rte_timer_stop_v20(struct rte_timer *tim);
> +int rte_timer_stop_v1905(struct rte_timer *tim);
> int rte_timer_stop(struct rte_timer *tim);
> 
> -
> /**
>  * Loop until rte_timer_stop() succeeds.
>  *
> @@ -292,7 +356,25 @@ int rte_timer_pending(struct rte_timer *tim);
>  * function. However, the more often the function is called, the more
>  * CPU resources it will use.
>  */
> -void rte_timer_manage(void);
> +void rte_timer_manage_v20(void);
> +
> +/**
> + * Manage the timer list and execute callback functions.
> + *
> + * This function must be called periodically from EAL lcores
> + * main_loop(). It browses the list of pending timers and runs all
> + * timers that are expired.
> + *
> + * The precision of the timer depends on the call frequency of this
> + * function. However, the more often the function is called, the more
> + * CPU resources it will use.
> + *
> + * @return
> + *   - 0: Success
> + *   - -EINVAL: timer subsystem not yet initialized
> + */
> +int rte_timer_manage_v1905(void);
> +int rte_timer_manage(void);
> 
> /**
>  * Dump statistics about timers.
> @@ -300,7 +382,143 @@ void rte_timer_manage(void);
>  * @param f
>  *   A pointer to a file for output
>  */
> -void rte_timer_dump_stats(FILE *f);
> +void rte_timer_dump_stats_v20(FILE *f);
> +
> +/**
> + * Dump statistics about timers.
> + *
> + * @param f
> + *   A pointer to a file for output
> + * @return
> + *   - 0: Success
> + *   - -EINVAL: timer subsystem not yet initialized
> + */
> +int rte_timer_dump_stats_v1905(FILE *f);
> +int rte_timer_dump_stats(FILE *f);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * This function is the same as rte_timer_reset(), except that it allows a
> + * caller to specify the rte_timer_data instance containing the list to which
> + * the timer should be added.
> + *
> + * @see rte_timer_reset()
> + *
> + * @param timer_data_id
> + *   An identifier indicating which instance of timer data should be used for
> + *   this operation.
> + * @param tim
> + *   The timer handle.
> + * @param ticks
> + *   The number of cycles (see rte_get_hpet_hz()) before the callback
> + *   function is called.
> + * @param type
> + *   The type can be either:
> + *   - PERIODICAL: The timer is automatically reloaded after execution
> + *     (returns to the PENDING state)
> + *   - SINGLE: The timer is one-shot, that is, the timer goes to a
> + *     STOPPED state after execution.
> + * @param tim_lcore
> + *   The ID of the lcore where the timer callback function has to be
> + *   executed. If tim_lcore is LCORE_ID_ANY, the timer library will
> + *   launch it on a different core for each call (round-robin).
> + * @param fct
> + *   The callback function of the timer. This parameter can be NULL if (and
> + *   only if) rte_timer_alt_manage() will be used to manage this timer.
> + * @param arg
> + *   The user argument of the callback function.
> + * @return
> + *   - 0: Success; the timer is scheduled.
> + *   - (-1): Timer is in the RUNNING or CONFIG state.
> + *   - -EINVAL: invalid timer_data_id
> + */
> +int __rte_experimental
> +rte_timer_alt_reset(uint32_t timer_data_id, struct rte_timer *tim,
> +            uint64_t ticks, enum rte_timer_type type,
> +            unsigned int tim_lcore, rte_timer_cb_t fct, void *arg);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * This function is the same as rte_timer_stop(), except that it allows a
> + * caller to specify the rte_timer_data instance containing the list from which
> + * this timer should be removed.
> + *
> + * @see rte_timer_stop()
> + *
> + * @param timer_data_id
> + *   An identifier indicating which instance of timer data should be used for
> + *   this operation.
> + * @param tim
> + *   The timer handle.
> + * @return
> + *   - 0: Success; the timer is stopped.
> + *   - (-1): The timer is in the RUNNING or CONFIG state.
> + *   - -EINVAL: invalid timer_data_id
> + */
> +int __rte_experimental
> +rte_timer_alt_stop(uint32_t timer_data_id, struct rte_timer *tim);
> +
> +/**
> + * Callback function type for rte_timer_alt_manage().
> + */
> +typedef void (*rte_timer_alt_manage_cb_t)(void *);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Manage a set of timer lists and execute the specified callback function for
> + * all expired timers. This function is similar to rte_timer_manage(), except
> + * that it allows a caller to specify the timer_data instance that should
> + * be operated on, as well as a set of lcore IDs identifying which timer lists
> + * should be processed.  Callback functions of individual timers are ignored.
> + *
> + * @see rte_timer_manage()
> + *
> + * @param timer_data_id
> + *   An identifier indicating which instance of timer data should be used for
> + *   this operation.
> + * @param poll_lcores
> + *   An array of lcore ids identifying the timer lists that should be processed.
> + *   NULL is allowed - if NULL, the timer list corresponding to the lcore
> + *   calling this routine is processed (same as rte_timer_manage()).
> + * @param n_poll_lcores
> + *   The size of the poll_lcores array. If 'poll_lcores' is NULL, this parameter
> + *   is ignored.
> + * @param f
> + *   The callback function which should be called for all expired timers.
> + * @return
> + *   - 0: success
> + *   - -EINVAL: invalid timer_data_id
> + */
> +int __rte_experimental
> +rte_timer_alt_manage(uint32_t timer_data_id, unsigned int *poll_lcores,
> +             int n_poll_lcores, rte_timer_alt_manage_cb_t f);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * This function is the same as rte_timer_dump_stats(), except that it allows
> + * the caller to specify the rte_timer_data instance that should be used.
> + *
> + * @see rte_timer_dump_stats()
> + *
> + * @param timer_data_id
> + *   An identifier indicating which instance of timer data should be used for
> + *   this operation.
> + * @param f
> + *   A pointer to a file for output
> + * @return
> + *   - 0: success
> + *   - -EINVAL: invalid timer_data_id
> + */
> +int __rte_experimental
> +rte_timer_alt_dump_stats(uint32_t timer_data_id, FILE *f);
> 
> #ifdef __cplusplus
> }
> diff --git a/lib/librte_timer/rte_timer_version.map b/lib/librte_timer/rte_timer_version.map
> index 9b2e4b8..c2e5836 100644
> --- a/lib/librte_timer/rte_timer_version.map
> +++ b/lib/librte_timer/rte_timer_version.map
> @@ -13,3 +13,25 @@ DPDK_2.0 {
> 
>    local: *;
> };
> +
> +DPDK_19.05 {
> +    global:
> +
> +    rte_timer_dump_stats;
> +    rte_timer_manage;
> +    rte_timer_reset;
> +    rte_timer_stop;
> +    rte_timer_subsystem_init;
> +} DPDK_2.0;
> +
> +EXPERIMENTAL {
> +    global:
> +
> +    rte_timer_alt_dump_stats;
> +    rte_timer_alt_manage;
> +    rte_timer_alt_reset;
> +    rte_timer_alt_stop;
> +    rte_timer_data_alloc;
> +    rte_timer_data_dealloc;
> +    rte_timer_subsystem_finalize;
> +};
> -- 
> 2.6.4
> 
> 

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [dpdk-dev] [PATCH v4 1/2] timer: allow timer management in shared memory
  2019-03-21  1:01           ` Carrillo, Erik G
@ 2019-03-21  1:01             ` Carrillo, Erik G
  2019-03-27 14:03             ` Thomas Monjalon
  1 sibling, 0 replies; 77+ messages in thread
From: Carrillo, Erik G @ 2019-03-21  1:01 UTC (permalink / raw)
  To: Sanford, Robert; +Cc: thomas, dev, nhorman

Hi Robert,

Thanks for the review and suggestions.  I’m out of the office on bonding leave for the next few weeks, but I’ll update the patch to address your points below when I return.

Thanks,
Erik

> On Mar 20, 2019, at 8:53 AM, Sanford, Robert <rsanford@akamai.com> wrote:
> 
> Hi Erik,
> 
> I have a few questions and comments on this patch series.
> 
> 1. Don't you think we need new tests (in test/test/) to verify the secondary-process APIs?
> 2. I suggest we define default_data_id as const, and explicitly set it to 0.
> 3. The outer for-loop in rte_timer_alt_manage() touches beyond the end of poll_lcores[]. I suggest a change like this:
> 
> -       for (i = 0, poll_lcore = poll_lcores[i]; i < nb_poll_lcores;
> -            poll_lcore = poll_lcores[++i]) {
> +       for (i = 0; I < nb_poll_lcores; i++) {
> +            poll_lcore = poll_lcores[i];
> 
> 4. Same problem (as #3) in the for-loop in rte_timer_stop_all(), in patch v4 2/2.
> 5. There seems to be no difference between "typedef void (*rte_timer_cb_t)(struct rte_timer *, void *)" and "typedef void (*rte_timer_stop_all_cb_t)(struct rte_timer *tim, void *arg)", why add rte_timer_stop_all_cb_t?
> 6. Can you provide a use case or code snippet that shows how we will use rte_timer_alt_manage()?
> 7. Why not make the argument to rte_timer_alt_manage_cb_t a "struct rte_timer *", instead of a "void *", since we pass a pointer-to-timer when we invoke the function?
> 
> --
> Regards,
> Robert Sanford
> 
> 
> On 3/6/19, 12:20 PM, "Erik Gabriel Carrillo" <erik.g.carrillo@intel.com> wrote:
> 
> Currently, the timer library uses a per-process table of structures to
> manage skiplists of timers presumably because timers contain arbitrary
> function pointers whose value may not resolve properly in other
> processes.
> 
> However, if the same callback is used handle all timers, and that
> callback is only invoked in one process, then it woud be safe to allow
> the data structures to be allocated in shared memory, and to allow
> secondary processes to modify the timer lists.  This would let timers be
> used in more multi-process scenarios.
> 
> The library's global variables are wrapped with a struct, and an array
> of these structures is created in shared memory.  The original APIs
> are updated to reference the zeroth entry in the array. This maintains
> the original behavior for both primary and secondary processes since
> the set intersection of their coremasks should be empty [1].  New APIs
> are introduced to enable the allocation/deallocation of other entries
> in the array.
> 
> New variants of the APIs used to start and stop timers are introduced;
> they allow a caller to specify which array entry should be used to
> locate the timer list to insert into or delete from.
> 
> Finally, a new variant of rte_timer_manage() is introduced, which
> allows a caller to specify which array entry should be used to locate
> the timer lists to process; it can also process multiple timer lists per
> invocation.
> 
> [1] https://doc.dpdk.org/guides/prog_guide/multi_proc_support.html#multi-process-limitations
> 
> Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
> ---
> lib/librte_timer/Makefile              |   1 +
> lib/librte_timer/rte_timer.c           | 519 ++++++++++++++++++++++++++++++---
> lib/librte_timer/rte_timer.h           | 226 +++++++++++++-
> lib/librte_timer/rte_timer_version.map |  22 ++
> 4 files changed, 723 insertions(+), 45 deletions(-)
> 
> diff --git a/lib/librte_timer/Makefile b/lib/librte_timer/Makefile
> index 4ebd528..8ec63f4 100644
> --- a/lib/librte_timer/Makefile
> +++ b/lib/librte_timer/Makefile
> @@ -6,6 +6,7 @@ include $(RTE_SDK)/mk/rte.vars.mk
> # library name
> LIB = librte_timer.a
> 
> +CFLAGS += -DALLOW_EXPERIMENTAL_API
> CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3
> LDLIBS += -lrte_eal
> 
> diff --git a/lib/librte_timer/rte_timer.c b/lib/librte_timer/rte_timer.c
> index 30c7b0a..2bd49d0 100644
> --- a/lib/librte_timer/rte_timer.c
> +++ b/lib/librte_timer/rte_timer.c
> @@ -5,6 +5,7 @@
> #include <string.h>
> #include <stdio.h>
> #include <stdint.h>
> +#include <stdbool.h>
> #include <inttypes.h>
> #include <assert.h>
> #include <sys/queue.h>
> @@ -21,11 +22,15 @@
> #include <rte_spinlock.h>
> #include <rte_random.h>
> #include <rte_pause.h>
> +#include <rte_memzone.h>
> +#include <rte_malloc.h>
> +#include <rte_compat.h>
> 
> #include "rte_timer.h"
> 
> -LIST_HEAD(rte_timer_list, rte_timer);
> -
> +/**
> + * Per-lcore info for timers.
> + */
> struct priv_timer {
>    struct rte_timer pending_head;  /**< dummy timer instance to head up list */
>    rte_spinlock_t list_lock;       /**< lock to protect list access */
> @@ -48,25 +53,84 @@ struct priv_timer {
> #endif
> } __rte_cache_aligned;
> 
> -/** per-lcore private info for timers */
> -static struct priv_timer priv_timer[RTE_MAX_LCORE];
> +#define FL_ALLOCATED    (1 << 0)
> +struct rte_timer_data {
> +    struct priv_timer priv_timer[RTE_MAX_LCORE];
> +    uint8_t internal_flags;
> +};
> +
> +#define RTE_MAX_DATA_ELS 64
> +static struct rte_timer_data *rte_timer_data_arr;
> +static uint32_t default_data_id;
> +static uint32_t rte_timer_subsystem_initialized;
> +
> +/* For maintaining older interfaces for a period */
> +static struct rte_timer_data default_timer_data;
> 
> /* when debug is enabled, store some statistics */
> #ifdef RTE_LIBRTE_TIMER_DEBUG
> -#define __TIMER_STAT_ADD(name, n) do {                    \
> +#define __TIMER_STAT_ADD(priv_timer, name, n) do {            \
>        unsigned __lcore_id = rte_lcore_id();            \
>        if (__lcore_id < RTE_MAX_LCORE)                \
>            priv_timer[__lcore_id].stats.name += (n);    \
>    } while(0)
> #else
> -#define __TIMER_STAT_ADD(name, n) do {} while(0)
> +#define __TIMER_STAT_ADD(priv_timer, name, n) do {} while (0)
> #endif
> 
> -/* Init the timer library. */
> +static inline int
> +timer_data_valid(uint32_t id)
> +{
> +    return !!(rte_timer_data_arr[id].internal_flags & FL_ALLOCATED);
> +}
> +
> +/* validate ID and retrieve timer data pointer, or return error value */
> +#define TIMER_DATA_VALID_GET_OR_ERR_RET(id, timer_data, retval) do {    \
> +    if (id >= RTE_MAX_DATA_ELS || !timer_data_valid(id))        \
> +        return retval;                        \
> +    timer_data = &rte_timer_data_arr[id];                \
> +} while (0)
> +
> +int __rte_experimental
> +rte_timer_data_alloc(uint32_t *id_ptr)
> +{
> +    int i;
> +    struct rte_timer_data *data;
> +
> +    if (!rte_timer_subsystem_initialized)
> +        return -ENOMEM;
> +
> +    for (i = 0; i < RTE_MAX_DATA_ELS; i++) {
> +        data = &rte_timer_data_arr[i];
> +        if (!(data->internal_flags & FL_ALLOCATED)) {
> +            data->internal_flags |= FL_ALLOCATED;
> +
> +            if (id_ptr)
> +                *id_ptr = i;
> +
> +            return 0;
> +        }
> +    }
> +
> +    return -ENOSPC;
> +}
> +
> +int __rte_experimental
> +rte_timer_data_dealloc(uint32_t id)
> +{
> +    struct rte_timer_data *timer_data;
> +    TIMER_DATA_VALID_GET_OR_ERR_RET(id, timer_data, -EINVAL);
> +
> +    timer_data->internal_flags &= ~(FL_ALLOCATED);
> +
> +    return 0;
> +}
> +
> void
> -rte_timer_subsystem_init(void)
> +rte_timer_subsystem_init_v20(void)
> {
>    unsigned lcore_id;
> +    struct priv_timer *priv_timer = default_timer_data.priv_timer;
> 
>    /* since priv_timer is static, it's zeroed by default, so only init some
>     * fields.
> @@ -76,6 +140,76 @@ rte_timer_subsystem_init(void)
>        priv_timer[lcore_id].prev_lcore = lcore_id;
>    }
> }
> +VERSION_SYMBOL(rte_timer_subsystem_init, _v20, 2.0);
> +
> +/* Init the timer library. Allocate an array of timer data structs in shared
> + * memory, and allocate the zeroth entry for use with original timer
> + * APIs. Since the intersection of the sets of lcore ids in primary and
> + * secondary processes should be empty, the zeroth entry can be shared by
> + * multiple processes.
> + */
> +int
> +rte_timer_subsystem_init_v1905(void)
> +{
> +    const struct rte_memzone *mz;
> +    struct rte_timer_data *data;
> +    int i, lcore_id;
> +    static const char *mz_name = "rte_timer_mz";
> +
> +    if (rte_timer_subsystem_initialized)
> +        return -EALREADY;
> +
> +    if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
> +        mz = rte_memzone_lookup(mz_name);
> +        if (mz == NULL)
> +            return -EEXIST;
> +
> +        rte_timer_data_arr = mz->addr;
> +
> +        rte_timer_data_arr[default_data_id].internal_flags |=
> +            FL_ALLOCATED;
> +
> +        rte_timer_subsystem_initialized = 1;
> +
> +        return 0;
> +    }
> +
> +    mz = rte_memzone_reserve_aligned(mz_name,
> +            RTE_MAX_DATA_ELS * sizeof(*rte_timer_data_arr),
> +            SOCKET_ID_ANY, 0, RTE_CACHE_LINE_SIZE);
> +    if (mz == NULL)
> +        return -ENOMEM;
> +
> +    rte_timer_data_arr = mz->addr;
> +
> +    for (i = 0; i < RTE_MAX_DATA_ELS; i++) {
> +        data = &rte_timer_data_arr[i];
> +
> +        for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
> +            rte_spinlock_init(
> +                &data->priv_timer[lcore_id].list_lock);
> +            data->priv_timer[lcore_id].prev_lcore = lcore_id;
> +        }
> +    }
> +
> +    rte_timer_data_arr[default_data_id].internal_flags |= FL_ALLOCATED;
> +
> +    rte_timer_subsystem_initialized = 1;
> +
> +    return 0;
> +}
> +MAP_STATIC_SYMBOL(int rte_timer_subsystem_init(void),
> +          rte_timer_subsystem_init_v1905);
> +BIND_DEFAULT_SYMBOL(rte_timer_subsystem_init, _v1905, 19.05);
> +
> +void __rte_experimental
> +rte_timer_subsystem_finalize(void)
> +{
> +    if (rte_timer_data_arr)
> +        rte_free(rte_timer_data_arr);
> +
> +    rte_timer_subsystem_initialized = 0;
> +}
> 
> /* Initialize the timer handle tim for use */
> void
> @@ -95,7 +229,8 @@ rte_timer_init(struct rte_timer *tim)
>  */
> static int
> timer_set_config_state(struct rte_timer *tim,
> -               union rte_timer_status *ret_prev_status)
> +               union rte_timer_status *ret_prev_status,
> +               struct priv_timer *priv_timer)
> {
>    union rte_timer_status prev_status, status;
>    int success = 0;
> @@ -207,7 +342,7 @@ timer_get_skiplist_level(unsigned curr_depth)
>  */
> static void
> timer_get_prev_entries(uint64_t time_val, unsigned tim_lcore,
> -        struct rte_timer **prev)
> +               struct rte_timer **prev, struct priv_timer *priv_timer)
> {
>    unsigned lvl = priv_timer[tim_lcore].curr_skiplist_depth;
>    prev[lvl] = &priv_timer[tim_lcore].pending_head;
> @@ -226,13 +361,15 @@ timer_get_prev_entries(uint64_t time_val, unsigned tim_lcore,
>  */
> static void
> timer_get_prev_entries_for_node(struct rte_timer *tim, unsigned tim_lcore,
> -        struct rte_timer **prev)
> +                struct rte_timer **prev,
> +                struct priv_timer *priv_timer)
> {
>    int i;
> +
>    /* to get a specific entry in the list, look for just lower than the time
>     * values, and then increment on each level individually if necessary
>     */
> -    timer_get_prev_entries(tim->expire - 1, tim_lcore, prev);
> +    timer_get_prev_entries(tim->expire - 1, tim_lcore, prev, priv_timer);
>    for (i = priv_timer[tim_lcore].curr_skiplist_depth - 1; i >= 0; i--) {
>        while (prev[i]->sl_next[i] != NULL &&
>                prev[i]->sl_next[i] != tim &&
> @@ -247,14 +384,15 @@ timer_get_prev_entries_for_node(struct rte_timer *tim, unsigned tim_lcore,
>  * timer must not be in a list
>  */
> static void
> -timer_add(struct rte_timer *tim, unsigned int tim_lcore)
> +timer_add(struct rte_timer *tim, unsigned int tim_lcore,
> +      struct priv_timer *priv_timer)
> {
>    unsigned lvl;
>    struct rte_timer *prev[MAX_SKIPLIST_DEPTH+1];
> 
>    /* find where exactly this element goes in the list of elements
>     * for each depth. */
> -    timer_get_prev_entries(tim->expire, tim_lcore, prev);
> +    timer_get_prev_entries(tim->expire, tim_lcore, prev, priv_timer);
> 
>    /* now assign it a new level and add at that level */
>    const unsigned tim_level = timer_get_skiplist_level(
> @@ -284,7 +422,7 @@ timer_add(struct rte_timer *tim, unsigned int tim_lcore)
>  */
> static void
> timer_del(struct rte_timer *tim, union rte_timer_status prev_status,
> -        int local_is_locked)
> +      int local_is_locked, struct priv_timer *priv_timer)
> {
>    unsigned lcore_id = rte_lcore_id();
>    unsigned prev_owner = prev_status.owner;
> @@ -304,7 +442,7 @@ timer_del(struct rte_timer *tim, union rte_timer_status prev_status,
>                ((tim->sl_next[0] == NULL) ? 0 : tim->sl_next[0]->expire);
> 
>    /* adjust pointers from previous entries to point past this */
> -    timer_get_prev_entries_for_node(tim, prev_owner, prev);
> +    timer_get_prev_entries_for_node(tim, prev_owner, prev, priv_timer);
>    for (i = priv_timer[prev_owner].curr_skiplist_depth - 1; i >= 0; i--) {
>        if (prev[i]->sl_next[i] == tim)
>            prev[i]->sl_next[i] = tim->sl_next[i];
> @@ -326,11 +464,13 @@ static int
> __rte_timer_reset(struct rte_timer *tim, uint64_t expire,
>          uint64_t period, unsigned tim_lcore,
>          rte_timer_cb_t fct, void *arg,
> -          int local_is_locked)
> +          int local_is_locked,
> +          struct rte_timer_data *timer_data)
> {
>    union rte_timer_status prev_status, status;
>    int ret;
>    unsigned lcore_id = rte_lcore_id();
> +    struct priv_timer *priv_timer = timer_data->priv_timer;
> 
>    /* round robin for tim_lcore */
>    if (tim_lcore == (unsigned)LCORE_ID_ANY) {
> @@ -348,11 +488,11 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t expire,
> 
>    /* wait that the timer is in correct status before update,
>     * and mark it as being configured */
> -    ret = timer_set_config_state(tim, &prev_status);
> +    ret = timer_set_config_state(tim, &prev_status, priv_timer);
>    if (ret < 0)
>        return -1;
> 
> -    __TIMER_STAT_ADD(reset, 1);
> +    __TIMER_STAT_ADD(priv_timer, reset, 1);
>    if (prev_status.state == RTE_TIMER_RUNNING &&
>        lcore_id < RTE_MAX_LCORE) {
>        priv_timer[lcore_id].updated = 1;
> @@ -360,8 +500,8 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t expire,
> 
>    /* remove it from list */
>    if (prev_status.state == RTE_TIMER_PENDING) {
> -        timer_del(tim, prev_status, local_is_locked);
> -        __TIMER_STAT_ADD(pending, -1);
> +        timer_del(tim, prev_status, local_is_locked, priv_timer);
> +        __TIMER_STAT_ADD(priv_timer, pending, -1);
>    }
> 
>    tim->period = period;
> @@ -376,8 +516,8 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t expire,
>    if (tim_lcore != lcore_id || !local_is_locked)
>        rte_spinlock_lock(&priv_timer[tim_lcore].list_lock);
> 
> -    __TIMER_STAT_ADD(pending, 1);
> -    timer_add(tim, tim_lcore);
> +    __TIMER_STAT_ADD(priv_timer, pending, 1);
> +    timer_add(tim, tim_lcore, priv_timer);
> 
>    /* update state: as we are in CONFIG state, only us can modify
>     * the state so we don't need to use cmpset() here */
> @@ -394,9 +534,9 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t expire,
> 
> /* Reset and start the timer associated with the timer handle tim */
> int
> -rte_timer_reset(struct rte_timer *tim, uint64_t ticks,
> -        enum rte_timer_type type, unsigned tim_lcore,
> -        rte_timer_cb_t fct, void *arg)
> +rte_timer_reset_v20(struct rte_timer *tim, uint64_t ticks,
> +            enum rte_timer_type type, unsigned int tim_lcore,
> +            rte_timer_cb_t fct, void *arg)
> {
>    uint64_t cur_time = rte_get_timer_cycles();
>    uint64_t period;
> @@ -412,7 +552,48 @@ rte_timer_reset(struct rte_timer *tim, uint64_t ticks,
>        period = 0;
> 
>    return __rte_timer_reset(tim,  cur_time + ticks, period, tim_lcore,
> -              fct, arg, 0);
> +              fct, arg, 0, &default_timer_data);
> +}
> +VERSION_SYMBOL(rte_timer_reset, _v20, 2.0);
> +
> +int
> +rte_timer_reset_v1905(struct rte_timer *tim, uint64_t ticks,
> +              enum rte_timer_type type, unsigned int tim_lcore,
> +              rte_timer_cb_t fct, void *arg)
> +{
> +    return rte_timer_alt_reset(default_data_id, tim, ticks, type,
> +                   tim_lcore, fct, arg);
> +}
> +MAP_STATIC_SYMBOL(int rte_timer_reset(struct rte_timer *tim, uint64_t ticks,
> +                      enum rte_timer_type type,
> +                      unsigned int tim_lcore,
> +                      rte_timer_cb_t fct, void *arg),
> +          rte_timer_reset_v1905);
> +BIND_DEFAULT_SYMBOL(rte_timer_reset, _v1905, 19.05);
> +
> +int __rte_experimental
> +rte_timer_alt_reset(uint32_t timer_data_id, struct rte_timer *tim,
> +            uint64_t ticks, enum rte_timer_type type,
> +            unsigned int tim_lcore, rte_timer_cb_t fct, void *arg)
> +{
> +    uint64_t cur_time = rte_get_timer_cycles();
> +    uint64_t period;
> +    struct rte_timer_data *timer_data;
> +
> +    TIMER_DATA_VALID_GET_OR_ERR_RET(timer_data_id, timer_data, -EINVAL);
> +
> +    if (unlikely((tim_lcore != (unsigned int)LCORE_ID_ANY) &&
> +            !(rte_lcore_is_enabled(tim_lcore) ||
> +              rte_lcore_has_role(tim_lcore, ROLE_SERVICE))))
> +        return -1;
> +
> +    if (type == PERIODICAL)
> +        period = ticks;
> +    else
> +        period = 0;
> +
> +    return __rte_timer_reset(tim,  cur_time + ticks, period, tim_lcore,
> +                 fct, arg, 0, timer_data);
> }
> 
> /* loop until rte_timer_reset() succeed */
> @@ -426,21 +607,22 @@ rte_timer_reset_sync(struct rte_timer *tim, uint64_t ticks,
>        rte_pause();
> }
> 
> -/* Stop the timer associated with the timer handle tim */
> -int
> -rte_timer_stop(struct rte_timer *tim)
> +static int
> +__rte_timer_stop(struct rte_timer *tim, int local_is_locked,
> +         struct rte_timer_data *timer_data)
> {
>    union rte_timer_status prev_status, status;
>    unsigned lcore_id = rte_lcore_id();
>    int ret;
> +    struct priv_timer *priv_timer = timer_data->priv_timer;
> 
>    /* wait that the timer is in correct status before update,
>     * and mark it as being configured */
> -    ret = timer_set_config_state(tim, &prev_status);
> +    ret = timer_set_config_state(tim, &prev_status, priv_timer);
>    if (ret < 0)
>        return -1;
> 
> -    __TIMER_STAT_ADD(stop, 1);
> +    __TIMER_STAT_ADD(priv_timer, stop, 1);
>    if (prev_status.state == RTE_TIMER_RUNNING &&
>        lcore_id < RTE_MAX_LCORE) {
>        priv_timer[lcore_id].updated = 1;
> @@ -448,8 +630,8 @@ rte_timer_stop(struct rte_timer *tim)
> 
>    /* remove it from list */
>    if (prev_status.state == RTE_TIMER_PENDING) {
> -        timer_del(tim, prev_status, 0);
> -        __TIMER_STAT_ADD(pending, -1);
> +        timer_del(tim, prev_status, local_is_locked, priv_timer);
> +        __TIMER_STAT_ADD(priv_timer, pending, -1);
>    }
> 
>    /* mark timer as stopped */
> @@ -461,6 +643,33 @@ rte_timer_stop(struct rte_timer *tim)
>    return 0;
> }
> 
> +/* Stop the timer associated with the timer handle tim */
> +int
> +rte_timer_stop_v20(struct rte_timer *tim)
> +{
> +    return __rte_timer_stop(tim, 0, &default_timer_data);
> +}
> +VERSION_SYMBOL(rte_timer_stop, _v20, 2.0);
> +
> +int
> +rte_timer_stop_v1905(struct rte_timer *tim)
> +{
> +    return rte_timer_alt_stop(default_data_id, tim);
> +}
> +MAP_STATIC_SYMBOL(int rte_timer_stop(struct rte_timer *tim),
> +          rte_timer_stop_v1905);
> +BIND_DEFAULT_SYMBOL(rte_timer_stop, _v1905, 19.05);
> +
> +int __rte_experimental
> +rte_timer_alt_stop(uint32_t timer_data_id, struct rte_timer *tim)
> +{
> +    struct rte_timer_data *timer_data;
> +
> +    TIMER_DATA_VALID_GET_OR_ERR_RET(timer_data_id, timer_data, -EINVAL);
> +
> +    return __rte_timer_stop(tim, 0, timer_data);
> +}
> +
> /* loop until rte_timer_stop() succeed */
> void
> rte_timer_stop_sync(struct rte_timer *tim)
> @@ -477,7 +686,8 @@ rte_timer_pending(struct rte_timer *tim)
> }
> 
> /* must be called periodically, run all timer that expired */
> -void rte_timer_manage(void)
> +static void
> +__rte_timer_manage(struct rte_timer_data *timer_data)
> {
>    union rte_timer_status status;
>    struct rte_timer *tim, *next_tim;
> @@ -486,11 +696,12 @@ void rte_timer_manage(void)
>    struct rte_timer *prev[MAX_SKIPLIST_DEPTH + 1];
>    uint64_t cur_time;
>    int i, ret;
> +    struct priv_timer *priv_timer = timer_data->priv_timer;
> 
>    /* timer manager only runs on EAL thread with valid lcore_id */
>    assert(lcore_id < RTE_MAX_LCORE);
> 
> -    __TIMER_STAT_ADD(manage, 1);
> +    __TIMER_STAT_ADD(priv_timer, manage, 1);
>    /* optimize for the case where per-cpu list is empty */
>    if (priv_timer[lcore_id].pending_head.sl_next[0] == NULL)
>        return;
> @@ -518,7 +729,7 @@ void rte_timer_manage(void)
>    tim = priv_timer[lcore_id].pending_head.sl_next[0];
> 
>    /* break the existing list at current time point */
> -    timer_get_prev_entries(cur_time, lcore_id, prev);
> +    timer_get_prev_entries(cur_time, lcore_id, prev, priv_timer);
>    for (i = priv_timer[lcore_id].curr_skiplist_depth -1; i >= 0; i--) {
>        if (prev[i] == &priv_timer[lcore_id].pending_head)
>            continue;
> @@ -563,7 +774,7 @@ void rte_timer_manage(void)
>        /* execute callback function with list unlocked */
>        tim->f(tim, tim->arg);
> 
> -        __TIMER_STAT_ADD(pending, -1);
> +        __TIMER_STAT_ADD(priv_timer, pending, -1);
>        /* the timer was stopped or reloaded by the callback
>         * function, we have nothing to do here */
>        if (priv_timer[lcore_id].updated == 1)
> @@ -580,24 +791,222 @@ void rte_timer_manage(void)
>            /* keep it in list and mark timer as pending */
>            rte_spinlock_lock(&priv_timer[lcore_id].list_lock);
>            status.state = RTE_TIMER_PENDING;
> -            __TIMER_STAT_ADD(pending, 1);
> +            __TIMER_STAT_ADD(priv_timer, pending, 1);
>            status.owner = (int16_t)lcore_id;
>            rte_wmb();
>            tim->status.u32 = status.u32;
>            __rte_timer_reset(tim, tim->expire + tim->period,
> -                tim->period, lcore_id, tim->f, tim->arg, 1);
> +                tim->period, lcore_id, tim->f, tim->arg, 1,
> +                timer_data);
>            rte_spinlock_unlock(&priv_timer[lcore_id].list_lock);
>        }
>    }
>    priv_timer[lcore_id].running_tim = NULL;
> }
> 
> +void
> +rte_timer_manage_v20(void)
> +{
> +    __rte_timer_manage(&default_timer_data);
> +}
> +VERSION_SYMBOL(rte_timer_manage, _v20, 2.0);
> +
> +int
> +rte_timer_manage_v1905(void)
> +{
> +    struct rte_timer_data *timer_data;
> +
> +    TIMER_DATA_VALID_GET_OR_ERR_RET(default_data_id, timer_data, -EINVAL);
> +
> +    __rte_timer_manage(timer_data);
> +
> +    return 0;
> +}
> +MAP_STATIC_SYMBOL(int rte_timer_manage(void), rte_timer_manage_v1905);
> +BIND_DEFAULT_SYMBOL(rte_timer_manage, _v1905, 19.05);
> +
> +int __rte_experimental
> +rte_timer_alt_manage(uint32_t timer_data_id,
> +             unsigned int *poll_lcores,
> +             int nb_poll_lcores,
> +             rte_timer_alt_manage_cb_t f)
> +{
> +    union rte_timer_status status;
> +    struct rte_timer *tim, *next_tim, **pprev;
> +    struct rte_timer *run_first_tims[RTE_MAX_LCORE];
> +    unsigned int runlist_lcore_ids[RTE_MAX_LCORE];
> +    unsigned int this_lcore = rte_lcore_id();
> +    struct rte_timer *prev[MAX_SKIPLIST_DEPTH + 1];
> +    uint64_t cur_time;
> +    int i, j, ret;
> +    int nb_runlists = 0;
> +    struct rte_timer_data *data;
> +    struct priv_timer *privp;
> +    uint32_t poll_lcore;
> +
> +    TIMER_DATA_VALID_GET_OR_ERR_RET(timer_data_id, data, -EINVAL);
> +
> +    /* timer manager only runs on EAL thread with valid lcore_id */
> +    assert(this_lcore < RTE_MAX_LCORE);
> +
> +    __TIMER_STAT_ADD(data->priv_timer, manage, 1);
> +
> +    if (poll_lcores == NULL) {
> +        poll_lcores = (unsigned int []){rte_lcore_id()};
> +        nb_poll_lcores = 1;
> +    }
> +
> +    for (i = 0, poll_lcore = poll_lcores[i]; i < nb_poll_lcores;
> +         poll_lcore = poll_lcores[++i]) {
> +        privp = &data->priv_timer[poll_lcore];
> +
> +        /* optimize for the case where per-cpu list is empty */
> +        if (privp->pending_head.sl_next[0] == NULL)
> +            continue;
> +        cur_time = rte_get_timer_cycles();
> +
> +#ifdef RTE_ARCH_64
> +        /* on 64-bit the value cached in the pending_head.expired will
> +         * be updated atomically, so we can consult that for a quick
> +         * check here outside the lock
> +         */
> +        if (likely(privp->pending_head.expire > cur_time))
> +            continue;
> +#endif
> +
> +        /* browse ordered list, add expired timers in 'expired' list */
> +        rte_spinlock_lock(&privp->list_lock);
> +
> +        /* if nothing to do just unlock and return */
> +        if (privp->pending_head.sl_next[0] == NULL ||
> +            privp->pending_head.sl_next[0]->expire > cur_time) {
> +            rte_spinlock_unlock(&privp->list_lock);
> +            continue;
> +        }
> +
> +        /* save start of list of expired timers */
> +        tim = privp->pending_head.sl_next[0];
> +
> +        /* break the existing list at current time point */
> +        timer_get_prev_entries(cur_time, poll_lcore, prev,
> +                       data->priv_timer);
> +        for (j = privp->curr_skiplist_depth - 1; j >= 0; j--) {
> +            if (prev[j] == &privp->pending_head)
> +                continue;
> +            privp->pending_head.sl_next[j] =
> +                prev[j]->sl_next[j];
> +            if (prev[j]->sl_next[j] == NULL)
> +                privp->curr_skiplist_depth--;
> +
> +            prev[j]->sl_next[j] = NULL;
> +        }
> +
> +        /* transition run-list from PENDING to RUNNING */
> +        run_first_tims[nb_runlists] = tim;
> +        runlist_lcore_ids[nb_runlists] = poll_lcore;
> +        pprev = &run_first_tims[nb_runlists];
> +        nb_runlists++;
> +
> +        for ( ; tim != NULL; tim = next_tim) {
> +            next_tim = tim->sl_next[0];
> +
> +            ret = timer_set_running_state(tim);
> +            if (likely(ret == 0)) {
> +                pprev = &tim->sl_next[0];
> +            } else {
> +                /* another core is trying to re-config this one,
> +                 * remove it from local expired list
> +                 */
> +                *pprev = next_tim;
> +            }
> +        }
> +
> +        /* update the next to expire timer value */
> +        privp->pending_head.expire =
> +            (privp->pending_head.sl_next[0] == NULL) ? 0 :
> +            privp->pending_head.sl_next[0]->expire;
> +
> +        rte_spinlock_unlock(&privp->list_lock);
> +    }
> +
> +    /* Now process the run lists */
> +    while (1) {
> +        bool done = true;
> +        uint64_t min_expire = UINT64_MAX;
> +        int min_idx = 0;
> +
> +        /* Find the next oldest timer to process */
> +        for (i = 0; i < nb_runlists; i++) {
> +            tim = run_first_tims[i];
> +
> +            if (tim != NULL && tim->expire < min_expire) {
> +                min_expire = tim->expire;
> +                min_idx = i;
> +                done = false;
> +            }
> +        }
> +
> +        if (done)
> +            break;
> +
> +        tim = run_first_tims[min_idx];
> +        privp = &data->priv_timer[runlist_lcore_ids[min_idx]];
> +
> +        /* Move down the runlist from which we picked a timer to
> +         * execute
> +         */
> +        run_first_tims[min_idx] = run_first_tims[min_idx]->sl_next[0];
> +
> +        privp->updated = 0;
> +        privp->running_tim = tim;
> +
> +        /* Call the provided callback function */
> +        f(tim);
> +
> +        __TIMER_STAT_ADD(privp, pending, -1);
> +
> +        /* the timer was stopped or reloaded by the callback
> +         * function, we have nothing to do here
> +         */
> +        if (privp->updated == 1)
> +            continue;
> +
> +        if (tim->period == 0) {
> +            /* remove from done list and mark timer as stopped */
> +            status.state = RTE_TIMER_STOP;
> +            status.owner = RTE_TIMER_NO_OWNER;
> +            rte_wmb();
> +            tim->status.u32 = status.u32;
> +        } else {
> +            /* keep it in list and mark timer as pending */
> +            rte_spinlock_lock(
> +                &data->priv_timer[this_lcore].list_lock);
> +            status.state = RTE_TIMER_PENDING;
> +            __TIMER_STAT_ADD(data->priv_timer, pending, 1);
> +            status.owner = (int16_t)this_lcore;
> +            rte_wmb();
> +            tim->status.u32 = status.u32;
> +            __rte_timer_reset(tim, tim->expire + tim->period,
> +                tim->period, this_lcore, tim->f, tim->arg, 1,
> +                data);
> +            rte_spinlock_unlock(
> +                &data->priv_timer[this_lcore].list_lock);
> +        }
> +
> +        privp->running_tim = NULL;
> +    }
> +
> +    return 0;
> +}
> +
> /* dump statistics about timers */
> -void rte_timer_dump_stats(FILE *f)
> +static void
> +__rte_timer_dump_stats(struct rte_timer_data *timer_data __rte_unused, FILE *f)
> {
> #ifdef RTE_LIBRTE_TIMER_DEBUG
>    struct rte_timer_debug_stats sum;
>    unsigned lcore_id;
> +    struct priv_timer *priv_timer = timer_data->priv_timer;
> 
>    memset(&sum, 0, sizeof(sum));
>    for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
> @@ -615,3 +1024,31 @@ void rte_timer_dump_stats(FILE *f)
>    fprintf(f, "No timer statistics, RTE_LIBRTE_TIMER_DEBUG is disabled\n");
> #endif
> }
> +
> +void
> +rte_timer_dump_stats_v20(FILE *f)
> +{
> +    __rte_timer_dump_stats(&default_timer_data, f);
> +}
> +VERSION_SYMBOL(rte_timer_dump_stats, _v20, 2.0);
> +
> +int
> +rte_timer_dump_stats_v1905(FILE *f)
> +{
> +    return rte_timer_alt_dump_stats(default_data_id, f);
> +}
> +MAP_STATIC_SYMBOL(int rte_timer_dump_stats(FILE *f),
> +          rte_timer_dump_stats_v1905);
> +BIND_DEFAULT_SYMBOL(rte_timer_dump_stats, _v1905, 19.05);
> +
> +int __rte_experimental
> +rte_timer_alt_dump_stats(uint32_t timer_data_id __rte_unused, FILE *f)
> +{
> +    struct rte_timer_data *timer_data;
> +
> +    TIMER_DATA_VALID_GET_OR_ERR_RET(timer_data_id, timer_data, -EINVAL);
> +
> +    __rte_timer_dump_stats(timer_data, f);
> +
> +    return 0;
> +}
> diff --git a/lib/librte_timer/rte_timer.h b/lib/librte_timer/rte_timer.h
> index 9b95cd2..bee1676 100644
> --- a/lib/librte_timer/rte_timer.h
> +++ b/lib/librte_timer/rte_timer.h
> @@ -39,6 +39,7 @@
> #include <stddef.h>
> #include <rte_common.h>
> #include <rte_config.h>
> +#include <rte_spinlock.h>
> 
> #ifdef __cplusplus
> extern "C" {
> @@ -132,12 +133,68 @@ struct rte_timer
> #endif
> 
> /**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Allocate a timer data instance in shared memory to track a set of pending
> + * timer lists.
> + *
> + * @param id_ptr
> + *   Pointer to variable into which to write the identifier of the allocated
> + *   timer data instance.
> + *
> + * @return
> + *   - 0: Success
> + *   - -ENOSPC: maximum number of timer data instances already allocated
> + */
> +int __rte_experimental rte_timer_data_alloc(uint32_t *id_ptr);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Deallocate a timer data instance.
> + *
> + * @param id
> + *   Identifier of the timer data instance to deallocate.
> + *
> + * @return
> + *   - 0: Success
> + *   - -EINVAL: invalid timer data instance identifier
> + */
> +int __rte_experimental rte_timer_data_dealloc(uint32_t id);
> +
> +/**
>  * Initialize the timer library.
>  *
>  * Initializes internal variables (list, locks and so on) for the RTE
>  * timer library.
>  */
> -void rte_timer_subsystem_init(void);
> +void rte_timer_subsystem_init_v20(void);
> +
> +/**
> + * Initialize the timer library.
> + *
> + * Initializes internal variables (list, locks and so on) for the RTE
> + * timer library.
> + *
> + * @return
> + *   - 0: Success
> + *   - -EEXIST: Returned in secondary process when primary process has not
> + *      yet initialized the timer subsystem
> + *   - -ENOMEM: Unable to allocate memory needed to initialize timer
> + *      subsystem
> + */
> +int rte_timer_subsystem_init_v1905(void);
> +int rte_timer_subsystem_init(void);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Free timer subsystem resources.
> + */
> +void __rte_experimental rte_timer_subsystem_finalize(void);
> 
> /**
>  * Initialize a timer handle.
> @@ -193,6 +250,12 @@ void rte_timer_init(struct rte_timer *tim);
>  *   - 0: Success; the timer is scheduled.
>  *   - (-1): Timer is in the RUNNING or CONFIG state.
>  */
> +int rte_timer_reset_v20(struct rte_timer *tim, uint64_t ticks,
> +            enum rte_timer_type type, unsigned int tim_lcore,
> +            rte_timer_cb_t fct, void *arg);
> +int rte_timer_reset_v1905(struct rte_timer *tim, uint64_t ticks,
> +              enum rte_timer_type type, unsigned int tim_lcore,
> +              rte_timer_cb_t fct, void *arg);
> int rte_timer_reset(struct rte_timer *tim, uint64_t ticks,
>            enum rte_timer_type type, unsigned tim_lcore,
>            rte_timer_cb_t fct, void *arg);
> @@ -252,9 +315,10 @@ rte_timer_reset_sync(struct rte_timer *tim, uint64_t ticks,
>  *   - 0: Success; the timer is stopped.
>  *   - (-1): The timer is in the RUNNING or CONFIG state.
>  */
> +int rte_timer_stop_v20(struct rte_timer *tim);
> +int rte_timer_stop_v1905(struct rte_timer *tim);
> int rte_timer_stop(struct rte_timer *tim);
> 
> -
> /**
>  * Loop until rte_timer_stop() succeeds.
>  *
> @@ -292,7 +356,25 @@ int rte_timer_pending(struct rte_timer *tim);
>  * function. However, the more often the function is called, the more
>  * CPU resources it will use.
>  */
> -void rte_timer_manage(void);
> +void rte_timer_manage_v20(void);
> +
> +/**
> + * Manage the timer list and execute callback functions.
> + *
> + * This function must be called periodically from EAL lcores
> + * main_loop(). It browses the list of pending timers and runs all
> + * timers that are expired.
> + *
> + * The precision of the timer depends on the call frequency of this
> + * function. However, the more often the function is called, the more
> + * CPU resources it will use.
> + *
> + * @return
> + *   - 0: Success
> + *   - -EINVAL: timer subsystem not yet initialized
> + */
> +int rte_timer_manage_v1905(void);
> +int rte_timer_manage(void);
> 
> /**
>  * Dump statistics about timers.
> @@ -300,7 +382,143 @@ void rte_timer_manage(void);
>  * @param f
>  *   A pointer to a file for output
>  */
> -void rte_timer_dump_stats(FILE *f);
> +void rte_timer_dump_stats_v20(FILE *f);
> +
> +/**
> + * Dump statistics about timers.
> + *
> + * @param f
> + *   A pointer to a file for output
> + * @return
> + *   - 0: Success
> + *   - -EINVAL: timer subsystem not yet initialized
> + */
> +int rte_timer_dump_stats_v1905(FILE *f);
> +int rte_timer_dump_stats(FILE *f);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * This function is the same as rte_timer_reset(), except that it allows a
> + * caller to specify the rte_timer_data instance containing the list to which
> + * the timer should be added.
> + *
> + * @see rte_timer_reset()
> + *
> + * @param timer_data_id
> + *   An identifier indicating which instance of timer data should be used for
> + *   this operation.
> + * @param tim
> + *   The timer handle.
> + * @param ticks
> + *   The number of cycles (see rte_get_hpet_hz()) before the callback
> + *   function is called.
> + * @param type
> + *   The type can be either:
> + *   - PERIODICAL: The timer is automatically reloaded after execution
> + *     (returns to the PENDING state)
> + *   - SINGLE: The timer is one-shot, that is, the timer goes to a
> + *     STOPPED state after execution.
> + * @param tim_lcore
> + *   The ID of the lcore where the timer callback function has to be
> + *   executed. If tim_lcore is LCORE_ID_ANY, the timer library will
> + *   launch it on a different core for each call (round-robin).
> + * @param fct
> + *   The callback function of the timer. This parameter can be NULL if (and
> + *   only if) rte_timer_alt_manage() will be used to manage this timer.
> + * @param arg
> + *   The user argument of the callback function.
> + * @return
> + *   - 0: Success; the timer is scheduled.
> + *   - (-1): Timer is in the RUNNING or CONFIG state.
> + *   - -EINVAL: invalid timer_data_id
> + */
> +int __rte_experimental
> +rte_timer_alt_reset(uint32_t timer_data_id, struct rte_timer *tim,
> +            uint64_t ticks, enum rte_timer_type type,
> +            unsigned int tim_lcore, rte_timer_cb_t fct, void *arg);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * This function is the same as rte_timer_stop(), except that it allows a
> + * caller to specify the rte_timer_data instance containing the list from which
> + * this timer should be removed.
> + *
> + * @see rte_timer_stop()
> + *
> + * @param timer_data_id
> + *   An identifier indicating which instance of timer data should be used for
> + *   this operation.
> + * @param tim
> + *   The timer handle.
> + * @return
> + *   - 0: Success; the timer is stopped.
> + *   - (-1): The timer is in the RUNNING or CONFIG state.
> + *   - -EINVAL: invalid timer_data_id
> + */
> +int __rte_experimental
> +rte_timer_alt_stop(uint32_t timer_data_id, struct rte_timer *tim);
> +
> +/**
> + * Callback function type for rte_timer_alt_manage().
> + */
> +typedef void (*rte_timer_alt_manage_cb_t)(void *);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Manage a set of timer lists and execute the specified callback function for
> + * all expired timers. This function is similar to rte_timer_manage(), except
> + * that it allows a caller to specify the timer_data instance that should
> + * be operated on, as well as a set of lcore IDs identifying which timer lists
> + * should be processed.  Callback functions of individual timers are ignored.
> + *
> + * @see rte_timer_manage()
> + *
> + * @param timer_data_id
> + *   An identifier indicating which instance of timer data should be used for
> + *   this operation.
> + * @param poll_lcores
> + *   An array of lcore ids identifying the timer lists that should be processed.
> + *   NULL is allowed - if NULL, the timer list corresponding to the lcore
> + *   calling this routine is processed (same as rte_timer_manage()).
> + * @param n_poll_lcores
> + *   The size of the poll_lcores array. If 'poll_lcores' is NULL, this parameter
> + *   is ignored.
> + * @param f
> + *   The callback function which should be called for all expired timers.
> + * @return
> + *   - 0: success
> + *   - -EINVAL: invalid timer_data_id
> + */
> +int __rte_experimental
> +rte_timer_alt_manage(uint32_t timer_data_id, unsigned int *poll_lcores,
> +             int n_poll_lcores, rte_timer_alt_manage_cb_t f);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * This function is the same as rte_timer_dump_stats(), except that it allows
> + * the caller to specify the rte_timer_data instance that should be used.
> + *
> + * @see rte_timer_dump_stats()
> + *
> + * @param timer_data_id
> + *   An identifier indicating which instance of timer data should be used for
> + *   this operation.
> + * @param f
> + *   A pointer to a file for output
> + * @return
> + *   - 0: success
> + *   - -EINVAL: invalid timer_data_id
> + */
> +int __rte_experimental
> +rte_timer_alt_dump_stats(uint32_t timer_data_id, FILE *f);
> 
> #ifdef __cplusplus
> }
> diff --git a/lib/librte_timer/rte_timer_version.map b/lib/librte_timer/rte_timer_version.map
> index 9b2e4b8..c2e5836 100644
> --- a/lib/librte_timer/rte_timer_version.map
> +++ b/lib/librte_timer/rte_timer_version.map
> @@ -13,3 +13,25 @@ DPDK_2.0 {
> 
>    local: *;
> };
> +
> +DPDK_19.05 {
> +    global:
> +
> +    rte_timer_dump_stats;
> +    rte_timer_manage;
> +    rte_timer_reset;
> +    rte_timer_stop;
> +    rte_timer_subsystem_init;
> +} DPDK_2.0;
> +
> +EXPERIMENTAL {
> +    global:
> +
> +    rte_timer_alt_dump_stats;
> +    rte_timer_alt_manage;
> +    rte_timer_alt_reset;
> +    rte_timer_alt_stop;
> +    rte_timer_data_alloc;
> +    rte_timer_data_dealloc;
> +    rte_timer_subsystem_finalize;
> +};
> -- 
> 2.6.4
> 
> 

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [dpdk-dev] [PATCH v4 1/2] timer: allow timer management in shared memory
  2019-03-21  1:01           ` Carrillo, Erik G
  2019-03-21  1:01             ` Carrillo, Erik G
@ 2019-03-27 14:03             ` Thomas Monjalon
  2019-03-27 14:03               ` Thomas Monjalon
  2019-03-28 12:42               ` Carrillo, Erik G
  1 sibling, 2 replies; 77+ messages in thread
From: Thomas Monjalon @ 2019-03-27 14:03 UTC (permalink / raw)
  To: Carrillo, Erik G; +Cc: dev, Sanford, Robert, nhorman

21/03/2019 02:01, Carrillo, Erik G:
> Hi Robert,
> 
> Thanks for the review and suggestions.  I’m out of the office on bonding leave for the next few weeks, but I’ll update the patch to address your points below when I return.

This is unfortunate.
This patch was waiting for reviews for months so we should merge it
in 19.05 if possible. We can consider it for the -rc2 as an exception.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [dpdk-dev] [PATCH v4 1/2] timer: allow timer management in shared memory
  2019-03-27 14:03             ` Thomas Monjalon
@ 2019-03-27 14:03               ` Thomas Monjalon
  2019-03-28 12:42               ` Carrillo, Erik G
  1 sibling, 0 replies; 77+ messages in thread
From: Thomas Monjalon @ 2019-03-27 14:03 UTC (permalink / raw)
  To: Carrillo, Erik G; +Cc: dev, Sanford, Robert, nhorman

21/03/2019 02:01, Carrillo, Erik G:
> Hi Robert,
> 
> Thanks for the review and suggestions.  I’m out of the office on bonding leave for the next few weeks, but I’ll update the patch to address your points below when I return.

This is unfortunate.
This patch was waiting for reviews for months so we should merge it
in 19.05 if possible. We can consider it for the -rc2 as an exception.



^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [dpdk-dev] [PATCH v4 1/2] timer: allow timer management in shared memory
  2019-03-27 14:03             ` Thomas Monjalon
  2019-03-27 14:03               ` Thomas Monjalon
@ 2019-03-28 12:42               ` Carrillo, Erik G
  2019-03-28 12:42                 ` Carrillo, Erik G
  1 sibling, 1 reply; 77+ messages in thread
From: Carrillo, Erik G @ 2019-03-28 12:42 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev, Sanford, Robert, nhorman


> On Mar 27, 2019, at 9:03 AM, Thomas Monjalon <thomas@monjalon.net> wrote:
> 
> 21/03/2019 02:01, Carrillo, Erik G:
>> Hi Robert,
>> 
>> Thanks for the review and suggestions.  I’m out of the office on bonding leave for the next few weeks, but I’ll update the patch to address your points below when I return.
> 
> This is unfortunate.
> This patch was waiting for reviews for months so we should merge it
> in 19.05 if possible. We can consider it for the -rc2 as an exception.
> 
> 

That sounds good - thank you, Thomas.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [dpdk-dev] [PATCH v4 1/2] timer: allow timer management in shared memory
  2019-03-28 12:42               ` Carrillo, Erik G
@ 2019-03-28 12:42                 ` Carrillo, Erik G
  0 siblings, 0 replies; 77+ messages in thread
From: Carrillo, Erik G @ 2019-03-28 12:42 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev, Sanford, Robert, nhorman

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="UTF-8", Size: 539 bytes --]


> On Mar 27, 2019, at 9:03 AM, Thomas Monjalon <thomas@monjalon.net> wrote:
> 
> 21/03/2019 02:01, Carrillo, Erik G:
>> Hi Robert,
>> 
>> Thanks for the review and suggestions.  I’m out of the office on bonding leave for the next few weeks, but I’ll update the patch to address your points below when I return.
> 
> This is unfortunate.
> This patch was waiting for reviews for months so we should merge it
> in 19.05 if possible. We can consider it for the -rc2 as an exception.
> 
> 

That sounds good - thank you, Thomas.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [dpdk-dev] [PATCH v5 0/2] Timer library changes
  2019-03-06 17:20     ` [dpdk-dev] [PATCH v4 " Erik Gabriel Carrillo
  2019-03-06 17:20       ` [dpdk-dev] [PATCH v4 1/2] timer: allow timer management in shared memory Erik Gabriel Carrillo
  2019-03-06 17:20       ` [dpdk-dev] [PATCH v4 2/2] timer: add function to stop all timers in a list Erik Gabriel Carrillo
@ 2019-04-15 21:41       ` Erik Gabriel Carrillo
  2019-04-15 21:41         ` Erik Gabriel Carrillo
                           ` (3 more replies)
  2 siblings, 4 replies; 77+ messages in thread
From: Erik Gabriel Carrillo @ 2019-04-15 21:41 UTC (permalink / raw)
  To: rsanford, thomas; +Cc: dev

This patch series modifies the timer library in such a way that
structures that used to be statically allocated in a process's data
segment are now allocated in shared memory.  As these structures contain
lists of timers, new APIs are introduced that allow a caller to specify
the particular structure instance into which a timer should be inserted
or from which a timer should be removed.  This enables primary and
secondary processes to modify the same timer list, which enables some
multi-process use cases that were not previously possible; e.g. a
secondary process can start a timer whose expiration is detected in a
primary process running a new flavor of timer_manage().

The original library API is mostly unchanged, though implementations are
updated to call into newly added functions with a default structure
instance ID that provides the original behavior.  New functions are
introduced to enable applications to allocate structure instances to
house timer lists, and to reference them with an identifier when
starting and stopping timers, and finally, to manage the timer lists
referenced with an identifier.

My initial performance testing with the "timer_perf_autotest" test shows
no performance regression or improvement, and inspection of the
generated optimized code shows that the extra function call gets inlined
in the functions that now have an extra function call. 

Changes in v5:
 - define default_data_id as const (Robert)
 - modify for-loop control in rte_timer_alt_manage and
   rte_timer_stop_all (Robert)
 - change parameter type in rte_timer_alt_manage_cb_t from "void *" to
   "struct rte_timer *" (Robert)

Changes in v4:
 - Updated versioned symbols so that they correspond to the next
   release. Checked ABI compatibility again with validate-abi.sh.

Changes in v3:
 - remove C++ style comment in first patch in series (Stephen)

Changes in v2:
 - split these changes out into their own series
 - version the symbols where the existing ABI was updated, and
   provide alternate implementation with behavior equivalent to original
   behavior. Validated ABI compatibility with validate-abi.sh
 - refactor changes to simplify patches

Erik Gabriel Carrillo (2):
  timer: allow timer management in shared memory
  timer: add function to stop all timers in a list

 lib/librte_timer/Makefile              |   1 +
 lib/librte_timer/rte_timer.c           | 557 ++++++++++++++++++++++++++++++---
 lib/librte_timer/rte_timer.h           | 258 ++++++++++++++-
 lib/librte_timer/rte_timer_version.map |  23 ++
 4 files changed, 794 insertions(+), 45 deletions(-)

-- 
2.6.4

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [dpdk-dev] [PATCH v5 0/2] Timer library changes
  2019-04-15 21:41       ` [dpdk-dev] [PATCH v5 0/2] Timer library changes Erik Gabriel Carrillo
@ 2019-04-15 21:41         ` Erik Gabriel Carrillo
  2019-04-15 21:41         ` [dpdk-dev] [PATCH v5 1/2] timer: allow timer management in shared memory Erik Gabriel Carrillo
                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 77+ messages in thread
From: Erik Gabriel Carrillo @ 2019-04-15 21:41 UTC (permalink / raw)
  To: rsanford, thomas; +Cc: dev

This patch series modifies the timer library in such a way that
structures that used to be statically allocated in a process's data
segment are now allocated in shared memory.  As these structures contain
lists of timers, new APIs are introduced that allow a caller to specify
the particular structure instance into which a timer should be inserted
or from which a timer should be removed.  This enables primary and
secondary processes to modify the same timer list, which enables some
multi-process use cases that were not previously possible; e.g. a
secondary process can start a timer whose expiration is detected in a
primary process running a new flavor of timer_manage().

The original library API is mostly unchanged, though implementations are
updated to call into newly added functions with a default structure
instance ID that provides the original behavior.  New functions are
introduced to enable applications to allocate structure instances to
house timer lists, and to reference them with an identifier when
starting and stopping timers, and finally, to manage the timer lists
referenced with an identifier.

My initial performance testing with the "timer_perf_autotest" test shows
no performance regression or improvement, and inspection of the
generated optimized code shows that the extra function call gets inlined
in the functions that now have an extra function call. 

Changes in v5:
 - define default_data_id as const (Robert)
 - modify for-loop control in rte_timer_alt_manage and
   rte_timer_stop_all (Robert)
 - change parameter type in rte_timer_alt_manage_cb_t from "void *" to
   "struct rte_timer *" (Robert)

Changes in v4:
 - Updated versioned symbols so that they correspond to the next
   release. Checked ABI compatibility again with validate-abi.sh.

Changes in v3:
 - remove C++ style comment in first patch in series (Stephen)

Changes in v2:
 - split these changes out into their own series
 - version the symbols where the existing ABI was updated, and
   provide alternate implementation with behavior equivalent to original
   behavior. Validated ABI compatibility with validate-abi.sh
 - refactor changes to simplify patches

Erik Gabriel Carrillo (2):
  timer: allow timer management in shared memory
  timer: add function to stop all timers in a list

 lib/librte_timer/Makefile              |   1 +
 lib/librte_timer/rte_timer.c           | 557 ++++++++++++++++++++++++++++++---
 lib/librte_timer/rte_timer.h           | 258 ++++++++++++++-
 lib/librte_timer/rte_timer_version.map |  23 ++
 4 files changed, 794 insertions(+), 45 deletions(-)

-- 
2.6.4


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [dpdk-dev] [PATCH v5 1/2] timer: allow timer management in shared memory
  2019-04-15 21:41       ` [dpdk-dev] [PATCH v5 0/2] Timer library changes Erik Gabriel Carrillo
  2019-04-15 21:41         ` Erik Gabriel Carrillo
@ 2019-04-15 21:41         ` Erik Gabriel Carrillo
  2019-04-15 21:41           ` Erik Gabriel Carrillo
  2019-04-17 17:09           ` Thomas Monjalon
  2019-04-15 21:41         ` [dpdk-dev] [PATCH v5 2/2] timer: add function to stop all timers in a list Erik Gabriel Carrillo
  2019-04-17 19:54         ` [dpdk-dev] [PATCH v5 0/2] Timer library changes Thomas Monjalon
  3 siblings, 2 replies; 77+ messages in thread
From: Erik Gabriel Carrillo @ 2019-04-15 21:41 UTC (permalink / raw)
  To: rsanford, thomas; +Cc: dev

Currently, the timer library uses a per-process table of structures to
manage skiplists of timers presumably because timers contain arbitrary
function pointers whose value may not resolve properly in other
processes.

However, if the same callback is used handle all timers, and that
callback is only invoked in one process, then it woud be safe to allow
the data structures to be allocated in shared memory, and to allow
secondary processes to modify the timer lists.  This would let timers be
used in more multi-process scenarios.

The library's global variables are wrapped with a struct, and an array
of these structures is created in shared memory.  The original APIs
are updated to reference the zeroth entry in the array. This maintains
the original behavior for both primary and secondary processes since
the set intersection of their coremasks should be empty [1].  New APIs
are introduced to enable the allocation/deallocation of other entries
in the array.

New variants of the APIs used to start and stop timers are introduced;
they allow a caller to specify which array entry should be used to
locate the timer list to insert into or delete from.

Finally, a new variant of rte_timer_manage() is introduced, which
allows a caller to specify which array entry should be used to locate
the timer lists to process; it can also process multiple timer lists per
invocation.

[1] https://doc.dpdk.org/guides/prog_guide/multi_proc_support.html#multi-process-limitations

Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
---
 lib/librte_timer/Makefile              |   1 +
 lib/librte_timer/rte_timer.c           | 519 ++++++++++++++++++++++++++++++---
 lib/librte_timer/rte_timer.h           | 226 +++++++++++++-
 lib/librte_timer/rte_timer_version.map |  22 ++
 4 files changed, 723 insertions(+), 45 deletions(-)

diff --git a/lib/librte_timer/Makefile b/lib/librte_timer/Makefile
index 4ebd528..8ec63f4 100644
--- a/lib/librte_timer/Makefile
+++ b/lib/librte_timer/Makefile
@@ -6,6 +6,7 @@ include $(RTE_SDK)/mk/rte.vars.mk
 # library name
 LIB = librte_timer.a
 
+CFLAGS += -DALLOW_EXPERIMENTAL_API
 CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3
 LDLIBS += -lrte_eal
 
diff --git a/lib/librte_timer/rte_timer.c b/lib/librte_timer/rte_timer.c
index 30c7b0a..511d902 100644
--- a/lib/librte_timer/rte_timer.c
+++ b/lib/librte_timer/rte_timer.c
@@ -5,6 +5,7 @@
 #include <string.h>
 #include <stdio.h>
 #include <stdint.h>
+#include <stdbool.h>
 #include <inttypes.h>
 #include <assert.h>
 #include <sys/queue.h>
@@ -21,11 +22,15 @@
 #include <rte_spinlock.h>
 #include <rte_random.h>
 #include <rte_pause.h>
+#include <rte_memzone.h>
+#include <rte_malloc.h>
+#include <rte_compat.h>
 
 #include "rte_timer.h"
 
-LIST_HEAD(rte_timer_list, rte_timer);
-
+/**
+ * Per-lcore info for timers.
+ */
 struct priv_timer {
 	struct rte_timer pending_head;  /**< dummy timer instance to head up list */
 	rte_spinlock_t list_lock;       /**< lock to protect list access */
@@ -48,25 +53,84 @@ struct priv_timer {
 #endif
 } __rte_cache_aligned;
 
-/** per-lcore private info for timers */
-static struct priv_timer priv_timer[RTE_MAX_LCORE];
+#define FL_ALLOCATED	(1 << 0)
+struct rte_timer_data {
+	struct priv_timer priv_timer[RTE_MAX_LCORE];
+	uint8_t internal_flags;
+};
+
+#define RTE_MAX_DATA_ELS 64
+static struct rte_timer_data *rte_timer_data_arr;
+static const uint32_t default_data_id;
+static uint32_t rte_timer_subsystem_initialized;
+
+/* For maintaining older interfaces for a period */
+static struct rte_timer_data default_timer_data;
 
 /* when debug is enabled, store some statistics */
 #ifdef RTE_LIBRTE_TIMER_DEBUG
-#define __TIMER_STAT_ADD(name, n) do {					\
+#define __TIMER_STAT_ADD(priv_timer, name, n) do {			\
 		unsigned __lcore_id = rte_lcore_id();			\
 		if (__lcore_id < RTE_MAX_LCORE)				\
 			priv_timer[__lcore_id].stats.name += (n);	\
 	} while(0)
 #else
-#define __TIMER_STAT_ADD(name, n) do {} while(0)
+#define __TIMER_STAT_ADD(priv_timer, name, n) do {} while (0)
 #endif
 
-/* Init the timer library. */
+static inline int
+timer_data_valid(uint32_t id)
+{
+	return !!(rte_timer_data_arr[id].internal_flags & FL_ALLOCATED);
+}
+
+/* validate ID and retrieve timer data pointer, or return error value */
+#define TIMER_DATA_VALID_GET_OR_ERR_RET(id, timer_data, retval) do {	\
+	if (id >= RTE_MAX_DATA_ELS || !timer_data_valid(id))		\
+		return retval;						\
+	timer_data = &rte_timer_data_arr[id];				\
+} while (0)
+
+int __rte_experimental
+rte_timer_data_alloc(uint32_t *id_ptr)
+{
+	int i;
+	struct rte_timer_data *data;
+
+	if (!rte_timer_subsystem_initialized)
+		return -ENOMEM;
+
+	for (i = 0; i < RTE_MAX_DATA_ELS; i++) {
+		data = &rte_timer_data_arr[i];
+		if (!(data->internal_flags & FL_ALLOCATED)) {
+			data->internal_flags |= FL_ALLOCATED;
+
+			if (id_ptr)
+				*id_ptr = i;
+
+			return 0;
+		}
+	}
+
+	return -ENOSPC;
+}
+
+int __rte_experimental
+rte_timer_data_dealloc(uint32_t id)
+{
+	struct rte_timer_data *timer_data;
+	TIMER_DATA_VALID_GET_OR_ERR_RET(id, timer_data, -EINVAL);
+
+	timer_data->internal_flags &= ~(FL_ALLOCATED);
+
+	return 0;
+}
+
 void
-rte_timer_subsystem_init(void)
+rte_timer_subsystem_init_v20(void)
 {
 	unsigned lcore_id;
+	struct priv_timer *priv_timer = default_timer_data.priv_timer;
 
 	/* since priv_timer is static, it's zeroed by default, so only init some
 	 * fields.
@@ -76,6 +140,76 @@ rte_timer_subsystem_init(void)
 		priv_timer[lcore_id].prev_lcore = lcore_id;
 	}
 }
+VERSION_SYMBOL(rte_timer_subsystem_init, _v20, 2.0);
+
+/* Init the timer library. Allocate an array of timer data structs in shared
+ * memory, and allocate the zeroth entry for use with original timer
+ * APIs. Since the intersection of the sets of lcore ids in primary and
+ * secondary processes should be empty, the zeroth entry can be shared by
+ * multiple processes.
+ */
+int
+rte_timer_subsystem_init_v1905(void)
+{
+	const struct rte_memzone *mz;
+	struct rte_timer_data *data;
+	int i, lcore_id;
+	static const char *mz_name = "rte_timer_mz";
+
+	if (rte_timer_subsystem_initialized)
+		return -EALREADY;
+
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
+		mz = rte_memzone_lookup(mz_name);
+		if (mz == NULL)
+			return -EEXIST;
+
+		rte_timer_data_arr = mz->addr;
+
+		rte_timer_data_arr[default_data_id].internal_flags |=
+			FL_ALLOCATED;
+
+		rte_timer_subsystem_initialized = 1;
+
+		return 0;
+	}
+
+	mz = rte_memzone_reserve_aligned(mz_name,
+			RTE_MAX_DATA_ELS * sizeof(*rte_timer_data_arr),
+			SOCKET_ID_ANY, 0, RTE_CACHE_LINE_SIZE);
+	if (mz == NULL)
+		return -ENOMEM;
+
+	rte_timer_data_arr = mz->addr;
+
+	for (i = 0; i < RTE_MAX_DATA_ELS; i++) {
+		data = &rte_timer_data_arr[i];
+
+		for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
+			rte_spinlock_init(
+				&data->priv_timer[lcore_id].list_lock);
+			data->priv_timer[lcore_id].prev_lcore = lcore_id;
+		}
+	}
+
+	rte_timer_data_arr[default_data_id].internal_flags |= FL_ALLOCATED;
+
+	rte_timer_subsystem_initialized = 1;
+
+	return 0;
+}
+MAP_STATIC_SYMBOL(int rte_timer_subsystem_init(void),
+		  rte_timer_subsystem_init_v1905);
+BIND_DEFAULT_SYMBOL(rte_timer_subsystem_init, _v1905, 19.05);
+
+void __rte_experimental
+rte_timer_subsystem_finalize(void)
+{
+	if (rte_timer_data_arr)
+		rte_free(rte_timer_data_arr);
+
+	rte_timer_subsystem_initialized = 0;
+}
 
 /* Initialize the timer handle tim for use */
 void
@@ -95,7 +229,8 @@ rte_timer_init(struct rte_timer *tim)
  */
 static int
 timer_set_config_state(struct rte_timer *tim,
-		       union rte_timer_status *ret_prev_status)
+		       union rte_timer_status *ret_prev_status,
+		       struct priv_timer *priv_timer)
 {
 	union rte_timer_status prev_status, status;
 	int success = 0;
@@ -207,7 +342,7 @@ timer_get_skiplist_level(unsigned curr_depth)
  */
 static void
 timer_get_prev_entries(uint64_t time_val, unsigned tim_lcore,
-		struct rte_timer **prev)
+		       struct rte_timer **prev, struct priv_timer *priv_timer)
 {
 	unsigned lvl = priv_timer[tim_lcore].curr_skiplist_depth;
 	prev[lvl] = &priv_timer[tim_lcore].pending_head;
@@ -226,13 +361,15 @@ timer_get_prev_entries(uint64_t time_val, unsigned tim_lcore,
  */
 static void
 timer_get_prev_entries_for_node(struct rte_timer *tim, unsigned tim_lcore,
-		struct rte_timer **prev)
+				struct rte_timer **prev,
+				struct priv_timer *priv_timer)
 {
 	int i;
+
 	/* to get a specific entry in the list, look for just lower than the time
 	 * values, and then increment on each level individually if necessary
 	 */
-	timer_get_prev_entries(tim->expire - 1, tim_lcore, prev);
+	timer_get_prev_entries(tim->expire - 1, tim_lcore, prev, priv_timer);
 	for (i = priv_timer[tim_lcore].curr_skiplist_depth - 1; i >= 0; i--) {
 		while (prev[i]->sl_next[i] != NULL &&
 				prev[i]->sl_next[i] != tim &&
@@ -247,14 +384,15 @@ timer_get_prev_entries_for_node(struct rte_timer *tim, unsigned tim_lcore,
  * timer must not be in a list
  */
 static void
-timer_add(struct rte_timer *tim, unsigned int tim_lcore)
+timer_add(struct rte_timer *tim, unsigned int tim_lcore,
+	  struct priv_timer *priv_timer)
 {
 	unsigned lvl;
 	struct rte_timer *prev[MAX_SKIPLIST_DEPTH+1];
 
 	/* find where exactly this element goes in the list of elements
 	 * for each depth. */
-	timer_get_prev_entries(tim->expire, tim_lcore, prev);
+	timer_get_prev_entries(tim->expire, tim_lcore, prev, priv_timer);
 
 	/* now assign it a new level and add at that level */
 	const unsigned tim_level = timer_get_skiplist_level(
@@ -284,7 +422,7 @@ timer_add(struct rte_timer *tim, unsigned int tim_lcore)
  */
 static void
 timer_del(struct rte_timer *tim, union rte_timer_status prev_status,
-		int local_is_locked)
+	  int local_is_locked, struct priv_timer *priv_timer)
 {
 	unsigned lcore_id = rte_lcore_id();
 	unsigned prev_owner = prev_status.owner;
@@ -304,7 +442,7 @@ timer_del(struct rte_timer *tim, union rte_timer_status prev_status,
 				((tim->sl_next[0] == NULL) ? 0 : tim->sl_next[0]->expire);
 
 	/* adjust pointers from previous entries to point past this */
-	timer_get_prev_entries_for_node(tim, prev_owner, prev);
+	timer_get_prev_entries_for_node(tim, prev_owner, prev, priv_timer);
 	for (i = priv_timer[prev_owner].curr_skiplist_depth - 1; i >= 0; i--) {
 		if (prev[i]->sl_next[i] == tim)
 			prev[i]->sl_next[i] = tim->sl_next[i];
@@ -326,11 +464,13 @@ static int
 __rte_timer_reset(struct rte_timer *tim, uint64_t expire,
 		  uint64_t period, unsigned tim_lcore,
 		  rte_timer_cb_t fct, void *arg,
-		  int local_is_locked)
+		  int local_is_locked,
+		  struct rte_timer_data *timer_data)
 {
 	union rte_timer_status prev_status, status;
 	int ret;
 	unsigned lcore_id = rte_lcore_id();
+	struct priv_timer *priv_timer = timer_data->priv_timer;
 
 	/* round robin for tim_lcore */
 	if (tim_lcore == (unsigned)LCORE_ID_ANY) {
@@ -348,11 +488,11 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t expire,
 
 	/* wait that the timer is in correct status before update,
 	 * and mark it as being configured */
-	ret = timer_set_config_state(tim, &prev_status);
+	ret = timer_set_config_state(tim, &prev_status, priv_timer);
 	if (ret < 0)
 		return -1;
 
-	__TIMER_STAT_ADD(reset, 1);
+	__TIMER_STAT_ADD(priv_timer, reset, 1);
 	if (prev_status.state == RTE_TIMER_RUNNING &&
 	    lcore_id < RTE_MAX_LCORE) {
 		priv_timer[lcore_id].updated = 1;
@@ -360,8 +500,8 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t expire,
 
 	/* remove it from list */
 	if (prev_status.state == RTE_TIMER_PENDING) {
-		timer_del(tim, prev_status, local_is_locked);
-		__TIMER_STAT_ADD(pending, -1);
+		timer_del(tim, prev_status, local_is_locked, priv_timer);
+		__TIMER_STAT_ADD(priv_timer, pending, -1);
 	}
 
 	tim->period = period;
@@ -376,8 +516,8 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t expire,
 	if (tim_lcore != lcore_id || !local_is_locked)
 		rte_spinlock_lock(&priv_timer[tim_lcore].list_lock);
 
-	__TIMER_STAT_ADD(pending, 1);
-	timer_add(tim, tim_lcore);
+	__TIMER_STAT_ADD(priv_timer, pending, 1);
+	timer_add(tim, tim_lcore, priv_timer);
 
 	/* update state: as we are in CONFIG state, only us can modify
 	 * the state so we don't need to use cmpset() here */
@@ -394,9 +534,9 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t expire,
 
 /* Reset and start the timer associated with the timer handle tim */
 int
-rte_timer_reset(struct rte_timer *tim, uint64_t ticks,
-		enum rte_timer_type type, unsigned tim_lcore,
-		rte_timer_cb_t fct, void *arg)
+rte_timer_reset_v20(struct rte_timer *tim, uint64_t ticks,
+		    enum rte_timer_type type, unsigned int tim_lcore,
+		    rte_timer_cb_t fct, void *arg)
 {
 	uint64_t cur_time = rte_get_timer_cycles();
 	uint64_t period;
@@ -412,7 +552,48 @@ rte_timer_reset(struct rte_timer *tim, uint64_t ticks,
 		period = 0;
 
 	return __rte_timer_reset(tim,  cur_time + ticks, period, tim_lcore,
-			  fct, arg, 0);
+			  fct, arg, 0, &default_timer_data);
+}
+VERSION_SYMBOL(rte_timer_reset, _v20, 2.0);
+
+int
+rte_timer_reset_v1905(struct rte_timer *tim, uint64_t ticks,
+		      enum rte_timer_type type, unsigned int tim_lcore,
+		      rte_timer_cb_t fct, void *arg)
+{
+	return rte_timer_alt_reset(default_data_id, tim, ticks, type,
+				   tim_lcore, fct, arg);
+}
+MAP_STATIC_SYMBOL(int rte_timer_reset(struct rte_timer *tim, uint64_t ticks,
+				      enum rte_timer_type type,
+				      unsigned int tim_lcore,
+				      rte_timer_cb_t fct, void *arg),
+		  rte_timer_reset_v1905);
+BIND_DEFAULT_SYMBOL(rte_timer_reset, _v1905, 19.05);
+
+int __rte_experimental
+rte_timer_alt_reset(uint32_t timer_data_id, struct rte_timer *tim,
+		    uint64_t ticks, enum rte_timer_type type,
+		    unsigned int tim_lcore, rte_timer_cb_t fct, void *arg)
+{
+	uint64_t cur_time = rte_get_timer_cycles();
+	uint64_t period;
+	struct rte_timer_data *timer_data;
+
+	TIMER_DATA_VALID_GET_OR_ERR_RET(timer_data_id, timer_data, -EINVAL);
+
+	if (unlikely((tim_lcore != (unsigned int)LCORE_ID_ANY) &&
+			!(rte_lcore_is_enabled(tim_lcore) ||
+			  rte_lcore_has_role(tim_lcore, ROLE_SERVICE))))
+		return -1;
+
+	if (type == PERIODICAL)
+		period = ticks;
+	else
+		period = 0;
+
+	return __rte_timer_reset(tim,  cur_time + ticks, period, tim_lcore,
+				 fct, arg, 0, timer_data);
 }
 
 /* loop until rte_timer_reset() succeed */
@@ -426,21 +607,22 @@ rte_timer_reset_sync(struct rte_timer *tim, uint64_t ticks,
 		rte_pause();
 }
 
-/* Stop the timer associated with the timer handle tim */
-int
-rte_timer_stop(struct rte_timer *tim)
+static int
+__rte_timer_stop(struct rte_timer *tim, int local_is_locked,
+		 struct rte_timer_data *timer_data)
 {
 	union rte_timer_status prev_status, status;
 	unsigned lcore_id = rte_lcore_id();
 	int ret;
+	struct priv_timer *priv_timer = timer_data->priv_timer;
 
 	/* wait that the timer is in correct status before update,
 	 * and mark it as being configured */
-	ret = timer_set_config_state(tim, &prev_status);
+	ret = timer_set_config_state(tim, &prev_status, priv_timer);
 	if (ret < 0)
 		return -1;
 
-	__TIMER_STAT_ADD(stop, 1);
+	__TIMER_STAT_ADD(priv_timer, stop, 1);
 	if (prev_status.state == RTE_TIMER_RUNNING &&
 	    lcore_id < RTE_MAX_LCORE) {
 		priv_timer[lcore_id].updated = 1;
@@ -448,8 +630,8 @@ rte_timer_stop(struct rte_timer *tim)
 
 	/* remove it from list */
 	if (prev_status.state == RTE_TIMER_PENDING) {
-		timer_del(tim, prev_status, 0);
-		__TIMER_STAT_ADD(pending, -1);
+		timer_del(tim, prev_status, local_is_locked, priv_timer);
+		__TIMER_STAT_ADD(priv_timer, pending, -1);
 	}
 
 	/* mark timer as stopped */
@@ -461,6 +643,33 @@ rte_timer_stop(struct rte_timer *tim)
 	return 0;
 }
 
+/* Stop the timer associated with the timer handle tim */
+int
+rte_timer_stop_v20(struct rte_timer *tim)
+{
+	return __rte_timer_stop(tim, 0, &default_timer_data);
+}
+VERSION_SYMBOL(rte_timer_stop, _v20, 2.0);
+
+int
+rte_timer_stop_v1905(struct rte_timer *tim)
+{
+	return rte_timer_alt_stop(default_data_id, tim);
+}
+MAP_STATIC_SYMBOL(int rte_timer_stop(struct rte_timer *tim),
+		  rte_timer_stop_v1905);
+BIND_DEFAULT_SYMBOL(rte_timer_stop, _v1905, 19.05);
+
+int __rte_experimental
+rte_timer_alt_stop(uint32_t timer_data_id, struct rte_timer *tim)
+{
+	struct rte_timer_data *timer_data;
+
+	TIMER_DATA_VALID_GET_OR_ERR_RET(timer_data_id, timer_data, -EINVAL);
+
+	return __rte_timer_stop(tim, 0, timer_data);
+}
+
 /* loop until rte_timer_stop() succeed */
 void
 rte_timer_stop_sync(struct rte_timer *tim)
@@ -477,7 +686,8 @@ rte_timer_pending(struct rte_timer *tim)
 }
 
 /* must be called periodically, run all timer that expired */
-void rte_timer_manage(void)
+static void
+__rte_timer_manage(struct rte_timer_data *timer_data)
 {
 	union rte_timer_status status;
 	struct rte_timer *tim, *next_tim;
@@ -486,11 +696,12 @@ void rte_timer_manage(void)
 	struct rte_timer *prev[MAX_SKIPLIST_DEPTH + 1];
 	uint64_t cur_time;
 	int i, ret;
+	struct priv_timer *priv_timer = timer_data->priv_timer;
 
 	/* timer manager only runs on EAL thread with valid lcore_id */
 	assert(lcore_id < RTE_MAX_LCORE);
 
-	__TIMER_STAT_ADD(manage, 1);
+	__TIMER_STAT_ADD(priv_timer, manage, 1);
 	/* optimize for the case where per-cpu list is empty */
 	if (priv_timer[lcore_id].pending_head.sl_next[0] == NULL)
 		return;
@@ -518,7 +729,7 @@ void rte_timer_manage(void)
 	tim = priv_timer[lcore_id].pending_head.sl_next[0];
 
 	/* break the existing list at current time point */
-	timer_get_prev_entries(cur_time, lcore_id, prev);
+	timer_get_prev_entries(cur_time, lcore_id, prev, priv_timer);
 	for (i = priv_timer[lcore_id].curr_skiplist_depth -1; i >= 0; i--) {
 		if (prev[i] == &priv_timer[lcore_id].pending_head)
 			continue;
@@ -563,7 +774,7 @@ void rte_timer_manage(void)
 		/* execute callback function with list unlocked */
 		tim->f(tim, tim->arg);
 
-		__TIMER_STAT_ADD(pending, -1);
+		__TIMER_STAT_ADD(priv_timer, pending, -1);
 		/* the timer was stopped or reloaded by the callback
 		 * function, we have nothing to do here */
 		if (priv_timer[lcore_id].updated == 1)
@@ -580,24 +791,222 @@ void rte_timer_manage(void)
 			/* keep it in list and mark timer as pending */
 			rte_spinlock_lock(&priv_timer[lcore_id].list_lock);
 			status.state = RTE_TIMER_PENDING;
-			__TIMER_STAT_ADD(pending, 1);
+			__TIMER_STAT_ADD(priv_timer, pending, 1);
 			status.owner = (int16_t)lcore_id;
 			rte_wmb();
 			tim->status.u32 = status.u32;
 			__rte_timer_reset(tim, tim->expire + tim->period,
-				tim->period, lcore_id, tim->f, tim->arg, 1);
+				tim->period, lcore_id, tim->f, tim->arg, 1,
+				timer_data);
 			rte_spinlock_unlock(&priv_timer[lcore_id].list_lock);
 		}
 	}
 	priv_timer[lcore_id].running_tim = NULL;
 }
 
+void
+rte_timer_manage_v20(void)
+{
+	__rte_timer_manage(&default_timer_data);
+}
+VERSION_SYMBOL(rte_timer_manage, _v20, 2.0);
+
+int
+rte_timer_manage_v1905(void)
+{
+	struct rte_timer_data *timer_data;
+
+	TIMER_DATA_VALID_GET_OR_ERR_RET(default_data_id, timer_data, -EINVAL);
+
+	__rte_timer_manage(timer_data);
+
+	return 0;
+}
+MAP_STATIC_SYMBOL(int rte_timer_manage(void), rte_timer_manage_v1905);
+BIND_DEFAULT_SYMBOL(rte_timer_manage, _v1905, 19.05);
+
+int __rte_experimental
+rte_timer_alt_manage(uint32_t timer_data_id,
+		     unsigned int *poll_lcores,
+		     int nb_poll_lcores,
+		     rte_timer_alt_manage_cb_t f)
+{
+	union rte_timer_status status;
+	struct rte_timer *tim, *next_tim, **pprev;
+	struct rte_timer *run_first_tims[RTE_MAX_LCORE];
+	unsigned int runlist_lcore_ids[RTE_MAX_LCORE];
+	unsigned int this_lcore = rte_lcore_id();
+	struct rte_timer *prev[MAX_SKIPLIST_DEPTH + 1];
+	uint64_t cur_time;
+	int i, j, ret;
+	int nb_runlists = 0;
+	struct rte_timer_data *data;
+	struct priv_timer *privp;
+	uint32_t poll_lcore;
+
+	TIMER_DATA_VALID_GET_OR_ERR_RET(timer_data_id, data, -EINVAL);
+
+	/* timer manager only runs on EAL thread with valid lcore_id */
+	assert(this_lcore < RTE_MAX_LCORE);
+
+	__TIMER_STAT_ADD(data->priv_timer, manage, 1);
+
+	if (poll_lcores == NULL) {
+		poll_lcores = (unsigned int []){rte_lcore_id()};
+		nb_poll_lcores = 1;
+	}
+
+	for (i = 0; i < nb_poll_lcores; i++) {
+		poll_lcore = poll_lcores[i];
+		privp = &data->priv_timer[poll_lcore];
+
+		/* optimize for the case where per-cpu list is empty */
+		if (privp->pending_head.sl_next[0] == NULL)
+			continue;
+		cur_time = rte_get_timer_cycles();
+
+#ifdef RTE_ARCH_64
+		/* on 64-bit the value cached in the pending_head.expired will
+		 * be updated atomically, so we can consult that for a quick
+		 * check here outside the lock
+		 */
+		if (likely(privp->pending_head.expire > cur_time))
+			continue;
+#endif
+
+		/* browse ordered list, add expired timers in 'expired' list */
+		rte_spinlock_lock(&privp->list_lock);
+
+		/* if nothing to do just unlock and return */
+		if (privp->pending_head.sl_next[0] == NULL ||
+		    privp->pending_head.sl_next[0]->expire > cur_time) {
+			rte_spinlock_unlock(&privp->list_lock);
+			continue;
+		}
+
+		/* save start of list of expired timers */
+		tim = privp->pending_head.sl_next[0];
+
+		/* break the existing list at current time point */
+		timer_get_prev_entries(cur_time, poll_lcore, prev,
+				       data->priv_timer);
+		for (j = privp->curr_skiplist_depth - 1; j >= 0; j--) {
+			if (prev[j] == &privp->pending_head)
+				continue;
+			privp->pending_head.sl_next[j] =
+				prev[j]->sl_next[j];
+			if (prev[j]->sl_next[j] == NULL)
+				privp->curr_skiplist_depth--;
+
+			prev[j]->sl_next[j] = NULL;
+		}
+
+		/* transition run-list from PENDING to RUNNING */
+		run_first_tims[nb_runlists] = tim;
+		runlist_lcore_ids[nb_runlists] = poll_lcore;
+		pprev = &run_first_tims[nb_runlists];
+		nb_runlists++;
+
+		for ( ; tim != NULL; tim = next_tim) {
+			next_tim = tim->sl_next[0];
+
+			ret = timer_set_running_state(tim);
+			if (likely(ret == 0)) {
+				pprev = &tim->sl_next[0];
+			} else {
+				/* another core is trying to re-config this one,
+				 * remove it from local expired list
+				 */
+				*pprev = next_tim;
+			}
+		}
+
+		/* update the next to expire timer value */
+		privp->pending_head.expire =
+		    (privp->pending_head.sl_next[0] == NULL) ? 0 :
+			privp->pending_head.sl_next[0]->expire;
+
+		rte_spinlock_unlock(&privp->list_lock);
+	}
+
+	/* Now process the run lists */
+	while (1) {
+		bool done = true;
+		uint64_t min_expire = UINT64_MAX;
+		int min_idx = 0;
+
+		/* Find the next oldest timer to process */
+		for (i = 0; i < nb_runlists; i++) {
+			tim = run_first_tims[i];
+
+			if (tim != NULL && tim->expire < min_expire) {
+				min_expire = tim->expire;
+				min_idx = i;
+				done = false;
+			}
+		}
+
+		if (done)
+			break;
+
+		tim = run_first_tims[min_idx];
+		privp = &data->priv_timer[runlist_lcore_ids[min_idx]];
+
+		/* Move down the runlist from which we picked a timer to
+		 * execute
+		 */
+		run_first_tims[min_idx] = run_first_tims[min_idx]->sl_next[0];
+
+		privp->updated = 0;
+		privp->running_tim = tim;
+
+		/* Call the provided callback function */
+		f(tim);
+
+		__TIMER_STAT_ADD(privp, pending, -1);
+
+		/* the timer was stopped or reloaded by the callback
+		 * function, we have nothing to do here
+		 */
+		if (privp->updated == 1)
+			continue;
+
+		if (tim->period == 0) {
+			/* remove from done list and mark timer as stopped */
+			status.state = RTE_TIMER_STOP;
+			status.owner = RTE_TIMER_NO_OWNER;
+			rte_wmb();
+			tim->status.u32 = status.u32;
+		} else {
+			/* keep it in list and mark timer as pending */
+			rte_spinlock_lock(
+				&data->priv_timer[this_lcore].list_lock);
+			status.state = RTE_TIMER_PENDING;
+			__TIMER_STAT_ADD(data->priv_timer, pending, 1);
+			status.owner = (int16_t)this_lcore;
+			rte_wmb();
+			tim->status.u32 = status.u32;
+			__rte_timer_reset(tim, tim->expire + tim->period,
+				tim->period, this_lcore, tim->f, tim->arg, 1,
+				data);
+			rte_spinlock_unlock(
+				&data->priv_timer[this_lcore].list_lock);
+		}
+
+		privp->running_tim = NULL;
+	}
+
+	return 0;
+}
+
 /* dump statistics about timers */
-void rte_timer_dump_stats(FILE *f)
+static void
+__rte_timer_dump_stats(struct rte_timer_data *timer_data __rte_unused, FILE *f)
 {
 #ifdef RTE_LIBRTE_TIMER_DEBUG
 	struct rte_timer_debug_stats sum;
 	unsigned lcore_id;
+	struct priv_timer *priv_timer = timer_data->priv_timer;
 
 	memset(&sum, 0, sizeof(sum));
 	for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
@@ -615,3 +1024,31 @@ void rte_timer_dump_stats(FILE *f)
 	fprintf(f, "No timer statistics, RTE_LIBRTE_TIMER_DEBUG is disabled\n");
 #endif
 }
+
+void
+rte_timer_dump_stats_v20(FILE *f)
+{
+	__rte_timer_dump_stats(&default_timer_data, f);
+}
+VERSION_SYMBOL(rte_timer_dump_stats, _v20, 2.0);
+
+int
+rte_timer_dump_stats_v1905(FILE *f)
+{
+	return rte_timer_alt_dump_stats(default_data_id, f);
+}
+MAP_STATIC_SYMBOL(int rte_timer_dump_stats(FILE *f),
+		  rte_timer_dump_stats_v1905);
+BIND_DEFAULT_SYMBOL(rte_timer_dump_stats, _v1905, 19.05);
+
+int __rte_experimental
+rte_timer_alt_dump_stats(uint32_t timer_data_id __rte_unused, FILE *f)
+{
+	struct rte_timer_data *timer_data;
+
+	TIMER_DATA_VALID_GET_OR_ERR_RET(timer_data_id, timer_data, -EINVAL);
+
+	__rte_timer_dump_stats(timer_data, f);
+
+	return 0;
+}
diff --git a/lib/librte_timer/rte_timer.h b/lib/librte_timer/rte_timer.h
index 9b95cd2..6a9c499 100644
--- a/lib/librte_timer/rte_timer.h
+++ b/lib/librte_timer/rte_timer.h
@@ -39,6 +39,7 @@
 #include <stddef.h>
 #include <rte_common.h>
 #include <rte_config.h>
+#include <rte_spinlock.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -132,12 +133,68 @@ struct rte_timer
 #endif
 
 /**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Allocate a timer data instance in shared memory to track a set of pending
+ * timer lists.
+ *
+ * @param id_ptr
+ *   Pointer to variable into which to write the identifier of the allocated
+ *   timer data instance.
+ *
+ * @return
+ *   - 0: Success
+ *   - -ENOSPC: maximum number of timer data instances already allocated
+ */
+int __rte_experimental rte_timer_data_alloc(uint32_t *id_ptr);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Deallocate a timer data instance.
+ *
+ * @param id
+ *   Identifier of the timer data instance to deallocate.
+ *
+ * @return
+ *   - 0: Success
+ *   - -EINVAL: invalid timer data instance identifier
+ */
+int __rte_experimental rte_timer_data_dealloc(uint32_t id);
+
+/**
  * Initialize the timer library.
  *
  * Initializes internal variables (list, locks and so on) for the RTE
  * timer library.
  */
-void rte_timer_subsystem_init(void);
+void rte_timer_subsystem_init_v20(void);
+
+/**
+ * Initialize the timer library.
+ *
+ * Initializes internal variables (list, locks and so on) for the RTE
+ * timer library.
+ *
+ * @return
+ *   - 0: Success
+ *   - -EEXIST: Returned in secondary process when primary process has not
+ *      yet initialized the timer subsystem
+ *   - -ENOMEM: Unable to allocate memory needed to initialize timer
+ *      subsystem
+ */
+int rte_timer_subsystem_init_v1905(void);
+int rte_timer_subsystem_init(void);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Free timer subsystem resources.
+ */
+void __rte_experimental rte_timer_subsystem_finalize(void);
 
 /**
  * Initialize a timer handle.
@@ -193,6 +250,12 @@ void rte_timer_init(struct rte_timer *tim);
  *   - 0: Success; the timer is scheduled.
  *   - (-1): Timer is in the RUNNING or CONFIG state.
  */
+int rte_timer_reset_v20(struct rte_timer *tim, uint64_t ticks,
+			enum rte_timer_type type, unsigned int tim_lcore,
+			rte_timer_cb_t fct, void *arg);
+int rte_timer_reset_v1905(struct rte_timer *tim, uint64_t ticks,
+			  enum rte_timer_type type, unsigned int tim_lcore,
+			  rte_timer_cb_t fct, void *arg);
 int rte_timer_reset(struct rte_timer *tim, uint64_t ticks,
 		    enum rte_timer_type type, unsigned tim_lcore,
 		    rte_timer_cb_t fct, void *arg);
@@ -252,9 +315,10 @@ rte_timer_reset_sync(struct rte_timer *tim, uint64_t ticks,
  *   - 0: Success; the timer is stopped.
  *   - (-1): The timer is in the RUNNING or CONFIG state.
  */
+int rte_timer_stop_v20(struct rte_timer *tim);
+int rte_timer_stop_v1905(struct rte_timer *tim);
 int rte_timer_stop(struct rte_timer *tim);
 
-
 /**
  * Loop until rte_timer_stop() succeeds.
  *
@@ -292,7 +356,25 @@ int rte_timer_pending(struct rte_timer *tim);
  * function. However, the more often the function is called, the more
  * CPU resources it will use.
  */
-void rte_timer_manage(void);
+void rte_timer_manage_v20(void);
+
+/**
+ * Manage the timer list and execute callback functions.
+ *
+ * This function must be called periodically from EAL lcores
+ * main_loop(). It browses the list of pending timers and runs all
+ * timers that are expired.
+ *
+ * The precision of the timer depends on the call frequency of this
+ * function. However, the more often the function is called, the more
+ * CPU resources it will use.
+ *
+ * @return
+ *   - 0: Success
+ *   - -EINVAL: timer subsystem not yet initialized
+ */
+int rte_timer_manage_v1905(void);
+int rte_timer_manage(void);
 
 /**
  * Dump statistics about timers.
@@ -300,7 +382,143 @@ void rte_timer_manage(void);
  * @param f
  *   A pointer to a file for output
  */
-void rte_timer_dump_stats(FILE *f);
+void rte_timer_dump_stats_v20(FILE *f);
+
+/**
+ * Dump statistics about timers.
+ *
+ * @param f
+ *   A pointer to a file for output
+ * @return
+ *   - 0: Success
+ *   - -EINVAL: timer subsystem not yet initialized
+ */
+int rte_timer_dump_stats_v1905(FILE *f);
+int rte_timer_dump_stats(FILE *f);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * This function is the same as rte_timer_reset(), except that it allows a
+ * caller to specify the rte_timer_data instance containing the list to which
+ * the timer should be added.
+ *
+ * @see rte_timer_reset()
+ *
+ * @param timer_data_id
+ *   An identifier indicating which instance of timer data should be used for
+ *   this operation.
+ * @param tim
+ *   The timer handle.
+ * @param ticks
+ *   The number of cycles (see rte_get_hpet_hz()) before the callback
+ *   function is called.
+ * @param type
+ *   The type can be either:
+ *   - PERIODICAL: The timer is automatically reloaded after execution
+ *     (returns to the PENDING state)
+ *   - SINGLE: The timer is one-shot, that is, the timer goes to a
+ *     STOPPED state after execution.
+ * @param tim_lcore
+ *   The ID of the lcore where the timer callback function has to be
+ *   executed. If tim_lcore is LCORE_ID_ANY, the timer library will
+ *   launch it on a different core for each call (round-robin).
+ * @param fct
+ *   The callback function of the timer. This parameter can be NULL if (and
+ *   only if) rte_timer_alt_manage() will be used to manage this timer.
+ * @param arg
+ *   The user argument of the callback function.
+ * @return
+ *   - 0: Success; the timer is scheduled.
+ *   - (-1): Timer is in the RUNNING or CONFIG state.
+ *   - -EINVAL: invalid timer_data_id
+ */
+int __rte_experimental
+rte_timer_alt_reset(uint32_t timer_data_id, struct rte_timer *tim,
+		    uint64_t ticks, enum rte_timer_type type,
+		    unsigned int tim_lcore, rte_timer_cb_t fct, void *arg);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * This function is the same as rte_timer_stop(), except that it allows a
+ * caller to specify the rte_timer_data instance containing the list from which
+ * this timer should be removed.
+ *
+ * @see rte_timer_stop()
+ *
+ * @param timer_data_id
+ *   An identifier indicating which instance of timer data should be used for
+ *   this operation.
+ * @param tim
+ *   The timer handle.
+ * @return
+ *   - 0: Success; the timer is stopped.
+ *   - (-1): The timer is in the RUNNING or CONFIG state.
+ *   - -EINVAL: invalid timer_data_id
+ */
+int __rte_experimental
+rte_timer_alt_stop(uint32_t timer_data_id, struct rte_timer *tim);
+
+/**
+ * Callback function type for rte_timer_alt_manage().
+ */
+typedef void (*rte_timer_alt_manage_cb_t)(struct rte_timer *tim);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Manage a set of timer lists and execute the specified callback function for
+ * all expired timers. This function is similar to rte_timer_manage(), except
+ * that it allows a caller to specify the timer_data instance that should
+ * be operated on, as well as a set of lcore IDs identifying which timer lists
+ * should be processed.  Callback functions of individual timers are ignored.
+ *
+ * @see rte_timer_manage()
+ *
+ * @param timer_data_id
+ *   An identifier indicating which instance of timer data should be used for
+ *   this operation.
+ * @param poll_lcores
+ *   An array of lcore ids identifying the timer lists that should be processed.
+ *   NULL is allowed - if NULL, the timer list corresponding to the lcore
+ *   calling this routine is processed (same as rte_timer_manage()).
+ * @param n_poll_lcores
+ *   The size of the poll_lcores array. If 'poll_lcores' is NULL, this parameter
+ *   is ignored.
+ * @param f
+ *   The callback function which should be called for all expired timers.
+ * @return
+ *   - 0: success
+ *   - -EINVAL: invalid timer_data_id
+ */
+int __rte_experimental
+rte_timer_alt_manage(uint32_t timer_data_id, unsigned int *poll_lcores,
+		     int n_poll_lcores, rte_timer_alt_manage_cb_t f);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * This function is the same as rte_timer_dump_stats(), except that it allows
+ * the caller to specify the rte_timer_data instance that should be used.
+ *
+ * @see rte_timer_dump_stats()
+ *
+ * @param timer_data_id
+ *   An identifier indicating which instance of timer data should be used for
+ *   this operation.
+ * @param f
+ *   A pointer to a file for output
+ * @return
+ *   - 0: success
+ *   - -EINVAL: invalid timer_data_id
+ */
+int __rte_experimental
+rte_timer_alt_dump_stats(uint32_t timer_data_id, FILE *f);
 
 #ifdef __cplusplus
 }
diff --git a/lib/librte_timer/rte_timer_version.map b/lib/librte_timer/rte_timer_version.map
index 9b2e4b8..c2e5836 100644
--- a/lib/librte_timer/rte_timer_version.map
+++ b/lib/librte_timer/rte_timer_version.map
@@ -13,3 +13,25 @@ DPDK_2.0 {
 
 	local: *;
 };
+
+DPDK_19.05 {
+	global:
+
+	rte_timer_dump_stats;
+	rte_timer_manage;
+	rte_timer_reset;
+	rte_timer_stop;
+	rte_timer_subsystem_init;
+} DPDK_2.0;
+
+EXPERIMENTAL {
+	global:
+
+	rte_timer_alt_dump_stats;
+	rte_timer_alt_manage;
+	rte_timer_alt_reset;
+	rte_timer_alt_stop;
+	rte_timer_data_alloc;
+	rte_timer_data_dealloc;
+	rte_timer_subsystem_finalize;
+};
-- 
2.6.4

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [dpdk-dev] [PATCH v5 1/2] timer: allow timer management in shared memory
  2019-04-15 21:41         ` [dpdk-dev] [PATCH v5 1/2] timer: allow timer management in shared memory Erik Gabriel Carrillo
@ 2019-04-15 21:41           ` Erik Gabriel Carrillo
  2019-04-17 17:09           ` Thomas Monjalon
  1 sibling, 0 replies; 77+ messages in thread
From: Erik Gabriel Carrillo @ 2019-04-15 21:41 UTC (permalink / raw)
  To: rsanford, thomas; +Cc: dev

Currently, the timer library uses a per-process table of structures to
manage skiplists of timers presumably because timers contain arbitrary
function pointers whose value may not resolve properly in other
processes.

However, if the same callback is used handle all timers, and that
callback is only invoked in one process, then it woud be safe to allow
the data structures to be allocated in shared memory, and to allow
secondary processes to modify the timer lists.  This would let timers be
used in more multi-process scenarios.

The library's global variables are wrapped with a struct, and an array
of these structures is created in shared memory.  The original APIs
are updated to reference the zeroth entry in the array. This maintains
the original behavior for both primary and secondary processes since
the set intersection of their coremasks should be empty [1].  New APIs
are introduced to enable the allocation/deallocation of other entries
in the array.

New variants of the APIs used to start and stop timers are introduced;
they allow a caller to specify which array entry should be used to
locate the timer list to insert into or delete from.

Finally, a new variant of rte_timer_manage() is introduced, which
allows a caller to specify which array entry should be used to locate
the timer lists to process; it can also process multiple timer lists per
invocation.

[1] https://doc.dpdk.org/guides/prog_guide/multi_proc_support.html#multi-process-limitations

Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
---
 lib/librte_timer/Makefile              |   1 +
 lib/librte_timer/rte_timer.c           | 519 ++++++++++++++++++++++++++++++---
 lib/librte_timer/rte_timer.h           | 226 +++++++++++++-
 lib/librte_timer/rte_timer_version.map |  22 ++
 4 files changed, 723 insertions(+), 45 deletions(-)

diff --git a/lib/librte_timer/Makefile b/lib/librte_timer/Makefile
index 4ebd528..8ec63f4 100644
--- a/lib/librte_timer/Makefile
+++ b/lib/librte_timer/Makefile
@@ -6,6 +6,7 @@ include $(RTE_SDK)/mk/rte.vars.mk
 # library name
 LIB = librte_timer.a
 
+CFLAGS += -DALLOW_EXPERIMENTAL_API
 CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3
 LDLIBS += -lrte_eal
 
diff --git a/lib/librte_timer/rte_timer.c b/lib/librte_timer/rte_timer.c
index 30c7b0a..511d902 100644
--- a/lib/librte_timer/rte_timer.c
+++ b/lib/librte_timer/rte_timer.c
@@ -5,6 +5,7 @@
 #include <string.h>
 #include <stdio.h>
 #include <stdint.h>
+#include <stdbool.h>
 #include <inttypes.h>
 #include <assert.h>
 #include <sys/queue.h>
@@ -21,11 +22,15 @@
 #include <rte_spinlock.h>
 #include <rte_random.h>
 #include <rte_pause.h>
+#include <rte_memzone.h>
+#include <rte_malloc.h>
+#include <rte_compat.h>
 
 #include "rte_timer.h"
 
-LIST_HEAD(rte_timer_list, rte_timer);
-
+/**
+ * Per-lcore info for timers.
+ */
 struct priv_timer {
 	struct rte_timer pending_head;  /**< dummy timer instance to head up list */
 	rte_spinlock_t list_lock;       /**< lock to protect list access */
@@ -48,25 +53,84 @@ struct priv_timer {
 #endif
 } __rte_cache_aligned;
 
-/** per-lcore private info for timers */
-static struct priv_timer priv_timer[RTE_MAX_LCORE];
+#define FL_ALLOCATED	(1 << 0)
+struct rte_timer_data {
+	struct priv_timer priv_timer[RTE_MAX_LCORE];
+	uint8_t internal_flags;
+};
+
+#define RTE_MAX_DATA_ELS 64
+static struct rte_timer_data *rte_timer_data_arr;
+static const uint32_t default_data_id;
+static uint32_t rte_timer_subsystem_initialized;
+
+/* For maintaining older interfaces for a period */
+static struct rte_timer_data default_timer_data;
 
 /* when debug is enabled, store some statistics */
 #ifdef RTE_LIBRTE_TIMER_DEBUG
-#define __TIMER_STAT_ADD(name, n) do {					\
+#define __TIMER_STAT_ADD(priv_timer, name, n) do {			\
 		unsigned __lcore_id = rte_lcore_id();			\
 		if (__lcore_id < RTE_MAX_LCORE)				\
 			priv_timer[__lcore_id].stats.name += (n);	\
 	} while(0)
 #else
-#define __TIMER_STAT_ADD(name, n) do {} while(0)
+#define __TIMER_STAT_ADD(priv_timer, name, n) do {} while (0)
 #endif
 
-/* Init the timer library. */
+static inline int
+timer_data_valid(uint32_t id)
+{
+	return !!(rte_timer_data_arr[id].internal_flags & FL_ALLOCATED);
+}
+
+/* validate ID and retrieve timer data pointer, or return error value */
+#define TIMER_DATA_VALID_GET_OR_ERR_RET(id, timer_data, retval) do {	\
+	if (id >= RTE_MAX_DATA_ELS || !timer_data_valid(id))		\
+		return retval;						\
+	timer_data = &rte_timer_data_arr[id];				\
+} while (0)
+
+int __rte_experimental
+rte_timer_data_alloc(uint32_t *id_ptr)
+{
+	int i;
+	struct rte_timer_data *data;
+
+	if (!rte_timer_subsystem_initialized)
+		return -ENOMEM;
+
+	for (i = 0; i < RTE_MAX_DATA_ELS; i++) {
+		data = &rte_timer_data_arr[i];
+		if (!(data->internal_flags & FL_ALLOCATED)) {
+			data->internal_flags |= FL_ALLOCATED;
+
+			if (id_ptr)
+				*id_ptr = i;
+
+			return 0;
+		}
+	}
+
+	return -ENOSPC;
+}
+
+int __rte_experimental
+rte_timer_data_dealloc(uint32_t id)
+{
+	struct rte_timer_data *timer_data;
+	TIMER_DATA_VALID_GET_OR_ERR_RET(id, timer_data, -EINVAL);
+
+	timer_data->internal_flags &= ~(FL_ALLOCATED);
+
+	return 0;
+}
+
 void
-rte_timer_subsystem_init(void)
+rte_timer_subsystem_init_v20(void)
 {
 	unsigned lcore_id;
+	struct priv_timer *priv_timer = default_timer_data.priv_timer;
 
 	/* since priv_timer is static, it's zeroed by default, so only init some
 	 * fields.
@@ -76,6 +140,76 @@ rte_timer_subsystem_init(void)
 		priv_timer[lcore_id].prev_lcore = lcore_id;
 	}
 }
+VERSION_SYMBOL(rte_timer_subsystem_init, _v20, 2.0);
+
+/* Init the timer library. Allocate an array of timer data structs in shared
+ * memory, and allocate the zeroth entry for use with original timer
+ * APIs. Since the intersection of the sets of lcore ids in primary and
+ * secondary processes should be empty, the zeroth entry can be shared by
+ * multiple processes.
+ */
+int
+rte_timer_subsystem_init_v1905(void)
+{
+	const struct rte_memzone *mz;
+	struct rte_timer_data *data;
+	int i, lcore_id;
+	static const char *mz_name = "rte_timer_mz";
+
+	if (rte_timer_subsystem_initialized)
+		return -EALREADY;
+
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
+		mz = rte_memzone_lookup(mz_name);
+		if (mz == NULL)
+			return -EEXIST;
+
+		rte_timer_data_arr = mz->addr;
+
+		rte_timer_data_arr[default_data_id].internal_flags |=
+			FL_ALLOCATED;
+
+		rte_timer_subsystem_initialized = 1;
+
+		return 0;
+	}
+
+	mz = rte_memzone_reserve_aligned(mz_name,
+			RTE_MAX_DATA_ELS * sizeof(*rte_timer_data_arr),
+			SOCKET_ID_ANY, 0, RTE_CACHE_LINE_SIZE);
+	if (mz == NULL)
+		return -ENOMEM;
+
+	rte_timer_data_arr = mz->addr;
+
+	for (i = 0; i < RTE_MAX_DATA_ELS; i++) {
+		data = &rte_timer_data_arr[i];
+
+		for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
+			rte_spinlock_init(
+				&data->priv_timer[lcore_id].list_lock);
+			data->priv_timer[lcore_id].prev_lcore = lcore_id;
+		}
+	}
+
+	rte_timer_data_arr[default_data_id].internal_flags |= FL_ALLOCATED;
+
+	rte_timer_subsystem_initialized = 1;
+
+	return 0;
+}
+MAP_STATIC_SYMBOL(int rte_timer_subsystem_init(void),
+		  rte_timer_subsystem_init_v1905);
+BIND_DEFAULT_SYMBOL(rte_timer_subsystem_init, _v1905, 19.05);
+
+void __rte_experimental
+rte_timer_subsystem_finalize(void)
+{
+	if (rte_timer_data_arr)
+		rte_free(rte_timer_data_arr);
+
+	rte_timer_subsystem_initialized = 0;
+}
 
 /* Initialize the timer handle tim for use */
 void
@@ -95,7 +229,8 @@ rte_timer_init(struct rte_timer *tim)
  */
 static int
 timer_set_config_state(struct rte_timer *tim,
-		       union rte_timer_status *ret_prev_status)
+		       union rte_timer_status *ret_prev_status,
+		       struct priv_timer *priv_timer)
 {
 	union rte_timer_status prev_status, status;
 	int success = 0;
@@ -207,7 +342,7 @@ timer_get_skiplist_level(unsigned curr_depth)
  */
 static void
 timer_get_prev_entries(uint64_t time_val, unsigned tim_lcore,
-		struct rte_timer **prev)
+		       struct rte_timer **prev, struct priv_timer *priv_timer)
 {
 	unsigned lvl = priv_timer[tim_lcore].curr_skiplist_depth;
 	prev[lvl] = &priv_timer[tim_lcore].pending_head;
@@ -226,13 +361,15 @@ timer_get_prev_entries(uint64_t time_val, unsigned tim_lcore,
  */
 static void
 timer_get_prev_entries_for_node(struct rte_timer *tim, unsigned tim_lcore,
-		struct rte_timer **prev)
+				struct rte_timer **prev,
+				struct priv_timer *priv_timer)
 {
 	int i;
+
 	/* to get a specific entry in the list, look for just lower than the time
 	 * values, and then increment on each level individually if necessary
 	 */
-	timer_get_prev_entries(tim->expire - 1, tim_lcore, prev);
+	timer_get_prev_entries(tim->expire - 1, tim_lcore, prev, priv_timer);
 	for (i = priv_timer[tim_lcore].curr_skiplist_depth - 1; i >= 0; i--) {
 		while (prev[i]->sl_next[i] != NULL &&
 				prev[i]->sl_next[i] != tim &&
@@ -247,14 +384,15 @@ timer_get_prev_entries_for_node(struct rte_timer *tim, unsigned tim_lcore,
  * timer must not be in a list
  */
 static void
-timer_add(struct rte_timer *tim, unsigned int tim_lcore)
+timer_add(struct rte_timer *tim, unsigned int tim_lcore,
+	  struct priv_timer *priv_timer)
 {
 	unsigned lvl;
 	struct rte_timer *prev[MAX_SKIPLIST_DEPTH+1];
 
 	/* find where exactly this element goes in the list of elements
 	 * for each depth. */
-	timer_get_prev_entries(tim->expire, tim_lcore, prev);
+	timer_get_prev_entries(tim->expire, tim_lcore, prev, priv_timer);
 
 	/* now assign it a new level and add at that level */
 	const unsigned tim_level = timer_get_skiplist_level(
@@ -284,7 +422,7 @@ timer_add(struct rte_timer *tim, unsigned int tim_lcore)
  */
 static void
 timer_del(struct rte_timer *tim, union rte_timer_status prev_status,
-		int local_is_locked)
+	  int local_is_locked, struct priv_timer *priv_timer)
 {
 	unsigned lcore_id = rte_lcore_id();
 	unsigned prev_owner = prev_status.owner;
@@ -304,7 +442,7 @@ timer_del(struct rte_timer *tim, union rte_timer_status prev_status,
 				((tim->sl_next[0] == NULL) ? 0 : tim->sl_next[0]->expire);
 
 	/* adjust pointers from previous entries to point past this */
-	timer_get_prev_entries_for_node(tim, prev_owner, prev);
+	timer_get_prev_entries_for_node(tim, prev_owner, prev, priv_timer);
 	for (i = priv_timer[prev_owner].curr_skiplist_depth - 1; i >= 0; i--) {
 		if (prev[i]->sl_next[i] == tim)
 			prev[i]->sl_next[i] = tim->sl_next[i];
@@ -326,11 +464,13 @@ static int
 __rte_timer_reset(struct rte_timer *tim, uint64_t expire,
 		  uint64_t period, unsigned tim_lcore,
 		  rte_timer_cb_t fct, void *arg,
-		  int local_is_locked)
+		  int local_is_locked,
+		  struct rte_timer_data *timer_data)
 {
 	union rte_timer_status prev_status, status;
 	int ret;
 	unsigned lcore_id = rte_lcore_id();
+	struct priv_timer *priv_timer = timer_data->priv_timer;
 
 	/* round robin for tim_lcore */
 	if (tim_lcore == (unsigned)LCORE_ID_ANY) {
@@ -348,11 +488,11 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t expire,
 
 	/* wait that the timer is in correct status before update,
 	 * and mark it as being configured */
-	ret = timer_set_config_state(tim, &prev_status);
+	ret = timer_set_config_state(tim, &prev_status, priv_timer);
 	if (ret < 0)
 		return -1;
 
-	__TIMER_STAT_ADD(reset, 1);
+	__TIMER_STAT_ADD(priv_timer, reset, 1);
 	if (prev_status.state == RTE_TIMER_RUNNING &&
 	    lcore_id < RTE_MAX_LCORE) {
 		priv_timer[lcore_id].updated = 1;
@@ -360,8 +500,8 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t expire,
 
 	/* remove it from list */
 	if (prev_status.state == RTE_TIMER_PENDING) {
-		timer_del(tim, prev_status, local_is_locked);
-		__TIMER_STAT_ADD(pending, -1);
+		timer_del(tim, prev_status, local_is_locked, priv_timer);
+		__TIMER_STAT_ADD(priv_timer, pending, -1);
 	}
 
 	tim->period = period;
@@ -376,8 +516,8 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t expire,
 	if (tim_lcore != lcore_id || !local_is_locked)
 		rte_spinlock_lock(&priv_timer[tim_lcore].list_lock);
 
-	__TIMER_STAT_ADD(pending, 1);
-	timer_add(tim, tim_lcore);
+	__TIMER_STAT_ADD(priv_timer, pending, 1);
+	timer_add(tim, tim_lcore, priv_timer);
 
 	/* update state: as we are in CONFIG state, only us can modify
 	 * the state so we don't need to use cmpset() here */
@@ -394,9 +534,9 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t expire,
 
 /* Reset and start the timer associated with the timer handle tim */
 int
-rte_timer_reset(struct rte_timer *tim, uint64_t ticks,
-		enum rte_timer_type type, unsigned tim_lcore,
-		rte_timer_cb_t fct, void *arg)
+rte_timer_reset_v20(struct rte_timer *tim, uint64_t ticks,
+		    enum rte_timer_type type, unsigned int tim_lcore,
+		    rte_timer_cb_t fct, void *arg)
 {
 	uint64_t cur_time = rte_get_timer_cycles();
 	uint64_t period;
@@ -412,7 +552,48 @@ rte_timer_reset(struct rte_timer *tim, uint64_t ticks,
 		period = 0;
 
 	return __rte_timer_reset(tim,  cur_time + ticks, period, tim_lcore,
-			  fct, arg, 0);
+			  fct, arg, 0, &default_timer_data);
+}
+VERSION_SYMBOL(rte_timer_reset, _v20, 2.0);
+
+int
+rte_timer_reset_v1905(struct rte_timer *tim, uint64_t ticks,
+		      enum rte_timer_type type, unsigned int tim_lcore,
+		      rte_timer_cb_t fct, void *arg)
+{
+	return rte_timer_alt_reset(default_data_id, tim, ticks, type,
+				   tim_lcore, fct, arg);
+}
+MAP_STATIC_SYMBOL(int rte_timer_reset(struct rte_timer *tim, uint64_t ticks,
+				      enum rte_timer_type type,
+				      unsigned int tim_lcore,
+				      rte_timer_cb_t fct, void *arg),
+		  rte_timer_reset_v1905);
+BIND_DEFAULT_SYMBOL(rte_timer_reset, _v1905, 19.05);
+
+int __rte_experimental
+rte_timer_alt_reset(uint32_t timer_data_id, struct rte_timer *tim,
+		    uint64_t ticks, enum rte_timer_type type,
+		    unsigned int tim_lcore, rte_timer_cb_t fct, void *arg)
+{
+	uint64_t cur_time = rte_get_timer_cycles();
+	uint64_t period;
+	struct rte_timer_data *timer_data;
+
+	TIMER_DATA_VALID_GET_OR_ERR_RET(timer_data_id, timer_data, -EINVAL);
+
+	if (unlikely((tim_lcore != (unsigned int)LCORE_ID_ANY) &&
+			!(rte_lcore_is_enabled(tim_lcore) ||
+			  rte_lcore_has_role(tim_lcore, ROLE_SERVICE))))
+		return -1;
+
+	if (type == PERIODICAL)
+		period = ticks;
+	else
+		period = 0;
+
+	return __rte_timer_reset(tim,  cur_time + ticks, period, tim_lcore,
+				 fct, arg, 0, timer_data);
 }
 
 /* loop until rte_timer_reset() succeed */
@@ -426,21 +607,22 @@ rte_timer_reset_sync(struct rte_timer *tim, uint64_t ticks,
 		rte_pause();
 }
 
-/* Stop the timer associated with the timer handle tim */
-int
-rte_timer_stop(struct rte_timer *tim)
+static int
+__rte_timer_stop(struct rte_timer *tim, int local_is_locked,
+		 struct rte_timer_data *timer_data)
 {
 	union rte_timer_status prev_status, status;
 	unsigned lcore_id = rte_lcore_id();
 	int ret;
+	struct priv_timer *priv_timer = timer_data->priv_timer;
 
 	/* wait that the timer is in correct status before update,
 	 * and mark it as being configured */
-	ret = timer_set_config_state(tim, &prev_status);
+	ret = timer_set_config_state(tim, &prev_status, priv_timer);
 	if (ret < 0)
 		return -1;
 
-	__TIMER_STAT_ADD(stop, 1);
+	__TIMER_STAT_ADD(priv_timer, stop, 1);
 	if (prev_status.state == RTE_TIMER_RUNNING &&
 	    lcore_id < RTE_MAX_LCORE) {
 		priv_timer[lcore_id].updated = 1;
@@ -448,8 +630,8 @@ rte_timer_stop(struct rte_timer *tim)
 
 	/* remove it from list */
 	if (prev_status.state == RTE_TIMER_PENDING) {
-		timer_del(tim, prev_status, 0);
-		__TIMER_STAT_ADD(pending, -1);
+		timer_del(tim, prev_status, local_is_locked, priv_timer);
+		__TIMER_STAT_ADD(priv_timer, pending, -1);
 	}
 
 	/* mark timer as stopped */
@@ -461,6 +643,33 @@ rte_timer_stop(struct rte_timer *tim)
 	return 0;
 }
 
+/* Stop the timer associated with the timer handle tim */
+int
+rte_timer_stop_v20(struct rte_timer *tim)
+{
+	return __rte_timer_stop(tim, 0, &default_timer_data);
+}
+VERSION_SYMBOL(rte_timer_stop, _v20, 2.0);
+
+int
+rte_timer_stop_v1905(struct rte_timer *tim)
+{
+	return rte_timer_alt_stop(default_data_id, tim);
+}
+MAP_STATIC_SYMBOL(int rte_timer_stop(struct rte_timer *tim),
+		  rte_timer_stop_v1905);
+BIND_DEFAULT_SYMBOL(rte_timer_stop, _v1905, 19.05);
+
+int __rte_experimental
+rte_timer_alt_stop(uint32_t timer_data_id, struct rte_timer *tim)
+{
+	struct rte_timer_data *timer_data;
+
+	TIMER_DATA_VALID_GET_OR_ERR_RET(timer_data_id, timer_data, -EINVAL);
+
+	return __rte_timer_stop(tim, 0, timer_data);
+}
+
 /* loop until rte_timer_stop() succeed */
 void
 rte_timer_stop_sync(struct rte_timer *tim)
@@ -477,7 +686,8 @@ rte_timer_pending(struct rte_timer *tim)
 }
 
 /* must be called periodically, run all timer that expired */
-void rte_timer_manage(void)
+static void
+__rte_timer_manage(struct rte_timer_data *timer_data)
 {
 	union rte_timer_status status;
 	struct rte_timer *tim, *next_tim;
@@ -486,11 +696,12 @@ void rte_timer_manage(void)
 	struct rte_timer *prev[MAX_SKIPLIST_DEPTH + 1];
 	uint64_t cur_time;
 	int i, ret;
+	struct priv_timer *priv_timer = timer_data->priv_timer;
 
 	/* timer manager only runs on EAL thread with valid lcore_id */
 	assert(lcore_id < RTE_MAX_LCORE);
 
-	__TIMER_STAT_ADD(manage, 1);
+	__TIMER_STAT_ADD(priv_timer, manage, 1);
 	/* optimize for the case where per-cpu list is empty */
 	if (priv_timer[lcore_id].pending_head.sl_next[0] == NULL)
 		return;
@@ -518,7 +729,7 @@ void rte_timer_manage(void)
 	tim = priv_timer[lcore_id].pending_head.sl_next[0];
 
 	/* break the existing list at current time point */
-	timer_get_prev_entries(cur_time, lcore_id, prev);
+	timer_get_prev_entries(cur_time, lcore_id, prev, priv_timer);
 	for (i = priv_timer[lcore_id].curr_skiplist_depth -1; i >= 0; i--) {
 		if (prev[i] == &priv_timer[lcore_id].pending_head)
 			continue;
@@ -563,7 +774,7 @@ void rte_timer_manage(void)
 		/* execute callback function with list unlocked */
 		tim->f(tim, tim->arg);
 
-		__TIMER_STAT_ADD(pending, -1);
+		__TIMER_STAT_ADD(priv_timer, pending, -1);
 		/* the timer was stopped or reloaded by the callback
 		 * function, we have nothing to do here */
 		if (priv_timer[lcore_id].updated == 1)
@@ -580,24 +791,222 @@ void rte_timer_manage(void)
 			/* keep it in list and mark timer as pending */
 			rte_spinlock_lock(&priv_timer[lcore_id].list_lock);
 			status.state = RTE_TIMER_PENDING;
-			__TIMER_STAT_ADD(pending, 1);
+			__TIMER_STAT_ADD(priv_timer, pending, 1);
 			status.owner = (int16_t)lcore_id;
 			rte_wmb();
 			tim->status.u32 = status.u32;
 			__rte_timer_reset(tim, tim->expire + tim->period,
-				tim->period, lcore_id, tim->f, tim->arg, 1);
+				tim->period, lcore_id, tim->f, tim->arg, 1,
+				timer_data);
 			rte_spinlock_unlock(&priv_timer[lcore_id].list_lock);
 		}
 	}
 	priv_timer[lcore_id].running_tim = NULL;
 }
 
+void
+rte_timer_manage_v20(void)
+{
+	__rte_timer_manage(&default_timer_data);
+}
+VERSION_SYMBOL(rte_timer_manage, _v20, 2.0);
+
+int
+rte_timer_manage_v1905(void)
+{
+	struct rte_timer_data *timer_data;
+
+	TIMER_DATA_VALID_GET_OR_ERR_RET(default_data_id, timer_data, -EINVAL);
+
+	__rte_timer_manage(timer_data);
+
+	return 0;
+}
+MAP_STATIC_SYMBOL(int rte_timer_manage(void), rte_timer_manage_v1905);
+BIND_DEFAULT_SYMBOL(rte_timer_manage, _v1905, 19.05);
+
+int __rte_experimental
+rte_timer_alt_manage(uint32_t timer_data_id,
+		     unsigned int *poll_lcores,
+		     int nb_poll_lcores,
+		     rte_timer_alt_manage_cb_t f)
+{
+	union rte_timer_status status;
+	struct rte_timer *tim, *next_tim, **pprev;
+	struct rte_timer *run_first_tims[RTE_MAX_LCORE];
+	unsigned int runlist_lcore_ids[RTE_MAX_LCORE];
+	unsigned int this_lcore = rte_lcore_id();
+	struct rte_timer *prev[MAX_SKIPLIST_DEPTH + 1];
+	uint64_t cur_time;
+	int i, j, ret;
+	int nb_runlists = 0;
+	struct rte_timer_data *data;
+	struct priv_timer *privp;
+	uint32_t poll_lcore;
+
+	TIMER_DATA_VALID_GET_OR_ERR_RET(timer_data_id, data, -EINVAL);
+
+	/* timer manager only runs on EAL thread with valid lcore_id */
+	assert(this_lcore < RTE_MAX_LCORE);
+
+	__TIMER_STAT_ADD(data->priv_timer, manage, 1);
+
+	if (poll_lcores == NULL) {
+		poll_lcores = (unsigned int []){rte_lcore_id()};
+		nb_poll_lcores = 1;
+	}
+
+	for (i = 0; i < nb_poll_lcores; i++) {
+		poll_lcore = poll_lcores[i];
+		privp = &data->priv_timer[poll_lcore];
+
+		/* optimize for the case where per-cpu list is empty */
+		if (privp->pending_head.sl_next[0] == NULL)
+			continue;
+		cur_time = rte_get_timer_cycles();
+
+#ifdef RTE_ARCH_64
+		/* on 64-bit the value cached in the pending_head.expired will
+		 * be updated atomically, so we can consult that for a quick
+		 * check here outside the lock
+		 */
+		if (likely(privp->pending_head.expire > cur_time))
+			continue;
+#endif
+
+		/* browse ordered list, add expired timers in 'expired' list */
+		rte_spinlock_lock(&privp->list_lock);
+
+		/* if nothing to do just unlock and return */
+		if (privp->pending_head.sl_next[0] == NULL ||
+		    privp->pending_head.sl_next[0]->expire > cur_time) {
+			rte_spinlock_unlock(&privp->list_lock);
+			continue;
+		}
+
+		/* save start of list of expired timers */
+		tim = privp->pending_head.sl_next[0];
+
+		/* break the existing list at current time point */
+		timer_get_prev_entries(cur_time, poll_lcore, prev,
+				       data->priv_timer);
+		for (j = privp->curr_skiplist_depth - 1; j >= 0; j--) {
+			if (prev[j] == &privp->pending_head)
+				continue;
+			privp->pending_head.sl_next[j] =
+				prev[j]->sl_next[j];
+			if (prev[j]->sl_next[j] == NULL)
+				privp->curr_skiplist_depth--;
+
+			prev[j]->sl_next[j] = NULL;
+		}
+
+		/* transition run-list from PENDING to RUNNING */
+		run_first_tims[nb_runlists] = tim;
+		runlist_lcore_ids[nb_runlists] = poll_lcore;
+		pprev = &run_first_tims[nb_runlists];
+		nb_runlists++;
+
+		for ( ; tim != NULL; tim = next_tim) {
+			next_tim = tim->sl_next[0];
+
+			ret = timer_set_running_state(tim);
+			if (likely(ret == 0)) {
+				pprev = &tim->sl_next[0];
+			} else {
+				/* another core is trying to re-config this one,
+				 * remove it from local expired list
+				 */
+				*pprev = next_tim;
+			}
+		}
+
+		/* update the next to expire timer value */
+		privp->pending_head.expire =
+		    (privp->pending_head.sl_next[0] == NULL) ? 0 :
+			privp->pending_head.sl_next[0]->expire;
+
+		rte_spinlock_unlock(&privp->list_lock);
+	}
+
+	/* Now process the run lists */
+	while (1) {
+		bool done = true;
+		uint64_t min_expire = UINT64_MAX;
+		int min_idx = 0;
+
+		/* Find the next oldest timer to process */
+		for (i = 0; i < nb_runlists; i++) {
+			tim = run_first_tims[i];
+
+			if (tim != NULL && tim->expire < min_expire) {
+				min_expire = tim->expire;
+				min_idx = i;
+				done = false;
+			}
+		}
+
+		if (done)
+			break;
+
+		tim = run_first_tims[min_idx];
+		privp = &data->priv_timer[runlist_lcore_ids[min_idx]];
+
+		/* Move down the runlist from which we picked a timer to
+		 * execute
+		 */
+		run_first_tims[min_idx] = run_first_tims[min_idx]->sl_next[0];
+
+		privp->updated = 0;
+		privp->running_tim = tim;
+
+		/* Call the provided callback function */
+		f(tim);
+
+		__TIMER_STAT_ADD(privp, pending, -1);
+
+		/* the timer was stopped or reloaded by the callback
+		 * function, we have nothing to do here
+		 */
+		if (privp->updated == 1)
+			continue;
+
+		if (tim->period == 0) {
+			/* remove from done list and mark timer as stopped */
+			status.state = RTE_TIMER_STOP;
+			status.owner = RTE_TIMER_NO_OWNER;
+			rte_wmb();
+			tim->status.u32 = status.u32;
+		} else {
+			/* keep it in list and mark timer as pending */
+			rte_spinlock_lock(
+				&data->priv_timer[this_lcore].list_lock);
+			status.state = RTE_TIMER_PENDING;
+			__TIMER_STAT_ADD(data->priv_timer, pending, 1);
+			status.owner = (int16_t)this_lcore;
+			rte_wmb();
+			tim->status.u32 = status.u32;
+			__rte_timer_reset(tim, tim->expire + tim->period,
+				tim->period, this_lcore, tim->f, tim->arg, 1,
+				data);
+			rte_spinlock_unlock(
+				&data->priv_timer[this_lcore].list_lock);
+		}
+
+		privp->running_tim = NULL;
+	}
+
+	return 0;
+}
+
 /* dump statistics about timers */
-void rte_timer_dump_stats(FILE *f)
+static void
+__rte_timer_dump_stats(struct rte_timer_data *timer_data __rte_unused, FILE *f)
 {
 #ifdef RTE_LIBRTE_TIMER_DEBUG
 	struct rte_timer_debug_stats sum;
 	unsigned lcore_id;
+	struct priv_timer *priv_timer = timer_data->priv_timer;
 
 	memset(&sum, 0, sizeof(sum));
 	for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
@@ -615,3 +1024,31 @@ void rte_timer_dump_stats(FILE *f)
 	fprintf(f, "No timer statistics, RTE_LIBRTE_TIMER_DEBUG is disabled\n");
 #endif
 }
+
+void
+rte_timer_dump_stats_v20(FILE *f)
+{
+	__rte_timer_dump_stats(&default_timer_data, f);
+}
+VERSION_SYMBOL(rte_timer_dump_stats, _v20, 2.0);
+
+int
+rte_timer_dump_stats_v1905(FILE *f)
+{
+	return rte_timer_alt_dump_stats(default_data_id, f);
+}
+MAP_STATIC_SYMBOL(int rte_timer_dump_stats(FILE *f),
+		  rte_timer_dump_stats_v1905);
+BIND_DEFAULT_SYMBOL(rte_timer_dump_stats, _v1905, 19.05);
+
+int __rte_experimental
+rte_timer_alt_dump_stats(uint32_t timer_data_id __rte_unused, FILE *f)
+{
+	struct rte_timer_data *timer_data;
+
+	TIMER_DATA_VALID_GET_OR_ERR_RET(timer_data_id, timer_data, -EINVAL);
+
+	__rte_timer_dump_stats(timer_data, f);
+
+	return 0;
+}
diff --git a/lib/librte_timer/rte_timer.h b/lib/librte_timer/rte_timer.h
index 9b95cd2..6a9c499 100644
--- a/lib/librte_timer/rte_timer.h
+++ b/lib/librte_timer/rte_timer.h
@@ -39,6 +39,7 @@
 #include <stddef.h>
 #include <rte_common.h>
 #include <rte_config.h>
+#include <rte_spinlock.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -132,12 +133,68 @@ struct rte_timer
 #endif
 
 /**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Allocate a timer data instance in shared memory to track a set of pending
+ * timer lists.
+ *
+ * @param id_ptr
+ *   Pointer to variable into which to write the identifier of the allocated
+ *   timer data instance.
+ *
+ * @return
+ *   - 0: Success
+ *   - -ENOSPC: maximum number of timer data instances already allocated
+ */
+int __rte_experimental rte_timer_data_alloc(uint32_t *id_ptr);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Deallocate a timer data instance.
+ *
+ * @param id
+ *   Identifier of the timer data instance to deallocate.
+ *
+ * @return
+ *   - 0: Success
+ *   - -EINVAL: invalid timer data instance identifier
+ */
+int __rte_experimental rte_timer_data_dealloc(uint32_t id);
+
+/**
  * Initialize the timer library.
  *
  * Initializes internal variables (list, locks and so on) for the RTE
  * timer library.
  */
-void rte_timer_subsystem_init(void);
+void rte_timer_subsystem_init_v20(void);
+
+/**
+ * Initialize the timer library.
+ *
+ * Initializes internal variables (list, locks and so on) for the RTE
+ * timer library.
+ *
+ * @return
+ *   - 0: Success
+ *   - -EEXIST: Returned in secondary process when primary process has not
+ *      yet initialized the timer subsystem
+ *   - -ENOMEM: Unable to allocate memory needed to initialize timer
+ *      subsystem
+ */
+int rte_timer_subsystem_init_v1905(void);
+int rte_timer_subsystem_init(void);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Free timer subsystem resources.
+ */
+void __rte_experimental rte_timer_subsystem_finalize(void);
 
 /**
  * Initialize a timer handle.
@@ -193,6 +250,12 @@ void rte_timer_init(struct rte_timer *tim);
  *   - 0: Success; the timer is scheduled.
  *   - (-1): Timer is in the RUNNING or CONFIG state.
  */
+int rte_timer_reset_v20(struct rte_timer *tim, uint64_t ticks,
+			enum rte_timer_type type, unsigned int tim_lcore,
+			rte_timer_cb_t fct, void *arg);
+int rte_timer_reset_v1905(struct rte_timer *tim, uint64_t ticks,
+			  enum rte_timer_type type, unsigned int tim_lcore,
+			  rte_timer_cb_t fct, void *arg);
 int rte_timer_reset(struct rte_timer *tim, uint64_t ticks,
 		    enum rte_timer_type type, unsigned tim_lcore,
 		    rte_timer_cb_t fct, void *arg);
@@ -252,9 +315,10 @@ rte_timer_reset_sync(struct rte_timer *tim, uint64_t ticks,
  *   - 0: Success; the timer is stopped.
  *   - (-1): The timer is in the RUNNING or CONFIG state.
  */
+int rte_timer_stop_v20(struct rte_timer *tim);
+int rte_timer_stop_v1905(struct rte_timer *tim);
 int rte_timer_stop(struct rte_timer *tim);
 
-
 /**
  * Loop until rte_timer_stop() succeeds.
  *
@@ -292,7 +356,25 @@ int rte_timer_pending(struct rte_timer *tim);
  * function. However, the more often the function is called, the more
  * CPU resources it will use.
  */
-void rte_timer_manage(void);
+void rte_timer_manage_v20(void);
+
+/**
+ * Manage the timer list and execute callback functions.
+ *
+ * This function must be called periodically from EAL lcores
+ * main_loop(). It browses the list of pending timers and runs all
+ * timers that are expired.
+ *
+ * The precision of the timer depends on the call frequency of this
+ * function. However, the more often the function is called, the more
+ * CPU resources it will use.
+ *
+ * @return
+ *   - 0: Success
+ *   - -EINVAL: timer subsystem not yet initialized
+ */
+int rte_timer_manage_v1905(void);
+int rte_timer_manage(void);
 
 /**
  * Dump statistics about timers.
@@ -300,7 +382,143 @@ void rte_timer_manage(void);
  * @param f
  *   A pointer to a file for output
  */
-void rte_timer_dump_stats(FILE *f);
+void rte_timer_dump_stats_v20(FILE *f);
+
+/**
+ * Dump statistics about timers.
+ *
+ * @param f
+ *   A pointer to a file for output
+ * @return
+ *   - 0: Success
+ *   - -EINVAL: timer subsystem not yet initialized
+ */
+int rte_timer_dump_stats_v1905(FILE *f);
+int rte_timer_dump_stats(FILE *f);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * This function is the same as rte_timer_reset(), except that it allows a
+ * caller to specify the rte_timer_data instance containing the list to which
+ * the timer should be added.
+ *
+ * @see rte_timer_reset()
+ *
+ * @param timer_data_id
+ *   An identifier indicating which instance of timer data should be used for
+ *   this operation.
+ * @param tim
+ *   The timer handle.
+ * @param ticks
+ *   The number of cycles (see rte_get_hpet_hz()) before the callback
+ *   function is called.
+ * @param type
+ *   The type can be either:
+ *   - PERIODICAL: The timer is automatically reloaded after execution
+ *     (returns to the PENDING state)
+ *   - SINGLE: The timer is one-shot, that is, the timer goes to a
+ *     STOPPED state after execution.
+ * @param tim_lcore
+ *   The ID of the lcore where the timer callback function has to be
+ *   executed. If tim_lcore is LCORE_ID_ANY, the timer library will
+ *   launch it on a different core for each call (round-robin).
+ * @param fct
+ *   The callback function of the timer. This parameter can be NULL if (and
+ *   only if) rte_timer_alt_manage() will be used to manage this timer.
+ * @param arg
+ *   The user argument of the callback function.
+ * @return
+ *   - 0: Success; the timer is scheduled.
+ *   - (-1): Timer is in the RUNNING or CONFIG state.
+ *   - -EINVAL: invalid timer_data_id
+ */
+int __rte_experimental
+rte_timer_alt_reset(uint32_t timer_data_id, struct rte_timer *tim,
+		    uint64_t ticks, enum rte_timer_type type,
+		    unsigned int tim_lcore, rte_timer_cb_t fct, void *arg);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * This function is the same as rte_timer_stop(), except that it allows a
+ * caller to specify the rte_timer_data instance containing the list from which
+ * this timer should be removed.
+ *
+ * @see rte_timer_stop()
+ *
+ * @param timer_data_id
+ *   An identifier indicating which instance of timer data should be used for
+ *   this operation.
+ * @param tim
+ *   The timer handle.
+ * @return
+ *   - 0: Success; the timer is stopped.
+ *   - (-1): The timer is in the RUNNING or CONFIG state.
+ *   - -EINVAL: invalid timer_data_id
+ */
+int __rte_experimental
+rte_timer_alt_stop(uint32_t timer_data_id, struct rte_timer *tim);
+
+/**
+ * Callback function type for rte_timer_alt_manage().
+ */
+typedef void (*rte_timer_alt_manage_cb_t)(struct rte_timer *tim);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Manage a set of timer lists and execute the specified callback function for
+ * all expired timers. This function is similar to rte_timer_manage(), except
+ * that it allows a caller to specify the timer_data instance that should
+ * be operated on, as well as a set of lcore IDs identifying which timer lists
+ * should be processed.  Callback functions of individual timers are ignored.
+ *
+ * @see rte_timer_manage()
+ *
+ * @param timer_data_id
+ *   An identifier indicating which instance of timer data should be used for
+ *   this operation.
+ * @param poll_lcores
+ *   An array of lcore ids identifying the timer lists that should be processed.
+ *   NULL is allowed - if NULL, the timer list corresponding to the lcore
+ *   calling this routine is processed (same as rte_timer_manage()).
+ * @param n_poll_lcores
+ *   The size of the poll_lcores array. If 'poll_lcores' is NULL, this parameter
+ *   is ignored.
+ * @param f
+ *   The callback function which should be called for all expired timers.
+ * @return
+ *   - 0: success
+ *   - -EINVAL: invalid timer_data_id
+ */
+int __rte_experimental
+rte_timer_alt_manage(uint32_t timer_data_id, unsigned int *poll_lcores,
+		     int n_poll_lcores, rte_timer_alt_manage_cb_t f);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * This function is the same as rte_timer_dump_stats(), except that it allows
+ * the caller to specify the rte_timer_data instance that should be used.
+ *
+ * @see rte_timer_dump_stats()
+ *
+ * @param timer_data_id
+ *   An identifier indicating which instance of timer data should be used for
+ *   this operation.
+ * @param f
+ *   A pointer to a file for output
+ * @return
+ *   - 0: success
+ *   - -EINVAL: invalid timer_data_id
+ */
+int __rte_experimental
+rte_timer_alt_dump_stats(uint32_t timer_data_id, FILE *f);
 
 #ifdef __cplusplus
 }
diff --git a/lib/librte_timer/rte_timer_version.map b/lib/librte_timer/rte_timer_version.map
index 9b2e4b8..c2e5836 100644
--- a/lib/librte_timer/rte_timer_version.map
+++ b/lib/librte_timer/rte_timer_version.map
@@ -13,3 +13,25 @@ DPDK_2.0 {
 
 	local: *;
 };
+
+DPDK_19.05 {
+	global:
+
+	rte_timer_dump_stats;
+	rte_timer_manage;
+	rte_timer_reset;
+	rte_timer_stop;
+	rte_timer_subsystem_init;
+} DPDK_2.0;
+
+EXPERIMENTAL {
+	global:
+
+	rte_timer_alt_dump_stats;
+	rte_timer_alt_manage;
+	rte_timer_alt_reset;
+	rte_timer_alt_stop;
+	rte_timer_data_alloc;
+	rte_timer_data_dealloc;
+	rte_timer_subsystem_finalize;
+};
-- 
2.6.4


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [dpdk-dev] [PATCH v5 2/2] timer: add function to stop all timers in a list
  2019-04-15 21:41       ` [dpdk-dev] [PATCH v5 0/2] Timer library changes Erik Gabriel Carrillo
  2019-04-15 21:41         ` Erik Gabriel Carrillo
  2019-04-15 21:41         ` [dpdk-dev] [PATCH v5 1/2] timer: allow timer management in shared memory Erik Gabriel Carrillo
@ 2019-04-15 21:41         ` Erik Gabriel Carrillo
  2019-04-15 21:41           ` Erik Gabriel Carrillo
  2019-04-17 19:54         ` [dpdk-dev] [PATCH v5 0/2] Timer library changes Thomas Monjalon
  3 siblings, 1 reply; 77+ messages in thread
From: Erik Gabriel Carrillo @ 2019-04-15 21:41 UTC (permalink / raw)
  To: rsanford, thomas; +Cc: dev

Add a function to the timer API that allows a caller to traverse a
specified set of timer lists, stopping each timer in each list,
and invoking a callback function.

Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
---
 lib/librte_timer/rte_timer.c           | 38 ++++++++++++++++++++++++++++++++++
 lib/librte_timer/rte_timer.h           | 32 ++++++++++++++++++++++++++++
 lib/librte_timer/rte_timer_version.map |  1 +
 3 files changed, 71 insertions(+)

diff --git a/lib/librte_timer/rte_timer.c b/lib/librte_timer/rte_timer.c
index 511d902..ae5d236 100644
--- a/lib/librte_timer/rte_timer.c
+++ b/lib/librte_timer/rte_timer.c
@@ -999,6 +999,44 @@ rte_timer_alt_manage(uint32_t timer_data_id,
 	return 0;
 }
 
+/* Walk pending lists, stopping timers and calling user-specified function */
+int __rte_experimental
+rte_timer_stop_all(uint32_t timer_data_id, unsigned int *walk_lcores,
+		   int nb_walk_lcores,
+		   rte_timer_stop_all_cb_t f, void *f_arg)
+{
+	int i;
+	struct priv_timer *priv_timer;
+	uint32_t walk_lcore;
+	struct rte_timer *tim, *next_tim;
+	struct rte_timer_data *timer_data;
+
+	TIMER_DATA_VALID_GET_OR_ERR_RET(timer_data_id, timer_data, -EINVAL);
+
+	for (i = 0; i < nb_walk_lcores; i++) {
+		walk_lcore = walk_lcores[i];
+		priv_timer = &timer_data->priv_timer[walk_lcore];
+
+		rte_spinlock_lock(&priv_timer->list_lock);
+
+		for (tim = priv_timer->pending_head.sl_next[0];
+		     tim != NULL;
+		     tim = next_tim) {
+			next_tim = tim->sl_next[0];
+
+			/* Call timer_stop with lock held */
+			__rte_timer_stop(tim, 1, timer_data);
+
+			if (f)
+				f(tim, f_arg);
+		}
+
+		rte_spinlock_unlock(&priv_timer->list_lock);
+	}
+
+	return 0;
+}
+
 /* dump statistics about timers */
 static void
 __rte_timer_dump_stats(struct rte_timer_data *timer_data __rte_unused, FILE *f)
diff --git a/lib/librte_timer/rte_timer.h b/lib/librte_timer/rte_timer.h
index 6a9c499..b502f8c 100644
--- a/lib/librte_timer/rte_timer.h
+++ b/lib/librte_timer/rte_timer.h
@@ -500,6 +500,38 @@ rte_timer_alt_manage(uint32_t timer_data_id, unsigned int *poll_lcores,
 		     int n_poll_lcores, rte_timer_alt_manage_cb_t f);
 
 /**
+ * Callback function type for rte_timer_stop_all().
+ */
+typedef void (*rte_timer_stop_all_cb_t)(struct rte_timer *tim, void *arg);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Walk the pending timer lists for the specified lcore IDs, and for each timer
+ * that is encountered, stop it and call the specified callback function to
+ * process it further.
+ *
+ * @param timer_data_id
+ *   An identifier indicating which instance of timer data should be used for
+ *   this operation.
+ * @param walk_lcores
+ *   An array of lcore ids identifying the timer lists that should be processed.
+ * @param nb_walk_lcores
+ *   The size of the walk_lcores array.
+ * @param f
+ *   The callback function which should be called for each timers. Can be NULL.
+ * @param f_arg
+ *   An arbitrary argument that will be passed to f, if it is called.
+ * @return
+ *   - 0: success
+ *   - EINVAL: invalid timer_data_id
+ */
+int __rte_experimental
+rte_timer_stop_all(uint32_t timer_data_id, unsigned int *walk_lcores,
+		   int nb_walk_lcores, rte_timer_stop_all_cb_t f, void *f_arg);
+
+/**
  * @warning
  * @b EXPERIMENTAL: this API may change without prior notice
  *
diff --git a/lib/librte_timer/rte_timer_version.map b/lib/librte_timer/rte_timer_version.map
index c2e5836..72f75c8 100644
--- a/lib/librte_timer/rte_timer_version.map
+++ b/lib/librte_timer/rte_timer_version.map
@@ -33,5 +33,6 @@ EXPERIMENTAL {
 	rte_timer_alt_stop;
 	rte_timer_data_alloc;
 	rte_timer_data_dealloc;
+	rte_timer_stop_all;
 	rte_timer_subsystem_finalize;
 };
-- 
2.6.4

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [dpdk-dev] [PATCH v5 2/2] timer: add function to stop all timers in a list
  2019-04-15 21:41         ` [dpdk-dev] [PATCH v5 2/2] timer: add function to stop all timers in a list Erik Gabriel Carrillo
@ 2019-04-15 21:41           ` Erik Gabriel Carrillo
  0 siblings, 0 replies; 77+ messages in thread
From: Erik Gabriel Carrillo @ 2019-04-15 21:41 UTC (permalink / raw)
  To: rsanford, thomas; +Cc: dev

Add a function to the timer API that allows a caller to traverse a
specified set of timer lists, stopping each timer in each list,
and invoking a callback function.

Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
---
 lib/librte_timer/rte_timer.c           | 38 ++++++++++++++++++++++++++++++++++
 lib/librte_timer/rte_timer.h           | 32 ++++++++++++++++++++++++++++
 lib/librte_timer/rte_timer_version.map |  1 +
 3 files changed, 71 insertions(+)

diff --git a/lib/librte_timer/rte_timer.c b/lib/librte_timer/rte_timer.c
index 511d902..ae5d236 100644
--- a/lib/librte_timer/rte_timer.c
+++ b/lib/librte_timer/rte_timer.c
@@ -999,6 +999,44 @@ rte_timer_alt_manage(uint32_t timer_data_id,
 	return 0;
 }
 
+/* Walk pending lists, stopping timers and calling user-specified function */
+int __rte_experimental
+rte_timer_stop_all(uint32_t timer_data_id, unsigned int *walk_lcores,
+		   int nb_walk_lcores,
+		   rte_timer_stop_all_cb_t f, void *f_arg)
+{
+	int i;
+	struct priv_timer *priv_timer;
+	uint32_t walk_lcore;
+	struct rte_timer *tim, *next_tim;
+	struct rte_timer_data *timer_data;
+
+	TIMER_DATA_VALID_GET_OR_ERR_RET(timer_data_id, timer_data, -EINVAL);
+
+	for (i = 0; i < nb_walk_lcores; i++) {
+		walk_lcore = walk_lcores[i];
+		priv_timer = &timer_data->priv_timer[walk_lcore];
+
+		rte_spinlock_lock(&priv_timer->list_lock);
+
+		for (tim = priv_timer->pending_head.sl_next[0];
+		     tim != NULL;
+		     tim = next_tim) {
+			next_tim = tim->sl_next[0];
+
+			/* Call timer_stop with lock held */
+			__rte_timer_stop(tim, 1, timer_data);
+
+			if (f)
+				f(tim, f_arg);
+		}
+
+		rte_spinlock_unlock(&priv_timer->list_lock);
+	}
+
+	return 0;
+}
+
 /* dump statistics about timers */
 static void
 __rte_timer_dump_stats(struct rte_timer_data *timer_data __rte_unused, FILE *f)
diff --git a/lib/librte_timer/rte_timer.h b/lib/librte_timer/rte_timer.h
index 6a9c499..b502f8c 100644
--- a/lib/librte_timer/rte_timer.h
+++ b/lib/librte_timer/rte_timer.h
@@ -500,6 +500,38 @@ rte_timer_alt_manage(uint32_t timer_data_id, unsigned int *poll_lcores,
 		     int n_poll_lcores, rte_timer_alt_manage_cb_t f);
 
 /**
+ * Callback function type for rte_timer_stop_all().
+ */
+typedef void (*rte_timer_stop_all_cb_t)(struct rte_timer *tim, void *arg);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Walk the pending timer lists for the specified lcore IDs, and for each timer
+ * that is encountered, stop it and call the specified callback function to
+ * process it further.
+ *
+ * @param timer_data_id
+ *   An identifier indicating which instance of timer data should be used for
+ *   this operation.
+ * @param walk_lcores
+ *   An array of lcore ids identifying the timer lists that should be processed.
+ * @param nb_walk_lcores
+ *   The size of the walk_lcores array.
+ * @param f
+ *   The callback function which should be called for each timers. Can be NULL.
+ * @param f_arg
+ *   An arbitrary argument that will be passed to f, if it is called.
+ * @return
+ *   - 0: success
+ *   - EINVAL: invalid timer_data_id
+ */
+int __rte_experimental
+rte_timer_stop_all(uint32_t timer_data_id, unsigned int *walk_lcores,
+		   int nb_walk_lcores, rte_timer_stop_all_cb_t f, void *f_arg);
+
+/**
  * @warning
  * @b EXPERIMENTAL: this API may change without prior notice
  *
diff --git a/lib/librte_timer/rte_timer_version.map b/lib/librte_timer/rte_timer_version.map
index c2e5836..72f75c8 100644
--- a/lib/librte_timer/rte_timer_version.map
+++ b/lib/librte_timer/rte_timer_version.map
@@ -33,5 +33,6 @@ EXPERIMENTAL {
 	rte_timer_alt_stop;
 	rte_timer_data_alloc;
 	rte_timer_data_dealloc;
+	rte_timer_stop_all;
 	rte_timer_subsystem_finalize;
 };
-- 
2.6.4


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [dpdk-dev] [PATCH v4 1/2] timer: allow timer management in shared memory
  2019-03-20 13:52         ` Sanford, Robert
  2019-03-20 13:52           ` Sanford, Robert
  2019-03-21  1:01           ` Carrillo, Erik G
@ 2019-04-15 21:49           ` Carrillo, Erik G
  2019-04-15 21:49             ` Carrillo, Erik G
  2 siblings, 1 reply; 77+ messages in thread
From: Carrillo, Erik G @ 2019-04-15 21:49 UTC (permalink / raw)
  To: Sanford, Robert, thomas, dev

Hi Robert,

I'm back in the office now;  I just submitted an updated patch series to address some of the points you made below.   I'll add responses in-line:

> -----Original Message-----
> From: Sanford, Robert [mailto:rsanford@akamai.com]
> Sent: Wednesday, March 20, 2019 8:53 AM
> To: Carrillo, Erik G <erik.g.carrillo@intel.com>; thomas@monjalon.net;
> dev@dpdk.org
> Cc: nhorman@tuxdriver.com
> Subject: Re: [PATCH v4 1/2] timer: allow timer management in shared
> memory
> 
> Hi Erik,
> 
> I have a few questions and comments on this patch series.
> 
> 1. Don't you think we need new tests (in test/test/) to verify the secondary-
> process APIs?

Yes, good idea.  I'll work on a separate patch to add this.

> 2. I suggest we define default_data_id as const, and explicitly set it to 0.

I did change this to const, but ommitted the explicit initialization because checkpatch 
complains with the following: "ERROR:INITIALISED_STATIC: do not initialise statics to 0".

> 3. The outer for-loop in rte_timer_alt_manage() touches beyond the end of
> poll_lcores[]. I suggest a change like this:
> 
> -       for (i = 0, poll_lcore = poll_lcores[i]; i < nb_poll_lcores;
> -            poll_lcore = poll_lcores[++i]) {
> +       for (i = 0; I < nb_poll_lcores; i++) {
> +            poll_lcore = poll_lcores[i];
> 

Change made.

> 4. Same problem (as #3) in the for-loop in rte_timer_stop_all(), in patch v4
> 2/2.

Change made.

> 5. There seems to be no difference between "typedef void
> (*rte_timer_cb_t)(struct rte_timer *, void *)" and "typedef void
> (*rte_timer_stop_all_cb_t)(struct rte_timer *tim, void *arg)", why add
> rte_timer_stop_all_cb_t?

Though they have the same signature, it seemed clearer to me to have a new callback 
type since one represents a function that gets called per timer, and the other represents
 a function that gets called for all timers.

> 6. Can you provide a use case or code snippet that shows how we will use
> rte_timer_alt_manage()?

Currently this function is used by an updated version of the software event timer 
adapter (http://patchwork.dpdk.org/patch/48944/); rte_timer_alt_manage() is called in 
the service function for an instance of the adapter.  Since this function allows timer_data_ids 
to be specified, different instances of the adapter can manage their own separate timer lists 
independently.

> 7. Why not make the argument to rte_timer_alt_manage_cb_t a "struct
> rte_timer *", instead of a "void *", since we pass a pointer-to-timer when we
> invoke the function?
> 

Change made.

> --
> Regards,
> Robert Sanford
> 

Thanks,
Erik

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [dpdk-dev] [PATCH v4 1/2] timer: allow timer management in shared memory
  2019-04-15 21:49           ` Carrillo, Erik G
@ 2019-04-15 21:49             ` Carrillo, Erik G
  0 siblings, 0 replies; 77+ messages in thread
From: Carrillo, Erik G @ 2019-04-15 21:49 UTC (permalink / raw)
  To: Sanford, Robert, thomas, dev

Hi Robert,

I'm back in the office now;  I just submitted an updated patch series to address some of the points you made below.   I'll add responses in-line:

> -----Original Message-----
> From: Sanford, Robert [mailto:rsanford@akamai.com]
> Sent: Wednesday, March 20, 2019 8:53 AM
> To: Carrillo, Erik G <erik.g.carrillo@intel.com>; thomas@monjalon.net;
> dev@dpdk.org
> Cc: nhorman@tuxdriver.com
> Subject: Re: [PATCH v4 1/2] timer: allow timer management in shared
> memory
> 
> Hi Erik,
> 
> I have a few questions and comments on this patch series.
> 
> 1. Don't you think we need new tests (in test/test/) to verify the secondary-
> process APIs?

Yes, good idea.  I'll work on a separate patch to add this.

> 2. I suggest we define default_data_id as const, and explicitly set it to 0.

I did change this to const, but ommitted the explicit initialization because checkpatch 
complains with the following: "ERROR:INITIALISED_STATIC: do not initialise statics to 0".

> 3. The outer for-loop in rte_timer_alt_manage() touches beyond the end of
> poll_lcores[]. I suggest a change like this:
> 
> -       for (i = 0, poll_lcore = poll_lcores[i]; i < nb_poll_lcores;
> -            poll_lcore = poll_lcores[++i]) {
> +       for (i = 0; I < nb_poll_lcores; i++) {
> +            poll_lcore = poll_lcores[i];
> 

Change made.

> 4. Same problem (as #3) in the for-loop in rte_timer_stop_all(), in patch v4
> 2/2.

Change made.

> 5. There seems to be no difference between "typedef void
> (*rte_timer_cb_t)(struct rte_timer *, void *)" and "typedef void
> (*rte_timer_stop_all_cb_t)(struct rte_timer *tim, void *arg)", why add
> rte_timer_stop_all_cb_t?

Though they have the same signature, it seemed clearer to me to have a new callback 
type since one represents a function that gets called per timer, and the other represents
 a function that gets called for all timers.

> 6. Can you provide a use case or code snippet that shows how we will use
> rte_timer_alt_manage()?

Currently this function is used by an updated version of the software event timer 
adapter (http://patchwork.dpdk.org/patch/48944/); rte_timer_alt_manage() is called in 
the service function for an instance of the adapter.  Since this function allows timer_data_ids 
to be specified, different instances of the adapter can manage their own separate timer lists 
independently.

> 7. Why not make the argument to rte_timer_alt_manage_cb_t a "struct
> rte_timer *", instead of a "void *", since we pass a pointer-to-timer when we
> invoke the function?
> 

Change made.

> --
> Regards,
> Robert Sanford
> 

Thanks,
Erik

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [dpdk-dev] [PATCH v5 1/2] timer: allow timer management in shared memory
  2019-04-15 21:41         ` [dpdk-dev] [PATCH v5 1/2] timer: allow timer management in shared memory Erik Gabriel Carrillo
  2019-04-15 21:41           ` Erik Gabriel Carrillo
@ 2019-04-17 17:09           ` Thomas Monjalon
  2019-04-17 17:09             ` Thomas Monjalon
  1 sibling, 1 reply; 77+ messages in thread
From: Thomas Monjalon @ 2019-04-17 17:09 UTC (permalink / raw)
  To: Erik Gabriel Carrillo; +Cc: dev, rsanford

15/04/2019 23:41, Erik Gabriel Carrillo:
> --- a/lib/librte_timer/Makefile
> +++ b/lib/librte_timer/Makefile
> +CFLAGS += -DALLOW_EXPERIMENTAL_API

You forgot the same for meson:
allow_experimental_apis = true

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [dpdk-dev] [PATCH v5 1/2] timer: allow timer management in shared memory
  2019-04-17 17:09           ` Thomas Monjalon
@ 2019-04-17 17:09             ` Thomas Monjalon
  0 siblings, 0 replies; 77+ messages in thread
From: Thomas Monjalon @ 2019-04-17 17:09 UTC (permalink / raw)
  To: Erik Gabriel Carrillo; +Cc: dev, rsanford

15/04/2019 23:41, Erik Gabriel Carrillo:
> --- a/lib/librte_timer/Makefile
> +++ b/lib/librte_timer/Makefile
> +CFLAGS += -DALLOW_EXPERIMENTAL_API

You forgot the same for meson:
allow_experimental_apis = true



^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [dpdk-dev] [PATCH v5 0/2] Timer library changes
  2019-04-15 21:41       ` [dpdk-dev] [PATCH v5 0/2] Timer library changes Erik Gabriel Carrillo
                           ` (2 preceding siblings ...)
  2019-04-15 21:41         ` [dpdk-dev] [PATCH v5 2/2] timer: add function to stop all timers in a list Erik Gabriel Carrillo
@ 2019-04-17 19:54         ` Thomas Monjalon
  2019-04-17 19:54           ` Thomas Monjalon
  3 siblings, 1 reply; 77+ messages in thread
From: Thomas Monjalon @ 2019-04-17 19:54 UTC (permalink / raw)
  To: Erik Gabriel Carrillo; +Cc: dev, rsanford

15/04/2019 23:41, Erik Gabriel Carrillo:
> This patch series modifies the timer library in such a way that
> structures that used to be statically allocated in a process's data
> segment are now allocated in shared memory.  As these structures contain
> lists of timers, new APIs are introduced that allow a caller to specify
> the particular structure instance into which a timer should be inserted
> or from which a timer should be removed.  This enables primary and
> secondary processes to modify the same timer list, which enables some
> multi-process use cases that were not previously possible; e.g. a
> secondary process can start a timer whose expiration is detected in a
> primary process running a new flavor of timer_manage().
> 
> The original library API is mostly unchanged, though implementations are
> updated to call into newly added functions with a default structure
> instance ID that provides the original behavior.  New functions are
> introduced to enable applications to allocate structure instances to
> house timer lists, and to reference them with an identifier when
> starting and stopping timers, and finally, to manage the timer lists
> referenced with an identifier.
> 
> My initial performance testing with the "timer_perf_autotest" test shows
> no performance regression or improvement, and inspection of the
> generated optimized code shows that the extra function call gets inlined
> in the functions that now have an extra function call. 
> 
> Erik Gabriel Carrillo (2):
>   timer: allow timer management in shared memory
>   timer: add function to stop all timers in a list

Applied with meson fix, thanks

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [dpdk-dev] [PATCH v5 0/2] Timer library changes
  2019-04-17 19:54         ` [dpdk-dev] [PATCH v5 0/2] Timer library changes Thomas Monjalon
@ 2019-04-17 19:54           ` Thomas Monjalon
  0 siblings, 0 replies; 77+ messages in thread
From: Thomas Monjalon @ 2019-04-17 19:54 UTC (permalink / raw)
  To: Erik Gabriel Carrillo; +Cc: dev, rsanford

15/04/2019 23:41, Erik Gabriel Carrillo:
> This patch series modifies the timer library in such a way that
> structures that used to be statically allocated in a process's data
> segment are now allocated in shared memory.  As these structures contain
> lists of timers, new APIs are introduced that allow a caller to specify
> the particular structure instance into which a timer should be inserted
> or from which a timer should be removed.  This enables primary and
> secondary processes to modify the same timer list, which enables some
> multi-process use cases that were not previously possible; e.g. a
> secondary process can start a timer whose expiration is detected in a
> primary process running a new flavor of timer_manage().
> 
> The original library API is mostly unchanged, though implementations are
> updated to call into newly added functions with a default structure
> instance ID that provides the original behavior.  New functions are
> introduced to enable applications to allocate structure instances to
> house timer lists, and to reference them with an identifier when
> starting and stopping timers, and finally, to manage the timer lists
> referenced with an identifier.
> 
> My initial performance testing with the "timer_perf_autotest" test shows
> no performance regression or improvement, and inspection of the
> generated optimized code shows that the extra function call gets inlined
> in the functions that now have an extra function call. 
> 
> Erik Gabriel Carrillo (2):
>   timer: allow timer management in shared memory
>   timer: add function to stop all timers in a list

Applied with meson fix, thanks




^ permalink raw reply	[flat|nested] 77+ messages in thread

* [dpdk-dev] [PATCH v5 0/1] New software event timer adapter
  2018-12-14 23:15     ` [dpdk-dev] [PATCH v4 0/1] New " Erik Gabriel Carrillo
  2018-12-14 23:15       ` [dpdk-dev] [PATCH v4 1/1] eventdev: add new " Erik Gabriel Carrillo
  2018-12-18 20:11       ` [dpdk-dev] [EXT] [PATCH v4 0/1] New " Jerin Jacob Kollanukkaran
@ 2019-04-22 14:57       ` Erik Gabriel Carrillo
  2019-04-22 14:57         ` Erik Gabriel Carrillo
                           ` (2 more replies)
  2 siblings, 3 replies; 77+ messages in thread
From: Erik Gabriel Carrillo @ 2019-04-22 14:57 UTC (permalink / raw)
  To: jerin.jacob; +Cc: pbhagavatula, mattias.ronnblom, dev

This patch introduces a new version of the event timer adapter software
PMD [1]. In the original design, timer event producer lcores in the primary
and secondary processes enqueued event timers into a ring, and a service
core in the primary process dequeued them and processed them further.  To
improve performance, this version does away with the ring and lets lcores in
both primary and secondary processes insert timers directly into timer
skiplist data structures; the service core directly accesses the lists as
well, when looking for timers that have expired.

[1] https://doc.dpdk.org/guides/prog_guide/event_timer_adapter.html

Changes in v5:
 - Rebase patch to apply with latest timer library
 - Fix event buffering bug where full buffer was treated as empty
 - Return rte_timer objects back to mempool after service function has
   returned from timer_manage() call instead of in callback

Changes in v4:
 - Addressed the following comments from Mattias Ronnblom:
   - remove unnecessary header include
   - add missing read barrier in timer cancel function

Changes in v3:
 - Addressed comments from Mattias Ronnblom:
   - remove unnecessary header include
   - remove unnecessary cast in mempool_put() call
   - update alignment of elements of array to avoid false sharing issue

Changes in v2:
 - split this change out into its own patch series

Erik Gabriel Carrillo (1):
  eventdev: add new software event timer adapter

 lib/librte_eventdev/rte_event_timer_adapter.c | 703 +++++++++++---------------
 1 file changed, 291 insertions(+), 412 deletions(-)

-- 
2.6.4

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [dpdk-dev] [PATCH v5 0/1] New software event timer adapter
  2019-04-22 14:57       ` [dpdk-dev] [PATCH v5 " Erik Gabriel Carrillo
@ 2019-04-22 14:57         ` Erik Gabriel Carrillo
  2019-04-22 14:57         ` [dpdk-dev] [PATCH v5 1/1] eventdev: add new " Erik Gabriel Carrillo
  2019-04-26 15:14         ` [dpdk-dev] [PATCH v6 0/1] New " Erik Gabriel Carrillo
  2 siblings, 0 replies; 77+ messages in thread
From: Erik Gabriel Carrillo @ 2019-04-22 14:57 UTC (permalink / raw)
  To: jerin.jacob; +Cc: pbhagavatula, mattias.ronnblom, dev

This patch introduces a new version of the event timer adapter software
PMD [1]. In the original design, timer event producer lcores in the primary
and secondary processes enqueued event timers into a ring, and a service
core in the primary process dequeued them and processed them further.  To
improve performance, this version does away with the ring and lets lcores in
both primary and secondary processes insert timers directly into timer
skiplist data structures; the service core directly accesses the lists as
well, when looking for timers that have expired.

[1] https://doc.dpdk.org/guides/prog_guide/event_timer_adapter.html

Changes in v5:
 - Rebase patch to apply with latest timer library
 - Fix event buffering bug where full buffer was treated as empty
 - Return rte_timer objects back to mempool after service function has
   returned from timer_manage() call instead of in callback

Changes in v4:
 - Addressed the following comments from Mattias Ronnblom:
   - remove unnecessary header include
   - add missing read barrier in timer cancel function

Changes in v3:
 - Addressed comments from Mattias Ronnblom:
   - remove unnecessary header include
   - remove unnecessary cast in mempool_put() call
   - update alignment of elements of array to avoid false sharing issue

Changes in v2:
 - split this change out into its own patch series

Erik Gabriel Carrillo (1):
  eventdev: add new software event timer adapter

 lib/librte_eventdev/rte_event_timer_adapter.c | 703 +++++++++++---------------
 1 file changed, 291 insertions(+), 412 deletions(-)

-- 
2.6.4


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [dpdk-dev] [PATCH v5 1/1] eventdev: add new software event timer adapter
  2019-04-22 14:57       ` [dpdk-dev] [PATCH v5 " Erik Gabriel Carrillo
  2019-04-22 14:57         ` Erik Gabriel Carrillo
@ 2019-04-22 14:57         ` Erik Gabriel Carrillo
  2019-04-22 14:57           ` Erik Gabriel Carrillo
  2019-04-26 15:14         ` [dpdk-dev] [PATCH v6 0/1] New " Erik Gabriel Carrillo
  2 siblings, 1 reply; 77+ messages in thread
From: Erik Gabriel Carrillo @ 2019-04-22 14:57 UTC (permalink / raw)
  To: jerin.jacob; +Cc: pbhagavatula, mattias.ronnblom, dev

This patch introduces a new version of the event timer adapter software
PMD. In the original design, timer event producer lcores in the primary
and secondary processes enqueued event timers into a ring, and a
service core in the primary process dequeued them and processed them
further.  To improve performance, this version does away with the ring
and lets lcores in both primary and secondary processes insert timers
directly into timer skiplist data structures; the service core directly
accesses the lists as well, when looking for timers that have expired.

Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
---
 lib/librte_eventdev/rte_event_timer_adapter.c | 703 +++++++++++---------------
 1 file changed, 291 insertions(+), 412 deletions(-)

diff --git a/lib/librte_eventdev/rte_event_timer_adapter.c b/lib/librte_eventdev/rte_event_timer_adapter.c
index 575da04..841c904 100644
--- a/lib/librte_eventdev/rte_event_timer_adapter.c
+++ b/lib/librte_eventdev/rte_event_timer_adapter.c
@@ -19,6 +19,7 @@
 #include <rte_timer.h>
 #include <rte_service_component.h>
 #include <rte_cycles.h>
+#include <rte_debug.h>
 
 #include "rte_eventdev.h"
 #include "rte_eventdev_pmd.h"
@@ -34,7 +35,7 @@ static int evtim_buffer_logtype;
 
 static struct rte_event_timer_adapter adapters[RTE_EVENT_TIMER_ADAPTER_NUM_MAX];
 
-static const struct rte_event_timer_adapter_ops sw_event_adapter_timer_ops;
+static const struct rte_event_timer_adapter_ops swtim_ops;
 
 #define EVTIM_LOG(level, logtype, ...) \
 	rte_log(RTE_LOG_ ## level, logtype, \
@@ -211,7 +212,7 @@ rte_event_timer_adapter_create_ext(
 	 * implementation.
 	 */
 	if (adapter->ops == NULL)
-		adapter->ops = &sw_event_adapter_timer_ops;
+		adapter->ops = &swtim_ops;
 
 	/* Allow driver to do some setup */
 	FUNC_PTR_OR_NULL_RET_WITH_ERRNO(adapter->ops->init, -ENOTSUP);
@@ -340,7 +341,7 @@ rte_event_timer_adapter_lookup(uint16_t adapter_id)
 	 * implementation.
 	 */
 	if (adapter->ops == NULL)
-		adapter->ops = &sw_event_adapter_timer_ops;
+		adapter->ops = &swtim_ops;
 
 	/* Set fast-path function pointers */
 	adapter->arm_burst = adapter->ops->arm_burst;
@@ -491,12 +492,17 @@ event_buffer_flush(struct event_buffer *bufp, uint8_t dev_id, uint8_t port_id,
 		n = head_idx - tail_idx;
 	else if (head_idx < tail_idx)
 		n = EVENT_BUFFER_SZ - tail_idx;
+	else if (event_buffer_full(bufp))
+		n = EVENT_BUFFER_SZ - tail_idx;
 	else {
+		/* Buffer empty */
+		RTE_ASSERT(bufp->head - bufp->tail == 0);
 		*nb_events_flushed = 0;
 		return;
 	}
 
 	*nb_events_inv = 0;
+
 	*nb_events_flushed = rte_event_enqueue_burst(dev_id, port_id,
 						     &events[tail_idx], n);
 	if (*nb_events_flushed != n && rte_errno == -EINVAL) {
@@ -504,137 +510,129 @@ event_buffer_flush(struct event_buffer *bufp, uint8_t dev_id, uint8_t port_id,
 		(*nb_events_inv)++;
 	}
 
+	if (*nb_events_flushed > 0)
+		EVTIM_BUF_LOG_DBG("enqueued %"PRIu16" timer events to event "
+				  "device", *nb_events_flushed);
+
 	bufp->tail = bufp->tail + *nb_events_flushed + *nb_events_inv;
 }
 
 /*
  * Software event timer adapter implementation
  */
-
-struct rte_event_timer_adapter_sw_data {
-	/* List of messages for outstanding timers */
-	TAILQ_HEAD(, msg) msgs_tailq_head;
-	/* Lock to guard tailq and armed count */
-	rte_spinlock_t msgs_tailq_sl;
+struct swtim {
 	/* Identifier of service executing timer management logic. */
 	uint32_t service_id;
 	/* The cycle count at which the adapter should next tick */
 	uint64_t next_tick_cycles;
-	/* Incremented as the service moves through phases of an iteration */
-	volatile int service_phase;
 	/* The tick resolution used by adapter instance. May have been
 	 * adjusted from what user requested
 	 */
 	uint64_t timer_tick_ns;
 	/* Maximum timeout in nanoseconds allowed by adapter instance. */
 	uint64_t max_tmo_ns;
-	/* Ring containing messages to arm or cancel event timers */
-	struct rte_ring *msg_ring;
-	/* Mempool containing msg objects */
-	struct rte_mempool *msg_pool;
 	/* Buffered timer expiry events to be enqueued to an event device. */
 	struct event_buffer buffer;
 	/* Statistics */
 	struct rte_event_timer_adapter_stats stats;
-	/* The number of threads currently adding to the message ring */
-	rte_atomic16_t message_producer_count;
+	/* Mempool of timer objects */
+	struct rte_mempool *tim_pool;
+	/* Back pointer for convenience */
+	struct rte_event_timer_adapter *adapter;
+	/* Identifier of timer data instance */
+	uint32_t timer_data_id;
+	/* Track which cores have actually armed a timer */
+	struct {
+		rte_atomic16_t v;
+	} __rte_cache_aligned in_use[RTE_MAX_LCORE];
+	/* Track which cores' timer lists should be polled */
+	unsigned int poll_lcores[RTE_MAX_LCORE];
+	/* The number of lists that should be polled */
+	int n_poll_lcores;
+	/* Lock to atomically access the above two variables */
+	rte_spinlock_t poll_lcores_sl;
+	struct rte_timer *expired_timers[EVENT_BUFFER_SZ];
+	int expired_timers_idx;
 };
 
-enum msg_type {MSG_TYPE_ARM, MSG_TYPE_CANCEL};
-
-struct msg {
-	enum msg_type type;
-	struct rte_event_timer *evtim;
-	struct rte_timer tim;
-	TAILQ_ENTRY(msg) msgs;
-};
+static inline struct swtim *
+swtim_pmd_priv(const struct rte_event_timer_adapter *adapter)
+{
+	return adapter->data->adapter_priv;
+}
 
 static void
-sw_event_timer_cb(struct rte_timer *tim, void *arg)
+swtim_callback(struct rte_timer *tim)
 {
-	int ret;
+	struct rte_event_timer *evtim = tim->arg;
+	struct rte_event_timer_adapter *adapter;
+	struct swtim *sw;
 	uint16_t nb_evs_flushed = 0;
 	uint16_t nb_evs_invalid = 0;
 	uint64_t opaque;
-	struct rte_event_timer *evtim;
-	struct rte_event_timer_adapter *adapter;
-	struct rte_event_timer_adapter_sw_data *sw_data;
+	int ret;
 
-	evtim = arg;
 	opaque = evtim->impl_opaque[1];
 	adapter = (struct rte_event_timer_adapter *)(uintptr_t)opaque;
-	sw_data = adapter->data->adapter_priv;
+	sw = swtim_pmd_priv(adapter);
 
-	ret = event_buffer_add(&sw_data->buffer, &evtim->ev);
+	ret = event_buffer_add(&sw->buffer, &evtim->ev);
 	if (ret < 0) {
 		/* If event buffer is full, put timer back in list with
 		 * immediate expiry value, so that we process it again on the
 		 * next iteration.
 		 */
-		rte_timer_reset_sync(tim, 0, SINGLE, rte_lcore_id(),
-				     sw_event_timer_cb, evtim);
+		rte_timer_alt_reset(sw->timer_data_id, tim, 0, SINGLE,
+				    rte_lcore_id(), NULL, evtim);
+
+		sw->stats.evtim_retry_count++;
 
-		sw_data->stats.evtim_retry_count++;
 		EVTIM_LOG_DBG("event buffer full, resetting rte_timer with "
 			      "immediate expiry value");
 	} else {
-		struct msg *m = container_of(tim, struct msg, tim);
-		TAILQ_REMOVE(&sw_data->msgs_tailq_head, m, msgs);
 		EVTIM_BUF_LOG_DBG("buffered an event timer expiry event");
-		evtim->state = RTE_EVENT_TIMER_NOT_ARMED;
 
-		/* Free the msg object containing the rte_timer now that
-		 * we've buffered its event successfully.
-		 */
-		rte_mempool_put(sw_data->msg_pool, m);
+		sw->expired_timers[sw->expired_timers_idx++] = tim;
+		RTE_ASSERT(sw->expired_timers_idx <= EVENT_BUFFER_SZ);
 
-		/* Bump the count when we successfully add an expiry event to
-		 * the buffer.
-		 */
-		sw_data->stats.evtim_exp_count++;
+		sw->stats.evtim_exp_count++;
+
+		evtim->state = RTE_EVENT_TIMER_NOT_ARMED;
 	}
 
-	if (event_buffer_batch_ready(&sw_data->buffer)) {
-		event_buffer_flush(&sw_data->buffer,
+	if (event_buffer_batch_ready(&sw->buffer)) {
+		event_buffer_flush(&sw->buffer,
 				   adapter->data->event_dev_id,
 				   adapter->data->event_port_id,
 				   &nb_evs_flushed,
 				   &nb_evs_invalid);
 
-		sw_data->stats.ev_enq_count += nb_evs_flushed;
-		sw_data->stats.ev_inv_count += nb_evs_invalid;
+		sw->stats.ev_enq_count += nb_evs_flushed;
+		sw->stats.ev_inv_count += nb_evs_invalid;
 	}
 }
 
 static __rte_always_inline uint64_t
 get_timeout_cycles(struct rte_event_timer *evtim,
-		   struct rte_event_timer_adapter *adapter)
+		   const struct rte_event_timer_adapter *adapter)
 {
-	uint64_t timeout_ns;
-	struct rte_event_timer_adapter_sw_data *sw_data;
-
-	sw_data = adapter->data->adapter_priv;
-	timeout_ns = evtim->timeout_ticks * sw_data->timer_tick_ns;
+	struct swtim *sw = swtim_pmd_priv(adapter);
+	uint64_t timeout_ns = evtim->timeout_ticks * sw->timer_tick_ns;
 	return timeout_ns * rte_get_timer_hz() / NSECPERSEC;
-
 }
 
 /* This function returns true if one or more (adapter) ticks have occurred since
  * the last time it was called.
  */
 static inline bool
-adapter_did_tick(struct rte_event_timer_adapter *adapter)
+swtim_did_tick(struct swtim *sw)
 {
 	uint64_t cycles_per_adapter_tick, start_cycles;
 	uint64_t *next_tick_cyclesp;
-	struct rte_event_timer_adapter_sw_data *sw_data;
 
-	sw_data = adapter->data->adapter_priv;
-	next_tick_cyclesp = &sw_data->next_tick_cycles;
-
-	cycles_per_adapter_tick = sw_data->timer_tick_ns *
+	next_tick_cyclesp = &sw->next_tick_cycles;
+	cycles_per_adapter_tick = sw->timer_tick_ns *
 			(rte_get_timer_hz() / NSECPERSEC);
-
 	start_cycles = rte_get_timer_cycles();
 
 	/* Note: initially, *next_tick_cyclesp == 0, so the clause below will
@@ -646,7 +644,6 @@ adapter_did_tick(struct rte_event_timer_adapter *adapter)
 		 * boundary.
 		 */
 		start_cycles -= start_cycles % cycles_per_adapter_tick;
-
 		*next_tick_cyclesp = start_cycles + cycles_per_adapter_tick;
 
 		return true;
@@ -661,15 +658,12 @@ check_timeout(struct rte_event_timer *evtim,
 	      const struct rte_event_timer_adapter *adapter)
 {
 	uint64_t tmo_nsec;
-	struct rte_event_timer_adapter_sw_data *sw_data;
-
-	sw_data = adapter->data->adapter_priv;
-	tmo_nsec = evtim->timeout_ticks * sw_data->timer_tick_ns;
+	struct swtim *sw = swtim_pmd_priv(adapter);
 
-	if (tmo_nsec > sw_data->max_tmo_ns)
+	tmo_nsec = evtim->timeout_ticks * sw->timer_tick_ns;
+	if (tmo_nsec > sw->max_tmo_ns)
 		return -1;
-
-	if (tmo_nsec < sw_data->timer_tick_ns)
+	if (tmo_nsec < sw->timer_tick_ns)
 		return -2;
 
 	return 0;
@@ -697,110 +691,39 @@ check_destination_event_queue(struct rte_event_timer *evtim,
 	return 0;
 }
 
-#define NB_OBJS 32
 static int
-sw_event_timer_adapter_service_func(void *arg)
+swtim_service_func(void *arg)
 {
-	int i, num_msgs;
-	uint64_t cycles, opaque;
+	struct rte_event_timer_adapter *adapter = arg;
+	struct swtim *sw = swtim_pmd_priv(adapter);
 	uint16_t nb_evs_flushed = 0;
 	uint16_t nb_evs_invalid = 0;
-	struct rte_event_timer_adapter *adapter;
-	struct rte_event_timer_adapter_sw_data *sw_data;
-	struct rte_event_timer *evtim = NULL;
-	struct rte_timer *tim = NULL;
-	struct msg *msg, *msgs[NB_OBJS];
-
-	adapter = arg;
-	sw_data = adapter->data->adapter_priv;
-
-	sw_data->service_phase = 1;
-	rte_smp_wmb();
-
-	while (rte_atomic16_read(&sw_data->message_producer_count) > 0 ||
-	       !rte_ring_empty(sw_data->msg_ring)) {
-
-		num_msgs = rte_ring_dequeue_burst(sw_data->msg_ring,
-						  (void **)msgs, NB_OBJS, NULL);
-
-		for (i = 0; i < num_msgs; i++) {
-			int ret = 0;
-
-			RTE_SET_USED(ret);
-
-			msg = msgs[i];
-			evtim = msg->evtim;
-
-			switch (msg->type) {
-			case MSG_TYPE_ARM:
-				EVTIM_SVC_LOG_DBG("dequeued ARM message from "
-						  "ring");
-				tim = &msg->tim;
-				rte_timer_init(tim);
-				cycles = get_timeout_cycles(evtim,
-							    adapter);
-				ret = rte_timer_reset(tim, cycles, SINGLE,
-						      rte_lcore_id(),
-						      sw_event_timer_cb,
-						      evtim);
-				RTE_ASSERT(ret == 0);
-
-				evtim->impl_opaque[0] = (uintptr_t)tim;
-				evtim->impl_opaque[1] = (uintptr_t)adapter;
-
-				TAILQ_INSERT_TAIL(&sw_data->msgs_tailq_head,
-						  msg,
-						  msgs);
-				break;
-			case MSG_TYPE_CANCEL:
-				EVTIM_SVC_LOG_DBG("dequeued CANCEL message "
-						  "from ring");
-				opaque = evtim->impl_opaque[0];
-				tim = (struct rte_timer *)(uintptr_t)opaque;
-				RTE_ASSERT(tim != NULL);
-
-				ret = rte_timer_stop(tim);
-				RTE_ASSERT(ret == 0);
-
-				/* Free the msg object for the original arm
-				 * request.
-				 */
-				struct msg *m;
-				m = container_of(tim, struct msg, tim);
-				TAILQ_REMOVE(&sw_data->msgs_tailq_head, m,
-					     msgs);
-				rte_mempool_put(sw_data->msg_pool, m);
-
-				/* Free the msg object for the current msg */
-				rte_mempool_put(sw_data->msg_pool, msg);
-
-				evtim->impl_opaque[0] = 0;
-				evtim->impl_opaque[1] = 0;
-
-				break;
-			}
-		}
-	}
-
-	sw_data->service_phase = 2;
-	rte_smp_wmb();
 
-	if (adapter_did_tick(adapter)) {
-		rte_timer_manage();
-
-		event_buffer_flush(&sw_data->buffer,
+	if (swtim_did_tick(sw)) {
+		/* This lock is seldom acquired on the arm side */
+		rte_spinlock_lock(&sw->poll_lcores_sl);
+		rte_timer_alt_manage(sw->timer_data_id,
+				     sw->poll_lcores,
+				     sw->n_poll_lcores,
+				     swtim_callback);
+		rte_spinlock_unlock(&sw->poll_lcores_sl);
+
+		/* Return expired timer objects back to mempool */
+		rte_mempool_put_bulk(sw->tim_pool, (void **)sw->expired_timers,
+				     sw->expired_timers_idx);
+		sw->expired_timers_idx = 0;
+
+		event_buffer_flush(&sw->buffer,
 				   adapter->data->event_dev_id,
 				   adapter->data->event_port_id,
-				   &nb_evs_flushed, &nb_evs_invalid);
+				   &nb_evs_flushed,
+				   &nb_evs_invalid);
 
-		sw_data->stats.ev_enq_count += nb_evs_flushed;
-		sw_data->stats.ev_inv_count += nb_evs_invalid;
-		sw_data->stats.adapter_tick_count++;
+		sw->stats.ev_enq_count += nb_evs_flushed;
+		sw->stats.ev_inv_count += nb_evs_invalid;
+		sw->stats.adapter_tick_count++;
 	}
 
-	sw_data->service_phase = 0;
-	rte_smp_wmb();
-
 	return 0;
 }
 
@@ -834,168 +757,145 @@ compute_msg_mempool_cache_size(uint64_t nb_requested, uint64_t nb_actual)
 	return cache_size;
 }
 
-#define SW_MIN_INTERVAL 1E5
-
 static int
-sw_event_timer_adapter_init(struct rte_event_timer_adapter *adapter)
+swtim_init(struct rte_event_timer_adapter *adapter)
 {
-	int ret;
-	struct rte_event_timer_adapter_sw_data *sw_data;
-	uint64_t nb_timers;
+	int i, ret;
+	struct swtim *sw;
 	unsigned int flags;
 	struct rte_service_spec service;
-	static bool timer_subsystem_inited; // static initialized to false
 
-	/* Allocate storage for SW implementation data */
-	char priv_data_name[RTE_RING_NAMESIZE];
-	snprintf(priv_data_name, RTE_RING_NAMESIZE, "sw_evtim_adap_priv_%"PRIu8,
-		 adapter->data->id);
-	adapter->data->adapter_priv = rte_zmalloc_socket(
-				priv_data_name,
-				sizeof(struct rte_event_timer_adapter_sw_data),
-				RTE_CACHE_LINE_SIZE,
-				adapter->data->socket_id);
-	if (adapter->data->adapter_priv == NULL) {
+	/* Allocate storage for private data area */
+#define SWTIM_NAMESIZE 32
+	char swtim_name[SWTIM_NAMESIZE];
+	snprintf(swtim_name, SWTIM_NAMESIZE, "swtim_%"PRIu8,
+			adapter->data->id);
+	sw = rte_zmalloc_socket(swtim_name, sizeof(*sw), RTE_CACHE_LINE_SIZE,
+			adapter->data->socket_id);
+	if (sw == NULL) {
 		EVTIM_LOG_ERR("failed to allocate space for private data");
 		rte_errno = ENOMEM;
 		return -1;
 	}
 
-	if (adapter->data->conf.timer_tick_ns < SW_MIN_INTERVAL) {
-		EVTIM_LOG_ERR("failed to create adapter with requested tick "
-			      "interval");
-		rte_errno = EINVAL;
-		return -1;
-	}
-
-	sw_data = adapter->data->adapter_priv;
-
-	sw_data->timer_tick_ns = adapter->data->conf.timer_tick_ns;
-	sw_data->max_tmo_ns = adapter->data->conf.max_tmo_ns;
+	/* Connect storage to adapter instance */
+	adapter->data->adapter_priv = sw;
+	sw->adapter = adapter;
 
-	TAILQ_INIT(&sw_data->msgs_tailq_head);
-	rte_spinlock_init(&sw_data->msgs_tailq_sl);
-	rte_atomic16_init(&sw_data->message_producer_count);
+	sw->timer_tick_ns = adapter->data->conf.timer_tick_ns;
+	sw->max_tmo_ns = adapter->data->conf.max_tmo_ns;
 
-	/* Rings require power of 2, so round up to next such value */
-	nb_timers = rte_align64pow2(adapter->data->conf.nb_timers);
-
-	char msg_ring_name[RTE_RING_NAMESIZE];
-	snprintf(msg_ring_name, RTE_RING_NAMESIZE,
-		 "sw_evtim_adap_msg_ring_%"PRIu8, adapter->data->id);
-	flags = adapter->data->conf.flags & RTE_EVENT_TIMER_ADAPTER_F_SP_PUT ?
-		RING_F_SP_ENQ | RING_F_SC_DEQ :
-		RING_F_SC_DEQ;
-	sw_data->msg_ring = rte_ring_create(msg_ring_name, nb_timers,
-					    adapter->data->socket_id, flags);
-	if (sw_data->msg_ring == NULL) {
-		EVTIM_LOG_ERR("failed to create message ring");
-		rte_errno = ENOMEM;
-		goto free_priv_data;
-	}
-
-	char pool_name[RTE_RING_NAMESIZE];
-	snprintf(pool_name, RTE_RING_NAMESIZE, "sw_evtim_adap_msg_pool_%"PRIu8,
+	/* Create a timer pool */
+	char pool_name[SWTIM_NAMESIZE];
+	snprintf(pool_name, SWTIM_NAMESIZE, "swtim_pool_%"PRIu8,
 		 adapter->data->id);
-
-	/* Both the arming/canceling thread and the service thread will do puts
-	 * to the mempool, but if the SP_PUT flag is enabled, we can specify
-	 * single-consumer get for the mempool.
-	 */
-	flags = adapter->data->conf.flags & RTE_EVENT_TIMER_ADAPTER_F_SP_PUT ?
-		MEMPOOL_F_SC_GET : 0;
-
-	/* The usable size of a ring is count - 1, so subtract one here to
-	 * make the counts agree.
-	 */
+	/* Optimal mempool size is a power of 2 minus one */
+	uint64_t nb_timers = rte_align64pow2(adapter->data->conf.nb_timers);
 	int pool_size = nb_timers - 1;
 	int cache_size = compute_msg_mempool_cache_size(
 				adapter->data->conf.nb_timers, nb_timers);
-	sw_data->msg_pool = rte_mempool_create(pool_name, pool_size,
-					       sizeof(struct msg), cache_size,
-					       0, NULL, NULL, NULL, NULL,
-					       adapter->data->socket_id, flags);
-	if (sw_data->msg_pool == NULL) {
-		EVTIM_LOG_ERR("failed to create message object mempool");
+	flags = 0; /* pool is multi-producer, multi-consumer */
+	sw->tim_pool = rte_mempool_create(pool_name, pool_size,
+			sizeof(struct rte_timer), cache_size, 0, NULL, NULL,
+			NULL, NULL, adapter->data->socket_id, flags);
+	if (sw->tim_pool == NULL) {
+		EVTIM_LOG_ERR("failed to create timer object mempool");
 		rte_errno = ENOMEM;
-		goto free_msg_ring;
+		goto free_alloc;
+	}
+
+	/* Initialize the variables that track in-use timer lists */
+	rte_spinlock_init(&sw->poll_lcores_sl);
+	for (i = 0; i < RTE_MAX_LCORE; i++)
+		rte_atomic16_init(&sw->in_use[i].v);
+
+	/* Initialize the timer subsystem and allocate timer data instance */
+	ret = rte_timer_subsystem_init();
+	if (ret < 0) {
+		if (ret != -EALREADY) {
+			EVTIM_LOG_ERR("failed to initialize timer subsystem");
+			rte_errno = ret;
+			goto free_mempool;
+		}
 	}
 
-	event_buffer_init(&sw_data->buffer);
+	ret = rte_timer_data_alloc(&sw->timer_data_id);
+	if (ret < 0) {
+		EVTIM_LOG_ERR("failed to allocate timer data instance");
+		rte_errno = ret;
+		goto free_mempool;
+	}
+
+	/* Initialize timer event buffer */
+	event_buffer_init(&sw->buffer);
+
+	sw->adapter = adapter;
 
 	/* Register a service component to run adapter logic */
 	memset(&service, 0, sizeof(service));
 	snprintf(service.name, RTE_SERVICE_NAME_MAX,
-		 "sw_evimer_adap_svc_%"PRIu8, adapter->data->id);
+		 "swtim_svc_%"PRIu8, adapter->data->id);
 	service.socket_id = adapter->data->socket_id;
-	service.callback = sw_event_timer_adapter_service_func;
+	service.callback = swtim_service_func;
 	service.callback_userdata = adapter;
 	service.capabilities &= ~(RTE_SERVICE_CAP_MT_SAFE);
-	ret = rte_service_component_register(&service, &sw_data->service_id);
+	ret = rte_service_component_register(&service, &sw->service_id);
 	if (ret < 0) {
 		EVTIM_LOG_ERR("failed to register service %s with id %"PRIu32
-			      ": err = %d", service.name, sw_data->service_id,
+			      ": err = %d", service.name, sw->service_id,
 			      ret);
 
 		rte_errno = ENOSPC;
-		goto free_msg_pool;
+		goto free_mempool;
 	}
 
 	EVTIM_LOG_DBG("registered service %s with id %"PRIu32, service.name,
-		      sw_data->service_id);
+		      sw->service_id);
 
-	adapter->data->service_id = sw_data->service_id;
+	adapter->data->service_id = sw->service_id;
 	adapter->data->service_inited = 1;
 
-	if (!timer_subsystem_inited) {
-		rte_timer_subsystem_init();
-		timer_subsystem_inited = true;
-	}
-
 	return 0;
-
-free_msg_pool:
-	rte_mempool_free(sw_data->msg_pool);
-free_msg_ring:
-	rte_ring_free(sw_data->msg_ring);
-free_priv_data:
-	rte_free(sw_data);
+free_mempool:
+	rte_mempool_free(sw->tim_pool);
+free_alloc:
+	rte_free(sw);
 	return -1;
 }
 
-static int
-sw_event_timer_adapter_uninit(struct rte_event_timer_adapter *adapter)
+static void
+swtim_free_tim(struct rte_timer *tim, void *arg)
 {
-	int ret;
-	struct msg *m1, *m2;
-	struct rte_event_timer_adapter_sw_data *sw_data =
-						adapter->data->adapter_priv;
+	struct swtim *sw = arg;
 
-	rte_spinlock_lock(&sw_data->msgs_tailq_sl);
-
-	/* Cancel outstanding rte_timers and free msg objects */
-	m1 = TAILQ_FIRST(&sw_data->msgs_tailq_head);
-	while (m1 != NULL) {
-		EVTIM_LOG_DBG("freeing outstanding timer");
-		m2 = TAILQ_NEXT(m1, msgs);
-
-		rte_timer_stop_sync(&m1->tim);
-		rte_mempool_put(sw_data->msg_pool, m1);
+	rte_mempool_put(sw->tim_pool, tim);
+}
 
-		m1 = m2;
-	}
+/* Traverse the list of outstanding timers and put them back in the mempool
+ * before freeing the adapter to avoid leaking the memory.
+ */
+static int
+swtim_uninit(struct rte_event_timer_adapter *adapter)
+{
+	int ret;
+	struct swtim *sw = swtim_pmd_priv(adapter);
 
-	rte_spinlock_unlock(&sw_data->msgs_tailq_sl);
+	/* Free outstanding timers */
+	rte_timer_stop_all(sw->timer_data_id,
+			   sw->poll_lcores,
+			   sw->n_poll_lcores,
+			   swtim_free_tim,
+			   sw);
 
-	ret = rte_service_component_unregister(sw_data->service_id);
+	ret = rte_service_component_unregister(sw->service_id);
 	if (ret < 0) {
 		EVTIM_LOG_ERR("failed to unregister service component");
 		return ret;
 	}
 
-	rte_ring_free(sw_data->msg_ring);
-	rte_mempool_free(sw_data->msg_pool);
-	rte_free(adapter->data->adapter_priv);
+	rte_mempool_free(sw->tim_pool);
+	rte_free(sw);
+	adapter->data->adapter_priv = NULL;
 
 	return 0;
 }
@@ -1016,88 +916,79 @@ get_mapped_count_for_service(uint32_t service_id)
 }
 
 static int
-sw_event_timer_adapter_start(const struct rte_event_timer_adapter *adapter)
+swtim_start(const struct rte_event_timer_adapter *adapter)
 {
 	int mapped_count;
-	struct rte_event_timer_adapter_sw_data *sw_data;
-
-	sw_data = adapter->data->adapter_priv;
+	struct swtim *sw = swtim_pmd_priv(adapter);
 
 	/* Mapping the service to more than one service core can introduce
 	 * delays while one thread is waiting to acquire a lock, so only allow
 	 * one core to be mapped to the service.
+	 *
+	 * Note: the service could be modified such that it spreads cores to
+	 * poll over multiple service instances.
 	 */
-	mapped_count = get_mapped_count_for_service(sw_data->service_id);
+	mapped_count = get_mapped_count_for_service(sw->service_id);
 
-	if (mapped_count == 1)
-		return rte_service_component_runstate_set(sw_data->service_id,
-							  1);
+	if (mapped_count != 1)
+		return mapped_count < 1 ? -ENOENT : -ENOTSUP;
 
-	return mapped_count < 1 ? -ENOENT : -ENOTSUP;
+	return rte_service_component_runstate_set(sw->service_id, 1);
 }
 
 static int
-sw_event_timer_adapter_stop(const struct rte_event_timer_adapter *adapter)
+swtim_stop(const struct rte_event_timer_adapter *adapter)
 {
 	int ret;
-	struct rte_event_timer_adapter_sw_data *sw_data =
-						adapter->data->adapter_priv;
+	struct swtim *sw = swtim_pmd_priv(adapter);
 
-	ret = rte_service_component_runstate_set(sw_data->service_id, 0);
+	ret = rte_service_component_runstate_set(sw->service_id, 0);
 	if (ret < 0)
 		return ret;
 
-	/* Wait for the service to complete its final iteration before
-	 * stopping.
-	 */
-	while (sw_data->service_phase != 0)
+	/* Wait for the service to complete its final iteration */
+	while (rte_service_may_be_active(sw->service_id))
 		rte_pause();
 
-	rte_smp_rmb();
-
 	return 0;
 }
 
 static void
-sw_event_timer_adapter_get_info(const struct rte_event_timer_adapter *adapter,
+swtim_get_info(const struct rte_event_timer_adapter *adapter,
 		struct rte_event_timer_adapter_info *adapter_info)
 {
-	struct rte_event_timer_adapter_sw_data *sw_data;
-	sw_data = adapter->data->adapter_priv;
-
-	adapter_info->min_resolution_ns = sw_data->timer_tick_ns;
-	adapter_info->max_tmo_ns = sw_data->max_tmo_ns;
+	struct swtim *sw = swtim_pmd_priv(adapter);
+	adapter_info->min_resolution_ns = sw->timer_tick_ns;
+	adapter_info->max_tmo_ns = sw->max_tmo_ns;
 }
 
 static int
-sw_event_timer_adapter_stats_get(const struct rte_event_timer_adapter *adapter,
-				 struct rte_event_timer_adapter_stats *stats)
+swtim_stats_get(const struct rte_event_timer_adapter *adapter,
+		struct rte_event_timer_adapter_stats *stats)
 {
-	struct rte_event_timer_adapter_sw_data *sw_data;
-	sw_data = adapter->data->adapter_priv;
-	*stats = sw_data->stats;
+	struct swtim *sw = swtim_pmd_priv(adapter);
+	*stats = sw->stats; /* structure copy */
 	return 0;
 }
 
 static int
-sw_event_timer_adapter_stats_reset(
-				const struct rte_event_timer_adapter *adapter)
+swtim_stats_reset(const struct rte_event_timer_adapter *adapter)
 {
-	struct rte_event_timer_adapter_sw_data *sw_data;
-	sw_data = adapter->data->adapter_priv;
-	memset(&sw_data->stats, 0, sizeof(sw_data->stats));
+	struct swtim *sw = swtim_pmd_priv(adapter);
+	memset(&sw->stats, 0, sizeof(sw->stats));
 	return 0;
 }
 
-static __rte_always_inline uint16_t
-__sw_event_timer_arm_burst(const struct rte_event_timer_adapter *adapter,
-			  struct rte_event_timer **evtims,
-			  uint16_t nb_evtims)
+static uint16_t
+__swtim_arm_burst(const struct rte_event_timer_adapter *adapter,
+		struct rte_event_timer **evtims,
+		uint16_t nb_evtims)
 {
-	uint16_t i;
-	int ret;
-	struct rte_event_timer_adapter_sw_data *sw_data;
-	struct msg *msgs[nb_evtims];
+	int i, ret;
+	struct swtim *sw = swtim_pmd_priv(adapter);
+	uint32_t lcore_id = rte_lcore_id();
+	struct rte_timer *tim, *tims[nb_evtims];
+	uint64_t cycles;
 
 #ifdef RTE_LIBRTE_EVENTDEV_DEBUG
 	/* Check that the service is running. */
@@ -1107,101 +998,104 @@ __sw_event_timer_arm_burst(const struct rte_event_timer_adapter *adapter,
 	}
 #endif
 
-	sw_data = adapter->data->adapter_priv;
+	/* Adjust lcore_id if non-EAL thread. Arbitrarily pick the timer list of
+	 * the highest lcore to insert such timers into
+	 */
+	if (lcore_id == LCORE_ID_ANY)
+		lcore_id = RTE_MAX_LCORE - 1;
+
+	/* If this is the first time we're arming an event timer on this lcore,
+	 * mark this lcore as "in use"; this will cause the service
+	 * function to process the timer list that corresponds to this lcore.
+	 */
+	if (unlikely(rte_atomic16_test_and_set(&sw->in_use[lcore_id].v))) {
+		rte_spinlock_lock(&sw->poll_lcores_sl);
+		EVTIM_LOG_DBG("Adding lcore id = %u to list of lcores to poll",
+			      lcore_id);
+		sw->poll_lcores[sw->n_poll_lcores++] = lcore_id;
+		rte_spinlock_unlock(&sw->poll_lcores_sl);
+	}
 
-	ret = rte_mempool_get_bulk(sw_data->msg_pool, (void **)msgs, nb_evtims);
+	ret = rte_mempool_get_bulk(sw->tim_pool, (void **)tims,
+				   nb_evtims);
 	if (ret < 0) {
 		rte_errno = ENOSPC;
 		return 0;
 	}
 
-	/* Let the service know we're producing messages for it to process */
-	rte_atomic16_inc(&sw_data->message_producer_count);
-
-	/* If the service is managing timers, wait for it to finish */
-	while (sw_data->service_phase == 2)
-		rte_pause();
-
-	rte_smp_rmb();
-
 	for (i = 0; i < nb_evtims; i++) {
 		/* Don't modify the event timer state in these cases */
 		if (evtims[i]->state == RTE_EVENT_TIMER_ARMED) {
 			rte_errno = EALREADY;
 			break;
 		} else if (!(evtims[i]->state == RTE_EVENT_TIMER_NOT_ARMED ||
-		    evtims[i]->state == RTE_EVENT_TIMER_CANCELED)) {
+			     evtims[i]->state == RTE_EVENT_TIMER_CANCELED)) {
 			rte_errno = EINVAL;
 			break;
 		}
 
 		ret = check_timeout(evtims[i], adapter);
-		if (ret == -1) {
+		if (unlikely(ret == -1)) {
 			evtims[i]->state = RTE_EVENT_TIMER_ERROR_TOOLATE;
 			rte_errno = EINVAL;
 			break;
-		}
-		if (ret == -2) {
+		} else if (unlikely(ret == -2)) {
 			evtims[i]->state = RTE_EVENT_TIMER_ERROR_TOOEARLY;
 			rte_errno = EINVAL;
 			break;
 		}
 
-		if (check_destination_event_queue(evtims[i], adapter) < 0) {
+		if (unlikely(check_destination_event_queue(evtims[i],
+							   adapter) < 0)) {
 			evtims[i]->state = RTE_EVENT_TIMER_ERROR;
 			rte_errno = EINVAL;
 			break;
 		}
 
-		/* Checks passed, set up a message to enqueue */
-		msgs[i]->type = MSG_TYPE_ARM;
-		msgs[i]->evtim = evtims[i];
+		tim = tims[i];
+		rte_timer_init(tim);
 
-		/* Set the payload pointer if not set. */
-		if (evtims[i]->ev.event_ptr == NULL)
-			evtims[i]->ev.event_ptr = evtims[i];
+		evtims[i]->impl_opaque[0] = (uintptr_t)tim;
+		evtims[i]->impl_opaque[1] = (uintptr_t)adapter;
 
-		/* msg objects that get enqueued successfully will be freed
-		 * either by a future cancel operation or by the timer
-		 * expiration callback.
-		 */
-		if (rte_ring_enqueue(sw_data->msg_ring, msgs[i]) < 0) {
-			rte_errno = ENOSPC;
+		cycles = get_timeout_cycles(evtims[i], adapter);
+		ret = rte_timer_alt_reset(sw->timer_data_id, tim, cycles,
+					  SINGLE, lcore_id, NULL, evtims[i]);
+		if (ret < 0) {
+			/* tim was in RUNNING or CONFIG state */
+			evtims[i]->state = RTE_EVENT_TIMER_ERROR;
 			break;
 		}
 
-		EVTIM_LOG_DBG("enqueued ARM message to ring");
-
+		rte_smp_wmb();
+		EVTIM_LOG_DBG("armed an event timer");
 		evtims[i]->state = RTE_EVENT_TIMER_ARMED;
 	}
 
-	/* Let the service know we're done producing messages */
-	rte_atomic16_dec(&sw_data->message_producer_count);
-
 	if (i < nb_evtims)
-		rte_mempool_put_bulk(sw_data->msg_pool, (void **)&msgs[i],
-				     nb_evtims - i);
+		rte_mempool_put_bulk(sw->tim_pool,
+				     (void **)&tims[i], nb_evtims - i);
 
 	return i;
 }
 
 static uint16_t
-sw_event_timer_arm_burst(const struct rte_event_timer_adapter *adapter,
-			 struct rte_event_timer **evtims,
-			 uint16_t nb_evtims)
+swtim_arm_burst(const struct rte_event_timer_adapter *adapter,
+		struct rte_event_timer **evtims,
+		uint16_t nb_evtims)
 {
-	return __sw_event_timer_arm_burst(adapter, evtims, nb_evtims);
+	return __swtim_arm_burst(adapter, evtims, nb_evtims);
 }
 
 static uint16_t
-sw_event_timer_cancel_burst(const struct rte_event_timer_adapter *adapter,
-			    struct rte_event_timer **evtims,
-			    uint16_t nb_evtims)
+swtim_cancel_burst(const struct rte_event_timer_adapter *adapter,
+		   struct rte_event_timer **evtims,
+		   uint16_t nb_evtims)
 {
-	uint16_t i;
-	int ret;
-	struct rte_event_timer_adapter_sw_data *sw_data;
-	struct msg *msgs[nb_evtims];
+	int i, ret;
+	struct rte_timer *timp;
+	uint64_t opaque;
+	struct swtim *sw = swtim_pmd_priv(adapter);
 
 #ifdef RTE_LIBRTE_EVENTDEV_DEBUG
 	/* Check that the service is running. */
@@ -1211,23 +1105,6 @@ sw_event_timer_cancel_burst(const struct rte_event_timer_adapter *adapter,
 	}
 #endif
 
-	sw_data = adapter->data->adapter_priv;
-
-	ret = rte_mempool_get_bulk(sw_data->msg_pool, (void **)msgs, nb_evtims);
-	if (ret < 0) {
-		rte_errno = ENOSPC;
-		return 0;
-	}
-
-	/* Let the service know we're producing messages for it to process */
-	rte_atomic16_inc(&sw_data->message_producer_count);
-
-	/* If the service could be modifying event timer states, wait */
-	while (sw_data->service_phase == 2)
-		rte_pause();
-
-	rte_smp_rmb();
-
 	for (i = 0; i < nb_evtims; i++) {
 		/* Don't modify the event timer state in these cases */
 		if (evtims[i]->state == RTE_EVENT_TIMER_CANCELED) {
@@ -1238,54 +1115,56 @@ sw_event_timer_cancel_burst(const struct rte_event_timer_adapter *adapter,
 			break;
 		}
 
-		msgs[i]->type = MSG_TYPE_CANCEL;
-		msgs[i]->evtim = evtims[i];
+		rte_smp_rmb();
+
+		opaque = evtims[i]->impl_opaque[0];
+		timp = (struct rte_timer *)(uintptr_t)opaque;
+		RTE_ASSERT(timp != NULL);
 
-		if (rte_ring_enqueue(sw_data->msg_ring, msgs[i]) < 0) {
-			rte_errno = ENOSPC;
+		ret = rte_timer_alt_stop(sw->timer_data_id, timp);
+		if (ret < 0) {
+			/* Timer is running or being configured */
+			rte_errno = EAGAIN;
 			break;
 		}
 
-		EVTIM_LOG_DBG("enqueued CANCEL message to ring");
+		rte_mempool_put(sw->tim_pool, (void **)timp);
 
 		evtims[i]->state = RTE_EVENT_TIMER_CANCELED;
-	}
+		evtims[i]->impl_opaque[0] = 0;
+		evtims[i]->impl_opaque[1] = 0;
 
-	/* Let the service know we're done producing messages */
-	rte_atomic16_dec(&sw_data->message_producer_count);
-
-	if (i < nb_evtims)
-		rte_mempool_put_bulk(sw_data->msg_pool, (void **)&msgs[i],
-				     nb_evtims - i);
+		rte_smp_wmb();
+	}
 
 	return i;
 }
 
 static uint16_t
-sw_event_timer_arm_tmo_tick_burst(const struct rte_event_timer_adapter *adapter,
-				  struct rte_event_timer **evtims,
-				  uint64_t timeout_ticks,
-				  uint16_t nb_evtims)
+swtim_arm_tmo_tick_burst(const struct rte_event_timer_adapter *adapter,
+			 struct rte_event_timer **evtims,
+			 uint64_t timeout_ticks,
+			 uint16_t nb_evtims)
 {
 	int i;
 
 	for (i = 0; i < nb_evtims; i++)
 		evtims[i]->timeout_ticks = timeout_ticks;
 
-	return __sw_event_timer_arm_burst(adapter, evtims, nb_evtims);
+	return __swtim_arm_burst(adapter, evtims, nb_evtims);
 }
 
-static const struct rte_event_timer_adapter_ops sw_event_adapter_timer_ops = {
-	.init = sw_event_timer_adapter_init,
-	.uninit = sw_event_timer_adapter_uninit,
-	.start = sw_event_timer_adapter_start,
-	.stop = sw_event_timer_adapter_stop,
-	.get_info = sw_event_timer_adapter_get_info,
-	.stats_get = sw_event_timer_adapter_stats_get,
-	.stats_reset = sw_event_timer_adapter_stats_reset,
-	.arm_burst = sw_event_timer_arm_burst,
-	.arm_tmo_tick_burst = sw_event_timer_arm_tmo_tick_burst,
-	.cancel_burst = sw_event_timer_cancel_burst,
+static const struct rte_event_timer_adapter_ops swtim_ops = {
+	.init			= swtim_init,
+	.uninit			= swtim_uninit,
+	.start			= swtim_start,
+	.stop			= swtim_stop,
+	.get_info		= swtim_get_info,
+	.stats_get		= swtim_stats_get,
+	.stats_reset		= swtim_stats_reset,
+	.arm_burst		= swtim_arm_burst,
+	.arm_tmo_tick_burst	= swtim_arm_tmo_tick_burst,
+	.cancel_burst		= swtim_cancel_burst,
 };
 
 RTE_INIT(event_timer_adapter_init_log)
-- 
2.6.4

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [dpdk-dev] [PATCH v5 1/1] eventdev: add new software event timer adapter
  2019-04-22 14:57         ` [dpdk-dev] [PATCH v5 1/1] eventdev: add new " Erik Gabriel Carrillo
@ 2019-04-22 14:57           ` Erik Gabriel Carrillo
  0 siblings, 0 replies; 77+ messages in thread
From: Erik Gabriel Carrillo @ 2019-04-22 14:57 UTC (permalink / raw)
  To: jerin.jacob; +Cc: pbhagavatula, mattias.ronnblom, dev

This patch introduces a new version of the event timer adapter software
PMD. In the original design, timer event producer lcores in the primary
and secondary processes enqueued event timers into a ring, and a
service core in the primary process dequeued them and processed them
further.  To improve performance, this version does away with the ring
and lets lcores in both primary and secondary processes insert timers
directly into timer skiplist data structures; the service core directly
accesses the lists as well, when looking for timers that have expired.

Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
---
 lib/librte_eventdev/rte_event_timer_adapter.c | 703 +++++++++++---------------
 1 file changed, 291 insertions(+), 412 deletions(-)

diff --git a/lib/librte_eventdev/rte_event_timer_adapter.c b/lib/librte_eventdev/rte_event_timer_adapter.c
index 575da04..841c904 100644
--- a/lib/librte_eventdev/rte_event_timer_adapter.c
+++ b/lib/librte_eventdev/rte_event_timer_adapter.c
@@ -19,6 +19,7 @@
 #include <rte_timer.h>
 #include <rte_service_component.h>
 #include <rte_cycles.h>
+#include <rte_debug.h>
 
 #include "rte_eventdev.h"
 #include "rte_eventdev_pmd.h"
@@ -34,7 +35,7 @@ static int evtim_buffer_logtype;
 
 static struct rte_event_timer_adapter adapters[RTE_EVENT_TIMER_ADAPTER_NUM_MAX];
 
-static const struct rte_event_timer_adapter_ops sw_event_adapter_timer_ops;
+static const struct rte_event_timer_adapter_ops swtim_ops;
 
 #define EVTIM_LOG(level, logtype, ...) \
 	rte_log(RTE_LOG_ ## level, logtype, \
@@ -211,7 +212,7 @@ rte_event_timer_adapter_create_ext(
 	 * implementation.
 	 */
 	if (adapter->ops == NULL)
-		adapter->ops = &sw_event_adapter_timer_ops;
+		adapter->ops = &swtim_ops;
 
 	/* Allow driver to do some setup */
 	FUNC_PTR_OR_NULL_RET_WITH_ERRNO(adapter->ops->init, -ENOTSUP);
@@ -340,7 +341,7 @@ rte_event_timer_adapter_lookup(uint16_t adapter_id)
 	 * implementation.
 	 */
 	if (adapter->ops == NULL)
-		adapter->ops = &sw_event_adapter_timer_ops;
+		adapter->ops = &swtim_ops;
 
 	/* Set fast-path function pointers */
 	adapter->arm_burst = adapter->ops->arm_burst;
@@ -491,12 +492,17 @@ event_buffer_flush(struct event_buffer *bufp, uint8_t dev_id, uint8_t port_id,
 		n = head_idx - tail_idx;
 	else if (head_idx < tail_idx)
 		n = EVENT_BUFFER_SZ - tail_idx;
+	else if (event_buffer_full(bufp))
+		n = EVENT_BUFFER_SZ - tail_idx;
 	else {
+		/* Buffer empty */
+		RTE_ASSERT(bufp->head - bufp->tail == 0);
 		*nb_events_flushed = 0;
 		return;
 	}
 
 	*nb_events_inv = 0;
+
 	*nb_events_flushed = rte_event_enqueue_burst(dev_id, port_id,
 						     &events[tail_idx], n);
 	if (*nb_events_flushed != n && rte_errno == -EINVAL) {
@@ -504,137 +510,129 @@ event_buffer_flush(struct event_buffer *bufp, uint8_t dev_id, uint8_t port_id,
 		(*nb_events_inv)++;
 	}
 
+	if (*nb_events_flushed > 0)
+		EVTIM_BUF_LOG_DBG("enqueued %"PRIu16" timer events to event "
+				  "device", *nb_events_flushed);
+
 	bufp->tail = bufp->tail + *nb_events_flushed + *nb_events_inv;
 }
 
 /*
  * Software event timer adapter implementation
  */
-
-struct rte_event_timer_adapter_sw_data {
-	/* List of messages for outstanding timers */
-	TAILQ_HEAD(, msg) msgs_tailq_head;
-	/* Lock to guard tailq and armed count */
-	rte_spinlock_t msgs_tailq_sl;
+struct swtim {
 	/* Identifier of service executing timer management logic. */
 	uint32_t service_id;
 	/* The cycle count at which the adapter should next tick */
 	uint64_t next_tick_cycles;
-	/* Incremented as the service moves through phases of an iteration */
-	volatile int service_phase;
 	/* The tick resolution used by adapter instance. May have been
 	 * adjusted from what user requested
 	 */
 	uint64_t timer_tick_ns;
 	/* Maximum timeout in nanoseconds allowed by adapter instance. */
 	uint64_t max_tmo_ns;
-	/* Ring containing messages to arm or cancel event timers */
-	struct rte_ring *msg_ring;
-	/* Mempool containing msg objects */
-	struct rte_mempool *msg_pool;
 	/* Buffered timer expiry events to be enqueued to an event device. */
 	struct event_buffer buffer;
 	/* Statistics */
 	struct rte_event_timer_adapter_stats stats;
-	/* The number of threads currently adding to the message ring */
-	rte_atomic16_t message_producer_count;
+	/* Mempool of timer objects */
+	struct rte_mempool *tim_pool;
+	/* Back pointer for convenience */
+	struct rte_event_timer_adapter *adapter;
+	/* Identifier of timer data instance */
+	uint32_t timer_data_id;
+	/* Track which cores have actually armed a timer */
+	struct {
+		rte_atomic16_t v;
+	} __rte_cache_aligned in_use[RTE_MAX_LCORE];
+	/* Track which cores' timer lists should be polled */
+	unsigned int poll_lcores[RTE_MAX_LCORE];
+	/* The number of lists that should be polled */
+	int n_poll_lcores;
+	/* Lock to atomically access the above two variables */
+	rte_spinlock_t poll_lcores_sl;
+	struct rte_timer *expired_timers[EVENT_BUFFER_SZ];
+	int expired_timers_idx;
 };
 
-enum msg_type {MSG_TYPE_ARM, MSG_TYPE_CANCEL};
-
-struct msg {
-	enum msg_type type;
-	struct rte_event_timer *evtim;
-	struct rte_timer tim;
-	TAILQ_ENTRY(msg) msgs;
-};
+static inline struct swtim *
+swtim_pmd_priv(const struct rte_event_timer_adapter *adapter)
+{
+	return adapter->data->adapter_priv;
+}
 
 static void
-sw_event_timer_cb(struct rte_timer *tim, void *arg)
+swtim_callback(struct rte_timer *tim)
 {
-	int ret;
+	struct rte_event_timer *evtim = tim->arg;
+	struct rte_event_timer_adapter *adapter;
+	struct swtim *sw;
 	uint16_t nb_evs_flushed = 0;
 	uint16_t nb_evs_invalid = 0;
 	uint64_t opaque;
-	struct rte_event_timer *evtim;
-	struct rte_event_timer_adapter *adapter;
-	struct rte_event_timer_adapter_sw_data *sw_data;
+	int ret;
 
-	evtim = arg;
 	opaque = evtim->impl_opaque[1];
 	adapter = (struct rte_event_timer_adapter *)(uintptr_t)opaque;
-	sw_data = adapter->data->adapter_priv;
+	sw = swtim_pmd_priv(adapter);
 
-	ret = event_buffer_add(&sw_data->buffer, &evtim->ev);
+	ret = event_buffer_add(&sw->buffer, &evtim->ev);
 	if (ret < 0) {
 		/* If event buffer is full, put timer back in list with
 		 * immediate expiry value, so that we process it again on the
 		 * next iteration.
 		 */
-		rte_timer_reset_sync(tim, 0, SINGLE, rte_lcore_id(),
-				     sw_event_timer_cb, evtim);
+		rte_timer_alt_reset(sw->timer_data_id, tim, 0, SINGLE,
+				    rte_lcore_id(), NULL, evtim);
+
+		sw->stats.evtim_retry_count++;
 
-		sw_data->stats.evtim_retry_count++;
 		EVTIM_LOG_DBG("event buffer full, resetting rte_timer with "
 			      "immediate expiry value");
 	} else {
-		struct msg *m = container_of(tim, struct msg, tim);
-		TAILQ_REMOVE(&sw_data->msgs_tailq_head, m, msgs);
 		EVTIM_BUF_LOG_DBG("buffered an event timer expiry event");
-		evtim->state = RTE_EVENT_TIMER_NOT_ARMED;
 
-		/* Free the msg object containing the rte_timer now that
-		 * we've buffered its event successfully.
-		 */
-		rte_mempool_put(sw_data->msg_pool, m);
+		sw->expired_timers[sw->expired_timers_idx++] = tim;
+		RTE_ASSERT(sw->expired_timers_idx <= EVENT_BUFFER_SZ);
 
-		/* Bump the count when we successfully add an expiry event to
-		 * the buffer.
-		 */
-		sw_data->stats.evtim_exp_count++;
+		sw->stats.evtim_exp_count++;
+
+		evtim->state = RTE_EVENT_TIMER_NOT_ARMED;
 	}
 
-	if (event_buffer_batch_ready(&sw_data->buffer)) {
-		event_buffer_flush(&sw_data->buffer,
+	if (event_buffer_batch_ready(&sw->buffer)) {
+		event_buffer_flush(&sw->buffer,
 				   adapter->data->event_dev_id,
 				   adapter->data->event_port_id,
 				   &nb_evs_flushed,
 				   &nb_evs_invalid);
 
-		sw_data->stats.ev_enq_count += nb_evs_flushed;
-		sw_data->stats.ev_inv_count += nb_evs_invalid;
+		sw->stats.ev_enq_count += nb_evs_flushed;
+		sw->stats.ev_inv_count += nb_evs_invalid;
 	}
 }
 
 static __rte_always_inline uint64_t
 get_timeout_cycles(struct rte_event_timer *evtim,
-		   struct rte_event_timer_adapter *adapter)
+		   const struct rte_event_timer_adapter *adapter)
 {
-	uint64_t timeout_ns;
-	struct rte_event_timer_adapter_sw_data *sw_data;
-
-	sw_data = adapter->data->adapter_priv;
-	timeout_ns = evtim->timeout_ticks * sw_data->timer_tick_ns;
+	struct swtim *sw = swtim_pmd_priv(adapter);
+	uint64_t timeout_ns = evtim->timeout_ticks * sw->timer_tick_ns;
 	return timeout_ns * rte_get_timer_hz() / NSECPERSEC;
-
 }
 
 /* This function returns true if one or more (adapter) ticks have occurred since
  * the last time it was called.
  */
 static inline bool
-adapter_did_tick(struct rte_event_timer_adapter *adapter)
+swtim_did_tick(struct swtim *sw)
 {
 	uint64_t cycles_per_adapter_tick, start_cycles;
 	uint64_t *next_tick_cyclesp;
-	struct rte_event_timer_adapter_sw_data *sw_data;
 
-	sw_data = adapter->data->adapter_priv;
-	next_tick_cyclesp = &sw_data->next_tick_cycles;
-
-	cycles_per_adapter_tick = sw_data->timer_tick_ns *
+	next_tick_cyclesp = &sw->next_tick_cycles;
+	cycles_per_adapter_tick = sw->timer_tick_ns *
 			(rte_get_timer_hz() / NSECPERSEC);
-
 	start_cycles = rte_get_timer_cycles();
 
 	/* Note: initially, *next_tick_cyclesp == 0, so the clause below will
@@ -646,7 +644,6 @@ adapter_did_tick(struct rte_event_timer_adapter *adapter)
 		 * boundary.
 		 */
 		start_cycles -= start_cycles % cycles_per_adapter_tick;
-
 		*next_tick_cyclesp = start_cycles + cycles_per_adapter_tick;
 
 		return true;
@@ -661,15 +658,12 @@ check_timeout(struct rte_event_timer *evtim,
 	      const struct rte_event_timer_adapter *adapter)
 {
 	uint64_t tmo_nsec;
-	struct rte_event_timer_adapter_sw_data *sw_data;
-
-	sw_data = adapter->data->adapter_priv;
-	tmo_nsec = evtim->timeout_ticks * sw_data->timer_tick_ns;
+	struct swtim *sw = swtim_pmd_priv(adapter);
 
-	if (tmo_nsec > sw_data->max_tmo_ns)
+	tmo_nsec = evtim->timeout_ticks * sw->timer_tick_ns;
+	if (tmo_nsec > sw->max_tmo_ns)
 		return -1;
-
-	if (tmo_nsec < sw_data->timer_tick_ns)
+	if (tmo_nsec < sw->timer_tick_ns)
 		return -2;
 
 	return 0;
@@ -697,110 +691,39 @@ check_destination_event_queue(struct rte_event_timer *evtim,
 	return 0;
 }
 
-#define NB_OBJS 32
 static int
-sw_event_timer_adapter_service_func(void *arg)
+swtim_service_func(void *arg)
 {
-	int i, num_msgs;
-	uint64_t cycles, opaque;
+	struct rte_event_timer_adapter *adapter = arg;
+	struct swtim *sw = swtim_pmd_priv(adapter);
 	uint16_t nb_evs_flushed = 0;
 	uint16_t nb_evs_invalid = 0;
-	struct rte_event_timer_adapter *adapter;
-	struct rte_event_timer_adapter_sw_data *sw_data;
-	struct rte_event_timer *evtim = NULL;
-	struct rte_timer *tim = NULL;
-	struct msg *msg, *msgs[NB_OBJS];
-
-	adapter = arg;
-	sw_data = adapter->data->adapter_priv;
-
-	sw_data->service_phase = 1;
-	rte_smp_wmb();
-
-	while (rte_atomic16_read(&sw_data->message_producer_count) > 0 ||
-	       !rte_ring_empty(sw_data->msg_ring)) {
-
-		num_msgs = rte_ring_dequeue_burst(sw_data->msg_ring,
-						  (void **)msgs, NB_OBJS, NULL);
-
-		for (i = 0; i < num_msgs; i++) {
-			int ret = 0;
-
-			RTE_SET_USED(ret);
-
-			msg = msgs[i];
-			evtim = msg->evtim;
-
-			switch (msg->type) {
-			case MSG_TYPE_ARM:
-				EVTIM_SVC_LOG_DBG("dequeued ARM message from "
-						  "ring");
-				tim = &msg->tim;
-				rte_timer_init(tim);
-				cycles = get_timeout_cycles(evtim,
-							    adapter);
-				ret = rte_timer_reset(tim, cycles, SINGLE,
-						      rte_lcore_id(),
-						      sw_event_timer_cb,
-						      evtim);
-				RTE_ASSERT(ret == 0);
-
-				evtim->impl_opaque[0] = (uintptr_t)tim;
-				evtim->impl_opaque[1] = (uintptr_t)adapter;
-
-				TAILQ_INSERT_TAIL(&sw_data->msgs_tailq_head,
-						  msg,
-						  msgs);
-				break;
-			case MSG_TYPE_CANCEL:
-				EVTIM_SVC_LOG_DBG("dequeued CANCEL message "
-						  "from ring");
-				opaque = evtim->impl_opaque[0];
-				tim = (struct rte_timer *)(uintptr_t)opaque;
-				RTE_ASSERT(tim != NULL);
-
-				ret = rte_timer_stop(tim);
-				RTE_ASSERT(ret == 0);
-
-				/* Free the msg object for the original arm
-				 * request.
-				 */
-				struct msg *m;
-				m = container_of(tim, struct msg, tim);
-				TAILQ_REMOVE(&sw_data->msgs_tailq_head, m,
-					     msgs);
-				rte_mempool_put(sw_data->msg_pool, m);
-
-				/* Free the msg object for the current msg */
-				rte_mempool_put(sw_data->msg_pool, msg);
-
-				evtim->impl_opaque[0] = 0;
-				evtim->impl_opaque[1] = 0;
-
-				break;
-			}
-		}
-	}
-
-	sw_data->service_phase = 2;
-	rte_smp_wmb();
 
-	if (adapter_did_tick(adapter)) {
-		rte_timer_manage();
-
-		event_buffer_flush(&sw_data->buffer,
+	if (swtim_did_tick(sw)) {
+		/* This lock is seldom acquired on the arm side */
+		rte_spinlock_lock(&sw->poll_lcores_sl);
+		rte_timer_alt_manage(sw->timer_data_id,
+				     sw->poll_lcores,
+				     sw->n_poll_lcores,
+				     swtim_callback);
+		rte_spinlock_unlock(&sw->poll_lcores_sl);
+
+		/* Return expired timer objects back to mempool */
+		rte_mempool_put_bulk(sw->tim_pool, (void **)sw->expired_timers,
+				     sw->expired_timers_idx);
+		sw->expired_timers_idx = 0;
+
+		event_buffer_flush(&sw->buffer,
 				   adapter->data->event_dev_id,
 				   adapter->data->event_port_id,
-				   &nb_evs_flushed, &nb_evs_invalid);
+				   &nb_evs_flushed,
+				   &nb_evs_invalid);
 
-		sw_data->stats.ev_enq_count += nb_evs_flushed;
-		sw_data->stats.ev_inv_count += nb_evs_invalid;
-		sw_data->stats.adapter_tick_count++;
+		sw->stats.ev_enq_count += nb_evs_flushed;
+		sw->stats.ev_inv_count += nb_evs_invalid;
+		sw->stats.adapter_tick_count++;
 	}
 
-	sw_data->service_phase = 0;
-	rte_smp_wmb();
-
 	return 0;
 }
 
@@ -834,168 +757,145 @@ compute_msg_mempool_cache_size(uint64_t nb_requested, uint64_t nb_actual)
 	return cache_size;
 }
 
-#define SW_MIN_INTERVAL 1E5
-
 static int
-sw_event_timer_adapter_init(struct rte_event_timer_adapter *adapter)
+swtim_init(struct rte_event_timer_adapter *adapter)
 {
-	int ret;
-	struct rte_event_timer_adapter_sw_data *sw_data;
-	uint64_t nb_timers;
+	int i, ret;
+	struct swtim *sw;
 	unsigned int flags;
 	struct rte_service_spec service;
-	static bool timer_subsystem_inited; // static initialized to false
 
-	/* Allocate storage for SW implementation data */
-	char priv_data_name[RTE_RING_NAMESIZE];
-	snprintf(priv_data_name, RTE_RING_NAMESIZE, "sw_evtim_adap_priv_%"PRIu8,
-		 adapter->data->id);
-	adapter->data->adapter_priv = rte_zmalloc_socket(
-				priv_data_name,
-				sizeof(struct rte_event_timer_adapter_sw_data),
-				RTE_CACHE_LINE_SIZE,
-				adapter->data->socket_id);
-	if (adapter->data->adapter_priv == NULL) {
+	/* Allocate storage for private data area */
+#define SWTIM_NAMESIZE 32
+	char swtim_name[SWTIM_NAMESIZE];
+	snprintf(swtim_name, SWTIM_NAMESIZE, "swtim_%"PRIu8,
+			adapter->data->id);
+	sw = rte_zmalloc_socket(swtim_name, sizeof(*sw), RTE_CACHE_LINE_SIZE,
+			adapter->data->socket_id);
+	if (sw == NULL) {
 		EVTIM_LOG_ERR("failed to allocate space for private data");
 		rte_errno = ENOMEM;
 		return -1;
 	}
 
-	if (adapter->data->conf.timer_tick_ns < SW_MIN_INTERVAL) {
-		EVTIM_LOG_ERR("failed to create adapter with requested tick "
-			      "interval");
-		rte_errno = EINVAL;
-		return -1;
-	}
-
-	sw_data = adapter->data->adapter_priv;
-
-	sw_data->timer_tick_ns = adapter->data->conf.timer_tick_ns;
-	sw_data->max_tmo_ns = adapter->data->conf.max_tmo_ns;
+	/* Connect storage to adapter instance */
+	adapter->data->adapter_priv = sw;
+	sw->adapter = adapter;
 
-	TAILQ_INIT(&sw_data->msgs_tailq_head);
-	rte_spinlock_init(&sw_data->msgs_tailq_sl);
-	rte_atomic16_init(&sw_data->message_producer_count);
+	sw->timer_tick_ns = adapter->data->conf.timer_tick_ns;
+	sw->max_tmo_ns = adapter->data->conf.max_tmo_ns;
 
-	/* Rings require power of 2, so round up to next such value */
-	nb_timers = rte_align64pow2(adapter->data->conf.nb_timers);
-
-	char msg_ring_name[RTE_RING_NAMESIZE];
-	snprintf(msg_ring_name, RTE_RING_NAMESIZE,
-		 "sw_evtim_adap_msg_ring_%"PRIu8, adapter->data->id);
-	flags = adapter->data->conf.flags & RTE_EVENT_TIMER_ADAPTER_F_SP_PUT ?
-		RING_F_SP_ENQ | RING_F_SC_DEQ :
-		RING_F_SC_DEQ;
-	sw_data->msg_ring = rte_ring_create(msg_ring_name, nb_timers,
-					    adapter->data->socket_id, flags);
-	if (sw_data->msg_ring == NULL) {
-		EVTIM_LOG_ERR("failed to create message ring");
-		rte_errno = ENOMEM;
-		goto free_priv_data;
-	}
-
-	char pool_name[RTE_RING_NAMESIZE];
-	snprintf(pool_name, RTE_RING_NAMESIZE, "sw_evtim_adap_msg_pool_%"PRIu8,
+	/* Create a timer pool */
+	char pool_name[SWTIM_NAMESIZE];
+	snprintf(pool_name, SWTIM_NAMESIZE, "swtim_pool_%"PRIu8,
 		 adapter->data->id);
-
-	/* Both the arming/canceling thread and the service thread will do puts
-	 * to the mempool, but if the SP_PUT flag is enabled, we can specify
-	 * single-consumer get for the mempool.
-	 */
-	flags = adapter->data->conf.flags & RTE_EVENT_TIMER_ADAPTER_F_SP_PUT ?
-		MEMPOOL_F_SC_GET : 0;
-
-	/* The usable size of a ring is count - 1, so subtract one here to
-	 * make the counts agree.
-	 */
+	/* Optimal mempool size is a power of 2 minus one */
+	uint64_t nb_timers = rte_align64pow2(adapter->data->conf.nb_timers);
 	int pool_size = nb_timers - 1;
 	int cache_size = compute_msg_mempool_cache_size(
 				adapter->data->conf.nb_timers, nb_timers);
-	sw_data->msg_pool = rte_mempool_create(pool_name, pool_size,
-					       sizeof(struct msg), cache_size,
-					       0, NULL, NULL, NULL, NULL,
-					       adapter->data->socket_id, flags);
-	if (sw_data->msg_pool == NULL) {
-		EVTIM_LOG_ERR("failed to create message object mempool");
+	flags = 0; /* pool is multi-producer, multi-consumer */
+	sw->tim_pool = rte_mempool_create(pool_name, pool_size,
+			sizeof(struct rte_timer), cache_size, 0, NULL, NULL,
+			NULL, NULL, adapter->data->socket_id, flags);
+	if (sw->tim_pool == NULL) {
+		EVTIM_LOG_ERR("failed to create timer object mempool");
 		rte_errno = ENOMEM;
-		goto free_msg_ring;
+		goto free_alloc;
+	}
+
+	/* Initialize the variables that track in-use timer lists */
+	rte_spinlock_init(&sw->poll_lcores_sl);
+	for (i = 0; i < RTE_MAX_LCORE; i++)
+		rte_atomic16_init(&sw->in_use[i].v);
+
+	/* Initialize the timer subsystem and allocate timer data instance */
+	ret = rte_timer_subsystem_init();
+	if (ret < 0) {
+		if (ret != -EALREADY) {
+			EVTIM_LOG_ERR("failed to initialize timer subsystem");
+			rte_errno = ret;
+			goto free_mempool;
+		}
 	}
 
-	event_buffer_init(&sw_data->buffer);
+	ret = rte_timer_data_alloc(&sw->timer_data_id);
+	if (ret < 0) {
+		EVTIM_LOG_ERR("failed to allocate timer data instance");
+		rte_errno = ret;
+		goto free_mempool;
+	}
+
+	/* Initialize timer event buffer */
+	event_buffer_init(&sw->buffer);
+
+	sw->adapter = adapter;
 
 	/* Register a service component to run adapter logic */
 	memset(&service, 0, sizeof(service));
 	snprintf(service.name, RTE_SERVICE_NAME_MAX,
-		 "sw_evimer_adap_svc_%"PRIu8, adapter->data->id);
+		 "swtim_svc_%"PRIu8, adapter->data->id);
 	service.socket_id = adapter->data->socket_id;
-	service.callback = sw_event_timer_adapter_service_func;
+	service.callback = swtim_service_func;
 	service.callback_userdata = adapter;
 	service.capabilities &= ~(RTE_SERVICE_CAP_MT_SAFE);
-	ret = rte_service_component_register(&service, &sw_data->service_id);
+	ret = rte_service_component_register(&service, &sw->service_id);
 	if (ret < 0) {
 		EVTIM_LOG_ERR("failed to register service %s with id %"PRIu32
-			      ": err = %d", service.name, sw_data->service_id,
+			      ": err = %d", service.name, sw->service_id,
 			      ret);
 
 		rte_errno = ENOSPC;
-		goto free_msg_pool;
+		goto free_mempool;
 	}
 
 	EVTIM_LOG_DBG("registered service %s with id %"PRIu32, service.name,
-		      sw_data->service_id);
+		      sw->service_id);
 
-	adapter->data->service_id = sw_data->service_id;
+	adapter->data->service_id = sw->service_id;
 	adapter->data->service_inited = 1;
 
-	if (!timer_subsystem_inited) {
-		rte_timer_subsystem_init();
-		timer_subsystem_inited = true;
-	}
-
 	return 0;
-
-free_msg_pool:
-	rte_mempool_free(sw_data->msg_pool);
-free_msg_ring:
-	rte_ring_free(sw_data->msg_ring);
-free_priv_data:
-	rte_free(sw_data);
+free_mempool:
+	rte_mempool_free(sw->tim_pool);
+free_alloc:
+	rte_free(sw);
 	return -1;
 }
 
-static int
-sw_event_timer_adapter_uninit(struct rte_event_timer_adapter *adapter)
+static void
+swtim_free_tim(struct rte_timer *tim, void *arg)
 {
-	int ret;
-	struct msg *m1, *m2;
-	struct rte_event_timer_adapter_sw_data *sw_data =
-						adapter->data->adapter_priv;
+	struct swtim *sw = arg;
 
-	rte_spinlock_lock(&sw_data->msgs_tailq_sl);
-
-	/* Cancel outstanding rte_timers and free msg objects */
-	m1 = TAILQ_FIRST(&sw_data->msgs_tailq_head);
-	while (m1 != NULL) {
-		EVTIM_LOG_DBG("freeing outstanding timer");
-		m2 = TAILQ_NEXT(m1, msgs);
-
-		rte_timer_stop_sync(&m1->tim);
-		rte_mempool_put(sw_data->msg_pool, m1);
+	rte_mempool_put(sw->tim_pool, tim);
+}
 
-		m1 = m2;
-	}
+/* Traverse the list of outstanding timers and put them back in the mempool
+ * before freeing the adapter to avoid leaking the memory.
+ */
+static int
+swtim_uninit(struct rte_event_timer_adapter *adapter)
+{
+	int ret;
+	struct swtim *sw = swtim_pmd_priv(adapter);
 
-	rte_spinlock_unlock(&sw_data->msgs_tailq_sl);
+	/* Free outstanding timers */
+	rte_timer_stop_all(sw->timer_data_id,
+			   sw->poll_lcores,
+			   sw->n_poll_lcores,
+			   swtim_free_tim,
+			   sw);
 
-	ret = rte_service_component_unregister(sw_data->service_id);
+	ret = rte_service_component_unregister(sw->service_id);
 	if (ret < 0) {
 		EVTIM_LOG_ERR("failed to unregister service component");
 		return ret;
 	}
 
-	rte_ring_free(sw_data->msg_ring);
-	rte_mempool_free(sw_data->msg_pool);
-	rte_free(adapter->data->adapter_priv);
+	rte_mempool_free(sw->tim_pool);
+	rte_free(sw);
+	adapter->data->adapter_priv = NULL;
 
 	return 0;
 }
@@ -1016,88 +916,79 @@ get_mapped_count_for_service(uint32_t service_id)
 }
 
 static int
-sw_event_timer_adapter_start(const struct rte_event_timer_adapter *adapter)
+swtim_start(const struct rte_event_timer_adapter *adapter)
 {
 	int mapped_count;
-	struct rte_event_timer_adapter_sw_data *sw_data;
-
-	sw_data = adapter->data->adapter_priv;
+	struct swtim *sw = swtim_pmd_priv(adapter);
 
 	/* Mapping the service to more than one service core can introduce
 	 * delays while one thread is waiting to acquire a lock, so only allow
 	 * one core to be mapped to the service.
+	 *
+	 * Note: the service could be modified such that it spreads cores to
+	 * poll over multiple service instances.
 	 */
-	mapped_count = get_mapped_count_for_service(sw_data->service_id);
+	mapped_count = get_mapped_count_for_service(sw->service_id);
 
-	if (mapped_count == 1)
-		return rte_service_component_runstate_set(sw_data->service_id,
-							  1);
+	if (mapped_count != 1)
+		return mapped_count < 1 ? -ENOENT : -ENOTSUP;
 
-	return mapped_count < 1 ? -ENOENT : -ENOTSUP;
+	return rte_service_component_runstate_set(sw->service_id, 1);
 }
 
 static int
-sw_event_timer_adapter_stop(const struct rte_event_timer_adapter *adapter)
+swtim_stop(const struct rte_event_timer_adapter *adapter)
 {
 	int ret;
-	struct rte_event_timer_adapter_sw_data *sw_data =
-						adapter->data->adapter_priv;
+	struct swtim *sw = swtim_pmd_priv(adapter);
 
-	ret = rte_service_component_runstate_set(sw_data->service_id, 0);
+	ret = rte_service_component_runstate_set(sw->service_id, 0);
 	if (ret < 0)
 		return ret;
 
-	/* Wait for the service to complete its final iteration before
-	 * stopping.
-	 */
-	while (sw_data->service_phase != 0)
+	/* Wait for the service to complete its final iteration */
+	while (rte_service_may_be_active(sw->service_id))
 		rte_pause();
 
-	rte_smp_rmb();
-
 	return 0;
 }
 
 static void
-sw_event_timer_adapter_get_info(const struct rte_event_timer_adapter *adapter,
+swtim_get_info(const struct rte_event_timer_adapter *adapter,
 		struct rte_event_timer_adapter_info *adapter_info)
 {
-	struct rte_event_timer_adapter_sw_data *sw_data;
-	sw_data = adapter->data->adapter_priv;
-
-	adapter_info->min_resolution_ns = sw_data->timer_tick_ns;
-	adapter_info->max_tmo_ns = sw_data->max_tmo_ns;
+	struct swtim *sw = swtim_pmd_priv(adapter);
+	adapter_info->min_resolution_ns = sw->timer_tick_ns;
+	adapter_info->max_tmo_ns = sw->max_tmo_ns;
 }
 
 static int
-sw_event_timer_adapter_stats_get(const struct rte_event_timer_adapter *adapter,
-				 struct rte_event_timer_adapter_stats *stats)
+swtim_stats_get(const struct rte_event_timer_adapter *adapter,
+		struct rte_event_timer_adapter_stats *stats)
 {
-	struct rte_event_timer_adapter_sw_data *sw_data;
-	sw_data = adapter->data->adapter_priv;
-	*stats = sw_data->stats;
+	struct swtim *sw = swtim_pmd_priv(adapter);
+	*stats = sw->stats; /* structure copy */
 	return 0;
 }
 
 static int
-sw_event_timer_adapter_stats_reset(
-				const struct rte_event_timer_adapter *adapter)
+swtim_stats_reset(const struct rte_event_timer_adapter *adapter)
 {
-	struct rte_event_timer_adapter_sw_data *sw_data;
-	sw_data = adapter->data->adapter_priv;
-	memset(&sw_data->stats, 0, sizeof(sw_data->stats));
+	struct swtim *sw = swtim_pmd_priv(adapter);
+	memset(&sw->stats, 0, sizeof(sw->stats));
 	return 0;
 }
 
-static __rte_always_inline uint16_t
-__sw_event_timer_arm_burst(const struct rte_event_timer_adapter *adapter,
-			  struct rte_event_timer **evtims,
-			  uint16_t nb_evtims)
+static uint16_t
+__swtim_arm_burst(const struct rte_event_timer_adapter *adapter,
+		struct rte_event_timer **evtims,
+		uint16_t nb_evtims)
 {
-	uint16_t i;
-	int ret;
-	struct rte_event_timer_adapter_sw_data *sw_data;
-	struct msg *msgs[nb_evtims];
+	int i, ret;
+	struct swtim *sw = swtim_pmd_priv(adapter);
+	uint32_t lcore_id = rte_lcore_id();
+	struct rte_timer *tim, *tims[nb_evtims];
+	uint64_t cycles;
 
 #ifdef RTE_LIBRTE_EVENTDEV_DEBUG
 	/* Check that the service is running. */
@@ -1107,101 +998,104 @@ __sw_event_timer_arm_burst(const struct rte_event_timer_adapter *adapter,
 	}
 #endif
 
-	sw_data = adapter->data->adapter_priv;
+	/* Adjust lcore_id if non-EAL thread. Arbitrarily pick the timer list of
+	 * the highest lcore to insert such timers into
+	 */
+	if (lcore_id == LCORE_ID_ANY)
+		lcore_id = RTE_MAX_LCORE - 1;
+
+	/* If this is the first time we're arming an event timer on this lcore,
+	 * mark this lcore as "in use"; this will cause the service
+	 * function to process the timer list that corresponds to this lcore.
+	 */
+	if (unlikely(rte_atomic16_test_and_set(&sw->in_use[lcore_id].v))) {
+		rte_spinlock_lock(&sw->poll_lcores_sl);
+		EVTIM_LOG_DBG("Adding lcore id = %u to list of lcores to poll",
+			      lcore_id);
+		sw->poll_lcores[sw->n_poll_lcores++] = lcore_id;
+		rte_spinlock_unlock(&sw->poll_lcores_sl);
+	}
 
-	ret = rte_mempool_get_bulk(sw_data->msg_pool, (void **)msgs, nb_evtims);
+	ret = rte_mempool_get_bulk(sw->tim_pool, (void **)tims,
+				   nb_evtims);
 	if (ret < 0) {
 		rte_errno = ENOSPC;
 		return 0;
 	}
 
-	/* Let the service know we're producing messages for it to process */
-	rte_atomic16_inc(&sw_data->message_producer_count);
-
-	/* If the service is managing timers, wait for it to finish */
-	while (sw_data->service_phase == 2)
-		rte_pause();
-
-	rte_smp_rmb();
-
 	for (i = 0; i < nb_evtims; i++) {
 		/* Don't modify the event timer state in these cases */
 		if (evtims[i]->state == RTE_EVENT_TIMER_ARMED) {
 			rte_errno = EALREADY;
 			break;
 		} else if (!(evtims[i]->state == RTE_EVENT_TIMER_NOT_ARMED ||
-		    evtims[i]->state == RTE_EVENT_TIMER_CANCELED)) {
+			     evtims[i]->state == RTE_EVENT_TIMER_CANCELED)) {
 			rte_errno = EINVAL;
 			break;
 		}
 
 		ret = check_timeout(evtims[i], adapter);
-		if (ret == -1) {
+		if (unlikely(ret == -1)) {
 			evtims[i]->state = RTE_EVENT_TIMER_ERROR_TOOLATE;
 			rte_errno = EINVAL;
 			break;
-		}
-		if (ret == -2) {
+		} else if (unlikely(ret == -2)) {
 			evtims[i]->state = RTE_EVENT_TIMER_ERROR_TOOEARLY;
 			rte_errno = EINVAL;
 			break;
 		}
 
-		if (check_destination_event_queue(evtims[i], adapter) < 0) {
+		if (unlikely(check_destination_event_queue(evtims[i],
+							   adapter) < 0)) {
 			evtims[i]->state = RTE_EVENT_TIMER_ERROR;
 			rte_errno = EINVAL;
 			break;
 		}
 
-		/* Checks passed, set up a message to enqueue */
-		msgs[i]->type = MSG_TYPE_ARM;
-		msgs[i]->evtim = evtims[i];
+		tim = tims[i];
+		rte_timer_init(tim);
 
-		/* Set the payload pointer if not set. */
-		if (evtims[i]->ev.event_ptr == NULL)
-			evtims[i]->ev.event_ptr = evtims[i];
+		evtims[i]->impl_opaque[0] = (uintptr_t)tim;
+		evtims[i]->impl_opaque[1] = (uintptr_t)adapter;
 
-		/* msg objects that get enqueued successfully will be freed
-		 * either by a future cancel operation or by the timer
-		 * expiration callback.
-		 */
-		if (rte_ring_enqueue(sw_data->msg_ring, msgs[i]) < 0) {
-			rte_errno = ENOSPC;
+		cycles = get_timeout_cycles(evtims[i], adapter);
+		ret = rte_timer_alt_reset(sw->timer_data_id, tim, cycles,
+					  SINGLE, lcore_id, NULL, evtims[i]);
+		if (ret < 0) {
+			/* tim was in RUNNING or CONFIG state */
+			evtims[i]->state = RTE_EVENT_TIMER_ERROR;
 			break;
 		}
 
-		EVTIM_LOG_DBG("enqueued ARM message to ring");
-
+		rte_smp_wmb();
+		EVTIM_LOG_DBG("armed an event timer");
 		evtims[i]->state = RTE_EVENT_TIMER_ARMED;
 	}
 
-	/* Let the service know we're done producing messages */
-	rte_atomic16_dec(&sw_data->message_producer_count);
-
 	if (i < nb_evtims)
-		rte_mempool_put_bulk(sw_data->msg_pool, (void **)&msgs[i],
-				     nb_evtims - i);
+		rte_mempool_put_bulk(sw->tim_pool,
+				     (void **)&tims[i], nb_evtims - i);
 
 	return i;
 }
 
 static uint16_t
-sw_event_timer_arm_burst(const struct rte_event_timer_adapter *adapter,
-			 struct rte_event_timer **evtims,
-			 uint16_t nb_evtims)
+swtim_arm_burst(const struct rte_event_timer_adapter *adapter,
+		struct rte_event_timer **evtims,
+		uint16_t nb_evtims)
 {
-	return __sw_event_timer_arm_burst(adapter, evtims, nb_evtims);
+	return __swtim_arm_burst(adapter, evtims, nb_evtims);
 }
 
 static uint16_t
-sw_event_timer_cancel_burst(const struct rte_event_timer_adapter *adapter,
-			    struct rte_event_timer **evtims,
-			    uint16_t nb_evtims)
+swtim_cancel_burst(const struct rte_event_timer_adapter *adapter,
+		   struct rte_event_timer **evtims,
+		   uint16_t nb_evtims)
 {
-	uint16_t i;
-	int ret;
-	struct rte_event_timer_adapter_sw_data *sw_data;
-	struct msg *msgs[nb_evtims];
+	int i, ret;
+	struct rte_timer *timp;
+	uint64_t opaque;
+	struct swtim *sw = swtim_pmd_priv(adapter);
 
 #ifdef RTE_LIBRTE_EVENTDEV_DEBUG
 	/* Check that the service is running. */
@@ -1211,23 +1105,6 @@ sw_event_timer_cancel_burst(const struct rte_event_timer_adapter *adapter,
 	}
 #endif
 
-	sw_data = adapter->data->adapter_priv;
-
-	ret = rte_mempool_get_bulk(sw_data->msg_pool, (void **)msgs, nb_evtims);
-	if (ret < 0) {
-		rte_errno = ENOSPC;
-		return 0;
-	}
-
-	/* Let the service know we're producing messages for it to process */
-	rte_atomic16_inc(&sw_data->message_producer_count);
-
-	/* If the service could be modifying event timer states, wait */
-	while (sw_data->service_phase == 2)
-		rte_pause();
-
-	rte_smp_rmb();
-
 	for (i = 0; i < nb_evtims; i++) {
 		/* Don't modify the event timer state in these cases */
 		if (evtims[i]->state == RTE_EVENT_TIMER_CANCELED) {
@@ -1238,54 +1115,56 @@ sw_event_timer_cancel_burst(const struct rte_event_timer_adapter *adapter,
 			break;
 		}
 
-		msgs[i]->type = MSG_TYPE_CANCEL;
-		msgs[i]->evtim = evtims[i];
+		rte_smp_rmb();
+
+		opaque = evtims[i]->impl_opaque[0];
+		timp = (struct rte_timer *)(uintptr_t)opaque;
+		RTE_ASSERT(timp != NULL);
 
-		if (rte_ring_enqueue(sw_data->msg_ring, msgs[i]) < 0) {
-			rte_errno = ENOSPC;
+		ret = rte_timer_alt_stop(sw->timer_data_id, timp);
+		if (ret < 0) {
+			/* Timer is running or being configured */
+			rte_errno = EAGAIN;
 			break;
 		}
 
-		EVTIM_LOG_DBG("enqueued CANCEL message to ring");
+		rte_mempool_put(sw->tim_pool, (void **)timp);
 
 		evtims[i]->state = RTE_EVENT_TIMER_CANCELED;
-	}
+		evtims[i]->impl_opaque[0] = 0;
+		evtims[i]->impl_opaque[1] = 0;
 
-	/* Let the service know we're done producing messages */
-	rte_atomic16_dec(&sw_data->message_producer_count);
-
-	if (i < nb_evtims)
-		rte_mempool_put_bulk(sw_data->msg_pool, (void **)&msgs[i],
-				     nb_evtims - i);
+		rte_smp_wmb();
+	}
 
 	return i;
 }
 
 static uint16_t
-sw_event_timer_arm_tmo_tick_burst(const struct rte_event_timer_adapter *adapter,
-				  struct rte_event_timer **evtims,
-				  uint64_t timeout_ticks,
-				  uint16_t nb_evtims)
+swtim_arm_tmo_tick_burst(const struct rte_event_timer_adapter *adapter,
+			 struct rte_event_timer **evtims,
+			 uint64_t timeout_ticks,
+			 uint16_t nb_evtims)
 {
 	int i;
 
 	for (i = 0; i < nb_evtims; i++)
 		evtims[i]->timeout_ticks = timeout_ticks;
 
-	return __sw_event_timer_arm_burst(adapter, evtims, nb_evtims);
+	return __swtim_arm_burst(adapter, evtims, nb_evtims);
 }
 
-static const struct rte_event_timer_adapter_ops sw_event_adapter_timer_ops = {
-	.init = sw_event_timer_adapter_init,
-	.uninit = sw_event_timer_adapter_uninit,
-	.start = sw_event_timer_adapter_start,
-	.stop = sw_event_timer_adapter_stop,
-	.get_info = sw_event_timer_adapter_get_info,
-	.stats_get = sw_event_timer_adapter_stats_get,
-	.stats_reset = sw_event_timer_adapter_stats_reset,
-	.arm_burst = sw_event_timer_arm_burst,
-	.arm_tmo_tick_burst = sw_event_timer_arm_tmo_tick_burst,
-	.cancel_burst = sw_event_timer_cancel_burst,
+static const struct rte_event_timer_adapter_ops swtim_ops = {
+	.init			= swtim_init,
+	.uninit			= swtim_uninit,
+	.start			= swtim_start,
+	.stop			= swtim_stop,
+	.get_info		= swtim_get_info,
+	.stats_get		= swtim_stats_get,
+	.stats_reset		= swtim_stats_reset,
+	.arm_burst		= swtim_arm_burst,
+	.arm_tmo_tick_burst	= swtim_arm_tmo_tick_burst,
+	.cancel_burst		= swtim_cancel_burst,
 };
 
 RTE_INIT(event_timer_adapter_init_log)
-- 
2.6.4


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [dpdk-dev] [PATCH v6 0/1] New software event timer adapter
  2019-04-22 14:57       ` [dpdk-dev] [PATCH v5 " Erik Gabriel Carrillo
  2019-04-22 14:57         ` Erik Gabriel Carrillo
  2019-04-22 14:57         ` [dpdk-dev] [PATCH v5 1/1] eventdev: add new " Erik Gabriel Carrillo
@ 2019-04-26 15:14         ` Erik Gabriel Carrillo
  2019-04-26 15:14           ` Erik Gabriel Carrillo
                             ` (2 more replies)
  2 siblings, 3 replies; 77+ messages in thread
From: Erik Gabriel Carrillo @ 2019-04-26 15:14 UTC (permalink / raw)
  To: jerin.jacob; +Cc: mattias.ronnblom, pbhagavatula, dev

This patch introduces a new version of the event timer adapter software
PMD [1]. In the original design, timer event producer lcores in the primary
and secondary processes enqueued event timers into a ring, and a service
core in the primary process dequeued them and processed them further.  To
improve performance, this version does away with the ring and lets lcores in
both primary and secondary processes insert timers directly into timer
skiplist data structures; the service core directly accesses the lists as
well, when looking for timers that have expired.

[1] https://doc.dpdk.org/guides/prog_guide/event_timer_adapter.html

Changes in v6:
 - Fix implicit type conversion bug that caused full event buffer to 
   sometimes not be correctly detected, resulting in lost events
 - Check return value of alt_timer_reset when resetting timer in event
   buffer full condition
 - Add timer list corresponding to service core to set of lists to scan
   when timers are reset by service core in event buffer full condition

Changes in v5:
 - Rebase patch to apply with latest timer library
 - Fix event buffering bug where full buffer was treated as empty
 - Return rte_timer objects back to mempool after service function has
   returned from timer_manage() call instead of in callback

Changes in v4:
 - Addressed the following comments from Mattias Ronnblom:
   - remove unnecessary header include
   - add missing read barrier in timer cancel function

Changes in v3:
 - Addressed comments from Mattias Ronnblom:
   - remove unnecessary header include
   - remove unnecessary cast in mempool_put() call
   - update alignment of elements of array to avoid false sharing issue

Changes in v2:
 - split this change out into its own patch series

Erik Gabriel Carrillo (1):
  eventdev: add new software event timer adapter

 lib/librte_eventdev/rte_event_timer_adapter.c | 733 +++++++++++---------------
 1 file changed, 312 insertions(+), 421 deletions(-)

-- 
2.6.4

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [dpdk-dev] [PATCH v6 0/1] New software event timer adapter
  2019-04-26 15:14         ` [dpdk-dev] [PATCH v6 0/1] New " Erik Gabriel Carrillo
@ 2019-04-26 15:14           ` Erik Gabriel Carrillo
  2019-04-26 15:14           ` [dpdk-dev] [PATCH v6 1/1] eventdev: add new " Erik Gabriel Carrillo
  2019-06-19 15:14           ` [dpdk-dev] [PATCH v7 0/1] New " Erik Gabriel Carrillo
  2 siblings, 0 replies; 77+ messages in thread
From: Erik Gabriel Carrillo @ 2019-04-26 15:14 UTC (permalink / raw)
  To: jerin.jacob; +Cc: mattias.ronnblom, pbhagavatula, dev

This patch introduces a new version of the event timer adapter software
PMD [1]. In the original design, timer event producer lcores in the primary
and secondary processes enqueued event timers into a ring, and a service
core in the primary process dequeued them and processed them further.  To
improve performance, this version does away with the ring and lets lcores in
both primary and secondary processes insert timers directly into timer
skiplist data structures; the service core directly accesses the lists as
well, when looking for timers that have expired.

[1] https://doc.dpdk.org/guides/prog_guide/event_timer_adapter.html

Changes in v6:
 - Fix implicit type conversion bug that caused full event buffer to 
   sometimes not be correctly detected, resulting in lost events
 - Check return value of alt_timer_reset when resetting timer in event
   buffer full condition
 - Add timer list corresponding to service core to set of lists to scan
   when timers are reset by service core in event buffer full condition

Changes in v5:
 - Rebase patch to apply with latest timer library
 - Fix event buffering bug where full buffer was treated as empty
 - Return rte_timer objects back to mempool after service function has
   returned from timer_manage() call instead of in callback

Changes in v4:
 - Addressed the following comments from Mattias Ronnblom:
   - remove unnecessary header include
   - add missing read barrier in timer cancel function

Changes in v3:
 - Addressed comments from Mattias Ronnblom:
   - remove unnecessary header include
   - remove unnecessary cast in mempool_put() call
   - update alignment of elements of array to avoid false sharing issue

Changes in v2:
 - split this change out into its own patch series

Erik Gabriel Carrillo (1):
  eventdev: add new software event timer adapter

 lib/librte_eventdev/rte_event_timer_adapter.c | 733 +++++++++++---------------
 1 file changed, 312 insertions(+), 421 deletions(-)

-- 
2.6.4


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [dpdk-dev] [PATCH v6 1/1] eventdev: add new software event timer adapter
  2019-04-26 15:14         ` [dpdk-dev] [PATCH v6 0/1] New " Erik Gabriel Carrillo
  2019-04-26 15:14           ` Erik Gabriel Carrillo
@ 2019-04-26 15:14           ` Erik Gabriel Carrillo
  2019-04-26 15:14             ` Erik Gabriel Carrillo
  2019-04-26 18:51             ` Honnappa Nagarahalli
  2019-06-19 15:14           ` [dpdk-dev] [PATCH v7 0/1] New " Erik Gabriel Carrillo
  2 siblings, 2 replies; 77+ messages in thread
From: Erik Gabriel Carrillo @ 2019-04-26 15:14 UTC (permalink / raw)
  To: jerin.jacob; +Cc: mattias.ronnblom, pbhagavatula, dev

This patch introduces a new version of the event timer adapter software
PMD. In the original design, timer event producer lcores in the primary
and secondary processes enqueued event timers into a ring, and a
service core in the primary process dequeued them and processed them
further.  To improve performance, this version does away with the ring
and lets lcores in both primary and secondary processes insert timers
directly into timer skiplist data structures; the service core directly
accesses the lists as well, when looking for timers that have expired.

Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
---
 lib/librte_eventdev/rte_event_timer_adapter.c | 733 +++++++++++---------------
 1 file changed, 312 insertions(+), 421 deletions(-)

diff --git a/lib/librte_eventdev/rte_event_timer_adapter.c b/lib/librte_eventdev/rte_event_timer_adapter.c
index 2f7a760..e9104f8 100644
--- a/lib/librte_eventdev/rte_event_timer_adapter.c
+++ b/lib/librte_eventdev/rte_event_timer_adapter.c
@@ -34,7 +34,7 @@ static int evtim_buffer_logtype;
 
 static struct rte_event_timer_adapter adapters[RTE_EVENT_TIMER_ADAPTER_NUM_MAX];
 
-static const struct rte_event_timer_adapter_ops sw_event_adapter_timer_ops;
+static const struct rte_event_timer_adapter_ops swtim_ops;
 
 #define EVTIM_LOG(level, logtype, ...) \
 	rte_log(RTE_LOG_ ## level, logtype, \
@@ -211,7 +211,7 @@ rte_event_timer_adapter_create_ext(
 	 * implementation.
 	 */
 	if (adapter->ops == NULL)
-		adapter->ops = &sw_event_adapter_timer_ops;
+		adapter->ops = &swtim_ops;
 
 	/* Allow driver to do some setup */
 	FUNC_PTR_OR_NULL_RET_WITH_ERRNO(adapter->ops->init, -ENOTSUP);
@@ -340,7 +340,7 @@ rte_event_timer_adapter_lookup(uint16_t adapter_id)
 	 * implementation.
 	 */
 	if (adapter->ops == NULL)
-		adapter->ops = &sw_event_adapter_timer_ops;
+		adapter->ops = &swtim_ops;
 
 	/* Set fast-path function pointers */
 	adapter->arm_burst = adapter->ops->arm_burst;
@@ -428,8 +428,8 @@ rte_event_timer_adapter_stats_reset(struct rte_event_timer_adapter *adapter)
 #define EVENT_BUFFER_MASK (EVENT_BUFFER_SZ - 1)
 
 struct event_buffer {
-	uint16_t head;
-	uint16_t tail;
+	size_t head;
+	size_t tail;
 	struct rte_event events[EVENT_BUFFER_SZ];
 } __rte_cache_aligned;
 
@@ -455,7 +455,7 @@ event_buffer_init(struct event_buffer *bufp)
 static int
 event_buffer_add(struct event_buffer *bufp, struct rte_event *eventp)
 {
-	uint16_t head_idx;
+	size_t head_idx;
 	struct rte_event *buf_eventp;
 
 	if (event_buffer_full(bufp))
@@ -477,13 +477,16 @@ event_buffer_flush(struct event_buffer *bufp, uint8_t dev_id, uint8_t port_id,
 		   uint16_t *nb_events_flushed,
 		   uint16_t *nb_events_inv)
 {
-	uint16_t head_idx, tail_idx, n = 0;
 	struct rte_event *events = bufp->events;
+	size_t head_idx, tail_idx;
+	uint16_t n = 0;
 
 	/* Instead of modulus, bitwise AND with mask to get index. */
 	head_idx = bufp->head & EVENT_BUFFER_MASK;
 	tail_idx = bufp->tail & EVENT_BUFFER_MASK;
 
+	RTE_ASSERT(head_idx < EVENT_BUFFER_SZ && tail_idx < EVENT_BUFFER_SZ);
+
 	/* Determine the largest contigous run we can attempt to enqueue to the
 	 * event device.
 	 */
@@ -491,150 +494,155 @@ event_buffer_flush(struct event_buffer *bufp, uint8_t dev_id, uint8_t port_id,
 		n = head_idx - tail_idx;
 	else if (head_idx < tail_idx)
 		n = EVENT_BUFFER_SZ - tail_idx;
+	else if (event_buffer_full(bufp))
+		n = EVENT_BUFFER_SZ - tail_idx;
 	else {
 		*nb_events_flushed = 0;
 		return;
 	}
 
+	n = RTE_MIN(EVENT_BUFFER_BATCHSZ, n);
 	*nb_events_inv = 0;
+
 	*nb_events_flushed = rte_event_enqueue_burst(dev_id, port_id,
 						     &events[tail_idx], n);
-	if (*nb_events_flushed != n && rte_errno == -EINVAL) {
-		EVTIM_LOG_ERR("failed to enqueue invalid event - dropping it");
-		(*nb_events_inv)++;
+	if (*nb_events_flushed != n) {
+		if (rte_errno == -EINVAL) {
+			EVTIM_LOG_ERR("failed to enqueue invalid event - "
+				      "dropping it");
+			(*nb_events_inv)++;
+		} else if (rte_errno == -ENOSPC)
+			rte_pause();
 	}
 
+	if (*nb_events_flushed > 0)
+		EVTIM_BUF_LOG_DBG("enqueued %"PRIu16" timer events to event "
+				  "device", *nb_events_flushed);
+
 	bufp->tail = bufp->tail + *nb_events_flushed + *nb_events_inv;
 }
 
 /*
  * Software event timer adapter implementation
  */
-
-struct rte_event_timer_adapter_sw_data {
-	/* List of messages for outstanding timers */
-	TAILQ_HEAD(, msg) msgs_tailq_head;
-	/* Lock to guard tailq and armed count */
-	rte_spinlock_t msgs_tailq_sl;
+struct swtim {
 	/* Identifier of service executing timer management logic. */
 	uint32_t service_id;
 	/* The cycle count at which the adapter should next tick */
 	uint64_t next_tick_cycles;
-	/* Incremented as the service moves through phases of an iteration */
-	volatile int service_phase;
 	/* The tick resolution used by adapter instance. May have been
 	 * adjusted from what user requested
 	 */
 	uint64_t timer_tick_ns;
 	/* Maximum timeout in nanoseconds allowed by adapter instance. */
 	uint64_t max_tmo_ns;
-	/* Ring containing messages to arm or cancel event timers */
-	struct rte_ring *msg_ring;
-	/* Mempool containing msg objects */
-	struct rte_mempool *msg_pool;
 	/* Buffered timer expiry events to be enqueued to an event device. */
 	struct event_buffer buffer;
 	/* Statistics */
 	struct rte_event_timer_adapter_stats stats;
-	/* The number of threads currently adding to the message ring */
-	rte_atomic16_t message_producer_count;
+	/* Mempool of timer objects */
+	struct rte_mempool *tim_pool;
+	/* Back pointer for convenience */
+	struct rte_event_timer_adapter *adapter;
+	/* Identifier of timer data instance */
+	uint32_t timer_data_id;
+	/* Track which cores have actually armed a timer */
+	struct {
+		rte_atomic16_t v;
+	} __rte_cache_aligned in_use[RTE_MAX_LCORE];
+	/* Track which cores' timer lists should be polled */
+	unsigned int poll_lcores[RTE_MAX_LCORE];
+	/* The number of lists that should be polled */
+	int n_poll_lcores;
+	/* Lock to atomically access the above two variables */
+	rte_spinlock_t poll_lcores_sl;
+
+	struct rte_timer *expired_timers[EVENT_BUFFER_SZ];
+	size_t expired_timers_idx;
 };
 
-enum msg_type {MSG_TYPE_ARM, MSG_TYPE_CANCEL};
-
-struct msg {
-	enum msg_type type;
-	struct rte_event_timer *evtim;
-	struct rte_timer tim;
-	TAILQ_ENTRY(msg) msgs;
-};
+static inline struct swtim *
+swtim_pmd_priv(const struct rte_event_timer_adapter *adapter)
+{
+	return adapter->data->adapter_priv;
+}
 
 static void
-sw_event_timer_cb(struct rte_timer *tim, void *arg)
+swtim_callback(struct rte_timer *tim)
 {
-	int ret;
+	struct rte_event_timer *evtim = tim->arg;
+	struct rte_event_timer_adapter *adapter;
+	unsigned int lcore = rte_lcore_id();
+	struct swtim *sw;
 	uint16_t nb_evs_flushed = 0;
 	uint16_t nb_evs_invalid = 0;
 	uint64_t opaque;
-	struct rte_event_timer *evtim;
-	struct rte_event_timer_adapter *adapter;
-	struct rte_event_timer_adapter_sw_data *sw_data;
+	int ret;
 
-	evtim = arg;
 	opaque = evtim->impl_opaque[1];
 	adapter = (struct rte_event_timer_adapter *)(uintptr_t)opaque;
-	sw_data = adapter->data->adapter_priv;
+	sw = swtim_pmd_priv(adapter);
 
-	ret = event_buffer_add(&sw_data->buffer, &evtim->ev);
+	ret = event_buffer_add(&sw->buffer, &evtim->ev);
 	if (ret < 0) {
 		/* If event buffer is full, put timer back in list with
 		 * immediate expiry value, so that we process it again on the
 		 * next iteration.
 		 */
-		rte_timer_reset_sync(tim, 0, SINGLE, rte_lcore_id(),
-				     sw_event_timer_cb, evtim);
+		ret = rte_timer_alt_reset(sw->timer_data_id, tim, 0, SINGLE,
+					  lcore, NULL, evtim);
+		if (ret < 0) {
+			EVTIM_LOG_DBG("event buffer full, failed to reset "
+				      "timer with immediate expiry value");
+		} else {
+			sw->stats.evtim_retry_count++;
+			EVTIM_LOG_DBG("event buffer full, resetting rte_timer "
+				      "with immediate expiry value");
+		}
 
-		sw_data->stats.evtim_retry_count++;
-		EVTIM_LOG_DBG("event buffer full, resetting rte_timer with "
-			      "immediate expiry value");
+		if (unlikely(rte_atomic16_test_and_set(&sw->in_use[lcore].v)))
+			sw->poll_lcores[sw->n_poll_lcores++] = lcore;
 	} else {
-		struct msg *m = container_of(tim, struct msg, tim);
-		TAILQ_REMOVE(&sw_data->msgs_tailq_head, m, msgs);
 		EVTIM_BUF_LOG_DBG("buffered an event timer expiry event");
-		evtim->state = RTE_EVENT_TIMER_NOT_ARMED;
-
-		/* Free the msg object containing the rte_timer now that
-		 * we've buffered its event successfully.
-		 */
-		rte_mempool_put(sw_data->msg_pool, m);
+		sw->expired_timers[sw->expired_timers_idx++] = tim;
+		sw->stats.evtim_exp_count++;
 
-		/* Bump the count when we successfully add an expiry event to
-		 * the buffer.
-		 */
-		sw_data->stats.evtim_exp_count++;
+		evtim->state = RTE_EVENT_TIMER_NOT_ARMED;
 	}
 
-	if (event_buffer_batch_ready(&sw_data->buffer)) {
-		event_buffer_flush(&sw_data->buffer,
+	if (event_buffer_batch_ready(&sw->buffer)) {
+		event_buffer_flush(&sw->buffer,
 				   adapter->data->event_dev_id,
 				   adapter->data->event_port_id,
 				   &nb_evs_flushed,
 				   &nb_evs_invalid);
 
-		sw_data->stats.ev_enq_count += nb_evs_flushed;
-		sw_data->stats.ev_inv_count += nb_evs_invalid;
+		sw->stats.ev_enq_count += nb_evs_flushed;
+		sw->stats.ev_inv_count += nb_evs_invalid;
 	}
 }
 
 static __rte_always_inline uint64_t
 get_timeout_cycles(struct rte_event_timer *evtim,
-		   struct rte_event_timer_adapter *adapter)
+		   const struct rte_event_timer_adapter *adapter)
 {
-	uint64_t timeout_ns;
-	struct rte_event_timer_adapter_sw_data *sw_data;
-
-	sw_data = adapter->data->adapter_priv;
-	timeout_ns = evtim->timeout_ticks * sw_data->timer_tick_ns;
+	struct swtim *sw = swtim_pmd_priv(adapter);
+	uint64_t timeout_ns = evtim->timeout_ticks * sw->timer_tick_ns;
 	return timeout_ns * rte_get_timer_hz() / NSECPERSEC;
-
 }
 
 /* This function returns true if one or more (adapter) ticks have occurred since
  * the last time it was called.
  */
 static inline bool
-adapter_did_tick(struct rte_event_timer_adapter *adapter)
+swtim_did_tick(struct swtim *sw)
 {
 	uint64_t cycles_per_adapter_tick, start_cycles;
 	uint64_t *next_tick_cyclesp;
-	struct rte_event_timer_adapter_sw_data *sw_data;
 
-	sw_data = adapter->data->adapter_priv;
-	next_tick_cyclesp = &sw_data->next_tick_cycles;
-
-	cycles_per_adapter_tick = sw_data->timer_tick_ns *
+	next_tick_cyclesp = &sw->next_tick_cycles;
+	cycles_per_adapter_tick = sw->timer_tick_ns *
 			(rte_get_timer_hz() / NSECPERSEC);
-
 	start_cycles = rte_get_timer_cycles();
 
 	/* Note: initially, *next_tick_cyclesp == 0, so the clause below will
@@ -646,7 +654,6 @@ adapter_did_tick(struct rte_event_timer_adapter *adapter)
 		 * boundary.
 		 */
 		start_cycles -= start_cycles % cycles_per_adapter_tick;
-
 		*next_tick_cyclesp = start_cycles + cycles_per_adapter_tick;
 
 		return true;
@@ -661,15 +668,12 @@ check_timeout(struct rte_event_timer *evtim,
 	      const struct rte_event_timer_adapter *adapter)
 {
 	uint64_t tmo_nsec;
-	struct rte_event_timer_adapter_sw_data *sw_data;
-
-	sw_data = adapter->data->adapter_priv;
-	tmo_nsec = evtim->timeout_ticks * sw_data->timer_tick_ns;
+	struct swtim *sw = swtim_pmd_priv(adapter);
 
-	if (tmo_nsec > sw_data->max_tmo_ns)
+	tmo_nsec = evtim->timeout_ticks * sw->timer_tick_ns;
+	if (tmo_nsec > sw->max_tmo_ns)
 		return -1;
-
-	if (tmo_nsec < sw_data->timer_tick_ns)
+	if (tmo_nsec < sw->timer_tick_ns)
 		return -2;
 
 	return 0;
@@ -697,110 +701,41 @@ check_destination_event_queue(struct rte_event_timer *evtim,
 	return 0;
 }
 
-#define NB_OBJS 32
 static int
-sw_event_timer_adapter_service_func(void *arg)
+swtim_service_func(void *arg)
 {
-	int i, num_msgs;
-	uint64_t cycles, opaque;
+	struct rte_event_timer_adapter *adapter = arg;
+	struct swtim *sw = swtim_pmd_priv(adapter);
 	uint16_t nb_evs_flushed = 0;
 	uint16_t nb_evs_invalid = 0;
-	struct rte_event_timer_adapter *adapter;
-	struct rte_event_timer_adapter_sw_data *sw_data;
-	struct rte_event_timer *evtim = NULL;
-	struct rte_timer *tim = NULL;
-	struct msg *msg, *msgs[NB_OBJS];
-
-	adapter = arg;
-	sw_data = adapter->data->adapter_priv;
-
-	sw_data->service_phase = 1;
-	rte_smp_wmb();
-
-	while (rte_atomic16_read(&sw_data->message_producer_count) > 0 ||
-	       !rte_ring_empty(sw_data->msg_ring)) {
-
-		num_msgs = rte_ring_dequeue_burst(sw_data->msg_ring,
-						  (void **)msgs, NB_OBJS, NULL);
-
-		for (i = 0; i < num_msgs; i++) {
-			int ret = 0;
-
-			RTE_SET_USED(ret);
-
-			msg = msgs[i];
-			evtim = msg->evtim;
-
-			switch (msg->type) {
-			case MSG_TYPE_ARM:
-				EVTIM_SVC_LOG_DBG("dequeued ARM message from "
-						  "ring");
-				tim = &msg->tim;
-				rte_timer_init(tim);
-				cycles = get_timeout_cycles(evtim,
-							    adapter);
-				ret = rte_timer_reset(tim, cycles, SINGLE,
-						      rte_lcore_id(),
-						      sw_event_timer_cb,
-						      evtim);
-				RTE_ASSERT(ret == 0);
-
-				evtim->impl_opaque[0] = (uintptr_t)tim;
-				evtim->impl_opaque[1] = (uintptr_t)adapter;
-
-				TAILQ_INSERT_TAIL(&sw_data->msgs_tailq_head,
-						  msg,
-						  msgs);
-				break;
-			case MSG_TYPE_CANCEL:
-				EVTIM_SVC_LOG_DBG("dequeued CANCEL message "
-						  "from ring");
-				opaque = evtim->impl_opaque[0];
-				tim = (struct rte_timer *)(uintptr_t)opaque;
-				RTE_ASSERT(tim != NULL);
-
-				ret = rte_timer_stop(tim);
-				RTE_ASSERT(ret == 0);
-
-				/* Free the msg object for the original arm
-				 * request.
-				 */
-				struct msg *m;
-				m = container_of(tim, struct msg, tim);
-				TAILQ_REMOVE(&sw_data->msgs_tailq_head, m,
-					     msgs);
-				rte_mempool_put(sw_data->msg_pool, m);
-
-				/* Free the msg object for the current msg */
-				rte_mempool_put(sw_data->msg_pool, msg);
-
-				evtim->impl_opaque[0] = 0;
-				evtim->impl_opaque[1] = 0;
-
-				break;
-			}
-		}
-	}
 
-	sw_data->service_phase = 2;
-	rte_smp_wmb();
+	if (swtim_did_tick(sw)) {
+		/* This lock is seldom acquired on the arm side */
+		rte_spinlock_lock(&sw->poll_lcores_sl);
+
+		rte_timer_alt_manage(sw->timer_data_id,
+				     sw->poll_lcores,
+				     sw->n_poll_lcores,
+				     swtim_callback);
 
-	if (adapter_did_tick(adapter)) {
-		rte_timer_manage();
+		rte_spinlock_unlock(&sw->poll_lcores_sl);
 
-		event_buffer_flush(&sw_data->buffer,
+		/* Return expired timer objects back to mempool */
+		rte_mempool_put_bulk(sw->tim_pool, (void **)sw->expired_timers,
+				     sw->expired_timers_idx);
+		sw->expired_timers_idx = 0;
+
+		event_buffer_flush(&sw->buffer,
 				   adapter->data->event_dev_id,
 				   adapter->data->event_port_id,
-				   &nb_evs_flushed, &nb_evs_invalid);
+				   &nb_evs_flushed,
+				   &nb_evs_invalid);
 
-		sw_data->stats.ev_enq_count += nb_evs_flushed;
-		sw_data->stats.ev_inv_count += nb_evs_invalid;
-		sw_data->stats.adapter_tick_count++;
+		sw->stats.ev_enq_count += nb_evs_flushed;
+		sw->stats.ev_inv_count += nb_evs_invalid;
+		sw->stats.adapter_tick_count++;
 	}
 
-	sw_data->service_phase = 0;
-	rte_smp_wmb();
-
 	return 0;
 }
 
@@ -820,7 +755,7 @@ compute_msg_mempool_cache_size(uint64_t nb_requested, uint64_t nb_actual)
 	int size;
 	int cache_size = 0;
 
-	for (i = 0; ; i++) {
+	for (i = 0;; i++) {
 		size = 1 << i;
 
 		if (RTE_MAX_LCORE * size < (int)(nb_actual - nb_requested) &&
@@ -834,168 +769,145 @@ compute_msg_mempool_cache_size(uint64_t nb_requested, uint64_t nb_actual)
 	return cache_size;
 }
 
-#define SW_MIN_INTERVAL 1E5
-
 static int
-sw_event_timer_adapter_init(struct rte_event_timer_adapter *adapter)
+swtim_init(struct rte_event_timer_adapter *adapter)
 {
-	int ret;
-	struct rte_event_timer_adapter_sw_data *sw_data;
-	uint64_t nb_timers;
+	int i, ret;
+	struct swtim *sw;
 	unsigned int flags;
 	struct rte_service_spec service;
-	static bool timer_subsystem_inited; // static initialized to false
 
-	/* Allocate storage for SW implementation data */
-	char priv_data_name[RTE_RING_NAMESIZE];
-	snprintf(priv_data_name, RTE_RING_NAMESIZE, "sw_evtim_adap_priv_%"PRIu8,
-		 adapter->data->id);
-	adapter->data->adapter_priv = rte_zmalloc_socket(
-				priv_data_name,
-				sizeof(struct rte_event_timer_adapter_sw_data),
-				RTE_CACHE_LINE_SIZE,
-				adapter->data->socket_id);
-	if (adapter->data->adapter_priv == NULL) {
+	/* Allocate storage for private data area */
+#define SWTIM_NAMESIZE 32
+	char swtim_name[SWTIM_NAMESIZE];
+	snprintf(swtim_name, SWTIM_NAMESIZE, "swtim_%"PRIu8,
+			adapter->data->id);
+	sw = rte_zmalloc_socket(swtim_name, sizeof(*sw), RTE_CACHE_LINE_SIZE,
+			adapter->data->socket_id);
+	if (sw == NULL) {
 		EVTIM_LOG_ERR("failed to allocate space for private data");
 		rte_errno = ENOMEM;
 		return -1;
 	}
 
-	if (adapter->data->conf.timer_tick_ns < SW_MIN_INTERVAL) {
-		EVTIM_LOG_ERR("failed to create adapter with requested tick "
-			      "interval");
-		rte_errno = EINVAL;
-		return -1;
-	}
-
-	sw_data = adapter->data->adapter_priv;
-
-	sw_data->timer_tick_ns = adapter->data->conf.timer_tick_ns;
-	sw_data->max_tmo_ns = adapter->data->conf.max_tmo_ns;
-
-	TAILQ_INIT(&sw_data->msgs_tailq_head);
-	rte_spinlock_init(&sw_data->msgs_tailq_sl);
-	rte_atomic16_init(&sw_data->message_producer_count);
+	/* Connect storage to adapter instance */
+	adapter->data->adapter_priv = sw;
+	sw->adapter = adapter;
 
-	/* Rings require power of 2, so round up to next such value */
-	nb_timers = rte_align64pow2(adapter->data->conf.nb_timers);
+	sw->timer_tick_ns = adapter->data->conf.timer_tick_ns;
+	sw->max_tmo_ns = adapter->data->conf.max_tmo_ns;
 
-	char msg_ring_name[RTE_RING_NAMESIZE];
-	snprintf(msg_ring_name, RTE_RING_NAMESIZE,
-		 "sw_evtim_adap_msg_ring_%"PRIu8, adapter->data->id);
-	flags = adapter->data->conf.flags & RTE_EVENT_TIMER_ADAPTER_F_SP_PUT ?
-		RING_F_SP_ENQ | RING_F_SC_DEQ :
-		RING_F_SC_DEQ;
-	sw_data->msg_ring = rte_ring_create(msg_ring_name, nb_timers,
-					    adapter->data->socket_id, flags);
-	if (sw_data->msg_ring == NULL) {
-		EVTIM_LOG_ERR("failed to create message ring");
-		rte_errno = ENOMEM;
-		goto free_priv_data;
-	}
-
-	char pool_name[RTE_RING_NAMESIZE];
-	snprintf(pool_name, RTE_RING_NAMESIZE, "sw_evtim_adap_msg_pool_%"PRIu8,
+	/* Create a timer pool */
+	char pool_name[SWTIM_NAMESIZE];
+	snprintf(pool_name, SWTIM_NAMESIZE, "swtim_pool_%"PRIu8,
 		 adapter->data->id);
-
-	/* Both the arming/canceling thread and the service thread will do puts
-	 * to the mempool, but if the SP_PUT flag is enabled, we can specify
-	 * single-consumer get for the mempool.
-	 */
-	flags = adapter->data->conf.flags & RTE_EVENT_TIMER_ADAPTER_F_SP_PUT ?
-		MEMPOOL_F_SC_GET : 0;
-
-	/* The usable size of a ring is count - 1, so subtract one here to
-	 * make the counts agree.
-	 */
+	/* Optimal mempool size is a power of 2 minus one */
+	uint64_t nb_timers = rte_align64pow2(adapter->data->conf.nb_timers);
 	int pool_size = nb_timers - 1;
 	int cache_size = compute_msg_mempool_cache_size(
 				adapter->data->conf.nb_timers, nb_timers);
-	sw_data->msg_pool = rte_mempool_create(pool_name, pool_size,
-					       sizeof(struct msg), cache_size,
-					       0, NULL, NULL, NULL, NULL,
-					       adapter->data->socket_id, flags);
-	if (sw_data->msg_pool == NULL) {
-		EVTIM_LOG_ERR("failed to create message object mempool");
+	flags = 0; /* pool is multi-producer, multi-consumer */
+	sw->tim_pool = rte_mempool_create(pool_name, pool_size,
+			sizeof(struct rte_timer), cache_size, 0, NULL, NULL,
+			NULL, NULL, adapter->data->socket_id, flags);
+	if (sw->tim_pool == NULL) {
+		EVTIM_LOG_ERR("failed to create timer object mempool");
 		rte_errno = ENOMEM;
-		goto free_msg_ring;
+		goto free_alloc;
 	}
 
-	event_buffer_init(&sw_data->buffer);
+	/* Initialize the variables that track in-use timer lists */
+	rte_spinlock_init(&sw->poll_lcores_sl);
+	for (i = 0; i < RTE_MAX_LCORE; i++)
+		rte_atomic16_init(&sw->in_use[i].v);
+
+	/* Initialize the timer subsystem and allocate timer data instance */
+	ret = rte_timer_subsystem_init();
+	if (ret < 0) {
+		if (ret != -EALREADY) {
+			EVTIM_LOG_ERR("failed to initialize timer subsystem");
+			rte_errno = ret;
+			goto free_mempool;
+		}
+	}
+
+	ret = rte_timer_data_alloc(&sw->timer_data_id);
+	if (ret < 0) {
+		EVTIM_LOG_ERR("failed to allocate timer data instance");
+		rte_errno = ret;
+		goto free_mempool;
+	}
+
+	/* Initialize timer event buffer */
+	event_buffer_init(&sw->buffer);
+
+	sw->adapter = adapter;
 
 	/* Register a service component to run adapter logic */
 	memset(&service, 0, sizeof(service));
 	snprintf(service.name, RTE_SERVICE_NAME_MAX,
-		 "sw_evimer_adap_svc_%"PRIu8, adapter->data->id);
+		 "swtim_svc_%"PRIu8, adapter->data->id);
 	service.socket_id = adapter->data->socket_id;
-	service.callback = sw_event_timer_adapter_service_func;
+	service.callback = swtim_service_func;
 	service.callback_userdata = adapter;
 	service.capabilities &= ~(RTE_SERVICE_CAP_MT_SAFE);
-	ret = rte_service_component_register(&service, &sw_data->service_id);
+	ret = rte_service_component_register(&service, &sw->service_id);
 	if (ret < 0) {
 		EVTIM_LOG_ERR("failed to register service %s with id %"PRIu32
-			      ": err = %d", service.name, sw_data->service_id,
+			      ": err = %d", service.name, sw->service_id,
 			      ret);
 
 		rte_errno = ENOSPC;
-		goto free_msg_pool;
+		goto free_mempool;
 	}
 
 	EVTIM_LOG_DBG("registered service %s with id %"PRIu32, service.name,
-		      sw_data->service_id);
+		      sw->service_id);
 
-	adapter->data->service_id = sw_data->service_id;
+	adapter->data->service_id = sw->service_id;
 	adapter->data->service_inited = 1;
 
-	if (!timer_subsystem_inited) {
-		rte_timer_subsystem_init();
-		timer_subsystem_inited = true;
-	}
-
 	return 0;
-
-free_msg_pool:
-	rte_mempool_free(sw_data->msg_pool);
-free_msg_ring:
-	rte_ring_free(sw_data->msg_ring);
-free_priv_data:
-	rte_free(sw_data);
+free_mempool:
+	rte_mempool_free(sw->tim_pool);
+free_alloc:
+	rte_free(sw);
 	return -1;
 }
 
-static int
-sw_event_timer_adapter_uninit(struct rte_event_timer_adapter *adapter)
+static void
+swtim_free_tim(struct rte_timer *tim, void *arg)
 {
-	int ret;
-	struct msg *m1, *m2;
-	struct rte_event_timer_adapter_sw_data *sw_data =
-						adapter->data->adapter_priv;
-
-	rte_spinlock_lock(&sw_data->msgs_tailq_sl);
+	struct swtim *sw = arg;
 
-	/* Cancel outstanding rte_timers and free msg objects */
-	m1 = TAILQ_FIRST(&sw_data->msgs_tailq_head);
-	while (m1 != NULL) {
-		EVTIM_LOG_DBG("freeing outstanding timer");
-		m2 = TAILQ_NEXT(m1, msgs);
-
-		rte_timer_stop_sync(&m1->tim);
-		rte_mempool_put(sw_data->msg_pool, m1);
+	rte_mempool_put(sw->tim_pool, tim);
+}
 
-		m1 = m2;
-	}
+/* Traverse the list of outstanding timers and put them back in the mempool
+ * before freeing the adapter to avoid leaking the memory.
+ */
+static int
+swtim_uninit(struct rte_event_timer_adapter *adapter)
+{
+	int ret;
+	struct swtim *sw = swtim_pmd_priv(adapter);
 
-	rte_spinlock_unlock(&sw_data->msgs_tailq_sl);
+	/* Free outstanding timers */
+	rte_timer_stop_all(sw->timer_data_id,
+			   sw->poll_lcores,
+			   sw->n_poll_lcores,
+			   swtim_free_tim,
+			   sw);
 
-	ret = rte_service_component_unregister(sw_data->service_id);
+	ret = rte_service_component_unregister(sw->service_id);
 	if (ret < 0) {
 		EVTIM_LOG_ERR("failed to unregister service component");
 		return ret;
 	}
 
-	rte_ring_free(sw_data->msg_ring);
-	rte_mempool_free(sw_data->msg_pool);
-	rte_free(adapter->data->adapter_priv);
+	rte_mempool_free(sw->tim_pool);
+	rte_free(sw);
+	adapter->data->adapter_priv = NULL;
 
 	return 0;
 }
@@ -1016,88 +928,79 @@ get_mapped_count_for_service(uint32_t service_id)
 }
 
 static int
-sw_event_timer_adapter_start(const struct rte_event_timer_adapter *adapter)
+swtim_start(const struct rte_event_timer_adapter *adapter)
 {
 	int mapped_count;
-	struct rte_event_timer_adapter_sw_data *sw_data;
-
-	sw_data = adapter->data->adapter_priv;
+	struct swtim *sw = swtim_pmd_priv(adapter);
 
 	/* Mapping the service to more than one service core can introduce
 	 * delays while one thread is waiting to acquire a lock, so only allow
 	 * one core to be mapped to the service.
+	 *
+	 * Note: the service could be modified such that it spreads cores to
+	 * poll over multiple service instances.
 	 */
-	mapped_count = get_mapped_count_for_service(sw_data->service_id);
+	mapped_count = get_mapped_count_for_service(sw->service_id);
 
-	if (mapped_count == 1)
-		return rte_service_component_runstate_set(sw_data->service_id,
-							  1);
+	if (mapped_count != 1)
+		return mapped_count < 1 ? -ENOENT : -ENOTSUP;
 
-	return mapped_count < 1 ? -ENOENT : -ENOTSUP;
+	return rte_service_component_runstate_set(sw->service_id, 1);
 }
 
 static int
-sw_event_timer_adapter_stop(const struct rte_event_timer_adapter *adapter)
+swtim_stop(const struct rte_event_timer_adapter *adapter)
 {
 	int ret;
-	struct rte_event_timer_adapter_sw_data *sw_data =
-						adapter->data->adapter_priv;
+	struct swtim *sw = swtim_pmd_priv(adapter);
 
-	ret = rte_service_component_runstate_set(sw_data->service_id, 0);
+	ret = rte_service_component_runstate_set(sw->service_id, 0);
 	if (ret < 0)
 		return ret;
 
-	/* Wait for the service to complete its final iteration before
-	 * stopping.
-	 */
-	while (sw_data->service_phase != 0)
+	/* Wait for the service to complete its final iteration */
+	while (rte_service_may_be_active(sw->service_id))
 		rte_pause();
 
-	rte_smp_rmb();
-
 	return 0;
 }
 
 static void
-sw_event_timer_adapter_get_info(const struct rte_event_timer_adapter *adapter,
+swtim_get_info(const struct rte_event_timer_adapter *adapter,
 		struct rte_event_timer_adapter_info *adapter_info)
 {
-	struct rte_event_timer_adapter_sw_data *sw_data;
-	sw_data = adapter->data->adapter_priv;
-
-	adapter_info->min_resolution_ns = sw_data->timer_tick_ns;
-	adapter_info->max_tmo_ns = sw_data->max_tmo_ns;
+	struct swtim *sw = swtim_pmd_priv(adapter);
+	adapter_info->min_resolution_ns = sw->timer_tick_ns;
+	adapter_info->max_tmo_ns = sw->max_tmo_ns;
 }
 
 static int
-sw_event_timer_adapter_stats_get(const struct rte_event_timer_adapter *adapter,
-				 struct rte_event_timer_adapter_stats *stats)
+swtim_stats_get(const struct rte_event_timer_adapter *adapter,
+		struct rte_event_timer_adapter_stats *stats)
 {
-	struct rte_event_timer_adapter_sw_data *sw_data;
-	sw_data = adapter->data->adapter_priv;
-	*stats = sw_data->stats;
+	struct swtim *sw = swtim_pmd_priv(adapter);
+	*stats = sw->stats; /* structure copy */
 	return 0;
 }
 
 static int
-sw_event_timer_adapter_stats_reset(
-				const struct rte_event_timer_adapter *adapter)
+swtim_stats_reset(const struct rte_event_timer_adapter *adapter)
 {
-	struct rte_event_timer_adapter_sw_data *sw_data;
-	sw_data = adapter->data->adapter_priv;
-	memset(&sw_data->stats, 0, sizeof(sw_data->stats));
+	struct swtim *sw = swtim_pmd_priv(adapter);
+	memset(&sw->stats, 0, sizeof(sw->stats));
 	return 0;
 }
 
-static __rte_always_inline uint16_t
-__sw_event_timer_arm_burst(const struct rte_event_timer_adapter *adapter,
-			  struct rte_event_timer **evtims,
-			  uint16_t nb_evtims)
+static uint16_t
+__swtim_arm_burst(const struct rte_event_timer_adapter *adapter,
+		struct rte_event_timer **evtims,
+		uint16_t nb_evtims)
 {
-	uint16_t i;
-	int ret;
-	struct rte_event_timer_adapter_sw_data *sw_data;
-	struct msg *msgs[nb_evtims];
+	int i, ret;
+	struct swtim *sw = swtim_pmd_priv(adapter);
+	uint32_t lcore_id = rte_lcore_id();
+	struct rte_timer *tim, *tims[nb_evtims];
+	uint64_t cycles;
 
 #ifdef RTE_LIBRTE_EVENTDEV_DEBUG
 	/* Check that the service is running. */
@@ -1107,101 +1010,104 @@ __sw_event_timer_arm_burst(const struct rte_event_timer_adapter *adapter,
 	}
 #endif
 
-	sw_data = adapter->data->adapter_priv;
+	/* Adjust lcore_id if non-EAL thread. Arbitrarily pick the timer list of
+	 * the highest lcore to insert such timers into
+	 */
+	if (lcore_id == LCORE_ID_ANY)
+		lcore_id = RTE_MAX_LCORE - 1;
+
+	/* If this is the first time we're arming an event timer on this lcore,
+	 * mark this lcore as "in use"; this will cause the service
+	 * function to process the timer list that corresponds to this lcore.
+	 */
+	if (unlikely(rte_atomic16_test_and_set(&sw->in_use[lcore_id].v))) {
+		rte_spinlock_lock(&sw->poll_lcores_sl);
+		EVTIM_LOG_DBG("Adding lcore id = %u to list of lcores to poll",
+			      lcore_id);
+		sw->poll_lcores[sw->n_poll_lcores++] = lcore_id;
+		rte_spinlock_unlock(&sw->poll_lcores_sl);
+	}
 
-	ret = rte_mempool_get_bulk(sw_data->msg_pool, (void **)msgs, nb_evtims);
+	ret = rte_mempool_get_bulk(sw->tim_pool, (void **)tims,
+				   nb_evtims);
 	if (ret < 0) {
 		rte_errno = ENOSPC;
 		return 0;
 	}
 
-	/* Let the service know we're producing messages for it to process */
-	rte_atomic16_inc(&sw_data->message_producer_count);
-
-	/* If the service is managing timers, wait for it to finish */
-	while (sw_data->service_phase == 2)
-		rte_pause();
-
-	rte_smp_rmb();
-
 	for (i = 0; i < nb_evtims; i++) {
 		/* Don't modify the event timer state in these cases */
 		if (evtims[i]->state == RTE_EVENT_TIMER_ARMED) {
 			rte_errno = EALREADY;
 			break;
 		} else if (!(evtims[i]->state == RTE_EVENT_TIMER_NOT_ARMED ||
-		    evtims[i]->state == RTE_EVENT_TIMER_CANCELED)) {
+			     evtims[i]->state == RTE_EVENT_TIMER_CANCELED)) {
 			rte_errno = EINVAL;
 			break;
 		}
 
 		ret = check_timeout(evtims[i], adapter);
-		if (ret == -1) {
+		if (unlikely(ret == -1)) {
 			evtims[i]->state = RTE_EVENT_TIMER_ERROR_TOOLATE;
 			rte_errno = EINVAL;
 			break;
-		}
-		if (ret == -2) {
+		} else if (unlikely(ret == -2)) {
 			evtims[i]->state = RTE_EVENT_TIMER_ERROR_TOOEARLY;
 			rte_errno = EINVAL;
 			break;
 		}
 
-		if (check_destination_event_queue(evtims[i], adapter) < 0) {
+		if (unlikely(check_destination_event_queue(evtims[i],
+							   adapter) < 0)) {
 			evtims[i]->state = RTE_EVENT_TIMER_ERROR;
 			rte_errno = EINVAL;
 			break;
 		}
 
-		/* Checks passed, set up a message to enqueue */
-		msgs[i]->type = MSG_TYPE_ARM;
-		msgs[i]->evtim = evtims[i];
+		tim = tims[i];
+		rte_timer_init(tim);
 
-		/* Set the payload pointer if not set. */
-		if (evtims[i]->ev.event_ptr == NULL)
-			evtims[i]->ev.event_ptr = evtims[i];
+		evtims[i]->impl_opaque[0] = (uintptr_t)tim;
+		evtims[i]->impl_opaque[1] = (uintptr_t)adapter;
 
-		/* msg objects that get enqueued successfully will be freed
-		 * either by a future cancel operation or by the timer
-		 * expiration callback.
-		 */
-		if (rte_ring_enqueue(sw_data->msg_ring, msgs[i]) < 0) {
-			rte_errno = ENOSPC;
+		cycles = get_timeout_cycles(evtims[i], adapter);
+		ret = rte_timer_alt_reset(sw->timer_data_id, tim, cycles,
+					  SINGLE, lcore_id, NULL, evtims[i]);
+		if (ret < 0) {
+			/* tim was in RUNNING or CONFIG state */
+			evtims[i]->state = RTE_EVENT_TIMER_ERROR;
 			break;
 		}
 
-		EVTIM_LOG_DBG("enqueued ARM message to ring");
-
+		rte_smp_wmb();
+		EVTIM_LOG_DBG("armed an event timer");
 		evtims[i]->state = RTE_EVENT_TIMER_ARMED;
 	}
 
-	/* Let the service know we're done producing messages */
-	rte_atomic16_dec(&sw_data->message_producer_count);
-
 	if (i < nb_evtims)
-		rte_mempool_put_bulk(sw_data->msg_pool, (void **)&msgs[i],
-				     nb_evtims - i);
+		rte_mempool_put_bulk(sw->tim_pool,
+				     (void **)&tims[i], nb_evtims - i);
 
 	return i;
 }
 
 static uint16_t
-sw_event_timer_arm_burst(const struct rte_event_timer_adapter *adapter,
-			 struct rte_event_timer **evtims,
-			 uint16_t nb_evtims)
+swtim_arm_burst(const struct rte_event_timer_adapter *adapter,
+		struct rte_event_timer **evtims,
+		uint16_t nb_evtims)
 {
-	return __sw_event_timer_arm_burst(adapter, evtims, nb_evtims);
+	return __swtim_arm_burst(adapter, evtims, nb_evtims);
 }
 
 static uint16_t
-sw_event_timer_cancel_burst(const struct rte_event_timer_adapter *adapter,
-			    struct rte_event_timer **evtims,
-			    uint16_t nb_evtims)
+swtim_cancel_burst(const struct rte_event_timer_adapter *adapter,
+		   struct rte_event_timer **evtims,
+		   uint16_t nb_evtims)
 {
-	uint16_t i;
-	int ret;
-	struct rte_event_timer_adapter_sw_data *sw_data;
-	struct msg *msgs[nb_evtims];
+	int i, ret;
+	struct rte_timer *timp;
+	uint64_t opaque;
+	struct swtim *sw = swtim_pmd_priv(adapter);
 
 #ifdef RTE_LIBRTE_EVENTDEV_DEBUG
 	/* Check that the service is running. */
@@ -1211,23 +1117,6 @@ sw_event_timer_cancel_burst(const struct rte_event_timer_adapter *adapter,
 	}
 #endif
 
-	sw_data = adapter->data->adapter_priv;
-
-	ret = rte_mempool_get_bulk(sw_data->msg_pool, (void **)msgs, nb_evtims);
-	if (ret < 0) {
-		rte_errno = ENOSPC;
-		return 0;
-	}
-
-	/* Let the service know we're producing messages for it to process */
-	rte_atomic16_inc(&sw_data->message_producer_count);
-
-	/* If the service could be modifying event timer states, wait */
-	while (sw_data->service_phase == 2)
-		rte_pause();
-
-	rte_smp_rmb();
-
 	for (i = 0; i < nb_evtims; i++) {
 		/* Don't modify the event timer state in these cases */
 		if (evtims[i]->state == RTE_EVENT_TIMER_CANCELED) {
@@ -1238,54 +1127,56 @@ sw_event_timer_cancel_burst(const struct rte_event_timer_adapter *adapter,
 			break;
 		}
 
-		msgs[i]->type = MSG_TYPE_CANCEL;
-		msgs[i]->evtim = evtims[i];
+		rte_smp_rmb();
+
+		opaque = evtims[i]->impl_opaque[0];
+		timp = (struct rte_timer *)(uintptr_t)opaque;
+		RTE_ASSERT(timp != NULL);
 
-		if (rte_ring_enqueue(sw_data->msg_ring, msgs[i]) < 0) {
-			rte_errno = ENOSPC;
+		ret = rte_timer_alt_stop(sw->timer_data_id, timp);
+		if (ret < 0) {
+			/* Timer is running or being configured */
+			rte_errno = EAGAIN;
 			break;
 		}
 
-		EVTIM_LOG_DBG("enqueued CANCEL message to ring");
+		rte_mempool_put(sw->tim_pool, (void **)timp);
 
 		evtims[i]->state = RTE_EVENT_TIMER_CANCELED;
-	}
+		evtims[i]->impl_opaque[0] = 0;
+		evtims[i]->impl_opaque[1] = 0;
 
-	/* Let the service know we're done producing messages */
-	rte_atomic16_dec(&sw_data->message_producer_count);
-
-	if (i < nb_evtims)
-		rte_mempool_put_bulk(sw_data->msg_pool, (void **)&msgs[i],
-				     nb_evtims - i);
+		rte_smp_wmb();
+	}
 
 	return i;
 }
 
 static uint16_t
-sw_event_timer_arm_tmo_tick_burst(const struct rte_event_timer_adapter *adapter,
-				  struct rte_event_timer **evtims,
-				  uint64_t timeout_ticks,
-				  uint16_t nb_evtims)
+swtim_arm_tmo_tick_burst(const struct rte_event_timer_adapter *adapter,
+			 struct rte_event_timer **evtims,
+			 uint64_t timeout_ticks,
+			 uint16_t nb_evtims)
 {
 	int i;
 
 	for (i = 0; i < nb_evtims; i++)
 		evtims[i]->timeout_ticks = timeout_ticks;
 
-	return __sw_event_timer_arm_burst(adapter, evtims, nb_evtims);
+	return __swtim_arm_burst(adapter, evtims, nb_evtims);
 }
 
-static const struct rte_event_timer_adapter_ops sw_event_adapter_timer_ops = {
-	.init = sw_event_timer_adapter_init,
-	.uninit = sw_event_timer_adapter_uninit,
-	.start = sw_event_timer_adapter_start,
-	.stop = sw_event_timer_adapter_stop,
-	.get_info = sw_event_timer_adapter_get_info,
-	.stats_get = sw_event_timer_adapter_stats_get,
-	.stats_reset = sw_event_timer_adapter_stats_reset,
-	.arm_burst = sw_event_timer_arm_burst,
-	.arm_tmo_tick_burst = sw_event_timer_arm_tmo_tick_burst,
-	.cancel_burst = sw_event_timer_cancel_burst,
+static const struct rte_event_timer_adapter_ops swtim_ops = {
+	.init			= swtim_init,
+	.uninit			= swtim_uninit,
+	.start			= swtim_start,
+	.stop			= swtim_stop,
+	.get_info		= swtim_get_info,
+	.stats_get		= swtim_stats_get,
+	.stats_reset		= swtim_stats_reset,
+	.arm_burst		= swtim_arm_burst,
+	.arm_tmo_tick_burst	= swtim_arm_tmo_tick_burst,
+	.cancel_burst		= swtim_cancel_burst,
 };
 
 RTE_INIT(event_timer_adapter_init_log)
-- 
2.6.4

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [dpdk-dev] [PATCH v6 1/1] eventdev: add new software event timer adapter
  2019-04-26 15:14           ` [dpdk-dev] [PATCH v6 1/1] eventdev: add new " Erik Gabriel Carrillo
@ 2019-04-26 15:14             ` Erik Gabriel Carrillo
  2019-04-26 18:51             ` Honnappa Nagarahalli
  1 sibling, 0 replies; 77+ messages in thread
From: Erik Gabriel Carrillo @ 2019-04-26 15:14 UTC (permalink / raw)
  To: jerin.jacob; +Cc: mattias.ronnblom, pbhagavatula, dev

This patch introduces a new version of the event timer adapter software
PMD. In the original design, timer event producer lcores in the primary
and secondary processes enqueued event timers into a ring, and a
service core in the primary process dequeued them and processed them
further.  To improve performance, this version does away with the ring
and lets lcores in both primary and secondary processes insert timers
directly into timer skiplist data structures; the service core directly
accesses the lists as well, when looking for timers that have expired.

Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
---
 lib/librte_eventdev/rte_event_timer_adapter.c | 733 +++++++++++---------------
 1 file changed, 312 insertions(+), 421 deletions(-)

diff --git a/lib/librte_eventdev/rte_event_timer_adapter.c b/lib/librte_eventdev/rte_event_timer_adapter.c
index 2f7a760..e9104f8 100644
--- a/lib/librte_eventdev/rte_event_timer_adapter.c
+++ b/lib/librte_eventdev/rte_event_timer_adapter.c
@@ -34,7 +34,7 @@ static int evtim_buffer_logtype;
 
 static struct rte_event_timer_adapter adapters[RTE_EVENT_TIMER_ADAPTER_NUM_MAX];
 
-static const struct rte_event_timer_adapter_ops sw_event_adapter_timer_ops;
+static const struct rte_event_timer_adapter_ops swtim_ops;
 
 #define EVTIM_LOG(level, logtype, ...) \
 	rte_log(RTE_LOG_ ## level, logtype, \
@@ -211,7 +211,7 @@ rte_event_timer_adapter_create_ext(
 	 * implementation.
 	 */
 	if (adapter->ops == NULL)
-		adapter->ops = &sw_event_adapter_timer_ops;
+		adapter->ops = &swtim_ops;
 
 	/* Allow driver to do some setup */
 	FUNC_PTR_OR_NULL_RET_WITH_ERRNO(adapter->ops->init, -ENOTSUP);
@@ -340,7 +340,7 @@ rte_event_timer_adapter_lookup(uint16_t adapter_id)
 	 * implementation.
 	 */
 	if (adapter->ops == NULL)
-		adapter->ops = &sw_event_adapter_timer_ops;
+		adapter->ops = &swtim_ops;
 
 	/* Set fast-path function pointers */
 	adapter->arm_burst = adapter->ops->arm_burst;
@@ -428,8 +428,8 @@ rte_event_timer_adapter_stats_reset(struct rte_event_timer_adapter *adapter)
 #define EVENT_BUFFER_MASK (EVENT_BUFFER_SZ - 1)
 
 struct event_buffer {
-	uint16_t head;
-	uint16_t tail;
+	size_t head;
+	size_t tail;
 	struct rte_event events[EVENT_BUFFER_SZ];
 } __rte_cache_aligned;
 
@@ -455,7 +455,7 @@ event_buffer_init(struct event_buffer *bufp)
 static int
 event_buffer_add(struct event_buffer *bufp, struct rte_event *eventp)
 {
-	uint16_t head_idx;
+	size_t head_idx;
 	struct rte_event *buf_eventp;
 
 	if (event_buffer_full(bufp))
@@ -477,13 +477,16 @@ event_buffer_flush(struct event_buffer *bufp, uint8_t dev_id, uint8_t port_id,
 		   uint16_t *nb_events_flushed,
 		   uint16_t *nb_events_inv)
 {
-	uint16_t head_idx, tail_idx, n = 0;
 	struct rte_event *events = bufp->events;
+	size_t head_idx, tail_idx;
+	uint16_t n = 0;
 
 	/* Instead of modulus, bitwise AND with mask to get index. */
 	head_idx = bufp->head & EVENT_BUFFER_MASK;
 	tail_idx = bufp->tail & EVENT_BUFFER_MASK;
 
+	RTE_ASSERT(head_idx < EVENT_BUFFER_SZ && tail_idx < EVENT_BUFFER_SZ);
+
 	/* Determine the largest contigous run we can attempt to enqueue to the
 	 * event device.
 	 */
@@ -491,150 +494,155 @@ event_buffer_flush(struct event_buffer *bufp, uint8_t dev_id, uint8_t port_id,
 		n = head_idx - tail_idx;
 	else if (head_idx < tail_idx)
 		n = EVENT_BUFFER_SZ - tail_idx;
+	else if (event_buffer_full(bufp))
+		n = EVENT_BUFFER_SZ - tail_idx;
 	else {
 		*nb_events_flushed = 0;
 		return;
 	}
 
+	n = RTE_MIN(EVENT_BUFFER_BATCHSZ, n);
 	*nb_events_inv = 0;
+
 	*nb_events_flushed = rte_event_enqueue_burst(dev_id, port_id,
 						     &events[tail_idx], n);
-	if (*nb_events_flushed != n && rte_errno == -EINVAL) {
-		EVTIM_LOG_ERR("failed to enqueue invalid event - dropping it");
-		(*nb_events_inv)++;
+	if (*nb_events_flushed != n) {
+		if (rte_errno == -EINVAL) {
+			EVTIM_LOG_ERR("failed to enqueue invalid event - "
+				      "dropping it");
+			(*nb_events_inv)++;
+		} else if (rte_errno == -ENOSPC)
+			rte_pause();
 	}
 
+	if (*nb_events_flushed > 0)
+		EVTIM_BUF_LOG_DBG("enqueued %"PRIu16" timer events to event "
+				  "device", *nb_events_flushed);
+
 	bufp->tail = bufp->tail + *nb_events_flushed + *nb_events_inv;
 }
 
 /*
  * Software event timer adapter implementation
  */
-
-struct rte_event_timer_adapter_sw_data {
-	/* List of messages for outstanding timers */
-	TAILQ_HEAD(, msg) msgs_tailq_head;
-	/* Lock to guard tailq and armed count */
-	rte_spinlock_t msgs_tailq_sl;
+struct swtim {
 	/* Identifier of service executing timer management logic. */
 	uint32_t service_id;
 	/* The cycle count at which the adapter should next tick */
 	uint64_t next_tick_cycles;
-	/* Incremented as the service moves through phases of an iteration */
-	volatile int service_phase;
 	/* The tick resolution used by adapter instance. May have been
 	 * adjusted from what user requested
 	 */
 	uint64_t timer_tick_ns;
 	/* Maximum timeout in nanoseconds allowed by adapter instance. */
 	uint64_t max_tmo_ns;
-	/* Ring containing messages to arm or cancel event timers */
-	struct rte_ring *msg_ring;
-	/* Mempool containing msg objects */
-	struct rte_mempool *msg_pool;
 	/* Buffered timer expiry events to be enqueued to an event device. */
 	struct event_buffer buffer;
 	/* Statistics */
 	struct rte_event_timer_adapter_stats stats;
-	/* The number of threads currently adding to the message ring */
-	rte_atomic16_t message_producer_count;
+	/* Mempool of timer objects */
+	struct rte_mempool *tim_pool;
+	/* Back pointer for convenience */
+	struct rte_event_timer_adapter *adapter;
+	/* Identifier of timer data instance */
+	uint32_t timer_data_id;
+	/* Track which cores have actually armed a timer */
+	struct {
+		rte_atomic16_t v;
+	} __rte_cache_aligned in_use[RTE_MAX_LCORE];
+	/* Track which cores' timer lists should be polled */
+	unsigned int poll_lcores[RTE_MAX_LCORE];
+	/* The number of lists that should be polled */
+	int n_poll_lcores;
+	/* Lock to atomically access the above two variables */
+	rte_spinlock_t poll_lcores_sl;
+
+	struct rte_timer *expired_timers[EVENT_BUFFER_SZ];
+	size_t expired_timers_idx;
 };
 
-enum msg_type {MSG_TYPE_ARM, MSG_TYPE_CANCEL};
-
-struct msg {
-	enum msg_type type;
-	struct rte_event_timer *evtim;
-	struct rte_timer tim;
-	TAILQ_ENTRY(msg) msgs;
-};
+static inline struct swtim *
+swtim_pmd_priv(const struct rte_event_timer_adapter *adapter)
+{
+	return adapter->data->adapter_priv;
+}
 
 static void
-sw_event_timer_cb(struct rte_timer *tim, void *arg)
+swtim_callback(struct rte_timer *tim)
 {
-	int ret;
+	struct rte_event_timer *evtim = tim->arg;
+	struct rte_event_timer_adapter *adapter;
+	unsigned int lcore = rte_lcore_id();
+	struct swtim *sw;
 	uint16_t nb_evs_flushed = 0;
 	uint16_t nb_evs_invalid = 0;
 	uint64_t opaque;
-	struct rte_event_timer *evtim;
-	struct rte_event_timer_adapter *adapter;
-	struct rte_event_timer_adapter_sw_data *sw_data;
+	int ret;
 
-	evtim = arg;
 	opaque = evtim->impl_opaque[1];
 	adapter = (struct rte_event_timer_adapter *)(uintptr_t)opaque;
-	sw_data = adapter->data->adapter_priv;
+	sw = swtim_pmd_priv(adapter);
 
-	ret = event_buffer_add(&sw_data->buffer, &evtim->ev);
+	ret = event_buffer_add(&sw->buffer, &evtim->ev);
 	if (ret < 0) {
 		/* If event buffer is full, put timer back in list with
 		 * immediate expiry value, so that we process it again on the
 		 * next iteration.
 		 */
-		rte_timer_reset_sync(tim, 0, SINGLE, rte_lcore_id(),
-				     sw_event_timer_cb, evtim);
+		ret = rte_timer_alt_reset(sw->timer_data_id, tim, 0, SINGLE,
+					  lcore, NULL, evtim);
+		if (ret < 0) {
+			EVTIM_LOG_DBG("event buffer full, failed to reset "
+				      "timer with immediate expiry value");
+		} else {
+			sw->stats.evtim_retry_count++;
+			EVTIM_LOG_DBG("event buffer full, resetting rte_timer "
+				      "with immediate expiry value");
+		}
 
-		sw_data->stats.evtim_retry_count++;
-		EVTIM_LOG_DBG("event buffer full, resetting rte_timer with "
-			      "immediate expiry value");
+		if (unlikely(rte_atomic16_test_and_set(&sw->in_use[lcore].v)))
+			sw->poll_lcores[sw->n_poll_lcores++] = lcore;
 	} else {
-		struct msg *m = container_of(tim, struct msg, tim);
-		TAILQ_REMOVE(&sw_data->msgs_tailq_head, m, msgs);
 		EVTIM_BUF_LOG_DBG("buffered an event timer expiry event");
-		evtim->state = RTE_EVENT_TIMER_NOT_ARMED;
-
-		/* Free the msg object containing the rte_timer now that
-		 * we've buffered its event successfully.
-		 */
-		rte_mempool_put(sw_data->msg_pool, m);
+		sw->expired_timers[sw->expired_timers_idx++] = tim;
+		sw->stats.evtim_exp_count++;
 
-		/* Bump the count when we successfully add an expiry event to
-		 * the buffer.
-		 */
-		sw_data->stats.evtim_exp_count++;
+		evtim->state = RTE_EVENT_TIMER_NOT_ARMED;
 	}
 
-	if (event_buffer_batch_ready(&sw_data->buffer)) {
-		event_buffer_flush(&sw_data->buffer,
+	if (event_buffer_batch_ready(&sw->buffer)) {
+		event_buffer_flush(&sw->buffer,
 				   adapter->data->event_dev_id,
 				   adapter->data->event_port_id,
 				   &nb_evs_flushed,
 				   &nb_evs_invalid);
 
-		sw_data->stats.ev_enq_count += nb_evs_flushed;
-		sw_data->stats.ev_inv_count += nb_evs_invalid;
+		sw->stats.ev_enq_count += nb_evs_flushed;
+		sw->stats.ev_inv_count += nb_evs_invalid;
 	}
 }
 
 static __rte_always_inline uint64_t
 get_timeout_cycles(struct rte_event_timer *evtim,
-		   struct rte_event_timer_adapter *adapter)
+		   const struct rte_event_timer_adapter *adapter)
 {
-	uint64_t timeout_ns;
-	struct rte_event_timer_adapter_sw_data *sw_data;
-
-	sw_data = adapter->data->adapter_priv;
-	timeout_ns = evtim->timeout_ticks * sw_data->timer_tick_ns;
+	struct swtim *sw = swtim_pmd_priv(adapter);
+	uint64_t timeout_ns = evtim->timeout_ticks * sw->timer_tick_ns;
 	return timeout_ns * rte_get_timer_hz() / NSECPERSEC;
-
 }
 
 /* This function returns true if one or more (adapter) ticks have occurred since
  * the last time it was called.
  */
 static inline bool
-adapter_did_tick(struct rte_event_timer_adapter *adapter)
+swtim_did_tick(struct swtim *sw)
 {
 	uint64_t cycles_per_adapter_tick, start_cycles;
 	uint64_t *next_tick_cyclesp;
-	struct rte_event_timer_adapter_sw_data *sw_data;
 
-	sw_data = adapter->data->adapter_priv;
-	next_tick_cyclesp = &sw_data->next_tick_cycles;
-
-	cycles_per_adapter_tick = sw_data->timer_tick_ns *
+	next_tick_cyclesp = &sw->next_tick_cycles;
+	cycles_per_adapter_tick = sw->timer_tick_ns *
 			(rte_get_timer_hz() / NSECPERSEC);
-
 	start_cycles = rte_get_timer_cycles();
 
 	/* Note: initially, *next_tick_cyclesp == 0, so the clause below will
@@ -646,7 +654,6 @@ adapter_did_tick(struct rte_event_timer_adapter *adapter)
 		 * boundary.
 		 */
 		start_cycles -= start_cycles % cycles_per_adapter_tick;
-
 		*next_tick_cyclesp = start_cycles + cycles_per_adapter_tick;
 
 		return true;
@@ -661,15 +668,12 @@ check_timeout(struct rte_event_timer *evtim,
 	      const struct rte_event_timer_adapter *adapter)
 {
 	uint64_t tmo_nsec;
-	struct rte_event_timer_adapter_sw_data *sw_data;
-
-	sw_data = adapter->data->adapter_priv;
-	tmo_nsec = evtim->timeout_ticks * sw_data->timer_tick_ns;
+	struct swtim *sw = swtim_pmd_priv(adapter);
 
-	if (tmo_nsec > sw_data->max_tmo_ns)
+	tmo_nsec = evtim->timeout_ticks * sw->timer_tick_ns;
+	if (tmo_nsec > sw->max_tmo_ns)
 		return -1;
-
-	if (tmo_nsec < sw_data->timer_tick_ns)
+	if (tmo_nsec < sw->timer_tick_ns)
 		return -2;
 
 	return 0;
@@ -697,110 +701,41 @@ check_destination_event_queue(struct rte_event_timer *evtim,
 	return 0;
 }
 
-#define NB_OBJS 32
 static int
-sw_event_timer_adapter_service_func(void *arg)
+swtim_service_func(void *arg)
 {
-	int i, num_msgs;
-	uint64_t cycles, opaque;
+	struct rte_event_timer_adapter *adapter = arg;
+	struct swtim *sw = swtim_pmd_priv(adapter);
 	uint16_t nb_evs_flushed = 0;
 	uint16_t nb_evs_invalid = 0;
-	struct rte_event_timer_adapter *adapter;
-	struct rte_event_timer_adapter_sw_data *sw_data;
-	struct rte_event_timer *evtim = NULL;
-	struct rte_timer *tim = NULL;
-	struct msg *msg, *msgs[NB_OBJS];
-
-	adapter = arg;
-	sw_data = adapter->data->adapter_priv;
-
-	sw_data->service_phase = 1;
-	rte_smp_wmb();
-
-	while (rte_atomic16_read(&sw_data->message_producer_count) > 0 ||
-	       !rte_ring_empty(sw_data->msg_ring)) {
-
-		num_msgs = rte_ring_dequeue_burst(sw_data->msg_ring,
-						  (void **)msgs, NB_OBJS, NULL);
-
-		for (i = 0; i < num_msgs; i++) {
-			int ret = 0;
-
-			RTE_SET_USED(ret);
-
-			msg = msgs[i];
-			evtim = msg->evtim;
-
-			switch (msg->type) {
-			case MSG_TYPE_ARM:
-				EVTIM_SVC_LOG_DBG("dequeued ARM message from "
-						  "ring");
-				tim = &msg->tim;
-				rte_timer_init(tim);
-				cycles = get_timeout_cycles(evtim,
-							    adapter);
-				ret = rte_timer_reset(tim, cycles, SINGLE,
-						      rte_lcore_id(),
-						      sw_event_timer_cb,
-						      evtim);
-				RTE_ASSERT(ret == 0);
-
-				evtim->impl_opaque[0] = (uintptr_t)tim;
-				evtim->impl_opaque[1] = (uintptr_t)adapter;
-
-				TAILQ_INSERT_TAIL(&sw_data->msgs_tailq_head,
-						  msg,
-						  msgs);
-				break;
-			case MSG_TYPE_CANCEL:
-				EVTIM_SVC_LOG_DBG("dequeued CANCEL message "
-						  "from ring");
-				opaque = evtim->impl_opaque[0];
-				tim = (struct rte_timer *)(uintptr_t)opaque;
-				RTE_ASSERT(tim != NULL);
-
-				ret = rte_timer_stop(tim);
-				RTE_ASSERT(ret == 0);
-
-				/* Free the msg object for the original arm
-				 * request.
-				 */
-				struct msg *m;
-				m = container_of(tim, struct msg, tim);
-				TAILQ_REMOVE(&sw_data->msgs_tailq_head, m,
-					     msgs);
-				rte_mempool_put(sw_data->msg_pool, m);
-
-				/* Free the msg object for the current msg */
-				rte_mempool_put(sw_data->msg_pool, msg);
-
-				evtim->impl_opaque[0] = 0;
-				evtim->impl_opaque[1] = 0;
-
-				break;
-			}
-		}
-	}
 
-	sw_data->service_phase = 2;
-	rte_smp_wmb();
+	if (swtim_did_tick(sw)) {
+		/* This lock is seldom acquired on the arm side */
+		rte_spinlock_lock(&sw->poll_lcores_sl);
+
+		rte_timer_alt_manage(sw->timer_data_id,
+				     sw->poll_lcores,
+				     sw->n_poll_lcores,
+				     swtim_callback);
 
-	if (adapter_did_tick(adapter)) {
-		rte_timer_manage();
+		rte_spinlock_unlock(&sw->poll_lcores_sl);
 
-		event_buffer_flush(&sw_data->buffer,
+		/* Return expired timer objects back to mempool */
+		rte_mempool_put_bulk(sw->tim_pool, (void **)sw->expired_timers,
+				     sw->expired_timers_idx);
+		sw->expired_timers_idx = 0;
+
+		event_buffer_flush(&sw->buffer,
 				   adapter->data->event_dev_id,
 				   adapter->data->event_port_id,
-				   &nb_evs_flushed, &nb_evs_invalid);
+				   &nb_evs_flushed,
+				   &nb_evs_invalid);
 
-		sw_data->stats.ev_enq_count += nb_evs_flushed;
-		sw_data->stats.ev_inv_count += nb_evs_invalid;
-		sw_data->stats.adapter_tick_count++;
+		sw->stats.ev_enq_count += nb_evs_flushed;
+		sw->stats.ev_inv_count += nb_evs_invalid;
+		sw->stats.adapter_tick_count++;
 	}
 
-	sw_data->service_phase = 0;
-	rte_smp_wmb();
-
 	return 0;
 }
 
@@ -820,7 +755,7 @@ compute_msg_mempool_cache_size(uint64_t nb_requested, uint64_t nb_actual)
 	int size;
 	int cache_size = 0;
 
-	for (i = 0; ; i++) {
+	for (i = 0;; i++) {
 		size = 1 << i;
 
 		if (RTE_MAX_LCORE * size < (int)(nb_actual - nb_requested) &&
@@ -834,168 +769,145 @@ compute_msg_mempool_cache_size(uint64_t nb_requested, uint64_t nb_actual)
 	return cache_size;
 }
 
-#define SW_MIN_INTERVAL 1E5
-
 static int
-sw_event_timer_adapter_init(struct rte_event_timer_adapter *adapter)
+swtim_init(struct rte_event_timer_adapter *adapter)
 {
-	int ret;
-	struct rte_event_timer_adapter_sw_data *sw_data;
-	uint64_t nb_timers;
+	int i, ret;
+	struct swtim *sw;
 	unsigned int flags;
 	struct rte_service_spec service;
-	static bool timer_subsystem_inited; // static initialized to false
 
-	/* Allocate storage for SW implementation data */
-	char priv_data_name[RTE_RING_NAMESIZE];
-	snprintf(priv_data_name, RTE_RING_NAMESIZE, "sw_evtim_adap_priv_%"PRIu8,
-		 adapter->data->id);
-	adapter->data->adapter_priv = rte_zmalloc_socket(
-				priv_data_name,
-				sizeof(struct rte_event_timer_adapter_sw_data),
-				RTE_CACHE_LINE_SIZE,
-				adapter->data->socket_id);
-	if (adapter->data->adapter_priv == NULL) {
+	/* Allocate storage for private data area */
+#define SWTIM_NAMESIZE 32
+	char swtim_name[SWTIM_NAMESIZE];
+	snprintf(swtim_name, SWTIM_NAMESIZE, "swtim_%"PRIu8,
+			adapter->data->id);
+	sw = rte_zmalloc_socket(swtim_name, sizeof(*sw), RTE_CACHE_LINE_SIZE,
+			adapter->data->socket_id);
+	if (sw == NULL) {
 		EVTIM_LOG_ERR("failed to allocate space for private data");
 		rte_errno = ENOMEM;
 		return -1;
 	}
 
-	if (adapter->data->conf.timer_tick_ns < SW_MIN_INTERVAL) {
-		EVTIM_LOG_ERR("failed to create adapter with requested tick "
-			      "interval");
-		rte_errno = EINVAL;
-		return -1;
-	}
-
-	sw_data = adapter->data->adapter_priv;
-
-	sw_data->timer_tick_ns = adapter->data->conf.timer_tick_ns;
-	sw_data->max_tmo_ns = adapter->data->conf.max_tmo_ns;
-
-	TAILQ_INIT(&sw_data->msgs_tailq_head);
-	rte_spinlock_init(&sw_data->msgs_tailq_sl);
-	rte_atomic16_init(&sw_data->message_producer_count);
+	/* Connect storage to adapter instance */
+	adapter->data->adapter_priv = sw;
+	sw->adapter = adapter;
 
-	/* Rings require power of 2, so round up to next such value */
-	nb_timers = rte_align64pow2(adapter->data->conf.nb_timers);
+	sw->timer_tick_ns = adapter->data->conf.timer_tick_ns;
+	sw->max_tmo_ns = adapter->data->conf.max_tmo_ns;
 
-	char msg_ring_name[RTE_RING_NAMESIZE];
-	snprintf(msg_ring_name, RTE_RING_NAMESIZE,
-		 "sw_evtim_adap_msg_ring_%"PRIu8, adapter->data->id);
-	flags = adapter->data->conf.flags & RTE_EVENT_TIMER_ADAPTER_F_SP_PUT ?
-		RING_F_SP_ENQ | RING_F_SC_DEQ :
-		RING_F_SC_DEQ;
-	sw_data->msg_ring = rte_ring_create(msg_ring_name, nb_timers,
-					    adapter->data->socket_id, flags);
-	if (sw_data->msg_ring == NULL) {
-		EVTIM_LOG_ERR("failed to create message ring");
-		rte_errno = ENOMEM;
-		goto free_priv_data;
-	}
-
-	char pool_name[RTE_RING_NAMESIZE];
-	snprintf(pool_name, RTE_RING_NAMESIZE, "sw_evtim_adap_msg_pool_%"PRIu8,
+	/* Create a timer pool */
+	char pool_name[SWTIM_NAMESIZE];
+	snprintf(pool_name, SWTIM_NAMESIZE, "swtim_pool_%"PRIu8,
 		 adapter->data->id);
-
-	/* Both the arming/canceling thread and the service thread will do puts
-	 * to the mempool, but if the SP_PUT flag is enabled, we can specify
-	 * single-consumer get for the mempool.
-	 */
-	flags = adapter->data->conf.flags & RTE_EVENT_TIMER_ADAPTER_F_SP_PUT ?
-		MEMPOOL_F_SC_GET : 0;
-
-	/* The usable size of a ring is count - 1, so subtract one here to
-	 * make the counts agree.
-	 */
+	/* Optimal mempool size is a power of 2 minus one */
+	uint64_t nb_timers = rte_align64pow2(adapter->data->conf.nb_timers);
 	int pool_size = nb_timers - 1;
 	int cache_size = compute_msg_mempool_cache_size(
 				adapter->data->conf.nb_timers, nb_timers);
-	sw_data->msg_pool = rte_mempool_create(pool_name, pool_size,
-					       sizeof(struct msg), cache_size,
-					       0, NULL, NULL, NULL, NULL,
-					       adapter->data->socket_id, flags);
-	if (sw_data->msg_pool == NULL) {
-		EVTIM_LOG_ERR("failed to create message object mempool");
+	flags = 0; /* pool is multi-producer, multi-consumer */
+	sw->tim_pool = rte_mempool_create(pool_name, pool_size,
+			sizeof(struct rte_timer), cache_size, 0, NULL, NULL,
+			NULL, NULL, adapter->data->socket_id, flags);
+	if (sw->tim_pool == NULL) {
+		EVTIM_LOG_ERR("failed to create timer object mempool");
 		rte_errno = ENOMEM;
-		goto free_msg_ring;
+		goto free_alloc;
 	}
 
-	event_buffer_init(&sw_data->buffer);
+	/* Initialize the variables that track in-use timer lists */
+	rte_spinlock_init(&sw->poll_lcores_sl);
+	for (i = 0; i < RTE_MAX_LCORE; i++)
+		rte_atomic16_init(&sw->in_use[i].v);
+
+	/* Initialize the timer subsystem and allocate timer data instance */
+	ret = rte_timer_subsystem_init();
+	if (ret < 0) {
+		if (ret != -EALREADY) {
+			EVTIM_LOG_ERR("failed to initialize timer subsystem");
+			rte_errno = ret;
+			goto free_mempool;
+		}
+	}
+
+	ret = rte_timer_data_alloc(&sw->timer_data_id);
+	if (ret < 0) {
+		EVTIM_LOG_ERR("failed to allocate timer data instance");
+		rte_errno = ret;
+		goto free_mempool;
+	}
+
+	/* Initialize timer event buffer */
+	event_buffer_init(&sw->buffer);
+
+	sw->adapter = adapter;
 
 	/* Register a service component to run adapter logic */
 	memset(&service, 0, sizeof(service));
 	snprintf(service.name, RTE_SERVICE_NAME_MAX,
-		 "sw_evimer_adap_svc_%"PRIu8, adapter->data->id);
+		 "swtim_svc_%"PRIu8, adapter->data->id);
 	service.socket_id = adapter->data->socket_id;
-	service.callback = sw_event_timer_adapter_service_func;
+	service.callback = swtim_service_func;
 	service.callback_userdata = adapter;
 	service.capabilities &= ~(RTE_SERVICE_CAP_MT_SAFE);
-	ret = rte_service_component_register(&service, &sw_data->service_id);
+	ret = rte_service_component_register(&service, &sw->service_id);
 	if (ret < 0) {
 		EVTIM_LOG_ERR("failed to register service %s with id %"PRIu32
-			      ": err = %d", service.name, sw_data->service_id,
+			      ": err = %d", service.name, sw->service_id,
 			      ret);
 
 		rte_errno = ENOSPC;
-		goto free_msg_pool;
+		goto free_mempool;
 	}
 
 	EVTIM_LOG_DBG("registered service %s with id %"PRIu32, service.name,
-		      sw_data->service_id);
+		      sw->service_id);
 
-	adapter->data->service_id = sw_data->service_id;
+	adapter->data->service_id = sw->service_id;
 	adapter->data->service_inited = 1;
 
-	if (!timer_subsystem_inited) {
-		rte_timer_subsystem_init();
-		timer_subsystem_inited = true;
-	}
-
 	return 0;
-
-free_msg_pool:
-	rte_mempool_free(sw_data->msg_pool);
-free_msg_ring:
-	rte_ring_free(sw_data->msg_ring);
-free_priv_data:
-	rte_free(sw_data);
+free_mempool:
+	rte_mempool_free(sw->tim_pool);
+free_alloc:
+	rte_free(sw);
 	return -1;
 }
 
-static int
-sw_event_timer_adapter_uninit(struct rte_event_timer_adapter *adapter)
+static void
+swtim_free_tim(struct rte_timer *tim, void *arg)
 {
-	int ret;
-	struct msg *m1, *m2;
-	struct rte_event_timer_adapter_sw_data *sw_data =
-						adapter->data->adapter_priv;
-
-	rte_spinlock_lock(&sw_data->msgs_tailq_sl);
+	struct swtim *sw = arg;
 
-	/* Cancel outstanding rte_timers and free msg objects */
-	m1 = TAILQ_FIRST(&sw_data->msgs_tailq_head);
-	while (m1 != NULL) {
-		EVTIM_LOG_DBG("freeing outstanding timer");
-		m2 = TAILQ_NEXT(m1, msgs);
-
-		rte_timer_stop_sync(&m1->tim);
-		rte_mempool_put(sw_data->msg_pool, m1);
+	rte_mempool_put(sw->tim_pool, tim);
+}
 
-		m1 = m2;
-	}
+/* Traverse the list of outstanding timers and put them back in the mempool
+ * before freeing the adapter to avoid leaking the memory.
+ */
+static int
+swtim_uninit(struct rte_event_timer_adapter *adapter)
+{
+	int ret;
+	struct swtim *sw = swtim_pmd_priv(adapter);
 
-	rte_spinlock_unlock(&sw_data->msgs_tailq_sl);
+	/* Free outstanding timers */
+	rte_timer_stop_all(sw->timer_data_id,
+			   sw->poll_lcores,
+			   sw->n_poll_lcores,
+			   swtim_free_tim,
+			   sw);
 
-	ret = rte_service_component_unregister(sw_data->service_id);
+	ret = rte_service_component_unregister(sw->service_id);
 	if (ret < 0) {
 		EVTIM_LOG_ERR("failed to unregister service component");
 		return ret;
 	}
 
-	rte_ring_free(sw_data->msg_ring);
-	rte_mempool_free(sw_data->msg_pool);
-	rte_free(adapter->data->adapter_priv);
+	rte_mempool_free(sw->tim_pool);
+	rte_free(sw);
+	adapter->data->adapter_priv = NULL;
 
 	return 0;
 }
@@ -1016,88 +928,79 @@ get_mapped_count_for_service(uint32_t service_id)
 }
 
 static int
-sw_event_timer_adapter_start(const struct rte_event_timer_adapter *adapter)
+swtim_start(const struct rte_event_timer_adapter *adapter)
 {
 	int mapped_count;
-	struct rte_event_timer_adapter_sw_data *sw_data;
-
-	sw_data = adapter->data->adapter_priv;
+	struct swtim *sw = swtim_pmd_priv(adapter);
 
 	/* Mapping the service to more than one service core can introduce
 	 * delays while one thread is waiting to acquire a lock, so only allow
 	 * one core to be mapped to the service.
+	 *
+	 * Note: the service could be modified such that it spreads cores to
+	 * poll over multiple service instances.
 	 */
-	mapped_count = get_mapped_count_for_service(sw_data->service_id);
+	mapped_count = get_mapped_count_for_service(sw->service_id);
 
-	if (mapped_count == 1)
-		return rte_service_component_runstate_set(sw_data->service_id,
-							  1);
+	if (mapped_count != 1)
+		return mapped_count < 1 ? -ENOENT : -ENOTSUP;
 
-	return mapped_count < 1 ? -ENOENT : -ENOTSUP;
+	return rte_service_component_runstate_set(sw->service_id, 1);
 }
 
 static int
-sw_event_timer_adapter_stop(const struct rte_event_timer_adapter *adapter)
+swtim_stop(const struct rte_event_timer_adapter *adapter)
 {
 	int ret;
-	struct rte_event_timer_adapter_sw_data *sw_data =
-						adapter->data->adapter_priv;
+	struct swtim *sw = swtim_pmd_priv(adapter);
 
-	ret = rte_service_component_runstate_set(sw_data->service_id, 0);
+	ret = rte_service_component_runstate_set(sw->service_id, 0);
 	if (ret < 0)
 		return ret;
 
-	/* Wait for the service to complete its final iteration before
-	 * stopping.
-	 */
-	while (sw_data->service_phase != 0)
+	/* Wait for the service to complete its final iteration */
+	while (rte_service_may_be_active(sw->service_id))
 		rte_pause();
 
-	rte_smp_rmb();
-
 	return 0;
 }
 
 static void
-sw_event_timer_adapter_get_info(const struct rte_event_timer_adapter *adapter,
+swtim_get_info(const struct rte_event_timer_adapter *adapter,
 		struct rte_event_timer_adapter_info *adapter_info)
 {
-	struct rte_event_timer_adapter_sw_data *sw_data;
-	sw_data = adapter->data->adapter_priv;
-
-	adapter_info->min_resolution_ns = sw_data->timer_tick_ns;
-	adapter_info->max_tmo_ns = sw_data->max_tmo_ns;
+	struct swtim *sw = swtim_pmd_priv(adapter);
+	adapter_info->min_resolution_ns = sw->timer_tick_ns;
+	adapter_info->max_tmo_ns = sw->max_tmo_ns;
 }
 
 static int
-sw_event_timer_adapter_stats_get(const struct rte_event_timer_adapter *adapter,
-				 struct rte_event_timer_adapter_stats *stats)
+swtim_stats_get(const struct rte_event_timer_adapter *adapter,
+		struct rte_event_timer_adapter_stats *stats)
 {
-	struct rte_event_timer_adapter_sw_data *sw_data;
-	sw_data = adapter->data->adapter_priv;
-	*stats = sw_data->stats;
+	struct swtim *sw = swtim_pmd_priv(adapter);
+	*stats = sw->stats; /* structure copy */
 	return 0;
 }
 
 static int
-sw_event_timer_adapter_stats_reset(
-				const struct rte_event_timer_adapter *adapter)
+swtim_stats_reset(const struct rte_event_timer_adapter *adapter)
 {
-	struct rte_event_timer_adapter_sw_data *sw_data;
-	sw_data = adapter->data->adapter_priv;
-	memset(&sw_data->stats, 0, sizeof(sw_data->stats));
+	struct swtim *sw = swtim_pmd_priv(adapter);
+	memset(&sw->stats, 0, sizeof(sw->stats));
 	return 0;
 }
 
-static __rte_always_inline uint16_t
-__sw_event_timer_arm_burst(const struct rte_event_timer_adapter *adapter,
-			  struct rte_event_timer **evtims,
-			  uint16_t nb_evtims)
+static uint16_t
+__swtim_arm_burst(const struct rte_event_timer_adapter *adapter,
+		struct rte_event_timer **evtims,
+		uint16_t nb_evtims)
 {
-	uint16_t i;
-	int ret;
-	struct rte_event_timer_adapter_sw_data *sw_data;
-	struct msg *msgs[nb_evtims];
+	int i, ret;
+	struct swtim *sw = swtim_pmd_priv(adapter);
+	uint32_t lcore_id = rte_lcore_id();
+	struct rte_timer *tim, *tims[nb_evtims];
+	uint64_t cycles;
 
 #ifdef RTE_LIBRTE_EVENTDEV_DEBUG
 	/* Check that the service is running. */
@@ -1107,101 +1010,104 @@ __sw_event_timer_arm_burst(const struct rte_event_timer_adapter *adapter,
 	}
 #endif
 
-	sw_data = adapter->data->adapter_priv;
+	/* Adjust lcore_id if non-EAL thread. Arbitrarily pick the timer list of
+	 * the highest lcore to insert such timers into
+	 */
+	if (lcore_id == LCORE_ID_ANY)
+		lcore_id = RTE_MAX_LCORE - 1;
+
+	/* If this is the first time we're arming an event timer on this lcore,
+	 * mark this lcore as "in use"; this will cause the service
+	 * function to process the timer list that corresponds to this lcore.
+	 */
+	if (unlikely(rte_atomic16_test_and_set(&sw->in_use[lcore_id].v))) {
+		rte_spinlock_lock(&sw->poll_lcores_sl);
+		EVTIM_LOG_DBG("Adding lcore id = %u to list of lcores to poll",
+			      lcore_id);
+		sw->poll_lcores[sw->n_poll_lcores++] = lcore_id;
+		rte_spinlock_unlock(&sw->poll_lcores_sl);
+	}
 
-	ret = rte_mempool_get_bulk(sw_data->msg_pool, (void **)msgs, nb_evtims);
+	ret = rte_mempool_get_bulk(sw->tim_pool, (void **)tims,
+				   nb_evtims);
 	if (ret < 0) {
 		rte_errno = ENOSPC;
 		return 0;
 	}
 
-	/* Let the service know we're producing messages for it to process */
-	rte_atomic16_inc(&sw_data->message_producer_count);
-
-	/* If the service is managing timers, wait for it to finish */
-	while (sw_data->service_phase == 2)
-		rte_pause();
-
-	rte_smp_rmb();
-
 	for (i = 0; i < nb_evtims; i++) {
 		/* Don't modify the event timer state in these cases */
 		if (evtims[i]->state == RTE_EVENT_TIMER_ARMED) {
 			rte_errno = EALREADY;
 			break;
 		} else if (!(evtims[i]->state == RTE_EVENT_TIMER_NOT_ARMED ||
-		    evtims[i]->state == RTE_EVENT_TIMER_CANCELED)) {
+			     evtims[i]->state == RTE_EVENT_TIMER_CANCELED)) {
 			rte_errno = EINVAL;
 			break;
 		}
 
 		ret = check_timeout(evtims[i], adapter);
-		if (ret == -1) {
+		if (unlikely(ret == -1)) {
 			evtims[i]->state = RTE_EVENT_TIMER_ERROR_TOOLATE;
 			rte_errno = EINVAL;
 			break;
-		}
-		if (ret == -2) {
+		} else if (unlikely(ret == -2)) {
 			evtims[i]->state = RTE_EVENT_TIMER_ERROR_TOOEARLY;
 			rte_errno = EINVAL;
 			break;
 		}
 
-		if (check_destination_event_queue(evtims[i], adapter) < 0) {
+		if (unlikely(check_destination_event_queue(evtims[i],
+							   adapter) < 0)) {
 			evtims[i]->state = RTE_EVENT_TIMER_ERROR;
 			rte_errno = EINVAL;
 			break;
 		}
 
-		/* Checks passed, set up a message to enqueue */
-		msgs[i]->type = MSG_TYPE_ARM;
-		msgs[i]->evtim = evtims[i];
+		tim = tims[i];
+		rte_timer_init(tim);
 
-		/* Set the payload pointer if not set. */
-		if (evtims[i]->ev.event_ptr == NULL)
-			evtims[i]->ev.event_ptr = evtims[i];
+		evtims[i]->impl_opaque[0] = (uintptr_t)tim;
+		evtims[i]->impl_opaque[1] = (uintptr_t)adapter;
 
-		/* msg objects that get enqueued successfully will be freed
-		 * either by a future cancel operation or by the timer
-		 * expiration callback.
-		 */
-		if (rte_ring_enqueue(sw_data->msg_ring, msgs[i]) < 0) {
-			rte_errno = ENOSPC;
+		cycles = get_timeout_cycles(evtims[i], adapter);
+		ret = rte_timer_alt_reset(sw->timer_data_id, tim, cycles,
+					  SINGLE, lcore_id, NULL, evtims[i]);
+		if (ret < 0) {
+			/* tim was in RUNNING or CONFIG state */
+			evtims[i]->state = RTE_EVENT_TIMER_ERROR;
 			break;
 		}
 
-		EVTIM_LOG_DBG("enqueued ARM message to ring");
-
+		rte_smp_wmb();
+		EVTIM_LOG_DBG("armed an event timer");
 		evtims[i]->state = RTE_EVENT_TIMER_ARMED;
 	}
 
-	/* Let the service know we're done producing messages */
-	rte_atomic16_dec(&sw_data->message_producer_count);
-
 	if (i < nb_evtims)
-		rte_mempool_put_bulk(sw_data->msg_pool, (void **)&msgs[i],
-				     nb_evtims - i);
+		rte_mempool_put_bulk(sw->tim_pool,
+				     (void **)&tims[i], nb_evtims - i);
 
 	return i;
 }
 
 static uint16_t
-sw_event_timer_arm_burst(const struct rte_event_timer_adapter *adapter,
-			 struct rte_event_timer **evtims,
-			 uint16_t nb_evtims)
+swtim_arm_burst(const struct rte_event_timer_adapter *adapter,
+		struct rte_event_timer **evtims,
+		uint16_t nb_evtims)
 {
-	return __sw_event_timer_arm_burst(adapter, evtims, nb_evtims);
+	return __swtim_arm_burst(adapter, evtims, nb_evtims);
 }
 
 static uint16_t
-sw_event_timer_cancel_burst(const struct rte_event_timer_adapter *adapter,
-			    struct rte_event_timer **evtims,
-			    uint16_t nb_evtims)
+swtim_cancel_burst(const struct rte_event_timer_adapter *adapter,
+		   struct rte_event_timer **evtims,
+		   uint16_t nb_evtims)
 {
-	uint16_t i;
-	int ret;
-	struct rte_event_timer_adapter_sw_data *sw_data;
-	struct msg *msgs[nb_evtims];
+	int i, ret;
+	struct rte_timer *timp;
+	uint64_t opaque;
+	struct swtim *sw = swtim_pmd_priv(adapter);
 
 #ifdef RTE_LIBRTE_EVENTDEV_DEBUG
 	/* Check that the service is running. */
@@ -1211,23 +1117,6 @@ sw_event_timer_cancel_burst(const struct rte_event_timer_adapter *adapter,
 	}
 #endif
 
-	sw_data = adapter->data->adapter_priv;
-
-	ret = rte_mempool_get_bulk(sw_data->msg_pool, (void **)msgs, nb_evtims);
-	if (ret < 0) {
-		rte_errno = ENOSPC;
-		return 0;
-	}
-
-	/* Let the service know we're producing messages for it to process */
-	rte_atomic16_inc(&sw_data->message_producer_count);
-
-	/* If the service could be modifying event timer states, wait */
-	while (sw_data->service_phase == 2)
-		rte_pause();
-
-	rte_smp_rmb();
-
 	for (i = 0; i < nb_evtims; i++) {
 		/* Don't modify the event timer state in these cases */
 		if (evtims[i]->state == RTE_EVENT_TIMER_CANCELED) {
@@ -1238,54 +1127,56 @@ sw_event_timer_cancel_burst(const struct rte_event_timer_adapter *adapter,
 			break;
 		}
 
-		msgs[i]->type = MSG_TYPE_CANCEL;
-		msgs[i]->evtim = evtims[i];
+		rte_smp_rmb();
+
+		opaque = evtims[i]->impl_opaque[0];
+		timp = (struct rte_timer *)(uintptr_t)opaque;
+		RTE_ASSERT(timp != NULL);
 
-		if (rte_ring_enqueue(sw_data->msg_ring, msgs[i]) < 0) {
-			rte_errno = ENOSPC;
+		ret = rte_timer_alt_stop(sw->timer_data_id, timp);
+		if (ret < 0) {
+			/* Timer is running or being configured */
+			rte_errno = EAGAIN;
 			break;
 		}
 
-		EVTIM_LOG_DBG("enqueued CANCEL message to ring");
+		rte_mempool_put(sw->tim_pool, (void **)timp);
 
 		evtims[i]->state = RTE_EVENT_TIMER_CANCELED;
-	}
+		evtims[i]->impl_opaque[0] = 0;
+		evtims[i]->impl_opaque[1] = 0;
 
-	/* Let the service know we're done producing messages */
-	rte_atomic16_dec(&sw_data->message_producer_count);
-
-	if (i < nb_evtims)
-		rte_mempool_put_bulk(sw_data->msg_pool, (void **)&msgs[i],
-				     nb_evtims - i);
+		rte_smp_wmb();
+	}
 
 	return i;
 }
 
 static uint16_t
-sw_event_timer_arm_tmo_tick_burst(const struct rte_event_timer_adapter *adapter,
-				  struct rte_event_timer **evtims,
-				  uint64_t timeout_ticks,
-				  uint16_t nb_evtims)
+swtim_arm_tmo_tick_burst(const struct rte_event_timer_adapter *adapter,
+			 struct rte_event_timer **evtims,
+			 uint64_t timeout_ticks,
+			 uint16_t nb_evtims)
 {
 	int i;
 
 	for (i = 0; i < nb_evtims; i++)
 		evtims[i]->timeout_ticks = timeout_ticks;
 
-	return __sw_event_timer_arm_burst(adapter, evtims, nb_evtims);
+	return __swtim_arm_burst(adapter, evtims, nb_evtims);
 }
 
-static const struct rte_event_timer_adapter_ops sw_event_adapter_timer_ops = {
-	.init = sw_event_timer_adapter_init,
-	.uninit = sw_event_timer_adapter_uninit,
-	.start = sw_event_timer_adapter_start,
-	.stop = sw_event_timer_adapter_stop,
-	.get_info = sw_event_timer_adapter_get_info,
-	.stats_get = sw_event_timer_adapter_stats_get,
-	.stats_reset = sw_event_timer_adapter_stats_reset,
-	.arm_burst = sw_event_timer_arm_burst,
-	.arm_tmo_tick_burst = sw_event_timer_arm_tmo_tick_burst,
-	.cancel_burst = sw_event_timer_cancel_burst,
+static const struct rte_event_timer_adapter_ops swtim_ops = {
+	.init			= swtim_init,
+	.uninit			= swtim_uninit,
+	.start			= swtim_start,
+	.stop			= swtim_stop,
+	.get_info		= swtim_get_info,
+	.stats_get		= swtim_stats_get,
+	.stats_reset		= swtim_stats_reset,
+	.arm_burst		= swtim_arm_burst,
+	.arm_tmo_tick_burst	= swtim_arm_tmo_tick_burst,
+	.cancel_burst		= swtim_cancel_burst,
 };
 
 RTE_INIT(event_timer_adapter_init_log)
-- 
2.6.4


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [dpdk-dev] [PATCH v6 1/1] eventdev: add new software event timer adapter
  2019-04-26 15:14           ` [dpdk-dev] [PATCH v6 1/1] eventdev: add new " Erik Gabriel Carrillo
  2019-04-26 15:14             ` Erik Gabriel Carrillo
@ 2019-04-26 18:51             ` Honnappa Nagarahalli
  2019-04-26 18:51               ` Honnappa Nagarahalli
  2019-04-26 18:58               ` Carrillo, Erik G
  1 sibling, 2 replies; 77+ messages in thread
From: Honnappa Nagarahalli @ 2019-04-26 18:51 UTC (permalink / raw)
  To: Erik Gabriel Carrillo, jerin.jacob
  Cc: mattias.ronnblom, pbhagavatula, dev, Honnappa Nagarahalli, nd, nd

Hi Erik,
	A quick question.

> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Erik Gabriel Carrillo
> Sent: Friday, April 26, 2019 10:14 AM
> To: jerin.jacob@caviumnetworks.com
> Cc: mattias.ronnblom@ericsson.com; pbhagavatula@caviumnetworks.com;
> dev@dpdk.org
> Subject: [dpdk-dev] [PATCH v6 1/1] eventdev: add new software event timer
> adapter
> 
> This patch introduces a new version of the event timer adapter software PMD.
> In the original design, timer event producer lcores in the primary and
> secondary processes enqueued event timers into a ring, and a service core in
> the primary process dequeued them and processed them further.  To improve
> performance, this version does away with the ring and lets lcores in both
> primary and secondary processes insert timers directly into timer skiplist data
> structures; the service core directly accesses the lists as well, when looking for
> timers that have expired.
How do you ensure concurrent access to the timer skiplist? Are you using any locks or is it a lock-free data structure?

<snip>

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [dpdk-dev] [PATCH v6 1/1] eventdev: add new software event timer adapter
  2019-04-26 18:51             ` Honnappa Nagarahalli
@ 2019-04-26 18:51               ` Honnappa Nagarahalli
  2019-04-26 18:58               ` Carrillo, Erik G
  1 sibling, 0 replies; 77+ messages in thread
From: Honnappa Nagarahalli @ 2019-04-26 18:51 UTC (permalink / raw)
  To: Erik Gabriel Carrillo, jerin.jacob
  Cc: mattias.ronnblom, pbhagavatula, dev, Honnappa Nagarahalli, nd, nd

Hi Erik,
	A quick question.

> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Erik Gabriel Carrillo
> Sent: Friday, April 26, 2019 10:14 AM
> To: jerin.jacob@caviumnetworks.com
> Cc: mattias.ronnblom@ericsson.com; pbhagavatula@caviumnetworks.com;
> dev@dpdk.org
> Subject: [dpdk-dev] [PATCH v6 1/1] eventdev: add new software event timer
> adapter
> 
> This patch introduces a new version of the event timer adapter software PMD.
> In the original design, timer event producer lcores in the primary and
> secondary processes enqueued event timers into a ring, and a service core in
> the primary process dequeued them and processed them further.  To improve
> performance, this version does away with the ring and lets lcores in both
> primary and secondary processes insert timers directly into timer skiplist data
> structures; the service core directly accesses the lists as well, when looking for
> timers that have expired.
How do you ensure concurrent access to the timer skiplist? Are you using any locks or is it a lock-free data structure?

<snip>


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [dpdk-dev] [PATCH v6 1/1] eventdev: add new software event timer adapter
  2019-04-26 18:51             ` Honnappa Nagarahalli
  2019-04-26 18:51               ` Honnappa Nagarahalli
@ 2019-04-26 18:58               ` Carrillo, Erik G
  2019-04-26 18:58                 ` Carrillo, Erik G
  2019-06-05 13:34                 ` Jerin Jacob Kollanukkaran
  1 sibling, 2 replies; 77+ messages in thread
From: Carrillo, Erik G @ 2019-04-26 18:58 UTC (permalink / raw)
  To: Honnappa Nagarahalli, jerin.jacob
  Cc: mattias.ronnblom, pbhagavatula, dev, nd, nd



> -----Original Message-----
> From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]
> Sent: Friday, April 26, 2019 1:51 PM
> To: Carrillo, Erik G <erik.g.carrillo@intel.com>;
> jerin.jacob@caviumnetworks.com
> Cc: mattias.ronnblom@ericsson.com; pbhagavatula@caviumnetworks.com;
> dev@dpdk.org; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>;
> nd <nd@arm.com>; nd <nd@arm.com>
> Subject: RE: [dpdk-dev] [PATCH v6 1/1] eventdev: add new software event
> timer adapter
> 
> Hi Erik,
> 	A quick question.
> 
> > -----Original Message-----
> > From: dev <dev-bounces@dpdk.org> On Behalf Of Erik Gabriel Carrillo
> > Sent: Friday, April 26, 2019 10:14 AM
> > To: jerin.jacob@caviumnetworks.com
> > Cc: mattias.ronnblom@ericsson.com;
> pbhagavatula@caviumnetworks.com;
> > dev@dpdk.org
> > Subject: [dpdk-dev] [PATCH v6 1/1] eventdev: add new software event
> > timer adapter
> >
> > This patch introduces a new version of the event timer adapter software
> PMD.
> > In the original design, timer event producer lcores in the primary and
> > secondary processes enqueued event timers into a ring, and a service
> > core in the primary process dequeued them and processed them further.
> > To improve performance, this version does away with the ring and lets
> > lcores in both primary and secondary processes insert timers directly
> > into timer skiplist data structures; the service core directly
> > accesses the lists as well, when looking for timers that have expired.
> How do you ensure concurrent access to the timer skiplist? Are you using any
> locks or is it a lock-free data structure?
> 
> <snip>

There are multiple timer skiplists,  one for each lcore, and each has its lock that is acquired as necessary when adding or removing timers from the skiplists.  This locking occurs in the underlying timer library, in the timer reset and stop functions.

Regards,
Erik

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [dpdk-dev] [PATCH v6 1/1] eventdev: add new software event timer adapter
  2019-04-26 18:58               ` Carrillo, Erik G
@ 2019-04-26 18:58                 ` Carrillo, Erik G
  2019-06-05 13:34                 ` Jerin Jacob Kollanukkaran
  1 sibling, 0 replies; 77+ messages in thread
From: Carrillo, Erik G @ 2019-04-26 18:58 UTC (permalink / raw)
  To: Honnappa Nagarahalli, jerin.jacob
  Cc: mattias.ronnblom, pbhagavatula, dev, nd, nd



> -----Original Message-----
> From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]
> Sent: Friday, April 26, 2019 1:51 PM
> To: Carrillo, Erik G <erik.g.carrillo@intel.com>;
> jerin.jacob@caviumnetworks.com
> Cc: mattias.ronnblom@ericsson.com; pbhagavatula@caviumnetworks.com;
> dev@dpdk.org; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>;
> nd <nd@arm.com>; nd <nd@arm.com>
> Subject: RE: [dpdk-dev] [PATCH v6 1/1] eventdev: add new software event
> timer adapter
> 
> Hi Erik,
> 	A quick question.
> 
> > -----Original Message-----
> > From: dev <dev-bounces@dpdk.org> On Behalf Of Erik Gabriel Carrillo
> > Sent: Friday, April 26, 2019 10:14 AM
> > To: jerin.jacob@caviumnetworks.com
> > Cc: mattias.ronnblom@ericsson.com;
> pbhagavatula@caviumnetworks.com;
> > dev@dpdk.org
> > Subject: [dpdk-dev] [PATCH v6 1/1] eventdev: add new software event
> > timer adapter
> >
> > This patch introduces a new version of the event timer adapter software
> PMD.
> > In the original design, timer event producer lcores in the primary and
> > secondary processes enqueued event timers into a ring, and a service
> > core in the primary process dequeued them and processed them further.
> > To improve performance, this version does away with the ring and lets
> > lcores in both primary and secondary processes insert timers directly
> > into timer skiplist data structures; the service core directly
> > accesses the lists as well, when looking for timers that have expired.
> How do you ensure concurrent access to the timer skiplist? Are you using any
> locks or is it a lock-free data structure?
> 
> <snip>

There are multiple timer skiplists,  one for each lcore, and each has its lock that is acquired as necessary when adding or removing timers from the skiplists.  This locking occurs in the underlying timer library, in the timer reset and stop functions.

Regards,
Erik

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [dpdk-dev] [PATCH v6 1/1] eventdev: add new software event timer adapter
  2019-04-26 18:58               ` Carrillo, Erik G
  2019-04-26 18:58                 ` Carrillo, Erik G
@ 2019-06-05 13:34                 ` Jerin Jacob Kollanukkaran
  1 sibling, 0 replies; 77+ messages in thread
From: Jerin Jacob Kollanukkaran @ 2019-06-05 13:34 UTC (permalink / raw)
  To: Carrillo, Erik G, Honnappa Nagarahalli, Jerin Jacob Kollanukkaran
  Cc: mattias.ronnblom, pbhagavatula, dev, nd, nd

> -----Original Message-----
> From: Carrillo, Erik G <erik.g.carrillo@intel.com>
> Sent: Saturday, April 27, 2019 12:29 AM
> To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>;
> jerin.jacob@caviumnetworks.com
> Cc: mattias.ronnblom@ericsson.com; pbhagavatula@caviumnetworks.com;
> dev@dpdk.org; nd <nd@arm.com>; nd <nd@arm.com>
> Subject: RE: [dpdk-dev] [PATCH v6 1/1] eventdev: add new software event timer
> adapter
> 
> 
> 
> > -----Original Message-----
> > From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]
> > Sent: Friday, April 26, 2019 1:51 PM
> > To: Carrillo, Erik G <erik.g.carrillo@intel.com>;
> > jerin.jacob@caviumnetworks.com
> > Cc: mattias.ronnblom@ericsson.com; pbhagavatula@caviumnetworks.com;
> > dev@dpdk.org; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; nd
> > <nd@arm.com>; nd <nd@arm.com>
> > Subject: RE: [dpdk-dev] [PATCH v6 1/1] eventdev: add new software
> > event timer adapter
> >
> > Hi Erik,
> > 	A quick question.
> >
> > > -----Original Message-----
> > > From: dev <dev-bounces@dpdk.org> On Behalf Of Erik Gabriel Carrillo
> > > Sent: Friday, April 26, 2019 10:14 AM
> > > To: jerin.jacob@caviumnetworks.com
> > > Cc: mattias.ronnblom@ericsson.com;
> > pbhagavatula@caviumnetworks.com;
> > > dev@dpdk.org
> > > Subject: [dpdk-dev] [PATCH v6 1/1] eventdev: add new software event
> > > timer adapter
> > >
> > > This patch introduces a new version of the event timer adapter
> > > software
> > PMD.
> > > In the original design, timer event producer lcores in the primary
> > > and secondary processes enqueued event timers into a ring, and a
> > > service core in the primary process dequeued them and processed them
> further.
> > > To improve performance, this version does away with the ring and

In general idea and patch looks good to me.
Could you update git commit log with 
# Percentage of performance improvement seen with this method?
# Means(test command etc) of measuring the performance improvement?



> > > lets lcores in both primary and secondary processes insert timers
> > > directly into timer skiplist data structures; the service core
> > > directly accesses the lists as well, when looking for timers that have expired.
> > How do you ensure concurrent access to the timer skiplist? Are you
> > using any locks or is it a lock-free data structure?
> >
> > <snip>
> 
> There are multiple timer skiplists,  one for each lcore, and each has its lock that
> is acquired as necessary when adding or removing timers from the skiplists.  This
> locking occurs in the underlying timer library, in the timer reset and stop
> functions.
> 
> Regards,
> Erik

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [dpdk-dev] [PATCH v7 0/1] New software event timer adapter
  2019-04-26 15:14         ` [dpdk-dev] [PATCH v6 0/1] New " Erik Gabriel Carrillo
  2019-04-26 15:14           ` Erik Gabriel Carrillo
  2019-04-26 15:14           ` [dpdk-dev] [PATCH v6 1/1] eventdev: add new " Erik Gabriel Carrillo
@ 2019-06-19 15:14           ` Erik Gabriel Carrillo
  2019-06-19 15:14             ` [dpdk-dev] [PATCH v7 1/1] eventdev: add new " Erik Gabriel Carrillo
  2 siblings, 1 reply; 77+ messages in thread
From: Erik Gabriel Carrillo @ 2019-06-19 15:14 UTC (permalink / raw)
  To: jerin.jacob; +Cc: mattias.ronnblom, pbhagavatula, Honnappa.Nagarahalli, dev

This patch introduces a new version of the event timer adapter software
PMD [1]. In the original design, timer event producer lcores in the primary
and secondary processes enqueued event timers into a ring, and a service
core in the primary process dequeued them and processed them further.  To
improve performance, this version does away with the ring and lets lcores in
both primary and secondary processes insert timers directly into timer
skiplist data structures; the service core directly accesses the lists as
well, when looking for timers that have expired.

[1] https://doc.dpdk.org/guides/prog_guide/event_timer_adapter.html

Changes in v7:
 - Remove unecessary lock protecting array of lcore ids whose timer lists
   should be processed in service function, and the count of elements
   in that array.
 - When adding rte_timers to a buffer to be freed later, first check if
   the buffer is full and empty it before adding a new element.
 - Update commit log with command used to compare performance, and
   amount of improvement seen. (Jerin)

Changes in v6:
 - Fix implicit type conversion bug that caused full event buffer to 
   sometimes not be correctly detected, resulting in lost events
 - Check return value of alt_timer_reset when resetting timer in event
   buffer full condition
 - Add timer list corresponding to service core to set of lists to scan
   when timers are reset by service core in event buffer full condition

Changes in v5:
 - Rebase patch to apply with latest timer library
 - Fix event buffering bug where full buffer was treated as empty
 - Return rte_timer objects back to mempool after service function has
   returned from timer_manage() call instead of in callback

Changes in v4:
 - Addressed the following comments from Mattias Ronnblom:
   - remove unnecessary header include
   - add missing read barrier in timer cancel function

Changes in v3:
 - Addressed comments from Mattias Ronnblom:
   - remove unnecessary header include
   - remove unnecessary cast in mempool_put() call
   - update alignment of elements of array to avoid false sharing issue

Changes in v2:
 - split this change out into its own patch series

Erik Gabriel Carrillo (1):
  eventdev: add new software event timer adapter

 lib/librte_eventdev/rte_event_timer_adapter.c | 741 +++++++++++---------------
 1 file changed, 322 insertions(+), 419 deletions(-)

-- 
2.6.4


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [dpdk-dev] [PATCH v7 1/1] eventdev: add new software event timer adapter
  2019-06-19 15:14           ` [dpdk-dev] [PATCH v7 0/1] New " Erik Gabriel Carrillo
@ 2019-06-19 15:14             ` Erik Gabriel Carrillo
  2019-06-19 16:25               ` [dpdk-dev] [PATCH v8 0/1] New " Erik Gabriel Carrillo
  0 siblings, 1 reply; 77+ messages in thread
From: Erik Gabriel Carrillo @ 2019-06-19 15:14 UTC (permalink / raw)
  To: jerin.jacob; +Cc: mattias.ronnblom, pbhagavatula, Honnappa.Nagarahalli, dev

This patch introduces a new version of the event timer adapter software
PMD. In the original design, timer event producer lcores in the primary
and secondary processes enqueued event timers into a ring, and a
service core in the primary process dequeued them and processed them
further.  To improve performance, this version does away with the ring
and lets lcores insert timers directly into timer skiplist data
structures; the service core directly accesses the lists as well, when
looking for timers that have expired.

To compare the burst and non-burst performance of the original and new
versions of the software event timer adapter, I ran the following
commands:

$ sudo ./build/app/dpdk-test-eventdev -c 0xFFE -s 0xC --vdev=event_sw0 \
-- --test=perf_queue --plcores=4,5,6 --wlcore=7,8,9 --stlist=p \
--prod_type_timerdev --worker_deq_depth=32

$ sudo ./build/app/dpdk-test-eventdev -c 0xFFE -s 0xC --vdev=event_sw0 \
-- --test=perf_queue --plcores=4,5,6 --wlcore=7,8,9 --stlist=p \
--prod_type_timerdev_burst --worker_deq_depth=32

With the new version, I see a 151% improvement in throughput for the
non-burst case, and a 270% improvement in throughput for the burst case.
I also see a 53% improvement in arm latency in the non-burst case and a
65% improvement in arm latency in the burst case.

Note: To perform the test,  I commented out a check in the original
version that checks the adapter tick interval against a minimum value.

Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
---
 lib/librte_eventdev/rte_event_timer_adapter.c | 741 +++++++++++---------------
 1 file changed, 322 insertions(+), 419 deletions(-)

diff --git a/lib/librte_eventdev/rte_event_timer_adapter.c b/lib/librte_eventdev/rte_event_timer_adapter.c
index 2f7a760..d525cb3 100644
--- a/lib/librte_eventdev/rte_event_timer_adapter.c
+++ b/lib/librte_eventdev/rte_event_timer_adapter.c
@@ -34,7 +34,7 @@ static int evtim_buffer_logtype;
 
 static struct rte_event_timer_adapter adapters[RTE_EVENT_TIMER_ADAPTER_NUM_MAX];
 
-static const struct rte_event_timer_adapter_ops sw_event_adapter_timer_ops;
+static const struct rte_event_timer_adapter_ops swtim_ops;
 
 #define EVTIM_LOG(level, logtype, ...) \
 	rte_log(RTE_LOG_ ## level, logtype, \
@@ -211,7 +211,7 @@ rte_event_timer_adapter_create_ext(
 	 * implementation.
 	 */
 	if (adapter->ops == NULL)
-		adapter->ops = &sw_event_adapter_timer_ops;
+		adapter->ops = &swtim_ops;
 
 	/* Allow driver to do some setup */
 	FUNC_PTR_OR_NULL_RET_WITH_ERRNO(adapter->ops->init, -ENOTSUP);
@@ -340,7 +340,7 @@ rte_event_timer_adapter_lookup(uint16_t adapter_id)
 	 * implementation.
 	 */
 	if (adapter->ops == NULL)
-		adapter->ops = &sw_event_adapter_timer_ops;
+		adapter->ops = &swtim_ops;
 
 	/* Set fast-path function pointers */
 	adapter->arm_burst = adapter->ops->arm_burst;
@@ -426,10 +426,11 @@ rte_event_timer_adapter_stats_reset(struct rte_event_timer_adapter *adapter)
 #define EVENT_BUFFER_SZ 4096
 #define EVENT_BUFFER_BATCHSZ 32
 #define EVENT_BUFFER_MASK (EVENT_BUFFER_SZ - 1)
+#define EXP_TIM_BUFFER_SZ 128
 
 struct event_buffer {
-	uint16_t head;
-	uint16_t tail;
+	size_t head;
+	size_t tail;
 	struct rte_event events[EVENT_BUFFER_SZ];
 } __rte_cache_aligned;
 
@@ -455,7 +456,7 @@ event_buffer_init(struct event_buffer *bufp)
 static int
 event_buffer_add(struct event_buffer *bufp, struct rte_event *eventp)
 {
-	uint16_t head_idx;
+	size_t head_idx;
 	struct rte_event *buf_eventp;
 
 	if (event_buffer_full(bufp))
@@ -477,13 +478,16 @@ event_buffer_flush(struct event_buffer *bufp, uint8_t dev_id, uint8_t port_id,
 		   uint16_t *nb_events_flushed,
 		   uint16_t *nb_events_inv)
 {
-	uint16_t head_idx, tail_idx, n = 0;
 	struct rte_event *events = bufp->events;
+	size_t head_idx, tail_idx;
+	uint16_t n = 0;
 
 	/* Instead of modulus, bitwise AND with mask to get index. */
 	head_idx = bufp->head & EVENT_BUFFER_MASK;
 	tail_idx = bufp->tail & EVENT_BUFFER_MASK;
 
+	RTE_ASSERT(head_idx < EVENT_BUFFER_SZ && tail_idx < EVENT_BUFFER_SZ);
+
 	/* Determine the largest contigous run we can attempt to enqueue to the
 	 * event device.
 	 */
@@ -491,150 +495,166 @@ event_buffer_flush(struct event_buffer *bufp, uint8_t dev_id, uint8_t port_id,
 		n = head_idx - tail_idx;
 	else if (head_idx < tail_idx)
 		n = EVENT_BUFFER_SZ - tail_idx;
+	else if (event_buffer_full(bufp))
+		n = EVENT_BUFFER_SZ - tail_idx;
 	else {
 		*nb_events_flushed = 0;
 		return;
 	}
 
+	n = RTE_MIN(EVENT_BUFFER_BATCHSZ, n);
 	*nb_events_inv = 0;
+
 	*nb_events_flushed = rte_event_enqueue_burst(dev_id, port_id,
 						     &events[tail_idx], n);
-	if (*nb_events_flushed != n && rte_errno == -EINVAL) {
-		EVTIM_LOG_ERR("failed to enqueue invalid event - dropping it");
-		(*nb_events_inv)++;
+	if (*nb_events_flushed != n) {
+		if (rte_errno == -EINVAL) {
+			EVTIM_LOG_ERR("failed to enqueue invalid event - "
+				      "dropping it");
+			(*nb_events_inv)++;
+		} else if (rte_errno == -ENOSPC)
+			rte_pause();
 	}
 
+	if (*nb_events_flushed > 0)
+		EVTIM_BUF_LOG_DBG("enqueued %"PRIu16" timer events to event "
+				  "device", *nb_events_flushed);
+
 	bufp->tail = bufp->tail + *nb_events_flushed + *nb_events_inv;
 }
 
 /*
  * Software event timer adapter implementation
  */
-
-struct rte_event_timer_adapter_sw_data {
-	/* List of messages for outstanding timers */
-	TAILQ_HEAD(, msg) msgs_tailq_head;
-	/* Lock to guard tailq and armed count */
-	rte_spinlock_t msgs_tailq_sl;
+struct swtim {
 	/* Identifier of service executing timer management logic. */
 	uint32_t service_id;
 	/* The cycle count at which the adapter should next tick */
 	uint64_t next_tick_cycles;
-	/* Incremented as the service moves through phases of an iteration */
-	volatile int service_phase;
 	/* The tick resolution used by adapter instance. May have been
 	 * adjusted from what user requested
 	 */
 	uint64_t timer_tick_ns;
 	/* Maximum timeout in nanoseconds allowed by adapter instance. */
 	uint64_t max_tmo_ns;
-	/* Ring containing messages to arm or cancel event timers */
-	struct rte_ring *msg_ring;
-	/* Mempool containing msg objects */
-	struct rte_mempool *msg_pool;
 	/* Buffered timer expiry events to be enqueued to an event device. */
 	struct event_buffer buffer;
 	/* Statistics */
 	struct rte_event_timer_adapter_stats stats;
-	/* The number of threads currently adding to the message ring */
-	rte_atomic16_t message_producer_count;
+	/* Mempool of timer objects */
+	struct rte_mempool *tim_pool;
+	/* Back pointer for convenience */
+	struct rte_event_timer_adapter *adapter;
+	/* Identifier of timer data instance */
+	uint32_t timer_data_id;
+	/* Track which cores have actually armed a timer */
+	struct {
+		rte_atomic16_t v;
+	} __rte_cache_aligned in_use[RTE_MAX_LCORE];
+	/* Track which cores' timer lists should be polled */
+	unsigned int poll_lcores[RTE_MAX_LCORE];
+	/* The number of lists that should be polled */
+	int n_poll_lcores;
+	/* Lock to atomically access the above two variables */
+	rte_spinlock_t poll_lcores_sl;
+
+	struct rte_timer *expired_timers[EXP_TIM_BUFFER_SZ];
+	size_t expired_timers_idx;
 };
 
-enum msg_type {MSG_TYPE_ARM, MSG_TYPE_CANCEL};
-
-struct msg {
-	enum msg_type type;
-	struct rte_event_timer *evtim;
-	struct rte_timer tim;
-	TAILQ_ENTRY(msg) msgs;
-};
+static inline struct swtim *
+swtim_pmd_priv(const struct rte_event_timer_adapter *adapter)
+{
+	return adapter->data->adapter_priv;
+}
 
 static void
-sw_event_timer_cb(struct rte_timer *tim, void *arg)
+swtim_callback(struct rte_timer *tim)
 {
-	int ret;
+	struct rte_event_timer *evtim = tim->arg;
+	struct rte_event_timer_adapter *adapter;
+	unsigned int lcore = rte_lcore_id();
+	struct swtim *sw;
 	uint16_t nb_evs_flushed = 0;
 	uint16_t nb_evs_invalid = 0;
 	uint64_t opaque;
-	struct rte_event_timer *evtim;
-	struct rte_event_timer_adapter *adapter;
-	struct rte_event_timer_adapter_sw_data *sw_data;
+	int ret;
 
-	evtim = arg;
 	opaque = evtim->impl_opaque[1];
 	adapter = (struct rte_event_timer_adapter *)(uintptr_t)opaque;
-	sw_data = adapter->data->adapter_priv;
+	sw = swtim_pmd_priv(adapter);
 
-	ret = event_buffer_add(&sw_data->buffer, &evtim->ev);
+	ret = event_buffer_add(&sw->buffer, &evtim->ev);
 	if (ret < 0) {
 		/* If event buffer is full, put timer back in list with
 		 * immediate expiry value, so that we process it again on the
 		 * next iteration.
 		 */
-		rte_timer_reset_sync(tim, 0, SINGLE, rte_lcore_id(),
-				     sw_event_timer_cb, evtim);
+		ret = rte_timer_alt_reset(sw->timer_data_id, tim, 0, SINGLE,
+					  lcore, NULL, evtim);
+		if (ret < 0) {
+			EVTIM_LOG_DBG("event buffer full, failed to reset "
+				      "timer with immediate expiry value");
+		} else {
+			sw->stats.evtim_retry_count++;
+			EVTIM_LOG_DBG("event buffer full, resetting rte_timer "
+				      "with immediate expiry value");
+		}
 
-		sw_data->stats.evtim_retry_count++;
-		EVTIM_LOG_DBG("event buffer full, resetting rte_timer with "
-			      "immediate expiry value");
+		if (unlikely(rte_atomic16_test_and_set(&sw->in_use[lcore].v)))
+			sw->poll_lcores[sw->n_poll_lcores++] = lcore;
 	} else {
-		struct msg *m = container_of(tim, struct msg, tim);
-		TAILQ_REMOVE(&sw_data->msgs_tailq_head, m, msgs);
 		EVTIM_BUF_LOG_DBG("buffered an event timer expiry event");
-		evtim->state = RTE_EVENT_TIMER_NOT_ARMED;
 
-		/* Free the msg object containing the rte_timer now that
-		 * we've buffered its event successfully.
+		/* Empty the buffer here, if necessary, to free older expired
+		 * timers only
 		 */
-		rte_mempool_put(sw_data->msg_pool, m);
+		if (unlikely(sw->expired_timers_idx == EXP_TIM_BUFFER_SZ)) {
+			rte_mempool_put_bulk(sw->tim_pool,
+					     (void **)sw->expired_timers,
+					     sw->expired_timers_idx);
+			sw->expired_timers_idx = 0;
+		}
 
-		/* Bump the count when we successfully add an expiry event to
-		 * the buffer.
-		 */
-		sw_data->stats.evtim_exp_count++;
+		sw->expired_timers[sw->expired_timers_idx++] = tim;
+		sw->stats.evtim_exp_count++;
+
+		evtim->state = RTE_EVENT_TIMER_NOT_ARMED;
 	}
 
-	if (event_buffer_batch_ready(&sw_data->buffer)) {
-		event_buffer_flush(&sw_data->buffer,
+	if (event_buffer_batch_ready(&sw->buffer)) {
+		event_buffer_flush(&sw->buffer,
 				   adapter->data->event_dev_id,
 				   adapter->data->event_port_id,
 				   &nb_evs_flushed,
 				   &nb_evs_invalid);
 
-		sw_data->stats.ev_enq_count += nb_evs_flushed;
-		sw_data->stats.ev_inv_count += nb_evs_invalid;
+		sw->stats.ev_enq_count += nb_evs_flushed;
+		sw->stats.ev_inv_count += nb_evs_invalid;
 	}
 }
 
 static __rte_always_inline uint64_t
 get_timeout_cycles(struct rte_event_timer *evtim,
-		   struct rte_event_timer_adapter *adapter)
+		   const struct rte_event_timer_adapter *adapter)
 {
-	uint64_t timeout_ns;
-	struct rte_event_timer_adapter_sw_data *sw_data;
-
-	sw_data = adapter->data->adapter_priv;
-	timeout_ns = evtim->timeout_ticks * sw_data->timer_tick_ns;
+	struct swtim *sw = swtim_pmd_priv(adapter);
+	uint64_t timeout_ns = evtim->timeout_ticks * sw->timer_tick_ns;
 	return timeout_ns * rte_get_timer_hz() / NSECPERSEC;
-
 }
 
 /* This function returns true if one or more (adapter) ticks have occurred since
  * the last time it was called.
  */
 static inline bool
-adapter_did_tick(struct rte_event_timer_adapter *adapter)
+swtim_did_tick(struct swtim *sw)
 {
 	uint64_t cycles_per_adapter_tick, start_cycles;
 	uint64_t *next_tick_cyclesp;
-	struct rte_event_timer_adapter_sw_data *sw_data;
 
-	sw_data = adapter->data->adapter_priv;
-	next_tick_cyclesp = &sw_data->next_tick_cycles;
-
-	cycles_per_adapter_tick = sw_data->timer_tick_ns *
+	next_tick_cyclesp = &sw->next_tick_cycles;
+	cycles_per_adapter_tick = sw->timer_tick_ns *
 			(rte_get_timer_hz() / NSECPERSEC);
-
 	start_cycles = rte_get_timer_cycles();
 
 	/* Note: initially, *next_tick_cyclesp == 0, so the clause below will
@@ -646,7 +666,6 @@ adapter_did_tick(struct rte_event_timer_adapter *adapter)
 		 * boundary.
 		 */
 		start_cycles -= start_cycles % cycles_per_adapter_tick;
-
 		*next_tick_cyclesp = start_cycles + cycles_per_adapter_tick;
 
 		return true;
@@ -661,15 +680,12 @@ check_timeout(struct rte_event_timer *evtim,
 	      const struct rte_event_timer_adapter *adapter)
 {
 	uint64_t tmo_nsec;
-	struct rte_event_timer_adapter_sw_data *sw_data;
-
-	sw_data = adapter->data->adapter_priv;
-	tmo_nsec = evtim->timeout_ticks * sw_data->timer_tick_ns;
+	struct swtim *sw = swtim_pmd_priv(adapter);
 
-	if (tmo_nsec > sw_data->max_tmo_ns)
+	tmo_nsec = evtim->timeout_ticks * sw->timer_tick_ns;
+	if (tmo_nsec > sw->max_tmo_ns)
 		return -1;
-
-	if (tmo_nsec < sw_data->timer_tick_ns)
+	if (tmo_nsec < sw->timer_tick_ns)
 		return -2;
 
 	return 0;
@@ -697,110 +713,41 @@ check_destination_event_queue(struct rte_event_timer *evtim,
 	return 0;
 }
 
-#define NB_OBJS 32
 static int
-sw_event_timer_adapter_service_func(void *arg)
+swtim_service_func(void *arg)
 {
-	int i, num_msgs;
-	uint64_t cycles, opaque;
+	struct rte_event_timer_adapter *adapter = arg;
+	struct swtim *sw = swtim_pmd_priv(adapter);
 	uint16_t nb_evs_flushed = 0;
 	uint16_t nb_evs_invalid = 0;
-	struct rte_event_timer_adapter *adapter;
-	struct rte_event_timer_adapter_sw_data *sw_data;
-	struct rte_event_timer *evtim = NULL;
-	struct rte_timer *tim = NULL;
-	struct msg *msg, *msgs[NB_OBJS];
-
-	adapter = arg;
-	sw_data = adapter->data->adapter_priv;
-
-	sw_data->service_phase = 1;
-	rte_smp_wmb();
-
-	while (rte_atomic16_read(&sw_data->message_producer_count) > 0 ||
-	       !rte_ring_empty(sw_data->msg_ring)) {
-
-		num_msgs = rte_ring_dequeue_burst(sw_data->msg_ring,
-						  (void **)msgs, NB_OBJS, NULL);
-
-		for (i = 0; i < num_msgs; i++) {
-			int ret = 0;
-
-			RTE_SET_USED(ret);
-
-			msg = msgs[i];
-			evtim = msg->evtim;
-
-			switch (msg->type) {
-			case MSG_TYPE_ARM:
-				EVTIM_SVC_LOG_DBG("dequeued ARM message from "
-						  "ring");
-				tim = &msg->tim;
-				rte_timer_init(tim);
-				cycles = get_timeout_cycles(evtim,
-							    adapter);
-				ret = rte_timer_reset(tim, cycles, SINGLE,
-						      rte_lcore_id(),
-						      sw_event_timer_cb,
-						      evtim);
-				RTE_ASSERT(ret == 0);
-
-				evtim->impl_opaque[0] = (uintptr_t)tim;
-				evtim->impl_opaque[1] = (uintptr_t)adapter;
-
-				TAILQ_INSERT_TAIL(&sw_data->msgs_tailq_head,
-						  msg,
-						  msgs);
-				break;
-			case MSG_TYPE_CANCEL:
-				EVTIM_SVC_LOG_DBG("dequeued CANCEL message "
-						  "from ring");
-				opaque = evtim->impl_opaque[0];
-				tim = (struct rte_timer *)(uintptr_t)opaque;
-				RTE_ASSERT(tim != NULL);
-
-				ret = rte_timer_stop(tim);
-				RTE_ASSERT(ret == 0);
-
-				/* Free the msg object for the original arm
-				 * request.
-				 */
-				struct msg *m;
-				m = container_of(tim, struct msg, tim);
-				TAILQ_REMOVE(&sw_data->msgs_tailq_head, m,
-					     msgs);
-				rte_mempool_put(sw_data->msg_pool, m);
-
-				/* Free the msg object for the current msg */
-				rte_mempool_put(sw_data->msg_pool, msg);
-
-				evtim->impl_opaque[0] = 0;
-				evtim->impl_opaque[1] = 0;
-
-				break;
-			}
-		}
-	}
 
-	sw_data->service_phase = 2;
-	rte_smp_wmb();
+	if (swtim_did_tick(sw)) {
+		/* This lock is seldom acquired on the arm side */
+		rte_spinlock_lock(&sw->poll_lcores_sl);
+
+		rte_timer_alt_manage(sw->timer_data_id,
+				     sw->poll_lcores,
+				     sw->n_poll_lcores,
+				     swtim_callback);
 
-	if (adapter_did_tick(adapter)) {
-		rte_timer_manage();
+		rte_spinlock_unlock(&sw->poll_lcores_sl);
 
-		event_buffer_flush(&sw_data->buffer,
+		/* Return expired timer objects back to mempool */
+		rte_mempool_put_bulk(sw->tim_pool, (void **)sw->expired_timers,
+				     sw->expired_timers_idx);
+		sw->expired_timers_idx = 0;
+
+		event_buffer_flush(&sw->buffer,
 				   adapter->data->event_dev_id,
 				   adapter->data->event_port_id,
-				   &nb_evs_flushed, &nb_evs_invalid);
+				   &nb_evs_flushed,
+				   &nb_evs_invalid);
 
-		sw_data->stats.ev_enq_count += nb_evs_flushed;
-		sw_data->stats.ev_inv_count += nb_evs_invalid;
-		sw_data->stats.adapter_tick_count++;
+		sw->stats.ev_enq_count += nb_evs_flushed;
+		sw->stats.ev_inv_count += nb_evs_invalid;
+		sw->stats.adapter_tick_count++;
 	}
 
-	sw_data->service_phase = 0;
-	rte_smp_wmb();
-
 	return 0;
 }
 
@@ -820,7 +767,7 @@ compute_msg_mempool_cache_size(uint64_t nb_requested, uint64_t nb_actual)
 	int size;
 	int cache_size = 0;
 
-	for (i = 0; ; i++) {
+	for (i = 0;; i++) {
 		size = 1 << i;
 
 		if (RTE_MAX_LCORE * size < (int)(nb_actual - nb_requested) &&
@@ -834,168 +781,145 @@ compute_msg_mempool_cache_size(uint64_t nb_requested, uint64_t nb_actual)
 	return cache_size;
 }
 
-#define SW_MIN_INTERVAL 1E5
-
 static int
-sw_event_timer_adapter_init(struct rte_event_timer_adapter *adapter)
+swtim_init(struct rte_event_timer_adapter *adapter)
 {
-	int ret;
-	struct rte_event_timer_adapter_sw_data *sw_data;
-	uint64_t nb_timers;
+	int i, ret;
+	struct swtim *sw;
 	unsigned int flags;
 	struct rte_service_spec service;
-	static bool timer_subsystem_inited; // static initialized to false
 
-	/* Allocate storage for SW implementation data */
-	char priv_data_name[RTE_RING_NAMESIZE];
-	snprintf(priv_data_name, RTE_RING_NAMESIZE, "sw_evtim_adap_priv_%"PRIu8,
-		 adapter->data->id);
-	adapter->data->adapter_priv = rte_zmalloc_socket(
-				priv_data_name,
-				sizeof(struct rte_event_timer_adapter_sw_data),
-				RTE_CACHE_LINE_SIZE,
-				adapter->data->socket_id);
-	if (adapter->data->adapter_priv == NULL) {
+	/* Allocate storage for private data area */
+#define SWTIM_NAMESIZE 32
+	char swtim_name[SWTIM_NAMESIZE];
+	snprintf(swtim_name, SWTIM_NAMESIZE, "swtim_%"PRIu8,
+			adapter->data->id);
+	sw = rte_zmalloc_socket(swtim_name, sizeof(*sw), RTE_CACHE_LINE_SIZE,
+			adapter->data->socket_id);
+	if (sw == NULL) {
 		EVTIM_LOG_ERR("failed to allocate space for private data");
 		rte_errno = ENOMEM;
 		return -1;
 	}
 
-	if (adapter->data->conf.timer_tick_ns < SW_MIN_INTERVAL) {
-		EVTIM_LOG_ERR("failed to create adapter with requested tick "
-			      "interval");
-		rte_errno = EINVAL;
-		return -1;
-	}
-
-	sw_data = adapter->data->adapter_priv;
-
-	sw_data->timer_tick_ns = adapter->data->conf.timer_tick_ns;
-	sw_data->max_tmo_ns = adapter->data->conf.max_tmo_ns;
-
-	TAILQ_INIT(&sw_data->msgs_tailq_head);
-	rte_spinlock_init(&sw_data->msgs_tailq_sl);
-	rte_atomic16_init(&sw_data->message_producer_count);
+	/* Connect storage to adapter instance */
+	adapter->data->adapter_priv = sw;
+	sw->adapter = adapter;
 
-	/* Rings require power of 2, so round up to next such value */
-	nb_timers = rte_align64pow2(adapter->data->conf.nb_timers);
+	sw->timer_tick_ns = adapter->data->conf.timer_tick_ns;
+	sw->max_tmo_ns = adapter->data->conf.max_tmo_ns;
 
-	char msg_ring_name[RTE_RING_NAMESIZE];
-	snprintf(msg_ring_name, RTE_RING_NAMESIZE,
-		 "sw_evtim_adap_msg_ring_%"PRIu8, adapter->data->id);
-	flags = adapter->data->conf.flags & RTE_EVENT_TIMER_ADAPTER_F_SP_PUT ?
-		RING_F_SP_ENQ | RING_F_SC_DEQ :
-		RING_F_SC_DEQ;
-	sw_data->msg_ring = rte_ring_create(msg_ring_name, nb_timers,
-					    adapter->data->socket_id, flags);
-	if (sw_data->msg_ring == NULL) {
-		EVTIM_LOG_ERR("failed to create message ring");
-		rte_errno = ENOMEM;
-		goto free_priv_data;
-	}
-
-	char pool_name[RTE_RING_NAMESIZE];
-	snprintf(pool_name, RTE_RING_NAMESIZE, "sw_evtim_adap_msg_pool_%"PRIu8,
+	/* Create a timer pool */
+	char pool_name[SWTIM_NAMESIZE];
+	snprintf(pool_name, SWTIM_NAMESIZE, "swtim_pool_%"PRIu8,
 		 adapter->data->id);
-
-	/* Both the arming/canceling thread and the service thread will do puts
-	 * to the mempool, but if the SP_PUT flag is enabled, we can specify
-	 * single-consumer get for the mempool.
-	 */
-	flags = adapter->data->conf.flags & RTE_EVENT_TIMER_ADAPTER_F_SP_PUT ?
-		MEMPOOL_F_SC_GET : 0;
-
-	/* The usable size of a ring is count - 1, so subtract one here to
-	 * make the counts agree.
-	 */
+	/* Optimal mempool size is a power of 2 minus one */
+	uint64_t nb_timers = rte_align64pow2(adapter->data->conf.nb_timers);
 	int pool_size = nb_timers - 1;
 	int cache_size = compute_msg_mempool_cache_size(
 				adapter->data->conf.nb_timers, nb_timers);
-	sw_data->msg_pool = rte_mempool_create(pool_name, pool_size,
-					       sizeof(struct msg), cache_size,
-					       0, NULL, NULL, NULL, NULL,
-					       adapter->data->socket_id, flags);
-	if (sw_data->msg_pool == NULL) {
-		EVTIM_LOG_ERR("failed to create message object mempool");
+	flags = 0; /* pool is multi-producer, multi-consumer */
+	sw->tim_pool = rte_mempool_create(pool_name, pool_size,
+			sizeof(struct rte_timer), cache_size, 0, NULL, NULL,
+			NULL, NULL, adapter->data->socket_id, flags);
+	if (sw->tim_pool == NULL) {
+		EVTIM_LOG_ERR("failed to create timer object mempool");
 		rte_errno = ENOMEM;
-		goto free_msg_ring;
+		goto free_alloc;
 	}
 
-	event_buffer_init(&sw_data->buffer);
+	/* Initialize the variables that track in-use timer lists */
+	rte_spinlock_init(&sw->poll_lcores_sl);
+	for (i = 0; i < RTE_MAX_LCORE; i++)
+		rte_atomic16_init(&sw->in_use[i].v);
+
+	/* Initialize the timer subsystem and allocate timer data instance */
+	ret = rte_timer_subsystem_init();
+	if (ret < 0) {
+		if (ret != -EALREADY) {
+			EVTIM_LOG_ERR("failed to initialize timer subsystem");
+			rte_errno = ret;
+			goto free_mempool;
+		}
+	}
+
+	ret = rte_timer_data_alloc(&sw->timer_data_id);
+	if (ret < 0) {
+		EVTIM_LOG_ERR("failed to allocate timer data instance");
+		rte_errno = ret;
+		goto free_mempool;
+	}
+
+	/* Initialize timer event buffer */
+	event_buffer_init(&sw->buffer);
+
+	sw->adapter = adapter;
 
 	/* Register a service component to run adapter logic */
 	memset(&service, 0, sizeof(service));
 	snprintf(service.name, RTE_SERVICE_NAME_MAX,
-		 "sw_evimer_adap_svc_%"PRIu8, adapter->data->id);
+		 "swtim_svc_%"PRIu8, adapter->data->id);
 	service.socket_id = adapter->data->socket_id;
-	service.callback = sw_event_timer_adapter_service_func;
+	service.callback = swtim_service_func;
 	service.callback_userdata = adapter;
 	service.capabilities &= ~(RTE_SERVICE_CAP_MT_SAFE);
-	ret = rte_service_component_register(&service, &sw_data->service_id);
+	ret = rte_service_component_register(&service, &sw->service_id);
 	if (ret < 0) {
 		EVTIM_LOG_ERR("failed to register service %s with id %"PRIu32
-			      ": err = %d", service.name, sw_data->service_id,
+			      ": err = %d", service.name, sw->service_id,
 			      ret);
 
 		rte_errno = ENOSPC;
-		goto free_msg_pool;
+		goto free_mempool;
 	}
 
 	EVTIM_LOG_DBG("registered service %s with id %"PRIu32, service.name,
-		      sw_data->service_id);
+		      sw->service_id);
 
-	adapter->data->service_id = sw_data->service_id;
+	adapter->data->service_id = sw->service_id;
 	adapter->data->service_inited = 1;
 
-	if (!timer_subsystem_inited) {
-		rte_timer_subsystem_init();
-		timer_subsystem_inited = true;
-	}
-
 	return 0;
-
-free_msg_pool:
-	rte_mempool_free(sw_data->msg_pool);
-free_msg_ring:
-	rte_ring_free(sw_data->msg_ring);
-free_priv_data:
-	rte_free(sw_data);
+free_mempool:
+	rte_mempool_free(sw->tim_pool);
+free_alloc:
+	rte_free(sw);
 	return -1;
 }
 
-static int
-sw_event_timer_adapter_uninit(struct rte_event_timer_adapter *adapter)
+static void
+swtim_free_tim(struct rte_timer *tim, void *arg)
 {
-	int ret;
-	struct msg *m1, *m2;
-	struct rte_event_timer_adapter_sw_data *sw_data =
-						adapter->data->adapter_priv;
-
-	rte_spinlock_lock(&sw_data->msgs_tailq_sl);
+	struct swtim *sw = arg;
 
-	/* Cancel outstanding rte_timers and free msg objects */
-	m1 = TAILQ_FIRST(&sw_data->msgs_tailq_head);
-	while (m1 != NULL) {
-		EVTIM_LOG_DBG("freeing outstanding timer");
-		m2 = TAILQ_NEXT(m1, msgs);
-
-		rte_timer_stop_sync(&m1->tim);
-		rte_mempool_put(sw_data->msg_pool, m1);
+	rte_mempool_put(sw->tim_pool, tim);
+}
 
-		m1 = m2;
-	}
+/* Traverse the list of outstanding timers and put them back in the mempool
+ * before freeing the adapter to avoid leaking the memory.
+ */
+static int
+swtim_uninit(struct rte_event_timer_adapter *adapter)
+{
+	int ret;
+	struct swtim *sw = swtim_pmd_priv(adapter);
 
-	rte_spinlock_unlock(&sw_data->msgs_tailq_sl);
+	/* Free outstanding timers */
+	rte_timer_stop_all(sw->timer_data_id,
+			   sw->poll_lcores,
+			   sw->n_poll_lcores,
+			   swtim_free_tim,
+			   sw);
 
-	ret = rte_service_component_unregister(sw_data->service_id);
+	ret = rte_service_component_unregister(sw->service_id);
 	if (ret < 0) {
 		EVTIM_LOG_ERR("failed to unregister service component");
 		return ret;
 	}
 
-	rte_ring_free(sw_data->msg_ring);
-	rte_mempool_free(sw_data->msg_pool);
-	rte_free(adapter->data->adapter_priv);
+	rte_mempool_free(sw->tim_pool);
+	rte_free(sw);
+	adapter->data->adapter_priv = NULL;
 
 	return 0;
 }
@@ -1016,88 +940,79 @@ get_mapped_count_for_service(uint32_t service_id)
 }
 
 static int
-sw_event_timer_adapter_start(const struct rte_event_timer_adapter *adapter)
+swtim_start(const struct rte_event_timer_adapter *adapter)
 {
 	int mapped_count;
-	struct rte_event_timer_adapter_sw_data *sw_data;
-
-	sw_data = adapter->data->adapter_priv;
+	struct swtim *sw = swtim_pmd_priv(adapter);
 
 	/* Mapping the service to more than one service core can introduce
 	 * delays while one thread is waiting to acquire a lock, so only allow
 	 * one core to be mapped to the service.
+	 *
+	 * Note: the service could be modified such that it spreads cores to
+	 * poll over multiple service instances.
 	 */
-	mapped_count = get_mapped_count_for_service(sw_data->service_id);
+	mapped_count = get_mapped_count_for_service(sw->service_id);
 
-	if (mapped_count == 1)
-		return rte_service_component_runstate_set(sw_data->service_id,
-							  1);
+	if (mapped_count != 1)
+		return mapped_count < 1 ? -ENOENT : -ENOTSUP;
 
-	return mapped_count < 1 ? -ENOENT : -ENOTSUP;
+	return rte_service_component_runstate_set(sw->service_id, 1);
 }
 
 static int
-sw_event_timer_adapter_stop(const struct rte_event_timer_adapter *adapter)
+swtim_stop(const struct rte_event_timer_adapter *adapter)
 {
 	int ret;
-	struct rte_event_timer_adapter_sw_data *sw_data =
-						adapter->data->adapter_priv;
+	struct swtim *sw = swtim_pmd_priv(adapter);
 
-	ret = rte_service_component_runstate_set(sw_data->service_id, 0);
+	ret = rte_service_component_runstate_set(sw->service_id, 0);
 	if (ret < 0)
 		return ret;
 
-	/* Wait for the service to complete its final iteration before
-	 * stopping.
-	 */
-	while (sw_data->service_phase != 0)
+	/* Wait for the service to complete its final iteration */
+	while (rte_service_may_be_active(sw->service_id))
 		rte_pause();
 
-	rte_smp_rmb();
-
 	return 0;
 }
 
 static void
-sw_event_timer_adapter_get_info(const struct rte_event_timer_adapter *adapter,
+swtim_get_info(const struct rte_event_timer_adapter *adapter,
 		struct rte_event_timer_adapter_info *adapter_info)
 {
-	struct rte_event_timer_adapter_sw_data *sw_data;
-	sw_data = adapter->data->adapter_priv;
-
-	adapter_info->min_resolution_ns = sw_data->timer_tick_ns;
-	adapter_info->max_tmo_ns = sw_data->max_tmo_ns;
+	struct swtim *sw = swtim_pmd_priv(adapter);
+	adapter_info->min_resolution_ns = sw->timer_tick_ns;
+	adapter_info->max_tmo_ns = sw->max_tmo_ns;
 }
 
 static int
-sw_event_timer_adapter_stats_get(const struct rte_event_timer_adapter *adapter,
-				 struct rte_event_timer_adapter_stats *stats)
+swtim_stats_get(const struct rte_event_timer_adapter *adapter,
+		struct rte_event_timer_adapter_stats *stats)
 {
-	struct rte_event_timer_adapter_sw_data *sw_data;
-	sw_data = adapter->data->adapter_priv;
-	*stats = sw_data->stats;
+	struct swtim *sw = swtim_pmd_priv(adapter);
+	*stats = sw->stats; /* structure copy */
 	return 0;
 }
 
 static int
-sw_event_timer_adapter_stats_reset(
-				const struct rte_event_timer_adapter *adapter)
+swtim_stats_reset(const struct rte_event_timer_adapter *adapter)
 {
-	struct rte_event_timer_adapter_sw_data *sw_data;
-	sw_data = adapter->data->adapter_priv;
-	memset(&sw_data->stats, 0, sizeof(sw_data->stats));
+	struct swtim *sw = swtim_pmd_priv(adapter);
+	memset(&sw->stats, 0, sizeof(sw->stats));
 	return 0;
 }
 
-static __rte_always_inline uint16_t
-__sw_event_timer_arm_burst(const struct rte_event_timer_adapter *adapter,
-			  struct rte_event_timer **evtims,
-			  uint16_t nb_evtims)
+static uint16_t
+__swtim_arm_burst(const struct rte_event_timer_adapter *adapter,
+		struct rte_event_timer **evtims,
+		uint16_t nb_evtims)
 {
-	uint16_t i;
-	int ret;
-	struct rte_event_timer_adapter_sw_data *sw_data;
-	struct msg *msgs[nb_evtims];
+	int i, ret;
+	struct swtim *sw = swtim_pmd_priv(adapter);
+	uint32_t lcore_id = rte_lcore_id();
+	struct rte_timer *tim, *tims[nb_evtims];
+	uint64_t cycles;
 
 #ifdef RTE_LIBRTE_EVENTDEV_DEBUG
 	/* Check that the service is running. */
@@ -1107,101 +1022,104 @@ __sw_event_timer_arm_burst(const struct rte_event_timer_adapter *adapter,
 	}
 #endif
 
-	sw_data = adapter->data->adapter_priv;
+	/* Adjust lcore_id if non-EAL thread. Arbitrarily pick the timer list of
+	 * the highest lcore to insert such timers into
+	 */
+	if (lcore_id == LCORE_ID_ANY)
+		lcore_id = RTE_MAX_LCORE - 1;
+
+	/* If this is the first time we're arming an event timer on this lcore,
+	 * mark this lcore as "in use"; this will cause the service
+	 * function to process the timer list that corresponds to this lcore.
+	 */
+	if (unlikely(rte_atomic16_test_and_set(&sw->in_use[lcore_id].v))) {
+		rte_spinlock_lock(&sw->poll_lcores_sl);
+		EVTIM_LOG_DBG("Adding lcore id = %u to list of lcores to poll",
+			      lcore_id);
+		sw->poll_lcores[sw->n_poll_lcores++] = lcore_id;
+		rte_spinlock_unlock(&sw->poll_lcores_sl);
+	}
 
-	ret = rte_mempool_get_bulk(sw_data->msg_pool, (void **)msgs, nb_evtims);
+	ret = rte_mempool_get_bulk(sw->tim_pool, (void **)tims,
+				   nb_evtims);
 	if (ret < 0) {
 		rte_errno = ENOSPC;
 		return 0;
 	}
 
-	/* Let the service know we're producing messages for it to process */
-	rte_atomic16_inc(&sw_data->message_producer_count);
-
-	/* If the service is managing timers, wait for it to finish */
-	while (sw_data->service_phase == 2)
-		rte_pause();
-
-	rte_smp_rmb();
-
 	for (i = 0; i < nb_evtims; i++) {
 		/* Don't modify the event timer state in these cases */
 		if (evtims[i]->state == RTE_EVENT_TIMER_ARMED) {
 			rte_errno = EALREADY;
 			break;
 		} else if (!(evtims[i]->state == RTE_EVENT_TIMER_NOT_ARMED ||
-		    evtims[i]->state == RTE_EVENT_TIMER_CANCELED)) {
+			     evtims[i]->state == RTE_EVENT_TIMER_CANCELED)) {
 			rte_errno = EINVAL;
 			break;
 		}
 
 		ret = check_timeout(evtims[i], adapter);
-		if (ret == -1) {
+		if (unlikely(ret == -1)) {
 			evtims[i]->state = RTE_EVENT_TIMER_ERROR_TOOLATE;
 			rte_errno = EINVAL;
 			break;
-		}
-		if (ret == -2) {
+		} else if (unlikely(ret == -2)) {
 			evtims[i]->state = RTE_EVENT_TIMER_ERROR_TOOEARLY;
 			rte_errno = EINVAL;
 			break;
 		}
 
-		if (check_destination_event_queue(evtims[i], adapter) < 0) {
+		if (unlikely(check_destination_event_queue(evtims[i],
+							   adapter) < 0)) {
 			evtims[i]->state = RTE_EVENT_TIMER_ERROR;
 			rte_errno = EINVAL;
 			break;
 		}
 
-		/* Checks passed, set up a message to enqueue */
-		msgs[i]->type = MSG_TYPE_ARM;
-		msgs[i]->evtim = evtims[i];
+		tim = tims[i];
+		rte_timer_init(tim);
 
-		/* Set the payload pointer if not set. */
-		if (evtims[i]->ev.event_ptr == NULL)
-			evtims[i]->ev.event_ptr = evtims[i];
+		evtims[i]->impl_opaque[0] = (uintptr_t)tim;
+		evtims[i]->impl_opaque[1] = (uintptr_t)adapter;
 
-		/* msg objects that get enqueued successfully will be freed
-		 * either by a future cancel operation or by the timer
-		 * expiration callback.
-		 */
-		if (rte_ring_enqueue(sw_data->msg_ring, msgs[i]) < 0) {
-			rte_errno = ENOSPC;
+		cycles = get_timeout_cycles(evtims[i], adapter);
+		ret = rte_timer_alt_reset(sw->timer_data_id, tim, cycles,
+					  SINGLE, lcore_id, NULL, evtims[i]);
+		if (ret < 0) {
+			/* tim was in RUNNING or CONFIG state */
+			evtims[i]->state = RTE_EVENT_TIMER_ERROR;
 			break;
 		}
 
-		EVTIM_LOG_DBG("enqueued ARM message to ring");
-
+		rte_smp_wmb();
+		EVTIM_LOG_DBG("armed an event timer");
 		evtims[i]->state = RTE_EVENT_TIMER_ARMED;
 	}
 
-	/* Let the service know we're done producing messages */
-	rte_atomic16_dec(&sw_data->message_producer_count);
-
 	if (i < nb_evtims)
-		rte_mempool_put_bulk(sw_data->msg_pool, (void **)&msgs[i],
-				     nb_evtims - i);
+		rte_mempool_put_bulk(sw->tim_pool,
+				     (void **)&tims[i], nb_evtims - i);
 
 	return i;
 }
 
 static uint16_t
-sw_event_timer_arm_burst(const struct rte_event_timer_adapter *adapter,
-			 struct rte_event_timer **evtims,
-			 uint16_t nb_evtims)
+swtim_arm_burst(const struct rte_event_timer_adapter *adapter,
+		struct rte_event_timer **evtims,
+		uint16_t nb_evtims)
 {
-	return __sw_event_timer_arm_burst(adapter, evtims, nb_evtims);
+	return __swtim_arm_burst(adapter, evtims, nb_evtims);
 }
 
 static uint16_t
-sw_event_timer_cancel_burst(const struct rte_event_timer_adapter *adapter,
-			    struct rte_event_timer **evtims,
-			    uint16_t nb_evtims)
+swtim_cancel_burst(const struct rte_event_timer_adapter *adapter,
+		   struct rte_event_timer **evtims,
+		   uint16_t nb_evtims)
 {
-	uint16_t i;
-	int ret;
-	struct rte_event_timer_adapter_sw_data *sw_data;
-	struct msg *msgs[nb_evtims];
+	int i, ret;
+	struct rte_timer *timp;
+	uint64_t opaque;
+	struct swtim *sw = swtim_pmd_priv(adapter);
 
 #ifdef RTE_LIBRTE_EVENTDEV_DEBUG
 	/* Check that the service is running. */
@@ -1211,23 +1129,6 @@ sw_event_timer_cancel_burst(const struct rte_event_timer_adapter *adapter,
 	}
 #endif
 
-	sw_data = adapter->data->adapter_priv;
-
-	ret = rte_mempool_get_bulk(sw_data->msg_pool, (void **)msgs, nb_evtims);
-	if (ret < 0) {
-		rte_errno = ENOSPC;
-		return 0;
-	}
-
-	/* Let the service know we're producing messages for it to process */
-	rte_atomic16_inc(&sw_data->message_producer_count);
-
-	/* If the service could be modifying event timer states, wait */
-	while (sw_data->service_phase == 2)
-		rte_pause();
-
-	rte_smp_rmb();
-
 	for (i = 0; i < nb_evtims; i++) {
 		/* Don't modify the event timer state in these cases */
 		if (evtims[i]->state == RTE_EVENT_TIMER_CANCELED) {
@@ -1238,54 +1139,56 @@ sw_event_timer_cancel_burst(const struct rte_event_timer_adapter *adapter,
 			break;
 		}
 
-		msgs[i]->type = MSG_TYPE_CANCEL;
-		msgs[i]->evtim = evtims[i];
+		rte_smp_rmb();
+
+		opaque = evtims[i]->impl_opaque[0];
+		timp = (struct rte_timer *)(uintptr_t)opaque;
+		RTE_ASSERT(timp != NULL);
 
-		if (rte_ring_enqueue(sw_data->msg_ring, msgs[i]) < 0) {
-			rte_errno = ENOSPC;
+		ret = rte_timer_alt_stop(sw->timer_data_id, timp);
+		if (ret < 0) {
+			/* Timer is running or being configured */
+			rte_errno = EAGAIN;
 			break;
 		}
 
-		EVTIM_LOG_DBG("enqueued CANCEL message to ring");
+		rte_mempool_put(sw->tim_pool, (void **)timp);
 
 		evtims[i]->state = RTE_EVENT_TIMER_CANCELED;
-	}
+		evtims[i]->impl_opaque[0] = 0;
+		evtims[i]->impl_opaque[1] = 0;
 
-	/* Let the service know we're done producing messages */
-	rte_atomic16_dec(&sw_data->message_producer_count);
-
-	if (i < nb_evtims)
-		rte_mempool_put_bulk(sw_data->msg_pool, (void **)&msgs[i],
-				     nb_evtims - i);
+		rte_smp_wmb();
+	}
 
 	return i;
 }
 
 static uint16_t
-sw_event_timer_arm_tmo_tick_burst(const struct rte_event_timer_adapter *adapter,
-				  struct rte_event_timer **evtims,
-				  uint64_t timeout_ticks,
-				  uint16_t nb_evtims)
+swtim_arm_tmo_tick_burst(const struct rte_event_timer_adapter *adapter,
+			 struct rte_event_timer **evtims,
+			 uint64_t timeout_ticks,
+			 uint16_t nb_evtims)
 {
 	int i;
 
 	for (i = 0; i < nb_evtims; i++)
 		evtims[i]->timeout_ticks = timeout_ticks;
 
-	return __sw_event_timer_arm_burst(adapter, evtims, nb_evtims);
+	return __swtim_arm_burst(adapter, evtims, nb_evtims);
 }
 
-static const struct rte_event_timer_adapter_ops sw_event_adapter_timer_ops = {
-	.init = sw_event_timer_adapter_init,
-	.uninit = sw_event_timer_adapter_uninit,
-	.start = sw_event_timer_adapter_start,
-	.stop = sw_event_timer_adapter_stop,
-	.get_info = sw_event_timer_adapter_get_info,
-	.stats_get = sw_event_timer_adapter_stats_get,
-	.stats_reset = sw_event_timer_adapter_stats_reset,
-	.arm_burst = sw_event_timer_arm_burst,
-	.arm_tmo_tick_burst = sw_event_timer_arm_tmo_tick_burst,
-	.cancel_burst = sw_event_timer_cancel_burst,
+static const struct rte_event_timer_adapter_ops swtim_ops = {
+	.init			= swtim_init,
+	.uninit			= swtim_uninit,
+	.start			= swtim_start,
+	.stop			= swtim_stop,
+	.get_info		= swtim_get_info,
+	.stats_get		= swtim_stats_get,
+	.stats_reset		= swtim_stats_reset,
+	.arm_burst		= swtim_arm_burst,
+	.arm_tmo_tick_burst	= swtim_arm_tmo_tick_burst,
+	.cancel_burst		= swtim_cancel_burst,
 };
 
 RTE_INIT(event_timer_adapter_init_log)
-- 
2.6.4


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [dpdk-dev] [PATCH v8 0/1] New software event timer adapter
  2019-06-19 15:14             ` [dpdk-dev] [PATCH v7 1/1] eventdev: add new " Erik Gabriel Carrillo
@ 2019-06-19 16:25               ` Erik Gabriel Carrillo
  2019-06-19 16:25                 ` [dpdk-dev] [PATCH v8 1/1] eventdev: add new " Erik Gabriel Carrillo
  0 siblings, 1 reply; 77+ messages in thread
From: Erik Gabriel Carrillo @ 2019-06-19 16:25 UTC (permalink / raw)
  To: jerin.jacob; +Cc: mattias.ronnblom, pbhagavatula, Honnappa.Nagarahalli, dev

This patch introduces a new version of the event timer adapter software
PMD [1]. In the original design, timer event producer lcores in the primary
and secondary processes enqueued event timers into a ring, and a service
core in the primary process dequeued them and processed them further.  To
improve performance, this version does away with the ring and lets lcores in
both primary and secondary processes insert timers directly into timer
skiplist data structures; the service core directly accesses the lists as
well, when looking for timers that have expired.

[1] https://doc.dpdk.org/guides/prog_guide/event_timer_adapter.html

Changes in v8:
 - I generated the v7 patch from the wrong commit, so the first two bullets
   listed in the v7 changes below were not included.  This version includes
   them.

Changes in v7:
 - Remove unecessary lock protecting array of lcore ids whose timer lists
   should be processed in service function, and the count of elements
   in that array.
 - When adding rte_timers to a buffer to be freed later, first check if
   the buffer is full and empty it before adding a new element.
 - Update commit log with command used to compare performance, and
   amount of improvement seen. (Jerin)

Changes in v6:
 - Fix implicit type conversion bug that caused full event buffer to 
   sometimes not be correctly detected, resulting in lost events
 - Check return value of alt_timer_reset when resetting timer in event
   buffer full condition
 - Add timer list corresponding to service core to set of lists to scan
   when timers are reset by service core in event buffer full condition

Changes in v5:
 - Rebase patch to apply with latest timer library
 - Fix event buffering bug where full buffer was treated as empty
 - Return rte_timer objects back to mempool after service function has
   returned from timer_manage() call instead of in callback

Changes in v4:
 - Addressed the following comments from Mattias Ronnblom:
   - remove unnecessary header include
   - add missing read barrier in timer cancel function

Changes in v3:
 - Addressed comments from Mattias Ronnblom:
   - remove unnecessary header include
   - remove unnecessary cast in mempool_put() call
   - update alignment of elements of array to avoid false sharing issue

Changes in v2:
 - split this change out into its own patch series

Erik Gabriel Carrillo (1):
  eventdev: add new software event timer adapter

 lib/librte_eventdev/rte_event_timer_adapter.c | 734 +++++++++++---------------
 1 file changed, 315 insertions(+), 419 deletions(-)

-- 
2.6.4


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [dpdk-dev] [PATCH v8 1/1] eventdev: add new software event timer adapter
  2019-06-19 16:25               ` [dpdk-dev] [PATCH v8 0/1] New " Erik Gabriel Carrillo
@ 2019-06-19 16:25                 ` Erik Gabriel Carrillo
  2019-06-24  6:12                   ` Jerin Jacob Kollanukkaran
  0 siblings, 1 reply; 77+ messages in thread
From: Erik Gabriel Carrillo @ 2019-06-19 16:25 UTC (permalink / raw)
  To: jerin.jacob; +Cc: mattias.ronnblom, pbhagavatula, Honnappa.Nagarahalli, dev

This patch introduces a new version of the event timer adapter software
PMD. In the original design, timer event producer lcores in the primary
and secondary processes enqueued event timers into a ring, and a
service core in the primary process dequeued them and processed them
further.  To improve performance, this version does away with the ring
and lets lcores insert timers directly into timer skiplist data
structures; the service core directly accesses the lists as well, when
looking for timers that have expired.

To compare the burst and non-burst performance of the original and new
versions of the software event timer adapter, I ran the following
commands:

$ sudo ./build/app/dpdk-test-eventdev -c 0xFFE -s 0xC --vdev=event_sw0 \
-- --test=perf_queue --plcores=4,5,6 --wlcore=7,8,9 --stlist=p \
--prod_type_timerdev --worker_deq_depth=32

$ sudo ./build/app/dpdk-test-eventdev -c 0xFFE -s 0xC --vdev=event_sw0 \
-- --test=perf_queue --plcores=4,5,6 --wlcore=7,8,9 --stlist=p \
--prod_type_timerdev_burst --worker_deq_depth=32

With the new version, I see a 151% improvement in throughput for the
non-burst case, and a 270% improvement in throughput for the burst case.
I also see a 53% improvement in arm latency in the non-burst case and a
65% improvement in arm latency in the burst case.

Note: To perform the test,  I commented out a check in the original
version that checks the adapter tick interval against a minimum value.

Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
---
 lib/librte_eventdev/rte_event_timer_adapter.c | 734 +++++++++++---------------
 1 file changed, 315 insertions(+), 419 deletions(-)

diff --git a/lib/librte_eventdev/rte_event_timer_adapter.c b/lib/librte_eventdev/rte_event_timer_adapter.c
index 2f7a760..459bc47 100644
--- a/lib/librte_eventdev/rte_event_timer_adapter.c
+++ b/lib/librte_eventdev/rte_event_timer_adapter.c
@@ -34,7 +34,7 @@ static int evtim_buffer_logtype;
 
 static struct rte_event_timer_adapter adapters[RTE_EVENT_TIMER_ADAPTER_NUM_MAX];
 
-static const struct rte_event_timer_adapter_ops sw_event_adapter_timer_ops;
+static const struct rte_event_timer_adapter_ops swtim_ops;
 
 #define EVTIM_LOG(level, logtype, ...) \
 	rte_log(RTE_LOG_ ## level, logtype, \
@@ -211,7 +211,7 @@ rte_event_timer_adapter_create_ext(
 	 * implementation.
 	 */
 	if (adapter->ops == NULL)
-		adapter->ops = &sw_event_adapter_timer_ops;
+		adapter->ops = &swtim_ops;
 
 	/* Allow driver to do some setup */
 	FUNC_PTR_OR_NULL_RET_WITH_ERRNO(adapter->ops->init, -ENOTSUP);
@@ -340,7 +340,7 @@ rte_event_timer_adapter_lookup(uint16_t adapter_id)
 	 * implementation.
 	 */
 	if (adapter->ops == NULL)
-		adapter->ops = &sw_event_adapter_timer_ops;
+		adapter->ops = &swtim_ops;
 
 	/* Set fast-path function pointers */
 	adapter->arm_burst = adapter->ops->arm_burst;
@@ -427,9 +427,11 @@ rte_event_timer_adapter_stats_reset(struct rte_event_timer_adapter *adapter)
 #define EVENT_BUFFER_BATCHSZ 32
 #define EVENT_BUFFER_MASK (EVENT_BUFFER_SZ - 1)
 
+#define EXP_TIM_BUF_SZ 128
+
 struct event_buffer {
-	uint16_t head;
-	uint16_t tail;
+	size_t head;
+	size_t tail;
 	struct rte_event events[EVENT_BUFFER_SZ];
 } __rte_cache_aligned;
 
@@ -455,7 +457,7 @@ event_buffer_init(struct event_buffer *bufp)
 static int
 event_buffer_add(struct event_buffer *bufp, struct rte_event *eventp)
 {
-	uint16_t head_idx;
+	size_t head_idx;
 	struct rte_event *buf_eventp;
 
 	if (event_buffer_full(bufp))
@@ -477,13 +479,16 @@ event_buffer_flush(struct event_buffer *bufp, uint8_t dev_id, uint8_t port_id,
 		   uint16_t *nb_events_flushed,
 		   uint16_t *nb_events_inv)
 {
-	uint16_t head_idx, tail_idx, n = 0;
 	struct rte_event *events = bufp->events;
+	size_t head_idx, tail_idx;
+	uint16_t n = 0;
 
 	/* Instead of modulus, bitwise AND with mask to get index. */
 	head_idx = bufp->head & EVENT_BUFFER_MASK;
 	tail_idx = bufp->tail & EVENT_BUFFER_MASK;
 
+	RTE_ASSERT(head_idx < EVENT_BUFFER_SZ && tail_idx < EVENT_BUFFER_SZ);
+
 	/* Determine the largest contigous run we can attempt to enqueue to the
 	 * event device.
 	 */
@@ -491,150 +496,165 @@ event_buffer_flush(struct event_buffer *bufp, uint8_t dev_id, uint8_t port_id,
 		n = head_idx - tail_idx;
 	else if (head_idx < tail_idx)
 		n = EVENT_BUFFER_SZ - tail_idx;
+	else if (event_buffer_full(bufp))
+		n = EVENT_BUFFER_SZ - tail_idx;
 	else {
 		*nb_events_flushed = 0;
 		return;
 	}
 
+	n = RTE_MIN(EVENT_BUFFER_BATCHSZ, n);
 	*nb_events_inv = 0;
+
 	*nb_events_flushed = rte_event_enqueue_burst(dev_id, port_id,
 						     &events[tail_idx], n);
-	if (*nb_events_flushed != n && rte_errno == -EINVAL) {
-		EVTIM_LOG_ERR("failed to enqueue invalid event - dropping it");
-		(*nb_events_inv)++;
+	if (*nb_events_flushed != n) {
+		if (rte_errno == -EINVAL) {
+			EVTIM_LOG_ERR("failed to enqueue invalid event - "
+				      "dropping it");
+			(*nb_events_inv)++;
+		} else if (rte_errno == -ENOSPC)
+			rte_pause();
 	}
 
+	if (*nb_events_flushed > 0)
+		EVTIM_BUF_LOG_DBG("enqueued %"PRIu16" timer events to event "
+				  "device", *nb_events_flushed);
+
 	bufp->tail = bufp->tail + *nb_events_flushed + *nb_events_inv;
 }
 
 /*
  * Software event timer adapter implementation
  */
-
-struct rte_event_timer_adapter_sw_data {
-	/* List of messages for outstanding timers */
-	TAILQ_HEAD(, msg) msgs_tailq_head;
-	/* Lock to guard tailq and armed count */
-	rte_spinlock_t msgs_tailq_sl;
+struct swtim {
 	/* Identifier of service executing timer management logic. */
 	uint32_t service_id;
 	/* The cycle count at which the adapter should next tick */
 	uint64_t next_tick_cycles;
-	/* Incremented as the service moves through phases of an iteration */
-	volatile int service_phase;
 	/* The tick resolution used by adapter instance. May have been
 	 * adjusted from what user requested
 	 */
 	uint64_t timer_tick_ns;
 	/* Maximum timeout in nanoseconds allowed by adapter instance. */
 	uint64_t max_tmo_ns;
-	/* Ring containing messages to arm or cancel event timers */
-	struct rte_ring *msg_ring;
-	/* Mempool containing msg objects */
-	struct rte_mempool *msg_pool;
 	/* Buffered timer expiry events to be enqueued to an event device. */
 	struct event_buffer buffer;
 	/* Statistics */
 	struct rte_event_timer_adapter_stats stats;
-	/* The number of threads currently adding to the message ring */
-	rte_atomic16_t message_producer_count;
+	/* Mempool of timer objects */
+	struct rte_mempool *tim_pool;
+	/* Back pointer for convenience */
+	struct rte_event_timer_adapter *adapter;
+	/* Identifier of timer data instance */
+	uint32_t timer_data_id;
+	/* Track which cores have actually armed a timer */
+	struct {
+		rte_atomic16_t v;
+	} __rte_cache_aligned in_use[RTE_MAX_LCORE];
+	/* Track which cores' timer lists should be polled */
+	unsigned int poll_lcores[RTE_MAX_LCORE];
+	/* The number of lists that should be polled */
+	int n_poll_lcores;
+	/* Timers which have expired and can be returned to a mempool */
+	struct rte_timer *expired_timers[EXP_TIM_BUF_SZ];
+	/* The number of timers that can be returned to a mempool */
+	size_t n_expired_timers;
 };
 
-enum msg_type {MSG_TYPE_ARM, MSG_TYPE_CANCEL};
-
-struct msg {
-	enum msg_type type;
-	struct rte_event_timer *evtim;
-	struct rte_timer tim;
-	TAILQ_ENTRY(msg) msgs;
-};
+static inline struct swtim *
+swtim_pmd_priv(const struct rte_event_timer_adapter *adapter)
+{
+	return adapter->data->adapter_priv;
+}
 
 static void
-sw_event_timer_cb(struct rte_timer *tim, void *arg)
+swtim_callback(struct rte_timer *tim)
 {
-	int ret;
+	struct rte_event_timer *evtim = tim->arg;
+	struct rte_event_timer_adapter *adapter;
+	unsigned int lcore = rte_lcore_id();
+	struct swtim *sw;
 	uint16_t nb_evs_flushed = 0;
 	uint16_t nb_evs_invalid = 0;
 	uint64_t opaque;
-	struct rte_event_timer *evtim;
-	struct rte_event_timer_adapter *adapter;
-	struct rte_event_timer_adapter_sw_data *sw_data;
+	int ret;
 
-	evtim = arg;
 	opaque = evtim->impl_opaque[1];
 	adapter = (struct rte_event_timer_adapter *)(uintptr_t)opaque;
-	sw_data = adapter->data->adapter_priv;
+	sw = swtim_pmd_priv(adapter);
 
-	ret = event_buffer_add(&sw_data->buffer, &evtim->ev);
+	ret = event_buffer_add(&sw->buffer, &evtim->ev);
 	if (ret < 0) {
 		/* If event buffer is full, put timer back in list with
 		 * immediate expiry value, so that we process it again on the
 		 * next iteration.
 		 */
-		rte_timer_reset_sync(tim, 0, SINGLE, rte_lcore_id(),
-				     sw_event_timer_cb, evtim);
+		ret = rte_timer_alt_reset(sw->timer_data_id, tim, 0, SINGLE,
+					  lcore, NULL, evtim);
+		if (ret < 0) {
+			EVTIM_LOG_DBG("event buffer full, failed to reset "
+				      "timer with immediate expiry value");
+		} else {
+			sw->stats.evtim_retry_count++;
+			EVTIM_LOG_DBG("event buffer full, resetting rte_timer "
+				      "with immediate expiry value");
+		}
 
-		sw_data->stats.evtim_retry_count++;
-		EVTIM_LOG_DBG("event buffer full, resetting rte_timer with "
-			      "immediate expiry value");
+		if (unlikely(rte_atomic16_test_and_set(&sw->in_use[lcore].v)))
+			sw->poll_lcores[sw->n_poll_lcores++] = lcore;
 	} else {
-		struct msg *m = container_of(tim, struct msg, tim);
-		TAILQ_REMOVE(&sw_data->msgs_tailq_head, m, msgs);
 		EVTIM_BUF_LOG_DBG("buffered an event timer expiry event");
-		evtim->state = RTE_EVENT_TIMER_NOT_ARMED;
 
-		/* Free the msg object containing the rte_timer now that
-		 * we've buffered its event successfully.
+		/* Empty the buffer here, if necessary, to free older expired
+		 * timers only
 		 */
-		rte_mempool_put(sw_data->msg_pool, m);
+		if (unlikely(sw->n_expired_timers == EXP_TIM_BUF_SZ)) {
+			rte_mempool_put_bulk(sw->tim_pool,
+					     (void **)sw->expired_timers,
+					     sw->n_expired_timers);
+			sw->n_expired_timers = 0;
+		}
 
-		/* Bump the count when we successfully add an expiry event to
-		 * the buffer.
-		 */
-		sw_data->stats.evtim_exp_count++;
+		sw->expired_timers[sw->n_expired_timers++] = tim;
+		sw->stats.evtim_exp_count++;
+
+		evtim->state = RTE_EVENT_TIMER_NOT_ARMED;
 	}
 
-	if (event_buffer_batch_ready(&sw_data->buffer)) {
-		event_buffer_flush(&sw_data->buffer,
+	if (event_buffer_batch_ready(&sw->buffer)) {
+		event_buffer_flush(&sw->buffer,
 				   adapter->data->event_dev_id,
 				   adapter->data->event_port_id,
 				   &nb_evs_flushed,
 				   &nb_evs_invalid);
 
-		sw_data->stats.ev_enq_count += nb_evs_flushed;
-		sw_data->stats.ev_inv_count += nb_evs_invalid;
+		sw->stats.ev_enq_count += nb_evs_flushed;
+		sw->stats.ev_inv_count += nb_evs_invalid;
 	}
 }
 
 static __rte_always_inline uint64_t
 get_timeout_cycles(struct rte_event_timer *evtim,
-		   struct rte_event_timer_adapter *adapter)
+		   const struct rte_event_timer_adapter *adapter)
 {
-	uint64_t timeout_ns;
-	struct rte_event_timer_adapter_sw_data *sw_data;
-
-	sw_data = adapter->data->adapter_priv;
-	timeout_ns = evtim->timeout_ticks * sw_data->timer_tick_ns;
+	struct swtim *sw = swtim_pmd_priv(adapter);
+	uint64_t timeout_ns = evtim->timeout_ticks * sw->timer_tick_ns;
 	return timeout_ns * rte_get_timer_hz() / NSECPERSEC;
-
 }
 
 /* This function returns true if one or more (adapter) ticks have occurred since
  * the last time it was called.
  */
 static inline bool
-adapter_did_tick(struct rte_event_timer_adapter *adapter)
+swtim_did_tick(struct swtim *sw)
 {
 	uint64_t cycles_per_adapter_tick, start_cycles;
 	uint64_t *next_tick_cyclesp;
-	struct rte_event_timer_adapter_sw_data *sw_data;
-
-	sw_data = adapter->data->adapter_priv;
-	next_tick_cyclesp = &sw_data->next_tick_cycles;
 
-	cycles_per_adapter_tick = sw_data->timer_tick_ns *
+	next_tick_cyclesp = &sw->next_tick_cycles;
+	cycles_per_adapter_tick = sw->timer_tick_ns *
 			(rte_get_timer_hz() / NSECPERSEC);
-
 	start_cycles = rte_get_timer_cycles();
 
 	/* Note: initially, *next_tick_cyclesp == 0, so the clause below will
@@ -646,7 +666,6 @@ adapter_did_tick(struct rte_event_timer_adapter *adapter)
 		 * boundary.
 		 */
 		start_cycles -= start_cycles % cycles_per_adapter_tick;
-
 		*next_tick_cyclesp = start_cycles + cycles_per_adapter_tick;
 
 		return true;
@@ -661,15 +680,12 @@ check_timeout(struct rte_event_timer *evtim,
 	      const struct rte_event_timer_adapter *adapter)
 {
 	uint64_t tmo_nsec;
-	struct rte_event_timer_adapter_sw_data *sw_data;
+	struct swtim *sw = swtim_pmd_priv(adapter);
 
-	sw_data = adapter->data->adapter_priv;
-	tmo_nsec = evtim->timeout_ticks * sw_data->timer_tick_ns;
-
-	if (tmo_nsec > sw_data->max_tmo_ns)
+	tmo_nsec = evtim->timeout_ticks * sw->timer_tick_ns;
+	if (tmo_nsec > sw->max_tmo_ns)
 		return -1;
-
-	if (tmo_nsec < sw_data->timer_tick_ns)
+	if (tmo_nsec < sw->timer_tick_ns)
 		return -2;
 
 	return 0;
@@ -697,110 +713,36 @@ check_destination_event_queue(struct rte_event_timer *evtim,
 	return 0;
 }
 
-#define NB_OBJS 32
 static int
-sw_event_timer_adapter_service_func(void *arg)
+swtim_service_func(void *arg)
 {
-	int i, num_msgs;
-	uint64_t cycles, opaque;
+	struct rte_event_timer_adapter *adapter = arg;
+	struct swtim *sw = swtim_pmd_priv(adapter);
 	uint16_t nb_evs_flushed = 0;
 	uint16_t nb_evs_invalid = 0;
-	struct rte_event_timer_adapter *adapter;
-	struct rte_event_timer_adapter_sw_data *sw_data;
-	struct rte_event_timer *evtim = NULL;
-	struct rte_timer *tim = NULL;
-	struct msg *msg, *msgs[NB_OBJS];
-
-	adapter = arg;
-	sw_data = adapter->data->adapter_priv;
-
-	sw_data->service_phase = 1;
-	rte_smp_wmb();
-
-	while (rte_atomic16_read(&sw_data->message_producer_count) > 0 ||
-	       !rte_ring_empty(sw_data->msg_ring)) {
-
-		num_msgs = rte_ring_dequeue_burst(sw_data->msg_ring,
-						  (void **)msgs, NB_OBJS, NULL);
-
-		for (i = 0; i < num_msgs; i++) {
-			int ret = 0;
-
-			RTE_SET_USED(ret);
-
-			msg = msgs[i];
-			evtim = msg->evtim;
-
-			switch (msg->type) {
-			case MSG_TYPE_ARM:
-				EVTIM_SVC_LOG_DBG("dequeued ARM message from "
-						  "ring");
-				tim = &msg->tim;
-				rte_timer_init(tim);
-				cycles = get_timeout_cycles(evtim,
-							    adapter);
-				ret = rte_timer_reset(tim, cycles, SINGLE,
-						      rte_lcore_id(),
-						      sw_event_timer_cb,
-						      evtim);
-				RTE_ASSERT(ret == 0);
-
-				evtim->impl_opaque[0] = (uintptr_t)tim;
-				evtim->impl_opaque[1] = (uintptr_t)adapter;
-
-				TAILQ_INSERT_TAIL(&sw_data->msgs_tailq_head,
-						  msg,
-						  msgs);
-				break;
-			case MSG_TYPE_CANCEL:
-				EVTIM_SVC_LOG_DBG("dequeued CANCEL message "
-						  "from ring");
-				opaque = evtim->impl_opaque[0];
-				tim = (struct rte_timer *)(uintptr_t)opaque;
-				RTE_ASSERT(tim != NULL);
-
-				ret = rte_timer_stop(tim);
-				RTE_ASSERT(ret == 0);
-
-				/* Free the msg object for the original arm
-				 * request.
-				 */
-				struct msg *m;
-				m = container_of(tim, struct msg, tim);
-				TAILQ_REMOVE(&sw_data->msgs_tailq_head, m,
-					     msgs);
-				rte_mempool_put(sw_data->msg_pool, m);
-
-				/* Free the msg object for the current msg */
-				rte_mempool_put(sw_data->msg_pool, msg);
-
-				evtim->impl_opaque[0] = 0;
-				evtim->impl_opaque[1] = 0;
-
-				break;
-			}
-		}
-	}
 
-	sw_data->service_phase = 2;
-	rte_smp_wmb();
+	if (swtim_did_tick(sw)) {
+		rte_timer_alt_manage(sw->timer_data_id,
+				     sw->poll_lcores,
+				     sw->n_poll_lcores,
+				     swtim_callback);
 
-	if (adapter_did_tick(adapter)) {
-		rte_timer_manage();
+		/* Return expired timer objects back to mempool */
+		rte_mempool_put_bulk(sw->tim_pool, (void **)sw->expired_timers,
+				     sw->n_expired_timers);
+		sw->n_expired_timers = 0;
 
-		event_buffer_flush(&sw_data->buffer,
+		event_buffer_flush(&sw->buffer,
 				   adapter->data->event_dev_id,
 				   adapter->data->event_port_id,
-				   &nb_evs_flushed, &nb_evs_invalid);
+				   &nb_evs_flushed,
+				   &nb_evs_invalid);
 
-		sw_data->stats.ev_enq_count += nb_evs_flushed;
-		sw_data->stats.ev_inv_count += nb_evs_invalid;
-		sw_data->stats.adapter_tick_count++;
+		sw->stats.ev_enq_count += nb_evs_flushed;
+		sw->stats.ev_inv_count += nb_evs_invalid;
+		sw->stats.adapter_tick_count++;
 	}
 
-	sw_data->service_phase = 0;
-	rte_smp_wmb();
-
 	return 0;
 }
 
@@ -820,7 +762,7 @@ compute_msg_mempool_cache_size(uint64_t nb_requested, uint64_t nb_actual)
 	int size;
 	int cache_size = 0;
 
-	for (i = 0; ; i++) {
+	for (i = 0;; i++) {
 		size = 1 << i;
 
 		if (RTE_MAX_LCORE * size < (int)(nb_actual - nb_requested) &&
@@ -834,168 +776,144 @@ compute_msg_mempool_cache_size(uint64_t nb_requested, uint64_t nb_actual)
 	return cache_size;
 }
 
-#define SW_MIN_INTERVAL 1E5
-
 static int
-sw_event_timer_adapter_init(struct rte_event_timer_adapter *adapter)
+swtim_init(struct rte_event_timer_adapter *adapter)
 {
-	int ret;
-	struct rte_event_timer_adapter_sw_data *sw_data;
-	uint64_t nb_timers;
+	int i, ret;
+	struct swtim *sw;
 	unsigned int flags;
 	struct rte_service_spec service;
-	static bool timer_subsystem_inited; // static initialized to false
 
-	/* Allocate storage for SW implementation data */
-	char priv_data_name[RTE_RING_NAMESIZE];
-	snprintf(priv_data_name, RTE_RING_NAMESIZE, "sw_evtim_adap_priv_%"PRIu8,
-		 adapter->data->id);
-	adapter->data->adapter_priv = rte_zmalloc_socket(
-				priv_data_name,
-				sizeof(struct rte_event_timer_adapter_sw_data),
-				RTE_CACHE_LINE_SIZE,
-				adapter->data->socket_id);
-	if (adapter->data->adapter_priv == NULL) {
+	/* Allocate storage for private data area */
+#define SWTIM_NAMESIZE 32
+	char swtim_name[SWTIM_NAMESIZE];
+	snprintf(swtim_name, SWTIM_NAMESIZE, "swtim_%"PRIu8,
+			adapter->data->id);
+	sw = rte_zmalloc_socket(swtim_name, sizeof(*sw), RTE_CACHE_LINE_SIZE,
+			adapter->data->socket_id);
+	if (sw == NULL) {
 		EVTIM_LOG_ERR("failed to allocate space for private data");
 		rte_errno = ENOMEM;
 		return -1;
 	}
 
-	if (adapter->data->conf.timer_tick_ns < SW_MIN_INTERVAL) {
-		EVTIM_LOG_ERR("failed to create adapter with requested tick "
-			      "interval");
-		rte_errno = EINVAL;
-		return -1;
-	}
-
-	sw_data = adapter->data->adapter_priv;
+	/* Connect storage to adapter instance */
+	adapter->data->adapter_priv = sw;
+	sw->adapter = adapter;
 
-	sw_data->timer_tick_ns = adapter->data->conf.timer_tick_ns;
-	sw_data->max_tmo_ns = adapter->data->conf.max_tmo_ns;
+	sw->timer_tick_ns = adapter->data->conf.timer_tick_ns;
+	sw->max_tmo_ns = adapter->data->conf.max_tmo_ns;
 
-	TAILQ_INIT(&sw_data->msgs_tailq_head);
-	rte_spinlock_init(&sw_data->msgs_tailq_sl);
-	rte_atomic16_init(&sw_data->message_producer_count);
-
-	/* Rings require power of 2, so round up to next such value */
-	nb_timers = rte_align64pow2(adapter->data->conf.nb_timers);
-
-	char msg_ring_name[RTE_RING_NAMESIZE];
-	snprintf(msg_ring_name, RTE_RING_NAMESIZE,
-		 "sw_evtim_adap_msg_ring_%"PRIu8, adapter->data->id);
-	flags = adapter->data->conf.flags & RTE_EVENT_TIMER_ADAPTER_F_SP_PUT ?
-		RING_F_SP_ENQ | RING_F_SC_DEQ :
-		RING_F_SC_DEQ;
-	sw_data->msg_ring = rte_ring_create(msg_ring_name, nb_timers,
-					    adapter->data->socket_id, flags);
-	if (sw_data->msg_ring == NULL) {
-		EVTIM_LOG_ERR("failed to create message ring");
-		rte_errno = ENOMEM;
-		goto free_priv_data;
-	}
-
-	char pool_name[RTE_RING_NAMESIZE];
-	snprintf(pool_name, RTE_RING_NAMESIZE, "sw_evtim_adap_msg_pool_%"PRIu8,
+	/* Create a timer pool */
+	char pool_name[SWTIM_NAMESIZE];
+	snprintf(pool_name, SWTIM_NAMESIZE, "swtim_pool_%"PRIu8,
 		 adapter->data->id);
-
-	/* Both the arming/canceling thread and the service thread will do puts
-	 * to the mempool, but if the SP_PUT flag is enabled, we can specify
-	 * single-consumer get for the mempool.
-	 */
-	flags = adapter->data->conf.flags & RTE_EVENT_TIMER_ADAPTER_F_SP_PUT ?
-		MEMPOOL_F_SC_GET : 0;
-
-	/* The usable size of a ring is count - 1, so subtract one here to
-	 * make the counts agree.
-	 */
+	/* Optimal mempool size is a power of 2 minus one */
+	uint64_t nb_timers = rte_align64pow2(adapter->data->conf.nb_timers);
 	int pool_size = nb_timers - 1;
 	int cache_size = compute_msg_mempool_cache_size(
 				adapter->data->conf.nb_timers, nb_timers);
-	sw_data->msg_pool = rte_mempool_create(pool_name, pool_size,
-					       sizeof(struct msg), cache_size,
-					       0, NULL, NULL, NULL, NULL,
-					       adapter->data->socket_id, flags);
-	if (sw_data->msg_pool == NULL) {
-		EVTIM_LOG_ERR("failed to create message object mempool");
+	flags = 0; /* pool is multi-producer, multi-consumer */
+	sw->tim_pool = rte_mempool_create(pool_name, pool_size,
+			sizeof(struct rte_timer), cache_size, 0, NULL, NULL,
+			NULL, NULL, adapter->data->socket_id, flags);
+	if (sw->tim_pool == NULL) {
+		EVTIM_LOG_ERR("failed to create timer object mempool");
 		rte_errno = ENOMEM;
-		goto free_msg_ring;
+		goto free_alloc;
+	}
+
+	/* Initialize the variables that track in-use timer lists */
+	for (i = 0; i < RTE_MAX_LCORE; i++)
+		rte_atomic16_init(&sw->in_use[i].v);
+
+	/* Initialize the timer subsystem and allocate timer data instance */
+	ret = rte_timer_subsystem_init();
+	if (ret < 0) {
+		if (ret != -EALREADY) {
+			EVTIM_LOG_ERR("failed to initialize timer subsystem");
+			rte_errno = ret;
+			goto free_mempool;
+		}
+	}
+
+	ret = rte_timer_data_alloc(&sw->timer_data_id);
+	if (ret < 0) {
+		EVTIM_LOG_ERR("failed to allocate timer data instance");
+		rte_errno = ret;
+		goto free_mempool;
 	}
 
-	event_buffer_init(&sw_data->buffer);
+	/* Initialize timer event buffer */
+	event_buffer_init(&sw->buffer);
+
+	sw->adapter = adapter;
 
 	/* Register a service component to run adapter logic */
 	memset(&service, 0, sizeof(service));
 	snprintf(service.name, RTE_SERVICE_NAME_MAX,
-		 "sw_evimer_adap_svc_%"PRIu8, adapter->data->id);
+		 "swtim_svc_%"PRIu8, adapter->data->id);
 	service.socket_id = adapter->data->socket_id;
-	service.callback = sw_event_timer_adapter_service_func;
+	service.callback = swtim_service_func;
 	service.callback_userdata = adapter;
 	service.capabilities &= ~(RTE_SERVICE_CAP_MT_SAFE);
-	ret = rte_service_component_register(&service, &sw_data->service_id);
+	ret = rte_service_component_register(&service, &sw->service_id);
 	if (ret < 0) {
 		EVTIM_LOG_ERR("failed to register service %s with id %"PRIu32
-			      ": err = %d", service.name, sw_data->service_id,
+			      ": err = %d", service.name, sw->service_id,
 			      ret);
 
 		rte_errno = ENOSPC;
-		goto free_msg_pool;
+		goto free_mempool;
 	}
 
 	EVTIM_LOG_DBG("registered service %s with id %"PRIu32, service.name,
-		      sw_data->service_id);
+		      sw->service_id);
 
-	adapter->data->service_id = sw_data->service_id;
+	adapter->data->service_id = sw->service_id;
 	adapter->data->service_inited = 1;
 
-	if (!timer_subsystem_inited) {
-		rte_timer_subsystem_init();
-		timer_subsystem_inited = true;
-	}
-
 	return 0;
-
-free_msg_pool:
-	rte_mempool_free(sw_data->msg_pool);
-free_msg_ring:
-	rte_ring_free(sw_data->msg_ring);
-free_priv_data:
-	rte_free(sw_data);
+free_mempool:
+	rte_mempool_free(sw->tim_pool);
+free_alloc:
+	rte_free(sw);
 	return -1;
 }
 
-static int
-sw_event_timer_adapter_uninit(struct rte_event_timer_adapter *adapter)
+static void
+swtim_free_tim(struct rte_timer *tim, void *arg)
 {
-	int ret;
-	struct msg *m1, *m2;
-	struct rte_event_timer_adapter_sw_data *sw_data =
-						adapter->data->adapter_priv;
-
-	rte_spinlock_lock(&sw_data->msgs_tailq_sl);
+	struct swtim *sw = arg;
 
-	/* Cancel outstanding rte_timers and free msg objects */
-	m1 = TAILQ_FIRST(&sw_data->msgs_tailq_head);
-	while (m1 != NULL) {
-		EVTIM_LOG_DBG("freeing outstanding timer");
-		m2 = TAILQ_NEXT(m1, msgs);
-
-		rte_timer_stop_sync(&m1->tim);
-		rte_mempool_put(sw_data->msg_pool, m1);
+	rte_mempool_put(sw->tim_pool, tim);
+}
 
-		m1 = m2;
-	}
+/* Traverse the list of outstanding timers and put them back in the mempool
+ * before freeing the adapter to avoid leaking the memory.
+ */
+static int
+swtim_uninit(struct rte_event_timer_adapter *adapter)
+{
+	int ret;
+	struct swtim *sw = swtim_pmd_priv(adapter);
 
-	rte_spinlock_unlock(&sw_data->msgs_tailq_sl);
+	/* Free outstanding timers */
+	rte_timer_stop_all(sw->timer_data_id,
+			   sw->poll_lcores,
+			   sw->n_poll_lcores,
+			   swtim_free_tim,
+			   sw);
 
-	ret = rte_service_component_unregister(sw_data->service_id);
+	ret = rte_service_component_unregister(sw->service_id);
 	if (ret < 0) {
 		EVTIM_LOG_ERR("failed to unregister service component");
 		return ret;
 	}
 
-	rte_ring_free(sw_data->msg_ring);
-	rte_mempool_free(sw_data->msg_pool);
-	rte_free(adapter->data->adapter_priv);
+	rte_mempool_free(sw->tim_pool);
+	rte_free(sw);
+	adapter->data->adapter_priv = NULL;
 
 	return 0;
 }
@@ -1016,88 +934,79 @@ get_mapped_count_for_service(uint32_t service_id)
 }
 
 static int
-sw_event_timer_adapter_start(const struct rte_event_timer_adapter *adapter)
+swtim_start(const struct rte_event_timer_adapter *adapter)
 {
 	int mapped_count;
-	struct rte_event_timer_adapter_sw_data *sw_data;
-
-	sw_data = adapter->data->adapter_priv;
+	struct swtim *sw = swtim_pmd_priv(adapter);
 
 	/* Mapping the service to more than one service core can introduce
 	 * delays while one thread is waiting to acquire a lock, so only allow
 	 * one core to be mapped to the service.
+	 *
+	 * Note: the service could be modified such that it spreads cores to
+	 * poll over multiple service instances.
 	 */
-	mapped_count = get_mapped_count_for_service(sw_data->service_id);
+	mapped_count = get_mapped_count_for_service(sw->service_id);
 
-	if (mapped_count == 1)
-		return rte_service_component_runstate_set(sw_data->service_id,
-							  1);
+	if (mapped_count != 1)
+		return mapped_count < 1 ? -ENOENT : -ENOTSUP;
 
-	return mapped_count < 1 ? -ENOENT : -ENOTSUP;
+	return rte_service_component_runstate_set(sw->service_id, 1);
 }
 
 static int
-sw_event_timer_adapter_stop(const struct rte_event_timer_adapter *adapter)
+swtim_stop(const struct rte_event_timer_adapter *adapter)
 {
 	int ret;
-	struct rte_event_timer_adapter_sw_data *sw_data =
-						adapter->data->adapter_priv;
+	struct swtim *sw = swtim_pmd_priv(adapter);
 
-	ret = rte_service_component_runstate_set(sw_data->service_id, 0);
+	ret = rte_service_component_runstate_set(sw->service_id, 0);
 	if (ret < 0)
 		return ret;
 
-	/* Wait for the service to complete its final iteration before
-	 * stopping.
-	 */
-	while (sw_data->service_phase != 0)
+	/* Wait for the service to complete its final iteration */
+	while (rte_service_may_be_active(sw->service_id))
 		rte_pause();
 
-	rte_smp_rmb();
-
 	return 0;
 }
 
 static void
-sw_event_timer_adapter_get_info(const struct rte_event_timer_adapter *adapter,
+swtim_get_info(const struct rte_event_timer_adapter *adapter,
 		struct rte_event_timer_adapter_info *adapter_info)
 {
-	struct rte_event_timer_adapter_sw_data *sw_data;
-	sw_data = adapter->data->adapter_priv;
-
-	adapter_info->min_resolution_ns = sw_data->timer_tick_ns;
-	adapter_info->max_tmo_ns = sw_data->max_tmo_ns;
+	struct swtim *sw = swtim_pmd_priv(adapter);
+	adapter_info->min_resolution_ns = sw->timer_tick_ns;
+	adapter_info->max_tmo_ns = sw->max_tmo_ns;
 }
 
 static int
-sw_event_timer_adapter_stats_get(const struct rte_event_timer_adapter *adapter,
-				 struct rte_event_timer_adapter_stats *stats)
+swtim_stats_get(const struct rte_event_timer_adapter *adapter,
+		struct rte_event_timer_adapter_stats *stats)
 {
-	struct rte_event_timer_adapter_sw_data *sw_data;
-	sw_data = adapter->data->adapter_priv;
-	*stats = sw_data->stats;
+	struct swtim *sw = swtim_pmd_priv(adapter);
+	*stats = sw->stats; /* structure copy */
 	return 0;
 }
 
 static int
-sw_event_timer_adapter_stats_reset(
-				const struct rte_event_timer_adapter *adapter)
+swtim_stats_reset(const struct rte_event_timer_adapter *adapter)
 {
-	struct rte_event_timer_adapter_sw_data *sw_data;
-	sw_data = adapter->data->adapter_priv;
-	memset(&sw_data->stats, 0, sizeof(sw_data->stats));
+	struct swtim *sw = swtim_pmd_priv(adapter);
+	memset(&sw->stats, 0, sizeof(sw->stats));
 	return 0;
 }
 
-static __rte_always_inline uint16_t
-__sw_event_timer_arm_burst(const struct rte_event_timer_adapter *adapter,
-			  struct rte_event_timer **evtims,
-			  uint16_t nb_evtims)
+static uint16_t
+__swtim_arm_burst(const struct rte_event_timer_adapter *adapter,
+		struct rte_event_timer **evtims,
+		uint16_t nb_evtims)
 {
-	uint16_t i;
-	int ret;
-	struct rte_event_timer_adapter_sw_data *sw_data;
-	struct msg *msgs[nb_evtims];
+	int i, ret;
+	struct swtim *sw = swtim_pmd_priv(adapter);
+	uint32_t lcore_id = rte_lcore_id();
+	struct rte_timer *tim, *tims[nb_evtims];
+	uint64_t cycles;
 
 #ifdef RTE_LIBRTE_EVENTDEV_DEBUG
 	/* Check that the service is running. */
@@ -1107,101 +1016,103 @@ __sw_event_timer_arm_burst(const struct rte_event_timer_adapter *adapter,
 	}
 #endif
 
-	sw_data = adapter->data->adapter_priv;
+	/* Adjust lcore_id if non-EAL thread. Arbitrarily pick the timer list of
+	 * the highest lcore to insert such timers into
+	 */
+	if (lcore_id == LCORE_ID_ANY)
+		lcore_id = RTE_MAX_LCORE - 1;
+
+	/* If this is the first time we're arming an event timer on this lcore,
+	 * mark this lcore as "in use"; this will cause the service
+	 * function to process the timer list that corresponds to this lcore.
+	 */
+	if (unlikely(rte_atomic16_test_and_set(&sw->in_use[lcore_id].v))) {
+		EVTIM_LOG_DBG("Adding lcore id = %u to list of lcores to poll",
+			      lcore_id);
+		sw->poll_lcores[sw->n_poll_lcores] = lcore_id;
+		++sw->n_poll_lcores;
+	}
 
-	ret = rte_mempool_get_bulk(sw_data->msg_pool, (void **)msgs, nb_evtims);
+	ret = rte_mempool_get_bulk(sw->tim_pool, (void **)tims,
+				   nb_evtims);
 	if (ret < 0) {
 		rte_errno = ENOSPC;
 		return 0;
 	}
 
-	/* Let the service know we're producing messages for it to process */
-	rte_atomic16_inc(&sw_data->message_producer_count);
-
-	/* If the service is managing timers, wait for it to finish */
-	while (sw_data->service_phase == 2)
-		rte_pause();
-
-	rte_smp_rmb();
-
 	for (i = 0; i < nb_evtims; i++) {
 		/* Don't modify the event timer state in these cases */
 		if (evtims[i]->state == RTE_EVENT_TIMER_ARMED) {
 			rte_errno = EALREADY;
 			break;
 		} else if (!(evtims[i]->state == RTE_EVENT_TIMER_NOT_ARMED ||
-		    evtims[i]->state == RTE_EVENT_TIMER_CANCELED)) {
+			     evtims[i]->state == RTE_EVENT_TIMER_CANCELED)) {
 			rte_errno = EINVAL;
 			break;
 		}
 
 		ret = check_timeout(evtims[i], adapter);
-		if (ret == -1) {
+		if (unlikely(ret == -1)) {
 			evtims[i]->state = RTE_EVENT_TIMER_ERROR_TOOLATE;
 			rte_errno = EINVAL;
 			break;
-		}
-		if (ret == -2) {
+		} else if (unlikely(ret == -2)) {
 			evtims[i]->state = RTE_EVENT_TIMER_ERROR_TOOEARLY;
 			rte_errno = EINVAL;
 			break;
 		}
 
-		if (check_destination_event_queue(evtims[i], adapter) < 0) {
+		if (unlikely(check_destination_event_queue(evtims[i],
+							   adapter) < 0)) {
 			evtims[i]->state = RTE_EVENT_TIMER_ERROR;
 			rte_errno = EINVAL;
 			break;
 		}
 
-		/* Checks passed, set up a message to enqueue */
-		msgs[i]->type = MSG_TYPE_ARM;
-		msgs[i]->evtim = evtims[i];
+		tim = tims[i];
+		rte_timer_init(tim);
 
-		/* Set the payload pointer if not set. */
-		if (evtims[i]->ev.event_ptr == NULL)
-			evtims[i]->ev.event_ptr = evtims[i];
+		evtims[i]->impl_opaque[0] = (uintptr_t)tim;
+		evtims[i]->impl_opaque[1] = (uintptr_t)adapter;
 
-		/* msg objects that get enqueued successfully will be freed
-		 * either by a future cancel operation or by the timer
-		 * expiration callback.
-		 */
-		if (rte_ring_enqueue(sw_data->msg_ring, msgs[i]) < 0) {
-			rte_errno = ENOSPC;
+		cycles = get_timeout_cycles(evtims[i], adapter);
+		ret = rte_timer_alt_reset(sw->timer_data_id, tim, cycles,
+					  SINGLE, lcore_id, NULL, evtims[i]);
+		if (ret < 0) {
+			/* tim was in RUNNING or CONFIG state */
+			evtims[i]->state = RTE_EVENT_TIMER_ERROR;
 			break;
 		}
 
-		EVTIM_LOG_DBG("enqueued ARM message to ring");
-
+		rte_smp_wmb();
+		EVTIM_LOG_DBG("armed an event timer");
 		evtims[i]->state = RTE_EVENT_TIMER_ARMED;
 	}
 
-	/* Let the service know we're done producing messages */
-	rte_atomic16_dec(&sw_data->message_producer_count);
-
 	if (i < nb_evtims)
-		rte_mempool_put_bulk(sw_data->msg_pool, (void **)&msgs[i],
-				     nb_evtims - i);
+		rte_mempool_put_bulk(sw->tim_pool,
+				     (void **)&tims[i], nb_evtims - i);
 
 	return i;
 }
 
 static uint16_t
-sw_event_timer_arm_burst(const struct rte_event_timer_adapter *adapter,
-			 struct rte_event_timer **evtims,
-			 uint16_t nb_evtims)
+swtim_arm_burst(const struct rte_event_timer_adapter *adapter,
+		struct rte_event_timer **evtims,
+		uint16_t nb_evtims)
 {
-	return __sw_event_timer_arm_burst(adapter, evtims, nb_evtims);
+	return __swtim_arm_burst(adapter, evtims, nb_evtims);
 }
 
 static uint16_t
-sw_event_timer_cancel_burst(const struct rte_event_timer_adapter *adapter,
-			    struct rte_event_timer **evtims,
-			    uint16_t nb_evtims)
+swtim_cancel_burst(const struct rte_event_timer_adapter *adapter,
+		   struct rte_event_timer **evtims,
+		   uint16_t nb_evtims)
 {
-	uint16_t i;
-	int ret;
-	struct rte_event_timer_adapter_sw_data *sw_data;
-	struct msg *msgs[nb_evtims];
+	int i, ret;
+	struct rte_timer *timp;
+	uint64_t opaque;
+	struct swtim *sw = swtim_pmd_priv(adapter);
 
 #ifdef RTE_LIBRTE_EVENTDEV_DEBUG
 	/* Check that the service is running. */
@@ -1211,23 +1122,6 @@ sw_event_timer_cancel_burst(const struct rte_event_timer_adapter *adapter,
 	}
 #endif
 
-	sw_data = adapter->data->adapter_priv;
-
-	ret = rte_mempool_get_bulk(sw_data->msg_pool, (void **)msgs, nb_evtims);
-	if (ret < 0) {
-		rte_errno = ENOSPC;
-		return 0;
-	}
-
-	/* Let the service know we're producing messages for it to process */
-	rte_atomic16_inc(&sw_data->message_producer_count);
-
-	/* If the service could be modifying event timer states, wait */
-	while (sw_data->service_phase == 2)
-		rte_pause();
-
-	rte_smp_rmb();
-
 	for (i = 0; i < nb_evtims; i++) {
 		/* Don't modify the event timer state in these cases */
 		if (evtims[i]->state == RTE_EVENT_TIMER_CANCELED) {
@@ -1238,54 +1132,56 @@ sw_event_timer_cancel_burst(const struct rte_event_timer_adapter *adapter,
 			break;
 		}
 
-		msgs[i]->type = MSG_TYPE_CANCEL;
-		msgs[i]->evtim = evtims[i];
+		rte_smp_rmb();
+
+		opaque = evtims[i]->impl_opaque[0];
+		timp = (struct rte_timer *)(uintptr_t)opaque;
+		RTE_ASSERT(timp != NULL);
 
-		if (rte_ring_enqueue(sw_data->msg_ring, msgs[i]) < 0) {
-			rte_errno = ENOSPC;
+		ret = rte_timer_alt_stop(sw->timer_data_id, timp);
+		if (ret < 0) {
+			/* Timer is running or being configured */
+			rte_errno = EAGAIN;
 			break;
 		}
 
-		EVTIM_LOG_DBG("enqueued CANCEL message to ring");
+		rte_mempool_put(sw->tim_pool, (void **)timp);
 
 		evtims[i]->state = RTE_EVENT_TIMER_CANCELED;
-	}
+		evtims[i]->impl_opaque[0] = 0;
+		evtims[i]->impl_opaque[1] = 0;
 
-	/* Let the service know we're done producing messages */
-	rte_atomic16_dec(&sw_data->message_producer_count);
-
-	if (i < nb_evtims)
-		rte_mempool_put_bulk(sw_data->msg_pool, (void **)&msgs[i],
-				     nb_evtims - i);
+		rte_smp_wmb();
+	}
 
 	return i;
 }
 
 static uint16_t
-sw_event_timer_arm_tmo_tick_burst(const struct rte_event_timer_adapter *adapter,
-				  struct rte_event_timer **evtims,
-				  uint64_t timeout_ticks,
-				  uint16_t nb_evtims)
+swtim_arm_tmo_tick_burst(const struct rte_event_timer_adapter *adapter,
+			 struct rte_event_timer **evtims,
+			 uint64_t timeout_ticks,
+			 uint16_t nb_evtims)
 {
 	int i;
 
 	for (i = 0; i < nb_evtims; i++)
 		evtims[i]->timeout_ticks = timeout_ticks;
 
-	return __sw_event_timer_arm_burst(adapter, evtims, nb_evtims);
+	return __swtim_arm_burst(adapter, evtims, nb_evtims);
 }
 
-static const struct rte_event_timer_adapter_ops sw_event_adapter_timer_ops = {
-	.init = sw_event_timer_adapter_init,
-	.uninit = sw_event_timer_adapter_uninit,
-	.start = sw_event_timer_adapter_start,
-	.stop = sw_event_timer_adapter_stop,
-	.get_info = sw_event_timer_adapter_get_info,
-	.stats_get = sw_event_timer_adapter_stats_get,
-	.stats_reset = sw_event_timer_adapter_stats_reset,
-	.arm_burst = sw_event_timer_arm_burst,
-	.arm_tmo_tick_burst = sw_event_timer_arm_tmo_tick_burst,
-	.cancel_burst = sw_event_timer_cancel_burst,
+static const struct rte_event_timer_adapter_ops swtim_ops = {
+	.init			= swtim_init,
+	.uninit			= swtim_uninit,
+	.start			= swtim_start,
+	.stop			= swtim_stop,
+	.get_info		= swtim_get_info,
+	.stats_get		= swtim_stats_get,
+	.stats_reset		= swtim_stats_reset,
+	.arm_burst		= swtim_arm_burst,
+	.arm_tmo_tick_burst	= swtim_arm_tmo_tick_burst,
+	.cancel_burst		= swtim_cancel_burst,
 };
 
 RTE_INIT(event_timer_adapter_init_log)
-- 
2.6.4


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [dpdk-dev] [PATCH v8 1/1] eventdev: add new software event timer adapter
  2019-06-19 16:25                 ` [dpdk-dev] [PATCH v8 1/1] eventdev: add new " Erik Gabriel Carrillo
@ 2019-06-24  6:12                   ` Jerin Jacob Kollanukkaran
  2019-06-25  6:06                     ` Jerin Jacob Kollanukkaran
  0 siblings, 1 reply; 77+ messages in thread
From: Jerin Jacob Kollanukkaran @ 2019-06-24  6:12 UTC (permalink / raw)
  To: Erik Gabriel Carrillo, jerin.jacob
  Cc: mattias.ronnblom, pbhagavatula, Honnappa.Nagarahalli, dev

> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Erik Gabriel Carrillo
> Sent: Wednesday, June 19, 2019 9:56 PM
> To: jerin.jacob@caviumnetworks.com
> Cc: mattias.ronnblom@ericsson.com; pbhagavatula@caviumnetworks.com;
> Honnappa.Nagarahalli@arm.com; dev@dpdk.org
> Subject: [dpdk-dev] [PATCH v8 1/1] eventdev: add new software event
> timer adapter
> 
> This patch introduces a new version of the event timer adapter software
> PMD. In the original design, timer event producer lcores in the primary and
> secondary processes enqueued event timers into a ring, and a service core in
> the primary process dequeued them and processed them further.  To
> improve performance, this version does away with the ring and lets lcores
> insert timers directly into timer skiplist data structures; the service core
> directly accesses the lists as well, when looking for timers that have expired.
> 
> To compare the burst and non-burst performance of the original and new
> versions of the software event timer adapter, I ran the following
> commands:
> 
> $ sudo ./build/app/dpdk-test-eventdev -c 0xFFE -s 0xC --vdev=event_sw0 \
> -- --test=perf_queue --plcores=4,5,6 --wlcore=7,8,9 --stlist=p \ --
> prod_type_timerdev --worker_deq_depth=32
> 
> $ sudo ./build/app/dpdk-test-eventdev -c 0xFFE -s 0xC --vdev=event_sw0 \
> -- --test=perf_queue --plcores=4,5,6 --wlcore=7,8,9 --stlist=p \ --
> prod_type_timerdev_burst --worker_deq_depth=32
> 
> With the new version, I see a 151% improvement in throughput for the non-
> burst case, and a 270% improvement in throughput for the burst case.
> I also see a 53% improvement in arm latency in the non-burst case and a 65%
> improvement in arm latency in the burst case.
> 
> Note: To perform the test,  I commented out a check in the original version
> that checks the adapter tick interval against a minimum value.
> 
> Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>

Acked-by: Jerin Jacob <jerinj@marvell.com>

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [dpdk-dev] [PATCH v8 1/1] eventdev: add new software event timer adapter
  2019-06-24  6:12                   ` Jerin Jacob Kollanukkaran
@ 2019-06-25  6:06                     ` Jerin Jacob Kollanukkaran
  0 siblings, 0 replies; 77+ messages in thread
From: Jerin Jacob Kollanukkaran @ 2019-06-25  6:06 UTC (permalink / raw)
  To: Jerin Jacob Kollanukkaran, Erik Gabriel Carrillo, jerin.jacob
  Cc: mattias.ronnblom, pbhagavatula, Honnappa.Nagarahalli, dev

> -----Original Message-----
> From: Jerin Jacob Kollanukkaran <jerinj@marvell.com>
> Sent: Monday, June 24, 2019 11:43 AM
> To: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>;
> jerin.jacob@caviumnetworks.com
> Cc: mattias.ronnblom@ericsson.com; pbhagavatula@caviumnetworks.com;
> Honnappa.Nagarahalli@arm.com; dev@dpdk.org
> Subject: RE: [dpdk-dev] [PATCH v8 1/1] eventdev: add new software event timer
> adapter
> 
> > -----Original Message-----
> > From: dev <dev-bounces@dpdk.org> On Behalf Of Erik Gabriel Carrillo
> > Sent: Wednesday, June 19, 2019 9:56 PM
> > To: jerin.jacob@caviumnetworks.com
> > Cc: mattias.ronnblom@ericsson.com; pbhagavatula@caviumnetworks.com;
> > Honnappa.Nagarahalli@arm.com; dev@dpdk.org
> > Subject: [dpdk-dev] [PATCH v8 1/1] eventdev: add new software event
> > timer adapter
> >
> > This patch introduces a new version of the event timer adapter software
> > PMD. In the original design, timer event producer lcores in the primary and
> > secondary processes enqueued event timers into a ring, and a service core in
> > the primary process dequeued them and processed them further.  To
> > improve performance, this version does away with the ring and lets lcores
> > insert timers directly into timer skiplist data structures; the service core
> > directly accesses the lists as well, when looking for timers that have expired.
> >
> > To compare the burst and non-burst performance of the original and new
> > versions of the software event timer adapter, I ran the following
> > commands:
> >
> > $ sudo ./build/app/dpdk-test-eventdev -c 0xFFE -s 0xC --vdev=event_sw0 \
> > -- --test=perf_queue --plcores=4,5,6 --wlcore=7,8,9 --stlist=p \ --
> > prod_type_timerdev --worker_deq_depth=32
> >
> > $ sudo ./build/app/dpdk-test-eventdev -c 0xFFE -s 0xC --vdev=event_sw0 \
> > -- --test=perf_queue --plcores=4,5,6 --wlcore=7,8,9 --stlist=p \ --
> > prod_type_timerdev_burst --worker_deq_depth=32
> >
> > With the new version, I see a 151% improvement in throughput for the non-
> > burst case, and a 270% improvement in throughput for the burst case.
> > I also see a 53% improvement in arm latency in the non-burst case and a 65%
> > improvement in arm latency in the burst case.
> >
> > Note: To perform the test,  I commented out a check in the original version
> > that checks the adapter tick interval against a minimum value.
> >
> > Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
> 
> Acked-by: Jerin Jacob <jerinj@marvell.com>


Applied to dpdk-next-eventdev/master. Thanks.






^ permalink raw reply	[flat|nested] 77+ messages in thread

end of thread, other threads:[~2019-06-25  6:06 UTC | newest]

Thread overview: 77+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-29 23:35 [dpdk-dev] [PATCH 0/3] new software event timer adapter Erik Gabriel Carrillo
2018-11-29 23:35 ` [dpdk-dev] [PATCH 1/3] timer: allow timer management in shared memory Erik Gabriel Carrillo
2018-11-29 23:35 ` [dpdk-dev] [PATCH 2/3] timer: add function to stop all timers in a list Erik Gabriel Carrillo
2018-11-29 23:35 ` [dpdk-dev] [PATCH 3/3] eventdev: add new software event timer adapter Erik Gabriel Carrillo
2018-11-30  7:26 ` [dpdk-dev] [PATCH 0/3] " Pavan Nikhilesh
2018-11-30 19:07   ` Carrillo, Erik G
2018-12-07 17:52 ` [dpdk-dev] [PATCH v2 0/2] Timer library changes Erik Gabriel Carrillo
2018-12-07 17:52   ` [dpdk-dev] [PATCH v2 1/2] timer: allow timer management in shared memory Erik Gabriel Carrillo
2018-12-07 18:10     ` Stephen Hemminger
2018-12-07 19:21       ` Carrillo, Erik G
2018-12-07 17:53   ` [dpdk-dev] [PATCH v2 2/2] timer: add function to stop all timers in a list Erik Gabriel Carrillo
2018-12-13 22:26   ` [dpdk-dev] [PATCH v3 0/2] Timer library changes Erik Gabriel Carrillo
2018-12-13 22:26     ` [dpdk-dev] [PATCH v3 1/2] timer: allow timer management in shared memory Erik Gabriel Carrillo
2018-12-13 22:26     ` [dpdk-dev] [PATCH v3 2/2] timer: add function to stop all timers in a list Erik Gabriel Carrillo
2018-12-19  3:35     ` [dpdk-dev] [PATCH v3 0/2] Timer library changes Thomas Monjalon
2018-12-19  7:33       ` Mattias Rönnblom
2019-03-05 22:41     ` Carrillo, Erik G
2019-03-05 22:58       ` [dpdk-dev] [dpdk-techboard] " Thomas Monjalon
2019-03-06 18:54         ` Carrillo, Erik G
2019-03-06 20:17           ` Thomas Monjalon
2019-03-06  2:39       ` [dpdk-dev] " Varghese, Vipin
2019-03-06 15:15         ` Carrillo, Erik G
2019-03-07  2:33           ` Varghese, Vipin
2019-03-06 17:20     ` [dpdk-dev] [PATCH v4 " Erik Gabriel Carrillo
2019-03-06 17:20       ` [dpdk-dev] [PATCH v4 1/2] timer: allow timer management in shared memory Erik Gabriel Carrillo
2019-03-20 13:52         ` Sanford, Robert
2019-03-20 13:52           ` Sanford, Robert
2019-03-21  1:01           ` Carrillo, Erik G
2019-03-21  1:01             ` Carrillo, Erik G
2019-03-27 14:03             ` Thomas Monjalon
2019-03-27 14:03               ` Thomas Monjalon
2019-03-28 12:42               ` Carrillo, Erik G
2019-03-28 12:42                 ` Carrillo, Erik G
2019-04-15 21:49           ` Carrillo, Erik G
2019-04-15 21:49             ` Carrillo, Erik G
2019-03-06 17:20       ` [dpdk-dev] [PATCH v4 2/2] timer: add function to stop all timers in a list Erik Gabriel Carrillo
2019-04-15 21:41       ` [dpdk-dev] [PATCH v5 0/2] Timer library changes Erik Gabriel Carrillo
2019-04-15 21:41         ` Erik Gabriel Carrillo
2019-04-15 21:41         ` [dpdk-dev] [PATCH v5 1/2] timer: allow timer management in shared memory Erik Gabriel Carrillo
2019-04-15 21:41           ` Erik Gabriel Carrillo
2019-04-17 17:09           ` Thomas Monjalon
2019-04-17 17:09             ` Thomas Monjalon
2019-04-15 21:41         ` [dpdk-dev] [PATCH v5 2/2] timer: add function to stop all timers in a list Erik Gabriel Carrillo
2019-04-15 21:41           ` Erik Gabriel Carrillo
2019-04-17 19:54         ` [dpdk-dev] [PATCH v5 0/2] Timer library changes Thomas Monjalon
2019-04-17 19:54           ` Thomas Monjalon
2018-12-07 20:34 ` [dpdk-dev] [PATCH v2 0/1] New software event timer adapter Erik Gabriel Carrillo
2018-12-07 20:34   ` [dpdk-dev] [PATCH v2 1/1] eventdev: add new " Erik Gabriel Carrillo
2018-12-09 19:17     ` Mattias Rönnblom
2018-12-10 17:17       ` Carrillo, Erik G
2018-12-14 15:45   ` [dpdk-dev] [PATCH v3 0/1] New " Erik Gabriel Carrillo
2018-12-14 15:45     ` [dpdk-dev] [PATCH v3 1/1] eventdev: add new " Erik Gabriel Carrillo
2018-12-14 21:15       ` Mattias Rönnblom
2018-12-14 23:04         ` Carrillo, Erik G
2018-12-14 23:15     ` [dpdk-dev] [PATCH v4 0/1] New " Erik Gabriel Carrillo
2018-12-14 23:15       ` [dpdk-dev] [PATCH v4 1/1] eventdev: add new " Erik Gabriel Carrillo
2018-12-18 20:11       ` [dpdk-dev] [EXT] [PATCH v4 0/1] New " Jerin Jacob Kollanukkaran
2018-12-18 20:14         ` Carrillo, Erik G
2019-04-22 14:57       ` [dpdk-dev] [PATCH v5 " Erik Gabriel Carrillo
2019-04-22 14:57         ` Erik Gabriel Carrillo
2019-04-22 14:57         ` [dpdk-dev] [PATCH v5 1/1] eventdev: add new " Erik Gabriel Carrillo
2019-04-22 14:57           ` Erik Gabriel Carrillo
2019-04-26 15:14         ` [dpdk-dev] [PATCH v6 0/1] New " Erik Gabriel Carrillo
2019-04-26 15:14           ` Erik Gabriel Carrillo
2019-04-26 15:14           ` [dpdk-dev] [PATCH v6 1/1] eventdev: add new " Erik Gabriel Carrillo
2019-04-26 15:14             ` Erik Gabriel Carrillo
2019-04-26 18:51             ` Honnappa Nagarahalli
2019-04-26 18:51               ` Honnappa Nagarahalli
2019-04-26 18:58               ` Carrillo, Erik G
2019-04-26 18:58                 ` Carrillo, Erik G
2019-06-05 13:34                 ` Jerin Jacob Kollanukkaran
2019-06-19 15:14           ` [dpdk-dev] [PATCH v7 0/1] New " Erik Gabriel Carrillo
2019-06-19 15:14             ` [dpdk-dev] [PATCH v7 1/1] eventdev: add new " Erik Gabriel Carrillo
2019-06-19 16:25               ` [dpdk-dev] [PATCH v8 0/1] New " Erik Gabriel Carrillo
2019-06-19 16:25                 ` [dpdk-dev] [PATCH v8 1/1] eventdev: add new " Erik Gabriel Carrillo
2019-06-24  6:12                   ` Jerin Jacob Kollanukkaran
2019-06-25  6:06                     ` Jerin Jacob Kollanukkaran

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).