DPDK patches and discussions
 help / color / mirror / Atom feed
* [RFC PATCH 1/5] eventdev: add power monitoring API on event port
@ 2023-04-19  9:54 Sivaprasad Tummala
  2023-04-19  9:54 ` [RFC PATCH 2/5] event/sw: support power monitor Sivaprasad Tummala
                   ` (6 more replies)
  0 siblings, 7 replies; 33+ messages in thread
From: Sivaprasad Tummala @ 2023-04-19  9:54 UTC (permalink / raw)
  To: david.hunt, jerinj, harry.van.haaren; +Cc: dev

A new API to allow power monitoring condition on event port to
optimize power when no events are arriving on an event port for
the worker core to process in an eventdev based pipelined application.

Signed-off-by: Sivaprasad Tummala <sivaprasad.tummala@amd.com>
---
 lib/eventdev/eventdev_pmd.h | 23 +++++++++++++++++++++++
 lib/eventdev/rte_eventdev.c | 24 ++++++++++++++++++++++++
 lib/eventdev/rte_eventdev.h | 25 +++++++++++++++++++++++++
 3 files changed, 72 insertions(+)

diff --git a/lib/eventdev/eventdev_pmd.h b/lib/eventdev/eventdev_pmd.h
index aebab26852..7b12f80f57 100644
--- a/lib/eventdev/eventdev_pmd.h
+++ b/lib/eventdev/eventdev_pmd.h
@@ -481,6 +481,26 @@ typedef int (*eventdev_port_unlink_t)(struct rte_eventdev *dev, void *port,
 typedef int (*eventdev_port_unlinks_in_progress_t)(struct rte_eventdev *dev,
 		void *port);
 
+/**
+ * @internal
+ * Get address of memory location whose contents will change whenever there is
+ * new data to be received on an Event port.
+ *
+ * @param port
+ *   Eventdev port pointer.
+ * @param pmc
+ *   The pointer to power-optimized monitoring condition structure.
+ * @return
+ *   Negative errno value on error, 0 on success.
+ *
+ * @retval 0
+ *   Success
+ * @retval -EINVAL
+ *   Invalid parameters
+ */
+typedef int (*event_get_monitor_addr_t)(void *port,
+		struct rte_power_monitor_cond *pmc);
+
 /**
  * Converts nanoseconds to *timeout_ticks* value for rte_event_dequeue()
  *
@@ -1376,6 +1396,9 @@ struct eventdev_ops {
 	eventdev_dump_t dump;
 	/* Dump internal information */
 
+	/** Get power monitoring condition for event port */
+	event_get_monitor_addr_t get_monitor_addr;
+
 	eventdev_xstats_get_t xstats_get;
 	/**< Get extended device statistics. */
 	eventdev_xstats_get_names_t xstats_get_names;
diff --git a/lib/eventdev/rte_eventdev.c b/lib/eventdev/rte_eventdev.c
index 6ab4524332..ff77194783 100644
--- a/lib/eventdev/rte_eventdev.c
+++ b/lib/eventdev/rte_eventdev.c
@@ -860,6 +860,30 @@ rte_event_port_attr_get(uint8_t dev_id, uint8_t port_id, uint32_t attr_id,
 	return 0;
 }
 
+int
+rte_event_port_get_monitor_addr(uint8_t dev_id, uint8_t port_id,
+		struct rte_power_monitor_cond *pmc)
+{
+	struct rte_eventdev *dev;
+
+	RTE_EVENTDEV_VALID_DEVID_OR_ERR_RET(dev_id, -EINVAL);
+	dev = &rte_eventdevs[dev_id];
+	if (!is_valid_port(dev, port_id)) {
+		RTE_EDEV_LOG_ERR("Invalid port_id=%" PRIu8, port_id);
+		return -EINVAL;
+	}
+
+	if (pmc == NULL) {
+		RTE_EDEV_LOG_ERR("devid %u port %u power monitor condition is NULL\n",
+				dev_id, port_id);
+		return -EINVAL;
+	}
+
+	if (*dev->dev_ops->get_monitor_addr == NULL)
+		return -ENOTSUP;
+	return (*dev->dev_ops->get_monitor_addr)(dev->data->ports[port_id], pmc);
+}
+
 int
 rte_event_queue_attr_get(uint8_t dev_id, uint8_t queue_id, uint32_t attr_id,
 			uint32_t *attr_value)
diff --git a/lib/eventdev/rte_eventdev.h b/lib/eventdev/rte_eventdev.h
index a90e23ac8b..841b1fb9b5 100644
--- a/lib/eventdev/rte_eventdev.h
+++ b/lib/eventdev/rte_eventdev.h
@@ -215,6 +215,7 @@ extern "C" {
 #include <rte_errno.h>
 #include <rte_mbuf_pool_ops.h>
 #include <rte_mempool.h>
+#include <rte_power_intrinsics.h>
 
 #include "rte_eventdev_trace_fp.h"
 
@@ -984,6 +985,30 @@ int
 rte_event_port_attr_get(uint8_t dev_id, uint8_t port_id, uint32_t attr_id,
 			uint32_t *attr_value);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Retrieve the monitor condition for a given event port.
+ *
+ * @param dev_id
+ *   Eventdev id
+ * @param port_id
+ *   Eventdev port id
+ * @param pmc
+ *   The pointer to power-optimized monitoring condition structure.
+ *
+ * @return
+ *   - 0: Success.
+ *   -ENOTSUP: Operation not supported.
+ *   -EINVAL: Invalid parameters.
+ *   -ENODEV: Invalid device ID.
+ */
+__rte_experimental
+int
+rte_event_port_get_monitor_addr(uint8_t dev_id, uint8_t port_id,
+		struct rte_power_monitor_cond *pmc);
+
 /**
  * Start an event device.
  *
-- 
2.34.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [RFC PATCH 2/5] event/sw: support power monitor
  2023-04-19  9:54 [RFC PATCH 1/5] eventdev: add power monitoring API on event port Sivaprasad Tummala
@ 2023-04-19  9:54 ` Sivaprasad Tummala
  2023-04-19  9:54 ` [RFC PATCH 3/5] eventdev: support optional dequeue callbacks Sivaprasad Tummala
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 33+ messages in thread
From: Sivaprasad Tummala @ 2023-04-19  9:54 UTC (permalink / raw)
  To: david.hunt, jerinj, harry.van.haaren; +Cc: dev

Currently sw eventdev pmd does not support ``rte_power_monitor`` api.
This patch adds support by adding monitor callback that is called
whenever we enter sleep state and need to check if it is time to
wake up.

Signed-off-by: Sivaprasad Tummala <sivaprasad.tummala@amd.com>
---
 drivers/event/sw/sw_evdev.c        |  2 ++
 drivers/event/sw/sw_evdev.h        |  2 ++
 drivers/event/sw/sw_evdev_worker.c | 27 +++++++++++++++++++++++++++
 3 files changed, 31 insertions(+)

diff --git a/drivers/event/sw/sw_evdev.c b/drivers/event/sw/sw_evdev.c
index cfd659d774..448253cdc3 100644
--- a/drivers/event/sw/sw_evdev.c
+++ b/drivers/event/sw/sw_evdev.c
@@ -958,6 +958,8 @@ sw_probe(struct rte_vdev_device *vdev)
 			.port_unlink = sw_port_unlink,
 			.port_unlinks_in_progress = sw_port_unlinks_in_progress,
 
+			.get_monitor_addr = sw_event_get_monitor_addr,
+
 			.eth_rx_adapter_caps_get = sw_eth_rx_adapter_caps_get,
 
 			.timer_adapter_caps_get = sw_timer_adapter_caps_get,
diff --git a/drivers/event/sw/sw_evdev.h b/drivers/event/sw/sw_evdev.h
index c7b943a72b..26aa2fe283 100644
--- a/drivers/event/sw/sw_evdev.h
+++ b/drivers/event/sw/sw_evdev.h
@@ -312,6 +312,8 @@ int sw_xstats_reset(struct rte_eventdev *dev,
 		int16_t queue_port_id,
 		const uint64_t ids[],
 		uint32_t nb_ids);
+int sw_event_get_monitor_addr(void *port,
+		struct rte_power_monitor_cond *pmc);
 
 int test_sw_eventdev(void);
 
diff --git a/drivers/event/sw/sw_evdev_worker.c b/drivers/event/sw/sw_evdev_worker.c
index 063b919c7e..139c98cfe2 100644
--- a/drivers/event/sw/sw_evdev_worker.c
+++ b/drivers/event/sw/sw_evdev_worker.c
@@ -10,6 +10,33 @@
 
 #define PORT_ENQUEUE_MAX_BURST_SIZE 64
 
+static int
+sw_event_ring_monitor_callback(const uint64_t value,
+		const uint64_t arg[RTE_POWER_MONITOR_OPAQUE_SZ])
+{
+	/* Check if the head pointer has changed */
+	return value != arg[0];
+}
+
+int
+sw_event_get_monitor_addr(void *port, struct rte_power_monitor_cond *pmc)
+{
+	struct sw_port *p = (void *)port;
+	struct rte_event_ring *ev_ring = p->cq_worker_ring;
+	struct rte_ring *rng = &ev_ring->r;
+
+	/*
+	 * Monitor event ring producer tail since if
+	 * prod.tail moves there are events to dequeue
+	 */
+	pmc->addr = &rng->prod.tail;
+	pmc->size = sizeof(rng->prod.tail);
+	pmc->opaque[0] = rng->prod.tail;
+	pmc->fn = sw_event_ring_monitor_callback;
+
+	return 0;
+}
+
 static inline void
 sw_event_release(struct sw_port *p, uint8_t index)
 {
-- 
2.34.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [RFC PATCH 3/5] eventdev: support optional dequeue callbacks
  2023-04-19  9:54 [RFC PATCH 1/5] eventdev: add power monitoring API on event port Sivaprasad Tummala
  2023-04-19  9:54 ` [RFC PATCH 2/5] event/sw: support power monitor Sivaprasad Tummala
@ 2023-04-19  9:54 ` Sivaprasad Tummala
  2023-04-24 16:06   ` Ferruh Yigit
  2023-05-17 14:22   ` Burakov, Anatoly
  2023-04-19  9:54 ` [RFC PATCH 4/5] power: add eventdev support for power management Sivaprasad Tummala
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 33+ messages in thread
From: Sivaprasad Tummala @ 2023-04-19  9:54 UTC (permalink / raw)
  To: david.hunt, jerinj, harry.van.haaren; +Cc: dev

Add optional support for inline event processing within dequeue call.
For a dequeue callback, events dequeued from the event port were
passed them to a callback function if configured, to allow
additional processing. e.g. unpack batch of packets from each event
on dequeue, before passing back to the application.

Signed-off-by: Sivaprasad Tummala <sivaprasad.tummala@amd.com>
---
 lib/eventdev/eventdev_pmd.h      |  17 ++++
 lib/eventdev/eventdev_private.c  |  17 ++++
 lib/eventdev/rte_eventdev.c      |  78 +++++++++++++++++
 lib/eventdev/rte_eventdev.h      | 145 ++++++++++++++++++++++++++++++-
 lib/eventdev/rte_eventdev_core.h |  12 ++-
 lib/eventdev/version.map         |   6 ++
 6 files changed, 272 insertions(+), 3 deletions(-)

diff --git a/lib/eventdev/eventdev_pmd.h b/lib/eventdev/eventdev_pmd.h
index 7b12f80f57..c87e06993f 100644
--- a/lib/eventdev/eventdev_pmd.h
+++ b/lib/eventdev/eventdev_pmd.h
@@ -97,6 +97,19 @@ struct rte_eventdev_global {
 	uint8_t nb_devs;	/**< Number of devices found */
 };
 
+/**
+ * @internal
+ * Structure used to hold information about the callbacks to be called for a
+ * port on dequeue.
+ */
+struct rte_event_dequeue_callback {
+	struct rte_event_dequeue_callback *next;
+	union{
+		rte_dequeue_callback_fn dequeue;
+	} fn;
+	void *param;
+};
+
 /**
  * @internal
  * The data part, with no function pointers, associated with each device.
@@ -173,6 +186,10 @@ struct rte_eventdev {
 	/**< Pointer to PMD dequeue burst function. */
 	event_maintain_t maintain;
 	/**< Pointer to PMD port maintenance function. */
+	struct rte_event_dequeue_callback *post_dequeue_burst_cbs[RTE_EVENT_MAX_PORTS_PER_DEV];
+	/**<  User-supplied functions called from dequeue_burst to post-process
+	 * received packets before passing them to the user
+	 */
 	event_tx_adapter_enqueue_t txa_enqueue_same_dest;
 	/**< Pointer to PMD eth Tx adapter burst enqueue function with
 	 * events destined to same Eth port & Tx queue.
diff --git a/lib/eventdev/eventdev_private.c b/lib/eventdev/eventdev_private.c
index 1d3d9d357e..6d1cbdb17d 100644
--- a/lib/eventdev/eventdev_private.c
+++ b/lib/eventdev/eventdev_private.c
@@ -118,4 +118,21 @@ event_dev_fp_ops_set(struct rte_event_fp_ops *fp_op,
 	fp_op->txa_enqueue_same_dest = dev->txa_enqueue_same_dest;
 	fp_op->ca_enqueue = dev->ca_enqueue;
 	fp_op->data = dev->data->ports;
+	fp_op->ev_port.clbk = (void **)(uintptr_t)dev->post_dequeue_burst_cbs;
+	fp_op->ev_port.data = dev->data->ports;
+}
+
+uint16_t
+rte_event_dequeue_callbacks(uint8_t dev_id, uint8_t port_id,
+		struct rte_event *ev, uint16_t nb_events, void *opaque)
+{
+	static uint16_t nb_rx;
+	const struct rte_event_dequeue_callback *cb = opaque;
+
+	while (cb != NULL) {
+		nb_rx = cb->fn.dequeue(dev_id, port_id, ev,
+				nb_events, cb->param);
+		cb = cb->next;
+	}
+	return nb_rx;
 }
diff --git a/lib/eventdev/rte_eventdev.c b/lib/eventdev/rte_eventdev.c
index ff77194783..0d43cb2d0a 100644
--- a/lib/eventdev/rte_eventdev.c
+++ b/lib/eventdev/rte_eventdev.c
@@ -38,6 +38,9 @@ static struct rte_eventdev_global eventdev_globals = {
 /* Public fastpath APIs. */
 struct rte_event_fp_ops rte_event_fp_ops[RTE_EVENT_MAX_DEVS];
 
+/* spinlock for add/remove dequeue callbacks */
+static rte_spinlock_t event_dev_dequeue_cb_lock = RTE_SPINLOCK_INITIALIZER;
+
 /* Event dev north bound API implementation */
 
 uint8_t
@@ -860,6 +863,81 @@ rte_event_port_attr_get(uint8_t dev_id, uint8_t port_id, uint32_t attr_id,
 	return 0;
 }
 
+const struct rte_event_dequeue_callback *
+rte_event_add_dequeue_callback(uint8_t dev_id, uint8_t port_id,
+		rte_dequeue_callback_fn fn, void *user_param)
+{
+	struct rte_eventdev *dev;
+	struct rte_event_dequeue_callback *cb;
+	struct rte_event_dequeue_callback *tail;
+
+	/* check input parameters */
+	RTE_EVENTDEV_VALID_DEVID_OR_ERR_RET(dev_id, NULL);
+	dev = &rte_eventdevs[dev_id];
+	if (!is_valid_port(dev, port_id)) {
+		RTE_EDEV_LOG_ERR("Invalid port_id=%" PRIu8, port_id);
+		return NULL;
+	}
+
+	cb = rte_zmalloc(NULL, sizeof(*cb), 0);
+	if (cb == NULL) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	cb->fn.dequeue = fn;
+	cb->param = user_param;
+
+	rte_spinlock_lock(&event_dev_dequeue_cb_lock);
+	/* Add the callbacks in fifo order. */
+	tail = rte_eventdevs[dev_id].post_dequeue_burst_cbs[port_id];
+	if (!tail) {
+		/* Stores to cb->fn and cb->param should complete before
+		 * cb is visible to data plane.
+		 */
+		__atomic_store_n(
+			&rte_eventdevs[dev_id].post_dequeue_burst_cbs[port_id],
+			cb, __ATOMIC_RELEASE);
+	} else {
+		while (tail->next)
+			tail = tail->next;
+		/* Stores to cb->fn and cb->param should complete before
+		 * cb is visible to data plane.
+		 */
+		__atomic_store_n(&tail->next, cb, __ATOMIC_RELEASE);
+	}
+	rte_spinlock_unlock(&event_dev_dequeue_cb_lock);
+
+	return cb;
+}
+
+int
+rte_event_remove_dequeue_callback(uint8_t dev_id, uint8_t port_id,
+		const struct rte_event_dequeue_callback *user_cb)
+{
+	struct rte_eventdev *dev;
+
+	/* Check input parameters. */
+	RTE_EVENTDEV_VALID_DEVID_OR_ERR_RET(dev_id, -EINVAL);
+	dev = &rte_eventdevs[dev_id];
+	if (user_cb == NULL || !is_valid_port(dev, port_id))
+		return -EINVAL;
+
+	rte_spinlock_lock(&event_dev_dequeue_cb_lock);
+	prev_cb = &dev->post_dequeue_burst_cbs[port_id];
+	for (; *prev_cb != NULL; prev_cb = &cb->next) {
+		cb = *prev_cb;
+		if (cb == user_cb) {
+			/* Remove the user cb from the callback list. */
+			__atomic_store_n(prev_cb, cb->next, __ATOMIC_RELAXED);
+			ret = 0;
+			break;
+		}
+	}
+	rte_spinlock_unlock(&event_dev_dequeue_cb_lock);
+
+	return ret;
+}
+
 int
 rte_event_port_get_monitor_addr(uint8_t dev_id, uint8_t port_id,
 		struct rte_power_monitor_cond *pmc)
diff --git a/lib/eventdev/rte_eventdev.h b/lib/eventdev/rte_eventdev.h
index 841b1fb9b5..9ccd259058 100644
--- a/lib/eventdev/rte_eventdev.h
+++ b/lib/eventdev/rte_eventdev.h
@@ -948,6 +948,100 @@ void
 rte_event_port_quiesce(uint8_t dev_id, uint8_t port_id,
 		       rte_eventdev_port_flush_t release_cb, void *args);
 
+struct rte_event_dequeue_callback;
+
+/**
+ * Function type used for dequeue event processing callbacks.
+ *
+ * The callback function is called on dequeue with a burst of events that have
+ * been received on the given event port.
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ * @param port_id
+ *   The identifier of the event port.
+ * @param[out] ev
+ *   Points to an array of *nb_events* objects of type *rte_event* structure
+ *   for output to be populated with the dequeued event objects.
+ * @param nb_events
+ *   The maximum number of event objects to dequeue, typically number of
+ *   rte_event_port_dequeue_depth() available for this port.
+ * @param opaque
+ *   Opaque pointer of event port callback related data.
+ *
+ * @return
+ *  The number of event objects returned to the user.
+ */
+typedef uint16_t (*rte_dequeue_callback_fn)(uint8_t dev_id, uint8_t port_id,
+		struct rte_event *ev, uint16_t nb_events, void *user_param);
+
+/**
+ * Add a callback to be called on event dequeue on a given event device port.
+ *
+ * This API configures a function to be called for each burst of
+ * events dequeued on a given event device port. The return value is a pointer
+ * that can be used to later remove the callback using
+ * rte_event_remove_dequeue_callback().
+ *
+ * Multiple functions are called in the order that they are added.
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ * @param port_id
+ *   The identifier of the event port.
+ * @param fn
+ *   The callback function
+ * @param user_param
+ *   A generic pointer parameter which will be passed to each invocation of the
+ *   callback function on this event device port. Inter-thread synchronization
+ *   of any user data changes is the responsibility of the user.
+ *
+ * @return
+ *   NULL on error.
+ *   On success, a pointer value which can later be used to remove the callback.
+ */
+__rte_experimental
+const struct rte_event_dequeue_callback *
+rte_event_add_dequeue_callback(uint8_t dev_id, uint8_t port_id,
+		rte_dequeue_callback_fn fn, void *user_param);
+
+/**
+ * Remove a dequeue event callback from a given event device port.
+ *
+ * This API is used to removed callbacks that were added to a event device port
+ * using rte_event_add_dequeue_callback().
+ *
+ * Note: the callback is removed from the callback list but it isn't freed
+ * since the it may still be in use. The memory for the callback can be
+ * subsequently freed back by the application by calling rte_free():
+ *
+ * - Immediately - if the device is stopped, or the user knows that no
+ *   callbacks are in flight e.g. if called from the thread doing dequeue
+ *   on that port.
+ *
+ * - After a short delay - where the delay is sufficient to allow any
+ *   in-flight callbacks to complete. Alternately, the RCU mechanism can be
+ *   used to detect when data plane threads have ceased referencing the
+ *   callback memory.
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ * @param port_id
+ *   The identifier of the event port.
+ * @param user_cb
+ *   The callback function
+ *
+ * @return
+ *   - 0: Success. Callback was removed.
+ *   - -ENODEV:  If *dev_id* is invalid.
+ *   - -EINVAL:  The port_id is out of range, or the callback
+ *               is NULL.
+ */
+__rte_experimental
+int
+rte_event_remove_dequeue_callback(uint8_t dev_id, uint8_t port_id,
+		const struct rte_event_dequeue_callback *user_cb);
+
 /**
  * The queue depth of the port on the enqueue side
  */
@@ -2133,6 +2227,34 @@ rte_event_enqueue_forward_burst(uint8_t dev_id, uint8_t port_id,
 					 fp_ops->enqueue_forward_burst);
 }
 
+/**
+ * @internal
+ * Helper routine for rte_event_dequeue_burst().
+ * Should be called at exit from PMD's rte_event_dequeue() implementation.
+ * Does necessary post-processing - invokes dequeue callbacks if any, etc.
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ * @param port_id
+ *   The identifier of the event port.
+ * @param[out] ev
+ *   Points to an array of *nb_events* objects of type *rte_event* structure
+ *   for output to be populated with the dequeued event objects.
+ * @param nb_events
+ *   The maximum number of event objects to dequeue, typically number of
+ *   rte_event_port_dequeue_depth() available for this port.
+ * @param opaque
+ *   Opaque pointer of event port callback related data.
+ *
+ * @return
+ *  The number of event objects actually dequeued from the port. The return
+ *  value can be less than the value of the *nb_events* parameter when the
+ *  event port's queue is not full.
+ */
+__rte_experimental
+uint16_t rte_event_dequeue_callbacks(uint8_t dev_id, uint8_t port_id,
+		struct rte_event *ev, uint16_t nb_events, void *opaque);
+
 /**
  * Dequeue a burst of events objects or an event object from the event port
  * designated by its *event_port_id*, on an event device designated
@@ -2205,6 +2327,7 @@ rte_event_dequeue_burst(uint8_t dev_id, uint8_t port_id, struct rte_event ev[],
 {
 	const struct rte_event_fp_ops *fp_ops;
 	void *port;
+	uint16_t nb_rx;
 
 	fp_ops = &rte_event_fp_ops[dev_id];
 	port = fp_ops->data[port_id];
@@ -2226,10 +2349,28 @@ rte_event_dequeue_burst(uint8_t dev_id, uint8_t port_id, struct rte_event ev[],
 	 * requests nb_events as const one
 	 */
 	if (nb_events == 1)
-		return (fp_ops->dequeue)(port, ev, timeout_ticks);
+		nb_rx = fp_ops->dequeue(port, ev, timeout_ticks);
 	else
-		return (fp_ops->dequeue_burst)(port, ev, nb_events,
+		nb_rx = fp_ops->dequeue_burst(port, ev, nb_events,
 					       timeout_ticks);
+
+	{
+		void *cb;
+
+		/* __ATOMIC_RELEASE memory order was used when the
+		 * call back was inserted into the list.
+		 * Since there is a clear dependency between loading
+		 * cb and cb->fn/cb->next, __ATOMIC_ACQUIRE memory order is
+		 * not required.
+		 */
+		cb = __atomic_load_n((void **)&fp_ops->ev_port.clbk[port_id],
+				__ATOMIC_RELAXED);
+		if (unlikely(cb != NULL))
+			nb_rx = rte_event_dequeue_callbacks(dev_id, port_id,
+							ev, nb_rx, cb);
+	}
+
+	return nb_rx;
 }
 
 #define RTE_EVENT_DEV_MAINT_OP_FLUSH          (1 << 0)
diff --git a/lib/eventdev/rte_eventdev_core.h b/lib/eventdev/rte_eventdev_core.h
index c328bdbc82..b364ecc2a5 100644
--- a/lib/eventdev/rte_eventdev_core.h
+++ b/lib/eventdev/rte_eventdev_core.h
@@ -42,6 +42,14 @@ typedef uint16_t (*event_crypto_adapter_enqueue_t)(void *port,
 						   uint16_t nb_events);
 /**< @internal Enqueue burst of events on crypto adapter */
 
+struct rte_eventdev_port_data {
+	void **data;
+	/**< points to array of internal port data pointers */
+	void **clbk;
+	/**< points to array of port callback data pointers */
+};
+/**< @internal Structure used to hold opaque eventdev port data. */
+
 struct rte_event_fp_ops {
 	void **data;
 	/**< points to array of internal port data pointers */
@@ -65,7 +73,9 @@ struct rte_event_fp_ops {
 	/**< PMD Tx adapter enqueue same destination function. */
 	event_crypto_adapter_enqueue_t ca_enqueue;
 	/**< PMD Crypto adapter enqueue function. */
-	uintptr_t reserved[6];
+	struct rte_eventdev_port_data ev_port;
+	/**< Eventdev port data. */
+	uintptr_t reserved[3];
 } __rte_cache_aligned;
 
 extern struct rte_event_fp_ops rte_event_fp_ops[RTE_EVENT_MAX_DEVS];
diff --git a/lib/eventdev/version.map b/lib/eventdev/version.map
index 89068a5713..8ce54f5017 100644
--- a/lib/eventdev/version.map
+++ b/lib/eventdev/version.map
@@ -131,6 +131,12 @@ EXPERIMENTAL {
 	rte_event_eth_tx_adapter_runtime_params_init;
 	rte_event_eth_tx_adapter_runtime_params_set;
 	rte_event_timer_remaining_ticks_get;
+
+	# added in 23.07
+	rte_event_dequeue_callbacks
+	rte_event_add_dequeue_callback
+	rte_event_remove_dequeue_callback
+	rte_event_port_get_monitor_addr
 };
 
 INTERNAL {
-- 
2.34.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [RFC PATCH 4/5] power: add eventdev support for power management
  2023-04-19  9:54 [RFC PATCH 1/5] eventdev: add power monitoring API on event port Sivaprasad Tummala
  2023-04-19  9:54 ` [RFC PATCH 2/5] event/sw: support power monitor Sivaprasad Tummala
  2023-04-19  9:54 ` [RFC PATCH 3/5] eventdev: support optional dequeue callbacks Sivaprasad Tummala
@ 2023-04-19  9:54 ` Sivaprasad Tummala
  2023-05-17 14:43   ` Burakov, Anatoly
  2023-04-19  9:54 ` [RFC PATCH 5/5] examples/eventdev_p: add eventdev " Sivaprasad Tummala
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 33+ messages in thread
From: Sivaprasad Tummala @ 2023-04-19  9:54 UTC (permalink / raw)
  To: david.hunt, jerinj, harry.van.haaren; +Cc: dev

Add eventdev support to enable power saving when no events
are arriving. It is based on counting the number of empty
polls and, when the number reaches a certain threshold, entering
an architecture-defined optimized power state that will either wait
until a TSC timestamp expires, or when events arrive.

This API mandates a core-to-single-port mapping (i.e. one core polling
multiple ports of event device is not supported). This should be ok
as the general use case will have one CPU core using one port to
enqueue/dequeue events from an eventdev.

This design is using Eventdev PMD Dequeue callbacks.

1. MWAITX/MONITORX:

   When a certain threshold of empty polls is reached, the core will go
   into a power optimized sleep while waiting on an address of next RX
   descriptor to be written to.

2. Pause instruction

   This method uses the pause instruction to avoid busy polling.

Signed-off-by: Sivaprasad Tummala <sivaprasad.tummala@amd.com>
---
 lib/power/meson.build          |   2 +-
 lib/power/rte_power_pmd_mgmt.c | 226 +++++++++++++++++++++++++++++++++
 lib/power/rte_power_pmd_mgmt.h |  55 ++++++++
 lib/power/version.map          |   4 +
 4 files changed, 286 insertions(+), 1 deletion(-)

diff --git a/lib/power/meson.build b/lib/power/meson.build
index 1ce8b7c07d..528340e291 100644
--- a/lib/power/meson.build
+++ b/lib/power/meson.build
@@ -31,4 +31,4 @@ headers = files(
 if cc.has_argument('-Wno-cast-qual')
     cflags += '-Wno-cast-qual'
 endif
-deps += ['timer', 'ethdev']
+deps += ['timer', 'ethdev', 'eventdev']
diff --git a/lib/power/rte_power_pmd_mgmt.c b/lib/power/rte_power_pmd_mgmt.c
index ca1840387c..3c194872c3 100644
--- a/lib/power/rte_power_pmd_mgmt.c
+++ b/lib/power/rte_power_pmd_mgmt.c
@@ -9,8 +9,10 @@
 #include <rte_cpuflags.h>
 #include <rte_malloc.h>
 #include <rte_ethdev.h>
+#include <rte_eventdev.h>
 #include <rte_power_intrinsics.h>
 
+#include <eventdev_pmd.h>
 #include "rte_power_pmd_mgmt.h"
 #include "power_common.h"
 
@@ -53,6 +55,7 @@ struct queue_list_entry {
 	uint64_t n_empty_polls;
 	uint64_t n_sleeps;
 	const struct rte_eth_rxtx_callback *cb;
+	const struct rte_event_dequeue_callback *evt_cb;
 };
 
 struct pmd_core_cfg {
@@ -414,6 +417,64 @@ cfg_queues_stopped(struct pmd_core_cfg *queue_cfg)
 	return 1;
 }
 
+static uint16_t
+evt_clb_umwait(uint8_t dev_id, uint8_t port_id, struct rte_event *ev __rte_unused,
+		uint16_t nb_events, void *arg)
+{
+	struct queue_list_entry *queue_conf = arg;
+
+	/* this callback can't do more than one queue, omit multiqueue logic */
+	if (unlikely(nb_events == 0)) {
+		queue_conf->n_empty_polls++;
+		if (unlikely(queue_conf->n_empty_polls > emptypoll_max)) {
+			struct rte_power_monitor_cond pmc;
+			int ret;
+
+			/* use monitoring condition to sleep */
+			ret = rte_event_port_get_monitor_addr(dev_id, port_id,
+					&pmc);
+			if (ret == 0)
+				rte_power_monitor(&pmc, UINT64_MAX);
+		}
+	} else
+		queue_conf->n_empty_polls = 0;
+
+	return nb_events;
+}
+
+static uint16_t
+evt_clb_pause(uint8_t dev_id __rte_unused, uint8_t port_id __rte_unused,
+		struct rte_event *ev __rte_unused,
+		uint16_t nb_events, void *arg)
+{
+	const unsigned int lcore = rte_lcore_id();
+	struct queue_list_entry *queue_conf = arg;
+	struct pmd_core_cfg *lcore_conf;
+	const bool empty = nb_events == 0;
+	uint32_t pause_duration = rte_power_pmd_mgmt_get_pause_duration();
+
+	lcore_conf = &lcore_cfgs[lcore];
+
+	if (likely(!empty))
+		/* early exit */
+		queue_reset(lcore_conf, queue_conf);
+	else {
+		/* can this queue sleep? */
+		if (!queue_can_sleep(lcore_conf, queue_conf))
+			return nb_events;
+
+		/* can this lcore sleep? */
+		if (!lcore_can_sleep(lcore_conf))
+			return nb_events;
+
+		uint64_t i;
+		for (i = 0; i < global_data.pause_per_us * pause_duration; i++)
+			rte_pause();
+	}
+
+	return nb_events;
+}
+
 static int
 check_scale(unsigned int lcore)
 {
@@ -479,6 +540,171 @@ get_monitor_callback(void)
 		clb_multiwait : clb_umwait;
 }
 
+static int
+check_evt_monitor(struct pmd_core_cfg *cfg __rte_unused,
+		const union queue *qdata)
+{
+	struct rte_power_monitor_cond dummy;
+
+	/* check if rte_power_monitor is supported */
+	if (!global_data.intrinsics_support.power_monitor) {
+		RTE_LOG(DEBUG, POWER, "Monitoring intrinsics are not supported\n");
+		return -ENOTSUP;
+	}
+
+	/* check if the device supports the necessary PMD API */
+	if (rte_event_port_get_monitor_addr((uint8_t)qdata->portid, (uint8_t)qdata->qid,
+				&dummy) == -ENOTSUP) {
+		RTE_LOG(DEBUG, POWER, "event port does not support rte_event_get_monitor_addr\n");
+		return -ENOTSUP;
+	}
+
+	/* we're done */
+	return 0;
+}
+
+int
+rte_power_eventdev_pmgmt_port_enable(unsigned int lcore_id, uint8_t dev_id,
+		uint8_t port_id, enum rte_power_pmd_mgmt_type mode)
+{
+	const union queue qdata = {.portid = dev_id, .qid = port_id};
+	struct pmd_core_cfg *lcore_cfg;
+	struct queue_list_entry *queue_cfg;
+	struct rte_event_dev_info info;
+	rte_dequeue_callback_fn clb;
+	int ret;
+
+	RTE_EVENTDEV_VALID_DEVID_OR_ERR_RET(dev_id, -EINVAL);
+
+	if (lcore_id >= RTE_MAX_LCORE) {
+		ret = -EINVAL;
+		goto end;
+	}
+
+	if (rte_event_dev_info_get(dev_id, &info) < 0) {
+		ret = -EINVAL;
+		goto end;
+	}
+
+	/* check if queue id is valid */
+	if (port_id >= info.max_event_ports) {
+		ret = -EINVAL;
+		goto end;
+	}
+
+	lcore_cfg = &lcore_cfgs[lcore_id];
+
+	/* if callback was already enabled, check current callback type */
+	if (lcore_cfg->pwr_mgmt_state != PMD_MGMT_DISABLED &&
+		lcore_cfg->cb_mode != mode) {
+		ret = -EINVAL;
+		goto end;
+	}
+
+	/* we need this in various places */
+	rte_cpu_get_intrinsics_support(&global_data.intrinsics_support);
+
+	switch (mode) {
+	case RTE_POWER_MGMT_TYPE_MONITOR:
+		/* check if we can add a new port */
+		ret = check_evt_monitor(lcore_cfg, &qdata);
+		if (ret < 0)
+			goto end;
+
+		clb = evt_clb_umwait;
+		break;
+	case RTE_POWER_MGMT_TYPE_PAUSE:
+		/* figure out various time-to-tsc conversions */
+		if (global_data.tsc_per_us == 0)
+			calc_tsc();
+
+		clb = evt_clb_pause;
+		break;
+	default:
+		RTE_LOG(DEBUG, POWER, "Invalid power management type\n");
+		ret = -EINVAL;
+		goto end;
+	}
+	/* add this queue to the list */
+	ret = queue_list_add(lcore_cfg, &qdata);
+	if (ret < 0) {
+		RTE_LOG(DEBUG, POWER, "Failed to add queue to list: %s\n",
+				strerror(-ret));
+		goto end;
+	}
+	/* new queue is always added last */
+	queue_cfg = TAILQ_LAST(&lcore_cfg->head, queue_list_head);
+
+	/* when enabling first queue, ensure sleep target is not 0 */
+	if (lcore_cfg->n_queues == 1 && lcore_cfg->sleep_target == 0)
+		lcore_cfg->sleep_target = 1;
+
+	/* initialize data before enabling the callback */
+	if (lcore_cfg->n_queues == 1) {
+		lcore_cfg->cb_mode = mode;
+		lcore_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED;
+	}
+	queue_cfg->evt_cb = rte_event_add_dequeue_callback(dev_id, port_id,
+						clb, queue_cfg);
+
+	ret = 0;
+end:
+	return ret;
+}
+
+int
+rte_power_eventdev_pmgmt_port_disable(unsigned int lcore_id,
+		uint8_t dev_id, uint8_t port_id)
+{
+	const union queue qdata = {.portid = dev_id, .qid = port_id};
+	struct pmd_core_cfg *lcore_cfg;
+	struct queue_list_entry *queue_cfg;
+
+	RTE_EVENTDEV_VALID_DEVID_OR_ERR_RET(dev_id, -EINVAL);
+
+	if (lcore_id >= RTE_MAX_LCORE)
+		return -EINVAL;
+
+	/* no need to check queue id as wrong queue id would not be enabled */
+	lcore_cfg = &lcore_cfgs[lcore_id];
+
+	if (lcore_cfg->pwr_mgmt_state != PMD_MGMT_ENABLED)
+		return -EINVAL;
+
+	/*
+	 * There is no good/easy way to do this without race conditions, so we
+	 * are just going to throw our hands in the air and hope that the user
+	 * has read the documentation and has ensured that ports are stopped at
+	 * the time we enter the API functions.
+	 */
+	queue_cfg = queue_list_take(lcore_cfg, &qdata);
+	if (queue_cfg == NULL)
+		return -ENOENT;
+
+	/* if we've removed all queues from the lists, set state to disabled */
+	if (lcore_cfg->n_queues == 0)
+		lcore_cfg->pwr_mgmt_state = PMD_MGMT_DISABLED;
+
+	switch (lcore_cfg->cb_mode) {
+	case RTE_POWER_MGMT_TYPE_MONITOR: /* fall-through */
+	case RTE_POWER_MGMT_TYPE_SCALE:
+	case RTE_POWER_MGMT_TYPE_PAUSE:
+		rte_event_remove_dequeue_callback(dev_id, port_id,
+			queue_cfg->evt_cb);
+		break;
+	}
+	/*
+	 * the API doc mandates that the user stops all processing on affected
+	 * ports before calling any of these API's, so we can assume that the
+	 * callbacks can be freed. we're intentionally casting away const-ness.
+	 */
+	rte_free((void *)queue_cfg->evt_cb);
+	free(queue_cfg);
+
+	return 0;
+}
+
+
 int
 rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 		uint16_t queue_id, enum rte_power_pmd_mgmt_type mode)
diff --git a/lib/power/rte_power_pmd_mgmt.h b/lib/power/rte_power_pmd_mgmt.h
index 0f1a2eb22e..e1966b9777 100644
--- a/lib/power/rte_power_pmd_mgmt.h
+++ b/lib/power/rte_power_pmd_mgmt.h
@@ -87,6 +87,61 @@ int
 rte_power_ethdev_pmgmt_queue_disable(unsigned int lcore_id,
 		uint16_t port_id, uint16_t queue_id);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice.
+ *
+ * Enable power management on a specified Event device port and lcore.
+ *
+ * @note This function is not thread-safe.
+ *
+ * @warning This function must be called when the event device stopped and
+ * no enqueue/dequeue is in progress!
+ *
+ * @param lcore_id
+ *   The lcore the event port will be polled from.
+ * @param dev_id
+ *   The identifier of the device.
+ * @param port_id
+ *   Event port identifier of the Event device.
+ * @param mode
+ *   The power management scheme to use for specified event port.
+ * @return
+ *   0 on success
+ *   <0 on error
+ */
+__rte_experimental
+int
+rte_power_eventdev_pmgmt_port_enable(unsigned int lcore_id,
+		uint8_t dev_id, uint8_t port_id,
+		enum rte_power_pmd_mgmt_type mode);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice.
+ *
+ * Disable power management on a specified Ethernet device Rx queue and lcore.
+ *
+ * @note This function is not thread-safe.
+ *
+ * @warning This function must be called when all affected Ethernet queues are
+ *   stopped and no Rx/Tx is in progress!
+ *
+ * @param lcore_id
+ *   The lcore the Rx queue is polled from.
+ * @param dev_id
+ *   The identifier of the device.
+ * @param port_id
+ *   Event port identifier of the Event device.
+ * @return
+ *   0 on success
+ *   <0 on error
+ */
+__rte_experimental
+int
+rte_power_eventdev_pmgmt_port_disable(unsigned int lcore_id,
+		uint8_t dev_id, uint8_t port_id);
+
 /**
  * @warning
  * @b EXPERIMENTAL: this API may change, or be removed, without prior notice.
diff --git a/lib/power/version.map b/lib/power/version.map
index 05d544e947..a8711779b6 100644
--- a/lib/power/version.map
+++ b/lib/power/version.map
@@ -52,4 +52,8 @@ EXPERIMENTAL {
 	rte_power_uncore_get_num_freqs;
 	rte_power_uncore_get_num_pkgs;
 	rte_power_uncore_init;
+
+	# added in 23.07
+	rte_power_eventdev_pmgmt_port_enable;
+	rte_power_eventdev_pmgmt_port_disable;
 };
-- 
2.34.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [RFC PATCH 5/5] examples/eventdev_p: add eventdev power management
  2023-04-19  9:54 [RFC PATCH 1/5] eventdev: add power monitoring API on event port Sivaprasad Tummala
                   ` (2 preceding siblings ...)
  2023-04-19  9:54 ` [RFC PATCH 4/5] power: add eventdev support for power management Sivaprasad Tummala
@ 2023-04-19  9:54 ` Sivaprasad Tummala
  2023-04-19 10:15 ` [RFC PATCH 1/5] eventdev: add power monitoring API on event port Jerin Jacob
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 33+ messages in thread
From: Sivaprasad Tummala @ 2023-04-19  9:54 UTC (permalink / raw)
  To: david.hunt, jerinj, harry.van.haaren; +Cc: dev, Sivaprasad Tummala

From: Sivaprasad Tummala <Sivaprasad.Tummala@amd.com>

Add power management feature support to eventdev_pipeline sample app.

Signed-off-by: Sivaprasad Tummala <Sivaprasad.Tummala@amd.com>
---
 examples/eventdev_pipeline/main.c                 | 15 ++++++++++++++-
 examples/eventdev_pipeline/pipeline_common.h      |  1 +
 .../eventdev_pipeline/pipeline_worker_generic.c   |  1 +
 3 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/examples/eventdev_pipeline/main.c b/examples/eventdev_pipeline/main.c
index 8d6c90f15d..bd6d568b78 100644
--- a/examples/eventdev_pipeline/main.c
+++ b/examples/eventdev_pipeline/main.c
@@ -430,7 +430,20 @@ main(int argc, char **argv)
 			continue;
 
 		dump_core_info(lcore_id, worker_data, worker_idx);
-
+		{
+			if (fdata->worker_core[lcore_id]) {
+				err = rte_power_eventdev_pmgmt_port_enable(
+							lcore_id, worker_data[worker_idx].dev_id,
+							worker_data[worker_idx].port_id,
+							RTE_POWER_MGMT_TYPE_MONITOR);
+				if (err) {
+					RTE_LOG(ERR, POWER,
+						"Power Management enabled failed on core %u\n",
+						lcore_id);
+					continue;
+				}
+			}
+		}
 		err = rte_eal_remote_launch(fdata->cap.worker,
 				&worker_data[worker_idx], lcore_id);
 		if (err) {
diff --git a/examples/eventdev_pipeline/pipeline_common.h b/examples/eventdev_pipeline/pipeline_common.h
index 28b6ab85ff..b33162adfb 100644
--- a/examples/eventdev_pipeline/pipeline_common.h
+++ b/examples/eventdev_pipeline/pipeline_common.h
@@ -19,6 +19,7 @@
 #include <rte_event_eth_tx_adapter.h>
 #include <rte_service.h>
 #include <rte_service_component.h>
+#include <rte_power_pmd_mgmt.h>
 
 #define MAX_NUM_STAGES 8
 #define BATCH_SIZE 16
diff --git a/examples/eventdev_pipeline/pipeline_worker_generic.c b/examples/eventdev_pipeline/pipeline_worker_generic.c
index 783f68c91e..22d644bd51 100644
--- a/examples/eventdev_pipeline/pipeline_worker_generic.c
+++ b/examples/eventdev_pipeline/pipeline_worker_generic.c
@@ -6,6 +6,7 @@
 
 #include <stdlib.h>
 
+#include <rte_power_pmd_mgmt.h>
 #include "pipeline_common.h"
 
 static __rte_always_inline int
-- 
2.34.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 1/5] eventdev: add power monitoring API on event port
  2023-04-19  9:54 [RFC PATCH 1/5] eventdev: add power monitoring API on event port Sivaprasad Tummala
                   ` (3 preceding siblings ...)
  2023-04-19  9:54 ` [RFC PATCH 5/5] examples/eventdev_p: add eventdev " Sivaprasad Tummala
@ 2023-04-19 10:15 ` Jerin Jacob
  2023-04-24 16:06   ` Ferruh Yigit
  2023-05-17 14:48 ` Burakov, Anatoly
  2023-10-16 20:57 ` [PATCH v1 1/6] " Sivaprasad Tummala
  6 siblings, 1 reply; 33+ messages in thread
From: Jerin Jacob @ 2023-04-19 10:15 UTC (permalink / raw)
  To: Sivaprasad Tummala
  Cc: david.hunt, jerinj, harry.van.haaren, dev, Pavan Nikhilesh,
	McDaniel, Timothy, Shijith Thotton, Hemant Agrawal,
	Sachin Saxena, Mattias Rönnblom, Peter Mccarthy, Liang Ma

On Wed, Apr 19, 2023 at 3:24 PM Sivaprasad Tummala
<sivaprasad.tummala@amd.com> wrote:
>
> A new API to allow power monitoring condition on event port to
> optimize power when no events are arriving on an event port for
> the worker core to process in an eventdev based pipelined application.
>
> Signed-off-by: Sivaprasad Tummala <sivaprasad.tummala@amd.com>
> + *
> + * @param dev_id
> + *   Eventdev id
> + * @param port_id
> + *   Eventdev port id
> + * @param pmc
> + *   The pointer to power-optimized monitoring condition structure.
> + *
> + * @return
> + *   - 0: Success.
> + *   -ENOTSUP: Operation not supported.
> + *   -EINVAL: Invalid parameters.
> + *   -ENODEV: Invalid device ID.
> + */
> +__rte_experimental
> +int
> +rte_event_port_get_monitor_addr(uint8_t dev_id, uint8_t port_id,
> +               struct rte_power_monitor_cond *pmc);

+ eventdev driver maintainers

I think, we don't need to expose this application due to applications
1)To make applications to be transparent whether power saving is enabled or not?
2)Some HW and Arch already supports power managent in driver and in HW
(Not using  CPU architecture directly)

If so, that will be translated to following,
a) Add rte_event_port_power_saving_ena_dis(uint8_t dev_id, uint8_t
port_id, bool ena) for controlling power saving in slowpath.
b) Create reusable PMD private function based on the CPU architecture
power saving primitive to cover the PMD don't have native power saving
support.
c)Update rte_event_dequeue_burst() burst of PMD callback to use (b).




> +
>  /**
>   * Start an event device.
>   *
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 1/5] eventdev: add power monitoring API on event port
  2023-04-19 10:15 ` [RFC PATCH 1/5] eventdev: add power monitoring API on event port Jerin Jacob
@ 2023-04-24 16:06   ` Ferruh Yigit
  2023-04-25  4:09     ` Jerin Jacob
  2023-04-25  6:19     ` Mattias Rönnblom
  0 siblings, 2 replies; 33+ messages in thread
From: Ferruh Yigit @ 2023-04-24 16:06 UTC (permalink / raw)
  To: Jerin Jacob, Sivaprasad Tummala
  Cc: david.hunt, jerinj, harry.van.haaren, dev, Pavan Nikhilesh,
	McDaniel, Timothy, Shijith Thotton, Hemant Agrawal,
	Sachin Saxena, Mattias Rönnblom, Peter Mccarthy, Liang Ma

On 4/19/2023 11:15 AM, Jerin Jacob wrote:
> On Wed, Apr 19, 2023 at 3:24 PM Sivaprasad Tummala
> <sivaprasad.tummala@amd.com> wrote:
>>
>> A new API to allow power monitoring condition on event port to
>> optimize power when no events are arriving on an event port for
>> the worker core to process in an eventdev based pipelined application.
>>
>> Signed-off-by: Sivaprasad Tummala <sivaprasad.tummala@amd.com>
>> + *
>> + * @param dev_id
>> + *   Eventdev id
>> + * @param port_id
>> + *   Eventdev port id
>> + * @param pmc
>> + *   The pointer to power-optimized monitoring condition structure.
>> + *
>> + * @return
>> + *   - 0: Success.
>> + *   -ENOTSUP: Operation not supported.
>> + *   -EINVAL: Invalid parameters.
>> + *   -ENODEV: Invalid device ID.
>> + */
>> +__rte_experimental
>> +int
>> +rte_event_port_get_monitor_addr(uint8_t dev_id, uint8_t port_id,
>> +               struct rte_power_monitor_cond *pmc);
> 
> + eventdev driver maintainers
> 
> I think, we don't need to expose this application due to applications
> 1)To make applications to be transparent whether power saving is enabled or not?
> 2)Some HW and Arch already supports power managent in driver and in HW
> (Not using  CPU architecture directly)
> 
> If so, that will be translated to following,
> a) Add rte_event_port_power_saving_ena_dis(uint8_t dev_id, uint8_t
> port_id, bool ena) for controlling power saving in slowpath.
> b) Create reusable PMD private function based on the CPU architecture
> power saving primitive to cover the PMD don't have native power saving
> support.
> c)Update rte_event_dequeue_burst() burst of PMD callback to use (b).
> 
> 

Hi Jerin,

ethdev approach seems applied here.

In ethdev, 'rte_event_port_get_monitor_addr()' equivalent is
'rte_eth_get_monitor_addr()'.

Although 'rte_eth_get_monitor_addr()' is public API, it is currently
only called from Rx/Tx callback functions implemented in the power library.
But I assume intention to make it public is to enable users to implement
their own callback functions that has custom algorithm for the power
management.

And probably same is true for the 'rte_event_port_get_monitor_addr()'.


Also instead of implementing power features for withing PMDs, isn't it
better to have a common eventdev layer for it?

For the PMDs benefit from HW event manager, just not implementing
.get_monitor_addr() dev_ops will make them free from power related APIs.





^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 3/5] eventdev: support optional dequeue callbacks
  2023-04-19  9:54 ` [RFC PATCH 3/5] eventdev: support optional dequeue callbacks Sivaprasad Tummala
@ 2023-04-24 16:06   ` Ferruh Yigit
  2023-05-17 14:22   ` Burakov, Anatoly
  1 sibling, 0 replies; 33+ messages in thread
From: Ferruh Yigit @ 2023-04-24 16:06 UTC (permalink / raw)
  To: Sivaprasad Tummala, david.hunt, jerinj, harry.van.haaren; +Cc: dev

On 4/19/2023 10:54 AM, Sivaprasad Tummala wrote:
> --- a/lib/eventdev/version.map
> +++ b/lib/eventdev/version.map
> @@ -131,6 +131,12 @@ EXPERIMENTAL {
>  	rte_event_eth_tx_adapter_runtime_params_init;
>  	rte_event_eth_tx_adapter_runtime_params_set;
>  	rte_event_timer_remaining_ticks_get;
> +
> +	# added in 23.07
> +	rte_event_dequeue_callbacks
> +	rte_event_add_dequeue_callback
> +	rte_event_remove_dequeue_callback
> +	rte_event_port_get_monitor_addr

'rte_event_port_get_monitor_addr' belongs to previous patch 1/5, where
it is introduced.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 1/5] eventdev: add power monitoring API on event port
  2023-04-24 16:06   ` Ferruh Yigit
@ 2023-04-25  4:09     ` Jerin Jacob
  2023-05-02 11:19       ` Ferruh Yigit
  2023-04-25  6:19     ` Mattias Rönnblom
  1 sibling, 1 reply; 33+ messages in thread
From: Jerin Jacob @ 2023-04-25  4:09 UTC (permalink / raw)
  To: Ferruh Yigit
  Cc: Sivaprasad Tummala, david.hunt, jerinj, harry.van.haaren, dev,
	Pavan Nikhilesh, McDaniel, Timothy, Shijith Thotton,
	Hemant Agrawal, Sachin Saxena, Mattias Rönnblom,
	Peter Mccarthy, Liang Ma

On Mon, Apr 24, 2023 at 9:36 PM Ferruh Yigit <ferruh.yigit@amd.com> wrote:
>
> On 4/19/2023 11:15 AM, Jerin Jacob wrote:
> > On Wed, Apr 19, 2023 at 3:24 PM Sivaprasad Tummala
> > <sivaprasad.tummala@amd.com> wrote:
> >>
> >> A new API to allow power monitoring condition on event port to
> >> optimize power when no events are arriving on an event port for
> >> the worker core to process in an eventdev based pipelined application.
> >>
> >> Signed-off-by: Sivaprasad Tummala <sivaprasad.tummala@amd.com>
> >> + *
> >> + * @param dev_id
> >> + *   Eventdev id
> >> + * @param port_id
> >> + *   Eventdev port id
> >> + * @param pmc
> >> + *   The pointer to power-optimized monitoring condition structure.
> >> + *
> >> + * @return
> >> + *   - 0: Success.
> >> + *   -ENOTSUP: Operation not supported.
> >> + *   -EINVAL: Invalid parameters.
> >> + *   -ENODEV: Invalid device ID.
> >> + */
> >> +__rte_experimental
> >> +int
> >> +rte_event_port_get_monitor_addr(uint8_t dev_id, uint8_t port_id,
> >> +               struct rte_power_monitor_cond *pmc);
> >
> > + eventdev driver maintainers
> >
> > I think, we don't need to expose this application due to applications
> > 1)To make applications to be transparent whether power saving is enabled or not?
> > 2)Some HW and Arch already supports power managent in driver and in HW
> > (Not using  CPU architecture directly)
> >
> > If so, that will be translated to following,
> > a) Add rte_event_port_power_saving_ena_dis(uint8_t dev_id, uint8_t
> > port_id, bool ena) for controlling power saving in slowpath.
> > b) Create reusable PMD private function based on the CPU architecture
> > power saving primitive to cover the PMD don't have native power saving
> > support.
> > c)Update rte_event_dequeue_burst() burst of PMD callback to use (b).
> >
> >
>
> Hi Jerin,

Hi Ferruh,

>
> ethdev approach seems applied here.

Understands that. But none of the NIC HW supports power management at
HW level like eventdev, so that way
for what we are doing for ethdev is a correct abstraction for ethdev.

>
> In ethdev, 'rte_event_port_get_monitor_addr()' equivalent is
> 'rte_eth_get_monitor_addr()'.
>
> Although 'rte_eth_get_monitor_addr()' is public API, it is currently
> only called from Rx/Tx callback functions implemented in the power library.
> But I assume intention to make it public is to enable users to implement
> their own callback functions that has custom algorithm for the power
> management.

If there is a use case for customizing with own callback, we can provide that.
Provided NULL is valid with default algorithm.

>
> And probably same is true for the 'rte_event_port_get_monitor_addr()'.
>
>
> Also instead of implementing power features for withing PMDs, isn't it
> better to have a common eventdev layer for it?

We can have rte_evetdev_pmd_* APIs as non-public APIs.
My only objection is to NOT introduce _monitor_ APIs at eventdev level,
Instead, _monitor_ is one way to do it in SW, So we need higher level
of abstraction.

>
> For the PMDs benefit from HW event manager, just not implementing
> .get_monitor_addr() dev_ops will make them free from power related APIs.

But application fast path code gets diverged by exposing low level primitives.


>
>
>
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 1/5] eventdev: add power monitoring API on event port
  2023-04-24 16:06   ` Ferruh Yigit
  2023-04-25  4:09     ` Jerin Jacob
@ 2023-04-25  6:19     ` Mattias Rönnblom
  2023-05-02 10:43       ` Ferruh Yigit
  1 sibling, 1 reply; 33+ messages in thread
From: Mattias Rönnblom @ 2023-04-25  6:19 UTC (permalink / raw)
  To: Ferruh Yigit, Jerin Jacob, Sivaprasad Tummala
  Cc: david.hunt, jerinj, harry.van.haaren, dev, Pavan Nikhilesh,
	McDaniel, Timothy, Shijith Thotton, Hemant Agrawal,
	Sachin Saxena, Peter Mccarthy, Liang Ma

On 2023-04-24 18:06, Ferruh Yigit wrote:
> On 4/19/2023 11:15 AM, Jerin Jacob wrote:
>> On Wed, Apr 19, 2023 at 3:24 PM Sivaprasad Tummala
>> <sivaprasad.tummala@amd.com> wrote:
>>>
>>> A new API to allow power monitoring condition on event port to
>>> optimize power when no events are arriving on an event port for
>>> the worker core to process in an eventdev based pipelined application.
>>>
>>> Signed-off-by: Sivaprasad Tummala <sivaprasad.tummala@amd.com>
>>> + *
>>> + * @param dev_id
>>> + *   Eventdev id
>>> + * @param port_id
>>> + *   Eventdev port id
>>> + * @param pmc
>>> + *   The pointer to power-optimized monitoring condition structure.
>>> + *
>>> + * @return
>>> + *   - 0: Success.
>>> + *   -ENOTSUP: Operation not supported.
>>> + *   -EINVAL: Invalid parameters.
>>> + *   -ENODEV: Invalid device ID.
>>> + */
>>> +__rte_experimental
>>> +int
>>> +rte_event_port_get_monitor_addr(uint8_t dev_id, uint8_t port_id,
>>> +               struct rte_power_monitor_cond *pmc);
>>
>> + eventdev driver maintainers
>>
>> I think, we don't need to expose this application due to applications
>> 1)To make applications to be transparent whether power saving is enabled or not?
>> 2)Some HW and Arch already supports power managent in driver and in HW
>> (Not using  CPU architecture directly)
>>
>> If so, that will be translated to following,
>> a) Add rte_event_port_power_saving_ena_dis(uint8_t dev_id, uint8_t
>> port_id, bool ena) for controlling power saving in slowpath.
>> b) Create reusable PMD private function based on the CPU architecture
>> power saving primitive to cover the PMD don't have native power saving
>> support.
>> c)Update rte_event_dequeue_burst() burst of PMD callback to use (b).
>>
>>
> 
> Hi Jerin,
> 
> ethdev approach seems applied here.
> 
> In ethdev, 'rte_event_port_get_monitor_addr()' equivalent is
> 'rte_eth_get_monitor_addr()'.
> 
> Although 'rte_eth_get_monitor_addr()' is public API, it is currently
> only called from Rx/Tx callback functions implemented in the power library.
> But I assume intention to make it public is to enable users to implement
> their own callback functions that has custom algorithm for the power
> management.
> 
> And probably same is true for the 'rte_event_port_get_monitor_addr()'.
> 
> 
> Also instead of implementing power features for withing PMDs, isn't it
> better to have a common eventdev layer for it?
> 

To allow that question to be answered, I think you need to be more 
specific what are "power features".

 From what it seems to me, the get_monitor_addr() family of functions 
address the pretty narrow case of allowing umwait (or the non-x86 
equivalent) to be used to wait for new events. It leaves all the heavy 
lifting to the app, which needs to figure out how loaded each CPU core 
is, what backlog of work there is, how to shuffle work around to get the 
most out of the power, how to translate wall-clock latency requirements 
into the equation, what CPU (and/or accelerator/NIC-level) power 
features to employ (e.g., DVFS, sleep states, umwait), etc.

In the context of Eventdev, optimizing for power may include packing 
more flows into the same port, in low-load situations. Keeping a few 
cores relatively busy, and the rest in some deep sleep state may well be 
the best solution for certain (most?) systems. For such a feature to 
work, the event device must be in the loop, but the mechanics could (and 
should) be generic. Eventdev could also control DVFS.

A reasonably generic power management mechanism could go into Eventdev a 
combination of the event device drivers, and some generic functions). 
(Various policies would still need to come from the app.)

I think keeping this kind of functionality in Eventdev works well 
provided the only source of work is Eventdev events (i.e., most or all 
fast path lcores are "pure" event-based lcores). No non-eventdev timer 
wheels, no non-eventdev lookaside accelerator or I/O device access, no 
control plane rings to poll, etc.

If such a model is too limiting, another option is to put the central 
power management function in the service framework (with a lot of help 
from Eventdev, RTE timer, and other sources of work as well).

> For the PMDs benefit from HW event manager, just not implementing
> .get_monitor_addr() dev_ops will make them free from power related APIs.
> 
> 
> 


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 1/5] eventdev: add power monitoring API on event port
  2023-04-25  6:19     ` Mattias Rönnblom
@ 2023-05-02 10:43       ` Ferruh Yigit
  0 siblings, 0 replies; 33+ messages in thread
From: Ferruh Yigit @ 2023-05-02 10:43 UTC (permalink / raw)
  To: Mattias Rönnblom, Jerin Jacob, Sivaprasad Tummala
  Cc: david.hunt, jerinj, harry.van.haaren, dev, Pavan Nikhilesh,
	McDaniel, Timothy, Shijith Thotton, Hemant Agrawal,
	Sachin Saxena, Peter Mccarthy, Liang Ma

On 4/25/2023 7:19 AM, Mattias Rönnblom wrote:
> On 2023-04-24 18:06, Ferruh Yigit wrote:
>> On 4/19/2023 11:15 AM, Jerin Jacob wrote:
>>> On Wed, Apr 19, 2023 at 3:24 PM Sivaprasad Tummala
>>> <sivaprasad.tummala@amd.com> wrote:
>>>>
>>>> A new API to allow power monitoring condition on event port to
>>>> optimize power when no events are arriving on an event port for
>>>> the worker core to process in an eventdev based pipelined application.
>>>>
>>>> Signed-off-by: Sivaprasad Tummala <sivaprasad.tummala@amd.com>
>>>> + *
>>>> + * @param dev_id
>>>> + *   Eventdev id
>>>> + * @param port_id
>>>> + *   Eventdev port id
>>>> + * @param pmc
>>>> + *   The pointer to power-optimized monitoring condition structure.
>>>> + *
>>>> + * @return
>>>> + *   - 0: Success.
>>>> + *   -ENOTSUP: Operation not supported.
>>>> + *   -EINVAL: Invalid parameters.
>>>> + *   -ENODEV: Invalid device ID.
>>>> + */
>>>> +__rte_experimental
>>>> +int
>>>> +rte_event_port_get_monitor_addr(uint8_t dev_id, uint8_t port_id,
>>>> +               struct rte_power_monitor_cond *pmc);
>>>
>>> + eventdev driver maintainers
>>>
>>> I think, we don't need to expose this application due to applications
>>> 1)To make applications to be transparent whether power saving is enabled or not?
>>> 2)Some HW and Arch already supports power managent in driver and in HW
>>> (Not using  CPU architecture directly)
>>>
>>> If so, that will be translated to following,
>>> a) Add rte_event_port_power_saving_ena_dis(uint8_t dev_id, uint8_t
>>> port_id, bool ena) for controlling power saving in slowpath.
>>> b) Create reusable PMD private function based on the CPU architecture
>>> power saving primitive to cover the PMD don't have native power saving
>>> support.
>>> c)Update rte_event_dequeue_burst() burst of PMD callback to use (b).
>>>
>>>
>>
>> Hi Jerin,
>>
>> ethdev approach seems applied here.
>>
>> In ethdev, 'rte_event_port_get_monitor_addr()' equivalent is
>> 'rte_eth_get_monitor_addr()'.
>>
>> Although 'rte_eth_get_monitor_addr()' is public API, it is currently
>> only called from Rx/Tx callback functions implemented in the power library.
>> But I assume intention to make it public is to enable users to implement
>> their own callback functions that has custom algorithm for the power
>> management.
>>
>> And probably same is true for the 'rte_event_port_get_monitor_addr()'.
>>
>>
>> Also instead of implementing power features for withing PMDs, isn't it
>> better to have a common eventdev layer for it?
>>
> 
> To allow that question to be answered, I think you need to be more 
> specific what are "power features".
> 
>  From what it seems to me, the get_monitor_addr() family of functions 
> address the pretty narrow case of allowing umwait (or the non-x86 
> equivalent) to be used to wait for new events. It leaves all the heavy 
> lifting to the app, which needs to figure out how loaded each CPU core 
> is, what backlog of work there is, how to shuffle work around to get the 
> most out of the power, how to translate wall-clock latency requirements 
> into the equation, what CPU (and/or accelerator/NIC-level) power 
> features to employ (e.g., DVFS, sleep states, umwait), etc.
> 
> In the context of Eventdev, optimizing for power may include packing 
> more flows into the same port, in low-load situations. Keeping a few 
> cores relatively busy, and the rest in some deep sleep state may well be 
> the best solution for certain (most?) systems. For such a feature to 
> work, the event device must be in the loop, but the mechanics could (and 
> should) be generic. Eventdev could also control DVFS.
> 
> A reasonably generic power management mechanism could go into Eventdev a 
> combination of the event device drivers, and some generic functions). 
> (Various policies would still need to come from the app.)
> 
> I think keeping this kind of functionality in Eventdev works well 
> provided the only source of work is Eventdev events (i.e., most or all 
> fast path lcores are "pure" event-based lcores). No non-eventdev timer 
> wheels, no non-eventdev lookaside accelerator or I/O device access, no 
> control plane rings to poll, etc.
> 
> If such a model is too limiting, another option is to put the central 
> power management function in the service framework (with a lot of help 
> from Eventdev, RTE timer, and other sources of work as well).
> 

Hi Mattias,

The current power management features referred in the scope of this
patch is around umwait use case as you mentioned.

It has default callbacks that application benefit with minimal
involvement from application, but if application wants more
sophisticated algorithm, needs to implement its own functions.

And I agree to have more comprehensive power management, it has benefit
but it has to start somewhere and we can grow it more by time. Also it
requires more support from community, not just from some vendors.

I think it is a good start to enable some HW features for power
management and make existing APIs more HW agnostic.

>> For the PMDs benefit from HW event manager, just not implementing
>> .get_monitor_addr() dev_ops will make them free from power related APIs.
>>
>>
>>
> 


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 1/5] eventdev: add power monitoring API on event port
  2023-04-25  4:09     ` Jerin Jacob
@ 2023-05-02 11:19       ` Ferruh Yigit
  2023-05-03  7:58         ` Jerin Jacob
  0 siblings, 1 reply; 33+ messages in thread
From: Ferruh Yigit @ 2023-05-02 11:19 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: Sivaprasad Tummala, david.hunt, jerinj, harry.van.haaren, dev,
	Pavan Nikhilesh, McDaniel, Timothy, Shijith Thotton,
	Hemant Agrawal, Sachin Saxena, Mattias Rönnblom,
	Peter Mccarthy, Liang Ma

On 4/25/2023 5:09 AM, Jerin Jacob wrote:
> On Mon, Apr 24, 2023 at 9:36 PM Ferruh Yigit <ferruh.yigit@amd.com> wrote:
>>
>> On 4/19/2023 11:15 AM, Jerin Jacob wrote:
>>> On Wed, Apr 19, 2023 at 3:24 PM Sivaprasad Tummala
>>> <sivaprasad.tummala@amd.com> wrote:
>>>>
>>>> A new API to allow power monitoring condition on event port to
>>>> optimize power when no events are arriving on an event port for
>>>> the worker core to process in an eventdev based pipelined application.
>>>>
>>>> Signed-off-by: Sivaprasad Tummala <sivaprasad.tummala@amd.com>
>>>> + *
>>>> + * @param dev_id
>>>> + *   Eventdev id
>>>> + * @param port_id
>>>> + *   Eventdev port id
>>>> + * @param pmc
>>>> + *   The pointer to power-optimized monitoring condition structure.
>>>> + *
>>>> + * @return
>>>> + *   - 0: Success.
>>>> + *   -ENOTSUP: Operation not supported.
>>>> + *   -EINVAL: Invalid parameters.
>>>> + *   -ENODEV: Invalid device ID.
>>>> + */
>>>> +__rte_experimental
>>>> +int
>>>> +rte_event_port_get_monitor_addr(uint8_t dev_id, uint8_t port_id,
>>>> +               struct rte_power_monitor_cond *pmc);
>>>
>>> + eventdev driver maintainers
>>>
>>> I think, we don't need to expose this application due to applications
>>> 1)To make applications to be transparent whether power saving is enabled or not?
>>> 2)Some HW and Arch already supports power managent in driver and in HW
>>> (Not using  CPU architecture directly)
>>>
>>> If so, that will be translated to following,
>>> a) Add rte_event_port_power_saving_ena_dis(uint8_t dev_id, uint8_t
>>> port_id, bool ena) for controlling power saving in slowpath.
>>> b) Create reusable PMD private function based on the CPU architecture
>>> power saving primitive to cover the PMD don't have native power saving
>>> support.
>>> c)Update rte_event_dequeue_burst() burst of PMD callback to use (b).
>>>
>>>
>>
>> Hi Jerin,
> 
> Hi Ferruh,
> 
>>
>> ethdev approach seems applied here.
> 
> Understands that. But none of the NIC HW supports power management at
> HW level like eventdev, so that way
> for what we are doing for ethdev is a correct abstraction for ethdev.
> 

What I understand is there is HW based event manager and SW based ones,
SW based ones can benefit more from CPU power optimizations, for HW
event managers if there is not enough benefit they can just ignore the
feature.

>>
>> In ethdev, 'rte_event_port_get_monitor_addr()' equivalent is
>> 'rte_eth_get_monitor_addr()'.
>>
>> Although 'rte_eth_get_monitor_addr()' is public API, it is currently
>> only called from Rx/Tx callback functions implemented in the power library.
>> But I assume intention to make it public is to enable users to implement
>> their own callback functions that has custom algorithm for the power
>> management.
> 
> If there is a use case for customizing with own callback, we can provide that.
> Provided NULL is valid with default algorithm.
> 
>>
>> And probably same is true for the 'rte_event_port_get_monitor_addr()'.
>>
>>
>> Also instead of implementing power features for withing PMDs, isn't it
>> better to have a common eventdev layer for it?
> 
> We can have rte_evetdev_pmd_* APIs as non-public APIs.
> My only objection is to NOT introduce _monitor_ APIs at eventdev level,
> Instead, _monitor_ is one way to do it in SW, So we need higher level
> of abstraction.
> 

I see, this seems a trade off between flexibility and usability. If
application has access to _monitor_ APIs, they can be more flexible to
implement their own logic.

Another option can be application provides the policy with an API and
monitor API used to realize the policy, but for this case it can be
challenge to find and implement correct policies.

>>
>> For the PMDs benefit from HW event manager, just not implementing
>> .get_monitor_addr() dev_ops will make them free from power related APIs.
> 
> But application fast path code gets diverged by exposing low level primitives.
> 

I am not clear with concern above, but for application that use default
callbacks, 'rte_power_eventdev_pmgmt_port_enable()' needs to be called
to enable this feature, if not called datapath is not impacted.
And if not dequeue callback added at all, custom or default, data path
is not impacted at all.



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 1/5] eventdev: add power monitoring API on event port
  2023-05-02 11:19       ` Ferruh Yigit
@ 2023-05-03  7:58         ` Jerin Jacob
  2023-05-03  8:13           ` Ferruh Yigit
  0 siblings, 1 reply; 33+ messages in thread
From: Jerin Jacob @ 2023-05-03  7:58 UTC (permalink / raw)
  To: Ferruh Yigit
  Cc: Sivaprasad Tummala, david.hunt, jerinj, harry.van.haaren, dev,
	Pavan Nikhilesh, McDaniel, Timothy, Shijith Thotton,
	Hemant Agrawal, Sachin Saxena, Mattias Rönnblom,
	Peter Mccarthy, Liang Ma

On Tue, May 2, 2023 at 4:49 PM Ferruh Yigit <ferruh.yigit@amd.com> wrote:
>
> On 4/25/2023 5:09 AM, Jerin Jacob wrote:
> > On Mon, Apr 24, 2023 at 9:36 PM Ferruh Yigit <ferruh.yigit@amd.com> wrote:
> >>
> >> On 4/19/2023 11:15 AM, Jerin Jacob wrote:
> >>> On Wed, Apr 19, 2023 at 3:24 PM Sivaprasad Tummala
> >>> <sivaprasad.tummala@amd.com> wrote:
> >>>>
> >>>> A new API to allow power monitoring condition on event port to
> >>>> optimize power when no events are arriving on an event port for
> >>>> the worker core to process in an eventdev based pipelined application.
> >>>>
> >>>> Signed-off-by: Sivaprasad Tummala <sivaprasad.tummala@amd.com>
> >>>> + *
> >>>> + * @param dev_id
> >>>> + *   Eventdev id
> >>>> + * @param port_id
> >>>> + *   Eventdev port id
> >>>> + * @param pmc
> >>>> + *   The pointer to power-optimized monitoring condition structure.
> >>>> + *
> >>>> + * @return
> >>>> + *   - 0: Success.
> >>>> + *   -ENOTSUP: Operation not supported.
> >>>> + *   -EINVAL: Invalid parameters.
> >>>> + *   -ENODEV: Invalid device ID.
> >>>> + */
> >>>> +__rte_experimental
> >>>> +int
> >>>> +rte_event_port_get_monitor_addr(uint8_t dev_id, uint8_t port_id,
> >>>> +               struct rte_power_monitor_cond *pmc);
> >>>
> >>> + eventdev driver maintainers
> >>>
> >>> I think, we don't need to expose this application due to applications
> >>> 1)To make applications to be transparent whether power saving is enabled or not?
> >>> 2)Some HW and Arch already supports power managent in driver and in HW
> >>> (Not using  CPU architecture directly)
> >>>
> >>> If so, that will be translated to following,
> >>> a) Add rte_event_port_power_saving_ena_dis(uint8_t dev_id, uint8_t
> >>> port_id, bool ena) for controlling power saving in slowpath.
> >>> b) Create reusable PMD private function based on the CPU architecture
> >>> power saving primitive to cover the PMD don't have native power saving
> >>> support.
> >>> c)Update rte_event_dequeue_burst() burst of PMD callback to use (b).
> >>>
> >>>
> >>
> >> Hi Jerin,
> >
> > Hi Ferruh,
> >
> >>
> >> ethdev approach seems applied here.
> >
> > Understands that. But none of the NIC HW supports power management at
> > HW level like eventdev, so that way
> > for what we are doing for ethdev is a correct abstraction for ethdev.
> >
>
> What I understand is there is HW based event manager and SW based ones,
> SW based ones can benefit more from CPU power optimizations, for HW
> event managers if there is not enough benefit they can just ignore the
> feature.
>
> >>
> >> In ethdev, 'rte_event_port_get_monitor_addr()' equivalent is
> >> 'rte_eth_get_monitor_addr()'.
> >>
> >> Although 'rte_eth_get_monitor_addr()' is public API, it is currently
> >> only called from Rx/Tx callback functions implemented in the power library.
> >> But I assume intention to make it public is to enable users to implement
> >> their own callback functions that has custom algorithm for the power
> >> management.
> >
> > If there is a use case for customizing with own callback, we can provide that.
> > Provided NULL is valid with default algorithm.
> >
> >>
> >> And probably same is true for the 'rte_event_port_get_monitor_addr()'.
> >>
> >>
> >> Also instead of implementing power features for withing PMDs, isn't it
> >> better to have a common eventdev layer for it?
> >
> > We can have rte_evetdev_pmd_* APIs as non-public APIs.
> > My only objection is to NOT introduce _monitor_ APIs at eventdev level,
> > Instead, _monitor_ is one way to do it in SW, So we need higher level
> > of abstraction.
> >
>
> I see, this seems a trade off between flexibility and usability. If
> application has access to _monitor_ APIs, they can be more flexible to
> implement their own logic.

OK.

>
> Another option can be application provides the policy with an API and
> monitor API used to realize the policy, but for this case it can be
> challenge to find and implement correct policies.

OK. If we can enumerate the policies, then it will be ideal.
On plus side, there will not be any changes in needed in lib/power/


>
> >>
> >> For the PMDs benefit from HW event manager, just not implementing
> >> .get_monitor_addr() dev_ops will make them free from power related APIs.
> >
> > But application fast path code gets diverged by exposing low level primitives.
> >
>
> I am not clear with concern above, but for application that use default
> callbacks, 'rte_power_eventdev_pmgmt_port_enable()' needs to be called
> to enable this feature, if not called datapath is not impacted.
> And if not dequeue callback added at all, custom or default, data path
> is not impacted at all.

Concerns are around following code[1] when callback is not registered
for this use case.
In eventdev, we are using _one packet at a time_ for a lot of use case
with latency critical workload like L1 processing.
On such cases, the following code will add up.

[1]
  cb = __atomic_load_n((void **)&fp_ops->ev_port.clbk[port_id],
    __ATOMIC_RELAXED);
  if (unlikely(cb != NULL))
       nb_rx = rte_event_dequeue_callbacks(dev_id, port_id, ev, nb_rx, cb);

I see two options,
1) Enumerate the power policy and let driver implement through
non-public PMD helper functions
OR
2)Move the power management callback to driver via non-public PMD
helper functions to avoid cost of
PMDs where power managment done in HW and to remove above extra check
when NO callback is registered[1]

>
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 1/5] eventdev: add power monitoring API on event port
  2023-05-03  7:58         ` Jerin Jacob
@ 2023-05-03  8:13           ` Ferruh Yigit
  2023-05-03  8:26             ` Jerin Jacob
  0 siblings, 1 reply; 33+ messages in thread
From: Ferruh Yigit @ 2023-05-03  8:13 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: Sivaprasad Tummala, david.hunt, jerinj, harry.van.haaren, dev,
	Pavan Nikhilesh, McDaniel, Timothy, Shijith Thotton,
	Hemant Agrawal, Sachin Saxena, Mattias Rönnblom,
	Peter Mccarthy, Liang Ma

On 5/3/2023 8:58 AM, Jerin Jacob wrote:
> On Tue, May 2, 2023 at 4:49 PM Ferruh Yigit <ferruh.yigit@amd.com> wrote:
>>
>> On 4/25/2023 5:09 AM, Jerin Jacob wrote:
>>> On Mon, Apr 24, 2023 at 9:36 PM Ferruh Yigit <ferruh.yigit@amd.com> wrote:
>>>>
>>>> On 4/19/2023 11:15 AM, Jerin Jacob wrote:
>>>>> On Wed, Apr 19, 2023 at 3:24 PM Sivaprasad Tummala
>>>>> <sivaprasad.tummala@amd.com> wrote:
>>>>>>
>>>>>> A new API to allow power monitoring condition on event port to
>>>>>> optimize power when no events are arriving on an event port for
>>>>>> the worker core to process in an eventdev based pipelined application.
>>>>>>
>>>>>> Signed-off-by: Sivaprasad Tummala <sivaprasad.tummala@amd.com>
>>>>>> + *
>>>>>> + * @param dev_id
>>>>>> + *   Eventdev id
>>>>>> + * @param port_id
>>>>>> + *   Eventdev port id
>>>>>> + * @param pmc
>>>>>> + *   The pointer to power-optimized monitoring condition structure.
>>>>>> + *
>>>>>> + * @return
>>>>>> + *   - 0: Success.
>>>>>> + *   -ENOTSUP: Operation not supported.
>>>>>> + *   -EINVAL: Invalid parameters.
>>>>>> + *   -ENODEV: Invalid device ID.
>>>>>> + */
>>>>>> +__rte_experimental
>>>>>> +int
>>>>>> +rte_event_port_get_monitor_addr(uint8_t dev_id, uint8_t port_id,
>>>>>> +               struct rte_power_monitor_cond *pmc);
>>>>>
>>>>> + eventdev driver maintainers
>>>>>
>>>>> I think, we don't need to expose this application due to applications
>>>>> 1)To make applications to be transparent whether power saving is enabled or not?
>>>>> 2)Some HW and Arch already supports power managent in driver and in HW
>>>>> (Not using  CPU architecture directly)
>>>>>
>>>>> If so, that will be translated to following,
>>>>> a) Add rte_event_port_power_saving_ena_dis(uint8_t dev_id, uint8_t
>>>>> port_id, bool ena) for controlling power saving in slowpath.
>>>>> b) Create reusable PMD private function based on the CPU architecture
>>>>> power saving primitive to cover the PMD don't have native power saving
>>>>> support.
>>>>> c)Update rte_event_dequeue_burst() burst of PMD callback to use (b).
>>>>>
>>>>>
>>>>
>>>> Hi Jerin,
>>>
>>> Hi Ferruh,
>>>
>>>>
>>>> ethdev approach seems applied here.
>>>
>>> Understands that. But none of the NIC HW supports power management at
>>> HW level like eventdev, so that way
>>> for what we are doing for ethdev is a correct abstraction for ethdev.
>>>
>>
>> What I understand is there is HW based event manager and SW based ones,
>> SW based ones can benefit more from CPU power optimizations, for HW
>> event managers if there is not enough benefit they can just ignore the
>> feature.
>>
>>>>
>>>> In ethdev, 'rte_event_port_get_monitor_addr()' equivalent is
>>>> 'rte_eth_get_monitor_addr()'.
>>>>
>>>> Although 'rte_eth_get_monitor_addr()' is public API, it is currently
>>>> only called from Rx/Tx callback functions implemented in the power library.
>>>> But I assume intention to make it public is to enable users to implement
>>>> their own callback functions that has custom algorithm for the power
>>>> management.
>>>
>>> If there is a use case for customizing with own callback, we can provide that.
>>> Provided NULL is valid with default algorithm.
>>>
>>>>
>>>> And probably same is true for the 'rte_event_port_get_monitor_addr()'.
>>>>
>>>>
>>>> Also instead of implementing power features for withing PMDs, isn't it
>>>> better to have a common eventdev layer for it?
>>>
>>> We can have rte_evetdev_pmd_* APIs as non-public APIs.
>>> My only objection is to NOT introduce _monitor_ APIs at eventdev level,
>>> Instead, _monitor_ is one way to do it in SW, So we need higher level
>>> of abstraction.
>>>
>>
>> I see, this seems a trade off between flexibility and usability. If
>> application has access to _monitor_ APIs, they can be more flexible to
>> implement their own logic.
> 
> OK.
> 
>>
>> Another option can be application provides the policy with an API and
>> monitor API used to realize the policy, but for this case it can be
>> challenge to find and implement correct policies.
> 
> OK. If we can enumerate the policies, then it will be ideal.
> On plus side, there will not be any changes in needed in lib/power/
> 

If we are talking about a power framework that user defines policies, I
expect parsing/defining policies will be in the power library and will
require changes in the power library anyway.

But as mentioned above it is difficult to define a proper policy, this
is not really related to eventdev, more a power library issue. We can
continue to provide flexibility to user in eventdev and discuss the
policy if a wider forum.

> 
>>
>>>>
>>>> For the PMDs benefit from HW event manager, just not implementing
>>>> .get_monitor_addr() dev_ops will make them free from power related APIs.
>>>
>>> But application fast path code gets diverged by exposing low level primitives.
>>>
>>
>> I am not clear with concern above, but for application that use default
>> callbacks, 'rte_power_eventdev_pmgmt_port_enable()' needs to be called
>> to enable this feature, if not called datapath is not impacted.
>> And if not dequeue callback added at all, custom or default, data path
>> is not impacted at all.
> 
> Concerns are around following code[1] when callback is not registered
> for this use case.
> In eventdev, we are using _one packet at a time_ for a lot of use case
> with latency critical workload like L1 processing.
> On such cases, the following code will add up.
> 
> [1]
>   cb = __atomic_load_n((void **)&fp_ops->ev_port.clbk[port_id],
>     __ATOMIC_RELAXED);
>   if (unlikely(cb != NULL))
>        nb_rx = rte_event_dequeue_callbacks(dev_id, port_id, ev, nb_rx, cb);
> 
> I see two options,
> 1) Enumerate the power policy and let driver implement through
> non-public PMD helper functions
> OR
> 2)Move the power management callback to driver via non-public PMD
> helper functions to avoid cost of
> PMDs where power managment done in HW and to remove above extra check
> when NO callback is registered[1]
> 

Got it, yes there is an additional check with event callbacks, we can
add a compiler flag around it as done in ethdev to let it not compiled
when not needed, will it work?


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 1/5] eventdev: add power monitoring API on event port
  2023-05-03  8:13           ` Ferruh Yigit
@ 2023-05-03  8:26             ` Jerin Jacob
  2023-05-03 15:11               ` Tummala, Sivaprasad
  0 siblings, 1 reply; 33+ messages in thread
From: Jerin Jacob @ 2023-05-03  8:26 UTC (permalink / raw)
  To: Ferruh Yigit
  Cc: Sivaprasad Tummala, david.hunt, jerinj, harry.van.haaren, dev,
	Pavan Nikhilesh, McDaniel, Timothy, Shijith Thotton,
	Hemant Agrawal, Sachin Saxena, Mattias Rönnblom,
	Peter Mccarthy, Liang Ma

On Wed, May 3, 2023 at 1:44 PM Ferruh Yigit <ferruh.yigit@amd.com> wrote:
>
> On 5/3/2023 8:58 AM, Jerin Jacob wrote:
> > On Tue, May 2, 2023 at 4:49 PM Ferruh Yigit <ferruh.yigit@amd.com> wrote:
> >>
> >> On 4/25/2023 5:09 AM, Jerin Jacob wrote:
> >>> On Mon, Apr 24, 2023 at 9:36 PM Ferruh Yigit <ferruh.yigit@amd.com> wrote:
> >>>>
> >>>> On 4/19/2023 11:15 AM, Jerin Jacob wrote:
> >>>>> On Wed, Apr 19, 2023 at 3:24 PM Sivaprasad Tummala
> >>>>> <sivaprasad.tummala@amd.com> wrote:
> >>>>>>
> >>>>>> A new API to allow power monitoring condition on event port to
> >>>>>> optimize power when no events are arriving on an event port for
> >>>>>> the worker core to process in an eventdev based pipelined application.
> >>>>>>
> >>>>>> Signed-off-by: Sivaprasad Tummala <sivaprasad.tummala@amd.com>
> >>>>>> + *
> >>>>>> + * @param dev_id
> >>>>>> + *   Eventdev id
> >>>>>> + * @param port_id
> >>>>>> + *   Eventdev port id
> >>>>>> + * @param pmc
> >>>>>> + *   The pointer to power-optimized monitoring condition structure.
> >>>>>> + *
> >>>>>> + * @return
> >>>>>> + *   - 0: Success.
> >>>>>> + *   -ENOTSUP: Operation not supported.
> >>>>>> + *   -EINVAL: Invalid parameters.
> >>>>>> + *   -ENODEV: Invalid device ID.
> >>>>>> + */
> >>>>>> +__rte_experimental
> >>>>>> +int
> >>>>>> +rte_event_port_get_monitor_addr(uint8_t dev_id, uint8_t port_id,
> >>>>>> +               struct rte_power_monitor_cond *pmc);
> >>>>>
> >>>>> + eventdev driver maintainers
> >>>>>
> >>>>> I think, we don't need to expose this application due to applications
> >>>>> 1)To make applications to be transparent whether power saving is enabled or not?
> >>>>> 2)Some HW and Arch already supports power managent in driver and in HW
> >>>>> (Not using  CPU architecture directly)
> >>>>>
> >>>>> If so, that will be translated to following,
> >>>>> a) Add rte_event_port_power_saving_ena_dis(uint8_t dev_id, uint8_t
> >>>>> port_id, bool ena) for controlling power saving in slowpath.
> >>>>> b) Create reusable PMD private function based on the CPU architecture
> >>>>> power saving primitive to cover the PMD don't have native power saving
> >>>>> support.
> >>>>> c)Update rte_event_dequeue_burst() burst of PMD callback to use (b).
> >>>>>
> >>>>>
> >>>>
> >>>> Hi Jerin,
> >>>
> >>> Hi Ferruh,
> >>>
> >>>>
> >>>> ethdev approach seems applied here.
> >>>
> >>> Understands that. But none of the NIC HW supports power management at
> >>> HW level like eventdev, so that way
> >>> for what we are doing for ethdev is a correct abstraction for ethdev.
> >>>
> >>
> >> What I understand is there is HW based event manager and SW based ones,
> >> SW based ones can benefit more from CPU power optimizations, for HW
> >> event managers if there is not enough benefit they can just ignore the
> >> feature.
> >>
> >>>>
> >>>> In ethdev, 'rte_event_port_get_monitor_addr()' equivalent is
> >>>> 'rte_eth_get_monitor_addr()'.
> >>>>
> >>>> Although 'rte_eth_get_monitor_addr()' is public API, it is currently
> >>>> only called from Rx/Tx callback functions implemented in the power library.
> >>>> But I assume intention to make it public is to enable users to implement
> >>>> their own callback functions that has custom algorithm for the power
> >>>> management.
> >>>
> >>> If there is a use case for customizing with own callback, we can provide that.
> >>> Provided NULL is valid with default algorithm.
> >>>
> >>>>
> >>>> And probably same is true for the 'rte_event_port_get_monitor_addr()'.
> >>>>
> >>>>
> >>>> Also instead of implementing power features for withing PMDs, isn't it
> >>>> better to have a common eventdev layer for it?
> >>>
> >>> We can have rte_evetdev_pmd_* APIs as non-public APIs.
> >>> My only objection is to NOT introduce _monitor_ APIs at eventdev level,
> >>> Instead, _monitor_ is one way to do it in SW, So we need higher level
> >>> of abstraction.
> >>>
> >>
> >> I see, this seems a trade off between flexibility and usability. If
> >> application has access to _monitor_ APIs, they can be more flexible to
> >> implement their own logic.
> >
> > OK.
> >
> >>
> >> Another option can be application provides the policy with an API and
> >> monitor API used to realize the policy, but for this case it can be
> >> challenge to find and implement correct policies.
> >
> > OK. If we can enumerate the policies, then it will be ideal.
> > On plus side, there will not be any changes in needed in lib/power/
> >
>
> If we are talking about a power framework that user defines policies, I
> expect parsing/defining policies will be in the power library and will
> require changes in the power library anyway.

OK

>
> But as mentioned above it is difficult to define a proper policy, this
> is not really related to eventdev, more a power library issue. We can
> continue to provide flexibility to user in eventdev and discuss the
> policy if a wider forum.

OK.

>
> >
> >>
> >>>>
> >>>> For the PMDs benefit from HW event manager, just not implementing
> >>>> .get_monitor_addr() dev_ops will make them free from power related APIs.
> >>>
> >>> But application fast path code gets diverged by exposing low level primitives.
> >>>
> >>
> >> I am not clear with concern above, but for application that use default
> >> callbacks, 'rte_power_eventdev_pmgmt_port_enable()' needs to be called
> >> to enable this feature, if not called datapath is not impacted.
> >> And if not dequeue callback added at all, custom or default, data path
> >> is not impacted at all.
> >
> > Concerns are around following code[1] when callback is not registered
> > for this use case.
> > In eventdev, we are using _one packet at a time_ for a lot of use case
> > with latency critical workload like L1 processing.
> > On such cases, the following code will add up.
> >
> > [1]
> >   cb = __atomic_load_n((void **)&fp_ops->ev_port.clbk[port_id],
> >     __ATOMIC_RELAXED);
> >   if (unlikely(cb != NULL))
> >        nb_rx = rte_event_dequeue_callbacks(dev_id, port_id, ev, nb_rx, cb);
> >
> > I see two options,
> > 1) Enumerate the power policy and let driver implement through
> > non-public PMD helper functions
> > OR
> > 2)Move the power management callback to driver via non-public PMD
> > helper functions to avoid cost of
> > PMDs where power managment done in HW and to remove above extra check
> > when NO callback is registered[1]
> >
>
> Got it, yes there is an additional check with event callbacks, we can
> add a compiler flag around it as done in ethdev to let it not compiled
> when not needed, will it work?

I would prefer to expose PMD helper function which can be called at
end of the driver dequeue function so that other PMD can reuse as
needed.
This is to avoid compiler flag, cache line occupancy changes in struct
rte_eventdev
struct rte_event_fp_ops in generic code also we may not need
full-fledged generic callbacks scheme for this.

>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: [RFC PATCH 1/5] eventdev: add power monitoring API on event port
  2023-05-03  8:26             ` Jerin Jacob
@ 2023-05-03 15:11               ` Tummala, Sivaprasad
  0 siblings, 0 replies; 33+ messages in thread
From: Tummala, Sivaprasad @ 2023-05-03 15:11 UTC (permalink / raw)
  To: Jerin Jacob, Yigit, Ferruh
  Cc: david.hunt, jerinj, harry.van.haaren, dev, Pavan Nikhilesh,
	McDaniel, Timothy, Shijith Thotton, Hemant Agrawal,
	Sachin Saxena, Mattias Rönnblom, Peter Mccarthy, Liang Ma

[AMD Official Use Only - General]

Hi Jerin, 

> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Wednesday, May 3, 2023 1:57 PM
> To: Yigit, Ferruh <Ferruh.Yigit@amd.com>
> Cc: Tummala, Sivaprasad <Sivaprasad.Tummala@amd.com>;
> david.hunt@intel.com; jerinj@marvell.com; harry.van.haaren@intel.com;
> dev@dpdk.org; Pavan Nikhilesh <pbhagavatula@marvell.com>; McDaniel, Timothy
> <timothy.mcdaniel@intel.com>; Shijith Thotton <sthotton@marvell.com>;
> Hemant Agrawal <hemant.agrawal@nxp.com>; Sachin Saxena
> <sachin.saxena@oss.nxp.com>; Mattias Rönnblom
> <mattias.ronnblom@ericsson.com>; Peter Mccarthy
> <peter.mccarthy@intel.com>; Liang Ma <liangma@liangbit.com>
> Subject: Re: [RFC PATCH 1/5] eventdev: add power monitoring API on event port
> 
> Caution: This message originated from an External Source. Use proper caution
> when opening attachments, clicking links, or responding.
> 
> 
> On Wed, May 3, 2023 at 1:44 PM Ferruh Yigit <ferruh.yigit@amd.com> wrote:
> >
> > On 5/3/2023 8:58 AM, Jerin Jacob wrote:
> > > On Tue, May 2, 2023 at 4:49 PM Ferruh Yigit <ferruh.yigit@amd.com> wrote:
> > >>
> > >> On 4/25/2023 5:09 AM, Jerin Jacob wrote:
> > >>> On Mon, Apr 24, 2023 at 9:36 PM Ferruh Yigit <ferruh.yigit@amd.com>
> wrote:
> > >>>>
> > >>>> On 4/19/2023 11:15 AM, Jerin Jacob wrote:
> > >>>>> On Wed, Apr 19, 2023 at 3:24 PM Sivaprasad Tummala
> > >>>>> <sivaprasad.tummala@amd.com> wrote:
> > >>>>>>
> > >>>>>> A new API to allow power monitoring condition on event port to
> > >>>>>> optimize power when no events are arriving on an event port for
> > >>>>>> the worker core to process in an eventdev based pipelined application.
> > >>>>>>
> > >>>>>> Signed-off-by: Sivaprasad Tummala <sivaprasad.tummala@amd.com>
> > >>>>>> + *
> > >>>>>> + * @param dev_id
> > >>>>>> + *   Eventdev id
> > >>>>>> + * @param port_id
> > >>>>>> + *   Eventdev port id
> > >>>>>> + * @param pmc
> > >>>>>> + *   The pointer to power-optimized monitoring condition structure.
> > >>>>>> + *
> > >>>>>> + * @return
> > >>>>>> + *   - 0: Success.
> > >>>>>> + *   -ENOTSUP: Operation not supported.
> > >>>>>> + *   -EINVAL: Invalid parameters.
> > >>>>>> + *   -ENODEV: Invalid device ID.
> > >>>>>> + */
> > >>>>>> +__rte_experimental
> > >>>>>> +int
> > >>>>>> +rte_event_port_get_monitor_addr(uint8_t dev_id, uint8_t port_id,
> > >>>>>> +               struct rte_power_monitor_cond *pmc);
> > >>>>>
> > >>>>> + eventdev driver maintainers
> > >>>>>
> > >>>>> I think, we don't need to expose this application due to
> > >>>>> applications 1)To make applications to be transparent whether power
> saving is enabled or not?
> > >>>>> 2)Some HW and Arch already supports power managent in driver and
> > >>>>> in HW (Not using  CPU architecture directly)
> > >>>>>
> > >>>>> If so, that will be translated to following,
> > >>>>> a) Add rte_event_port_power_saving_ena_dis(uint8_t dev_id,
> > >>>>> uint8_t port_id, bool ena) for controlling power saving in slowpath.
> > >>>>> b) Create reusable PMD private function based on the CPU
> > >>>>> architecture power saving primitive to cover the PMD don't have
> > >>>>> native power saving support.
> > >>>>> c)Update rte_event_dequeue_burst() burst of PMD callback to use (b).
> > >>>>>
> > >>>>>
> > >>>>
> > >>>> Hi Jerin,
> > >>>
> > >>> Hi Ferruh,
> > >>>
> > >>>>
> > >>>> ethdev approach seems applied here.
> > >>>
> > >>> Understands that. But none of the NIC HW supports power management
> > >>> at HW level like eventdev, so that way for what we are doing for
> > >>> ethdev is a correct abstraction for ethdev.
> > >>>
> > >>
> > >> What I understand is there is HW based event manager and SW based
> > >> ones, SW based ones can benefit more from CPU power optimizations,
> > >> for HW event managers if there is not enough benefit they can just
> > >> ignore the feature.
> > >>
> > >>>>
> > >>>> In ethdev, 'rte_event_port_get_monitor_addr()' equivalent is
> > >>>> 'rte_eth_get_monitor_addr()'.
> > >>>>
> > >>>> Although 'rte_eth_get_monitor_addr()' is public API, it is
> > >>>> currently only called from Rx/Tx callback functions implemented in the
> power library.
> > >>>> But I assume intention to make it public is to enable users to
> > >>>> implement their own callback functions that has custom algorithm
> > >>>> for the power management.
> > >>>
> > >>> If there is a use case for customizing with own callback, we can provide
> that.
> > >>> Provided NULL is valid with default algorithm.
> > >>>
> > >>>>
> > >>>> And probably same is true for the 'rte_event_port_get_monitor_addr()'.
> > >>>>
> > >>>>
> > >>>> Also instead of implementing power features for withing PMDs,
> > >>>> isn't it better to have a common eventdev layer for it?
> > >>>
> > >>> We can have rte_evetdev_pmd_* APIs as non-public APIs.
> > >>> My only objection is to NOT introduce _monitor_ APIs at eventdev
> > >>> level, Instead, _monitor_ is one way to do it in SW, So we need
> > >>> higher level of abstraction.
> > >>>
> > >>
> > >> I see, this seems a trade off between flexibility and usability. If
> > >> application has access to _monitor_ APIs, they can be more flexible
> > >> to implement their own logic.
> > >
> > > OK.
> > >
> > >>
> > >> Another option can be application provides the policy with an API
> > >> and monitor API used to realize the policy, but for this case it
> > >> can be challenge to find and implement correct policies.
> > >
> > > OK. If we can enumerate the policies, then it will be ideal.
> > > On plus side, there will not be any changes in needed in lib/power/
> > >
> >
> > If we are talking about a power framework that user defines policies,
> > I expect parsing/defining policies will be in the power library and
> > will require changes in the power library anyway.
> 
> OK
> 
> >
> > But as mentioned above it is difficult to define a proper policy, this
> > is not really related to eventdev, more a power library issue. We can
> > continue to provide flexibility to user in eventdev and discuss the
> > policy if a wider forum.
> 
> OK.
> 
> >
> > >
> > >>
> > >>>>
> > >>>> For the PMDs benefit from HW event manager, just not implementing
> > >>>> .get_monitor_addr() dev_ops will make them free from power related
> APIs.
> > >>>
> > >>> But application fast path code gets diverged by exposing low level
> primitives.
> > >>>
> > >>
> > >> I am not clear with concern above, but for application that use
> > >> default callbacks, 'rte_power_eventdev_pmgmt_port_enable()' needs
> > >> to be called to enable this feature, if not called datapath is not impacted.
> > >> And if not dequeue callback added at all, custom or default, data
> > >> path is not impacted at all.
> > >
> > > Concerns are around following code[1] when callback is not
> > > registered for this use case.
> > > In eventdev, we are using _one packet at a time_ for a lot of use
> > > case with latency critical workload like L1 processing.
> > > On such cases, the following code will add up.
> > >
> > > [1]
> > >   cb = __atomic_load_n((void **)&fp_ops->ev_port.clbk[port_id],
> > >     __ATOMIC_RELAXED);
> > >   if (unlikely(cb != NULL))
> > >        nb_rx = rte_event_dequeue_callbacks(dev_id, port_id, ev,
> > > nb_rx, cb);
> > >
> > > I see two options,
> > > 1) Enumerate the power policy and let driver implement through
> > > non-public PMD helper functions OR 2)Move the power management
> > > callback to driver via non-public PMD helper functions to avoid cost
> > > of PMDs where power managment done in HW and to remove above extra
> > > check when NO callback is registered[1]
> > >
> >
> > Got it, yes there is an additional check with event callbacks, we can
> > add a compiler flag around it as done in ethdev to let it not compiled
> > when not needed, will it work?
> 
> I would prefer to expose PMD helper function which can be called at end of the
> driver dequeue function so that other PMD can reuse as needed.
> This is to avoid compiler flag, cache line occupancy changes in struct rte_eventdev
> struct rte_event_fp_ops in generic code also we may not need full-fledged generic
> callbacks scheme for this.
> 
> >
OK. Will fix this in the v1 patch.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 3/5] eventdev: support optional dequeue callbacks
  2023-04-19  9:54 ` [RFC PATCH 3/5] eventdev: support optional dequeue callbacks Sivaprasad Tummala
  2023-04-24 16:06   ` Ferruh Yigit
@ 2023-05-17 14:22   ` Burakov, Anatoly
  1 sibling, 0 replies; 33+ messages in thread
From: Burakov, Anatoly @ 2023-05-17 14:22 UTC (permalink / raw)
  To: Sivaprasad Tummala, david.hunt, jerinj, harry.van.haaren; +Cc: dev

On 4/19/2023 10:54 AM, Sivaprasad Tummala wrote:
> Add optional support for inline event processing within dequeue call.
> For a dequeue callback, events dequeued from the event port were
> passed them to a callback function if configured, to allow
> additional processing. e.g. unpack batch of packets from each event
> on dequeue, before passing back to the application.
> 
> Signed-off-by: Sivaprasad Tummala <sivaprasad.tummala@amd.com>
> ---

Hi,

> +/**
> + * @internal
> + * Structure used to hold information about the callbacks to be called for a
> + * port on dequeue.
> + */
> +struct rte_event_dequeue_callback {
> +	struct rte_event_dequeue_callback *next;
> +	union{
> +		rte_dequeue_callback_fn dequeue;
> +	} fn;
> +	void *param;
> +};

...and...

> +
> +uint16_t
> +rte_event_dequeue_callbacks(uint8_t dev_id, uint8_t port_id,
> +		struct rte_event *ev, uint16_t nb_events, void *opaque)
> +{
> +	static uint16_t nb_rx;
> +	const struct rte_event_dequeue_callback *cb = opaque;
> +
> +	while (cb != NULL) {
> +		nb_rx = cb->fn.dequeue(dev_id, port_id, ev,
> +				nb_events, cb->param);
> +		cb = cb->next;
> +	}
> +	return nb_rx;

Nitpicking an RFC, but this kind of looks like reimplementation for 
TAILQ functionality?

-- 
Thanks,
Anatoly


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 4/5] power: add eventdev support for power management
  2023-04-19  9:54 ` [RFC PATCH 4/5] power: add eventdev support for power management Sivaprasad Tummala
@ 2023-05-17 14:43   ` Burakov, Anatoly
  2023-05-24 12:34     ` Tummala, Sivaprasad
  0 siblings, 1 reply; 33+ messages in thread
From: Burakov, Anatoly @ 2023-05-17 14:43 UTC (permalink / raw)
  To: Sivaprasad Tummala, david.hunt, jerinj, harry.van.haaren; +Cc: dev

On 4/19/2023 10:54 AM, Sivaprasad Tummala wrote:
> Add eventdev support to enable power saving when no events
> are arriving. It is based on counting the number of empty
> polls and, when the number reaches a certain threshold, entering
> an architecture-defined optimized power state that will either wait
> until a TSC timestamp expires, or when events arrive.
> 
> This API mandates a core-to-single-port mapping (i.e. one core polling
> multiple ports of event device is not supported). This should be ok
> as the general use case will have one CPU core using one port to
> enqueue/dequeue events from an eventdev.
> 
> This design is using Eventdev PMD Dequeue callbacks.
> 
> 1. MWAITX/MONITORX:
> 
>     When a certain threshold of empty polls is reached, the core will go
>     into a power optimized sleep while waiting on an address of next RX
>     descriptor to be written to.
> 
> 2. Pause instruction
> 
>     This method uses the pause instruction to avoid busy polling.
> 
> Signed-off-by: Sivaprasad Tummala <sivaprasad.tummala@amd.com>
> ---

Hi, few comments

> +
> +static uint16_t
> +evt_clb_pause(uint8_t dev_id __rte_unused, uint8_t port_id __rte_unused,
> +		struct rte_event *ev __rte_unused,
> +		uint16_t nb_events, void *arg)
> +{
> +	const unsigned int lcore = rte_lcore_id();
> +	struct queue_list_entry *queue_conf = arg;
> +	struct pmd_core_cfg *lcore_conf;
> +	const bool empty = nb_events == 0;
> +	uint32_t pause_duration = rte_power_pmd_mgmt_get_pause_duration();
> +
> +	lcore_conf = &lcore_cfgs[lcore];
> +
> +	if (likely(!empty))
> +		/* early exit */
> +		queue_reset(lcore_conf, queue_conf);
> +	else {
> +		/* can this queue sleep? */
> +		if (!queue_can_sleep(lcore_conf, queue_conf))
> +			return nb_events;
> +
> +		/* can this lcore sleep? */
> +		if (!lcore_can_sleep(lcore_conf))
> +			return nb_events;
> +
> +		uint64_t i;
> +		for (i = 0; i < global_data.pause_per_us * pause_duration; i++)
> +			rte_pause();

Why not add support for TPAUSE? This is generic code, ethdev code path 
supports it, and most of this function is duplicated from ethdev 
implementation. I wish we could unify them somehow, but I can't think of 
an elegant way to do it off the top of my head.

> +
> +	/* we need this in various places */
> +	rte_cpu_get_intrinsics_support(&global_data.intrinsics_support);
> +
> +	switch (mode) {
> +	case RTE_POWER_MGMT_TYPE_MONITOR:
> +		/* check if we can add a new port */
> +		ret = check_evt_monitor(lcore_cfg, &qdata);
> +		if (ret < 0)
> +			goto end;
> +
> +		clb = evt_clb_umwait;
> +		break;
> +	case RTE_POWER_MGMT_TYPE_PAUSE:
> +		/* figure out various time-to-tsc conversions */
> +		if (global_data.tsc_per_us == 0)
> +			calc_tsc();
> +
> +		clb = evt_clb_pause;
> +		break;
> +	default:
> +		RTE_LOG(DEBUG, POWER, "Invalid power management type\n");

Technically, if we specify "scale" here, the power management scheme 
would be *unsupported* rather than *invalid*, and thus should return 
-ENOTSUP rather than -EINVAL.

Also, since this is generic code, theoretically this code could in fact 
support SCALE mode? Would it make sense for eventdev to use that scheme?

> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice.
> + *
> + * Disable power management on a specified Ethernet device Rx queue and lcore.
> + *
> + * @note This function is not thread-safe.
> + *
> + * @warning This function must be called when all affected Ethernet queues are
> + *   stopped and no Rx/Tx is in progress!
> + *
> + * @param lcore_id
> + *   The lcore the Rx queue is polled from.
> + * @param dev_id
> + *   The identifier of the device.
> + * @param port_id
> + *   Event port identifier of the Event device.
> + * @return
> + *   0 on success
> + *   <0 on error
> + */
> +__rte_experimental
> +int
> +rte_power_eventdev_pmgmt_port_disable(unsigned int lcore_id,
> +		uint8_t dev_id, uint8_t port_id);

It would've been nice if we didn't have to reimplement the same logic 
for every new device type, but seeing how we do not have any unified 
driver API, I don't have any bright ideas on how to do it better.

-- 
Thanks,
Anatoly


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 1/5] eventdev: add power monitoring API on event port
  2023-04-19  9:54 [RFC PATCH 1/5] eventdev: add power monitoring API on event port Sivaprasad Tummala
                   ` (4 preceding siblings ...)
  2023-04-19 10:15 ` [RFC PATCH 1/5] eventdev: add power monitoring API on event port Jerin Jacob
@ 2023-05-17 14:48 ` Burakov, Anatoly
  2023-10-16 20:57 ` [PATCH v1 1/6] " Sivaprasad Tummala
  6 siblings, 0 replies; 33+ messages in thread
From: Burakov, Anatoly @ 2023-05-17 14:48 UTC (permalink / raw)
  To: Sivaprasad Tummala, david.hunt, jerinj, harry.van.haaren; +Cc: dev

On 4/19/2023 10:54 AM, Sivaprasad Tummala wrote:
> A new API to allow power monitoring condition on event port to
> optimize power when no events are arriving on an event port for
> the worker core to process in an eventdev based pipelined application.
> 
> Signed-off-by: Sivaprasad Tummala <sivaprasad.tummala@amd.com>
> ---

General patchset comment: the implementation seems straightforward 
enough and closely follows ethdev, so I do not have any objections to it 
from rte_power point of view - it's nice to see that the infrastructure 
we have created came out useful outside of usecases we envisioned for it :)

-- 
Thanks,
Anatoly


^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: [RFC PATCH 4/5] power: add eventdev support for power management
  2023-05-17 14:43   ` Burakov, Anatoly
@ 2023-05-24 12:34     ` Tummala, Sivaprasad
  0 siblings, 0 replies; 33+ messages in thread
From: Tummala, Sivaprasad @ 2023-05-24 12:34 UTC (permalink / raw)
  To: Burakov, Anatoly, david.hunt, jerinj, harry.van.haaren; +Cc: dev

[AMD Official Use Only - General]

Hi,

> -----Original Message-----
> From: Burakov, Anatoly <anatoly.burakov@intel.com>
> Sent: Wednesday, May 17, 2023 8:14 PM
> To: Tummala, Sivaprasad <Sivaprasad.Tummala@amd.com>;
> david.hunt@intel.com; jerinj@marvell.com; harry.van.haaren@intel.com
> Cc: dev@dpdk.org
> Subject: Re: [RFC PATCH 4/5] power: add eventdev support for power
> management
> 
> Caution: This message originated from an External Source. Use proper caution
> when opening attachments, clicking links, or responding.
> 
> 
> On 4/19/2023 10:54 AM, Sivaprasad Tummala wrote:
> > Add eventdev support to enable power saving when no events are
> > arriving. It is based on counting the number of empty polls and, when
> > the number reaches a certain threshold, entering an
> > architecture-defined optimized power state that will either wait until
> > a TSC timestamp expires, or when events arrive.
> >
> > This API mandates a core-to-single-port mapping (i.e. one core polling
> > multiple ports of event device is not supported). This should be ok as
> > the general use case will have one CPU core using one port to
> > enqueue/dequeue events from an eventdev.
> >
> > This design is using Eventdev PMD Dequeue callbacks.
> >
> > 1. MWAITX/MONITORX:
> >
> >     When a certain threshold of empty polls is reached, the core will go
> >     into a power optimized sleep while waiting on an address of next RX
> >     descriptor to be written to.
> >
> > 2. Pause instruction
> >
> >     This method uses the pause instruction to avoid busy polling.
> >
> > Signed-off-by: Sivaprasad Tummala <sivaprasad.tummala@amd.com>
> > ---
> 
> Hi, few comments
> 
> > +
> > +static uint16_t
> > +evt_clb_pause(uint8_t dev_id __rte_unused, uint8_t port_id __rte_unused,
> > +             struct rte_event *ev __rte_unused,
> > +             uint16_t nb_events, void *arg) {
> > +     const unsigned int lcore = rte_lcore_id();
> > +     struct queue_list_entry *queue_conf = arg;
> > +     struct pmd_core_cfg *lcore_conf;
> > +     const bool empty = nb_events == 0;
> > +     uint32_t pause_duration =
> > +rte_power_pmd_mgmt_get_pause_duration();
> > +
> > +     lcore_conf = &lcore_cfgs[lcore];
> > +
> > +     if (likely(!empty))
> > +             /* early exit */
> > +             queue_reset(lcore_conf, queue_conf);
> > +     else {
> > +             /* can this queue sleep? */
> > +             if (!queue_can_sleep(lcore_conf, queue_conf))
> > +                     return nb_events;
> > +
> > +             /* can this lcore sleep? */
> > +             if (!lcore_can_sleep(lcore_conf))
> > +                     return nb_events;
> > +
> > +             uint64_t i;
> > +             for (i = 0; i < global_data.pause_per_us * pause_duration; i++)
> > +                     rte_pause();
> 
> Why not add support for TPAUSE? This is generic code, ethdev code path supports
> it, and most of this function is duplicated from ethdev implementation. I wish we
> could unify them somehow, but I can't think of an elegant way to do it off the top
> of my head.
> 
Sure, will fix this in v1.
> > +
> > +     /* we need this in various places */
> > +     rte_cpu_get_intrinsics_support(&global_data.intrinsics_support);
> > +
> > +     switch (mode) {
> > +     case RTE_POWER_MGMT_TYPE_MONITOR:
> > +             /* check if we can add a new port */
> > +             ret = check_evt_monitor(lcore_cfg, &qdata);
> > +             if (ret < 0)
> > +                     goto end;
> > +
> > +             clb = evt_clb_umwait;
> > +             break;
> > +     case RTE_POWER_MGMT_TYPE_PAUSE:
> > +             /* figure out various time-to-tsc conversions */
> > +             if (global_data.tsc_per_us == 0)
> > +                     calc_tsc();
> > +
> > +             clb = evt_clb_pause;
> > +             break;
> > +     default:
> > +             RTE_LOG(DEBUG, POWER, "Invalid power management
> > + type\n");
> 
> Technically, if we specify "scale" here, the power management scheme would be
> *unsupported* rather than *invalid*, and thus should return -ENOTSUP rather than
> -EINVAL.
> 
> Also, since this is generic code, theoretically this code could in fact support SCALE
> mode? Would it make sense for eventdev to use that scheme?
> 
Agreed. Will fix this in v1 patch. 

> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> notice.
> > + *
> > + * Disable power management on a specified Ethernet device Rx queue and
> lcore.
> > + *
> > + * @note This function is not thread-safe.
> > + *
> > + * @warning This function must be called when all affected Ethernet queues are
> > + *   stopped and no Rx/Tx is in progress!
> > + *
> > + * @param lcore_id
> > + *   The lcore the Rx queue is polled from.
> > + * @param dev_id
> > + *   The identifier of the device.
> > + * @param port_id
> > + *   Event port identifier of the Event device.
> > + * @return
> > + *   0 on success
> > + *   <0 on error
> > + */
> > +__rte_experimental
> > +int
> > +rte_power_eventdev_pmgmt_port_disable(unsigned int lcore_id,
> > +             uint8_t dev_id, uint8_t port_id);
> 
> It would've been nice if we didn't have to reimplement the same logic for every
> new device type, but seeing how we do not have any unified driver API, I don't
> have any bright ideas on how to do it better.
> 
> --
> Thanks,
> Anatoly 


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v1 1/6] eventdev: add power monitoring API on event port
  2023-04-19  9:54 [RFC PATCH 1/5] eventdev: add power monitoring API on event port Sivaprasad Tummala
                   ` (5 preceding siblings ...)
  2023-05-17 14:48 ` Burakov, Anatoly
@ 2023-10-16 20:57 ` Sivaprasad Tummala
  2023-10-16 20:57   ` [PATCH v1 2/6] event/sw: support power monitor Sivaprasad Tummala
                     ` (4 more replies)
  6 siblings, 5 replies; 33+ messages in thread
From: Sivaprasad Tummala @ 2023-10-16 20:57 UTC (permalink / raw)
  To: jerinjacobk, harry.van.haaren, anatoly.burakov
  Cc: dev, ferruh.yigit, david.hunt

A new API to allow power monitoring condition on event port to
optimize power when no events are arriving on an event port for
the worker core to process in an eventdev based pipelined application.

Signed-off-by: Sivaprasad Tummala <sivaprasad.tummala@amd.com>
---
 lib/eventdev/eventdev_pmd.h | 22 ++++++++++++++++++++++
 lib/eventdev/rte_eventdev.c | 24 ++++++++++++++++++++++++
 lib/eventdev/rte_eventdev.h | 25 +++++++++++++++++++++++++
 lib/eventdev/version.map    |  1 +
 4 files changed, 72 insertions(+)

diff --git a/lib/eventdev/eventdev_pmd.h b/lib/eventdev/eventdev_pmd.h
index 30bd90085c..a0ee768ce7 100644
--- a/lib/eventdev/eventdev_pmd.h
+++ b/lib/eventdev/eventdev_pmd.h
@@ -525,6 +525,26 @@ typedef int (*eventdev_port_unlink_profile_t)(struct rte_eventdev *dev, void *po
 typedef int (*eventdev_port_unlinks_in_progress_t)(struct rte_eventdev *dev,
 		void *port);
 
+/**
+ * @internal
+ * Get address of memory location whose contents will change whenever there is
+ * new data to be received on an Event port.
+ *
+ * @param port
+ *   Eventdev port pointer.
+ * @param pmc
+ *   The pointer to power-optimized monitoring condition structure.
+ * @return
+ *   Negative errno value on error, 0 on success.
+ *
+ * @retval 0
+ *   Success
+ * @retval -EINVAL
+ *   Invalid parameters
+ */
+typedef int (*event_get_monitor_addr_t)(void *port,
+		struct rte_power_monitor_cond *pmc);
+
 /**
  * Converts nanoseconds to *timeout_ticks* value for rte_event_dequeue()
  *
@@ -1564,6 +1584,8 @@ struct eventdev_ops {
 	eventdev_dump_t dump;
 	/* Dump internal information */
 
+	event_get_monitor_addr_t get_monitor_addr;
+	/** Get power monitoring condition for event port */
 	eventdev_xstats_get_t xstats_get;
 	/**< Get extended device statistics. */
 	eventdev_xstats_get_names_t xstats_get_names;
diff --git a/lib/eventdev/rte_eventdev.c b/lib/eventdev/rte_eventdev.c
index 95373bbaad..5feb4326a2 100644
--- a/lib/eventdev/rte_eventdev.c
+++ b/lib/eventdev/rte_eventdev.c
@@ -884,6 +884,30 @@ rte_event_port_attr_get(uint8_t dev_id, uint8_t port_id, uint32_t attr_id,
 	return 0;
 }
 
+int
+rte_event_port_get_monitor_addr(uint8_t dev_id, uint8_t port_id,
+				struct rte_power_monitor_cond *pmc)
+{
+	struct rte_eventdev *dev;
+
+	RTE_EVENTDEV_VALID_DEVID_OR_ERR_RET(dev_id, -EINVAL);
+	dev = &rte_eventdevs[dev_id];
+	if (!is_valid_port(dev, port_id)) {
+		RTE_EDEV_LOG_ERR("Invalid port_id=%" PRIu8, port_id);
+		return -EINVAL;
+	}
+
+	if (pmc == NULL) {
+		RTE_EDEV_LOG_ERR("devid %u port %u power monitor condition is NULL\n",
+				dev_id, port_id);
+		return -EINVAL;
+	}
+
+	if (*dev->dev_ops->get_monitor_addr == NULL)
+		return -ENOTSUP;
+	return (*dev->dev_ops->get_monitor_addr)(dev->data->ports[port_id], pmc);
+}
+
 int
 rte_event_queue_attr_get(uint8_t dev_id, uint8_t queue_id, uint32_t attr_id,
 			uint32_t *attr_value)
diff --git a/lib/eventdev/rte_eventdev.h b/lib/eventdev/rte_eventdev.h
index 2ea98302b8..38dbbc2617 100644
--- a/lib/eventdev/rte_eventdev.h
+++ b/lib/eventdev/rte_eventdev.h
@@ -214,6 +214,7 @@ extern "C" {
 #include <rte_errno.h>
 #include <rte_mbuf_pool_ops.h>
 #include <rte_mempool.h>
+#include <rte_power_intrinsics.h>
 
 #include "rte_eventdev_trace_fp.h"
 
@@ -990,6 +991,30 @@ int
 rte_event_port_attr_get(uint8_t dev_id, uint8_t port_id, uint32_t attr_id,
 			uint32_t *attr_value);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Retrieve the monitor condition for a given event port.
+ *
+ * @param dev_id
+ *   Eventdev id
+ * @param port_id
+ *   Eventdev port id
+ * @param pmc
+ *   The pointer to power-optimized monitoring condition structure.
+ *
+ * @return
+ *   - 0: Success.
+ *   -ENOTSUP: Operation not supported.
+ *   -EINVAL: Invalid parameters.
+ *   -ENODEV: Invalid device ID.
+ */
+__rte_experimental
+int
+rte_event_port_get_monitor_addr(uint8_t dev_id, uint8_t port_id,
+		struct rte_power_monitor_cond *pmc);
+
 /**
  * Start an event device.
  *
diff --git a/lib/eventdev/version.map b/lib/eventdev/version.map
index 42a2f7206e..fa9eb069ff 100644
--- a/lib/eventdev/version.map
+++ b/lib/eventdev/version.map
@@ -154,6 +154,7 @@ EXPERIMENTAL {
 	rte_event_port_profile_links_set;
 	rte_event_port_profile_unlink;
 	rte_event_port_profile_links_get;
+	rte_event_port_get_monitor_addr;
 	__rte_eventdev_trace_port_profile_switch;
 };
 
-- 
2.34.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v1 2/6] event/sw: support power monitor
  2023-10-16 20:57 ` [PATCH v1 1/6] " Sivaprasad Tummala
@ 2023-10-16 20:57   ` Sivaprasad Tummala
  2023-10-16 23:41     ` Tyler Retzlaff
  2023-10-16 20:57   ` [PATCH v1 3/6] eventdev: support optional dequeue callbacks Sivaprasad Tummala
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 33+ messages in thread
From: Sivaprasad Tummala @ 2023-10-16 20:57 UTC (permalink / raw)
  To: jerinjacobk, harry.van.haaren, anatoly.burakov
  Cc: dev, ferruh.yigit, david.hunt

Currently sw eventdev pmd does not support ``rte_power_monitor`` api.
This patch adds support by adding monitor callback that is called
whenever we enter sleep state and need to check if it is time to
wake up.

Signed-off-by: Sivaprasad Tummala <sivaprasad.tummala@amd.com>
---
 drivers/event/sw/sw_evdev.c        |  1 +
 drivers/event/sw/sw_evdev.h        |  2 ++
 drivers/event/sw/sw_evdev_worker.c | 27 +++++++++++++++++++++++++++
 3 files changed, 30 insertions(+)

diff --git a/drivers/event/sw/sw_evdev.c b/drivers/event/sw/sw_evdev.c
index 6d1816b76d..99b3c0d92f 100644
--- a/drivers/event/sw/sw_evdev.c
+++ b/drivers/event/sw/sw_evdev.c
@@ -958,6 +958,7 @@ sw_probe(struct rte_vdev_device *vdev)
 			.port_link = sw_port_link,
 			.port_unlink = sw_port_unlink,
 			.port_unlinks_in_progress = sw_port_unlinks_in_progress,
+			.get_monitor_addr = sw_event_get_monitor_addr,
 
 			.eth_rx_adapter_caps_get = sw_eth_rx_adapter_caps_get,
 
diff --git a/drivers/event/sw/sw_evdev.h b/drivers/event/sw/sw_evdev.h
index c7b943a72b..26aa2fe283 100644
--- a/drivers/event/sw/sw_evdev.h
+++ b/drivers/event/sw/sw_evdev.h
@@ -312,6 +312,8 @@ int sw_xstats_reset(struct rte_eventdev *dev,
 		int16_t queue_port_id,
 		const uint64_t ids[],
 		uint32_t nb_ids);
+int sw_event_get_monitor_addr(void *port,
+		struct rte_power_monitor_cond *pmc);
 
 int test_sw_eventdev(void);
 
diff --git a/drivers/event/sw/sw_evdev_worker.c b/drivers/event/sw/sw_evdev_worker.c
index 063b919c7e..139c98cfe2 100644
--- a/drivers/event/sw/sw_evdev_worker.c
+++ b/drivers/event/sw/sw_evdev_worker.c
@@ -10,6 +10,33 @@
 
 #define PORT_ENQUEUE_MAX_BURST_SIZE 64
 
+static int
+sw_event_ring_monitor_callback(const uint64_t value,
+		const uint64_t arg[RTE_POWER_MONITOR_OPAQUE_SZ])
+{
+	/* Check if the head pointer has changed */
+	return value != arg[0];
+}
+
+int
+sw_event_get_monitor_addr(void *port, struct rte_power_monitor_cond *pmc)
+{
+	struct sw_port *p = (void *)port;
+	struct rte_event_ring *ev_ring = p->cq_worker_ring;
+	struct rte_ring *rng = &ev_ring->r;
+
+	/*
+	 * Monitor event ring producer tail since if
+	 * prod.tail moves there are events to dequeue
+	 */
+	pmc->addr = &rng->prod.tail;
+	pmc->size = sizeof(rng->prod.tail);
+	pmc->opaque[0] = rng->prod.tail;
+	pmc->fn = sw_event_ring_monitor_callback;
+
+	return 0;
+}
+
 static inline void
 sw_event_release(struct sw_port *p, uint8_t index)
 {
-- 
2.34.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v1 3/6] eventdev: support optional dequeue callbacks
  2023-10-16 20:57 ` [PATCH v1 1/6] " Sivaprasad Tummala
  2023-10-16 20:57   ` [PATCH v1 2/6] event/sw: support power monitor Sivaprasad Tummala
@ 2023-10-16 20:57   ` Sivaprasad Tummala
  2023-10-16 23:47     ` Tyler Retzlaff
  2023-10-16 20:57   ` [PATCH v1 4/6] event/sw: " Sivaprasad Tummala
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 33+ messages in thread
From: Sivaprasad Tummala @ 2023-10-16 20:57 UTC (permalink / raw)
  To: jerinjacobk, harry.van.haaren, anatoly.burakov
  Cc: dev, ferruh.yigit, david.hunt

Add optional support for inline event processing within pmd dequeue
call. For a dequeue callback, events dequeued from the event port
were passed them to a callback function if configured, to allow
additional processing. e.g. unpack batch of packets from each event
on dequeue, before passing back to the application.

Signed-off-by: Sivaprasad Tummala <sivaprasad.tummala@amd.com>
---
 lib/eventdev/eventdev_pmd.h      |  38 +++++++++++
 lib/eventdev/eventdev_private.c  |   2 +
 lib/eventdev/rte_eventdev.c      | 107 +++++++++++++++++++++++++++++++
 lib/eventdev/rte_eventdev.h      |  95 +++++++++++++++++++++++++++
 lib/eventdev/rte_eventdev_core.h |  12 +++-
 lib/eventdev/version.map         |   3 +
 6 files changed, 256 insertions(+), 1 deletion(-)

diff --git a/lib/eventdev/eventdev_pmd.h b/lib/eventdev/eventdev_pmd.h
index a0ee768ce7..ce067b1d5d 100644
--- a/lib/eventdev/eventdev_pmd.h
+++ b/lib/eventdev/eventdev_pmd.h
@@ -97,6 +97,19 @@ struct rte_eventdev_global {
 	uint8_t nb_devs;	/**< Number of devices found */
 };
 
+/**
+ * @internal
+ * Structure used to hold information about the callbacks to be called for a
+ * port on dequeue.
+ */
+struct rte_event_dequeue_callback {
+	struct rte_event_dequeue_callback *next;
+	union{
+		rte_dequeue_callback_fn dequeue;
+	} fn;
+	void *param;
+};
+
 /**
  * @internal
  * The data part, with no function pointers, associated with each device.
@@ -171,6 +184,10 @@ struct rte_eventdev {
 	/**< Pointer to PMD dequeue burst function. */
 	event_maintain_t maintain;
 	/**< Pointer to PMD port maintenance function. */
+	struct rte_event_dequeue_callback *post_dequeue_burst_cbs[RTE_EVENT_MAX_PORTS_PER_DEV];
+	/**<  User-supplied functions called from dequeue_burst to post-process
+	 * received packets before passing them to the user
+	 */
 	event_tx_adapter_enqueue_t txa_enqueue_same_dest;
 	/**< Pointer to PMD eth Tx adapter burst enqueue function with
 	 * events destined to same Eth port & Tx queue.
@@ -245,6 +262,27 @@ rte_event_pmd_is_valid_dev(uint8_t dev_id)
 		return 1;
 }
 
+/**
+ * Executes all the user application registered callbacks for the specific
+ * event device.
+ *
+ * @param dev_id
+ *   Event device index.
+ * @param port_id
+ *   Event port index
+ * @param ev
+ *   Points to an array of *nb_events* objects of type *rte_event* structure
+ *   for output to be populated with the dequeued event objects.
+ * @param nb_events
+ *   number of event objects
+ *
+ * @return
+ * The number of event objects
+ */
+__rte_internal
+uint16_t rte_eventdev_pmd_dequeue_callback_process(uint8_t dev_id,
+		uint8_t port_id, struct rte_event ev[], uint16_t nb_events);
+
 /**
  * Definitions of all functions exported by a driver through the
  * generic structure of type *event_dev_ops* supplied in the
diff --git a/lib/eventdev/eventdev_private.c b/lib/eventdev/eventdev_private.c
index 017f97ccab..052c526ce0 100644
--- a/lib/eventdev/eventdev_private.c
+++ b/lib/eventdev/eventdev_private.c
@@ -137,4 +137,6 @@ event_dev_fp_ops_set(struct rte_event_fp_ops *fp_op,
 	fp_op->dma_enqueue = dev->dma_enqueue;
 	fp_op->profile_switch = dev->profile_switch;
 	fp_op->data = dev->data->ports;
+	fp_op->ev_port.clbk = (void **)(uintptr_t)dev->post_dequeue_burst_cbs;
+	fp_op->ev_port.data = dev->data->ports;
 }
diff --git a/lib/eventdev/rte_eventdev.c b/lib/eventdev/rte_eventdev.c
index 5feb4326a2..f2540a6aa8 100644
--- a/lib/eventdev/rte_eventdev.c
+++ b/lib/eventdev/rte_eventdev.c
@@ -18,6 +18,7 @@
 #include <rte_common.h>
 #include <rte_malloc.h>
 #include <rte_errno.h>
+#include <rte_stdatomic.h>
 #include <ethdev_driver.h>
 #include <rte_cryptodev.h>
 #include <rte_dmadev.h>
@@ -39,6 +40,9 @@ static struct rte_eventdev_global eventdev_globals = {
 /* Public fastpath APIs. */
 struct rte_event_fp_ops rte_event_fp_ops[RTE_EVENT_MAX_DEVS];
 
+/* spinlock for add/remove dequeue callbacks */
+static rte_spinlock_t event_dev_dequeue_cb_lock = RTE_SPINLOCK_INITIALIZER;
+
 /* Event dev north bound API implementation */
 
 uint8_t
@@ -884,6 +888,109 @@ rte_event_port_attr_get(uint8_t dev_id, uint8_t port_id, uint32_t attr_id,
 	return 0;
 }
 
+const struct rte_event_dequeue_callback *
+rte_event_add_dequeue_callback(uint8_t dev_id, uint8_t port_id,
+		rte_dequeue_callback_fn fn, void *user_param)
+{
+	struct rte_eventdev *dev;
+	struct rte_event_dequeue_callback *cb;
+	struct rte_event_dequeue_callback *tail;
+
+	/* check input parameters */
+	RTE_EVENTDEV_VALID_DEVID_OR_ERR_RET(dev_id, NULL);
+	dev = &rte_eventdevs[dev_id];
+	if (!is_valid_port(dev, port_id)) {
+		RTE_EDEV_LOG_ERR("Invalid port_id=%" PRIu8, port_id);
+		return NULL;
+	}
+
+	cb = rte_zmalloc(NULL, sizeof(*cb), 0);
+	if (cb == NULL) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	cb->fn.dequeue = fn;
+	cb->param = user_param;
+
+	rte_spinlock_lock(&event_dev_dequeue_cb_lock);
+	/* Add the callbacks in fifo order. */
+	tail = rte_eventdevs[dev_id].post_dequeue_burst_cbs[port_id];
+	if (!tail) {
+		/* Stores to cb->fn and cb->param should complete before
+		 * cb is visible to data plane.
+		 */
+		rte_atomic_store_explicit(
+			&rte_eventdevs[dev_id].post_dequeue_burst_cbs[port_id],
+			cb, __ATOMIC_RELEASE);
+	} else {
+		while (tail->next)
+			tail = tail->next;
+		/* Stores to cb->fn and cb->param should complete before
+		 * cb is visible to data plane.
+		 */
+		rte_atomic_store_explicit(&tail->next, cb, __ATOMIC_RELEASE);
+	}
+	rte_spinlock_unlock(&event_dev_dequeue_cb_lock);
+
+	return cb;
+}
+
+int
+rte_event_remove_dequeue_callback(uint8_t dev_id, uint8_t port_id,
+		const struct rte_event_dequeue_callback *user_cb)
+{
+	struct rte_eventdev *dev;
+	struct rte_event_dequeue_callback *cb;
+	struct rte_event_dequeue_callback **prev_cb;
+
+	/* Check input parameters. */
+	RTE_EVENTDEV_VALID_DEVID_OR_ERR_RET(dev_id, -EINVAL);
+	dev = &rte_eventdevs[dev_id];
+	if (user_cb == NULL || !is_valid_port(dev, port_id))
+		return -EINVAL;
+
+	rte_spinlock_lock(&event_dev_dequeue_cb_lock);
+	prev_cb = &dev->post_dequeue_burst_cbs[port_id];
+	for (; *prev_cb != NULL; prev_cb = &cb->next) {
+		cb = *prev_cb;
+		if (cb == user_cb) {
+			/* Remove the user cb from the callback list. */
+			rte_atomic_store_explicit(prev_cb, cb->next,
+						__ATOMIC_RELAXED);
+			break;
+		}
+	}
+	rte_spinlock_unlock(&event_dev_dequeue_cb_lock);
+
+	return 0;
+}
+
+uint16_t rte_eventdev_pmd_dequeue_callback_process(uint8_t dev_id,
+		uint8_t port_id, struct rte_event ev[], uint16_t nb_events)
+{
+	struct rte_event_dequeue_callback *cb;
+	const struct rte_event_fp_ops *fp_ops;
+
+	fp_ops = &rte_event_fp_ops[dev_id];
+
+	/* __ATOMIC_RELEASE memory order was used when the
+	 * call back was inserted into the list.
+	 * Since there is a clear dependency between loading
+	 * cb and cb->fn/cb->next, __ATOMIC_ACQUIRE memory order is
+	 * not required.
+	 */
+	cb = rte_atomic_load_explicit((void **)&fp_ops->ev_port.clbk[port_id],
+					__ATOMIC_RELAXED);
+	if (unlikely(cb != NULL))
+		while (cb != NULL) {
+			nb_events = cb->fn.dequeue(dev_id, port_id, ev,
+			nb_events, cb->param);
+			cb = cb->next;
+		}
+
+	return nb_events;
+}
+
 int
 rte_event_port_get_monitor_addr(uint8_t dev_id, uint8_t port_id,
 				struct rte_power_monitor_cond *pmc)
diff --git a/lib/eventdev/rte_eventdev.h b/lib/eventdev/rte_eventdev.h
index 38dbbc2617..c0097c0a23 100644
--- a/lib/eventdev/rte_eventdev.h
+++ b/lib/eventdev/rte_eventdev.h
@@ -954,6 +954,101 @@ void
 rte_event_port_quiesce(uint8_t dev_id, uint8_t port_id,
 		       rte_eventdev_port_flush_t release_cb, void *args);
 
+struct rte_event_dequeue_callback;
+
+/**
+ * Function type used for dequeue event processing callbacks.
+ *
+ * The callback function is called on dequeue with a burst of events that have
+ * been received on the given event port.
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ * @param port_id
+ *   The identifier of the event port.
+ * @param[out] ev
+ *   Points to an array of *nb_events* objects of type *rte_event* structure
+ *   for output to be populated with the dequeued event objects.
+ * @param nb_events
+ *   The maximum number of event objects to dequeue, typically number of
+ *   rte_event_port_dequeue_depth() available for this port.
+ * @param opaque
+ *   Opaque pointer of event port callback related data.
+ *
+ * @return
+ *  The number of event objects returned to the user.
+ */
+typedef uint16_t (*rte_dequeue_callback_fn)(uint8_t dev_id, uint8_t port_id,
+		struct rte_event *ev, uint16_t nb_events, void *user_param);
+
+/**
+ * Add a callback to be called on event dequeue on a given event device port.
+ *
+ * This API configures a function to be called for each burst of
+ * events dequeued on a given event device port. The return value is a pointer
+ * that can be used to later remove the callback using
+ * rte_event_remove_dequeue_callback().
+ *
+ * Multiple functions are called in the order that they are added.
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ * @param port_id
+ *   The identifier of the event port.
+ * @param fn
+ *   The callback function
+ * @param user_param
+ *   A generic pointer parameter which will be passed to each invocation of the
+ *   callback function on this event device port. Inter-thread synchronization
+ *   of any user data changes is the responsibility of the user.
+ *
+ * @return
+ *   NULL on error.
+ *   On success, a pointer value which can later be used to remove the callback.
+ */
+__rte_experimental
+const struct rte_event_dequeue_callback *
+rte_event_add_dequeue_callback(uint8_t dev_id, uint8_t port_id,
+		rte_dequeue_callback_fn fn, void *user_param);
+
+/**
+ * Remove a dequeue event callback from a given event device port.
+ *
+ * This API is used to removed callbacks that were added to a event device port
+ * using rte_event_add_dequeue_callback().
+ *
+ * Note: the callback is removed from the callback list but it isn't freed
+ * since the it may still be in use. The memory for the callback can be
+ * subsequently freed back by the application by calling rte_free():
+ *
+ * - Immediately - if the device is stopped, or the user knows that no
+ *   callbacks are in flight e.g. if called from the thread doing dequeue
+ *   on that port.
+ *
+ * - After a short delay - where the delay is sufficient to allow any
+ *   in-flight callbacks to complete. Alternately, the RCU mechanism can be
+ *   used to detect when data plane threads have ceased referencing the
+ *   callback memory.
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ * @param port_id
+ *   The identifier of the event port.
+ * @param user_cb
+ *   The callback function
+ *
+ * @return
+ *   - 0: Success. Callback was removed.
+ *   - -ENODEV:  If *dev_id* is invalid.
+ *   - -EINVAL:  The port_id is out of range, or the callback
+ *               is NULL.
+ */
+__rte_experimental
+int
+rte_event_remove_dequeue_callback(uint8_t dev_id, uint8_t port_id,
+		const struct rte_event_dequeue_callback *user_cb);
+
+
 /**
  * The queue depth of the port on the enqueue side
  */
diff --git a/lib/eventdev/rte_eventdev_core.h b/lib/eventdev/rte_eventdev_core.h
index 5b405518d1..5ce93c4b6f 100644
--- a/lib/eventdev/rte_eventdev_core.h
+++ b/lib/eventdev/rte_eventdev_core.h
@@ -49,6 +49,14 @@ typedef uint16_t (*event_dma_adapter_enqueue_t)(void *port, struct rte_event ev[
 typedef int (*event_profile_switch_t)(void *port, uint8_t profile);
 /**< @internal Switch active link profile on the event port. */
 
+struct rte_eventdev_port_data {
+	void **data;
+	/**< points to array of internal port data pointers */
+	void **clbk;
+	/**< points to array of port callback data pointers */
+};
+/**< @internal Structure used to hold opaque eventdev port data. */
+
 struct rte_event_fp_ops {
 	void **data;
 	/**< points to array of internal port data pointers */
@@ -76,7 +84,9 @@ struct rte_event_fp_ops {
 	/**< PMD DMA adapter enqueue function. */
 	event_profile_switch_t profile_switch;
 	/**< PMD Event switch profile function. */
-	uintptr_t reserved[4];
+	struct rte_eventdev_port_data ev_port;
+	/**< Eventdev port data. */
+	uintptr_t reserved[1];
 } __rte_cache_aligned;
 
 extern struct rte_event_fp_ops rte_event_fp_ops[RTE_EVENT_MAX_DEVS];
diff --git a/lib/eventdev/version.map b/lib/eventdev/version.map
index fa9eb069ff..a0c7aa5bbd 100644
--- a/lib/eventdev/version.map
+++ b/lib/eventdev/version.map
@@ -155,6 +155,8 @@ EXPERIMENTAL {
 	rte_event_port_profile_unlink;
 	rte_event_port_profile_links_get;
 	rte_event_port_get_monitor_addr;
+	rte_event_add_dequeue_callback;
+	rte_event_remove_dequeue_callback;
 	__rte_eventdev_trace_port_profile_switch;
 };
 
@@ -165,6 +167,7 @@ INTERNAL {
 	event_dev_fp_ops_set;
 	event_dev_probing_finish;
 	rte_event_pmd_allocate;
+	rte_eventdev_pmd_dequeue_callback_process;
 	rte_event_pmd_get_named_dev;
 	rte_event_pmd_is_valid_dev;
 	rte_event_pmd_pci_probe;
-- 
2.34.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v1 4/6] event/sw: support optional dequeue callbacks
  2023-10-16 20:57 ` [PATCH v1 1/6] " Sivaprasad Tummala
  2023-10-16 20:57   ` [PATCH v1 2/6] event/sw: support power monitor Sivaprasad Tummala
  2023-10-16 20:57   ` [PATCH v1 3/6] eventdev: support optional dequeue callbacks Sivaprasad Tummala
@ 2023-10-16 20:57   ` Sivaprasad Tummala
  2023-10-16 20:57   ` [PATCH v1 5/6] power: add eventdev support for power management Sivaprasad Tummala
  2023-10-16 20:57   ` [PATCH v1 6/6] examples/eventdev_p: add eventdev " Sivaprasad Tummala
  4 siblings, 0 replies; 33+ messages in thread
From: Sivaprasad Tummala @ 2023-10-16 20:57 UTC (permalink / raw)
  To: jerinjacobk, harry.van.haaren, anatoly.burakov
  Cc: dev, ferruh.yigit, david.hunt

This patch adds support for an event dequeue callbacks when configured
to allow additional processing, power management (frequency scaling,
monitor/sleep etc.,) and soon.

Signed-off-by: Sivaprasad Tummala <sivaprasad.tummala@amd.com>
---
 config/rte_config.h                | 3 +++
 drivers/event/sw/sw_evdev_worker.c | 5 +++++
 2 files changed, 8 insertions(+)

diff --git a/config/rte_config.h b/config/rte_config.h
index 9036214013..b05a86a267 100644
--- a/config/rte_config.h
+++ b/config/rte_config.h
@@ -135,4 +135,7 @@
 /* DLB2 defines */
 // RTE_LIBRTE_PMD_DLB2_QUELL_STATS is not set
 
+/* Event/SW defines */
+#define RTE_EVENT_SW_DEQ_CALLBACKS 1
+
 #endif /* _RTE_CONFIG_H_ */
diff --git a/drivers/event/sw/sw_evdev_worker.c b/drivers/event/sw/sw_evdev_worker.c
index 139c98cfe2..3a954b2385 100644
--- a/drivers/event/sw/sw_evdev_worker.c
+++ b/drivers/event/sw/sw_evdev_worker.c
@@ -5,6 +5,7 @@
 #include <rte_atomic.h>
 #include <rte_cycles.h>
 #include <rte_event_ring.h>
+#include <eventdev_pmd.h>
 
 #include "sw_evdev.h"
 
@@ -203,6 +204,10 @@ sw_event_dequeue_burst(void *port, struct rte_event *ev, uint16_t num,
 	p->total_polls++;
 
 end:
+#ifdef RTE_EVENT_SW_DEQ_CALLBACKS
+	ndeq = rte_eventdev_pmd_dequeue_callback_process(p->sw->data->dev_id,
+			p->id, ev, ndeq);
+#endif
 	return ndeq;
 }
 
-- 
2.34.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v1 5/6] power: add eventdev support for power management
  2023-10-16 20:57 ` [PATCH v1 1/6] " Sivaprasad Tummala
                     ` (2 preceding siblings ...)
  2023-10-16 20:57   ` [PATCH v1 4/6] event/sw: " Sivaprasad Tummala
@ 2023-10-16 20:57   ` Sivaprasad Tummala
  2023-10-16 23:51     ` Tyler Retzlaff
  2023-10-17  3:22     ` Jerin Jacob
  2023-10-16 20:57   ` [PATCH v1 6/6] examples/eventdev_p: add eventdev " Sivaprasad Tummala
  4 siblings, 2 replies; 33+ messages in thread
From: Sivaprasad Tummala @ 2023-10-16 20:57 UTC (permalink / raw)
  To: jerinjacobk, harry.van.haaren, anatoly.burakov
  Cc: dev, ferruh.yigit, david.hunt

Add eventdev support to enable power saving when no events
are arriving. It is based on counting the number of empty
polls and, when the number reaches a certain threshold, entering
an architecture-defined optimized power state that will either wait
until a TSC timestamp expires, or when events arrive.

This API mandates a core-to-single-port mapping (i.e. one core polling
multiple ports of event device is not supported). This should be ok
as the general use case will have one CPU core using one port to
enqueue/dequeue events from an eventdev.

This design is using Eventdev PMD Dequeue callbacks.

1. MWAITX/MONITORX:

   When a certain threshold of empty polls is reached, the core will go
   into a power optimized sleep while waiting on an address of next RX
   descriptor to be written to.

2. Pause instruction

   This method uses the pause instruction to avoid busy polling.

Signed-off-by: Sivaprasad Tummala <sivaprasad.tummala@amd.com>
---
 lib/power/meson.build          |   2 +-
 lib/power/rte_power_pmd_mgmt.c | 226 +++++++++++++++++++++++++++++++++
 lib/power/rte_power_pmd_mgmt.h |  55 ++++++++
 lib/power/version.map          |   4 +
 4 files changed, 286 insertions(+), 1 deletion(-)

diff --git a/lib/power/meson.build b/lib/power/meson.build
index 056d0043d8..86e178bbb4 100644
--- a/lib/power/meson.build
+++ b/lib/power/meson.build
@@ -32,4 +32,4 @@ headers = files(
 if cc.has_argument('-Wno-cast-qual')
     cflags += '-Wno-cast-qual'
 endif
-deps += ['timer', 'ethdev']
+deps += ['timer', 'ethdev', 'eventdev']
diff --git a/lib/power/rte_power_pmd_mgmt.c b/lib/power/rte_power_pmd_mgmt.c
index 38f8384085..df3ac2d221 100644
--- a/lib/power/rte_power_pmd_mgmt.c
+++ b/lib/power/rte_power_pmd_mgmt.c
@@ -9,8 +9,10 @@
 #include <rte_cpuflags.h>
 #include <rte_malloc.h>
 #include <rte_ethdev.h>
+#include <rte_eventdev.h>
 #include <rte_power_intrinsics.h>
 
+#include <eventdev_pmd.h>
 #include "rte_power_pmd_mgmt.h"
 #include "power_common.h"
 
@@ -53,6 +55,7 @@ struct queue_list_entry {
 	uint64_t n_empty_polls;
 	uint64_t n_sleeps;
 	const struct rte_eth_rxtx_callback *cb;
+	const struct rte_event_dequeue_callback *evt_cb;
 };
 
 struct pmd_core_cfg {
@@ -414,6 +417,64 @@ cfg_queues_stopped(struct pmd_core_cfg *queue_cfg)
 	return 1;
 }
 
+static uint16_t
+evt_clb_umwait(uint8_t dev_id, uint8_t port_id, struct rte_event *ev __rte_unused,
+		uint16_t nb_events, void *arg)
+{
+	struct queue_list_entry *queue_conf = arg;
+
+	/* this callback can't do more than one queue, omit multiqueue logic */
+	if (unlikely(nb_events == 0)) {
+		queue_conf->n_empty_polls++;
+		if (unlikely(queue_conf->n_empty_polls > emptypoll_max)) {
+			struct rte_power_monitor_cond pmc;
+			int ret;
+
+			/* use monitoring condition to sleep */
+			ret = rte_event_port_get_monitor_addr(dev_id, port_id,
+					&pmc);
+			if (ret == 0)
+				rte_power_monitor(&pmc, UINT64_MAX);
+		}
+	} else
+		queue_conf->n_empty_polls = 0;
+
+	return nb_events;
+}
+
+static uint16_t
+evt_clb_pause(uint8_t dev_id __rte_unused, uint8_t port_id __rte_unused,
+		struct rte_event *ev __rte_unused,
+		uint16_t nb_events, void *arg)
+{
+	const unsigned int lcore = rte_lcore_id();
+	struct queue_list_entry *queue_conf = arg;
+	struct pmd_core_cfg *lcore_conf;
+	const bool empty = nb_events == 0;
+	uint32_t pause_duration = rte_power_pmd_mgmt_get_pause_duration();
+
+	lcore_conf = &lcore_cfgs[lcore];
+
+	if (likely(!empty))
+		/* early exit */
+		queue_reset(lcore_conf, queue_conf);
+	else {
+		/* can this queue sleep? */
+		if (!queue_can_sleep(lcore_conf, queue_conf))
+			return nb_events;
+
+		/* can this lcore sleep? */
+		if (!lcore_can_sleep(lcore_conf))
+			return nb_events;
+
+		uint64_t i;
+		for (i = 0; i < global_data.pause_per_us * pause_duration; i++)
+			rte_pause();
+	}
+
+	return nb_events;
+}
+
 static int
 check_scale(unsigned int lcore)
 {
@@ -481,6 +542,171 @@ get_monitor_callback(void)
 		clb_multiwait : clb_umwait;
 }
 
+static int
+check_evt_monitor(struct pmd_core_cfg *cfg __rte_unused,
+		const union queue *qdata)
+{
+	struct rte_power_monitor_cond dummy;
+
+	/* check if rte_power_monitor is supported */
+	if (!global_data.intrinsics_support.power_monitor) {
+		RTE_LOG(DEBUG, POWER, "Monitoring intrinsics are not supported\n");
+		return -ENOTSUP;
+	}
+
+	/* check if the device supports the necessary PMD API */
+	if (rte_event_port_get_monitor_addr((uint8_t)qdata->portid, (uint8_t)qdata->qid,
+				&dummy) == -ENOTSUP) {
+		RTE_LOG(DEBUG, POWER, "event port does not support rte_event_get_monitor_addr\n");
+		return -ENOTSUP;
+	}
+
+	/* we're done */
+	return 0;
+}
+
+int
+rte_power_eventdev_pmgmt_port_enable(unsigned int lcore_id, uint8_t dev_id,
+		uint8_t port_id, enum rte_power_pmd_mgmt_type mode)
+{
+	const union queue qdata = {.portid = dev_id, .qid = port_id};
+	struct pmd_core_cfg *lcore_cfg;
+	struct queue_list_entry *queue_cfg;
+	struct rte_event_dev_info info;
+	rte_dequeue_callback_fn clb;
+	int ret;
+
+	RTE_EVENTDEV_VALID_DEVID_OR_ERR_RET(dev_id, -EINVAL);
+
+	if (lcore_id >= RTE_MAX_LCORE) {
+		ret = -EINVAL;
+		goto end;
+	}
+
+	if (rte_event_dev_info_get(dev_id, &info) < 0) {
+		ret = -EINVAL;
+		goto end;
+	}
+
+	/* check if queue id is valid */
+	if (port_id >= info.max_event_ports) {
+		ret = -EINVAL;
+		goto end;
+	}
+
+	lcore_cfg = &lcore_cfgs[lcore_id];
+
+	/* if callback was already enabled, check current callback type */
+	if (lcore_cfg->pwr_mgmt_state != PMD_MGMT_DISABLED &&
+		lcore_cfg->cb_mode != mode) {
+		ret = -EINVAL;
+		goto end;
+	}
+
+	/* we need this in various places */
+	rte_cpu_get_intrinsics_support(&global_data.intrinsics_support);
+
+	switch (mode) {
+	case RTE_POWER_MGMT_TYPE_MONITOR:
+		/* check if we can add a new port */
+		ret = check_evt_monitor(lcore_cfg, &qdata);
+		if (ret < 0)
+			goto end;
+
+		clb = evt_clb_umwait;
+		break;
+	case RTE_POWER_MGMT_TYPE_PAUSE:
+		/* figure out various time-to-tsc conversions */
+		if (global_data.tsc_per_us == 0)
+			calc_tsc();
+
+		clb = evt_clb_pause;
+		break;
+	default:
+		RTE_LOG(DEBUG, POWER, "Invalid power management type\n");
+		ret = -EINVAL;
+		goto end;
+	}
+	/* add this queue to the list */
+	ret = queue_list_add(lcore_cfg, &qdata);
+	if (ret < 0) {
+		RTE_LOG(DEBUG, POWER, "Failed to add queue to list: %s\n",
+				strerror(-ret));
+		goto end;
+	}
+	/* new queue is always added last */
+	queue_cfg = TAILQ_LAST(&lcore_cfg->head, queue_list_head);
+
+	/* when enabling first queue, ensure sleep target is not 0 */
+	if (lcore_cfg->n_queues == 1 && lcore_cfg->sleep_target == 0)
+		lcore_cfg->sleep_target = 1;
+
+	/* initialize data before enabling the callback */
+	if (lcore_cfg->n_queues == 1) {
+		lcore_cfg->cb_mode = mode;
+		lcore_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED;
+	}
+	queue_cfg->evt_cb = rte_event_add_dequeue_callback(dev_id, port_id,
+						clb, queue_cfg);
+
+	ret = 0;
+end:
+	return ret;
+}
+
+int
+rte_power_eventdev_pmgmt_port_disable(unsigned int lcore_id,
+		uint8_t dev_id, uint8_t port_id)
+{
+	const union queue qdata = {.portid = dev_id, .qid = port_id};
+	struct pmd_core_cfg *lcore_cfg;
+	struct queue_list_entry *queue_cfg;
+
+	RTE_EVENTDEV_VALID_DEVID_OR_ERR_RET(dev_id, -EINVAL);
+
+	if (lcore_id >= RTE_MAX_LCORE)
+		return -EINVAL;
+
+	/* no need to check queue id as wrong queue id would not be enabled */
+	lcore_cfg = &lcore_cfgs[lcore_id];
+
+	if (lcore_cfg->pwr_mgmt_state != PMD_MGMT_ENABLED)
+		return -EINVAL;
+
+	/*
+	 * There is no good/easy way to do this without race conditions, so we
+	 * are just going to throw our hands in the air and hope that the user
+	 * has read the documentation and has ensured that ports are stopped at
+	 * the time we enter the API functions.
+	 */
+	queue_cfg = queue_list_take(lcore_cfg, &qdata);
+	if (queue_cfg == NULL)
+		return -ENOENT;
+
+	/* if we've removed all queues from the lists, set state to disabled */
+	if (lcore_cfg->n_queues == 0)
+		lcore_cfg->pwr_mgmt_state = PMD_MGMT_DISABLED;
+
+	switch (lcore_cfg->cb_mode) {
+	case RTE_POWER_MGMT_TYPE_MONITOR: /* fall-through */
+	case RTE_POWER_MGMT_TYPE_SCALE:
+	case RTE_POWER_MGMT_TYPE_PAUSE:
+		rte_event_remove_dequeue_callback(dev_id, port_id,
+			queue_cfg->evt_cb);
+		break;
+	}
+	/*
+	 * the API doc mandates that the user stops all processing on affected
+	 * ports before calling any of these API's, so we can assume that the
+	 * callbacks can be freed. we're intentionally casting away const-ness.
+	 */
+	rte_free((void *)queue_cfg->evt_cb);
+	free(queue_cfg);
+
+	return 0;
+}
+
+
 int
 rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 		uint16_t queue_id, enum rte_power_pmd_mgmt_type mode)
diff --git a/lib/power/rte_power_pmd_mgmt.h b/lib/power/rte_power_pmd_mgmt.h
index 0f1a2eb22e..e1966b9777 100644
--- a/lib/power/rte_power_pmd_mgmt.h
+++ b/lib/power/rte_power_pmd_mgmt.h
@@ -87,6 +87,61 @@ int
 rte_power_ethdev_pmgmt_queue_disable(unsigned int lcore_id,
 		uint16_t port_id, uint16_t queue_id);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice.
+ *
+ * Enable power management on a specified Event device port and lcore.
+ *
+ * @note This function is not thread-safe.
+ *
+ * @warning This function must be called when the event device stopped and
+ * no enqueue/dequeue is in progress!
+ *
+ * @param lcore_id
+ *   The lcore the event port will be polled from.
+ * @param dev_id
+ *   The identifier of the device.
+ * @param port_id
+ *   Event port identifier of the Event device.
+ * @param mode
+ *   The power management scheme to use for specified event port.
+ * @return
+ *   0 on success
+ *   <0 on error
+ */
+__rte_experimental
+int
+rte_power_eventdev_pmgmt_port_enable(unsigned int lcore_id,
+		uint8_t dev_id, uint8_t port_id,
+		enum rte_power_pmd_mgmt_type mode);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice.
+ *
+ * Disable power management on a specified Ethernet device Rx queue and lcore.
+ *
+ * @note This function is not thread-safe.
+ *
+ * @warning This function must be called when all affected Ethernet queues are
+ *   stopped and no Rx/Tx is in progress!
+ *
+ * @param lcore_id
+ *   The lcore the Rx queue is polled from.
+ * @param dev_id
+ *   The identifier of the device.
+ * @param port_id
+ *   Event port identifier of the Event device.
+ * @return
+ *   0 on success
+ *   <0 on error
+ */
+__rte_experimental
+int
+rte_power_eventdev_pmgmt_port_disable(unsigned int lcore_id,
+		uint8_t dev_id, uint8_t port_id);
+
 /**
  * @warning
  * @b EXPERIMENTAL: this API may change, or be removed, without prior notice.
diff --git a/lib/power/version.map b/lib/power/version.map
index b8b54f768e..4ab762e072 100644
--- a/lib/power/version.map
+++ b/lib/power/version.map
@@ -52,4 +52,8 @@ EXPERIMENTAL {
 	rte_power_uncore_get_num_freqs;
 	rte_power_uncore_get_num_pkgs;
 	rte_power_uncore_init;
+
+	# added in 23.07
+	rte_power_eventdev_pmgmt_port_enable;
+	rte_power_eventdev_pmgmt_port_disable;
 };
-- 
2.34.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v1 6/6] examples/eventdev_p: add eventdev power management
  2023-10-16 20:57 ` [PATCH v1 1/6] " Sivaprasad Tummala
                     ` (3 preceding siblings ...)
  2023-10-16 20:57   ` [PATCH v1 5/6] power: add eventdev support for power management Sivaprasad Tummala
@ 2023-10-16 20:57   ` Sivaprasad Tummala
  4 siblings, 0 replies; 33+ messages in thread
From: Sivaprasad Tummala @ 2023-10-16 20:57 UTC (permalink / raw)
  To: jerinjacobk, harry.van.haaren, anatoly.burakov
  Cc: dev, ferruh.yigit, david.hunt

Add power management feature support to eventdev_pipeline sample app.

A new command option "--pmd-mgmt" was added to select power management
mode on worker cores. Options currently supported are "pause/monitor".
Default, no power management features are enabled on the worker ports.

Signed-off-by: Sivaprasad Tummala <sivaprasad.tummala@amd.com>
---
 .../sample_app_ug/eventdev_pipeline.rst       |  3 +-
 examples/eventdev_pipeline/main.c             | 64 ++++++++++++++++++-
 examples/eventdev_pipeline/pipeline_common.h  |  1 +
 .../pipeline_worker_generic.c                 |  1 +
 4 files changed, 66 insertions(+), 3 deletions(-)

diff --git a/doc/guides/sample_app_ug/eventdev_pipeline.rst b/doc/guides/sample_app_ug/eventdev_pipeline.rst
index 19ff53803e..1186d0af3d 100644
--- a/doc/guides/sample_app_ug/eventdev_pipeline.rst
+++ b/doc/guides/sample_app_ug/eventdev_pipeline.rst
@@ -44,11 +44,12 @@ these settings is shown below:
  * ``-c32``: worker dequeue depth of 32
  * ``-W1000``: do 1000 cycles of work per packet in each stage
  * ``-D``: dump statistics on exit
+ * ``--pmd-mgmt=pause``: worker core power management using pause;
 
 .. code-block:: console
 
     ./<build_dir>/examples/dpdk-eventdev_pipeline -l 0,2,8-15 --vdev event_sw0 \
-    -- -r1 -t1 -e4 -w FF00 -s4 -n0 -c32 -W1000 -D
+    -- -r1 -t1 -e4 -w FF00 -s4 -n0 -c32 -W1000 -D --pmd-mgmt=pause
 
 The application has some sanity checking built-in, so if there is a function
 (e.g.; the RX core) which doesn't have a cpu core mask assigned, the application
diff --git a/examples/eventdev_pipeline/main.c b/examples/eventdev_pipeline/main.c
index 0c995d1a70..3bb55054ce 100644
--- a/examples/eventdev_pipeline/main.c
+++ b/examples/eventdev_pipeline/main.c
@@ -24,6 +24,9 @@ struct config_data cdata = {
 	.worker_cq_depth = 16
 };
 
+static enum rte_power_pmd_mgmt_type pmgmt_mode;
+bool pmgmt_enabled;
+
 static void
 dump_core_info(unsigned int lcore_id, struct worker_data *data,
 		unsigned int worker_idx)
@@ -122,6 +125,8 @@ parse_coremask(const char *coremask)
 	return mask;
 }
 
+#define CMD_LINE_OPT_PMD_MGMT "pmd-mgmt"
+
 static struct option long_options[] = {
 	{"workers", required_argument, 0, 'w'},
 	{"packets", required_argument, 0, 'n'},
@@ -139,6 +144,7 @@ static struct option long_options[] = {
 	{"quiet", no_argument, 0, 'q'},
 	{"use-atq", no_argument, 0, 'a'},
 	{"dump", no_argument, 0, 'D'},
+	{CMD_LINE_OPT_PMD_MGMT, 1, 0, 0},
 	{0, 0, 0, 0}
 };
 
@@ -163,12 +169,38 @@ usage(void)
 		"  -q, --quiet                  Minimize printed output\n"
 		"  -a, --use-atq                Use all type queues\n"
 		"  -m, --mempool-size=N         Dictate the mempool size\n"
-		"  -D, --dump                   Print detailed statistics before exit"
+		"  -D, --dump                   Print detailed statistics before exit\n"
+		" --pmd-mgmt MODE		enable PMD power management mode."
 		"\n";
 	fprintf(stderr, "%s", usage_str);
 	exit(1);
 }
 
+static int
+parse_pmd_mgmt_config(const char *name)
+{
+#define PMD_MGMT_MONITOR "monitor"
+#define PMD_MGMT_PAUSE   "pause"
+#define PMD_MGMT_SCALE   "scale"
+
+	if (strncmp(PMD_MGMT_MONITOR, name, sizeof(PMD_MGMT_MONITOR)) == 0) {
+		pmgmt_mode = RTE_POWER_MGMT_TYPE_MONITOR;
+		return 0;
+	}
+
+	if (strncmp(PMD_MGMT_PAUSE, name, sizeof(PMD_MGMT_PAUSE)) == 0) {
+		pmgmt_mode = RTE_POWER_MGMT_TYPE_PAUSE;
+		return 0;
+	}
+
+	if (strncmp(PMD_MGMT_SCALE, name, sizeof(PMD_MGMT_SCALE)) == 0) {
+		pmgmt_mode = RTE_POWER_MGMT_TYPE_SCALE;
+		return 0;
+	}
+	/* unknown PMD power management mode */
+	return -1;
+}
+
 static void
 parse_app_args(int argc, char **argv)
 {
@@ -246,6 +278,21 @@ parse_app_args(int argc, char **argv)
 		case 'm':
 			cdata.num_mbuf = (uint64_t)atol(optarg);
 			break;
+		/* long options */
+		case 0:
+			if (!strncmp(long_options[option_index].name,
+					CMD_LINE_OPT_PMD_MGMT,
+					sizeof(CMD_LINE_OPT_PMD_MGMT))) {
+				if (parse_pmd_mgmt_config(optarg) < 0) {
+					printf(" Invalid power mgmt mode: %s\n",
+							optarg);
+					return;
+				}
+				pmgmt_enabled = true;
+				printf("PMD power mgmt mode is enabled\n");
+			}
+			break;
+
 		default:
 			usage();
 		}
@@ -430,7 +477,20 @@ main(int argc, char **argv)
 			continue;
 
 		dump_core_info(lcore_id, worker_data, worker_idx);
-
+		if (pmgmt_enabled) {
+			if (fdata->worker_core[lcore_id]) {
+				err = rte_power_eventdev_pmgmt_port_enable(
+							lcore_id, worker_data[worker_idx].dev_id,
+							worker_data[worker_idx].port_id,
+							pmgmt_mode);
+				if (err) {
+					RTE_LOG(ERR, POWER,
+						"Power Management enabled failed on core %u\n",
+						lcore_id);
+					continue;
+				}
+			}
+		}
 		err = rte_eal_remote_launch(fdata->cap.worker,
 				&worker_data[worker_idx], lcore_id);
 		if (err) {
diff --git a/examples/eventdev_pipeline/pipeline_common.h b/examples/eventdev_pipeline/pipeline_common.h
index 28b6ab85ff..b33162adfb 100644
--- a/examples/eventdev_pipeline/pipeline_common.h
+++ b/examples/eventdev_pipeline/pipeline_common.h
@@ -19,6 +19,7 @@
 #include <rte_event_eth_tx_adapter.h>
 #include <rte_service.h>
 #include <rte_service_component.h>
+#include <rte_power_pmd_mgmt.h>
 
 #define MAX_NUM_STAGES 8
 #define BATCH_SIZE 16
diff --git a/examples/eventdev_pipeline/pipeline_worker_generic.c b/examples/eventdev_pipeline/pipeline_worker_generic.c
index 783f68c91e..22d644bd51 100644
--- a/examples/eventdev_pipeline/pipeline_worker_generic.c
+++ b/examples/eventdev_pipeline/pipeline_worker_generic.c
@@ -6,6 +6,7 @@
 
 #include <stdlib.h>
 
+#include <rte_power_pmd_mgmt.h>
 #include "pipeline_common.h"
 
 static __rte_always_inline int
-- 
2.34.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v1 2/6] event/sw: support power monitor
  2023-10-16 20:57   ` [PATCH v1 2/6] event/sw: support power monitor Sivaprasad Tummala
@ 2023-10-16 23:41     ` Tyler Retzlaff
  0 siblings, 0 replies; 33+ messages in thread
From: Tyler Retzlaff @ 2023-10-16 23:41 UTC (permalink / raw)
  To: Sivaprasad Tummala
  Cc: jerinjacobk, harry.van.haaren, anatoly.burakov, dev,
	ferruh.yigit, david.hunt

On Mon, Oct 16, 2023 at 01:57:11PM -0700, Sivaprasad Tummala wrote:
> Currently sw eventdev pmd does not support ``rte_power_monitor`` api.
> This patch adds support by adding monitor callback that is called
> whenever we enter sleep state and need to check if it is time to
> wake up.
> 
> Signed-off-by: Sivaprasad Tummala <sivaprasad.tummala@amd.com>
> ---
>  drivers/event/sw/sw_evdev.c        |  1 +
>  drivers/event/sw/sw_evdev.h        |  2 ++
>  drivers/event/sw/sw_evdev_worker.c | 27 +++++++++++++++++++++++++++
>  3 files changed, 30 insertions(+)
> 
> diff --git a/drivers/event/sw/sw_evdev.c b/drivers/event/sw/sw_evdev.c
> index 6d1816b76d..99b3c0d92f 100644
> --- a/drivers/event/sw/sw_evdev.c
> +++ b/drivers/event/sw/sw_evdev.c
> @@ -958,6 +958,7 @@ sw_probe(struct rte_vdev_device *vdev)
>  			.port_link = sw_port_link,
>  			.port_unlink = sw_port_unlink,
>  			.port_unlinks_in_progress = sw_port_unlinks_in_progress,
> +			.get_monitor_addr = sw_event_get_monitor_addr,
>  
>  			.eth_rx_adapter_caps_get = sw_eth_rx_adapter_caps_get,
>  
> diff --git a/drivers/event/sw/sw_evdev.h b/drivers/event/sw/sw_evdev.h
> index c7b943a72b..26aa2fe283 100644
> --- a/drivers/event/sw/sw_evdev.h
> +++ b/drivers/event/sw/sw_evdev.h
> @@ -312,6 +312,8 @@ int sw_xstats_reset(struct rte_eventdev *dev,
>  		int16_t queue_port_id,
>  		const uint64_t ids[],
>  		uint32_t nb_ids);
> +int sw_event_get_monitor_addr(void *port,
> +		struct rte_power_monitor_cond *pmc);
>  
>  int test_sw_eventdev(void);
>  
> diff --git a/drivers/event/sw/sw_evdev_worker.c b/drivers/event/sw/sw_evdev_worker.c
> index 063b919c7e..139c98cfe2 100644
> --- a/drivers/event/sw/sw_evdev_worker.c
> +++ b/drivers/event/sw/sw_evdev_worker.c
> @@ -10,6 +10,33 @@
>  
>  #define PORT_ENQUEUE_MAX_BURST_SIZE 64
>  
> +static int
> +sw_event_ring_monitor_callback(const uint64_t value,
> +		const uint64_t arg[RTE_POWER_MONITOR_OPAQUE_SZ])
> +{
> +	/* Check if the head pointer has changed */
> +	return value != arg[0];
> +}
> +
> +int
> +sw_event_get_monitor_addr(void *port, struct rte_power_monitor_cond *pmc)
> +{
> +	struct sw_port *p = (void *)port;
			     ^^^^^^ cast seems redundant


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v1 3/6] eventdev: support optional dequeue callbacks
  2023-10-16 20:57   ` [PATCH v1 3/6] eventdev: support optional dequeue callbacks Sivaprasad Tummala
@ 2023-10-16 23:47     ` Tyler Retzlaff
  0 siblings, 0 replies; 33+ messages in thread
From: Tyler Retzlaff @ 2023-10-16 23:47 UTC (permalink / raw)
  To: Sivaprasad Tummala
  Cc: jerinjacobk, harry.van.haaren, anatoly.burakov, dev,
	ferruh.yigit, david.hunt

On Mon, Oct 16, 2023 at 01:57:12PM -0700, Sivaprasad Tummala wrote:
> Add optional support for inline event processing within pmd dequeue
> call. For a dequeue callback, events dequeued from the event port
> were passed them to a callback function if configured, to allow
> additional processing. e.g. unpack batch of packets from each event
> on dequeue, before passing back to the application.
> 
> Signed-off-by: Sivaprasad Tummala <sivaprasad.tummala@amd.com>
> ---
>  lib/eventdev/eventdev_pmd.h      |  38 +++++++++++
>  lib/eventdev/eventdev_private.c  |   2 +
>  lib/eventdev/rte_eventdev.c      | 107 +++++++++++++++++++++++++++++++
>  lib/eventdev/rte_eventdev.h      |  95 +++++++++++++++++++++++++++
>  lib/eventdev/rte_eventdev_core.h |  12 +++-
>  lib/eventdev/version.map         |   3 +
>  6 files changed, 256 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/eventdev/eventdev_pmd.h b/lib/eventdev/eventdev_pmd.h
> index a0ee768ce7..ce067b1d5d 100644
> --- a/lib/eventdev/eventdev_pmd.h
> +++ b/lib/eventdev/eventdev_pmd.h
> @@ -97,6 +97,19 @@ struct rte_eventdev_global {
>  	uint8_t nb_devs;	/**< Number of devices found */
>  };
>  
> +/**
> + * @internal
> + * Structure used to hold information about the callbacks to be called for a
> + * port on dequeue.
> + */
> +struct rte_event_dequeue_callback {
> +	struct rte_event_dequeue_callback *next;
> +	union{
> +		rte_dequeue_callback_fn dequeue;
> +	} fn;
> +	void *param;
> +};
> +
>  /**
>   * @internal
>   * The data part, with no function pointers, associated with each device.
> @@ -171,6 +184,10 @@ struct rte_eventdev {
>  	/**< Pointer to PMD dequeue burst function. */
>  	event_maintain_t maintain;
>  	/**< Pointer to PMD port maintenance function. */
> +	struct rte_event_dequeue_callback *post_dequeue_burst_cbs[RTE_EVENT_MAX_PORTS_PER_DEV];
> +	/**<  User-supplied functions called from dequeue_burst to post-process
> +	 * received packets before passing them to the user
> +	 */
>  	event_tx_adapter_enqueue_t txa_enqueue_same_dest;
>  	/**< Pointer to PMD eth Tx adapter burst enqueue function with
>  	 * events destined to same Eth port & Tx queue.
> @@ -245,6 +262,27 @@ rte_event_pmd_is_valid_dev(uint8_t dev_id)
>  		return 1;
>  }
>  
> +/**
> + * Executes all the user application registered callbacks for the specific
> + * event device.
> + *
> + * @param dev_id
> + *   Event device index.
> + * @param port_id
> + *   Event port index
> + * @param ev
> + *   Points to an array of *nb_events* objects of type *rte_event* structure
> + *   for output to be populated with the dequeued event objects.
> + * @param nb_events
> + *   number of event objects
> + *
> + * @return
> + * The number of event objects
> + */
> +__rte_internal
> +uint16_t rte_eventdev_pmd_dequeue_callback_process(uint8_t dev_id,
> +		uint8_t port_id, struct rte_event ev[], uint16_t nb_events);
> +
>  /**
>   * Definitions of all functions exported by a driver through the
>   * generic structure of type *event_dev_ops* supplied in the
> diff --git a/lib/eventdev/eventdev_private.c b/lib/eventdev/eventdev_private.c
> index 017f97ccab..052c526ce0 100644
> --- a/lib/eventdev/eventdev_private.c
> +++ b/lib/eventdev/eventdev_private.c
> @@ -137,4 +137,6 @@ event_dev_fp_ops_set(struct rte_event_fp_ops *fp_op,
>  	fp_op->dma_enqueue = dev->dma_enqueue;
>  	fp_op->profile_switch = dev->profile_switch;
>  	fp_op->data = dev->data->ports;
> +	fp_op->ev_port.clbk = (void **)(uintptr_t)dev->post_dequeue_burst_cbs;
> +	fp_op->ev_port.data = dev->data->ports;
>  }
> diff --git a/lib/eventdev/rte_eventdev.c b/lib/eventdev/rte_eventdev.c
> index 5feb4326a2..f2540a6aa8 100644
> --- a/lib/eventdev/rte_eventdev.c
> +++ b/lib/eventdev/rte_eventdev.c
> @@ -18,6 +18,7 @@
>  #include <rte_common.h>
>  #include <rte_malloc.h>
>  #include <rte_errno.h>
> +#include <rte_stdatomic.h>
>  #include <ethdev_driver.h>
>  #include <rte_cryptodev.h>
>  #include <rte_dmadev.h>
> @@ -39,6 +40,9 @@ static struct rte_eventdev_global eventdev_globals = {
>  /* Public fastpath APIs. */
>  struct rte_event_fp_ops rte_event_fp_ops[RTE_EVENT_MAX_DEVS];
>  
> +/* spinlock for add/remove dequeue callbacks */
> +static rte_spinlock_t event_dev_dequeue_cb_lock = RTE_SPINLOCK_INITIALIZER;
> +
>  /* Event dev north bound API implementation */
>  
>  uint8_t
> @@ -884,6 +888,109 @@ rte_event_port_attr_get(uint8_t dev_id, uint8_t port_id, uint32_t attr_id,
>  	return 0;
>  }
>  
> +const struct rte_event_dequeue_callback *
> +rte_event_add_dequeue_callback(uint8_t dev_id, uint8_t port_id,
> +		rte_dequeue_callback_fn fn, void *user_param)
> +{
> +	struct rte_eventdev *dev;
> +	struct rte_event_dequeue_callback *cb;
> +	struct rte_event_dequeue_callback *tail;
> +
> +	/* check input parameters */
> +	RTE_EVENTDEV_VALID_DEVID_OR_ERR_RET(dev_id, NULL);
> +	dev = &rte_eventdevs[dev_id];
> +	if (!is_valid_port(dev, port_id)) {
> +		RTE_EDEV_LOG_ERR("Invalid port_id=%" PRIu8, port_id);
> +		return NULL;
> +	}
> +
> +	cb = rte_zmalloc(NULL, sizeof(*cb), 0);
> +	if (cb == NULL) {
> +		rte_errno = ENOMEM;
> +		return NULL;
> +	}
> +	cb->fn.dequeue = fn;
> +	cb->param = user_param;
> +
> +	rte_spinlock_lock(&event_dev_dequeue_cb_lock);
> +	/* Add the callbacks in fifo order. */
> +	tail = rte_eventdevs[dev_id].post_dequeue_burst_cbs[port_id];
> +	if (!tail) {
> +		/* Stores to cb->fn and cb->param should complete before
> +		 * cb is visible to data plane.
> +		 */
> +		rte_atomic_store_explicit(
> +			&rte_eventdevs[dev_id].post_dequeue_burst_cbs[port_id],
> +			cb, __ATOMIC_RELEASE);
                            ^^^^^^^^^^^^^^^^ rte_memory_order_release (rte_stdatomic.h)

> +	} else {
> +		while (tail->next)
> +			tail = tail->next;
> +		/* Stores to cb->fn and cb->param should complete before
> +		 * cb is visible to data plane.
> +		 */
> +		rte_atomic_store_explicit(&tail->next, cb, __ATOMIC_RELEASE);
							   ^^^^ same here

> +	}
> +	rte_spinlock_unlock(&event_dev_dequeue_cb_lock);
> +
> +	return cb;
> +}
> +
> +int
> +rte_event_remove_dequeue_callback(uint8_t dev_id, uint8_t port_id,
> +		const struct rte_event_dequeue_callback *user_cb)
> +{
> +	struct rte_eventdev *dev;
> +	struct rte_event_dequeue_callback *cb;
> +	struct rte_event_dequeue_callback **prev_cb;
> +
> +	/* Check input parameters. */
> +	RTE_EVENTDEV_VALID_DEVID_OR_ERR_RET(dev_id, -EINVAL);
> +	dev = &rte_eventdevs[dev_id];
> +	if (user_cb == NULL || !is_valid_port(dev, port_id))
> +		return -EINVAL;
> +
> +	rte_spinlock_lock(&event_dev_dequeue_cb_lock);
> +	prev_cb = &dev->post_dequeue_burst_cbs[port_id];
> +	for (; *prev_cb != NULL; prev_cb = &cb->next) {
> +		cb = *prev_cb;
> +		if (cb == user_cb) {
> +			/* Remove the user cb from the callback list. */
> +			rte_atomic_store_explicit(prev_cb, cb->next,
> +						__ATOMIC_RELAXED);
						^^^ and here

> +			break;
> +		}
> +	}
> +	rte_spinlock_unlock(&event_dev_dequeue_cb_lock);
> +
> +	return 0;
> +}
> +
> +uint16_t rte_eventdev_pmd_dequeue_callback_process(uint8_t dev_id,
> +		uint8_t port_id, struct rte_event ev[], uint16_t nb_events)
> +{
> +	struct rte_event_dequeue_callback *cb;
> +	const struct rte_event_fp_ops *fp_ops;
> +
> +	fp_ops = &rte_event_fp_ops[dev_id];
> +
> +	/* __ATOMIC_RELEASE memory order was used when the
> +	 * call back was inserted into the list.
> +	 * Since there is a clear dependency between loading
> +	 * cb and cb->fn/cb->next, __ATOMIC_ACQUIRE memory order is
> +	 * not required.
> +	 */
> +	cb = rte_atomic_load_explicit((void **)&fp_ops->ev_port.clbk[port_id],
				       ^^^^^^^ needs to be __rte_atomic
qualified

> +					__ATOMIC_RELAXED);
					^^^^ rte_memory_order
				

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v1 5/6] power: add eventdev support for power management
  2023-10-16 20:57   ` [PATCH v1 5/6] power: add eventdev support for power management Sivaprasad Tummala
@ 2023-10-16 23:51     ` Tyler Retzlaff
  2023-10-17  3:03       ` Tummala, Sivaprasad
  2023-10-17  3:22     ` Jerin Jacob
  1 sibling, 1 reply; 33+ messages in thread
From: Tyler Retzlaff @ 2023-10-16 23:51 UTC (permalink / raw)
  To: Sivaprasad Tummala
  Cc: jerinjacobk, harry.van.haaren, anatoly.burakov, dev,
	ferruh.yigit, david.hunt

On Mon, Oct 16, 2023 at 01:57:14PM -0700, Sivaprasad Tummala wrote:
> Add eventdev support to enable power saving when no events
> are arriving. It is based on counting the number of empty
> polls and, when the number reaches a certain threshold, entering
> an architecture-defined optimized power state that will either wait
> until a TSC timestamp expires, or when events arrive.
> 
> This API mandates a core-to-single-port mapping (i.e. one core polling
> multiple ports of event device is not supported). This should be ok
> as the general use case will have one CPU core using one port to
> enqueue/dequeue events from an eventdev.
> 
> This design is using Eventdev PMD Dequeue callbacks.
> 
> 1. MWAITX/MONITORX:
> 
>    When a certain threshold of empty polls is reached, the core will go
>    into a power optimized sleep while waiting on an address of next RX
>    descriptor to be written to.
> 
> 2. Pause instruction
> 
>    This method uses the pause instruction to avoid busy polling.
> 
> Signed-off-by: Sivaprasad Tummala <sivaprasad.tummala@amd.com>
> ---
>  lib/power/meson.build          |   2 +-
>  lib/power/rte_power_pmd_mgmt.c | 226 +++++++++++++++++++++++++++++++++
>  lib/power/rte_power_pmd_mgmt.h |  55 ++++++++
>  lib/power/version.map          |   4 +
>  4 files changed, 286 insertions(+), 1 deletion(-)
> 

...
> +
> +	# added in 23.07
> +	rte_power_eventdev_pmgmt_port_enable;
> +	rte_power_eventdev_pmgmt_port_disable;

23.07 is released, 23.11?

>  };
> -- 
> 2.34.1

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: [PATCH v1 5/6] power: add eventdev support for power management
  2023-10-16 23:51     ` Tyler Retzlaff
@ 2023-10-17  3:03       ` Tummala, Sivaprasad
  0 siblings, 0 replies; 33+ messages in thread
From: Tummala, Sivaprasad @ 2023-10-17  3:03 UTC (permalink / raw)
  To: Tyler Retzlaff
  Cc: jerinjacobk, harry.van.haaren, anatoly.burakov, dev, Yigit,
	Ferruh, david.hunt

[AMD Official Use Only - General]

> -----Original Message-----
> From: Tyler Retzlaff <roretzla@linux.microsoft.com>
> Sent: Tuesday, October 17, 2023 5:21 AM
> To: Tummala, Sivaprasad <Sivaprasad.Tummala@amd.com>
> Cc: jerinjacobk@gmail.com; harry.van.haaren@intel.com;
> anatoly.burakov@intel.com; dev@dpdk.org; Yigit, Ferruh <Ferruh.Yigit@amd.com>;
> david.hunt@intel.com
> Subject: Re: [PATCH v1 5/6] power: add eventdev support for power management
>
> Caution: This message originated from an External Source. Use proper caution
> when opening attachments, clicking links, or responding.
>
>
> On Mon, Oct 16, 2023 at 01:57:14PM -0700, Sivaprasad Tummala wrote:
> > Add eventdev support to enable power saving when no events are
> > arriving. It is based on counting the number of empty polls and, when
> > the number reaches a certain threshold, entering an
> > architecture-defined optimized power state that will either wait until
> > a TSC timestamp expires, or when events arrive.
> >
> > This API mandates a core-to-single-port mapping (i.e. one core polling
> > multiple ports of event device is not supported). This should be ok as
> > the general use case will have one CPU core using one port to
> > enqueue/dequeue events from an eventdev.
> >
> > This design is using Eventdev PMD Dequeue callbacks.
> >
> > 1. MWAITX/MONITORX:
> >
> >    When a certain threshold of empty polls is reached, the core will go
> >    into a power optimized sleep while waiting on an address of next RX
> >    descriptor to be written to.
> >
> > 2. Pause instruction
> >
> >    This method uses the pause instruction to avoid busy polling.
> >
> > Signed-off-by: Sivaprasad Tummala <sivaprasad.tummala@amd.com>
> > ---
> >  lib/power/meson.build          |   2 +-
> >  lib/power/rte_power_pmd_mgmt.c | 226
> > +++++++++++++++++++++++++++++++++
> lib/power/rte_power_pmd_mgmt.h |  55 ++++++++
> >  lib/power/version.map          |   4 +
> >  4 files changed, 286 insertions(+), 1 deletion(-)
> >
>
> ...
> > +
> > +     # added in 23.07
> > +     rte_power_eventdev_pmgmt_port_enable;
> > +     rte_power_eventdev_pmgmt_port_disable;
>
> 23.07 is released, 23.11?
Hi Tyler,
Thanks for spotting this. Will fix the typo in v2.
>
> >  };
> > --
> > 2.34.1

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v1 5/6] power: add eventdev support for power management
  2023-10-16 20:57   ` [PATCH v1 5/6] power: add eventdev support for power management Sivaprasad Tummala
  2023-10-16 23:51     ` Tyler Retzlaff
@ 2023-10-17  3:22     ` Jerin Jacob
  2023-10-18  7:08       ` Tummala, Sivaprasad
  1 sibling, 1 reply; 33+ messages in thread
From: Jerin Jacob @ 2023-10-17  3:22 UTC (permalink / raw)
  To: Sivaprasad Tummala
  Cc: harry.van.haaren, anatoly.burakov, dev, ferruh.yigit, david.hunt

On Tue, Oct 17, 2023 at 2:27 AM Sivaprasad Tummala
<sivaprasad.tummala@amd.com> wrote:
>
> Add eventdev support to enable power saving when no events
> are arriving. It is based on counting the number of empty
> polls and, when the number reaches a certain threshold, entering
> an architecture-defined optimized power state that will either wait
> until a TSC timestamp expires, or when events arrive.
>
> This API mandates a core-to-single-port mapping (i.e. one core polling
> multiple ports of event device is not supported). This should be ok
> as the general use case will have one CPU core using one port to
> enqueue/dequeue events from an eventdev.
>
> This design is using Eventdev PMD Dequeue callbacks.
>
> 1. MWAITX/MONITORX:
>
>    When a certain threshold of empty polls is reached, the core will go
>    into a power optimized sleep while waiting on an address of next RX
>    descriptor to be written to.
>
> 2. Pause instruction
>
>    This method uses the pause instruction to avoid busy polling.
>
> Signed-off-by: Sivaprasad Tummala <sivaprasad.tummala@amd.com>


Hi Siva,

It does not look it is aligned with previous discussion.

I spend a couple of minutes to draft semantics. Please treat as reference.

# IMO, only following public SLOW PATH eventdev API required.(Just
share the concept)

enum rte_event_pmgmt_modes {
       /** Default power management scheme */
    RTE_EVENT_POWER_MGMT_TYPE_DEFAULT;
   /** Use power-optimized monitoring to wait for incoming traffic */
        RTE_EVENT_POWER_MGMT_TYPE_F_CPU_MONITOR = RTE_BIT(0),
        /** Use power-optimized sleep to avoid busy polling */
        RTE_EVENT_POWER_MGMT_TYPE_F_CPU_PAUSE = RTE_BIT(1),
       /** HW based power management scheme found in ARM64 machines,
where core goes to sleep state till event available on dequeue */
RTE_EVENT_POWER_MGMT_TYPE_F_HW_WFE_ON_DEQUEUE = RTE_BIT(2),

};

int rte_event_port_pmgmt_type_supported_get(uint8_t dev_id, enum
rte_event_pmgmt_modes *mode_flags)
/** Device must be in stop state */
int rte_event_port_pmgmt_enable(uint8_t dev_id, uint8_t port_id, enum
rte_event_pmgmt_modes mode);
int rte_event_port_pmgmt_disable(uint8_t dev_id, uint8_t port_id);

# It should be self-contained, No need to add to rte_power as it is
CPU only power mgmt.(See RTE_EVENT_POWER_MGMT_TYPE_F_HW_WFE_ON_DEQUEUE
above)

# Add: lib/eventdev/eventdev_pmd_pmgmt.c or so and have CPU based on
power management helper functions
so that all SW PMD can be reused.
example:
eventdev_pmd_pmgmt_handle_monitor(uint8_t dev_id, uint8_t port_id,
struct rte_event ev[], uint16_t nb_events);
eventdev_pmd_pmgmt_handle_pause(uint8_t dev_id, uint8_t port_id,
struct rte_event ev[], uint16_t nb_events);


# In rte_event_dev_start(), Fixup dev->dequeue_burst if CPU based
power management is applicable,and it is selected.
ie. new dev->dequeue_burst is existing PMD's  dev->dequeue_burst +
eventdev_pmd_pmgmt_handle_.. (based on power management mode selected)

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: [PATCH v1 5/6] power: add eventdev support for power management
  2023-10-17  3:22     ` Jerin Jacob
@ 2023-10-18  7:08       ` Tummala, Sivaprasad
  2023-10-18  7:13         ` Jerin Jacob
  0 siblings, 1 reply; 33+ messages in thread
From: Tummala, Sivaprasad @ 2023-10-18  7:08 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: harry.van.haaren, anatoly.burakov, dev, Yigit, Ferruh, david.hunt

[AMD Official Use Only - General]

Hi Jerin,

> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Tuesday, October 17, 2023 8:53 AM
> To: Tummala, Sivaprasad <Sivaprasad.Tummala@amd.com>
> Cc: harry.van.haaren@intel.com; anatoly.burakov@intel.com; dev@dpdk.org; Yigit,
> Ferruh <Ferruh.Yigit@amd.com>; david.hunt@intel.com
> Subject: Re: [PATCH v1 5/6] power: add eventdev support for power management
>
> Caution: This message originated from an External Source. Use proper caution
> when opening attachments, clicking links, or responding.
>
>
> On Tue, Oct 17, 2023 at 2:27 AM Sivaprasad Tummala
> <sivaprasad.tummala@amd.com> wrote:
> >
> > Add eventdev support to enable power saving when no events are
> > arriving. It is based on counting the number of empty polls and, when
> > the number reaches a certain threshold, entering an
> > architecture-defined optimized power state that will either wait until
> > a TSC timestamp expires, or when events arrive.
> >
> > This API mandates a core-to-single-port mapping (i.e. one core polling
> > multiple ports of event device is not supported). This should be ok as
> > the general use case will have one CPU core using one port to
> > enqueue/dequeue events from an eventdev.
> >
> > This design is using Eventdev PMD Dequeue callbacks.
> >
> > 1. MWAITX/MONITORX:
> >
> >    When a certain threshold of empty polls is reached, the core will go
> >    into a power optimized sleep while waiting on an address of next RX
> >    descriptor to be written to.
> >
> > 2. Pause instruction
> >
> >    This method uses the pause instruction to avoid busy polling.
> >
> > Signed-off-by: Sivaprasad Tummala <sivaprasad.tummala@amd.com>
>
>
> Hi Siva,
>
> It does not look it is aligned with previous discussion.
>
> I spend a couple of minutes to draft semantics. Please treat as reference.
>
> # IMO, only following public SLOW PATH eventdev API required.(Just share the
> concept)
>
> enum rte_event_pmgmt_modes {
>        /** Default power management scheme */
>     RTE_EVENT_POWER_MGMT_TYPE_DEFAULT;
>    /** Use power-optimized monitoring to wait for incoming traffic */
>         RTE_EVENT_POWER_MGMT_TYPE_F_CPU_MONITOR = RTE_BIT(0),
>         /** Use power-optimized sleep to avoid busy polling */
>         RTE_EVENT_POWER_MGMT_TYPE_F_CPU_PAUSE = RTE_BIT(1),
>        /** HW based power management scheme found in ARM64 machines, where
> core goes to sleep state till event available on dequeue */
> RTE_EVENT_POWER_MGMT_TYPE_F_HW_WFE_ON_DEQUEUE = RTE_BIT(2),
>
> };
>
> int rte_event_port_pmgmt_type_supported_get(uint8_t dev_id, enum
> rte_event_pmgmt_modes *mode_flags)
> /** Device must be in stop state */
> int rte_event_port_pmgmt_enable(uint8_t dev_id, uint8_t port_id, enum
> rte_event_pmgmt_modes mode); int rte_event_port_pmgmt_disable(uint8_t
> dev_id, uint8_t port_id);
>
> # It should be self-contained, No need to add to rte_power as it is CPU only power
> mgmt.(See RTE_EVENT_POWER_MGMT_TYPE_F_HW_WFE_ON_DEQUEUE
> above)
>
> # Add: lib/eventdev/eventdev_pmd_pmgmt.c or so and have CPU based on power
> management helper functions so that all SW PMD can be reused.
> example:
> eventdev_pmd_pmgmt_handle_monitor(uint8_t dev_id, uint8_t port_id, struct
> rte_event ev[], uint16_t nb_events);
> eventdev_pmd_pmgmt_handle_pause(uint8_t dev_id, uint8_t port_id, struct
> rte_event ev[], uint16_t nb_events);
>
>
> # In rte_event_dev_start(), Fixup dev->dequeue_burst if CPU based power
> management is applicable,and it is selected.
> ie. new dev->dequeue_burst is existing PMD's  dev->dequeue_burst +
> eventdev_pmd_pmgmt_handle_.. (based on power management mode selected)

Thanks for the clarification. Will incorporate the changes in next version of the patch (to support power management on event port).
With the time constraints, I will defer the power management support on event port to next release. However to avoid ABI breakage,
I will split the Patchset and push the patches to support callbacks in this release, so we don't have to wait for next stable release to
get these changes integrated.

Please let me know your thoughts.

Thanks & Regards,
Sivaprasad

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v1 5/6] power: add eventdev support for power management
  2023-10-18  7:08       ` Tummala, Sivaprasad
@ 2023-10-18  7:13         ` Jerin Jacob
  0 siblings, 0 replies; 33+ messages in thread
From: Jerin Jacob @ 2023-10-18  7:13 UTC (permalink / raw)
  To: Tummala, Sivaprasad
  Cc: harry.van.haaren, anatoly.burakov, dev, Yigit, Ferruh, david.hunt

On Wed, Oct 18, 2023 at 12:38 PM Tummala, Sivaprasad
<Sivaprasad.Tummala@amd.com> wrote:
>
> [AMD Official Use Only - General]
>
> Hi Jerin,
>
> > -----Original Message-----
> > From: Jerin Jacob <jerinjacobk@gmail.com>
> > Sent: Tuesday, October 17, 2023 8:53 AM
> > To: Tummala, Sivaprasad <Sivaprasad.Tummala@amd.com>
> > Cc: harry.van.haaren@intel.com; anatoly.burakov@intel.com; dev@dpdk.org; Yigit,
> > Ferruh <Ferruh.Yigit@amd.com>; david.hunt@intel.com
> > Subject: Re: [PATCH v1 5/6] power: add eventdev support for power management
> >
> > Caution: This message originated from an External Source. Use proper caution
> > when opening attachments, clicking links, or responding.
> >
> >
> > On Tue, Oct 17, 2023 at 2:27 AM Sivaprasad Tummala
> > <sivaprasad.tummala@amd.com> wrote:
> > >
> > > Add eventdev support to enable power saving when no events are
> > > arriving. It is based on counting the number of empty polls and, when
> > > the number reaches a certain threshold, entering an
> > > architecture-defined optimized power state that will either wait until
> > > a TSC timestamp expires, or when events arrive.
> > >
> > > This API mandates a core-to-single-port mapping (i.e. one core polling
> > > multiple ports of event device is not supported). This should be ok as
> > > the general use case will have one CPU core using one port to
> > > enqueue/dequeue events from an eventdev.
> > >
> > > This design is using Eventdev PMD Dequeue callbacks.
> > >
> > > 1. MWAITX/MONITORX:
> > >
> > >    When a certain threshold of empty polls is reached, the core will go
> > >    into a power optimized sleep while waiting on an address of next RX
> > >    descriptor to be written to.
> > >
> > > 2. Pause instruction
> > >
> > >    This method uses the pause instruction to avoid busy polling.
> > >
> > > Signed-off-by: Sivaprasad Tummala <sivaprasad.tummala@amd.com>
> >
> >
> > Hi Siva,
> >
> > It does not look it is aligned with previous discussion.
> >
> > I spend a couple of minutes to draft semantics. Please treat as reference.
> >
> > # IMO, only following public SLOW PATH eventdev API required.(Just share the
> > concept)
> >
> > enum rte_event_pmgmt_modes {
> >        /** Default power management scheme */
> >     RTE_EVENT_POWER_MGMT_TYPE_DEFAULT;
> >    /** Use power-optimized monitoring to wait for incoming traffic */
> >         RTE_EVENT_POWER_MGMT_TYPE_F_CPU_MONITOR = RTE_BIT(0),
> >         /** Use power-optimized sleep to avoid busy polling */
> >         RTE_EVENT_POWER_MGMT_TYPE_F_CPU_PAUSE = RTE_BIT(1),
> >        /** HW based power management scheme found in ARM64 machines, where
> > core goes to sleep state till event available on dequeue */
> > RTE_EVENT_POWER_MGMT_TYPE_F_HW_WFE_ON_DEQUEUE = RTE_BIT(2),
> >
> > };
> >
> > int rte_event_port_pmgmt_type_supported_get(uint8_t dev_id, enum
> > rte_event_pmgmt_modes *mode_flags)
> > /** Device must be in stop state */
> > int rte_event_port_pmgmt_enable(uint8_t dev_id, uint8_t port_id, enum
> > rte_event_pmgmt_modes mode); int rte_event_port_pmgmt_disable(uint8_t
> > dev_id, uint8_t port_id);
> >
> > # It should be self-contained, No need to add to rte_power as it is CPU only power
> > mgmt.(See RTE_EVENT_POWER_MGMT_TYPE_F_HW_WFE_ON_DEQUEUE
> > above)
> >
> > # Add: lib/eventdev/eventdev_pmd_pmgmt.c or so and have CPU based on power
> > management helper functions so that all SW PMD can be reused.
> > example:
> > eventdev_pmd_pmgmt_handle_monitor(uint8_t dev_id, uint8_t port_id, struct
> > rte_event ev[], uint16_t nb_events);
> > eventdev_pmd_pmgmt_handle_pause(uint8_t dev_id, uint8_t port_id, struct
> > rte_event ev[], uint16_t nb_events);
> >
> >
> > # In rte_event_dev_start(), Fixup dev->dequeue_burst if CPU based power
> > management is applicable,and it is selected.
> > ie. new dev->dequeue_burst is existing PMD's  dev->dequeue_burst +
> > eventdev_pmd_pmgmt_handle_.. (based on power management mode selected)
>
> Thanks for the clarification. Will incorporate the changes in next version of the patch (to support power management on event port).
> With the time constraints, I will defer the power management support on event port to next release. However to avoid ABI breakage,
> I will split the Patchset and push the patches to support callbacks in this release, so we don't have to wait for next stable release to
> get these changes integrated.

if you follow this scheme, public callback API is not needed.

 # In rte_event_dev_start(), Fixup dev->dequeue_burst if CPU based power
 management is applicable,and it is selected.
ie. new dev->dequeue_burst is existing PMD's  dev->dequeue_burst +
 eventdev_pmd_pmgmt_handle_.. (based on power management mode selected)


>
> Please let me know your thoughts.
>
> Thanks & Regards,
> Sivaprasad

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2023-10-18  7:14 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-19  9:54 [RFC PATCH 1/5] eventdev: add power monitoring API on event port Sivaprasad Tummala
2023-04-19  9:54 ` [RFC PATCH 2/5] event/sw: support power monitor Sivaprasad Tummala
2023-04-19  9:54 ` [RFC PATCH 3/5] eventdev: support optional dequeue callbacks Sivaprasad Tummala
2023-04-24 16:06   ` Ferruh Yigit
2023-05-17 14:22   ` Burakov, Anatoly
2023-04-19  9:54 ` [RFC PATCH 4/5] power: add eventdev support for power management Sivaprasad Tummala
2023-05-17 14:43   ` Burakov, Anatoly
2023-05-24 12:34     ` Tummala, Sivaprasad
2023-04-19  9:54 ` [RFC PATCH 5/5] examples/eventdev_p: add eventdev " Sivaprasad Tummala
2023-04-19 10:15 ` [RFC PATCH 1/5] eventdev: add power monitoring API on event port Jerin Jacob
2023-04-24 16:06   ` Ferruh Yigit
2023-04-25  4:09     ` Jerin Jacob
2023-05-02 11:19       ` Ferruh Yigit
2023-05-03  7:58         ` Jerin Jacob
2023-05-03  8:13           ` Ferruh Yigit
2023-05-03  8:26             ` Jerin Jacob
2023-05-03 15:11               ` Tummala, Sivaprasad
2023-04-25  6:19     ` Mattias Rönnblom
2023-05-02 10:43       ` Ferruh Yigit
2023-05-17 14:48 ` Burakov, Anatoly
2023-10-16 20:57 ` [PATCH v1 1/6] " Sivaprasad Tummala
2023-10-16 20:57   ` [PATCH v1 2/6] event/sw: support power monitor Sivaprasad Tummala
2023-10-16 23:41     ` Tyler Retzlaff
2023-10-16 20:57   ` [PATCH v1 3/6] eventdev: support optional dequeue callbacks Sivaprasad Tummala
2023-10-16 23:47     ` Tyler Retzlaff
2023-10-16 20:57   ` [PATCH v1 4/6] event/sw: " Sivaprasad Tummala
2023-10-16 20:57   ` [PATCH v1 5/6] power: add eventdev support for power management Sivaprasad Tummala
2023-10-16 23:51     ` Tyler Retzlaff
2023-10-17  3:03       ` Tummala, Sivaprasad
2023-10-17  3:22     ` Jerin Jacob
2023-10-18  7:08       ` Tummala, Sivaprasad
2023-10-18  7:13         ` Jerin Jacob
2023-10-16 20:57   ` [PATCH v1 6/6] examples/eventdev_p: add eventdev " Sivaprasad Tummala

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).